23 May, 2020

1 commit


17 Jan, 2020

1 commit


06 Nov, 2019

1 commit

  • The maximum number of memory controllers is fixed within a family/model
    group. In most cases, this has been fixed at 2, but some systems may
    have up to 8.

    The struct amd64_family_type already contains family/model-specific
    information, and this can be used rather than adding model checks to
    various functions.

    Create a new field in struct amd64_family_type for max_mcs.
    Set this when setting other family type information, and use this when
    needing the maximum number of memory controllers possible for a system.

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Cc: "linux-edac@vger.kernel.org"
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: Robert Richter
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20191106012448.243970-4-Yazen.Ghannam@amd.com

    Yazen Ghannam
     

07 Sep, 2019

1 commit

  • Add the new Family 17h Model 70h PCI IDs (device 18h functions 0 and 6)
    to the AMD64 EDAC module.

    [ bp: s/f17_base_addr_to_cs_size/f17_addr_mask_to_cs_size/g ]

    Signed-off-by: Isaac Vaughn
    Signed-off-by: Borislav Petkov
    Cc: James Morse
    Cc: linux-edac@vger.kernel.org
    Cc: Mauro Carvalho Chehab
    Cc: Robert Richter
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20190906192131.8ced0ca112146f32d82b6cae@knights.ucf.edu

    Isaac Vaughn
     

23 Aug, 2019

3 commits

  • Future AMD systems will support asymmetric dual-rank DIMMs. These are
    DIMMs where the ranks are of different sizes.

    The even rank will use the Primary Even Chip Select registers and the
    odd rank will use the Secondary Odd Chip Select registers.

    Recognize if a Secondary Odd Chip Select is being used. Use the
    Secondary Odd Address Mask when calculating the chip select size.

    [ bp: move csrow_sec_enabled() to the header, fix CS_ODD define and
    tone-down the capitalized words spelling. ]

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Cc: "linux-edac@vger.kernel.org"
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20190821235938.118710-8-Yazen.Ghannam@amd.com

    Yazen Ghannam
     
  • AMD Family 17h systems have a set of secondary Chip Select Base
    Addresses and Address Masks. These do not represent unique Chip
    Selects, rather they are used in conjunction with the primary
    Chip Select registers in certain cases.

    Cache these secondary Chip Select registers for future use.

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Cc: "linux-edac@vger.kernel.org"
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20190821235938.118710-7-Yazen.Ghannam@amd.com

    Yazen Ghannam
     
  • The struct chip_select array that's used for saving chip select bases
    and masks is fixed at length of two. There should be one struct
    chip_select for each controller, so this array should be increased to
    support systems that may have more than two controllers.

    Increase the size of the struct chip_select array to eight, which is the
    largest number of controllers per die currently supported on AMD
    systems.

    Fix number of DIMMs and Chip Select bases/masks on Family17h, because
    AMD Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per
    channel.

    Also, carve out the Family 17h+ reading of the bases/masks into a
    separate function. This effectively reverts the original bases/masks
    reading code to before Family 17h support was added.

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Cc: "linux-edac@vger.kernel.org"
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20190821235938.118710-2-Yazen.Ghannam@amd.com

    Yazen Ghannam
     

25 Apr, 2019

1 commit

  • This reverts commit 0a227af521d6df5286550b62f4b591417170b4ea.

    Unfortunately, this commit caused wrong detection of chip select sizes
    on some F17h client machines:

    --- 00-rc6+ 2019-02-14 14:28:03.126622904 +0100
    +++ 01-rc4+ 2019-04-14 21:06:16.060614790 +0200
    EDAC amd64: MC: 0: 0MB 1: 0MB
    -EDAC amd64: MC: 2: 16383MB 3: 16383MB
    +EDAC amd64: MC: 2: 0MB 3: 2097151MB
    EDAC amd64: MC: 4: 0MB 5: 0MB
    EDAC amd64: MC: 6: 0MB 7: 0MB
    EDAC MC: UMC1 chip selects:
    EDAC amd64: MC: 0: 0MB 1: 0MB
    -EDAC amd64: MC: 2: 16383MB 3: 16383MB
    +EDAC amd64: MC: 2: 0MB 3: 2097151MB
    EDAC amd64: MC: 4: 0MB 5: 0MB
    EDAC amd64: MC: 6: 0MB 7: 0M

    Revert it for now until it has been solved properly.

    Signed-off-by: Borislav Petkov
    Cc: Yazen Ghannam

    Borislav Petkov
     

27 Mar, 2019

4 commits

  • The struct chip_select array that's used for saving chip select bases
    and masks is fixed at length of two. There should be one struct
    chip_select for each controller, so this array should be increased to
    support systems that may have more than two controllers.

    Increase the size of the struct chip_select array to eight, which is the
    largest number of controllers per die currently supported on AMD
    systems.

    Also, carve out the Family 17h+ reading of the bases/masks into a
    separate function. This effectively reverts the original bases/masks
    reading code to before Family 17h support was added.

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Tested-by: Kim Phillips
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/20190228153558.127292-5-Yazen.Ghannam@amd.com

    Yazen Ghannam
     
  • Future AMD systems may support x16 symbol sizes.

    Recognize if a system is using x16 symbol size. Also, simplify the print
    statement.

    Note that a x16 syndrome vector table is not necessary like with x4 or
    x8 syndromes. This is because systems that support x16 symbol sizes are
    SMCA systems and in that case, the syndrome can be directly extracted
    from the MCA_SYND[Syndrome] field.

    [ bp: massage. ]

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Tested-by: Kim Phillips
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/20190228153558.127292-4-Yazen.Ghannam@amd.com

    Yazen Ghannam
     
  • The first few models of Family 17h all had 2 Unified Memory Controllers
    per Die, so this was treated as a fixed value. However, future systems
    may have more Unified Memory Controllers per Die.

    Related to this, the channel number and base address of a Unified Memory
    Controller were found by matching on fixed, known values. However,
    current and future systems follow this pattern for the channel number
    and base address of a Unified Memory Controller: 0xYXXXXX, where Y is
    the channel number. So matching on hardcoded values is not necessary.

    Set the number of Unified Memory Controllers at driver init time based
    on the family/model. Also, update the functions that find the channel
    number and base address of a Unified Memory Controller to support more
    than two.

    [ bp: Move num_umcs into the .c file and simplify comment. ]

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Tested-by: Kim Phillips
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/20190228153558.127292-3-Yazen.Ghannam@amd.com

    Yazen Ghannam
     
  • Add the new Family 17h Model 30h PCI IDs to the AMD64 EDAC module.

    This also fixes a probe failure that appeared when some other PCI IDs
    for Family 17h Model 30h were added to the AMD NB code.

    Fixes: be3518a16ef2 (x86/amd_nb: Add PCI device IDs for family 17h, model 30h)
    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Tested-by: Kim Phillips
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/20190228153558.127292-1-Yazen.Ghannam@amd.com

    Yazen Ghannam
     

27 Aug, 2018

1 commit

  • Add new device IDs for family 17h, models 10h-2fh.

    This is required by amd64_edac_mod in order to properly detect PCI
    device functions 0 and 6.

    Signed-off-by: Michael Jin
    Reviewed-by: Yazen Ghannam
    Cc:
    Link: http://lkml.kernel.org/r/20180816192840.31166-1-mikhail.jin@gmail.com
    Signed-off-by: Borislav Petkov

    Michael Jin
     

14 Feb, 2017

1 commit


28 Jan, 2017

2 commits

  • Match one of the devices in amd64_cpuids[] before loading the module.
    This is an additional sanity check against users trying to load
    amd64_edac_mod on unsupported systems.

    Signed-off-by: Yazen Ghannam
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1485537863-2707-9-git-send-email-Yazen.Ghannam@amd.com
    [ Get rid of err_ret label, make it a bit more readable this way. ]
    Signed-off-by: Borislav Petkov

    Yazen Ghannam
     
  • amd64_{debug,notice} don't have any users, so remove them.

    Signed-off-by: Yazen Ghannam
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1485537863-2707-6-git-send-email-Yazen.Ghannam@amd.com
    Signed-off-by: Borislav Petkov

    Yazen Ghannam
     

15 Dec, 2016

1 commit


01 Dec, 2016

1 commit


30 Nov, 2016

2 commits

  • How we need to decode UMC errors is different from how we decode bus
    errors, so let's define a new function for this. We also need a way to
    determine the UMC channel since we're not guaranteed that there is a
    fixed relation between channel and MCA bank.

    Signed-off-by: Yazen Ghannam
    Cc: Aravind Gopalakrishnan
    Cc: linux-edac
    Cc: x86-ml
    Link: http://lkml.kernel.org/r/1480359593-80369-1-git-send-email-Yazen.Ghannam@amd.com
    [ Fold in decode_synd_reg(), simplify. ]
    Signed-off-by: Borislav Petkov

    Yazen Ghannam
     
  • Read a few more UMC registers and provide debug output in order to be as
    similar as possible to older AMD systems.

    Signed-off-by: Yazen Ghannam
    Cc: Aravind Gopalakrishnan
    Cc: linux-edac
    Cc: x86-ml
    Link: http://lkml.kernel.org/r/1480344621-14966-1-git-send-email-Yazen.Ghannam@amd.com
    [ Remove unneeded K8 check and comments, fixup others. ]
    Signed-off-by: Borislav Petkov

    Yazen Ghannam
     

29 Nov, 2016

3 commits

  • Fam17h has new register offsets and fields for setting up the DRAM
    scrubber so add support for this.

    Signed-off-by: Yazen Ghannam
    Cc: Aravind Gopalakrishnan
    Cc: linux-edac
    Cc: x86-ml
    Link: http://lkml.kernel.org/r/1479423463-8536-17-git-send-email-Yazen.Ghannam@amd.com
    Signed-off-by: Borislav Petkov

    Yazen Ghannam
     
  • Fam17h has a different set of registers and bitfields. Most of these
    registers are read through SMN (System Management Network) rather
    than PCI config space. Also, the derivation of various values is now
    different.

    Update amd64_edac to read the appropriate registers and extract the
    correct values for Fam17h.

    Signed-off-by: Yazen Ghannam
    Cc: Aravind Gopalakrishnan
    Cc: linux-edac
    Cc: x86-ml
    Link: http://lkml.kernel.org/r/1479423463-8536-12-git-send-email-Yazen.Ghannam@amd.com
    [ Save us the indentation level in read_mc_regs(), add defines ]
    Signed-off-by: Borislav Petkov

    Yazen Ghannam
     
  • Fam17h needs PCI device functions 0 and 6 instead of 1 and 2 as on older
    systems. Update struct amd64_pvt to hold the new functions and reserve
    them if on Fam17h.

    Also, allocate an array of UMC structs within our newly allocated PVT
    struct.

    Signed-off-by: Yazen Ghannam
    Cc: Aravind Gopalakrishnan
    Cc: linux-edac
    Cc: x86-ml
    Link: http://lkml.kernel.org/r/1479423463-8536-11-git-send-email-Yazen.Ghannam@amd.com
    [ init_one_instance() error handling, shorten lines, unbreak >80 cols lines. ]
    Signed-off-by: Borislav Petkov

    Yazen Ghannam
     

25 Nov, 2016

2 commits

  • Add a family type and associated ops for Fam17h. Define a struct to hold
    all the UMC registers that we need. Make this a part of struct amd64_pvt
    in order to maximize code reuse in the rest of the driver.

    Signed-off-by: Yazen Ghannam
    Cc: Aravind Gopalakrishnan
    Cc: linux-edac
    Cc: x86-ml
    Link: http://lkml.kernel.org/r/1479423463-8536-10-git-send-email-Yazen.Ghannam@amd.com
    Signed-off-by: Borislav Petkov

    Yazen Ghannam
     
  • Update the ecc_enabled() function to work on Fam17h. This entails
    reading a different set of registers and using the SMN (System
    Management Network) rather than PCI devices.

    Signed-off-by: Yazen Ghannam
    Cc: Aravind Gopalakrishnan
    Cc: linux-edac
    Cc: x86-ml
    Link: http://lkml.kernel.org/r/1479423463-8536-9-git-send-email-Yazen.Ghannam@amd.com
    [ Fixup ecc_en assignment and get_umc_base(). ]
    Signed-off-by: Borislav Petkov

    Yazen Ghannam
     

10 May, 2016

1 commit

  • - remove homegrown instances counting.
    - take F3 PCI device from amd_nb caching instead of F2 which was used with the
    PCI core.

    With those changes, the driver doesn't need to register a PCI driver and
    relies on the northbridges caching which we do anyway on AMD.

    Signed-off-by: Borislav Petkov
    Cc: Yazen Ghannam

    Borislav Petkov
     

29 Sep, 2015

2 commits

  • Git provides us all the changelogs anyway. So trim the comments section
    here. Update the copyrights info while at it.

    Signed-off-by: Aravind Gopalakrishnan
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1443440593-2316-3-git-send-email-Aravind.Gopalakrishnan@amd.com
    Signed-off-by: Borislav Petkov

    Aravind Gopalakrishnan
     
  • The scrub rate control register has moved to function 2 in PCI config
    space and is at a different offset on family 0x15, models 0x60 and
    later. The minimum recommended scrub rate has also changed. (Refer to
    D18F2x1c9_dct[1:0][DramScrub] in Fam15hM60h BKDG).

    Adjust set_scrub_rate() and get_scrub_rate() functions to accommodate
    this.

    Tested on F15hM60h, Fam15h, models 00h-0fh and Fam10h systems.

    Signed-off-by: Aravind Gopalakrishnan
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1443440593-2316-2-git-send-email-Aravind.Gopalakrishnan@amd.com
    [ Cleanup conditionals. ]
    Signed-off-by: Borislav Petkov

    Aravind Gopalakrishnan
     

23 Feb, 2015

1 commit

  • Instead of calling device_create_file() and device_remove_file()
    manually, pass the static attribute groups with the new
    edac_mc_add_mc_with_groups(). The conditional creation of inject sysfs
    files is done by a proper is_visible callback.

    Signed-off-by: Takashi Iwai
    Link: http://lkml.kernel.org/r/1423046938-18111-4-git-send-email-tiwai@suse.de
    Signed-off-by: Borislav Petkov

    Takashi Iwai
     

30 Oct, 2014

1 commit

  • This patch adds support for ECC error decoding for F15h M60h processor.
    Aside from the usual changes, the patch adds support for some new features
    in the processor:
    - DDR4(unbuffered, registered); LRDIMM DDR3 support
    - relevant debug messages have been modified/added to report these
    memory types
    - new dbam_to_cs mappers
    - if (F15h M60h && LRDIMM); we need a 'multiplier' value to find
    cs_size. This multiplier value is obtained from the per-dimm
    DCSM register. So, change the interface to accept a 'cs_mask_nr'
    value to facilitate this calculation
    - switch-casing determine_memory_type()
    - done to cleanse the function of too many if-else statements
    and improve readability
    - This is now called early in read_mc_regs() to cache dram_type

    Misc cleanup:
    - amd64_pci_table[] is condensed by using PCI_VDEVICE macro.

    Testing details:
    Tested the patch by injecting 'ECC' type errors using mce_amd_inj
    and error decoding works fine.

    Signed-off-by: Aravind Gopalakrishnan
    Link: http://lkml.kernel.org/r/1414617483-4941-1-git-send-email-Aravind.Gopalakrishnan@amd.com
    [ Boris: determine_memory_type() cleanups ]
    Signed-off-by: Borislav Petkov

    Aravind Gopalakrishnan
     

23 Sep, 2014

1 commit

  • Rationale behind this change:
    - F2x1xx addresses were stopped from being mapped explicitly to DCT1
    from F15h (OR) onwards. They use _dct[0:1] mechanism to access the
    registers. So we should move away from using address ranges to select
    DCT for these families.
    - On newer processors, the address ranges used to indicate DCT1 (0x140,
    0x1a0) have different meanings than what is assumed currently.

    Changes introduced:
    - amd64_read_dct_pci_cfg() now takes in dct value and uses it for
    'selecting the dct'
    - Update usage of the function. Keep in mind that different families
    have specific handling requirements
    - Remove [k8|f10]_read_dct_pci_cfg() as they don't do much different
    from amd64_read_pci_cfg()
    - Move the k8 specific check to amd64_read_pci_cfg
    - Remove f15_read_dct_pci_cfg() and move logic to amd64_read_dct_pci_cfg()
    - Remove now needless .read_dct_pci_cfg

    Testing:
    - Tested on Fam 10h; Fam15h Models: 00h, 30h; Fam16h using 'EDAC_DEBUG'
    and mce_amd_inj
    - driver obtains info from F2x registers and caches it in pvt
    structures correctly
    - ECC decoding works fine

    Signed-off-by: Aravind Gopalakrishnan
    Link: http://lkml.kernel.org/r/1410799058-3149-1-git-send-email-aravind.gopalakrishnan@amd.com
    Signed-off-by: Borislav Petkov

    Aravind Gopalakrishnan
     

28 Feb, 2014

1 commit

  • Extend ECC decoding support for F16h M30h. Tested on F16h M30h with ECC
    turned on using mce_amd_inj module and the patch works fine.

    Signed-off-by: Aravind Gopalakrishnan
    Link: http://lkml.kernel.org/r/1392913726-16961-1-git-send-email-Aravind.Gopalakrishnan@amd.com
    Tested-by: Arindam Nath
    Acked-by: H. Peter Anvin
    Signed-off-by: Borislav Petkov

    Aravind Gopalakrishnan
     

22 Oct, 2013

1 commit

  • GENMASK is used to create a contiguous bitmask([hi:lo]). It is
    implemented twice in current kernel. One is in EDAC driver, the other
    is in SiS/XGI FB driver. Move it to a more generic place for other
    usage.

    Signed-off-by: Chen, Gong
    Cc: Borislav Petkov
    Cc: Thomas Winischhofer
    Cc: Jean-Christophe Plagniol-Villard
    Cc: Tomi Valkeinen
    Acked-by: Borislav Petkov
    Acked-by: Mauro Carvalho Chehab
    Signed-off-by: Tony Luck

    Chen, Gong
     

12 Aug, 2013

2 commits

  • Now that we cache (family, model, stepping) locally, use them instead of
    boot_cpu_data.

    No functionality change.

    Signed-off-by: Borislav Petkov

    Borislav Petkov
     
  • On newer models, support has been included for upto 4 DCT's, however,
    only DCT0 and DCT3 are currently configured (cf BKDG Section 2.10).
    Also, the routing DRAM Requests algorithm is different for F15h M30h.
    Thus it is cleaner to use a brand new function rather than adding quirks
    to the more generic f1x_match_to_this_node(). Refer to "2.10.5 DRAM
    Routing Requests" in the BKDG for further info.

    Tested on Fam15h M30h with ECC turned on using mce_amd_inj facility and
    verified to be functionally correct.

    While at it, verify if erratum workarounds for E505 and E637 still hold.
    From email conversations within AMD, the current status of the errata
    is:

    * Erratum 505: fixed in model 0x1, stepping 0x1 and later.
    * Erratum 637: not fixed.

    Signed-off-by: Aravind Gopalakrishnan
    [ Cleanups, corrections ]
    Signed-off-by: Borislav Petkov

    Aravind Gopalakrishnan
     

19 Apr, 2013

1 commit


10 Jan, 2013

2 commits

  • Use appropriate types for northbridge IDs and memory ranges. Mark
    immutable data const and keep within compilation unit on related
    structures.

    Signed-off-by: Daniel J Blueman
    Link: http://lkml.kernel.org/r/1354265060-22956-2-git-send-email-daniel@numascale-asia.com
    [Boris: Drop arg change to node_to_amd_nb]
    Signed-off-by: Borislav Petkov

    Daniel J Blueman
     
  • Fix get_node_id to match northbridge IDs from the array of detected
    ones, allowing multi-server support such as with Numascale's
    NumaConnect, renaming to 'amd_get_node_id' for consistency.

    Signed-off-by: Daniel J Blueman
    Link: http://lkml.kernel.org/r/1353997932-8475-1-git-send-email-daniel@numascale-asia.com
    [Boris: shorten lines to fit 80 cols]
    Signed-off-by: Borislav Petkov

    Daniel J Blueman
     

28 Nov, 2012

2 commits