23 May, 2020
1 commit
-
Add support for AMD Renoir (4000-series Ryzen CPUs).
Signed-off-by: Alexander Monakov
Signed-off-by: Borislav Petkov
Acked-by: Yazen Ghannam
Link: https://lkml.kernel.org/r/20200510204842.2603-4-amonakov@ispras.ru
17 Jan, 2020
1 commit
-
Add family ops to support AMD Family 19h systems. Existing Family 17h
functions can be used. Also, add Family 19h to the list of families to
automatically load the module.Signed-off-by: Yazen Ghannam
Signed-off-by: Borislav Petkov
Link: https://lkml.kernel.org/r/20200110015651.14887-5-Yazen.Ghannam@amd.com
06 Nov, 2019
1 commit
-
The maximum number of memory controllers is fixed within a family/model
group. In most cases, this has been fixed at 2, but some systems may
have up to 8.The struct amd64_family_type already contains family/model-specific
information, and this can be used rather than adding model checks to
various functions.Create a new field in struct amd64_family_type for max_mcs.
Set this when setting other family type information, and use this when
needing the maximum number of memory controllers possible for a system.Signed-off-by: Yazen Ghannam
Signed-off-by: Borislav Petkov
Cc: "linux-edac@vger.kernel.org"
Cc: James Morse
Cc: Mauro Carvalho Chehab
Cc: Robert Richter
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20191106012448.243970-4-Yazen.Ghannam@amd.com
07 Sep, 2019
1 commit
-
Add the new Family 17h Model 70h PCI IDs (device 18h functions 0 and 6)
to the AMD64 EDAC module.[ bp: s/f17_base_addr_to_cs_size/f17_addr_mask_to_cs_size/g ]
Signed-off-by: Isaac Vaughn
Signed-off-by: Borislav Petkov
Cc: James Morse
Cc: linux-edac@vger.kernel.org
Cc: Mauro Carvalho Chehab
Cc: Robert Richter
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20190906192131.8ced0ca112146f32d82b6cae@knights.ucf.edu
23 Aug, 2019
3 commits
-
Future AMD systems will support asymmetric dual-rank DIMMs. These are
DIMMs where the ranks are of different sizes.The even rank will use the Primary Even Chip Select registers and the
odd rank will use the Secondary Odd Chip Select registers.Recognize if a Secondary Odd Chip Select is being used. Use the
Secondary Odd Address Mask when calculating the chip select size.[ bp: move csrow_sec_enabled() to the header, fix CS_ODD define and
tone-down the capitalized words spelling. ]Signed-off-by: Yazen Ghannam
Signed-off-by: Borislav Petkov
Cc: "linux-edac@vger.kernel.org"
Cc: James Morse
Cc: Mauro Carvalho Chehab
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20190821235938.118710-8-Yazen.Ghannam@amd.com -
AMD Family 17h systems have a set of secondary Chip Select Base
Addresses and Address Masks. These do not represent unique Chip
Selects, rather they are used in conjunction with the primary
Chip Select registers in certain cases.Cache these secondary Chip Select registers for future use.
Signed-off-by: Yazen Ghannam
Signed-off-by: Borislav Petkov
Cc: "linux-edac@vger.kernel.org"
Cc: James Morse
Cc: Mauro Carvalho Chehab
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20190821235938.118710-7-Yazen.Ghannam@amd.com -
The struct chip_select array that's used for saving chip select bases
and masks is fixed at length of two. There should be one struct
chip_select for each controller, so this array should be increased to
support systems that may have more than two controllers.Increase the size of the struct chip_select array to eight, which is the
largest number of controllers per die currently supported on AMD
systems.Fix number of DIMMs and Chip Select bases/masks on Family17h, because
AMD Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per
channel.Also, carve out the Family 17h+ reading of the bases/masks into a
separate function. This effectively reverts the original bases/masks
reading code to before Family 17h support was added.Signed-off-by: Yazen Ghannam
Signed-off-by: Borislav Petkov
Cc: "linux-edac@vger.kernel.org"
Cc: James Morse
Cc: Mauro Carvalho Chehab
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20190821235938.118710-2-Yazen.Ghannam@amd.com
25 Apr, 2019
1 commit
-
This reverts commit 0a227af521d6df5286550b62f4b591417170b4ea.
Unfortunately, this commit caused wrong detection of chip select sizes
on some F17h client machines:--- 00-rc6+ 2019-02-14 14:28:03.126622904 +0100
+++ 01-rc4+ 2019-04-14 21:06:16.060614790 +0200
EDAC amd64: MC: 0: 0MB 1: 0MB
-EDAC amd64: MC: 2: 16383MB 3: 16383MB
+EDAC amd64: MC: 2: 0MB 3: 2097151MB
EDAC amd64: MC: 4: 0MB 5: 0MB
EDAC amd64: MC: 6: 0MB 7: 0MB
EDAC MC: UMC1 chip selects:
EDAC amd64: MC: 0: 0MB 1: 0MB
-EDAC amd64: MC: 2: 16383MB 3: 16383MB
+EDAC amd64: MC: 2: 0MB 3: 2097151MB
EDAC amd64: MC: 4: 0MB 5: 0MB
EDAC amd64: MC: 6: 0MB 7: 0MRevert it for now until it has been solved properly.
Signed-off-by: Borislav Petkov
Cc: Yazen Ghannam
27 Mar, 2019
4 commits
-
The struct chip_select array that's used for saving chip select bases
and masks is fixed at length of two. There should be one struct
chip_select for each controller, so this array should be increased to
support systems that may have more than two controllers.Increase the size of the struct chip_select array to eight, which is the
largest number of controllers per die currently supported on AMD
systems.Also, carve out the Family 17h+ reading of the bases/masks into a
separate function. This effectively reverts the original bases/masks
reading code to before Family 17h support was added.Signed-off-by: Yazen Ghannam
Signed-off-by: Borislav Petkov
Tested-by: Kim Phillips
Cc: James Morse
Cc: Mauro Carvalho Chehab
Cc: linux-edac
Link: https://lkml.kernel.org/r/20190228153558.127292-5-Yazen.Ghannam@amd.com -
Future AMD systems may support x16 symbol sizes.
Recognize if a system is using x16 symbol size. Also, simplify the print
statement.Note that a x16 syndrome vector table is not necessary like with x4 or
x8 syndromes. This is because systems that support x16 symbol sizes are
SMCA systems and in that case, the syndrome can be directly extracted
from the MCA_SYND[Syndrome] field.[ bp: massage. ]
Signed-off-by: Yazen Ghannam
Signed-off-by: Borislav Petkov
Tested-by: Kim Phillips
Cc: James Morse
Cc: Mauro Carvalho Chehab
Cc: linux-edac
Link: https://lkml.kernel.org/r/20190228153558.127292-4-Yazen.Ghannam@amd.com -
The first few models of Family 17h all had 2 Unified Memory Controllers
per Die, so this was treated as a fixed value. However, future systems
may have more Unified Memory Controllers per Die.Related to this, the channel number and base address of a Unified Memory
Controller were found by matching on fixed, known values. However,
current and future systems follow this pattern for the channel number
and base address of a Unified Memory Controller: 0xYXXXXX, where Y is
the channel number. So matching on hardcoded values is not necessary.Set the number of Unified Memory Controllers at driver init time based
on the family/model. Also, update the functions that find the channel
number and base address of a Unified Memory Controller to support more
than two.[ bp: Move num_umcs into the .c file and simplify comment. ]
Signed-off-by: Yazen Ghannam
Signed-off-by: Borislav Petkov
Tested-by: Kim Phillips
Cc: James Morse
Cc: Mauro Carvalho Chehab
Cc: linux-edac
Link: https://lkml.kernel.org/r/20190228153558.127292-3-Yazen.Ghannam@amd.com -
Add the new Family 17h Model 30h PCI IDs to the AMD64 EDAC module.
This also fixes a probe failure that appeared when some other PCI IDs
for Family 17h Model 30h were added to the AMD NB code.Fixes: be3518a16ef2 (x86/amd_nb: Add PCI device IDs for family 17h, model 30h)
Signed-off-by: Yazen Ghannam
Signed-off-by: Borislav Petkov
Tested-by: Kim Phillips
Cc: James Morse
Cc: Mauro Carvalho Chehab
Cc: linux-edac
Link: https://lkml.kernel.org/r/20190228153558.127292-1-Yazen.Ghannam@amd.com
27 Aug, 2018
1 commit
-
Add new device IDs for family 17h, models 10h-2fh.
This is required by amd64_edac_mod in order to properly detect PCI
device functions 0 and 6.Signed-off-by: Michael Jin
Reviewed-by: Yazen Ghannam
Cc:
Link: http://lkml.kernel.org/r/20180816192840.31166-1-mikhail.jin@gmail.com
Signed-off-by: Borislav Petkov
14 Feb, 2017
1 commit
-
Last time we did that was when we enabled Bulldozer. Now, we enabled Zen
so it is only natural ... :-)Signed-off-by: Borislav Petkov
Cc: Yazen Ghannam
28 Jan, 2017
2 commits
-
Match one of the devices in amd64_cpuids[] before loading the module.
This is an additional sanity check against users trying to load
amd64_edac_mod on unsupported systems.Signed-off-by: Yazen Ghannam
Cc: linux-edac
Link: http://lkml.kernel.org/r/1485537863-2707-9-git-send-email-Yazen.Ghannam@amd.com
[ Get rid of err_ret label, make it a bit more readable this way. ]
Signed-off-by: Borislav Petkov -
amd64_{debug,notice} don't have any users, so remove them.
Signed-off-by: Yazen Ghannam
Cc: linux-edac
Link: http://lkml.kernel.org/r/1485537863-2707-6-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov
15 Dec, 2016
1 commit
-
Now, all left at edac_core.h are at drivers/edac/edac_mc.c,
so rename it to edac_mc.h.Signed-off-by: Mauro Carvalho Chehab
01 Dec, 2016
1 commit
-
Prefix the warn and error macros with the respective string so that
callers don't have to say "Error" or "Warning". We save us string length
this way in the actual calls.While at it, shorten the calls in reserve_mc_sibling_devs().
Signed-off-by: Borislav Petkov
Cc: Dan Carpenter
Cc: Yazen Ghannam
30 Nov, 2016
2 commits
-
How we need to decode UMC errors is different from how we decode bus
errors, so let's define a new function for this. We also need a way to
determine the UMC channel since we're not guaranteed that there is a
fixed relation between channel and MCA bank.Signed-off-by: Yazen Ghannam
Cc: Aravind Gopalakrishnan
Cc: linux-edac
Cc: x86-ml
Link: http://lkml.kernel.org/r/1480359593-80369-1-git-send-email-Yazen.Ghannam@amd.com
[ Fold in decode_synd_reg(), simplify. ]
Signed-off-by: Borislav Petkov -
Read a few more UMC registers and provide debug output in order to be as
similar as possible to older AMD systems.Signed-off-by: Yazen Ghannam
Cc: Aravind Gopalakrishnan
Cc: linux-edac
Cc: x86-ml
Link: http://lkml.kernel.org/r/1480344621-14966-1-git-send-email-Yazen.Ghannam@amd.com
[ Remove unneeded K8 check and comments, fixup others. ]
Signed-off-by: Borislav Petkov
29 Nov, 2016
3 commits
-
Fam17h has new register offsets and fields for setting up the DRAM
scrubber so add support for this.Signed-off-by: Yazen Ghannam
Cc: Aravind Gopalakrishnan
Cc: linux-edac
Cc: x86-ml
Link: http://lkml.kernel.org/r/1479423463-8536-17-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov -
Fam17h has a different set of registers and bitfields. Most of these
registers are read through SMN (System Management Network) rather
than PCI config space. Also, the derivation of various values is now
different.Update amd64_edac to read the appropriate registers and extract the
correct values for Fam17h.Signed-off-by: Yazen Ghannam
Cc: Aravind Gopalakrishnan
Cc: linux-edac
Cc: x86-ml
Link: http://lkml.kernel.org/r/1479423463-8536-12-git-send-email-Yazen.Ghannam@amd.com
[ Save us the indentation level in read_mc_regs(), add defines ]
Signed-off-by: Borislav Petkov -
Fam17h needs PCI device functions 0 and 6 instead of 1 and 2 as on older
systems. Update struct amd64_pvt to hold the new functions and reserve
them if on Fam17h.Also, allocate an array of UMC structs within our newly allocated PVT
struct.Signed-off-by: Yazen Ghannam
Cc: Aravind Gopalakrishnan
Cc: linux-edac
Cc: x86-ml
Link: http://lkml.kernel.org/r/1479423463-8536-11-git-send-email-Yazen.Ghannam@amd.com
[ init_one_instance() error handling, shorten lines, unbreak >80 cols lines. ]
Signed-off-by: Borislav Petkov
25 Nov, 2016
2 commits
-
Add a family type and associated ops for Fam17h. Define a struct to hold
all the UMC registers that we need. Make this a part of struct amd64_pvt
in order to maximize code reuse in the rest of the driver.Signed-off-by: Yazen Ghannam
Cc: Aravind Gopalakrishnan
Cc: linux-edac
Cc: x86-ml
Link: http://lkml.kernel.org/r/1479423463-8536-10-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov -
Update the ecc_enabled() function to work on Fam17h. This entails
reading a different set of registers and using the SMN (System
Management Network) rather than PCI devices.Signed-off-by: Yazen Ghannam
Cc: Aravind Gopalakrishnan
Cc: linux-edac
Cc: x86-ml
Link: http://lkml.kernel.org/r/1479423463-8536-9-git-send-email-Yazen.Ghannam@amd.com
[ Fixup ecc_en assignment and get_umc_base(). ]
Signed-off-by: Borislav Petkov
10 May, 2016
1 commit
-
- remove homegrown instances counting.
- take F3 PCI device from amd_nb caching instead of F2 which was used with the
PCI core.With those changes, the driver doesn't need to register a PCI driver and
relies on the northbridges caching which we do anyway on AMD.Signed-off-by: Borislav Petkov
Cc: Yazen Ghannam
29 Sep, 2015
2 commits
-
Git provides us all the changelogs anyway. So trim the comments section
here. Update the copyrights info while at it.Signed-off-by: Aravind Gopalakrishnan
Cc: linux-edac
Link: http://lkml.kernel.org/r/1443440593-2316-3-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov -
The scrub rate control register has moved to function 2 in PCI config
space and is at a different offset on family 0x15, models 0x60 and
later. The minimum recommended scrub rate has also changed. (Refer to
D18F2x1c9_dct[1:0][DramScrub] in Fam15hM60h BKDG).Adjust set_scrub_rate() and get_scrub_rate() functions to accommodate
this.Tested on F15hM60h, Fam15h, models 00h-0fh and Fam10h systems.
Signed-off-by: Aravind Gopalakrishnan
Cc: linux-edac
Link: http://lkml.kernel.org/r/1443440593-2316-2-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Cleanup conditionals. ]
Signed-off-by: Borislav Petkov
23 Feb, 2015
1 commit
-
Instead of calling device_create_file() and device_remove_file()
manually, pass the static attribute groups with the new
edac_mc_add_mc_with_groups(). The conditional creation of inject sysfs
files is done by a proper is_visible callback.Signed-off-by: Takashi Iwai
Link: http://lkml.kernel.org/r/1423046938-18111-4-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov
30 Oct, 2014
1 commit
-
This patch adds support for ECC error decoding for F15h M60h processor.
Aside from the usual changes, the patch adds support for some new features
in the processor:
- DDR4(unbuffered, registered); LRDIMM DDR3 support
- relevant debug messages have been modified/added to report these
memory types
- new dbam_to_cs mappers
- if (F15h M60h && LRDIMM); we need a 'multiplier' value to find
cs_size. This multiplier value is obtained from the per-dimm
DCSM register. So, change the interface to accept a 'cs_mask_nr'
value to facilitate this calculation
- switch-casing determine_memory_type()
- done to cleanse the function of too many if-else statements
and improve readability
- This is now called early in read_mc_regs() to cache dram_typeMisc cleanup:
- amd64_pci_table[] is condensed by using PCI_VDEVICE macro.Testing details:
Tested the patch by injecting 'ECC' type errors using mce_amd_inj
and error decoding works fine.Signed-off-by: Aravind Gopalakrishnan
Link: http://lkml.kernel.org/r/1414617483-4941-1-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Boris: determine_memory_type() cleanups ]
Signed-off-by: Borislav Petkov
23 Sep, 2014
1 commit
-
Rationale behind this change:
- F2x1xx addresses were stopped from being mapped explicitly to DCT1
from F15h (OR) onwards. They use _dct[0:1] mechanism to access the
registers. So we should move away from using address ranges to select
DCT for these families.
- On newer processors, the address ranges used to indicate DCT1 (0x140,
0x1a0) have different meanings than what is assumed currently.Changes introduced:
- amd64_read_dct_pci_cfg() now takes in dct value and uses it for
'selecting the dct'
- Update usage of the function. Keep in mind that different families
have specific handling requirements
- Remove [k8|f10]_read_dct_pci_cfg() as they don't do much different
from amd64_read_pci_cfg()
- Move the k8 specific check to amd64_read_pci_cfg
- Remove f15_read_dct_pci_cfg() and move logic to amd64_read_dct_pci_cfg()
- Remove now needless .read_dct_pci_cfgTesting:
- Tested on Fam 10h; Fam15h Models: 00h, 30h; Fam16h using 'EDAC_DEBUG'
and mce_amd_inj
- driver obtains info from F2x registers and caches it in pvt
structures correctly
- ECC decoding works fineSigned-off-by: Aravind Gopalakrishnan
Link: http://lkml.kernel.org/r/1410799058-3149-1-git-send-email-aravind.gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov
28 Feb, 2014
1 commit
-
Extend ECC decoding support for F16h M30h. Tested on F16h M30h with ECC
turned on using mce_amd_inj module and the patch works fine.Signed-off-by: Aravind Gopalakrishnan
Link: http://lkml.kernel.org/r/1392913726-16961-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Tested-by: Arindam Nath
Acked-by: H. Peter Anvin
Signed-off-by: Borislav Petkov
22 Oct, 2013
1 commit
-
GENMASK is used to create a contiguous bitmask([hi:lo]). It is
implemented twice in current kernel. One is in EDAC driver, the other
is in SiS/XGI FB driver. Move it to a more generic place for other
usage.Signed-off-by: Chen, Gong
Cc: Borislav Petkov
Cc: Thomas Winischhofer
Cc: Jean-Christophe Plagniol-Villard
Cc: Tomi Valkeinen
Acked-by: Borislav Petkov
Acked-by: Mauro Carvalho Chehab
Signed-off-by: Tony Luck
12 Aug, 2013
2 commits
-
Now that we cache (family, model, stepping) locally, use them instead of
boot_cpu_data.No functionality change.
Signed-off-by: Borislav Petkov
-
On newer models, support has been included for upto 4 DCT's, however,
only DCT0 and DCT3 are currently configured (cf BKDG Section 2.10).
Also, the routing DRAM Requests algorithm is different for F15h M30h.
Thus it is cleaner to use a brand new function rather than adding quirks
to the more generic f1x_match_to_this_node(). Refer to "2.10.5 DRAM
Routing Requests" in the BKDG for further info.Tested on Fam15h M30h with ECC turned on using mce_amd_inj facility and
verified to be functionally correct.While at it, verify if erratum workarounds for E505 and E637 still hold.
From email conversations within AMD, the current status of the errata
is:* Erratum 505: fixed in model 0x1, stepping 0x1 and later.
* Erratum 637: not fixed.Signed-off-by: Aravind Gopalakrishnan
[ Cleanups, corrections ]
Signed-off-by: Borislav Petkov
19 Apr, 2013
1 commit
-
Add code to handle DRAM ECC errors decoding for Fam16h.
Tested on Fam16h with ECC turned on using the mce_amd_inj facility and
works fine.Signed-off-by: Aravind Gopalakrishnan
[ Boris: cleanups and clarifications ]
Signed-off-by: Borislav Petkov
10 Jan, 2013
2 commits
-
Use appropriate types for northbridge IDs and memory ranges. Mark
immutable data const and keep within compilation unit on related
structures.Signed-off-by: Daniel J Blueman
Link: http://lkml.kernel.org/r/1354265060-22956-2-git-send-email-daniel@numascale-asia.com
[Boris: Drop arg change to node_to_amd_nb]
Signed-off-by: Borislav Petkov -
Fix get_node_id to match northbridge IDs from the array of detected
ones, allowing multi-server support such as with Numascale's
NumaConnect, renaming to 'amd_get_node_id' for consistency.Signed-off-by: Daniel J Blueman
Link: http://lkml.kernel.org/r/1353997932-8475-1-git-send-email-daniel@numascale-asia.com
[Boris: shorten lines to fit 80 cols]
Signed-off-by: Borislav Petkov
28 Nov, 2012
2 commits
-
Instead of open-coding it, use the DBAM_DIMM macro in
amd64_csrow_nr_pages() which we have already.Signed-off-by: Borislav Petkov
-
Rewrite CE/UE paths so that they use the same code and drop additional
code duplication in handle_ue. Add a struct err_info which collects
required info for the error reporting. This, in turn, helps slimming all
edac_mc_handle_error() calls down to one.Signed-off-by: Borislav Petkov