27 Apr, 2017
1 commit
-
Leave it to the user to decide whether to enable this or not. Otherwise,
platform-specific drivers won't initialize (currently, EDAC supports
only a single platform driver loaded).Signed-off-by: Borislav Petkov
10 Apr, 2017
10 commits
-
Change them to have the edac_ prefix.
No functionality change.
Signed-off-by: Borislav Petkov
-
Move the remaining functionality to edac_mc.c. Convert "edac_report=" to
a module parameter.Signed-off-by: Borislav Petkov
-
Remove the old URLs.
Signed-off-by: Borislav Petkov
-
Move all the EDAC core functionality behind CONFIG_EDAC and get rid of
that indirection. Update defconfigs which had it.While at it, fix dependencies such that EDAC depends on RAS for the
tracepoints.Signed-off-by: Borislav Petkov
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Chris Metcalf
Cc: linux-edac@vger.kernel.org -
... and this happens only when CONFIG_RAS is enabled.
Signed-off-by: Borislav Petkov
-
... as part of moving stuff away from edac_stub.c
Signed-off-by: Borislav Petkov
-
... and the glue around it. It is not needed anymore.
Signed-off-by: Borislav Petkov
-
Use mc_devices list instead to check whether we have EDAC driver
instances successfully registered with EDAC core.Signed-off-by: Borislav Petkov
-
Apparently, some machines used to report DRAM errors through a PCI SERR
NMI. This is why we have a call into EDAC in the NMI handler. Seec0d121720220 ("drivers/edac: add new nmi rescan").
From looking at the patch above, that's two drivers: e752x_edac.c and
e7xxx_edac.c. Now, I wanna say those are old machines which are probably
decommissioned already.Tony says that "[t]the newest CPU supported by either of those drivers
is the Xeon E7520 (a.k.a. "Nehalem") released in Q1'2010. Possibly some
folks are still using these ... but people that hold onto h/w for 7
years generally cling to old s/w too ... so I'd guess it unlikely that
we will get complaints for breaking these in upstream."So even if there is a small number still in use, we did load EDAC with
edac_op_state == EDAC_OPSTATE_POLL by default (we still do, in fact)
which means a default EDAC setup without any parameters supplied on the
command line or otherwise would never even log the error in the NMI
handler because we're polling by default:inline int edac_handler_set(void)
{
if (edac_op_state == EDAC_OPSTATE_POLL)
return 0;return atomic_read(&edac_handlers);
}So, long story short, I'd like to get rid of that nastiness called
edac_stub.c and confine all the EDAC drivers solely to drivers/edac/. If
we ever have to do stuff like that again, it should be notifiers we're
using and not some insanity like this one.Signed-off-by: Borislav Petkov
Acked-by: Thomas Gleixner
Cc: Tony Luck -
... like the rest of the file.
Signed-off-by: Borislav Petkov
07 Apr, 2017
2 commits
-
Remove unused code reserved for upcoming CPUs.
Reported-by: Dan Carpenter
Signed-off-by: Sergey Temerkhanov
Cc: David Daney
Cc: Jan.Glauber@cavium.com
Cc: linux-edac
Link: http://lkml.kernel.org/r/20170406113834.17153-1-s.temerkhanov@gmail.com
Signed-off-by: Borislav Petkov -
Shift the node number by 3 bits instead of 8 allowing proper functioning
with default EDAC_MAX_MCS.Signed-off-by: Sergey Temerkhanov
Cc: David Daney
Cc: Jan.Glauber@cavium.com
Cc: linux-edac
Link: http://lkml.kernel.org/r/20170406113755.17082-1-s.temerkhanov@gmail.com
Signed-off-by: Borislav Petkov
06 Apr, 2017
1 commit
-
The peripherals' RAS functionality only exist on the Arria10 SoCFPGA.
The Cyclone5 initialization generates EDAC warnings when the peripherals
aren't found in the device tree. Fix by checking for Arria10 in the init
functions.Signed-off-by: Thor Thayer
Cc: linux-edac
Link: http://lkml.kernel.org/r/1491415262-5018-1-git-send-email-thor.thayer@linux.intel.com
Signed-off-by: Borislav Petkov
05 Apr, 2017
1 commit
-
Fix a typo that disabled the MCI interrupts using the wrong bitmask.
Signed-off-by: Jan Glauber
Cc: David Daney
Cc: Ralf Baechle
Cc: Sergey Temerkhanov
Cc: linux-edac
Link: http://lkml.kernel.org/r/20170405102739.6301-1-jglauber@cavium.com
Signed-off-by: Borislav Petkov
27 Mar, 2017
1 commit
-
Add support for Cavium ThunderX EDAC capable on-chip peripherals, namely
the DRAM controller (LMC), cache coherent processor interconnect (CCPI)
and level 2 cache blocks (L2C-TAD, L2C-MCI, L2C-CBC)Signed-off-by: Sergey Temerkhanov
Cc: David.Daney@cavium.com
Cc: Jan.Glauber@cavium.com
Cc: linux-edac
Link: http://lkml.kernel.org/r/20170324222837.60583-1-s.temerkhanov@gmail.com
Signed-off-by: Borislav Petkov
26 Mar, 2017
1 commit
-
DIMM number passed to edac_mc_handle_error() was accidentally hardcoded
to zero. Pass in the correct daddr->dimm value.Signed-off-by: Qiuxu Zhuo
Signed-off-by: Borislav Petkov
23 Mar, 2017
2 commits
-
Provide debugfs function stubs when EDAC_DEBUG is not enabled so that we
don't fail the build:drivers/edac/pnd2_edac.c: In function ‘pnd2_init’:
drivers/edac/pnd2_edac.c:1521:2: error: implicit declaration of function ‘setup_pnd2_debug’ [-Werror=implicit-function-declaration]
setup_pnd2_debug();
^
drivers/edac/pnd2_edac.c: In function ‘pnd2_exit’:
drivers/edac/pnd2_edac.c:1529:2: error: implicit declaration of function ‘teardown_pnd2_debug’ [-Werror=implicit-function-declaration]
teardown_pnd2_debug();
^Signed-off-by: Borislav Petkov
-
The debugfs.c functionality relies on DEBUG_FS so select it.
Signed-off-by: Borislav Petkov
16 Mar, 2017
1 commit
-
Initial target for this driver is the Intel Apollo Lake platform and
Denverton micro-server, they use the same internal memory controller IP
called Pondicherry2.Memory controller registers are not in PCI config space like earlier
Intel memory controllers. For Apollo Lake platform they are accessed via
a "side-band" interface, for Denverton micro-server they are access via
PCI config space and memory map I/O. This driver is for Apollo Lake and
Denverton, but only the Denverton is fully enabled while we wait for the
sideband driver.Apollo lake driver and initial cut at Denverton driver by Tony Luck.
Extensive cleanup, refactoring and basic verification by Qiuxu Zhuo.Signed-off-by: Tony Luck
Signed-off-by: Qiuxu Zhuo
Cc: linux-edac
Link: http://lkml.kernel.org/r/20170308174539.14432-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov
09 Mar, 2017
1 commit
-
The MTR_DRAM_WIDTH macro returns the data width. It is sometimes used
as if it returned a boolean true if the width if 8. Fix the tests where
MTR_DRAM_WIDTH is misused.Signed-off-by: Jérémy Lefaure
Cc: linux-edac
Link: http://lkml.kernel.org/r/20170309011809.8340-1-jeremy.lefaure@lse.epita.fr
Signed-off-by: Borislav Petkov
07 Mar, 2017
1 commit
-
Fix spelling mistake in dev_err message.
Signed-off-by: Colin Ian King
Reviewed-by: Loc Ho
Cc: linux-edac
Link: http://lkml.kernel.org/r/20170223002609.9440-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov
21 Feb, 2017
1 commit
-
Pull RAS updates from Ingo Molnar:
"The main changes in this cycle were:- Assign notifier chain priorities for all RAS related handlers to
make the ordering explicit (Borislav Petkov)- Improve the AMD MCA banks sysfs output (Yazen Ghannam)
- Various cleanups and restructuring of the x86 RAS code (Borislav
Petkov)"* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority
x86/ras: Get rid of mce_process_work()
EDAC/mce/amd: Dump TSC value
EDAC/mce/amd: Unexport amd_decode_mce()
x86/ras/amd/inj: Change dependency
x86/ras: Flip the TSC-adding logic
x86/ras/amd: Make sysfs names of banks more user-friendly
x86/ras/therm_throt: Do not log a fake MCE for thermal events
x86/ras/inject: Make it depend on X86_LOCAL_APIC=y
16 Feb, 2017
1 commit
-
Currently, the IPID and Syndrome are printed on the same line as the
Address. There are cases when we can have a valid Syndrome but not a
valid Address.For example, the MCA_SYND register can be used to hold more detailed
error info that the hardware folks can use. It's not just DRAM ECC
syndromes. There are some error types that aren't related to memory that
may have valid syndromes, like some errors related to links in the Data
Fabric, etc.In these cases, the IPID and Syndrome are not printed at the same log
level as the rest of the stanza, so users won't see them on the console.Console:
[Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
[Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2Dmesg:
[Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
, Syndrome: 0x000000010b404000, IPID: 0x0001002e00000002
[Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2Print the IPID first and on a new line. The IPID should always be
printed on SMCA systems. The Syndrome will then be printed with the IPID
and at the same log level when valid:[Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
[Hardware Error]: IPID: 0x0001002e00000002, Syndrome: 0x000000010b404000
[Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2Signed-off-by: Yazen Ghannam
Cc: linux-edac
Link: http://lkml.kernel.org/r/1487192182-2474-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov
14 Feb, 2017
1 commit
-
Last time we did that was when we enabled Bulldozer. Now, we enabled Zen
so it is only natural ... :-)Signed-off-by: Borislav Petkov
Cc: Yazen Ghannam
10 Feb, 2017
1 commit
-
Fix the following sparse warnings:
drivers/edac/fsl_ddr_edac.c:148:1: warning:
symbol 'dev_attr_inject_data_hi' was not declared. Should it be static?
drivers/edac/fsl_ddr_edac.c:150:1: warning:
symbol 'dev_attr_inject_data_lo' was not declared. Should it be static?
drivers/edac/fsl_ddr_edac.c:152:1: warning:
symbol 'dev_attr_inject_ctrl' was not declared. Should it be static?Signed-off-by: Wei Yongjun
Cc: linux-edac
Link: http://lkml.kernel.org/r/20170209150424.15124-1-weiyj.lk@gmail.com
Signed-off-by: Borislav Petkov
03 Feb, 2017
1 commit
-
The L2 cache controller on the T2080 SoC has similar capabilities to the
others already supported by the mpc85xx_edac driver. Add it to the list
of compatible devices.Signed-off-by: Chris Packham
Acked-by: Johannes Thumshirn
Acked-by: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: devicetree@vger.kernel.org
Cc: linux-edac
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170201231624.28843-1-chris.packham@alliedtelesis.co.nz
Signed-off-by: Borislav Petkov
28 Jan, 2017
8 commits
-
Match one of the devices in amd64_cpuids[] before loading the module.
This is an additional sanity check against users trying to load
amd64_edac_mod on unsupported systems.Signed-off-by: Yazen Ghannam
Cc: linux-edac
Link: http://lkml.kernel.org/r/1485537863-2707-9-git-send-email-Yazen.Ghannam@amd.com
[ Get rid of err_ret label, make it a bit more readable this way. ]
Signed-off-by: Borislav Petkov -
Having ECC disabled on a node doesn't necessarily mean that it's
disabled for the entire system. So let's return a non-failing code when
ECC is disabled on a node. This way we can skip initialization for the
node but still continue with the remaining nodes.After probing all instances, make sure we have at least one MC device
allocated.This issue is seen and fix tested on Fam15h and Fam17h MCM systems.
Signed-off-by: Yazen Ghannam
Cc: linux-edac
Link: http://lkml.kernel.org/r/1485537863-2707-8-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov -
We need to know if any MC devices have been allocated.
Signed-off-by: Yazen Ghannam
Cc: linux-edac
Link: http://lkml.kernel.org/r/1485537863-2707-7-git-send-email-Yazen.Ghannam@amd.com
[ Prettify text. ]
Signed-off-by: Borislav Petkov -
amd64_{debug,notice} don't have any users, so remove them.
Signed-off-by: Yazen Ghannam
Cc: linux-edac
Link: http://lkml.kernel.org/r/1485537863-2707-6-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov -
Print the node number when informing that DRAM ECC is disabled so
that we can show which nodes have DRAM ECC disabled. Also, print more
detailed system information as edac_dbg(), so as to not bother general
users.Switch amd64_notice to amd64_info to match the message above it.
Signed-off-by: Yazen Ghannam
Cc: linux-edac
Link: http://lkml.kernel.org/r/1485537863-2707-5-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov -
We have a few functions that register/unregister an ECC error decoding
routine. These functions are called when we init/remove instances.
However, they are global and so don't need to be registered/unregistered
multiple times.So move them out of the init/remove instance functions and into the
module init/exit routines.Signed-off-by: Yazen Ghannam
Cc: linux-edac
Link: http://lkml.kernel.org/r/1485297149-13733-4-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov -
Jump to memory freeing routines when init_one_instance() fails.
Signed-off-by: Yazen Ghannam
Cc: linux-edac
Link: http://lkml.kernel.org/r/1485297149-13733-3-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov -
Users may not be familiar with the concept of deferred errors. There is
no action for users to take on this type of error, so give more context
in the error message to make this more clear.Signed-off-by: Yazen Ghannam
Cc: linux-edac
Link: http://lkml.kernel.org/r/1485297149-13733-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov
26 Jan, 2017
1 commit
-
REDMEMB[17] is the ECC_Locator bit, which, when set, identifies the
CS[3:2] as the simbols in error. And thus the second channel.The macro computing it was wrong so get rid of it (it was used at one
place only) and get rid of the conditional too. Generates better code
this way anyway.Signed-off-by: Borislav Petkov
Reported-by: David Binderman
Reviewed-by: Mauro Carvalho Chehab
24 Jan, 2017
3 commits
-
Assign all notifiers on the MCE decode chain a priority so that they get
called in the correct order.Suggested-by: Thomas Gleixner
Signed-off-by: Borislav Petkov
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Tony Luck
Cc: Yazen Ghannam
Cc: linux-edac
Link: http://lkml.kernel.org/r/20170123183514.13356-10-bp@alien8.de
Signed-off-by: Ingo Molnar -
Dump the TSC value of the time when the MCE got logged.
Signed-off-by: Borislav Petkov
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Yazen Ghannam
Cc: linux-edac
Link: http://lkml.kernel.org/r/20170123183514.13356-8-bp@alien8.de
Signed-off-by: Ingo Molnar -
It is not used outside of the driver anymore.
Signed-off-by: Borislav Petkov
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Yazen Ghannam
Cc: linux-edac
Link: http://lkml.kernel.org/r/20170123183514.13356-7-bp@alien8.de
Signed-off-by: Ingo Molnar
23 Jan, 2017
1 commit
-
Function sbridge_register_mci() sets pvt->info.show_interleave_mode
to knl_show_interleave_mode() on Knight's Landing and
show_interleave_mode() anywhere else.Merge show_interleave_mode() and knl_show_interleave_mode() in a single
implementation and use it without an indirect function pointer.Signed-off-by: Nicolas Iooss
Cc: Mauro Carvalho Chehab
Cc: linux-edac
Link: http://lkml.kernel.org/r/20170122172806.10412-1-nicolas.iooss_linux@m4x.org
[ Call it get_intlv_mode_str(). ]
Signed-off-by: Borislav Petkov