16 Sep, 2009
5 commits
-
The old code was using smp_call_function_many which skips the current
cpu if it is in the supplied cpumask. Switch to the rdmsr_on_cpus()
interface which takes care of that.In addition, add get_cpus_on_this_dct_cpumask helper which computes a
cpumask of all the cores on a node and thus on a DCT.Signed-off-by: Borislav Petkov
-
Simplify the procedure by checking if there is any DIMM in each channel.
This patch will fix the bugs such as when there is no DIMMs under
certain node, two DIMMs in the same channel, and only one DIMM in each
channel of the node.Borislav: minor fixups
Signed-off-by: Wan Wei
Signed-off-by: Borislav Petkov -
Simplify code flow and make sure return value is always valid since
further driver init depends on it. Carve out long warning string and
make code more readable. Shorten some names, while at it.There should be no functional change resulting from this patch.
Signed-off-by: Borislav Petkov
-
Signed-off-by: Andreas Herrmann
Signed-off-by: Borislav Petkov
Acked-by: H. Peter Anvin -
-tip testing found the following build failure (config attached):
drivers/built-in.o: In function `amd64_check':
amd64_edac.c:(.text+0x3e9491): undefined reference to `amd_decode_nb_mce'
drivers/built-in.o: In function `amd64_init_2nd_stage':
amd64_edac.c:(.text+0x3e9b46): undefined reference to `amd_report_gart_errors'
amd64_edac.c:(.text+0x3e9b55): undefined reference to `amd_register_ecc_decoder'
drivers/built-in.o: In function `amd64_nbea_store':
amd64_edac_dbg.c:(.text+0x3ea22e): undefined reference to `amd_decode_nb_mce'
drivers/built-in.o: In function `amd64_remove_one_instance':
amd64_edac.c:(.devexit.text+0x3eea): undefined reference to `amd_report_gart_errors'
amd64_edac.c:(.devexit.text+0x3ef6): undefined reference to `amd_unregister_ecc_decoder'the AMD EDAC code has a dependency on CONFIG_CPU_SUP_AMD facilities. The
patch below solves the problem here.Signed-off-by: Ingo Molnar
Signed-off-by: Borislav Petkov
15 Sep, 2009
14 commits
-
See Fam10h BKDG (31116, rev. 3.28), Table 101.
Signed-off-by: Borislav Petkov
-
See Fam10h BKDG (31116, rev. 3.28), Table 100.
Signed-off-by: Borislav Petkov
-
... according to Table 69, Fam10h BKDG (31116, rev. 3.28).
Signed-off-by: Borislav Petkov
-
See Fam10h BKDG (31116, rev. 3.28), Table 95
Signed-off-by: Borislav Petkov
-
Those get reported in MC0_STATUS, see Table 92, F10h BKDG (31116, rev.
3.28) for more details.Signed-off-by: Borislav Petkov
-
This is the MCE error code from the MCi_STATUS banks, bits [15:0] which
describe what type of error was encountered: GART TLB, Memory or Bus
error. The semantics of those bits are identical across all MCE banks so
decode those separately, irrespectively of MCE type.Signed-off-by: Borislav Petkov
-
The MCi_STATUS registers have most field definitions in common so decode
them in the general path. Do not pass ecc_type along and compute it in
__amd64_decode_bus_error instead.Signed-off-by: Borislav Petkov
-
Move NB decoder along with required defines to EDAC MCE core. Add
registration routines for further decoding of the MCE info in the AMD64
EDAC module.CC: Andi Kleen
Signed-off-by: Borislav Petkov -
Signed-off-by: Borislav Petkov
-
Signed-off-by: Borislav Petkov
-
* don't dump info which mcheck already does
* update to newest BKDG
* mv amd64_process_error_info -> amd64_decode_nb_mce
* shorten error struct names
* remove redundant info ptr in amd64_process_error_info
* remove unused ErrorCodeExt[19:16] (MCx_STATUS) definesSigned-off-by: Borislav Petkov
-
* mv amd64_error_info_regs -> err_regs
* remove redundant info ptr
Signed-off-by: Borislav Petkov
-
Signed-off-by: Borislav Petkov
-
This is in preparation of adding AMD-specific MCE decoding functionality
to the EDAC core. The error decoding macros originate from the AMD64
EDAC driver albeit in a simplified and cleaned up version here.While at it, add macros to generate the error description strings and
use them in the error type decoders directly which removes a bunch of
code and makes the decoding functions much more readable. Also, fix
strings and shorten macro names.Remove superfluous htlink_msgs.
Signed-off-by: Borislav Petkov
04 Aug, 2009
1 commit
-
Add forgotten return calls for the successful cases.
Signed-off-by: Doug Thompson
Signed-off-by: Borislav Petkov
03 Aug, 2009
1 commit
-
On the good path of BIOS enabled ECC and no override, the value returned
is 1 by omission and thus is deemed failing by the probe-function.Allow proper module initialization by clearing the retval explicitly.
Signed-off-by: Doug Thompson
Signed-off-by: Borislav Petkov
30 Jul, 2009
1 commit
-
Intel X38 MCHBAR is a 64bits register, base from 0x48, so its higher base
is 0x4C.Signed-off-by: Lu Zhihe
Signed-off-by: Doug Thompson
Cc: [2.6.30.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Jul, 2009
1 commit
-
Signed-off-by: Wan Wei
Signed-off-by: Borislav Petkov
01 Jul, 2009
1 commit
-
Since some new MPC85xx SOCs support DDR3 memory now, so add DDR3 memory
type for MPC85xx EDAC.Signed-off-by: Yang Shi
Cc: Doug Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Jun, 2009
3 commits
-
- cleanup debug calls
- shorten function names
- cleanup error exit pathsSigned-off-by: Borislav Petkov
-
amd64_check_ecc_enabled() returns non-zero status when ECC
checking/correcting is disabled and this fails further loading of the
driver even when 'ecc_enable_override' boot param is used.Fix that by clearing return status in that case.
Signed-off-by: Borislav Petkov
-
Checking whether the machine is using ECC enabled DRAM is done through
testing the DimmEccEn bit in the DRAM Cfg Low register (F2x[1,0]90). Do
that instead of testing all bits from the DimmEccEn upwards.Also, remove mci->edac_cap assignment and use value returned from
amd64_determine_edac_cap().Signed-off-by: Borislav Petkov
19 Jun, 2009
4 commits
-
Fix the meaning of EDAC(Error Detection And Correction) correctly.
[akpm@linux-foundation.org: add missing space]
Signed-off-by: GeunSik Lim
Cc: Alan Cox
Acked-by: Doug Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The remove function uses __devexit, so the .remove assignment needs
__devexit_p() to fix a build error with hotplug disabled.Signed-off-by: Mike Frysinger
Cc: Doug Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add edac_device_alloc_index(), because for MAPLE platform there may
exist several EDAC driver modules that could make use of
edac_device_ctl_info structure at the same time. The index allocation
for these structures should be taken care of by EDAC core.[akpm@linux-foundation.org: cleanups]
Signed-off-by: Harry Ciao
Cc: Doug Thompson
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Kumar Gala
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Introduce IBM CPC925 EDAC driver, which makes use of ECC, CPU and
HyperTransport Link error detections and corrections on the IBM
CPC925 Bridge and Memory Controller.[akpm@linux-foundation.org: cleanup]
Signed-off-by: Harry Ciao
Cc: Doug Thompson
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Kumar Gala
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Jun, 2009
1 commit
-
Signed-off-by: Martin Olsson
Signed-off-by: Jiri Kosina
10 Jun, 2009
8 commits
-
Prevent EDAC compilation units from being built by default and let the
user explicitly select the needed modules.Acked-by: Randy Dunlap
Tested-by: Randy Dunlap
Signed-off-by: Borislav Petkov -
While at it, fix a link failure when !K8_NB.
Acked-by: Doug Thompson
Acked-by: Randy Dunlap
Tested-by: Randy Dunlap
Signed-off-by: Borislav Petkov -
Also, link into Kbuild by adding Kconfig and Makefile entries.
Borislav:
- Kconfig/Makefile splitting
- use zero-sized arrays for the sysfs attrs if not enabled
- rename sysfs attrs to more conform values
- shorten CONFIG_ names
- make multiple structure members assignment vertically aligned
- fix/cleanup comments
- fix function return value patterns
- fix err labels
- fix a memleak bug caught by Ingo
- remove the NUMA dependency and use num_k8_northbrides for initializing
a driver instance per NB.
- do not copy the pvt contents into the mci struct in
amd64_init_2nd_stage() and save it in the mci->pvt_info void ptr
instead.
- cleanup debug calls
- simplify amd64_setup_pci_device()Reviewed-by: Mauro Carvalho Chehab
Signed-off-by: Doug Thompson
Signed-off-by: Borislav Petkov -
Borislav:
- convert to the new {rd|wr}msr_on_cpus interfaces.
- convert pvt->old_mcgctl to a bitmask thus saving some bytes
- fix/cleanup comments
- fix function return value patterns
- add a proper bugfix found by Doug to amd64_check_ecc_enabled where we
missed checking for the ECC enabled bit in NB CFG.
- cleanup debug callsReviewed-by: Mauro Carvalho Chehab
Signed-off-by: Doug Thompson
Signed-off-by: Borislav Petkov -
Borislav:
- add a amd64_free_mc_sibling_devices() helper instead of opencoding the
release-path.
- fix/cleanup comments
- fix function return value patterns
- cleanup debug callsReviewed-by: Mauro Carvalho Chehab
Signed-off-by: Doug Thompson
Signed-off-by: Borislav Petkov -
Borislav:
- fold amd64_error_info_valid() into its only user
- fix/cleanup comments
- fix function return value patterns
- cleanup debug callsReviewed-by: Mauro Carvalho Chehab
Signed-off-by: Doug Thompson
Signed-off-by: Borislav Petkov -
Borislav:
- fix comments
- cleanup debug callsReviewed-by: Mauro Carvalho Chehab
Signed-off-by: Doug Thompson
Signed-off-by: Borislav Petkov -
Borislav:
- fix comments
- fix function return value patternsReviewed-by: Mauro Carvalho Chehab
Signed-off-by: Doug Thompson
Signed-off-by: Borislav Petkov