Doug / smarc-fsl-linux-kernel | Embedian Git Server

16 Mar, 2013

2 commits

9713faecf EDAC: Merge mci.mem_is_per_rank with mci.csbased ... Browse Code »

Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
memory controller is csrows based. Merge both fields into one.

There's no need for the driver to actually fill it, as the core detects
it by checking if one of the layers has the csrows type as part of the
memory hierarchy:

if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
per_rank = true;

Signed-off-by: Mauro Carvalho Chehab
Signed-off-by: Borislav Petkov

Mauro Carvalho Chehab
2013-03-16 13:32:30 +0800
1eef12825 amd64_edac: Correct DIMM sizes ... Browse Code »

We were filling the csrow size with a wrong value. 16a528ee3975 ("EDAC:
Fix csrow size reported in sysfs") tried to address the issue. It fixed
the report with the old API but not with the new one. Correct it for the
new API too.

Signed-off-by: Mauro Carvalho Chehab
[ make it a per-csrow accounting regardless of ->channel_count ]
Signed-off-by: Borislav Petkov

Mauro Carvalho Chehab
2013-03-16 13:32:02 +0800

05 Mar, 2013

1 commit

fbe2d3616 EDAC: Make sysfs functions static ... Browse Code »

Fixes lots of sparse warnings here.

Signed-off-by: Stephen Hemminger
Signed-off-by: Borislav Petkov

Stephen Hemminger
2013-03-05 18:33:57 +0800

01 Mar, 2013

1 commit

ad6c2c2eb Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac ... Browse Code »

Pull EDAC fixes and ghes-edac from Mauro Carvalho Chehab:
"For:

- Some fixes at edac drivers (i7core_edac, sb_edac, i3200_edac);
- error injection support for i5100, when EDAC debug is enabled;
- fix edac when it is loaded builtin (early init for the subsystem);
- a "Firmware First" EDAC driver, allowing ghes to report errors via
EDAC (ghes-edac).

With regards to ghes-edac, this fixes a longstanding BZ at Red Hat
that happens with Nehalem and Sandy Bridge CPUs: when both GHES and
i7core_edac or sb_edac are running, the error reports are
unpredictable, as both BIOS and OS race to access the registers. With
ghes-edac, the EDAC core will refuse to register any other concurrent
memory error driver.

This patchset moves the ghes struct definitions to a separate header
file (include/acpi/ghes.h) and adds 3 hooks at apei/ghes.c to
register/unregister and to report errors via ghes-edac. Those changes
were acked by ghes driver maintainer (Huang)."

* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (30 commits)
i5100_edac: convert to use simple_open()
ghes_edac: fix to use list_for_each_entry_safe() when delete list items
ghes_edac: Fix RAS tracing
ghes_edac: Make it compliant with UEFI spec 2.3.1
ghes_edac: Improve driver's printk messages
ghes_edac: Don't credit the same memory dimm twice
ghes_edac: do a better job of filling EDAC DIMM info
ghes_edac: add support for reporting errors via EDAC
ghes_edac: Register at EDAC core the BIOS report
ghes: add the needed hooks for EDAC error report
ghes: move structures/enum to a header file
edac: add support for error type "Info"
edac: add support for raw error reports
edac: reduce stack pressure by using a pre-allocated buffer
edac: lock module owner to avoid error report conflicts
edac: remove proc_name from mci structure
edac: add a new memory layer type
edac: initialize the core earlier
edac: better report error conditions in debug mode
i5100_edac: Remove two checkpatch warnings
...

Linus Torvalds
2013-03-01 12:42:33 +0800

26 Feb, 2013

9 commits

b0769891b i5100_edac: convert to use simple_open() ... Browse Code »

This removes an open coded simple_open() function and
replaces file operations references to the function
with simple_open() instead.

Signed-off-by: Wei Yongjun
Signed-off-by: Mauro Carvalho Chehab

Wei Yongjun
2013-02-26 21:06:18 +0800
5dae92a71 ghes_edac: fix to use list_for_each_entry_safe() when delete list items ... Browse Code »

Since we will remove items off the list using list_del() we need
to use a safe version of the list_for_each_entry() macro aptly named
list_for_each_entry_safe().

Signed-off-by: Wei Yongjun
Signed-off-by: Mauro Carvalho Chehab

Wei Yongjun
2013-02-26 21:05:35 +0800
8ae8f50ad ghes_edac: Fix RAS tracing ... Browse Code »

With the current version of CPER, there's no way to associate an
error with the memory error. So, the error location in EDAC
layers is unused.

As CPER has its own idea about memory architectural layers, just
output whatever is there inside the driver's detail at the RAS
tracepoint.

The EDAC location keeps untouched, in the case that, in some future,
we could actually map the error into the dimm labels.

Now, the error message:

[ 72.396625] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[ 72.396627] {1}[Hardware Error]: APEI generic hardware error status
[ 72.396628] {1}[Hardware Error]: severity: 2, corrected
[ 72.396630] {1}[Hardware Error]: section: 0, severity: 2, corrected
[ 72.396632] {1}[Hardware Error]: flags: 0x01
[ 72.396634] {1}[Hardware Error]: primary
[ 72.396635] {1}[Hardware Error]: section_type: memory error
[ 72.396637] {1}[Hardware Error]: error_status: 0x0000000000000400
[ 72.396638] {1}[Hardware Error]: node: 3
[ 72.396639] {1}[Hardware Error]: card: 0
[ 72.396640] {1}[Hardware Error]: module: 0
[ 72.396641] {1}[Hardware Error]: device: 0
[ 72.396643] {1}[Hardware Error]: error_type: 18, unknown
[ 72.396666] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:0 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)

Is properly represented on the trace event:

kworker/0:2-584 [000] .... 72.396657: mc_event: 1 Corrected error: reserved error (18) on unknown label (mc:0 location:-1:-1:-1 address:0x00000000 grain:1 syndrome:0x00000000 APEI location: node:3 card:0 module:0 status(0x0000000000000400): Storage error in DRAM memory)

Tested on a 4 sockets E5-4650 Sandy Bridge machine.

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2013-02-26 06:42:17 +0800
689c9cd81 ghes_edac: Make it compliant with UEFI spec 2.3.1 ... Browse Code »

The UEFI spec defines the memory error types ans the bits that
validate each field on the memory error record, at
Appendix N om items N.2.5 (Memory Error Section) and
N.2.11 (Error Status). Make the error description compliant with
it, only showing the valid fields.

The EDAC error log is now properly reporting the error:

[ 281.556854] mce: [Hardware Error]: Machine check events logged
[ 281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[ 281.557044] {2}[Hardware Error]: APEI generic hardware error status
[ 281.557046] {2}[Hardware Error]: severity: 2, corrected
[ 281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected
[ 281.557050] {2}[Hardware Error]: flags: 0x01
[ 281.557052] {2}[Hardware Error]: primary
[ 281.557053] {2}[Hardware Error]: section_type: memory error
[ 281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400
[ 281.557056] {2}[Hardware Error]: node: 3
[ 281.557057] {2}[Hardware Error]: card: 0
[ 281.557058] {2}[Hardware Error]: module: 1
[ 281.557059] {2}[Hardware Error]: device: 0
[ 281.557061] {2}[Hardware Error]: error_type: 18, unknown
[ 281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9
[ 281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)

Tested on a 4 CPUs E5-4650 Sandy Bridge machine.

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2013-02-26 06:42:16 +0800
d2a685661 ghes_edac: Improve driver's printk messages ... Browse Code »

Provide a better infrastructure for printk's inside the driver:
- use edac_dbg() for debug messages;
- standardize the usage of pr_info();
- provide warning about the risk of relying on this
driver.

While here, changes the size of a fake memory to 1 page. This is
as good or as bad as 1000 pages, but it is easier for userspace to
detect, as I don't expect that any machine implementing GHES would
provide just 1 page available ;)

Signed-off-by: Mauro Carvalho Chehab

Conflicts:
drivers/edac/ghes_edac.c

Mauro Carvalho Chehab
2013-02-26 06:42:15 +0800
5ee726db5 ghes_edac: Don't credit the same memory dimm twice ... Browse Code »

On my tests on a 4xE5-4650 CPU's system, the GHES
EDAC driver is called twice. As the SMBIOS DMI enumeration
call will seek for the entire DIMM sockets in the system, on
this machine, equipped with 128 GB of RAM, the memory is
displayed twice:

+-----------------------+
| mc0 | mc1 |
----------+-----------------------+
memory45: | 8192 MB | 8192 MB |
memory44: | 0 MB | 0 MB |
----------+-----------------------+
memory43: | 0 MB | 0 MB |
memory42: | 8192 MB | 8192 MB |
----------+-----------------------+
memory41: | 0 MB | 0 MB |
memory40: | 0 MB | 0 MB |
----------+-----------------------+
memory39: | 8192 MB | 8192 MB |
memory38: | 0 MB | 0 MB |
----------+-----------------------+
memory37: | 0 MB | 0 MB |
memory36: | 8192 MB | 8192 MB |
----------+-----------------------+
memory35: | 0 MB | 0 MB |
memory34: | 0 MB | 0 MB |
----------+-----------------------+
memory33: | 8192 MB | 8192 MB |
memory32: | 0 MB | 0 MB |
----------+-----------------------+
memory31: | 0 MB | 0 MB |
memory30: | 8192 MB | 8192 MB |
----------+-----------------------+
memory29: | 0 MB | 0 MB |
memory28: | 0 MB | 0 MB |
----------+-----------------------+
memory27: | 8192 MB | 8192 MB |
memory26: | 0 MB | 0 MB |
----------+-----------------------+
memory25: | 0 MB | 0 MB |
memory24: | 8192 MB | 8192 MB |
----------+-----------------------+
memory23: | 0 MB | 0 MB |
memory22: | 0 MB | 0 MB |
----------+-----------------------+
memory21: | 8192 MB | 8192 MB |
memory20: | 0 MB | 0 MB |
----------+-----------------------+
memory19: | 0 MB | 0 MB |
memory18: | 8192 MB | 8192 MB |
----------+-----------------------+
memory17: | 0 MB | 0 MB |
memory16: | 0 MB | 0 MB |
----------+-----------------------+
memory15: | 8192 MB | 8192 MB |
memory14: | 0 MB | 0 MB |
----------+-----------------------+
memory13: | 0 MB | 0 MB |
memory12: | 8192 MB | 8192 MB |
----------+-----------------------+
memory11: | 0 MB | 0 MB |
memory10: | 0 MB | 0 MB |
----------+-----------------------+
memory9: | 8192 MB | 8192 MB |
memory8: | 0 MB | 0 MB |
----------+-----------------------+
memory7: | 0 MB | 0 MB |
memory6: | 8192 MB | 8192 MB |
----------+-----------------------+
memory5: | 0 MB | 0 MB |
memory4: | 0 MB | 0 MB |
----------+-----------------------+
memory3: | 8192 MB | 8192 MB |
memory2: | 0 MB | 0 MB |
----------+-----------------------+
memory1: | 0 MB | 0 MB |
memory0: | 8192 MB | 8192 MB |
----------+-----------------------+

Total sum of 256 GB.

As there's no reliable way to credit DIMMS to the right memory
controller, just put everything on memory controller 0 (with should
always exist).

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2013-02-26 06:42:14 +0800
32fa1f53c ghes_edac: do a better job of filling EDAC DIMM info ... Browse Code »

Instead of just faking a random value for the DIMM data, get
the information that it is available via DMI table.

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2013-02-26 06:42:13 +0800
f04c62a70 ghes_edac: add support for reporting errors via EDAC ... Browse Code »

Now that the EDAC core is capable of just forward the errors via
the userspace API, add a report mechanism for the GHES errors.

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2013-02-26 06:42:13 +0800
77c5f5d2f ghes_edac: Register at EDAC core the BIOS report ... Browse Code »

Register GHES at EDAC MC core, in order to avoid other
drivers to also handle errors and mangle with error data.

The edac core will warrant that just one driver will be used,
so the first one to register (BIOS first) will be the one that
will be reporting the hardware errors.

For now, the EDAC driver does nothing but to register at the
EDAC core, preventing the hardware-driven mechanism to
interfere with GHES.

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2013-02-26 06:42:12 +0800

22 Feb, 2013

3 commits

06991c28f Merge tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core patches from Greg Kroah-Hartman:
"Here is the big driver core merge for 3.9-rc1

There are two major series here, both of which touch lots of drivers
all over the kernel, and will cause you some merge conflicts:

- add a new function called devm_ioremap_resource() to properly be
able to check return values.

- remove CONFIG_EXPERIMENTAL

Other than those patches, there's not much here, some minor fixes and
updates"

Fix up trivial conflicts

* tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
base: memory: fix soft/hard_offline_page permissions
drivercore: Fix ordering between deferred_probe and exiting initcalls
backlight: fix class_find_device() arguments
TTY: mark tty_get_device call with the proper const values
driver-core: constify data for class_find_device()
firmware: Ignore abort check when no user-helper is used
firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
firmware: Make user-mode helper optional
firmware: Refactoring for splitting user-mode helper code
Driver core: treat unregistered bus_types as having no devices
watchdog: Convert to devm_ioremap_resource()
thermal: Convert to devm_ioremap_resource()
spi: Convert to devm_ioremap_resource()
power: Convert to devm_ioremap_resource()
mtd: Convert to devm_ioremap_resource()
mmc: Convert to devm_ioremap_resource()
mfd: Convert to devm_ioremap_resource()
media: Convert to devm_ioremap_resource()
iommu: Convert to devm_ioremap_resource()
drm: Convert to devm_ioremap_resource()
...

Linus Torvalds
2013-02-22 04:05:51 +0800
e7e248304 edac: add support for raw error reports ... Browse Code »

That allows APEI GHES driver to report errors directly, using
the EDAC error report API.

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2013-02-22 01:16:03 +0800
c7ef76455 edac: reduce stack pressure by using a pre-allocated buffer ... Browse Code »

The number of variables at the stack is too big.
Reduces the stack usage by using a pre-allocated error
buffer.

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2013-02-22 00:48:45 +0800

21 Feb, 2013

13 commits

80cc7d87d edac: lock module owner to avoid error report conflicts ... Browse Code »

APEI GHES and i7core_edac/sb_edac currently can be loaded at
the same time, but those are Highlander modules:
"There can be only one".

There are two reasons for that:

1) Each driver assumes that it is the only one registering at
the EDAC core, as it is driver's responsibility to number
the memory controllers, and all of them start from 0;

2) If BIOS is handling the memory errors, the OS can't also be
doing it, as one will mangle with the other.

So, we need to add an module owner's lock at the EDAC core,
in order to avoid having two different modules handling memory
errors at the same time. The best way for doing this lock seems
to use the driver's name, as this is unique, and won't require
changes on every driver.

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2013-02-21 22:06:38 +0800
c66b5a79a edac: add a new memory layer type ... Browse Code »

There are some cases where the memory controller layout is
completely hidden. This is the case of firmware-driven error
code, like the one provided by GHES. Add a new layer to be
used on such memory error report mechanisms.

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2013-02-21 22:06:37 +0800
4ab19b06a edac: initialize the core earlier ... Browse Code »

In order for it to work with it builtin, the EDAC core should
be initialized earlier, otherwise the ghes_edac driver initializes
before edac_mc_sysfs_init() being called:

...
[ 4.998373] EDAC MC0: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes
...
[ 4.998373] EDAC MC1: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes
[ 6.519495] EDAC MC: Ver: 3.0.0
[ 6.523749] EDAC DEBUG: edac_mc_sysfs_init: device mc created

The net result is that no EDAC sysfs nodes will appear.

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2013-02-21 22:06:36 +0800
3d958823e edac: better report error conditions in debug mode ... Browse Code »

It is hard to find what's wrong without a proper error
report. Improve it, in debug mode.

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2013-02-21 22:06:35 +0800
59b9796d1 i5100_edac: Remove two checkpatch warnings ... Browse Code »

The last changeset introduced a few checkpatch warnings:

WARNING: debugfs_remove_recursive(NULL) is safe this check is probably not required
261: FILE: drivers/edac/i5100_edac.c:1207:
+ if (priv->debugfs)
+ debugfs_remove_recursive(priv->debugfs);

WARNING: debugfs_remove(NULL) is safe this check is probably not required
290: FILE: drivers/edac/i5100_edac.c:1250:
+ if (i5100_debugfs)
+ debugfs_remove(i5100_debugfs);

Get rid of them.

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2013-02-21 22:06:34 +0800
9cbc6d38f i5100_edac: connect fault injection to debugfs node ... Browse Code »

Create a debugfs direcotry i5100_edac/mcX for each memory controller and
add nodes to control how fault injection is preformed.

After configuring an injection using inject_channel, inject_deviceptr1,
inject_deviceptr2, inject_eccmask1, inject_eccmask2 and inject_hlinesel
trigger the injection by writing anything to inject_enable.

Example of a CE injection:

echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
echo 61440 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable

Example of UE injection:

echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
echo 2 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask2
echo 17 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr1
echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr2
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable

Sometimes it is needed to enable the injection more then once (echo to
the inject_enable node) for the injection to happen, I am not sure why.

Signed-off-by: Niklas Söderlund
Signed-off-by: Mauro Carvalho Chehab

Niklas Söderlund
2013-02-21 22:06:34 +0800
53ceafd6a i5100_edac: add fault injection code ... Browse Code »

Add fault injection based on information datasheet for i5100, see 1. In
addition to the i5100 datasheet some missing information on injection
functions where found through experimentation and the i7300 datasheet,
see 2.

[1] Intel 5100 Memory Controller Hub Chipset
Doc.Nr: 318378
http://www.intel.com/content/dam/doc/datasheet/5100-
memory-controller-hub-chipset-datasheet.pdf

[2] Intel 7300 Chipset MemoryController Hub (MCH)
Doc.Nr: 318082
http://www.intel.com/assets/pdf/datasheet/318082.pdf

Signed-off-by: Niklas Söderlund
Signed-off-by: Mauro Carvalho Chehab

Niklas Söderlund
2013-02-21 22:06:33 +0800
52608ba20 i5100_edac: probe for device 19 function 0 ... Browse Code »

Probe and store the device handle for the device 19 function 0 during
driver initialization. The device is used during fault injection.

Signed-off-by: Niklas Söderlund
Signed-off-by: Mauro Carvalho Chehab

Niklas Söderlund
2013-02-21 22:06:32 +0800
e7100478f edac: only create sdram_scrub_rate where supported ... Browse Code »

Currently, sdram_scrub_rate sysfs node is created even if the device
doesn't support get/set the scub rate. Change the logic to only
create this device node when the operation is supported.

Reported-by: Felipe Balbi
Acked-by: Borislav Petkov
Reviewed-by: Felipe Balbi
Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2013-02-21 22:06:31 +0800
61734e183 i3200_edac: Fix the logic that detects filled memories ... Browse Code »

After running a series of tests on an HP DL320, filled with different
memory sizes, it was noticed that, when filled with just one DIMM
on such hardware, the driver wrongly detects twice the memory, and
thinks that both channels 0 and 1 are filled.

It seems to be partially caused by the BIOS and partially by the driver.

The i3200_edac current logic would be working fine if the BIOS were
disabling the unused second channel when just one DIMM is connected,
in order to do power-saving, as recommended on this chipset's datasheet.

However, the BIOS on this particular machine doesn't do it:

[ 16.741421] EDAC DEBUG: how_many_channels: In dual channel mode
[ 16.741424] EDAC DEBUG: how_many_channels: 2 DIMMS per channel enabled

So, the driver were assuming that 2 channels are enabled (well, they are,
but the second is unused).

Combined with that, I found two issues at the logic that creates the
EDAC data, that were failing when the two channels are not equally
filled (AFAICT, that happens only when just 1 DIMM is plugged).

The first one is that a 0 at DRB means that nothing is filled. The
driver's logic, however, do some calculation with that.

The second one is that the logic that fills the DIMM data currently
assumes that both channels are equally filled.

I tested the system already with the current configuration and my
patch and it is now working fine. So, for a 2R single DIMM 2Gb memory
at dimm slot 01 (channel 0), it is now displaying:

[ 16.741406] EDAC DEBUG: i3200_get_drbs: drb[0][0] = 16, drb[1][0] = 0
[ 16.741410] EDAC DEBUG: i3200_get_drbs: drb[0][1] = 32, drb[1][1] = 0
[ 16.741413] EDAC DEBUG: i3200_get_drbs: drb[0][2] = 32, drb[1][2] = 0
[ 16.741416] EDAC DEBUG: i3200_get_drbs: drb[0][3] = 32, drb[1][3] = 0
...
[ 16.741896] EDAC DEBUG: i3200_probe1: csrow 0, channel 0, size = 1024 Mb
[ 16.741899] EDAC DEBUG: i3200_probe1: csrow 1, channel 0, size = 1024 Mb

and the corresponding sysfs nodes are now properly filled.

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2013-02-21 22:06:31 +0800
5f466cb02 i3200_edac: Add more debug to the driver ... Browse Code »

Currently, it is not possible to know, when debug is enabled,
if the driver is using 2 DIMMS per channel mode or not. It is
not possible to know the values of the drbs registers, used
to identify the memory rank sizes.

Add debug for both, as it helps to track issues on the driver.

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2013-02-21 22:06:22 +0800
55529fa57 Merge tag 'edac_3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp ... Browse Code »

Pull EDAC updates from Borislav Petkov:
"Mostly AMD's side of EDAC. It is basically a new family enablement
stuff: AMD F16h MCE decoding enablement from Jacob Shin. The rest is
trivial cleanups."

* tag 'edac_3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
mpc85xx_edac: Fix typo
EDAC, MCE, AMD: Remove unneeded exports
EDAC, MCE, AMD: Add MCE decoding support for Family 16h
EDAC, MCE, AMD: Make MC2 decoding per-family
amd64_edac: Remove dead code

Linus Torvalds
2013-02-21 03:28:50 +0800
1339730e7 Merge tag 'v3.8-rc7' into next ... Browse Code »

Linux 3.8-rc7

* tag 'v3.8-rc7': (12052 commits)
Linux 3.8-rc7
net: sctp: sctp_endpoint_free: zero out secret key data
net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree
atm/iphase: rename fregt_t -> ffreg_t
ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned
ARM: DMA mapping: fix bad atomic test
ARM: realview: ensure that we have sufficient IRQs available
ARM: GIC: fix GIC cpumask initialization
net: usb: fix regression from FLAG_NOARP code
l2tp: dont play with skb->truesize
net: sctp: sctp_auth_key_put: use kzfree instead of kfree
netback: correct netbk_tx_err to handle wrap around.
xen/netback: free already allocated memory on failure in xen_netbk_get_requests
xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.
xen/netback: shutdown the ring if it contains garbage.
drm/ttm: fix fence locking in ttm_buffer_object_transfer, 2nd try
virtio_console: Don't access uninitialized data.
net: qmi_wwan: add more Huawei devices, including E320
net: cdc_ncm: add another Huawei vendor specific device
ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit()
...

Mauro Carvalho Chehab
2013-02-21 02:45:52 +0800

20 Feb, 2013

1 commit

f98982ce8 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 platform changes from Ingo Molnar:

- Support for the Technologic Systems TS-5500 platform, by Vivien
Didelot

- Improved NUMA support on AMD systems:

Add support for federated systems where multiple memory controllers
can exist and see each other over multiple PCI domains. This
basically means that AMD node ids can be more than 8 now and the code
handling this is taught to incorporate PCI domain into those IDs.

- Support for the Goldfish virtual Android emulator, by Jun Nakajima,
Intel, Google, et al.

- Misc fixlets.

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Add TS-5500 platform support
x86/srat: Simplify memory affinity init error handling
x86/apb/timer: Remove unnecessary "if"
goldfish: platform device for x86
amd64_edac: Fix type usage in NB IDs and memory ranges
amd64_edac: Fix PCI function lookup
x86, AMD, NB: Use u16 for northbridge IDs in amd_get_nb_id
x86, AMD, NB: Add multi-domain support

Linus Torvalds
2013-02-20 12:11:07 +0800

10 Feb, 2013

1 commit

e7d2c215e mpc85xx_edac: Fix typo ... Browse Code »

Correct typos.

Signed-off-by: Baruch Siach
Cc: Dave Jiang
Signed-off-by: Borislav Petkov

Baruch Siach
2013-02-10 20:28:41 +0800

30 Jan, 2013

2 commits

d3d09e182 EDAC: Fix kcalloc argument order ... Browse Code »

First number, then size.

Signed-off-by: Joe Perches
Cc:
Signed-off-by: Borislav Petkov

Joe Perches
2013-01-30 18:38:25 +0800
8024c4c0b EDAC: Test correct variable in ->store function ... Browse Code »

We're testing for ->show but calling ->store().

Signed-off-by: Dan Carpenter
Cc: stable@vger.kernel.org
Signed-off-by: Borislav Petkov

Dan Carpenter
2013-01-30 18:38:11 +0800

23 Jan, 2013

4 commits

0f08669e8 EDAC, MCE, AMD: Remove unneeded exports ... Browse Code »

Initially, those strings describing different parts of an MCE message
were shared with amd64_edac and were therefore exported to modules.
However, all except pp_msgs are used only in one place right now so hide
them and make them static.

No functionality change.

Reported-by: Fengguang Wu
Signed-off-by: Borislav Petkov

Borislav Petkov
2013-01-23 05:40:03 +0800
980eec8b2 EDAC, MCE, AMD: Add MCE decoding support for Family 16h ... Browse Code »

Add MCE decoding logic for AMD Family 16h processors.

Boris:

- drop unneeded uu_msgs export
- exit early in cat_mc1_mce and save us an indentation level

Signed-off-by: Jacob Shin
Signed-off-by: Borislav Petkov

Jacob Shin
2013-01-23 05:39:58 +0800
4a73d3de6 EDAC, MCE, AMD: Make MC2 decoding per-family ... Browse Code »

Currently only AMD Family 15h processors have special handling for MC2
errors. Since upcoming Family 16h will also need unique handling, let's
make MC2 handling part of amd_decoder_ops.

Signed-off-by: Jacob Shin
Signed-off-by: Borislav Petkov

Jacob Shin
2013-01-23 05:39:54 +0800
acc7fcb40 amd64_edac: Remove dead code ... Browse Code »

5e2af0c09e60 ("edac: Don't initialize csrow's first_page & friends when
not needed") removed useless initialization of variables but left in the
functions which did that. They're unused now so drop them.

Signed-off-by: Borislav Petkov

Borislav Petkov
2013-01-23 05:39:41 +0800

18 Jan, 2013

1 commit

053417a53 drivers/edac: remove depends on CONFIG_EXPERIMENTAL ... Browse Code »

The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

Acked-by: Mauro Carvalho Chehab
Signed-off-by: Kees Cook
Signed-off-by: Greg Kroah-Hartman

Kees Cook
2013-01-18 04:11:26 +0800

10 Jan, 2013

2 commits

c7e5301a1 amd64_edac: Fix type usage in NB IDs and memory ranges ... Browse Code »

Use appropriate types for northbridge IDs and memory ranges. Mark
immutable data const and keep within compilation unit on related
structures.

Signed-off-by: Daniel J Blueman
Link: http://lkml.kernel.org/r/1354265060-22956-2-git-send-email-daniel@numascale-asia.com
[Boris: Drop arg change to node_to_amd_nb]
Signed-off-by: Borislav Petkov

Daniel J Blueman
2013-01-10 23:18:00 +0800
e2c0bffea amd64_edac: Fix PCI function lookup ... Browse Code »

Fix locating sibling memory controller PCI functions by using the
correct PCI domain and use a northbridge descriptor only if found. We
need to at least warn if it wasn't found so that it gets fixed and we
don't go off with wrong results.

Signed-off-by: Daniel J Blueman
Link: http://lkml.kernel.org/r/1354265060-22956-1-git-send-email-daniel@numascale-asia.com
[Boris: remove wrong comment, sanitize code and warn if NB desc lookup fails]
Signed-off-by: Borislav Petkov

Daniel J Blueman
2013-01-10 23:17:59 +0800