29 May, 2012
7 commits
-
While userspace doesn't fill the dimm labels, add there the dimm location,
as described by the used memory model. This could eventually match what
is described at the dmidecode, making easier for people to identify the
memory.For example, on an Intel motherboard where the DMI table is reliable,
the first memory stick is described as:Memory Device
Array Handle: 0x0029
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 2048 MB
Form Factor: DIMM
Set: 1
Locator: A1_DIMM0
Bank Locator: A1_Node0_Channel0_Dimm0
Type:
Type Detail: Synchronous
Speed: 800 MHz
Manufacturer: A1_Manufacturer0
Serial Number: A1_SerNum0
Asset Tag: A1_AssetTagNum0
Part Number: A1_PartNum0The memory named as "A1_DIMM0" is physically located at the first
memory controller (node 0), at channel 0, dimm slot 0.After this patch, the memory label will be filled with:
/sys/devices/system/edac/mc/csrow0/ch0_dimm_label:mc#0channel#0slot#0And (after the new EDAC API patches) as:
/sys/devices/system/edac/mc/mc0/dimm0/dimm_label:mc#0channel#0slot#0So, even if the memory label is not initialized on userspace, an useful
information with the error location is filled there, expecially since
several systems/motherboards are provided with enough info to map from
channel/slot (or branch/channel/slot) into the DIMM label. So, letting the
EDAC core fill it by default is a good thing.It should noticed that, as the label filling happens at the
edac_mc_alloc(), drivers can override it to better describe the memories
(and some actually do it).Cc: Aristeu Rozanski
Cc: Doug Thompson
Signed-off-by: Mauro Carvalho Chehab -
Now that all drivers got converted to use the new ABI, we can
drop the old one.Acked-by: Chris Metcalf
Signed-off-by: Mauro Carvalho Chehab -
Change the EDAC internal representation to work with non-csrow
based memory controllers.There are lots of those memory controllers nowadays, and more
are coming. So, the EDAC internal representation needs to be
changed, in order to work with those memory controllers, while
preserving backward compatibility with the old ones.The edac core was written with the idea that memory controllers
are able to directly access csrows.This is not true for FB-DIMM and RAMBUS memory controllers.
Also, some recent advanced memory controllers don't present a per-csrows
view. Instead, they view memories as DIMMs, instead of ranks.So, change the allocation and error report routines to allow
them to work with all types of architectures.This will allow the removal of several hacks with FB-DIMM and RAMBUS
memory controllers.Also, several tests were done on different platforms using different
x86 drivers.TODO: a multi-rank DIMMs are currently represented by multiple DIMM
entries in struct dimm_info. That means that changing a label for one
rank won't change the same label for the other ranks at the same DIMM.
This bug is present since the beginning of the EDAC, so it is not a big
deal. However, on several drivers, it is possible to fix this issue, but
it should be a per-driver fix, as the csrow => DIMM arrangement may not
be equal for all. So, don't try to fix it here yet.I tried to make this patch as short as possible, preceding it with
several other patches that simplified the logic here. Yet, as the
internal API changes, all drivers need changes. The changes are
generally bigger in the drivers for FB-DIMMs.Cc: Aristeu Rozanski
Cc: Doug Thompson
Cc: Borislav Petkov
Cc: Mark Gross
Cc: Jason Uhlenkott
Cc: Tim Small
Cc: Ranganathan Desikan
Cc: "Arvind R."
Cc: Olof Johansson
Cc: Egor Martovetsky
Cc: Chris Metcalf
Cc: Michal Marek
Cc: Jiri Kosina
Cc: Joe Perches
Cc: Dmitry Eremin-Solenikov
Cc: Benjamin Herrenschmidt
Cc: Hitoshi Mitake
Cc: Andrew Morton
Cc: "Niklas Söderlund"
Cc: Shaohui Xie
Cc: Josh Boyer
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab -
The edac_align_ptr() function is used to prepare data for a single
memory allocation kzalloc() call. It counts how many bytes are needed
by some data structure.Using it as-is is not that trivial, as the quantity of memory elements
reserved is not there, but, instead, it is on a next call.In order to avoid mistakes when using it, move the number of allocated
elements into it, making easier to use it.Reviewed-by: Borislav Petkov
Cc: Aristeu Rozanski
Cc: Doug Thompson
Signed-off-by: Mauro Carvalho Chehab -
The number of pages is a dimm property. Move it to the dimm struct.
After this change, it is possible to add sysfs nodes for the DIMM's that
will properly represent the DIMM stick properties, including its size.A TODO fix here is to properly represent dual-rank/quad-rank DIMMs when
the memory controller represents the memory via chip select rows.Reviewed-by: Aristeu Rozanski
Acked-by: Borislav Petkov
Acked-by: Chris Metcalf
Cc: Doug Thompson
Cc: Mark Gross
Cc: Jason Uhlenkott
Cc: Tim Small
Cc: Ranganathan Desikan
Cc: "Arvind R."
Cc: Olof Johansson
Cc: Egor Martovetsky
Cc: Michal Marek
Cc: Jiri Kosina
Cc: Joe Perches
Cc: Dmitry Eremin-Solenikov
Cc: Benjamin Herrenschmidt
Cc: Hitoshi Mitake
Cc: Andrew Morton
Cc: "Niklas Söderlund"
Cc: Shaohui Xie
Cc: Josh Boyer
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab -
On systems based on chip select rows, all channels need to use memories
with the same properties, otherwise the memories on channels A and B
won't be recognized.However, such assumption is not true for all types of memory
controllers.Controllers for FB-DIMM's don't have such requirements.
Also, modern Intel controllers seem to be capable of handling such
differences.So, we need to get rid of storing the DIMM information into a per-csrow
data, storing it, instead at the right place.The first step is to move grain, mtype, dtype and edac_mode to the
per-dimm struct.Reviewed-by: Aristeu Rozanski
Reviewed-by: Borislav Petkov
Acked-by: Chris Metcalf
Cc: Doug Thompson
Cc: Borislav Petkov
Cc: Mark Gross
Cc: Jason Uhlenkott
Cc: Tim Small
Cc: Ranganathan Desikan
Cc: "Arvind R."
Cc: Olof Johansson
Cc: Egor Martovetsky
Cc: Michal Marek
Cc: Jiri Kosina
Cc: Joe Perches
Cc: Dmitry Eremin-Solenikov
Cc: Benjamin Herrenschmidt
Cc: Hitoshi Mitake
Cc: Andrew Morton
Cc: James Bottomley
Cc: "Niklas Söderlund"
Cc: Shaohui Xie
Cc: Josh Boyer
Cc: Mike Williams
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab -
The way a DIMM is currently represented implies that they're
linked into a per-csrow struct. However, some drivers don't see
csrows, as they're ridden behind some chip like the AMB's
on FBDIMM's, for example.This forced drivers to fake^Wvirtualize a csrow struct, and to create
a mess under csrow/channel original's concept.Move the DIMM labels into a per-DIMM struct, and add there
the real location of the socket, in terms of csrow/channel.
Latter patches will modify the location to properly represent the
memory architecture.All other drivers will use a per-csrow type of location.
Some of those drivers will require a latter conversion, as
they also fake the csrows internally.TODO: While this patch doesn't change the existing behavior, on
csrows-based memory controllers, a csrow/channel pair points to a memory
rank. There's a known bug at the EDAC core that allows having different
labels for the same DIMM, if it has more than one rank. A latter patch
is need to merge the several ranks for a DIMM into the same dimm_info
struct, in order to avoid having different labels for the same DIMM.The edac_mc_alloc() will now contain a per-dimm initialization loop that
will be changed by latter patches in order to match other types of
memory architectures.Reviewed-by: Aristeu Rozanski
Reviewed-by: Borislav Petkov
Cc: Doug Thompson
Cc: Ranganathan Desikan
Cc: "Arvind R."
Cc: "Niklas Söderlund"
Signed-off-by: Mauro Carvalho Chehab
29 Mar, 2012
1 commit
-
Pull EDAC fixes from Mauro Carvalho Chehab:
"A series of EDAC driver fixes. It also has one core fix at the
documentation, and a rename patch, fixing the name of the struct that
contains the rank information."* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
edac: rename channel_info to rank_info
i5400_edac: Avoid calling pci_put_device() twice
edac: i5100 ack error detection register after each read
edac: i5100 fix erroneous define for M1Err
edac: sb_edac: Fix a wrong value setting for the previous value
edac: sb_edac: Fix a INTERLEAVE_MODE() misuse
edac: sb_edac: Let the driver depend on PCI_MMCONFIG
edac: Improve the comments to better describe the memory concepts
edac/ppc4xx_edac: Fix compilation
Fix sb_edac compilation with 32 bits kernels
22 Mar, 2012
1 commit
-
What it is pointed by a csrow/channel vector is a rank information, and
not a channel information.On a traditional architecture, the memory controller directly access the
memory ranks, via chip select rows. Different ranks at the same DIMM is
selected via different chip select rows. So, typically, one
csrow/channel pair means one different DIMM.On FB-DIMMs, there's a microcontroller chip at the DIMM, called Advanced
Memory Buffer (AMB) that serves as the interface between the memory
controller and the memory chips.The AMB selection is via the DIMM slot, and not via a csrow.
It is up to the AMB to talk with the csrows of the DRAM chips.
So, the FB-DIMM memory controllers see the DIMM slot, and not the DIMM
rank. RAMBUS is similar.Newer memory controllers, like the ones found on Intel Sandy Bridge and
Nehalem, even working with normal DDR3 DIMM's, don't use the usual
channel A/channel B interleaving schema to provide 128 bits data access.Instead, they have more channels (3 or 4 channels), and they can use
several interleaving schemas. Such memory controllers see the DIMMs
directly on their registers, instead of the ranks, which is better for
the driver, as its main usageis to point to a broken DIMM stick (the
Field Repleceable Unit), and not to point to a broken DRAM chip.The drivers that support such such newer memory architecture models
currently need to fake information and to abuse on EDAC structures, as
the subsystem was conceived with the idea that the csrow would always be
visible by the CPU.To make things a little worse, those drivers don't currently fake
csrows/channels on a consistent way, as the concepts there don't apply
to the memory controllers they're talking with. So, each driver author
interpreted the concepts using a different logic.In order to fix it, let's rename the data structure that points into a
DIMM rank to "rank_info", in order to be clearer about what's stored
there.Latter patches will provide a better way to represent the memory
hierarchy for the other types of memory controller.Signed-off-by: Mauro Carvalho Chehab
20 Mar, 2012
1 commit
-
Signed-off-by: Cong Wang
15 Dec, 2011
1 commit
-
After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.Cc: Doug Thompson
Cc: Paul Gortmaker
Cc: Lucas De Marchi
Cc: Borislav Petkov
Signed-off-by: Kay Sievers
Signed-off-by: Greg Kroah-Hartman
27 May, 2011
1 commit
-
synchronize_rcu() does the stuff as needed.
Signed-off-by: Lai Jiangshan
Cc: Doug Thompson
Cc: "Paul E. McKenney"
Cc: Mauro Carvalho Chehab
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
31 Mar, 2011
1 commit
-
Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: Lucas De Marchi
07 Jan, 2011
1 commit
-
Add a macro per printk level, shorten up error messages. Add relevant
information to KERN_INFO level. No functional change.Signed-off-by: Borislav Petkov
09 Dec, 2010
1 commit
-
00740c58541b6087d78418cebca1fcb86dc6077d changed edac_core to
un-/register a workqueue item only if a lowlevel driver supplies a
polling routine. Normally, when we remove a polling low-level driver, we
go and cancel all the queued work. However, the workqueue unreg happens
based on the ->op_state setting, and edac_mc_del_mc() sets this to
OP_OFFLINE _before_ we cancel the work item, leading to NULL ptr oops on
the workqueue list.Fix it by putting the unreg stuff in proper order.
Cc: #36.x
Reported-and-tested-by: Tobias Karnat
LKML-Reference:
Signed-off-by: Borislav Petkov
24 Oct, 2010
4 commits
-
This is a nasty bug. Since kobject count will be reduced by zero by
edac_mc_del_mc(), and this triggers the kobj release method, the
mci memory will be freed automatically. So, all we have left is ctl_name,
as shown by enabling debug:[ 80.822186] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1020: edac_remove_sysfs_mci_device() remove_link
[ 80.832590] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1024: edac_remove_sysfs_mci_device() remove_mci_instance
[ 80.843776] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 640: edac_mci_control_release() mci instance idx=0 releasing
[ 80.855163] EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:3f:03.0
[ 80.862936] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 2089: (null): free structs
[ 80.871134] EDAC DEBUG: in drivers/edac/edac_mc.c, line at 238: edac_mc_free()
[ 80.878379] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 726: edac_mc_unregister_sysfs_main_kobj()
[ 80.888043] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1232: drivers/edac/i7core_edac.c: i7core_put_devices()Also, kfree(mci) shouldn't happen at the kobj.release, as it happens
when edac_remove_sysfs_mci_device() is called, but the logic is:
edac_remove_sysfs_mci_device(mci);
edac_printk(KERN_INFO, EDAC_MC,
"Removed device %d for %s %s: DEV %s\n", mci->mc_idx,
mci->mod_name, mci->ctl_name, edac_dev_name(mci));
So, as the edac_printk() needs the mci struct, this generates an OOPS.Signed-off-by: Mauro Carvalho Chehab
-
This is important to track a nasty bug at the free logic.
Signed-off-by: Mauro Carvalho Chehab
-
Make sure we remove groups at the right order
Signed-off-by: Mauro Carvalho Chehab
-
With multi-sockets, more than one edac pci handler is enabled. Be sure to
un-register all instances.Signed-off-by: Mauro Carvalho Chehab
27 Sep, 2010
1 commit
-
f4347553b30ec66530bfe63c84530afea3803396 removed the edac polling
mechanism in favor of using a notifier chain for conveying MCE
information to edac. However, the module removal path didn't test
whether the driver had setup the polling function workqueue at all and
the rmmod process was hanging in the kernel at try_to_del_timer_sync()
in the cancel_delayed_work() path, trying to cancel an uninitialized
work struct.Fix that by adding a balancing check to the workqueue removal path.
Signed-off-by: Borislav Petkov
08 Dec, 2009
1 commit
-
Instead of using deeply-nested conditionals for dumping the DIMM type in
debug mode, add a strings array of the supported DIMM types.This is useful in cases where an edac driver supports multiple DRAM
types and is only defined in debug builds.Signed-off-by: Borislav Petkov
24 Sep, 2009
1 commit
-
Module edac_core.ko uses call_rcu() callbacks in edac_device.c, edac_mc.c
and edac_pci.c.They all use a wait_for_completion() scheme, but this scheme it not 100%
safe on multiple CPUs. See the _rcu_barrier() implementation which
explains why extra precausion is needed.The patch adds a comment about rcu_barrier() and as a precausion calls
rcu_barrier(). A maintainer needs to look at removing the
wait_for_completion code.[dougthompson@xmission.com: remove the wait_for_completion code]
Signed-off-by Jesper Dangaard Brouer
Signed-off-by: Doug Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Apr, 2009
1 commit
-
The edac-core driver includes code which assumes that the work_struct
which is included in every delayed_work is the first member of that
structure. This is currently the case but might change in the future, so
use to_delayed_work() instead, which doesn't make such an assumption.linux-2.6.30-rc1 has the to_delayed_work() function that will allow this
patch to workSigned-off-by: Jean Delvare
Signed-off-by: Doug Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Jan, 2009
1 commit
-
This patch is part of a larger patch series which will remove the "char
bus_id[20]" name string from struct device. The device name is managed in
the kobject anyway, and without any size limitation, and just needlessly
copied into "struct device".[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Greg Kroah-Hartman
Acked-by: Doug Thompson
Signed-off-by: Kay Sievers
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
06 May, 2008
1 commit
-
Commit 06916639e2fed9ee475efef2747a1b7429f8fe76 ("driver-core: add
dev_name() to help transition away from using bus_id") added a static
inline dev_name() and used it in dev_printk.Unfortunately, drivers/edac/edac_core.h defines a macro called
dev_name(). Rename the latter.Diagnosis by Tony Breeds and Michael Ellerman.
Signed-off-by: Stephen Rothwell
Acked-by: Doug Thompson
Signed-off-by: Linus Torvalds
29 Apr, 2008
2 commits
-
Collection of patches, merged into one, from Adrian that do the following:
1) This patch makes the following needlessly global functions static:
- edac_pci_get_log_pe()
- edac_pci_get_log_npe()
- edac_pci_get_panic_on_pe()
- edac_pci_unregister_sysfs_instance_kobj()
- edac_pci_main_kobj_setup()2) Remove unneeded function edac_device_find()
3) Added #if 0 around function edac_pci_find()
4) make the needlessly global edac_pci_generic_check() static
5) Removed function edac_check_mc_devices()
Doug Thompson modified Adrian's patches, to bettern represent
the direction of EDAC, and make them one patch.Cc: Alan Cox
Signed-off-by: Adrian Bunk
Signed-off-by: Doug Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Robert P. J. Day
Acked-by: Doug Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Jul, 2007
1 commit
-
This fixes a deadlock that could occur on a 'setup' and 'teardown' sequence of
the workq for a edac_mc control structure instance. A similiar fix was
previously implemented for the edac_device code.In addition, the edac_mc device code there was missing code to allow the workq
period valu to be altered via sysfs control.This patch adds that fix on the code, and allows for the changing of the
period value as well.Cc: Alan Cox
Signed-off-by: Doug Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Jul, 2007
12 commits
-
Fix mutex locking deadlock on the device controller linked list. Was calling
a lock then a function that could call the same lock. Moved the cancel workq
function to outside the lockAdded some short circuit logic in the workq code
Added comments of description
Code tidying
Signed-off-by: Doug Thompson
Cc: Greg KH
Cc: Alan Cox
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch refactors the 'releasing' of kobjects for the edac_mc type of
device. The correct pattern of kobject release is followed.As internal kobjs are allocated they bump a ref count on the top level kobj.
It in turn has a module ref count on the edac_core module. When internal
kobjects are released, they dec the ref count on the top level kobj. When the
top level kobj reaches zero, it decrements the ref count on the edac_core
object, allow it to be unloaded, as all resources have all now been released.Cc: Alan Cox alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson
Acked-by: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Refactoring of sysfs code necessitated the refactoring of the edac_mc_alloc()
and edac_mc_add_mc() apis, of moving the index value to the alloc() function.
This patch alters the in tree drivers to utilize this new api signature.Having the index value performed later created a chicken-and-the-egg issue.
Moving it to the alloc() function allows for creating the necessary sysfs
entries with the proper index numberCc: Alan Cox alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Refactor the edac_align_ptr() function to reduce the noise of casting the
aligned pointer to the various types of data objects and modified its callers
to its new signatureSigned-off-by: Douglas Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch fixes some remnant spaces inserted by the use of Lindent.
Seems Lindent adds some spaces when it shoulded. These have been fixed.
In addition, goto targets have issues, these have been fixed
in this patch.Signed-off-by: Douglas Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The origin of this code comes from patches at sourceforge, that
allow EDAC to be updated to various kernels. With kernel version 2.6.20 a
new workq system was installed, thus the patches needed to be modified
based on the kernel version. For submitting to the latest kernel.org
those #ifdefs are removedSigned-off-by: Douglas Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Run the EDAC CORE files through Lindent for cleanup
Signed-off-by: Douglas Thompson
Signed-off-by: Dave Jiang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fixup poll values for MC and PCI.
Also make mc function names unique to mc.Signed-off-by: Dave Jiang
Signed-off-by: Douglas Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change error check and clear variable from an atomic to an int
Signed-off-by: Dave Jiang
Signed-off-by: Douglas Thompson
Signed-off-by: Linus Torvalds -
Move the memory controller object to work queue based implementation from the
kernel thread based.Signed-off-by: Dave Jiang
Signed-off-by: Douglas Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move dev_name() macro to a more generic interface since it's not possible
to determine whether a device is pci, platform, or of_device easily.Now each low level driver sets the name into the control structure, and
the EDAC core references the control structure for the information.Better abstraction.
Signed-off-by: Dave Jiang
Signed-off-by: Douglas Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In the refactoring of edac_mc.c into several subsystem files,
the header file edac_mc.h became meaningless. A new header file
edac_core.h was created. All the files that previously included
"edac_mc.h" are changed to include "edac_core.h".Signed-off-by: Douglas Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds