12 Dec, 2013

1 commit

  • This new parameter is used to control how to report HW error reporting,
    especially for newer Intel platform, like Ivybridge-EX, which contains
    an enhanced error decoding functionality in the firmware, i.e. eMCA.

    Signed-off-by: Chen, Gong
    Acked-by: Tony Luck
    Link: http://lkml.kernel.org/r/1386310630-12529-2-git-send-email-gong.chen@linux.intel.com
    [ Boris: massage commit message. ]
    Signed-off-by: Borislav Petkov

    Chen, Gong
     

24 Oct, 2013

1 commit


24 Jul, 2013

1 commit

  • Fix the following:

    BUG: key ffff88043bdd0330 not in .data!
    ------------[ cut here ]------------
    WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
    DEBUG_LOCKS_WARN_ON(1)
    Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
    CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
    Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
    0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
    ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
    ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
    Call Trace:
    dump_stack
    warn_slowpath_common
    warn_slowpath_fmt
    lockdep_init_map
    ? trace_hardirqs_on_caller
    ? trace_hardirqs_on
    debug_mutex_init
    __mutex_init
    bus_register
    edac_create_sysfs_mci_device
    edac_mc_add_mc
    sbridge_probe
    pci_device_probe
    driver_probe_device
    __driver_attach
    ? driver_probe_device
    bus_for_each_dev
    driver_attach
    bus_add_driver
    driver_register
    __pci_register_driver
    ? 0xffffffffa0010fff
    sbridge_init
    ? 0xffffffffa0010fff
    do_one_initcall
    load_module
    ? unset_module_init_ro_nx
    SyS_init_module
    tracesys
    ---[ end trace d24a70b0d3ddf733 ]---
    EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
    EDAC sbridge: Driver loaded.

    What happens is that bus_register needs a statically allocated lock_key
    because the last is handed in to lockdep. However, struct mem_ctl_info
    embeds struct bus_type (the whole struct, not a pointer to it) and the
    whole thing gets dynamically allocated.

    Fix this by using a statically allocated struct bus_type for the MC bus.

    Signed-off-by: Borislav Petkov
    Acked-by: Mauro Carvalho Chehab
    Cc: Markus Trippelsdorf
    Cc: stable@kernel.org # v3.10
    Signed-off-by: Tony Luck

    Borislav Petkov
     

16 Mar, 2013

2 commits

  • Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
    memory controller is csrows based. Merge both fields into one.

    There's no need for the driver to actually fill it, as the core detects
    it by checking if one of the layers has the csrows type as part of the
    memory hierarchy:

    if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
    per_rank = true;

    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Borislav Petkov

    Mauro Carvalho Chehab
     
  • We were filling the csrow size with a wrong value. 16a528ee3975 ("EDAC:
    Fix csrow size reported in sysfs") tried to address the issue. It fixed
    the report with the old API but not with the new one. Correct it for the
    new API too.

    Signed-off-by: Mauro Carvalho Chehab
    [ make it a per-csrow accounting regardless of ->channel_count ]
    Signed-off-by: Borislav Petkov

    Mauro Carvalho Chehab
     

22 Feb, 2013

2 commits


21 Feb, 2013

2 commits


21 Dec, 2012

1 commit


28 Nov, 2012

2 commits


27 Jun, 2012

1 commit


12 Jun, 2012

4 commits

  • Kernel kobjects have rigid rules: each container object should be
    dynamically allocated, and can't be allocated into a single kmalloc.

    EDAC never obeyed this rule: it has a single malloc function that
    allocates all needed data into a single kzalloc.

    As this is not accepted anymore, change the allocation schema of the
    EDAC *_info structs to enforce this Kernel standard.

    Acked-by: Chris Metcalf
    Cc: Aristeu Rozanski
    Cc: Doug Thompson
    Cc: Greg K H
    Cc: Borislav Petkov
    Cc: Mark Gross
    Cc: Tim Small
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: Olof Johansson
    Cc: Egor Martovetsky
    Cc: Michal Marek
    Cc: Jiri Kosina
    Cc: Dmitry Eremin-Solenikov
    Cc: Benjamin Herrenschmidt
    Cc: Hitoshi Mitake
    Cc: Andrew Morton
    Cc: Shaohui Xie
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Sometimes, it is useful to have a mechanism that generates fake
    errors, in order to test the EDAC core code, and the userspace
    tools.

    Provide such mechanism by adding a few debugfs nodes.

    Reviewed-by: Aristeu Rozanski
    Cc: Doug Thompson
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Now that al users for the old kobj raw access are gone,
    we can get rid of the legacy kobj-based structures and
    data.

    Reviewed-by: Aristeu Rozanski
    Cc: Doug Thompson
    Cc: Michal Marek
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • The EDAC subsystem uses the old struct sysdev approach,
    creating all nodes using the raw sysfs API. This is bad,
    as the API is deprecated.

    As we'll be changing the EDAC API, let's first port the existing
    code to struct device.

    There's one drawback on this patch: driver-specific sysfs
    nodes, used by mpc85xx_edac, amd64_edac and i7core_edac
    won't be created anymore. While it would be possible to
    also port the device-specific code, that would mix kobj with
    struct device, with is not recommended. Also, it is easier and nicer
    to move the code to the drivers, instead, as the core can get rid
    of some complex logic that just emulates what the device_add()
    and device_create_file() already does.

    The next patches will convert the driver-specific code to use
    the device-specific calls. Then, the remaining bits of the old
    sysfs API will be removed.

    NOTE: a per-MC bus is required, otherwise devices with more than
    one memory controller will hit a bug like the one below:

    [ 819.094946] EDAC DEBUG: find_mci_by_dev: find_mci_by_dev()
    [ 819.094948] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device() idx=1
    [ 819.094952] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device(): creating device mc1
    [ 819.094967] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device creating dimm0, located at channel 0 slot 0
    [ 819.094984] ------------[ cut here ]------------
    [ 819.100142] WARNING: at fs/sysfs/dir.c:481 sysfs_add_one+0xc1/0xf0()
    [ 819.107282] Hardware name: S2600CP
    [ 819.111078] sysfs: cannot create duplicate filename '/bus/edac/devices/dimm0'
    [ 819.119062] Modules linked in: sb_edac(+) edac_core ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun kvm microcode pcspkr iTCO_wdt iTCO_vendor_support igb i2c_i801 i2c_core sg ioatdma dca sr_mod cdrom sd_mod crc_t10dif ahci libahci isci libsas libata scsi_transport_sas scsi_mod wmi dm_mod [last unloaded: scsi_wait_scan]
    [ 819.175748] Pid: 10902, comm: modprobe Not tainted 3.3.0-0.11.el7.v12.2.x86_64 #1
    [ 819.184113] Call Trace:
    [ 819.186868] [] warn_slowpath_common+0x7f/0xc0
    [ 819.193573] [] warn_slowpath_fmt+0x46/0x50
    [ 819.200000] [] sysfs_add_one+0xc1/0xf0
    [ 819.206025] [] sysfs_do_create_link+0x135/0x220
    [ 819.212944] [] ? sysfs_create_group+0x13/0x20
    [ 819.219656] [] sysfs_create_link+0x13/0x20
    [ 819.226109] [] bus_add_device+0xe6/0x1b0
    [ 819.232350] [] device_add+0x2db/0x460
    [ 819.238300] [] edac_create_dimm_object+0x84/0xf0 [edac_core]
    [ 819.246460] [] edac_create_sysfs_mci_device+0xe8/0x290 [edac_core]
    [ 819.255215] [] edac_mc_add_mc+0x5a/0x2c0 [edac_core]
    [ 819.262611] [] sbridge_register_mci+0x1bc/0x279 [sb_edac]
    [ 819.270493] [] sbridge_probe+0xef/0x175 [sb_edac]
    [ 819.277630] [] ? pm_runtime_enable+0x58/0x90
    [ 819.284268] [] local_pci_probe+0x5c/0xd0
    [ 819.290508] [] __pci_device_probe+0xf1/0x100
    [ 819.297117] [] pci_device_probe+0x3a/0x60
    [ 819.303457] [] really_probe+0x73/0x270
    [ 819.309496] [] driver_probe_device+0x4e/0xb0
    [ 819.316104] [] __driver_attach+0xab/0xb0
    [ 819.322337] [] ? driver_probe_device+0xb0/0xb0
    [ 819.329151] [] bus_for_each_dev+0x56/0x90
    [ 819.335489] [] driver_attach+0x1e/0x20
    [ 819.341534] [] bus_add_driver+0x1b0/0x2a0
    [ 819.347884] [] ? 0xffffffffa0346fff
    [ 819.353641] [] driver_register+0x76/0x140
    [ 819.359980] [] ? printk+0x51/0x53
    [ 819.365524] [] ? 0xffffffffa0346fff
    [ 819.371291] [] __pci_register_driver+0x56/0xd0
    [ 819.378096] [] sbridge_init+0x54/0x1000 [sb_edac]
    [ 819.385231] [] do_one_initcall+0x3f/0x170
    [ 819.391577] [] sys_init_module+0xbe/0x230
    [ 819.397926] [] system_call_fastpath+0x16/0x1b
    [ 819.404633] ---[ end trace 1654fdd39556689f ]---

    This happens because the bus is not being properly initialized.
    Instead of putting the memory sub-devices inside the memory controller,
    it is putting everything under the same directory:

    $ tree /sys/bus/edac/
    /sys/bus/edac/
    ├── devices
    │ ├── all_channel_counts -> ../../../devices/system/edac/mc/mc0/all_channel_counts
    │ ├── csrow0 -> ../../../devices/system/edac/mc/mc0/csrow0
    │ ├── csrow1 -> ../../../devices/system/edac/mc/mc0/csrow1
    │ ├── csrow2 -> ../../../devices/system/edac/mc/mc0/csrow2
    │ ├── dimm0 -> ../../../devices/system/edac/mc/mc0/dimm0
    │ ├── dimm1 -> ../../../devices/system/edac/mc/mc0/dimm1
    │ ├── dimm3 -> ../../../devices/system/edac/mc/mc0/dimm3
    │ ├── dimm6 -> ../../../devices/system/edac/mc/mc0/dimm6
    │ ├── inject_addrmatch -> ../../../devices/system/edac/mc/mc0/inject_addrmatch
    │ ├── mc -> ../../../devices/system/edac/mc
    │ └── mc0 -> ../../../devices/system/edac/mc/mc0
    ├── drivers
    ├── drivers_autoprobe
    ├── drivers_probe
    └── uevent

    On a multi-memory controller system, the names "csrow%d" and "dimm%d"
    should be under "mc%d", and not at the main hierarchy level.

    So, we need to create a per-MC bus, in order to have its own namespace.

    Reviewed-by: Aristeu Rozanski
    Cc: Doug Thompson
    Cc: Greg K H
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

11 Jun, 2012

2 commits

  • No functional changes. Just comment improvements.

    Reviewed-by: Aristeu Rozanski
    Cc: Doug Thompson
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • As EDAC doesn't use struct device itself, it created a parent dev
    pointer called as "pdev". Now that we'll be converting it to use
    struct device, instead of struct devsys, this needs to be fixed.

    No functional changes.

    Reviewed-by: Aristeu Rozanski
    Acked-by: Chris Metcalf
    Cc: Doug Thompson
    Cc: Borislav Petkov
    Cc: Mark Gross
    Cc: Jason Uhlenkott
    Cc: Tim Small
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: Olof Johansson
    Cc: Egor Martovetsky
    Cc: Michal Marek
    Cc: Jiri Kosina
    Cc: Joe Perches
    Cc: Dmitry Eremin-Solenikov
    Cc: Benjamin Herrenschmidt
    Cc: Hitoshi Mitake
    Cc: Andrew Morton
    Cc: "Niklas Söderlund"
    Cc: Shaohui Xie
    Cc: Josh Boyer
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

29 May, 2012

6 commits

  • While userspace doesn't fill the dimm labels, add there the dimm location,
    as described by the used memory model. This could eventually match what
    is described at the dmidecode, making easier for people to identify the
    memory.

    For example, on an Intel motherboard where the DMI table is reliable,
    the first memory stick is described as:

    Memory Device
    Array Handle: 0x0029
    Error Information Handle: Not Provided
    Total Width: 64 bits
    Data Width: 64 bits
    Size: 2048 MB
    Form Factor: DIMM
    Set: 1
    Locator: A1_DIMM0
    Bank Locator: A1_Node0_Channel0_Dimm0
    Type:
    Type Detail: Synchronous
    Speed: 800 MHz
    Manufacturer: A1_Manufacturer0
    Serial Number: A1_SerNum0
    Asset Tag: A1_AssetTagNum0
    Part Number: A1_PartNum0

    The memory named as "A1_DIMM0" is physically located at the first
    memory controller (node 0), at channel 0, dimm slot 0.

    After this patch, the memory label will be filled with:
    /sys/devices/system/edac/mc/csrow0/ch0_dimm_label:mc#0channel#0slot#0

    And (after the new EDAC API patches) as:
    /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:mc#0channel#0slot#0

    So, even if the memory label is not initialized on userspace, an useful
    information with the error location is filled there, expecially since
    several systems/motherboards are provided with enough info to map from
    channel/slot (or branch/channel/slot) into the DIMM label. So, letting the
    EDAC core fill it by default is a good thing.

    It should noticed that, as the label filling happens at the
    edac_mc_alloc(), drivers can override it to better describe the memories
    (and some actually do it).

    Cc: Aristeu Rozanski
    Cc: Doug Thompson
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Change the EDAC internal representation to work with non-csrow
    based memory controllers.

    There are lots of those memory controllers nowadays, and more
    are coming. So, the EDAC internal representation needs to be
    changed, in order to work with those memory controllers, while
    preserving backward compatibility with the old ones.

    The edac core was written with the idea that memory controllers
    are able to directly access csrows.

    This is not true for FB-DIMM and RAMBUS memory controllers.

    Also, some recent advanced memory controllers don't present a per-csrows
    view. Instead, they view memories as DIMMs, instead of ranks.

    So, change the allocation and error report routines to allow
    them to work with all types of architectures.

    This will allow the removal of several hacks with FB-DIMM and RAMBUS
    memory controllers.

    Also, several tests were done on different platforms using different
    x86 drivers.

    TODO: a multi-rank DIMMs are currently represented by multiple DIMM
    entries in struct dimm_info. That means that changing a label for one
    rank won't change the same label for the other ranks at the same DIMM.
    This bug is present since the beginning of the EDAC, so it is not a big
    deal. However, on several drivers, it is possible to fix this issue, but
    it should be a per-driver fix, as the csrow => DIMM arrangement may not
    be equal for all. So, don't try to fix it here yet.

    I tried to make this patch as short as possible, preceding it with
    several other patches that simplified the logic here. Yet, as the
    internal API changes, all drivers need changes. The changes are
    generally bigger in the drivers for FB-DIMMs.

    Cc: Aristeu Rozanski
    Cc: Doug Thompson
    Cc: Borislav Petkov
    Cc: Mark Gross
    Cc: Jason Uhlenkott
    Cc: Tim Small
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: Olof Johansson
    Cc: Egor Martovetsky
    Cc: Chris Metcalf
    Cc: Michal Marek
    Cc: Jiri Kosina
    Cc: Joe Perches
    Cc: Dmitry Eremin-Solenikov
    Cc: Benjamin Herrenschmidt
    Cc: Hitoshi Mitake
    Cc: Andrew Morton
    Cc: "Niklas Söderlund"
    Cc: Shaohui Xie
    Cc: Josh Boyer
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • The edac core were written with the idea that memory controllers
    are able to directly access csrows, and that the channels are
    used inside a csrows select.

    This is not true for FB-DIMM and RAMBUS memory controllers.

    Also, some recent advanced memory controllers don't present a per-csrows
    view. Instead, they view memories as DIMMs, instead of ranks, accessed
    via csrow/channel.

    So, changes are needed in order to allow the EDAC core to
    work with all types of architectures.

    In preparation for handling non-csrows based memory controllers,
    add some memory structs and a macro:

    enum hw_event_mc_err_type: describes the type of error
    (corrected, uncorrected, fatal)

    To be used by the new edac_mc_handle_error function;

    enum edac_mc_layer: describes the type of a given memory
    architecture layer (branch, channel, slot, csrow).

    struct edac_mc_layer: describes the properties of a memory
    layer (type, size, and if the layer
    will be used on a virtual csrow.

    EDAC_DIMM_PTR() - as the number of layers can vary from 1 to 3,
    this macro converts from an address with up to 3 layers into
    a linear address.

    Reviewed-by: Borislav Petkov
    Cc: Doug Thompson
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • The number of pages is a dimm property. Move it to the dimm struct.

    After this change, it is possible to add sysfs nodes for the DIMM's that
    will properly represent the DIMM stick properties, including its size.

    A TODO fix here is to properly represent dual-rank/quad-rank DIMMs when
    the memory controller represents the memory via chip select rows.

    Reviewed-by: Aristeu Rozanski
    Acked-by: Borislav Petkov
    Acked-by: Chris Metcalf
    Cc: Doug Thompson
    Cc: Mark Gross
    Cc: Jason Uhlenkott
    Cc: Tim Small
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: Olof Johansson
    Cc: Egor Martovetsky
    Cc: Michal Marek
    Cc: Jiri Kosina
    Cc: Joe Perches
    Cc: Dmitry Eremin-Solenikov
    Cc: Benjamin Herrenschmidt
    Cc: Hitoshi Mitake
    Cc: Andrew Morton
    Cc: "Niklas Söderlund"
    Cc: Shaohui Xie
    Cc: Josh Boyer
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • On systems based on chip select rows, all channels need to use memories
    with the same properties, otherwise the memories on channels A and B
    won't be recognized.

    However, such assumption is not true for all types of memory
    controllers.

    Controllers for FB-DIMM's don't have such requirements.

    Also, modern Intel controllers seem to be capable of handling such
    differences.

    So, we need to get rid of storing the DIMM information into a per-csrow
    data, storing it, instead at the right place.

    The first step is to move grain, mtype, dtype and edac_mode to the
    per-dimm struct.

    Reviewed-by: Aristeu Rozanski
    Reviewed-by: Borislav Petkov
    Acked-by: Chris Metcalf
    Cc: Doug Thompson
    Cc: Borislav Petkov
    Cc: Mark Gross
    Cc: Jason Uhlenkott
    Cc: Tim Small
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: Olof Johansson
    Cc: Egor Martovetsky
    Cc: Michal Marek
    Cc: Jiri Kosina
    Cc: Joe Perches
    Cc: Dmitry Eremin-Solenikov
    Cc: Benjamin Herrenschmidt
    Cc: Hitoshi Mitake
    Cc: Andrew Morton
    Cc: James Bottomley
    Cc: "Niklas Söderlund"
    Cc: Shaohui Xie
    Cc: Josh Boyer
    Cc: Mike Williams
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • The way a DIMM is currently represented implies that they're
    linked into a per-csrow struct. However, some drivers don't see
    csrows, as they're ridden behind some chip like the AMB's
    on FBDIMM's, for example.

    This forced drivers to fake^Wvirtualize a csrow struct, and to create
    a mess under csrow/channel original's concept.

    Move the DIMM labels into a per-DIMM struct, and add there
    the real location of the socket, in terms of csrow/channel.
    Latter patches will modify the location to properly represent the
    memory architecture.

    All other drivers will use a per-csrow type of location.
    Some of those drivers will require a latter conversion, as
    they also fake the csrows internally.

    TODO: While this patch doesn't change the existing behavior, on
    csrows-based memory controllers, a csrow/channel pair points to a memory
    rank. There's a known bug at the EDAC core that allows having different
    labels for the same DIMM, if it has more than one rank. A latter patch
    is need to merge the several ranks for a DIMM into the same dimm_info
    struct, in order to avoid having different labels for the same DIMM.

    The edac_mc_alloc() will now contain a per-dimm initialization loop that
    will be changed by latter patches in order to match other types of
    memory architectures.

    Reviewed-by: Aristeu Rozanski
    Reviewed-by: Borislav Petkov
    Cc: Doug Thompson
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: "Niklas Söderlund"
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

29 Mar, 2012

1 commit

  • Pull EDAC fixes from Mauro Carvalho Chehab:
    "A series of EDAC driver fixes. It also has one core fix at the
    documentation, and a rename patch, fixing the name of the struct that
    contains the rank information."

    * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
    edac: rename channel_info to rank_info
    i5400_edac: Avoid calling pci_put_device() twice
    edac: i5100 ack error detection register after each read
    edac: i5100 fix erroneous define for M1Err
    edac: sb_edac: Fix a wrong value setting for the previous value
    edac: sb_edac: Fix a INTERLEAVE_MODE() misuse
    edac: sb_edac: Let the driver depend on PCI_MMCONFIG
    edac: Improve the comments to better describe the memory concepts
    edac/ppc4xx_edac: Fix compilation
    Fix sb_edac compilation with 32 bits kernels

    Linus Torvalds
     

22 Mar, 2012

2 commits

  • What it is pointed by a csrow/channel vector is a rank information, and
    not a channel information.

    On a traditional architecture, the memory controller directly access the
    memory ranks, via chip select rows. Different ranks at the same DIMM is
    selected via different chip select rows. So, typically, one
    csrow/channel pair means one different DIMM.

    On FB-DIMMs, there's a microcontroller chip at the DIMM, called Advanced
    Memory Buffer (AMB) that serves as the interface between the memory
    controller and the memory chips.

    The AMB selection is via the DIMM slot, and not via a csrow.

    It is up to the AMB to talk with the csrows of the DRAM chips.

    So, the FB-DIMM memory controllers see the DIMM slot, and not the DIMM
    rank. RAMBUS is similar.

    Newer memory controllers, like the ones found on Intel Sandy Bridge and
    Nehalem, even working with normal DDR3 DIMM's, don't use the usual
    channel A/channel B interleaving schema to provide 128 bits data access.

    Instead, they have more channels (3 or 4 channels), and they can use
    several interleaving schemas. Such memory controllers see the DIMMs
    directly on their registers, instead of the ranks, which is better for
    the driver, as its main usageis to point to a broken DIMM stick (the
    Field Repleceable Unit), and not to point to a broken DRAM chip.

    The drivers that support such such newer memory architecture models
    currently need to fake information and to abuse on EDAC structures, as
    the subsystem was conceived with the idea that the csrow would always be
    visible by the CPU.

    To make things a little worse, those drivers don't currently fake
    csrows/channels on a consistent way, as the concepts there don't apply
    to the memory controllers they're talking with. So, each driver author
    interpreted the concepts using a different logic.

    In order to fix it, let's rename the data structure that points into a
    DIMM rank to "rank_info", in order to be clearer about what's stored
    there.

    Latter patches will provide a better way to represent the memory
    hierarchy for the other types of memory controller.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • The Computer memory terminology has changed with time since EDAC was
    originally written: new concepts were introduced, and some things have
    different meanings, depending on the memory architecture.

    Improve the definition of all related terms.

    Also, describe each memory type in a more detailed fashion.

    No functional changes. Just comments were touched.

    Acked-by: Borislav Petkov
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

16 Mar, 2012

1 commit

  • The header includes a lot of stuff, and
    it in turn gets a lot of use just for the basic "struct device"
    which appears so often.

    Clean up the users as follows:

    1) For those headers only needing "struct device" as a pointer
    in fcn args, replace the include with exactly that.

    2) For headers not really using anything from device.h, simply
    delete the include altogether.

    3) For headers relying on getting device.h implicitly before
    being included themselves, now explicitly include device.h

    4) For files in which doing #1 or #2 uncovers an implicit
    dependency on some other header, fix by explicitly adding
    the required header(s).

    Any C files that were implicitly relying on device.h to be
    present have already been dealt with in advance.

    Total removals from #1 and #2: 51. Total additions coming
    from #3: 9. Total other implicit dependencies from #4: 7.

    As of 3.3-rc1, there were 110, so a net removal of 42 gives
    about a 38% reduction in device.h presence in include/*

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

15 Dec, 2011

1 commit


01 Nov, 2011

1 commit


27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

21 Oct, 2010

1 commit


29 Apr, 2008

1 commit

  • I implemented opstate_init() as a inline function in linux/edac.h.

    added calling opstate_init() to:
    i82443bxgx_edac.c
    i82860_edac.c
    i82875p_edac.c
    i82975x_edac.c

    I wrote a fixed patch of
    edac-fix-module-initialization-on-several-modules.patch,
    and tested building 2.6.25-rc7 with applying this. It was succeed.
    I think the patch is now correct.

    Cc: Alan Cox
    Signed-off-by: Hitoshi Mitake
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hitoshi Mitake
     

20 Jul, 2007

2 commits

  • Change error check and clear variable from an atomic to an int

    Signed-off-by: Dave Jiang
    Signed-off-by: Douglas Thompson
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • Provides a way for NMI reported errors on x86 to notify the EDAC
    subsystem pending ECC errors by writing to a software state variable.

    Here's the reworked patch. I added an EDAC stub to the kernel so we can
    have variables that are in the kernel even if EDAC is a module. I also
    implemented the idea of using the chip driver to select error detection
    mode via module parameter and eliminate the kernel compile option.
    Please review/test. Thx!

    Also, I only made changes to some of the chipset drivers since I am
    unfamiliar with the other ones. We can add similar changes as we go.

    Signed-off-by: Dave Jiang
    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang