29 Jul, 2013

1 commit

  • commit 88d84ac97378c2f1d5fec9af1e8b7d9a662d6b00 upstream.

    Fix the following:

    BUG: key ffff88043bdd0330 not in .data!
    ------------[ cut here ]------------
    WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
    DEBUG_LOCKS_WARN_ON(1)
    Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
    CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
    Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
    0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
    ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
    ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
    Call Trace:
    dump_stack
    warn_slowpath_common
    warn_slowpath_fmt
    lockdep_init_map
    ? trace_hardirqs_on_caller
    ? trace_hardirqs_on
    debug_mutex_init
    __mutex_init
    bus_register
    edac_create_sysfs_mci_device
    edac_mc_add_mc
    sbridge_probe
    pci_device_probe
    driver_probe_device
    __driver_attach
    ? driver_probe_device
    bus_for_each_dev
    driver_attach
    bus_add_driver
    driver_register
    __pci_register_driver
    ? 0xffffffffa0010fff
    sbridge_init
    ? 0xffffffffa0010fff
    do_one_initcall
    load_module
    ? unset_module_init_ro_nx
    SyS_init_module
    tracesys
    ---[ end trace d24a70b0d3ddf733 ]---
    EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
    EDAC sbridge: Driver loaded.

    What happens is that bus_register needs a statically allocated lock_key
    because the last is handed in to lockdep. However, struct mem_ctl_info
    embeds struct bus_type (the whole struct, not a pointer to it) and the
    whole thing gets dynamically allocated.

    Fix this by using a statically allocated struct bus_type for the MC bus.

    Signed-off-by: Borislav Petkov
    Acked-by: Mauro Carvalho Chehab
    Cc: Markus Trippelsdorf
    Signed-off-by: Tony Luck
    Signed-off-by: Greg Kroah-Hartman

    Borislav Petkov
     

16 Mar, 2013

1 commit

  • Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
    memory controller is csrows based. Merge both fields into one.

    There's no need for the driver to actually fill it, as the core detects
    it by checking if one of the layers has the csrows type as part of the
    memory hierarchy:

    if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
    per_rank = true;

    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Borislav Petkov

    Mauro Carvalho Chehab
     

22 Feb, 2013

2 commits


21 Feb, 2013

3 commits

  • APEI GHES and i7core_edac/sb_edac currently can be loaded at
    the same time, but those are Highlander modules:
    "There can be only one".

    There are two reasons for that:

    1) Each driver assumes that it is the only one registering at
    the EDAC core, as it is driver's responsibility to number
    the memory controllers, and all of them start from 0;

    2) If BIOS is handling the memory errors, the OS can't also be
    doing it, as one will mangle with the other.

    So, we need to add an module owner's lock at the EDAC core,
    in order to avoid having two different modules handling memory
    errors at the same time. The best way for doing this lock seems
    to use the driver's name, as this is unique, and won't require
    changes on every driver.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • There are some cases where the memory controller layout is
    completely hidden. This is the case of firmware-driven error
    code, like the one provided by GHES. Add a new layer to be
    used on such memory error report mechanisms.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Linux 3.8-rc7

    * tag 'v3.8-rc7': (12052 commits)
    Linux 3.8-rc7
    net: sctp: sctp_endpoint_free: zero out secret key data
    net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree
    atm/iphase: rename fregt_t -> ffreg_t
    ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned
    ARM: DMA mapping: fix bad atomic test
    ARM: realview: ensure that we have sufficient IRQs available
    ARM: GIC: fix GIC cpumask initialization
    net: usb: fix regression from FLAG_NOARP code
    l2tp: dont play with skb->truesize
    net: sctp: sctp_auth_key_put: use kzfree instead of kfree
    netback: correct netbk_tx_err to handle wrap around.
    xen/netback: free already allocated memory on failure in xen_netbk_get_requests
    xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.
    xen/netback: shutdown the ring if it contains garbage.
    drm/ttm: fix fence locking in ttm_buffer_object_transfer, 2nd try
    virtio_console: Don't access uninitialized data.
    net: qmi_wwan: add more Huawei devices, including E320
    net: cdc_ncm: add another Huawei vendor specific device
    ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit()
    ...

    Mauro Carvalho Chehab
     

30 Jan, 2013

1 commit


21 Dec, 2012

1 commit


12 Dec, 2012

1 commit

  • Pull EDAC fixes from Borislav Petkov:

    - EDAC core error path fix, from Denis Kirjanov.

    - Generalization of AMD MCE bank names and some minor error reporting
    improvements.

    - EDAC core cleanups and simplifications, from Wei Yongjun.

    - amd64_edac fixes for sysfs-reported values, from Josh Hunt.

    - some heavy amd64_edac error reporting path shaving, leading to
    removing a bunch of code.

    - amd64_edac error injection method improvements.

    - EDAC core cleanups and fixes

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (24 commits)
    EDAC, pci_sysfs: Use for_each_pci_dev to simplify the code
    EDAC: Handle error path in edac_mc_sysfs_init() properly
    MCE, AMD: Dump error status
    MCE, AMD: Report decoded error type first
    MCE, AMD: Dump CPU f/m/s triple with the error
    MCE, AMD: Remove functional unit references
    EDAC: Convert to use simple_open()
    EDAC, Calxeda highbank: Convert to use simple_open()
    EDAC: Fix mc size reported in sysfs
    EDAC: Fix csrow size reported in sysfs
    EDAC: Pass mci parent
    EDAC: Add memory controller flags
    amd64_edac: Fix csrows size and pages computation
    amd64_edac: Use DBAM_DIMM macro
    amd64_edac: Fix K8 chip select reporting
    amd64_edac: Reorganize error reporting path
    amd64_edac: Do not check whether error address is valid
    amd64_edac: Improve error injection
    amd64_edac: Cleanup error injection code
    amd64_edac: Small fixlets and cleanups
    ...

    Linus Torvalds
     

04 Dec, 2012

1 commit


28 Nov, 2012

2 commits


25 Oct, 2012

1 commit

  • The driver is currently filling data in a wrong way, on drivers
    for csrows-based memory controller, when the first layer is a
    csrow.

    This is not easily to notice, as, in general, memories are
    filed in dual, interleaved, symetric mode, as very few memory
    controllers support asymetric modes.

    While digging into a bug for i82795_edac driver, the asymetric
    mode there is now working, allowing us to fill the machine with
    4x1GB ranks at channel 0, and 2x512GB at channel 1:

    Channel 0 ranks:
    EDAC DEBUG: i82975x_init_csrows: DIMM A0: from page 0x00000000 to 0x0003ffff (size: 0x00040000 pages)
    EDAC DEBUG: i82975x_init_csrows: DIMM A1: from page 0x00040000 to 0x0007ffff (size: 0x00040000 pages)
    EDAC DEBUG: i82975x_init_csrows: DIMM A2: from page 0x00080000 to 0x000bffff (size: 0x00040000 pages)
    EDAC DEBUG: i82975x_init_csrows: DIMM A3: from page 0x000c0000 to 0x000fffff (size: 0x00040000 pages)

    Channel 1 ranks:
    EDAC DEBUG: i82975x_init_csrows: DIMM B0: from page 0x00100000 to 0x0011ffff (size: 0x00020000 pages)
    EDAC DEBUG: i82975x_init_csrows: DIMM B1: from page 0x00120000 to 0x0013ffff (size: 0x00020000 pages)

    Instead of properly showing the memories as such, before this patch, it
    shows the memory layout as:

    +-----------------------------------+
    | mc0 |
    | csrow0 | csrow1 | csrow2 |
    ----------+-----------------------------------+
    channel1: | 1024 MB | 1024 MB | 512 MB |
    channel0: | 1024 MB | 1024 MB | 512 MB |
    ----------+-----------------------------------+

    as if both channels were symetric, grouping the DIMMs on a wrong
    layout.

    After this patch, the memory is correctly represented.
    So, for csrows at layers[0], it shows:

    +-----------------------------------------------+
    | mc0 |
    | csrow0 | csrow1 | csrow2 | csrow3 |
    ----------+-----------------------------------------------+
    channel1: | 512 MB | 512 MB | 0 MB | 0 MB |
    channel0: | 1024 MB | 1024 MB | 1024 MB | 1024 MB |
    ----------+-----------------------------------------------+

    For csrows at layers[1], it shows:

    +-----------------------+
    | mc0 |
    | channel0 | channel1 |
    --------+-----------------------+
    csrow3: | 1024 MB | 0 MB |
    csrow2: | 1024 MB | 0 MB |
    --------+-----------------------+
    csrow1: | 1024 MB | 512 MB |
    csrow0: | 1024 MB | 512 MB |
    --------+-----------------------+

    So, no matter of what comes first, the information between
    channel and csrow will be properly represented.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

03 Oct, 2012

1 commit

  • Pull workqueue changes from Tejun Heo:
    "This is workqueue updates for v3.7-rc1. A lot of activities this
    round including considerable API and behavior cleanups.

    * delayed_work combines a timer and a work item. The handling of the
    timer part has always been a bit clunky leading to confusing
    cancelation API with weird corner-case behaviors. delayed_work is
    updated to use new IRQ safe timer and cancelation now works as
    expected.

    * Another deficiency of delayed_work was lack of the counterpart of
    mod_timer() which led to cancel+queue combinations or open-coded
    timer+work usages. mod_delayed_work[_on]() are added.

    These two delayed_work changes make delayed_work provide interface
    and behave like timer which is executed with process context.

    * A work item could be executed concurrently on multiple CPUs, which
    is rather unintuitive and made flush_work() behavior confusing and
    half-broken under certain circumstances. This problem doesn't
    exist for non-reentrant workqueues. While non-reentrancy check
    isn't free, the overhead is incurred only when a work item bounces
    across different CPUs and even in simulated pathological scenario
    the overhead isn't too high.

    All workqueues are made non-reentrant. This removes the
    distinction between flush_[delayed_]work() and
    flush_[delayed_]_work_sync(). The former is now as strong as the
    latter and the specified work item is guaranteed to have finished
    execution of any previous queueing on return.

    * In addition to the various bug fixes, Lai redid and simplified CPU
    hotplug handling significantly.

    * Joonsoo introduced system_highpri_wq and used it during CPU
    hotplug.

    There are two merge commits - one to pull in IRQ safe timer from
    tip/timers/core and the other to pull in CPU hotplug fixes from
    wq/for-3.6-fixes as Lai's hotplug restructuring depended on them."

    Fixed a number of trivial conflicts, but the more interesting conflicts
    were silent ones where the deprecated interfaces had been used by new
    code in the merge window, and thus didn't cause any real data conflicts.

    Tejun pointed out a few of them, I fixed a couple more.

    * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits)
    workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending()
    workqueue: use cwq_set_max_active() helper for workqueue_set_max_active()
    workqueue: introduce cwq_set_max_active() helper for thaw_workqueues()
    workqueue: remove @delayed from cwq_dec_nr_in_flight()
    workqueue: fix possible stall on try_to_grab_pending() of a delayed work item
    workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback()
    workqueue: use __cpuinit instead of __devinit for cpu callbacks
    workqueue: rename manager_mutex to assoc_mutex
    workqueue: WORKER_REBIND is no longer necessary for idle rebinding
    workqueue: WORKER_REBIND is no longer necessary for busy rebinding
    workqueue: reimplement idle worker rebinding
    workqueue: deprecate __cancel_delayed_work()
    workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()
    workqueue: use mod_delayed_work() instead of __cancel + queue
    workqueue: use irqsafe timer for delayed_work
    workqueue: clean up delayed_work initializers and add missing one
    workqueue: make deferrable delayed_work initializer names consistent
    workqueue: cosmetic whitespace updates for macro definitions
    workqueue: deprecate system_nrt[_freezable]_wq
    workqueue: deprecate flush[_delayed]_work_sync()
    ...

    Linus Torvalds
     

24 Sep, 2012

2 commits

  • Fix potential NULL pointer dereference in edac_unregister_sysfs() on
    system boot introduced in 3.6-rc1.

    Since commit 7a623c039 ("edac: rewrite the sysfs code to use struct
    device") edac_mc_alloc() no longer initializes embedded kobjects in
    struct mem_ctl_info. Therefore edac_mc_free() can no longer simply
    decrement a kobject reference count to free the allocated memory unless
    the memory controller driver module had also called edac_mc_add_mc().

    Now edac_mc_free() will check if the newly embedded struct device has
    been registered with sysfs before using either the standard device
    release functions or freeing the data structures itself with logic
    pulled out of the error path of edac_mc_alloc().

    The BUG this patch resolves for me:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    EIP is at __wake_up_common+0x1a/0x6a
    Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 task.ti=f3dc6000)
    Call Trace:
    complete_all+0x3f/0x50
    device_pm_remove+0x23/0xa2
    device_del+0x34/0x142
    edac_unregister_sysfs+0x3b/0x5c [edac_core]
    edac_mc_free+0x29/0x2f [edac_core]
    e7xxx_probe1+0x268/0x311 [e7xxx_edac]
    e7xxx_init_one+0x56/0x61 [e7xxx_edac]
    local_pci_probe+0x13/0x15
    ...

    Cc: Mauro Carvalho Chehab
    Cc: Shaohui Xie
    Signed-off-by: Shaun Ruffell
    Signed-off-by: Linus Torvalds

    Shaun Ruffell
     
  • coccinelle warns about:

    + drivers/edac/edac_mc.c:429:9-23: ERROR: reference preceded by free on line 429

    421 if (mci->csrows) {
    > 422 for (chn = 0; chn < tot_channels; chn++) {
    423 csr = mci->csrows[chn];
    424 if (csr) {
    > 425 for (chn = 0; chn < tot_channels; chn++)
    426 kfree(csr->channels[chn]);
    427 kfree(csr);
    428 }
    > 429 kfree(mci->csrows[i]);
    430 }
    431 kfree(mci->csrows);
    432 }

    and that code block seem to mess things up in several ways (double free, memory
    leak, out-of-bound reads etc.):

    L422: The iterator "chn" and bound "tot_channels" are totally wrong. Should be
    "row" and "tot_csrows" respectively. Which means either memory leak, or
    out-of-bound reads (which if does not trigger an immediate page fault
    error, will further lead to kfree() on random addresses).

    L425: The inner loop is reusing the same iterator "chn" as the outer loop,
    which could lead to premature end of the outer loop, and hence memory leak.

    L429: The array index 'i' in mci->csrows[i] is a temporary value used in
    previous loops, and won't change at all in the current loop. Which
    means either out-of-bound read and possibly kfree(random number), or the
    same mci->csrows[i] get freed once and again, and possibly double free
    for the kfree(csr) in L427.

    L426/L427: a kfree(csr->channels) is needed in between to avoid leaking the memory.

    The buggy code was introduced by commit de3910eb ("edac: change the mem
    allocation scheme to make Documentation/kobject.txt happy") in the 3.6-rc1
    merge window. Fix it by freeing up resources in this order:

    free csrows[i]->channels[j]
    free csrows[i]->channels
    free csrows[i]
    free csrows

    CC: Mauro Carvalho Chehab
    CC: Shaun Ruffell
    Signed-off-by: Fengguang Wu
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     

14 Aug, 2012

1 commit

  • Convert delayed_work users doing cancel_delayed_work() followed by
    queue_delayed_work() to mod_delayed_work().

    Most conversions are straight-forward. Ones worth mentioning are,

    * drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always
    use mod_delayed_work() and cancel loop in
    edac_mc_reset_delay_period() is dropped.

    * drivers/platform/x86/thinkpad_acpi.c: No need to remember whether
    watchdog is active or not. @fan_watchdog_active and related code
    dropped.

    * drivers/power/charger-manager.c: Seemingly a lot of
    delayed_work_pending() abuse going on here.
    [delayed_]work_pending() are unsynchronized and racy when used like
    this. I converted one instance in fullbatt_handler(). Please
    conver the rest so that it invokes workqueue APIs for the intended
    target state rather than trying to game work item pending state
    transitions. e.g. if timer should be modified - call
    mod_delayed_work(), canceled - call cancel_delayed_work[_sync]().

    * drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling()
    simplified. Note that round_jiffies() calls in this function are
    meaningless. round_jiffies() work on absolute jiffies not delta
    delay used by delayed_work.

    v2: Tomi pointed out that __cancel_delayed_work() users can't be
    safely converted to mod_delayed_work(). They could be calling it
    from irq context and if that happens while delayed_work_timer_fn()
    is running, it could deadlock. __cancel_delayed_work() users are
    dropped.

    Signed-off-by: Tejun Heo
    Acked-by: Henrique de Moraes Holschuh
    Acked-by: Dmitry Torokhov
    Acked-by: Anton Vorontsov
    Acked-by: David Howells
    Cc: Tomi Valkeinen
    Cc: Jens Axboe
    Cc: Jiri Kosina
    Cc: Doug Thompson
    Cc: David Airlie
    Cc: Roland Dreier
    Cc: "John W. Linville"
    Cc: Zhang Rui
    Cc: Len Brown
    Cc: "J. Bruce Fields"
    Cc: Johannes Berg

    Tejun Heo
     

30 Jul, 2012

1 commit

  • * devel: (33 commits)
    edac i5000, i5400: fix pointer math in i5000_get_mc_regs()
    edac: allow specifying the error count with fake_inject
    edac: add support for Calxeda highbank L2 cache ecc
    edac: add support for Calxeda highbank memory controller
    edac: create top-level debugfs directory
    sb_edac: properly handle error count
    i7core_edac: properly handle error count
    edac: edac_mc_handle_error(): add an error_count parameter
    edac: remove arch-specific parameter for the error handler
    amd64_edac: Don't pass driver name as an error parameter
    edac_mc: check for allocation failure in edac_mc_alloc()
    edac: Increase version to 3.0.0
    edac_mc: Cleanup per-dimm_info debug messages
    edac: Convert debugfX to edac_dbg(X,
    edac: Use more normal debugging macro style
    edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs
    Edac: Add ABI Documentation for the new device nodes
    edac: move documentation ABI to ABI/testing/sysfs-devices-edac
    i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy
    edac: change the mem allocation scheme to make Documentation/kobject.txt happy
    ...

    Mauro Carvalho Chehab
     

12 Jun, 2012

9 commits

  • In order to avoid loosing error events, it is desirable to group
    error events together and generate a single trace for several identical
    errors.

    The trace API already allows reporting multiple errors. Change the
    handle_error function to also allow that.

    The changes at the drivers were made by this small script:

    $file .=$_ while (<>);
    $file =~ s/(edac_mc_handle_error)\s*\(([^\,]+)\,([^\,]+)\,/$1($2,$3, 1,/g;
    print $file;

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Remove the arch-dependent parameter, as it were not used,
    as the MCE tracepoint weren't implemented. It probably doesn't
    make sense to have an MCE-specific tracepoint, as this will
    cost more bytes at the tracepoint, and tracepoint is not free.

    The changes at the EDAC drivers were done by this small perl script:

    $file .=$_ while (<>);
    $file =~ s/(edac_mc_handle_error)\s*\(([^\;]+)\,([^\,\)]+)\s*\)/$1($2)/g;
    print $file;

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Add a check here for if kzalloc() failed.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Mauro Carvalho Chehab

    Dan Carpenter
     
  • The edac_mc_alloc() routine allocates one dimm_info device for all
    possible memories, including the non-filled ones. The debug messages
    there are somewhat confusing. So, cleans them, by moving the code
    that prints the memory location to edac_mc, and using it on both
    edac_mc_sysfs and edac_mc.

    Also, only dumps information when DIMM/ranks are actually
    filled.

    After this patch, a dimm-based memory controller will print the debug
    info as:

    [ 1011.380027] EDAC DEBUG: edac_mc_dump_csrow: csrow->csrow_idx = 0
    [ 1011.380029] EDAC DEBUG: edac_mc_dump_csrow: csrow = ffff8801169be000
    [ 1011.380031] EDAC DEBUG: edac_mc_dump_csrow: csrow->first_page = 0x0
    [ 1011.380032] EDAC DEBUG: edac_mc_dump_csrow: csrow->last_page = 0x0
    [ 1011.380034] EDAC DEBUG: edac_mc_dump_csrow: csrow->page_mask = 0x0
    [ 1011.380035] EDAC DEBUG: edac_mc_dump_csrow: csrow->nr_channels = 3
    [ 1011.380037] EDAC DEBUG: edac_mc_dump_csrow: csrow->channels = ffff8801149c2840
    [ 1011.380039] EDAC DEBUG: edac_mc_dump_csrow: csrow->mci = ffff880117426000
    [ 1011.380041] EDAC DEBUG: edac_mc_dump_channel: channel->chan_idx = 0
    [ 1011.380042] EDAC DEBUG: edac_mc_dump_channel: channel = ffff8801149c2860
    [ 1011.380044] EDAC DEBUG: edac_mc_dump_channel: channel->csrow = ffff8801169be000
    [ 1011.380046] EDAC DEBUG: edac_mc_dump_channel: channel->dimm = ffff88010fe90400
    ...
    [ 1011.380095] EDAC DEBUG: edac_mc_dump_dimm: dimm0: channel 0 slot 0 mapped as virtual row 0, chan 0
    [ 1011.380097] EDAC DEBUG: edac_mc_dump_dimm: dimm = ffff88010fe90400
    [ 1011.380099] EDAC DEBUG: edac_mc_dump_dimm: dimm->label = 'CPU#0Channel#0_DIMM#0'
    [ 1011.380101] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000
    [ 1011.380103] EDAC DEBUG: edac_mc_dump_dimm: dimm->grain = 8
    [ 1011.380104] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000
    ...

    (a rank-based memory controller would print, instead of "dimm?", "rank?"
    on the above debug info)

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Use a more common debugging style.

    Remove __FILE__ uses, add missing newlines,
    coalesce formats and align arguments.

    Signed-off-by: Joe Perches
    Signed-off-by: Mauro Carvalho Chehab

    Joe Perches
     
  • The debug macro already adds that. Most of the work here was
    made by this small script:

    $f .=$_ while (<>);

    $f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*": /\1"/g;
    $f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*/\1/g;
    $f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*"MC: /\1"/g;

    $f =~ s/(debugf[0-9]\s*\(\")\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+)__func__\s*\,\s*/\1\2/g;
    $f =~ s/(debugf[0-9]\s*\(\")\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+),\s*__func__\s*\)/\1\2)/g;
    $f =~ s/(debugf[0-9]\s*\(\"MC\:\s*)\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+)__func__\s*\,\s*/\1\2/g;
    $f =~ s/(debugf[0-9]\s*\(\"MC\:\s*)\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+),\s*__func__\s*\)/\1\2)/g;

    $f =~ s/\"MC\: \\n\"/"MC:\\n"/g;

    print $f;

    After running the script, manual cleanups were done to fix it the remaining
    places.

    While here, removed the __LINE__ on most places, as it doesn't actually give
    useful info on most places.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Kernel kobjects have rigid rules: each container object should be
    dynamically allocated, and can't be allocated into a single kmalloc.

    EDAC never obeyed this rule: it has a single malloc function that
    allocates all needed data into a single kzalloc.

    As this is not accepted anymore, change the allocation schema of the
    EDAC *_info structs to enforce this Kernel standard.

    Acked-by: Chris Metcalf
    Cc: Aristeu Rozanski
    Cc: Doug Thompson
    Cc: Greg K H
    Cc: Borislav Petkov
    Cc: Mark Gross
    Cc: Tim Small
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: Olof Johansson
    Cc: Egor Martovetsky
    Cc: Michal Marek
    Cc: Jiri Kosina
    Cc: Dmitry Eremin-Solenikov
    Cc: Benjamin Herrenschmidt
    Cc: Hitoshi Mitake
    Cc: Andrew Morton
    Cc: Shaohui Xie
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Now that al users for the old kobj raw access are gone,
    we can get rid of the legacy kobj-based structures and
    data.

    Reviewed-by: Aristeu Rozanski
    Cc: Doug Thompson
    Cc: Michal Marek
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • The EDAC subsystem uses the old struct sysdev approach,
    creating all nodes using the raw sysfs API. This is bad,
    as the API is deprecated.

    As we'll be changing the EDAC API, let's first port the existing
    code to struct device.

    There's one drawback on this patch: driver-specific sysfs
    nodes, used by mpc85xx_edac, amd64_edac and i7core_edac
    won't be created anymore. While it would be possible to
    also port the device-specific code, that would mix kobj with
    struct device, with is not recommended. Also, it is easier and nicer
    to move the code to the drivers, instead, as the core can get rid
    of some complex logic that just emulates what the device_add()
    and device_create_file() already does.

    The next patches will convert the driver-specific code to use
    the device-specific calls. Then, the remaining bits of the old
    sysfs API will be removed.

    NOTE: a per-MC bus is required, otherwise devices with more than
    one memory controller will hit a bug like the one below:

    [ 819.094946] EDAC DEBUG: find_mci_by_dev: find_mci_by_dev()
    [ 819.094948] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device() idx=1
    [ 819.094952] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device(): creating device mc1
    [ 819.094967] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device creating dimm0, located at channel 0 slot 0
    [ 819.094984] ------------[ cut here ]------------
    [ 819.100142] WARNING: at fs/sysfs/dir.c:481 sysfs_add_one+0xc1/0xf0()
    [ 819.107282] Hardware name: S2600CP
    [ 819.111078] sysfs: cannot create duplicate filename '/bus/edac/devices/dimm0'
    [ 819.119062] Modules linked in: sb_edac(+) edac_core ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun kvm microcode pcspkr iTCO_wdt iTCO_vendor_support igb i2c_i801 i2c_core sg ioatdma dca sr_mod cdrom sd_mod crc_t10dif ahci libahci isci libsas libata scsi_transport_sas scsi_mod wmi dm_mod [last unloaded: scsi_wait_scan]
    [ 819.175748] Pid: 10902, comm: modprobe Not tainted 3.3.0-0.11.el7.v12.2.x86_64 #1
    [ 819.184113] Call Trace:
    [ 819.186868] [] warn_slowpath_common+0x7f/0xc0
    [ 819.193573] [] warn_slowpath_fmt+0x46/0x50
    [ 819.200000] [] sysfs_add_one+0xc1/0xf0
    [ 819.206025] [] sysfs_do_create_link+0x135/0x220
    [ 819.212944] [] ? sysfs_create_group+0x13/0x20
    [ 819.219656] [] sysfs_create_link+0x13/0x20
    [ 819.226109] [] bus_add_device+0xe6/0x1b0
    [ 819.232350] [] device_add+0x2db/0x460
    [ 819.238300] [] edac_create_dimm_object+0x84/0xf0 [edac_core]
    [ 819.246460] [] edac_create_sysfs_mci_device+0xe8/0x290 [edac_core]
    [ 819.255215] [] edac_mc_add_mc+0x5a/0x2c0 [edac_core]
    [ 819.262611] [] sbridge_register_mci+0x1bc/0x279 [sb_edac]
    [ 819.270493] [] sbridge_probe+0xef/0x175 [sb_edac]
    [ 819.277630] [] ? pm_runtime_enable+0x58/0x90
    [ 819.284268] [] local_pci_probe+0x5c/0xd0
    [ 819.290508] [] __pci_device_probe+0xf1/0x100
    [ 819.297117] [] pci_device_probe+0x3a/0x60
    [ 819.303457] [] really_probe+0x73/0x270
    [ 819.309496] [] driver_probe_device+0x4e/0xb0
    [ 819.316104] [] __driver_attach+0xab/0xb0
    [ 819.322337] [] ? driver_probe_device+0xb0/0xb0
    [ 819.329151] [] bus_for_each_dev+0x56/0x90
    [ 819.335489] [] driver_attach+0x1e/0x20
    [ 819.341534] [] bus_add_driver+0x1b0/0x2a0
    [ 819.347884] [] ? 0xffffffffa0346fff
    [ 819.353641] [] driver_register+0x76/0x140
    [ 819.359980] [] ? printk+0x51/0x53
    [ 819.365524] [] ? 0xffffffffa0346fff
    [ 819.371291] [] __pci_register_driver+0x56/0xd0
    [ 819.378096] [] sbridge_init+0x54/0x1000 [sb_edac]
    [ 819.385231] [] do_one_initcall+0x3f/0x170
    [ 819.391577] [] sys_init_module+0xbe/0x230
    [ 819.397926] [] system_call_fastpath+0x16/0x1b
    [ 819.404633] ---[ end trace 1654fdd39556689f ]---

    This happens because the bus is not being properly initialized.
    Instead of putting the memory sub-devices inside the memory controller,
    it is putting everything under the same directory:

    $ tree /sys/bus/edac/
    /sys/bus/edac/
    ├── devices
    │ ├── all_channel_counts -> ../../../devices/system/edac/mc/mc0/all_channel_counts
    │ ├── csrow0 -> ../../../devices/system/edac/mc/mc0/csrow0
    │ ├── csrow1 -> ../../../devices/system/edac/mc/mc0/csrow1
    │ ├── csrow2 -> ../../../devices/system/edac/mc/mc0/csrow2
    │ ├── dimm0 -> ../../../devices/system/edac/mc/mc0/dimm0
    │ ├── dimm1 -> ../../../devices/system/edac/mc/mc0/dimm1
    │ ├── dimm3 -> ../../../devices/system/edac/mc/mc0/dimm3
    │ ├── dimm6 -> ../../../devices/system/edac/mc/mc0/dimm6
    │ ├── inject_addrmatch -> ../../../devices/system/edac/mc/mc0/inject_addrmatch
    │ ├── mc -> ../../../devices/system/edac/mc
    │ └── mc0 -> ../../../devices/system/edac/mc/mc0
    ├── drivers
    ├── drivers_autoprobe
    ├── drivers_probe
    └── uevent

    On a multi-memory controller system, the names "csrow%d" and "dimm%d"
    should be under "mc%d", and not at the main hierarchy level.

    So, we need to create a per-MC bus, in order to have its own namespace.

    Reviewed-by: Aristeu Rozanski
    Cc: Doug Thompson
    Cc: Greg K H
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

11 Jun, 2012

3 commits

  • The logic was checking the sizeof the structure being allocated to
    determine whether an alignment fixup was required. This isn't right;
    what we actually care about is the alignment of the actual pointer that's
    about to be returned. This became an issue recently because struct
    edac_mc_layer has a size that is not zero modulo eight, so we were
    taking the correctly-aligned pointer and forcing it to be misaligned.
    On Tile this caused an alignment exception.

    Signed-off-by: Chris Metcalf
    Signed-off-by: Mauro Carvalho Chehab

    Chris Metcalf
     
  • As EDAC doesn't use struct device itself, it created a parent dev
    pointer called as "pdev". Now that we'll be converting it to use
    struct device, instead of struct devsys, this needs to be fixed.

    No functional changes.

    Reviewed-by: Aristeu Rozanski
    Acked-by: Chris Metcalf
    Cc: Doug Thompson
    Cc: Borislav Petkov
    Cc: Mark Gross
    Cc: Jason Uhlenkott
    Cc: Tim Small
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: Olof Johansson
    Cc: Egor Martovetsky
    Cc: Michal Marek
    Cc: Jiri Kosina
    Cc: Joe Perches
    Cc: Dmitry Eremin-Solenikov
    Cc: Benjamin Herrenschmidt
    Cc: Hitoshi Mitake
    Cc: Andrew Morton
    Cc: "Niklas Söderlund"
    Cc: Shaohui Xie
    Cc: Josh Boyer
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Add a new tracepoint-based hardware events report method for
    reporting Memory Controller events.

    Part of the description bellow is shamelessly copied from Tony
    Luck's notes about the Hardware Error BoF during LPC 2010 [1].
    Tony, thanks for your notes and discussions to generate the
    h/w error reporting requirements.

    [1] http://lwn.net/Articles/416669/

    We have several subsystems & methods for reporting hardware errors:

    1) EDAC ("Error Detection and Correction"). In its original form
    this consisted of a platform specific driver that read topology
    information and error counts from chipset registers and reported
    the results via a sysfs interface.

    2) mcelog - x86 specific decoding of machine check bank registers
    reporting in binary form via /dev/mcelog. Recent additions make use
    of the APEI extensions that were documented in version 4.0a of the
    ACPI specification to acquire more information about errors without
    having to rely reading chipset registers directly. A user level
    programs decodes into somewhat human readable format.

    3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
    decodes errors reported via machine check bank registers in AMD
    processors to the console log using printk();

    Each of these mechanisms has a band of followers ... and none
    of them appear to meet all the needs of all users.

    As part of a RAS subsystem, let's encapsulate the memory error hardware
    events into a trace facility.

    The tracepoint printk will be displayed like:

    mc_event: [quant] (Corrected|Uncorrected|Fatal) error:[error msg] on [label] ([location] [edac_mc detail] [driver_detail]

    Where:
    [quant] is the quantity of errors
    [error msg] is the driver-specific error message
    (e. g. "memory read", "bus error", ...);
    [location] is the location in terms of memory controller and
    branch/channel/slot, channel/slot or csrow/channel;
    [label] is the memory stick label;
    [edac_mc detail] describes the address location of the error
    and the syndrome;
    [driver detail] is driver-specifig error message details,
    when needed/provided (e. g. "area:DMA", ...)

    For example:

    mc_event: 1 Corrected error:memory read on memory stick DIMM_1A (mc:0 location:0:0:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)

    Of course, any userspace tools meant to handle errors should not parse
    the above data. They should, instead, use the binary fields provided by
    the tracepoint, mapping them directly into their Management Information
    Base.

    NOTE: The original patch was providing an additional mechanism for
    MCA-based trace events that also contained MCA error register data.
    However, as no agreement was reached so far for the MCA-based trace
    events, for now, let's add events only for memory errors.
    A latter patch is planned to change the tracepoint, for those types
    of event.

    Cc: Aristeu Rozanski
    Cc: Doug Thompson
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

29 May, 2012

7 commits

  • While userspace doesn't fill the dimm labels, add there the dimm location,
    as described by the used memory model. This could eventually match what
    is described at the dmidecode, making easier for people to identify the
    memory.

    For example, on an Intel motherboard where the DMI table is reliable,
    the first memory stick is described as:

    Memory Device
    Array Handle: 0x0029
    Error Information Handle: Not Provided
    Total Width: 64 bits
    Data Width: 64 bits
    Size: 2048 MB
    Form Factor: DIMM
    Set: 1
    Locator: A1_DIMM0
    Bank Locator: A1_Node0_Channel0_Dimm0
    Type:
    Type Detail: Synchronous
    Speed: 800 MHz
    Manufacturer: A1_Manufacturer0
    Serial Number: A1_SerNum0
    Asset Tag: A1_AssetTagNum0
    Part Number: A1_PartNum0

    The memory named as "A1_DIMM0" is physically located at the first
    memory controller (node 0), at channel 0, dimm slot 0.

    After this patch, the memory label will be filled with:
    /sys/devices/system/edac/mc/csrow0/ch0_dimm_label:mc#0channel#0slot#0

    And (after the new EDAC API patches) as:
    /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:mc#0channel#0slot#0

    So, even if the memory label is not initialized on userspace, an useful
    information with the error location is filled there, expecially since
    several systems/motherboards are provided with enough info to map from
    channel/slot (or branch/channel/slot) into the DIMM label. So, letting the
    EDAC core fill it by default is a good thing.

    It should noticed that, as the label filling happens at the
    edac_mc_alloc(), drivers can override it to better describe the memories
    (and some actually do it).

    Cc: Aristeu Rozanski
    Cc: Doug Thompson
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Now that all drivers got converted to use the new ABI, we can
    drop the old one.

    Acked-by: Chris Metcalf
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Change the EDAC internal representation to work with non-csrow
    based memory controllers.

    There are lots of those memory controllers nowadays, and more
    are coming. So, the EDAC internal representation needs to be
    changed, in order to work with those memory controllers, while
    preserving backward compatibility with the old ones.

    The edac core was written with the idea that memory controllers
    are able to directly access csrows.

    This is not true for FB-DIMM and RAMBUS memory controllers.

    Also, some recent advanced memory controllers don't present a per-csrows
    view. Instead, they view memories as DIMMs, instead of ranks.

    So, change the allocation and error report routines to allow
    them to work with all types of architectures.

    This will allow the removal of several hacks with FB-DIMM and RAMBUS
    memory controllers.

    Also, several tests were done on different platforms using different
    x86 drivers.

    TODO: a multi-rank DIMMs are currently represented by multiple DIMM
    entries in struct dimm_info. That means that changing a label for one
    rank won't change the same label for the other ranks at the same DIMM.
    This bug is present since the beginning of the EDAC, so it is not a big
    deal. However, on several drivers, it is possible to fix this issue, but
    it should be a per-driver fix, as the csrow => DIMM arrangement may not
    be equal for all. So, don't try to fix it here yet.

    I tried to make this patch as short as possible, preceding it with
    several other patches that simplified the logic here. Yet, as the
    internal API changes, all drivers need changes. The changes are
    generally bigger in the drivers for FB-DIMMs.

    Cc: Aristeu Rozanski
    Cc: Doug Thompson
    Cc: Borislav Petkov
    Cc: Mark Gross
    Cc: Jason Uhlenkott
    Cc: Tim Small
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: Olof Johansson
    Cc: Egor Martovetsky
    Cc: Chris Metcalf
    Cc: Michal Marek
    Cc: Jiri Kosina
    Cc: Joe Perches
    Cc: Dmitry Eremin-Solenikov
    Cc: Benjamin Herrenschmidt
    Cc: Hitoshi Mitake
    Cc: Andrew Morton
    Cc: "Niklas Söderlund"
    Cc: Shaohui Xie
    Cc: Josh Boyer
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • The edac_align_ptr() function is used to prepare data for a single
    memory allocation kzalloc() call. It counts how many bytes are needed
    by some data structure.

    Using it as-is is not that trivial, as the quantity of memory elements
    reserved is not there, but, instead, it is on a next call.

    In order to avoid mistakes when using it, move the number of allocated
    elements into it, making easier to use it.

    Reviewed-by: Borislav Petkov
    Cc: Aristeu Rozanski
    Cc: Doug Thompson
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • The number of pages is a dimm property. Move it to the dimm struct.

    After this change, it is possible to add sysfs nodes for the DIMM's that
    will properly represent the DIMM stick properties, including its size.

    A TODO fix here is to properly represent dual-rank/quad-rank DIMMs when
    the memory controller represents the memory via chip select rows.

    Reviewed-by: Aristeu Rozanski
    Acked-by: Borislav Petkov
    Acked-by: Chris Metcalf
    Cc: Doug Thompson
    Cc: Mark Gross
    Cc: Jason Uhlenkott
    Cc: Tim Small
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: Olof Johansson
    Cc: Egor Martovetsky
    Cc: Michal Marek
    Cc: Jiri Kosina
    Cc: Joe Perches
    Cc: Dmitry Eremin-Solenikov
    Cc: Benjamin Herrenschmidt
    Cc: Hitoshi Mitake
    Cc: Andrew Morton
    Cc: "Niklas Söderlund"
    Cc: Shaohui Xie
    Cc: Josh Boyer
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • On systems based on chip select rows, all channels need to use memories
    with the same properties, otherwise the memories on channels A and B
    won't be recognized.

    However, such assumption is not true for all types of memory
    controllers.

    Controllers for FB-DIMM's don't have such requirements.

    Also, modern Intel controllers seem to be capable of handling such
    differences.

    So, we need to get rid of storing the DIMM information into a per-csrow
    data, storing it, instead at the right place.

    The first step is to move grain, mtype, dtype and edac_mode to the
    per-dimm struct.

    Reviewed-by: Aristeu Rozanski
    Reviewed-by: Borislav Petkov
    Acked-by: Chris Metcalf
    Cc: Doug Thompson
    Cc: Borislav Petkov
    Cc: Mark Gross
    Cc: Jason Uhlenkott
    Cc: Tim Small
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: Olof Johansson
    Cc: Egor Martovetsky
    Cc: Michal Marek
    Cc: Jiri Kosina
    Cc: Joe Perches
    Cc: Dmitry Eremin-Solenikov
    Cc: Benjamin Herrenschmidt
    Cc: Hitoshi Mitake
    Cc: Andrew Morton
    Cc: James Bottomley
    Cc: "Niklas Söderlund"
    Cc: Shaohui Xie
    Cc: Josh Boyer
    Cc: Mike Williams
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • The way a DIMM is currently represented implies that they're
    linked into a per-csrow struct. However, some drivers don't see
    csrows, as they're ridden behind some chip like the AMB's
    on FBDIMM's, for example.

    This forced drivers to fake^Wvirtualize a csrow struct, and to create
    a mess under csrow/channel original's concept.

    Move the DIMM labels into a per-DIMM struct, and add there
    the real location of the socket, in terms of csrow/channel.
    Latter patches will modify the location to properly represent the
    memory architecture.

    All other drivers will use a per-csrow type of location.
    Some of those drivers will require a latter conversion, as
    they also fake the csrows internally.

    TODO: While this patch doesn't change the existing behavior, on
    csrows-based memory controllers, a csrow/channel pair points to a memory
    rank. There's a known bug at the EDAC core that allows having different
    labels for the same DIMM, if it has more than one rank. A latter patch
    is need to merge the several ranks for a DIMM into the same dimm_info
    struct, in order to avoid having different labels for the same DIMM.

    The edac_mc_alloc() will now contain a per-dimm initialization loop that
    will be changed by latter patches in order to match other types of
    memory architectures.

    Reviewed-by: Aristeu Rozanski
    Reviewed-by: Borislav Petkov
    Cc: Doug Thompson
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: "Niklas Söderlund"
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

29 Mar, 2012

1 commit

  • Pull EDAC fixes from Mauro Carvalho Chehab:
    "A series of EDAC driver fixes. It also has one core fix at the
    documentation, and a rename patch, fixing the name of the struct that
    contains the rank information."

    * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
    edac: rename channel_info to rank_info
    i5400_edac: Avoid calling pci_put_device() twice
    edac: i5100 ack error detection register after each read
    edac: i5100 fix erroneous define for M1Err
    edac: sb_edac: Fix a wrong value setting for the previous value
    edac: sb_edac: Fix a INTERLEAVE_MODE() misuse
    edac: sb_edac: Let the driver depend on PCI_MMCONFIG
    edac: Improve the comments to better describe the memory concepts
    edac/ppc4xx_edac: Fix compilation
    Fix sb_edac compilation with 32 bits kernels

    Linus Torvalds
     

22 Mar, 2012

1 commit

  • What it is pointed by a csrow/channel vector is a rank information, and
    not a channel information.

    On a traditional architecture, the memory controller directly access the
    memory ranks, via chip select rows. Different ranks at the same DIMM is
    selected via different chip select rows. So, typically, one
    csrow/channel pair means one different DIMM.

    On FB-DIMMs, there's a microcontroller chip at the DIMM, called Advanced
    Memory Buffer (AMB) that serves as the interface between the memory
    controller and the memory chips.

    The AMB selection is via the DIMM slot, and not via a csrow.

    It is up to the AMB to talk with the csrows of the DRAM chips.

    So, the FB-DIMM memory controllers see the DIMM slot, and not the DIMM
    rank. RAMBUS is similar.

    Newer memory controllers, like the ones found on Intel Sandy Bridge and
    Nehalem, even working with normal DDR3 DIMM's, don't use the usual
    channel A/channel B interleaving schema to provide 128 bits data access.

    Instead, they have more channels (3 or 4 channels), and they can use
    several interleaving schemas. Such memory controllers see the DIMMs
    directly on their registers, instead of the ranks, which is better for
    the driver, as its main usageis to point to a broken DIMM stick (the
    Field Repleceable Unit), and not to point to a broken DRAM chip.

    The drivers that support such such newer memory architecture models
    currently need to fake information and to abuse on EDAC structures, as
    the subsystem was conceived with the idea that the csrow would always be
    visible by the CPU.

    To make things a little worse, those drivers don't currently fake
    csrows/channels on a consistent way, as the concepts there don't apply
    to the memory controllers they're talking with. So, each driver author
    interpreted the concepts using a different logic.

    In order to fix it, let's rename the data structure that points into a
    DIMM rank to "rank_info", in order to be clearer about what's stored
    there.

    Latter patches will provide a better way to represent the memory
    hierarchy for the other types of memory controller.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab