09 May, 2013

1 commit

  • I get the following warning on boot:

    ------------[ cut here ]------------
    WARNING: at drivers/base/core.c:575 device_create_file+0x9a/0xa0()
    Hardware name: -[8737R2A]-
    Write permission without 'store'
    ...

    Drilling down, this is related to dynamic channel ce_count attribute
    files sporting a S_IWUSR mode without a ->store() function. Looking
    around, it appears that they aren't supposed to have a ->store()
    function. So remove the bogus write permission to get rid of the
    warning.

    Signed-off-by: Srivatsa S. Bhat
    Cc: Mauro Carvalho Chehab
    Cc: # 3.[89]
    [ shorten commit message ]
    Signed-off-by: Borislav Petkov

    Srivatsa S. Bhat
     

25 Mar, 2013

1 commit


16 Mar, 2013

2 commits

  • Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
    memory controller is csrows based. Merge both fields into one.

    There's no need for the driver to actually fill it, as the core detects
    it by checking if one of the layers has the csrows type as part of the
    memory hierarchy:

    if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
    per_rank = true;

    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Borislav Petkov

    Mauro Carvalho Chehab
     
  • We were filling the csrow size with a wrong value. 16a528ee3975 ("EDAC:
    Fix csrow size reported in sysfs") tried to address the issue. It fixed
    the report with the old API but not with the new one. Correct it for the
    new API too.

    Signed-off-by: Mauro Carvalho Chehab
    [ make it a per-csrow accounting regardless of ->channel_count ]
    Signed-off-by: Borislav Petkov

    Mauro Carvalho Chehab
     

05 Mar, 2013

1 commit


21 Feb, 2013

2 commits


08 Jan, 2013

2 commits

  • Use device_unregister to replace put_device + device_del for
    cleanup, and fix the potential use after free.

    Signed-off-by: Lans Zhang
    Signed-off-by: Borislav Petkov

    Lans Zhang
     
  • This patch fixes use-after-free and double-free bugs in
    edac_mc_sysfs_exit(). mci_pdev has single reference and put_device()
    calls mc_attr_release() which calls kfree(). The following
    device_del() works with already released memory. An another kfree() in
    edac_mc_sysfs_exit() releses the same memory again. Great.

    Signed-off-by: Konstantin Khlebnikov
    Cc: stable@vger.kernel.org # 3.[67]
    Cc: Denis Kirjanov
    Cc: Mauro Carvalho Chehab
    Link: http://lkml.kernel.org/r/20121214110310.11019.21098.stgit@zurg
    Signed-off-by: Borislav Petkov

    Konstantin Khlebnikov
     

28 Nov, 2012

5 commits


27 Jun, 2012

1 commit


12 Jun, 2012

12 commits

  • Create a single, top-level "edac" directory for debugfs. An "mc[0-N]"
    directory is then created for each memory controller. Individual drivers
    can create additional entries such as h/w error injection control.

    Signed-off-by: Rob Herring
    Signed-off-by: Mauro Carvalho Chehab

    Rob Herring
     
  • In order to avoid loosing error events, it is desirable to group
    error events together and generate a single trace for several identical
    errors.

    The trace API already allows reporting multiple errors. Change the
    handle_error function to also allow that.

    The changes at the drivers were made by this small script:

    $file .=$_ while (<>);
    $file =~ s/(edac_mc_handle_error)\s*\(([^\,]+)\,([^\,]+)\,/$1($2,$3, 1,/g;
    print $file;

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Remove the arch-dependent parameter, as it were not used,
    as the MCE tracepoint weren't implemented. It probably doesn't
    make sense to have an MCE-specific tracepoint, as this will
    cost more bytes at the tracepoint, and tracepoint is not free.

    The changes at the EDAC drivers were done by this small perl script:

    $file .=$_ while (<>);
    $file =~ s/(edac_mc_handle_error)\s*\(([^\;]+)\,([^\,\)]+)\s*\)/$1($2)/g;
    print $file;

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • The edac_mc_alloc() routine allocates one dimm_info device for all
    possible memories, including the non-filled ones. The debug messages
    there are somewhat confusing. So, cleans them, by moving the code
    that prints the memory location to edac_mc, and using it on both
    edac_mc_sysfs and edac_mc.

    Also, only dumps information when DIMM/ranks are actually
    filled.

    After this patch, a dimm-based memory controller will print the debug
    info as:

    [ 1011.380027] EDAC DEBUG: edac_mc_dump_csrow: csrow->csrow_idx = 0
    [ 1011.380029] EDAC DEBUG: edac_mc_dump_csrow: csrow = ffff8801169be000
    [ 1011.380031] EDAC DEBUG: edac_mc_dump_csrow: csrow->first_page = 0x0
    [ 1011.380032] EDAC DEBUG: edac_mc_dump_csrow: csrow->last_page = 0x0
    [ 1011.380034] EDAC DEBUG: edac_mc_dump_csrow: csrow->page_mask = 0x0
    [ 1011.380035] EDAC DEBUG: edac_mc_dump_csrow: csrow->nr_channels = 3
    [ 1011.380037] EDAC DEBUG: edac_mc_dump_csrow: csrow->channels = ffff8801149c2840
    [ 1011.380039] EDAC DEBUG: edac_mc_dump_csrow: csrow->mci = ffff880117426000
    [ 1011.380041] EDAC DEBUG: edac_mc_dump_channel: channel->chan_idx = 0
    [ 1011.380042] EDAC DEBUG: edac_mc_dump_channel: channel = ffff8801149c2860
    [ 1011.380044] EDAC DEBUG: edac_mc_dump_channel: channel->csrow = ffff8801169be000
    [ 1011.380046] EDAC DEBUG: edac_mc_dump_channel: channel->dimm = ffff88010fe90400
    ...
    [ 1011.380095] EDAC DEBUG: edac_mc_dump_dimm: dimm0: channel 0 slot 0 mapped as virtual row 0, chan 0
    [ 1011.380097] EDAC DEBUG: edac_mc_dump_dimm: dimm = ffff88010fe90400
    [ 1011.380099] EDAC DEBUG: edac_mc_dump_dimm: dimm->label = 'CPU#0Channel#0_DIMM#0'
    [ 1011.380101] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000
    [ 1011.380103] EDAC DEBUG: edac_mc_dump_dimm: dimm->grain = 8
    [ 1011.380104] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000
    ...

    (a rank-based memory controller would print, instead of "dimm?", "rank?"
    on the above debug info)

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Use a more common debugging style.

    Remove __FILE__ uses, add missing newlines,
    coalesce formats and align arguments.

    Signed-off-by: Joe Perches
    Signed-off-by: Mauro Carvalho Chehab

    Joe Perches
     
  • The debug macro already adds that. Most of the work here was
    made by this small script:

    $f .=$_ while (<>);

    $f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*": /\1"/g;
    $f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*/\1/g;
    $f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*"MC: /\1"/g;

    $f =~ s/(debugf[0-9]\s*\(\")\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+)__func__\s*\,\s*/\1\2/g;
    $f =~ s/(debugf[0-9]\s*\(\")\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+),\s*__func__\s*\)/\1\2)/g;
    $f =~ s/(debugf[0-9]\s*\(\"MC\:\s*)\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+)__func__\s*\,\s*/\1\2/g;
    $f =~ s/(debugf[0-9]\s*\(\"MC\:\s*)\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+),\s*__func__\s*\)/\1\2)/g;

    $f =~ s/\"MC\: \\n\"/"MC:\\n"/g;

    print $f;

    After running the script, manual cleanups were done to fix it the remaining
    places.

    While here, removed the __LINE__ on most places, as it doesn't actually give
    useful info on most places.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Kernel kobjects have rigid rules: each container object should be
    dynamically allocated, and can't be allocated into a single kmalloc.

    EDAC never obeyed this rule: it has a single malloc function that
    allocates all needed data into a single kzalloc.

    As this is not accepted anymore, change the allocation schema of the
    EDAC *_info structs to enforce this Kernel standard.

    Acked-by: Chris Metcalf
    Cc: Aristeu Rozanski
    Cc: Doug Thompson
    Cc: Greg K H
    Cc: Borislav Petkov
    Cc: Mark Gross
    Cc: Tim Small
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: Olof Johansson
    Cc: Egor Martovetsky
    Cc: Michal Marek
    Cc: Jiri Kosina
    Cc: Dmitry Eremin-Solenikov
    Cc: Benjamin Herrenschmidt
    Cc: Hitoshi Mitake
    Cc: Andrew Morton
    Cc: Shaohui Xie
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • This patch actually fixes a bug with the legacy API, where, at the
    same csrow, some channels may have different DIMMs. This can happen
    on FB-DIMM/RAMBUS and modern Intel controllers.

    This is the case, for example, of Nehalem machines:

    $ ./edac-ctl --layout
    +-----------------------------------+
    | mc0 |
    | channel0 | channel1 | channel2 |
    -------+-----------------------------------+
    slot2: | 0 MB | 0 MB | 0 MB |
    slot1: | 1024 MB | 0 MB | 0 MB |
    slot0: | 1024 MB | 1024 MB | 1024 MB |
    -------+-----------------------------------+

    Before this patch, non-filled memories were shown. Now, only what's
    filled is there:

    grep . /sys/devices/system/edac/mc/mc0/csrow*/ch?*
    /sys/devices/system/edac/mc/mc0/csrow0/ch0_ce_count:0
    /sys/devices/system/edac/mc/mc0/csrow0/ch0_dimm_label:CPU#0Channel#0_DIMM#0
    /sys/devices/system/edac/mc/mc0/csrow0/ch1_ce_count:0
    /sys/devices/system/edac/mc/mc0/csrow0/ch1_dimm_label:CPU#0Channel#0_DIMM#1
    /sys/devices/system/edac/mc/mc0/csrow1/ch0_ce_count:0
    /sys/devices/system/edac/mc/mc0/csrow1/ch0_dimm_label:CPU#0Channel#1_DIMM#0
    /sys/devices/system/edac/mc/mc0/csrow2/ch0_ce_count:0
    /sys/devices/system/edac/mc/mc0/csrow2/ch0_dimm_label:CPU#0Channel#2_DIMM#0

    Thanks-to: Aristeu Rozanski Filho
    Reviewed-by: Aristeu Rozanski
    Cc: Doug Thompson
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Sometimes, it is useful to have a mechanism that generates fake
    errors, in order to test the EDAC core code, and the userspace
    tools.

    Provide such mechanism by adding a few debugfs nodes.

    Reviewed-by: Aristeu Rozanski
    Cc: Doug Thompson
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • The userspace tools need to know what's the maximum location on each
    system, as it helps to create nice maps showing how the memory was
    filled at the system.

    Reviewed-by: Aristeu Rozanski
    Cc: Doug Thompson
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • The old EDAC API is broken. It only works fine for systems manufatured
    before 2005 and for AMD 64. The reason is that it forces all memory
    controller drivers to discover rank info.

    Also, it doesn't allow grouping the several ranks into a DIMM.

    So, what almost all modern drivers do is to create a fake virtual-rank
    information, and use it to cheat the EDAC core to accept the driver.

    While this works if the user has enough time to discover what DIMM slot
    corresponds to each "virtual-rank" information, it prevents EDAC usage
    for users with less available time. It also makes life hard for vendors
    that may want to provide a table with their motherboards to the userspace
    tool (edac-utils) as each driver has its own logic for the virtual
    mapping.

    So, the old API should be removed, in favor of a more flexible API that
    allows newer drivers to not lie to the EDAC core.

    Reviewed-by: Aristeu Rozanski
    Cc: Doug Thompson
    Cc: Borislav Petkov
    Cc: Randy Dunlap
    Cc: Josh Boyer
    Cc: Hui Wang
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • The EDAC subsystem uses the old struct sysdev approach,
    creating all nodes using the raw sysfs API. This is bad,
    as the API is deprecated.

    As we'll be changing the EDAC API, let's first port the existing
    code to struct device.

    There's one drawback on this patch: driver-specific sysfs
    nodes, used by mpc85xx_edac, amd64_edac and i7core_edac
    won't be created anymore. While it would be possible to
    also port the device-specific code, that would mix kobj with
    struct device, with is not recommended. Also, it is easier and nicer
    to move the code to the drivers, instead, as the core can get rid
    of some complex logic that just emulates what the device_add()
    and device_create_file() already does.

    The next patches will convert the driver-specific code to use
    the device-specific calls. Then, the remaining bits of the old
    sysfs API will be removed.

    NOTE: a per-MC bus is required, otherwise devices with more than
    one memory controller will hit a bug like the one below:

    [ 819.094946] EDAC DEBUG: find_mci_by_dev: find_mci_by_dev()
    [ 819.094948] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device() idx=1
    [ 819.094952] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device(): creating device mc1
    [ 819.094967] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device creating dimm0, located at channel 0 slot 0
    [ 819.094984] ------------[ cut here ]------------
    [ 819.100142] WARNING: at fs/sysfs/dir.c:481 sysfs_add_one+0xc1/0xf0()
    [ 819.107282] Hardware name: S2600CP
    [ 819.111078] sysfs: cannot create duplicate filename '/bus/edac/devices/dimm0'
    [ 819.119062] Modules linked in: sb_edac(+) edac_core ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun kvm microcode pcspkr iTCO_wdt iTCO_vendor_support igb i2c_i801 i2c_core sg ioatdma dca sr_mod cdrom sd_mod crc_t10dif ahci libahci isci libsas libata scsi_transport_sas scsi_mod wmi dm_mod [last unloaded: scsi_wait_scan]
    [ 819.175748] Pid: 10902, comm: modprobe Not tainted 3.3.0-0.11.el7.v12.2.x86_64 #1
    [ 819.184113] Call Trace:
    [ 819.186868] [] warn_slowpath_common+0x7f/0xc0
    [ 819.193573] [] warn_slowpath_fmt+0x46/0x50
    [ 819.200000] [] sysfs_add_one+0xc1/0xf0
    [ 819.206025] [] sysfs_do_create_link+0x135/0x220
    [ 819.212944] [] ? sysfs_create_group+0x13/0x20
    [ 819.219656] [] sysfs_create_link+0x13/0x20
    [ 819.226109] [] bus_add_device+0xe6/0x1b0
    [ 819.232350] [] device_add+0x2db/0x460
    [ 819.238300] [] edac_create_dimm_object+0x84/0xf0 [edac_core]
    [ 819.246460] [] edac_create_sysfs_mci_device+0xe8/0x290 [edac_core]
    [ 819.255215] [] edac_mc_add_mc+0x5a/0x2c0 [edac_core]
    [ 819.262611] [] sbridge_register_mci+0x1bc/0x279 [sb_edac]
    [ 819.270493] [] sbridge_probe+0xef/0x175 [sb_edac]
    [ 819.277630] [] ? pm_runtime_enable+0x58/0x90
    [ 819.284268] [] local_pci_probe+0x5c/0xd0
    [ 819.290508] [] __pci_device_probe+0xf1/0x100
    [ 819.297117] [] pci_device_probe+0x3a/0x60
    [ 819.303457] [] really_probe+0x73/0x270
    [ 819.309496] [] driver_probe_device+0x4e/0xb0
    [ 819.316104] [] __driver_attach+0xab/0xb0
    [ 819.322337] [] ? driver_probe_device+0xb0/0xb0
    [ 819.329151] [] bus_for_each_dev+0x56/0x90
    [ 819.335489] [] driver_attach+0x1e/0x20
    [ 819.341534] [] bus_add_driver+0x1b0/0x2a0
    [ 819.347884] [] ? 0xffffffffa0346fff
    [ 819.353641] [] driver_register+0x76/0x140
    [ 819.359980] [] ? printk+0x51/0x53
    [ 819.365524] [] ? 0xffffffffa0346fff
    [ 819.371291] [] __pci_register_driver+0x56/0xd0
    [ 819.378096] [] sbridge_init+0x54/0x1000 [sb_edac]
    [ 819.385231] [] do_one_initcall+0x3f/0x170
    [ 819.391577] [] sys_init_module+0xbe/0x230
    [ 819.397926] [] system_call_fastpath+0x16/0x1b
    [ 819.404633] ---[ end trace 1654fdd39556689f ]---

    This happens because the bus is not being properly initialized.
    Instead of putting the memory sub-devices inside the memory controller,
    it is putting everything under the same directory:

    $ tree /sys/bus/edac/
    /sys/bus/edac/
    ├── devices
    │ ├── all_channel_counts -> ../../../devices/system/edac/mc/mc0/all_channel_counts
    │ ├── csrow0 -> ../../../devices/system/edac/mc/mc0/csrow0
    │ ├── csrow1 -> ../../../devices/system/edac/mc/mc0/csrow1
    │ ├── csrow2 -> ../../../devices/system/edac/mc/mc0/csrow2
    │ ├── dimm0 -> ../../../devices/system/edac/mc/mc0/dimm0
    │ ├── dimm1 -> ../../../devices/system/edac/mc/mc0/dimm1
    │ ├── dimm3 -> ../../../devices/system/edac/mc/mc0/dimm3
    │ ├── dimm6 -> ../../../devices/system/edac/mc/mc0/dimm6
    │ ├── inject_addrmatch -> ../../../devices/system/edac/mc/mc0/inject_addrmatch
    │ ├── mc -> ../../../devices/system/edac/mc
    │ └── mc0 -> ../../../devices/system/edac/mc/mc0
    ├── drivers
    ├── drivers_autoprobe
    ├── drivers_probe
    └── uevent

    On a multi-memory controller system, the names "csrow%d" and "dimm%d"
    should be under "mc%d", and not at the main hierarchy level.

    So, we need to create a per-MC bus, in order to have its own namespace.

    Reviewed-by: Aristeu Rozanski
    Cc: Doug Thompson
    Cc: Greg K H
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

11 Jun, 2012

1 commit

  • As EDAC doesn't use struct device itself, it created a parent dev
    pointer called as "pdev". Now that we'll be converting it to use
    struct device, instead of struct devsys, this needs to be fixed.

    No functional changes.

    Reviewed-by: Aristeu Rozanski
    Acked-by: Chris Metcalf
    Cc: Doug Thompson
    Cc: Borislav Petkov
    Cc: Mark Gross
    Cc: Jason Uhlenkott
    Cc: Tim Small
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: Olof Johansson
    Cc: Egor Martovetsky
    Cc: Michal Marek
    Cc: Jiri Kosina
    Cc: Joe Perches
    Cc: Dmitry Eremin-Solenikov
    Cc: Benjamin Herrenschmidt
    Cc: Hitoshi Mitake
    Cc: Andrew Morton
    Cc: "Niklas Söderlund"
    Cc: Shaohui Xie
    Cc: Josh Boyer
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

29 May, 2012

4 commits

  • While userspace doesn't fill the dimm labels, add there the dimm location,
    as described by the used memory model. This could eventually match what
    is described at the dmidecode, making easier for people to identify the
    memory.

    For example, on an Intel motherboard where the DMI table is reliable,
    the first memory stick is described as:

    Memory Device
    Array Handle: 0x0029
    Error Information Handle: Not Provided
    Total Width: 64 bits
    Data Width: 64 bits
    Size: 2048 MB
    Form Factor: DIMM
    Set: 1
    Locator: A1_DIMM0
    Bank Locator: A1_Node0_Channel0_Dimm0
    Type:
    Type Detail: Synchronous
    Speed: 800 MHz
    Manufacturer: A1_Manufacturer0
    Serial Number: A1_SerNum0
    Asset Tag: A1_AssetTagNum0
    Part Number: A1_PartNum0

    The memory named as "A1_DIMM0" is physically located at the first
    memory controller (node 0), at channel 0, dimm slot 0.

    After this patch, the memory label will be filled with:
    /sys/devices/system/edac/mc/csrow0/ch0_dimm_label:mc#0channel#0slot#0

    And (after the new EDAC API patches) as:
    /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:mc#0channel#0slot#0

    So, even if the memory label is not initialized on userspace, an useful
    information with the error location is filled there, expecially since
    several systems/motherboards are provided with enough info to map from
    channel/slot (or branch/channel/slot) into the DIMM label. So, letting the
    EDAC core fill it by default is a good thing.

    It should noticed that, as the label filling happens at the
    edac_mc_alloc(), drivers can override it to better describe the memories
    (and some actually do it).

    Cc: Aristeu Rozanski
    Cc: Doug Thompson
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • The number of pages is a dimm property. Move it to the dimm struct.

    After this change, it is possible to add sysfs nodes for the DIMM's that
    will properly represent the DIMM stick properties, including its size.

    A TODO fix here is to properly represent dual-rank/quad-rank DIMMs when
    the memory controller represents the memory via chip select rows.

    Reviewed-by: Aristeu Rozanski
    Acked-by: Borislav Petkov
    Acked-by: Chris Metcalf
    Cc: Doug Thompson
    Cc: Mark Gross
    Cc: Jason Uhlenkott
    Cc: Tim Small
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: Olof Johansson
    Cc: Egor Martovetsky
    Cc: Michal Marek
    Cc: Jiri Kosina
    Cc: Joe Perches
    Cc: Dmitry Eremin-Solenikov
    Cc: Benjamin Herrenschmidt
    Cc: Hitoshi Mitake
    Cc: Andrew Morton
    Cc: "Niklas Söderlund"
    Cc: Shaohui Xie
    Cc: Josh Boyer
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • On systems based on chip select rows, all channels need to use memories
    with the same properties, otherwise the memories on channels A and B
    won't be recognized.

    However, such assumption is not true for all types of memory
    controllers.

    Controllers for FB-DIMM's don't have such requirements.

    Also, modern Intel controllers seem to be capable of handling such
    differences.

    So, we need to get rid of storing the DIMM information into a per-csrow
    data, storing it, instead at the right place.

    The first step is to move grain, mtype, dtype and edac_mode to the
    per-dimm struct.

    Reviewed-by: Aristeu Rozanski
    Reviewed-by: Borislav Petkov
    Acked-by: Chris Metcalf
    Cc: Doug Thompson
    Cc: Borislav Petkov
    Cc: Mark Gross
    Cc: Jason Uhlenkott
    Cc: Tim Small
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: Olof Johansson
    Cc: Egor Martovetsky
    Cc: Michal Marek
    Cc: Jiri Kosina
    Cc: Joe Perches
    Cc: Dmitry Eremin-Solenikov
    Cc: Benjamin Herrenschmidt
    Cc: Hitoshi Mitake
    Cc: Andrew Morton
    Cc: James Bottomley
    Cc: "Niklas Söderlund"
    Cc: Shaohui Xie
    Cc: Josh Boyer
    Cc: Mike Williams
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • The way a DIMM is currently represented implies that they're
    linked into a per-csrow struct. However, some drivers don't see
    csrows, as they're ridden behind some chip like the AMB's
    on FBDIMM's, for example.

    This forced drivers to fake^Wvirtualize a csrow struct, and to create
    a mess under csrow/channel original's concept.

    Move the DIMM labels into a per-DIMM struct, and add there
    the real location of the socket, in terms of csrow/channel.
    Latter patches will modify the location to properly represent the
    memory architecture.

    All other drivers will use a per-csrow type of location.
    Some of those drivers will require a latter conversion, as
    they also fake the csrows internally.

    TODO: While this patch doesn't change the existing behavior, on
    csrows-based memory controllers, a csrow/channel pair points to a memory
    rank. There's a known bug at the EDAC core that allows having different
    labels for the same DIMM, if it has more than one rank. A latter patch
    is need to merge the several ranks for a DIMM into the same dimm_info
    struct, in order to avoid having different labels for the same DIMM.

    The edac_mc_alloc() will now contain a per-dimm initialization loop that
    will be changed by latter patches in order to match other types of
    memory architectures.

    Reviewed-by: Aristeu Rozanski
    Reviewed-by: Borislav Petkov
    Cc: Doug Thompson
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: "Niklas Söderlund"
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

19 Mar, 2012

1 commit

  • The original scrub rate API definition states that if scrub rate
    accessors are not implemented, a negative value (-1) should be written
    to the sysfs file (/sys/devices/system/edac/mc/mc/sdram_scrub_rate,
    where N is the memory controller number on the system). This is
    counter-intuitive and awkward at the very least because, when setting
    the scrub rate, userspace has to write to sysfs and then read it back to
    check error status of the operation.

    As Tony notes, best it would be to not have the sdram_scrub_rate in
    sysfs if scrub rate support is not implemented. It is too late about
    that and a bunch of drivers on a bunch of arches would need to be
    changed and tested which is not a trivial task ATM.

    Instead, settle for the next best thing of returning -ENODEV when
    implementation is missing and -EINVAL when there was an error
    encountered while setting the scrub rate.

    Reported-by: Han Pingtian
    Cc: Tony Luck
    Link: http://lkml.kernel.org/r/20110916105856.GA13253@hpt.nay.redhat.com
    Signed-off-by: Borislav Petkov

    Borislav Petkov
     

15 Dec, 2011

1 commit


21 Apr, 2011

1 commit


31 Mar, 2011

1 commit


17 Mar, 2011

1 commit


07 Jan, 2011

1 commit

  • Make the ->{get|set}_sdram_scrub_rate return the actual scrub rate
    bandwidth it succeeded setting and remove superfluous arg pointer used
    for that. A negative value returned still means that an error occurred
    while setting the scrubrate. Document this for future reference.

    Signed-off-by: Borislav Petkov

    Borislav Petkov
     

27 Oct, 2010

1 commit

  • * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core: (34 commits)
    i7core_edac: return -ENODEV when devices were already probed
    i7core_edac: properly terminate pci_dev_table
    i7core_edac: Avoid PCI refcount to reach zero on successive load/reload
    i7core_edac: Fix refcount error at PCI devices
    i7core_edac: it is safe to i7core_unregister_mci() when mci=NULL
    i7core_edac: Fix an oops at i7core probe
    i7core_edac: Remove unused member channels in i7core_pvt
    i7core_edac: Remove unused arg csrow from get_dimm_config
    i7core_edac: Reduce args of i7core_register_mci
    i7core_edac: Introduce i7core_unregister_mci
    i7core_edac: Use saved pointers
    i7core_edac: Check probe counter in i7core_remove
    i7core_edac: Call pci_dev_put() when alloc_i7core_dev() failed
    i7core_edac: Fix error path of i7core_register_mci
    i7core_edac: Fix order of lines in i7core_register_mci
    i7core_edac: Always do get/put for all devices
    i7core_edac: Introduce i7core_pci_ctl_create/release
    i7core_edac: Introduce free_i7core_dev
    i7core_edac: Introduce alloc_i7core_dev
    i7core_edac: Reduce args of i7core_get_onedevice
    ...

    Linus Torvalds
     

24 Oct, 2010

1 commit

  • This is a nasty bug. Since kobject count will be reduced by zero by
    edac_mc_del_mc(), and this triggers the kobj release method, the
    mci memory will be freed automatically. So, all we have left is ctl_name,
    as shown by enabling debug:

    [ 80.822186] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1020: edac_remove_sysfs_mci_device() remove_link
    [ 80.832590] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1024: edac_remove_sysfs_mci_device() remove_mci_instance
    [ 80.843776] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 640: edac_mci_control_release() mci instance idx=0 releasing
    [ 80.855163] EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:3f:03.0
    [ 80.862936] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 2089: (null): free structs
    [ 80.871134] EDAC DEBUG: in drivers/edac/edac_mc.c, line at 238: edac_mc_free()
    [ 80.878379] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 726: edac_mc_unregister_sysfs_main_kobj()
    [ 80.888043] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1232: drivers/edac/i7core_edac.c: i7core_put_devices()

    Also, kfree(mci) shouldn't happen at the kobj.release, as it happens
    when edac_remove_sysfs_mci_device() is called, but the logic is:
    edac_remove_sysfs_mci_device(mci);
    edac_printk(KERN_INFO, EDAC_MC,
    "Removed device %d for %s %s: DEV %s\n", mci->mc_idx,
    mci->mod_name, mci->ctl_name, edac_dev_name(mci));
    So, as the edac_printk() needs the mci struct, this generates an OOPS.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab