18 Sep, 2020

1 commit

  • Reading those sysfs entries gives:

    [root@localhost /]# cat /sys/devices/system/edac/mc/mc0/max_location
    memory 3 [root@localhost /]# cat /sys/devices/system/edac/mc/mc0/dimm0/dimm_location
    memory 0 [root@localhost /]#

    Add newlines after the value it prints for better readability.

    [ bp: Make len a signed int and change the check to catch wraparound.
    Increment the pointer p only when the length check passes. Use
    scnprintf(). ]

    Signed-off-by: Xiongfeng Wang
    Signed-off-by: Borislav Petkov
    Link: https://lkml.kernel.org/r/1600051734-8993-1-git-send-email-wangxiongfeng2@huawei.com

    Xiongfeng Wang
     

17 Feb, 2020

2 commits

  • Looking at how mci->{ue,ce}_per_layer[EDAC_MAX_LAYERS] is used, it
    turns out that only the leaves in the memory hierarchy are consumed
    (in sysfs), but not the intermediate layers, e.g.:

    count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][dimm->idx];

    These unused counters only add complexity, remove them. The error
    counter values are directly stored in struct dimm_info now.

    Signed-off-by: Robert Richter
    Signed-off-by: Borislav Petkov
    Acked-by: Aristeu Rozanski
    Link: https://lkml.kernel.org/r/20200123090210.26933-11-rrichter@marvell.com

    Robert Richter
     
  • There are dimm and csrow devices linked to the mci device esp. to show
    up in sysfs. It must be granted that children devices are removed before
    its mci parent. Thus, the release functions must be called in the
    correct order and may not miss any child before releasing its parent. In
    the current implementation this is only granted by the correct order of
    release functions.

    A much better approach is to use put_device() that releases the device
    only after all users are gone. It is the recommended way to release a
    device and free its memory. The function uses the device's refcount and
    only frees it if there are no users of it anymore such as children.

    So implement a mci_release() function to remove mci devices, use
    put_device() to free them and early initialize the mci device right
    after its struct has been allocated.

    Change the release function so that it can be universally used no
    matter if the device is registered or not. Since subsequent dimm
    and csrow sysfs links are implemented as children devices, their
    refcounts will keep the parent mci device from being removed as long
    as sysfs entries exist and until all users have been unregistered in
    edac_remove_sysfs_mci_device().

    Remove edac_unregister_sysfs() and merge mci sysfs removal into
    edac_remove_sysfs_mci_device(). There is only a single instance now that
    removes the sysfs entries. The function can now be used in the error
    paths for cleanup.

    Also, create device release functions for all involved devices
    (dev->release), remove device_type release functions (dev_type->
    release) and also use dev->init_name instead of dev_set_name().

    [ bp: Massage commit message and comments. ]

    Signed-off-by: Robert Richter
    Signed-off-by: Borislav Petkov
    Acked-by: Aristeu Rozanski
    Link: https://lkml.kernel.org/r/20200212120340.4764-5-rrichter@marvell.com

    Robert Richter
     

13 Feb, 2020

2 commits

  • All created csrow objects must be removed in the error path of
    edac_create_csrow_objects(). The objects have been added as devices.

    They need to be removed by doing a device_del() *and* put_device() call
    to also free their memory. The missing put_device() leaves a memory
    leak. Use device_unregister() instead of device_del() which properly
    unregisters the device doing both.

    Fixes: 7adc05d2dc3a ("EDAC/sysfs: Drop device references properly")
    Signed-off-by: Robert Richter
    Signed-off-by: Borislav Petkov
    Tested-by: John Garry
    Cc:
    Link: https://lkml.kernel.org/r/20200212120340.4764-4-rrichter@marvell.com

    Robert Richter
     
  • A test kernel with the options DEBUG_TEST_DRIVER_REMOVE, KASAN and
    DEBUG_KMEMLEAK set, revealed several issues when removing an mci device:

    1) Use-after-free:

    On 27.11.19 17:07:33, John Garry wrote:
    > [ 22.104498] BUG: KASAN: use-after-free in
    > edac_remove_sysfs_mci_device+0x148/0x180

    The use-after-free is caused by the mci_for_each_dimm() macro called in
    edac_remove_sysfs_mci_device(). The iterator was introduced with

    c498afaf7df8 ("EDAC: Introduce an mci_for_each_dimm() iterator").

    The iterator loop calls device_unregister(&dimm->dev), which removes
    the sysfs entry of the device, but also frees the dimm struct in
    dimm_attr_release(). When incrementing the loop in mci_for_each_dimm(),
    the dimm struct is accessed again, after having been freed already.

    The fix is to free all the mci device's subsequent dimm and csrow
    objects at a later point, in _edac_mc_free(), when the mci device itself
    is being freed.

    This keeps the data structures intact and the mci device can be
    fully used until its removal. The change allows the safe usage of
    mci_for_each_dimm() to release dimm devices from sysfs.

    2) Memory leaks:

    Following memory leaks have been detected:

    # grep edac /sys/kernel/debug/kmemleak | sort | uniq -c
    1 [] edac_mc_alloc+0x3bc/0x9d0 # mci->csrows
    16 [] edac_mc_alloc+0x49c/0x9d0 # csr->channels
    16 [] edac_mc_alloc+0x518/0x9d0 # csr->channels[chn]
    1 [] edac_mc_alloc+0x5c8/0x9d0 # mci->dimms
    34 [] ghes_edac_register+0x1c8/0x3f8 # see edac_mc_alloc()

    All leaks are from memory allocated by edac_mc_alloc().

    Note: The test above shows that edac_mc_alloc() was called here from
    ghes_edac_register(), thus both functions show up in the stack trace
    but the module causing the leaks is edac_mc. The comments with the data
    structures involved were made manually by analyzing the objdump.

    The data structures listed above and created by edac_mc_alloc() are
    not properly removed during device removal, which is done in
    edac_mc_free().

    There are two paths implemented to remove the device depending on device
    registration, _edac_mc_free() is called if the device is not registered
    and edac_unregister_sysfs() otherwise.

    The implemenations differ. For the sysfs case, the mci device removal
    lacks the removal of subsequent data structures (csrows, channels,
    dimms). This causes the memory leaks (see mci_attr_release()).

    [ bp: Massage commit message. ]

    Fixes: c498afaf7df8 ("EDAC: Introduce an mci_for_each_dimm() iterator")
    Fixes: faa2ad09c01c ("edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.")
    Fixes: 7a623c039075 ("edac: rewrite the sysfs code to use struct device")
    Reported-by: John Garry
    Signed-off-by: Robert Richter
    Signed-off-by: Borislav Petkov
    Tested-by: John Garry
    Cc:
    Link: https://lkml.kernel.org/r/20200212120340.4764-3-rrichter@marvell.com

    Robert Richter
     

10 Nov, 2019

1 commit

  • Introduce an mci_for_each_dimm() iterator. It returns a pointer to
    a struct dimm_info. This makes the declaration and use of an index
    obsolete and avoids access to internal data of struct mci (direct array
    access etc).

    [ bp: push the struct dimm_info *dimm; declaration into the
    CONFIG_EDAC_DEBUG block. ]

    Signed-off-by: Robert Richter
    Signed-off-by: Borislav Petkov
    Reviewed-by: Mauro Carvalho Chehab
    Cc: "linux-edac@vger.kernel.org"
    Cc: James Morse
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20191106093239.25517-4-rrichter@marvell.com

    Robert Richter
     

09 Nov, 2019

1 commit

  • The EDAC_DIMM_OFF() macro takes 5 arguments to get the DIMM's index.
    Simplify this by storing the index in struct dimm_info to avoid its
    calculation and remove the EDAC_DIMM_OFF() macro. The index can be
    directly used then.

    Another advantage is that edac_mc_alloc() could be used even if the
    exact size of the layers is unknown. Only the number of DIMMs would be
    needed.

    Rename iterator variable to idx, while at it. The name is more handy,
    esp. when searching for it in the code.

    Signed-off-by: Robert Richter
    Signed-off-by: Borislav Petkov
    Reviewed-by: Mauro Carvalho Chehab
    Cc: "linux-edac@vger.kernel.org"
    Cc: James Morse
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20191106093239.25517-3-rrichter@marvell.com

    Robert Richter
     

04 Sep, 2019

3 commits

  • Debug messages are inconsistently used in the error handlers. Some lack
    an error message, some are called regardless of the return status,
    messages for the same error are at different locations in the code
    depending on the error code. This happens esp. near put_device() calls.

    Make those debug messages more consistent. Additionally, unify the error
    messages to have the same terms for the same operations of the device.

    Signed-off-by: Robert Richter
    Signed-off-by: Borislav Petkov
    Reviewed-by: Mauro Carvalho Chehab
    Cc: "linux-edac@vger.kernel.org"
    Cc: James Morse
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20190902123216.9809-5-rrichter@marvell.com

    Robert Richter
     
  • Use direct returns instead of gotos. Error handling code becomes
    smaller and better readable.

    Signed-off-by: Robert Richter
    Signed-off-by: Borislav Petkov
    Reviewed-by: Mauro Carvalho Chehab
    Cc: "linux-edac@vger.kernel.org"
    Cc: James Morse
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20190902123216.9809-4-rrichter@marvell.com

    Robert Richter
     
  • Use of 'unsigned int' instead of bare use of 'unsigned'. Fix this for
    edac_mc*, ghes and the i5100 driver as reported by checkpatch.pl.

    While at it, struct member dev_ch_attribute->channel is always used as
    unsigned int. Change type to unsigned int to avoid type casts.

    [ bp: Massage. ]

    Signed-off-by: Robert Richter
    Signed-off-by: Borislav Petkov
    Reviewed-by: Mauro Carvalho Chehab
    Cc: "linux-edac@vger.kernel.org"
    Cc: James Morse
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20190902123216.9809-2-rrichter@marvell.com

    Robert Richter
     

28 Jun, 2019

1 commit

  • Commit 9da21b1509d8 ("EDAC: Poll timeout cannot be zero, p2") assumes
    edac_mc_poll_msec to be unsigned long, but the type of the variable still
    remained as int. Setting edac_mc_poll_msec can trigger out-of-bounds
    write.

    Reproducer:

    # echo 1001 > /sys/module/edac_core/parameters/edac_mc_poll_msec

    KASAN report:

    BUG: KASAN: global-out-of-bounds in edac_set_poll_msec+0x140/0x150
    Write of size 8 at addr ffffffffb91b2d00 by task bash/1996

    CPU: 1 PID: 1996 Comm: bash Not tainted 5.2.0-rc6+ #23
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
    Call Trace:
    dump_stack+0xca/0x13e
    print_address_description.cold+0x5/0x246
    __kasan_report.cold+0x75/0x9a
    ? edac_set_poll_msec+0x140/0x150
    kasan_report+0xe/0x20
    edac_set_poll_msec+0x140/0x150
    ? dimmdev_location_show+0x30/0x30
    ? vfs_lock_file+0xe0/0xe0
    ? _raw_spin_lock+0x87/0xe0
    param_attr_store+0x1b5/0x310
    ? param_array_set+0x4f0/0x4f0
    module_attr_store+0x58/0x80
    ? module_attr_show+0x80/0x80
    sysfs_kf_write+0x13d/0x1a0
    kernfs_fop_write+0x2bc/0x460
    ? sysfs_kf_bin_read+0x270/0x270
    ? kernfs_notify+0x1f0/0x1f0
    __vfs_write+0x81/0x100
    vfs_write+0x1e1/0x560
    ksys_write+0x126/0x250
    ? __ia32_sys_read+0xb0/0xb0
    ? do_syscall_64+0x1f/0x390
    do_syscall_64+0xc1/0x390
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7fa7caa5e970
    Code: 73 01 c3 48 8b 0d 28 d5 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 99 2d 2c 00 00 75 10 b8 01 00 00 00 04
    RSP: 002b:00007fff6acfdfe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fa7caa5e970
    RDX: 0000000000000005 RSI: 0000000000e95c08 RDI: 0000000000000001
    RBP: 0000000000e95c08 R08: 00007fa7cad1e760 R09: 00007fa7cb36a700
    R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000005
    R13: 0000000000000001 R14: 00007fa7cad1d600 R15: 0000000000000005

    The buggy address belongs to the variable:
    edac_mc_poll_msec+0x0/0x40

    Memory state around the buggy address:
    ffffffffb91b2c00: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
    ffffffffb91b2c80: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
    >ffffffffb91b2d00: 04 fa fa fa fa fa fa fa 04 fa fa fa fa fa fa fa
    ^
    ffffffffb91b2d80: 04 fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
    ffffffffb91b2e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

    Fix it by changing the type of edac_mc_poll_msec to unsigned int.
    The reason why this patch adopts unsigned int rather than unsigned long
    is msecs_to_jiffies() assumes arg to be unsigned int. We can avoid
    integer conversion bugs and unsigned int will be large enough for
    edac_mc_poll_msec.

    Reviewed-by: James Morse
    Fixes: 9da21b1509d8 ("EDAC: Poll timeout cannot be zero, p2")
    Signed-off-by: Eiichi Tsukata
    Signed-off-by: Tony Luck

    Eiichi Tsukata
     

21 Jun, 2019

2 commits

  • Do put_device() if device_add() fails.

    [ bp: do device_del() for the successfully created devices in
    edac_create_csrow_objects(), on the unwind path. ]

    Signed-off-by: Greg KH
    Signed-off-by: Borislav Petkov
    Link: https://lkml.kernel.org/r/20190427214925.GE16338@kroah.com

    Greg KH
     
  • In edac_create_csrow_object(), the reference to the object is not
    released when adding the device to the device hierarchy fails
    (device_add()). This may result in a memory leak.

    Signed-off-by: Pan Bian
    Signed-off-by: Borislav Petkov
    Reviewed-by: Greg Kroah-Hartman
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/1555554438-103953-1-git-send-email-bianpan2016@163.com

    Pan Bian
     

14 Nov, 2018

2 commits

  • ... and use the single edac_subsys object returned from
    subsys_system_register(). The idea is to have a single bus
    and multiple devices on it.

    Signed-off-by: Borislav Petkov
    Acked-by: Mauro Carvalho Chehab
    CC: Aristeu Rozanski Filho
    CC: Greg KH
    CC: Justin Ernst
    CC: linux-edac
    CC: Mauro Carvalho Chehab
    CC: Russ Anderson
    Cc: Tony Luck
    Link: https://lkml.kernel.org/r/20180926152752.GG5584@zn.tnic

    Borislav Petkov
     
  • Nobody(*) uses them. Dropping this will allow us to make the total
    number of memory controllers configurable (as we won't have to worry
    about duplicated device names under this directory).

    (*) https://lkml.kernel.org/r/20180927221054.580220e5@coco.lan

    Signed-off-by: Tony Luck
    Signed-off-by: Borislav Petkov
    Acked-by: Mauro Carvalho Chehab
    CC: Aristeu Rozanski Filho
    CC: Greg KH
    CC: Justin Ernst
    CC: Mauro Carvalho Chehab
    CC: Russ Anderson
    CC: linux-edac
    Link: http://lkml.kernel.org/r/20181001224313.GA9487@agluck-desk

    Tony Luck
     

17 Jun, 2018

1 commit

  • Make sure to use put_device() to free the initialised struct device so
    that resources managed by driver core also gets released in the event of
    a registration failure.

    Signed-off-by: Johan Hovold
    Cc: Denis Kirjanov
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Fixes: 2d56b109e3a5 ("EDAC: Handle error path in edac_mc_sysfs_init() properly")
    Link: http://lkml.kernel.org/r/20180612124335.6420-1-johan@kernel.org
    Signed-off-by: Borislav Petkov

    Johan Hovold
     

14 Mar, 2018

1 commit

  • Somehow we ended up with two separate arrays of strings to describe the
    "enum mem_type" values.

    In edac_mc.c we have an exported list edac_mem_types[] that is used
    by a couple of drivers in debug messaged.

    In edac_mc_sysfs.c we have a private list that is used to display
    values in:
    /sys/devices/system/edac/mc/mc*/dimm*/dimm_mem_type
    /sys/devices/system/edac/mc/mc*/csrow*/mem_type

    This list was missing a value for MEM_LRDDR3.

    The string values in the two lists were different :-(

    Combining the lists, I kept the values so that the sysfs output
    will be unchanged as some scripts may depend on that.

    Reported-by: Borislav Petkov
    Acked-by: Borislav Petkov
    Signed-off-by: Tony Luck
    Cc: "Rafael J. Wysocki"
    Cc: Aristeu Rozanski
    Cc: Dan Williams
    Cc: Jean Delvare
    Cc: Len Brown
    Cc: Mauro Carvalho Chehab
    Cc: Qiuxu Zhuo
    Cc: linux-acpi@vger.kernel.org
    Cc: linux-edac
    Cc: linux-nvdimm@lists.01.org
    Link: http://lkml.kernel.org/r/20180312182430.10335-2-tony.luck@intel.com
    Signed-off-by: Borislav Petkov

    Tony Luck
     

31 Oct, 2017

1 commit

  • Several function prototypes for the set/get functions defined by
    module_param_call() have a slightly wrong argument types. This fixes
    those in an effort to clean up the calls when running under type-enforced
    compiler instrumentation for CFI. This is the result of running the
    following semantic patch:

    @match_module_param_call_function@
    declarer name module_param_call;
    identifier _name, _set_func, _get_func;
    expression _arg, _mode;
    @@

    module_param_call(_name, _set_func, _get_func, _arg, _mode);

    @fix_set_prototype
    depends on match_module_param_call_function@
    identifier match_module_param_call_function._set_func;
    identifier _val, _param;
    type _val_type, _param_type;
    @@

    int _set_func(
    -_val_type _val
    +const char * _val
    ,
    -_param_type _param
    +const struct kernel_param * _param
    ) { ... }

    @fix_get_prototype
    depends on match_module_param_call_function@
    identifier match_module_param_call_function._get_func;
    identifier _val, _param;
    type _val_type, _param_type;
    @@

    int _get_func(
    -_val_type _val
    +char * _val
    ,
    -_param_type _param
    +const struct kernel_param * _param
    ) { ... }

    Two additional by-hand changes are included for places where the above
    Coccinelle script didn't notice them:

    drivers/platform/x86/thinkpad_acpi.c
    fs/lockd/svc.c

    Signed-off-by: Kees Cook
    Signed-off-by: Jessica Yu

    Kees Cook
     

20 Aug, 2017

1 commit

  • Make these const as they are only stored in the type field of a device
    structure, which is const.

    Done using Coccinelle.

    Signed-off-by: Bhumika Goyal
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1503130946-2854-2-git-send-email-bhumirks@gmail.com
    Signed-off-by: Borislav Petkov

    Bhumika Goyal
     

17 Jul, 2017

1 commit

  • attribute_groups are not supposed to change at runtime. All functions
    working with attribute_groups provided by work with
    const attribute_group. So mark the non-const structs as const.

    Signed-off-by: Arvind Yadav
    CC: linux-edac@vger.kernel.org
    Link: http://lkml.kernel.org/r/776cb8265509054abd01b0b551624cc0da3b88e7.1499078335.git.arvind.yadav.cs@gmail.com
    Signed-off-by: Borislav Petkov

    Arvind Yadav
     

19 Jan, 2017

1 commit

  • The old csrowX sysfs directories have per-csrow error counters, but the
    new dimmX directories do not currently expose error counts.

    EDAC already keeps these counts, add them to sysfs so per-DIMM counts
    are still available when CONFIG_EDAC_LEGACY_SYSFS=n.

    Signed-off-by: Aaron Miller
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/20161103220153.3997328-1-aaronmiller@fb.com
    Signed-off-by: Borislav Petkov

    Aaron Miller
     

06 Jan, 2017

1 commit

  • The dev_attr_sdram_scrub_rate is not declared in a header or used
    anywhere else, so make it static to fix the following warning:

    drivers/edac/edac_mc_sysfs.c:816:1: warning: symbol
    'dev_attr_sdram_scrub_rate' was not declared. Should it be static?

    Signed-off-by: Ben Dooks
    Reviewed-by: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1465407356-7357-1-git-send-email-ben.dooks@codethink.co.uk
    Signed-off-by: Borislav Petkov

    Ben Dooks
     

15 Dec, 2016

1 commit


16 Jun, 2016

1 commit

  • c44696fff04f ("EDAC: Remove arbitrary limit on number of channels")
    lifted the arbitrary limit on memory controller channels in EDAC.
    However, the dynamic channel attributes dynamic_csrow_dimm_attr and
    dynamic_csrow_ce_count_attr remained 6.

    This wasn't a problem except channels 6 and 7 weren't visible in sysfs
    on machines with more than 6 channels after the conversion to static
    attr groups with

    2c1946b6d629 ("EDAC: Use static attribute groups for managing sysfs entries")

    [ without that, we're exploding in edac_create_sysfs_mci_device()
    because we're dereferencing out of the bounds of the
    dynamic_csrow_dimm_attr array. ]

    Add attributes for channels 6 and 7 along with a guard for the
    future, should more channels be required and/or to sanity check for
    misconfigured machines.

    We still need to check against the number of channels present on the MC
    first, as Thor reported.

    Signed-off-by: Borislav Petkov
    Reported-by: Hironobu Ishii
    Tested-by: Thor Thayer
    Cc: # 4.2

    Borislav Petkov
     

23 Apr, 2016

1 commit

  • Code flow looks like this:

    device_unregister(&mci->dev);
    -> kobject_put+0x25/0x50
    -> kobject_cleanup+0x77/0x190
    -> device_release+0x32/0xa0
    -> mci_attr_release+0x36/0x70
    -> kfree(mci);
    bus_unregister(mci->bus);

    Fix is to grab a local copy of "mci->bus" and use that when we call
    bus_unregister().

    Signed-off-by: Tony Luck
    Acked-by: Aristeu Rozanski
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/21d595b0ab3d718d9cb206647f4ec91c05e62ec4.1461261078.git.tony.luck@intel.com
    Signed-off-by: Borislav Petkov

    Tony Luck
     

11 Dec, 2015

3 commits

  • It cannot fail now. We either load EDAC core after having successfully
    initialized edac_subsys or we don't.

    Signed-off-by: Borislav Petkov

    Borislav Petkov
     
  • This was really dumb - reference counting for the main EDAC sysfs
    object. While we could've simply registered it as the first thing in the
    module init path and then hand it around to what needs it.

    Do that and rip out all the code around it, thus simplifying the whole
    handling significantly.

    Move the edac_subsys node back to edac_module.c.

    Signed-off-by: Borislav Petkov

    Borislav Petkov
     
  • I get the splat below when modprobing/rmmoding EDAC drivers. It happens
    because bus->name is invalid after bus_unregister() has run. The Code: section
    below corresponds to:

    .loc 1 1108 0
    movq 672(%rbx), %rax # mci_1(D)->bus, mci_1(D)->bus
    .loc 1 1109 0
    popq %rbx #

    .loc 1 1108 0
    movq (%rax), %rdi # _7->name,
    jmp kfree #

    and %rax has some funky stuff 2030203020312030 which looks a lot like
    something walked over it.

    Fix that by saving the name ptr before doing stuff to string it points to.

    general protection fault: 0000 [#1] SMP
    Modules linked in: ...
    CPU: 4 PID: 10318 Comm: modprobe Tainted: G I EN 3.12.51-11-default+ #48
    Hardware name: HP ProLiant DL380 G7, BIOS P67 05/05/2011
    task: ffff880311320280 ti: ffff88030da3e000 task.ti: ffff88030da3e000
    RIP: 0010:[] [] edac_unregister_sysfs+0x22/0x30 [edac_core]
    RSP: 0018:ffff88030da3fe28 EFLAGS: 00010292
    RAX: 2030203020312030 RBX: ffff880311b4e000 RCX: 000000000000095c
    RDX: 0000000000000001 RSI: ffff880327bb9600 RDI: 0000000000000286
    RBP: ffff880311b4e750 R08: 0000000000000000 R09: ffffffff81296110
    R10: 0000000000000400 R11: 0000000000000000 R12: ffff88030ba1ac68
    R13: 0000000000000001 R14: 00000000011b02f0 R15: 0000000000000000
    FS: 00007fc9bf8f5700(0000) GS:ffff8801a7c40000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000403c90 CR3: 000000019ebdf000 CR4: 00000000000007e0
    Stack:
    Call Trace:
    i7core_unregister_mci.isra.9
    i7core_remove
    pci_device_remove
    __device_release_driver
    driver_detach
    bus_remove_driver
    pci_unregister_driver
    i7core_exit
    SyS_delete_module
    system_call_fastpath
    0x7fc9bf426536
    Code: 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 48 89 fb e8 52 2a 1f e1 48 8b bb a0 02 00 00 e8 46 59 1f e1 48 8b 83 a0 02 00 00 5b 8b 38 e9 26 9a fe e0 66 0f 1f 44 00 00 66 66 66 66 90 48 8b
    RIP [] edac_unregister_sysfs+0x22/0x30 [edac_core]
    RSP

    Signed-off-by: Borislav Petkov
    Cc: Mauro Carvalho Chehab
    Cc: # v3.6..
    Fixes: 7a623c039075 ("edac: rewrite the sysfs code to use struct device")

    Borislav Petkov
     

15 Oct, 2015

1 commit

  • debugfs_remove() is used to remove a file or a directory from the
    debugfs filesystem, but mci->debugfs might not empty.

    This can be triggered by the following sequence:

    1) Enable CONFIG_EDAC_DEBUG
    2) insmod an EDAC module (like i3000_edac or similar)
    3) rmmod this module
    4) we can see files remaining under /edac/ like
    "fake_inject", for example.

    Removing edac_core then, causes a NULL pointer dereference.

    Reported-by: Yun Wu (Abel)
    Signed-off-by: Tan Xiaojun
    Cc: Doug Thompson
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1444787364-104353-1-git-send-email-tanxiaojun@huawei.com
    Signed-off-by: Borislav Petkov

    Tan Xiaojun
     

28 Sep, 2015

1 commit

  • Updating dimm_label to an empty string does not make much sense. Change
    the sysfs dimm_label store operation to fail a request when an input
    string is empty.

    Suggested-by: Borislav Petkov
    Signed-off-by: Toshi Kani
    Cc: elliott@hpe.com
    Cc: Mauro Carvalho Chehab
    Cc: Tony Luck
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1443124767.25474.172.camel@hpe.com
    Signed-off-by: Borislav Petkov

    Toshi Kani
     

26 Sep, 2015

2 commits

  • Sysfs "dimm_label" and "chX_dimm_label" nodes have the following issues
    in their store operation:

    1) A newline-terminated input string causes redundant newlines:

    # echo "test" > /sys/bus/mc0/devices/dimm0/dimm_label
    # cat /sys/bus/mc0/devices/dimm0/dimm_label
    test

    # od -bc /sys/bus/mc0/devices/dimm0/dimm_label
    0000000 164 145 163 164 012 012
    t e s t \n \n
    0000006

    2) The original label string (31 characters) cannot be stored due to
    an improper size check:

    # echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0" > /sys/bus/mc0/devices/dimm0/dimm_label
    # cat /sys/bus/mc0/devices/dimm0/dimm_label

    # od -bc /sys/bus/mc0/devices/dimm0/dimm_label
    0000000 012 012
    \n \n
    0000002

    3) An input string longer than the buffer size results a wrong label
    info as it allows a retry with the remaining string:

    # echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0_TEST" > /sys/bus/mc0/devices/dimm0/dimm_label
    # cat /sys/bus/mc0/devices/dimm0/dimm_label
    _TEST

    Fix these issues by making the following changes:
    1) Replace a newline character at the end by setting a null. It also
    assures that the string is null-terminated in the label buffer.
    2) Check the label buffer size with 'sizeof(dimm->label)'.
    3) Fail a request if its string exceeds the label buffer size.

    Signed-off-by: Toshi Kani
    Acked-by: Tony Luck
    Cc: linux-edac
    Cc: Mauro Carvalho Chehab
    Cc: Robert Elliott
    Link: http://lkml.kernel.org/r/1443121564.25474.160.camel@hpe.com
    Signed-off-by: Borislav Petkov

    Toshi Kani
     
  • After

    7d375bffa524 ("sb_edac: Fix support for systems with two home agents per socket")

    sysfs "dimm_label" and "chX_dimm_label" show their label string without a
    newline "\n" at the end.

    [root@orange ~]# cat /sys/bus/mc0/devices/dimm0/dimm_label
    CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]#

    [root@orange ~]# cat /sys/devices/system/edac/mc/mc0/csrow0/ch0_dimm_label
    CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]#

    The label strings now have 31 characters, which are the same as
    EDAC_MC_LABEL_LEN. Since the snprintf()s in channel_dimm_label_show()
    and dimmdev_label_show() limit the whole length by EDAC_MC_LABEL_LEN,
    the newline in the format "%s\n" is ignored.

    [root@orange ~]# od -bc /sys/bus/mc0/devices/dimm0/dimm_label
    0000000 103 120 125 137 123 162 143 111 104 043 060 137 110 141 043 060
    C P U _ S r c I D # 0 _ H a # 0
    0000020 137 103 150 141 156 043 060 137 104 111 115 115 043 060 000
    _ C h a n # 0 _ D I M M # 0 \0
    0000037

    Fix it by using 'sizeof(dimm->label) + 1' as the whole length in the
    snprintf()s in channel_dimm_label_show() and dimmdev_label_show().

    Reported-by: Robert Elliott
    Signed-off-by: Toshi Kani
    Acked-by: Tony Luck
    Cc: linux-edac
    Cc: Mauro Carvalho Chehab
    Link: http://lkml.kernel.org/r/1442933883-21587-2-git-send-email-toshi.kani@hpe.com
    Signed-off-by: Borislav Petkov

    Toshi Kani
     

22 Sep, 2015

1 commit


03 Jun, 2015

1 commit

  • Currently set to "6", but the reset of the code will dynamically
    allocate as needed. We need to go to "8" today, but drop the check
    completely to save doing this again when we need even larger numbers.

    Signed-off-by: Tony Luck
    Acked-by: Aristeu Rozanski
    Signed-off-by: Mauro Carvalho Chehab

    Tony Luck
     

23 Feb, 2015

3 commits

  • edac_init() does not deallocate already allocated resources on failure
    path.

    Found by Linux Driver Verification project (linuxtesting.org).

    [ Boris: The unwind path functions have __exit annotation but are being
    used in an __init function, leading to section mismatches. Drop the
    section annotation and make them normal functions. ]

    Signed-off-by: Alexey Khoroshilov
    Link: http://lkml.kernel.org/r/1423203162-26368-1-git-send-email-khoroshilov@ispras.ru
    Signed-off-by: Borislav Petkov

    Alexey Khoroshilov
     
  • Add edac_mc_add_mc_with_groups() for initializing the mem_ctl_info
    object with the optional attribute groups. This allows drivers to
    pass additional sysfs entries without manual (and racy)
    device_create_file() and co calls.

    edac_mc_add_mc() is kept as is, just calling edac_mc_add_with_groups()
    with NULL groups.

    Signed-off-by: Takashi Iwai
    Link: http://lkml.kernel.org/r/1423046938-18111-3-git-send-email-tiwai@suse.de
    Signed-off-by: Borislav Petkov

    Takashi Iwai
     
  • Instead of manual calls of device_create_file() and
    device_remove_file(), use static attribute groups with proper
    is_visible callbacks for managing the sysfs entries.

    This simplifies the code a lot and avoids the possible races.

    Signed-off-by: Takashi Iwai
    Link: http://lkml.kernel.org/r/1423046938-18111-2-git-send-email-tiwai@suse.de
    Signed-off-by: Borislav Petkov

    Takashi Iwai
     

30 Jan, 2015

2 commits


02 Dec, 2014

1 commit