18 Sep, 2020
1 commit
-
Reading those sysfs entries gives:
[root@localhost /]# cat /sys/devices/system/edac/mc/mc0/max_location
memory 3 [root@localhost /]# cat /sys/devices/system/edac/mc/mc0/dimm0/dimm_location
memory 0 [root@localhost /]#Add newlines after the value it prints for better readability.
[ bp: Make len a signed int and change the check to catch wraparound.
Increment the pointer p only when the length check passes. Use
scnprintf(). ]Signed-off-by: Xiongfeng Wang
Signed-off-by: Borislav Petkov
Link: https://lkml.kernel.org/r/1600051734-8993-1-git-send-email-wangxiongfeng2@huawei.com
17 Feb, 2020
2 commits
-
Looking at how mci->{ue,ce}_per_layer[EDAC_MAX_LAYERS] is used, it
turns out that only the leaves in the memory hierarchy are consumed
(in sysfs), but not the intermediate layers, e.g.:count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][dimm->idx];
These unused counters only add complexity, remove them. The error
counter values are directly stored in struct dimm_info now.Signed-off-by: Robert Richter
Signed-off-by: Borislav Petkov
Acked-by: Aristeu Rozanski
Link: https://lkml.kernel.org/r/20200123090210.26933-11-rrichter@marvell.com -
There are dimm and csrow devices linked to the mci device esp. to show
up in sysfs. It must be granted that children devices are removed before
its mci parent. Thus, the release functions must be called in the
correct order and may not miss any child before releasing its parent. In
the current implementation this is only granted by the correct order of
release functions.A much better approach is to use put_device() that releases the device
only after all users are gone. It is the recommended way to release a
device and free its memory. The function uses the device's refcount and
only frees it if there are no users of it anymore such as children.So implement a mci_release() function to remove mci devices, use
put_device() to free them and early initialize the mci device right
after its struct has been allocated.Change the release function so that it can be universally used no
matter if the device is registered or not. Since subsequent dimm
and csrow sysfs links are implemented as children devices, their
refcounts will keep the parent mci device from being removed as long
as sysfs entries exist and until all users have been unregistered in
edac_remove_sysfs_mci_device().Remove edac_unregister_sysfs() and merge mci sysfs removal into
edac_remove_sysfs_mci_device(). There is only a single instance now that
removes the sysfs entries. The function can now be used in the error
paths for cleanup.Also, create device release functions for all involved devices
(dev->release), remove device_type release functions (dev_type->
release) and also use dev->init_name instead of dev_set_name().[ bp: Massage commit message and comments. ]
Signed-off-by: Robert Richter
Signed-off-by: Borislav Petkov
Acked-by: Aristeu Rozanski
Link: https://lkml.kernel.org/r/20200212120340.4764-5-rrichter@marvell.com
13 Feb, 2020
2 commits
-
All created csrow objects must be removed in the error path of
edac_create_csrow_objects(). The objects have been added as devices.They need to be removed by doing a device_del() *and* put_device() call
to also free their memory. The missing put_device() leaves a memory
leak. Use device_unregister() instead of device_del() which properly
unregisters the device doing both.Fixes: 7adc05d2dc3a ("EDAC/sysfs: Drop device references properly")
Signed-off-by: Robert Richter
Signed-off-by: Borislav Petkov
Tested-by: John Garry
Cc:
Link: https://lkml.kernel.org/r/20200212120340.4764-4-rrichter@marvell.com -
A test kernel with the options DEBUG_TEST_DRIVER_REMOVE, KASAN and
DEBUG_KMEMLEAK set, revealed several issues when removing an mci device:1) Use-after-free:
On 27.11.19 17:07:33, John Garry wrote:
> [ 22.104498] BUG: KASAN: use-after-free in
> edac_remove_sysfs_mci_device+0x148/0x180The use-after-free is caused by the mci_for_each_dimm() macro called in
edac_remove_sysfs_mci_device(). The iterator was introduced withc498afaf7df8 ("EDAC: Introduce an mci_for_each_dimm() iterator").
The iterator loop calls device_unregister(&dimm->dev), which removes
the sysfs entry of the device, but also frees the dimm struct in
dimm_attr_release(). When incrementing the loop in mci_for_each_dimm(),
the dimm struct is accessed again, after having been freed already.The fix is to free all the mci device's subsequent dimm and csrow
objects at a later point, in _edac_mc_free(), when the mci device itself
is being freed.This keeps the data structures intact and the mci device can be
fully used until its removal. The change allows the safe usage of
mci_for_each_dimm() to release dimm devices from sysfs.2) Memory leaks:
Following memory leaks have been detected:
# grep edac /sys/kernel/debug/kmemleak | sort | uniq -c
1 [] edac_mc_alloc+0x3bc/0x9d0 # mci->csrows
16 [] edac_mc_alloc+0x49c/0x9d0 # csr->channels
16 [] edac_mc_alloc+0x518/0x9d0 # csr->channels[chn]
1 [] edac_mc_alloc+0x5c8/0x9d0 # mci->dimms
34 [] ghes_edac_register+0x1c8/0x3f8 # see edac_mc_alloc()All leaks are from memory allocated by edac_mc_alloc().
Note: The test above shows that edac_mc_alloc() was called here from
ghes_edac_register(), thus both functions show up in the stack trace
but the module causing the leaks is edac_mc. The comments with the data
structures involved were made manually by analyzing the objdump.The data structures listed above and created by edac_mc_alloc() are
not properly removed during device removal, which is done in
edac_mc_free().There are two paths implemented to remove the device depending on device
registration, _edac_mc_free() is called if the device is not registered
and edac_unregister_sysfs() otherwise.The implemenations differ. For the sysfs case, the mci device removal
lacks the removal of subsequent data structures (csrows, channels,
dimms). This causes the memory leaks (see mci_attr_release()).[ bp: Massage commit message. ]
Fixes: c498afaf7df8 ("EDAC: Introduce an mci_for_each_dimm() iterator")
Fixes: faa2ad09c01c ("edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.")
Fixes: 7a623c039075 ("edac: rewrite the sysfs code to use struct device")
Reported-by: John Garry
Signed-off-by: Robert Richter
Signed-off-by: Borislav Petkov
Tested-by: John Garry
Cc:
Link: https://lkml.kernel.org/r/20200212120340.4764-3-rrichter@marvell.com
10 Nov, 2019
1 commit
-
Introduce an mci_for_each_dimm() iterator. It returns a pointer to
a struct dimm_info. This makes the declaration and use of an index
obsolete and avoids access to internal data of struct mci (direct array
access etc).[ bp: push the struct dimm_info *dimm; declaration into the
CONFIG_EDAC_DEBUG block. ]Signed-off-by: Robert Richter
Signed-off-by: Borislav Petkov
Reviewed-by: Mauro Carvalho Chehab
Cc: "linux-edac@vger.kernel.org"
Cc: James Morse
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20191106093239.25517-4-rrichter@marvell.com
09 Nov, 2019
1 commit
-
The EDAC_DIMM_OFF() macro takes 5 arguments to get the DIMM's index.
Simplify this by storing the index in struct dimm_info to avoid its
calculation and remove the EDAC_DIMM_OFF() macro. The index can be
directly used then.Another advantage is that edac_mc_alloc() could be used even if the
exact size of the layers is unknown. Only the number of DIMMs would be
needed.Rename iterator variable to idx, while at it. The name is more handy,
esp. when searching for it in the code.Signed-off-by: Robert Richter
Signed-off-by: Borislav Petkov
Reviewed-by: Mauro Carvalho Chehab
Cc: "linux-edac@vger.kernel.org"
Cc: James Morse
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20191106093239.25517-3-rrichter@marvell.com
04 Sep, 2019
3 commits
-
Debug messages are inconsistently used in the error handlers. Some lack
an error message, some are called regardless of the return status,
messages for the same error are at different locations in the code
depending on the error code. This happens esp. near put_device() calls.Make those debug messages more consistent. Additionally, unify the error
messages to have the same terms for the same operations of the device.Signed-off-by: Robert Richter
Signed-off-by: Borislav Petkov
Reviewed-by: Mauro Carvalho Chehab
Cc: "linux-edac@vger.kernel.org"
Cc: James Morse
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20190902123216.9809-5-rrichter@marvell.com -
Use direct returns instead of gotos. Error handling code becomes
smaller and better readable.Signed-off-by: Robert Richter
Signed-off-by: Borislav Petkov
Reviewed-by: Mauro Carvalho Chehab
Cc: "linux-edac@vger.kernel.org"
Cc: James Morse
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20190902123216.9809-4-rrichter@marvell.com -
Use of 'unsigned int' instead of bare use of 'unsigned'. Fix this for
edac_mc*, ghes and the i5100 driver as reported by checkpatch.pl.While at it, struct member dev_ch_attribute->channel is always used as
unsigned int. Change type to unsigned int to avoid type casts.[ bp: Massage. ]
Signed-off-by: Robert Richter
Signed-off-by: Borislav Petkov
Reviewed-by: Mauro Carvalho Chehab
Cc: "linux-edac@vger.kernel.org"
Cc: James Morse
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20190902123216.9809-2-rrichter@marvell.com
28 Jun, 2019
1 commit
-
Commit 9da21b1509d8 ("EDAC: Poll timeout cannot be zero, p2") assumes
edac_mc_poll_msec to be unsigned long, but the type of the variable still
remained as int. Setting edac_mc_poll_msec can trigger out-of-bounds
write.Reproducer:
# echo 1001 > /sys/module/edac_core/parameters/edac_mc_poll_msec
KASAN report:
BUG: KASAN: global-out-of-bounds in edac_set_poll_msec+0x140/0x150
Write of size 8 at addr ffffffffb91b2d00 by task bash/1996CPU: 1 PID: 1996 Comm: bash Not tainted 5.2.0-rc6+ #23
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
Call Trace:
dump_stack+0xca/0x13e
print_address_description.cold+0x5/0x246
__kasan_report.cold+0x75/0x9a
? edac_set_poll_msec+0x140/0x150
kasan_report+0xe/0x20
edac_set_poll_msec+0x140/0x150
? dimmdev_location_show+0x30/0x30
? vfs_lock_file+0xe0/0xe0
? _raw_spin_lock+0x87/0xe0
param_attr_store+0x1b5/0x310
? param_array_set+0x4f0/0x4f0
module_attr_store+0x58/0x80
? module_attr_show+0x80/0x80
sysfs_kf_write+0x13d/0x1a0
kernfs_fop_write+0x2bc/0x460
? sysfs_kf_bin_read+0x270/0x270
? kernfs_notify+0x1f0/0x1f0
__vfs_write+0x81/0x100
vfs_write+0x1e1/0x560
ksys_write+0x126/0x250
? __ia32_sys_read+0xb0/0xb0
? do_syscall_64+0x1f/0x390
do_syscall_64+0xc1/0x390
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fa7caa5e970
Code: 73 01 c3 48 8b 0d 28 d5 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 99 2d 2c 00 00 75 10 b8 01 00 00 00 04
RSP: 002b:00007fff6acfdfe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fa7caa5e970
RDX: 0000000000000005 RSI: 0000000000e95c08 RDI: 0000000000000001
RBP: 0000000000e95c08 R08: 00007fa7cad1e760 R09: 00007fa7cb36a700
R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000005
R13: 0000000000000001 R14: 00007fa7cad1d600 R15: 0000000000000005The buggy address belongs to the variable:
edac_mc_poll_msec+0x0/0x40Memory state around the buggy address:
ffffffffb91b2c00: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
ffffffffb91b2c80: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
>ffffffffb91b2d00: 04 fa fa fa fa fa fa fa 04 fa fa fa fa fa fa fa
^
ffffffffb91b2d80: 04 fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
ffffffffb91b2e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00Fix it by changing the type of edac_mc_poll_msec to unsigned int.
The reason why this patch adopts unsigned int rather than unsigned long
is msecs_to_jiffies() assumes arg to be unsigned int. We can avoid
integer conversion bugs and unsigned int will be large enough for
edac_mc_poll_msec.Reviewed-by: James Morse
Fixes: 9da21b1509d8 ("EDAC: Poll timeout cannot be zero, p2")
Signed-off-by: Eiichi Tsukata
Signed-off-by: Tony Luck
21 Jun, 2019
2 commits
-
Do put_device() if device_add() fails.
[ bp: do device_del() for the successfully created devices in
edac_create_csrow_objects(), on the unwind path. ]Signed-off-by: Greg KH
Signed-off-by: Borislav Petkov
Link: https://lkml.kernel.org/r/20190427214925.GE16338@kroah.com -
In edac_create_csrow_object(), the reference to the object is not
released when adding the device to the device hierarchy fails
(device_add()). This may result in a memory leak.Signed-off-by: Pan Bian
Signed-off-by: Borislav Petkov
Reviewed-by: Greg Kroah-Hartman
Cc: James Morse
Cc: Mauro Carvalho Chehab
Cc: linux-edac
Link: https://lkml.kernel.org/r/1555554438-103953-1-git-send-email-bianpan2016@163.com
14 Nov, 2018
2 commits
-
... and use the single edac_subsys object returned from
subsys_system_register(). The idea is to have a single bus
and multiple devices on it.Signed-off-by: Borislav Petkov
Acked-by: Mauro Carvalho Chehab
CC: Aristeu Rozanski Filho
CC: Greg KH
CC: Justin Ernst
CC: linux-edac
CC: Mauro Carvalho Chehab
CC: Russ Anderson
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20180926152752.GG5584@zn.tnic -
Nobody(*) uses them. Dropping this will allow us to make the total
number of memory controllers configurable (as we won't have to worry
about duplicated device names under this directory).(*) https://lkml.kernel.org/r/20180927221054.580220e5@coco.lan
Signed-off-by: Tony Luck
Signed-off-by: Borislav Petkov
Acked-by: Mauro Carvalho Chehab
CC: Aristeu Rozanski Filho
CC: Greg KH
CC: Justin Ernst
CC: Mauro Carvalho Chehab
CC: Russ Anderson
CC: linux-edac
Link: http://lkml.kernel.org/r/20181001224313.GA9487@agluck-desk
17 Jun, 2018
1 commit
-
Make sure to use put_device() to free the initialised struct device so
that resources managed by driver core also gets released in the event of
a registration failure.Signed-off-by: Johan Hovold
Cc: Denis Kirjanov
Cc: Mauro Carvalho Chehab
Cc: linux-edac
Fixes: 2d56b109e3a5 ("EDAC: Handle error path in edac_mc_sysfs_init() properly")
Link: http://lkml.kernel.org/r/20180612124335.6420-1-johan@kernel.org
Signed-off-by: Borislav Petkov
14 Mar, 2018
1 commit
-
Somehow we ended up with two separate arrays of strings to describe the
"enum mem_type" values.In edac_mc.c we have an exported list edac_mem_types[] that is used
by a couple of drivers in debug messaged.In edac_mc_sysfs.c we have a private list that is used to display
values in:
/sys/devices/system/edac/mc/mc*/dimm*/dimm_mem_type
/sys/devices/system/edac/mc/mc*/csrow*/mem_typeThis list was missing a value for MEM_LRDDR3.
The string values in the two lists were different :-(
Combining the lists, I kept the values so that the sysfs output
will be unchanged as some scripts may depend on that.Reported-by: Borislav Petkov
Acked-by: Borislav Petkov
Signed-off-by: Tony Luck
Cc: "Rafael J. Wysocki"
Cc: Aristeu Rozanski
Cc: Dan Williams
Cc: Jean Delvare
Cc: Len Brown
Cc: Mauro Carvalho Chehab
Cc: Qiuxu Zhuo
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/20180312182430.10335-2-tony.luck@intel.com
Signed-off-by: Borislav Petkov
31 Oct, 2017
1 commit
-
Several function prototypes for the set/get functions defined by
module_param_call() have a slightly wrong argument types. This fixes
those in an effort to clean up the calls when running under type-enforced
compiler instrumentation for CFI. This is the result of running the
following semantic patch:@match_module_param_call_function@
declarer name module_param_call;
identifier _name, _set_func, _get_func;
expression _arg, _mode;
@@module_param_call(_name, _set_func, _get_func, _arg, _mode);
@fix_set_prototype
depends on match_module_param_call_function@
identifier match_module_param_call_function._set_func;
identifier _val, _param;
type _val_type, _param_type;
@@int _set_func(
-_val_type _val
+const char * _val
,
-_param_type _param
+const struct kernel_param * _param
) { ... }@fix_get_prototype
depends on match_module_param_call_function@
identifier match_module_param_call_function._get_func;
identifier _val, _param;
type _val_type, _param_type;
@@int _get_func(
-_val_type _val
+char * _val
,
-_param_type _param
+const struct kernel_param * _param
) { ... }Two additional by-hand changes are included for places where the above
Coccinelle script didn't notice them:drivers/platform/x86/thinkpad_acpi.c
fs/lockd/svc.cSigned-off-by: Kees Cook
Signed-off-by: Jessica Yu
20 Aug, 2017
1 commit
-
Make these const as they are only stored in the type field of a device
structure, which is const.Done using Coccinelle.
Signed-off-by: Bhumika Goyal
Cc: linux-edac
Link: http://lkml.kernel.org/r/1503130946-2854-2-git-send-email-bhumirks@gmail.com
Signed-off-by: Borislav Petkov
17 Jul, 2017
1 commit
-
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by work with
const attribute_group. So mark the non-const structs as const.Signed-off-by: Arvind Yadav
CC: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/776cb8265509054abd01b0b551624cc0da3b88e7.1499078335.git.arvind.yadav.cs@gmail.com
Signed-off-by: Borislav Petkov
19 Jan, 2017
1 commit
-
The old csrowX sysfs directories have per-csrow error counters, but the
new dimmX directories do not currently expose error counts.EDAC already keeps these counts, add them to sysfs so per-DIMM counts
are still available when CONFIG_EDAC_LEGACY_SYSFS=n.Signed-off-by: Aaron Miller
Cc: linux-edac
Link: http://lkml.kernel.org/r/20161103220153.3997328-1-aaronmiller@fb.com
Signed-off-by: Borislav Petkov
06 Jan, 2017
1 commit
-
The dev_attr_sdram_scrub_rate is not declared in a header or used
anywhere else, so make it static to fix the following warning:drivers/edac/edac_mc_sysfs.c:816:1: warning: symbol
'dev_attr_sdram_scrub_rate' was not declared. Should it be static?Signed-off-by: Ben Dooks
Reviewed-by: Mauro Carvalho Chehab
Cc: linux-edac
Link: http://lkml.kernel.org/r/1465407356-7357-1-git-send-email-ben.dooks@codethink.co.uk
Signed-off-by: Borislav Petkov
15 Dec, 2016
1 commit
-
Now, all left at edac_core.h are at drivers/edac/edac_mc.c,
so rename it to edac_mc.h.Signed-off-by: Mauro Carvalho Chehab
16 Jun, 2016
1 commit
-
c44696fff04f ("EDAC: Remove arbitrary limit on number of channels")
lifted the arbitrary limit on memory controller channels in EDAC.
However, the dynamic channel attributes dynamic_csrow_dimm_attr and
dynamic_csrow_ce_count_attr remained 6.This wasn't a problem except channels 6 and 7 weren't visible in sysfs
on machines with more than 6 channels after the conversion to static
attr groups with2c1946b6d629 ("EDAC: Use static attribute groups for managing sysfs entries")
[ without that, we're exploding in edac_create_sysfs_mci_device()
because we're dereferencing out of the bounds of the
dynamic_csrow_dimm_attr array. ]Add attributes for channels 6 and 7 along with a guard for the
future, should more channels be required and/or to sanity check for
misconfigured machines.We still need to check against the number of channels present on the MC
first, as Thor reported.Signed-off-by: Borislav Petkov
Reported-by: Hironobu Ishii
Tested-by: Thor Thayer
Cc: # 4.2
23 Apr, 2016
1 commit
-
Code flow looks like this:
device_unregister(&mci->dev);
-> kobject_put+0x25/0x50
-> kobject_cleanup+0x77/0x190
-> device_release+0x32/0xa0
-> mci_attr_release+0x36/0x70
-> kfree(mci);
bus_unregister(mci->bus);Fix is to grab a local copy of "mci->bus" and use that when we call
bus_unregister().Signed-off-by: Tony Luck
Acked-by: Aristeu Rozanski
Cc: Mauro Carvalho Chehab
Cc: linux-edac
Link: http://lkml.kernel.org/r/21d595b0ab3d718d9cb206647f4ec91c05e62ec4.1461261078.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov
11 Dec, 2015
3 commits
-
It cannot fail now. We either load EDAC core after having successfully
initialized edac_subsys or we don't.Signed-off-by: Borislav Petkov
-
This was really dumb - reference counting for the main EDAC sysfs
object. While we could've simply registered it as the first thing in the
module init path and then hand it around to what needs it.Do that and rip out all the code around it, thus simplifying the whole
handling significantly.Move the edac_subsys node back to edac_module.c.
Signed-off-by: Borislav Petkov
-
I get the splat below when modprobing/rmmoding EDAC drivers. It happens
because bus->name is invalid after bus_unregister() has run. The Code: section
below corresponds to:.loc 1 1108 0
movq 672(%rbx), %rax # mci_1(D)->bus, mci_1(D)->bus
.loc 1 1109 0
popq %rbx #.loc 1 1108 0
movq (%rax), %rdi # _7->name,
jmp kfree #and %rax has some funky stuff 2030203020312030 which looks a lot like
something walked over it.Fix that by saving the name ptr before doing stuff to string it points to.
general protection fault: 0000 [#1] SMP
Modules linked in: ...
CPU: 4 PID: 10318 Comm: modprobe Tainted: G I EN 3.12.51-11-default+ #48
Hardware name: HP ProLiant DL380 G7, BIOS P67 05/05/2011
task: ffff880311320280 ti: ffff88030da3e000 task.ti: ffff88030da3e000
RIP: 0010:[] [] edac_unregister_sysfs+0x22/0x30 [edac_core]
RSP: 0018:ffff88030da3fe28 EFLAGS: 00010292
RAX: 2030203020312030 RBX: ffff880311b4e000 RCX: 000000000000095c
RDX: 0000000000000001 RSI: ffff880327bb9600 RDI: 0000000000000286
RBP: ffff880311b4e750 R08: 0000000000000000 R09: ffffffff81296110
R10: 0000000000000400 R11: 0000000000000000 R12: ffff88030ba1ac68
R13: 0000000000000001 R14: 00000000011b02f0 R15: 0000000000000000
FS: 00007fc9bf8f5700(0000) GS:ffff8801a7c40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000403c90 CR3: 000000019ebdf000 CR4: 00000000000007e0
Stack:
Call Trace:
i7core_unregister_mci.isra.9
i7core_remove
pci_device_remove
__device_release_driver
driver_detach
bus_remove_driver
pci_unregister_driver
i7core_exit
SyS_delete_module
system_call_fastpath
0x7fc9bf426536
Code: 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 48 89 fb e8 52 2a 1f e1 48 8b bb a0 02 00 00 e8 46 59 1f e1 48 8b 83 a0 02 00 00 5b 8b 38 e9 26 9a fe e0 66 0f 1f 44 00 00 66 66 66 66 90 48 8b
RIP [] edac_unregister_sysfs+0x22/0x30 [edac_core]
RSPSigned-off-by: Borislav Petkov
Cc: Mauro Carvalho Chehab
Cc: # v3.6..
Fixes: 7a623c039075 ("edac: rewrite the sysfs code to use struct device")
15 Oct, 2015
1 commit
-
debugfs_remove() is used to remove a file or a directory from the
debugfs filesystem, but mci->debugfs might not empty.This can be triggered by the following sequence:
1) Enable CONFIG_EDAC_DEBUG
2) insmod an EDAC module (like i3000_edac or similar)
3) rmmod this module
4) we can see files remaining under /edac/ like
"fake_inject", for example.Removing edac_core then, causes a NULL pointer dereference.
Reported-by: Yun Wu (Abel)
Signed-off-by: Tan Xiaojun
Cc: Doug Thompson
Cc: Mauro Carvalho Chehab
Cc: linux-edac
Link: http://lkml.kernel.org/r/1444787364-104353-1-git-send-email-tanxiaojun@huawei.com
Signed-off-by: Borislav Petkov
28 Sep, 2015
1 commit
-
Updating dimm_label to an empty string does not make much sense. Change
the sysfs dimm_label store operation to fail a request when an input
string is empty.Suggested-by: Borislav Petkov
Signed-off-by: Toshi Kani
Cc: elliott@hpe.com
Cc: Mauro Carvalho Chehab
Cc: Tony Luck
Cc: linux-edac
Link: http://lkml.kernel.org/r/1443124767.25474.172.camel@hpe.com
Signed-off-by: Borislav Petkov
26 Sep, 2015
2 commits
-
Sysfs "dimm_label" and "chX_dimm_label" nodes have the following issues
in their store operation:1) A newline-terminated input string causes redundant newlines:
# echo "test" > /sys/bus/mc0/devices/dimm0/dimm_label
# cat /sys/bus/mc0/devices/dimm0/dimm_label
test# od -bc /sys/bus/mc0/devices/dimm0/dimm_label
0000000 164 145 163 164 012 012
t e s t \n \n
00000062) The original label string (31 characters) cannot be stored due to
an improper size check:# echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0" > /sys/bus/mc0/devices/dimm0/dimm_label
# cat /sys/bus/mc0/devices/dimm0/dimm_label# od -bc /sys/bus/mc0/devices/dimm0/dimm_label
0000000 012 012
\n \n
00000023) An input string longer than the buffer size results a wrong label
info as it allows a retry with the remaining string:# echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0_TEST" > /sys/bus/mc0/devices/dimm0/dimm_label
# cat /sys/bus/mc0/devices/dimm0/dimm_label
_TESTFix these issues by making the following changes:
1) Replace a newline character at the end by setting a null. It also
assures that the string is null-terminated in the label buffer.
2) Check the label buffer size with 'sizeof(dimm->label)'.
3) Fail a request if its string exceeds the label buffer size.Signed-off-by: Toshi Kani
Acked-by: Tony Luck
Cc: linux-edac
Cc: Mauro Carvalho Chehab
Cc: Robert Elliott
Link: http://lkml.kernel.org/r/1443121564.25474.160.camel@hpe.com
Signed-off-by: Borislav Petkov -
After
7d375bffa524 ("sb_edac: Fix support for systems with two home agents per socket")
sysfs "dimm_label" and "chX_dimm_label" show their label string without a
newline "\n" at the end.[root@orange ~]# cat /sys/bus/mc0/devices/dimm0/dimm_label
CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]#[root@orange ~]# cat /sys/devices/system/edac/mc/mc0/csrow0/ch0_dimm_label
CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]#The label strings now have 31 characters, which are the same as
EDAC_MC_LABEL_LEN. Since the snprintf()s in channel_dimm_label_show()
and dimmdev_label_show() limit the whole length by EDAC_MC_LABEL_LEN,
the newline in the format "%s\n" is ignored.[root@orange ~]# od -bc /sys/bus/mc0/devices/dimm0/dimm_label
0000000 103 120 125 137 123 162 143 111 104 043 060 137 110 141 043 060
C P U _ S r c I D # 0 _ H a # 0
0000020 137 103 150 141 156 043 060 137 104 111 115 115 043 060 000
_ C h a n # 0 _ D I M M # 0 \0
0000037Fix it by using 'sizeof(dimm->label) + 1' as the whole length in the
snprintf()s in channel_dimm_label_show() and dimmdev_label_show().Reported-by: Robert Elliott
Signed-off-by: Toshi Kani
Acked-by: Tony Luck
Cc: linux-edac
Cc: Mauro Carvalho Chehab
Link: http://lkml.kernel.org/r/1442933883-21587-2-git-send-email-toshi.kani@hpe.com
Signed-off-by: Borislav Petkov
22 Sep, 2015
1 commit
-
... into a separate compilation unit and drop a couple of
CONFIG_EDAC_DEBUG ifdefferies. Rename edac_create_debug_nodes() to
edac_create_debugfs_nodes(), while at it.No functionality change.
Cc:
Signed-off-by: Borislav Petkov
03 Jun, 2015
1 commit
-
Currently set to "6", but the reset of the code will dynamically
allocate as needed. We need to go to "8" today, but drop the check
completely to save doing this again when we need even larger numbers.Signed-off-by: Tony Luck
Acked-by: Aristeu Rozanski
Signed-off-by: Mauro Carvalho Chehab
23 Feb, 2015
3 commits
-
edac_init() does not deallocate already allocated resources on failure
path.Found by Linux Driver Verification project (linuxtesting.org).
[ Boris: The unwind path functions have __exit annotation but are being
used in an __init function, leading to section mismatches. Drop the
section annotation and make them normal functions. ]Signed-off-by: Alexey Khoroshilov
Link: http://lkml.kernel.org/r/1423203162-26368-1-git-send-email-khoroshilov@ispras.ru
Signed-off-by: Borislav Petkov -
Add edac_mc_add_mc_with_groups() for initializing the mem_ctl_info
object with the optional attribute groups. This allows drivers to
pass additional sysfs entries without manual (and racy)
device_create_file() and co calls.edac_mc_add_mc() is kept as is, just calling edac_mc_add_with_groups()
with NULL groups.Signed-off-by: Takashi Iwai
Link: http://lkml.kernel.org/r/1423046938-18111-3-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov -
Instead of manual calls of device_create_file() and
device_remove_file(), use static attribute groups with proper
is_visible callbacks for managing the sysfs entries.This simplifies the code a lot and avoids the possible races.
Signed-off-by: Takashi Iwai
Link: http://lkml.kernel.org/r/1423046938-18111-2-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov
30 Jan, 2015
2 commits
-
Fix sparse warnings.
Signed-off-by: Borislav Petkov
-
Also use goto labels for all failure paths in
edac_create_sysfs_mci_device and update meaningless labels.Signed-off-by: Junjie Mao
Link: http://lkml.kernel.org/r/BLU436-SMTP25291B6B612942A212AEBFE95300@phx.gbl
[ Boris: Use ! for 0 checks and add newlines for less crammed code. ]
Signed-off-by: Borislav Petkov
02 Dec, 2014
1 commit
-
This prevented edac sysfs code from properly handling 6 channels
per memory controller.Signed-off-by: Jim Snow
Signed-off-by: Lukasz Anaczkowski
Signed-off-by: Mauro Carvalho Chehab