10 Apr, 2017
6 commits
-
Change them to have the edac_ prefix.
No functionality change.
Signed-off-by: Borislav Petkov
-
Move the remaining functionality to edac_mc.c. Convert "edac_report=" to
a module parameter.Signed-off-by: Borislav Petkov
-
... and this happens only when CONFIG_RAS is enabled.
Signed-off-by: Borislav Petkov
-
... as part of moving stuff away from edac_stub.c
Signed-off-by: Borislav Petkov
-
... and the glue around it. It is not needed anymore.
Signed-off-by: Borislav Petkov
-
Use mc_devices list instead to check whether we have EDAC driver
instances successfully registered with EDAC core.Signed-off-by: Borislav Petkov
28 Jan, 2017
1 commit
-
We need to know if any MC devices have been allocated.
Signed-off-by: Yazen Ghannam
Cc: linux-edac
Link: http://lkml.kernel.org/r/1485537863-2707-7-git-send-email-Yazen.Ghannam@amd.com
[ Prettify text. ]
Signed-off-by: Borislav Petkov
25 Dec, 2016
1 commit
-
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
sed -i -e "s!$PATT!#include !" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)to do the replacement at the end of the merge window.
Requested-by: Al Viro
Signed-off-by: Linus Torvalds
15 Dec, 2016
2 commits
-
Several functions are documented at edac_mc.c.
As we'll be including edac_core.h at drivers-api book, move
those, in order for the kernel-doc markups be part of the API
documentation book.Signed-off-by: Mauro Carvalho Chehab
-
Now, all left at edac_core.h are at drivers/edac/edac_mc.c,
so rename it to edac_mc.h.Signed-off-by: Mauro Carvalho Chehab
14 Nov, 2016
1 commit
-
When accessing the mc_devices list of memory controller descriptors, we
need to hold mem_ctls_mutex. This was not always the case, fix that.Make all external callers call a version which grabs the mutex since the
last is local to edac_mc.c.Reported-by: Yazen Ghannam
Signed-off-by: Borislav Petkov
03 Jun, 2016
1 commit
-
After the workqueue cleanup, we're registering workqueues based on
the presence of an ->edac_check function. When that is the case,
we're setting OP_RUNNING_POLL. But we forgot to check that in
edac_mc_reset_delay_period(), leading to:BUG: unable to handle kernel paging request at 0000000000015d10
IP: [ .. ] queued_spin_lock_slowpath
PGD 3ffcc8067 PUD 3ffc56067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: ...
CPU: 1 PID: 2792 Comm: edactest Not tainted 4.6.0-dirty #1
Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013
Stack:
Call Trace:
? _raw_spin_lock_irqsave
? lock_timer_base.isra.34
? del_timer
? try_to_grab_pending
? mod_delayed_work_on
? edac_mc_reset_delay_period
? edac_set_poll_msec
? param_attr_store
? module_attr_store
? kernfs_fop_write
? __vfs_write
? __vfs_read
? __alloc_fd
? vfs_write
? SyS_write
? entry_SYSCALL_64_fastpath
Code:
RIP [ .. ] queued_spin_lock_slowpath
RSP <>
CR2: 0000000000015d10
---[ end trace 3f286bc71cca15d1 ]---
Kernel panic - not syncing: Fatal exceptionFix it.
Signed-off-by: Nicholas Krause
Cc: # 4.5
Cc: Mauro Carvalho Chehab
Cc: linux-edac
Link: http://lkml.kernel.org/r/1463697958-13406-1-git-send-email-xerofoify@gmail.com
[ Rewrite commit message. ]
Signed-off-by: Borislav Petkov
24 Apr, 2016
1 commit
-
Fix typo in edac_inc_ue_error() to increment ue_noinfo_count instead of
ce_noinfo_count.Signed-off-by: Emmanouil Maroudas
Cc: Mauro Carvalho Chehab
Cc: linux-edac
Fixes: 4275be635597 ("edac: Change internal representation to work with layers")
Link: http://lkml.kernel.org/r/1461425580-5898-1-git-send-email-emmanouil.maroudas@gmail.com
Signed-off-by: Borislav Petkov
02 Feb, 2016
3 commits
-
They're both running only when ->edac_check is initialized so remove
that check from the workqueue function itself. Synchronize/generalize
the ->op_state check between the two.Kill useless comments, while at it.
Signed-off-by: Borislav Petkov
-
We have the generic wrappers now, use those. edac_pci_workq_setup() had
an unused argument anyway.Signed-off-by: Borislav Petkov
-
We use the ->edac_check function pointers to determine whether we need
to setup a polling workqueue. However, the destroy path is not balanced
and we might try to teardown an unitialized workqueue.Balance init and destroy paths by looking at ->edac_check in both cases.
Set op_state to OP_OFFLINE *before* destroying anything.Reported-by: Zhiqiang Hou
Cc: Varun Sethi
Signed-off-by: Borislav Petkov
11 Dec, 2015
2 commits
-
Hide the EDAC workqueue pointer in a separate compilation unit and add
accessors for the workqueue manipulations needed.Remove edac_pci_reset_delay_period() which wasn't used by anything. It
seems it got added without a user with91b99041c1d5 ("drivers/edac: updated PCI monitoring")
Signed-off-by: Borislav Petkov
-
EDAC workqueue destruction is really fragile. We cancel delayed work
but if it is still running and requeues itself, we still go ahead and
destroy the workqueue and the queued work explodes when workqueue core
attempts to run it.Make the destruction more robust by switching op_state to offline so
that requeuing stops. Cancel any pending work *synchronously* too.EDAC i7core: Driver loaded.
general protection fault: 0000 [#1] SMP
CPU 12
Modules linked in:
Supported: Yes
Pid: 0, comm: kworker/0:1 Tainted: G IE 3.0.101-0-default #1 HP ProLiant DL380 G7
RIP: 0010:[] [] __queue_work+0x17/0x3f0
< ... regs ...>
Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600)
Stack:
...
Call Trace:
call_timer_fn
run_timer_softirq
__do_softirq
call_softirq
do_softirq
irq_exit
smp_apic_timer_interrupt
apic_timer_interrupt
intel_idle
cpuidle_idle_call
cpu_idle
Code: ...
RIP __queue_work
RSPSigned-off-by: Borislav Petkov
Cc:
23 Oct, 2015
1 commit
-
The PAGES_TO_MiB macro is used for unit conversion but the
trace_mc_event() tracepoint expects a page address. Fix that.Signed-off-by: Tan Xiaojun
Cc: Mauro Carvalho Chehab
Cc: linux-edac
Link: http://lkml.kernel.org/r/1445341538-24271-1-git-send-email-tanxiaojun@huawei.com
Signed-off-by: Borislav Petkov
28 May, 2015
1 commit
-
So first of all, this atomic_scrub() function's naming is bad. It looks
like an atomic_t helper. Change it to edac_atomic_scrub().The bigger problem is that this function is arch-specific and every new
arch which doesn't necessarily need that functionality still needs to
define it, otherwise EDAC doesn't compile.So instead of doing that and including arch-specific headers, have each
arch define an EDAC_ATOMIC_SCRUB symbol which can be used in edac_mc.c
for ifdeffery. Much cleaner.And we already are doing this with another symbol - EDAC_SUPPORT. This
is also much cleaner than having CONFIG_EDAC enumerate all the arches
which need/have EDAC support and drivers.This way I can kill the useless edac.h header in tile too.
Acked-by: Ralf Baechle
Acked-by: Michael Ellerman
Acked-by: Chris Metcalf
Acked-by: Ingo Molnar
Acked-by: Russell King
Cc: Benjamin Herrenschmidt
Cc: Doug Thompson
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: "Maciej W. Rozycki"
Cc: Markos Chandras
Cc: Mauro Carvalho Chehab
Cc: Paul Mackerras
Cc: "Steven J. Hill"
Cc: x86@kernel.org
Signed-off-by: Borislav Petkov
23 Feb, 2015
1 commit
-
Add edac_mc_add_mc_with_groups() for initializing the mem_ctl_info
object with the optional attribute groups. This allows drivers to
pass additional sysfs entries without manual (and racy)
device_create_file() and co calls.edac_mc_add_mc() is kept as is, just calling edac_mc_add_with_groups()
with NULL groups.Signed-off-by: Takashi Iwai
Link: http://lkml.kernel.org/r/1423046938-18111-3-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov
20 Oct, 2014
2 commits
-
Make keeping the sync between the mem_types enum and the actual string
names simpler by using designated initializers.Signed-off-by: Borislav Petkov
-
F15hM60h adds support for DDR4 and DDR3 LRDIMMs. Add them here.
Signed-off-by: Aravind Gopalakrishnan
Link: http://lkml.kernel.org/r/1411070218-10258-1-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Boris: improve comments. ]
Signed-off-by: Borislav Petkov
02 Sep, 2014
1 commit
-
This one got forgotten during an earlier cleanup.
Signed-off-by: Borislav Petkov
24 Jun, 2014
1 commit
-
To avoid confuision and conflict of usage for RAS related trace event,
add an unified RAS trace event stub.Start a RAS subsystem menu which will be fleshed out in time, when more
features get added to it.Signed-off-by: Chen, Gong
Link: http://lkml.kernel.org/r/1402475691-30045-2-git-send-email-gong.chen@linux.intel.com
Signed-off-by: Borislav Petkov
Signed-off-by: Tony Luck
09 May, 2014
1 commit
-
The MC structure field scrub_mode is of integer type - not bit field.
Use it accordingly.Signed-off-by: Loc Ho
Link: http://lkml.kernel.org/r/1399590199-12256-2-git-send-email-lho@apm.com
Signed-off-by: Borislav Petkov
14 Feb, 2014
2 commits
-
We're using edac_mc_workq_setup() both on the init path, when
we load an edac driver and when we change the polling period
(edac_mc_reset_delay_period) through /sys/.../edac_mc_poll_msec.On that second path we don't need to init the workqueue which has been
initialized already.Thanks to Tejun for workqueue insights.
Signed-off-by: Borislav Petkov
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: -
Sanitize code even more to accept unsigned longs only and to not allow
polling intervals below 1 second as this is unnecessary and doesn't make
much sense anyway for polling errors.Signed-off-by: Borislav Petkov
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: Doug Thompson
Cc:
05 Nov, 2013
1 commit
-
Log messages slightly differ between edac subsystems. Unifying it.
Signed-off-by: Robert Richter
Acked-by: Rob Herring
Acked-by: Borislav Petkov
Signed-off-by: Robert Richter
24 Jul, 2013
1 commit
-
Fix the following:
BUG: key ffff88043bdd0330 not in .data!
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
Call Trace:
dump_stack
warn_slowpath_common
warn_slowpath_fmt
lockdep_init_map
? trace_hardirqs_on_caller
? trace_hardirqs_on
debug_mutex_init
__mutex_init
bus_register
edac_create_sysfs_mci_device
edac_mc_add_mc
sbridge_probe
pci_device_probe
driver_probe_device
__driver_attach
? driver_probe_device
bus_for_each_dev
driver_attach
bus_add_driver
driver_register
__pci_register_driver
? 0xffffffffa0010fff
sbridge_init
? 0xffffffffa0010fff
do_one_initcall
load_module
? unset_module_init_ro_nx
SyS_init_module
tracesys
---[ end trace d24a70b0d3ddf733 ]---
EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
EDAC sbridge: Driver loaded.What happens is that bus_register needs a statically allocated lock_key
because the last is handed in to lockdep. However, struct mem_ctl_info
embeds struct bus_type (the whole struct, not a pointer to it) and the
whole thing gets dynamically allocated.Fix this by using a statically allocated struct bus_type for the MC bus.
Signed-off-by: Borislav Petkov
Acked-by: Mauro Carvalho Chehab
Cc: Markus Trippelsdorf
Cc: stable@kernel.org # v3.10
Signed-off-by: Tony Luck
16 Mar, 2013
1 commit
-
Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
memory controller is csrows based. Merge both fields into one.There's no need for the driver to actually fill it, as the core detects
it by checking if one of the layers has the csrows type as part of the
memory hierarchy:if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
per_rank = true;Signed-off-by: Mauro Carvalho Chehab
Signed-off-by: Borislav Petkov
22 Feb, 2013
2 commits
-
That allows APEI GHES driver to report errors directly, using
the EDAC error report API.Signed-off-by: Mauro Carvalho Chehab
-
The number of variables at the stack is too big.
Reduces the stack usage by using a pre-allocated error
buffer.Signed-off-by: Mauro Carvalho Chehab
21 Feb, 2013
3 commits
-
APEI GHES and i7core_edac/sb_edac currently can be loaded at
the same time, but those are Highlander modules:
"There can be only one".There are two reasons for that:
1) Each driver assumes that it is the only one registering at
the EDAC core, as it is driver's responsibility to number
the memory controllers, and all of them start from 0;2) If BIOS is handling the memory errors, the OS can't also be
doing it, as one will mangle with the other.So, we need to add an module owner's lock at the EDAC core,
in order to avoid having two different modules handling memory
errors at the same time. The best way for doing this lock seems
to use the driver's name, as this is unique, and won't require
changes on every driver.Signed-off-by: Mauro Carvalho Chehab
-
There are some cases where the memory controller layout is
completely hidden. This is the case of firmware-driven error
code, like the one provided by GHES. Add a new layer to be
used on such memory error report mechanisms.Signed-off-by: Mauro Carvalho Chehab
-
Linux 3.8-rc7
* tag 'v3.8-rc7': (12052 commits)
Linux 3.8-rc7
net: sctp: sctp_endpoint_free: zero out secret key data
net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree
atm/iphase: rename fregt_t -> ffreg_t
ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned
ARM: DMA mapping: fix bad atomic test
ARM: realview: ensure that we have sufficient IRQs available
ARM: GIC: fix GIC cpumask initialization
net: usb: fix regression from FLAG_NOARP code
l2tp: dont play with skb->truesize
net: sctp: sctp_auth_key_put: use kzfree instead of kfree
netback: correct netbk_tx_err to handle wrap around.
xen/netback: free already allocated memory on failure in xen_netbk_get_requests
xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.
xen/netback: shutdown the ring if it contains garbage.
drm/ttm: fix fence locking in ttm_buffer_object_transfer, 2nd try
virtio_console: Don't access uninitialized data.
net: qmi_wwan: add more Huawei devices, including E320
net: cdc_ncm: add another Huawei vendor specific device
ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit()
...
30 Jan, 2013
1 commit
-
First number, then size.
Signed-off-by: Joe Perches
Cc:
Signed-off-by: Borislav Petkov
21 Dec, 2012
1 commit
-
There are no more embedded kobjects in struct mem_ctl_info. Remove a header and
a comment that does not reflect the code anymore.Signed-off-by: Shaun Ruffell
Signed-off-by: Mauro Carvalho Chehab
12 Dec, 2012
1 commit
-
Pull EDAC fixes from Borislav Petkov:
- EDAC core error path fix, from Denis Kirjanov.
- Generalization of AMD MCE bank names and some minor error reporting
improvements.- EDAC core cleanups and simplifications, from Wei Yongjun.
- amd64_edac fixes for sysfs-reported values, from Josh Hunt.
- some heavy amd64_edac error reporting path shaving, leading to
removing a bunch of code.- amd64_edac error injection method improvements.
- EDAC core cleanups and fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (24 commits)
EDAC, pci_sysfs: Use for_each_pci_dev to simplify the code
EDAC: Handle error path in edac_mc_sysfs_init() properly
MCE, AMD: Dump error status
MCE, AMD: Report decoded error type first
MCE, AMD: Dump CPU f/m/s triple with the error
MCE, AMD: Remove functional unit references
EDAC: Convert to use simple_open()
EDAC, Calxeda highbank: Convert to use simple_open()
EDAC: Fix mc size reported in sysfs
EDAC: Fix csrow size reported in sysfs
EDAC: Pass mci parent
EDAC: Add memory controller flags
amd64_edac: Fix csrows size and pages computation
amd64_edac: Use DBAM_DIMM macro
amd64_edac: Fix K8 chip select reporting
amd64_edac: Reorganize error reporting path
amd64_edac: Do not check whether error address is valid
amd64_edac: Improve error injection
amd64_edac: Cleanup error injection code
amd64_edac: Small fixlets and cleanups
...
04 Dec, 2012
1 commit
-
Pull EDAC fixes from Mauro Carvalho Chehab:
"One EDAC core fix, and a few driver fixes (i7300, i9275x, i7core)."* git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
i7core_edac: fix panic when accessing sysfs files
i7300_edac: Fix error flag testing
edac: Fix the dimm filling for csrows-based layouts
i82975x_edac: Fix dimm label initialization