17 Oct, 2020
1 commit
-
memory_failure() is supposed to call action_result() when it handles a
memory error event, but there's one missing case. So let's add it.I find that include/ras/ras_event.h has some other MF_MSG_* undefined, so
this patch also adds them.Signed-off-by: Naoya Horiguchi
Signed-off-by: Oscar Salvador
Signed-off-by: Andrew Morton
Cc: "Aneesh Kumar K.V"
Cc: Aneesh Kumar K.V
Cc: Aristeu Rozanski
Cc: Dave Hansen
Cc: David Hildenbrand
Cc: Dmitry Yakunin
Cc: Michal Hocko
Cc: Mike Kravetz
Cc: Oscar Salvador
Cc: Qian Cai
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20200922135650.1634-13-osalvador@suse.de
Signed-off-by: Linus Torvalds
25 Jan, 2019
1 commit
-
The commit
297b64c74385 ("ras: acpi / apei: generate trace event for unrecognized CPER section")
brought inconsistency in UUID types which are used across the RAS
subsystem.Fix this by using guid_t everywhere.
Signed-off-by: Andy Shevchenko
Signed-off-by: Borislav Petkov
Reviewed-by: Christoph Hellwig
Cc: "Rafael J. Wysocki"
Cc: "Steven Rostedt (VMware)"
Cc: Bjorn Helgaas
Cc: Thomas Tai
Cc: Tony Luck
Cc: Tyler Baicar
Link: https://lkml.kernel.org/r/20190125143035.81589-1-andriy.shevchenko@linux.intel.com
10 May, 2018
1 commit
-
When a PCIe AER error occurs, the TLP header information is printed in the
kernel message but it is missing from the tracepoint. A userspace program
can use this information in the tracepoint to better analyze problems.To enable the tracepoint:
echo 1 > /sys/kernel/debug/tracing/events/ras/aer_event/enable
Example tracepoint output:
$ cat /sys/kernel/debug/tracing/trace
aer_event: 0000:01:00.0
PCIe Bus Error: severity=Uncorrected, non-fatal, Completer Abort
TLP Header={0x0,0x1,0x2,0x3}Signed-off-by: Thomas Tai
Signed-off-by: Bjorn Helgaas
Reviewed-by: Steven Rostedt (VMware)
02 Nov, 2017
1 commit
-
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.By default all files without license information are under the default
license of the kernel, which is GPL version 2.Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
23 Jun, 2017
2 commits
-
Currently there are trace events for the various RAS
errors with the exception of ARM processor type errors.
Add a new trace event for such errors so that the user
will know when they occur. These trace events are
consistent with the ARM processor error section type
defined in UEFI 2.6 spec section N.2.4.4.Signed-off-by: Tyler Baicar
Acked-by: Steven Rostedt
Reviewed-by: Xie XiuQi
Signed-off-by: Will Deacon -
The UEFI spec includes non-standard section type support in the
Common Platform Error Record. This is defined in section N.2.3 of
UEFI version 2.5.Currently if the CPER section's type (UUID) does not match any
section type that the kernel knows how to parse, a trace event is
not generated.Generate a trace event which contains the raw error data for
non-standard section type error records.Signed-off-by: Tyler Baicar
CC: Jonathan (Zhixiong) Zhang
Tested-by: Shiju Jose
Signed-off-by: Will Deacon
16 Jul, 2016
1 commit
-
__get_str(msg) does not need (char *) operator overloading to access
mgs's elements anymore. This patch substitutes
((char *)__get_str(msg))[0] usage to __get_str(msg)[0].It is just a code cleanup, no changes on tracepoint ABI.
Link: http://lkml.kernel.org/r/6f2db5be7705da2cb483923320c91283d7c712a7.1467407618.git.bristot@redhat.com
Cc: Trond Myklebust
Cc: Anna Schumaker
Cc: Ingo Molnar
Reviewed-by: Steven Rostedt
Signed-off-by: Daniel Bristot de Oliveira
Signed-off-by: Steven Rostedt
25 Jun, 2015
1 commit
-
RAS user space tools like rasdaemon which base on trace event, could
receive mce error event, but no memory recovery result event. So, I want
to add this event to make this scenario complete.This patch add a event at ras group for memory-failure.
The output like below:
# tracer: nop
#
# entries-in-buffer/entries-written: 2/2 #P:24
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
mce-inject-13150 [001] .... 277.019359: memory_failure_event: pfn 0x19869: recovery action for free buddy page: Delayed[xiexiuqi@huawei.com: fix build error]
Signed-off-by: Xie XiuQi
Reviewed-by: Naoya Horiguchi
Acked-by: Steven Rostedt
Cc: Tony Luck
Cc: Chen Gong
Cc: Jim Davis
Signed-off-by: Xie XiuQi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Sep, 2014
3 commits
-
In PCIe r1.0, sec 5.10.2, bit 0 of the Uncorrectable Error Status, Mask,
and Severity Registers was for "Training Error." In PCIe r1.1, sec 7.10.2,
bit 0 was redefined to be "Undefined."Rename PCI_ERR_UNC_TRAIN to PCI_ERR_UNC_UND to reflect this change.
No functional change.
[bhelgaas: changelog]
Signed-off-by: Chen, Gong
Signed-off-by: Bjorn Helgaas -
Add all AER error bits defined in PCIe r3.0.
[bhelgaas: changelog]
Signed-off-by: Chen, Gong
Signed-off-by: Bjorn Helgaas -
Replace bare numbers like "BIT(0)" with the existing #defines, e.g.,
PCI_ERR_COR_RCVR, to improve maintainability. This way grep will find more
uses of the #defines.No functional change.
[bhelgaas: changelog]
Signed-off-by: Chen, Gong
Signed-off-by: Bjorn Helgaas
26 Jun, 2014
1 commit
-
Add trace interface to elaborate all H/W error related information.
Signed-off-by: Chen, Gong
Acked-by: Borislav Petkov
Signed-off-by: Tony Luck
24 Jun, 2014
1 commit
-
AER uses a separate trace interface by now. To make it
consistent, move it into unified RAS trace interface.Signed-off-by: Chen, Gong
Acked-by: Borislav Petkov
Signed-off-by: Tony Luck
22 Feb, 2013
1 commit
-
The CPER spec defines a forth type of error: informational
logs. Add support for it at the edac API and at the
trace event interface.Signed-off-by: Mauro Carvalho Chehab
11 Jun, 2012
1 commit
-
Add a new tracepoint-based hardware events report method for
reporting Memory Controller events.Part of the description bellow is shamelessly copied from Tony
Luck's notes about the Hardware Error BoF during LPC 2010 [1].
Tony, thanks for your notes and discussions to generate the
h/w error reporting requirements.[1] http://lwn.net/Articles/416669/
We have several subsystems & methods for reporting hardware errors:
1) EDAC ("Error Detection and Correction"). In its original form
this consisted of a platform specific driver that read topology
information and error counts from chipset registers and reported
the results via a sysfs interface.2) mcelog - x86 specific decoding of machine check bank registers
reporting in binary form via /dev/mcelog. Recent additions make use
of the APEI extensions that were documented in version 4.0a of the
ACPI specification to acquire more information about errors without
having to rely reading chipset registers directly. A user level
programs decodes into somewhat human readable format.3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
decodes errors reported via machine check bank registers in AMD
processors to the console log using printk();Each of these mechanisms has a band of followers ... and none
of them appear to meet all the needs of all users.As part of a RAS subsystem, let's encapsulate the memory error hardware
events into a trace facility.The tracepoint printk will be displayed like:
mc_event: [quant] (Corrected|Uncorrected|Fatal) error:[error msg] on [label] ([location] [edac_mc detail] [driver_detail]
Where:
[quant] is the quantity of errors
[error msg] is the driver-specific error message
(e. g. "memory read", "bus error", ...);
[location] is the location in terms of memory controller and
branch/channel/slot, channel/slot or csrow/channel;
[label] is the memory stick label;
[edac_mc detail] describes the address location of the error
and the syndrome;
[driver detail] is driver-specifig error message details,
when needed/provided (e. g. "area:DMA", ...)For example:
mc_event: 1 Corrected error:memory read on memory stick DIMM_1A (mc:0 location:0:0:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)
Of course, any userspace tools meant to handle errors should not parse
the above data. They should, instead, use the binary fields provided by
the tracepoint, mapping them directly into their Management Information
Base.NOTE: The original patch was providing an additional mechanism for
MCA-based trace events that also contained MCA error register data.
However, as no agreement was reached so far for the MCA-based trace
events, for now, let's add events only for memory errors.
A latter patch is planned to change the tracepoint, for those types
of event.Cc: Aristeu Rozanski
Cc: Doug Thompson
Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Signed-off-by: Mauro Carvalho Chehab