26 Sep, 2020

1 commit


20 Aug, 2020

1 commit

  • late_initcall() expects a function that returns an integer. Update the
    function signature to match.

    [ bp: Massage commit message into proper sentences. ]

    Fixes: 9554bfe403bd ("x86/mce: Convert the CEC to use the MCE notifier")
    Signed-off-by: Luca Stefani
    Signed-off-by: Borislav Petkov
    Reviewed-by: Sami Tolvanen
    Tested-by: Sami Tolvanen
    Link: https://lkml.kernel.org/r/20200805095708.83939-1-luca.stefani.ge1@gmail.com

    Luca Stefani
     

14 Apr, 2020

2 commits

  • If the handler took any action to log or deal with the error, set a bit
    in mce->kflags so that the default handler on the end of the machine
    check chain can see what has been done.

    Get rid of NOTIFY_STOP returns. Make the EDAC and dev-mcelog handlers
    skip over errors already processed by CEC.

    Signed-off-by: Tony Luck
    Signed-off-by: Borislav Petkov
    Tested-by: Tony Luck
    Link: https://lkml.kernel.org/r/20200214222720.13168-5-tony.luck@intel.com

    Tony Luck
     
  • The CEC code has its claws in a couple of routines in mce/core.c.
    Convert it to just register itself on the normal MCE notifier chain.

    [ bp: Make cec_add_elem() and cec_init() static. ]

    Signed-off-by: Tony Luck
    Signed-off-by: Borislav Petkov
    Tested-by: Tony Luck
    Link: https://lkml.kernel.org/r/20200214222720.13168-3-tony.luck@intel.com

    Tony Luck
     

08 Aug, 2019

2 commits

  • In addition, the 0day bot reported this build error:

    >> drivers/ras/debugfs.c:10:5: error: redefinition of 'ras_userspace_consumers'
    int ras_userspace_consumers(void)
    ^~~~~~~~~~~~~~~~~~~~~~~
    In file included from drivers/ras/debugfs.c:3:0:
    include/linux/ras.h:14:19: note: previous definition of 'ras_userspace_consumers' was here
    static inline int ras_userspace_consumers(void) { return 0; }
    ^~~~~~~~~~~~~~~~~~~~~~~

    for a riscv-specific .config where CONFIG_DEBUG_FS is not set. Fix all
    that by making debugfs.o depend on that define.

    [ bp: Rewrite commit message. ]

    Reported-by: kbuild test robot
    Signed-off-by: Valdis Kletnieks
    Signed-off-by: Borislav Petkov
    Cc: Tony Luck
    Cc: linux-edac@vger.kernel.org
    Cc: x86@kernel.org
    Link: http://lkml.kernel.org/r/7053.1565218556@turing-police

    Valdis Kletnieks
     
  • When building with C=2 and/or W=1, legitimate warnings are issued about
    missing prototypes:

    CHECK drivers/ras/debugfs.c
    drivers/ras/debugfs.c:4:15: warning: symbol 'ras_debugfs_dir' was not declared. Should it be static?
    drivers/ras/debugfs.c:8:5: warning: symbol 'ras_userspace_consumers' was not declared. Should it be static?
    drivers/ras/debugfs.c:38:12: warning: symbol 'ras_add_daemon_trace' was not declared. Should it be static?
    drivers/ras/debugfs.c:54:13: warning: symbol 'ras_debugfs_init' was not declared. Should it be static?
    CC drivers/ras/debugfs.o
    drivers/ras/debugfs.c:8:5: warning: no previous prototype for 'ras_userspace_consumers' [-Wmissing-prototypes]
    8 | int ras_userspace_consumers(void)
    | ^~~~~~~~~~~~~~~~~~~~~~~
    drivers/ras/debugfs.c:38:12: warning: no previous prototype for 'ras_add_daemon_trace' [-Wmissing-prototypes]
    38 | int __init ras_add_daemon_trace(void)
    | ^~~~~~~~~~~~~~~~~~~~
    drivers/ras/debugfs.c:54:13: warning: no previous prototype for 'ras_debugfs_init' [-Wmissing-prototypes]
    54 | void __init ras_debugfs_init(void)
    | ^~~~~~~~~~~~~~~~

    Provide the proper includes.

    [ bp: Take care of the same warnings for cec.c too. ]

    Signed-off-by: Valdis Kletnieks
    Signed-off-by: Borislav Petkov
    Cc: Tony Luck
    Cc: linux-edac@vger.kernel.org
    Cc: x86@kernel.org
    Link: http://lkml.kernel.org/r/7168.1565218769@turing-police

    Valdis Klētnieks
     

08 Jun, 2019

11 commits

  • Signed-off-by: Borislav Petkov

    Borislav Petkov
     
  • The pfn and array files in (debugfs)/ras/cec are intended for debugging
    the CEC code itself. They are not needed on production systems, so the
    default setting for this CONFIG option is "n".

    [ bp: Have it with less ifdeffery by using IS_ENABLED(). ]

    Signed-off-by: Tony Luck
    Signed-off-by: Borislav Petkov

    Tony Luck
     
  • When dumping the array elements, print them in the following format:

    [ PFN | generation in binary | count ]

    to be perfectly clear what all those sections are.

    Signed-off-by: Borislav Petkov
    Cc: Tony Luck
    Cc: linux-edac

    Borislav Petkov
     
  • ... which is the better, more-fitting name anyway.

    Tony:
    - make action_threshold u64 due to debugfs accessors expecting u64.
    - rename the remaining: s/count_threshold/action_threshold/g

    Co-developed-by: Tony Luck
    Signed-off-by: Tony Luck
    Signed-off-by: Borislav Petkov
    Cc: linux-edac

    Borislav Petkov
     
  • Check the elements order in the array after every insertion.

    Signed-off-by: Borislav Petkov
    Cc: Tony Luck
    Cc: linux-edac

    Borislav Petkov
     
  • Free the array page if a failure is encountered while creating the
    debugfs nodes.

    Signed-off-by: Borislav Petkov
    Cc: Tony Luck
    Cc: linux-edac

    Borislav Petkov
     
  • When the value requested doesn't match the allowed (min,max) range,
    the @data buffer should not be modified with the invalid value because
    reading "decay_interval" shows it otherwise as if the previous write
    succeeded.

    Move the data write after the check.

    Signed-off-by: Borislav Petkov
    Cc: Tony Luck
    Cc: linux-edac

    Borislav Petkov
     
  • The count_threshold should be checked unconditionally, after insertion
    too, so that a count_threshold value of 1 can cause an immediate
    offlining. I.e., offline the page on the *first* error encountered.

    Add comments to make it clear what cec_add_elem() does, while at it.

    Reported-by: WANG Chao
    Signed-off-by: Borislav Petkov
    Cc: Tony Luck
    Cc: linux-edac@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190418034115.75954-3-chao.wang@ucloud.cn

    Borislav Petkov
     
  • When inserting random PFNs for debugging the CEC through
    (debugfs)/ras/cec/pfn, depending on the return value of pfn_set(),
    multiple values get inserted per a single write.

    That is because simple_attr_write() interprets a retval of 0 as
    success and claims the whole input. However, pfn_set() returns the
    cec_add_elem() value, which, if > 0 and smaller than the whole input
    length, makes glibc continue issuing the write syscall until there's
    input left:

    pfn_set
    simple_attr_write
    debugfs_attr_write
    full_proxy_write
    vfs_write
    ksys_write
    do_syscall_64
    entry_SYSCALL_64_after_hwframe

    leading to those repeated calls.

    Return 0 to fix that.

    Signed-off-by: Borislav Petkov
    Cc: Tony Luck
    Cc: linux-edac

    Borislav Petkov
     
  • cec_timer_fn() is a timer callback which reads ce_arr.array[] and
    updates its decay values. However, it runs in interrupt context and the
    mutex protection the CEC uses for that array, is inadequate. Convert the
    used timer to a workqueue to keep the tasks the CEC performs preemptible
    and thus low-prio.

    [ bp: Rewrite commit message.
    s/timer/decay/gi to make it agnostic as to what facility is used. ]

    Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
    Signed-off-by: Cong Wang
    Signed-off-by: Borislav Petkov
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: linux-edac
    Cc:
    Link: https://lkml.kernel.org/r/20190416213351.28999-2-xiyou.wangcong@gmail.com

    Cong Wang
     
  • Switch to using Donald Knuth's binary search algorithm (The Art of
    Computer Programming, vol. 3, section 6.2.1). This should've been done
    from the very beginning but the author must've been smoking something
    very potent at the time.

    The problem with the current one was that it would return the wrong
    element index in certain situations:

    https://lkml.kernel.org/r/CAM_iQpVd02zkVJ846cj-Fg1yUNuz6tY5q1Vpj4LrXmE06dPYYg@mail.gmail.com

    and the noodling code after the loop was fishy at best.

    So switch to using Knuth's binary search. The final result is much
    cleaner and straightforward.

    Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
    Reported-by: Cong Wang
    Signed-off-by: Borislav Petkov
    Cc: Tony Luck
    Cc: linux-edac
    Cc:

    Borislav Petkov
     

21 May, 2019

2 commits

  • Add SPDX license identifiers to all Make/Kconfig files which:

    - Have no license information of any form

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

20 Apr, 2019

1 commit


25 Jan, 2019

1 commit

  • The commit

    297b64c74385 ("ras: acpi / apei: generate trace event for unrecognized CPER section")

    brought inconsistency in UUID types which are used across the RAS
    subsystem.

    Fix this by using guid_t everywhere.

    Signed-off-by: Andy Shevchenko
    Signed-off-by: Borislav Petkov
    Reviewed-by: Christoph Hellwig
    Cc: "Rafael J. Wysocki"
    Cc: "Steven Rostedt (VMware)"
    Cc: Bjorn Helgaas
    Cc: Thomas Tai
    Cc: Tony Luck
    Cc: Tyler Baicar
    Link: https://lkml.kernel.org/r/20190125143035.81589-1-andriy.shevchenko@linux.intel.com

    Andy Shevchenko
     

21 Dec, 2018

1 commit

  • The Kconfig lexer supports special characters such as '.' and '/' in
    the parameter context. In my understanding, the reason is just to
    support bare file paths in the source statement.

    I do not see a good reason to complicate Kconfig for the room of
    ambiguity.

    The majority of code already surrounds file paths with double quotes,
    and it makes sense since file paths are constant string literals.

    Make it treewide consistent now.

    Signed-off-by: Masahiro Yamada
    Acked-by: Wolfram Sang
    Acked-by: Geert Uytterhoeven
    Acked-by: Ingo Molnar

    Masahiro Yamada
     

24 Jan, 2018

1 commit


14 Nov, 2017

1 commit

  • Pull timer updates from Thomas Gleixner:
    "Yet another big pile of changes:

    - More year 2038 work from Arnd slowly reaching the point where we
    need to think about the syscalls themself.

    - A new timer function which allows to conditionally (re)arm a timer
    only when it's either not running or the new expiry time is sooner
    than the armed expiry time. This allows to use a single timer for
    multiple timeout requirements w/o caring about the first expiry
    time at the call site.

    - A new NMI safe accessor to clock real time for the printk timestamp
    work. Can be used by tracing, perf as well if required.

    - A large number of timer setup conversions from Kees which got
    collected here because either maintainers requested so or they
    simply got ignored. As Kees pointed out already there are a few
    trivial merge conflicts and some redundant commits which was
    unavoidable due to the size of this conversion effort.

    - Avoid a redundant iteration in the timer wheel softirq processing.

    - Provide a mechanism to treat RTC implementations depending on their
    hardware properties, i.e. don't inflict the write at the 0.5
    seconds boundary which originates from the PC CMOS RTC to all RTCs.
    No functional change as drivers need to be updated separately.

    - The usual small updates to core code clocksource drivers. Nothing
    really exciting"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits)
    timers: Add a function to start/reduce a timer
    pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday()
    timer: Prepare to change all DEFINE_TIMER() callbacks
    netfilter: ipvs: Convert timers to use timer_setup()
    scsi: qla2xxx: Convert timers to use timer_setup()
    block/aoe: discover_timer: Convert timers to use timer_setup()
    ide: Convert timers to use timer_setup()
    drbd: Convert timers to use timer_setup()
    mailbox: Convert timers to use timer_setup()
    crypto: Convert timers to use timer_setup()
    drivers/pcmcia: omap1: Fix error in automated timer conversion
    ARM: footbridge: Fix typo in timer conversion
    drivers/sgi-xp: Convert timers to use timer_setup()
    drivers/pcmcia: Convert timers to use timer_setup()
    drivers/memstick: Convert timers to use timer_setup()
    drivers/macintosh: Convert timers to use timer_setup()
    hwrng/xgene-rng: Convert timers to use timer_setup()
    auxdisplay: Convert timers to use timer_setup()
    sparc/led: Convert timers to use timer_setup()
    mips: ip22/32: Convert timers to use timer_setup()
    ...

    Linus Torvalds
     

02 Nov, 2017

2 commits

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • In preparation for unconditionally passing the struct timer_list pointer to
    all timer callbacks, switch to using the new timer_setup() and from_timer()
    to pass the timer pointer explicitly.

    Cc: Borislav Petkov
    Cc: Thomas Gleixner
    Cc: Christophe JAILLET
    Cc: Nicolas Iooss
    Cc: Ingo Molnar
    Signed-off-by: Kees Cook
    Reviewed-by: Borislav Petkov

    Kees Cook
     

05 Oct, 2017

1 commit

  • parse_cec_param() compares a string with "cec_disable" using only 7
    characters of the 11-character-long string.

    The proper solution for this would be:

    #define CEC_DISABLE "cec_disable"

    strncmp(str, CEC_DISABLE, strlen(CEC_DISABLE))

    but when comparing a string against a string constant strncmp() has no
    advantage over strcmp() because the comparison is guaranteed to be bound by
    the string constant. So just replace str strncmp() with strcmp().

    [ tglx: Made it use strcmp and updated the changelog ]

    Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
    Signed-off-by: Nicolas Iooss
    Signed-off-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20170903075440.30250-1-nicolas.iooss_linux@m4x.org

    Nicolas Iooss
     

06 Jul, 2017

1 commit

  • Pull arm64 updates from Will Deacon:

    - RAS reporting via GHES/APEI (ACPI)

    - Indirect ftrace trampolines for modules

    - Improvements to kernel fault reporting

    - Page poisoning

    - Sigframe cleanups and preparation for SVE context

    - Core dump fixes

    - Sparse fixes (mainly relating to endianness)

    - xgene SoC PMU v3 driver

    - Misc cleanups and non-critical fixes

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (75 commits)
    arm64: fix endianness annotation for 'struct jit_ctx' and friends
    arm64: cpuinfo: constify attribute_group structures.
    arm64: ptrace: Fix incorrect get_user() use in compat_vfp_set()
    arm64: ptrace: Remove redundant overrun check from compat_vfp_set()
    arm64: ptrace: Avoid setting compat FP[SC]R to garbage if get_user fails
    arm64: fix endianness annotation for __apply_alternatives()/get_alt_insn()
    arm64: fix endianness annotation in get_kaslr_seed()
    arm64: add missing conversion to __wsum in ip_fast_csum()
    arm64: fix endianness annotation in acpi_parking_protocol.c
    arm64: use readq() instead of readl() to read 64bit entry_point
    arm64: fix endianness annotation for reloc_insn_movw() & reloc_insn_imm()
    arm64: fix endianness annotation for aarch64_insn_write()
    arm64: fix endianness annotation in aarch64_insn_read()
    arm64: fix endianness annotation in call_undef_hook()
    arm64: fix endianness annotation for debug-monitors.c
    ras: mark stub functions as 'inline'
    arm64: pass endianness info to sparse
    arm64: ftrace: fix !CONFIG_ARM64_MODULE_PLTS kernels
    arm64: signal: Allow expansion of the signal frame
    acpi: apei: check for pending errors when probing GHES entries
    ...

    Linus Torvalds
     

26 Jun, 2017

1 commit

  • Check the correct variable when handling a potential error from
    debugfs_create_file(). Most likely a copy-paste botch.

    [ Rewrite commit message. ]
    Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
    Signed-off-by: Christophe JAILLET
    Signed-off-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20170623062440.6726-1-christophe.jaillet@wanadoo.fr

    Christophe JAILLET
     

23 Jun, 2017

2 commits

  • Currently there are trace events for the various RAS
    errors with the exception of ARM processor type errors.
    Add a new trace event for such errors so that the user
    will know when they occur. These trace events are
    consistent with the ARM processor error section type
    defined in UEFI 2.6 spec section N.2.4.4.

    Signed-off-by: Tyler Baicar
    Acked-by: Steven Rostedt
    Reviewed-by: Xie XiuQi
    Signed-off-by: Will Deacon

    Tyler Baicar
     
  • The UEFI spec includes non-standard section type support in the
    Common Platform Error Record. This is defined in section N.2.3 of
    UEFI version 2.5.

    Currently if the CPER section's type (UUID) does not match any
    section type that the kernel knows how to parse, a trace event is
    not generated.

    Generate a trace event which contains the raw error data for
    non-standard section type error records.

    Signed-off-by: Tyler Baicar
    CC: Jonathan (Zhixiong) Zhang
    Tested-by: Shiju Jose
    Signed-off-by: Will Deacon

    Tyler Baicar
     

22 May, 2017

1 commit


28 Mar, 2017

1 commit

  • Introduce a simple data structure for collecting correctable errors
    along with accessors. More detailed description in the code itself.

    The error decoding is done with the decoding chain now and
    mce_first_notifier() gets to see the error first and the CEC decides
    whether to log it and then the rest of the chain doesn't hear about it -
    basically the main reason for the CE collector - or to continue running
    the notifiers.

    When the CEC hits the action threshold, it will try to soft-offine the
    page containing the ECC and then the whole decoding chain gets to see
    the error.

    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/20170327093304.10683-5-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     

13 Aug, 2015

2 commits

  • This is an x86-specific module and would benefit from being
    closer to the arch code. Move it there. Update copyright while
    at it.

    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Link: http://lkml.kernel.org/r/1439396985-12812-14-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • Text taken a previous patch from "Gong Chen" .

    Signed-off-by: Borislav Petkov
    Cc: Gong Chen
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Link: http://lkml.kernel.org/r/1439396985-12812-11-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     

26 Jun, 2014

2 commits

  • Add trace interface to elaborate all H/W error related information.

    Signed-off-by: Chen, Gong
    Acked-by: Borislav Petkov
    Signed-off-by: Tony Luck

    Chen, Gong
     
  • Implement a new debugfs interface for RAS susbsystem.
    A file named daemon_active is added there accordingly.
    This file is used to track if user space daemon accesses
    perf/trace interface or not. One can track which daemon
    opens it via "lsof /path/to/debugfs/ras/daemon_active".

    Signed-off-by: Chen, Gong
    Link: http://lkml.kernel.org/r/1402475691-30045-5-git-send-email-gong.chen@linux.intel.com
    Signed-off-by: Borislav Petkov
    Signed-off-by: Tony Luck

    Chen, Gong
     

24 Jun, 2014

1 commit

  • To avoid confuision and conflict of usage for RAS related trace event,
    add an unified RAS trace event stub.

    Start a RAS subsystem menu which will be fleshed out in time, when more
    features get added to it.

    Signed-off-by: Chen, Gong
    Link: http://lkml.kernel.org/r/1402475691-30045-2-git-send-email-gong.chen@linux.intel.com
    Signed-off-by: Borislav Petkov
    Signed-off-by: Tony Luck

    Chen, Gong