26 Sep, 2020

1 commit


20 Aug, 2020

1 commit

  • late_initcall() expects a function that returns an integer. Update the
    function signature to match.

    [ bp: Massage commit message into proper sentences. ]

    Fixes: 9554bfe403bd ("x86/mce: Convert the CEC to use the MCE notifier")
    Signed-off-by: Luca Stefani
    Signed-off-by: Borislav Petkov
    Reviewed-by: Sami Tolvanen
    Tested-by: Sami Tolvanen
    Link: https://lkml.kernel.org/r/20200805095708.83939-1-luca.stefani.ge1@gmail.com

    Luca Stefani
     

14 Apr, 2020

2 commits

  • If the handler took any action to log or deal with the error, set a bit
    in mce->kflags so that the default handler on the end of the machine
    check chain can see what has been done.

    Get rid of NOTIFY_STOP returns. Make the EDAC and dev-mcelog handlers
    skip over errors already processed by CEC.

    Signed-off-by: Tony Luck
    Signed-off-by: Borislav Petkov
    Tested-by: Tony Luck
    Link: https://lkml.kernel.org/r/20200214222720.13168-5-tony.luck@intel.com

    Tony Luck
     
  • The CEC code has its claws in a couple of routines in mce/core.c.
    Convert it to just register itself on the normal MCE notifier chain.

    [ bp: Make cec_add_elem() and cec_init() static. ]

    Signed-off-by: Tony Luck
    Signed-off-by: Borislav Petkov
    Tested-by: Tony Luck
    Link: https://lkml.kernel.org/r/20200214222720.13168-3-tony.luck@intel.com

    Tony Luck
     

08 Aug, 2019

1 commit

  • When building with C=2 and/or W=1, legitimate warnings are issued about
    missing prototypes:

    CHECK drivers/ras/debugfs.c
    drivers/ras/debugfs.c:4:15: warning: symbol 'ras_debugfs_dir' was not declared. Should it be static?
    drivers/ras/debugfs.c:8:5: warning: symbol 'ras_userspace_consumers' was not declared. Should it be static?
    drivers/ras/debugfs.c:38:12: warning: symbol 'ras_add_daemon_trace' was not declared. Should it be static?
    drivers/ras/debugfs.c:54:13: warning: symbol 'ras_debugfs_init' was not declared. Should it be static?
    CC drivers/ras/debugfs.o
    drivers/ras/debugfs.c:8:5: warning: no previous prototype for 'ras_userspace_consumers' [-Wmissing-prototypes]
    8 | int ras_userspace_consumers(void)
    | ^~~~~~~~~~~~~~~~~~~~~~~
    drivers/ras/debugfs.c:38:12: warning: no previous prototype for 'ras_add_daemon_trace' [-Wmissing-prototypes]
    38 | int __init ras_add_daemon_trace(void)
    | ^~~~~~~~~~~~~~~~~~~~
    drivers/ras/debugfs.c:54:13: warning: no previous prototype for 'ras_debugfs_init' [-Wmissing-prototypes]
    54 | void __init ras_debugfs_init(void)
    | ^~~~~~~~~~~~~~~~

    Provide the proper includes.

    [ bp: Take care of the same warnings for cec.c too. ]

    Signed-off-by: Valdis Kletnieks
    Signed-off-by: Borislav Petkov
    Cc: Tony Luck
    Cc: linux-edac@vger.kernel.org
    Cc: x86@kernel.org
    Link: http://lkml.kernel.org/r/7168.1565218769@turing-police

    Valdis Klētnieks
     

08 Jun, 2019

11 commits

  • Signed-off-by: Borislav Petkov

    Borislav Petkov
     
  • The pfn and array files in (debugfs)/ras/cec are intended for debugging
    the CEC code itself. They are not needed on production systems, so the
    default setting for this CONFIG option is "n".

    [ bp: Have it with less ifdeffery by using IS_ENABLED(). ]

    Signed-off-by: Tony Luck
    Signed-off-by: Borislav Petkov

    Tony Luck
     
  • When dumping the array elements, print them in the following format:

    [ PFN | generation in binary | count ]

    to be perfectly clear what all those sections are.

    Signed-off-by: Borislav Petkov
    Cc: Tony Luck
    Cc: linux-edac

    Borislav Petkov
     
  • ... which is the better, more-fitting name anyway.

    Tony:
    - make action_threshold u64 due to debugfs accessors expecting u64.
    - rename the remaining: s/count_threshold/action_threshold/g

    Co-developed-by: Tony Luck
    Signed-off-by: Tony Luck
    Signed-off-by: Borislav Petkov
    Cc: linux-edac

    Borislav Petkov
     
  • Check the elements order in the array after every insertion.

    Signed-off-by: Borislav Petkov
    Cc: Tony Luck
    Cc: linux-edac

    Borislav Petkov
     
  • Free the array page if a failure is encountered while creating the
    debugfs nodes.

    Signed-off-by: Borislav Petkov
    Cc: Tony Luck
    Cc: linux-edac

    Borislav Petkov
     
  • When the value requested doesn't match the allowed (min,max) range,
    the @data buffer should not be modified with the invalid value because
    reading "decay_interval" shows it otherwise as if the previous write
    succeeded.

    Move the data write after the check.

    Signed-off-by: Borislav Petkov
    Cc: Tony Luck
    Cc: linux-edac

    Borislav Petkov
     
  • The count_threshold should be checked unconditionally, after insertion
    too, so that a count_threshold value of 1 can cause an immediate
    offlining. I.e., offline the page on the *first* error encountered.

    Add comments to make it clear what cec_add_elem() does, while at it.

    Reported-by: WANG Chao
    Signed-off-by: Borislav Petkov
    Cc: Tony Luck
    Cc: linux-edac@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190418034115.75954-3-chao.wang@ucloud.cn

    Borislav Petkov
     
  • When inserting random PFNs for debugging the CEC through
    (debugfs)/ras/cec/pfn, depending on the return value of pfn_set(),
    multiple values get inserted per a single write.

    That is because simple_attr_write() interprets a retval of 0 as
    success and claims the whole input. However, pfn_set() returns the
    cec_add_elem() value, which, if > 0 and smaller than the whole input
    length, makes glibc continue issuing the write syscall until there's
    input left:

    pfn_set
    simple_attr_write
    debugfs_attr_write
    full_proxy_write
    vfs_write
    ksys_write
    do_syscall_64
    entry_SYSCALL_64_after_hwframe

    leading to those repeated calls.

    Return 0 to fix that.

    Signed-off-by: Borislav Petkov
    Cc: Tony Luck
    Cc: linux-edac

    Borislav Petkov
     
  • cec_timer_fn() is a timer callback which reads ce_arr.array[] and
    updates its decay values. However, it runs in interrupt context and the
    mutex protection the CEC uses for that array, is inadequate. Convert the
    used timer to a workqueue to keep the tasks the CEC performs preemptible
    and thus low-prio.

    [ bp: Rewrite commit message.
    s/timer/decay/gi to make it agnostic as to what facility is used. ]

    Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
    Signed-off-by: Cong Wang
    Signed-off-by: Borislav Petkov
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: linux-edac
    Cc:
    Link: https://lkml.kernel.org/r/20190416213351.28999-2-xiyou.wangcong@gmail.com

    Cong Wang
     
  • Switch to using Donald Knuth's binary search algorithm (The Art of
    Computer Programming, vol. 3, section 6.2.1). This should've been done
    from the very beginning but the author must've been smoking something
    very potent at the time.

    The problem with the current one was that it would return the wrong
    element index in certain situations:

    https://lkml.kernel.org/r/CAM_iQpVd02zkVJ846cj-Fg1yUNuz6tY5q1Vpj4LrXmE06dPYYg@mail.gmail.com

    and the noodling code after the loop was fishy at best.

    So switch to using Knuth's binary search. The final result is much
    cleaner and straightforward.

    Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
    Reported-by: Cong Wang
    Signed-off-by: Borislav Petkov
    Cc: Tony Luck
    Cc: linux-edac
    Cc:

    Borislav Petkov
     

20 Apr, 2019

1 commit


24 Jan, 2018

1 commit


14 Nov, 2017

1 commit

  • Pull timer updates from Thomas Gleixner:
    "Yet another big pile of changes:

    - More year 2038 work from Arnd slowly reaching the point where we
    need to think about the syscalls themself.

    - A new timer function which allows to conditionally (re)arm a timer
    only when it's either not running or the new expiry time is sooner
    than the armed expiry time. This allows to use a single timer for
    multiple timeout requirements w/o caring about the first expiry
    time at the call site.

    - A new NMI safe accessor to clock real time for the printk timestamp
    work. Can be used by tracing, perf as well if required.

    - A large number of timer setup conversions from Kees which got
    collected here because either maintainers requested so or they
    simply got ignored. As Kees pointed out already there are a few
    trivial merge conflicts and some redundant commits which was
    unavoidable due to the size of this conversion effort.

    - Avoid a redundant iteration in the timer wheel softirq processing.

    - Provide a mechanism to treat RTC implementations depending on their
    hardware properties, i.e. don't inflict the write at the 0.5
    seconds boundary which originates from the PC CMOS RTC to all RTCs.
    No functional change as drivers need to be updated separately.

    - The usual small updates to core code clocksource drivers. Nothing
    really exciting"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits)
    timers: Add a function to start/reduce a timer
    pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday()
    timer: Prepare to change all DEFINE_TIMER() callbacks
    netfilter: ipvs: Convert timers to use timer_setup()
    scsi: qla2xxx: Convert timers to use timer_setup()
    block/aoe: discover_timer: Convert timers to use timer_setup()
    ide: Convert timers to use timer_setup()
    drbd: Convert timers to use timer_setup()
    mailbox: Convert timers to use timer_setup()
    crypto: Convert timers to use timer_setup()
    drivers/pcmcia: omap1: Fix error in automated timer conversion
    ARM: footbridge: Fix typo in timer conversion
    drivers/sgi-xp: Convert timers to use timer_setup()
    drivers/pcmcia: Convert timers to use timer_setup()
    drivers/memstick: Convert timers to use timer_setup()
    drivers/macintosh: Convert timers to use timer_setup()
    hwrng/xgene-rng: Convert timers to use timer_setup()
    auxdisplay: Convert timers to use timer_setup()
    sparc/led: Convert timers to use timer_setup()
    mips: ip22/32: Convert timers to use timer_setup()
    ...

    Linus Torvalds
     

02 Nov, 2017

2 commits

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • In preparation for unconditionally passing the struct timer_list pointer to
    all timer callbacks, switch to using the new timer_setup() and from_timer()
    to pass the timer pointer explicitly.

    Cc: Borislav Petkov
    Cc: Thomas Gleixner
    Cc: Christophe JAILLET
    Cc: Nicolas Iooss
    Cc: Ingo Molnar
    Signed-off-by: Kees Cook
    Reviewed-by: Borislav Petkov

    Kees Cook
     

05 Oct, 2017

1 commit

  • parse_cec_param() compares a string with "cec_disable" using only 7
    characters of the 11-character-long string.

    The proper solution for this would be:

    #define CEC_DISABLE "cec_disable"

    strncmp(str, CEC_DISABLE, strlen(CEC_DISABLE))

    but when comparing a string against a string constant strncmp() has no
    advantage over strcmp() because the comparison is guaranteed to be bound by
    the string constant. So just replace str strncmp() with strcmp().

    [ tglx: Made it use strcmp and updated the changelog ]

    Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
    Signed-off-by: Nicolas Iooss
    Signed-off-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20170903075440.30250-1-nicolas.iooss_linux@m4x.org

    Nicolas Iooss
     

26 Jun, 2017

1 commit

  • Check the correct variable when handling a potential error from
    debugfs_create_file(). Most likely a copy-paste botch.

    [ Rewrite commit message. ]
    Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
    Signed-off-by: Christophe JAILLET
    Signed-off-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20170623062440.6726-1-christophe.jaillet@wanadoo.fr

    Christophe JAILLET
     

28 Mar, 2017

1 commit

  • Introduce a simple data structure for collecting correctable errors
    along with accessors. More detailed description in the code itself.

    The error decoding is done with the decoding chain now and
    mce_first_notifier() gets to see the error first and the CEC decides
    whether to log it and then the rest of the chain doesn't hear about it -
    basically the main reason for the CE collector - or to continue running
    the notifiers.

    When the CEC hits the action threshold, it will try to soft-offine the
    page containing the ECC and then the whole decoding chain gets to see
    the error.

    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/20170327093304.10683-5-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov