20 Apr, 2019

1 commit


30 Aug, 2018

1 commit

  • Some architectures need to use stop_machine() to patch functions for
    ftrace, and the assumption is that the stopped CPUs do not make function
    calls to traceable functions when they are in the stopped state.

    Commit ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after
    MULTI_STOP_PREPARE") added calls to the watchdog touch functions from
    the stopped CPUs and those functions lack notrace annotations. This
    leads to crashes when enabling/disabling ftrace on ARM kernels built
    with the Thumb-2 instruction set.

    Fix it by adding the necessary notrace annotations.

    Fixes: ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE")
    Signed-off-by: Vincent Whitchurch
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: oleg@redhat.com
    Cc: tj@kernel.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180821152507.18313-1-vincent.whitchurch@axis.com

    Vincent Whitchurch
     

03 Aug, 2018

1 commit

  • Code is emitting the following error message during boot on systems
    without PMU hardware support while probing NMI capability.

    NMI watchdog: Perf event create on CPU 0 failed with -2

    This error is emitted as the perf subsystem returns -ENOENT due to lack of
    PMUs in the system.

    It is followed by the warning that NMI watchdog is disabled:

    NMI watchdog: Perf NMI watchdog permanently disabled

    While NMI disabled information is useful for ordinary users, seeing a PERF
    event create failed with error code -2 is not.

    Reduce the message severity to debug so that if debugging is still possible
    in case the error code returned by perf is required for analysis.

    Signed-off-by: Sinan Kaya
    Signed-off-by: Thomas Gleixner
    Acked-by: Don Zickus
    Cc: Kate Stewart
    Cc: Greg Kroah-Hartman
    Cc: Colin Ian King
    Cc: Peter Zijlstra
    Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599368
    Link: https://lkml.kernel.org/r/20180803060943.2643-1-okaya@kernel.org

    Sinan Kaya
     

04 Nov, 2017

1 commit


02 Nov, 2017

3 commits

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • Guenter reported:
    There is still a problem. When running
    echo 6 > /proc/sys/kernel/watchdog_thresh
    echo 5 > /proc/sys/kernel/watchdog_thresh
    repeatedly, the message

    NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.

    stops after a while (after ~10-30 iterations, with fluctuations).
    Maybe watchdog_cpus needs to be atomic ?

    That's correct as this again is affected by the asynchronous nature of the
    smpboot thread unpark mechanism.

    CPU 0 CPU1 CPU2
    write(watchdog_thresh, 6)
    stop()
    park()
    update()
    start()
    unpark()
    thread->unpark()
    cnt++;
    write(watchdog_thresh, 5) thread->unpark()
    stop()
    park() thread->park()
    cnt--; cnt++;
    update()
    start()
    unpark()

    That's not a functional problem, it just affects the informational message.

    Convert watchdog_cpus to atomic_t to prevent the problem

    Reported-and-tested-by: Guenter Roeck
    Signed-off-by: Don Zickus
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Link: https://lkml.kernel.org/r/20171101181126.j727fqjmdthjz4xk@redhat.com

    Don Zickus
     
  • …fy deferred event destroy")

    Guenter reported a crash in the watchdog/perf code, which is caused by
    cleanup() and enable() running concurrently. The reason for this is:

    The watchdog functions are serialized via the watchdog_mutex and cpu
    hotplug locking, but the enable of the perf based watchdog happens in
    context of the unpark callback of the smpboot thread. But that unpark
    function is not synchronous inside the locking. The unparking of the thread
    just wakes it up and leaves so there is no guarantee when the thread is
    executing.

    If it starts running _before_ the cleanup happened then it will create a
    event and overwrite the dead event pointer. The new event is then cleaned
    up because the event is marked dead.

    lock(watchdog_mutex);
    lockup_detector_reconfigure();
    cpus_read_lock();
    stop();
    park()
    update();
    start();
    unpark()
    cpus_read_unlock(); thread runs()
    overwrite dead event ptr
    cleanup();
    free new event, which is active inside perf....
    unlock(watchdog_mutex);

    The park side is safe as that actually waits for the thread to reach
    parked state.

    Commit a33d44843d45 removed the protection against this kind of scenario
    under the stupid assumption that the hotplug serialization and the
    watchdog_mutex cover everything.

    Bring it back.

    Reverts: a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
    Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Thomas Feels-stupid Gleixner <tglx@linutronix.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Don Zickus <dzickus@redhat.com>
    Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710312145190.1942@nanos

    Thomas Gleixner
     

28 Sep, 2017

1 commit


26 Sep, 2017

1 commit

  • for_each_cpu() unintuitively reports CPU0 as set independend of the actual
    cpumask content on UP kernels. That leads to a NULL pointer dereference
    when the cleanup function is invoked and there is no event to clean up.

    Reported-by: Fengguang Wu
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

14 Sep, 2017

8 commits

  • Now that all functionality is properly serialized against CPU hotplug,
    remove the extra per cpu storage which holds the disabled events for
    cleanup. The core makes sure that cleanup happens before new events are
    created.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194148.340708074@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Get rid of the hodgepodge which tries to be smart about perf being
    unavailable and error printout rate limiting.

    That's all not required simply because this is never invoked when the perf
    NMI watchdog is not functional.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194148.259651788@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • watchdog_nmi_enable() is an unparseable mess, Provide a clean perf specific
    implementation, which will be used when the existing setup/teardown mess is
    replaced.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194148.180215498@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • The watchdog tries to create perf events even after it figured out that
    perf is not functional or the requested event is not supported.

    That's braindead as this can be done once at init time and if not supported
    the NMI watchdog can be turned off unconditonally.

    Implement the perf hardlockup detector functionality for that. This creates
    a new event create function, which will replace the unholy mess of the
    existing one in later patches.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194148.019090547@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Commit:

    b94f51183b06 ("kernel/watchdog: prevent false hardlockup on overloaded system")

    tries to fix the following issue:

    proc_write()
    set_sample_period() park()
    disable_nmi()

    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1709052038270.2393@nanos
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • The following deadlock is possible in the watchdog hotplug code:

    cpus_write_lock()
    ...
    takedown_cpu()
    smpboot_park_threads()
    smpboot_park_thread()
    kthread_park()
    ->park() := watchdog_disable()
    watchdog_nmi_disable()
    perf_event_release_kernel();
    put_event()
    _free_event()
    ->destroy() := hw_perf_event_destroy()
    x86_release_hardware()
    release_ds_buffers()
    get_online_cpus()

    when a per cpu watchdog perf event is destroyed which drops the last
    reference to the PMU hardware. The cleanup code there invokes
    get_online_cpus() which instantly deadlocks because the hotplug percpu
    rwsem is write locked.

    To solve this add a deferring mechanism:

    cpus_write_lock()
    kthread_park()
    watchdog_nmi_disable(deferred)
    perf_event_disable(event);
    move_event_to_deferred(event);
    ....
    cpus_write_unlock()
    cleaup_deferred_events()
    perf_event_release_kernel()

    This is still properly serialized against concurrent hotplug via the
    cpu_add_remove_lock, which is held by the task which initiated the hotplug
    event.

    This is also used to handle event destruction when the watchdog threads are
    parked via other mechanisms than CPU hotplug.

    Analyzed-by: Peter Zijlstra

    Reported-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194146.884469246@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • The self disabling feature is broken vs. CPU hotplug locking:

    CPU 0 CPU 1
    cpus_write_lock();
    cpu_up(1)
    wait_for_completion()
    ....
    unpark_watchdog()
    ->unpark()
    perf_event_create() parked);

    Result: End of hotplug and instantaneous full lockup of the machine.

    There is a similar problem with disabling the watchdog via the user space
    interface as the sysctl function fiddles with watchdog_enable directly.

    It's very debatable whether this is required at all. If the watchdog works
    nicely on N CPUs and it fails to enable on the N + 1 CPU either during
    hotplug or because the user space interface disabled it via sysctl cpumask
    and then some perf user grabbed the counter which is then unavailable for
    the watchdog when the sysctl cpumask gets changed back.

    There is no real justification for this.

    One of the reasons WHY this is done is the utter stupidity of the init code
    of the perf NMI watchdog. Instead of checking upfront at boot whether PERF
    is available and functional at all, it just does this check at run time
    over and over when user space fiddles with the sysctl. That's broken beyond
    repair along with the idiotic error code dependent warn level printks and
    the even more silly printk rate limiting.

    If the init code checks whether perf works at boot time, then this mess can
    be more or less avoided completely. Perf does not come magically into life
    at runtime. Brain usage while coding is overrated.

    Remove the cruft and add a temporary safe guard which gets removed later.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194146.806708429@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Provide an interface to stop and restart perf NMI watchdog events on all
    CPUs. This is only usable during init and especially for handling the perf
    HT bug on Intel machines. It's safe to use it this way as nothing can
    start/stop the NMI watchdog in parallel.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194146.167649596@linutronix.de
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

18 Aug, 2017

1 commit

  • The hardlockup detector on x86 uses a performance counter based on unhalted
    CPU cycles and a periodic hrtimer. The hrtimer period is about 2/5 of the
    performance counter period, so the hrtimer should fire 2-3 times before the
    performance counter NMI fires. The NMI code checks whether the hrtimer
    fired since the last invocation. If not, it assumess a hard lockup.

    The calculation of those periods is based on the nominal CPU
    frequency. Turbo modes increase the CPU clock frequency and therefore
    shorten the period of the perf/NMI watchdog. With extreme Turbo-modes (3x
    nominal frequency) the perf/NMI period is shorter than the hrtimer period
    which leads to false positives.

    A simple fix would be to shorten the hrtimer period, but that comes with
    the side effect of more frequent hrtimer and softlockup thread wakeups,
    which is not desired.

    Implement a low pass filter, which checks the perf/NMI period against
    kernel time. If the perf/NMI fires before 4/5 of the watchdog period has
    elapsed then the event is ignored and postponed to the next perf/NMI.

    That solves the problem and avoids the overhead of shorter hrtimer periods
    and more frequent softlockup thread wakeups.

    Fixes: 58687acba592 ("lockup_detector: Combine nmi_watchdog and softlockup detector")
    Reported-and-tested-by: Kan Liang
    Signed-off-by: Thomas Gleixner
    Cc: dzickus@redhat.com
    Cc: prarit@redhat.com
    Cc: ak@linux.intel.com
    Cc: babu.moger@oracle.com
    Cc: peterz@infradead.org
    Cc: eranian@google.com
    Cc: acme@redhat.com
    Cc: stable@vger.kernel.org
    Cc: atomlin@redhat.com
    Cc: akpm@linux-foundation.org
    Cc: torvalds@linux-foundation.org
    Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708150931310.1886@nanos

    Thomas Gleixner
     

13 Jul, 2017

2 commits

  • Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
    HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.

    LOCKUP_DETECTOR implies the general boot, sysctl, and programming
    interfaces for the lockup detectors.

    An architecture that wants to use a hard lockup detector must define
    HAVE_HARDLOCKUP_DETECTOR_PERF or HAVE_HARDLOCKUP_DETECTOR_ARCH.

    Alternatively an arch can define HAVE_NMI_WATCHDOG, which provides the
    minimum arch_touch_nmi_watchdog, and it otherwise does its own thing and
    does not implement the LOCKUP_DETECTOR interfaces.

    sparc is unusual in that it has started to implement some of the
    interfaces, but not fully yet. It should probably be converted to a full
    HAVE_HARDLOCKUP_DETECTOR_ARCH.

    [npiggin@gmail.com: fix]
    Link: http://lkml.kernel.org/r/20170617223522.66c0ad88@roar.ozlabs.ibm.com
    Link: http://lkml.kernel.org/r/20170616065715.18390-4-npiggin@gmail.com
    Signed-off-by: Nicholas Piggin
    Reviewed-by: Don Zickus
    Reviewed-by: Babu Moger
    Tested-by: Babu Moger [sparc]
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicholas Piggin
     
  • For architectures that define HAVE_NMI_WATCHDOG, instead of having them
    provide the complete touch_nmi_watchdog() function, just have them
    provide arch_touch_nmi_watchdog().

    This gives the generic code more flexibility in implementing this
    function, and arch implementations don't miss out on touching the
    softlockup watchdog or other generic details.

    Link: http://lkml.kernel.org/r/20170616065715.18390-3-npiggin@gmail.com
    Signed-off-by: Nicholas Piggin
    Reviewed-by: Don Zickus
    Reviewed-by: Babu Moger
    Tested-by: Babu Moger [sparc]
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicholas Piggin
     

02 Mar, 2017

1 commit


23 Feb, 2017

1 commit

  • When CONFIG_BOOTPARAM_HOTPLUG_CPU0 is enabled, the socket containing the
    boot cpu can be replaced. During the hot add event, the message

    NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.

    is output implying that the NMI watchdog was disabled at some point. This
    is not the case and the message has caused confusion for users of systems
    that support the removal of the boot cpu socket.

    The watchdog code is coded to assume that cpu 0 is always the first cpu to
    initialize the watchdog, and the last to stop its watchdog thread. That
    is not the case for initializing if cpu 0 has been removed and added. The
    removal case has never been correct because the smpboot code will remove
    the watchdog threads starting with the lowest cpu number.

    This patch adds watchdog_cpus to track the number of cpus with active NMI
    watchdog threads so that the first and last thread can be used to set and
    clear the value of firstcpu_err. firstcpu_err is set when the first
    watchdog thread is enabled, and cleared when the last watchdog thread is
    disabled.

    Link: http://lkml.kernel.org/r/1480425321-32296-1-git-send-email-prarit@redhat.com
    Signed-off-by: Prarit Bhargava
    Acked-by: Don Zickus
    Cc: Borislav Petkov
    Cc: Tejun Heo
    Cc: Hidehiro Kawai
    Cc: Thomas Gleixner
    Cc: Andi Kleen
    Cc: Joshua Hunt
    Cc: Ingo Molnar
    Cc: Babu Moger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Prarit Bhargava
     

25 Jan, 2017

1 commit

  • On an overloaded system, it is possible that a change in the watchdog
    threshold can be delayed long enough to trigger a false positive.

    This can easily be achieved by having a cpu spinning indefinitely on a
    task, while another cpu updates watchdog threshold.

    What happens is while trying to park the watchdog threads, the hrtimers
    on the other cpus trigger and reprogram themselves with the new slower
    watchdog threshold. Meanwhile, the nmi watchdog is still programmed
    with the old faster threshold.

    Because the one cpu is blocked, it prevents the thread parking on the
    other cpus from completing, which is needed to shutdown the nmi watchdog
    and reprogram it correctly. As a result, a false positive from the nmi
    watchdog is reported.

    Fix this by setting a park_in_progress flag to block all lockups until
    the parking is complete.

    Fix provided by Ulrich Obergfell.

    [akpm@linux-foundation.org: s/park_in_progress/watchdog_park_in_progress/]
    Link: http://lkml.kernel.org/r/1481041033-192236-1-git-send-email-dzickus@redhat.com
    Signed-off-by: Don Zickus
    Reviewed-by: Aaron Tomlin
    Cc: Ulrich Obergfell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Don Zickus
     

15 Dec, 2016

1 commit

  • Separate hardlockup code from watchdog.c and move it to watchdog_hld.c.
    It is mostly straight forward. Remove everything inside
    CONFIG_HARDLOCKUP_DETECTORS. This code will go to file watchdog_hld.c.
    Also update the makefile accordigly.

    Link: http://lkml.kernel.org/r/1478034826-43888-3-git-send-email-babu.moger@oracle.com
    Signed-off-by: Babu Moger
    Acked-by: Don Zickus
    Cc: Ingo Molnar
    Cc: Jiri Kosina
    Cc: Andi Kleen
    Cc: Yaowei Bai
    Cc: Aaron Tomlin
    Cc: Ulrich Obergfell
    Cc: Tejun Heo
    Cc: Hidehiro Kawai
    Cc: Josh Hunt
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Babu Moger