15 Nov, 2020

1 commit

  • Define watchdog_allowed_mask only when SOFTLOCKUP_DETECTOR is enabled.

    Fixes: 7feeb9cd4f5b ("watchdog/sysctl: Clean up sysctl variable name space")
    Signed-off-by: Santosh Sivaraj
    Signed-off-by: Andrew Morton
    Reviewed-by: Petr Mladek
    Cc: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20201106015025.1281561-1-santosh@fossix.org
    Signed-off-by: Linus Torvalds

    Santosh Sivaraj
     

09 Jun, 2020

1 commit

  • After a recent change introduced by Vlastimil's series [0], kernel is
    able now to handle sysctl parameters on kernel command line; also, the
    series introduced a simple infrastructure to convert legacy boot
    parameters (that duplicate sysctls) into sysctl aliases.

    This patch converts the watchdog parameters softlockup_panic and
    {hard,soft}lockup_all_cpu_backtrace to use the new alias infrastructure.
    It fixes the documentation too, since the alias only accepts values 0 or
    1, not the full range of integers.

    We also took the opportunity here to improve the documentation of the
    previously converted hung_task_panic (see the patch series [0]) and put
    the alias table in alphabetical order.

    [0] http://lkml.kernel.org/r/20200427180433.7029-1-vbabka@suse.cz

    Signed-off-by: Guilherme G. Piccoli
    Signed-off-by: Andrew Morton
    Acked-by: Vlastimil Babka
    Cc: Kees Cook
    Cc: Iurii Zaikin
    Cc: Luis Chamberlain
    Link: http://lkml.kernel.org/r/20200507214624.21911-1-gpiccoli@canonical.com
    Signed-off-by: Linus Torvalds

    Guilherme G. Piccoli
     

27 Apr, 2020

1 commit

  • Instead of having all the sysctl handlers deal with user pointers, which
    is rather hairy in terms of the BPF interaction, copy the input to and
    from userspace in common code. This also means that the strings are
    always NUL-terminated by the common code, making the API a little bit
    safer.

    As most handler just pass through the data to one of the common handlers
    a lot of the changes are mechnical.

    Signed-off-by: Christoph Hellwig
    Acked-by: Andrey Ignatov
    Signed-off-by: Al Viro

    Christoph Hellwig
     

17 Jan, 2020

1 commit

  • Robert reported that during boot the watchdog timestamp is set to 0 for one
    second which is the indicator for a watchdog reset.

    The reason for this is that the timestamp is in seconds and the time is
    taken from sched clock and divided by ~1e9. sched clock starts at 0 which
    means that for the first second during boot the watchdog timestamp is 0,
    i.e. reset.

    Use ULONG_MAX as the reset indicator value so the watchdog works correctly
    right from the start. ULONG_MAX would only conflict with a real timestamp
    if the system reaches an uptime of 136 years on 32bit and almost eternity
    on 64bit.

    Reported-by: Robert Richter
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/87o8v3uuzl.fsf@nanos.tec.linutronix.de

    Thomas Gleixner
     

16 Jan, 2020

2 commits

  • commit 9cf57731b63e ("watchdog/softlockup: Replace "watchdog/%u" threads
    with cpu_stop_work") ensures that the watchdog is reliably touched during
    a task switch.

    As a result the check for an unnoticed task switch is not longer needed.

    Remove the relevant code, which effectively reverts commit b1a8de1f5343
    ("softlockup: make detector be aware of task switch of processes hogging
    cpu")

    Signed-off-by: Petr Mladek
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Ziljstra
    Link: https://lore.kernel.org/r/20191024114928.15377-2-pmladek@suse.com

    Petr Mladek
     
  • After commit 9cf57731b63e ("watchdog/softlockup: Replace "watchdog/%u"
    threads with cpu_stop_work"), the percpu soft_lockup_hrtimer_cnt is
    not used any more, so remove it and related code.

    Signed-off-by: Jisheng Zhang
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/20191218131720.4146aea2@xhacker.debian

    Jisheng Zhang
     

02 Aug, 2019

1 commit

  • The watchdog hrtimer must expire in hard interrupt context even on
    PREEMPT_RT=y kernels as otherwise the hard/softlockup detection logic would
    not work.

    No functional change.

    [ tglx: Split out from larger combo patch. Added changelog ]

    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20190726185753.262895510@linutronix.de

    Sebastian Andrzej Siewior
     

18 Apr, 2019

1 commit

  • Signed-off-by: Arash Fotouhi
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: loberman@redhat.com
    Cc: vincent.whitchurch@axis.com
    Link: http://lkml.kernel.org/r/1553308112-3513-1-git-send-email-arash@arashfotouhi.com
    Signed-off-by: Ingo Molnar

    Arash Fotouhi
     

28 Mar, 2019

1 commit

  • The rework of the watchdog core to use cpu_stop_work broke the watchdog
    cpumask on CPU hotplug.

    The watchdog_enable/disable() functions are now called unconditionally from
    the hotplug callback, i.e. even on CPUs which are not in the watchdog
    cpumask. As a consequence the watchdog can become unstoppable.

    Only invoke them when the plugged CPU is in the watchdog cpumask.

    Fixes: 9cf57731b63e ("watchdog/softlockup: Replace "watchdog/%u" threads with cpu_stop_work")
    Reported-by: Maxime Coquelin
    Signed-off-by: Thomas Gleixner
    Tested-by: Maxime Coquelin
    Cc: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Don Zickus
    Cc: Ricardo Neri
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1903262245490.1789@nanos.tec.linutronix.de

    Thomas Gleixner
     

22 Mar, 2019

1 commit

  • sparse complains:
    CHECK kernel/watchdog.c
    kernel/watchdog.c:45:19: warning: symbol 'nmi_watchdog_available'
    was not declared. Should it be static?
    kernel/watchdog.c:47:16: warning: symbol 'watchdog_allowed_mask'
    was not declared. Should it be static?

    They're not referenced by name from anyplace else, make them static.

    Signed-off-by: Valdis Kletnieks
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/7855.1552383228@turing-police

    Valdis Kletnieks
     

01 Nov, 2018

1 commit

  • The hard and soft lockup detector threshold has a default value of 10
    seconds which can only be changed via sysctl.

    During early boot lockup detection can trigger when noisy debugging emits
    a large amount of messages to the console, but there is no way to set a
    larger threshold on the kernel command line. The detector can only be
    completely disabled.

    Add a new watchdog_thresh= command line parameter to allow boot time
    control over the threshold. It works in the same way as the sysctl and
    affects both the soft and the hard lockup detectors.

    Signed-off-by: Laurence Oberman
    Signed-off-by: Thomas Gleixner
    Cc: rdunlap@infradead.org
    Cc: prarit@redhat.com
    Link: https://lkml.kernel.org/r/1541079018-13953-1-git-send-email-loberman@redhat.com

    Laurence Oberman
     

30 Aug, 2018

1 commit

  • Some architectures need to use stop_machine() to patch functions for
    ftrace, and the assumption is that the stopped CPUs do not make function
    calls to traceable functions when they are in the stopped state.

    Commit ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after
    MULTI_STOP_PREPARE") added calls to the watchdog touch functions from
    the stopped CPUs and those functions lack notrace annotations. This
    leads to crashes when enabling/disabling ftrace on ARM kernels built
    with the Thumb-2 instruction set.

    Fix it by adding the necessary notrace annotations.

    Fixes: ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE")
    Signed-off-by: Vincent Whitchurch
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: oleg@redhat.com
    Cc: tj@kernel.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180821152507.18313-1-vincent.whitchurch@axis.com

    Vincent Whitchurch
     

16 Jul, 2018

1 commit

  • When scheduling is delayed for longer than the softlockup interrupt
    period it is possible to double-queue the cpu_stop_work, causing list
    corruption.

    Cure this by adding a completion to track the cpu_stop_work's
    progress.

    Reported-by: kernel test robot
    Tested-by: Rong Chen
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: 9cf57731b63e ("watchdog/softlockup: Replace "watchdog/%u" threads with cpu_stop_work")
    Link: http://lkml.kernel.org/r/20180713104208.GW2494@hirez.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

03 Jul, 2018

1 commit

  • Oleg suggested to replace the "watchdog/%u" threads with
    cpu_stop_work. That removes one thread per CPU while at the same time
    fixes softlockup vs SCHED_DEADLINE.

    But more importantly, it does away with the single
    smpboot_update_cpumask_percpu_thread() user, which allows
    cleanups/shrinkage of the smpboot interface.

    Suggested-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

08 Nov, 2017

1 commit


02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

27 Oct, 2017

3 commits

  • Before we implement isolcpus under housekeeping, we need the isolation
    features to be more finegrained. For example some people want NOHZ_FULL
    without the full scheduler isolation, others want full scheduler
    isolation without NOHZ_FULL.

    So let's cut all these isolation features piecewise, at the risk of
    overcutting it right now. We can still merge some flags later if they
    always make sense together.

    Signed-off-by: Frederic Weisbecker
    Acked-by: Thomas Gleixner
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Linus Torvalds
    Cc: Luiz Capitulino
    Cc: Mike Galbraith
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1509072159-31808-9-git-send-email-frederic@kernel.org
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • While trying to disable the watchog on nohz_full CPUs, the watchdog
    implements an ad-hoc version of housekeeping_cpumask(). Lets replace
    those re-invented lines.

    Signed-off-by: Frederic Weisbecker
    Acked-by: Thomas Gleixner
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Linus Torvalds
    Cc: Luiz Capitulino
    Cc: Mike Galbraith
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1509072159-31808-3-git-send-email-frederic@kernel.org
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • The housekeeping code is currently tied to the NOHZ code. As we are
    planning to make housekeeping independent from it, start with moving
    the relevant code to its own file.

    Signed-off-by: Frederic Weisbecker
    Acked-by: Thomas Gleixner
    Acked-by: Paul E. McKenney
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Linus Torvalds
    Cc: Luiz Capitulino
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1509072159-31808-2-git-send-email-frederic@kernel.org
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

04 Oct, 2017

5 commits

  • The variable is unused when the softlockup detector is disabled in Kconfig.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • The function names made sense up to the point where the watchdog
    (re)configuration was unified to use softlockup_reconfigure_threads() for
    all configuration purposes. But that includes scenarios which solely
    configure the nmi watchdog.

    Rename softlockup_reconfigure_threads() and softlockup_init_threads() so
    the function names match the functionality.

    Signed-off-by: Thomas Gleixner
    Cc: Linus Torvalds
    Cc: Michael Ellerman
    Cc: Peter Zijlstra
    Cc: Don Zickus

    Thomas Gleixner
     
  • The rework of the core hotplug code triggers the WARN_ON in start_wd_cpu()
    on powerpc because it is called multiple times for the boot CPU.

    The first call is via:

    start_wd_on_cpu+0x80/0x2f0
    watchdog_nmi_reconfigure+0x124/0x170
    softlockup_reconfigure_threads+0x110/0x130
    lockup_detector_init+0xbc/0xe0
    kernel_init_freeable+0x18c/0x37c
    kernel_init+0x2c/0x160
    ret_from_kernel_thread+0x5c/0xbc

    And then again via the CPU hotplug registration:

    start_wd_on_cpu+0x80/0x2f0
    cpuhp_invoke_callback+0x194/0x620
    cpuhp_thread_fun+0x7c/0x1b0
    smpboot_thread_fn+0x290/0x2a0
    kthread+0x168/0x1b0
    ret_from_kernel_thread+0x5c/0xbc

    This can be avoided by setting up the cpu hotplug state with nocalls and
    move the initialization to the watchdog_nmi_probe() function. That
    initializes the hotplug callbacks without invoking the callback and the
    following core initialization function then configures the watchdog for the
    online CPUs (in this case CPU0) via softlockup_reconfigure_threads().

    Reported-and-tested-by: Michael Ellerman
    Signed-off-by: Thomas Gleixner
    Acked-by: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Nicholas Piggin
    Cc: linuxppc-dev@lists.ozlabs.org

    Thomas Gleixner
     
  • Instead of dropping the cpu hotplug lock after stopping NMI watchdog and
    threads and reaquiring for restart, the code and the protection rules
    become more obvious when holding cpu hotplug lock across the full
    reconfiguration.

    Suggested-by: Linus Torvalds
    Signed-off-by: Thomas Gleixner
    Acked-by: Michael Ellerman
    Cc: Peter Zijlstra
    Cc: Don Zickus
    Cc: Benjamin Herrenschmidt
    Cc: Nicholas Piggin
    Cc: linuxppc-dev@lists.ozlabs.org
    Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710022105570.2114@nanos

    Thomas Gleixner
     
  • The recent cleanup of the watchdog code split watchdog_nmi_reconfigure()
    into two stages. One to stop the NMI and one to restart it after
    reconfiguration. That was done by adding a boolean 'run' argument to the
    code, which is functionally correct but not necessarily a piece of art.

    Replace it by two explicit functions: watchdog_nmi_stop() and
    watchdog_nmi_start().

    Fixes: 6592ad2fcc8f ("watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage")
    Requested-by: Linus 'Nursing his pet-peeve' Torvalds
    Signed-off-by: Thomas 'Mopping up garbage' Gleixner
    Acked-by: Michael Ellerman
    Cc: Peter Zijlstra
    Cc: Don Zickus
    Cc: Benjamin Herrenschmidt
    Cc: Nicholas Piggin
    Cc: linuxppc-dev@lists.ozlabs.org
    Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710021957480.2114@nanos

    Thomas Gleixner
     

14 Sep, 2017

16 commits

  • All watchdog thread related functions are delegated to the smpboot thread
    infrastructure, which handles serialization against CPU hotplug correctly.

    The sysctl interface is completely decoupled from anything which requires
    CPU hotplug protection.

    No need to protect the sysctl writes against cpu hotplug anymore. Remove it
    and add the now required protection to the powerpc arch_nmi_watchdog
    implementation.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Cc: linuxppc-dev@lists.ozlabs.org
    Link: http://lkml.kernel.org/r/20170912194148.418497420@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Get rid of the hodgepodge which tries to be smart about perf being
    unavailable and error printout rate limiting.

    That's all not required simply because this is never invoked when the perf
    NMI watchdog is not functional.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194148.259651788@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Use the init time detection of the perf NMI watchdog to determine whether
    the perf NMI watchdog is functional. If not disable it permanentely. It
    won't come back magically at runtime.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194148.099799541@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Letting user space poke directly at variables which are used at run time is
    stupid and causes a lot of race conditions and other issues.

    Seperate the user variables and on change invoke the reconfiguration, which
    then stops the watchdogs, reevaluates the new user value and restarts the
    watchdogs with the new parameters.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194147.939985640@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Both the perf reconfiguration and the powerpc watchdog_nmi_reconfigure()
    need to be done in two steps.

    1) Stop all NMIs
    2) Read the new parameters and start NMIs

    Right now watchdog_nmi_reconfigure() is a combination of both. To allow a
    clean reconfiguration add a 'run' argument and split the functionality in
    powerpc.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Cc: linuxppc-dev@lists.ozlabs.org
    Link: http://lkml.kernel.org/r/20170912194147.862865570@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Reflect that these variables are user interface related and remove the
    whitespace damage in the sysctl table while at it.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194147.783210221@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Use a single function to update sysctl changes. This is not a high
    frequency user space interface and it's root only.

    Preparatory patch to cleanup the sysctl variable handling.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194147.549114957@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • The lockup detector reconfiguration tears down all watchdog threads when
    the watchdog is disabled and sets them up again when its enabled.

    That's a pointless exercise. The watchdog threads are not consuming an
    insane amount of resources, so it's enough to set them up at init time and
    keep them in parked position when the watchdog is disabled and unpark them
    when it is reenabled. The smpboot thread infrastructure takes care of
    keeping the force parked threads in place even across cpu hotplug.

    Aside of that the code implements the park/unpark facility of smp hotplug
    threads on its own, which is even more pointless. We have functionality in
    the smpboot thread code to do so.

    Use the new thread management functions and get rid of the unholy mess.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194147.470370113@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • The lockup detector reconfiguration tears down all watchdog threads when
    the watchdog is disabled and sets them up again when its enabled.

    That's a pointless exercise. The watchdog threads are not consuming an
    insane amount of resources, so it's enough to set them up at init time and
    keep them in parked position when the watchdog is disabled and unpark them
    when it is reenabled. The smpboot thread infrastructure takes care of
    keeping the force parked threads in place even across cpu hotplug.

    Another horrible mechanism are the open coded park/unpark loops which are
    used for reconfiguration of the watchdog. The smpboot infrastructure allows
    exactly the same via smpboot_update_cpumask_thread_percpu(), which is cpu
    hotplug safe. Using that instead of the open coded loops allows to get rid
    of the hotplug locking mess in the watchdog code.

    Implement a clean infrastructure which allows to replace the open coded
    nonsense.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194147.377182587@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • smpboot_update_cpumask_threads_percpu() allocates a temporary cpumask at
    runtime. This is suboptimal because the call site needs more code size for
    proper error handling than a statically allocated temporary mask requires
    data size.

    Add static temporary cpumask. The function is globaly serialized, so no
    further protection required.

    Remove the half baken error handling in the watchdog code and get rid of
    the export as there are no in tree modular users of that function.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194147.297288838@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Split the write part of the cpumask proc handler out into a separate helper
    to avoid deep indentation. This also reduces the patch complexity in the
    following cleanups.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194147.218075991@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • The #ifdef maze in this file is horrible, group stuff at least a bit so one
    can figure out what belongs to what.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194147.139629546@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Having stub functions which take a full page is not helping the
    readablility of code.

    Condense them and move the doubled #ifdef variant into the SYSFS section.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194147.045545271@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Commit:

    b94f51183b06 ("kernel/watchdog: prevent false hardlockup on overloaded system")

    tries to fix the following issue:

    proc_write()
    set_sample_period() park()
    disable_nmi()

    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1709052038270.2393@nanos
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • The following deadlock is possible in the watchdog hotplug code:

    cpus_write_lock()
    ...
    takedown_cpu()
    smpboot_park_threads()
    smpboot_park_thread()
    kthread_park()
    ->park() := watchdog_disable()
    watchdog_nmi_disable()
    perf_event_release_kernel();
    put_event()
    _free_event()
    ->destroy() := hw_perf_event_destroy()
    x86_release_hardware()
    release_ds_buffers()
    get_online_cpus()

    when a per cpu watchdog perf event is destroyed which drops the last
    reference to the PMU hardware. The cleanup code there invokes
    get_online_cpus() which instantly deadlocks because the hotplug percpu
    rwsem is write locked.

    To solve this add a deferring mechanism:

    cpus_write_lock()
    kthread_park()
    watchdog_nmi_disable(deferred)
    perf_event_disable(event);
    move_event_to_deferred(event);
    ....
    cpus_write_unlock()
    cleaup_deferred_events()
    perf_event_release_kernel()

    This is still properly serialized against concurrent hotplug via the
    cpu_add_remove_lock, which is held by the task which initiated the hotplug
    event.

    This is also used to handle event destruction when the watchdog threads are
    parked via other mechanisms than CPU hotplug.

    Analyzed-by: Peter Zijlstra

    Reported-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194146.884469246@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • The self disabling feature is broken vs. CPU hotplug locking:

    CPU 0 CPU 1
    cpus_write_lock();
    cpu_up(1)
    wait_for_completion()
    ....
    unpark_watchdog()
    ->unpark()
    perf_event_create() parked);

    Result: End of hotplug and instantaneous full lockup of the machine.

    There is a similar problem with disabling the watchdog via the user space
    interface as the sysctl function fiddles with watchdog_enable directly.

    It's very debatable whether this is required at all. If the watchdog works
    nicely on N CPUs and it fails to enable on the N + 1 CPU either during
    hotplug or because the user space interface disabled it via sysctl cpumask
    and then some perf user grabbed the counter which is then unavailable for
    the watchdog when the sysctl cpumask gets changed back.

    There is no real justification for this.

    One of the reasons WHY this is done is the utter stupidity of the init code
    of the perf NMI watchdog. Instead of checking upfront at boot whether PERF
    is available and functional at all, it just does this check at run time
    over and over when user space fiddles with the sysctl. That's broken beyond
    repair along with the idiotic error code dependent warn level printks and
    the even more silly printk rate limiting.

    If the init code checks whether perf works at boot time, then this mess can
    be more or less avoided completely. Perf does not come magically into life
    at runtime. Brain usage while coding is overrated.

    Remove the cruft and add a temporary safe guard which gets removed later.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Don Zickus
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Metcalf
    Cc: Linus Torvalds
    Cc: Nicholas Piggin
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Ulrich Obergfell
    Link: http://lkml.kernel.org/r/20170912194146.806708429@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner