20 Nov, 2015

1 commit

  • Emulate NMIs on systems where they are not available by using timer
    interrupts on other cpus. Each cpu will use its softlockup hrtimer
    to check that the next cpu is processing hrtimer interrupts by
    verifying that a counter is increasing.

    This patch is useful on systems where the hardlockup detector is not
    available due to a lack of NMIs, for example most ARM SoCs.
    Without this patch any cpu stuck with interrupts disabled can
    cause a hardware watchdog reset with no debugging information,
    but with this patch the kernel can detect the lockup and panic,
    which can result in useful debugging info.

    Change-Id: Ia5faf50243e19c1755201212e04c8892d929785a
    Signed-off-by: Colin Cross

    Colin Cross
     

20 May, 2015

1 commit

  • Commit ab992dc38f9a ("watchdog: Fix merge 'conflict'") has introduced an
    obvious deadlock because of a typo. watchdog_proc_mutex should be
    unlocked on exit.

    Thanks to Miroslav Benes who was staring at the code with me and noticed
    this.

    Signed-off-by: Michal Hocko
    Duh-by: Peter Zijlstra
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

19 May, 2015

1 commit

  • Two watchdog changes that came through different trees had a non
    conflicting conflict, that is, one changed the semantics of a variable
    but no actual code conflict happened. So the merge appeared fine, but
    the resulting code did not behave as expected.

    Commit 195daf665a62 ("watchdog: enable the new user interface of the
    watchdog mechanism") changes the semantics of watchdog_user_enabled,
    which thereafter is only used by the functions introduced by
    b3738d293233 ("watchdog: Add watchdog enable/disable all functions").

    There further appears to be a distinct lack of serialization between
    setting and using watchdog_enabled, so perhaps we should wrap the
    {en,dis}able_all() things in watchdog_proc_mutex.

    This patch fixes a s2r failure reported by Michal; which I cannot
    readily explain. But this does make the code internally consistent
    again.

    Reported-and-tested-by: Michal Hocko
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

15 Apr, 2015

10 commits

  • Merge first patchbomb from Andrew Morton:

    - arch/sh updates

    - ocfs2 updates

    - kernel/watchdog feature

    - about half of mm/

    * emailed patches from Andrew Morton : (122 commits)
    Documentation: update arch list in the 'memtest' entry
    Kconfig: memtest: update number of test patterns up to 17
    arm: add support for memtest
    arm64: add support for memtest
    memtest: use phys_addr_t for physical addresses
    mm: move memtest under mm
    mm, hugetlb: abort __get_user_pages if current has been oom killed
    mm, mempool: do not allow atomic resizing
    memcg: print cgroup information when system panics due to panic_on_oom
    mm: numa: remove migrate_ratelimited
    mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE
    mm: split ET_DYN ASLR from mmap ASLR
    s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE
    mm: expose arch_mmap_rnd when available
    s390: standardize mmap_rnd() usage
    powerpc: standardize mmap_rnd() usage
    mips: extract logic for mmap_rnd()
    arm64: standardize mmap_rnd() usage
    x86: standardize mmap_rnd() usage
    arm: factor out mmap ASLR into mmap_rnd
    ...

    Linus Torvalds
     
  • Have kvm_guest_init() use hardlockup_detector_disable() instead of
    watchdog_enable_hardlockup_detector(false).

    Remove the watchdog_hardlockup_detector_is_enabled() and the
    watchdog_enable_hardlockup_detector() function which are no longer needed.

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • Rename the update_timers*() functions to update_watchdog*().

    Remove the boolean argument from watchdog_enable_all_cpus() because
    update_watchdog_all_cpus() is now a generic function to change the run
    state of the lockup detectors and to have the lockup detectors use a new
    sample period.

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • With the current user interface of the watchdog mechanism it is only
    possible to disable or enable both lockup detectors at the same time.
    This series introduces new kernel parameters and changes the semantics of
    some existing kernel parameters, so that the hard lockup detector and the
    soft lockup detector can be disabled or enabled individually. With this
    series applied, the user interface is as follows.

    - parameters in /proc/sys/kernel

    . soft_watchdog
    This is a new parameter to control and examine the run state of
    the soft lockup detector.

    . nmi_watchdog
    The semantics of this parameter have changed. It can now be used
    to control and examine the run state of the hard lockup detector.

    . watchdog
    This parameter is still available to control the run state of both
    lockup detectors at the same time. If this parameter is examined,
    it shows the logical OR of soft_watchdog and nmi_watchdog.

    . watchdog_thresh
    The semantics of this parameter are not affected by the patch.

    - kernel command line parameters

    . nosoftlockup
    The semantics of this parameter have changed. It can now be used
    to disable the soft lockup detector at boot time.

    . nmi_watchdog=0 or nmi_watchdog=1
    Disable or enable the hard lockup detector at boot time. The patch
    introduces '=1' as a new option.

    . nowatchdog
    The semantics of this parameter are not affected by the patch. It
    is still available to disable both lockup detectors at boot time.

    Also, remove the proc_dowatchdog() function which is no longer needed.

    [dzickus@redhat.com: wrote changelog]
    [dzickus@redhat.com: update documentation for kernel params and sysctl]
    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • If watchdog_nmi_enable() fails to set up the hardware perf event of one
    CPU, the entire hard lockup detector is deemed unreliable. Hence, disable
    the hard lockup detector and shut down the hardware perf events on all
    CPUs.

    [dzickus@redhat.com: update comments to explain some code]
    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • Separate handlers for each watchdog parameter in /proc/sys/kernel replace
    the proc_dowatchdog() function. Three of those handlers merely call
    proc_watchdog_common() with one different argument.

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • Three of four handlers for the watchdog parameters in /proc/sys/kernel
    essentially have to do the same thing.

    if the parameter is being read {
    return the state of the corresponding bit(s) in 'watchdog_enabled'
    } else {
    set/clear the state of the corresponding bit(s) in 'watchdog_enabled'
    update the run state of the lockup detector(s)
    }

    Hence, introduce a common function that can be called by those handlers.
    The callers pass a 'bit mask' to this function to indicate which bit(s)
    should be set/cleared in 'watchdog_enabled'.

    This function handles an uncommon race with watchdog_nmi_enable() where a
    concurrent update of 'watchdog_enabled' is possible. We use 'cmpxchg' to
    detect the concurrency. [This avoids introducing a new spinlock or a
    mutex to synchronize updates of 'watchdog_enabled'. Using the same lock
    or mutex in watchdog thread context and in system call context needs to be
    considered carefully because it can make the code prone to deadlock
    situations in connection with parking/unparking the watchdog threads.]

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • This series removes proc_dowatchdog(). Since multiple new functions need
    the 'watchdog_proc_mutex' to serialize access to the watchdog parameters
    in /proc/sys/kernel, move the mutex outside of any function.

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • This series introduces a separate handler for each watchdog parameter in
    /proc/sys/kernel. The separate handlers need a common function that they
    can call to update the run state of the lockup detectors, or to have the
    lockup detectors use a new sample period.

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • The hardlockup and softockup had always been tied together. Due to the
    request of KVM folks, they had a need to have one enabled but not the
    other. Internally rework the code to split things apart more cleanly.

    There is a bunch of churn here, but the end result should be code that
    should be easier to maintain and fix without knowing the internals of what
    is going on.

    This patch (of 9):

    Introduce new definitions and variables to separate the user interface in
    /proc/sys/kernel from the internal run state of the lockup detectors. The
    internal run state is represented by two bits in a new variable that is
    named 'watchdog_enabled'. This helps simplify the code, for example:

    - In order to check if any of the two lockup detectors is enabled,
    it is sufficient to check if 'watchdog_enabled' is not zero.

    - In order to enable/disable one or both lockup detectors,
    it is sufficient to set/clear one or both bits in 'watchdog_enabled'.

    - Concurrent updates of 'watchdog_enabled' need not be synchronized via
    a spinlock or a mutex. Updates can either be atomic or concurrency can
    be detected by using 'cmpxchg'.

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     

02 Apr, 2015

1 commit

  • This patch adds two new functions to enable/disable
    the watchdog across all CPUs.

    This will be used by the HT PMU bug workaround code to
    disable/enable the NMI watchdog across quirk enablement.

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: bp@alien8.de
    Cc: jolsa@redhat.com
    Cc: kan.liang@intel.com
    Cc: maria.n.dimakopoulou@gmail.com
    Cc: Frederic Weisbecker
    Cc: Don Zickus
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/1416251225-17721-12-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     

13 Feb, 2015

1 commit

  • When the hypervisor pauses a virtualised kernel the kernel will observe a
    jump in timebase, this can cause spurious messages from the softlockup
    detector.

    Whilst these messages are harmless, they are accompanied with a stack
    trace which causes undue concern and more problematically the stack trace
    in the guest has nothing to do with the observed problem and can only be
    misleading.

    Futhermore, on POWER8 this is completely avoidable with the introduction
    of the Virtual Time Base (VTB) register.

    This patch (of 2):

    This permits the use of arch specific clocks for which virtualised kernels
    can use their notion of 'running' time, not the elpased wall time which
    will include host execution time.

    Signed-off-by: Cyril Bur
    Cc: Michael Ellerman
    Cc: Andrew Jones
    Acked-by: Don Zickus
    Cc: Ingo Molnar
    Cc: Ulrich Obergfell
    Cc: chai wen
    Cc: Fabian Frederick
    Cc: Aaron Tomlin
    Cc: Ben Zhang
    Cc: Martin Schwidefsky
    Cc: John Stultz
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyril Bur
     

15 Oct, 2014

1 commit

  • Pull percpu consistent-ops changes from Tejun Heo:
    "Way back, before the current percpu allocator was implemented, static
    and dynamic percpu memory areas were allocated and handled separately
    and had their own accessors. The distinction has been gone for many
    years now; however, the now duplicate two sets of accessors remained
    with the pointer based ones - this_cpu_*() - evolving various other
    operations over time. During the process, we also accumulated other
    inconsistent operations.

    This pull request contains Christoph's patches to clean up the
    duplicate accessor situation. __get_cpu_var() uses are replaced with
    with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

    Unfortunately, the former sometimes is tricky thanks to C being a bit
    messy with the distinction between lvalues and pointers, which led to
    a rather ugly solution for cpumask_var_t involving the introduction of
    this_cpu_cpumask_var_ptr().

    This converts most of the uses but not all. Christoph will follow up
    with the remaining conversions in this merge window and hopefully
    remove the obsolete accessors"

    * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
    irqchip: Properly fetch the per cpu offset
    percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
    ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
    percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
    Revert "powerpc: Replace __get_cpu_var uses"
    percpu: Remove __this_cpu_ptr
    clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
    sparc: Replace __get_cpu_var uses
    avr32: Replace __get_cpu_var with __this_cpu_write
    blackfin: Replace __get_cpu_var uses
    tile: Use this_cpu_ptr() for hardware counters
    tile: Replace __get_cpu_var uses
    powerpc: Replace __get_cpu_var uses
    alpha: Replace __get_cpu_var
    ia64: Replace __get_cpu_var uses
    s390: cio driver &__get_cpu_var replacements
    s390: Replace __get_cpu_var uses
    mips: Replace __get_cpu_var uses
    MIPS: Replace __get_cpu_var uses in FPU emulator.
    arm: Replace __this_cpu_ptr with raw_cpu_ptr
    ...

    Linus Torvalds
     

14 Oct, 2014

1 commit

  • In some cases we don't want hard lockup detection enabled by default.
    An example is when running as a guest. Introduce

    watchdog_enable_hardlockup_detector(bool)

    allowing those cases to disable hard lockup detection. This must be
    executed early by the boot processor from e.g. smp_prepare_boot_cpu, in
    order to allow kernel command line arguments to override it, as well as
    to avoid hard lockup detection being enabled before we've had a chance
    to indicate that it's unwanted. In summary,

    initial boot: default=enabled
    smp_prepare_boot_cpu
    watchdog_enable_hardlockup_detector(false): default=disabled
    cmdline has 'nmi_watchdog=1': default=enabled

    The running kernel still has the ability to enable/disable at any time
    with /proc/sys/kernel/nmi_watchdog us usual. However even when the
    default has been overridden /proc/sys/kernel/nmi_watchdog will initially
    show '1'. To truly turn it on one must disable/enable it, i.e.

    echo 0 > /proc/sys/kernel/nmi_watchdog
    echo 1 > /proc/sys/kernel/nmi_watchdog

    This patch will be immediately useful for KVM with the next patch of this
    series. Other hypervisor guest types may find it useful as well.

    [akpm@linux-foundation.org: fix build]
    [dzickus@redhat.com: fix compile issues on sparc]
    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Andrew Jones
    Signed-off-by: Don Zickus
    Signed-off-by: Don Zickus
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     

13 Oct, 2014

1 commit


10 Oct, 2014

1 commit

  • For now, soft lockup detector warns once for each case of process
    softlockup. But the thread 'watchdog/n' may not always get the cpu at the
    time slot between the task switch of two processes hogging that cpu to
    reset soft_watchdog_warn.

    An example would be two processes hogging the cpu. Process A causes the
    softlockup warning and is killed manually by a user. Process B
    immediately becomes the new process hogging the cpu preventing the
    softlockup code from resetting the soft_watchdog_warn variable.

    This case is a false negative of "warn only once for a process", as there
    may be a different process that is going to hog the cpu. Resolve this by
    saving/checking the task pointer of the hogging process and use that to
    reset soft_watchdog_warn too.

    [dzickus@redhat.com: update comment]
    Signed-off-by: chai wen
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    chai wen
     

27 Aug, 2014

1 commit

  • Most of these are the uses of &__raw_get_cpu_var for address calculation.

    touch_softlockup_watchdog_sync() uses __raw_get_cpu_var to write to
    per cpu variables. Use __this_cpu_write instead.

    Cc: Wim Van Sebroeck
    Cc: linux-watchdog@vger.kernel.org
    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Christoph Lameter
     

18 Aug, 2014

2 commits

  • This patch avoids printing the message 'enabled on all CPUs,
    ...' multiple times. For example, the issue can occur in the
    following scenario:

    1) watchdog_nmi_enable() fails to enable PMU counters and sets
    cpu0_err.

    2) 'echo [0|1] > /proc/sys/kernel/nmi_watchdog' is executed to
    disable and re-enable the watchdog mechanism 'on the fly'.

    3) If watchdog_nmi_enable() succeeds to enable PMU counters,
    each CPU will print the message because step1 left behind a
    non-zero cpu0_err.

    if (!IS_ERR(event)) {
    if (cpu == 0 || cpu0_err)
    pr_info("enabled on all CPUs, ...")

    The patch avoids this by clearing cpu0_err in watchdog_nmi_disable().

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Andrew Jones
    Signed-off-by: Don Zickus
    Cc: pbonzini@redhat.com
    Link: http://lkml.kernel.org/r/1407768567-171794-4-git-send-email-dzickus@redhat.com
    [ Applied small cleanups. ]
    Signed-off-by: Ingo Molnar

    Ulrich Obergfell
     
  • Signed-off-by: chai wen
    Signed-off-by: Don Zickus
    Cc: pbonzini@redhat.com
    Link: http://lkml.kernel.org/r/1407768567-171794-2-git-send-email-dzickus@redhat.com
    Signed-off-by: Ingo Molnar

    chai wen
     

09 Aug, 2014

1 commit

  • This taint flag will be set if the system has ever entered a softlockup
    state. Similar to TAINT_WARN it is useful to know whether or not the
    system has been in a softlockup state when debugging.

    [akpm@linux-foundation.org: apply the taint before calling panic()]
    Signed-off-by: Josh Hunt
    Cc: Jason Baron
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Hunt
     

07 Aug, 2014

1 commit


24 Jun, 2014

2 commits

  • A 'softlockup' is defined as a bug that causes the kernel to loop in
    kernel mode for more than a predefined period to time, without giving
    other tasks a chance to run.

    Currently, upon detection of this condition by the per-cpu watchdog
    task, debug information (including a stack trace) is sent to the system
    log.

    On some occasions, we have observed that the "victim" rather than the
    actual "culprit" (i.e. the owner/holder of the contended resource) is
    reported to the user. Often this information has proven to be
    insufficient to assist debugging efforts.

    To avoid loss of useful debug information, for architectures which
    support NMI, this patch makes it possible to improve soft lockup
    reporting. This is accomplished by issuing an NMI to each cpu to obtain
    a stack trace.

    If NMI is not supported we just revert back to the old method. A sysctl
    and boot-time parameter is available to toggle this feature.

    [dzickus@redhat.com: add CONFIG_SMP in certain areas]
    [akpm@linux-foundation.org: additional CONFIG_SMP=n optimisations]
    [mq@suse.cz: fix warning]
    Signed-off-by: Aaron Tomlin
    Signed-off-by: Don Zickus
    Cc: David S. Miller
    Cc: Mateusz Guzik
    Cc: Oleg Nesterov
    Signed-off-by: Jan Moskyto Matejka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aaron Tomlin
     
  • Peter Wu noticed the following splat on his machine when updating
    /proc/sys/kernel/watchdog_thresh:

    BUG: sleeping function called from invalid context at mm/slub.c:965
    in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
    3 locks held by init/1:
    #0: (sb_writers#3){.+.+.+}, at: [] vfs_write+0x143/0x180
    #1: (watchdog_proc_mutex){+.+.+.}, at: [] proc_dowatchdog+0x33/0x110
    #2: (cpu_hotplug.lock){.+.+.+}, at: [] get_online_cpus+0x32/0x80
    Preemption disabled at:[] proc_dowatchdog+0xe4/0x110

    CPU: 0 PID: 1 Comm: init Not tainted 3.16.0-rc1-testing #34
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
    dump_stack+0x4e/0x7a
    __might_sleep+0x11d/0x190
    kmem_cache_alloc_trace+0x4e/0x1e0
    perf_event_alloc+0x55/0x440
    perf_event_create_kernel_counter+0x26/0xe0
    watchdog_nmi_enable+0x75/0x140
    update_timers_all_cpus+0x53/0xa0
    proc_dowatchdog+0xe4/0x110
    proc_sys_call_handler+0xb3/0xc0
    proc_sys_write+0x14/0x20
    vfs_write+0xad/0x180
    SyS_write+0x49/0xb0
    system_call_fastpath+0x16/0x1b
    NMI watchdog: disabled (cpu0): hardware events not enabled

    What happened is after updating the watchdog_thresh, the lockup detector
    is restarted to utilize the new value. Part of this process involved
    disabling preemption. Once preemption was disabled, perf tried to
    allocate a new event (as part of the restart). This caused the above
    BUG_ON as you can't sleep with preemption disabled.

    The preemption restriction seemed agressive as we are not doing anything
    on that particular cpu, but with all the online cpus (which are
    protected by the get_online_cpus lock). Remove the restriction and the
    BUG_ON goes away.

    Signed-off-by: Don Zickus
    Acked-by: Michal Hocko
    Reported-by: Peter Wu
    Tested-by: Peter Wu
    Acked-by: David Rientjes
    Cc: [3.13+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Don Zickus
     

19 Apr, 2014

1 commit

  • Fix:

    BUG: using __this_cpu_write() in preemptible [00000000] code: systemd-udevd/497
    caller is __this_cpu_preempt_check+0x13/0x20
    CPU: 3 PID: 497 Comm: systemd-udevd Tainted: G W 3.15.0-rc1 #9
    Hardware name: Hewlett-Packard HP EliteBook 8470p/179B, BIOS 68ICF Ver. F.02 04/27/2012
    Call Trace:
    check_preemption_disabled+0xe1/0xf0
    __this_cpu_preempt_check+0x13/0x20
    touch_nmi_watchdog+0x28/0x40

    Reported-by: Luis Henriques
    Tested-by: Luis Henriques
    Cc: Eric Piel
    Cc: Robert Moore
    Cc: Lv Zheng
    Cc: "Rafael J. Wysocki"
    Cc: Len Brown
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

04 Apr, 2014

1 commit

  • I ran into a scenario where while one cpu was stuck and should have
    panic'd because of the NMI watchdog, it didn't. The reason was another
    cpu was spewing stack dumps on to the console. Upon investigation, I
    noticed that when writing to the console and also when dumping the
    stack, the watchdog is touched.

    This causes all the cpus to reset their NMI watchdog flags and the
    'stuck' cpu just spins forever.

    This change causes the semantics of touch_nmi_watchdog to be changed
    slightly. Previously, I accidentally changed the semantics and we
    noticed there was a codepath in which touch_nmi_watchdog could be
    touched from a preemtible area. That caused a BUG() to happen when
    CONFIG_DEBUG_PREEMPT was enabled. I believe it was the acpi code.

    My attempt here re-introduces the change to have the
    touch_nmi_watchdog() code only touch the local cpu instead of all of the
    cpus. But instead of using __get_cpu_var(), I use the
    __raw_get_cpu_var() version.

    This avoids the preemption problem. However my reasoning wasn't because
    I was trying to be lazy. Instead I rationalized it as, well if
    preemption is enabled then interrupts should be enabled to and the NMI
    watchdog will have no reason to trigger. So it won't matter if the
    wrong cpu is touched because the percpu interrupt counters the NMI
    watchdog uses should still be incrementing.

    Don said:

    : I'm ok with this patch, though it does alter the behaviour of how
    : touch_nmi_watchdog works. For the most part I don't think most callers
    : need to touch all of the watchdogs (on each cpu). Perhaps a corner case
    : will pop up (the scheduler?? to mimic touch_all_softlockup_watchdogs() ).
    :
    : But this does address an issue where if a system is locked up and one cpu
    : is spewing out useful debug messages (or error messages), the hard lockup
    : will fail to go off. We have seen this on RHEL also.

    Signed-off-by: Don Zickus
    Signed-off-by: Ben Zhang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Zhang
     

25 Feb, 2014

1 commit

  • In order to remotely restart the watchdog hrtimer, update_timers()
    allocates a csd on the stack and pass it to __smp_call_function_single().

    There is no partcular need, however, for a specific csd here. Lets
    simplify that a little by calling smp_call_function_single()
    which can already take care of the csd allocation by itself.

    Acked-by: Don Zickus
    Reviewed-by: Michal Hocko
    Cc: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Don Zickus
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Michal Hocko
    Cc: Srivatsa S. Bhat
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Jens Axboe

    Frederic Weisbecker
     

25 Sep, 2013

2 commits

  • watchdog_tresh controls how often nmi perf event counter checks per-cpu
    hrtimer_interrupts counter and blows up if the counter hasn't changed
    since the last check. The counter is updated by per-cpu
    watchdog_hrtimer hrtimer which is scheduled with 2/5 watchdog_thresh
    period which guarantees that hrtimer is scheduled 2 times per the main
    period. Both hrtimer and perf event are started together when the
    watchdog is enabled.

    So far so good. But...

    But what happens when watchdog_thresh is updated from sysctl handler?

    proc_dowatchdog will set a new sampling period and hrtimer callback
    (watchdog_timer_fn) will use the new value in the next round. The
    problem, however, is that nobody tells the perf event that the sampling
    period has changed so it is ticking with the period configured when it
    has been set up.

    This might result in an ear ripping dissonance between perf and hrtimer
    parts if the watchdog_thresh is increased. And even worse it might lead
    to KABOOM if the watchdog is configured to panic on such a spurious
    lockup.

    This patch fixes the issue by updating both nmi perf even counter and
    hrtimers if the threshold value has changed.

    The nmi one is disabled and then reinitialized from scratch. This has
    an unpleasant side effect that the allocation of the new event might
    fail theoretically so the hard lockup detector would be disabled for
    such cpus. On the other hand such a memory allocation failure is very
    unlikely because the original event is deallocated right before.

    It would be much nicer if we just changed perf event period but there
    doesn't seem to be any API to do that right now. It is also unfortunate
    that perf_event_alloc uses GFP_KERNEL allocation unconditionally so we
    cannot use on_each_cpu() and do the same thing from the per-cpu context.
    The update from the current CPU should be safe because
    perf_event_disable removes the event atomically before it clears the
    per-cpu watchdog_ev so it cannot change anything under running handler
    feet.

    The hrtimer is simply restarted (thanks to Don Zickus who has pointed
    this out) if it is queued because we cannot rely it will fire&adopt to
    the new sampling period before a new nmi event triggers (when the
    treshold is decreased).

    [akpm@linux-foundation.org: the UP version of __smp_call_function_single ended up in the wrong place]
    Signed-off-by: Michal Hocko
    Acked-by: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Fabio Estevam
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • proc_dowatchdog doesn't synchronize multiple callers which might lead to
    confusion when two parallel callers might confuse watchdog_enable_all_cpus
    resp watchdog_disable_all_cpus (eg watchdog gets enabled even if
    watchdog_thresh was set to 0 already).

    This patch adds a local mutex which synchronizes callers to the sysctl
    handler.

    Signed-off-by: Michal Hocko
    Cc: Frederic Weisbecker
    Acked-by: Don Zickus
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

31 Jul, 2013

1 commit

  • A perf event can be used without forcing the tick to
    stay alive if it doesn't use a frequency but a sample
    period and if it doesn't throttle (raise storm of events).

    Since the lockup detector neither use a perf event frequency
    nor should ever throttle due to its high period, it can now
    run concurrently with the full dynticks feature.

    So remove the hack that disabled the watchdog.

    Signed-off-by: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Arnaldo Carvalho de Melo
    Cc: Stephane Eranian
    Cc: Don Zickus
    Cc: Srivatsa S. Bhat
    Cc: Anish Singh
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1374539466-4799-9-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

20 Jun, 2013

3 commits

  • When the watchdog runs, it prevents the full dynticks
    CPUs from stopping their tick because the hard lockup
    detector uses perf events internally, which in turn
    rely on the periodic tick.

    Since this is a rather confusing behaviour that is not
    easy to track down and identify for those who want to
    test CONFIG_NO_HZ_FULL, let's default disable the
    watchdog on boot time when full dynticks is enabled.

    The user can still enable it later on runtime using
    proc or sysctl.

    Reported-by: Steven Rostedt
    Suggested-by: Peter Zijlstra
    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Andrew Morton
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Li Zhong
    Cc: Don Zickus
    Cc: Srivatsa S. Bhat
    Cc: Anish Singh

    Frederic Weisbecker
     
  • We have two very conflicting state variable names in the
    watchdog:

    * watchdog_enabled: This one reflects the user interface. It's
    set to 1 by default and can be overriden with boot options
    or sysctl/procfs interface.

    * watchdog_disabled: This is the internal toggle state that
    tells if watchdog threads, timers and NMI events are currently
    running or not. This state mostly depends on the user settings.
    It's a convenient state latch.

    Now we really need to find clearer names because those
    are just too confusing to encourage deep review.

    watchdog_enabled now becomes watchdog_user_enabled to reflect
    its purpose as an interface.

    watchdog_disabled becomes watchdog_running to suggest its
    role as a pure internal state.

    Signed-off-by: Frederic Weisbecker
    Cc: Srivatsa S. Bhat
    Cc: Anish Singh
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Don Zickus

    Frederic Weisbecker
     
  • The user activation/deactivation of the watchdog through boot parameters
    or systcl is currently implemented with a dance involving kthreads parking
    and unparking methods: the threads are unconditionally registered on
    boot and they park as soon as the user want the watchdog to be disabled.

    This method involves a few noisy details to handle though: the watchdog
    kthreads may be unparked anytime due to hotplug operations, after which
    the watchdog internals have to decide to park again if it is user-disabled.

    As a result the setup() and unpark() methods need to be able to request a
    reparking. This is not currently supported in the kthread infrastructure
    so this piece of the watchdog code only works halfway.

    Besides, unparking/reparking the watchdog kthreads consume unnecessary
    cputime on hotplug operations when those could be simply ignored in the
    first place.

    As suggested by Srivatsa, let's instead only register the watchdog
    threads when they are needed. This way we don't need to think about
    hotplug operations and we don't burden the CPU onlining when the watchdog
    is simply disabled.

    Suggested-by: Srivatsa S. Bhat
    Signed-off-by: Frederic Weisbecker
    Cc: Srivatsa S. Bhat
    Cc: Anish Singh
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Don Zickus

    Frederic Weisbecker
     

14 Mar, 2013

1 commit

  • The watchdog_disabled flag is a bit cryptic. However it's
    usefulness is multifold. Uses are:

    1. Check if smpboot_register_percpu_thread function passed.

    2. Makes sure that user enables and disables the watchdog in
    sequence i.e. enable watchdog->disable watchdog->enable watchdog
    Unlike enable watchdog->enable watchdog which is wrong.

    Signed-off-by: anish kumar
    [small text cleanups]
    Signed-off-by: Don Zickus
    Cc: chuansheng.liu@intel.com
    Cc: paulmck@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/1363113848-18344-1-git-send-email-dzickus@redhat.com
    Signed-off-by: Ingo Molnar

    anish kumar
     

23 Feb, 2013

1 commit

  • Pull core locking changes from Ingo Molnar:
    "The biggest change is the rwsem lock-steal improvements, both to the
    assembly optimized and the spinlock based variants.

    The other notable change is the clean up of the seqlock implementation
    to be based on the seqcount infrastructure.

    The rest is assorted smaller debuggability, cleanup and continued -rt
    locking changes."

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    rwsem-spinlock: Implement writer lock-stealing for better scalability
    futex: Revert "futex: Mark get_robust_list as deprecated"
    generic: Use raw local irq variant for generic cmpxchg
    lockdep: Selftest: convert spinlock to raw spinlock
    seqlock: Use seqcount infrastructure
    seqlock: Remove unused functions
    ntp: Make ntp_lock raw
    intel_idle: Convert i7300_idle_lock to raw_spinlock
    locking: Various static lock initializer fixes
    lockdep: Print more info when MAX_LOCK_DEPTH is exceeded
    rwsem: Implement writer lock-stealing for better scalability
    lockdep: Silence warning if CONFIG_LOCKDEP isn't set
    watchdog: Use local_clock for get_timestamp()
    lockdep: Rename print_unlock_inbalance_bug() to print_unlock_imbalance_bug()
    locking/stat: Fix a typo

    Linus Torvalds
     

19 Feb, 2013

1 commit

  • The get_timestamp() function is always called with current cpu,
    thus using local_clock() would be more appropriate and it makes
    the code shorter and cleaner IMHO.

    Signed-off-by: Namhyung Kim
    Acked-by: Don Zickus
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/1356576585-28782-1-git-send-email-namhyung@kernel.org
    Signed-off-by: Ingo Molnar

    Namhyung Kim
     

08 Feb, 2013

1 commit


20 Dec, 2012

1 commit

  • Commit 8d4516904b39 ("watchdog: Fix CPU hotplug regression") causes an
    oops or hard lockup when doing

    echo 0 > /proc/sys/kernel/nmi_watchdog
    echo 1 > /proc/sys/kernel/nmi_watchdog

    and the kernel is booted with nmi_watchdog=1 (default)

    Running laptop-mode-tools and disconnecting/connecting AC power will
    cause this to trigger, making it a common failure scenario on laptops.

    Instead of bailing out of watchdog_disable() when !watchdog_enabled we
    can initialize the hrtimer regardless of watchdog_enabled status. This
    makes it safe to call watchdog_disable() in the nmi_watchdog=0 case,
    without the negative effect on the enabled => disabled => enabled case.

    All these tests pass with this patch:
    - nmi_watchdog=1
    echo 0 > /proc/sys/kernel/nmi_watchdog
    echo 1 > /proc/sys/kernel/nmi_watchdog

    - nmi_watchdog=0
    echo 0 > /sys/devices/system/cpu/cpu1/online

    - nmi_watchdog=0
    echo mem > /sys/power/state

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=51661

    Cc: # v3.7
    Cc: Norbert Warmuth
    Cc: Joseph Salisbury
    Cc: Thomas Gleixner
    Signed-off-by: Bjørn Mork
    Signed-off-by: Linus Torvalds

    Bjørn Mork