07 Nov, 2018
1 commit
-
The split out of the hard lockup detector exposed two new weak functions,
but no prototypes for them, which triggers the build warning:kernel/watchdog.c:109:12: warning: no previous prototype for ‘watchdog_nmi_enable’ [-Wmissing-prototypes]
kernel/watchdog.c:115:13: warning: no previous prototype for ‘watchdog_nmi_disable’ [-Wmissing-prototypes]Add the prototypes.
Fixes: 73ce0511c436 ("kernel/watchdog.c: move hardlockup detector to separate file")
Signed-off-by: Mathieu Malaterre
Signed-off-by: Thomas Gleixner
Cc: Babu Moger
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180606194232.17653-1-malat@debian.org
10 Jul, 2018
1 commit
-
I got confused by all the various CONFIG options here about and
conflated CONFIG_LOCKUP_DETECTOR and CONFIG_SOFTLOCKUP_DETECTOR.This results in a build failure for:
CONFIG_LOCKUP_DETECTOR=y && CONFIG_SOFTLOCKUP_DETECTOR=n
As reported by Abdul.
Reported-and-tested-by: Abdul Haleem
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-next
Cc: linuxppc-dev
Cc: mpe
Cc: sachinp
Cc: stephen Rothwell
Fixes: 9cf57731b63e ("watchdog/softlockup: Replace "watchdog/%u" threads with cpu_stop_work")
Link: http://lkml.kernel.org/r/20180710114210.GI2476@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar
03 Jul, 2018
1 commit
-
Oleg suggested to replace the "watchdog/%u" threads with
cpu_stop_work. That removes one thread per CPU while at the same time
fixes softlockup vs SCHED_DEADLINE.But more importantly, it does away with the single
smpboot_update_cpumask_percpu_thread() user, which allows
cleanups/shrinkage of the smpboot interface.Suggested-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar
02 Nov, 2017
1 commit
-
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.By default all files without license information are under the default
license of the kernel, which is GPL version 2.Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
04 Oct, 2017
2 commits
-
The rework of the core hotplug code triggers the WARN_ON in start_wd_cpu()
on powerpc because it is called multiple times for the boot CPU.The first call is via:
start_wd_on_cpu+0x80/0x2f0
watchdog_nmi_reconfigure+0x124/0x170
softlockup_reconfigure_threads+0x110/0x130
lockup_detector_init+0xbc/0xe0
kernel_init_freeable+0x18c/0x37c
kernel_init+0x2c/0x160
ret_from_kernel_thread+0x5c/0xbcAnd then again via the CPU hotplug registration:
start_wd_on_cpu+0x80/0x2f0
cpuhp_invoke_callback+0x194/0x620
cpuhp_thread_fun+0x7c/0x1b0
smpboot_thread_fn+0x290/0x2a0
kthread+0x168/0x1b0
ret_from_kernel_thread+0x5c/0xbcThis can be avoided by setting up the cpu hotplug state with nocalls and
move the initialization to the watchdog_nmi_probe() function. That
initializes the hotplug callbacks without invoking the callback and the
following core initialization function then configures the watchdog for the
online CPUs (in this case CPU0) via softlockup_reconfigure_threads().Reported-and-tested-by: Michael Ellerman
Signed-off-by: Thomas Gleixner
Acked-by: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Nicholas Piggin
Cc: linuxppc-dev@lists.ozlabs.org -
The recent cleanup of the watchdog code split watchdog_nmi_reconfigure()
into two stages. One to stop the NMI and one to restart it after
reconfiguration. That was done by adding a boolean 'run' argument to the
code, which is functionally correct but not necessarily a piece of art.Replace it by two explicit functions: watchdog_nmi_stop() and
watchdog_nmi_start().Fixes: 6592ad2fcc8f ("watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage")
Requested-by: Linus 'Nursing his pet-peeve' Torvalds
Signed-off-by: Thomas 'Mopping up garbage' Gleixner
Acked-by: Michael Ellerman
Cc: Peter Zijlstra
Cc: Don Zickus
Cc: Benjamin Herrenschmidt
Cc: Nicholas Piggin
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710021957480.2114@nanos
14 Sep, 2017
11 commits
-
watchdog_nmi_enable() is an unparseable mess, Provide a clean perf specific
implementation, which will be used when the existing setup/teardown mess is
replaced.Signed-off-by: Thomas Gleixner
Reviewed-by: Don Zickus
Cc: Andrew Morton
Cc: Borislav Petkov
Cc: Chris Metcalf
Cc: Linus Torvalds
Cc: Nicholas Piggin
Cc: Peter Zijlstra
Cc: Sebastian Siewior
Cc: Ulrich Obergfell
Link: http://lkml.kernel.org/r/20170912194148.180215498@linutronix.de
Signed-off-by: Ingo Molnar -
The watchdog tries to create perf events even after it figured out that
perf is not functional or the requested event is not supported.That's braindead as this can be done once at init time and if not supported
the NMI watchdog can be turned off unconditonally.Implement the perf hardlockup detector functionality for that. This creates
a new event create function, which will replace the unholy mess of the
existing one in later patches.Signed-off-by: Thomas Gleixner
Reviewed-by: Don Zickus
Cc: Andrew Morton
Cc: Borislav Petkov
Cc: Chris Metcalf
Cc: Linus Torvalds
Cc: Nicholas Piggin
Cc: Peter Zijlstra
Cc: Sebastian Siewior
Cc: Ulrich Obergfell
Link: http://lkml.kernel.org/r/20170912194148.019090547@linutronix.de
Signed-off-by: Ingo Molnar -
Both the perf reconfiguration and the powerpc watchdog_nmi_reconfigure()
need to be done in two steps.1) Stop all NMIs
2) Read the new parameters and start NMIsRight now watchdog_nmi_reconfigure() is a combination of both. To allow a
clean reconfiguration add a 'run' argument and split the functionality in
powerpc.Signed-off-by: Thomas Gleixner
Reviewed-by: Don Zickus
Cc: Andrew Morton
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Chris Metcalf
Cc: Linus Torvalds
Cc: Michael Ellerman
Cc: Nicholas Piggin
Cc: Peter Zijlstra
Cc: Sebastian Siewior
Cc: Ulrich Obergfell
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170912194147.862865570@linutronix.de
Signed-off-by: Ingo Molnar -
Reflect that these variables are user interface related and remove the
whitespace damage in the sysctl table while at it.Signed-off-by: Thomas Gleixner
Reviewed-by: Don Zickus
Cc: Andrew Morton
Cc: Borislav Petkov
Cc: Chris Metcalf
Cc: Linus Torvalds
Cc: Nicholas Piggin
Cc: Peter Zijlstra
Cc: Sebastian Siewior
Cc: Ulrich Obergfell
Link: http://lkml.kernel.org/r/20170912194147.783210221@linutronix.de
Signed-off-by: Ingo Molnar -
The sysctl of the nmi_watchdog file prevents writes by setting:
min = max = 0
if none of the users is enabled. That involves ifdeffery and is competely
non obvious.If none of the facilities is enabeld, then the file can simply be made read
only. Move the ifdeffery into the header and use a constant for file
permissions.Signed-off-by: Thomas Gleixner
Reviewed-by: Don Zickus
Cc: Andrew Morton
Cc: Borislav Petkov
Cc: Chris Metcalf
Cc: Linus Torvalds
Cc: Nicholas Piggin
Cc: Peter Zijlstra
Cc: Sebastian Siewior
Cc: Ulrich Obergfell
Link: http://lkml.kernel.org/r/20170912194147.706073616@linutronix.de
Signed-off-by: Ingo Molnar -
Having the same #ifdef in various places does not make it more
readable. Collect stuff into one place.Signed-off-by: Thomas Gleixner
Reviewed-by: Don Zickus
Cc: Andrew Morton
Cc: Borislav Petkov
Cc: Chris Metcalf
Cc: Linus Torvalds
Cc: Nicholas Piggin
Cc: Peter Zijlstra
Cc: Sebastian Siewior
Cc: Ulrich Obergfell
Link: http://lkml.kernel.org/r/20170912194147.627096864@linutronix.de
Signed-off-by: Ingo Molnar -
Commit:
b94f51183b06 ("kernel/watchdog: prevent false hardlockup on overloaded system")
tries to fix the following issue:
proc_write()
set_sample_period() park()
disable_nmi()
Reviewed-by: Don Zickus
Cc: Andrew Morton
Cc: Borislav Petkov
Cc: Chris Metcalf
Cc: Linus Torvalds
Cc: Nicholas Piggin
Cc: Peter Zijlstra
Cc: Sebastian Siewior
Cc: Ulrich Obergfell
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1709052038270.2393@nanos
Signed-off-by: Ingo Molnar -
The following deadlock is possible in the watchdog hotplug code:
cpus_write_lock()
...
takedown_cpu()
smpboot_park_threads()
smpboot_park_thread()
kthread_park()
->park() := watchdog_disable()
watchdog_nmi_disable()
perf_event_release_kernel();
put_event()
_free_event()
->destroy() := hw_perf_event_destroy()
x86_release_hardware()
release_ds_buffers()
get_online_cpus()when a per cpu watchdog perf event is destroyed which drops the last
reference to the PMU hardware. The cleanup code there invokes
get_online_cpus() which instantly deadlocks because the hotplug percpu
rwsem is write locked.To solve this add a deferring mechanism:
cpus_write_lock()
kthread_park()
watchdog_nmi_disable(deferred)
perf_event_disable(event);
move_event_to_deferred(event);
....
cpus_write_unlock()
cleaup_deferred_events()
perf_event_release_kernel()This is still properly serialized against concurrent hotplug via the
cpu_add_remove_lock, which is held by the task which initiated the hotplug
event.This is also used to handle event destruction when the watchdog threads are
parked via other mechanisms than CPU hotplug.Analyzed-by: Peter Zijlstra
Reported-by: Borislav Petkov
Signed-off-by: Thomas Gleixner
Reviewed-by: Don Zickus
Cc: Andrew Morton
Cc: Chris Metcalf
Cc: Linus Torvalds
Cc: Nicholas Piggin
Cc: Peter Zijlstra
Cc: Sebastian Siewior
Cc: Ulrich Obergfell
Link: http://lkml.kernel.org/r/20170912194146.884469246@linutronix.de
Signed-off-by: Ingo Molnar -
This interface has several issues:
- It's causing recursive locking of the hotplug lock.
- It's complete overkill to teardown all threads and then recreate them
The same can be achieved with the simple hardlockup_detector_perf_stop /
restart() interfaces. The abuse from the busy looping poweroff() loop of
PARISC has been solved as well.Remove the cruft.
Signed-off-by: Thomas Gleixner
Reviewed-by: Don Zickus
Cc: Andrew Morton
Cc: Borislav Petkov
Cc: Chris Metcalf
Cc: Linus Torvalds
Cc: Nicholas Piggin
Cc: Peter Zijlstra
Cc: Sebastian Siewior
Cc: Ulrich Obergfell
Link: http://lkml.kernel.org/r/20170912194146.487537732@linutronix.de
Signed-off-by: Ingo Molnar -
PARISC has a a busy looping power off routine. If the watchdog is enabled
the watchdog timer will still fire, but the thread is not running, which
causes the softlockup watchdog to trigger.Provide a interface which allows to turn the watchdog off.
Signed-off-by: Thomas Gleixner
Reviewed-by: Don Zickus
Cc: Andrew Morton
Cc: Borislav Petkov
Cc: Chris Metcalf
Cc: Helge Deller
Cc: Linus Torvalds
Cc: Nicholas Piggin
Cc: Peter Zijlstra
Cc: Sebastian Siewior
Cc: Ulrich Obergfell
Cc: linux-parisc@vger.kernel.org
Link: http://lkml.kernel.org/r/20170912194146.327343752@linutronix.de
Signed-off-by: Ingo Molnar -
Provide an interface to stop and restart perf NMI watchdog events on all
CPUs. This is only usable during init and especially for handling the perf
HT bug on Intel machines. It's safe to use it this way as nothing can
start/stop the NMI watchdog in parallel.Signed-off-by: Peter Zijlstra
Signed-off-by: Thomas Gleixner
Reviewed-by: Don Zickus
Cc: Andrew Morton
Cc: Borislav Petkov
Cc: Chris Metcalf
Cc: Linus Torvalds
Cc: Nicholas Piggin
Cc: Sebastian Siewior
Cc: Ulrich Obergfell
Link: http://lkml.kernel.org/r/20170912194146.167649596@linutronix.de
Signed-off-by: Ingo Molnar
18 Aug, 2017
1 commit
-
The hardlockup detector on x86 uses a performance counter based on unhalted
CPU cycles and a periodic hrtimer. The hrtimer period is about 2/5 of the
performance counter period, so the hrtimer should fire 2-3 times before the
performance counter NMI fires. The NMI code checks whether the hrtimer
fired since the last invocation. If not, it assumess a hard lockup.The calculation of those periods is based on the nominal CPU
frequency. Turbo modes increase the CPU clock frequency and therefore
shorten the period of the perf/NMI watchdog. With extreme Turbo-modes (3x
nominal frequency) the perf/NMI period is shorter than the hrtimer period
which leads to false positives.A simple fix would be to shorten the hrtimer period, but that comes with
the side effect of more frequent hrtimer and softlockup thread wakeups,
which is not desired.Implement a low pass filter, which checks the perf/NMI period against
kernel time. If the perf/NMI fires before 4/5 of the watchdog period has
elapsed then the event is ignored and postponed to the next perf/NMI.That solves the problem and avoids the overhead of shorter hrtimer periods
and more frequent softlockup thread wakeups.Fixes: 58687acba592 ("lockup_detector: Combine nmi_watchdog and softlockup detector")
Reported-and-tested-by: Kan Liang
Signed-off-by: Thomas Gleixner
Cc: dzickus@redhat.com
Cc: prarit@redhat.com
Cc: ak@linux.intel.com
Cc: babu.moger@oracle.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: acme@redhat.com
Cc: stable@vger.kernel.org
Cc: atomlin@redhat.com
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708150931310.1886@nanos
13 Jul, 2017
3 commits
-
Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.LOCKUP_DETECTOR implies the general boot, sysctl, and programming
interfaces for the lockup detectors.An architecture that wants to use a hard lockup detector must define
HAVE_HARDLOCKUP_DETECTOR_PERF or HAVE_HARDLOCKUP_DETECTOR_ARCH.Alternatively an arch can define HAVE_NMI_WATCHDOG, which provides the
minimum arch_touch_nmi_watchdog, and it otherwise does its own thing and
does not implement the LOCKUP_DETECTOR interfaces.sparc is unusual in that it has started to implement some of the
interfaces, but not fully yet. It should probably be converted to a full
HAVE_HARDLOCKUP_DETECTOR_ARCH.[npiggin@gmail.com: fix]
Link: http://lkml.kernel.org/r/20170617223522.66c0ad88@roar.ozlabs.ibm.com
Link: http://lkml.kernel.org/r/20170616065715.18390-4-npiggin@gmail.com
Signed-off-by: Nicholas Piggin
Reviewed-by: Don Zickus
Reviewed-by: Babu Moger
Tested-by: Babu Moger [sparc]
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
For architectures that define HAVE_NMI_WATCHDOG, instead of having them
provide the complete touch_nmi_watchdog() function, just have them
provide arch_touch_nmi_watchdog().This gives the generic code more flexibility in implementing this
function, and arch implementations don't miss out on touching the
softlockup watchdog or other generic details.Link: http://lkml.kernel.org/r/20170616065715.18390-3-npiggin@gmail.com
Signed-off-by: Nicholas Piggin
Reviewed-by: Don Zickus
Reviewed-by: Babu Moger
Tested-by: Babu Moger [sparc]
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Patch series "Improve watchdog config for arch watchdogs", v4.
A series to make the hardlockup watchdog more easily replaceable by arch
code. The last patch provides some justification for why we want to do
this (existing sparc watchdog is another that could benefit).This patch (of 5):
Remove unused declaration.
Link: http://lkml.kernel.org/r/20170616065715.18390-2-npiggin@gmail.com
Signed-off-by: Nicholas Piggin
Reviewed-by: Don Zickus
Reviewed-by: Babu Moger
Tested-by: Babu Moger [sparc]
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Mar, 2017
1 commit
-
These methods don't belong into , they are neither directly
related to task_struct or are scheduler functionality.Put them next to the other watchdog methods in .
( Arguably that header's name is a misnomer, and this patch makes it
more so - but it should be renamed in another patch. )Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar
25 Jan, 2017
1 commit
-
On an overloaded system, it is possible that a change in the watchdog
threshold can be delayed long enough to trigger a false positive.This can easily be achieved by having a cpu spinning indefinitely on a
task, while another cpu updates watchdog threshold.What happens is while trying to park the watchdog threads, the hrtimers
on the other cpus trigger and reprogram themselves with the new slower
watchdog threshold. Meanwhile, the nmi watchdog is still programmed
with the old faster threshold.Because the one cpu is blocked, it prevents the thread parking on the
other cpus from completing, which is needed to shutdown the nmi watchdog
and reprogram it correctly. As a result, a false positive from the nmi
watchdog is reported.Fix this by setting a park_in_progress flag to block all lockups until
the parking is complete.Fix provided by Ulrich Obergfell.
[akpm@linux-foundation.org: s/park_in_progress/watchdog_park_in_progress/]
Link: http://lkml.kernel.org/r/1481041033-192236-1-git-send-email-dzickus@redhat.com
Signed-off-by: Don Zickus
Reviewed-by: Aaron Tomlin
Cc: Ulrich Obergfell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Dec, 2016
1 commit
-
Patch series "Clean up watchdog handlers", v2.
This is an attempt to cleanup watchdog handlers. Right now,
kernel/watchdog.c implements both softlockup and hardlockup detectors.
Softlockup code is generic. Hardlockup code is arch specific. Some
architectures don't use hardlockup detectors. They use their own
watchdog detectors. To make both these combination work, we have
numerous #ifdefs in kernel/watchdog.c.We are trying here to make these handlers independent of each other.
Also provide an interface for architectures to implement their own
handlers. watchdog_nmi_enable and watchdog_nmi_disable will be defined
as weak such that architectures can override its definitions.Thanks to Don Zickus for his suggestions.
Here are our previous discussions
http://www.spinics.net/lists/sparclinux/msg16543.html
http://www.spinics.net/lists/sparclinux/msg16441.htmlThis patch (of 3):
Move shared macros and definitions to nmi.h so that watchdog.c, new file
watchdog_hld.c or any other architecture specific handler can use those
definitions.Link: http://lkml.kernel.org/r/1478034826-43888-2-git-send-email-babu.moger@oracle.com
Signed-off-by: Babu Moger
Acked-by: Don Zickus
Cc: Ingo Molnar
Cc: Jiri Kosina
Cc: Andi Kleen
Cc: Yaowei Bai
Cc: Aaron Tomlin
Cc: Ulrich Obergfell
Cc: Tejun Heo
Cc: Hidehiro Kawai
Cc: Josh Hunt
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Oct, 2016
1 commit
-
Patch series "improvements to the nmi_backtrace code" v9.
This patch series modifies the trigger_xxx_backtrace() NMI-based remote
backtracing code to make it more flexible, and makes a few small
improvements along the way.The motivation comes from the task isolation code, where there are
scenarios where we want to be able to diagnose a case where some cpu is
about to interrupt a task-isolated cpu. It can be helpful to see both
where the interrupting cpu is, and also an approximation of where the
cpu that is being interrupted is. The nmi_backtrace framework allows us
to discover the stack of the interrupted cpu.I've tested that the change works as desired on tile, and build-tested
x86, arm, mips, and sparc64. For x86 I confirmed that the generic
cpuidle stuff as well as the architecture-specific routines are in the
new cpuidle section. For arm, mips, and sparc I just build-tested it
and made sure the generic cpuidle routines were in the new cpuidle
section, but I didn't attempt to figure out which the platform-specific
idle routines might be. That might be more usefully done by someone
with platform experience in follow-up patches.This patch (of 4):
Currently you can only request a backtrace of either all cpus, or all
cpus but yourself. It can also be helpful to request a remote backtrace
of a single cpu, and since we want that, the logical extension is to
support a cpumask as the underlying primitive.This change modifies the existing lib/nmi_backtrace.c code to take a
cpumask as its basic primitive, and modifies the linux/nmi.h code to use
the new "cpumask" method instead.The existing clients of nmi_backtrace (arm and x86) are converted to
using the new cpumask approach in this change.The other users of the backtracing API (sparc64 and mips) are converted
to use the cpumask approach rather than the all/allbutself approach.
The mips code ignored the "include_self" boolean but with this change it
will now also dump a local backtrace if requested.Link: http://lkml.kernel.org/r/1472487169-14923-2-git-send-email-cmetcalf@mellanox.com
Signed-off-by: Chris Metcalf
Tested-by: Daniel Thompson [arm]
Reviewed-by: Aaron Tomlin
Reviewed-by: Petr Mladek
Cc: "Rafael J. Wysocki"
Cc: Russell King
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Ralf Baechle
Cc: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Mar, 2016
1 commit
-
Commit 10f9014912 ("x86: Cleanup hw_nmi.c cruft") removed unused
code in the hw_nmi.c file because of the redesign of the hardlockup
watchdog but left declaration of hw_nmi_is_cpu_stuck in linux/nmi.h,
so remvoe it.Signed-off-by: Yaowei Bai
Link: http://lkml.kernel.org/r/1458697210-3027-1-git-send-email-baiyaowei@cmss.chinamobile.com
Signed-off-by: Thomas Gleixner
06 Nov, 2015
1 commit
-
In many cases of hardlockup reports, it's actually not possible to know
why it triggered, because the CPU that got stuck is usually waiting on a
resource (with IRQs disabled) in posession of some other CPU is holding.IOW, we are often looking at the stacktrace of the victim and not the
actual offender.Introduce sysctl / cmdline parameter that makes it possible to have
hardlockup detector perform all-CPU backtrace.Signed-off-by: Jiri Kosina
Reviewed-by: Aaron Tomlin
Cc: Ulrich Obergfell
Acked-by: Don Zickus
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Sep, 2015
1 commit
-
Pull NMI backtrace update from Russell King:
"These changes convert the x86 NMI handling to be a library
implementation which other architectures can make use of. Thomas
Gleixner has reviewed and tested these changes, and wishes me to send
these rather than taking them through the tip tree.The final patch in the set adds an initial implementation using this
infrastructure to ARM, even though it doesn't send the IPI at "NMI"
level. Patches are in progress to add the ARM equivalent of NMI, but
we still need the IRQ-level fallback for systems where the "NMI" isn't
available due to secure firmware denying access to it"* 'nmi' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: add basic support for on-demand backtrace of other CPUs
nmi: x86: convert to generic nmi handler
nmi: create generic NMI backtrace implementation
05 Sep, 2015
4 commits
-
Rename watchdog_suspend() to lockup_detector_suspend() and
watchdog_resume() to lockup_detector_resume() to avoid confusion with the
watchdog subsystem and to be consistent with the existing name
lockup_detector_init().Also provide comment blocks to explain the watchdog_running and
watchdog_suspended variables and their relationship.Signed-off-by: Ulrich Obergfell
Reviewed-by: Aaron Tomlin
Cc: Guenter Roeck
Cc: Don Zickus
Cc: Ulrich Obergfell
Cc: Jiri Olsa
Cc: Michal Hocko
Cc: Stephane Eranian
Cc: Chris Metcalf
Cc: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove watchdog_nmi_disable_all() and watchdog_nmi_enable_all() since
these functions are no longer needed. If a subsystem has a need to
deactivate the watchdog temporarily, it should utilize the
watchdog_suspend() and watchdog_resume() functions.[akpm@linux-foundation.org: fix build with CONFIG_LOCKUP_DETECTOR=m]
Signed-off-by: Ulrich Obergfell
Reviewed-by: Aaron Tomlin
Cc: Guenter Roeck
Cc: Don Zickus
Cc: Ulrich Obergfell
Cc: Jiri Olsa
Cc: Michal Hocko
Cc: Stephane Eranian
Cc: Chris Metcalf
Cc: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This interface can be utilized to deactivate the hard and soft lockup
detector temporarily. Callers are expected to minimize the duration of
deactivation. Multiple deactivations are allowed to occur in parallel but
should be rare in practice.[akpm@linux-foundation.org: remove unneeded static initialization]
Signed-off-by: Ulrich Obergfell
Reviewed-by: Aaron Tomlin
Cc: Guenter Roeck
Cc: Don Zickus
Cc: Ulrich Obergfell
Cc: Jiri Olsa
Cc: Michal Hocko
Cc: Stephane Eranian
Cc: Chris Metcalf
Cc: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The kernel's NMI watchdog has nothing to do with the watchdog subsystem.
Its header declarations should be in linux/nmi.h, not linux/watchdog.h.The code provided two sets of dummy functions if HARDLOCKUP_DETECTOR is
not configured, one in the include file and one in kernel/watchdog.c.
Remove the dummy functions from kernel/watchdog.c and use those from the
include file.Signed-off-by: Guenter Roeck
Cc: Stephane Eranian
Cc: Peter Zijlstra (Intel)
Cc: Ingo Molnar
Cc: Don Zickus
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Jul, 2015
1 commit
-
x86s NMI backtrace implementation (for arch_trigger_all_cpu_backtrace())
is fairly generic in nature - the only architecture specific bits are
the act of raising the NMI to other CPUs, and reporting the status of
the NMI handler.These are fairly simple to factor out, and produce a generic
implementation which can be shared between ARM and x86.Reviewed-by: Thomas Gleixner
Signed-off-by: Russell King
25 Jun, 2015
1 commit
-
Change the default behavior of watchdog so it only runs on the
housekeeping cores when nohz_full is enabled at build and boot time.
Allow modifying the set of cores the watchdog is currently running on
with a new kernel.watchdog_cpumask sysctl.In the current system, the watchdog subsystem runs a periodic timer that
schedules the watchdog kthread to run. However, nohz_full cores are
designed to allow userspace application code running on those cores to
have 100% access to the CPU. So the watchdog system prevents the
nohz_full application code from being able to run the way it wants to,
thus the motivation to suppress the watchdog on nohz_full cores, which
this patchset provides by default.However, if we disable the watchdog globally, then the housekeeping
cores can't benefit from the watchdog functionality. So we allow
disabling it only on some cores. See Documentation/lockup-watchdogs.txt
for more information.[jhubbard@nvidia.com: fix a watchdog crash in some configurations]
Signed-off-by: Chris Metcalf
Acked-by: Don Zickus
Cc: Ingo Molnar
Cc: Ulrich Obergfell
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Frederic Weisbecker
Signed-off-by: John Hubbard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Apr, 2015
4 commits
-
Have kvm_guest_init() use hardlockup_detector_disable() instead of
watchdog_enable_hardlockup_detector(false).Remove the watchdog_hardlockup_detector_is_enabled() and the
watchdog_enable_hardlockup_detector() function which are no longer needed.Signed-off-by: Ulrich Obergfell
Signed-off-by: Don Zickus
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With the current user interface of the watchdog mechanism it is only
possible to disable or enable both lockup detectors at the same time.
This series introduces new kernel parameters and changes the semantics of
some existing kernel parameters, so that the hard lockup detector and the
soft lockup detector can be disabled or enabled individually. With this
series applied, the user interface is as follows.- parameters in /proc/sys/kernel
. soft_watchdog
This is a new parameter to control and examine the run state of
the soft lockup detector.. nmi_watchdog
The semantics of this parameter have changed. It can now be used
to control and examine the run state of the hard lockup detector.. watchdog
This parameter is still available to control the run state of both
lockup detectors at the same time. If this parameter is examined,
it shows the logical OR of soft_watchdog and nmi_watchdog.. watchdog_thresh
The semantics of this parameter are not affected by the patch.- kernel command line parameters
. nosoftlockup
The semantics of this parameter have changed. It can now be used
to disable the soft lockup detector at boot time.. nmi_watchdog=0 or nmi_watchdog=1
Disable or enable the hard lockup detector at boot time. The patch
introduces '=1' as a new option.. nowatchdog
The semantics of this parameter are not affected by the patch. It
is still available to disable both lockup detectors at boot time.Also, remove the proc_dowatchdog() function which is no longer needed.
[dzickus@redhat.com: wrote changelog]
[dzickus@redhat.com: update documentation for kernel params and sysctl]
Signed-off-by: Ulrich Obergfell
Signed-off-by: Don Zickus
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Separate handlers for each watchdog parameter in /proc/sys/kernel replace
the proc_dowatchdog() function. Three of those handlers merely call
proc_watchdog_common() with one different argument.Signed-off-by: Ulrich Obergfell
Signed-off-by: Don Zickus
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The hardlockup and softockup had always been tied together. Due to the
request of KVM folks, they had a need to have one enabled but not the
other. Internally rework the code to split things apart more cleanly.There is a bunch of churn here, but the end result should be code that
should be easier to maintain and fix without knowing the internals of what
is going on.This patch (of 9):
Introduce new definitions and variables to separate the user interface in
/proc/sys/kernel from the internal run state of the lockup detectors. The
internal run state is represented by two bits in a new variable that is
named 'watchdog_enabled'. This helps simplify the code, for example:- In order to check if any of the two lockup detectors is enabled,
it is sufficient to check if 'watchdog_enabled' is not zero.- In order to enable/disable one or both lockup detectors,
it is sufficient to set/clear one or both bits in 'watchdog_enabled'.- Concurrent updates of 'watchdog_enabled' need not be synchronized via
a spinlock or a mutex. Updates can either be atomic or concurrency can
be detected by using 'cmpxchg'.Signed-off-by: Ulrich Obergfell
Signed-off-by: Don Zickus
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Oct, 2014
1 commit
-
In some cases we don't want hard lockup detection enabled by default.
An example is when running as a guest. Introducewatchdog_enable_hardlockup_detector(bool)
allowing those cases to disable hard lockup detection. This must be
executed early by the boot processor from e.g. smp_prepare_boot_cpu, in
order to allow kernel command line arguments to override it, as well as
to avoid hard lockup detection being enabled before we've had a chance
to indicate that it's unwanted. In summary,initial boot: default=enabled
smp_prepare_boot_cpu
watchdog_enable_hardlockup_detector(false): default=disabled
cmdline has 'nmi_watchdog=1': default=enabledThe running kernel still has the ability to enable/disable at any time
with /proc/sys/kernel/nmi_watchdog us usual. However even when the
default has been overridden /proc/sys/kernel/nmi_watchdog will initially
show '1'. To truly turn it on one must disable/enable it, i.e.echo 0 > /proc/sys/kernel/nmi_watchdog
echo 1 > /proc/sys/kernel/nmi_watchdogThis patch will be immediately useful for KVM with the next patch of this
series. Other hypervisor guest types may find it useful as well.[akpm@linux-foundation.org: fix build]
[dzickus@redhat.com: fix compile issues on sparc]
Signed-off-by: Ulrich Obergfell
Signed-off-by: Andrew Jones
Signed-off-by: Don Zickus
Signed-off-by: Don Zickus
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Jul, 2014
1 commit
-
Currently APEI depends on x86 architecture. It is because of NMI hardware
error notification of GHES which is currently supported by x86 only.
However, many other APEI features can be still used perfectly by other
architectures.This commit adds two symbols:
1. HAVE_ACPI_APEI for those archs which support APEI.
2. HAVE_ACPI_APEI_NMI which is used for NMI code isolation in ghes.c
file. NMI related data and functions are grouped so they can be wrapped
inside one #ifdef section. Appropriate function stubs are provided for
!NMI case.Note there is no functional changes for x86 due to hard selected
HAVE_ACPI_APEI and HAVE_ACPI_APEI_NMI symbols.Signed-off-by: Tomasz Nowicki
Acked-by: Borislav Petkov
Signed-off-by: Tony Luck