14 Jan, 2009
1 commit
-
At run-time, if softlockup_thresh is changed to a much lower value,
touch_timestamp is likely to be much older than the new softlock_thresh.This will cause a false softlockup to be detected. If softlockup_panic
is enabled, the system will panic.The fix is to touch all watchdogs before changing softlockup_thresh.
Signed-off-by: Mandeep Singh Baines
Signed-off-by: Ingo Molnar
01 Jan, 2009
2 commits
-
Impact: Reduce stack usage, use new cpumask API.
Mainly changing cpumask_t to 'struct cpumask' and similar simple API
conversion. Two conversions worth mentioning:1) we use cpumask_any_but to avoid a temporary in kernel/softlockup.c,
2) Use cpumask_var_t in taskstats_user_cmd().Signed-off-by: Rusty Russell
Signed-off-by: Mike Travis
Cc: Balbir Singh
Cc: Ingo Molnar -
Impact: Remove obsolete API usage
any_online_cpu() is a good name, but it takes a cpumask_t, not a
pointer.There are several places where any_online_cpu() doesn't really want a
mask arg at all. Replace all callers with cpumask_any() and
cpumask_any_and().Signed-off-by: Rusty Russell
Signed-off-by: Mike Travis
25 Dec, 2008
1 commit
03 Dec, 2008
1 commit
-
Impact: fix warnings-limit cutoff check for debug feature
unsigned sysctl_hung_task_warnings cannot be less than 0
Signed-off-by: Roel Kluin
Signed-off-by: Ingo Molnar
25 Nov, 2008
1 commit
-
…ignal', 'core/urgent' and 'core/xen' into core/core
17 Oct, 2008
1 commit
-
It's somewhat unlikely that it happens, but right now a race window
between interrupts or machine checks or oopses could corrupt the tainted
bitmap because it is modified in a non atomic fashion.Convert the taint variable to an unsigned long and use only atomic bit
operations on it.Unfortunately this means the intvec sysctl functions cannot be used on it
anymore.It turned out the taint sysctl handler could actually be simplified a bit
(since it only increases capabilities) so this patch actually removes
code.[akpm@linux-foundation.org: remove unneeded include]
Signed-off-by: Andi Kleen
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 Sep, 2008
2 commits
-
Conflicts:
lib/vsprintf.cManual merge:
include/linux/kernel.h
Signed-off-by: Ingo Molnar
-
Andrew says:
> Seems that about 100% of the reports we get of this warning triggering
> are sys_sync, transaction commit, etc.increase the timeout. If it still triggers for people, we can kill it.
Signed-off-by: Ingo Molnar
03 Sep, 2008
1 commit
-
The recent commit 16d9679f33caf7e683471647d1472bfe133d858 changed
check_hung_task() to filter out the TASK_KILLABLE tasks. We can
move this check to the caller which has to test t->state anyway.Signed-off-by: Oleg Nesterov
Acked-by: Andi Kleen
Signed-off-by: Linus Torvalds
30 Aug, 2008
1 commit
-
Pulling the ethernet cable on a 2.6.27-rc system with NFS mounts
currently leads to an ongoing flood of soft lockup detector backtraces
for all tasks blocked on the NFS mounts when the hickup takes
longer than 120s.I don't think NFS problems should be all that noisy.
Luckily there's a reasonably easy way to distingush this case.
Don't report task softlockup warnings for tasks in TASK_KILLABLE
state, which is used by the network file systems.I believe this patch is a 2.6.27 candidate.
Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds
27 Jul, 2008
1 commit
-
A previous patch added the early_initcall(), to allow a cleaner hooking of
pre-SMP initcalls. Now we remove the older interface, converting all
existing users to the new one.[akpm@linux-foundation.org: cleanups]
[akpm@linux-foundation.org: build fix]
[kosaki.motohiro@jp.fujitsu.com: warning fix]
[kosaki.motohiro@jp.fujitsu.com: warning fix]
Signed-off-by: Eduard - Gabriel Munteanu
Cc: Tom Zanussi
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Jul, 2008
1 commit
-
The print_timestamp can never be bigger than the touch_timestamp, at
maximum it can be equal. And if it is, the second check for
touch_timestamp + 1 bigger print_timestamp is always true, too.The check for equality is sufficient as we proceed in one-second-steps
and are at least one second away from the last print-out if we have
another timestamp.Signed-off-by: Johannes Weiner
Cc: Peter Zijlstra
Signed-off-by: Ingo Molnar
30 Jun, 2008
1 commit
-
Updating the timestamp more often is pointless as we print the warnings
only if we exceed the threshold. And the check for hung tasks relies on
the last timestamp, so it will keep working correctly, too.Signed-off-by: Johannes Weiner
Cc: Peter Zijlstra
Signed-off-by: Ingo Molnar
25 Jun, 2008
1 commit
-
This patch adds some information about when interrupts were last
enabled and disabled to the output of the softlockup detector.Signed-off-by: Vegard Nossum
Cc: Peter Zijlstra
Cc: Johannes Weiner
Cc: Arjan van de Ven
Signed-off-by: Ingo Molnar
18 Jun, 2008
1 commit
-
Most places in the kernel that go BUG: print a module list
(which is very useful for doing statistics and finding patterns),
however the softlockup detector does not do this yet.This patch adds the one line change to fix this gap.
Signed-off-by: Arjan van de Ven
Signed-off-by: Ingo Molnar
02 Jun, 2008
1 commit
-
The touch_nmi_watchdog() routine on x86 ultimately calls
touch_softlockup_watchdog(). The problem is that to touch the
softlockup watchdog, the cpu_clock code has to be called which could
involve multiple cpu locks and can lead to a hard hang if one of the
locks is held by a processor that is not going to return anytime soon
(such as could be the case with kgdb or perhaps even with some other
kind of exception).This patch causes the public version of the
touch_softlockup_watchdog() to defer the cpu clock access to a later
point.The test case for this problem is to use the following kernel config
options:CONFIG_KGDB_TESTS=y
CONFIG_KGDB_TESTS_ON_BOOT=y
CONFIG_KGDB_TESTS_BOOT_STRING="V1F100I100000"It should be noted that kgdb test suite and these options were not
available until 2.6.26-rc2, so it was necessary to patch the kgdb
test suite during the bisection.I would consider this patch a regression fix because the problem first
appeared in commit 27ec4407790d075c325e1f4da0a19c56953cce23 when some
logic was added to try to periodically sync the clocks. It was
possible to work around this particular problem by simply not
performing the sync anytime the system was in a critical context.
This was ok until commit 3e51f33fcc7f55e6df25d15b55ed10c8b4da84cd,
which added config option CONFIG_HAVE_UNSTABLE_SCHED_CLOCK and some
multi-cpu locks to sync the clocks. It became clear that accessing
this code from an nmi was the source of the lockups. Avoiding the
access to the low level clock code from an code inside the NMI
processing also fixed the problem with the 27ec44... commit.Signed-off-by: Jason Wessel
Signed-off-by: Ingo Molnar
25 May, 2008
2 commits
-
Fix unaligned access errors when setting softlockup_thresh on
64 bit platforms.Allow softlockup detection to be disabled by setting
softlockup_thresh
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner -
allow users to configure the softlockup detector to generate a panic
instead of a warning message.high-availability systems might opt for this strict method (combined
with panic_timeout= boot option/sysctl), instead of generating
softlockup warnings ad infinitum.also, automated tests work better if the system reboots reliably (into
a safe kernel) in case of a lockup.The full spectrum of configurability is supported: boot option, sysctl
option and Kconfig option.it's default-disabled.
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner
01 Mar, 2008
1 commit
-
kthread_stop() can be called when a 'watchdog' thread is executing after
kthread_should_stop() but before set_task_state(TASK_INTERRUPTIBLE).Signed-off-by: Dmitry Adamushko
Signed-off-by: Ingo Molnar
02 Feb, 2008
1 commit
-
Rafael J. Wysocki reported weird, multi-seconds delays during
suspend/resume and bisected it back to:commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
Author: Ingo Molnar
Date: Fri Jan 25 21:08:02 2008 +0100softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
fix it:
- restore the old wakeup mechanism
- fix break usage in do_each_thread() { } while_each_thread().
- fix the hotplug switch stmt, a fall-through case was broken.Bisected-by: Rafael J. Wysocki
Signed-off-by: Peter Zijlstra
Tested-by: Rafael J. Wysocki
Signed-off-by: Ingo Molnar
Acked-by: Rafael J. Wysocki
Signed-off-by: Linus Torvalds
26 Jan, 2008
2 commits
-
fix softlockup tunables signedness.
mark tunables read-mostly.
Signed-off-by: Ingo Molnar
-
this patch extends the soft-lockup detector to automatically
detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
printed the following way:------------------>
INFO: task prctl:3042 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
prctl D fd5e3793 0 3042 2997
f6050f38 00000046 00000001 fd5e3793 00000009 c06d8264 c06dae80 00000286
f6050f40 f6050f00 f7d34d90 f7d34fc8 c1e1be80 00000001 f6050000 00000000
f7e92d00 00000286 f6050f18 c0489d1a f6050f40 00006605 00000000 c0133a5b
Call Trace:
[] schedule_timeout+0x6d/0x8b
[] schedule_timeout_uninterruptible+0x15/0x17
[] msleep+0x10/0x16
[] sys_prctl+0x30/0x1e2
[] sysenter_past_esp+0x5f/0xa5
=======================
2 locks held by prctl/3042:
#0: (&sb->s_type->i_mutex_key#5){--..}, at: [] do_fsync+0x38/0x7a
#1: (jbd_handle){--..}, at: [] journal_start+0xc7/0xe9
: CPU hotplug fixes. ]
[ Andrew Morton : build warning fix. ]Signed-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven
20 Oct, 2007
1 commit
-
The task_struct->pid member is going to be deprecated, so start
using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in
the kernel.The first thing to start with is the pid, printed to dmesg - in
this case we may safely use task_pid_nr(). Besides, printks produce
more (much more) than a half of all the explicit pid usage.[akpm@linux-foundation.org: git-drm went and changed lots of stuff]
Signed-off-by: Pavel Emelyanov
Cc: Dave Airlie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Oct, 2007
5 commits
-
Control the trigger limit for softlockup warnings. This is useful for
debugging softlockups, by lowering the softlockup_thresh to identify
possible softlockups earlier.This patch:
1. Adds a sysctl softlockup_thresh with valid values of 1-60s
(Higher value to disable false positives)
2. Changes the softlockup printk to print the cpu softlockup time[akpm@linux-foundation.org: Fix various warnings and add definition of "two"]
Signed-off-by: Ravikiran Thirumalai
Signed-off-by: Shai Fultheim
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
kernel/softirq.c grew a few style uncleanlinesses in the past few
months, clean that up. No functional changes:text data bss dec hex filename
1126 76 4 1206 4b6 softlockup.o.before
1129 76 4 1209 4b9 softlockup.o.after( the 3 bytes .text increase is due to the "" appended to one of
the printk messages. )Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Improve the debuggability of kernel lockups by enhancing the debug
output of the softlockup detector: print the task that causes the lockup
and try to print a more intelligent backtrace.The old format was:
BUG: soft lockup detected on CPU#1!
[] show_trace_log_lvl+0x19/0x2e
[] show_trace+0x12/0x14
[] dump_stack+0x14/0x16
[] softlockup_tick+0xbe/0xd0
[] run_local_timers+0x12/0x14
[] update_process_times+0x3e/0x63
[] tick_sched_timer+0x7c/0xc0
[] hrtimer_interrupt+0x135/0x1ba
[] smp_apic_timer_interrupt+0x6e/0x80
[] apic_timer_interrupt+0x33/0x38
[] syscall_call+0x7/0xb
=======================The new format is:
BUG: soft lockup detected on CPU#1! [prctl:2363]
Pid: 2363, comm: prctl
EIP: 0060:[] CPU: 1
EIP is at sys_prctl+0x24/0x18c
EFLAGS: 00000213 Not tainted (2.6.22-cfs-v20 #26)
EAX: 00000001 EBX: 000003e7 ECX: 00000001 EDX: f6df0000
ESI: 000003e7 EDI: 000003e7 EBP: f6df0fb0 DS: 007b ES: 007b FS: 00d8
CR0: 8005003b CR2: 4d8c3340 CR3: 3731d000 CR4: 000006d0
[] show_trace_log_lvl+0x19/0x2e
[] show_trace+0x12/0x14
[] show_regs+0x1ab/0x1b3
[] softlockup_tick+0xef/0x108
[] run_local_timers+0x12/0x14
[] update_process_times+0x3e/0x63
[] tick_sched_timer+0x7c/0xc0
[] hrtimer_interrupt+0x135/0x1ba
[] smp_apic_timer_interrupt+0x6e/0x80
[] apic_timer_interrupt+0x33/0x38
[] syscall_call+0x7/0xb
=======================Note that in the old format we only knew that some system call locked
up, we didnt know _which_. With the new format we know that it's at a
specific place in sys_prctl(). [which was where i created an artificial
kernel lockup to test the new format.]This is also useful if the lockup happens in user-space - the user-space
EIP (and other registers) will be printed too. (such a lockup would
either suggest that the task was running at SCHED_FIFO:99 and looping
for more than 10 seconds, or that the softlockup detector has a
false-positive.)The task name is printed too first, just in case we dont manage to print
a useful backtrace.[satyam@infradead.org: fix warning]
Signed-off-by: Ingo Molnar
Signed-off-by: Satyam Sharma
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
this Xen related commit:
commit 966812dc98e6a7fcdf759cbfa0efab77500a8868
Author: Jeremy Fitzhardinge
Date: Tue May 8 00:28:02 2007 -0700Ignore stolen time in the softlockup watchdog
broke the softlockup watchdog to never report any lockups. (!)
print_timestamp defaults to 0, this makes the following condition
always true:if (print_timestamp < (touch_timestamp + 1) ||
and we'll in essence never report soft lockups.
apparently the functionality of the soft lockup watchdog was never
actually tested with that patch applied ...Signed-off-by: Ingo Molnar
Cc: Jeremy Fitzhardinge
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
sched_clock() is not a reliable time-source, use cpu_clock() instead.
Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Jul, 2007
1 commit
-
Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves. This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.The patch causes all kernel threads to be nonfreezable by default (ie. to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE. It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear. Additionally, it updates documentation to
describe the freezing of tasks more accurately.[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki
Acked-by: Nigel Cunningham
Cc: Pavel Machek
Cc: Oleg Nesterov
Cc: Gautham R Shenoy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 May, 2007
1 commit
-
Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress. This
patch introduces such notifications and causes them to be used during
suspend and resume transitions. It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki
Cc: Gautham R Shenoy
Cc: Pavel Machek
Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 May, 2007
3 commits
-
Add touch_all_softlockup_watchdogs() to allow the softlockup watchdog
timers on all cpus to be updated. This is used to prevent sysrq-t from
generating a spurious watchdog message when generating lots of output.Softlockup watchdogs use sched_clock() as its timebase, which is inherently
per-cpu (at least, when it is measuring unstolen time). Because of this,
it isn't possible for one CPU to directly update the other CPU's timers,
but it is possible to tell the other CPUs to do update themselves
appropriately.Signed-off-by: Jeremy Fitzhardinge
Acked-by: Chris Lalancette
Signed-off-by: Prarit Bhargava
Cc: Rick Lindsley
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The softlockup watchdog is currently a nuisance in a virtual machine, since
the whole system could have the CPU stolen from it for a long period of
time. While it would be unlikely for a guest domain to be denied timer
interrupts for over 10s, it could happen and any softlockup message would
be completely spurious.Earlier I proposed that sched_clock() return time in unstolen nanoseconds,
which is how Xen and VMI currently implement it. If the softlockup
watchdog uses sched_clock() to measure time, it would automatically ignore
stolen time, and therefore only report when the guest itself locked up.
When running native, sched_clock() returns real-time nanoseconds, so the
behaviour would be unchanged.Note that sched_clock() used this way is inherently per-cpu, so this patch
makes sure that the per-processor watchdog thread initialized its own
timestamp.Signed-off-by: Jeremy Fitzhardinge
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: john stultz
Cc: Zachary Amsden
Cc: James Morris
Cc: Dan Hecht
Cc: Paul Mackerras
Cc: Martin Schwidefsky
Cc: Prarit Bhargava
Cc: Chris Lalancette
Cc: Rick Lindsley
Cc: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Don't use hardcoded 99 value, use MAX_RT_PRIO.
Signed-off-by: Oleg Nesterov
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
30 Sep, 2006
1 commit
-
Spawing ksoftirqd, migration, or watchdog, and calling init_timers_cpu()
may fail with small memory. If it happens in initcalls, kernel NULL
pointer dereference happens later. This patch makes crash happen
immediately in such cases. It seems a bit better than getting kernel NULL
pointer dereference later.Cc: Ingo Molnar
Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Aug, 2006
1 commit
-
Few of the callback functions and notifier blocks that are associated with cpu
notifications incorrectly have __devinit and __devinitdata. They should be
__cpuinit and __cpuinitdata instead.It makes no functional difference but wastes text area when CONFIG_HOTPLUG is
enabled and CONFIG_HOTPLUG_CPU is not.This patch fixes all those instances.
Signed-off-by: Chandra Seetharaman
Cc: Ashok Raj
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 Jun, 2006
2 commits
-
This patch reverts notifier_block changes made in 2.6.17
Signed-off-by: Chandra Seetharaman
Cc: Ashok Raj
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In 2.6.17, there was a problem with cpu_notifiers and XFS. I provided a
band-aid solution to solve that problem. In the process, i undid all the
changes you both were making to ensure that these notifiers were available
only at init time (unless CONFIG_HOTPLUG_CPU is defined).We deferred the real fix to 2.6.18. Here is a set of patches that fixes the
XFS problem cleanly and makes the cpu notifiers available only at init time
(unless CONFIG_HOTPLUG_CPU is defined).If CONFIG_HOTPLUG_CPU is defined then cpu notifiers are available at run
time.This patch reverts the notifier_call changes made in 2.6.17
Signed-off-by: Chandra Seetharaman
Cc: Ashok Raj
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Jun, 2006
2 commits
-
If a cpu hotplug callback fails on CPU_UP_PREPARE, all callbacks will be
called with CPU_UP_CANCELED. A few of these callbacks assume that on
CPU_UP_PREPARE a pointer to task has been stored in a percpu array. This
assumption is not true if CPU_UP_PREPARE fails and the following calls to
kthread_bind() in CPU_UP_CANCELED will cause an addressing exception
because of passing a NULL pointer.Signed-off-by: Heiko Carstens
Cc: Ashok Raj
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There are several instances of per_cpu(foo, raw_smp_processor_id()), which
is semantically equivalent to __get_cpu_var(foo) but without the warning
that smp_processor_id() can give if CONFIG_DEBUG_PREEMPT is enabled. For
those architectures with optimized per-cpu implementations, namely ia64,
powerpc, s390, sparc64 and x86_64, per_cpu() turns into more and slower
code than __get_cpu_var(), so it would be preferable to use __get_cpu_var
on those platforms.This defines a __raw_get_cpu_var(x) macro which turns into per_cpu(x,
raw_smp_processor_id()) on architectures that use the generic per-cpu
implementation, and turns into __get_cpu_var(x) on the architectures that
have an optimized per-cpu implementation.Signed-off-by: Paul Mackerras
Acked-by: David S. Miller
Acked-by: Ingo Molnar
Acked-by: Martin Schwidefsky
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds