12 Oct, 2016

1 commit

  • Previously hung_task_panic would not be respected if enabled after
    hung_task_warnings had already been decremented to 0.

    Permit the kernel to panic if hung_task_panic is enabled after
    hung_task_warnings has already been decremented to 0 and another task
    hangs for hung_task_timeout_secs seconds.

    Check if hung_task_panic is enabled so we don't return prematurely, and
    check if hung_task_warnings is non-zero so we don't print the warning
    unnecessarily.

    [akpm@linux-foundation.org: fix off-by-one]
    Link: http://lkml.kernel.org/r/1473450214-4049-1-git-send-email-jsiddle@redhat.com
    Signed-off-by: John Siddle
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Siddle
     

24 Aug, 2016

1 commit

  • When we get a hung task it can often be valuable to see _all_ the held
    locks on the system (in case we are being blocked on trying to acquire
    one), e.g. with this patch we can immediately see where the problem is
    below:

    INFO: task trinity-c3:14933 blocked for more than 120 seconds.
    Not tainted 4.8.0-rc1+ #135
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    trinity-c3 D ffff88010c16fc88 0 14933 1 0x00080004
    ffff88010c16fc88 000000003b9aca00 0000000000000000 0000000000000296
    00000000776cdf88 ffff88011a520ae0 ffff88011a520b08 ffff88011a520198
    ffffffff867d7f00 ffff88011942c080 ffff880116841580 ffff88010c168000
    Call Trace:
    [] schedule+0x77/0x230
    [] __lock_sock+0x129/0x250
    [] ? __sk_destruct+0x450/0x450
    [] ? wake_bit_function+0x2e0/0x2e0
    [] lock_sock_nested+0xeb/0x120
    [] irda_setsockopt+0x65/0xb40
    [] SyS_setsockopt+0x139/0x230
    [] ? SyS_recv+0x20/0x20
    [] ? trace_event_raw_event_sys_enter+0xb90/0xb90
    [] ? __this_cpu_preempt_check+0x13/0x20
    [] ? __context_tracking_exit.part.3+0x30/0x1b0
    [] ? SyS_recv+0x20/0x20
    [] do_syscall_64+0x1b3/0x4b0
    [] entry_SYSCALL64_slow_path+0x25/0x25

    Showing all locks held in the system:
    2 locks held by khungtaskd/563:
    #0: (rcu_read_lock){......}, at: [] watchdog+0x106/0x910
    #1: (tasklist_lock){......}, at: [] debug_show_all_locks+0x74/0x360
    1 lock held by trinity-c0/19280:
    #0: (sk_lock-AF_IRDA){......}, at: [] irda_accept+0x176/0x10f0
    1 lock held by trinity-c0/12865:
    #0: (sk_lock-AF_IRDA){......}, at: [] irda_accept+0x176/0x10f0

    Signed-off-by: Vegard Nossum
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Mandeep Singh Baines
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1471538460-7505-1-git-send-email-vegard.nossum@oracle.com
    Signed-off-by: Ingo Molnar

    Vegard Nossum
     

23 Mar, 2016

1 commit

  • When new timeout is written to /proc/sys/kernel/hung_task_timeout_secs,
    khungtaskd is interrupted and again sleeps for full timeout duration.

    This means that hang task will not be checked if new timeout is written
    periodically within old timeout duration and/or checking of hang task
    will be delayed for up to previous timeout duration. Fix this by
    remembering last time khungtaskd checked hang task.

    This change will allow other watchdog tasks (if any) to share khungtaskd
    by sleeping for minimal timeout diff of all watchdog tasks. Doing more
    watchdog tasks from khungtaskd will reduce the possibility of printk()
    collisions by multiple watchdog threads.

    Signed-off-by: Tetsuo Handa
    Cc: Oleg Nesterov
    Cc: Aaron Tomlin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     

16 Apr, 2015

1 commit

  • In check_hung_uninterruptible_tasks() avoid the use of deprecated
    while_each_thread().

    The "max_count" logic will prevent a livelock - see commit 0c740d0a
    ("introduce for_each_thread() to replace the buggy while_each_thread()").
    Having said this let's use for_each_process_thread().

    Signed-off-by: Aaron Tomlin
    Acked-by: Oleg Nesterov
    Cc: David Rientjes
    Cc: Dave Wysochanski
    Cc: Aaron Tomlin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aaron Tomlin
     

05 Jun, 2014

1 commit


04 Apr, 2014

1 commit

  • Code that is obj-y (always built-in) or dependent on a bool Kconfig
    (built-in or absent) can never be modular. So using module_init as an
    alias for __initcall can be somewhat misleading.

    Fix these up now, so that we can relocate module_init from init.h into
    module.h in the future. If we don't do this, we'd have to add module.h
    to obviously non-modular code, and that would be a worse thing.

    The audit targets the following module_init users for change:
    kernel/user.c obj-y
    kernel/kexec.c bool KEXEC (one instance per arch)
    kernel/profile.c bool PROFILING
    kernel/hung_task.c bool DETECT_HUNG_TASK
    kernel/sched/stats.c bool SCHEDSTATS
    kernel/user_namespace.c bool USER_NS

    Note that direct use of __initcall is discouraged, vs. one of the
    priority categorized subgroups. As __initcall gets mapped onto
    device_initcall, our use of subsys_initcall (which makes sense for these
    files) will thus change this registration from level 6-device to level
    4-subsys (i.e. slightly earlier). However no observable impact of that
    difference has been observed during testing.

    Also, two instances of missing ";" at EOL are fixed in kexec.

    Signed-off-by: Paul Gortmaker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Eric Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Gortmaker
     

25 Jan, 2014

1 commit

  • When khungtaskd detects hung tasks, it prints out
    backtraces from a number of those tasks.

    Limiting the number of backtraces being printed
    out can result in the user not seeing the information
    necessary to debug the issue. The hung_task_warnings
    sysctl controls this feature.

    This patch makes it possible for hung_task_warnings
    to accept a special value to print an unlimited
    number of backtraces when khungtaskd detects hung
    tasks.

    The special value is -1. To use this value it is
    necessary to change types from ulong to int.

    Signed-off-by: Aaron Tomlin
    Reviewed-by: Rik van Riel
    Acked-by: David Rientjes
    Cc: oleg@redhat.com
    Link: http://lkml.kernel.org/r/1390239253-24030-3-git-send-email-atomlin@redhat.com
    [ Build warning fix. ]
    Signed-off-by: Ingo Molnar

    Aaron Tomlin
     

15 Nov, 2013

1 commit

  • Pull KVM changes from Paolo Bonzini:
    "Here are the 3.13 KVM changes. There was a lot of work on the PPC
    side: the HV and emulation flavors can now coexist in a single kernel
    is probably the most interesting change from a user point of view.

    On the x86 side there are nested virtualization improvements and a few
    bugfixes.

    ARM got transparent huge page support, improved overcommit, and
    support for big endian guests.

    Finally, there is a new interface to connect KVM with VFIO. This
    helps with devices that use NoSnoop PCI transactions, letting the
    driver in the guest execute WBINVD instructions. This includes some
    nVidia cards on Windows, that fail to start without these patches and
    the corresponding userspace changes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
    kvm, vmx: Fix lazy FPU on nested guest
    arm/arm64: KVM: PSCI: propagate caller endianness to the incoming vcpu
    arm/arm64: KVM: MMIO support for BE guest
    kvm, cpuid: Fix sparse warning
    kvm: Delete prototype for non-existent function kvm_check_iopl
    kvm: Delete prototype for non-existent function complete_pio
    hung_task: add method to reset detector
    pvclock: detect watchdog reset at pvclock read
    kvm: optimize out smp_mb after srcu_read_unlock
    srcu: API for barrier after srcu read unlock
    KVM: remove vm mmap method
    KVM: IOMMU: hva align mapping page size
    KVM: x86: trace cpuid emulation when called from emulator
    KVM: emulator: cleanup decode_register_operand() a bit
    KVM: emulator: check rex prefix inside decode_register()
    KVM: x86: fix emulation of "movzbl %bpl, %eax"
    kvm_host: typo fix
    KVM: x86: emulate SAHF instruction
    MAINTAINERS: add tree for kvm.git
    Documentation/kvm: add a 00-INDEX file
    ...

    Linus Torvalds
     

06 Nov, 2013

1 commit

  • In certain occasions it is possible for a hung task detector
    positive to be false: continuation from a paused VM, for example.

    Add a method to reset detection, similar as is done
    with other kernel watchdogs.

    Acked-by: Don Zickus
    Acked-by: Paolo Bonzini
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Gleb Natapov

    Marcelo Tosatti
     

31 Oct, 2013

1 commit

  • Currently check_hung_task() prints a warning if it detects the
    problem, but it is not convenient to watch the system logs if
    user-space wants to be notified about the hang.

    Add the new trace_sched_process_hang() into check_hung_task(),
    this way a user-space monitor can easily wait for the hang and
    potentially resolve a problem.

    Signed-off-by: Oleg Nesterov
    Cc: Dave Sullivan
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20131019161828.GA7439@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     

23 Sep, 2013

1 commit

  • As 'sysctl_hung_task_check_count' is 'unsigned long' when this
    value is assigned to max_count in check_hung_uninterruptible_tasks(),
    it's truncated to 'int' type.

    This causes a minor artifact: if we write 2^32 to sysctl.hung_task_check_count,
    hung task detection will be effectively disabled.

    With this fix, it will still truncate the user input to 32 bits, but
    reading sysctl.hung_task_check_count reflects the actual truncated value.

    Signed-off-by: Li Zefan
    Acked-by: Ingo Molnar
    Link: http://lkml.kernel.org/r/523FFF4E.9050401@huawei.com
    Signed-off-by: Ingo Molnar

    Li Zefan
     

02 Aug, 2013

1 commit

  • printk(KERN_ERR) from check_hung_task() likely means we have a bug,
    but unlike BUG_ON()/WARN_ON ()it doesn't show the kernel version,
    this complicates the bug-reports investigation.

    Add the additional pr_err() to print tainted/release/version
    like dump_stack_print_info() does, the output becomes:

    INFO: task perl:504 blocked for more than 2 seconds.
    Not tainted 3.11.0-rc1-10367-g136bb46-dirty #1763
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    ...

    While at it, turn the old printk's into pr_err().

    Signed-off-by: Oleg Nesterov
    Cc: ahecox@redhat.com
    Cc: Christopher Williams
    Cc: dwysocha@redhat.com
    Cc: gavin@redhat.com
    Cc: Mandeep Singh Baines
    Cc: nshi@redhat.com
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20130801165941.GA17544@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     

25 Apr, 2012

1 commit

  • Send an NMI to all CPUs when a hung task is detected and the hung
    task code is configured to panic. This gives us a fairly uptodate
    snapshot of all CPUs in the system.

    This lets us get stack trace of all CPUs which makes life easier
    trying to debug a deadlock, and the NMI doesn't change anything
    since the next step is a kernel panic.

    Signed-off-by: Sasha Levin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/1331848040-1676-1-git-send-email-levinsasha928@gmail.com
    [ extended the changelog a bit ]
    Signed-off-by: Ingo Molnar

    Sasha Levin
     

06 Mar, 2012

1 commit

  • check_hung_uninterruptible_tasks()->rcu_lock_break() introduced by
    "softlockup: check all tasks in hung_task" commit ce9dbe24 looks
    absolutely wrong.

    - rcu_lock_break() does put_task_struct(). If the task has exited
    it is not safe to even read its ->state, nothing protects this
    task_struct.

    - The TASK_DEAD checks are wrong too. Contrary to the comment, we
    can't use it to check if the task was unhashed. It can be unhashed
    without TASK_DEAD, or it can be valid with TASK_DEAD.

    For example, an autoreaping task can do release_task(current)
    long before it sets TASK_DEAD in do_exit().

    Or, a zombie task can have ->state == TASK_DEAD but release_task()
    was not called, and in this case we must not break the loop.

    Change this code to check pid_alive() instead, and do this before we drop
    the reference to the task_struct.

    Note: while_each_thread() under rcu_read_lock() is not really safe, it can
    livelock. This will be fixed later, but fortunately in this case the
    "max_count" logic saves us anyway.

    Signed-off-by: Oleg Nesterov
    Acked-by: Frederic Weisbecker
    Acked-by: Mandeep Singh Baines
    Acked-by: Paul E. McKenney
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

04 Jan, 2012

1 commit

  • vfork parent uninterruptibly and unkillably waits for its child to
    exec/exit. This wait is of unbounded length. Ignore such waits
    in the hung_task detector.

    Signed-off-by: Mandeep Singh Baines
    Reported-by: Sasha Levin
    LKML-Reference:
    Cc: Linus Torvalds
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Cc: John Kacur
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Mandeep Singh Baines
     

31 Oct, 2011

1 commit

  • The changed files were only including linux/module.h for the
    EXPORT_SYMBOL infrastructure, and nothing else. Revector them
    onto the isolated export header for faster compile times.

    Nothing to see here but a whole lot of instances of:

    -#include
    +#include

    This commit is only changing the kernel dir; next targets
    will probably be mm, fs, the arch dirs, etc.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

28 Apr, 2011

1 commit

  • This patch allows the default value for sysctl_hung_task_timeout_secs
    to be set at build time. The feature carries virtually no overhead,
    so it makes sense to keep it enabled. On heavily loaded systems, though,
    it can end up triggering stack traces when there is no bug other than
    the system being underprovisioned. We use this patch to keep the hung task
    facility available but disabled at boot-time.

    The default of 120 seconds is preserved. As a note, commit e162b39a may
    have accidentally reverted commit fb822db4, which raised the default from
    120 seconds to 480 seconds.

    Signed-off-by: Jeff Mahoney
    Acked-by: Mandeep Singh Baines
    Link: http://lkml.kernel.org/r/4DB8600C.8080000@suse.com
    Signed-off-by: Ingo Molnar

    Jeff Mahoney
     

17 Aug, 2010

2 commits


27 Nov, 2009

1 commit

  • I'm seeing spikes of up to 0.5ms in khungtaskd on a large
    machine. To reduce this source of jitter I tried setting
    hung_task_check_count to 0:

    # echo 0 > /proc/sys/kernel/hung_task_check_count

    which didn't have the intended response. Change to a post
    increment of max_count, so a value of 0 means check 0 tasks.

    Signed-off-by: Anton Blanchard
    Acked-by: Frederic Weisbecker
    Cc: msb@google.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Anton Blanchard
     

24 Sep, 2009

1 commit

  • It's unused.

    It isn't needed -- read or write flag is already passed and sysctl
    shouldn't care about the rest.

    It _was_ used in two places at arch/frv for some reason.

    Signed-off-by: Alexey Dobriyan
    Cc: David Howells
    Cc: "Eric W. Biederman"
    Cc: Al Viro
    Cc: Ralf Baechle
    Cc: Martin Schwidefsky
    Cc: Ingo Molnar
    Cc: "David S. Miller"
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

11 Feb, 2009

1 commit

  • When we check if a task has been switched out since the last scan, we might
    have a race condition on the following scenario:

    - the task is freshly created and scheduled

    - it puts its state to TASK_UNINTERRUPTIBLE and is not yet switched out

    - check_hung_task() scans this task and will report a false positive because
    t->nvcsw + t->nivcsw == t->last_switch_count == 0

    Add a check for such cases.

    Signed-off-by: Frederic Weisbecker
    Acked-by: Mandeep Singh Baines
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

09 Feb, 2009

1 commit


06 Feb, 2009

2 commits

  • Since the tasklist is protected by rcu list operations, it is safe
    to convert the read_lock()s to rcu_read_lock().

    Suggested-by: Peter Zijlstra
    Signed-off-by: Mandeep Singh Baines
    Signed-off-by: Ingo Molnar

    Mandeep Singh Baines
     
  • Impact: extend the scope of hung-task checks

    Changed the default value of hung_task_check_count to PID_MAX_LIMIT.
    hung_task_batch_count added to put an upper bound on the critical
    section. Every hung_task_batch_count checks, the rcu lock is never
    held for a too long time.

    Keeping the critical section small minimizes time preemption is disabled
    and keeps rcu grace periods small.

    To prevent following a stale pointer, get_task_struct is called on g and t.
    To verify that g and t have not been unhashed while outside the critical
    section, the task states are checked.

    The design was proposed by Frédéric Weisbecker.

    Signed-off-by: Mandeep Singh Baines
    Suggested-by: Frédéric Weisbecker
    Acked-by: Andrew Morton
    Signed-off-by: Ingo Molnar

    Mandeep Singh Baines
     

19 Jan, 2009

1 commit


16 Jan, 2009

1 commit

  • Decoupling allows:

    * hung tasks check to happen at very low priority

    * hung tasks check and softlockup to be enabled/disabled independently
    at compile and/or run-time

    * individual panic settings to be enabled disabled independently
    at compile and/or run-time

    * softlockup threshold to be reduced without increasing hung tasks
    poll frequency (hung task check is expensive relative to softlock watchdog)

    * hung task check to be zero over-head when disabled at run-time

    Signed-off-by: Mandeep Singh Baines
    Signed-off-by: Ingo Molnar

    Mandeep Singh Baines