03 Nov, 2020

1 commit

  • Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    removed various __user annotations from function signatures as part of
    its refactoring.

    It also removed the __user annotation for proc_dohung_task_timeout_secs()
    at its declaration in sched/sysctl.h, but not at its definition in
    kernel/hung_task.c.

    Hence, sparse complains:

    kernel/hung_task.c:271:5: error: symbol 'proc_dohung_task_timeout_secs' redeclared with different type (incompatible argument 3 (different address spaces))

    Adjust the annotation at the definition fitting to that refactoring to make
    sparse happy again, which also resolves this warning from sparse:

    kernel/hung_task.c:277:52: warning: incorrect type in argument 3 (different address spaces)
    kernel/hung_task.c:277:52: expected void *
    kernel/hung_task.c:277:52: got void [noderef] __user *buffer

    No functional change. No change in object code.

    Signed-off-by: Lukas Bulwahn
    Signed-off-by: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Tetsuo Handa
    Cc: Al Viro
    Cc: Andrey Ignatov
    Link: https://lkml.kernel.org/r/20201028130541.20320-1-lukas.bulwahn@gmail.com
    Signed-off-by: Linus Torvalds

    Lukas Bulwahn
     

09 Jun, 2020

2 commits

  • Commit 401c636a0eeb ("kernel/hung_task.c: show all hung tasks before
    panic") introduced a change in that we started to show all CPUs
    backtraces when a hung task is detected _and_ the sysctl/kernel
    parameter "hung_task_panic" is set. The idea is good, because usually
    when observing deadlocks (that may lead to hung tasks), the culprit is
    another task holding a lock and not necessarily the task detected as
    hung.

    The problem with this approach is that dumping backtraces is a slightly
    expensive task, specially printing that on console (and specially in
    many CPU machines, as servers commonly found nowadays). So, users that
    plan to collect a kdump to investigate the hung tasks and narrow down
    the deadlock definitely don't need the CPUs backtrace on dmesg/console,
    which will delay the panic and pollute the log (crash tool would easily
    grab all CPUs traces with 'bt -a' command).

    Also, there's the reciprocal scenario: some users may be interested in
    seeing the CPUs backtraces but not have the system panic when a hung
    task is detected. The current approach hence is almost as embedding a
    policy in the kernel, by forcing the CPUs backtraces' dump (only) on
    hung_task_panic.

    This patch decouples the panic event on hung task from the CPUs
    backtraces dump, by creating (and documenting) a new sysctl called
    "hung_task_all_cpu_backtrace", analog to the approach taken on soft/hard
    lockups, that have both a panic and an "all_cpu_backtrace" sysctl to
    allow individual control. The new mechanism for dumping the CPUs
    backtraces on hung task detection respects "hung_task_warnings" by not
    dumping the traces in case there's no warnings left.

    Signed-off-by: Guilherme G. Piccoli
    Signed-off-by: Andrew Morton
    Reviewed-by: Kees Cook
    Cc: Tetsuo Handa
    Link: http://lkml.kernel.org/r/20200327223646.20779-1-gpiccoli@canonical.com
    Signed-off-by: Linus Torvalds

    Guilherme G. Piccoli
     
  • We can now handle sysctl parameters on kernel command line and have
    infrastructure to convert legacy command line options that duplicate
    sysctl to become a sysctl alias.

    This patch converts the hung_task_panic parameter. Note that the sysctl
    handler is more strict and allows only 0 and 1, while the legacy
    parameter allowed any non-zero value. But there is little reason anyone
    would not be using 1.

    Signed-off-by: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Reviewed-by: Kees Cook
    Acked-by: Michal Hocko
    Cc: Alexey Dobriyan
    Cc: Christian Brauner
    Cc: David Rientjes
    Cc: "Eric W . Biederman"
    Cc: Greg Kroah-Hartman
    Cc: "Guilherme G . Piccoli"
    Cc: Iurii Zaikin
    Cc: Ivan Teterevkov
    Cc: Luis Chamberlain
    Cc: Masami Hiramatsu
    Cc: Matthew Wilcox
    Cc: Michal Hocko
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200427180433.7029-4-vbabka@suse.cz
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

08 Mar, 2019

2 commits

  • Since commit a2e514453861 ("kernel/hung_task.c: allow to set checking
    interval separately from timeout") added hung_task_check_interval_secs,
    setting a value different from hung_task_timeout_secs

    echo 0 > /proc/sys/kernel/hung_task_panic
    echo 120 > /proc/sys/kernel/hung_task_timeout_secs
    echo 5 > /proc/sys/kernel/hung_task_check_interval_secs

    causes confusing output as if the task was blocked for
    hung_task_timeout_secs seconds from the previous report.

    [ 399.395930] INFO: task kswapd0:75 blocked for more than 120 seconds.
    [ 405.027637] INFO: task kswapd0:75 blocked for more than 120 seconds.
    [ 410.659725] INFO: task kswapd0:75 blocked for more than 120 seconds.
    [ 416.292860] INFO: task kswapd0:75 blocked for more than 120 seconds.
    [ 421.932305] INFO: task kswapd0:75 blocked for more than 120 seconds.

    Although we could update t->last_switch_time after sched_show_task(t) if
    we want to report only every 120 seconds, reporting every 5 seconds
    might not be very bad for monitoring after a problematic situation has
    started. Thus, let's use continuously blocked time instead of updating
    previously reported time.

    [ 677.985011] INFO: task kswapd0:80 blocked for more than 122 seconds.
    [ 693.856126] INFO: task kswapd0:80 blocked for more than 138 seconds.
    [ 709.728075] INFO: task kswapd0:80 blocked for more than 154 seconds.
    [ 725.600018] INFO: task kswapd0:80 blocked for more than 170 seconds.
    [ 741.473133] INFO: task kswapd0:80 blocked for more than 186 seconds.

    Link: http://lkml.kernel.org/r/1551175083-10669-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
    Signed-off-by: Tetsuo Handa
    Acked-by: Dmitry Vyukov
    Cc: "Paul E. McKenney"
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • sparse complains:

    CHECK kernel/hung_task.c
    kernel/hung_task.c:28:19: warning: symbol 'sysctl_hung_task_check_count' was not declared. Should it be static?
    kernel/hung_task.c:42:29: warning: symbol 'sysctl_hung_task_timeout_secs' was not declared. Should it be static?
    kernel/hung_task.c:47:29: warning: symbol 'sysctl_hung_task_check_interval_secs' was not declared. Should it be static?
    kernel/hung_task.c:49:19: warning: symbol 'sysctl_hung_task_warnings' was not declared. Should it be static?
    kernel/hung_task.c:61:28: warning: symbol 'sysctl_hung_task_panic' was not declared. Should it be static?
    kernel/hung_task.c:219:5: warning: symbol 'proc_dohung_task_timeout_secs' was not declared. Should it be static?

    Add the appropriate header file to provide declarations.

    Link: http://lkml.kernel.org/r/467.1548649525@turing-police.cc.vt.edu
    Signed-off-by: Valdis Kletnieks
    Cc: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Valdis Kletnieks
     

05 Jan, 2019

2 commits

  • check_hung_uninterruptible_tasks() is currently calling rcu_lock_break()
    for every 1024 threads. But check_hung_task() is very slow if printk()
    was called, and is very fast otherwise.

    If many threads within some 1024 threads called printk(), the RCU grace
    period might be extended enough to trigger RCU stall warnings.
    Therefore, calling rcu_lock_break() for every some fixed jiffies will be
    safer.

    Link: http://lkml.kernel.org/r/1544800658-11423-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
    Signed-off-by: Tetsuo Handa
    Acked-by: Paul E. McKenney
    Cc: Petr Mladek
    Cc: Sergey Senozhatsky
    Cc: Dmitry Vyukov
    Cc: "Rafael J. Wysocki"
    Cc: Vitaly Kuznetsov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • Based on commit 401c636a0eeb ("kernel/hung_task.c: show all hung tasks
    before panic"), we could get the call stack of hung task.

    However, if the console loglevel is not high, we still can not see the
    useful panic information in practice, and in most cases users don't set
    console loglevel to high level.

    This patch is to force console verbose before system panic, so that the
    real useful information can be seen in the console, instead of being
    like the following, which doesn't have hung task information.

    INFO: task init:1 blocked for more than 120 seconds.
    Tainted: G U W 4.19.0-quilt-2e5dc0ac-g51b6c21d76cc #1
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Kernel panic - not syncing: hung_task: blocked tasks
    CPU: 2 PID: 479 Comm: khungtaskd Tainted: G U W 4.19.0-quilt-2e5dc0ac-g51b6c21d76cc #1
    Call Trace:
    dump_stack+0x4f/0x65
    panic+0xde/0x231
    watchdog+0x290/0x410
    kthread+0x12c/0x150
    ret_from_fork+0x35/0x40
    reboot: panic mode set: p,w
    Kernel Offset: 0x34000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

    Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A6015B675@SHSMSX101.ccr.corp.intel.com
    Signed-off-by: Chuansheng Liu
    Reviewed-by: Petr Mladek
    Reviewed-by: Sergey Senozhatsky
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Liu, Chuansheng
     

26 Oct, 2018

1 commit

  • It is possible to observe hung_task complaints when system goes to
    suspend-to-idle state:

    # echo freeze > /sys/power/state

    PM: Syncing filesystems ... done.
    Freezing user space processes ... (elapsed 0.001 seconds) done.
    OOM killer disabled.
    Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
    sd 0:0:0:0: [sda] Synchronizing SCSI cache
    INFO: task bash:1569 blocked for more than 120 seconds.
    Not tainted 4.19.0-rc3_+ #687
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    bash D 0 1569 604 0x00000000
    Call Trace:
    ? __schedule+0x1fe/0x7e0
    schedule+0x28/0x80
    suspend_devices_and_enter+0x4ac/0x750
    pm_suspend+0x2c0/0x310

    Register a PM notifier to disable the detector on suspend and re-enable
    back on wakeup.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: Rafael J. Wysocki

    Vitaly Kuznetsov
     

23 Aug, 2018

1 commit

  • Currently task hung checking interval is equal to timeout, as the result
    hung is detected anywhere between timeout and 2*timeout. This is fine for
    most interactive environments, but this hurts automated testing setups
    (syzbot). In an automated setup we need to strictly order CPU lockup <
    RCU stall < workqueue lockup < task hung < silent loss, so that RCU stall
    is not detected as task hung and task hung is not detected as silent
    machine loss. The large variance in task hung detection timeout requires
    setting silent machine loss timeout to a very large value (e.g. if task
    hung is 3 mins, then silent loss need to be set to ~7 mins). The
    additional 3 minutes significantly reduce testing efficiency because
    usually we crash kernel within a minute, and this can add hours to bug
    localization process as it needs to do dozens of tests.

    Allow setting checking interval separately from timeout. This allows to
    set timeout to, say, 3 minutes, but checking interval to 10 secs.

    The interval is controlled via a new hung_task_check_interval_secs sysctl,
    similar to the existing hung_task_timeout_secs sysctl. The default value
    of 0 results in the current behavior: checking interval is equal to
    timeout.

    [akpm@linux-foundation.org: update hung_task_timeout_max's comment]
    Link: http://lkml.kernel.org/r/20180611111004.203513-1-dvyukov@google.com
    Signed-off-by: Dmitry Vyukov
    Cc: Paul E. McKenney
    Cc: Tetsuo Handa
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Vyukov
     

08 Jun, 2018

1 commit

  • When we get a hung task it can often be valuable to see _all_ the hung
    tasks on the system before calling panic().

    Quoting from https://syzkaller.appspot.com/text?tag=CrashReport&id=5316056503549952
    ----------------------------------------
    INFO: task syz-executor0:6540 blocked for more than 120 seconds.
    Not tainted 4.16.0+ #13
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    syz-executor0 D23560 6540 4521 0x80000004
    Call Trace:
    context_switch kernel/sched/core.c:2848 [inline]
    __schedule+0x8fb/0x1ef0 kernel/sched/core.c:3490
    schedule+0xf5/0x430 kernel/sched/core.c:3549
    schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3607
    __mutex_lock_common kernel/locking/mutex.c:833 [inline]
    __mutex_lock+0xb7f/0x1810 kernel/locking/mutex.c:893
    mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
    lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
    __blkdev_driver_ioctl block/ioctl.c:303 [inline]
    blkdev_ioctl+0x1759/0x1e00 block/ioctl.c:601
    ioctl_by_bdev+0xa5/0x110 fs/block_dev.c:2060
    isofs_get_last_session fs/isofs/inode.c:567 [inline]
    isofs_fill_super+0x2ba9/0x3bc0 fs/isofs/inode.c:660
    mount_bdev+0x2b7/0x370 fs/super.c:1119
    isofs_mount+0x34/0x40 fs/isofs/inode.c:1560
    mount_fs+0x66/0x2d0 fs/super.c:1222
    vfs_kern_mount.part.26+0xc6/0x4a0 fs/namespace.c:1037
    vfs_kern_mount fs/namespace.c:2514 [inline]
    do_new_mount fs/namespace.c:2517 [inline]
    do_mount+0xea4/0x2b90 fs/namespace.c:2847
    ksys_mount+0xab/0x120 fs/namespace.c:3063
    SYSC_mount fs/namespace.c:3077 [inline]
    SyS_mount+0x39/0x50 fs/namespace.c:3074
    do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x42/0xb7
    (...snipped...)
    Showing all locks held in the system:
    (...snipped...)
    2 locks held by syz-executor0/6540:
    #0: 00000000566d4c39 (&type->s_umount_key#49/1){+.+.}, at: alloc_super fs/super.c:211 [inline]
    #0: 00000000566d4c39 (&type->s_umount_key#49/1){+.+.}, at: sget_userns+0x3b2/0xe60 fs/super.c:502 /* down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING); */
    #1: 0000000043ca8836 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */
    (...snipped...)
    3 locks held by syz-executor7/6541:
    #0: 0000000043ca8836 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */
    #1: 000000007bf3d3f9 (&bdev->bd_mutex){+.+.}, at: blkdev_reread_part+0x1e/0x40 block/ioctl.c:192
    #2: 00000000566d4c39 (&type->s_umount_key#50){.+.+}, at: __get_super.part.10+0x1d3/0x280 fs/super.c:663 /* down_read(&sb->s_umount); */
    ----------------------------------------

    When reporting an AB-BA deadlock like shown above, it would be nice if
    trace of PID=6541 is printed as well as trace of PID=6540 before calling
    panic().

    Showing hung tasks up to /proc/sys/kernel/hung_task_warnings could delay
    calling panic() but normally there should not be so many hung tasks.

    Link: http://lkml.kernel.org/r/201804050705.BHE57833.HVFOFtSOMQJFOL@I-love.SAKURA.ne.jp
    Signed-off-by: Tetsuo Handa
    Acked-by: Paul E. McKenney
    Acked-by: Dmitry Vyukov
    Cc: Vegard Nossum
    Cc: Mandeep Singh Baines
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton

    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     

09 May, 2017

1 commit

  • When I was running my testcase which may block hundreds of threads on fs
    locks, I got lockup due to output from debug_show_all_locks() added by
    commit b2d4c2edb2e4 ("locking/hung_task: Show all locks").

    For example, if 1000 threads were blocked in TASK_UNINTERRUPTIBLE state
    and 500 out of 1000 threads hold some lock, debug_show_all_locks() from
    for_each_process_thread() loop will report locks held by 500 threads for
    1000 times. This is a too much noise.

    In order to make sure rcu_lock_break() is called frequently, we should
    avoid calling debug_show_all_locks() from for_each_process_thread() loop
    because debug_show_all_locks() effectively calls for_each_process_thread()
    loop. Let's defer calling debug_show_all_locks() till before panic() or
    leaving for_each_process_thread() loop.

    Link: http://lkml.kernel.org/r/1489296834-60436-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
    Signed-off-by: Tetsuo Handa
    Reviewed-by: Vegard Nossum
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     

02 Mar, 2017

2 commits


13 Dec, 2016

1 commit

  • Since sysctl_hung_task_warnings == -1 is allowed (infinite warnings),
    commit 48a6d64edadb ("hung_task: allow hung_task_panic when
    hung_task_warnings is 0") should decrement it only when it is not -1.

    This prevents the kernel from ceasing warnings after the first
    4294967295 ;)

    Signed-off-by: Tetsuo Handa
    Cc: John Siddle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     

12 Oct, 2016

1 commit

  • Previously hung_task_panic would not be respected if enabled after
    hung_task_warnings had already been decremented to 0.

    Permit the kernel to panic if hung_task_panic is enabled after
    hung_task_warnings has already been decremented to 0 and another task
    hangs for hung_task_timeout_secs seconds.

    Check if hung_task_panic is enabled so we don't return prematurely, and
    check if hung_task_warnings is non-zero so we don't print the warning
    unnecessarily.

    [akpm@linux-foundation.org: fix off-by-one]
    Link: http://lkml.kernel.org/r/1473450214-4049-1-git-send-email-jsiddle@redhat.com
    Signed-off-by: John Siddle
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Siddle
     

24 Aug, 2016

1 commit

  • When we get a hung task it can often be valuable to see _all_ the held
    locks on the system (in case we are being blocked on trying to acquire
    one), e.g. with this patch we can immediately see where the problem is
    below:

    INFO: task trinity-c3:14933 blocked for more than 120 seconds.
    Not tainted 4.8.0-rc1+ #135
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    trinity-c3 D ffff88010c16fc88 0 14933 1 0x00080004
    ffff88010c16fc88 000000003b9aca00 0000000000000000 0000000000000296
    00000000776cdf88 ffff88011a520ae0 ffff88011a520b08 ffff88011a520198
    ffffffff867d7f00 ffff88011942c080 ffff880116841580 ffff88010c168000
    Call Trace:
    [] schedule+0x77/0x230
    [] __lock_sock+0x129/0x250
    [] ? __sk_destruct+0x450/0x450
    [] ? wake_bit_function+0x2e0/0x2e0
    [] lock_sock_nested+0xeb/0x120
    [] irda_setsockopt+0x65/0xb40
    [] SyS_setsockopt+0x139/0x230
    [] ? SyS_recv+0x20/0x20
    [] ? trace_event_raw_event_sys_enter+0xb90/0xb90
    [] ? __this_cpu_preempt_check+0x13/0x20
    [] ? __context_tracking_exit.part.3+0x30/0x1b0
    [] ? SyS_recv+0x20/0x20
    [] do_syscall_64+0x1b3/0x4b0
    [] entry_SYSCALL64_slow_path+0x25/0x25

    Showing all locks held in the system:
    2 locks held by khungtaskd/563:
    #0: (rcu_read_lock){......}, at: [] watchdog+0x106/0x910
    #1: (tasklist_lock){......}, at: [] debug_show_all_locks+0x74/0x360
    1 lock held by trinity-c0/19280:
    #0: (sk_lock-AF_IRDA){......}, at: [] irda_accept+0x176/0x10f0
    1 lock held by trinity-c0/12865:
    #0: (sk_lock-AF_IRDA){......}, at: [] irda_accept+0x176/0x10f0

    Signed-off-by: Vegard Nossum
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Mandeep Singh Baines
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1471538460-7505-1-git-send-email-vegard.nossum@oracle.com
    Signed-off-by: Ingo Molnar

    Vegard Nossum
     

23 Mar, 2016

1 commit

  • When new timeout is written to /proc/sys/kernel/hung_task_timeout_secs,
    khungtaskd is interrupted and again sleeps for full timeout duration.

    This means that hang task will not be checked if new timeout is written
    periodically within old timeout duration and/or checking of hang task
    will be delayed for up to previous timeout duration. Fix this by
    remembering last time khungtaskd checked hang task.

    This change will allow other watchdog tasks (if any) to share khungtaskd
    by sleeping for minimal timeout diff of all watchdog tasks. Doing more
    watchdog tasks from khungtaskd will reduce the possibility of printk()
    collisions by multiple watchdog threads.

    Signed-off-by: Tetsuo Handa
    Cc: Oleg Nesterov
    Cc: Aaron Tomlin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     

16 Apr, 2015

1 commit

  • In check_hung_uninterruptible_tasks() avoid the use of deprecated
    while_each_thread().

    The "max_count" logic will prevent a livelock - see commit 0c740d0a
    ("introduce for_each_thread() to replace the buggy while_each_thread()").
    Having said this let's use for_each_process_thread().

    Signed-off-by: Aaron Tomlin
    Acked-by: Oleg Nesterov
    Cc: David Rientjes
    Cc: Dave Wysochanski
    Cc: Aaron Tomlin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aaron Tomlin
     

05 Jun, 2014

1 commit


04 Apr, 2014

1 commit

  • Code that is obj-y (always built-in) or dependent on a bool Kconfig
    (built-in or absent) can never be modular. So using module_init as an
    alias for __initcall can be somewhat misleading.

    Fix these up now, so that we can relocate module_init from init.h into
    module.h in the future. If we don't do this, we'd have to add module.h
    to obviously non-modular code, and that would be a worse thing.

    The audit targets the following module_init users for change:
    kernel/user.c obj-y
    kernel/kexec.c bool KEXEC (one instance per arch)
    kernel/profile.c bool PROFILING
    kernel/hung_task.c bool DETECT_HUNG_TASK
    kernel/sched/stats.c bool SCHEDSTATS
    kernel/user_namespace.c bool USER_NS

    Note that direct use of __initcall is discouraged, vs. one of the
    priority categorized subgroups. As __initcall gets mapped onto
    device_initcall, our use of subsys_initcall (which makes sense for these
    files) will thus change this registration from level 6-device to level
    4-subsys (i.e. slightly earlier). However no observable impact of that
    difference has been observed during testing.

    Also, two instances of missing ";" at EOL are fixed in kexec.

    Signed-off-by: Paul Gortmaker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Eric Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Gortmaker
     

25 Jan, 2014

1 commit

  • When khungtaskd detects hung tasks, it prints out
    backtraces from a number of those tasks.

    Limiting the number of backtraces being printed
    out can result in the user not seeing the information
    necessary to debug the issue. The hung_task_warnings
    sysctl controls this feature.

    This patch makes it possible for hung_task_warnings
    to accept a special value to print an unlimited
    number of backtraces when khungtaskd detects hung
    tasks.

    The special value is -1. To use this value it is
    necessary to change types from ulong to int.

    Signed-off-by: Aaron Tomlin
    Reviewed-by: Rik van Riel
    Acked-by: David Rientjes
    Cc: oleg@redhat.com
    Link: http://lkml.kernel.org/r/1390239253-24030-3-git-send-email-atomlin@redhat.com
    [ Build warning fix. ]
    Signed-off-by: Ingo Molnar

    Aaron Tomlin
     

15 Nov, 2013

1 commit

  • Pull KVM changes from Paolo Bonzini:
    "Here are the 3.13 KVM changes. There was a lot of work on the PPC
    side: the HV and emulation flavors can now coexist in a single kernel
    is probably the most interesting change from a user point of view.

    On the x86 side there are nested virtualization improvements and a few
    bugfixes.

    ARM got transparent huge page support, improved overcommit, and
    support for big endian guests.

    Finally, there is a new interface to connect KVM with VFIO. This
    helps with devices that use NoSnoop PCI transactions, letting the
    driver in the guest execute WBINVD instructions. This includes some
    nVidia cards on Windows, that fail to start without these patches and
    the corresponding userspace changes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
    kvm, vmx: Fix lazy FPU on nested guest
    arm/arm64: KVM: PSCI: propagate caller endianness to the incoming vcpu
    arm/arm64: KVM: MMIO support for BE guest
    kvm, cpuid: Fix sparse warning
    kvm: Delete prototype for non-existent function kvm_check_iopl
    kvm: Delete prototype for non-existent function complete_pio
    hung_task: add method to reset detector
    pvclock: detect watchdog reset at pvclock read
    kvm: optimize out smp_mb after srcu_read_unlock
    srcu: API for barrier after srcu read unlock
    KVM: remove vm mmap method
    KVM: IOMMU: hva align mapping page size
    KVM: x86: trace cpuid emulation when called from emulator
    KVM: emulator: cleanup decode_register_operand() a bit
    KVM: emulator: check rex prefix inside decode_register()
    KVM: x86: fix emulation of "movzbl %bpl, %eax"
    kvm_host: typo fix
    KVM: x86: emulate SAHF instruction
    MAINTAINERS: add tree for kvm.git
    Documentation/kvm: add a 00-INDEX file
    ...

    Linus Torvalds
     

06 Nov, 2013

1 commit

  • In certain occasions it is possible for a hung task detector
    positive to be false: continuation from a paused VM, for example.

    Add a method to reset detection, similar as is done
    with other kernel watchdogs.

    Acked-by: Don Zickus
    Acked-by: Paolo Bonzini
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Gleb Natapov

    Marcelo Tosatti
     

31 Oct, 2013

1 commit

  • Currently check_hung_task() prints a warning if it detects the
    problem, but it is not convenient to watch the system logs if
    user-space wants to be notified about the hang.

    Add the new trace_sched_process_hang() into check_hung_task(),
    this way a user-space monitor can easily wait for the hang and
    potentially resolve a problem.

    Signed-off-by: Oleg Nesterov
    Cc: Dave Sullivan
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20131019161828.GA7439@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     

23 Sep, 2013

1 commit

  • As 'sysctl_hung_task_check_count' is 'unsigned long' when this
    value is assigned to max_count in check_hung_uninterruptible_tasks(),
    it's truncated to 'int' type.

    This causes a minor artifact: if we write 2^32 to sysctl.hung_task_check_count,
    hung task detection will be effectively disabled.

    With this fix, it will still truncate the user input to 32 bits, but
    reading sysctl.hung_task_check_count reflects the actual truncated value.

    Signed-off-by: Li Zefan
    Acked-by: Ingo Molnar
    Link: http://lkml.kernel.org/r/523FFF4E.9050401@huawei.com
    Signed-off-by: Ingo Molnar

    Li Zefan
     

02 Aug, 2013

1 commit

  • printk(KERN_ERR) from check_hung_task() likely means we have a bug,
    but unlike BUG_ON()/WARN_ON ()it doesn't show the kernel version,
    this complicates the bug-reports investigation.

    Add the additional pr_err() to print tainted/release/version
    like dump_stack_print_info() does, the output becomes:

    INFO: task perl:504 blocked for more than 2 seconds.
    Not tainted 3.11.0-rc1-10367-g136bb46-dirty #1763
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    ...

    While at it, turn the old printk's into pr_err().

    Signed-off-by: Oleg Nesterov
    Cc: ahecox@redhat.com
    Cc: Christopher Williams
    Cc: dwysocha@redhat.com
    Cc: gavin@redhat.com
    Cc: Mandeep Singh Baines
    Cc: nshi@redhat.com
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20130801165941.GA17544@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     

25 Apr, 2012

1 commit

  • Send an NMI to all CPUs when a hung task is detected and the hung
    task code is configured to panic. This gives us a fairly uptodate
    snapshot of all CPUs in the system.

    This lets us get stack trace of all CPUs which makes life easier
    trying to debug a deadlock, and the NMI doesn't change anything
    since the next step is a kernel panic.

    Signed-off-by: Sasha Levin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/1331848040-1676-1-git-send-email-levinsasha928@gmail.com
    [ extended the changelog a bit ]
    Signed-off-by: Ingo Molnar

    Sasha Levin
     

06 Mar, 2012

1 commit

  • check_hung_uninterruptible_tasks()->rcu_lock_break() introduced by
    "softlockup: check all tasks in hung_task" commit ce9dbe24 looks
    absolutely wrong.

    - rcu_lock_break() does put_task_struct(). If the task has exited
    it is not safe to even read its ->state, nothing protects this
    task_struct.

    - The TASK_DEAD checks are wrong too. Contrary to the comment, we
    can't use it to check if the task was unhashed. It can be unhashed
    without TASK_DEAD, or it can be valid with TASK_DEAD.

    For example, an autoreaping task can do release_task(current)
    long before it sets TASK_DEAD in do_exit().

    Or, a zombie task can have ->state == TASK_DEAD but release_task()
    was not called, and in this case we must not break the loop.

    Change this code to check pid_alive() instead, and do this before we drop
    the reference to the task_struct.

    Note: while_each_thread() under rcu_read_lock() is not really safe, it can
    livelock. This will be fixed later, but fortunately in this case the
    "max_count" logic saves us anyway.

    Signed-off-by: Oleg Nesterov
    Acked-by: Frederic Weisbecker
    Acked-by: Mandeep Singh Baines
    Acked-by: Paul E. McKenney
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

04 Jan, 2012

1 commit

  • vfork parent uninterruptibly and unkillably waits for its child to
    exec/exit. This wait is of unbounded length. Ignore such waits
    in the hung_task detector.

    Signed-off-by: Mandeep Singh Baines
    Reported-by: Sasha Levin
    LKML-Reference:
    Cc: Linus Torvalds
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Cc: John Kacur
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Mandeep Singh Baines
     

31 Oct, 2011

1 commit

  • The changed files were only including linux/module.h for the
    EXPORT_SYMBOL infrastructure, and nothing else. Revector them
    onto the isolated export header for faster compile times.

    Nothing to see here but a whole lot of instances of:

    -#include
    +#include

    This commit is only changing the kernel dir; next targets
    will probably be mm, fs, the arch dirs, etc.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

28 Apr, 2011

1 commit

  • This patch allows the default value for sysctl_hung_task_timeout_secs
    to be set at build time. The feature carries virtually no overhead,
    so it makes sense to keep it enabled. On heavily loaded systems, though,
    it can end up triggering stack traces when there is no bug other than
    the system being underprovisioned. We use this patch to keep the hung task
    facility available but disabled at boot-time.

    The default of 120 seconds is preserved. As a note, commit e162b39a may
    have accidentally reverted commit fb822db4, which raised the default from
    120 seconds to 480 seconds.

    Signed-off-by: Jeff Mahoney
    Acked-by: Mandeep Singh Baines
    Link: http://lkml.kernel.org/r/4DB8600C.8080000@suse.com
    Signed-off-by: Ingo Molnar

    Jeff Mahoney
     

17 Aug, 2010

2 commits


27 Nov, 2009

1 commit

  • I'm seeing spikes of up to 0.5ms in khungtaskd on a large
    machine. To reduce this source of jitter I tried setting
    hung_task_check_count to 0:

    # echo 0 > /proc/sys/kernel/hung_task_check_count

    which didn't have the intended response. Change to a post
    increment of max_count, so a value of 0 means check 0 tasks.

    Signed-off-by: Anton Blanchard
    Acked-by: Frederic Weisbecker
    Cc: msb@google.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Anton Blanchard
     

24 Sep, 2009

1 commit

  • It's unused.

    It isn't needed -- read or write flag is already passed and sysctl
    shouldn't care about the rest.

    It _was_ used in two places at arch/frv for some reason.

    Signed-off-by: Alexey Dobriyan
    Cc: David Howells
    Cc: "Eric W. Biederman"
    Cc: Al Viro
    Cc: Ralf Baechle
    Cc: Martin Schwidefsky
    Cc: Ingo Molnar
    Cc: "David S. Miller"
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

11 Feb, 2009

1 commit

  • When we check if a task has been switched out since the last scan, we might
    have a race condition on the following scenario:

    - the task is freshly created and scheduled

    - it puts its state to TASK_UNINTERRUPTIBLE and is not yet switched out

    - check_hung_task() scans this task and will report a false positive because
    t->nvcsw + t->nivcsw == t->last_switch_count == 0

    Add a check for such cases.

    Signed-off-by: Frederic Weisbecker
    Acked-by: Mandeep Singh Baines
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

09 Feb, 2009

1 commit


06 Feb, 2009

2 commits

  • Since the tasklist is protected by rcu list operations, it is safe
    to convert the read_lock()s to rcu_read_lock().

    Suggested-by: Peter Zijlstra
    Signed-off-by: Mandeep Singh Baines
    Signed-off-by: Ingo Molnar

    Mandeep Singh Baines
     
  • Impact: extend the scope of hung-task checks

    Changed the default value of hung_task_check_count to PID_MAX_LIMIT.
    hung_task_batch_count added to put an upper bound on the critical
    section. Every hung_task_batch_count checks, the rcu lock is never
    held for a too long time.

    Keeping the critical section small minimizes time preemption is disabled
    and keeps rcu grace periods small.

    To prevent following a stale pointer, get_task_struct is called on g and t.
    To verify that g and t have not been unhashed while outside the critical
    section, the task states are checked.

    The design was proposed by Frédéric Weisbecker.

    Signed-off-by: Mandeep Singh Baines
    Suggested-by: Frédéric Weisbecker
    Acked-by: Andrew Morton
    Signed-off-by: Ingo Molnar

    Mandeep Singh Baines