01 Jun, 2020

1 commit


04 Apr, 2020

1 commit


05 Mar, 2020

1 commit


13 Feb, 2020

1 commit

  • It's desirable to be able to rely on the following property: All stores
    preceding (in program order) a call to a successful queue_work() will be
    visible from the CPU which will execute the queued work by the time such
    work executes, e.g.,

    { x is initially 0 }

    CPU0 CPU1

    WRITE_ONCE(x, 1); [ "work" is being executed ]
    r0 = queue_work(wq, work); r1 = READ_ONCE(x);

    Forbids: r0 == true && r1 == 0

    The current implementation of queue_work() provides such memory-ordering
    property:

    - In __queue_work(), the ->lock spinlock is acquired.

    - On the other side, in worker_thread(), this same ->lock is held
    when dequeueing work.

    So the locking ordering makes things work out.

    Add this property to the DocBook headers of {queue,schedule}_work().

    Suggested-by: Paul E. McKenney
    Signed-off-by: Andrea Parri
    Acked-by: Paul E. McKenney
    Signed-off-by: Tejun Heo

    Andrea Parri
     

13 Sep, 2019

1 commit


28 Jun, 2019

1 commit


07 Mar, 2019

1 commit

  • Pull driver core updates from Greg KH:
    "Here is the big driver core patchset for 5.1-rc1

    More patches than "normal" here this merge window, due to some work in
    the driver core by Alexander Duyck to rework the async probe
    functionality to work better for a number of devices, and independant
    work from Rafael for the device link functionality to make it work
    "correctly".

    Also in here is:

    - lots of BUS_ATTR() removals, the macro is about to go away

    - firmware test fixups

    - ihex fixups and simplification

    - component additions (also includes i915 patches)

    - lots of minor coding style fixups and cleanups.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits)
    driver core: platform: remove misleading err_alloc label
    platform: set of_node in platform_device_register_full()
    firmware: hardcode the debug message for -ENOENT
    driver core: Add missing description of new struct device_link field
    driver core: Fix PM-runtime for links added during consumer probe
    drivers/component: kerneldoc polish
    async: Add cmdline option to specify drivers to be async probed
    driver core: Fix possible supplier PM-usage counter imbalance
    PM-runtime: Fix __pm_runtime_set_status() race with runtime resume
    driver: platform: Support parsing GpioInt 0 in platform_get_irq()
    selftests: firmware: fix verify_reqs() return value
    Revert "selftests: firmware: remove use of non-standard diff -Z option"
    Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config"
    device: Fix comment for driver_data in struct device
    kernfs: Allocating memory for kernfs_iattrs with kmem_cache.
    sysfs: remove unused include of kernfs-internal.h
    driver core: Postpone DMA tear-down until after devres release
    driver core: Document limitation related to DL_FLAG_RPM_ACTIVE
    PM-runtime: Take suppliers into account in __pm_runtime_set_status()
    device.h: Add __cold to dev_ logging functions
    ...

    Linus Torvalds
     

28 Feb, 2019

1 commit

  • The following commit:

    87915adc3f0a ("workqueue: re-add lockdep dependencies for flushing")

    improved deadlock checking in the workqueue implementation. Unfortunately
    that patch also introduced a few false positive lockdep complaints.

    This patch suppresses these false positives by allocating the workqueue mutex
    lockdep key dynamically.

    An example of a false positive lockdep complaint suppressed by this patch
    can be found below. The root cause of the lockdep complaint shown below
    is that the direct I/O code can call alloc_workqueue() from inside a work
    item created by another alloc_workqueue() call and that both workqueues
    share the same lockdep key. This patch avoids that that lockdep complaint
    is triggered by allocating the work queue lockdep keys dynamically.

    In other words, this patch guarantees that a unique lockdep key is
    associated with each work queue mutex.

    ======================================================
    WARNING: possible circular locking dependency detected
    4.19.0-dbg+ #1 Not tainted
    fio/4129 is trying to acquire lock:
    00000000a01cfe1a ((wq_completion)"dio/%s"sb->s_id){+.+.}, at: flush_workqueue+0xd0/0x970

    but task is already holding lock:
    00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #2 (&sb->s_type->i_mutex_key#14){+.+.}:
    down_write+0x3d/0x80
    __generic_file_fsync+0x77/0xf0
    ext4_sync_file+0x3c9/0x780
    vfs_fsync_range+0x66/0x100
    dio_complete+0x2f5/0x360
    dio_aio_complete_work+0x1c/0x20
    process_one_work+0x481/0x9f0
    worker_thread+0x63/0x5a0
    kthread+0x1cf/0x1f0
    ret_from_fork+0x24/0x30

    -> #1 ((work_completion)(&dio->complete_work)){+.+.}:
    process_one_work+0x447/0x9f0
    worker_thread+0x63/0x5a0
    kthread+0x1cf/0x1f0
    ret_from_fork+0x24/0x30

    -> #0 ((wq_completion)"dio/%s"sb->s_id){+.+.}:
    lock_acquire+0xc5/0x200
    flush_workqueue+0xf3/0x970
    drain_workqueue+0xec/0x220
    destroy_workqueue+0x23/0x350
    sb_init_dio_done_wq+0x6a/0x80
    do_blockdev_direct_IO+0x1f33/0x4be0
    __blockdev_direct_IO+0x79/0x86
    ext4_direct_IO+0x5df/0xbb0
    generic_file_direct_write+0x119/0x220
    __generic_file_write_iter+0x131/0x2d0
    ext4_file_write_iter+0x3fa/0x710
    aio_write+0x235/0x330
    io_submit_one+0x510/0xeb0
    __x64_sys_io_submit+0x122/0x340
    do_syscall_64+0x71/0x220
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    other info that might help us debug this:

    Chain exists of:
    (wq_completion)"dio/%s"sb->s_id --> (work_completion)(&dio->complete_work) --> &sb->s_type->i_mutex_key#14

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(&sb->s_type->i_mutex_key#14);
    lock((work_completion)(&dio->complete_work));
    lock(&sb->s_type->i_mutex_key#14);
    lock((wq_completion)"dio/%s"sb->s_id);

    *** DEADLOCK ***

    1 lock held by fio/4129:
    #0: 00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710

    stack backtrace:
    CPU: 3 PID: 4129 Comm: fio Not tainted 4.19.0-dbg+ #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
    Call Trace:
    dump_stack+0x86/0xc5
    print_circular_bug.isra.32+0x20a/0x218
    __lock_acquire+0x1c68/0x1cf0
    lock_acquire+0xc5/0x200
    flush_workqueue+0xf3/0x970
    drain_workqueue+0xec/0x220
    destroy_workqueue+0x23/0x350
    sb_init_dio_done_wq+0x6a/0x80
    do_blockdev_direct_IO+0x1f33/0x4be0
    __blockdev_direct_IO+0x79/0x86
    ext4_direct_IO+0x5df/0xbb0
    generic_file_direct_write+0x119/0x220
    __generic_file_write_iter+0x131/0x2d0
    ext4_file_write_iter+0x3fa/0x710
    aio_write+0x235/0x330
    io_submit_one+0x510/0xeb0
    __x64_sys_io_submit+0x122/0x340
    do_syscall_64+0x71/0x220
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Johannes Berg
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/20190214230058.196511-20-bvanassche@acm.org
    [ Reworked the changelog a bit. ]
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     

31 Jan, 2019

1 commit

  • Provide a new function, queue_work_node, which is meant to schedule work on
    a "random" CPU of the requested NUMA node. The main motivation for this is
    to help assist asynchronous init to better improve boot times for devices
    that are local to a specific node.

    For now we just default to the first CPU that is in the intersection of the
    cpumask of the node and the online cpumask. The only exception is if the
    CPU is local to the node we will just use the current CPU. This should work
    for our purposes as we are currently only using this for unbound work so
    the CPU will be translated to a node anyway instead of being directly used.

    As we are only using the first CPU to represent the NUMA node for now I am
    limiting the scope of the function so that it can only be used with unbound
    workqueues.

    Acked-by: Tejun Heo
    Reviewed-by: Bart Van Assche
    Acked-by: Dan Williams
    Signed-off-by: Alexander Duyck
    Signed-off-by: Greg Kroah-Hartman

    Alexander Duyck
     

18 May, 2018

1 commit

  • There can be a lot of workqueue workers and they all show up with the
    cryptic kworker/* names making it difficult to understand which is
    doing what and how they came to be.

    # ps -ef | grep kworker
    root 4 2 0 Feb25 ? 00:00:00 [kworker/0:0H]
    root 6 2 0 Feb25 ? 00:00:00 [kworker/u112:0]
    root 19 2 0 Feb25 ? 00:00:00 [kworker/1:0H]
    root 25 2 0 Feb25 ? 00:00:00 [kworker/2:0H]
    root 31 2 0 Feb25 ? 00:00:00 [kworker/3:0H]
    ...

    This patch makes workqueue workers report the latest workqueue it was
    executing for through /proc/PID/{comm,stat,status}. The extra
    information is appended to the kthread name with intervening '+' if
    currently executing, otherwise '-'.

    # cat /proc/25/comm
    kworker/2:0-events_power_efficient
    # cat /proc/25/stat
    25 (kworker/2:0-events_power_efficient) I 2 0 0 0 -1 69238880 0 0...
    # grep Name /proc/25/status
    Name: kworker/2:0-events_power_efficient

    Unfortunately, ps(1) truncates comm to 15 characters,

    # ps 25
    PID TTY STAT TIME COMMAND
    25 ? I 0:00 [kworker/2:0-eve]

    making it a lot less useful; however, this should be an easy fix from
    ps(1) side.

    Signed-off-by: Tejun Heo
    Suggested-by: Linus Torvalds
    Cc: Craig Small

    Tejun Heo
     

04 Apr, 2018

1 commit

  • Pull workqueue updates from Tejun Heo:
    "rcu_work addition and a couple trivial changes"

    * 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: remove the comment about the old manager_arb mutex
    workqueue: fix the comments of nr_idle
    fs/aio: Use rcu_work instead of explicit rcu and work item
    cgroup: Use rcu_work instead of explicit rcu and work item
    RCU, workqueue: Implement rcu_work

    Linus Torvalds
     

20 Mar, 2018

1 commit

  • There are cases where RCU callback needs to be bounced to a sleepable
    context. This is currently done by the RCU callback queueing a work
    item, which can be cumbersome to write and confusing to read.

    This patch introduces rcu_work, a workqueue work variant which gets
    executed after a RCU grace period, and converts the open coded
    bouncing in fs/aio and kernel/cgroup.

    v3: Dropped queue_rcu_work_on(). Documented rcu grace period behavior
    after queue_rcu_work().

    v2: Use rcu_barrier() instead of synchronize_rcu() to wait for
    completion of previously queued rcu callback as per Paul.

    Signed-off-by: Tejun Heo
    Acked-by: "Paul E. McKenney"
    Cc: Linus Torvalds

    Tejun Heo
     

14 Mar, 2018

1 commit


17 Feb, 2018

1 commit

  • Introduce a helper to retrieve the current task's work struct if it is
    a workqueue worker.

    This allows us to fix a long-standing deadlock in several DRM drivers
    wherein the ->runtime_suspend callback waits for a specific worker to
    finish and that worker in turn calls a function which waits for runtime
    suspend to finish. That function is invoked from multiple call sites
    and waiting for runtime suspend to finish is the correct thing to do
    except if it's executing in the context of the worker.

    Cc: Lai Jiangshan
    Cc: Dave Airlie
    Cc: Ben Skeggs
    Cc: Alex Deucher
    Acked-by: Tejun Heo
    Reviewed-by: Lyude Paul
    Signed-off-by: Lukas Wunner
    Link: https://patchwork.freedesktop.org/patch/msgid/2d8f603074131eb87e588d2b803a71765bd3a2fd.1518338788.git.lukas@wunner.de

    Lukas Wunner
     

22 Nov, 2017

3 commits

  • With all callbacks converted, and the timer callback prototype
    switched over, the TIMER_FUNC_TYPE cast is no longer needed,
    so remove it. Conversion was done with the following scripts:

    perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \
    $(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u)

    perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \
    $(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u)

    The now unused macros are also dropped from include/linux/timer.h.

    Signed-off-by: Kees Cook

    Kees Cook
     
  • With __init_timer*() now matching __setup_timer*(), remove the redundant
    internal interface, clean up the resulting definitions and add more
    documentation.

    Cc: Thomas Gleixner
    Cc: Tejun Heo
    Cc: Lai Jiangshan
    Cc: Shaohua Li
    Cc: Jens Axboe
    Cc: Andrew Morton
    Signed-off-by: Kees Cook

    Kees Cook
     
  • With the .data field removed, the ignored data arguments in timer macros
    can be removed.

    Cc: Thomas Gleixner
    Cc: Tejun Heo
    Cc: Lai Jiangshan
    Cc: Jens Axboe
    Cc: Andrew Morton
    Cc: Shaohua Li
    Signed-off-by: Kees Cook

    Kees Cook
     

14 Nov, 2017

1 commit

  • Pull timer updates from Thomas Gleixner:
    "Yet another big pile of changes:

    - More year 2038 work from Arnd slowly reaching the point where we
    need to think about the syscalls themself.

    - A new timer function which allows to conditionally (re)arm a timer
    only when it's either not running or the new expiry time is sooner
    than the armed expiry time. This allows to use a single timer for
    multiple timeout requirements w/o caring about the first expiry
    time at the call site.

    - A new NMI safe accessor to clock real time for the printk timestamp
    work. Can be used by tracing, perf as well if required.

    - A large number of timer setup conversions from Kees which got
    collected here because either maintainers requested so or they
    simply got ignored. As Kees pointed out already there are a few
    trivial merge conflicts and some redundant commits which was
    unavoidable due to the size of this conversion effort.

    - Avoid a redundant iteration in the timer wheel softirq processing.

    - Provide a mechanism to treat RTC implementations depending on their
    hardware properties, i.e. don't inflict the write at the 0.5
    seconds boundary which originates from the PC CMOS RTC to all RTCs.
    No functional change as drivers need to be updated separately.

    - The usual small updates to core code clocksource drivers. Nothing
    really exciting"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits)
    timers: Add a function to start/reduce a timer
    pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday()
    timer: Prepare to change all DEFINE_TIMER() callbacks
    netfilter: ipvs: Convert timers to use timer_setup()
    scsi: qla2xxx: Convert timers to use timer_setup()
    block/aoe: discover_timer: Convert timers to use timer_setup()
    ide: Convert timers to use timer_setup()
    drbd: Convert timers to use timer_setup()
    mailbox: Convert timers to use timer_setup()
    crypto: Convert timers to use timer_setup()
    drivers/pcmcia: omap1: Fix error in automated timer conversion
    ARM: footbridge: Fix typo in timer conversion
    drivers/sgi-xp: Convert timers to use timer_setup()
    drivers/pcmcia: Convert timers to use timer_setup()
    drivers/memstick: Convert timers to use timer_setup()
    drivers/macintosh: Convert timers to use timer_setup()
    hwrng/xgene-rng: Convert timers to use timer_setup()
    auxdisplay: Convert timers to use timer_setup()
    sparc/led: Convert timers to use timer_setup()
    mips: ip22/32: Convert timers to use timer_setup()
    ...

    Linus Torvalds
     

07 Nov, 2017

1 commit


02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

25 Oct, 2017

1 commit

  • The workqueue code added manual lock acquisition annotations to catch
    deadlocks.

    After lockdepcrossrelease was introduced, some of those became redundant,
    since wait_for_completion() already does the acquisition and tracking.

    Remove the duplicate annotations.

    Signed-off-by: Byungchul Park
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: amir73il@gmail.com
    Cc: axboe@kernel.dk
    Cc: darrick.wong@oracle.com
    Cc: david@fromorbit.com
    Cc: hch@infradead.org
    Cc: idryomov@gmail.com
    Cc: johan@kernel.org
    Cc: johannes.berg@intel.com
    Cc: kernel-team@lge.com
    Cc: linux-block@vger.kernel.org
    Cc: linux-fsdevel@vger.kernel.org
    Cc: linux-mm@kvack.org
    Cc: linux-xfs@vger.kernel.org
    Cc: oleg@redhat.com
    Cc: tj@kernel.org
    Link: http://lkml.kernel.org/r/1508921765-15396-9-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     

05 Oct, 2017

2 commits

  • In preparation for unconditionally passing the struct timer_list pointer
    to all timer callbacks, switch workqueue to use from_timer() and pass the
    timer pointer explicitly.

    Signed-off-by: Kees Cook
    Signed-off-by: Thomas Gleixner
    Cc: linux-mips@linux-mips.org
    Cc: Petr Mladek
    Cc: Benjamin Herrenschmidt
    Cc: Lai Jiangshan
    Cc: Sebastian Reichel
    Cc: Kalle Valo
    Cc: Paul Mackerras
    Cc: Pavel Machek
    Cc: linux1394-devel@lists.sourceforge.net
    Cc: Chris Metcalf
    Cc: linux-s390@vger.kernel.org
    Cc: linux-wireless@vger.kernel.org
    Cc: "James E.J. Bottomley"
    Cc: Wim Van Sebroeck
    Cc: Michael Ellerman
    Cc: Ursula Braun
    Cc: Geert Uytterhoeven
    Cc: Viresh Kumar
    Cc: Harish Patil
    Cc: Stephen Boyd
    Cc: Guenter Roeck
    Cc: Manish Chopra
    Cc: Len Brown
    Cc: Arnd Bergmann
    Cc: linux-pm@vger.kernel.org
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Julian Wiedmann
    Cc: John Stultz
    Cc: Mark Gross
    Cc: linux-watchdog@vger.kernel.org
    Cc: linux-scsi@vger.kernel.org
    Cc: "Martin K. Petersen"
    Cc: Greg Kroah-Hartman
    Cc: "Rafael J. Wysocki"
    Cc: Oleg Nesterov
    Cc: Ralf Baechle
    Cc: Stefan Richter
    Cc: Michael Reed
    Cc: netdev@vger.kernel.org
    Cc: Tejun Heo
    Cc: Andrew Morton
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: Sudip Mukherjee
    Link: https://lkml.kernel.org/r/1507159627-127660-14-git-send-email-keescook@chromium.org

    Kees Cook
     
  • The expires field is normally initialized during the first mod_timer()
    call. It was unused by all callers, so remove it from the macro.

    Signed-off-by: Kees Cook
    Cc: linux-mips@linux-mips.org
    Cc: Petr Mladek
    Cc: Benjamin Herrenschmidt
    Cc: Lai Jiangshan
    Cc: Sebastian Reichel
    Cc: Kalle Valo
    Cc: Paul Mackerras
    Cc: Pavel Machek
    Cc: linux1394-devel@lists.sourceforge.net
    Cc: Chris Metcalf
    Cc: linux-s390@vger.kernel.org
    Cc: linux-wireless@vger.kernel.org
    Cc: "James E.J. Bottomley"
    Cc: Wim Van Sebroeck
    Cc: Michael Ellerman
    Cc: Ursula Braun
    Cc: Geert Uytterhoeven
    Cc: Viresh Kumar
    Cc: Harish Patil
    Cc: Stephen Boyd
    Cc: Michael Reed
    Cc: Manish Chopra
    Cc: Len Brown
    Cc: Arnd Bergmann
    Cc: linux-pm@vger.kernel.org
    Cc: Heiko Carstens
    Cc: Tejun Heo
    Cc: Julian Wiedmann
    Cc: John Stultz
    Cc: Mark Gross
    Cc: linux-watchdog@vger.kernel.org
    Cc: linux-scsi@vger.kernel.org
    Cc: "Martin K. Petersen"
    Cc: Greg Kroah-Hartman
    Cc: "Rafael J. Wysocki"
    Cc: Oleg Nesterov
    Cc: Ralf Baechle
    Cc: Stefan Richter
    Cc: Guenter Roeck
    Cc: netdev@vger.kernel.org
    Cc: Martin Schwidefsky
    Cc: Andrew Morton
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: Sudip Mukherjee
    Link: https://lkml.kernel.org/r/1507159627-127660-12-git-send-email-keescook@chromium.org
    Signed-off-by: Thomas Gleixner

    Kees Cook
     

05 Sep, 2017

1 commit

  • Commit 0a94efb5acbb ("workqueue: implicit ordered attribute should be
    overridable") introduced a __WQ_ORDERED_EXPLICIT flag but gave it the
    same value as __WQ_LEGACY. I don't believe these were intended to
    mean the same thing, so renumber __WQ_ORDERED_EXPLICIT.

    Fixes: 0a94efb5acbb ("workqueue: implicit ordered attribute should be ...")
    Signed-off-by: Ben Hutchings
    Cc: stable@vger.kernel.org # v4.13
    Signed-off-by: Tejun Heo

    Ben Hutchings
     

26 Jul, 2017

1 commit

  • 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be
    ordered") automatically enabled ordered attribute for unbound
    workqueues w/ max_active == 1. Because ordered workqueues reject
    max_active and some attribute changes, this implicit ordered mode
    broke cases where the user creates an unbound workqueue w/ max_active
    == 1 and later explicitly changes the related attributes.

    This patch distinguishes explicit and implicit ordered setting and
    overrides from attribute changes if implict.

    Signed-off-by: Tejun Heo
    Fixes: 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")

    Tejun Heo
     

15 Apr, 2017

1 commit

  • work_on_cpu() is not protected against CPU hotplug. For code which requires
    to be either executed on an online CPU or to fail if the CPU is not
    available the callsite would have to protect against CPU hotplug.

    Provide a function which does get/put_online_cpus() around the call to
    work_on_cpu() and fails the call with -ENODEV if the target CPU is not
    online.

    Preparatory patch to convert several racy task affinity manipulations.

    Signed-off-by: Thomas Gleixner
    Acked-by: Tejun Heo
    Cc: Fenghua Yu
    Cc: Tony Luck
    Cc: Herbert Xu
    Cc: "Rafael J. Wysocki"
    Cc: Peter Zijlstra
    Cc: Benjamin Herrenschmidt
    Cc: Sebastian Siewior
    Cc: Lai Jiangshan
    Cc: Viresh Kumar
    Cc: Michael Ellerman
    Cc: "David S. Miller"
    Cc: Len Brown
    Link: http://lkml.kernel.org/r/20170412201042.262610721@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

03 Feb, 2017

1 commit

  • Building with clang shows lots of warning like:

    drivers/amba/bus.c:447:8: warning: implicit conversion from 'long long' to 'int' changes value from 4294967248 to -48
    [-Wconstant-conversion]
    static DECLARE_DELAYED_WORK(deferred_retry_work, amba_deferred_retry_func);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    include/linux/workqueue.h:187:26: note: expanded from macro 'DECLARE_DELAYED_WORK'
    struct delayed_work n = __DELAYED_WORK_INITIALIZER(n, f, 0)
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    include/linux/workqueue.h:177:10: note: expanded from macro '__DELAYED_WORK_INITIALIZER'
    .work = __WORK_INITIALIZER((n).work, (f)), \
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    include/linux/workqueue.h:170:10: note: expanded from macro '__WORK_INITIALIZER'
    .data = WORK_DATA_STATIC_INIT(), \
    ^~~~~~~~~~~~~~~~~~~~~~~
    include/linux/workqueue.h:111:39: note: expanded from macro 'WORK_DATA_STATIC_INIT'
    ATOMIC_LONG_INIT(WORK_STRUCT_NO_POOL | WORK_STRUCT_STATIC)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
    include/asm-generic/atomic-long.h:32:41: note: expanded from macro 'ATOMIC_LONG_INIT'
    #define ATOMIC_LONG_INIT(i) ATOMIC_INIT(i)
    ~~~~~~~~~~~~^~
    arch/arm/include/asm/atomic.h:21:27: note: expanded from macro 'ATOMIC_INIT'
    #define ATOMIC_INIT(i) { (i) }
    ~ ^

    This makes the type cast explicit, which shuts up the warning.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Tejun Heo

    Arnd Bergmann
     

14 Dec, 2016

1 commit

  • Pull workqueue updates from Tejun Heo:
    "Mostly patches to initialize workqueue subsystem earlier and get rid
    of keventd_up().

    The patches were headed for the last merge cycle but got delayed due
    to a bug found late minute, which is fixed now.

    Also, to help debugging, destroy_workqueue() is more chatty now on a
    sanity check failure."

    * 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: move wq_numa_init() to workqueue_init()
    workqueue: remove keventd_up()
    debugobj, workqueue: remove keventd_up() usage
    slab, workqueue: remove keventd_up() usage
    power, workqueue: remove keventd_up() usage
    tty, workqueue: remove keventd_up() usage
    mce, workqueue: remove keventd_up() usage
    workqueue: make workqueue available early during boot
    workqueue: dump workqueue state on sanity check failures in destroy_workqueue()

    Linus Torvalds
     

29 Oct, 2016

1 commit


20 Oct, 2016

1 commit


18 Sep, 2016

2 commits

  • keventd_up() no longer has in-kernel users. Remove it and make
    wq_online static.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Workqueue is currently initialized in an early init call; however,
    there are cases where early boot code has to be split and reordered to
    come after workqueue initialization or the same code path which makes
    use of workqueues is used both before workqueue initailization and
    after. The latter cases have to gate workqueue usages with
    keventd_up() tests, which is nasty and easy to get wrong.

    Workqueue usages have become widespread and it'd be a lot more
    convenient if it can be used very early from boot. This patch splits
    workqueue initialization into two steps. workqueue_init_early() which
    sets up the basic data structures so that workqueues can be created
    and work items queued, and workqueue_init() which actually brings up
    workqueues online and starts executing queued work items. The former
    step can be done very early during boot once memory allocation,
    cpumasks and idr are initialized. The latter right after kthreads
    become available.

    This allows work item queueing and canceling from very early boot
    which is what most of these use cases want.

    * As systemd_wq being initialized doesn't indicate that workqueue is
    fully online anymore, update keventd_up() to test wq_online instead.
    The follow-up patches will get rid of all its usages and the
    function itself.

    * Flushing doesn't make sense before workqueue is fully initialized.
    The flush functions trigger WARN and return immediately before fully
    online.

    * Work items are never in-flight before fully online. Canceling can
    always succeed by skipping the flush step.

    * Some code paths can no longer assume to be called with irq enabled
    as irq is disabled during early boot. Use irqsave/restore
    operations instead.

    v2: Watchdog init, which requires timer to be running, moved from
    workqueue_init_early() to workqueue_init().

    Signed-off-by: Tejun Heo
    Suggested-by: Linus Torvalds
    Link: http://lkml.kernel.org/r/CA+55aFx0vPuMuxn00rBSM192n-Du5uxy+4AvKa0SBSOVJeuCGg@mail.gmail.com

    Tejun Heo
     

29 Aug, 2016

1 commit


14 Jul, 2016

1 commit

  • Get rid of the prio ordering of the separate notifiers and use a proper state
    callback pair.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Anna-Maria Gleixner
    Reviewed-by: Sebastian Andrzej Siewior
    Acked-by: Tejun Heo
    Cc: Andrew Morton
    Cc: Lai Jiangshan
    Cc: Linus Torvalds
    Cc: Nicolas Iooss
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rasmus Villemoes
    Cc: Rusty Russell
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160713153335.197083890@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

30 Jan, 2016

1 commit

  • fca839c00a12 ("workqueue: warn if memory reclaim tries to flush
    !WQ_MEM_RECLAIM workqueue") implemented flush dependency warning which
    triggers if a PF_MEMALLOC task or WQ_MEM_RECLAIM workqueue tries to
    flush a !WQ_MEM_RECLAIM workquee.

    This assumes that workqueues marked with WQ_MEM_RECLAIM sit in memory
    reclaim path and making it depend on something which may need more
    memory to make forward progress can lead to deadlocks. Unfortunately,
    workqueues created with the legacy create*_workqueue() interface
    always have WQ_MEM_RECLAIM regardless of whether they are depended
    upon memory reclaim or not. These spurious WQ_MEM_RECLAIM markings
    cause spurious triggering of the flush dependency checks.

    WARNING: CPU: 0 PID: 6 at kernel/workqueue.c:2361 check_flush_dependency+0x138/0x144()
    workqueue: WQ_MEM_RECLAIM deferwq:deferred_probe_work_func is flushing !WQ_MEM_RECLAIM events:lru_add_drain_per_cpu
    ...
    Workqueue: deferwq deferred_probe_work_func
    [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
    [] (show_stack) from [] (dump_stack+0x94/0xd4)
    [] (dump_stack) from [] (warn_slowpath_common+0x80/0xb0)
    [] (warn_slowpath_common) from [] (warn_slowpath_fmt+0x30/0x40)
    [] (warn_slowpath_fmt) from [] (check_flush_dependency+0x138/0x144)
    [] (check_flush_dependency) from [] (flush_work+0x50/0x15c)
    [] (flush_work) from [] (lru_add_drain_all+0x130/0x180)
    [] (lru_add_drain_all) from [] (migrate_prep+0x8/0x10)
    [] (migrate_prep) from [] (alloc_contig_range+0xd8/0x338)
    [] (alloc_contig_range) from [] (cma_alloc+0xe0/0x1ac)
    [] (cma_alloc) from [] (__alloc_from_contiguous+0x38/0xd8)
    [] (__alloc_from_contiguous) from [] (__dma_alloc+0x240/0x278)
    [] (__dma_alloc) from [] (arm_dma_alloc+0x54/0x5c)
    [] (arm_dma_alloc) from [] (dmam_alloc_coherent+0xc0/0xec)
    [] (dmam_alloc_coherent) from [] (ahci_port_start+0x150/0x1dc)
    [] (ahci_port_start) from [] (ata_host_start.part.3+0xc8/0x1c8)
    [] (ata_host_start.part.3) from [] (ata_host_activate+0x50/0x148)
    [] (ata_host_activate) from [] (ahci_host_activate+0x44/0x114)
    [] (ahci_host_activate) from [] (ahci_platform_init_host+0x1d8/0x3c8)
    [] (ahci_platform_init_host) from [] (tegra_ahci_probe+0x448/0x4e8)
    [] (tegra_ahci_probe) from [] (platform_drv_probe+0x50/0xac)
    [] (platform_drv_probe) from [] (driver_probe_device+0x214/0x2c0)
    [] (driver_probe_device) from [] (bus_for_each_drv+0x60/0x94)
    [] (bus_for_each_drv) from [] (__device_attach+0xb0/0x114)
    [] (__device_attach) from [] (bus_probe_device+0x84/0x8c)
    [] (bus_probe_device) from [] (deferred_probe_work_func+0x68/0x98)
    [] (deferred_probe_work_func) from [] (process_one_work+0x120/0x3f8)
    [] (process_one_work) from [] (worker_thread+0x38/0x55c)
    [] (worker_thread) from [] (kthread+0xdc/0xf4)
    [] (kthread) from [] (ret_from_fork+0x14/0x3c)

    Fix it by marking workqueues created via create*_workqueue() with
    __WQ_LEGACY and disabling flush dependency checks on them.

    Signed-off-by: Tejun Heo
    Reported-and-tested-by: Thierry Reding
    Link: http://lkml.kernel.org/g/20160126173843.GA11115@ulmo.nvidia.com
    Fixes: fca839c00a12 ("workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue")

    Tejun Heo
     

09 Dec, 2015

1 commit

  • Workqueue stalls can happen from a variety of usage bugs such as
    missing WQ_MEM_RECLAIM flag or concurrency managed work item
    indefinitely staying RUNNING. These stalls can be extremely difficult
    to hunt down because the usual warning mechanisms can't detect
    workqueue stalls and the internal state is pretty opaque.

    To alleviate the situation, this patch implements workqueue lockup
    detector. It periodically monitors all worker_pools periodically and,
    if any pool failed to make forward progress longer than the threshold
    duration, triggers warning and dumps workqueue state as follows.

    BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 31s!
    Showing busy workqueues and worker pools:
    workqueue events: flags=0x0
    pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=17/256
    pending: monkey_wrench_fn, e1000_watchdog, cache_reap, vmstat_shepherd, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, cgroup_release_agent
    workqueue events_power_efficient: flags=0x80
    pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
    pending: check_lifetime, neigh_periodic_work
    workqueue cgroup_pidlist_destroy: flags=0x0
    pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
    pending: cgroup_pidlist_destroy_work_fn
    ...

    The detection mechanism is controller through kernel parameter
    workqueue.watchdog_thresh and can be updated at runtime through the
    sysfs module parameter file.

    v2: Decoupled from softlockup control knobs.

    Signed-off-by: Tejun Heo
    Acked-by: Don Zickus
    Cc: Ulrich Obergfell
    Cc: Michal Hocko
    Cc: Chris Mason
    Cc: Andrew Morton

    Tejun Heo
     

18 Aug, 2015

1 commit

  • There are some errors in the docbook comments in workqueue.h that cause
    warnings when the docs are built; this only recently came to light because
    these comments were not used until now. Fix the comments to make the
    warnings go away.

    The "args..." "fix" is a hack. kerneldoc doesn't deal properly with named
    variadic arguments in macros, so all I've really achieved here is to make
    it shut up. Fixing kerneldoc will have to wait for more time.

    Signed-off-by: Jonathan Corbet
    Signed-off-by: Tejun Heo

    Jonathan Corbet
     

22 May, 2015

1 commit


30 Apr, 2015

1 commit

  • Allow to modify the low-level unbound workqueues cpumask through
    sysfs. This is performed by traversing the entire workqueue list
    and calling apply_wqattrs_prepare() on the unbound workqueues
    with the new low level mask. Only after all the preparation are done,
    we commit them all together.

    Ordered workqueues are ignored from the low level unbound workqueue
    cpumask, it will be handled in near future.

    All the (default & per-node) pwqs are mandatorily controlled by
    the low level cpumask. If the user configured cpumask doesn't overlap
    with the low level cpumask, the low level cpumask will be used for the
    wq instead.

    The comment of wq_calc_node_cpumask() is updated and explicitly
    requires that its first argument should be the attrs of the default
    pwq.

    The default wq_unbound_cpumask is cpu_possible_mask. The workqueue
    subsystem doesn't know its best default value, let the system manager
    or the other subsystem set it when needed.

    Changed from V8:
    merge the calculating code for the attrs of the default pwq together.
    minor change the code&comments for saving the user configured attrs.
    remove unnecessary list_del().
    minor update the comment of wq_calc_node_cpumask().
    update the comment of workqueue_set_unbound_cpumask();

    Cc: Christoph Lameter
    Cc: Kevin Hilman
    Cc: Lai Jiangshan
    Cc: Mike Galbraith
    Cc: Paul E. McKenney
    Cc: Tejun Heo
    Cc: Viresh Kumar
    Cc: Frederic Weisbecker
    Original-patch-by: Frederic Weisbecker
    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     

09 Mar, 2015

1 commit

  • Workqueues are used extensively throughout the kernel but sometimes
    it's difficult to debug stalls involving work items because visibility
    into its inner workings is fairly limited. Although sysrq-t task dump
    annotates each active worker task with the information on the work
    item being executed, it is challenging to find out which work items
    are pending or delayed on which queues and how pools are being
    managed.

    This patch implements show_workqueue_state() which dumps all busy
    workqueues and pools and is called from the sysrq-t handler. At the
    end of sysrq-t dump, something like the following is printed.

    Showing busy workqueues and worker pools:
    ...
    workqueue filler_wq: flags=0x0
    pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
    in-flight: 491:filler_workfn, 507:filler_workfn
    pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
    in-flight: 501:filler_workfn
    pending: filler_workfn
    ...
    workqueue test_wq: flags=0x8
    pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1
    in-flight: 510(RESCUER):test_workfn BAR(69) BAR(500)
    delayed: test_workfn1 BAR(492), test_workfn2
    ...
    pool 0: cpus=0 node=0 flags=0x0 nice=0 workers=2 manager: 137
    pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=3 manager: 469
    pool 3: cpus=1 node=0 flags=0x0 nice=-20 workers=2 idle: 16
    pool 8: cpus=0-3 flags=0x4 nice=0 workers=2 manager: 62

    The above shows that test_wq is executing test_workfn() on pid 510
    which is the rescuer and also that there are two tasks 69 and 500
    waiting for the work item to finish in flush_work(). As test_wq has
    max_active of 1, there are two work items for test_workfn1() and
    test_workfn2() which are delayed till the current work item is
    finished. In addition, pid 492 is flushing test_workfn1().

    The work item for test_workfn() is being executed on pwq of pool 2
    which is the normal priority per-cpu pool for CPU 1. The pool has
    three workers, two of which are executing filler_workfn() for
    filler_wq and the last one is assuming the manager role trying to
    create more workers.

    This extra workqueue state dump will hopefully help chasing down hangs
    involving workqueues.

    v3: cpulist_pr_cont() replaced with "%*pbl" printf formatting.

    v2: As suggested by Andrew, minor formatting change in pr_cont_work(),
    printk()'s replaced with pr_info()'s, and cpumask printing now
    uses cpulist_pr_cont().

    Signed-off-by: Tejun Heo
    Cc: Lai Jiangshan
    Cc: Linus Torvalds
    Cc: Andrew Morton
    CC: Ingo Molnar

    Tejun Heo