29 Aug, 2016

1 commit


14 Jul, 2016

1 commit

  • Get rid of the prio ordering of the separate notifiers and use a proper state
    callback pair.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Anna-Maria Gleixner
    Reviewed-by: Sebastian Andrzej Siewior
    Acked-by: Tejun Heo
    Cc: Andrew Morton
    Cc: Lai Jiangshan
    Cc: Linus Torvalds
    Cc: Nicolas Iooss
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rasmus Villemoes
    Cc: Rusty Russell
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160713153335.197083890@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

30 Jan, 2016

1 commit

  • fca839c00a12 ("workqueue: warn if memory reclaim tries to flush
    !WQ_MEM_RECLAIM workqueue") implemented flush dependency warning which
    triggers if a PF_MEMALLOC task or WQ_MEM_RECLAIM workqueue tries to
    flush a !WQ_MEM_RECLAIM workquee.

    This assumes that workqueues marked with WQ_MEM_RECLAIM sit in memory
    reclaim path and making it depend on something which may need more
    memory to make forward progress can lead to deadlocks. Unfortunately,
    workqueues created with the legacy create*_workqueue() interface
    always have WQ_MEM_RECLAIM regardless of whether they are depended
    upon memory reclaim or not. These spurious WQ_MEM_RECLAIM markings
    cause spurious triggering of the flush dependency checks.

    WARNING: CPU: 0 PID: 6 at kernel/workqueue.c:2361 check_flush_dependency+0x138/0x144()
    workqueue: WQ_MEM_RECLAIM deferwq:deferred_probe_work_func is flushing !WQ_MEM_RECLAIM events:lru_add_drain_per_cpu
    ...
    Workqueue: deferwq deferred_probe_work_func
    [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
    [] (show_stack) from [] (dump_stack+0x94/0xd4)
    [] (dump_stack) from [] (warn_slowpath_common+0x80/0xb0)
    [] (warn_slowpath_common) from [] (warn_slowpath_fmt+0x30/0x40)
    [] (warn_slowpath_fmt) from [] (check_flush_dependency+0x138/0x144)
    [] (check_flush_dependency) from [] (flush_work+0x50/0x15c)
    [] (flush_work) from [] (lru_add_drain_all+0x130/0x180)
    [] (lru_add_drain_all) from [] (migrate_prep+0x8/0x10)
    [] (migrate_prep) from [] (alloc_contig_range+0xd8/0x338)
    [] (alloc_contig_range) from [] (cma_alloc+0xe0/0x1ac)
    [] (cma_alloc) from [] (__alloc_from_contiguous+0x38/0xd8)
    [] (__alloc_from_contiguous) from [] (__dma_alloc+0x240/0x278)
    [] (__dma_alloc) from [] (arm_dma_alloc+0x54/0x5c)
    [] (arm_dma_alloc) from [] (dmam_alloc_coherent+0xc0/0xec)
    [] (dmam_alloc_coherent) from [] (ahci_port_start+0x150/0x1dc)
    [] (ahci_port_start) from [] (ata_host_start.part.3+0xc8/0x1c8)
    [] (ata_host_start.part.3) from [] (ata_host_activate+0x50/0x148)
    [] (ata_host_activate) from [] (ahci_host_activate+0x44/0x114)
    [] (ahci_host_activate) from [] (ahci_platform_init_host+0x1d8/0x3c8)
    [] (ahci_platform_init_host) from [] (tegra_ahci_probe+0x448/0x4e8)
    [] (tegra_ahci_probe) from [] (platform_drv_probe+0x50/0xac)
    [] (platform_drv_probe) from [] (driver_probe_device+0x214/0x2c0)
    [] (driver_probe_device) from [] (bus_for_each_drv+0x60/0x94)
    [] (bus_for_each_drv) from [] (__device_attach+0xb0/0x114)
    [] (__device_attach) from [] (bus_probe_device+0x84/0x8c)
    [] (bus_probe_device) from [] (deferred_probe_work_func+0x68/0x98)
    [] (deferred_probe_work_func) from [] (process_one_work+0x120/0x3f8)
    [] (process_one_work) from [] (worker_thread+0x38/0x55c)
    [] (worker_thread) from [] (kthread+0xdc/0xf4)
    [] (kthread) from [] (ret_from_fork+0x14/0x3c)

    Fix it by marking workqueues created via create*_workqueue() with
    __WQ_LEGACY and disabling flush dependency checks on them.

    Signed-off-by: Tejun Heo
    Reported-and-tested-by: Thierry Reding
    Link: http://lkml.kernel.org/g/20160126173843.GA11115@ulmo.nvidia.com
    Fixes: fca839c00a12 ("workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue")

    Tejun Heo
     

09 Dec, 2015

1 commit

  • Workqueue stalls can happen from a variety of usage bugs such as
    missing WQ_MEM_RECLAIM flag or concurrency managed work item
    indefinitely staying RUNNING. These stalls can be extremely difficult
    to hunt down because the usual warning mechanisms can't detect
    workqueue stalls and the internal state is pretty opaque.

    To alleviate the situation, this patch implements workqueue lockup
    detector. It periodically monitors all worker_pools periodically and,
    if any pool failed to make forward progress longer than the threshold
    duration, triggers warning and dumps workqueue state as follows.

    BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 31s!
    Showing busy workqueues and worker pools:
    workqueue events: flags=0x0
    pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=17/256
    pending: monkey_wrench_fn, e1000_watchdog, cache_reap, vmstat_shepherd, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, cgroup_release_agent
    workqueue events_power_efficient: flags=0x80
    pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
    pending: check_lifetime, neigh_periodic_work
    workqueue cgroup_pidlist_destroy: flags=0x0
    pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
    pending: cgroup_pidlist_destroy_work_fn
    ...

    The detection mechanism is controller through kernel parameter
    workqueue.watchdog_thresh and can be updated at runtime through the
    sysfs module parameter file.

    v2: Decoupled from softlockup control knobs.

    Signed-off-by: Tejun Heo
    Acked-by: Don Zickus
    Cc: Ulrich Obergfell
    Cc: Michal Hocko
    Cc: Chris Mason
    Cc: Andrew Morton

    Tejun Heo
     

18 Aug, 2015

1 commit

  • There are some errors in the docbook comments in workqueue.h that cause
    warnings when the docs are built; this only recently came to light because
    these comments were not used until now. Fix the comments to make the
    warnings go away.

    The "args..." "fix" is a hack. kerneldoc doesn't deal properly with named
    variadic arguments in macros, so all I've really achieved here is to make
    it shut up. Fixing kerneldoc will have to wait for more time.

    Signed-off-by: Jonathan Corbet
    Signed-off-by: Tejun Heo

    Jonathan Corbet
     

22 May, 2015

1 commit


30 Apr, 2015

1 commit

  • Allow to modify the low-level unbound workqueues cpumask through
    sysfs. This is performed by traversing the entire workqueue list
    and calling apply_wqattrs_prepare() on the unbound workqueues
    with the new low level mask. Only after all the preparation are done,
    we commit them all together.

    Ordered workqueues are ignored from the low level unbound workqueue
    cpumask, it will be handled in near future.

    All the (default & per-node) pwqs are mandatorily controlled by
    the low level cpumask. If the user configured cpumask doesn't overlap
    with the low level cpumask, the low level cpumask will be used for the
    wq instead.

    The comment of wq_calc_node_cpumask() is updated and explicitly
    requires that its first argument should be the attrs of the default
    pwq.

    The default wq_unbound_cpumask is cpu_possible_mask. The workqueue
    subsystem doesn't know its best default value, let the system manager
    or the other subsystem set it when needed.

    Changed from V8:
    merge the calculating code for the attrs of the default pwq together.
    minor change the code&comments for saving the user configured attrs.
    remove unnecessary list_del().
    minor update the comment of wq_calc_node_cpumask().
    update the comment of workqueue_set_unbound_cpumask();

    Cc: Christoph Lameter
    Cc: Kevin Hilman
    Cc: Lai Jiangshan
    Cc: Mike Galbraith
    Cc: Paul E. McKenney
    Cc: Tejun Heo
    Cc: Viresh Kumar
    Cc: Frederic Weisbecker
    Original-patch-by: Frederic Weisbecker
    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     

09 Mar, 2015

1 commit

  • Workqueues are used extensively throughout the kernel but sometimes
    it's difficult to debug stalls involving work items because visibility
    into its inner workings is fairly limited. Although sysrq-t task dump
    annotates each active worker task with the information on the work
    item being executed, it is challenging to find out which work items
    are pending or delayed on which queues and how pools are being
    managed.

    This patch implements show_workqueue_state() which dumps all busy
    workqueues and pools and is called from the sysrq-t handler. At the
    end of sysrq-t dump, something like the following is printed.

    Showing busy workqueues and worker pools:
    ...
    workqueue filler_wq: flags=0x0
    pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
    in-flight: 491:filler_workfn, 507:filler_workfn
    pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
    in-flight: 501:filler_workfn
    pending: filler_workfn
    ...
    workqueue test_wq: flags=0x8
    pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1
    in-flight: 510(RESCUER):test_workfn BAR(69) BAR(500)
    delayed: test_workfn1 BAR(492), test_workfn2
    ...
    pool 0: cpus=0 node=0 flags=0x0 nice=0 workers=2 manager: 137
    pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=3 manager: 469
    pool 3: cpus=1 node=0 flags=0x0 nice=-20 workers=2 idle: 16
    pool 8: cpus=0-3 flags=0x4 nice=0 workers=2 manager: 62

    The above shows that test_wq is executing test_workfn() on pid 510
    which is the rescuer and also that there are two tasks 69 and 500
    waiting for the work item to finish in flush_work(). As test_wq has
    max_active of 1, there are two work items for test_workfn1() and
    test_workfn2() which are delayed till the current work item is
    finished. In addition, pid 492 is flushing test_workfn1().

    The work item for test_workfn() is being executed on pwq of pool 2
    which is the normal priority per-cpu pool for CPU 1. The pool has
    three workers, two of which are executing filler_workfn() for
    filler_wq and the last one is assuming the manager role trying to
    create more workers.

    This extra workqueue state dump will hopefully help chasing down hangs
    involving workqueues.

    v3: cpulist_pr_cont() replaced with "%*pbl" printf formatting.

    v2: As suggested by Andrew, minor formatting change in pr_cont_work(),
    printk()'s replaced with pr_info()'s, and cpumask printing now
    uses cpulist_pr_cont().

    Signed-off-by: Tejun Heo
    Cc: Lai Jiangshan
    Cc: Linus Torvalds
    Cc: Andrew Morton
    CC: Ingo Molnar

    Tejun Heo
     

05 Mar, 2015

1 commit

  • cancel[_delayed]_work_sync() are implemented using
    __cancel_work_timer() which grabs the PENDING bit using
    try_to_grab_pending() and then flushes the work item with PENDING set
    to prevent the on-going execution of the work item from requeueing
    itself.

    try_to_grab_pending() can always grab PENDING bit without blocking
    except when someone else is doing the above flushing during
    cancelation. In that case, try_to_grab_pending() returns -ENOENT. In
    this case, __cancel_work_timer() currently invokes flush_work(). The
    assumption is that the completion of the work item is what the other
    canceling task would be waiting for too and thus waiting for the same
    condition and retrying should allow forward progress without excessive
    busy looping

    Unfortunately, this doesn't work if preemption is disabled or the
    latter task has real time priority. Let's say task A just got woken
    up from flush_work() by the completion of the target work item. If,
    before task A starts executing, task B gets scheduled and invokes
    __cancel_work_timer() on the same work item, its try_to_grab_pending()
    will return -ENOENT as the work item is still being canceled by task A
    and flush_work() will also immediately return false as the work item
    is no longer executing. This puts task B in a busy loop possibly
    preventing task A from executing and clearing the canceling state on
    the work item leading to a hang.

    task A task B worker

    executing work
    __cancel_work_timer()
    try_to_grab_pending()
    set work CANCELING
    flush_work()
    block for work completion
    completion, wakes up A
    __cancel_work_timer()
    while (forever) {
    try_to_grab_pending()
    -ENOENT as work is being canceled
    flush_work()
    false as work is no longer executing
    }

    This patch removes the possible hang by updating __cancel_work_timer()
    to explicitly wait for clearing of CANCELING rather than invoking
    flush_work() after try_to_grab_pending() fails with -ENOENT.

    Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com

    v3: bit_waitqueue() can't be used for work items defined in vmalloc
    area. Switched to custom wake function which matches the target
    work item and exclusive wait and wakeup.

    v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if
    the target bit waitqueue has wait_bit_queue's on it. Use
    DEFINE_WAIT_BIT() and __wake_up_bit() instead. Reported by Tomeu
    Vizoso.

    Signed-off-by: Tejun Heo
    Reported-by: Rabin Vincent
    Cc: Tomeu Vizoso
    Cc: stable@vger.kernel.org
    Tested-by: Jesper Nilsson
    Tested-by: Rabin Vincent

    Tejun Heo
     

07 Jan, 2015

1 commit


13 Sep, 2014

1 commit

  • create_singlethread_workqueue() is a compat interface for single
    threaded workqueue which maps to ordered workqueue w/ rescuer in the
    current implementation. create_singlethread_workqueue() currently
    implemented by invoking alloc_workqueue() w/ appropriate parameters.

    8719dceae2f9 ("workqueue: reject adjusting max_active or applying
    attrs to ordered workqueues") introduced __WQ_ORDERED to protect
    ordered workqueues against dynamic attribute changes which can break
    ordering guarantees but forgot to apply it to
    create_singlethread_workqueue(). This in itself is okay as nobody
    currently uses dynamic attribute change on workqueues created with
    create_singlethread_workqueue().

    However, 4c16bd327c ("workqueue: implement NUMA affinity for unbound
    workqueues") broke singlethreaded guarantee for ordered workqueues
    through allocating a separate pool_workqueue on each NUMA node by
    default. A later change 8a2b75384444 ("workqueue: fix ordered
    workqueues in NUMA setups") fixed it by allocating only one global
    pool_workqueue if __WQ_ORDERED is set.

    Combined, the __WQ_ORDERED omission in create_singlethread_workqueue()
    became critical breaking its single threadedness and ordering
    guarantee.

    Let's make create_singlethread_workqueue() wrap
    alloc_ordered_workqueue() instead so that it inherits __WQ_ORDERED and
    can implicitly track future ordered_workqueue changes.

    v2: I missed that __WQ_ORDERED now protects against pwq splitting
    across NUMA nodes and incorrectly described the patch as a
    nice-to-have fix to protect against future dynamic attribute
    usages. Oleg pointed out that this is actually a critical
    breakage due to 8a2b75384444 ("workqueue: fix ordered workqueues
    in NUMA setups").

    Signed-off-by: Tejun Heo
    Reported-by: Mike Anderson
    Cc: Oleg Nesterov
    Cc: Gustavo Luiz Duarte
    Cc: Tomas Henzl
    Cc: stable@vger.kernel.org
    Fixes: 4c16bd327c ("workqueue: implement NUMA affinity for unbound workqueues")

    Tejun Heo
     

22 May, 2014

3 commits

  • In 8930caba3dbd ("workqueue: disable irq while manipulating PENDING"),
    setting last CPU and clearing PENDING got merged into a single
    operation (set_work_cpu_and_clear_pending()), which resulted that the
    internal routine work_clear_pending() is not used any more.

    tj: Minor description tweak.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • WORK_CPU_END is totally unused since 4e8b22bd1a37 ("workqueue: fix
    pool ID allocation leakage and remove BUILD_BUG_ON() in
    init_workqueues"). It should be removed.

    After it is removed, the comment "special cpu IDs" is not precise due to
    there is only one special CPU ID (WORK_CPU_UNBOUND) left, so we also
    change this comment to the description for WORK_CPU_UNBOUND.

    tj: Minor description and comment tweaks.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • system_highpri_wq is exported to modules via EXPORT_SYMBOL_GPL(),
    but it was forgotten to be declared in workqueue.h. So we add the declaration
    and a short description for it.

    tj: Minor comment tweak.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     

15 May, 2014

2 commits


02 Apr, 2014

1 commit

  • Pull timer changes from Thomas Gleixner:
    "This assorted collection provides:

    - A new timer based timer broadcast feature for systems which do not
    provide a global accessible timer device. That allows those
    systems to put CPUs into deep idle states where the per cpu timer
    device stops.

    - A few NOHZ_FULL related improvements to the timer wheel

    - The usual updates to timer devices found in ARM SoCs

    - Small improvements and updates all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
    tick: Remove code duplication in tick_handle_periodic()
    tick: Fix spelling mistake in tick_handle_periodic()
    x86: hpet: Use proper destructor for delayed work
    workqueue: Provide destroy_delayed_work_on_stack()
    clocksource: CMT, MTU2, TMU and STI should depend on GENERIC_CLOCKEVENTS
    timer: Remove code redundancy while calling get_nohz_timer_target()
    hrtimer: Rearrange comments in the order struct members are declared
    timer: Use variable head instead of &work_list in __run_timers()
    clocksource: exynos_mct: silence a static checker warning
    arm: zynq: Add support for cpufreq
    arm: zynq: Don't use arm_global_timer with cpufreq
    clocksource/cadence_ttc: Overhaul clocksource frequency adjustment
    clocksource/cadence_ttc: Call clockevents_update_freq() with IRQs enabled
    clocksource: Add Kconfig entries for CMT, MTU2, TMU and STI
    sh: Remove Kconfig entries for TMU, CMT and MTU2
    ARM: shmobile: Remove CMT, TMU and STI Kconfig entries
    clocksource: armada-370-xp: Use atomic access for shared registers
    clocksource: orion: Use atomic access for shared registers
    clocksource: timer-keystone: Delete unnecessary variable
    clocksource: timer-keystone: introduce clocksource driver for Keystone
    ...

    Linus Torvalds
     

29 Mar, 2014

1 commit

  • Tejun Heo has made WQ_NON_REENTRANT useless in the dbf2576e37
    ("workqueue: make all workqueues non-reentrant"). So remove its
    usages and definition.

    This patch doesn't introduce any behavior changes.

    tj: minor description updates.

    Signed-off-by: ZhangZhen
    Sigend-off-by: Tejun Heo
    Acked-by: James Chapman
    Acked-by: Ulf Hansson

    ZhangZhen
     

26 Mar, 2014

1 commit

  • If a delayed or deferrable work is on stack we need to tell debug
    objects that we are destroying the timer and the work. Otherwise we
    leak the tracking object.

    Signed-off-by: Thomas Gleixner
    Cc: Vince Weaver
    Acked-by: Tejun Heo
    Link: http://lkml.kernel.org/r/20140323141939.911487677@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

25 Mar, 2014

1 commit


07 Mar, 2014

2 commits

  • Peter Hurley noticed that since a2c1c57be8d9 ("workqueue: consider
    work function when searching for busy work items"), a work item which
    gets assigned a different work function would break out of the
    non-reentrancy guarantee as workqueue would consider it a different
    work item.

    This is fragile and extremely subtle. PREPARE_[DELAYED_]WORK() have
    never been used widely and its semantics has always been somewhat
    iffy. If the work item is known not to be on queue when
    PREPARE_WORK() is called, there's no difference from using
    INIT_WORK(). If the work item may be queued at the time of
    PREPARE_WORK(), we can't really tell whether the old or new function
    will be executed the next time.

    We really don't want this level of subtlety in workqueue interface for
    such marginal use cases. The previous patches converted all existing
    users away from PREPARE_[DELAYED_]WORK(). Let's remove them.

    Signed-off-by: Tejun Heo
    Cc: Peter Hurley
    Link: http://lkml.kernel.org/g/1392493119-9277-1-git-send-email-peter@hurleysoftware.com

    Tejun Heo
     
  • To receive 70044d71d31d ("firewire: don't use PREPARE_DELAYED_WORK").
    There will be further related updates in for-3.15 branch.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

19 Feb, 2014

1 commit

  • __cancel_delayed_work() was deprecated by 136b5721d75a ("workqueue:
    deprecate __cancel_delayed_work()") as cancel_delayed_work() was
    updated so that it could be used from all contexts. Enough time has
    passed since the deprecation. Let's remove it.

    tj: description update

    Signed-off-by: Tan Xiaojun
    Signed-off-by: Tejun Heo

    Tan Xiaojun
     

14 Feb, 2014

1 commit

  • Tommi noticed a 'funny' lock class name: "%s#5" from a lock acquired in
    process_one_work().

    Maybe #fmt plus #args could be used as the lock_name to give some more
    information for some fmt string like the above.

    __builtin_constant_p() check is removed (as there seems no good way to
    check all the variables in args list). However, by removing the check,
    it only adds two additional "s for those constants.

    Some lockdep name examples printed out after the change:

    lockdep name wq->name

    "events_long" events_long
    "%s"("khelper") khelper
    "xfs-data/%s"mp->m_fsname xfs-data/dm-3

    Signed-off-by: Li Zhong
    Signed-off-by: Tejun Heo

    Li Zhong
     

30 Jul, 2013

1 commit

  • dbf2576e37 ("workqueue: make all workqueues non-reentrant") made
    WQ_NON_REENTRANT no-op but the following patches didn't remove the
    flag or update the documentation. Let's mark the flag deprecated and
    update the documentation accordingly.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

04 Jul, 2013

1 commit

  • For the workqueue creation interfaces that do not expect format strings,
    make sure they cannot accidently be parsed that way. Additionally, clean
    up calls made with a single parameter that would be handled as a format
    string. Many callers are passing potentially dynamic string content, so
    use "%s" in those cases to avoid any potential accidents.

    Signed-off-by: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

15 May, 2013

2 commits

  • This patch adds system wide workqueues aligned towards power saving. This is
    done by allocating them with WQ_UNBOUND flag if 'wq_power_efficient' is set to
    'true'.

    tj: updated comments a bit.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Tejun Heo

    Viresh Kumar
     
  • Workqueues can be performance or power-oriented. Currently, most workqueues are
    bound to the CPU they were created on. This gives good performance (due to cache
    effects) at the cost of potentially waking up otherwise idle cores (Idle from
    scheduler's perspective. Which may or may not be physically idle) just to
    process some work. To save power, we can allow the work to be rescheduled on a
    core that is already awake.

    Workqueues created with the WQ_UNBOUND flag will allow some power savings.
    However, we don't change the default behaviour of the system. To enable
    power-saving behaviour, a new config option CONFIG_WQ_POWER_EFFICIENT needs to
    be turned on. This option can also be overridden by the
    workqueue.power_efficient boot parameter.

    tj: Updated config description and comments. Renamed
    CONFIG_WQ_POWER_EFFICIENT to CONFIG_WQ_POWER_EFFICIENT_DEFAULT.

    Signed-off-by: Viresh Kumar
    Reviewed-by: Amit Kucheria
    Signed-off-by: Tejun Heo

    Viresh Kumar
     

01 May, 2013

1 commit

  • One of the problems that arise when converting dedicated custom
    threadpool to workqueue is that the shared worker pool used by workqueue
    anonimizes each worker making it more difficult to identify what the
    worker was doing on which target from the output of sysrq-t or debug
    dump from oops, BUG() and friends.

    This patch implements set_worker_desc() which can be called from any
    workqueue work function to set its description. When the worker task is
    dumped for whatever reason - sysrq-t, WARN, BUG, oops, lockdep assertion
    and so on - the description will be printed out together with the
    workqueue name and the worker function pointer.

    The printing side is implemented by print_worker_info() which is called
    from functions in task dump paths - sched_show_task() and
    dump_stack_print_info(). print_worker_info() can be safely called on
    any task in any state as long as the task struct itself is accessible.
    It uses probe_*() functions to access worker fields. It may print
    garbage if something went very wrong, but it wouldn't cause (another)
    oops.

    The description is currently limited to 24bytes including the
    terminating \0. worker->desc_valid and workder->desc[] are added and
    the 64 bytes marker which was already incorrect before adding the new
    fields is moved to the correct position.

    Here's an example dump with writeback updated to set the bdi name as
    worker desc.

    Hardware name: Bochs
    Modules linked in:
    Pid: 7, comm: kworker/u9:0 Not tainted 3.9.0-rc1-work+ #1
    Workqueue: writeback bdi_writeback_workfn (flush-8:0)
    ffffffff820a3ab0 ffff88000f6e9cb8 ffffffff81c61845 ffff88000f6e9cf8
    ffffffff8108f50f 0000000000000000 0000000000000000 ffff88000cde16b0
    ffff88000cde1aa8 ffff88001ee19240 ffff88000f6e9fd8 ffff88000f6e9d08
    Call Trace:
    [] dump_stack+0x19/0x1b
    [] warn_slowpath_common+0x7f/0xc0
    [] warn_slowpath_null+0x1a/0x20
    [] bdi_writeback_workfn+0x2a0/0x3b0
    ...

    Signed-off-by: Tejun Heo
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Acked-by: Jan Kara
    Cc: Oleg Nesterov
    Cc: Jens Axboe
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

02 Apr, 2013

1 commit

  • …o disable NUMA affinity

    Unbound workqueues are now NUMA aware. Let's add some control knobs
    and update sysfs interface accordingly.

    * Add kernel param workqueue.numa_disable which disables NUMA affinity
    globally.

    * Replace sysfs file "pool_id" with "pool_ids" which contain
    node:pool_id pairs. This change is userland-visible but "pool_id"
    hasn't seen a release yet, so this is okay.

    * Add a new sysf files "numa" which can toggle NUMA affinity on
    individual workqueues. This is implemented as attrs->no_numa whichn
    is special in that it isn't part of a pool's attributes. It only
    affects how apply_workqueue_attrs() picks which pools to use.

    After "pool_ids" change, first_pwq() doesn't have any user left.
    Removed.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>

    Tejun Heo
     

14 Mar, 2013

1 commit

  • There's no reason to make these trivial wrappers full (exported)
    functions. Inline the followings.

    queue_work()
    queue_delayed_work()
    mod_delayed_work()
    schedule_work_on()
    schedule_work()
    schedule_delayed_work_on()
    schedule_delayed_work()
    keventd_up()

    Signed-off-by: Tejun Heo

    Tejun Heo
     

13 Mar, 2013

8 commits

  • Implement a function which queries whether it currently is running off
    a workqueue rescuer. This will be used to convert writeback to
    workqueue.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • There are cases where workqueue users want to expose control knobs to
    userland. e.g. Unbound workqueues with custom attributes are
    scheduled to be used for writeback workers and depending on
    configuration it can be useful to allow admins to tinker with the
    priority or allowed CPUs.

    This patch implements workqueue_sysfs_register(), which makes the
    workqueue visible under /sys/bus/workqueue/devices/WQ_NAME. There
    currently are two attributes common to both per-cpu and unbound pools
    and extra attributes for unbound pools including nice level and
    cpumask.

    If alloc_workqueue*() is called with WQ_SYSFS,
    workqueue_sysfs_register() is called automatically as part of
    workqueue creation. This is the preferred method unless the workqueue
    user wants to apply workqueue_attrs before making the workqueue
    visible to userland.

    v2: Disallow exposing ordered workqueues as ordered workqueues can't
    be tuned in any way.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Adjusting max_active of or applying new workqueue_attrs to an ordered
    workqueue breaks its ordering guarantee. The former is obvious. The
    latter is because applying attrs creates a new pwq (pool_workqueue)
    and there is no ordering constraint between the old and new pwqs.

    Make apply_workqueue_attrs() and workqueue_set_max_active() trigger
    WARN_ON() if those operations are requested on an ordered workqueue
    and fail / ignore respectively.

    Signed-off-by: Tejun Heo
    Reviewed-by: Lai Jiangshan

    Tejun Heo
     
  • We're gonna add another internal WQ flag. Let's make the distinction
    clear. Prefix WQ_DRAINING with __ and move it to bit 16.

    Signed-off-by: Tejun Heo
    Reviewed-by: Lai Jiangshan

    Tejun Heo
     
  • Implement apply_workqueue_attrs() which applies workqueue_attrs to the
    specified unbound workqueue by creating a new pwq (pool_workqueue)
    linked to worker_pool with the specified attributes.

    A new pwq is linked at the head of wq->pwqs instead of tail and
    __queue_work() verifies that the first unbound pwq has positive refcnt
    before choosing it for the actual queueing. This is to cover the case
    where creation of a new pwq races with queueing. As base ref on a pwq
    won't be dropped without making another pwq the first one,
    __queue_work() is guaranteed to make progress and not add work item to
    a dead pwq.

    init_and_link_pwq() is updated to return the last first pwq the new
    pwq replaced, which is put by apply_workqueue_attrs().

    Note that apply_workqueue_attrs() is almost identical to unbound pwq
    part of alloc_and_link_pwqs(). The only difference is that there is
    no previous first pwq. apply_workqueue_attrs() is implemented to
    handle such cases and replaces unbound pwq handling in
    alloc_and_link_pwqs().

    Signed-off-by: Tejun Heo
    Reviewed-by: Lai Jiangshan

    Tejun Heo
     
  • WQ_RESCUER is superflous. WQ_MEM_RECLAIM indicates that the user
    wants a rescuer and testing wq->rescuer for NULL can answer whether a
    given workqueue has a rescuer or not. Drop WQ_RESCUER and test
    wq->rescuer directly.

    This will help simplifying __alloc_workqueue_key() failure path by
    allowing it to use destroy_workqueue() on a partially constructed
    workqueue, which in turn will help implementing dynamic management of
    pool_workqueues.

    While at it, clear wq->rescuer after freeing it in
    destroy_workqueue(). This is a precaution as scheduled changes will
    make destruction more complex.

    This patch doesn't introduce any functional changes.

    Signed-off-by: Tejun Heo
    Reviewed-by: Lai Jiangshan

    Tejun Heo
     
  • Introduce struct workqueue_attrs which carries worker attributes -
    currently the nice level and allowed cpumask along with helper
    routines alloc_workqueue_attrs() and free_workqueue_attrs().

    Each worker_pool now carries ->attrs describing the attributes of its
    workers. All functions dealing with cpumask and nice level of workers
    are updated to follow worker_pool->attrs instead of determining them
    from other characteristics of the worker_pool, and init_workqueues()
    is updated to set worker_pool->attrs appropriately for all standard
    pools.

    Note that create_worker() is updated to always perform set_user_nice()
    and use set_cpus_allowed_ptr() combined with manual assertion of
    PF_THREAD_BOUND instead of kthread_bind(). This simplifies handling
    random attributes without affecting the outcome.

    This patch doesn't introduce any behavior changes.

    v2: Missing cpumask_var_t definition caused build failure on some
    archs. linux/cpumask.h included.

    Signed-off-by: Tejun Heo
    Reported-by: kbuild test robot
    Reviewed-by: Lai Jiangshan

    Tejun Heo
     
  • Workqueue is mixing unsigned int and int for @cpu variables. There's
    no point in using unsigned int for cpus - many of cpu related APIs
    take int anyway. Consistently use int for @cpu variables so that we
    can use negative values to mark special ones.

    This patch doesn't introduce any visible behavior changes.

    Signed-off-by: Tejun Heo
    Reviewed-by: Lai Jiangshan

    Tejun Heo
     

05 Mar, 2013

1 commit

  • When a work item is off-queue, its work->data contains WORK_STRUCT_*
    and WORK_OFFQ_* flags. As WORK_OFFQ_* flags are used only while a
    work item is off-queue, it can occupy bits of work->data which aren't
    used while off-queue. WORK_OFFQ_* currently only use bits used by
    on-queue CWQ pointer. As color bits aren't used while off-queue,
    there's no reason to not use them.

    Lower WORK_OFFQ_FLAG_BASE from WORK_STRUCT_FLAG_BITS to
    WORK_STRUCT_COLOR_SHIFT thus giving 4 more bits to off-queue flag
    space which is also used to record worker_pool ID while off-queue.

    This doesn't introduce any visible behavior difference.

    tj: Rewrote the description.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan