11 Jan, 2012

1 commit

  • alloc_workqueue() currently expects the passed in @name pointer to remain
    accessible. This is inconvenient and a bit silly given that the whole wq
    is being dynamically allocated. This patch updates alloc_workqueue() and
    friends to take printf format string instead of opaque string and matching
    varargs at the end. The name is allocated together with the wq and
    formatted.

    alloc_ordered_workqueue() is converted to a macro to unify varargs
    handling with alloc_workqueue(), and, while at it, add comment to
    alloc_workqueue().

    None of the current in-kernel users pass in string with '%' as constant
    name and this change shouldn't cause any problem.

    [akpm@linux-foundation.org: use __printf]
    Signed-off-by: Tejun Heo
    Suggested-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

31 Oct, 2011

1 commit

  • The changed files were only including linux/module.h for the
    EXPORT_SYMBOL infrastructure, and nothing else. Revector them
    onto the isolated export header for faster compile times.

    Nothing to see here but a whole lot of instances of:

    -#include
    +#include

    This commit is only changing the kernel dir; next targets
    will probably be mm, fs, the arch dirs, etc.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

15 Sep, 2011

1 commit

  • Take cwq->gcwq->lock to avoid racing between drain_workqueue checking to
    make sure the workqueues are empty and cwq_dec_nr_in_flight decrementing
    and then incrementing nr_active when it activates a delayed work.

    We discovered this when a corner case in one of our drivers resulted in
    us trying to destroy a workqueue in which the remaining work would
    always requeue itself again in the same workqueue. We would hit this
    race condition and trip the BUG_ON on workqueue.c:3080.

    Signed-off-by: Thomas Tuttle
    Acked-by: Tejun Heo
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Tuttle
     

23 Jul, 2011

1 commit


25 May, 2011

1 commit


20 May, 2011

1 commit

  • There are users which want to drain workqueues without destroying it.
    Separate out drain functionality from destroy_workqueue() into
    drain_workqueue() and make it accessible to workqueue users.

    To guarantee forward-progress, only chain queueing is allowed while
    drain is in progress. If a new work item which isn't chained from the
    running or pending work items is queued while draining is in progress,
    WARN_ON_ONCE() is triggered.

    Signed-off-by: Tejun Heo
    Cc: James Bottomley

    Tejun Heo
     

30 Apr, 2011

1 commit

  • If a rescuer and stop_machine() bringing down a CPU race with each
    other, they may deadlock on non-preemptive kernel. The CPU won't
    accept a new task, so the rescuer can't migrate to the target CPU,
    while stop_machine() can't proceed because the rescuer is holding one
    of the CPU retrying migration. GCWQ_DISASSOCIATED is never cleared
    and worker_maybe_bind_and_lock() retries indefinitely.

    This problem can be reproduced semi reliably while the system is
    entering suspend.

    http://thread.gmane.org/gmane.linux.kernel/1122051

    A lot of kudos to Thilo-Alexander for reporting this tricky issue and
    painstaking testing.

    stable: This affects all kernels with cmwq, so all kernels since and
    including v2.6.36 need this fix.

    Signed-off-by: Tejun Heo
    Reported-by: Thilo-Alexander Ginkel
    Tested-by: Thilo-Alexander Ginkel
    Cc: stable@kernel.org

    Tejun Heo
     

25 Mar, 2011

1 commit

  • Percpu allocator honors alignment request upto PAGE_SIZE and both the
    percpu addresses in the percpu address space and the translated kernel
    addresses should be aligned accordingly. The calculation of the
    former depends on the alignment of percpu output section in the kernel
    image.

    The linker script macros PERCPU_VADDR() and PERCPU() are used to
    define this output section and the latter takes @align parameter.
    Several architectures are using @align smaller than PAGE_SIZE breaking
    percpu memory alignment.

    This patch removes @align parameter from PERCPU(), renames it to
    PERCPU_SECTION() and makes it always align to PAGE_SIZE. While at it,
    add PCPU_SETUP_BUG_ON() checks such that alignment problems are
    reliably detected and remove percpu alignment comment recently added
    in workqueue.c as the condition would trigger BUG way before reaching
    there.

    For um, this patch raises the alignment of percpu area. As the area
    is in .init, there shouldn't be any noticeable difference.

    This problem was discovered by David Howells while debugging boot
    failure on mn10300.

    Signed-off-by: Tejun Heo
    Acked-by: Mike Frysinger
    Cc: uclinux-dist-devel@blackfin.uclinux.org
    Cc: David Howells
    Cc: Jeff Dike
    Cc: user-mode-linux-devel@lists.sourceforge.net

    Tejun Heo
     

23 Mar, 2011

1 commit

  • ksoftirqd, kworker, migration, and pktgend kthreads can be created with
    kthread_create_on_node(), to get proper NUMA affinities for their stack and
    task_struct.

    Signed-off-by: Eric Dumazet
    Acked-by: David S. Miller
    Reviewed-by: Andi Kleen
    Acked-by: Rusty Russell
    Acked-by: Tejun Heo
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: David Howells
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

16 Mar, 2011

1 commit

  • * 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: fix build failure introduced by s/freezeable/freezable/
    workqueue: add system_freezeable_wq
    rds/ib: use system_wq instead of rds_ib_fmr_wq
    net/9p: replace p9_poll_task with a work
    net/9p: use system_wq instead of p9_mux_wq
    xfs: convert to alloc_workqueue()
    reiserfs: make commit_wq use the default concurrency level
    ocfs2: use system_wq instead of ocfs2_quota_wq
    ext4: convert to alloc_workqueue()
    scsi/scsi_tgt_lib: scsi_tgtd isn't used in memory reclaim path
    scsi/be2iscsi,qla2xxx: convert to alloc_workqueue()
    misc/iwmc3200top: use system_wq instead of dedicated workqueues
    i2o: use alloc_workqueue() instead of create_workqueue()
    acpi: kacpi*_wq don't need WQ_MEM_RECLAIM
    fs/aio: aio_wq isn't used in memory reclaim path
    input/tps6507x-ts: use system_wq instead of dedicated workqueue
    cpufreq: use system_wq instead of dedicated workqueues
    wireless/ipw2x00: use system_wq instead of dedicated workqueues
    arm/omap: use system_wq in mailbox
    workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER

    Linus Torvalds
     

08 Mar, 2011

1 commit

  • In complex subsystems like mac80211 structures can contain several
    timers and work structs, so identifying a specific instance from the
    call trace and object type output of debugobjects can be hard.

    Allow the subsystems which support debugobjects to provide a hint
    function. This function returns a pointer to a kernel address
    (preferrably the objects callback function) which is printed along
    with the debugobjects type.

    Add hint methods for timer_list, work_struct and hrtimer.

    [ tglx: Massaged changelog, made it compile ]

    Signed-off-by: Stanislaw Gruszka
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Stanislaw Gruszka
     

21 Feb, 2011

2 commits


17 Feb, 2011

2 commits

  • MAYDAY_INITIAL_TIMEOUT is defined as HZ / 100 and depending on
    configuration may end up 0 or 1. Even when it's 1, depending on when
    the mayday timer is added in the current jiffy interval, it may expire
    way before a jiffy has passed.

    Make sure MAYDAY_INITIAL_TIMEOUT is at least two to guarantee that at
    least a full jiffy has passed before calling rescuers.

    Signed-off-by: Tejun Heo
    Reported-by: Ray Jui
    Cc: stable@kernel.org

    Tejun Heo
     
  • There are two spellings in use for 'freeze' + 'able' - 'freezable' and
    'freezeable'. The former is the more prominent one. The latter is
    mostly used by workqueue and in a few other odd places. Unify the
    spelling to 'freezable'.

    Signed-off-by: Tejun Heo
    Reported-by: Alan Stern
    Acked-by: "Rafael J. Wysocki"
    Acked-by: Greg Kroah-Hartman
    Acked-by: Dmitry Torokhov
    Cc: David Woodhouse
    Cc: Alex Dubov
    Cc: "David S. Miller"
    Cc: Steven Whitehouse

    Tejun Heo
     

14 Feb, 2011

1 commit

  • After executing the matching works, a rescuer leaves the gcwq whether
    there are more pending works or not. This may decrease the
    concurrency level to zero and stall execution until a new work item is
    queued on the gcwq.

    Make rescuer wake up a regular worker when it leaves a gcwq if there
    are more works to execute, so that execution isn't stalled.

    Signed-off-by: Tejun Heo
    Reported-by: Ray Jui
    Cc: stable@kernel.org

    Tejun Heo
     

09 Feb, 2011

1 commit


11 Jan, 2011

2 commits

  • The nested NOT_RUNNING test in worker_clr_flags() is slightly
    misleading in that if NOT_RUNNING were a single flag the nested test
    would be always %true and thus noop. Add a comment noting that the
    test isn't a noop.

    Signed-off-by: Tejun Heo
    Cc: Hillf Danton
    Cc: Andrew Morton

    Tejun Heo
     
  • Currently, the lockdep annotation in flush_work() requires exclusive
    access on the workqueue the target work is queued on and triggers
    warning if a work is trying to flush another work on the same
    workqueue; however, this is no longer true as workqueues can now
    execute multiple works concurrently.

    This patch adds lock_map_acquire_read() and make process_one_work()
    hold read access to the workqueue while executing a work and
    start_flush_work() check for write access if concurrnecy level is one
    or the workqueue has a rescuer (as only one execution resource - the
    rescuer - is guaranteed to be available under memory pressure), and
    read access if higher.

    This better represents what's going on and removes spurious lockdep
    warnings which are triggered by fake dependency chain created through
    flush_work().

    * Peter pointed out that flushing another work from a WQ_MEM_RECLAIM
    wq breaks forward progress guarantee under memory pressure.
    Condition check accordingly updated.

    Signed-off-by: Tejun Heo
    Reported-by: "Rafael J. Wysocki"
    Tested-by: "Rafael J. Wysocki"
    Cc: Peter Zijlstra
    Cc: stable@kernel.org

    Tejun Heo
     

21 Dec, 2010

1 commit

  • Currently, destroy_workqueue() makes the workqueue deny all new
    queueing by setting WQ_DYING and flushes the workqueue once before
    proceeding with destruction; however, there are cases where work items
    queue more related work items. Currently, such users need to
    explicitly flush the workqueue multiple times depending on the
    possible depth of such chained queueing.

    This patch updates the queueing path such that a work item can queue
    further work items on the same workqueue even when WQ_DYING is set.
    The flush on destruction is automatically retried until the workqueue
    is empty. This guarantees that the workqueue is empty on destruction
    while allowing chained queueing.

    The flush retry logic whines if it takes too many retries to drain the
    workqueue.

    Signed-off-by: Tejun Heo
    Cc: James Bottomley

    Tejun Heo
     

14 Dec, 2010

1 commit

  • Running the annotate branch profiler on three boxes, including my
    main box that runs firefox, evolution, xchat, and is part of the distcc farm,
    showed this with the likelys in the workqueue code:

    correct incorrect % Function File Line
    ------- --------- - -------- ---- ----
    96 996253 99 wq_worker_sleeping workqueue.c 703
    96 996247 99 wq_worker_waking_up workqueue.c 677

    The likely()s in this case were assuming that WORKER_NOT_RUNNING will
    most likely be false. But this is not the case. The reason is
    (and shown by adding trace_printks and testing it) that most of the time
    WORKER_PREP is set.

    In worker_thread() we have:

    worker_clr_flags(worker, WORKER_PREP);

    [ do work stuff ]

    worker_set_flags(worker, WORKER_PREP, false);

    (that 'false' means not to wake up an idle worker)

    The wq_worker_sleeping() is called from schedule when a worker thread
    is putting itself to sleep. Which happens most of the time outside
    of that [ do work stuff ].

    The wq_worker_waking_up is called by the wakeup worker code, which
    is also callod outside that [ do work stuff ].

    Thus, the likely and unlikely used by those two functions are actually
    backwards.

    Remove the annotation and let gcc figure it out.

    Acked-by: Tejun Heo
    Signed-off-by: Steven Rostedt
    Signed-off-by: Tejun Heo

    Steven Rostedt
     

26 Nov, 2010

1 commit


27 Oct, 2010

1 commit

  • Silly though it is, completions and wait_queue_heads use foo_ONSTACK
    (COMPLETION_INITIALIZER_ONSTACK, DECLARE_COMPLETION_ONSTACK,
    __WAIT_QUEUE_HEAD_INIT_ONSTACK and DECLARE_WAIT_QUEUE_HEAD_ONSTACK) so I
    guess workqueues should do the same thing.

    s/INIT_WORK_ON_STACK/INIT_WORK_ONSTACK/
    s/INIT_DELAYED_WORK_ON_STACK/INIT_DELAYED_WORK_ONSTACK/

    Cc: Peter Zijlstra
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

26 Oct, 2010

1 commit

  • In the MN10300 arch, we occasionally see an assertion being tripped in
    alloc_cwqs() at the following line:

    /* just in case, make sure it's actually aligned */
    ---> BUG_ON(!IS_ALIGNED(wq->cpu_wq.v, align));
    return wq->cpu_wq.v ? 0 : -ENOMEM;

    The values are:

    wa->cpu_wq.v => 0x902776e0
    align => 0x100

    and align is calculated by the following:

    const size_t align = max_t(size_t, 1 << WORK_STRUCT_FLAG_BITS,
    __alignof__(unsigned long long));

    This is because the pointer in question (wq->cpu_wq.v) loses some of its
    lower bits to control flags, and so the object it points to must be
    sufficiently aligned to avoid the need to use those bits for pointing to
    things.

    Currently, 4 control bits and 4 colour bits are used in normal
    circumstances, plus a debugging bit if debugging is set. This requires
    the cpu_workqueue_struct struct to be at least 256 bytes aligned (or 512
    bytes aligned with debugging).

    PERCPU() alignment on MN13000, however, is only 32 bytes as set in
    vmlinux.lds.S. So we set this to PAGE_SIZE (4096) to match most other
    arches and stick a comment in alloc_cwqs() for anyone else who triggers
    the assertion.

    Reported-by: Akira Takeuchi
    Signed-off-by: David Howells
    Acked-by: Mark Salter
    Cc: Tejun Heo
    Signed-off-by: Linus Torvalds

    David Howells
     

19 Oct, 2010

2 commits

  • Commit a25909a4 (lockdep: Add an in_workqueue_context() lockdep-based
    test function) added in_workqueue_context() but there hasn't been any
    in-kernel user and the lockdep annotation in workqueue is scheduled to
    change. Remove the unused function.

    Signed-off-by: Tejun Heo
    Cc: Paul E. McKenney

    Tejun Heo
     
  • The documentation for schedule_on_each_cpu() states that it calls a
    function on each online CPU from keventd. This can easily be
    interpreted as an asyncronous call because the description does not
    mention that flush_work is called. Clarify that it is synchronous.

    tj: rephrased a bit

    Signed-off-by: Mel Gorman
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Tejun Heo

    Tejun Heo
     

11 Oct, 2010

2 commits

  • Add WQ_MEM_RECLAIM flag which currently maps to WQ_RESCUER, mark
    WQ_RESCUER as internal and replace all external WQ_RESCUER usages to
    WQ_MEM_RECLAIM.

    This makes the API users express the intent of the workqueue instead
    of indicating the internal mechanism used to guarantee forward
    progress. This is also to make it cleaner to add more semantics to
    WQ_MEM_RECLAIM. For example, if deemed necessary, memory reclaim
    workqueues can be made highpri.

    This patch doesn't introduce any functional change.

    Signed-off-by: Tejun Heo
    Cc: Jeff Garzik
    Cc: Dave Chinner
    Cc: Steven Whitehouse

    Tejun Heo
     
  • The policy function keep_working() didn't check GCWQ_HIGHPRI_PENDING
    and could return %false with highpri work pending. This could lead to
    late execution of a highpri work which was delayed due to @max_active
    throttling if other works are actively consuming CPU cycles.

    For example, the following could happen.

    1. Work W0 which burns CPU cycles.

    2. Two works W1 and W2 are queued to a highpri wq w/ @max_active of 1.

    3. W1 starts executing and W2 is put to delayed queue. W0 and W1 are
    both runnable.

    4. W1 finishes which puts W2 to pending queue but keep_working()
    incorrectly returns %false and the worker goes to sleep.

    5. W0 finishes and W2 starts execution.

    With this patch applied, W2 starts execution as soon as W1 finishes.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

05 Oct, 2010

2 commits


19 Sep, 2010

3 commits

  • Implement flush[_delayed]_work_sync(). These are flush functions
    which also make sure no CPU is still executing the target work from
    earlier queueing instances. These are similar to
    cancel[_delayed]_work_sync() except that the target work item is
    flushed instead of cancelled.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Factor out start_flush_work() from flush_work(). start_flush_work()
    has @wait_executing argument which controls whether the barrier is
    queued only if the work is pending or also if executing. As
    flush_work() needs to wait for execution too, it uses %true.

    This commit doesn't cause any behavior difference. start_flush_work()
    will be used to implement flush_work_sync().

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Make the following cleanup changes.

    * Relocate flush/cancel function prototypes and definitions.

    * Relocate wait_on_cpu_work() and wait_on_work() before
    try_to_grab_pending(). These will be used to implement
    flush_work_sync().

    * Make all flush/cancel functions return bool instead of int.

    * Update wait_on_cpu_work() and wait_on_work() to return %true if they
    actually waited.

    * Add / update comments.

    This patch doesn't cause any functional changes.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

13 Sep, 2010

1 commit

  • Update copyright notice and add Documentation/workqueue.txt.

    Randy Dunlap, Dave Chinner: misc fixes.

    Signed-off-by: Tejun Heo
    Reviewed-By: Florian Mickler
    Cc: Ingo Molnar
    Cc: Christoph Lameter
    Cc: Randy Dunlap
    Cc: Dave Chinner

    Tejun Heo
     

08 Sep, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: use zalloc_cpumask_var() for gcwq->mayday_mask
    workqueue: fix GCWQ_DISASSOCIATED initialization
    workqueue: Add a workqueue chapter to the tracepoint docbook
    workqueue: fix cwq->nr_active underflow
    workqueue: improve destroy_workqueue() debuggability
    workqueue: mark lock acquisition on worker_maybe_bind_and_lock()
    workqueue: annotate lock context change
    workqueue: free rescuer on destroy_workqueue

    Linus Torvalds
     

31 Aug, 2010

2 commits

  • alloc_mayday_mask() was using alloc_cpumask_var() making
    gcwq->mayday_mask contain garbage after initialization on
    CONFIG_CPUMASK_OFFSTACK=y configurations. This combined with the
    previously fixed GCWQ_DISASSOCIATED initialization bug could make
    rescuers fall into infinite loop trying to bind to an offline cpu.

    Signed-off-by: Tejun Heo
    Reported-by: CAI Qian

    Tejun Heo
     
  • init_workqueues() incorrectly marks workqueues for all possible CPUs
    associated. Combined with mayday_mask initialization bug, this can
    make rescuers keep trying to bind to an offline gcwq indefinitely.
    Fix init_workqueues() such that only online CPUs have their gcwqs have
    GCWQ_DISASSOCIATED cleared.

    Signed-off-by: Tejun Heo
    Reported-by: CAI Qian

    Tejun Heo
     

25 Aug, 2010

2 commits

  • cwq->nr_active is used to keep track of how many work items are active
    for the cpu workqueue, where 'active' is defined as either pending on
    global worklist or executing. This is used to implement the
    max_active limit and workqueue freezing. If a work item is queued
    after nr_active has already reached max_active, the work item doesn't
    increment nr_active and is put on the delayed queue and gets activated
    later as previous active work items retire.

    try_to_grab_pending() which is used in the cancellation path
    unconditionally decremented nr_active whether the work item being
    cancelled is currently active or delayed, so cancelling a delayed work
    item makes nr_active underflow. This breaks max_active enforcement
    and triggers BUG_ON() in destroy_workqueue() later on.

    This patch fixes this bug by adding a flag WORK_STRUCT_DELAYED, which
    is set while a work item in on the delayed list and making
    try_to_grab_pending() decrement nr_active iff the work item is
    currently active.

    The addition of the flag enlarges cwq alignment to 256 bytes which is
    getting a bit too large. It's scheduled to be reduced back to 128
    bytes by merging WORK_STRUCT_PENDING and WORK_STRUCT_CWQ in the next
    devel cycle.

    Signed-off-by: Tejun Heo
    Reported-by: Johannes Berg

    Tejun Heo
     
  • Now that the worklist is global, having works pending after wq
    destruction can easily lead to oops and destroy_workqueue() have
    several BUG_ON()s to catch these cases. Unfortunately, BUG_ON()
    doesn't tell much about how the work became pending after the final
    flush_workqueue().

    This patch adds WQ_DYING which is set before the final flush begins.
    If a work is requested to be queued on a dying workqueue,
    WARN_ON_ONCE() is triggered and the request is ignored. This clearly
    indicates which caller is trying to queue a work on a dying workqueue
    and keeps the system working in most cases.

    Locking rule comment is updated such that the 'I' rule includes
    modifying the field from destruction path.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

23 Aug, 2010

1 commit

  • worker_maybe_bind_and_lock() actually grabs gcwq->lock but was missing proper
    annotation. Add it. So this patch will remove following sparse warnings:

    kernel/workqueue.c:1214:13: warning: context imbalance in 'worker_maybe_bind_and_lock' - wrong count at exit
    arch/x86/include/asm/irqflags.h:44:9: warning: context imbalance in 'worker_rebind_fn' - unexpected unlock
    kernel/workqueue.c:1991:17: warning: context imbalance in 'rescuer_thread' - unexpected unlock

    Signed-off-by: Namhyung Kim
    Signed-off-by: Tejun Heo

    Namhyung Kim