11 Jan, 2011

2 commits

  • The nested NOT_RUNNING test in worker_clr_flags() is slightly
    misleading in that if NOT_RUNNING were a single flag the nested test
    would be always %true and thus noop. Add a comment noting that the
    test isn't a noop.

    Signed-off-by: Tejun Heo
    Cc: Hillf Danton
    Cc: Andrew Morton

    Tejun Heo
     
  • Currently, the lockdep annotation in flush_work() requires exclusive
    access on the workqueue the target work is queued on and triggers
    warning if a work is trying to flush another work on the same
    workqueue; however, this is no longer true as workqueues can now
    execute multiple works concurrently.

    This patch adds lock_map_acquire_read() and make process_one_work()
    hold read access to the workqueue while executing a work and
    start_flush_work() check for write access if concurrnecy level is one
    or the workqueue has a rescuer (as only one execution resource - the
    rescuer - is guaranteed to be available under memory pressure), and
    read access if higher.

    This better represents what's going on and removes spurious lockdep
    warnings which are triggered by fake dependency chain created through
    flush_work().

    * Peter pointed out that flushing another work from a WQ_MEM_RECLAIM
    wq breaks forward progress guarantee under memory pressure.
    Condition check accordingly updated.

    Signed-off-by: Tejun Heo
    Reported-by: "Rafael J. Wysocki"
    Tested-by: "Rafael J. Wysocki"
    Cc: Peter Zijlstra
    Cc: stable@kernel.org

    Tejun Heo
     

21 Dec, 2010

1 commit

  • Currently, destroy_workqueue() makes the workqueue deny all new
    queueing by setting WQ_DYING and flushes the workqueue once before
    proceeding with destruction; however, there are cases where work items
    queue more related work items. Currently, such users need to
    explicitly flush the workqueue multiple times depending on the
    possible depth of such chained queueing.

    This patch updates the queueing path such that a work item can queue
    further work items on the same workqueue even when WQ_DYING is set.
    The flush on destruction is automatically retried until the workqueue
    is empty. This guarantees that the workqueue is empty on destruction
    while allowing chained queueing.

    The flush retry logic whines if it takes too many retries to drain the
    workqueue.

    Signed-off-by: Tejun Heo
    Cc: James Bottomley

    Tejun Heo
     

14 Dec, 2010

1 commit

  • Running the annotate branch profiler on three boxes, including my
    main box that runs firefox, evolution, xchat, and is part of the distcc farm,
    showed this with the likelys in the workqueue code:

    correct incorrect % Function File Line
    ------- --------- - -------- ---- ----
    96 996253 99 wq_worker_sleeping workqueue.c 703
    96 996247 99 wq_worker_waking_up workqueue.c 677

    The likely()s in this case were assuming that WORKER_NOT_RUNNING will
    most likely be false. But this is not the case. The reason is
    (and shown by adding trace_printks and testing it) that most of the time
    WORKER_PREP is set.

    In worker_thread() we have:

    worker_clr_flags(worker, WORKER_PREP);

    [ do work stuff ]

    worker_set_flags(worker, WORKER_PREP, false);

    (that 'false' means not to wake up an idle worker)

    The wq_worker_sleeping() is called from schedule when a worker thread
    is putting itself to sleep. Which happens most of the time outside
    of that [ do work stuff ].

    The wq_worker_waking_up is called by the wakeup worker code, which
    is also callod outside that [ do work stuff ].

    Thus, the likely and unlikely used by those two functions are actually
    backwards.

    Remove the annotation and let gcc figure it out.

    Acked-by: Tejun Heo
    Signed-off-by: Steven Rostedt
    Signed-off-by: Tejun Heo

    Steven Rostedt
     

26 Nov, 2010

1 commit


27 Oct, 2010

1 commit

  • Silly though it is, completions and wait_queue_heads use foo_ONSTACK
    (COMPLETION_INITIALIZER_ONSTACK, DECLARE_COMPLETION_ONSTACK,
    __WAIT_QUEUE_HEAD_INIT_ONSTACK and DECLARE_WAIT_QUEUE_HEAD_ONSTACK) so I
    guess workqueues should do the same thing.

    s/INIT_WORK_ON_STACK/INIT_WORK_ONSTACK/
    s/INIT_DELAYED_WORK_ON_STACK/INIT_DELAYED_WORK_ONSTACK/

    Cc: Peter Zijlstra
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

26 Oct, 2010

1 commit

  • In the MN10300 arch, we occasionally see an assertion being tripped in
    alloc_cwqs() at the following line:

    /* just in case, make sure it's actually aligned */
    ---> BUG_ON(!IS_ALIGNED(wq->cpu_wq.v, align));
    return wq->cpu_wq.v ? 0 : -ENOMEM;

    The values are:

    wa->cpu_wq.v => 0x902776e0
    align => 0x100

    and align is calculated by the following:

    const size_t align = max_t(size_t, 1 << WORK_STRUCT_FLAG_BITS,
    __alignof__(unsigned long long));

    This is because the pointer in question (wq->cpu_wq.v) loses some of its
    lower bits to control flags, and so the object it points to must be
    sufficiently aligned to avoid the need to use those bits for pointing to
    things.

    Currently, 4 control bits and 4 colour bits are used in normal
    circumstances, plus a debugging bit if debugging is set. This requires
    the cpu_workqueue_struct struct to be at least 256 bytes aligned (or 512
    bytes aligned with debugging).

    PERCPU() alignment on MN13000, however, is only 32 bytes as set in
    vmlinux.lds.S. So we set this to PAGE_SIZE (4096) to match most other
    arches and stick a comment in alloc_cwqs() for anyone else who triggers
    the assertion.

    Reported-by: Akira Takeuchi
    Signed-off-by: David Howells
    Acked-by: Mark Salter
    Cc: Tejun Heo
    Signed-off-by: Linus Torvalds

    David Howells
     

19 Oct, 2010

2 commits

  • Commit a25909a4 (lockdep: Add an in_workqueue_context() lockdep-based
    test function) added in_workqueue_context() but there hasn't been any
    in-kernel user and the lockdep annotation in workqueue is scheduled to
    change. Remove the unused function.

    Signed-off-by: Tejun Heo
    Cc: Paul E. McKenney

    Tejun Heo
     
  • The documentation for schedule_on_each_cpu() states that it calls a
    function on each online CPU from keventd. This can easily be
    interpreted as an asyncronous call because the description does not
    mention that flush_work is called. Clarify that it is synchronous.

    tj: rephrased a bit

    Signed-off-by: Mel Gorman
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Tejun Heo

    Tejun Heo
     

11 Oct, 2010

2 commits

  • Add WQ_MEM_RECLAIM flag which currently maps to WQ_RESCUER, mark
    WQ_RESCUER as internal and replace all external WQ_RESCUER usages to
    WQ_MEM_RECLAIM.

    This makes the API users express the intent of the workqueue instead
    of indicating the internal mechanism used to guarantee forward
    progress. This is also to make it cleaner to add more semantics to
    WQ_MEM_RECLAIM. For example, if deemed necessary, memory reclaim
    workqueues can be made highpri.

    This patch doesn't introduce any functional change.

    Signed-off-by: Tejun Heo
    Cc: Jeff Garzik
    Cc: Dave Chinner
    Cc: Steven Whitehouse

    Tejun Heo
     
  • The policy function keep_working() didn't check GCWQ_HIGHPRI_PENDING
    and could return %false with highpri work pending. This could lead to
    late execution of a highpri work which was delayed due to @max_active
    throttling if other works are actively consuming CPU cycles.

    For example, the following could happen.

    1. Work W0 which burns CPU cycles.

    2. Two works W1 and W2 are queued to a highpri wq w/ @max_active of 1.

    3. W1 starts executing and W2 is put to delayed queue. W0 and W1 are
    both runnable.

    4. W1 finishes which puts W2 to pending queue but keep_working()
    incorrectly returns %false and the worker goes to sleep.

    5. W0 finishes and W2 starts execution.

    With this patch applied, W2 starts execution as soon as W1 finishes.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

05 Oct, 2010

2 commits


19 Sep, 2010

3 commits

  • Implement flush[_delayed]_work_sync(). These are flush functions
    which also make sure no CPU is still executing the target work from
    earlier queueing instances. These are similar to
    cancel[_delayed]_work_sync() except that the target work item is
    flushed instead of cancelled.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Factor out start_flush_work() from flush_work(). start_flush_work()
    has @wait_executing argument which controls whether the barrier is
    queued only if the work is pending or also if executing. As
    flush_work() needs to wait for execution too, it uses %true.

    This commit doesn't cause any behavior difference. start_flush_work()
    will be used to implement flush_work_sync().

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Make the following cleanup changes.

    * Relocate flush/cancel function prototypes and definitions.

    * Relocate wait_on_cpu_work() and wait_on_work() before
    try_to_grab_pending(). These will be used to implement
    flush_work_sync().

    * Make all flush/cancel functions return bool instead of int.

    * Update wait_on_cpu_work() and wait_on_work() to return %true if they
    actually waited.

    * Add / update comments.

    This patch doesn't cause any functional changes.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

13 Sep, 2010

1 commit

  • Update copyright notice and add Documentation/workqueue.txt.

    Randy Dunlap, Dave Chinner: misc fixes.

    Signed-off-by: Tejun Heo
    Reviewed-By: Florian Mickler
    Cc: Ingo Molnar
    Cc: Christoph Lameter
    Cc: Randy Dunlap
    Cc: Dave Chinner

    Tejun Heo
     

08 Sep, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: use zalloc_cpumask_var() for gcwq->mayday_mask
    workqueue: fix GCWQ_DISASSOCIATED initialization
    workqueue: Add a workqueue chapter to the tracepoint docbook
    workqueue: fix cwq->nr_active underflow
    workqueue: improve destroy_workqueue() debuggability
    workqueue: mark lock acquisition on worker_maybe_bind_and_lock()
    workqueue: annotate lock context change
    workqueue: free rescuer on destroy_workqueue

    Linus Torvalds
     

31 Aug, 2010

2 commits

  • alloc_mayday_mask() was using alloc_cpumask_var() making
    gcwq->mayday_mask contain garbage after initialization on
    CONFIG_CPUMASK_OFFSTACK=y configurations. This combined with the
    previously fixed GCWQ_DISASSOCIATED initialization bug could make
    rescuers fall into infinite loop trying to bind to an offline cpu.

    Signed-off-by: Tejun Heo
    Reported-by: CAI Qian

    Tejun Heo
     
  • init_workqueues() incorrectly marks workqueues for all possible CPUs
    associated. Combined with mayday_mask initialization bug, this can
    make rescuers keep trying to bind to an offline gcwq indefinitely.
    Fix init_workqueues() such that only online CPUs have their gcwqs have
    GCWQ_DISASSOCIATED cleared.

    Signed-off-by: Tejun Heo
    Reported-by: CAI Qian

    Tejun Heo
     

25 Aug, 2010

2 commits

  • cwq->nr_active is used to keep track of how many work items are active
    for the cpu workqueue, where 'active' is defined as either pending on
    global worklist or executing. This is used to implement the
    max_active limit and workqueue freezing. If a work item is queued
    after nr_active has already reached max_active, the work item doesn't
    increment nr_active and is put on the delayed queue and gets activated
    later as previous active work items retire.

    try_to_grab_pending() which is used in the cancellation path
    unconditionally decremented nr_active whether the work item being
    cancelled is currently active or delayed, so cancelling a delayed work
    item makes nr_active underflow. This breaks max_active enforcement
    and triggers BUG_ON() in destroy_workqueue() later on.

    This patch fixes this bug by adding a flag WORK_STRUCT_DELAYED, which
    is set while a work item in on the delayed list and making
    try_to_grab_pending() decrement nr_active iff the work item is
    currently active.

    The addition of the flag enlarges cwq alignment to 256 bytes which is
    getting a bit too large. It's scheduled to be reduced back to 128
    bytes by merging WORK_STRUCT_PENDING and WORK_STRUCT_CWQ in the next
    devel cycle.

    Signed-off-by: Tejun Heo
    Reported-by: Johannes Berg

    Tejun Heo
     
  • Now that the worklist is global, having works pending after wq
    destruction can easily lead to oops and destroy_workqueue() have
    several BUG_ON()s to catch these cases. Unfortunately, BUG_ON()
    doesn't tell much about how the work became pending after the final
    flush_workqueue().

    This patch adds WQ_DYING which is set before the final flush begins.
    If a work is requested to be queued on a dying workqueue,
    WARN_ON_ONCE() is triggered and the request is ignored. This clearly
    indicates which caller is trying to queue a work on a dying workqueue
    and keeps the system working in most cases.

    Locking rule comment is updated such that the 'I' rule includes
    modifying the field from destruction path.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

23 Aug, 2010

2 commits

  • worker_maybe_bind_and_lock() actually grabs gcwq->lock but was missing proper
    annotation. Add it. So this patch will remove following sparse warnings:

    kernel/workqueue.c:1214:13: warning: context imbalance in 'worker_maybe_bind_and_lock' - wrong count at exit
    arch/x86/include/asm/irqflags.h:44:9: warning: context imbalance in 'worker_rebind_fn' - unexpected unlock
    kernel/workqueue.c:1991:17: warning: context imbalance in 'rescuer_thread' - unexpected unlock

    Signed-off-by: Namhyung Kim
    Signed-off-by: Tejun Heo

    Namhyung Kim
     
  • Some of internal functions called within gcwq->lock context releases and
    regrabs the lock but were missing proper annotations. Add it.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Tejun Heo

    Namhyung Kim
     

22 Aug, 2010

1 commit

  • With the introduction of the new unified work queue thread pools,
    we lost one feature: It's no longer possible to know which worker
    is causing the CPU to wake out of idle. The result is that PowerTOP
    now reports a lot of "kworker/a:b" instead of more readable results.

    This patch adds a pair of tracepoints to the new workqueue code,
    similar in style to the timer/hrtimer tracepoints.

    With this pair of tracepoints, the next PowerTOP can correctly
    report which work item caused the wakeup (and how long it took):

    Interrupt (43) i915 time 3.51ms wakeups 141
    Work ieee80211_iface_work time 0.81ms wakeups 29
    Work do_dbs_timer time 0.55ms wakeups 24
    Process Xorg time 21.36ms wakeups 4
    Timer sched_rt_period_timer time 0.01ms wakeups 1

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

16 Aug, 2010

1 commit


09 Aug, 2010

1 commit

  • Commit 6ee0578b (workqueue: mark init_workqueues as early_initcall)
    made workqueue SMP initialization depend on workqueue_cpu_callback(),
    which however was registered as hotcpu_notifier() and didn't get
    called if CONFIG_HOTPLUG_CPU is not set. This made gcwqs on non-boot
    CPUs not create their initial workers leading to boot failures. Fix
    it by making it a cpu_notifier.

    Signed-off-by: Tejun Heo
    Reported-and-bisected-by: walt
    Tested-by: Markus Trippelsdorf

    Tejun Heo
     

08 Aug, 2010

2 commits

  • works in schecule_on_each_cpu() is a percpu pointer but was missing
    __percpu markup. Add it.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Tejun Heo

    Namhyung Kim
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits)
    workqueue: mark init_workqueues() as early_initcall()
    workqueue: explain for_each_*cwq_cpu() iterators
    fscache: fix build on !CONFIG_SYSCTL
    slow-work: kill it
    gfs2: use workqueue instead of slow-work
    drm: use workqueue instead of slow-work
    cifs: use workqueue instead of slow-work
    fscache: drop references to slow-work
    fscache: convert operation to use workqueue instead of slow-work
    fscache: convert object to use workqueue instead of slow-work
    workqueue: fix how cpu number is stored in work->data
    workqueue: fix mayday_mask handling on UP
    workqueue: fix build problem on !CONFIG_SMP
    workqueue: fix locking in retry path of maybe_create_worker()
    async: use workqueue for worker pool
    workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead
    workqueue: implement unbound workqueue
    workqueue: prepare for WQ_UNBOUND implementation
    libata: take advantage of cmwq and remove concurrency limitations
    workqueue: fix worker management invocation without pending works
    ...

    Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in
    include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c

    Linus Torvalds
     

01 Aug, 2010

2 commits

  • Mark init_workqueues() as early_initcall() and thus it will be initialized
    before smp bringup. init_workqueues() registers for the hotcpu notifier
    and thus it should cope with the processors that are brought online after
    the workqueues are initialized.

    x86 smp bringup code uses workqueues and uses a workaround for the
    cold boot process (as the workqueues are initialized post smp_init()).
    Marking init_workqueues() as early_initcall() will pave the way for
    cleaning up this code.

    Signed-off-by: Suresh Siddha
    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Andrew Morton

    Suresh Siddha
     
  • for_each_*cwq_cpu() are similar to regular CPU iterators except that
    it also considers the pseudo CPU number used for unbound workqueues.
    Explain them.

    Signed-off-by: Tejun Heo
    Cc: Andrew Morton

    Tejun Heo
     

23 Jul, 2010

1 commit

  • Once a work starts execution, its data contains the cpu number it was
    on instead of pointing to cwq. This is added by commit 7a22ad75
    (workqueue: carry cpu number in work data once execution starts) to
    reliably determine the work was last on even if the workqueue itself
    was destroyed inbetween.

    Whether data points to a cwq or contains a cpu number was
    distinguished by comparing the value against PAGE_OFFSET. The
    assumption was that a cpu number should be below PAGE_OFFSET while a
    pointer to cwq should be above it. However, on architectures which
    use separate address spaces for user and kernel spaces, this doesn't
    hold as PAGE_OFFSET is zero.

    Fix it by using an explicit flag, WORK_STRUCT_CWQ, to mark what the
    data field contains. If the flag is set, it's pointing to a cwq;
    otherwise, it contains a cpu number.

    Reported on s390 and microblaze during linux-next testing.

    Signed-off-by: Tejun Heo
    Reported-by: Sachin Sant
    Reported-by: Michal Simek
    Reported-by: Martin Schwidefsky
    Tested-by: Martin Schwidefsky
    Tested-by: Michal Simek

    Tejun Heo
     

20 Jul, 2010

2 commits

  • All cpumasks are assumed to have cpu 0 permanently set on UP, so it
    can't be used to signify whether there's something to be done for the
    CPU. workqueue was using cpumask to track which CPU requested rescuer
    assistance and this led rescuer thread to think there always are
    pending mayday requests on UP, which resulted in infinite busy loops.

    This patch fixes the problem by introducing mayday_mask_t and
    associated helpers which wrap cpumask on SMP and emulates its behavior
    using bitops and unsigned long on UP.

    Signed-off-by: Tejun Heo
    Cc: Rusty Russell

    Tejun Heo
     
  • Commit f3421797 (workqueue: implement unbound workqueue) incorrectly
    tested CONFIG_SMP as part of a C expression in alloc/free_cwqs(). As
    CONFIG_SMP is not defined in UP, this breaks build. Fix it by using

    Found during linux-next build test.

    Signed-off-by: Tejun Heo
    Reported-by: Stephen Rothwell

    Tejun Heo
     

14 Jul, 2010

1 commit


02 Jul, 2010

5 commits

  • WQ_SINGLE_CPU combined with @max_active of 1 is used to achieve full
    ordering among works queued to a workqueue. The same can be achieved
    using WQ_UNBOUND as unbound workqueues always use the gcwq for
    WORK_CPU_UNBOUND. As @max_active is always one and benefits from cpu
    locality isn't accessible anyway, serving them with unbound workqueues
    should be fine.

    Drop WQ_SINGLE_CPU support and use WQ_UNBOUND instead. Note that most
    single thread workqueue users will be converted to use multithread or
    non-reentrant instead and only the ones which require strict ordering
    will keep using WQ_UNBOUND + @max_active of 1.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • This patch implements unbound workqueue which can be specified with
    WQ_UNBOUND flag on creation. An unbound workqueue has the following
    properties.

    * It uses a dedicated gcwq with a pseudo CPU number WORK_CPU_UNBOUND.
    This gcwq is always online and disassociated.

    * Workers are not bound to any CPU and not concurrency managed. Works
    are dispatched to workers as soon as possible and the only applied
    limitation is @max_active. IOW, all unbound workqeueues are
    implicitly high priority.

    Unbound workqueues can be used as simple execution context provider.
    Contexts unbound to any cpu are served as soon as possible.

    Signed-off-by: Tejun Heo
    Cc: Arjan van de Ven
    Cc: David Howells

    Tejun Heo
     
  • In preparation of WQ_UNBOUND addition, make the following changes.

    * Add WORK_CPU_* constants for pseudo cpu id numbers used (currently
    only WORK_CPU_NONE) and use them instead of NR_CPUS. This is to
    allow another pseudo cpu id for unbound cpu.

    * Reorder WQ_* flags.

    * Make workqueue_struct->cpu_wq a union which contains a percpu
    pointer, regular pointer and an unsigned long value and use
    kzalloc/kfree() in UP allocation path. This will be used to
    implement unbound workqueues which will use only one cwq on SMPs.

    * Move alloc_cwqs() allocation after initialization of wq fields, so
    that alloc_cwqs() has access to wq->flags.

    * Trivial relocation of wq local variables in freeze functions.

    These changes don't cause any functional change.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • When there's no pending work to do, worker_thread() goes back to sleep
    after waking up without checking whether worker management is
    necessary. This means that idle worker exit requests can be ignored
    if the gcwq stays empty.

    Fix it by making worker_thread() always check whether worker
    management is necessary before going to sleep.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • get_work_gcwq() was incorrectly triggering BUG_ON() if cpu number is
    equal to or higher than num_possible_cpus() instead of nr_cpu_ids.
    Fix it.

    Signed-off-by: Tejun Heo

    Tejun Heo