01 Aug, 2010

2 commits

  • Mark init_workqueues() as early_initcall() and thus it will be initialized
    before smp bringup. init_workqueues() registers for the hotcpu notifier
    and thus it should cope with the processors that are brought online after
    the workqueues are initialized.

    x86 smp bringup code uses workqueues and uses a workaround for the
    cold boot process (as the workqueues are initialized post smp_init()).
    Marking init_workqueues() as early_initcall() will pave the way for
    cleaning up this code.

    Signed-off-by: Suresh Siddha
    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Andrew Morton

    Suresh Siddha
     
  • for_each_*cwq_cpu() are similar to regular CPU iterators except that
    it also considers the pseudo CPU number used for unbound workqueues.
    Explain them.

    Signed-off-by: Tejun Heo
    Cc: Andrew Morton

    Tejun Heo
     

24 Jul, 2010

1 commit

  • Commit 8b8edefa (fscache: convert object to use workqueue instead of
    slow-work) made fscache_exit() call unregister_sysctl_table()
    unconditionally breaking build when sysctl is disabled. Fix it by
    putting it inside CONFIG_SYSCTL.

    Signed-off-by: Tejun Heo
    Reported-by: Randy Dunlap
    Cc: David Howells

    Tejun Heo
     

23 Jul, 2010

8 commits

  • slow-work doesn't have any user left. Kill it.

    Signed-off-by: Tejun Heo
    Acked-by: David Howells

    Tejun Heo
     
  • Workqueue can now handle high concurrency. Convert gfs to use
    workqueue instead of slow-work.

    * Steven pointed out that recovery path might be run from allocation
    path and thus requires forward progress guarantee without memory
    allocation. Create and use gfs_recovery_wq with rescuer. Please
    note that forward progress wasn't guaranteed with slow-work.

    * Updated to use non-reentrant workqueue.

    Signed-off-by: Tejun Heo
    Acked-by: Steven Whitehouse

    Tejun Heo
     
  • Workqueue can now handle high concurrency. Convert drm_crtc_helper to
    use system_nrt_wq instead of slow-work. The conversion is mostly
    straight forward. One difference is that drm_helper_hpd_irq_event()
    no longer blocks and can be called from any context.

    Signed-off-by: Tejun Heo
    Acked-by: David Airlie
    Cc: dri-devel@lists.freedesktop.org

    Tejun Heo
     
  • Workqueue can now handle high concurrency. Use system_nrt_wq
    instead of slow-work.

    * Updated is_valid_oplock_break() to not call cifs_oplock_break_put()
    as advised by Steve French. It might cause deadlock. Instead,
    reference is increased after queueing succeeded and
    cifs_oplock_break() briefly grabs GlobalSMBSeslock before putting
    the cfile to make sure it doesn't put before the matching get is
    finished.

    * Anton Blanchard reported that cifs conversion was using now gone
    system_single_wq. Use system_nrt_wq which provides non-reentrance
    guarantee which is enough and much better.

    Signed-off-by: Tejun Heo
    Acked-by: Steve French
    Cc: Anton Blanchard

    Tejun Heo
     
  • fscache no longer uses slow-work. Drop references to it.

    Signed-off-by: Tejun Heo
    Acked-by: David Howells

    Tejun Heo
     
  • Make fscache operation to use only workqueue instead of combination of
    workqueue and slow-work. FSCACHE_OP_SLOW is dropped and
    FSCACHE_OP_FAST is renamed to FSCACHE_OP_ASYNC and uses newly added
    fscache_op_wq workqueue to execute op->processor().
    fscache_operation_init_slow() is dropped and fscache_operation_init()
    now takes @processor argument directly.

    * Unbound workqueue is used.

    * fscache_retrieval_work() is no longer necessary as OP_ASYNC now does
    the equivalent thing.

    * sysctl fscache.operation_max_active added to control concurrency.
    The default value is nr_cpus clamped between 2 and
    WQ_UNBOUND_MAX_ACTIVE.

    * debugfs support is dropped for now. Tracing API based debug
    facility is planned to be added.

    Signed-off-by: Tejun Heo
    Acked-by: David Howells

    Tejun Heo
     
  • Make fscache object state transition callbacks use workqueue instead
    of slow-work. New dedicated unbound CPU workqueue fscache_object_wq
    is created. get/put callbacks are renamed and modified to take
    @object and called directly from the enqueue wrapper and the work
    function. While at it, make all open coded instances of get/put to
    use fscache_get/put_object().

    * Unbound workqueue is used.

    * work_busy() output is printed instead of slow-work flags in object
    debugging outputs. They mean basically the same thing bit-for-bit.

    * sysctl fscache.object_max_active added to control concurrency. The
    default value is nr_cpus clamped between 4 and
    WQ_UNBOUND_MAX_ACTIVE.

    * slow_work_sleep_till_thread_needed() is replaced with fscache
    private implementation fscache_object_sleep_till_congested() which
    waits on fscache_object_wq congestion.

    * debugfs support is dropped for now. Tracing API based debug
    facility is planned to be added.

    Signed-off-by: Tejun Heo
    Acked-by: David Howells

    Tejun Heo
     
  • Once a work starts execution, its data contains the cpu number it was
    on instead of pointing to cwq. This is added by commit 7a22ad75
    (workqueue: carry cpu number in work data once execution starts) to
    reliably determine the work was last on even if the workqueue itself
    was destroyed inbetween.

    Whether data points to a cwq or contains a cpu number was
    distinguished by comparing the value against PAGE_OFFSET. The
    assumption was that a cpu number should be below PAGE_OFFSET while a
    pointer to cwq should be above it. However, on architectures which
    use separate address spaces for user and kernel spaces, this doesn't
    hold as PAGE_OFFSET is zero.

    Fix it by using an explicit flag, WORK_STRUCT_CWQ, to mark what the
    data field contains. If the flag is set, it's pointing to a cwq;
    otherwise, it contains a cpu number.

    Reported on s390 and microblaze during linux-next testing.

    Signed-off-by: Tejun Heo
    Reported-by: Sachin Sant
    Reported-by: Michal Simek
    Reported-by: Martin Schwidefsky
    Tested-by: Martin Schwidefsky
    Tested-by: Michal Simek

    Tejun Heo
     

20 Jul, 2010

2 commits

  • All cpumasks are assumed to have cpu 0 permanently set on UP, so it
    can't be used to signify whether there's something to be done for the
    CPU. workqueue was using cpumask to track which CPU requested rescuer
    assistance and this led rescuer thread to think there always are
    pending mayday requests on UP, which resulted in infinite busy loops.

    This patch fixes the problem by introducing mayday_mask_t and
    associated helpers which wrap cpumask on SMP and emulates its behavior
    using bitops and unsigned long on UP.

    Signed-off-by: Tejun Heo
    Cc: Rusty Russell

    Tejun Heo
     
  • Commit f3421797 (workqueue: implement unbound workqueue) incorrectly
    tested CONFIG_SMP as part of a C expression in alloc/free_cwqs(). As
    CONFIG_SMP is not defined in UP, this breaks build. Fix it by using

    Found during linux-next build test.

    Signed-off-by: Tejun Heo
    Reported-by: Stephen Rothwell

    Tejun Heo
     

14 Jul, 2010

2 commits


02 Jul, 2010

8 commits

  • WQ_SINGLE_CPU combined with @max_active of 1 is used to achieve full
    ordering among works queued to a workqueue. The same can be achieved
    using WQ_UNBOUND as unbound workqueues always use the gcwq for
    WORK_CPU_UNBOUND. As @max_active is always one and benefits from cpu
    locality isn't accessible anyway, serving them with unbound workqueues
    should be fine.

    Drop WQ_SINGLE_CPU support and use WQ_UNBOUND instead. Note that most
    single thread workqueue users will be converted to use multithread or
    non-reentrant instead and only the ones which require strict ordering
    will keep using WQ_UNBOUND + @max_active of 1.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • This patch implements unbound workqueue which can be specified with
    WQ_UNBOUND flag on creation. An unbound workqueue has the following
    properties.

    * It uses a dedicated gcwq with a pseudo CPU number WORK_CPU_UNBOUND.
    This gcwq is always online and disassociated.

    * Workers are not bound to any CPU and not concurrency managed. Works
    are dispatched to workers as soon as possible and the only applied
    limitation is @max_active. IOW, all unbound workqeueues are
    implicitly high priority.

    Unbound workqueues can be used as simple execution context provider.
    Contexts unbound to any cpu are served as soon as possible.

    Signed-off-by: Tejun Heo
    Cc: Arjan van de Ven
    Cc: David Howells

    Tejun Heo
     
  • In preparation of WQ_UNBOUND addition, make the following changes.

    * Add WORK_CPU_* constants for pseudo cpu id numbers used (currently
    only WORK_CPU_NONE) and use them instead of NR_CPUS. This is to
    allow another pseudo cpu id for unbound cpu.

    * Reorder WQ_* flags.

    * Make workqueue_struct->cpu_wq a union which contains a percpu
    pointer, regular pointer and an unsigned long value and use
    kzalloc/kfree() in UP allocation path. This will be used to
    implement unbound workqueues which will use only one cwq on SMPs.

    * Move alloc_cwqs() allocation after initialization of wq fields, so
    that alloc_cwqs() has access to wq->flags.

    * Trivial relocation of wq local variables in freeze functions.

    These changes don't cause any functional change.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • libata has two concurrency related limitations.

    a. ata_wq which is used for polling PIO has single thread per CPU. If
    there are multiple devices doing polling PIO on the same CPU, they
    can't be executed simultaneously.

    b. ata_aux_wq which is used for SCSI probing has single thread. In
    cases where SCSI probing is stalled for extended period of time
    which is possible for ATAPI devices, this will stall all probing.

    #a is solved by increasing maximum concurrency of ata_wq. Please note
    that polling PIO might be used under allocation path and thus needs to
    be served by a separate wq with a rescuer.

    #b is solved by using the default wq instead and achieving exclusion
    via per-port mutex.

    Signed-off-by: Tejun Heo
    Acked-by: Jeff Garzik

    Tejun Heo
     
  • When there's no pending work to do, worker_thread() goes back to sleep
    after waking up without checking whether worker management is
    necessary. This means that idle worker exit requests can be ignored
    if the gcwq stays empty.

    Fix it by making worker_thread() always check whether worker
    management is necessary before going to sleep.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • get_work_gcwq() was incorrectly triggering BUG_ON() if cpu number is
    equal to or higher than num_possible_cpus() instead of nr_cpu_ids.
    Fix it.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • When one flusher is cascading to the next flusher, it first sets
    wq->first_flusher to the next one and sets up the next flush cycle.
    If there's nothing to do for the next cycle, it clears
    wq->flush_flusher and proceeds to the one after that.

    If the woken up flusher checks wq->first_flusher before it gets
    cleared, it will incorrectly assume the role of the first flusher,
    which triggers BUG_ON() sanity check.

    Fix it by checking wq->first_flusher again after grabbing the mutex.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • worker_set/clr_flags() assume that if none of NOT_RUNNING flags is set
    the worker must be contributing to nr_running which is only true if
    the worker is actually running.

    As when called from self, it is guaranteed that the worker is running,
    those functions can be safely used from the worker itself and they
    aren't necessary from other places anyway. Make the following changes
    to fix the bug.

    * Make worker_set/clr_flags() whine if not called from self.

    * Convert all places which called those functions from other tasks to
    manipulate flags directly.

    * Make trustee_thread() directly clear nr_running after setting
    WORKER_ROGUE on all workers. This is the only place where
    nr_running manipulation is necessary outside of workers themselves.

    * While at it, add sanity check for nr_running in worker_enter_idle().

    Signed-off-by: Tejun Heo

    Tejun Heo
     

29 Jun, 2010

17 commits

  • This patch implements cpu intensive workqueue which can be specified
    with WQ_CPU_INTENSIVE flag on creation. Works queued to a cpu
    intensive workqueue don't participate in concurrency management. IOW,
    it doesn't contribute to gcwq->nr_running and thus doesn't delay
    excution of other works.

    Note that although cpu intensive works won't delay other works, they
    can be delayed by other works. Combine with WQ_HIGHPRI to avoid being
    delayed by other works too.

    As the name suggests this is useful when using workqueue for cpu
    intensive works. Workers executing cpu intensive works are not
    considered for workqueue concurrency management and left for the
    scheduler to manage.

    Signed-off-by: Tejun Heo
    Cc: Andrew Morton

    Tejun Heo
     
  • This patch implements high priority workqueue which can be specified
    with WQ_HIGHPRI flag on creation. A high priority workqueue has the
    following properties.

    * A work queued to it is queued at the head of the worklist of the
    respective gcwq after other highpri works, while normal works are
    always appended at the end.

    * As long as there are highpri works on gcwq->worklist,
    [__]need_more_worker() remains %true and process_one_work() wakes up
    another worker before it start executing a work.

    The above two properties guarantee that works queued to high priority
    workqueues are dispatched to workers and start execution as soon as
    possible regardless of the state of other works.

    Signed-off-by: Tejun Heo
    Cc: Andi Kleen
    Cc: Andrew Morton

    Tejun Heo
     
  • Implement the following utility APIs.

    workqueue_set_max_active() : adjust max_active of a wq
    workqueue_congested() : test whether a wq is contested
    work_cpu() : determine the last / current cpu of a work
    work_busy() : query whether a work is busy

    * Anton Blanchard fixed missing ret initialization in work_busy().

    Signed-off-by: Tejun Heo
    Cc: Anton Blanchard

    Tejun Heo
     
  • This patch makes changes to make new workqueue features available to
    its users.

    * Now that workqueue is more featureful, there should be a public
    workqueue creation function which takes paramters to control them.
    Rename __create_workqueue() to alloc_workqueue() and make 0
    max_active mean WQ_DFL_ACTIVE. In the long run, all
    create_workqueue_*() will be converted over to alloc_workqueue().

    * To further unify access interface, rename keventd_wq to system_wq
    and export it.

    * Add system_long_wq and system_nrt_wq. The former is to host long
    running works separately (so that flush_scheduled_work() dosen't
    take so long) and the latter guarantees any queued work item is
    never executed in parallel by multiple CPUs. These will be used by
    future patches to update workqueue users.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Define WQ_MAX_ACTIVE and create keventd with max_active set to half of
    it which means that keventd now can process upto WQ_MAX_ACTIVE / 2 - 1
    works concurrently. Unless some combination can result in dependency
    loop longer than max_active, deadlock won't happen and thus it's
    unnecessary to check whether current_is_keventd() before trying to
    schedule a work. Kill current_is_keventd().

    (Lockdep annotations are broken. We need lock_map_acquire_read_norecurse())

    Signed-off-by: Tejun Heo
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Christoph Lameter
    Cc: Tony Luck
    Cc: Andi Kleen
    Cc: Oleg Nesterov

    Tejun Heo
     
  • Instead of creating a worker for each cwq and putting it into the
    shared pool, manage per-cpu workers dynamically.

    Works aren't supposed to be cpu cycle hogs and maintaining just enough
    concurrency to prevent work processing from stalling due to lack of
    processing context is optimal. gcwq keeps the number of concurrent
    active workers to minimum but no less. As long as there's one or more
    running workers on the cpu, no new worker is scheduled so that works
    can be processed in batch as much as possible but when the last
    running worker blocks, gcwq immediately schedules new worker so that
    the cpu doesn't sit idle while there are works to be processed.

    gcwq always keeps at least single idle worker around. When a new
    worker is necessary and the worker is the last idle one, the worker
    assumes the role of "manager" and manages the worker pool -
    ie. creates another worker. Forward-progress is guaranteed by having
    dedicated rescue workers for workqueues which may be necessary while
    creating a new worker. When the manager is having problem creating a
    new worker, mayday timer activates and rescue workers are summoned to
    the cpu and execute works which might be necessary to create new
    workers.

    Trustee is expanded to serve the role of manager while a CPU is being
    taken down and stays down. As no new works are supposed to be queued
    on a dead cpu, it just needs to drain all the existing ones. Trustee
    continues to try to create new workers and summon rescuers as long as
    there are pending works. If the CPU is brought back up while the
    trustee is still trying to drain the gcwq from the previous offlining,
    the trustee will kill all idles ones and tell workers which are still
    busy to rebind to the cpu, and pass control over to gcwq which assumes
    the manager role as necessary.

    Concurrency managed worker pool reduces the number of workers
    drastically. Only workers which are necessary to keep the processing
    going are created and kept. Also, it reduces cache footprint by
    avoiding unnecessarily switching contexts between different workers.

    Please note that this patch does not increase max_active of any
    workqueue. All workqueues can still only process one work per cpu.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Implement worker_{set|clr}_flags() to manipulate worker flags. These
    are currently simple wrappers but logics to track the current worker
    state and the current level of concurrency will be added.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Use gcwq->worklist instead of cwq->worklist and break the strict
    association between a cwq and its worker. All works queued on a cpu
    are queued on gcwq->worklist and processed by any available worker on
    the gcwq.

    As there no longer is strict association between a cwq and its worker,
    whether a work is executing can now only be determined by calling
    [__]find_worker_executing_work().

    After this change, the only association between a cwq and its worker
    is that a cwq puts a worker into shared worker pool on creation and
    kills it on destruction. As all workqueues are still limited to
    max_active of one, this means that there are always at least as many
    workers as active works and thus there's no danger for deadlock.

    The break of strong association between cwqs and workers requires
    somewhat clumsy changes to current_is_keventd() and
    destroy_workqueue(). Dynamic worker pool management will remove both
    clumsy changes. current_is_keventd() won't be necessary at all as the
    only reason it exists is to avoid queueing a work from a work which
    will be allowed just fine. The clumsy part of destroy_workqueue() is
    added because a worker can only be destroyed while idle and there's no
    guarantee a worker is idle when its wq is going down. With dynamic
    pool management, workers are not associated with workqueues at all and
    only idle ones will be submitted to destroy_workqueue() so the code
    won't be necessary anymore.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • With gcwq managing all the workers and work->data pointing to the last
    gcwq it was on, non-reentrance can be easily implemented by checking
    whether the work is still running on the previous gcwq on queueing.
    Implement it.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • To implement non-reentrant workqueue, the last gcwq a work was
    executed on must be reliably obtainable as long as the work structure
    is valid even if the previous workqueue has been destroyed.

    To achieve this, work->data will be overloaded to carry the last cpu
    number once execution starts so that the previous gcwq can be located
    reliably. This means that cwq can't be obtained from work after
    execution starts but only gcwq.

    Implement set_work_{cwq|cpu}(), get_work_[g]cwq() and
    clear_work_data() to set work data to the cpu number when starting
    execution, access the overloaded work data and clear it after
    cancellation.

    queue_delayed_work_on() is updated to preserve the last cpu while
    in-flight in timer and other callers which depended on getting cwq
    from work after execution starts are converted to depend on gcwq
    instead.

    * Anton Blanchard fixed compile error on powerpc due to missing
    linux/threads.h include.

    Signed-off-by: Tejun Heo
    Cc: Anton Blanchard

    Tejun Heo
     
  • Now that all the workers are tracked by gcwq, we can find which worker
    is executing a work from gcwq. Implement find_worker_executing_work()
    and make worker track its current_cwq so that we can find things the
    other way around. This will be used to implement non-reentrant wqs.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Reimplement st (single thread) workqueue so that it's friendly to
    shared worker pool. It was originally implemented by confining st
    workqueues to use cwq of a fixed cpu and always having a worker for
    the cpu. This implementation isn't very friendly to shared worker
    pool and suboptimal in that it ends up crossing cpu boundaries often.

    Reimplement st workqueue using dynamic single cpu binding and
    cwq->limit. WQ_SINGLE_THREAD is replaced with WQ_SINGLE_CPU. In a
    single cpu workqueue, at most single cwq is bound to the wq at any
    given time. Arbitration is done using atomic accesses to
    wq->single_cpu when queueing a work. Once bound, the binding stays
    till the workqueue is drained.

    Note that the binding is never broken while a workqueue is frozen.
    This is because idle cwqs may have works waiting in delayed_works
    queue while frozen. On thaw, the cwq is restarted if there are any
    delayed works or unbound otherwise.

    When combined with max_active limit of 1, single cpu workqueue has
    exactly the same execution properties as the original single thread
    workqueue while allowing sharing of per-cpu workers.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Reimplement CPU hotplugging support using trustee thread. On CPU
    down, a trustee thread is created and each step of CPU down is
    executed by the trustee and workqueue_cpu_callback() simply drives and
    waits for trustee state transitions.

    CPU down operation no longer waits for works to be drained but trustee
    sticks around till all pending works have been completed. If CPU is
    brought back up while works are still draining,
    workqueue_cpu_callback() tells trustee to step down and tell workers
    to rebind to the cpu.

    As it's difficult to tell whether cwqs are empty if it's freezing or
    frozen, trustee doesn't consider draining to be complete while a gcwq
    is freezing or frozen (tracked by new GCWQ_FREEZING flag). Also,
    workers which get unbound from their cpu are marked with WORKER_ROGUE.

    Trustee based implementation doesn't bring any new feature at this
    point but it will be used to manage worker pool when dynamic shared
    worker pool is implemented.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Implement worker states. After created, a worker is STARTED. While a
    worker isn't processing a work, it's IDLE and chained on
    gcwq->idle_list. While processing a work, a worker is BUSY and
    chained on gcwq->busy_hash. Also, gcwq now counts the number of all
    workers and idle ones.

    worker_thread() is restructured to reflect state transitions.
    cwq->more_work is removed and waking up a worker makes it check for
    events. A worker is killed by setting DIE flag while it's IDLE and
    waking it up.

    This gives gcwq better visibility of what's going on and allows it to
    find out whether a work is executing quickly which is necessary to
    have multiple workers processing the same cwq.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • There is one gcwq (global cwq) per each cpu and all cwqs on an cpu
    point to it. A gcwq contains a lock to be used by all cwqs on the cpu
    and an ida to give IDs to workers belonging to the cpu.

    This patch introduces gcwq, moves worker_ida into gcwq and make all
    cwqs on the same cpu use the cpu's gcwq->lock instead of separate
    locks. gcwq->ida is now protected by gcwq->lock too.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Currently, workqueue freezing is implemented by marking the worker
    freezeable and calling try_to_freeze() from dispatch loop.
    Reimplement it using cwq->limit so that the workqueue is frozen
    instead of the worker.

    * workqueue_struct->saved_max_active is added which stores the
    specified max_active on initialization.

    * On freeze, all cwq->max_active's are quenched to zero. Freezing is
    complete when nr_active on all cwqs reach zero.

    * On thaw, all cwq->max_active's are restored to wq->saved_max_active
    and the worklist is repopulated.

    This new implementation allows having single shared pool of workers
    per cpu.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Add cwq->nr_active, cwq->max_active and cwq->delayed_work. nr_active
    counts the number of active works per cwq. A work is active if it's
    flushable (colored) and is on cwq's worklist. If nr_active reaches
    max_active, new works are queued on cwq->delayed_work and activated
    later as works on the cwq complete and decrement nr_active.

    cwq->max_active can be specified via the new @max_active parameter to
    __create_workqueue() and is set to 1 for all workqueues for now. As
    each cwq has only single worker now, this double queueing doesn't
    cause any behavior difference visible to its users.

    This will be used to reimplement freeze/thaw and implement shared
    worker pool.

    Signed-off-by: Tejun Heo

    Tejun Heo