06 Oct, 2014

2 commits

  • Tidy up and use cond_resched_rcu_qs when calling cond_resched and
    reporting potential quiescent state to RCU. Splitting this change in
    this way allows easy backporting to -stable for kernel versions not
    having cond_resched_rcu_qs().

    Signed-off-by: Joe Lawrence
    Acked-by: Tejun Heo
    Signed-off-by: Paul E. McKenney

    Joe Lawrence
     
  • Similar to the stop_machine deadlock scenario on !PREEMPT kernels
    addressed in b22ce2785d97 "workqueue: cond_resched() after processing
    each work item", kworker threads requeueing back-to-back with zero jiffy
    delay can stall RCU. The cond_resched call introduced in that fix will
    yield only iff there are other higher priority tasks to run, so force a
    quiescent RCU state between work items.

    Signed-off-by: Joe Lawrence
    Link: https://lkml.kernel.org/r/20140926105227.01325697@jlaw-desktop.mno.stratus.com
    Link: https://lkml.kernel.org/r/20140929115445.40221d8e@jlaw-desktop.mno.stratus.com
    Fixes: b22ce2785d97 ("workqueue: cond_resched() after processing each work item")
    Cc:
    Acked-by: Tejun Heo
    Signed-off-by: Paul E. McKenney

    Joe Lawrence
     

05 Aug, 2014

2 commits

  • Pull percpu updates from Tejun Heo:

    - Major reorganization of percpu header files which I think makes
    things a lot more readable and logical than before.

    - percpu-refcount is updated so that it requires explicit destruction
    and can be reinitialized if necessary. This was pulled into the
    block tree to replace the custom percpu refcnting implemented in
    blk-mq.

    - In the process, percpu and percpu-refcount got cleaned up a bit

    * 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (21 commits)
    percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero()
    percpu-refcount: require percpu_ref to be exited explicitly
    percpu-refcount: use unsigned long for pcpu_count pointer
    percpu-refcount: add helpers for ->percpu_count accesses
    percpu-refcount: one bit is enough for REF_STATUS
    percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc()
    workqueue: stronger test in process_one_work()
    workqueue: clear POOL_DISASSOCIATED in rebind_workers()
    percpu: Use ALIGN macro instead of hand coding alignment calculation
    percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations
    percpu: preffity percpu header files
    percpu: use raw_cpu_*() to define __this_cpu_*()
    percpu: reorder macros in percpu header files
    percpu: move {raw|this}_cpu_*() definitions to include/linux/percpu-defs.h
    percpu: move generic {raw|this}_cpu_*_N() definitions to include/asm-generic/percpu.h
    percpu: only allow sized arch overrides for {raw|this}_cpu_*() ops
    percpu: reorganize include/linux/percpu-defs.h
    percpu: move accessors from include/linux/percpu.h to percpu-defs.h
    percpu: include/asm-generic/percpu.h should contain only arch-overridable parts
    percpu: introduce arch_raw_cpu_ptr()
    ...

    Linus Torvalds
     
  • Pull workqueue updates from Tejun Heo:
    "Lai has been doing a lot of cleanups of workqueue and kthread_work.
    No significant behavior change. Just a lot of cleanups all over the
    place. Some are a bit invasive but overall nothing too dangerous"

    * 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    kthread_work: remove the unused wait_queue_head
    kthread_work: wake up worker only when the worker is idle
    workqueue: use nr_node_ids instead of wq_numa_tbl_len
    workqueue: remove the misnamed out_unlock label in get_unbound_pool()
    workqueue: remove the stale comment in pwq_unbound_release_workfn()
    workqueue: move rescuer pool detachment to the end
    workqueue: unfold start_worker() into create_worker()
    workqueue: remove @wakeup from worker_set_flags()
    workqueue: remove an unneeded UNBOUND test before waking up the next worker
    workqueue: wake regular worker if need_more_worker() when rescuer leave the pool
    workqueue: alloc struct worker on its local node
    workqueue: reuse the already calculated pwq in try_to_grab_pending()
    workqueue: stronger test in process_one_work()
    workqueue: clear POOL_DISASSOCIATED in rebind_workers()
    workqueue: sanity check pool->cpu in wq_worker_sleeping()
    workqueue: clear leftover flags when detached
    workqueue: remove useless WARN_ON_ONCE()
    workqueue: use schedule_timeout_interruptible() instead of open code
    workqueue: remove the empty check in too_many_workers()
    workqueue: use "pool->cpu < 0" to stand for an unbound pool

    Linus Torvalds
     

23 Jul, 2014

6 commits

  • They are the same and nr_node_ids is provided by the memory subsystem.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • After the locking was moved up to the caller of the get_unbound_pool(),
    out_unlock label doesn't need to do any unlock operation and the name
    became bad, so we just remove this label, and the only usage-site
    "goto out_unlock" is subsituted to "return pool".

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • In 75ccf5950f82 ("workqueue: prepare flush_workqueue() for dynamic
    creation and destrucion of unbound pool_workqueues"), a comment
    about the synchronization for the pwq in pwq_unbound_release_workfn()
    was added. The comment claimed the flush_mutex wasn't strictly
    necessary, it was correct in that time, due to the pwq was protected
    by workqueue_lock.

    But it is incorrect now since the wq->flush_mutex was renamed to
    wq->mutex and workqueue_lock was removed, the wq->mutex is strictly
    needed. But the comment was miss-updated when the synchronization
    was changed.

    This patch removes the incorrect comments and doesn't add any new
    comment to explain why wq->mutex is needed here, which is definitely
    obvious and wq->pwqs_node has "WQ" notation in its definition which is
    better comment.

    The old commit mentioned above also introduced a comment in link_pwq()
    about the synchronization. This comment is also removed in this patch
    since the whole link_pwq() is proteced by wq->mutex.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • In 51697d393922 ("workqueue: use generic attach/detach routine for
    rescuers"), The rescuer detaches itself from the pool before put_pwq()
    so that the put_unbound_pool() will not destroy the rescuer-attached
    pool.

    It is unnecessary. worker_detach_from_pool() can be used as the last
    statement to access to the pool just like the regular workers,
    put_unbound_pool() will wait for it to detach and then free the pool.

    So we move the worker_detach_from_pool() down, make it coincide with
    the regular workers.

    tj: Minor description update.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • Simply unfold the code of start_worker() into create_worker() and
    remove the original start_worker() and create_and_start_worker().

    The only trade-off is the introduced overhead that the pool->lock
    is released and regrabbed after the newly worker is started.
    The overhead is acceptible since the manager is slow path.

    And because this new locking behavior, the newly created worker
    may grab the lock earlier than the manager and go to process
    work items. In this case, the recheck need_to_create_worker() may be
    true as expected and the manager goes to restart which is the
    correct behavior.

    tj: Minor updates to description and comments.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • worker_set_flags() has only two callers, each specifying %true and
    %false for @wakeup. Let's push the wake up to the caller and remove
    @wakeup from worker_set_flags(). The caller can use the following
    instead if wakeup is necessary:

    worker_set_flags();
    if (need_more_worker(pool))
    wake_up_worker(pool);

    This makes the code simpler. This patch doesn't introduce behavior
    changes.

    tj: Updated description and comments.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     

22 Jul, 2014

1 commit

  • In process_one_work():

    if ((worker->flags & WORKER_UNBOUND) && need_more_worker(pool))
    wake_up_worker(pool);

    the first test is unneeded. Even if the first test is removed, it
    doesn't affect the wake-up logic for WORKER_UNBOUND, and it will not
    introduce any useless wake-ups for normal per-cpu workers since
    nr_running is always >= 1. It will introduce useless/redundant
    wake-ups for CPU_INTENSIVE, but this case is rare and the next patch
    will also remove this redundant wake-up.

    tj: Minor updates to the description and comment.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     

19 Jul, 2014

1 commit


15 Jul, 2014

1 commit

  • When the create_worker() is called from non-manager, the struct worker
    is allocated from the node of the caller which may be different from the
    node of pool->node.

    So we add a node ID argument for the alloc_worker() to ensure the
    struct worker is allocated from the preferable node.

    tj: @nid renamed to @node for consistency.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     

11 Jul, 2014

1 commit


07 Jul, 2014

1 commit

  • When hot-adding and onlining CPU, kernel panic occurs, showing following
    call trace.

    BUG: unable to handle kernel paging request at 0000000000001d08
    IP: [] __alloc_pages_nodemask+0x9d/0xb10
    PGD 0
    Oops: 0000 [#1] SMP
    ...
    Call Trace:
    [] ? cpumask_next_and+0x35/0x50
    [] ? find_busiest_group+0x113/0x8f0
    [] ? deactivate_slab+0x349/0x3c0
    [] new_slab+0x91/0x300
    [] __slab_alloc+0x2bb/0x482
    [] ? copy_process.part.25+0xfc/0x14c0
    [] ? load_balance+0x218/0x890
    [] ? sched_clock+0x9/0x10
    [] ? trace_clock_local+0x9/0x10
    [] kmem_cache_alloc_node+0x8c/0x200
    [] copy_process.part.25+0xfc/0x14c0
    [] ? trace_buffer_unlock_commit+0x4d/0x60
    [] ? kthread_create_on_node+0x140/0x140
    [] do_fork+0xbc/0x360
    [] kernel_thread+0x26/0x30
    [] kthreadd+0x2c2/0x300
    [] ? kthread_create_on_cpu+0x60/0x60
    [] ret_from_fork+0x7c/0xb0
    [] ? kthread_create_on_cpu+0x60/0x60

    In my investigation, I found the root cause is wq_numa_possible_cpumask.
    All entries of wq_numa_possible_cpumask is allocated by
    alloc_cpumask_var_node(). And these entries are used without initializing.
    So these entries have wrong value.

    When hot-adding and onlining CPU, wq_update_unbound_numa() is called.
    wq_update_unbound_numa() calls alloc_unbound_pwq(). And alloc_unbound_pwq()
    calls get_unbound_pool(). In get_unbound_pool(), worker_pool->node is set
    as follow:

    3592 /* if cpumask is contained inside a NUMA node, we belong to that node */
    3593 if (wq_numa_enabled) {
    3594 for_each_node(node) {
    3595 if (cpumask_subset(pool->attrs->cpumask,
    3596 wq_numa_possible_cpumask[node])) {
    3597 pool->node = node;
    3598 break;
    3599 }
    3600 }
    3601 }

    But wq_numa_possible_cpumask[node] does not have correct cpumask. So, wrong
    node is selected. As a result, kernel panic occurs.

    By this patch, all entries of wq_numa_possible_cpumask are allocated by
    zalloc_cpumask_var_node to initialize them. And the panic disappeared.

    Signed-off-by: Yasuaki Ishimatsu
    Reviewed-by: Lai Jiangshan
    Signed-off-by: Tejun Heo
    Cc: stable@vger.kernel.org
    Fixes: bce903809ab3 ("workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]")

    Yasuaki Ishimatsu
     

02 Jul, 2014

2 commits

  • When POOL_DISASSOCIATED is cleared, the running worker's local CPU should
    be the same as pool->cpu without any exception even during cpu-hotplug.

    This patch changes "(proposition_A && proposition_B && proposition_C)"
    to "(proposition_B && proposition_C)", so if the old compound
    proposition is true, the new one must be true too. so this won't hide
    any possible bug which can be hit by old test.

    tj: Minor description update and dropped the obvious comment.

    CC: Jason J. Herne
    CC: Sasha Levin
    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • a9ab775bcadf ("workqueue: directly restore CPU affinity of workers
    from CPU_ONLINE") moved pool locking into rebind_workers() but left
    "pool->flags &= ~POOL_DISASSOCIATED" in workqueue_cpu_up_callback().

    There is nothing necessarily wrong with it, but there is no benefit
    either. Let's move it into rebind_workers() and achieve the following
    benefits:

    1) better readability, POOL_DISASSOCIATED is cleared in rebind_workers()
    as expected.

    2) we can guarantee that, when POOL_DISASSOCIATED is clear, the
    running workers of the pool are on the local CPU (pool->cpu).

    tj: Minor description update.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     

24 Jun, 2014

1 commit


20 Jun, 2014

8 commits

  • After the recent changes, when POOL_DISASSOCIATED is cleared, the
    running worker's local CPU should be the same as pool->cpu without any
    exception even during cpu-hotplug. Update the sanity check in
    process_one_work() accordingly.

    This patch changes "(proposition_A && proposition_B && proposition_C)"
    to "(proposition_B && proposition_C)", so if the old compound
    proposition is true, the new one must be true too. so this will not
    hide any possible bug which can be caught by the old test.

    tj: Minor updates to the description.

    CC: Jason J. Herne
    CC: Sasha Levin
    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • The commit a9ab775bcadf ("workqueue: directly restore CPU affinity of
    workers from CPU_ONLINE") moved the pool->lock into rebind_workers()
    without also moving "pool->flags &= ~POOL_DISASSOCIATED".

    There is nothing wrong with "pool->flags &= ~POOL_DISASSOCIATED" not
    being moved together, but there isn't any benefit either. We move it
    into rebind_workers() and achieve these benefits:

    1) Better readability. POOL_DISASSOCIATED is cleared in
    rebind_workers() as expected.

    2) When POOL_DISASSOCIATED is cleared, we can ensure that all the
    running workers of the pool are on the local CPU (pool->cpu).

    tj: Cosmetic updates to the code and description.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • In theory, pool->cpu is equals to @cpu in wq_worker_sleeping() after
    worker->flags is checked.

    And "pool->cpu != cpu" sanity check will help us if something wrong.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • When a worker is detached, the worker->flags may still have WORKER_UNBOUND
    or WORKER_REBOUND, it is OK for all cases:
    1) if it is a normal worker, the worker will be dead, it is OK.
    2) if it is a rescuer, it may re-attach to a pool with this leftover flag[s],
    it is still correct except it may cause unneeded wakeup.

    It is correct but not good, so we just remove the leftover flags.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • The @cpu is fetched via smp_processor_id() in this function,
    so the check is useless.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • schedule_timeout_interruptible(CREATE_COOLDOWN) is exactly the same as
    the original code.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • The commit ea1abd6197d5 ("workqueue: reimplement idle worker rebinding")
    used a trick which simply removes all to-be-bound idle workers from the
    idle list and lets them add themselves back after completing rebinding.

    And this trick caused the @worker_pool->nr_idle may deviate than the actual
    number of idle workers on @worker_pool->idle_list. More specifically,
    nr_idle may be non-zero while ->idle_list is empty. All users of
    ->nr_idle and ->idle_list are audited. The only affected one is
    too_many_workers() which is updated to check %false if ->idle_list is
    empty regardless of ->nr_idle.

    The commit/trick was complicated due to it just tried to simplify an even
    more complicated problem (workers had to rebind itself). But the commit
    a9ab775bcadf ("workqueue: directly restore CPU affinity of workers
    from CPU_ONLINE") fixed all these problems and the mentioned trick was
    useless and is gone.

    So, now the @worker_pool->nr_idle is exactly the actual number of workers
    on @worker_pool->idle_list. too_many_workers() should recover as it was
    before the trick. So we remove the empty check.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • There is a piece of sanity checks code in the put_unbound_pool().
    The meaning of this code is "if it is not an unbound pool, it will complain
    and return" IIUC. But the code uses "pool->flags & POOL_DISASSOCIATED"
    imprecisely due to a non-unbound pool may also have this flags.

    We should use "pool->cpu < 0" to stand for an unbound pool, so we covert the
    code to it.

    There is no strictly wrong if we still keep "pool->flags & POOL_DISASSOCIATED"
    here, but it is just a noise if we keep it:
    1) we focus on "unbound" here, not "[dis]association".
    2) "pool->cpu < 0" already implies "pool->flags & POOL_DISASSOCIATED".

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     

10 Jun, 2014

1 commit

  • Pull workqueue updates from Tejun Heo:
    "Lai simplified worker destruction path and internal workqueue locking
    and there are some other minor changes.

    Except for the removal of some long-deprecated interfaces which
    haven't had any in-kernel user for quite a while, there shouldn't be
    any difference to workqueue users"

    * 'for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    kernel/workqueue.c: pr_warning/pr_warn & printk/pr_info
    workqueue: remove the confusing POOL_FREEZING
    workqueue: rename first_worker() to first_idle_worker()
    workqueue: remove unused work_clear_pending()
    workqueue: remove unused WORK_CPU_END
    workqueue: declare system_highpri_wq
    workqueue: use generic attach/detach routine for rescuers
    workqueue: separate pool-attaching code out from create_worker()
    workqueue: rename manager_mutex to attach_mutex
    workqueue: narrow the protection range of manager_mutex
    workqueue: convert worker_idr to worker_ida
    workqueue: separate iteration role from worker_idr
    workqueue: destroy worker directly in the idle timeout handler
    workqueue: async worker destruction
    workqueue: destroy_worker() should destroy idle workers only
    workqueue: use manager lock only to protect worker_idr
    workqueue: Remove deprecated system_nrt[_freezable]_wq
    workqueue: Remove deprecated flush[_delayed]_work_sync()
    kernel/workqueue.c: pr_warning/pr_warn & printk/pr_info
    workqueue: simplify wq_update_unbound_numa() by jumping to use_dfl_pwq if the target cpumask equals wq's

    Linus Torvalds
     

28 May, 2014

1 commit

  • This commit did an incorrect printk->pr_info conversion. If we were
    converting to pr_info() we should lose the log_level parameter. The problem is
    that this is called (indirectly) by show_regs_print_info(), which is called
    with various log_levels (from _INFO clear to _EMERG). So we leave it as
    a printk() call so the desired log_level is applied.

    Not a full revert, as the other half of the patch is correct.

    Signed-off-by: Valdis Kletnieks
    Signed-off-by: Tejun Heo

    Valdis Kletnieks
     

22 May, 2014

3 commits

  • Currently, the global freezing state is propagated to worker_pools via
    POOL_FREEZING and then to each workqueue; however, the middle step -
    propagation through worker_pools - can be skipped as long as one or
    more max_active adjustments happens for each workqueue after the
    update to the global state is visible. The global workqueue freezing
    state and the max_active adjustments during workqueue creation and
    [un]freezing are serialized with wq_pool_mutex, so it's trivial to
    guarantee that max_actives stay in sync with global freezing state.

    POOL_FREEZING is unnecessary and makes the code more confusing and
    complicates freeze_workqueues_begin() and thaw_workqueues() by
    requiring them to walk through all pools.

    Remove POOL_FREEZING and use workqueue_freezing directly instead.

    tj: Description and comment updates.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • first_worker() actually returns the first idle workers, the name
    first_idle_worker() which is self-commnet will be better.

    All the callers of first_worker() expect it returns an idle worker,
    the name first_idle_worker() with "idle" notation makes reviewers happier.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     

20 May, 2014

9 commits

  • There are several problems with the code that rescuers use to bind
    themselve to the target pool's cpumask.

    1) It is very different from how the normal workers bind to cpumask,
    increasing code complexity and maintenance overhead.

    2) The code of cpu-binding for rescuers is complicated.

    3) If one or more cpu hotplugs happen while a rescuer is processing
    its scheduled work items, the rescuer may not stay bound to the
    cpumask of the pool. This is an allowed behavior, but is still
    hairy. It will be better if the cpumask of the rescuer is always
    kept synchronized with the pool across cpu hotplugs.

    Using generic attach/detach routine will solve the above problems and
    results in much simpler code.

    tj: Minor description updates.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • Currently, the code to attach a new worker to its pool is embedded in
    create_worker(). Separating this code out will make the codes clearer
    and will allow rescuers to share the code path later.

    tj: Description and comment updates.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • manager_mutex is only used to protect the attaching for the pool
    and the pool->workers list. It protects the pool->workers and operations
    based on this list, such as:

    cpu-binding for the workers in the pool->workers
    the operations to set/clear WORKER_UNBOUND

    So let's rename manager_mutex to attach_mutex to better reflect its
    role. This patch is a pure rename.

    tj: Minor command and description updates.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • In create_worker(), as pool->worker_ida now uses
    ida_simple_get()/ida_simple_put() and doesn't require external
    synchronization, it doesn't need manager_mutex.

    struct worker allocation and kthread allocation are not visible by any
    one before attached, so they don't need manager_mutex either.

    The above operations are before the attaching operation which attaches
    the worker to the pool. Between attaching and starting the worker, the
    worker is already attached to the pool, so the cpu hotplug will handle
    cpu-binding for the worker correctly and we don't need the
    manager_mutex after attaching.

    The conclusion is that only the attaching operation needs manager_mutex,
    so we narrow the protection section of manager_mutex in create_worker().

    Some comments about manager_mutex are removed, because we will rename
    it to attach_mutex and add worker_attach_to_pool() later which will be
    self-explanatory.

    tj: Minor description updates.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • We no longer iterate workers via worker_idr and worker_idr is used
    only for allocating/freeing ID, so we can convert it to worker_ida.

    By using ida_simple_get/remove(), worker_ida doesn't require external
    synchronization, so we don't need manager_mutex to protect it and the
    ID-removal code is allowed to be moved out from
    worker_detach_from_pool().

    In a later patch, worker_detach_from_pool() will be used in rescuers
    which don't have IDs, so we move the ID-removal code out from
    worker_detach_from_pool() into worker_thread().

    tj: Minor description updates.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • worker_idr has the iteration (iterating for attached workers) and
    worker ID duties. These two duties don't have to be tied together. We
    can separate them and use a list for tracking attached workers and
    iteration.

    Before this separation, it wasn't possible to add rescuer workers to
    worker_idr due to rescuer workers couldn't allocate ID dynamically
    because ID-allocation depends on memory-allocation, which rescuer
    can't depend on.

    After separation, we can easily add the rescuer workers to the list for
    iteration without any memory-allocation. It is required when we attach
    the rescuer worker to the pool in later patch.

    tj: Minor description updates.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • Since destroy_worker() doesn't need to sleep nor require manager_mutex,
    destroy_worker() can be directly called in the idle timeout
    handler, it helps us remove POOL_MANAGE_WORKERS and
    maybe_destroy_worker() and simplify the manage_workers()

    After POOL_MANAGE_WORKERS is removed, worker_thread() doesn't
    need to test whether it needs to manage after processed works.
    So we can remove the test branch.

    Signed-off-by: Lai Jiangshan

    Lai Jiangshan
     
  • worker destruction includes these parts of code:
    adjust pool's stats
    remove the worker from idle list
    detach the worker from the pool
    kthread_stop() to wait for the worker's task exit
    free the worker struct

    We can find out that there is no essential work to do after
    kthread_stop(), which means destroy_worker() doesn't need to wait for
    the worker's task exit, so we can remove kthread_stop() and free the
    worker struct in the worker exiting path.

    However, put_unbound_pool() still needs to sync the all the workers'
    destruction before destroying the pool; otherwise, the workers may
    access to the invalid pool when they are exiting.

    So we also move the code of "detach the worker" to the exiting
    path and let put_unbound_pool() to sync with this code via
    detach_completion.

    The code of "detach the worker" is wrapped in a new function
    "worker_detach_from_pool()" although worker_detach_from_pool() is only
    called once (in worker_thread()) after this patch, but we need to wrap
    it for these reasons:

    1) The code of "detach the worker" is not short enough to unfold them
    in worker_thread().
    2) the name of "worker_detach_from_pool()" is self-comment, and we add
    some comments above the function.
    3) it will be shared by rescuer in later patch which allows rescuer
    and normal thread use the same attach/detach frameworks.

    The worker id is freed when detaching which happens before the worker
    is fully dead, but this id of the dying worker may be re-used for a
    new worker, so the dying worker's task name is changed to
    "worker/dying" to avoid two or several workers having the same name.

    Since "detach the worker" is moved out from destroy_worker(),
    destroy_worker() doesn't require manager_mutex, so the
    "lockdep_assert_held(&pool->manager_mutex)" in destroy_worker() is
    removed, and destroy_worker() is not protected by manager_mutex in
    put_unbound_pool().

    tj: Minor description updates.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan
     
  • We used to have the CPU online failure path where a worker is created
    and then destroyed without being started. A worker was created for
    the CPU coming online and if the online operation failed the created worker
    was shut down without being started. But this behavior was changed.
    The first worker is created and started at the same time for the CPU coming
    online.

    It means that we had already ensured in the code that destroy_worker()
    destroys only idle workers and we don't want to allow it to destroy
    any non-idle worker in the future. Otherwise, it may be buggy and it
    may be extremely hard to check. We should force destroy_worker() to
    destroy only idle workers explicitly.

    Since destroy_worker() destroys only idle workers, this patch does not
    change any functionality. We just need to update the comments and the
    sanity check code.

    In the sanity check code, we will refuse to destroy the worker
    if !(worker->flags & WORKER_IDLE).

    If the worker entered idle which means it is already started,
    so we remove the check of "worker->flags & WORKER_STARTED",
    after this removal, WORKER_STARTED is totally unneeded,
    so we remove WORKER_STARTED too.

    In the comments for create_worker(), "Create a new worker which is bound..."
    is changed to "... which is attached..." due to we change the name of this
    behavior to attaching.

    tj: Minor description / comment updates.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Tejun Heo

    Lai Jiangshan