14 Nov, 2008

2 commits


06 Nov, 2008

1 commit

  • Impact: introduce new APIs

    We want to deprecate cpumasks on the stack, as we are headed for
    gynormous numbers of CPUs. Eventually, we want to head towards an
    undefined 'struct cpumask' so they can never be declared on stack.

    1) New cpumask functions which take pointers instead of copies.
    (cpus_* -> cpumask_*)

    2) Several new helpers to reduce requirements for temporary cpumasks
    (cpumask_first_and, cpumask_next_and, cpumask_any_and)

    3) Helpers for declaring cpumasks on or offstack for large NR_CPUS
    (cpumask_var_t, alloc_cpumask_var and free_cpumask_var)

    4) 'struct cpumask' for explicitness and to mark new-style code.

    5) Make iterator functions stop at nr_cpu_ids (a runtime constant),
    not NR_CPUS for time efficiency and for smaller dynamic allocations
    in future.

    6) cpumask_copy() so we can allocate less than a full cpumask eventually
    (for alloc_cpumask_var), and so we can eliminate the 'struct cpumask'
    definition eventually.

    7) work_on_cpu() helper for doing task on a CPU, rather than saving old
    cpumask for current thread and manipulating it.

    8) smp_call_function_many() which is smp_call_function_mask() except
    taking a cpumask pointer.

    Note that this patch simply introduces the new functions and leaves
    the obsolescent ones in place. This is to simplify the transition
    patches.

    Signed-off-by: Rusty Russell
    Signed-off-by: Ingo Molnar

    Rusty Russell
     

22 Oct, 2008

1 commit

  • create_rt_workqueue will create a real time prioritized workqueue.
    This is needed for the conversion of stop_machine to a workqueue based
    implementation.
    This patch adds yet another parameter to __create_workqueue_key to tell
    it that we want an rt workqueue.
    However it looks like we rather should have something like "int type"
    instead of singlethread, freezable and rt.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Rusty Russell
    Cc: Ingo Molnar

    Heiko Carstens
     

17 Oct, 2008

1 commit


12 Aug, 2008

1 commit


11 Aug, 2008

2 commits


31 Jul, 2008

1 commit


26 Jul, 2008

8 commits

  • The bug was pointed out by Akinobu Mita , and this
    patch is based on his original patch.

    workqueue_cpu_callback(CPU_UP_PREPARE) expects that if it returns
    NOTIFY_BAD, _cpu_up() will send CPU_UP_CANCELED then.

    However, this is not true since

    "cpu hotplug: cpu: deliver CPU_UP_CANCELED only to NOTIFY_OKed callbacks with CPU_UP_PREPARE"
    commit: a0d8cdb652d35af9319a9e0fb7134de2a276c636

    The callback which has returned NOTIFY_BAD will not receive
    CPU_UP_CANCELED. Change the code to fulfil the CPU_UP_CANCELED logic if
    CPU_UP_PREPARE fails.

    Signed-off-by: Oleg Nesterov
    Reported-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • schedule_on_each_cpu() can use schedule_work_on() to avoid the code
    duplication.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • queue_work() can use queue_work_on() to avoid the code duplication.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Add lockdep annotations to flush_work() and update the comment.

    Signed-off-by: Oleg Nesterov
    Cc: Jarek Poplawski
    Acked-by: Johannes Berg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • workqueue_cpu_callback(CPU_DEAD) flushes cwq->thread under
    cpu_maps_update_begin(). This means that the multithreaded workqueues
    can't use get_online_cpus() due to the possible deadlock, very bad and
    very old problem.

    Introduce the new state, CPU_POST_DEAD, which is called after
    cpu_hotplug_done() but before cpu_maps_update_done().

    Change workqueue_cpu_callback() to use CPU_POST_DEAD instead of CPU_DEAD.
    This means that create/destroy functions can't rely on get_online_cpus()
    any longer and should take cpu_add_remove_lock instead.

    [akpm@linux-foundation.org: fix CONFIG_SMP=n]
    Signed-off-by: Oleg Nesterov
    Acked-by: Gautham R Shenoy
    Cc: Heiko Carstens
    Cc: Max Krasnyansky
    Cc: Paul Jackson
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: Vegard Nossum
    Cc: Martin Schwidefsky
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Change schedule_on_each_cpu() to use flush_work() instead of
    flush_workqueue(), this way we don't wait for other work_struct's which
    can be queued meanwhile.

    Signed-off-by: Oleg Nesterov
    Cc: Jarek Poplawski
    Cc: Max Krasnyansky
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Most of users of flush_workqueue() can be changed to use cancel_work_sync(),
    but sometimes we really need to wait for the completion and cancelling is not
    an option. schedule_on_each_cpu() is good example.

    Add the new helper, flush_work(work), which waits for the completion of the
    specific work_struct. More precisely, it "flushes" the result of of the last
    queue_work() which is visible to the caller.

    For example, this code

    queue_work(wq, work);
    /* WINDOW */
    queue_work(wq, work);

    flush_work(work);

    doesn't necessary work "as expected". What can happen in the WINDOW above is

    - wq starts the execution of work->func()

    - the caller migrates to another CPU

    now, after the 2nd queue_work() this work is active on the previous CPU, and
    at the same time it is queued on another. In this case flush_work(work) may
    return before the first work->func() completes.

    It is trivial to add another helper

    int flush_work_sync(struct work_struct *work)
    {
    return flush_work(work) || wait_on_work(work);
    }

    which works "more correctly", but it has to iterate over all CPUs and thus
    it much slower than flush_work().

    Signed-off-by: Oleg Nesterov
    Acked-by: Max Krasnyansky
    Acked-by: Jarek Poplawski
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • insert_work() inserts the new work_struct before or after cwq->worklist,
    depending on the "int tail" parameter. Change it to accept "list_head *"
    instead, this shrinks .text a bit and allows us to insert the barrier
    after specific work_struct.

    Signed-off-by: Oleg Nesterov
    Cc: Jarek Poplawski
    Cc: Max Krasnyansky
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

25 Jul, 2008

1 commit

  • This interface allows adding a job on a specific cpu.

    Although a work struct on a cpu will be scheduled to other cpu if the cpu
    dies, there is a recursion if a work task tries to offline the cpu it's
    running on. we need to schedule the task to a specific cpu in this case.
    http://bugzilla.kernel.org/show_bug.cgi?id=10897

    [oleg@tv-sign.ru: cleanups]
    Signed-off-by: Zhang Rui
    Tested-by: Rus
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Rui
     

06 Jul, 2008

1 commit


05 Jul, 2008

1 commit

  • Remove all clameter@sgi.com addresses from the kernel tree since they will
    become invalid on June 27th. Change my maintainer email address for the
    slab allocators to cl@linux-foundation.org (which will be the new email
    address for the future).

    Signed-off-by: Christoph Lameter
    Signed-off-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Stephen Rothwell
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

24 May, 2008

1 commit


01 May, 2008

1 commit

  • timer_stats_timer_set_start_info is invoked twice, additionally, the
    invocation of this function can be moved to where it is only called when a
    delay is really required.

    Signed-off-by: Andrew Liu
    Cc: Pavel Machek
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Liu
     

30 Apr, 2008

1 commit


29 Apr, 2008

2 commits

  • cleanup_workqueue_thread() doesn't need the second argument, remove it.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • When cpu_populated_map was introduced, it was supposed that cwq->thread can
    survive after CPU_DEAD, that is why we never shrink cpu_populated_map.

    This is not very nice, we can safely remove the already dead CPU from the map.
    The only required change is that destroy_workqueue() must hold the hotplug
    lock until it destroys all cwq->thread's, to protect the cpu_populated_map.
    We could make the local copy of cpu mask and drop the lock, but
    sizeof(cpumask_t) may be very large.

    Also, fix the comment near queue_work(). Unless _cpu_down() happens we do
    guarantee the cpu-affinity of the work_struct, and we have users which rely on
    this.

    [akpm@linux-foundation.org: repair comment]
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

17 Apr, 2008

1 commit


09 Feb, 2008

2 commits


26 Jan, 2008

1 commit


16 Jan, 2008

1 commit

  • Dave Young reported warnings from lockdep that the workqueue API
    can sometimes try to register lockdep classes with the same key
    but different names. This is not permitted in lockdep.

    Unfortunately, I was unaware of that restriction when I wrote
    the code to debug workqueue problems with lockdep and used the
    workqueue name as the lockdep class name. This can obviously
    lead to the problem if the workqueue name is dynamic.

    This patch solves the problem by always using a constant name
    for the workqueue's lockdep class, namely either the constant
    name that was passed in or a string consisting of the variable
    name.

    Signed-off-by: Johannes Berg
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra

    Johannes Berg
     

20 Oct, 2007

2 commits

  • The task_struct->pid member is going to be deprecated, so start
    using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in
    the kernel.

    The first thing to start with is the pid, printed to dmesg - in
    this case we may safely use task_pid_nr(). Besides, printks produce
    more (much more) than a half of all the explicit pid usage.

    [akpm@linux-foundation.org: git-drm went and changed lots of stuff]
    Signed-off-by: Pavel Emelyanov
    Cc: Dave Airlie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • In the following scenario:

    code path 1:
    my_function() -> lock(L1); ...; flush_workqueue(); ...

    code path 2:
    run_workqueue() -> my_work() -> ...; lock(L1); ...

    you can get a deadlock when my_work() is queued or running
    but my_function() has acquired L1 already.

    This patch adds a pseudo-lock to each workqueue to make lockdep
    warn about this scenario.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Johannes Berg
    Acked-by: Oleg Nesterov
    Acked-by: Ingo Molnar
    Acked-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Berg
     

28 Aug, 2007

1 commit

  • Fix bogus DEBUG_PREEMPT warning on x86_64, when cpu brought online after
    bootup: current_is_keventd is right to note its use of smp_processor_id
    is preempt-safe, but should use raw_smp_processor_id to avoid the warning.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

18 Jul, 2007

2 commits

  • Pointed out by Michal Schmidt .

    The bug was introduced in 2.6.22 by me.

    cleanup_workqueue_thread() does flush_cpu_workqueue(cwq) in a loop until
    ->worklist becomes empty. This is live-lockable, a re-niced caller can get
    CPU after wake_up() and insert a new barrier before the lower-priority
    cwq->thread has a chance to clear ->current_work.

    Change cleanup_workqueue_thread() to do flush_cpu_workqueue(cwq) only once.
    We can rely on the fact that run_workqueue() won't return until it flushes
    all works. So it is safe to call kthread_stop() after that, the "should
    stop" request won't be noticed until run_workqueue() returns.

    Signed-off-by: Oleg Nesterov
    Cc: Michal Schmidt
    Cc: Srivatsa Vaddagiri
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Currently, the freezer treats all tasks as freezable, except for the kernel
    threads that explicitly set the PF_NOFREEZE flag for themselves. This
    approach is problematic, since it requires every kernel thread to either
    set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
    care for the freezing of tasks at all.

    It seems better to only require the kernel threads that want to or need to
    be frozen to use some freezer-related code and to remove any
    freezer-related code from the other (nonfreezable) kernel threads, which is
    done in this patch.

    The patch causes all kernel threads to be nonfreezable by default (ie. to
    have PF_NOFREEZE set by default) and introduces the set_freezable()
    function that should be called by the freezable kernel threads in order to
    unset PF_NOFREEZE. It also makes all of the currently freezable kernel
    threads call set_freezable(), so it shouldn't cause any (intentional)
    change of behaviour to appear. Additionally, it updates documentation to
    describe the freezing of tasks more accurately.

    [akpm@linux-foundation.org: build fixes]
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Nigel Cunningham
    Cc: Pavel Machek
    Cc: Oleg Nesterov
    Cc: Gautham R Shenoy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

17 Jul, 2007

2 commits

  • Change cancel_work_sync() and cancel_delayed_work_sync() to return a boolean
    indicating whether the work was actually cancelled. A zero return value means
    that the work was not pending/queued.

    Without that kind of change it is not possible to avoid flush_workqueue()
    sometimes, see the next patch as an example.

    Also, this patch unifies both functions and kills the (unlikely) busy-wait
    loop.

    Signed-off-by: Oleg Nesterov
    Acked-by: Jarek Poplawski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Imho, the current naming of cancel_xxx workqueue functions is very confusing.

    cancel_delayed_work()
    cancel_rearming_delayed_work()
    cancel_rearming_delayed_workqueue() // obsolete

    cancel_work_sync()

    This looks as if the first 2 functions differ in "type" of their argument
    which is not true any longer, nowadays the difference is the behaviour.

    The semantics of cancel_rearming_delayed_work(dwork) was changed
    significantly, it doesn't require that dwork rearms itself, and cancels dwork
    synchronously.

    Rename it to cancel_delayed_work_sync(). This matches cancel_delayed_work()
    and cancel_work_sync(). Re-create cancel_rearming_delayed_work() as a simple
    inline obsolete wrapper, like cancel_rearming_delayed_workqueue().

    Signed-off-by: Oleg Nesterov
    Acked-by: Jarek Poplawski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

24 May, 2007

1 commit

  • cleanup_workqueue_thread() and cwq_should_stop() are overcomplicated.

    Convert the code to use kthread_should_stop/kthread_stop as was
    suggested by Gautham and Srivatsa.

    In particular this patch removes the (unlikely) busy-wait loop from the
    exit path, it was a temporary and ugly kludge (if not a bug).

    Note: the current code was designed to solve another old problem:
    work->func can't share locks with hotplug callbacks. I think this could
    be done, see

    http://marc.info/?l=linux-kernel&m=116905366428633

    but this needs some more complications to preserve CPU affinity of
    cwq->thread during cpu_up(). A freezer-based hotplug looks more
    appealing.

    [akpm@linux-foundation.org: make it more tolerant of gcc borkenness]
    Signed-off-by: Oleg Nesterov
    Cc: Zilvinas Valinskas
    Cc: Gautham R Shenoy
    Cc: Srivatsa Vaddagiri
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

10 May, 2007

2 commits

  • Since nonboot CPUs are now disabled after tasks and devices have been
    frozen and the CPU hotplug infrastructure is used for this purpose, we need
    special CPU hotplug notifications that will help the CPU-hotplug-aware
    subsystems distinguish normal CPU hotplug events from CPU hotplug events
    related to a system-wide suspend or resume operation in progress. This
    patch introduces such notifications and causes them to be used during
    suspend and resume transitions. It also changes all of the
    CPU-hotplug-aware subsystems to take these notifications into consideration
    (for now they are handled in the same way as the corresponding "normal"
    ones).

    [oleg@tv-sign.ru: cleanups]
    Signed-off-by: Rafael J. Wysocki
    Cc: Gautham R Shenoy
    Cc: Pavel Machek
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Thanks to Jarek Poplawski for the ideas and for spotting the bug in the
    initial draft patch.

    cancel_rearming_delayed_work() currently has many limitations, because it
    requires that dwork always re-arms itself via queue_delayed_work(). So it
    hangs forever if dwork doesn't do this, or cancel_rearming_delayed_work/
    cancel_delayed_work was already called. It uses flush_workqueue() in a
    loop, so it can't be used if workqueue was freezed, and it is potentially
    live- lockable on busy system if delay is small.

    With this patch cancel_rearming_delayed_work() doesn't make any assumptions
    about dwork, it can re-arm itself via queue_delayed_work(), or
    queue_work(), or do nothing.

    As a "side effect", cancel_work_sync() was changed to handle re-arming works
    as well.

    Disadvantages:

    - this patch adds wmb() to insert_work().

    - slowdowns the fast path (when del_timer() succeeds on entry) of
    cancel_rearming_delayed_work(), because wait_on_work() is called
    unconditionally. In that case, compared to the old version, we are
    doing "unneeded" lock/unlock for each online CPU.

    On the other hand, this means we don't need to use cancel_work_sync()
    after cancel_rearming_delayed_work().

    - complicates the code (.text grows by 130 bytes).

    [akpm@linux-foundation.org: fix speling]
    Signed-off-by: Oleg Nesterov
    Cc: David Chinner
    Cc: David Howells
    Cc: Gautham Shenoy
    Acked-by: Jarek Poplawski
    Cc: Srivatsa Vaddagiri
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov