11 Dec, 2009

1 commit


18 Nov, 2009

1 commit

  • Commit 65a64464349883891e21e74af16c05d6e1eeb4e9 ("HWPOISON: Allow
    schedule_on_each_cpu() from keventd") which allows schedule_on_each_cpu()
    to be called from keventd added a race condition. schedule_on_each_cpu()
    may race with cpu hotplug and end up executing the function twice on a
    cpu.

    Fix it by moving direct execution into the section protected with
    get/put_online_cpus(). While at it, update code such that direct
    execution is done after works have been scheduled for all other cpus and
    drop unnecessary cpu != orig test from flush loop.

    Signed-off-by: Tejun Heo
    Cc: Andi Kleen
    Acked-by: Oleg Nesterov
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

16 Nov, 2009

1 commit


29 Oct, 2009

1 commit


19 Oct, 2009

1 commit

  • Right now when calling schedule_on_each_cpu() from keventd there
    is a deadlock because it tries to schedule a work item on the current CPU
    too. This happens via lru_add_drain_all() in hwpoison.

    Just call the function for the current CPU in this case. This is actually
    faster too.

    Debugging with Fengguang Wu & Max Asbock

    Signed-off-by: Andi Kleen

    Andi Kleen
     

15 Oct, 2009

1 commit


12 Sep, 2009

1 commit

  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits)
    sched: Fix sched::sched_stat_wait tracepoint field
    sched: Disable NEW_FAIR_SLEEPERS for now
    sched: Keep kthreads at default priority
    sched: Re-tune the scheduler latency defaults to decrease worst-case latencies
    sched: Turn off child_runs_first
    sched: Ensure that a child can't gain time over it's parent after fork()
    sched: enable SD_WAKE_IDLE
    sched: Deal with low-load in wake_affine()
    sched: Remove short cut from select_task_rq_fair()
    sched: Turn on SD_BALANCE_NEWIDLE
    sched: Clean up topology.h
    sched: Fix dynamic power-balancing crash
    sched: Remove reciprocal for cpu_power
    sched: Try to deal with low capacity, fix update_sd_power_savings_stats()
    sched: Try to deal with low capacity
    sched: Scale down cpu_power due to RT tasks
    sched: Implement dynamic cpu_power
    sched: Add smt_gain
    sched: Update the cpu_power sum during load-balance
    sched: Add SD_PREFER_SIBLING
    ...

    Linus Torvalds
     

09 Sep, 2009

1 commit


04 Aug, 2009

1 commit

  • Two important aspects of the schedule_work() function are not
    yet documented:

    - that it is allowed to pass a struct work_struct * to this
    function that is already on the kernel-global workqueue;

    - the meaning of its return value.

    The patch below documents both aspects.

    Signed-off-by: Bart Van Assche
    Cc: "Greg Kroah-Hartman"
    Cc: Andrew Morton
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     

02 Jun, 2009

1 commit

  • v3: zhaolei@cn.fujitsu.com: Change TRACE_EVENT definition to new format
    introduced by Steven Rostedt: consolidate trace and trace_event headers
    v2: kosaki@jp.fujitsu.com: print the function names instead of addr, and zap
    the work addr
    v1: zhaolei@cn.fujitsu.com: Make workqueue tracepoints use TRACE_EVENT macro

    TRACE_EVENT is a more generic way to define tracepoints.
    Doing so adds these new capabilities to the tracepoints:

    - zero-copy and per-cpu splice() tracing
    - binary tracing without printf overhead
    - structured logging records exposed under /debug/tracing/events
    - trace events embedded in function tracer output and other plugins
    - user-defined, per tracepoint filter expressions

    Then, this patch converts DEFINE_TRACE to TRACE_EVENT in workqueue related
    tracepoints.

    [ Impact: expand workqueue tracer to events tracing ]

    Signed-off-by: Zhao Lei
    Cc: Steven Rostedt
    Cc: Tom Zanussi
    Cc: Oleg Nesterov
    Cc: Andrew Morton
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Frederic Weisbecker

    Zhaolei
     

09 Apr, 2009

1 commit

  • Impact: circular locking bugfix

    The various implemetnations and proposed implemetnations of work_on_cpu()
    are vulnerable to various deadlocks because they all used queues of some
    form.

    Unrelated pieces of kernel code thus gained dependencies wherein if one
    work_on_cpu() caller holds a lock which some other work_on_cpu() callback
    also takes, the kernel could rarely deadlock.

    Fix this by creating a short-lived kernel thread for each work_on_cpu()
    invokation.

    This is not terribly fast, but the only current caller of work_on_cpu() is
    pci_call_probe().

    It would be nice to find some other way of doing the node-local
    allocations in the PCI probe code so that we can zap work_on_cpu()
    altogether. The code there is rather nasty. I can't think of anything
    simple at this time...

    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Rusty Russell

    Andrew Morton
     

06 Apr, 2009

1 commit

  • * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits)
    tracing, net: fix net tree and tracing tree merge interaction
    tracing, powerpc: fix powerpc tree and tracing tree interaction
    ring-buffer: do not remove reader page from list on ring buffer free
    function-graph: allow unregistering twice
    trace: make argument 'mem' of trace_seq_putmem() const
    tracing: add missing 'extern' keywords to trace_output.h
    tracing: provide trace_seq_reserve()
    blktrace: print out BLK_TN_MESSAGE properly
    blktrace: extract duplidate code
    blktrace: fix memory leak when freeing struct blk_io_trace
    blktrace: fix blk_probes_ref chaos
    blktrace: make classic output more classic
    blktrace: fix off-by-one bug
    blktrace: fix the original blktrace
    blktrace: fix a race when creating blk_tree_root in debugfs
    blktrace: fix timestamp in binary output
    tracing, Text Edit Lock: cleanup
    tracing: filter fix for TRACE_EVENT_FORMAT events
    ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release()
    x86: kretprobe-booster interrupt emulation code fix
    ...

    Fix up trivial conflicts in
    arch/parisc/include/asm/ftrace.h
    include/linux/memory.h
    kernel/extable.c
    kernel/module.c

    Linus Torvalds
     

03 Apr, 2009

1 commit

  • 1) lockdep will complain when run_workqueue() performs recursion.

    2) The recursive implementation of run_workqueue() means that
    flush_workqueue() and its documentation are inconsistent. This may
    hide deadlocks and other bugs.

    3) The recursion in run_workqueue() will poison cwq->current_work, but
    flush_work() and __cancel_work_timer(), etcetera need a reliable
    cwq->current_work.

    Signed-off-by: Lai Jiangshan
    Acked-by: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Frederic Weisbecker
    Cc: Eric Dumazet
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     

02 Apr, 2009

1 commit


30 Mar, 2009

1 commit


03 Feb, 2009

1 commit


20 Jan, 2009

2 commits

  • Impact: remove potential clashes with generic kevent workqueue

    Annoyingly, some places we want to use work_on_cpu are already in
    workqueues. As per Ingo's suggestion, we create a different workqueue
    for work_on_cpu.

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis
    Signed-off-by: Ingo Molnar

    Rusty Russell
     
  • Impact: remove potential circular lock dependency with cpu hotplug lock

    This has caused more problems than it solved, with a pile of cpu
    hotplug locking issues.

    Followup patches will get_online_cpus() in callers that need it, but
    if they don't do it they're no worse than before when they were using
    set_cpus_allowed without locking.

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis
    Signed-off-by: Ingo Molnar

    Rusty Russell
     

14 Jan, 2009

1 commit

  • Impact: new tracer

    The workqueue tracer provides some statistical informations
    about each cpu workqueue thread such as the number of the
    works inserted and executed since their creation. It can help
    to evaluate the amount of work each of them have to perform.
    For example it can help a developer to decide whether he should
    choose a per cpu workqueue instead of a singlethreaded one.

    It only traces statistical informations for now but it will probably later
    provide event tracing too.

    Such a tracer could help too, and be improved, to help rt priority sorted
    workqueue development.

    To have a snapshot of the workqueues state at any time, just do

    cat /debugfs/tracing/trace_stat/workqueues

    Ie:

    1 125 125 reiserfs/1
    1 0 0 scsi_tgtd/1
    1 0 0 aio/1
    1 0 0 ata/1
    1 114 114 kblockd/1
    1 0 0 kintegrityd/1
    1 2147 2147 events/1

    0 0 0 kpsmoused
    0 105 105 reiserfs/0
    0 0 0 scsi_tgtd/0
    0 0 0 aio/0
    0 0 0 ata_aux
    0 0 0 ata/0
    0 0 0 cqueue
    0 0 0 kacpi_notify
    0 0 0 kacpid
    0 149 149 kblockd/0
    0 0 0 kintegrityd/0
    0 1000 1000 khelper
    0 2270 2270 events/0

    Changes in V2:

    _ Drop the static array based on NR_CPU and dynamically allocate the stat array
    with num_possible_cpus() and other cpu mask facilities....
    _ Trace workqueue insertion at a bit lower level (insert_work instead of queue_work) to handle
    even the workqueue barriers.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

01 Jan, 2009

1 commit

  • Impact: Reduce memory usage, use new cpumask API.

    cpu_populated_map becomes a cpumask_var_t, and cpu_singlethread_map is
    simply a cpumask pointer: it's simply the cpumask containing the first
    possible CPU anyway.

    Signed-off-by: Rusty Russell

    Rusty Russell
     

14 Nov, 2008

2 commits


06 Nov, 2008

1 commit

  • Impact: introduce new APIs

    We want to deprecate cpumasks on the stack, as we are headed for
    gynormous numbers of CPUs. Eventually, we want to head towards an
    undefined 'struct cpumask' so they can never be declared on stack.

    1) New cpumask functions which take pointers instead of copies.
    (cpus_* -> cpumask_*)

    2) Several new helpers to reduce requirements for temporary cpumasks
    (cpumask_first_and, cpumask_next_and, cpumask_any_and)

    3) Helpers for declaring cpumasks on or offstack for large NR_CPUS
    (cpumask_var_t, alloc_cpumask_var and free_cpumask_var)

    4) 'struct cpumask' for explicitness and to mark new-style code.

    5) Make iterator functions stop at nr_cpu_ids (a runtime constant),
    not NR_CPUS for time efficiency and for smaller dynamic allocations
    in future.

    6) cpumask_copy() so we can allocate less than a full cpumask eventually
    (for alloc_cpumask_var), and so we can eliminate the 'struct cpumask'
    definition eventually.

    7) work_on_cpu() helper for doing task on a CPU, rather than saving old
    cpumask for current thread and manipulating it.

    8) smp_call_function_many() which is smp_call_function_mask() except
    taking a cpumask pointer.

    Note that this patch simply introduces the new functions and leaves
    the obsolescent ones in place. This is to simplify the transition
    patches.

    Signed-off-by: Rusty Russell
    Signed-off-by: Ingo Molnar

    Rusty Russell
     

22 Oct, 2008

1 commit

  • create_rt_workqueue will create a real time prioritized workqueue.
    This is needed for the conversion of stop_machine to a workqueue based
    implementation.
    This patch adds yet another parameter to __create_workqueue_key to tell
    it that we want an rt workqueue.
    However it looks like we rather should have something like "int type"
    instead of singlethread, freezable and rt.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Rusty Russell
    Cc: Ingo Molnar

    Heiko Carstens
     

17 Oct, 2008

1 commit


12 Aug, 2008

1 commit


11 Aug, 2008

2 commits


31 Jul, 2008

1 commit


26 Jul, 2008

8 commits

  • The bug was pointed out by Akinobu Mita , and this
    patch is based on his original patch.

    workqueue_cpu_callback(CPU_UP_PREPARE) expects that if it returns
    NOTIFY_BAD, _cpu_up() will send CPU_UP_CANCELED then.

    However, this is not true since

    "cpu hotplug: cpu: deliver CPU_UP_CANCELED only to NOTIFY_OKed callbacks with CPU_UP_PREPARE"
    commit: a0d8cdb652d35af9319a9e0fb7134de2a276c636

    The callback which has returned NOTIFY_BAD will not receive
    CPU_UP_CANCELED. Change the code to fulfil the CPU_UP_CANCELED logic if
    CPU_UP_PREPARE fails.

    Signed-off-by: Oleg Nesterov
    Reported-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • schedule_on_each_cpu() can use schedule_work_on() to avoid the code
    duplication.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • queue_work() can use queue_work_on() to avoid the code duplication.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Add lockdep annotations to flush_work() and update the comment.

    Signed-off-by: Oleg Nesterov
    Cc: Jarek Poplawski
    Acked-by: Johannes Berg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • workqueue_cpu_callback(CPU_DEAD) flushes cwq->thread under
    cpu_maps_update_begin(). This means that the multithreaded workqueues
    can't use get_online_cpus() due to the possible deadlock, very bad and
    very old problem.

    Introduce the new state, CPU_POST_DEAD, which is called after
    cpu_hotplug_done() but before cpu_maps_update_done().

    Change workqueue_cpu_callback() to use CPU_POST_DEAD instead of CPU_DEAD.
    This means that create/destroy functions can't rely on get_online_cpus()
    any longer and should take cpu_add_remove_lock instead.

    [akpm@linux-foundation.org: fix CONFIG_SMP=n]
    Signed-off-by: Oleg Nesterov
    Acked-by: Gautham R Shenoy
    Cc: Heiko Carstens
    Cc: Max Krasnyansky
    Cc: Paul Jackson
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: Vegard Nossum
    Cc: Martin Schwidefsky
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Change schedule_on_each_cpu() to use flush_work() instead of
    flush_workqueue(), this way we don't wait for other work_struct's which
    can be queued meanwhile.

    Signed-off-by: Oleg Nesterov
    Cc: Jarek Poplawski
    Cc: Max Krasnyansky
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Most of users of flush_workqueue() can be changed to use cancel_work_sync(),
    but sometimes we really need to wait for the completion and cancelling is not
    an option. schedule_on_each_cpu() is good example.

    Add the new helper, flush_work(work), which waits for the completion of the
    specific work_struct. More precisely, it "flushes" the result of of the last
    queue_work() which is visible to the caller.

    For example, this code

    queue_work(wq, work);
    /* WINDOW */
    queue_work(wq, work);

    flush_work(work);

    doesn't necessary work "as expected". What can happen in the WINDOW above is

    - wq starts the execution of work->func()

    - the caller migrates to another CPU

    now, after the 2nd queue_work() this work is active on the previous CPU, and
    at the same time it is queued on another. In this case flush_work(work) may
    return before the first work->func() completes.

    It is trivial to add another helper

    int flush_work_sync(struct work_struct *work)
    {
    return flush_work(work) || wait_on_work(work);
    }

    which works "more correctly", but it has to iterate over all CPUs and thus
    it much slower than flush_work().

    Signed-off-by: Oleg Nesterov
    Acked-by: Max Krasnyansky
    Acked-by: Jarek Poplawski
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • insert_work() inserts the new work_struct before or after cwq->worklist,
    depending on the "int tail" parameter. Change it to accept "list_head *"
    instead, this shrinks .text a bit and allows us to insert the barrier
    after specific work_struct.

    Signed-off-by: Oleg Nesterov
    Cc: Jarek Poplawski
    Cc: Max Krasnyansky
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

25 Jul, 2008

1 commit

  • This interface allows adding a job on a specific cpu.

    Although a work struct on a cpu will be scheduled to other cpu if the cpu
    dies, there is a recursion if a work task tries to offline the cpu it's
    running on. we need to schedule the task to a specific cpu in this case.
    http://bugzilla.kernel.org/show_bug.cgi?id=10897

    [oleg@tv-sign.ru: cleanups]
    Signed-off-by: Zhang Rui
    Tested-by: Rus
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Rui
     

06 Jul, 2008

1 commit


05 Jul, 2008

1 commit

  • Remove all clameter@sgi.com addresses from the kernel tree since they will
    become invalid on June 27th. Change my maintainer email address for the
    slab allocators to cl@linux-foundation.org (which will be the new email
    address for the future).

    Signed-off-by: Christoph Lameter
    Signed-off-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Stephen Rothwell
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter