12 Dec, 2011

6 commits

  • Tyler Hicks pointed me at an additional article on RCU and I figured
    it should probably be mentioned with the others.

    Signed-off-by: Kees Cook
    Signed-off-by: Paul E. McKenney

    Kees Cook
     
  • Running CPU-hotplug operations concurrently with rcutorture has
    historically been a good way to find bugs in both RCU and CPU hotplug.
    This commit therefore adds an rcutorture module parameter called
    "onoff_interval" that causes a randomly selected CPU-hotplug operation to
    be executed at the specified interval, in seconds. The default value of
    "onoff_interval" is zero, which disables rcutorture-instigated CPU-hotplug
    operations.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Although it is easy to run rcutorture tests under KVM, there is currently
    no nice way to run such a test for a fixed time period, collect all of
    the rcutorture data, and then shut the system down cleanly. This commit
    therefore adds an rcutorture module parameter named "shutdown_secs" that
    specified the run duration in seconds, after which rcutorture terminates
    the test and powers the system down. The default value for "shutdown_secs"
    is zero, which disables shutdown.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Update various files in Documentation/RCU to reflect srcu_read_lock_raw()
    and srcu_read_unlock_raw(). Credit to Peter Zijlstra for suggesting
    use of the existing _raw suffix instead of the earlier bulkref names.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • One of lclaudio's systems was seeing RCU CPU stall warnings from idle.
    These turned out to be caused by a bug that stopped scheduling-clock
    tick interrupts from being sent to a given CPU for several hundred seconds.
    This commit therefore updates the documentation to call this out as a
    possible cause for RCU CPU stall warnings.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Earlier versions of RCU used the scheduling-clock tick to detect idleness
    by checking for the idle task, but handled idleness differently for
    CONFIG_NO_HZ=y. But there are now a number of uses of RCU read-side
    critical sections in the idle task, for example, for tracing. A more
    fine-grained detection of idleness is therefore required.

    This commit presses the old dyntick-idle code into full-time service,
    so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
    always invoked at the beginning of an idle loop iteration. Similarly,
    rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
    at the end of an idle-loop iteration. This allows the idle task to
    use RCU everywhere except between consecutive rcu_idle_enter() and
    rcu_idle_exit() calls, in turn allowing architecture maintainers to
    specify exactly where in the idle loop that RCU may be used.

    Because some of the userspace upcall uses can result in what looks
    to RCU like half of an interrupt, it is not possible to expect that
    the irq_enter() and irq_exit() hooks will give exact counts. This
    patch therefore expands the ->dynticks_nesting counter to 64 bits
    and uses two separate bitfields to count process/idle transitions
    and interrupt entry/exit transitions. It is presumed that userspace
    upcalls do not happen in the idle loop or from usermode execution
    (though usermode might do a system call that results in an upcall).
    The counter is hard-reset on each process/idle transition, which
    avoids the interrupt entry/exit error from accumulating. Overflow
    is avoided by the 64-bitness of the ->dyntick_nesting counter.

    This commit also adds warnings if a non-idle task asks RCU to enter
    idle state (and these checks will need some adjustment before applying
    Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
    In addition, validation of ->dynticks and ->dynticks_nesting is added.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

29 Sep, 2011

8 commits

  • There has been quite a bit of confusion about what RCU-lockdep splats
    mean, so this commit adds some documentation describing how to
    interpret them.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Add documentation for rcu_dereference_bh_check(),
    rcu_dereference_sched_check(), srcu_dereference_check(), and
    rcu_dereference_index_check().

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Since ca5ecddf (rcu: define __rcu address space modifier for sparse)
    rcu_dereference_check() use rcu_read_lock_held() as a part of condition
    automatically. Therefore, callers of rcu_dereference_check() no longer
    need to pass rcu_read_lock_held() to rcu_dereference_check().

    Signed-off-by: Michal Hocko
    Signed-off-by: Paul E. McKenney

    Michal Hocko
     
  • There is often a delay between the time that a CPU passes through a
    quiescent state and the time that this quiescent state is reported to the
    RCU core. It is quite possible that the grace period ended before the
    quiescent state could be reported, for example, some other CPU might have
    deduced that this CPU passed through dyntick-idle mode. It is critically
    important that quiescent state be counted only against the grace period
    that was in effect at the time that the quiescent state was detected.

    Previously, this was handled by recording the number of the last grace
    period to complete when passing through a quiescent state. The RCU
    core then checks this number against the current value, and rejects
    the quiescent state if there is a mismatch. However, one additional
    possibility must be accounted for, namely that the quiescent state was
    recorded after the prior grace period completed but before the current
    grace period started. In this case, the RCU core must reject the
    quiescent state, but the recorded number will match. This is handled
    when the CPU becomes aware of a new grace period -- at that point,
    it invalidates any prior quiescent state.

    This works, but is a bit indirect. The new approach records the current
    grace period, and the RCU core checks to see (1) that this is still the
    current grace period and (2) that this grace period has not yet ended.
    This approach simplifies reasoning about correctness, and this commit
    changes over to this new approach.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • It has long been the case that the architecture must call nmi_enter()
    and nmi_exit() rather than irq_enter() and irq_exit() in order to
    permit RCU read-side critical sections in NMIs. Catch the documentation
    up with reality.

    Signed-off-by: Paul E. McKenney
    Acked-by: Mathieu Desnoyers

    Paul E. McKenney
     
  • Now that the RCU API contains synchronize_rcu_bh(), synchronize_sched(),
    call_rcu_sched(), and rcu_bh_expedited()...

    Make rcutorture test synchronize_rcu_bh(), getting rid of the old
    rcu_bh_torture_synchronize() workaround. Similarly, make rcutorture test
    synchronize_sched(), getting rid of the old sched_torture_synchronize()
    workaround. Make rcutorture test call_rcu_sched() instead of wrappering
    synchronize_sched(). Also add testing of rcu_bh_expedited().

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Update rcutorture documentation to account for boosting, new types of
    RCU torture testing that have been added over the past few years, and
    the memory-barrier testing that was added an embarrassingly long time
    ago.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Call out the RCU_TRACE information that is provided only in kernels
    built with RCU_BOOST.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

13 Jun, 2011

1 commit

  • Change all "arch/i386" to "arch/x86" in Documentaion/,
    since the directory has changed.

    Also update the files which have changed their filename
    in the meantime accordingly.

    Signed-off-by: Wanlong Gao
    [jkosina@suse.cz: reword changelog]
    Signed-off-by: Jiri Kosina

    Wanlong Gao
     

27 May, 2011

1 commit

  • (Note: this was reverted, and is now being re-applied in pieces, with
    this being the fifth and final piece. See below for the reason that
    it is now felt to be safe to re-apply this.)

    Commit d09b62d fixed grace-period synchronization, but left some smp_mb()
    invocations in rcu_process_callbacks() that are no longer needed, but
    sheer paranoia prevented them from being removed. This commit removes
    them and provides a proof of correctness in their absence. It also adds
    a memory barrier to rcu_report_qs_rsp() immediately before the update to
    rsp->completed in order to handle the theoretical possibility that the
    compiler or CPU might move massive quantities of code into a lock-based
    critical section. This also proves that the sheer paranoia was not
    entirely unjustified, at least from a theoretical point of view.

    In addition, the old dyntick-idle synchronization depended on the fact
    that grace periods were many milliseconds in duration, so that it could
    be assumed that no dyntick-idle CPU could reorder a memory reference
    across an entire grace period. Unfortunately for this design, the
    addition of expedited grace periods breaks this assumption, which has
    the unfortunate side-effect of requiring atomic operations in the
    functions that track dyntick-idle state for RCU. (There is some hope
    that the algorithms used in user-level RCU might be applied here, but
    some work is required to handle the NMIs that user-space applications
    can happily ignore. For the short term, better safe than sorry.)

    This proof assumes that neither compiler nor CPU will allow a lock
    acquisition and release to be reordered, as doing so can result in
    deadlock. The proof is as follows:

    1. A given CPU declares a quiescent state under the protection of
    its leaf rcu_node's lock.

    2. If there is more than one level of rcu_node hierarchy, the
    last CPU to declare a quiescent state will also acquire the
    ->lock of the next rcu_node up in the hierarchy, but only
    after releasing the lower level's lock. The acquisition of this
    lock clearly cannot occur prior to the acquisition of the leaf
    node's lock.

    3. Step 2 repeats until we reach the root rcu_node structure.
    Please note again that only one lock is held at a time through
    this process. The acquisition of the root rcu_node's ->lock
    must occur after the release of that of the leaf rcu_node.

    4. At this point, we set the ->completed field in the rcu_state
    structure in rcu_report_qs_rsp(). However, if the rcu_node
    hierarchy contains only one rcu_node, then in theory the code
    preceding the quiescent state could leak into the critical
    section. We therefore precede the update of ->completed with a
    memory barrier. All CPUs will therefore agree that any updates
    preceding any report of a quiescent state will have happened
    before the update of ->completed.

    5. Regardless of whether a new grace period is needed, rcu_start_gp()
    will propagate the new value of ->completed to all of the leaf
    rcu_node structures, under the protection of each rcu_node's ->lock.
    If a new grace period is needed immediately, this propagation
    will occur in the same critical section that ->completed was
    set in, but courtesy of the memory barrier in #4 above, is still
    seen to follow any pre-quiescent-state activity.

    6. When a given CPU invokes __rcu_process_gp_end(), it becomes
    aware of the end of the old grace period and therefore makes
    any RCU callbacks that were waiting on that grace period eligible
    for invocation.

    If this CPU is the same one that detected the end of the grace
    period, and if there is but a single rcu_node in the hierarchy,
    we will still be in the single critical section. In this case,
    the memory barrier in step #4 guarantees that all callbacks will
    be seen to execute after each CPU's quiescent state.

    On the other hand, if this is a different CPU, it will acquire
    the leaf rcu_node's ->lock, and will again be serialized after
    each CPU's quiescent state for the old grace period.

    On the strength of this proof, this commit therefore removes the memory
    barriers from rcu_process_callbacks() and adds one to rcu_report_qs_rsp().
    The effect is to reduce the number of memory barriers by one and to
    reduce the frequency of execution from about once per scheduling tick
    per CPU to once per grace period.

    This was reverted do to hangs found during testing by Yinghai Lu and
    Ingo Molnar. Frederic Weisbecker supplied Yinghai with tracing that
    located the underlying problem, and Frederic also provided the fix.

    The underlying problem was that the HARDIRQ_ENTER() macro from
    lib/locking-selftest.c invoked irq_enter(), which in turn invokes
    rcu_irq_enter(), but HARDIRQ_EXIT() invoked __irq_exit(), which
    does not invoke rcu_irq_exit(). This situation resulted in calls
    to rcu_irq_enter() that were not balanced by the required calls to
    rcu_irq_exit(). Therefore, after these locking selftests completed,
    RCU's dyntick-idle nesting count was a large number (for example,
    72), which caused RCU to to conclude that the affected CPU was not in
    dyntick-idle mode when in fact it was.

    RCU would therefore incorrectly wait for this dyntick-idle CPU, resulting
    in hangs.

    In contrast, with Frederic's patch, which replaces the irq_enter()
    in HARDIRQ_ENTER() with an __irq_enter(), these tests don't ever call
    either rcu_irq_enter() or rcu_irq_exit(), which works because the CPU
    running the test is already marked as not being in dyntick-idle mode.
    This means that the rcu_irq_enter() and rcu_irq_exit() calls and RCU
    then has no problem working out which CPUs are in dyntick-idle mode and
    which are not.

    The reason that the imbalance was not noticed before the barrier patch
    was applied is that the old implementation of rcu_enter_nohz() ignored
    the nesting depth. This could still result in delays, but much shorter
    ones. Whenever there was a delay, RCU would IPI the CPU with the
    unbalanced nesting level, which would eventually result in rcu_enter_nohz()
    being called, which in turn would force RCU to see that the CPU was in
    dyntick-idle mode.

    The reason that very few people noticed the problem is that the mismatched
    irq_enter() vs. __irq_exit() occured only when the kernel was built with
    CONFIG_DEBUG_LOCKING_API_SELFTESTS.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

20 May, 2011

1 commit


06 May, 2011

8 commits

  • Increment a per-CPU counter on each pass through rcu_cpu_kthread()'s
    service loop, and add it to the rcudata trace output.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • This commit adds the age in jiffies of the current grace period along
    with the duration in jiffies of the longest grace period since boot
    to the rcu/rcugp debugfs file. It also adds an additional "O" state
    to kthread tracing to differentiate between the kthread waiting due to
    having nothing to do on the one hand and waiting due to being on the
    wrong CPU on the other hand.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit documents the new debugfs rcu/rcutorture and rcu/rcuboost
    trace files. The description has been updated as suggested by Josh
    Triplett.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit adds an indication of the state of the callback queue using
    a string of four characters following the "ql=" integer queue length.
    The first character is "N" if there are callbacks that have been
    queued that are not yet ready to be handled by the next grace period, or
    "." otherwise. The second character is "R" if there are callbacks queued
    that are ready to be handled by the next grace period, or "." otherwise.
    The third character is "W" if there are callbacks waiting for the current
    grace period, or "." otherwise. Finally, the fourth character is "D"
    if there are callbacks that have been handled by a prior grace period
    and are waiting to be invoked, or ".".

    Note that callbacks that are in the process of being invoked are
    not shown. These callbacks would have been removed from the rcu_data
    structure's list by rcu_do_batch() prior to being executed. (These
    callbacks are also not reflected in the "ql=" total, FWIW.)

    Also, document the new callback-queue trace information.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The trace.txt file had obsolete output for the debugfs rcu/rcudata
    file, so update it.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Combine the current TREE_PREEMPT_RCU ->blocked_tasks[] lists in the
    rcu_node structure into a single ->blkd_tasks list with ->gp_tasks
    and ->exp_tasks tail pointers. This is in preparation for RCU priority
    boosting, which will add a third dimension to the combinatorial explosion
    in the ->blocked_tasks[] case, but simply a third pointer in the new
    ->blkd_tasks case.

    Also update documentation to reflect blocked_tasks[] merge

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Commit d09b62d fixed grace-period synchronization, but left some smp_mb()
    invocations in rcu_process_callbacks() that are no longer needed, but
    sheer paranoia prevented them from being removed. This commit removes
    them and provides a proof of correctness in their absence. It also adds
    a memory barrier to rcu_report_qs_rsp() immediately before the update to
    rsp->completed in order to handle the theoretical possibility that the
    compiler or CPU might move massive quantities of code into a lock-based
    critical section. This also proves that the sheer paranoia was not
    entirely unjustified, at least from a theoretical point of view.

    In addition, the old dyntick-idle synchronization depended on the fact
    that grace periods were many milliseconds in duration, so that it could
    be assumed that no dyntick-idle CPU could reorder a memory reference
    across an entire grace period. Unfortunately for this design, the
    addition of expedited grace periods breaks this assumption, which has
    the unfortunate side-effect of requiring atomic operations in the
    functions that track dyntick-idle state for RCU. (There is some hope
    that the algorithms used in user-level RCU might be applied here, but
    some work is required to handle the NMIs that user-space applications
    can happily ignore. For the short term, better safe than sorry.)

    This proof assumes that neither compiler nor CPU will allow a lock
    acquisition and release to be reordered, as doing so can result in
    deadlock. The proof is as follows:

    1. A given CPU declares a quiescent state under the protection of
    its leaf rcu_node's lock.

    2. If there is more than one level of rcu_node hierarchy, the
    last CPU to declare a quiescent state will also acquire the
    ->lock of the next rcu_node up in the hierarchy, but only
    after releasing the lower level's lock. The acquisition of this
    lock clearly cannot occur prior to the acquisition of the leaf
    node's lock.

    3. Step 2 repeats until we reach the root rcu_node structure.
    Please note again that only one lock is held at a time through
    this process. The acquisition of the root rcu_node's ->lock
    must occur after the release of that of the leaf rcu_node.

    4. At this point, we set the ->completed field in the rcu_state
    structure in rcu_report_qs_rsp(). However, if the rcu_node
    hierarchy contains only one rcu_node, then in theory the code
    preceding the quiescent state could leak into the critical
    section. We therefore precede the update of ->completed with a
    memory barrier. All CPUs will therefore agree that any updates
    preceding any report of a quiescent state will have happened
    before the update of ->completed.

    5. Regardless of whether a new grace period is needed, rcu_start_gp()
    will propagate the new value of ->completed to all of the leaf
    rcu_node structures, under the protection of each rcu_node's ->lock.
    If a new grace period is needed immediately, this propagation
    will occur in the same critical section that ->completed was
    set in, but courtesy of the memory barrier in #4 above, is still
    seen to follow any pre-quiescent-state activity.

    6. When a given CPU invokes __rcu_process_gp_end(), it becomes
    aware of the end of the old grace period and therefore makes
    any RCU callbacks that were waiting on that grace period eligible
    for invocation.

    If this CPU is the same one that detected the end of the grace
    period, and if there is but a single rcu_node in the hierarchy,
    we will still be in the single critical section. In this case,
    the memory barrier in step #4 guarantees that all callbacks will
    be seen to execute after each CPU's quiescent state.

    On the other hand, if this is a different CPU, it will acquire
    the leaf rcu_node's ->lock, and will again be serialized after
    each CPU's quiescent state for the old grace period.

    On the strength of this proof, this commit therefore removes the memory
    barriers from rcu_process_callbacks() and adds one to rcu_report_qs_rsp().
    The effect is to reduce the number of memory barriers by one and to
    reduce the frequency of execution from about once per scheduling tick
    per CPU to once per grace period.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The RCU CPU stall warnings can now be controlled using the
    rcu_cpu_stall_suppress boot-time parameter or via the same parameter
    from sysfs. There is therefore no longer any reason to have
    kernel config parameters for this feature. This commit therefore
    removes the RCU_CPU_STALL_DETECTOR and RCU_CPU_STALL_DETECTOR_RUNNABLE
    kernel config parameters. The RCU_CPU_STALL_TIMEOUT parameter remains
    to allow the timeout to be tuned and the RCU_CPU_STALL_VERBOSE parameter
    remains to allow task-stall information to be suppressed if desired.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

05 Mar, 2011

1 commit


30 Nov, 2010

2 commits


24 Sep, 2010

1 commit

  • The current tracing data is not sufficient to deduce the average time
    that a callback spends waiting for a grace period to end. Add three
    per-CPU counters recording the number of callbacks invoked (ci), the
    number of callbacks orphaned (co), and the number of callbacks adopted
    (ca). Given the existing callback queue length (ql), the average wait
    time in absence of CPU hotplug operations is ql/ci. The units of wait
    time will be in terms of the duration over which ci was measured.

    In the presence of CPU hotplug operations, there is room for argument,
    but ql/(ci-co+ca) won't steer you too far wrong.

    Also fixes a typo called out by Lucas De Marchi .

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

24 Aug, 2010

1 commit


21 Aug, 2010

1 commit


20 Aug, 2010

1 commit


04 Aug, 2010

1 commit

  • Below you will find an updated version from the original series bunching all patches into one big patch
    updating broken web addresses that are located in Documentation/*
    Some of the addresses date as far far back as 1995 etc... so searching became a bit difficult,
    the best way to deal with these is to use web.archive.org to locate these addresses that are outdated.
    Now there are also some addresses pointing to .spec files some are located, but some(after searching
    on the companies site)where still no where to be found. In this case I just changed the address
    to the company site this way the users can contact the company and they can locate them for the users.

    Signed-off-by: Justin P. Mattock
    Signed-off-by: Thomas Weber
    Signed-off-by: Mike Frysinger
    Cc: Paulo Marques
    Cc: Randy Dunlap
    Cc: Michael Neuling
    Signed-off-by: Jiri Kosina

    Justin P. Mattock
     

18 May, 2010

1 commit

  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
    stop_machine: Move local variable closer to the usage site in cpu_stop_cpu_callback()
    sched, wait: Use wrapper functions
    sched: Remove a stale comment
    ondemand: Make the iowait-is-busy time a sysfs tunable
    ondemand: Solve a big performance issue by counting IOWAIT time as busy
    sched: Intoduce get_cpu_iowait_time_us()
    sched: Eliminate the ts->idle_lastupdate field
    sched: Fold updating of the last_update_time_info into update_ts_time_stats()
    sched: Update the idle statistics in get_cpu_idle_time_us()
    sched: Introduce a function to update the idle statistics
    sched: Add a comment to get_cpu_idle_time_us()
    cpu_stop: add dummy implementation for UP
    sched: Remove rq argument to the tracepoints
    rcu: need barrier() in UP synchronize_sched_expedited()
    sched: correctly place paranioa memory barriers in synchronize_sched_expedited()
    sched: kill paranoia check in synchronize_sched_expedited()
    sched: replace migration_thread with cpu_stop
    stop_machine: reimplement using cpu_stop
    cpu_stop: implement stop_cpu[s]()
    sched: Fix select_idle_sibling() logic in select_task_rq_fair()
    ...

    Linus Torvalds
     

11 May, 2010

2 commits

  • The existing Documentation/RCU/stallwarn.txt has proven unhelpful, so
    rework it a bit. In particular, show how to interpret the stall-warning
    messages.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Lai Jiangshan noted that up to 10% of the RCU_SOFTIRQ are spurious, and
    traced this down to the fact that the current grace-period machinery
    will uselessly raise RCU_SOFTIRQ when a given CPU needs to go through
    a quiescent state, but has not yet done so. In this situation, there
    might well be nothing that RCU_SOFTIRQ can do, and the overhead can be
    worth worrying about in the ksoftirqd case. This patch therefore avoids
    raising RCU_SOFTIRQ in this situation.

    Changes since v1 (http://lkml.org/lkml/2010/3/30/122 from Lai Jiangshan):

    o Omit the rcu_qs_pending() prechecks, as they aren't that
    much less expensive than the quiescent-state checks.

    o Merge with the set_need_resched() patch that reduces IPIs.

    o Add the new n_rp_report_qs field to the rcu_pending tracing output.

    o Update the tracing documentation accordingly.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

09 May, 2010

1 commit


07 May, 2010

1 commit

  • Currently migration_thread is serving three purposes - migration
    pusher, context to execute active_load_balance() and forced context
    switcher for expedited RCU synchronize_sched. All three roles are
    hardcoded into migration_thread() and determining which job is
    scheduled is slightly messy.

    This patch kills migration_thread and replaces all three uses with
    cpu_stop. The three different roles of migration_thread() are
    splitted into three separate cpu_stop callbacks -
    migration_cpu_stop(), active_load_balance_cpu_stop() and
    synchronize_sched_expedited_cpu_stop() - and each use case now simply
    asks cpu_stop to execute the callback as necessary.

    synchronize_sched_expedited() was implemented with private
    preallocated resources and custom multi-cpu queueing and waiting
    logic, both of which are provided by cpu_stop.
    synchronize_sched_expedited_count is made atomic and all other shared
    resources along with the mutex are dropped.

    synchronize_sched_expedited() also implemented a check to detect cases
    where not all the callback got executed on their assigned cpus and
    fall back to synchronize_sched(). If called with cpu hotplug blocked,
    cpu_stop already guarantees that and the condition cannot happen;
    otherwise, stop_machine() would break. However, this patch preserves
    the paranoid check using a cpumask to record on which cpus the stopper
    ran so that it can serve as a bisection point if something actually
    goes wrong theree.

    Because the internal execution state is no longer visible,
    rcu_expedited_torture_stats() is removed.

    This patch also renames cpu_stop threads to from "stopper/%d" to
    "migration/%d". The names of these threads ultimately don't matter
    and there's no reason to make unnecessary userland visible changes.

    With this patch applied, stop_machine() and sched now share the same
    resources. stop_machine() is faster without wasting any resources and
    sched migration users are much cleaner.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Dipankar Sarma
    Cc: Josh Triplett
    Cc: Paul E. McKenney
    Cc: Oleg Nesterov
    Cc: Dimitri Sivanich

    Tejun Heo
     

14 Apr, 2010

1 commit

  • Update examples and lists of APIs to include these new
    primitives.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: eric.dumazet@gmail.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

25 Feb, 2010

1 commit

  • The version numbers change too quickly, so use a canonical URL
    that represents the most recent version.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney