05 May, 2013

1 commit


26 Mar, 2013

1 commit

  • Because RCU callbacks are now associated with the number of the grace
    period that they must wait for, CPUs can now take advance callbacks
    corresponding to grace periods that ended while a given CPU was in
    dyntick-idle mode. This eliminates the need to try forcing the RCU
    state machine while entering idle, thus reducing the CPU intensiveness
    of RCU_FAST_NO_HZ, which should increase its energy efficiency.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

17 Nov, 2012

2 commits

  • Currently, callback invocations from callback-free CPUs are accounted to
    the CPU that registered the callback, but using the same field that is
    used for normal callbacks. This makes it impossible to determine from
    debugfs output whether callbacks are in fact being diverted. This commit
    therefore adds a separate ->n_nocbs_invoked field in the rcu_data structure
    in which diverted callback invocations are counted. RCU's debugfs tracing
    still displays normal callback invocations using ci=, but displayed
    diverted callbacks with nci=.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • RCU callback execution can add significant OS jitter and also can
    degrade both scheduling latency and, in asymmetric multiprocessors,
    energy efficiency. This commit therefore adds the ability for selected
    CPUs ("rcu_nocbs=" boot parameter) to have their callbacks offloaded
    to kthreads. If the "rcu_nocb_poll" boot parameter is also specified,
    these kthreads will do polling, removing the need for the offloaded
    CPUs to do wakeups. At least one CPU must be doing normal callback
    processing: currently CPU 0 cannot be selected as a no-CBs CPU.
    In addition, attempts to offline the last normal-CBs CPU will fail.

    This feature was inspired by Jim Houston's and Joe Korty's JRCU, and
    this commit includes fixes to problems located by Fengguang Wu's
    kbuild test robot.

    [ paulmck: Added gfp.h include file as suggested by Fengguang Wu. ]

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

09 Nov, 2012

14 commits


26 Sep, 2012

1 commit


25 Sep, 2012

1 commit

  • …', 'hotplug.2012.09.23a' and 'idlechop.2012.09.23a' into HEAD

    bigrt.2012.09.23a contains additional commits to reduce scheduling latency
    from RCU on huge systems (many hundrends or thousands of CPUs).

    doctorture.2012.09.23a contains documentation changes and rcutorture fixes.

    fixes.2012.09.23a contains miscellaneous fixes.

    hotplug.2012.09.23a contains CPU-hotplug-related changes.

    idle.2012.09.23a fixes architectures for which RCU no longer considered
    the idle loop to be a quiescent state due to earlier
    adaptive-dynticks changes. Affected architectures are alpha,
    cris, frv, h8300, m32r, m68k, mn10300, parisc, score, xtensa,
    and ia64.

    Paul E. McKenney
     

23 Sep, 2012

3 commits

  • Currently, _rcu_barrier() relies on preempt_disable() to prevent
    any CPU from going offline, which in turn depends on CPU hotplug's
    use of __stop_machine().

    This patch therefore makes _rcu_barrier() use get_online_cpus() to
    block CPU-hotplug operations. This has the added benefit of removing
    the need for _rcu_barrier() to adopt callbacks: Because CPU-hotplug
    operations are excluded, there can be no callbacks to adopt. This
    commit simplifies the code accordingly.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The current quiescent-state detection algorithm is needlessly
    complex. It records the grace-period number corresponding to
    the quiescent state at the time of the quiescent state, which
    works, but it seems better to simply erase any record of previous
    quiescent states at the time that the CPU notices the new grace
    period. This has the further advantage of removing another piece
    of RCU for which lockless reasoning is required.

    Therefore, this commit makes this change.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Moving quiescent-state forcing into a kthread dispenses with the need
    for the ->n_rp_need_fqs field, so this commit removes it.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

13 Aug, 2012

1 commit

  • Bring RCU into the new-age CPU-hotplug fold by modifying RCU's per-CPU
    kthread code to use the new smp_hotplug_thread facility.

    [ tglx: Adapted it to use callbacks and to the simplified rcu yield ]

    Signed-off-by: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Srivatsa S. Bhat
    Cc: Rusty Russell
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/20120716103948.673354828@linutronix.de
    Signed-off-by: Thomas Gleixner

    Paul E. McKenney
     

06 Jul, 2012

1 commit

  • Although the C language allows you to break strings across lines, doing
    this makes it hard for people to find the Linux kernel code corresponding
    to a given console message. This commit therefore fixes broken strings
    throughout RCU's source code.

    Suggested-by: Josh Triplett
    Suggested-by: Ingo Molnar
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

03 Jul, 2012

3 commits


10 May, 2012

1 commit

  • The rcu_barrier() primitive interrupts each and every CPU, registering
    a callback on every CPU. Once all of these callbacks have been invoked,
    rcu_barrier() knows that every callback that was registered before
    the call to rcu_barrier() has also been invoked.

    However, there is no point in registering a callback on a CPU that
    currently has no callbacks, most especially if that CPU is in a
    deep idle state. This commit therefore makes rcu_barrier() avoid
    interrupting CPUs that have no callbacks. Doing this requires reworking
    the handling of orphaned callbacks, otherwise callbacks could slip through
    rcu_barrier()'s net by being orphaned from a CPU that rcu_barrier() had
    not yet interrupted to a CPU that rcu_barrier() had already interrupted.
    This reworking was needed anyway to take a first step towards weaning
    RCU from the CPU_DYING notifier's use of stop_cpu().

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

22 Feb, 2012

2 commits

  • Because newly offlined CPUs continue executing after completing the
    CPU_DYING notifiers, they legitimately enter the scheduler and use
    RCU while appearing to be offline. This calls for a more sophisticated
    approach as follows:

    1. RCU marks the CPU online during the CPU_UP_PREPARE phase.

    2. RCU marks the CPU offline during the CPU_DEAD phase.

    3. Diagnostics regarding use of read-side RCU by offline CPUs use
    RCU's accounting rather than the cpu_online_map. (Note that
    __call_rcu() still uses cpu_online_map to detect illegal
    invocations within CPU_DYING notifiers.)

    4. Offline CPUs are prevented from hanging the system by
    force_quiescent_state(), which pays attention to cpu_online_map.
    Some additional work (in a later commit) will be needed to
    guarantee that force_quiescent_state() waits a full jiffy before
    assuming that a CPU is offline, for example, when called from
    idle entry. (This commit also makes the one-jiffy wait
    explicit, since the old-style implicit wait can now be defeated
    by RCU_FAST_NO_HZ and by rcutorture.)

    This approach avoids the false positives encountered when attempting to
    use more exact classification of CPU online/offline state.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • When CONFIG_RCU_FAST_NO_HZ is enabled, RCU will allow a given CPU to
    enter dyntick-idle mode even if it still has RCU callbacks queued.
    RCU avoids system hangs in this case by scheduling a timer for several
    jiffies in the future. However, if all of the callbacks on that CPU
    are from kfree_rcu(), there is no reason to wake the CPU up, as it is
    not a problem to defer freeing of memory.

    This commit therefore tracks the number of callbacks on a given CPU
    that are from kfree_rcu(), and avoids scheduling the timer if all of
    a given CPU's callbacks are from kfree_rcu().

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

12 Dec, 2011

2 commits

  • Earlier versions of RCU used the scheduling-clock tick to detect idleness
    by checking for the idle task, but handled idleness differently for
    CONFIG_NO_HZ=y. But there are now a number of uses of RCU read-side
    critical sections in the idle task, for example, for tracing. A more
    fine-grained detection of idleness is therefore required.

    This commit presses the old dyntick-idle code into full-time service,
    so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
    always invoked at the beginning of an idle loop iteration. Similarly,
    rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
    at the end of an idle-loop iteration. This allows the idle task to
    use RCU everywhere except between consecutive rcu_idle_enter() and
    rcu_idle_exit() calls, in turn allowing architecture maintainers to
    specify exactly where in the idle loop that RCU may be used.

    Because some of the userspace upcall uses can result in what looks
    to RCU like half of an interrupt, it is not possible to expect that
    the irq_enter() and irq_exit() hooks will give exact counts. This
    patch therefore expands the ->dynticks_nesting counter to 64 bits
    and uses two separate bitfields to count process/idle transitions
    and interrupt entry/exit transitions. It is presumed that userspace
    upcalls do not happen in the idle loop or from usermode execution
    (though usermode might do a system call that results in an upcall).
    The counter is hard-reset on each process/idle transition, which
    avoids the interrupt entry/exit error from accumulating. Overflow
    is avoided by the 64-bitness of the ->dyntick_nesting counter.

    This commit also adds warnings if a non-idle task asks RCU to enter
    idle state (and these checks will need some adjustment before applying
    Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
    In addition, validation of ->dynticks and ->dynticks_nesting is added.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The ->signaled field was named before complications in the form of
    dyntick-idle mode and offlined CPUs. These complications have required
    that force_quiescent_state() be implemented as a state machine, instead
    of simply unconditionally sending reschedule IPIs. Therefore, this
    commit renames ->signaled to ->fqs_state to catch up with the new
    force_quiescent_state() reality.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

29 Sep, 2011

3 commits

  • There is often a delay between the time that a CPU passes through a
    quiescent state and the time that this quiescent state is reported to the
    RCU core. It is quite possible that the grace period ended before the
    quiescent state could be reported, for example, some other CPU might have
    deduced that this CPU passed through dyntick-idle mode. It is critically
    important that quiescent state be counted only against the grace period
    that was in effect at the time that the quiescent state was detected.

    Previously, this was handled by recording the number of the last grace
    period to complete when passing through a quiescent state. The RCU
    core then checks this number against the current value, and rejects
    the quiescent state if there is a mismatch. However, one additional
    possibility must be accounted for, namely that the quiescent state was
    recorded after the prior grace period completed but before the current
    grace period started. In this case, the RCU core must reject the
    quiescent state, but the recorded number will match. This is handled
    when the CPU becomes aware of a new grace period -- at that point,
    it invalidates any prior quiescent state.

    This works, but is a bit indirect. The new approach records the current
    grace period, and the RCU core checks to see (1) that this is still the
    current grace period and (2) that this grace period has not yet ended.
    This approach simplifies reasoning about correctness, and this commit
    changes over to this new approach.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Andi Kleen noticed that one of the RCU_BOOST data declarations was
    out of sync with the definition. Move the declarations so that the
    compiler can do the checking in the future.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • rcutree.c defines rcu_cpu_kthread_cpu as int, not unsigned int,
    so the extern has to follow that.

    Signed-off-by: Andi Kleen
    Signed-off-by: Paul E. McKenney

    Andi Kleen
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

16 Jun, 2011

1 commit


27 May, 2011

1 commit

  • (Note: this was reverted, and is now being re-applied in pieces, with
    this being the fifth and final piece. See below for the reason that
    it is now felt to be safe to re-apply this.)

    Commit d09b62d fixed grace-period synchronization, but left some smp_mb()
    invocations in rcu_process_callbacks() that are no longer needed, but
    sheer paranoia prevented them from being removed. This commit removes
    them and provides a proof of correctness in their absence. It also adds
    a memory barrier to rcu_report_qs_rsp() immediately before the update to
    rsp->completed in order to handle the theoretical possibility that the
    compiler or CPU might move massive quantities of code into a lock-based
    critical section. This also proves that the sheer paranoia was not
    entirely unjustified, at least from a theoretical point of view.

    In addition, the old dyntick-idle synchronization depended on the fact
    that grace periods were many milliseconds in duration, so that it could
    be assumed that no dyntick-idle CPU could reorder a memory reference
    across an entire grace period. Unfortunately for this design, the
    addition of expedited grace periods breaks this assumption, which has
    the unfortunate side-effect of requiring atomic operations in the
    functions that track dyntick-idle state for RCU. (There is some hope
    that the algorithms used in user-level RCU might be applied here, but
    some work is required to handle the NMIs that user-space applications
    can happily ignore. For the short term, better safe than sorry.)

    This proof assumes that neither compiler nor CPU will allow a lock
    acquisition and release to be reordered, as doing so can result in
    deadlock. The proof is as follows:

    1. A given CPU declares a quiescent state under the protection of
    its leaf rcu_node's lock.

    2. If there is more than one level of rcu_node hierarchy, the
    last CPU to declare a quiescent state will also acquire the
    ->lock of the next rcu_node up in the hierarchy, but only
    after releasing the lower level's lock. The acquisition of this
    lock clearly cannot occur prior to the acquisition of the leaf
    node's lock.

    3. Step 2 repeats until we reach the root rcu_node structure.
    Please note again that only one lock is held at a time through
    this process. The acquisition of the root rcu_node's ->lock
    must occur after the release of that of the leaf rcu_node.

    4. At this point, we set the ->completed field in the rcu_state
    structure in rcu_report_qs_rsp(). However, if the rcu_node
    hierarchy contains only one rcu_node, then in theory the code
    preceding the quiescent state could leak into the critical
    section. We therefore precede the update of ->completed with a
    memory barrier. All CPUs will therefore agree that any updates
    preceding any report of a quiescent state will have happened
    before the update of ->completed.

    5. Regardless of whether a new grace period is needed, rcu_start_gp()
    will propagate the new value of ->completed to all of the leaf
    rcu_node structures, under the protection of each rcu_node's ->lock.
    If a new grace period is needed immediately, this propagation
    will occur in the same critical section that ->completed was
    set in, but courtesy of the memory barrier in #4 above, is still
    seen to follow any pre-quiescent-state activity.

    6. When a given CPU invokes __rcu_process_gp_end(), it becomes
    aware of the end of the old grace period and therefore makes
    any RCU callbacks that were waiting on that grace period eligible
    for invocation.

    If this CPU is the same one that detected the end of the grace
    period, and if there is but a single rcu_node in the hierarchy,
    we will still be in the single critical section. In this case,
    the memory barrier in step #4 guarantees that all callbacks will
    be seen to execute after each CPU's quiescent state.

    On the other hand, if this is a different CPU, it will acquire
    the leaf rcu_node's ->lock, and will again be serialized after
    each CPU's quiescent state for the old grace period.

    On the strength of this proof, this commit therefore removes the memory
    barriers from rcu_process_callbacks() and adds one to rcu_report_qs_rsp().
    The effect is to reduce the number of memory barriers by one and to
    reduce the frequency of execution from about once per scheduling tick
    per CPU to once per grace period.

    This was reverted do to hangs found during testing by Yinghai Lu and
    Ingo Molnar. Frederic Weisbecker supplied Yinghai with tracing that
    located the underlying problem, and Frederic also provided the fix.

    The underlying problem was that the HARDIRQ_ENTER() macro from
    lib/locking-selftest.c invoked irq_enter(), which in turn invokes
    rcu_irq_enter(), but HARDIRQ_EXIT() invoked __irq_exit(), which
    does not invoke rcu_irq_exit(). This situation resulted in calls
    to rcu_irq_enter() that were not balanced by the required calls to
    rcu_irq_exit(). Therefore, after these locking selftests completed,
    RCU's dyntick-idle nesting count was a large number (for example,
    72), which caused RCU to to conclude that the affected CPU was not in
    dyntick-idle mode when in fact it was.

    RCU would therefore incorrectly wait for this dyntick-idle CPU, resulting
    in hangs.

    In contrast, with Frederic's patch, which replaces the irq_enter()
    in HARDIRQ_ENTER() with an __irq_enter(), these tests don't ever call
    either rcu_irq_enter() or rcu_irq_exit(), which works because the CPU
    running the test is already marked as not being in dyntick-idle mode.
    This means that the rcu_irq_enter() and rcu_irq_exit() calls and RCU
    then has no problem working out which CPUs are in dyntick-idle mode and
    which are not.

    The reason that the imbalance was not noticed before the barrier patch
    was applied is that the old implementation of rcu_enter_nohz() ignored
    the nesting depth. This could still result in delays, but much shorter
    ones. Whenever there was a delay, RCU would IPI the CPU with the
    unbalanced nesting level, which would eventually result in rcu_enter_nohz()
    being called, which in turn would force RCU to see that the CPU was in
    dyntick-idle mode.

    The reason that very few people noticed the problem is that the mismatched
    irq_enter() vs. __irq_exit() occured only when the kernel was built with
    CONFIG_DEBUG_LOCKING_API_SELFTESTS.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

20 May, 2011

1 commit