12 Dec, 2011

19 commits

  • The current rcu_batch_end event trace records only the name of the RCU
    flavor and the total number of callbacks that remain queued on the
    current CPU. This is insufficient for testing and tuning the new
    dyntick-idle RCU_FAST_NO_HZ code, so this commit adds idle state along
    with whether or not any of the callbacks that were ready to invoke
    at the beginning of rcu_do_batch() are still queued.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • No point in having two identical rcu_cpu_stall_suppress declarations,
    so remove the more obscure of the two.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The rcu_do_batch() function that invokes callbacks for TREE_RCU and
    TREE_PREEMPT_RCU normally throttles callback invocation to avoid degrading
    scheduling latency. However, as long as the CPU would otherwise be idle,
    there is no downside to continuing to invoke any callbacks that have passed
    through their grace periods. In fact, processing such callbacks in a
    timely manner has the benefit of increasing the probability that the
    CPU can enter the power-saving dyntick-idle mode.

    Therefore, this commit allows callback invocation to continue beyond the
    preset limit as long as the scheduler does not have some other task to
    run and as long as context is that of the idle task or the relevant
    RCU kthread.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Because tasks don't nest, the ->dyntick_nesting must always be zero upon
    entry to rcu_idle_enter_common(). Therefore, pass "0" rather than the
    counter itself.

    Signed-off-by: Frederic Weisbecker
    Cc: Josh Triplett
    Signed-off-by: Paul E. McKenney

    Frederic Weisbecker
     
  • Because tasks do not nest, rcu_idle_enter() and rcu_idle_exit() do
    not need to check for nesting. This commit therefore moves nesting
    checks from rcu_idle_enter_common() to rcu_irq_exit() and from
    rcu_idle_exit_common() to rcu_irq_enter().

    Signed-off-by: Frederic Weisbecker
    Cc: Josh Triplett
    Signed-off-by: Paul E. McKenney

    Frederic Weisbecker
     
  • The current implementation of RCU_FAST_NO_HZ prevents CPUs from entering
    dyntick-idle state if they have RCU callbacks pending. Unfortunately,
    this has the side-effect of often preventing them from entering this
    state, especially if at least one other CPU is not in dyntick-idle state.
    However, the resulting per-tick wakeup is wasteful in many cases: if the
    CPU has already fully responded to the current RCU grace period, there
    will be nothing for it to do until this grace period ends, which will
    frequently take several jiffies.

    This commit therefore permits a CPU that has done everything that the
    current grace period has asked of it (rcu_pending() == 0) even if it
    still as RCU callbacks pending. However, such a CPU posts a timer to
    wake it up several jiffies later (6 jiffies, based on experience with
    grace-period lengths). This wakeup is required to handle situations
    that can result in all CPUs being in dyntick-idle mode, thus failing
    to ever complete the current grace period. If a CPU wakes up before
    the timer goes off, then it cancels that timer, thus avoiding spurious
    wakeups.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Fixes and workarounds for a number of issues (for example, that in
    df4012edc) make it safe to once again detect dyntick-idle CPUs on the
    first pass of force_quiescent_state(), so this commit makes that change.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Assertions in rcu_init_percpu_data() unknowingly relied on outgoing
    CPUs being turned off before reaching the idle loop. Unfortunately,
    when running under kvm/qemu on x86, CPUs really can get to idle before
    begin shut off. These CPUs are then born in dyntick-idle mode from an
    RCU perspective, which results in splats in rcu_init_percpu_data() and
    in RCU wrongly ignoring those CPUs despite them being active. This in
    turn can cause RCU to end grace periods prematurely, potentially freeing
    up memory that the newly onlined CPUs were still using. This is most
    decidedly not what we need to see in an RCU implementation.

    This commit therefore replaces the assertions in rcu_init_percpu_data()
    with code that forces RCU's dyntick-idle view of newly onlined CPUs to
    match reality.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • With the new implementation of RCU_FAST_NO_HZ, it was possible to hang
    RCU grace periods as follows:

    o CPU 0 attempts to go idle, cycles several times through the
    rcu_prepare_for_idle() loop, then goes dyntick-idle when
    RCU needs nothing more from it, while still having at least
    on RCU callback pending.

    o CPU 1 goes idle with no callbacks.

    Both CPUs can then stay in dyntick-idle mode indefinitely, preventing
    the RCU grace period from ever completing, possibly hanging the system.

    This commit therefore prevents CPUs that have RCU callbacks from entering
    dyntick-idle mode. This approach also eliminates the need for the
    end-of-grace-period IPIs used previously.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Change from direct comparison of ->pid with zero to is_idle_task().

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • RCU has traditionally relied on idle_cpu() to determine whether a given
    CPU is running in the context of an idle task, but commit 908a3283
    (Fix idle_cpu()) has invalidated this approach. After commit 908a3283,
    idle_cpu() will return true if the current CPU is currently running the
    idle task, and will be doing so for the foreseeable future. RCU instead
    needs to know whether or not the current CPU is currently running the
    idle task, regardless of what the near future might bring.

    This commit therefore switches from idle_cpu() to "current->pid != 0".

    Reported-by: Wu Fengguang
    Suggested-by: Carsten Emde
    Signed-off-by: Paul E. McKenney
    Acked-by: Steven Rostedt
    Tested-by: Wu Fengguang
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, RCU does not permit a CPU to enter dyntick-idle mode if that
    CPU has any RCU callbacks queued. This means that workloads for which
    each CPU wakes up and does some RCU updates every few ticks will never
    enter dyntick-idle mode. This can result in significant unnecessary power
    consumption, so this patch permits a given to enter dyntick-idle mode if
    it has callbacks, but only if that same CPU has completed all current
    work for the RCU core. We determine use rcu_pending() to determine
    whether a given CPU has completed all current work for the RCU core.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The current code just complains if the current task is not the idle task.
    This commit therefore adds printing of the identity of the idle task.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The trace_rcu_dyntick() trace event did not print both the old and
    the new value of the nesting level, and furthermore printed only
    the low-order 32 bits of it. This could result in some confusion
    when interpreting trace-event dumps, so this commit prints both
    the old and the new value, prints the full 64 bits, and also selects
    the process-entry/exit increment to print nicely in hexadecimal.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Report that none of the rcu read lock maps are held while in an RCU
    extended quiescent state (the section between rcu_idle_enter()
    and rcu_idle_exit()). This helps detect any use of rcu_dereference()
    and friends from within the section in idle where RCU is not allowed.

    This way we can guarantee an extended quiescent window where the CPU
    can be put in dyntick idle mode or can simply aoid to be part of any
    global grace period completion while in the idle loop.

    Uses of RCU from such mode are totally ignored by RCU, hence the
    importance of these checks.

    Signed-off-by: Frederic Weisbecker
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Lai Jiangshan
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Frederic Weisbecker
     
  • When setting up an expedited grace period, if there were no readers, the
    task will awaken itself. This commit removes this useless self-awakening.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Paul E. McKenney

    Thomas Gleixner
     
  • Because rcu_is_cpu_idle() is to be used to check for extended quiescent
    states in RCU-preempt read-side critical sections, it cannot assume that
    preemption is disabled. And preemption must be disabled when accessing
    the dyntick-idle state, because otherwise the following sequence of events
    could occur:

    1. Task A on CPU 1 enters rcu_is_cpu_idle() and picks up the pointer
    to CPU 1's per-CPU variables.

    2. Task B preempts Task A and starts running on CPU 1.

    3. Task A migrates to CPU 2.

    4. Task B blocks, leaving CPU 1 idle.

    5. Task A continues execution on CPU 2, accessing CPU 1's dyntick-idle
    information using the pointer fetched in step 1 above, and finds
    that CPU 1 is idle.

    6. Task A therefore incorrectly concludes that it is executing in
    an extended quiescent state, possibly issuing a spurious splat.

    Therefore, this commit disables preemption within the rcu_is_cpu_idle()
    function.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Earlier versions of RCU used the scheduling-clock tick to detect idleness
    by checking for the idle task, but handled idleness differently for
    CONFIG_NO_HZ=y. But there are now a number of uses of RCU read-side
    critical sections in the idle task, for example, for tracing. A more
    fine-grained detection of idleness is therefore required.

    This commit presses the old dyntick-idle code into full-time service,
    so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
    always invoked at the beginning of an idle loop iteration. Similarly,
    rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
    at the end of an idle-loop iteration. This allows the idle task to
    use RCU everywhere except between consecutive rcu_idle_enter() and
    rcu_idle_exit() calls, in turn allowing architecture maintainers to
    specify exactly where in the idle loop that RCU may be used.

    Because some of the userspace upcall uses can result in what looks
    to RCU like half of an interrupt, it is not possible to expect that
    the irq_enter() and irq_exit() hooks will give exact counts. This
    patch therefore expands the ->dynticks_nesting counter to 64 bits
    and uses two separate bitfields to count process/idle transitions
    and interrupt entry/exit transitions. It is presumed that userspace
    upcalls do not happen in the idle loop or from usermode execution
    (though usermode might do a system call that results in an upcall).
    The counter is hard-reset on each process/idle transition, which
    avoids the interrupt entry/exit error from accumulating. Overflow
    is avoided by the 64-bitness of the ->dyntick_nesting counter.

    This commit also adds warnings if a non-idle task asks RCU to enter
    idle state (and these checks will need some adjustment before applying
    Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
    In addition, validation of ->dynticks and ->dynticks_nesting is added.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The ->signaled field was named before complications in the form of
    dyntick-idle mode and offlined CPUs. These complications have required
    that force_quiescent_state() be implemented as a state machine, instead
    of simply unconditionally sending reschedule IPIs. Therefore, this
    commit renames ->signaled to ->fqs_state to catch up with the new
    force_quiescent_state() reality.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

31 Oct, 2011

1 commit

  • The changed files were only including linux/module.h for the
    EXPORT_SYMBOL infrastructure, and nothing else. Revector them
    onto the isolated export header for faster compile times.

    Nothing to see here but a whole lot of instances of:

    -#include
    +#include

    This commit is only changing the kernel dir; next targets
    will probably be mm, fs, the arch dirs, etc.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

29 Sep, 2011

18 commits

  • It is possible for the CPU that noted the end of the prior grace period
    to not need a new one, and therefore to decide to propagate ->completed
    throughout the rcu_node tree without starting another grace period.
    However, in so doing, it releases the root rcu_node structure's lock,
    which can allow some other CPU to start another grace period. The first
    CPU will be propagating ->completed in parallel with the second CPU
    initializing the rcu_node tree for the new grace period. In theory
    this is harmless, but in practice we need to keep things simple.

    This commit therefore moves the propagation of ->completed to
    rcu_report_qs_rsp(), and refrains from marking the old grace period
    as having been completed until it has finished doing this. This
    prevents anyone from starting a new grace period concurrently with
    marking the old grace period as having been completed.

    Of course, the optimization where a CPU needing a new grace period
    doesn't bother marking the old one completed is still in effect:
    In that case, the marking happens implicitly as part of initializing
    the new grace period.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The purpose of rcu_needs_cpu_flush() was to iterate on pushing the
    current grace period in order to help the current CPU enter dyntick-idle
    mode. However, this can result in failures if the CPU starts entering
    dyntick-idle mode, but then backs out. In this case, the call to
    rcu_pending() from rcu_needs_cpu_flush() might end up announcing a
    non-existing quiescent state.

    This commit therefore removes rcu_needs_cpu_flush() in favor of letting
    the dyntick-idle machinery at the end of the softirq handler push the
    loop along via its call to rcu_pending().

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • RCU boost threads start life at RCU_BOOST_PRIO, while others remain
    at RCU_KTHREAD_PRIO. While here, change thread names to match other
    kthreads, and adjust rcu_yield() to not override the priority set by
    the user. This last change sets the stage for runtime changes to
    priority in the -rt tree.

    Signed-off-by: Mike Galbraith
    Signed-off-by: Paul E. McKenney

    Mike Galbraith
     
  • CPUs set rdp->qs_pending when coming online to resolve races with
    grace-period start. However, this means that if RCU is idle, the
    just-onlined CPU might needlessly send itself resched IPIs. Adjust
    the online-CPU initialization to avoid this, and also to correctly
    cause the CPU to respond to the current grace period if needed.

    Signed-off-by: Paul E. McKenney
    Tested-by: Josh Boyer
    Tested-by: Christian Hoffmann

    Paul E. McKenney
     
  • It is possible for an RCU CPU stall to end just as it is detected, in
    which case the current code will uselessly dump all CPU's stacks.
    This commit therefore checks for this condition and refrains from
    sending needless NMIs.

    And yes, the stall might also end just after we checked all CPUs and
    tasks, but in that case we would at least have given some clue as
    to which CPU/task was at fault.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Greater use of RCU during early boot (before the scheduler is operating)
    is causing RCU to attempt to start grace periods during that time, which
    in turn is resulting in both RCU and the callback functions attempting
    to use the scheduler before it is ready.

    This commit prevents these problems by prohibiting RCU grace periods
    until after the scheduler has spawned the first non-idle task.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • There isn't a whole lot of point in poking the scheduler before there
    are other tasks to switch to. This commit therefore adds a check
    for rcu_scheduler_fully_active in __rcu_pending() to suppress any
    pre-scheduler calls to set_need_resched(). The downside of this approach
    is additional runtime overhead in a reasonably hot code path.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The trigger_all_cpu_backtrace() function is a no-op in architectures that
    do not define arch_trigger_all_cpu_backtrace. On such architectures, RCU
    CPU stall warning messages contain no stack trace information, which makes
    debugging quite difficult. This commit therefore substitutes dump_stack()
    for architectures that do not define arch_trigger_all_cpu_backtrace,
    so that at least the local CPU's stack is dumped as part of the RCU CPU
    stall warning message.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • When the ->dynticks field in the rcu_dynticks structure changed to an
    atomic_t, its size on 64-bit systems changed from 64 bits to 32 bits.
    The local variables in rcu_implicit_dynticks_qs() need to change as
    well, hence this commit.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The in_irq() check in rcu_enter_nohz() is redundant because if we really
    are in an interrupt, the attempt to re-enter dyntick-idle mode will invoke
    rcu_needs_cpu() in any case, which will force the check for RCU callbacks.
    So this commit removes the check along with the set_need_resched().

    Suggested-by: Frederic Weisbecker
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • There is often a delay between the time that a CPU passes through a
    quiescent state and the time that this quiescent state is reported to the
    RCU core. It is quite possible that the grace period ended before the
    quiescent state could be reported, for example, some other CPU might have
    deduced that this CPU passed through dyntick-idle mode. It is critically
    important that quiescent state be counted only against the grace period
    that was in effect at the time that the quiescent state was detected.

    Previously, this was handled by recording the number of the last grace
    period to complete when passing through a quiescent state. The RCU
    core then checks this number against the current value, and rejects
    the quiescent state if there is a mismatch. However, one additional
    possibility must be accounted for, namely that the quiescent state was
    recorded after the prior grace period completed but before the current
    grace period started. In this case, the RCU core must reject the
    quiescent state, but the recorded number will match. This is handled
    when the CPU becomes aware of a new grace period -- at that point,
    it invalidates any prior quiescent state.

    This works, but is a bit indirect. The new approach records the current
    grace period, and the RCU core checks to see (1) that this is still the
    current grace period and (2) that this grace period has not yet ended.
    This approach simplifies reasoning about correctness, and this commit
    changes over to this new approach.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Add trace events to record grace-period start and end, quiescent states,
    CPUs noticing grace-period start and end, grace-period initialization,
    call_rcu() invocation, tasks blocking in RCU read-side critical sections,
    tasks exiting those same critical sections, force_quiescent_state()
    detection of dyntick-idle and offline CPUs, CPUs entering and leaving
    dyntick-idle mode (except from NMIs), CPUs coming online and going
    offline, and CPUs being kicked for staying in dyntick-idle mode for too
    long (as in many weeks, even on 32-bit systems).

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    rcu: Add the rcu flavor to callback trace events

    The earlier trace events for registering RCU callbacks and for invoking
    them did not include the RCU flavor (rcu_bh, rcu_preempt, or rcu_sched).
    This commit adds the RCU flavor to those trace events.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • We now have kthreads only for flavors of RCU that support boosting,
    so update the now-misleading comments accordingly.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Add a string to the rcu_batch_start() and rcu_batch_end() trace
    messages that indicates the RCU type ("rcu_sched", "rcu_bh", or
    "rcu_preempt"). The trace messages for the actual invocations
    themselves are not marked, as it should be clear from the
    rcu_batch_start() and rcu_batch_end() events before and after.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • In order to allow event tracing to distinguish between flavors of
    RCU, we need those names in the relevant RCU data structures. TINY_RCU
    has avoided them for memory-footprint reasons, so add them only if
    CONFIG_RCU_TRACE=y.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit adds the trace_rcu_utilization() marker that is to be
    used to allow postprocessing scripts compute RCU's CPU utilization,
    give or take event-trace overhead. Note that we do not include RCU's
    dyntick-idle interface because event tracing requires RCU protection,
    which is not available in dyntick-idle mode.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • There was recently some controversy about the overhead of invoking RCU
    callbacks. Add TRACE_EVENT()s to obtain fine-grained timings for the
    start and stop of a batch of callbacks and also for each callback invoked.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Pull the code that waits for an RCU grace period into a single function,
    which is then called by synchronize_rcu() and friends in the case of
    TREE_RCU and TREE_PREEMPT_RCU, and from rcu_barrier() and friends in
    the case of TINY_RCU and TINY_PREEMPT_RCU.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

13 Jul, 2011

1 commit

  • Under some rare but real combinations of configuration parameters, RCU
    callbacks are posted during early boot that use kernel facilities that
    are not yet initialized. Therefore, when these callbacks are invoked,
    hard hangs and crashes ensue. This commit therefore prevents RCU
    callbacks from being invoked until after the scheduler is fully up and
    running, as in after multiple tasks have been spawned.

    It might well turn out that a better approach is to identify the specific
    RCU callbacks that are causing this problem, but that discussion will
    wait until such time as someone really needs an RCU callback to be invoked
    (as opposed to merely registered) during early boot.

    Reported-by: julie Sullivan
    Reported-by: RKK
    Signed-off-by: Paul E. McKenney
    Tested-by: Konrad Rzeszutek Wilk
    Tested-by: julie Sullivan
    Tested-by: RKK

    Paul E. McKenney
     

17 Jun, 2011

1 commit

  • The commit "use softirq instead of kthreads except when RCU_BOOST=y"
    just applied #ifdef in place. This commit is a cleanup that moves
    the newly #ifdef'ed code to the header file kernel/rcutree_plugin.h.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney