06 Jul, 2012

2 commits

  • The Linux kernel coding style says that single-statement blocks should
    omit curly braces unless the other leg of the "if" statement has
    multiple statements, in which case the curly braces should be included.
    This commit fixes RCU's violations of this rule.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • …a' and 'fnh.2012.07.02a' into HEAD

    bigrtm: First steps towards getting RCU out of the way of
    tens-of-microseconds real-time response on systems compiled
    with NR_CPUS=4096. Also cleanups for and increased concurrency
    of rcu_barrier() family of primitives.
    doctorture: rcutorture and documentation improvements.
    fixes: Miscellaneous fixes.
    fnh: RCU_FAST_NO_HZ fixes and improvements.

    Paul E. McKenney
     

03 Jul, 2012

11 commits

  • If the nohz= boot parameter disables nohz, then RCU_FAST_NO_HZ needs to
    also disable itself. This commit therefore checks for tick_nohz_enabled
    being zero, disabling rcu_prepare_for_idle() if so. This commit assumes
    that tick_nohz_enabled can change at runtime: If this is not the case,
    then a simpler approach suffices.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, if several CPUs in the same package have all lazy RCU
    callbacks, their wakeups will be uncorrelated. If all the CPUs are in the
    same power domain (as is often the case), this will result in unnecessary
    power-ups of the package. This commit therefore uses round_jiffies()
    to round the timeouts to a second boundary, increasing the odds that
    they can be coalesced with each other or with other timeouts.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • An uninitialized string may be displayed at the end of the rcu_preempt
    detected stall info such as

    0: (1 GPs behind) idle=075/140000000000000/0 =8?^D=8?^D
    ^^^^^^^^^^
    if CONFIG_RCU_FAST_NO_HZ is not defined.

    This trivial patch clears the string in this case.

    Signed-off-by: Carsten Emde
    Signed-off-by: Paul E. McKenney

    Carsten Emde
     
  • The CONFIG_TREE_PREEMPT_RCU and CONFIG_TINY_PREEMPT_RCU versions of
    __rcu_read_lock() and __rcu_read_unlock() are identical, so this commit
    consolidates them into kernel/rcupdate.h.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The arrival of TREE_PREEMPT_RCU some years back included some ugly
    code involving either #ifdef or #ifdef'ed wrapper functions to iterate
    over all non-SRCU flavors of RCU. This commit therefore introduces
    a for_each_rcu_flavor() iterator over the rcu_state structures for each
    flavor of RCU to clean up a bit of the ugliness.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • With the advent of __this_cpu_ptr(), it is no longer necessary to pass
    both the rcu_state and rcu_data structures into __rcu_process_callbacks().
    This commit therefore computes the rcu_data pointer from the rcu_state
    pointer within __rcu_process_callbacks() so that callers can pass in
    only the pointer to the rcu_state structure. This paves the way for
    linking the rcu_state structures together and iterating over them.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • This is a preparatory commit for increasing rcu_barrier()'s concurrency.
    It adds a pointer in the rcu_data structure to the corresponding call_rcu()
    function. This allows a pointer to the rcu_data structure to imply the
    function pointer, which allows _rcu_barrier() state to be placed in the
    rcu_state structure.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The rcu_node tree array is sized based on compile-time constants,
    including NR_CPUS. Although this approach has worked well in the past,
    the recent trend by many distros to define NR_CPUS=4096 results in
    excessive grace-period-initialization latencies.

    This commit therefore substitutes the run-time computed nr_cpu_ids for
    the compile-time NR_CPUS when building the tree. This can result in
    much of the compile-time-allocated rcu_node array being unused. If
    this is a major problem, you are in a specialized situation anyway,
    so you can manually adjust the NR_CPUS, RCU_FANOUT, and RCU_FANOUT_LEAF
    kernel config parameters.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Time to make the four-level-hierarchy setting less scary, so this
    commit removes "Experimental" from the boot-time message. Leave the
    message in order to get a heads-up on any possible need to expand to
    a five-level hierarchy.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Although making RCU_FANOUT_LEAF a kernel configuration parameter rather
    than a fixed constant makes it easier for people to decrease cache-miss
    overhead for large systems, it is of little help for people who must
    run a single pre-built kernel binary.

    This commit therefore allows the value of RCU_FANOUT_LEAF to be
    increased (but not decreased!) via a boot-time parameter named
    rcutree.rcu_fanout_leaf.

    Reported-by: Mike Galbraith
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This reverts commit 616c310e83b872024271c915c1b9ab505b9efad9.
    (Move PREEMPT_RCU preemption to switch_to() invocation).
    Testing by Sasha Levin showed that this
    can result in deadlock due to invoking the scheduler when one of
    the runqueue locks is held. Because this commit was simply a
    performance optimization, revert it.

    Reported-by: Sasha Levin
    Signed-off-by: Paul E. McKenney
    Tested-by: Sasha Levin

    Paul E. McKenney
     

07 Jun, 2012

3 commits

  • When a CPU is entering dyntick-idle mode, tick_nohz_stop_sched_tick()
    calls rcu_needs_cpu() see if RCU needs that CPU, and, if not, computes the
    next wakeup time based on the timer wheels. Only later, when actually
    entering the idle loop, rcu_prepare_for_idle() will be invoked. In some
    cases, rcu_prepare_for_idle() will post timers to wake the CPU back up.
    But all for naught: The next wakeup time for the CPU has already been
    computed, and posting a timer afterwards does not force that wakeup
    time to be recomputed. This means that rcu_prepare_for_idle()'s have
    no effect.

    This is not a problem on a busy system because something else will wake
    up the CPU soon enough. However, on lightly loaded systems, the CPU
    might stay asleep for a considerable length of time. If that CPU has
    a callback that the rest of the system is waiting on, the system might
    run very slowly or (in theory) even hang.

    This commit avoids this problem by having rcu_needs_cpu() give
    tick_nohz_stop_sched_tick() an estimate of when RCU will need the CPU
    to wake back up, which tick_nohz_stop_sched_tick() takes into account
    when programming the CPU's wakeup time. An alternative approach is
    for rcu_prepare_for_idle() to use hrtimers instead of normal timers,
    but timers are much more efficient than are hrtimers for frequently
    and repeatedly posting and cancelling a given timer, which is exactly
    what RCU_FAST_NO_HZ does.

    Reported-by: Pascal Chapperon
    Reported-by: Heiko Carstens
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Heiko Carstens
    Tested-by: Pascal Chapperon

    Paul E. McKenney
     
  • The RCU_FAST_NO_HZ code relies on a number of per-CPU variables.
    This works, but is hidden from someone scanning the data structures
    in rcutree.h. This commit therefore converts these per-CPU variables
    to fields in the per-CPU rcu_dynticks structures.

    Suggested-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Heiko Carstens
    Tested-by: Pascal Chapperon

    Paul E. McKenney
     
  • In the current code, a short dyntick-idle interval (where there is
    at least one non-lazy callback on the CPU) and a long dyntick-idle
    interval (where there are only lazy callbacks on the CPU) are traced
    identically, which can be less than helpful. This commit therefore
    emits different event traces in these two cases.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Heiko Carstens
    Tested-by: Pascal Chapperon

    Paul E. McKenney
     

12 May, 2012

1 commit

  • …and 'srcu.2012.05.07b' into HEAD

    barrier: Reduce the amount of disturbance by rcu_barrier() to the rest of
    the system. This branch also includes improvements to
    RCU_FAST_NO_HZ, which are included here due to conflicts.
    fixes: Miscellaneous fixes.
    inline: Remaining changes from an abortive attempt to inline
    preemptible RCU's __rcu_read_lock(). These are (1) making
    exit_rcu() avoid unnecessary work and (2) avoiding having
    preemptible RCU record a blocked thread when the scheduler
    declines to do a context switch.
    srcu: Lai Jiangshan's algorithmic implementation of SRCU, including
    call_srcu().

    Paul E. McKenney
     

10 May, 2012

2 commits

  • The current initialization of the RCU_FAST_NO_HZ per-CPU variables makes
    needless and fragile assumptions about the initial value of things like
    the jiffies counter. This commit therefore explicitly initializes all of
    them that are better started with a non-zero value. It also adds some
    comments describing the per-CPU state variables.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The current RCU_FAST_NO_HZ assumes that timers do not migrate unless a
    CPU goes offline, in which case it assumes that the CPU will have to come
    out of dyntick-idle mode (cancelling the timer) in order to go offline.
    This is important because when RCU_FAST_NO_HZ permits a CPU to enter
    dyntick-idle mode despite having RCU callbacks pending, it posts a timer
    on that CPU to force a wakeup on that CPU. This wakeup ensures that the
    CPU will eventually handle the end of the grace period, including invoking
    its RCU callbacks.

    However, Pascal Chapperon's test setup shows that the timer handler
    rcu_idle_gp_timer_func() really does get invoked in some cases. This is
    problematic because this can cause the CPU that entered dyntick-idle
    mode despite still having RCU callbacks pending to remain in
    dyntick-idle mode indefinitely, which means that its RCU callbacks might
    never be invoked. This situation can result in grace-period delays or
    even system hangs, which matches Pascal's observations of slow boot-up
    and shutdown (https://lkml.org/lkml/2012/4/5/142). See also the bugzilla:

    https://bugzilla.redhat.com/show_bug.cgi?id=806548

    This commit therefore causes the "should never be invoked" timer handler
    rcu_idle_gp_timer_func() to use smp_call_function_single() to wake up
    the CPU for which the timer was intended, allowing that CPU to invoke
    its RCU callbacks in a timely manner.

    Reported-by: Pascal Chapperon
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

03 May, 2012

2 commits

  • When running preemptible RCU, if a task exits in an RCU read-side
    critical section having blocked within that same RCU read-side critical
    section, the task must be removed from the list of tasks blocking a
    grace period (perhaps the current grace period, perhaps the next grace
    period, depending on timing). The exit() path invokes exit_rcu() to
    do this cleanup.

    However, the current implementation of exit_rcu() needlessly does the
    cleanup even if the task did not block within the current RCU read-side
    critical section, which wastes time and needlessly increases the size
    of the state space. Fix this by only doing the cleanup if the current
    task is actually on the list of tasks blocking some grace period.

    While we are at it, consolidate the two identical exit_rcu() functions
    into a single function.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Linus Torvalds

    Conflicts:

    kernel/rcupdate.c

    Paul E. McKenney
     
  • Currently, PREEMPT_RCU readers are enqueued upon entry to the scheduler.
    This is inefficient because enqueuing is required only if there is a
    context switch, and entry to the scheduler does not guarantee a context
    switch.

    The commit therefore moves the enqueuing to immediately precede the
    call to switch_to() from the scheduler.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Linus Torvalds

    Paul E. McKenney
     

01 May, 2012

1 commit

  • Timers are subject to migration, which can lead to the following
    system-hang scenario when CONFIG_RCU_FAST_NO_HZ=y:

    1. CPU 0 executes synchronize_rcu(), which posts an RCU callback.

    2. CPU 0 then goes idle. It cannot immediately invoke the callback,
    but there is nothing RCU needs from ti, so it enters dyntick-idle
    mode after posting a timer.

    3. The timer gets migrated to CPU 1.

    4. CPU 0 never wakes up, so the synchronize_rcu() never returns, so
    the system hangs.

    This commit fixes this problem by using mod_timer_pinned(), as suggested
    by Peter Zijlstra, to ensure that the timer is actually posted on the
    running CPU.

    Reported-by: Dipankar Sarma
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

26 Apr, 2012

1 commit

  • RCU_FAST_NO_HZ uses a timer to limit the time that a CPU with callbacks
    can remain in dyntick-idle mode. This timer is cancelled when the CPU
    exits idle, and therefore should never fire. However, if the timer
    were migrated to some other CPU for whatever reason (1) the timer could
    actually fire and (2) firing on some other CPU would fail to wake up the
    CPU with callbacks, possibly resulting in sluggishness or a system hang.

    This commit therfore adds a WARN_ON_ONCE() to the timer handler in order
    to detect this condition.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

25 Apr, 2012

3 commits

  • Both Steven Rostedt's new idle-capable trace macros and the RCU_NONIDLE()
    macro can cause RCU to momentarily pause out of idle without the rest
    of the system being involved. This can cause rcu_prepare_for_idle()
    to run through its state machine too quickly, which can in turn result
    in needless scheduling-clock interrupts.

    This commit therefore adds code to enable rcu_prepare_for_idle() to
    distinguish between an initial entry to idle on the one hand (which needs
    to advance the rcu_prepare_for_idle() state machine) and an idle reentry
    due to idle-capable trace macros and RCU_NONIDLE() on the other hand
    (which should avoid advancing the rcu_prepare_for_idle() state machine).
    Additional state is maintained to allow the timer to be correctly reposted
    when returning after a momentary pause out of idle, and even more state
    is maintained to detect when new non-lazy callbacks have been enqueued
    (which may require re-evaluation of the approach to idleness).

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The RCU_FAST_NO_HZ facility uses an hrtimer to wake up a CPU when
    it is allowed to go into dyntick-idle mode, which is almost always
    cancelled soon after. This is not what hrtimers are good at, so
    this commit switches to the timer wheel.

    Reported-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Traces of rcu_prep_idle events can be confusing because
    rcu_cleanup_after_idle() does no tracing. This commit therefore adds
    this tracing.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

22 Feb, 2012

14 commits

  • This commit handles workloads that transition quickly between idle and
    non-idle, and where the CPU's callbacks cannot be invoked, but where
    RCU does not have anything immediate for the CPU to do. Without this
    patch, the RCU_FAST_NO_HZ code can be invoked repeatedly on each entry
    to idle. The commit sets the per-CPU rcu_dyntick_holdoff variable to
    hold off further attempts for a tick.

    Reported-by: "Abou Gazala, Neven M"
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • If a softirq is pending, the current CPU has RCU callbacks pending,
    and RCU does not immediately need anything from this CPU, then the
    current code resets the RCU_FAST_NO_HZ state machine. This means that
    upon exit from the subsequent softirq handler, RCU_FAST_NO_HZ will
    try really hard to force RCU into dyntick-idle mode. And if the same
    conditions hold after a few tries (determined by RCU_IDLE_OPT_FLUSHES),
    the same situation can repeat, possibly endlessly. This scenario is
    not particularly good for battery lifetime.

    This commit therefore suppresses the early exit from the RCU_FAST_NO_HZ
    state machine in the case where there is a softirq pending. This change
    forces the state machine to retain its memory, and to enter holdoff if
    this condition persists.

    Reported-by: "Abou Gazala, Neven M"
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The expedited RCU primitives can be quite useful, but they have some
    high costs as well. This commit updates and creates docbook comments
    calling out the costs, and updates the RCU documentation as well.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Because newly offlined CPUs continue executing after completing the
    CPU_DYING notifiers, they legitimately enter the scheduler and use
    RCU while appearing to be offline. This calls for a more sophisticated
    approach as follows:

    1. RCU marks the CPU online during the CPU_UP_PREPARE phase.

    2. RCU marks the CPU offline during the CPU_DEAD phase.

    3. Diagnostics regarding use of read-side RCU by offline CPUs use
    RCU's accounting rather than the cpu_online_map. (Note that
    __call_rcu() still uses cpu_online_map to detect illegal
    invocations within CPU_DYING notifiers.)

    4. Offline CPUs are prevented from hanging the system by
    force_quiescent_state(), which pays attention to cpu_online_map.
    Some additional work (in a later commit) will be needed to
    guarantee that force_quiescent_state() waits a full jiffy before
    assuming that a CPU is offline, for example, when called from
    idle entry. (This commit also makes the one-jiffy wait
    explicit, since the old-style implicit wait can now be defeated
    by RCU_FAST_NO_HZ and by rcutorture.)

    This approach avoids the false positives encountered when attempting to
    use more exact classification of CPU online/offline state.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The rcu_prepare_for_idle() function is always called with interrupts
    disabled, so there is no reason to disable interrupts again within
    rcu_prepare_for_idle(). Therefore, this commit removes all of the
    interrupt disabling, also removing a latent disabling-unbalance bug.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Now that TREE_RCU and TREE_PREEMPT_RCU no longer do anything different
    for the single-CPU case, there is no need for multiple definitions of
    synchronize_sched_expedited(). It is no longer in any sense a plug-in,
    so move it from kernel/rcutree_plugin.h to kernel/rcutree.c.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Although it is legal to use RCU during early boot, it is anything
    but legal to use RCU at runtime from an offlined CPU. After all, RCU
    explicitly ignores offlined CPUs. This commit therefore adds checks
    for runtime use of RCU from offlined CPUs.

    These checks are not perfect, in particular, they can be subverted
    through use of things like rcu_dereference_raw(). Note that it is not
    possible to put checks in rcu_read_lock() and friends due to the fact
    that these primitives are used in code that might be used under either
    RCU or lock-based protection, which means that checking rcu_read_lock()
    gets you fat piles of false positives.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • There have been situations where RCU CPU stall warnings were caused by
    issues in scheduling-clock timer initialization. To make it easier to
    track these down, this commit causes the RCU CPU stall-warning messages
    to print out the number of scheduling-clock interrupts taken in the
    current grace period for each stalled CPU.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Now that both TINY_RCU and TINY_PREEMPT_RCU have been in place for awhile,
    it is time to remove UP support from TREE_RCU, which is what this commit
    does.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The recent updates to RCU_CPU_FAST_NO_HZ have an rcu_needs_cpu() that
    does more than just check for callbacks, so get the name for
    rcu_preempt_needs_cpu() consistent with that change, now calling it
    rcu_preempt_cpu_has_callbacks().

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, a given CPU is permitted to remain in dyntick-idle mode
    indefinitely if it has only lazy RCU callbacks queued. This is vulnerable
    to corner cases in NUMA systems, so limit the time to six seconds by
    default. (Currently controlled by a cpp macro.)

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Move ->qsmaskinit and blkd_tasks[] manipulation to the CPU_DYING
    notifier. This simplifies the code by eliminating a potential
    deadlock and by reducing the responsibilities of force_quiescent_state().
    Also rename functions to make their connection to the CPU-hotplug
    stages explicit.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • When CONFIG_RCU_FAST_NO_HZ is enabled, RCU will allow a given CPU to
    enter dyntick-idle mode even if it still has RCU callbacks queued.
    RCU avoids system hangs in this case by scheduling a timer for several
    jiffies in the future. However, if all of the callbacks on that CPU
    are from kfree_rcu(), there is no reason to wake the CPU up, as it is
    not a problem to defer freeing of memory.

    This commit therefore tracks the number of callbacks on a given CPU
    that are from kfree_rcu(), and avoids scheduling the timer if all of
    a given CPU's callbacks are from kfree_rcu().

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • It is illegal to have a grace period within a same-flavor RCU read-side
    critical section, so this commit adds lockdep-RCU checks to splat when
    such abuse is encountered. This commit does not detect more elaborate
    RCU deadlock situations. These situations might be a job for lockdep
    enhancements.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney