29 Sep, 2011

40 commits

  • The purpose of rcu_needs_cpu_flush() was to iterate on pushing the
    current grace period in order to help the current CPU enter dyntick-idle
    mode. However, this can result in failures if the CPU starts entering
    dyntick-idle mode, but then backs out. In this case, the call to
    rcu_pending() from rcu_needs_cpu_flush() might end up announcing a
    non-existing quiescent state.

    This commit therefore removes rcu_needs_cpu_flush() in favor of letting
    the dyntick-idle machinery at the end of the softirq handler push the
    loop along via its call to rcu_pending().

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • RCU boost threads start life at RCU_BOOST_PRIO, while others remain
    at RCU_KTHREAD_PRIO. While here, change thread names to match other
    kthreads, and adjust rcu_yield() to not override the priority set by
    the user. This last change sets the stage for runtime changes to
    priority in the -rt tree.

    Signed-off-by: Mike Galbraith
    Signed-off-by: Paul E. McKenney

    Mike Galbraith
     
  • One of the loops in rcu_torture_boost() fails to check kthread_should_stop(),
    and thus might be slowing or even stopping completion of rcutorture tests
    at rmmod time. This commit adds the kthread_should_stop() check to the
    offending loop.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The rcu_torture_fqs() function can prevent the rcutorture tests from
    completing, resulting in a hang. This commit therefore ensures that
    rcu_torture_fqs() will exit its inner loops at the end of the test,
    and also applies the newish ULONG_CMP_LT() macro to time comparisons.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Create a separate lockdep class for the rt_mutex used for RCU priority
    boosting and enable use of rt_mutex_lock() with irqs disabled. This
    prevents RCU priority boosting from falling prey to deadlocks when
    someone begins an RCU read-side critical section in preemptible state,
    but releases it with an irq-disabled lock held.

    Unfortunately, the scheduler's runqueue and priority-inheritance locks
    still must either completely enclose or be completely enclosed by any
    overlapping RCU read-side critical section.

    This version removes a redundant local_irq_restore() noted by
    Yong Zhang.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • CPUs set rdp->qs_pending when coming online to resolve races with
    grace-period start. However, this means that if RCU is idle, the
    just-onlined CPU might needlessly send itself resched IPIs. Adjust
    the online-CPU initialization to avoid this, and also to correctly
    cause the CPU to respond to the current grace period if needed.

    Signed-off-by: Paul E. McKenney
    Tested-by: Josh Boyer
    Tested-by: Christian Hoffmann

    Paul E. McKenney
     
  • It is possible for an RCU CPU stall to end just as it is detected, in
    which case the current code will uselessly dump all CPU's stacks.
    This commit therefore checks for this condition and refrains from
    sending needless NMIs.

    And yes, the stall might also end just after we checked all CPUs and
    tasks, but in that case we would at least have given some clue as
    to which CPU/task was at fault.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Greater use of RCU during early boot (before the scheduler is operating)
    is causing RCU to attempt to start grace periods during that time, which
    in turn is resulting in both RCU and the callback functions attempting
    to use the scheduler before it is ready.

    This commit prevents these problems by prohibiting RCU grace periods
    until after the scheduler has spawned the first non-idle task.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Commit 7765be (Fix RCU_BOOST race handling current->rcu_read_unlock_special)
    introduced a new ->rcu_boosted field in the task structure. This is
    redundant because the existing ->rcu_boost_mutex will be non-NULL at
    any time that ->rcu_boosted is nonzero. Therefore, this commit removes
    ->rcu_boosted and tests ->rcu_boost_mutex instead.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • There isn't a whole lot of point in poking the scheduler before there
    are other tasks to switch to. This commit therefore adds a check
    for rcu_scheduler_fully_active in __rcu_pending() to suppress any
    pre-scheduler calls to set_need_resched(). The downside of this approach
    is additional runtime overhead in a reasonably hot code path.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The trigger_all_cpu_backtrace() function is a no-op in architectures that
    do not define arch_trigger_all_cpu_backtrace. On such architectures, RCU
    CPU stall warning messages contain no stack trace information, which makes
    debugging quite difficult. This commit therefore substitutes dump_stack()
    for architectures that do not define arch_trigger_all_cpu_backtrace,
    so that at least the local CPU's stack is dumped as part of the RCU CPU
    stall warning message.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • We only need to constrain the compiler if we are actually exiting
    the top-level RCU read-side critical section. This commit therefore
    moves the first barrier() cal in __rcu_read_unlock() to inside the
    "if" statement, thus avoiding needless register flushes for inner
    rcu_read_unlock() calls.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The differences between rcu_assign_pointer() and RCU_INIT_POINTER() are
    subtle, and it is easy to use the the cheaper RCU_INIT_POINTER() when
    the more-expensive rcu_assign_pointer() should have been used instead.
    The consequences of this mistake are quite severe.

    This commit therefore carefully lays out the situations in which it it
    permissible to use RCU_INIT_POINTER().

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Recent changes to gcc give warning messages on rcu_assign_pointers()'s
    checks that allow it to determine when it is OK to omit the memory
    barrier. Stephen Hemminger tried a number of gcc tricks to silence
    this warning, but #pragmas and CPP macros do not work together in the
    way that would be required to make this work.

    However, we now have RCU_INIT_POINTER(), which already omits this
    memory barrier, and which therefore may be used when assigning NULL to
    an RCU-protected pointer that is accessible to readers. This commit
    therefore makes rcu_assign_pointer() unconditionally emit the memory
    barrier.

    Reported-by: Stephen Hemminger
    Signed-off-by: Eric Dumazet
    Acked-by: David S. Miller
    Signed-off-by: Paul E. McKenney

    Eric Dumazet
     
  • When the ->dynticks field in the rcu_dynticks structure changed to an
    atomic_t, its size on 64-bit systems changed from 64 bits to 32 bits.
    The local variables in rcu_implicit_dynticks_qs() need to change as
    well, hence this commit.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The in_irq() check in rcu_enter_nohz() is redundant because if we really
    are in an interrupt, the attempt to re-enter dyntick-idle mode will invoke
    rcu_needs_cpu() in any case, which will force the check for RCU callbacks.
    So this commit removes the check along with the set_need_resched().

    Suggested-by: Frederic Weisbecker
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • RCU no longer uses this global variable, nor does anyone else. This
    commit therefore removes this variable. This reduces memory footprint
    and also removes some atomic instructions and memory barriers from
    the dyntick-idle path.

    Signed-off-by: Alex Shi
    Signed-off-by: Paul E. McKenney

    Shi, Alex
     
  • There has been quite a bit of confusion about what RCU-lockdep splats
    mean, so this commit adds some documentation describing how to
    interpret them.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • When rcutorture is compiled directly into the kernel
    (instead of separately as a module), it is necessary to specify
    rcutorture.stat_interval as a kernel command-line parameter, otherwise,
    the rcu_torture_stats kthread is never started. However, when working
    with the system after it has booted, it is convenient to be able to
    change the time between statistic printing, particularly when logged
    into the console.

    This commit therefore allows the stat_interval parameter to be changed
    at runtime.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The rcu_dereference_bh_protected() and rcu_dereference_sched_protected()
    macros are synonyms for rcu_dereference_protected() and are not used
    anywhere in mainline. This commit therefore removes them.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Add documentation for rcu_dereference_bh_check(),
    rcu_dereference_sched_check(), srcu_dereference_check(), and
    rcu_dereference_index_check().

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Since ca5ecddf (rcu: define __rcu address space modifier for sparse)
    rcu_dereference_check() use rcu_read_lock_held() as a part of condition
    automatically. Therefore, callers of rcu_dereference_check() no longer
    need to pass rcu_read_lock_held() to rcu_dereference_check().

    Signed-off-by: Michal Hocko
    Signed-off-by: Paul E. McKenney

    Michal Hocko
     
  • There is often a delay between the time that a CPU passes through a
    quiescent state and the time that this quiescent state is reported to the
    RCU core. It is quite possible that the grace period ended before the
    quiescent state could be reported, for example, some other CPU might have
    deduced that this CPU passed through dyntick-idle mode. It is critically
    important that quiescent state be counted only against the grace period
    that was in effect at the time that the quiescent state was detected.

    Previously, this was handled by recording the number of the last grace
    period to complete when passing through a quiescent state. The RCU
    core then checks this number against the current value, and rejects
    the quiescent state if there is a mismatch. However, one additional
    possibility must be accounted for, namely that the quiescent state was
    recorded after the prior grace period completed but before the current
    grace period started. In this case, the RCU core must reject the
    quiescent state, but the recorded number will match. This is handled
    when the CPU becomes aware of a new grace period -- at that point,
    it invalidates any prior quiescent state.

    This works, but is a bit indirect. The new approach records the current
    grace period, and the RCU core checks to see (1) that this is still the
    current grace period and (2) that this grace period has not yet ended.
    This approach simplifies reasoning about correctness, and this commit
    changes over to this new approach.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Add trace events to record grace-period start and end, quiescent states,
    CPUs noticing grace-period start and end, grace-period initialization,
    call_rcu() invocation, tasks blocking in RCU read-side critical sections,
    tasks exiting those same critical sections, force_quiescent_state()
    detection of dyntick-idle and offline CPUs, CPUs entering and leaving
    dyntick-idle mode (except from NMIs), CPUs coming online and going
    offline, and CPUs being kicked for staying in dyntick-idle mode for too
    long (as in many weeks, even on 32-bit systems).

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    rcu: Add the rcu flavor to callback trace events

    The earlier trace events for registering RCU callbacks and for invoking
    them did not include the RCU flavor (rcu_bh, rcu_preempt, or rcu_sched).
    This commit adds the RCU flavor to those trace events.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This patch #ifdefs TINY_RCU kthreads out of the kernel unless RCU_BOOST=y,
    thus eliminating context-switch overhead if RCU priority boosting has
    not been configured.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Add event-trace markers to TREE_RCU kthreads to allow including these
    kthread's CPU time in the utilization calculations.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Andi Kleen noticed that one of the RCU_BOOST data declarations was
    out of sync with the definition. Move the declarations so that the
    compiler can do the checking in the future.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • We now have kthreads only for flavors of RCU that support boosting,
    so update the now-misleading comments accordingly.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Add a string to the rcu_batch_start() and rcu_batch_end() trace
    messages that indicates the RCU type ("rcu_sched", "rcu_bh", or
    "rcu_preempt"). The trace messages for the actual invocations
    themselves are not marked, as it should be clear from the
    rcu_batch_start() and rcu_batch_end() events before and after.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • In order to allow event tracing to distinguish between flavors of
    RCU, we need those names in the relevant RCU data structures. TINY_RCU
    has avoided them for memory-footprint reasons, so add them only if
    CONFIG_RCU_TRACE=y.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit adds the trace_rcu_utilization() marker that is to be
    used to allow postprocessing scripts compute RCU's CPU utilization,
    give or take event-trace overhead. Note that we do not include RCU's
    dyntick-idle interface because event tracing requires RCU protection,
    which is not available in dyntick-idle mode.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • There was recently some controversy about the overhead of invoking RCU
    callbacks. Add TRACE_EVENT()s to obtain fine-grained timings for the
    start and stop of a batch of callbacks and also for each callback invoked.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The rcu_torture_boost() cleanup code destroyed debug-objects state before
    waiting for the last RCU callback to be invoked, resulting in rare but
    very real debug-objects warnings. Move the destruction to after the
    waiting to fix this problem.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit eliminates the possibility of running TREE_PREEMPT_RCU
    when SMP=n and of running TINY_RCU when PREEMPT=y. People who really
    want these combinations can hand-edit init/Kconfig, but eliminating
    them as choices for production systems reduces the amount of testing
    required. It will also allow cutting out a few #ifdefs.

    Note that running TREE_RCU and TINY_RCU on single-CPU systems using
    SMP-built kernels is still supported.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • It has long been the case that the architecture must call nmi_enter()
    and nmi_exit() rather than irq_enter() and irq_exit() in order to
    permit RCU read-side critical sections in NMIs. Catch the documentation
    up with reality.

    Signed-off-by: Paul E. McKenney
    Acked-by: Mathieu Desnoyers

    Paul E. McKenney
     
  • Now that the RCU API contains synchronize_rcu_bh(), synchronize_sched(),
    call_rcu_sched(), and rcu_bh_expedited()...

    Make rcutorture test synchronize_rcu_bh(), getting rid of the old
    rcu_bh_torture_synchronize() workaround. Similarly, make rcutorture test
    synchronize_sched(), getting rid of the old sched_torture_synchronize()
    workaround. Make rcutorture test call_rcu_sched() instead of wrappering
    synchronize_sched(). Also add testing of rcu_bh_expedited().

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Pull the code that waits for an RCU grace period into a single function,
    which is then called by synchronize_rcu() and friends in the case of
    TREE_RCU and TREE_PREEMPT_RCU, and from rcu_barrier() and friends in
    the case of TINY_RCU and TINY_PREEMPT_RCU.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • rcutree.c defines rcu_cpu_kthread_cpu as int, not unsigned int,
    so the extern has to follow that.

    Signed-off-by: Andi Kleen
    Signed-off-by: Paul E. McKenney

    Andi Kleen
     
  • Update rcutorture documentation to account for boosting, new types of
    RCU torture testing that have been added over the past few years, and
    the memory-barrier testing that was added an embarrassingly long time
    ago.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Take a first step towards untangling Linux kernel header files by
    placing the struct rcu_head definition into include/linux/types.h
    and including include/linux/types.h in include/linux/rcupdate.h
    where struct rcu_head used to be defined. The actual inclusion point
    for include/linux/types.h is with the rest of the #include directives
    rather than at the point where struct rcu_head used to be defined,
    as suggested by Mathieu Desnoyers.

    Once this is in place, then header files that need only rcu_head
    can include types.h rather than rcupdate.h.

    Signed-off-by: Paul E. McKenney
    Cc: Paul Gortmaker
    Acked-by: Mathieu Desnoyers

    Paul E. McKenney