12 Dec, 2011

40 commits

  • Both TINY_RCU's and TREE_RCU's implementations of rcu_boost() access
    the ->boost_tasks and ->exp_tasks fields without preventing concurrent
    changes to these fields. This commit therefore applies ACCESS_ONCE in
    order to prevent compiler mischief.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This reverts commit 5342e269b2b58ee0b0b4168a94087faaa60d0567.

    The approach taken in this patch was deemed too abusive to mutexes,
    and thus too likely to result in maintenance problems in the future.
    Instead, we will disallow RCU read-side critical sections that partially
    overlap with interrupt-disbled code segments.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Tyler Hicks pointed me at an additional article on RCU and I figured
    it should probably be mentioned with the others.

    Signed-off-by: Kees Cook
    Signed-off-by: Paul E. McKenney

    Kees Cook
     
  • The current rcu_batch_end event trace records only the name of the RCU
    flavor and the total number of callbacks that remain queued on the
    current CPU. This is insufficient for testing and tuning the new
    dyntick-idle RCU_FAST_NO_HZ code, so this commit adds idle state along
    with whether or not any of the callbacks that were ready to invoke
    at the beginning of rcu_do_batch() are still queued.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit adds simple rcutorture tests for srcu_read_lock_raw() and
    srcu_read_unlock_raw(). It does not test doing srcu_read_lock_raw()
    in an exception handler and releasing it in the corresponding process
    context.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The rcutorture test now can automatically exercise CPU hotplug and
    collect success statistics, which can be correlated with other rcutorture
    activity. This permits rcutorture to completely exercise RCU regardless
    of what sort of userspace and filesystem layout is in use. Unfortunately,
    rcutorture is happy to attempt to offline CPUs that cannot be offlined,
    for example, CPU 0 in both the x86 and ARM architectures. Although this
    allows rcutorture testing to proceed normally, it confounds attempts at
    error analysis due to the resulting flood of spurious CPU-hotplug errors.

    Therefore, this commit uses the new cpu_is_hotpluggable() function to
    avoid attempting to offline CPUs that are not hotpluggable, which in
    turn avoids spurious CPU-hotplug errors.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • When architectures register CPUs, they indicate whether the CPU allows
    hotplugging; notably, x86 and ARM don't allow hotplugging CPU 0.
    Userspace can easily query the hotpluggability of a CPU via sysfs;
    however, the kernel has no convenient way of accessing that property in
    an architecture-independent way. While the kernel can simply try it and
    see, some code needs to distinguish between "hotplug failed" and
    "hotplug has no hope of working on this CPU"; for example, rcutorture's
    CPU hotplug tests want to avoid drowning out real hotplug failures with
    expected failures.

    Expose this property via a new cpu_is_hotpluggable function, so that the
    rest of the kernel can access it in an architecture-independent way.

    Signed-off-by: Josh Triplett
    Signed-off-by: Paul E. McKenney

    Josh Triplett
     
  • No point in having two identical rcu_cpu_stall_suppress declarations,
    so remove the more obscure of the two.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • If there are other CPUs active at a given point in time, then there is a
    limit to what a given CPU can do to advance the current RCU grace period.
    Beyond this limit, attempting to force the RCU grace period forward will
    do nothing but consume energy burning CPU cycles.

    Therefore, this commit takes an adaptive approach to RCU_FAST_NO_HZ
    preparations for idle. It pushes the RCU core state machine for
    two cycles unconditionally, and then it will push from zero to three
    additional cycles, but only as long as the RCU core has work for this
    CPU to do immediately. The rcu_pending() function is used to check
    whether the RCU core has such work.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The rcu_do_batch() function that invokes callbacks for TREE_RCU and
    TREE_PREEMPT_RCU normally throttles callback invocation to avoid degrading
    scheduling latency. However, as long as the CPU would otherwise be idle,
    there is no downside to continuing to invoke any callbacks that have passed
    through their grace periods. In fact, processing such callbacks in a
    timely manner has the benefit of increasing the probability that the
    CPU can enter the power-saving dyntick-idle mode.

    Therefore, this commit allows callback invocation to continue beyond the
    preset limit as long as the scheduler does not have some other task to
    run and as long as context is that of the idle task or the relevant
    RCU kthread.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Because tasks don't nest, the ->dyntick_nesting must always be zero upon
    entry to rcu_idle_enter_common(). Therefore, pass "0" rather than the
    counter itself.

    Signed-off-by: Frederic Weisbecker
    Cc: Josh Triplett
    Signed-off-by: Paul E. McKenney

    Frederic Weisbecker
     
  • Because tasks do not nest, rcu_idle_enter() and rcu_idle_exit() do
    not need to check for nesting. This commit therefore moves nesting
    checks from rcu_idle_enter_common() to rcu_irq_exit() and from
    rcu_idle_exit_common() to rcu_irq_enter().

    Signed-off-by: Frederic Weisbecker
    Cc: Josh Triplett
    Signed-off-by: Paul E. McKenney

    Frederic Weisbecker
     
  • The current implementation of RCU_FAST_NO_HZ prevents CPUs from entering
    dyntick-idle state if they have RCU callbacks pending. Unfortunately,
    this has the side-effect of often preventing them from entering this
    state, especially if at least one other CPU is not in dyntick-idle state.
    However, the resulting per-tick wakeup is wasteful in many cases: if the
    CPU has already fully responded to the current RCU grace period, there
    will be nothing for it to do until this grace period ends, which will
    frequently take several jiffies.

    This commit therefore permits a CPU that has done everything that the
    current grace period has asked of it (rcu_pending() == 0) even if it
    still as RCU callbacks pending. However, such a CPU posts a timer to
    wake it up several jiffies later (6 jiffies, based on experience with
    grace-period lengths). This wakeup is required to handle situations
    that can result in all CPUs being in dyntick-idle mode, thus failing
    to ever complete the current grace period. If a CPU wakes up before
    the timer goes off, then it cancels that timer, thus avoiding spurious
    wakeups.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The intent is that a given RCU read-side critical section be confined
    to a single context. For example, it is illegal to invoke rcu_read_lock()
    in an exception handler and then invoke rcu_read_unlock() from the
    context of the task that received the exception.

    Suggested-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Fixes and workarounds for a number of issues (for example, that in
    df4012edc) make it safe to once again detect dyntick-idle CPUs on the
    first pass of force_quiescent_state(), so this commit makes that change.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Assertions in rcu_init_percpu_data() unknowingly relied on outgoing
    CPUs being turned off before reaching the idle loop. Unfortunately,
    when running under kvm/qemu on x86, CPUs really can get to idle before
    begin shut off. These CPUs are then born in dyntick-idle mode from an
    RCU perspective, which results in splats in rcu_init_percpu_data() and
    in RCU wrongly ignoring those CPUs despite them being active. This in
    turn can cause RCU to end grace periods prematurely, potentially freeing
    up memory that the newly onlined CPUs were still using. This is most
    decidedly not what we need to see in an RCU implementation.

    This commit therefore replaces the assertions in rcu_init_percpu_data()
    with code that forces RCU's dyntick-idle view of newly onlined CPUs to
    match reality.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Re-enable interrupts across calls to quiescent-state functions and
    also across force_quiescent_state() to reduce latency.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • With the new implementation of RCU_FAST_NO_HZ, it was possible to hang
    RCU grace periods as follows:

    o CPU 0 attempts to go idle, cycles several times through the
    rcu_prepare_for_idle() loop, then goes dyntick-idle when
    RCU needs nothing more from it, while still having at least
    on RCU callback pending.

    o CPU 1 goes idle with no callbacks.

    Both CPUs can then stay in dyntick-idle mode indefinitely, preventing
    the RCU grace period from ever completing, possibly hanging the system.

    This commit therefore prevents CPUs that have RCU callbacks from entering
    dyntick-idle mode. This approach also eliminates the need for the
    end-of-grace-period IPIs used previously.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • If a CPU enters dyntick-idle mode with callbacks pending, it will need
    an IPI at the end of the grace period. However, if it exits dyntick-idle
    mode before the grace period ends, it will be needlessly IPIed at the
    end of the grace period.

    Therefore, this commit clears the per-CPU rcu_awake_at_gp_end flag
    when a CPU determines that it does not need it. This in turn requires
    disabling interrupts across much of rcu_prepare_for_idle() in order to
    avoid having nested interrupts clearing this state out from under us.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The earlier version would attempt to push callbacks through five times
    before going into dyntick-idle mode if callbacks remained, but the CPU
    had done all that it needed to do for the current RCU grace periods.
    This is wasteful: In most cases, once the CPU has done all that it
    needs to for the current RCU grace periods, it will make no further
    progress on the callbacks no matter how many times it loops through
    the RCU core processing and the idle-entry code.

    This commit therefore goes to dyntick-idle mode whenever the current
    CPU has done all it can for the current grace period.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit adds trace_rcu_prep_idle(), which is invoked from
    rcu_prepare_for_idle() and rcu_wake_cpu() to trace attempts on
    the part of RCU to force CPUs into dyntick-idle mode.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit updates the trace_rcu_dyntick() header comment to reflect
    events added by commit 4b4f421.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • An IRC discussion uncovered many conflicting opinions on what types
    of data may be atomically loaded and stored. This commit therefore
    calls this out the official set: pointers, longs, ints, and chars (but
    not shorts). This commit also gives some examples of compiler mischief
    that can thwart atomicity.

    Please note that this discussion is relevant to !SMP kernels if
    CONFIG_PREEMPT=y: preemption can cause almost as much trouble as can SMP.

    Signed-off-by: Paul E. McKenney
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: Russell King
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Mike Frysinger
    Cc: Mikael Starvik
    Cc: Jesper Nilsson
    Cc: David Howells
    Cc: Yoshinori Sato
    Cc: Richard Kuo
    Cc: Jes Sorensen
    Cc: Hirokazu Takata
    Cc: Geert Uytterhoeven
    Cc: Michal Simek
    Cc: Ralf Baechle
    Cc: Koichi Yasutake
    Cc: Jonas Bonn
    Cc: Kyle McMartin
    Cc: Helge Deller
    Cc: "James E.J. Bottomley"
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Chen Liqin
    Cc: Lennox Wu
    Cc: Paul Mundt
    Cc: "David S. Miller"
    Cc: Chris Metcalf
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Guan Xuetao
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Chris Zankel

    Paul E. McKenney
     
  • Those two APIs were provided to optimize the calls of
    tick_nohz_idle_enter() and rcu_idle_enter() into a single
    irq disabled section. This way no interrupt happening in-between would
    needlessly process any RCU job.

    Now we are talking about an optimization for which benefits
    have yet to be measured. Let's start simple and completely decouple
    idle rcu and dyntick idle logics to simplify.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Reviewed-by: Josh Triplett
    Signed-off-by: Paul E. McKenney

    Frederic Weisbecker
     
  • Running CPU-hotplug operations concurrently with rcutorture has
    historically been a good way to find bugs in both RCU and CPU hotplug.
    This commit therefore adds an rcutorture module parameter called
    "onoff_interval" that causes a randomly selected CPU-hotplug operation to
    be executed at the specified interval, in seconds. The default value of
    "onoff_interval" is zero, which disables rcutorture-instigated CPU-hotplug
    operations.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Change from direct comparison of ->pid with zero to is_idle_task().

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett
    Acked-by: Chris Metcalf

    Paul E. McKenney
     
  • Change from direct comparison of ->pid with zero to is_idle_task().

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Arnaldo Carvalho de Melo
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Change from direct comparison of ->pid with zero to is_idle_task().

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Cc: Jason Wessel
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Change from direct comparison of ->pid with zero to is_idle_task().

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Acked-by: David S. Miller
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Change from direct comparison of ->pid with zero to is_idle_task().

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Commit 908a3283 (Fix idle_cpu()) invalidated some uses of idle_cpu(),
    which used to say whether or not the CPU was running the idle task,
    but now instead says whether or not the CPU is running the idle task
    in the absence of pending wakeups. Although this new implementation
    gives a better answer to the question "is this CPU idle?", it also
    invalidates other uses that were made of idle_cpu().

    This commit therefore introduces a new is_idle_task() API member
    that determines whether or not the specified task is one of the
    idle tasks, allowing open-coded "->pid == 0" sequences to be replaced
    by something more meaningful.

    Suggested-by: Josh Triplett
    Suggested-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, if rcutorture is built into the kernel, it must be manually
    started or started from an init script. This is inconvenient for
    automated KVM testing, where it is good to be able to fully control
    rcutorture execution from the kernel parameters. This patch therefore
    adds a module parameter named "rcutorture_runnable" that defaults
    to zero ("don't start automatically"), but which can be set to one
    to cause rcutorture to start up immediately during boot.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Although it is easy to run rcutorture tests under KVM, there is currently
    no nice way to run such a test for a fixed time period, collect all of
    the rcutorture data, and then shut the system down cleanly. This commit
    therefore adds an rcutorture module parameter named "shutdown_secs" that
    specified the run duration in seconds, after which rcutorture terminates
    the test and powers the system down. The default value for "shutdown_secs"
    is zero, which disables shutdown.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The new implementation of RCU_FAST_NO_HZ is compatible with preemptible
    RCU, so this commit removes the Kconfig restriction that previously
    prohibited this.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • RCU has traditionally relied on idle_cpu() to determine whether a given
    CPU is running in the context of an idle task, but commit 908a3283
    (Fix idle_cpu()) has invalidated this approach. After commit 908a3283,
    idle_cpu() will return true if the current CPU is currently running the
    idle task, and will be doing so for the foreseeable future. RCU instead
    needs to know whether or not the current CPU is currently running the
    idle task, regardless of what the near future might bring.

    This commit therefore switches from idle_cpu() to "current->pid != 0".

    Reported-by: Wu Fengguang
    Suggested-by: Carsten Emde
    Signed-off-by: Paul E. McKenney
    Acked-by: Steven Rostedt
    Tested-by: Wu Fengguang
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, RCU does not permit a CPU to enter dyntick-idle mode if that
    CPU has any RCU callbacks queued. This means that workloads for which
    each CPU wakes up and does some RCU updates every few ticks will never
    enter dyntick-idle mode. This can result in significant unnecessary power
    consumption, so this patch permits a given to enter dyntick-idle mode if
    it has callbacks, but only if that same CPU has completed all current
    work for the RCU core. We determine use rcu_pending() to determine
    whether a given CPU has completed all current work for the RCU core.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The current code just complains if the current task is not the idle task.
    This commit therefore adds printing of the identity of the idle task.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The trace_rcu_dyntick() trace event did not print both the old and
    the new value of the nesting level, and furthermore printed only
    the low-order 32 bits of it. This could result in some confusion
    when interpreting trace-event dumps, so this commit prints both
    the old and the new value, prints the full 64 bits, and also selects
    the process-entry/exit increment to print nicely in hexadecimal.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Update various files in Documentation/RCU to reflect srcu_read_lock_raw()
    and srcu_read_unlock_raw(). Credit to Peter Zijlstra for suggesting
    use of the existing _raw suffix instead of the earlier bulkref names.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The RCU implementations, including SRCU, are designed to be used in a
    lock-like fashion, so that the read-side lock and unlock primitives must
    execute in the same context for any given read-side critical section.
    This constraint is enforced by lockdep-RCU. However, there is a need
    to enter an SRCU read-side critical section within the context of an
    exception and then exit in the context of the task that encountered the
    exception. The cost of this capability is that the read-side operations
    incur the overhead of disabling interrupts.

    Note that although the current implementation allows a given read-side
    critical section to be entered by one task and then exited by another, all
    known possible implementations that allow this have scalability problems.
    Therefore, a given read-side critical section must be exited by the same
    task that entered it, though perhaps from an interrupt or exception
    handler running within that task's context. But if you are thinking
    in terms of interrupt handlers, make sure that you have considered the
    possibility of threaded interrupt handlers.

    Credit goes to Peter Zijlstra for suggesting use of the existing _raw
    suffix to indicate disabling lockdep over the earlier "bulkref" names.

    Requested-by: Srikar Dronamraju
    Signed-off-by: Paul E. McKenney
    Tested-by: Srikar Dronamraju

    Paul E. McKenney