26 Sep, 2012

2 commits

  • The current implementation of RCU_FAST_NO_HZ tries reasonably hard to rid
    the current CPU of RCU callbacks. This is appropriate when the CPU is
    entering idle, where it doesn't have much useful to do anyway, but is most
    definitely not what you want when transitioning to user-mode execution.
    This commit therefore detects the adaptive-tick case, and refrains from
    burning CPU time getting rid of RCU callbacks in that case.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The conflicts between kernel/rcutree.h and kernel/rcutree_plugin.h
    were due to adjacent insertions and deletions, which were resolved
    by simply accepting the changes on both branches.

    Paul E. McKenney
     

25 Sep, 2012

1 commit

  • …', 'hotplug.2012.09.23a' and 'idlechop.2012.09.23a' into HEAD

    bigrt.2012.09.23a contains additional commits to reduce scheduling latency
    from RCU on huge systems (many hundrends or thousands of CPUs).

    doctorture.2012.09.23a contains documentation changes and rcutorture fixes.

    fixes.2012.09.23a contains miscellaneous fixes.

    hotplug.2012.09.23a contains CPU-hotplug-related changes.

    idle.2012.09.23a fixes architectures for which RCU no longer considered
    the idle loop to be a quiescent state due to earlier
    adaptive-dynticks changes. Affected architectures are alpha,
    cris, frv, h8300, m32r, m68k, mn10300, parisc, score, xtensa,
    and ia64.

    Paul E. McKenney
     

23 Sep, 2012

11 commits

  • The print_cpu_stall_fast_no_hz() function attempts to print -1 when
    the ->idle_gp_timer is not pending, but unsigned arithmetic causes it
    to instead print ULONG_MAX, which is 4294967295 on 32-bit systems and
    18446744073709551615 on 64-bit systems. Neither of these are the most
    reader-friendly values, so this commit instead causes "timer not pending"
    to be printed when ->idle_gp_timer is not pending.

    Reported-by: Paul Walmsley
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The rcu_print_detail_task_stall_rnp() function invokes
    rcu_preempt_blocked_readers_cgp() to verify that there are some preempted
    RCU readers blocking the current grace period outside of the protection
    of the rcu_node structure's ->lock. This means that the last blocked
    reader might exit its RCU read-side critical section and remove itself
    from the ->blkd_tasks list before the ->lock is acquired, resulting in
    a segmentation fault when the subsequent code attempts to dereference
    the now-NULL gp_tasks pointer.

    This commit therefore moves the test under the lock. This will not
    have measurable effect on lock contention because this code is invoked
    only when printing RCU CPU stall warnings, in other words, in the common
    case, never.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The increment_cpu_stall_ticks() function listed each RCU flavor
    explicitly, with an ifdef to handle preemptible RCU. This commit
    therefore applies for_each_rcu_flavor() to save a line of code.

    Because this commit switches from a code-based enumeration of the
    flavors of RCU to an rcu_state-list-based enumeration, it is no longer
    possible to apply __get_cpu_var() to the per-CPU rcu_data structures.
    We instead use __this_cpu_var() on the rcu_state structure's ->rda field
    that references the corresponding rcu_data structures.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Commit 1217ed1b (rcu: permit rcu_read_unlock() to be called while holding
    runqueue locks) made rcu_initiate_boost() restore irq state when releasing
    the rcu_node structure's ->lock, but failed to update the header comment
    accordingly. This commit therefore brings the header comment up to date.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The rcu_preempt_offline_tasks() moves all tasks queued on a given leaf
    rcu_node structure to the root rcu_node, which is done when the last CPU
    corresponding the the leaf rcu_node structure goes offline. Now that
    RCU-preempt's synchronize_rcu_expedited() implementation blocks CPU-hotplug
    operations during the initialization of each rcu_node structure's
    ->boost_tasks pointer, rcu_preempt_offline_tasks() can do a better job
    of setting the root rcu_node's ->boost_tasks pointer.

    The key point is that rcu_preempt_offline_tasks() runs as part of the
    CPU-hotplug process, so that a concurrent synchronize_rcu_expedited()
    is guaranteed to either have not started on the one hand (in which case
    there is no boosting on behalf of the expedited grace period) or to be
    completely initialized on the other (in which case, in the absence of
    other priority boosting, all ->boost_tasks pointers will be initialized).
    Therefore, if rcu_preempt_offline_tasks() finds that the ->boost_tasks
    pointer is equal to the ->exp_tasks pointer, it can be sure that it is
    correctly placed.

    In the case where there was boosting ongoing at the time that the
    synchronize_rcu_expedited() function started, different nodes might start
    boosting the tasks blocking the expedited grace period at different times.
    In this mixed case, the root node will either be boosting tasks for
    the expedited grace period already, or it will start as soon as it gets
    done boosting for the normal grace period -- but in this latter case,
    the root node's tasks needed to be boosted in any case.

    This commit therefore adds a check of the ->boost_tasks pointer against
    the ->exp_tasks pointer to the list that prevents updating ->boost_tasks.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • When rcu_preempt_offline_tasks() clears tasks from a leaf rcu_node
    structure, it does not NULL out the structure's ->boost_tasks field.
    This commit therefore fixes this issue.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The current quiescent-state detection algorithm is needlessly
    complex. It records the grace-period number corresponding to
    the quiescent state at the time of the quiescent state, which
    works, but it seems better to simply erase any record of previous
    quiescent states at the time that the CPU notices the new grace
    period. This has the further advantage of removing another piece
    of RCU for which lockless reasoning is required.

    Therefore, this commit makes this change.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The synchronize_rcu_expedited() function disables interrupts across a
    scan of all leaf rcu_node structures, which is not good for real-time
    scheduling latency on large systems (hundreds or especially thousands
    of CPUs). This commit therefore holds off CPU-hotplug operations using
    get_online_cpus(), and removes the prior acquisiion of the ->onofflock
    (which required disabling interrupts).

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • In the C language, signed overflow is undefined. It is true that
    twos-complement arithmetic normally comes to the rescue, but if the
    compiler can subvert this any time it has any information about the values
    being compared. For example, given "if (a - b > 0)", if the compiler
    has enough information to realize that (for example) the value of "a"
    is positive and that of "b" is negative, the compiler is within its
    rights to optimize to a simple "if (1)", which might not be what you want.

    This commit therefore converts synchronize_rcu_expedited()'s work-done
    detection counter from signed to unsigned.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • As the first step towards allowing quiescent-state forcing to be
    preemptible, this commit moves RCU quiescent-state forcing into the
    same kthread that is now used to initialize and clean up after grace
    periods. This is yet another step towards keeping scheduling
    latency down to a dull roar.

    Updated to change from raw_spin_lock_irqsave() to raw_spin_lock_irq()
    and to remove the now-unused rcu_state structure fields as suggested by
    Peter Zijlstra.

    Reported-by: Mike Galbraith
    Reported-by: Dimitri Sivanich
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • In kernels built with CONFIG_RCU_FAST_NO_HZ=y, CPUs can accumulate a
    large number of lazy callbacks, which as the name implies will be slow
    to be invoked. This can be a problem on small-memory systems, where the
    default 6-second sleep for CPUs having only lazy RCU callbacks could well
    be fatal. This commit therefore installs an OOM hander that ensures that
    every CPU with lazy callbacks has at least one non-lazy callback, in turn
    ensuring timely advancement for these callbacks.

    Updated to fix bug that disabled OOM killing, noted by Lai Jiangshan.

    Updated to push the for_each_rcu_flavor() loop into rcu_oom_notify_cpu(),
    thus reducing the number of IPIs, as suggested by Steven Rostedt. Also
    to make the for_each_online_cpu() loop be preemptible. (Later, it might
    be good to use smp_call_function(), as suggested by Peter Zijlstra.)

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Sasha Levin
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

13 Aug, 2012

2 commits

  • Bring RCU into the new-age CPU-hotplug fold by modifying RCU's per-CPU
    kthread code to use the new smp_hotplug_thread facility.

    [ tglx: Adapted it to use callbacks and to the simplified rcu yield ]

    Signed-off-by: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Srivatsa S. Bhat
    Cc: Rusty Russell
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/20120716103948.673354828@linutronix.de
    Signed-off-by: Thomas Gleixner

    Paul E. McKenney
     
  • The rcu_yield() code is amazing. It's there to avoid starvation of the
    system when lots of (boosting) work is to be done.

    Now looking at the code it's functionality is:

    Make the thread SCHED_OTHER and very nice, i.e. get it out of the way
    Arm a timer with 2 ticks
    schedule()

    Now if the system goes idle the rcu task returns, regains SCHED_FIFO
    and plugs on. If the systems stays busy the timer fires and wakes a
    per node kthread which in turn makes the per cpu thread SCHED_FIFO and
    brings it back on the cpu. For the boosting thread the "make it FIFO"
    bit is missing and it just runs some magic boost checks. Now this is a
    lot of code with extra threads and complexity.

    It's way simpler to let the tasks when they detect overload schedule
    away for 2 ticks and defer the normal wakeup as long as they are in
    yielded state and the cpu is not idle.

    That solves the same problem and the only difference is that when the
    cpu goes idle it's not guaranteed that the thread returns right away,
    but it won't be longer out than two ticks, so no harm is done. If
    that's an issue than it is way simpler just to wake the task from
    idle as RCU has callbacks there anyway.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Srivatsa S. Bhat
    Cc: Rusty Russell
    Cc: Namhyung Kim
    Reviewed-by: Paul E. McKenney
    Link: http://lkml.kernel.org/r/20120716103948.131256723@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

06 Jul, 2012

2 commits

  • The Linux kernel coding style says that single-statement blocks should
    omit curly braces unless the other leg of the "if" statement has
    multiple statements, in which case the curly braces should be included.
    This commit fixes RCU's violations of this rule.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • …a' and 'fnh.2012.07.02a' into HEAD

    bigrtm: First steps towards getting RCU out of the way of
    tens-of-microseconds real-time response on systems compiled
    with NR_CPUS=4096. Also cleanups for and increased concurrency
    of rcu_barrier() family of primitives.
    doctorture: rcutorture and documentation improvements.
    fixes: Miscellaneous fixes.
    fnh: RCU_FAST_NO_HZ fixes and improvements.

    Paul E. McKenney
     

03 Jul, 2012

11 commits

  • If the nohz= boot parameter disables nohz, then RCU_FAST_NO_HZ needs to
    also disable itself. This commit therefore checks for tick_nohz_enabled
    being zero, disabling rcu_prepare_for_idle() if so. This commit assumes
    that tick_nohz_enabled can change at runtime: If this is not the case,
    then a simpler approach suffices.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, if several CPUs in the same package have all lazy RCU
    callbacks, their wakeups will be uncorrelated. If all the CPUs are in the
    same power domain (as is often the case), this will result in unnecessary
    power-ups of the package. This commit therefore uses round_jiffies()
    to round the timeouts to a second boundary, increasing the odds that
    they can be coalesced with each other or with other timeouts.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • An uninitialized string may be displayed at the end of the rcu_preempt
    detected stall info such as

    0: (1 GPs behind) idle=075/140000000000000/0 =8?^D=8?^D
    ^^^^^^^^^^
    if CONFIG_RCU_FAST_NO_HZ is not defined.

    This trivial patch clears the string in this case.

    Signed-off-by: Carsten Emde
    Signed-off-by: Paul E. McKenney

    Carsten Emde
     
  • The CONFIG_TREE_PREEMPT_RCU and CONFIG_TINY_PREEMPT_RCU versions of
    __rcu_read_lock() and __rcu_read_unlock() are identical, so this commit
    consolidates them into kernel/rcupdate.h.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The arrival of TREE_PREEMPT_RCU some years back included some ugly
    code involving either #ifdef or #ifdef'ed wrapper functions to iterate
    over all non-SRCU flavors of RCU. This commit therefore introduces
    a for_each_rcu_flavor() iterator over the rcu_state structures for each
    flavor of RCU to clean up a bit of the ugliness.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • With the advent of __this_cpu_ptr(), it is no longer necessary to pass
    both the rcu_state and rcu_data structures into __rcu_process_callbacks().
    This commit therefore computes the rcu_data pointer from the rcu_state
    pointer within __rcu_process_callbacks() so that callers can pass in
    only the pointer to the rcu_state structure. This paves the way for
    linking the rcu_state structures together and iterating over them.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • This is a preparatory commit for increasing rcu_barrier()'s concurrency.
    It adds a pointer in the rcu_data structure to the corresponding call_rcu()
    function. This allows a pointer to the rcu_data structure to imply the
    function pointer, which allows _rcu_barrier() state to be placed in the
    rcu_state structure.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The rcu_node tree array is sized based on compile-time constants,
    including NR_CPUS. Although this approach has worked well in the past,
    the recent trend by many distros to define NR_CPUS=4096 results in
    excessive grace-period-initialization latencies.

    This commit therefore substitutes the run-time computed nr_cpu_ids for
    the compile-time NR_CPUS when building the tree. This can result in
    much of the compile-time-allocated rcu_node array being unused. If
    this is a major problem, you are in a specialized situation anyway,
    so you can manually adjust the NR_CPUS, RCU_FANOUT, and RCU_FANOUT_LEAF
    kernel config parameters.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Time to make the four-level-hierarchy setting less scary, so this
    commit removes "Experimental" from the boot-time message. Leave the
    message in order to get a heads-up on any possible need to expand to
    a five-level hierarchy.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Although making RCU_FANOUT_LEAF a kernel configuration parameter rather
    than a fixed constant makes it easier for people to decrease cache-miss
    overhead for large systems, it is of little help for people who must
    run a single pre-built kernel binary.

    This commit therefore allows the value of RCU_FANOUT_LEAF to be
    increased (but not decreased!) via a boot-time parameter named
    rcutree.rcu_fanout_leaf.

    Reported-by: Mike Galbraith
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This reverts commit 616c310e83b872024271c915c1b9ab505b9efad9.
    (Move PREEMPT_RCU preemption to switch_to() invocation).
    Testing by Sasha Levin showed that this
    can result in deadlock due to invoking the scheduler when one of
    the runqueue locks is held. Because this commit was simply a
    performance optimization, revert it.

    Reported-by: Sasha Levin
    Signed-off-by: Paul E. McKenney
    Tested-by: Sasha Levin

    Paul E. McKenney
     

07 Jun, 2012

3 commits

  • When a CPU is entering dyntick-idle mode, tick_nohz_stop_sched_tick()
    calls rcu_needs_cpu() see if RCU needs that CPU, and, if not, computes the
    next wakeup time based on the timer wheels. Only later, when actually
    entering the idle loop, rcu_prepare_for_idle() will be invoked. In some
    cases, rcu_prepare_for_idle() will post timers to wake the CPU back up.
    But all for naught: The next wakeup time for the CPU has already been
    computed, and posting a timer afterwards does not force that wakeup
    time to be recomputed. This means that rcu_prepare_for_idle()'s have
    no effect.

    This is not a problem on a busy system because something else will wake
    up the CPU soon enough. However, on lightly loaded systems, the CPU
    might stay asleep for a considerable length of time. If that CPU has
    a callback that the rest of the system is waiting on, the system might
    run very slowly or (in theory) even hang.

    This commit avoids this problem by having rcu_needs_cpu() give
    tick_nohz_stop_sched_tick() an estimate of when RCU will need the CPU
    to wake back up, which tick_nohz_stop_sched_tick() takes into account
    when programming the CPU's wakeup time. An alternative approach is
    for rcu_prepare_for_idle() to use hrtimers instead of normal timers,
    but timers are much more efficient than are hrtimers for frequently
    and repeatedly posting and cancelling a given timer, which is exactly
    what RCU_FAST_NO_HZ does.

    Reported-by: Pascal Chapperon
    Reported-by: Heiko Carstens
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Heiko Carstens
    Tested-by: Pascal Chapperon

    Paul E. McKenney
     
  • The RCU_FAST_NO_HZ code relies on a number of per-CPU variables.
    This works, but is hidden from someone scanning the data structures
    in rcutree.h. This commit therefore converts these per-CPU variables
    to fields in the per-CPU rcu_dynticks structures.

    Suggested-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Heiko Carstens
    Tested-by: Pascal Chapperon

    Paul E. McKenney
     
  • In the current code, a short dyntick-idle interval (where there is
    at least one non-lazy callback on the CPU) and a long dyntick-idle
    interval (where there are only lazy callbacks on the CPU) are traced
    identically, which can be less than helpful. This commit therefore
    emits different event traces in these two cases.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Heiko Carstens
    Tested-by: Pascal Chapperon

    Paul E. McKenney
     

12 May, 2012

1 commit

  • …and 'srcu.2012.05.07b' into HEAD

    barrier: Reduce the amount of disturbance by rcu_barrier() to the rest of
    the system. This branch also includes improvements to
    RCU_FAST_NO_HZ, which are included here due to conflicts.
    fixes: Miscellaneous fixes.
    inline: Remaining changes from an abortive attempt to inline
    preemptible RCU's __rcu_read_lock(). These are (1) making
    exit_rcu() avoid unnecessary work and (2) avoiding having
    preemptible RCU record a blocked thread when the scheduler
    declines to do a context switch.
    srcu: Lai Jiangshan's algorithmic implementation of SRCU, including
    call_srcu().

    Paul E. McKenney
     

10 May, 2012

2 commits

  • The current initialization of the RCU_FAST_NO_HZ per-CPU variables makes
    needless and fragile assumptions about the initial value of things like
    the jiffies counter. This commit therefore explicitly initializes all of
    them that are better started with a non-zero value. It also adds some
    comments describing the per-CPU state variables.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The current RCU_FAST_NO_HZ assumes that timers do not migrate unless a
    CPU goes offline, in which case it assumes that the CPU will have to come
    out of dyntick-idle mode (cancelling the timer) in order to go offline.
    This is important because when RCU_FAST_NO_HZ permits a CPU to enter
    dyntick-idle mode despite having RCU callbacks pending, it posts a timer
    on that CPU to force a wakeup on that CPU. This wakeup ensures that the
    CPU will eventually handle the end of the grace period, including invoking
    its RCU callbacks.

    However, Pascal Chapperon's test setup shows that the timer handler
    rcu_idle_gp_timer_func() really does get invoked in some cases. This is
    problematic because this can cause the CPU that entered dyntick-idle
    mode despite still having RCU callbacks pending to remain in
    dyntick-idle mode indefinitely, which means that its RCU callbacks might
    never be invoked. This situation can result in grace-period delays or
    even system hangs, which matches Pascal's observations of slow boot-up
    and shutdown (https://lkml.org/lkml/2012/4/5/142). See also the bugzilla:

    https://bugzilla.redhat.com/show_bug.cgi?id=806548

    This commit therefore causes the "should never be invoked" timer handler
    rcu_idle_gp_timer_func() to use smp_call_function_single() to wake up
    the CPU for which the timer was intended, allowing that CPU to invoke
    its RCU callbacks in a timely manner.

    Reported-by: Pascal Chapperon
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

03 May, 2012

2 commits

  • When running preemptible RCU, if a task exits in an RCU read-side
    critical section having blocked within that same RCU read-side critical
    section, the task must be removed from the list of tasks blocking a
    grace period (perhaps the current grace period, perhaps the next grace
    period, depending on timing). The exit() path invokes exit_rcu() to
    do this cleanup.

    However, the current implementation of exit_rcu() needlessly does the
    cleanup even if the task did not block within the current RCU read-side
    critical section, which wastes time and needlessly increases the size
    of the state space. Fix this by only doing the cleanup if the current
    task is actually on the list of tasks blocking some grace period.

    While we are at it, consolidate the two identical exit_rcu() functions
    into a single function.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Linus Torvalds

    Conflicts:

    kernel/rcupdate.c

    Paul E. McKenney
     
  • Currently, PREEMPT_RCU readers are enqueued upon entry to the scheduler.
    This is inefficient because enqueuing is required only if there is a
    context switch, and entry to the scheduler does not guarantee a context
    switch.

    The commit therefore moves the enqueuing to immediately precede the
    call to switch_to() from the scheduler.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Linus Torvalds

    Paul E. McKenney
     

01 May, 2012

1 commit

  • Timers are subject to migration, which can lead to the following
    system-hang scenario when CONFIG_RCU_FAST_NO_HZ=y:

    1. CPU 0 executes synchronize_rcu(), which posts an RCU callback.

    2. CPU 0 then goes idle. It cannot immediately invoke the callback,
    but there is nothing RCU needs from ti, so it enters dyntick-idle
    mode after posting a timer.

    3. The timer gets migrated to CPU 1.

    4. CPU 0 never wakes up, so the synchronize_rcu() never returns, so
    the system hangs.

    This commit fixes this problem by using mod_timer_pinned(), as suggested
    by Peter Zijlstra, to ensure that the timer is actually posted on the
    running CPU.

    Reported-by: Dipankar Sarma
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

26 Apr, 2012

1 commit

  • RCU_FAST_NO_HZ uses a timer to limit the time that a CPU with callbacks
    can remain in dyntick-idle mode. This timer is cancelled when the CPU
    exits idle, and therefore should never fire. However, if the timer
    were migrated to some other CPU for whatever reason (1) the timer could
    actually fire and (2) firing on some other CPU would fail to wake up the
    CPU with callbacks, possibly resulting in sluggishness or a system hang.

    This commit therfore adds a WARN_ON_ONCE() to the timer handler in order
    to detect this condition.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

25 Apr, 2012

1 commit

  • Both Steven Rostedt's new idle-capable trace macros and the RCU_NONIDLE()
    macro can cause RCU to momentarily pause out of idle without the rest
    of the system being involved. This can cause rcu_prepare_for_idle()
    to run through its state machine too quickly, which can in turn result
    in needless scheduling-clock interrupts.

    This commit therefore adds code to enable rcu_prepare_for_idle() to
    distinguish between an initial entry to idle on the one hand (which needs
    to advance the rcu_prepare_for_idle() state machine) and an idle reentry
    due to idle-capable trace macros and RCU_NONIDLE() on the other hand
    (which should avoid advancing the rcu_prepare_for_idle() state machine).
    Additional state is maintained to allow the timer to be correctly reposted
    when returning after a momentary pause out of idle, and even more state
    is maintained to detect when new non-lazy callbacks have been enqueued
    (which may require re-evaluation of the approach to idleness).

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney