Doug / smarc-fsl-linux-kernel | Embedian Git Server

26 Sep, 2012

2 commits

9a0c6fef4 rcu: Make RCU_FAST_NO_HZ handle adaptive ticks ... Browse Code »

The current implementation of RCU_FAST_NO_HZ tries reasonably hard to rid
the current CPU of RCU callbacks. This is appropriate when the CPU is
entering idle, where it doesn't have much useful to do anyway, but is most
definitely not what you want when transitioning to user-mode execution.
This commit therefore detects the adaptive-tick case, and refrains from
burning CPU time getting rid of RCU callbacks in that case.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2012-09-26 21:44:02 +0800
5217192b8 Merge remote-tracking branch 'tip/smp/hotplug' into next.2012.09.25b ... Browse Code »

The conflicts between kernel/rcutree.h and kernel/rcutree_plugin.h
were due to adjacent insertions and deletions, which were resolved
by simply accepting the changes on both branches.

Paul E. McKenney
2012-09-26 01:01:45 +0800

25 Sep, 2012

1 commit

bda4ec9f6 Merge branches 'bigrt.2012.09.23a', 'doctorture.2012.09.23a', 'fixes.2012.09.23a… ... Browse Code »

…', 'hotplug.2012.09.23a' and 'idlechop.2012.09.23a' into HEAD

bigrt.2012.09.23a contains additional commits to reduce scheduling latency
from RCU on huge systems (many hundrends or thousands of CPUs).

doctorture.2012.09.23a contains documentation changes and rcutorture fixes.

fixes.2012.09.23a contains miscellaneous fixes.

hotplug.2012.09.23a contains CPU-hotplug-related changes.

idle.2012.09.23a fixes architectures for which RCU no longer considered
the idle loop to be a quiescent state due to earlier
adaptive-dynticks changes. Affected architectures are alpha,
cris, frv, h8300, m32r, m68k, mn10300, parisc, score, xtensa,
and ia64.

Paul E. McKenney
2012-09-25 11:02:22 +0800

23 Sep, 2012

11 commits

86f343b50 rcu: Fix CONFIG_RCU_FAST_NO_HZ stall warning message ... Browse Code »

The print_cpu_stall_fast_no_hz() function attempts to print -1 when
the ->idle_gp_timer is not pending, but unsigned arithmetic causes it
to instead print ULONG_MAX, which is 4294967295 on 32-bit systems and
18446744073709551615 on 64-bit systems. Neither of these are the most
reader-friendly values, so this commit instead causes "timer not pending"
to be printed when ->idle_gp_timer is not pending.

Reported-by: Paul Walmsley
Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-09-23 22:42:52 +0800
5fd4dc068 rcu: Avoid rcu_print_detail_task_stall_rnp() segfault ... Browse Code »

The rcu_print_detail_task_stall_rnp() function invokes
rcu_preempt_blocked_readers_cgp() to verify that there are some preempted
RCU readers blocking the current grace period outside of the protection
of the rcu_node structure's ->lock. This means that the last blocked
reader might exit its RCU read-side critical section and remove itself
from the ->blkd_tasks list before the ->lock is acquired, resulting in
a segmentation fault when the subsequent code attempts to dereference
the now-NULL gp_tasks pointer.

This commit therefore moves the test under the lock. This will not
have measurable effect on lock contention because this code is invoked
only when printing RCU CPU stall warnings, in other words, in the common
case, never.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-09-23 22:42:50 +0800
115f7a7ca rcu: Apply for_each_rcu_flavor() to increment_cpu_stall_ticks() ... Browse Code »

The increment_cpu_stall_ticks() function listed each RCU flavor
explicitly, with an ifdef to handle preemptible RCU. This commit
therefore applies for_each_rcu_flavor() to save a line of code.

Because this commit switches from a code-based enumeration of the
flavors of RCU to an rcu_state-list-based enumeration, it is no longer
possible to apply __get_cpu_var() to the per-CPU rcu_data structures.
We instead use __this_cpu_var() on the rcu_state structure's ->rda field
that references the corresponding rcu_data structures.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-09-23 22:42:50 +0800
b065a8535 rcu: Fix obsolete rcu_initiate_boost() header comment ... Browse Code »

Commit 1217ed1b (rcu: permit rcu_read_unlock() to be called while holding
runqueue locks) made rcu_initiate_boost() restore irq state when releasing
the rcu_node structure's ->lock, but failed to update the header comment
accordingly. This commit therefore brings the header comment up to date.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2012-09-23 22:42:50 +0800
5cc900cf5 rcu: Improve boost selection when moving tasks to root rcu_node ... Browse Code »

The rcu_preempt_offline_tasks() moves all tasks queued on a given leaf
rcu_node structure to the root rcu_node, which is done when the last CPU
corresponding the the leaf rcu_node structure goes offline. Now that
RCU-preempt's synchronize_rcu_expedited() implementation blocks CPU-hotplug
operations during the initialization of each rcu_node structure's
->boost_tasks pointer, rcu_preempt_offline_tasks() can do a better job
of setting the root rcu_node's ->boost_tasks pointer.

The key point is that rcu_preempt_offline_tasks() runs as part of the
CPU-hotplug process, so that a concurrent synchronize_rcu_expedited()
is guaranteed to either have not started on the one hand (in which case
there is no boosting on behalf of the expedited grace period) or to be
completely initialized on the other (in which case, in the absence of
other priority boosting, all ->boost_tasks pointers will be initialized).
Therefore, if rcu_preempt_offline_tasks() finds that the ->boost_tasks
pointer is equal to the ->exp_tasks pointer, it can be sure that it is
correctly placed.

In the case where there was boosting ongoing at the time that the
synchronize_rcu_expedited() function started, different nodes might start
boosting the tasks blocking the expedited grace period at different times.
In this mixed case, the root node will either be boosting tasks for
the expedited grace period already, or it will start as soon as it gets
done boosting for the normal grace period -- but in this latter case,
the root node's tasks needed to be boosted in any case.

This commit therefore adds a check of the ->boost_tasks pointer against
the ->exp_tasks pointer to the list that prevents updating ->boost_tasks.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2012-09-23 22:42:50 +0800
1e3fd2b38 rcu: Properly initialize ->boost_tasks on CPU offline ... Browse Code »

When rcu_preempt_offline_tasks() clears tasks from a leaf rcu_node
structure, it does not NULL out the structure's ->boost_tasks field.
This commit therefore fixes this issue.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2012-09-23 22:42:49 +0800
d7d6a11e8 rcu: Simplify quiescent-state detection ... Browse Code »

The current quiescent-state detection algorithm is needlessly
complex. It records the grace-period number corresponding to
the quiescent state at the time of the quiescent state, which
works, but it seems better to simply erase any record of previous
quiescent states at the time that the CPU notices the new grace
period. This has the further advantage of removing another piece
of RCU for which lockless reasoning is required.

Therefore, this commit makes this change.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2012-09-23 22:41:56 +0800
1943c89de rcu: Reduce synchronize_rcu_expedited() latency ... Browse Code »

The synchronize_rcu_expedited() function disables interrupts across a
scan of all leaf rcu_node structures, which is not good for real-time
scheduling latency on large systems (hundreds or especially thousands
of CPUs). This commit therefore holds off CPU-hotplug operations using
get_online_cpus(), and removes the prior acquisiion of the ->onofflock
(which required disabling interrupts).

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2012-09-23 22:41:56 +0800
bcfa57ce1 rcu: Eliminate signed overflow in synchronize_rcu_expedited() ... Browse Code »

In the C language, signed overflow is undefined. It is true that
twos-complement arithmetic normally comes to the rescue, but if the
compiler can subvert this any time it has any information about the values
being compared. For example, given "if (a - b > 0)", if the compiler
has enough information to realize that (for example) the value of "a"
is positive and that of "b" is negative, the compiler is within its
rights to optimize to a simple "if (1)", which might not be what you want.

This commit therefore converts synchronize_rcu_expedited()'s work-done
detection counter from signed to unsigned.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2012-09-23 22:41:56 +0800
4cdfc175c rcu: Move quiescent-state forcing into kthread ... Browse Code »

As the first step towards allowing quiescent-state forcing to be
preemptible, this commit moves RCU quiescent-state forcing into the
same kthread that is now used to initialize and clean up after grace
periods. This is yet another step towards keeping scheduling
latency down to a dull roar.

Updated to change from raw_spin_lock_irqsave() to raw_spin_lock_irq()
and to remove the now-unused rcu_state structure fields as suggested by
Peter Zijlstra.

Reported-by: Mike Galbraith
Reported-by: Dimitri Sivanich
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-09-23 22:41:54 +0800
b626c1b68 rcu: Provide OOM handler to motivate lazy RCU callbacks ... Browse Code »

In kernels built with CONFIG_RCU_FAST_NO_HZ=y, CPUs can accumulate a
large number of lazy callbacks, which as the name implies will be slow
to be invoked. This can be a problem on small-memory systems, where the
default 6-second sleep for CPUs having only lazy RCU callbacks could well
be fatal. This commit therefore installs an OOM hander that ensures that
every CPU with lazy callbacks has at least one non-lazy callback, in turn
ensuring timely advancement for these callbacks.

Updated to fix bug that disabled OOM killing, noted by Lai Jiangshan.

Updated to push the for_each_rcu_flavor() loop into rcu_oom_notify_cpu(),
thus reducing the number of IPIs, as suggested by Steven Rostedt. Also
to make the for_each_online_cpu() loop be preemptible. (Later, it might
be good to use smp_call_function(), as suggested by Peter Zijlstra.)

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Tested-by: Sasha Levin
Reviewed-by: Josh Triplett

Paul E. McKenney
2012-09-23 22:41:53 +0800

13 Aug, 2012

2 commits

62ab70724 rcu: Use smp_hotplug_thread facility for RCUs per-CPU kthread ... Browse Code »

Bring RCU into the new-age CPU-hotplug fold by modifying RCU's per-CPU
kthread code to use the new smp_hotplug_thread facility.

[ tglx: Adapted it to use callbacks and to the simplified rcu yield ]

Signed-off-by: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Srivatsa S. Bhat
Cc: Rusty Russell
Cc: Namhyung Kim
Link: http://lkml.kernel.org/r/20120716103948.673354828@linutronix.de
Signed-off-by: Thomas Gleixner

Paul E. McKenney
2012-08-13 23:01:08 +0800
5d01bbd11 rcu: Yield simpler ... Browse Code »

The rcu_yield() code is amazing. It's there to avoid starvation of the
system when lots of (boosting) work is to be done.

Now looking at the code it's functionality is:

Make the thread SCHED_OTHER and very nice, i.e. get it out of the way
Arm a timer with 2 ticks
schedule()

Now if the system goes idle the rcu task returns, regains SCHED_FIFO
and plugs on. If the systems stays busy the timer fires and wakes a
per node kthread which in turn makes the per cpu thread SCHED_FIFO and
brings it back on the cpu. For the boosting thread the "make it FIFO"
bit is missing and it just runs some magic boost checks. Now this is a
lot of code with extra threads and complexity.

It's way simpler to let the tasks when they detect overload schedule
away for 2 ticks and defer the normal wakeup as long as they are in
yielded state and the cpu is not idle.

That solves the same problem and the only difference is that when the
cpu goes idle it's not guaranteed that the thread returns right away,
but it won't be longer out than two ticks, so no harm is done. If
that's an issue than it is way simpler just to wake the task from
idle as RCU has callbacks there anyway.

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Srivatsa S. Bhat
Cc: Rusty Russell
Cc: Namhyung Kim
Reviewed-by: Paul E. McKenney
Link: http://lkml.kernel.org/r/20120716103948.131256723@linutronix.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2012-08-13 23:01:06 +0800

06 Jul, 2012

2 commits

c701d5d9b rcu: Fix code-style issues involving "else" ... Browse Code »

The Linux kernel coding style says that single-statement blocks should
omit curly braces unless the other leg of the "if" statement has
multiple statements, in which case the curly braces should be included.
This commit fixes RCU's violations of this rule.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2012-07-06 21:01:48 +0800
02a0677b0 Merge branches 'bigrtm.2012.07.04a', 'doctorture.2012.07.02a', 'fixes.2012.07.06… ... Browse Code »

…a' and 'fnh.2012.07.02a' into HEAD

bigrtm: First steps towards getting RCU out of the way of
tens-of-microseconds real-time response on systems compiled
with NR_CPUS=4096. Also cleanups for and increased concurrency
of rcu_barrier() family of primitives.
doctorture: rcutorture and documentation improvements.
fixes: Miscellaneous fixes.
fnh: RCU_FAST_NO_HZ fixes and improvements.

Paul E. McKenney
2012-07-06 20:59:30 +0800

03 Jul, 2012

11 commits

9d2ad2430 rcu: Make RCU_FAST_NO_HZ respect nohz= boot parameter ... Browse Code »

If the nohz= boot parameter disables nohz, then RCU_FAST_NO_HZ needs to
also disable itself. This commit therefore checks for tick_nohz_enabled
being zero, disabling rcu_prepare_for_idle() if so. This commit assumes
that tick_nohz_enabled can change at runtime: If this is not the case,
then a simpler approach suffices.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-07-03 03:34:43 +0800
e84c48ae3 rcu: Round FAST_NO_HZ lazy timeout to nearest second ... Browse Code »

Currently, if several CPUs in the same package have all lazy RCU
callbacks, their wakeups will be uncorrelated. If all the CPUs are in the
same power domain (as is often the case), this will result in unnecessary
power-ups of the package. This commit therefore uses round_jiffies()
to round the timeouts to a second boundary, increasing the odds that
they can be coalesced with each other or with other timeouts.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-07-03 03:34:42 +0800
1c17e4d44 rcu: Prevent uninitialized string in RCU CPU stall info ... Browse Code »

An uninitialized string may be displayed at the end of the rcu_preempt
detected stall info such as

0: (1 GPs behind) idle=075/140000000000000/0 =8?^D=8?^D
^^^^^^^^^^
if CONFIG_RCU_FAST_NO_HZ is not defined.

This trivial patch clears the string in this case.

Signed-off-by: Carsten Emde
Signed-off-by: Paul E. McKenney

Carsten Emde
2012-07-03 03:34:25 +0800
2a3fa843b rcu: Consolidate tree/tiny __rcu_read_{,un}lock() implementations ... Browse Code »

The CONFIG_TREE_PREEMPT_RCU and CONFIG_TINY_PREEMPT_RCU versions of
__rcu_read_lock() and __rcu_read_unlock() are identical, so this commit
consolidates them into kernel/rcupdate.h.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2012-07-03 03:34:23 +0800
6ce75a232 rcu: Introduce for_each_rcu_flavor() and use it ... Browse Code »

The arrival of TREE_PREEMPT_RCU some years back included some ugly
code involving either #ifdef or #ifdef'ed wrapper functions to iterate
over all non-SRCU flavors of RCU. This commit therefore introduces
a for_each_rcu_flavor() iterator over the rcu_state structures for each
flavor of RCU to clean up a bit of the ugliness.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-07-03 03:33:24 +0800
1bca8cf1a rcu: Remove unneeded __rcu_process_callbacks() argument ... Browse Code »

With the advent of __this_cpu_ptr(), it is no longer necessary to pass
both the rcu_state and rcu_data structures into __rcu_process_callbacks().
This commit therefore computes the rcu_data pointer from the rcu_state
pointer within __rcu_process_callbacks() so that callers can pass in
only the pointer to the rcu_state structure. This paves the way for
linking the rcu_state structures together and iterating over them.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2012-07-03 03:33:23 +0800
037b64ed0 rcu: Place pointer to call_rcu() in rcu_data structure ... Browse Code »

This is a preparatory commit for increasing rcu_barrier()'s concurrency.
It adds a pointer in the rcu_data structure to the corresponding call_rcu()
function. This allows a pointer to the rcu_data structure to imply the
function pointer, which allows _rcu_barrier() state to be placed in the
rcu_state structure.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2012-07-03 03:33:21 +0800
cca6f3931 rcu: Size rcu_node tree from nr_cpu_ids rather than NR_CPUS ... Browse Code »

The rcu_node tree array is sized based on compile-time constants,
including NR_CPUS. Although this approach has worked well in the past,
the recent trend by many distros to define NR_CPUS=4096 results in
excessive grace-period-initialization latencies.

This commit therefore substitutes the run-time computed nr_cpu_ids for
the compile-time NR_CPUS when building the tree. This can result in
much of the compile-time-allocated rcu_node array being unused. If
this is a major problem, you are in a specialized situation anyway,
so you can manually adjust the NR_CPUS, RCU_FANOUT, and RCU_FANOUT_LEAF
kernel config parameters.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-07-03 03:33:21 +0800
cc5df65b0 rcu: Four-level hierarchy is no longer experimental ... Browse Code »

Time to make the four-level-hierarchy setting less scary, so this
commit removes "Experimental" from the boot-time message. Leave the
message in order to get a heads-up on any possible need to expand to
a five-level hierarchy.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-07-03 03:33:20 +0800
f885b7f2b rcu: Control RCU_FANOUT_LEAF from boot-time parameter ... Browse Code »

Although making RCU_FANOUT_LEAF a kernel configuration parameter rather
than a fixed constant makes it easier for people to decrease cache-miss
overhead for large systems, it is of little help for people who must
run a single pre-built kernel binary.

This commit therefore allows the value of RCU_FANOUT_LEAF to be
increased (but not decreased!) via a boot-time parameter named
rcutree.rcu_fanout_leaf.

Reported-by: Mike Galbraith
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-07-03 03:33:20 +0800
cba6d0d64 Revert "rcu: Move PREEMPT_RCU preemption to switch_to() invocation" ... Browse Code »

This reverts commit 616c310e83b872024271c915c1b9ab505b9efad9.
(Move PREEMPT_RCU preemption to switch_to() invocation).
Testing by Sasha Levin showed that this
can result in deadlock due to invoking the scheduler when one of
the runqueue locks is held. Because this commit was simply a
performance optimization, revert it.

Reported-by: Sasha Levin
Signed-off-by: Paul E. McKenney
Tested-by: Sasha Levin

Paul E. McKenney
2012-07-03 02:39:19 +0800

07 Jun, 2012

3 commits

aa9b16306 rcu: Precompute RCU_FAST_NO_HZ timer offsets ... Browse Code »

When a CPU is entering dyntick-idle mode, tick_nohz_stop_sched_tick()
calls rcu_needs_cpu() see if RCU needs that CPU, and, if not, computes the
next wakeup time based on the timer wheels. Only later, when actually
entering the idle loop, rcu_prepare_for_idle() will be invoked. In some
cases, rcu_prepare_for_idle() will post timers to wake the CPU back up.
But all for naught: The next wakeup time for the CPU has already been
computed, and posting a timer afterwards does not force that wakeup
time to be recomputed. This means that rcu_prepare_for_idle()'s have
no effect.

This is not a problem on a busy system because something else will wake
up the CPU soon enough. However, on lightly loaded systems, the CPU
might stay asleep for a considerable length of time. If that CPU has
a callback that the rest of the system is waiting on, the system might
run very slowly or (in theory) even hang.

This commit avoids this problem by having rcu_needs_cpu() give
tick_nohz_stop_sched_tick() an estimate of when RCU will need the CPU
to wake back up, which tick_nohz_stop_sched_tick() takes into account
when programming the CPU's wakeup time. An alternative approach is
for rcu_prepare_for_idle() to use hrtimers instead of normal timers,
but timers are much more efficient than are hrtimers for frequently
and repeatedly posting and cancelling a given timer, which is exactly
what RCU_FAST_NO_HZ does.

Reported-by: Pascal Chapperon
Reported-by: Heiko Carstens
Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Tested-by: Heiko Carstens
Tested-by: Pascal Chapperon

Paul E. McKenney
2012-06-07 11:43:28 +0800
5955f7eec rcu: Move RCU_FAST_NO_HZ per-CPU variables to rcu_dynticks structure ... Browse Code »

The RCU_FAST_NO_HZ code relies on a number of per-CPU variables.
This works, but is hidden from someone scanning the data structures
in rcutree.h. This commit therefore converts these per-CPU variables
to fields in the per-CPU rcu_dynticks structures.

Suggested-by: Peter Zijlstra
Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Tested-by: Heiko Carstens
Tested-by: Pascal Chapperon

Paul E. McKenney
2012-06-07 11:43:28 +0800
fd4b35268 rcu: Update RCU_FAST_NO_HZ tracing for lazy callbacks ... Browse Code »

In the current code, a short dyntick-idle interval (where there is
at least one non-lazy callback on the CPU) and a long dyntick-idle
interval (where there are only lazy callbacks on the CPU) are traced
identically, which can be less than helpful. This commit therefore
emits different event traces in these two cases.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Tested-by: Heiko Carstens
Tested-by: Pascal Chapperon

Paul E. McKenney
2012-06-07 11:43:27 +0800

12 May, 2012

1 commit

dc36be441 Merge branches 'barrier.2012.05.09a', 'fixes.2012.04.26a', 'inline.2012.05.02b' … ... Browse Code »

…and 'srcu.2012.05.07b' into HEAD

barrier: Reduce the amount of disturbance by rcu_barrier() to the rest of
the system. This branch also includes improvements to
RCU_FAST_NO_HZ, which are included here due to conflicts.
fixes: Miscellaneous fixes.
inline: Remaining changes from an abortive attempt to inline
preemptible RCU's __rcu_read_lock(). These are (1) making
exit_rcu() avoid unnecessary work and (2) avoiding having
preemptible RCU record a blocked thread when the scheduler
declines to do a context switch.
srcu: Lai Jiangshan's algorithmic implementation of SRCU, including
call_srcu().

Paul E. McKenney
2012-05-12 01:14:21 +0800

10 May, 2012

2 commits

98248a0e2 rcu: Explicitly initialize RCU_FAST_NO_HZ per-CPU variables ... Browse Code »

The current initialization of the RCU_FAST_NO_HZ per-CPU variables makes
needless and fragile assumptions about the initial value of things like
the jiffies counter. This commit therefore explicitly initializes all of
them that are better started with a non-zero value. It also adds some
comments describing the per-CPU state variables.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-05-10 05:26:57 +0800
21e52e156 rcu: Make RCU_FAST_NO_HZ handle timer migration ... Browse Code »

The current RCU_FAST_NO_HZ assumes that timers do not migrate unless a
CPU goes offline, in which case it assumes that the CPU will have to come
out of dyntick-idle mode (cancelling the timer) in order to go offline.
This is important because when RCU_FAST_NO_HZ permits a CPU to enter
dyntick-idle mode despite having RCU callbacks pending, it posts a timer
on that CPU to force a wakeup on that CPU. This wakeup ensures that the
CPU will eventually handle the end of the grace period, including invoking
its RCU callbacks.

However, Pascal Chapperon's test setup shows that the timer handler
rcu_idle_gp_timer_func() really does get invoked in some cases. This is
problematic because this can cause the CPU that entered dyntick-idle
mode despite still having RCU callbacks pending to remain in
dyntick-idle mode indefinitely, which means that its RCU callbacks might
never be invoked. This situation can result in grace-period delays or
even system hangs, which matches Pascal's observations of slow boot-up
and shutdown (https://lkml.org/lkml/2012/4/5/142). See also the bugzilla:

https://bugzilla.redhat.com/show_bug.cgi?id=806548

This commit therefore causes the "should never be invoked" timer handler
rcu_idle_gp_timer_func() to use smp_call_function_single() to wake up
the CPU for which the timer was intended, allowing that CPU to invoke
its RCU callbacks in a timely manner.

Reported-by: Pascal Chapperon
Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-05-10 05:26:56 +0800

03 May, 2012

2 commits

9dd8fb16c rcu: Make exit_rcu() more precise and consolidate ... Browse Code »

When running preemptible RCU, if a task exits in an RCU read-side
critical section having blocked within that same RCU read-side critical
section, the task must be removed from the list of tasks blocking a
grace period (perhaps the current grace period, perhaps the next grace
period, depending on timing). The exit() path invokes exit_rcu() to
do this cleanup.

However, the current implementation of exit_rcu() needlessly does the
cleanup even if the task did not block within the current RCU read-side
critical section, which wastes time and needlessly increases the size
of the state space. Fix this by only doing the cleanup if the current
task is actually on the list of tasks blocking some grace period.

While we are at it, consolidate the two identical exit_rcu() functions
into a single function.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Tested-by: Linus Torvalds

Conflicts:

kernel/rcupdate.c

Paul E. McKenney
2012-05-03 05:48:27 +0800
616c310e8 rcu: Move PREEMPT_RCU preemption to switch_to() invocation ... Browse Code »

Currently, PREEMPT_RCU readers are enqueued upon entry to the scheduler.
This is inefficient because enqueuing is required only if there is a
context switch, and entry to the scheduler does not guarantee a context
switch.

The commit therefore moves the enqueuing to immediately precede the
call to switch_to() from the scheduler.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Tested-by: Linus Torvalds

Paul E. McKenney
2012-05-03 05:43:23 +0800

01 May, 2012

1 commit

f511fc624 rcu: Ensure that RCU_FAST_NO_HZ timers expire on correct CPU ... Browse Code »

Timers are subject to migration, which can lead to the following
system-hang scenario when CONFIG_RCU_FAST_NO_HZ=y:

1. CPU 0 executes synchronize_rcu(), which posts an RCU callback.

2. CPU 0 then goes idle. It cannot immediately invoke the callback,
but there is nothing RCU needs from ti, so it enters dyntick-idle
mode after posting a timer.

3. The timer gets migrated to CPU 1.

4. CPU 0 never wakes up, so the synchronize_rcu() never returns, so
the system hangs.

This commit fixes this problem by using mod_timer_pinned(), as suggested
by Peter Zijlstra, to ensure that the timer is actually posted on the
running CPU.

Reported-by: Dipankar Sarma
Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-05-01 23:22:50 +0800

26 Apr, 2012

1 commit

79b9a75fb rcu: Add warning for RCU_FAST_NO_HZ timer firing ... Browse Code »

RCU_FAST_NO_HZ uses a timer to limit the time that a CPU with callbacks
can remain in dyntick-idle mode. This timer is cancelled when the CPU
exits idle, and therefore should never fire. However, if the timer
were migrated to some other CPU for whatever reason (1) the timer could
actually fire and (2) firing on some other CPU would fail to wake up the
CPU with callbacks, possibly resulting in sluggishness or a system hang.

This commit therfore adds a WARN_ON_ONCE() to the timer handler in order
to detect this condition.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-04-26 23:49:05 +0800

25 Apr, 2012

1 commit

c57afe80d rcu: Make RCU_FAST_NO_HZ account for pauses out of idle ... Browse Code »

Both Steven Rostedt's new idle-capable trace macros and the RCU_NONIDLE()
macro can cause RCU to momentarily pause out of idle without the rest
of the system being involved. This can cause rcu_prepare_for_idle()
to run through its state machine too quickly, which can in turn result
in needless scheduling-clock interrupts.

This commit therefore adds code to enable rcu_prepare_for_idle() to
distinguish between an initial entry to idle on the one hand (which needs
to advance the rcu_prepare_for_idle() state machine) and an idle reentry
due to idle-capable trace macros and RCU_NONIDLE() on the other hand
(which should avoid advancing the rcu_prepare_for_idle() state machine).
Additional state is maintained to allow the timer to be correctly reposted
when returning after a momentary pause out of idle, and even more state
is maintained to detect when new non-lazy callbacks have been enqueued
(which may require re-evaluation of the approach to idleness).

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-04-25 11:55:20 +0800