Doug / smarc-fsl-linux-kernel | Embedian Git Server

05 May, 2013

1 commit

7ee2b9e56 rcutrace: single_open() leaks ... Browse Code »

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro

Al Viro
2013-05-05 12:16:35 +0800

26 Mar, 2013

1 commit

c0f4dfd4f rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks ... Browse Code »

Because RCU callbacks are now associated with the number of the grace
period that they must wait for, CPUs can now take advance callbacks
corresponding to grace periods that ended while a given CPU was in
dyntick-idle mode. This eliminates the need to try forcing the RCU
state machine while entering idle, thus reducing the CPU intensiveness
of RCU_FAST_NO_HZ, which should increase its energy efficiency.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2013-03-26 23:04:51 +0800

17 Nov, 2012

2 commits

c635a4e1c rcu: Separate accounting of callbacks from callback-free CPUs ... Browse Code »

Currently, callback invocations from callback-free CPUs are accounted to
the CPU that registered the callback, but using the same field that is
used for normal callbacks. This makes it impossible to determine from
debugfs output whether callbacks are in fact being diverted. This commit
therefore adds a separate ->n_nocbs_invoked field in the rcu_data structure
in which diverted callback invocations are counted. RCU's debugfs tracing
still displays normal callback invocations using ci=, but displayed
diverted callbacks with nci=.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-11-17 02:05:57 +0800
3fbfbf7a3 rcu: Add callback-free CPUs ... Browse Code »

RCU callback execution can add significant OS jitter and also can
degrade both scheduling latency and, in asymmetric multiprocessors,
energy efficiency. This commit therefore adds the ability for selected
CPUs ("rcu_nocbs=" boot parameter) to have their callbacks offloaded
to kthreads. If the "rcu_nocb_poll" boot parameter is also specified,
these kthreads will do polling, removing the need for the offloaded
CPUs to do wakeups. At least one CPU must be doing normal callback
processing: currently CPU 0 cannot be selected as a no-CBs CPU.
In addition, attempts to offline the last normal-CBs CPU will fail.

This feature was inspired by Jim Houston's and Joe Korty's JRCU, and
this commit includes fixes to problems located by Fengguang Wu's
kbuild test robot.

[ paulmck: Added gfp.h include file as suggested by Fengguang Wu. ]

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-11-17 02:05:56 +0800

09 Nov, 2012

14 commits

7bd8f2a74 rcu: Add tracing for synchronize_sched_expedited() ... Browse Code »

This commit adds a per-RCU-flavor "rcuexp" file that dumps out
statistics for synchonize_sched_expedited().

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-11-09 03:57:07 +0800
6ee0886ff rcu: Remove old debugfs interfaces and also RCU flavor name ... Browse Code »

This commit removes the old debugfs interfaces, so that the new
directory-per-RCU-flavor versions remain. Because the RCU flavor is
given by the directory name, there is no need to print it out, so remove
the name from the printout.

Signed-off-by: Michael Wang
Signed-off-by: Paul E. McKenney

Michael Wang
2012-11-09 03:56:44 +0800
a608d84bd rcu: split 'rcuhier' to each flavor ... Browse Code »

This patch add new 'rcuhier' to each flavor's folder, now we could use:
'cat /debugfs/rcu/rsp/rcuhier'
to get the selected rsp info.

Signed-off-by: Michael Wang
Signed-off-by: Paul E. McKenney

Michael Wang
2012-11-09 03:55:45 +0800
66b38bc52 rcu: split 'rcugp' to each flavor ... Browse Code »

This patch add new 'rcugp' to each flavor's folder, now we could use:
'cat /debugfs/rcu/rsp/rcugp'
to get the selected rsp info.

Signed-off-by: Michael Wang
Signed-off-by: Paul E. McKenney

Michael Wang
2012-11-09 03:55:44 +0800
29c67764f rcu: split 'rcuboost' to each flavor ... Browse Code »

This patch add new 'rcuboost' to each flavor's folder, now we could use:
'cat /debugfs/rcu/rsp/rcuboost'
to get the selected rsp info.

Signed-off-by: Michael Wang
Signed-off-by: Paul E. McKenney

Michael Wang
2012-11-09 03:55:42 +0800
c25e557f5 rcu: split 'rcubarrier' to each flavor ... Browse Code »

This patch add new 'rcubarrier' to each flavor's folder, now we could use:
'cat /debugfs/rcu/rsp/rcubarrier'
to get the selected rsp info.

Signed-off-by: Michael Wang
Signed-off-by: Paul E. McKenney

Michael Wang
2012-11-09 03:55:41 +0800
42c3533ee rcu: Fix tracing formatting ... Browse Code »

The rcu_state structure's ->completed field is unsigned long, so this
commit adjusts show_one_rcugp()'s printf() format to suit. Also add
the required ACCESS_ONCE() directives while we are in this function.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-11-09 03:55:30 +0800
5f4ee1fa1 rcu: Remove the interface "rcudata.csv" ... Browse Code »

This patch removes the interface "rcudata.csv" since it is apparently
not used.

Signed-off-by: Michael Wang
Signed-off-by: Paul E. McKenney

Michael Wang
2012-11-09 03:50:19 +0800
c011c41f1 rcu: Replace the old interface with the new one ... Browse Code »

This patch removed the old RCU debugfs interface and replaced it with
the new one.

Signed-off-by: Michael Wang
Signed-off-by: Paul E. McKenney

Michael Wang
2012-11-09 03:50:18 +0800
51d0f16d4 rcu: Optimize the 'rcu_pending' for RCU trace ... Browse Code »

This patch implements the new 'rcu_pending' interface under each rsp
directory, by using the 'CPU units sequence reading', thus avoiding loss
of tracing data.

Signed-off-by: Michael Wang
Signed-off-by: Paul E. McKenney

Michael Wang
2012-11-09 03:50:17 +0800
d29200efa rcu: Optimize the 'rcudata.csv' for RCU trace ... Browse Code »

This patch implements the new 'rcudata.csv' interface under each rsp
directory, by using the 'CPU units sequence reading', thus avoiding loss
of tracing data.

Signed-off-by: Michael Wang
Signed-off-by: Paul E. McKenney

Michael Wang
2012-11-09 03:50:16 +0800
878eda72e rcu: Optimize the 'rcudata' for RCU trace ... Browse Code »

This patch implements the new 'rcudata' interface under each rsp
directory, by using the 'CPU units sequence reading', thus avoiding loss
of tracing data.

Signed-off-by: Michael Wang
Signed-off-by: Paul E. McKenney

Michael Wang
2012-11-09 03:50:16 +0800
374b928ee rcu: Fundamental facility for 'CPU units sequence reading' ... Browse Code »

This patch add the fundamental facility used by the following patches, so we
can implement the 'CPU units sequence reading' later.

This helps us avoid losing data when there are too many CPUs and too
small of a buffer, since this new approach allows userspace to read out
the data one CPU at a time. Thus, if the buffer is not large enough,
userspace will get whatever CPUs fit, and can then issue another read
for the remainder of the data.

Signed-off-by: Michael Wang
Signed-off-by: Paul E. McKenney

Michael Wang
2012-11-09 03:50:15 +0800
573bcd40d rcu: Create directory for each flavor of rcu ... Browse Code »

This patch will create subdirectory according to each flavor of rcu, the new
structure will be:

/debugfs/rcu/ -> rsp_0
-> rsp_1
-> ...

So we can go to '/debugfs/rcu/rsp_0' and get the cpu info of rsp_0 there.
The flavors of RCU are currently rcu_bh, rcu_preempt, and rcu_sched.

Signed-off-by: Michael Wang
Signed-off-by: Paul E. McKenney

Michael Wang
2012-11-09 03:50:14 +0800

26 Sep, 2012

1 commit

5217192b8 Merge remote-tracking branch 'tip/smp/hotplug' into next.2012.09.25b ... Browse Code »

The conflicts between kernel/rcutree.h and kernel/rcutree_plugin.h
were due to adjacent insertions and deletions, which were resolved
by simply accepting the changes on both branches.

Paul E. McKenney
2012-09-26 01:01:45 +0800

25 Sep, 2012

1 commit

bda4ec9f6 Merge branches 'bigrt.2012.09.23a', 'doctorture.2012.09.23a', 'fixes.2012.09.23a… ... Browse Code »

…', 'hotplug.2012.09.23a' and 'idlechop.2012.09.23a' into HEAD

bigrt.2012.09.23a contains additional commits to reduce scheduling latency
from RCU on huge systems (many hundrends or thousands of CPUs).

doctorture.2012.09.23a contains documentation changes and rcutorture fixes.

fixes.2012.09.23a contains miscellaneous fixes.

hotplug.2012.09.23a contains CPU-hotplug-related changes.

idle.2012.09.23a fixes architectures for which RCU no longer considered
the idle loop to be a quiescent state due to earlier
adaptive-dynticks changes. Affected architectures are alpha,
cris, frv, h8300, m32r, m68k, mn10300, parisc, score, xtensa,
and ia64.

Paul E. McKenney
2012-09-25 11:02:22 +0800

23 Sep, 2012

3 commits

1331e7a1b rcu: Remove _rcu_barrier() dependency on __stop_machine() ... Browse Code »

Currently, _rcu_barrier() relies on preempt_disable() to prevent
any CPU from going offline, which in turn depends on CPU hotplug's
use of __stop_machine().

This patch therefore makes _rcu_barrier() use get_online_cpus() to
block CPU-hotplug operations. This has the added benefit of removing
the need for _rcu_barrier() to adopt callbacks: Because CPU-hotplug
operations are excluded, there can be no callbacks to adopt. This
commit simplifies the code accordingly.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2012-09-23 22:43:55 +0800
d7d6a11e8 rcu: Simplify quiescent-state detection ... Browse Code »

The current quiescent-state detection algorithm is needlessly
complex. It records the grace-period number corresponding to
the quiescent state at the time of the quiescent state, which
works, but it seems better to simply erase any record of previous
quiescent states at the time that the CPU notices the new grace
period. This has the further advantage of removing another piece
of RCU for which lockless reasoning is required.

Therefore, this commit makes this change.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2012-09-23 22:41:56 +0800
4605c0143 rcu: Adjust debugfs tracing for kthread-based quiescent-state forcing ... Browse Code »

Moving quiescent-state forcing into a kthread dispenses with the need
for the ->n_rp_need_fqs field, so this commit removes it.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2012-09-23 22:41:54 +0800

13 Aug, 2012

1 commit

62ab70724 rcu: Use smp_hotplug_thread facility for RCUs per-CPU kthread ... Browse Code »

Bring RCU into the new-age CPU-hotplug fold by modifying RCU's per-CPU
kthread code to use the new smp_hotplug_thread facility.

[ tglx: Adapted it to use callbacks and to the simplified rcu yield ]

Signed-off-by: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Srivatsa S. Bhat
Cc: Rusty Russell
Cc: Namhyung Kim
Link: http://lkml.kernel.org/r/20120716103948.673354828@linutronix.de
Signed-off-by: Thomas Gleixner

Paul E. McKenney
2012-08-13 23:01:08 +0800

06 Jul, 2012

1 commit

5cf05ad75 rcu: Fix broken strings in RCU's source code. ... Browse Code »

Although the C language allows you to break strings across lines, doing
this makes it hard for people to find the Linux kernel code corresponding
to a given console message. This commit therefore fixes broken strings
throughout RCU's source code.

Suggested-by: Josh Triplett
Suggested-by: Ingo Molnar
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-07-06 21:01:49 +0800

03 Jul, 2012

3 commits

c0cc962da rcu: Use for_each_rcu_flavor() in TREE_RCU tracing ... Browse Code »

This commit applies the new for_each_rcu_flavor() macro to the
kernel/rcutree_trace.c file.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-07-03 03:33:24 +0800
d7e187c8e rcu: Add rcu_barrier() statistics to debugfs tracing ... Browse Code »

This commit adds an rcubarrier file to RCU's debugfs statistical tracing
directory, providing diagnostic information on rcu_barrier().

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2012-07-03 03:33:23 +0800
f885b7f2b rcu: Control RCU_FANOUT_LEAF from boot-time parameter ... Browse Code »

Although making RCU_FANOUT_LEAF a kernel configuration parameter rather
than a fixed constant makes it easier for people to decrease cache-miss
overhead for large systems, it is of little help for people who must
run a single pre-built kernel binary.

This commit therefore allows the value of RCU_FANOUT_LEAF to be
increased (but not decreased!) via a boot-time parameter named
rcutree.rcu_fanout_leaf.

Reported-by: Mike Galbraith
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-07-03 03:33:20 +0800

10 May, 2012

1 commit

b1420f1c8 rcu: Make rcu_barrier() less disruptive ... Browse Code »

The rcu_barrier() primitive interrupts each and every CPU, registering
a callback on every CPU. Once all of these callbacks have been invoked,
rcu_barrier() knows that every callback that was registered before
the call to rcu_barrier() has also been invoked.

However, there is no point in registering a callback on a CPU that
currently has no callbacks, most especially if that CPU is in a
deep idle state. This commit therefore makes rcu_barrier() avoid
interrupting CPUs that have no callbacks. Doing this requires reworking
the handling of orphaned callbacks, otherwise callbacks could slip through
rcu_barrier()'s net by being orphaned from a CPU that rcu_barrier() had
not yet interrupted to a CPU that rcu_barrier() had already interrupted.
This reworking was needed anyway to take a first step towards weaning
RCU from the CPU_DYING notifier's use of stop_cpu().

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-05-10 05:27:54 +0800

22 Feb, 2012

2 commits

2036d94a7 rcu: Rework detection of use of RCU by offline CPUs ... Browse Code »

Because newly offlined CPUs continue executing after completing the
CPU_DYING notifiers, they legitimately enter the scheduler and use
RCU while appearing to be offline. This calls for a more sophisticated
approach as follows:

1. RCU marks the CPU online during the CPU_UP_PREPARE phase.

2. RCU marks the CPU offline during the CPU_DEAD phase.

3. Diagnostics regarding use of read-side RCU by offline CPUs use
RCU's accounting rather than the cpu_online_map. (Note that
__call_rcu() still uses cpu_online_map to detect illegal
invocations within CPU_DYING notifiers.)

4. Offline CPUs are prevented from hanging the system by
force_quiescent_state(), which pays attention to cpu_online_map.
Some additional work (in a later commit) will be needed to
guarantee that force_quiescent_state() waits a full jiffy before
assuming that a CPU is offline, for example, when called from
idle entry. (This commit also makes the one-jiffy wait
explicit, since the old-style implicit wait can now be defeated
by RCU_FAST_NO_HZ and by rcutorture.)

This approach avoids the false positives encountered when attempting to
use more exact classification of CPU online/offline state.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-02-22 01:06:07 +0800
486e25934 rcu: Avoid waking up CPUs having only kfree_rcu() callbacks ... Browse Code »

When CONFIG_RCU_FAST_NO_HZ is enabled, RCU will allow a given CPU to
enter dyntick-idle mode even if it still has RCU callbacks queued.
RCU avoids system hangs in this case by scheduling a timer for several
jiffies in the future. However, if all of the callbacks on that CPU
are from kfree_rcu(), there is no reason to wake the CPU up, as it is
not a problem to defer freeing of memory.

This commit therefore tracks the number of callbacks on a given CPU
that are from kfree_rcu(), and avoids scheduling the timer if all of
a given CPU's callbacks are from kfree_rcu().

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2012-02-22 01:03:25 +0800

12 Dec, 2011

2 commits

9b2e4f188 rcu: Track idleness independent of idle tasks ... Browse Code »

Earlier versions of RCU used the scheduling-clock tick to detect idleness
by checking for the idle task, but handled idleness differently for
CONFIG_NO_HZ=y. But there are now a number of uses of RCU read-side
critical sections in the idle task, for example, for tracing. A more
fine-grained detection of idleness is therefore required.

This commit presses the old dyntick-idle code into full-time service,
so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
always invoked at the beginning of an idle loop iteration. Similarly,
rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
at the end of an idle-loop iteration. This allows the idle task to
use RCU everywhere except between consecutive rcu_idle_enter() and
rcu_idle_exit() calls, in turn allowing architecture maintainers to
specify exactly where in the idle loop that RCU may be used.

Because some of the userspace upcall uses can result in what looks
to RCU like half of an interrupt, it is not possible to expect that
the irq_enter() and irq_exit() hooks will give exact counts. This
patch therefore expands the ->dynticks_nesting counter to 64 bits
and uses two separate bitfields to count process/idle transitions
and interrupt entry/exit transitions. It is presumed that userspace
upcalls do not happen in the idle loop or from usermode execution
(though usermode might do a system call that results in an upcall).
The counter is hard-reset on each process/idle transition, which
avoids the interrupt entry/exit error from accumulating. Overflow
is avoided by the 64-bitness of the ->dyntick_nesting counter.

This commit also adds warnings if a non-idle task asks RCU to enter
idle state (and these checks will need some adjustment before applying
Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
In addition, validation of ->dynticks and ->dynticks_nesting is added.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2011-12-12 02:31:24 +0800
af446b702 rcu: ->signaled better named ->fqs_state ... Browse Code »

The ->signaled field was named before complications in the form of
dyntick-idle mode and offlined CPUs. These complications have required
that force_quiescent_state() be implemented as a state machine, instead
of simply unconditionally sending reschedule IPIs. Therefore, this
commit renames ->signaled to ->fqs_state to catch up with the new
force_quiescent_state() reality.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2011-12-12 02:31:20 +0800

29 Sep, 2011

3 commits

e4cc1f22b rcu: Simplify quiescent-state accounting ... Browse Code »

There is often a delay between the time that a CPU passes through a
quiescent state and the time that this quiescent state is reported to the
RCU core. It is quite possible that the grace period ended before the
quiescent state could be reported, for example, some other CPU might have
deduced that this CPU passed through dyntick-idle mode. It is critically
important that quiescent state be counted only against the grace period
that was in effect at the time that the quiescent state was detected.

Previously, this was handled by recording the number of the last grace
period to complete when passing through a quiescent state. The RCU
core then checks this number against the current value, and rejects
the quiescent state if there is a mismatch. However, one additional
possibility must be accounted for, namely that the quiescent state was
recorded after the prior grace period completed but before the current
grace period started. In this case, the RCU core must reject the
quiescent state, but the recorded number will match. This is handled
when the CPU becomes aware of a new grace period -- at that point,
it invalidates any prior quiescent state.

This works, but is a bit indirect. The new approach records the current
grace period, and the RCU core checks to see (1) that this is still the
current grace period and (2) that this grace period has not yet ended.
This approach simplifies reasoning about correctness, and this commit
changes over to this new approach.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2011-09-29 12:38:22 +0800
eab0993c7 rcu: Move RCU_BOOST declarations to allow compiler checking ... Browse Code »

Andi Kleen noticed that one of the RCU_BOOST data declarations was
out of sync with the definition. Move the declarations so that the
compiler can do the checking in the future.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2011-09-29 12:38:18 +0800
f039d1f18 rcu: Fix mismatched variable in rcutree_trace.c ... Browse Code »

rcutree.c defines rcu_cpu_kthread_cpu as int, not unsigned int,
so the extern has to follow that.

Signed-off-by: Andi Kleen
Signed-off-by: Paul E. McKenney

Andi Kleen
2011-09-29 12:36:40 +0800

27 Jul, 2011

1 commit

60063497a atomic: use <linux/atomic.h> ... Browse Code »

This allows us to move duplicated code in
(atomic_inc_not_zero() for now) to

Signed-off-by: Arun Sharma
Reviewed-by: Eric Dumazet
Cc: Ingo Molnar
Cc: David Miller
Cc: Eric Dumazet
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arun Sharma
2011-07-27 07:49:47 +0800

16 Jun, 2011

1 commit

a46e0899e rcu: use softirq instead of kthreads except when RCU_BOOST=y ... Browse Code »

This patch #ifdefs RCU kthreads out of the kernel unless RCU_BOOST=y,
thus eliminating context-switch overhead if RCU priority boosting has
not been configured.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2011-06-16 14:07:21 +0800

27 May, 2011

1 commit

23b5c8fa0 rcu: Decrease memory-barrier usage based on semi-formal proof ... Browse Code »

(Note: this was reverted, and is now being re-applied in pieces, with
this being the fifth and final piece. See below for the reason that
it is now felt to be safe to re-apply this.)

Commit d09b62d fixed grace-period synchronization, but left some smp_mb()
invocations in rcu_process_callbacks() that are no longer needed, but
sheer paranoia prevented them from being removed. This commit removes
them and provides a proof of correctness in their absence. It also adds
a memory barrier to rcu_report_qs_rsp() immediately before the update to
rsp->completed in order to handle the theoretical possibility that the
compiler or CPU might move massive quantities of code into a lock-based
critical section. This also proves that the sheer paranoia was not
entirely unjustified, at least from a theoretical point of view.

In addition, the old dyntick-idle synchronization depended on the fact
that grace periods were many milliseconds in duration, so that it could
be assumed that no dyntick-idle CPU could reorder a memory reference
across an entire grace period. Unfortunately for this design, the
addition of expedited grace periods breaks this assumption, which has
the unfortunate side-effect of requiring atomic operations in the
functions that track dyntick-idle state for RCU. (There is some hope
that the algorithms used in user-level RCU might be applied here, but
some work is required to handle the NMIs that user-space applications
can happily ignore. For the short term, better safe than sorry.)

This proof assumes that neither compiler nor CPU will allow a lock
acquisition and release to be reordered, as doing so can result in
deadlock. The proof is as follows:

1. A given CPU declares a quiescent state under the protection of
its leaf rcu_node's lock.

2. If there is more than one level of rcu_node hierarchy, the
last CPU to declare a quiescent state will also acquire the
->lock of the next rcu_node up in the hierarchy, but only
after releasing the lower level's lock. The acquisition of this
lock clearly cannot occur prior to the acquisition of the leaf
node's lock.

3. Step 2 repeats until we reach the root rcu_node structure.
Please note again that only one lock is held at a time through
this process. The acquisition of the root rcu_node's ->lock
must occur after the release of that of the leaf rcu_node.

4. At this point, we set the ->completed field in the rcu_state
structure in rcu_report_qs_rsp(). However, if the rcu_node
hierarchy contains only one rcu_node, then in theory the code
preceding the quiescent state could leak into the critical
section. We therefore precede the update of ->completed with a
memory barrier. All CPUs will therefore agree that any updates
preceding any report of a quiescent state will have happened
before the update of ->completed.

5. Regardless of whether a new grace period is needed, rcu_start_gp()
will propagate the new value of ->completed to all of the leaf
rcu_node structures, under the protection of each rcu_node's ->lock.
If a new grace period is needed immediately, this propagation
will occur in the same critical section that ->completed was
set in, but courtesy of the memory barrier in #4 above, is still
seen to follow any pre-quiescent-state activity.

6. When a given CPU invokes __rcu_process_gp_end(), it becomes
aware of the end of the old grace period and therefore makes
any RCU callbacks that were waiting on that grace period eligible
for invocation.

If this CPU is the same one that detected the end of the grace
period, and if there is but a single rcu_node in the hierarchy,
we will still be in the single critical section. In this case,
the memory barrier in step #4 guarantees that all callbacks will
be seen to execute after each CPU's quiescent state.

On the other hand, if this is a different CPU, it will acquire
the leaf rcu_node's ->lock, and will again be serialized after
each CPU's quiescent state for the old grace period.

On the strength of this proof, this commit therefore removes the memory
barriers from rcu_process_callbacks() and adds one to rcu_report_qs_rsp().
The effect is to reduce the number of memory barriers by one and to
reduce the frequency of execution from about once per scheduling tick
per CPU to once per grace period.

This was reverted do to hangs found during testing by Yinghai Lu and
Ingo Molnar. Frederic Weisbecker supplied Yinghai with tracing that
located the underlying problem, and Frederic also provided the fix.

The underlying problem was that the HARDIRQ_ENTER() macro from
lib/locking-selftest.c invoked irq_enter(), which in turn invokes
rcu_irq_enter(), but HARDIRQ_EXIT() invoked __irq_exit(), which
does not invoke rcu_irq_exit(). This situation resulted in calls
to rcu_irq_enter() that were not balanced by the required calls to
rcu_irq_exit(). Therefore, after these locking selftests completed,
RCU's dyntick-idle nesting count was a large number (for example,
72), which caused RCU to to conclude that the affected CPU was not in
dyntick-idle mode when in fact it was.

RCU would therefore incorrectly wait for this dyntick-idle CPU, resulting
in hangs.

In contrast, with Frederic's patch, which replaces the irq_enter()
in HARDIRQ_ENTER() with an __irq_enter(), these tests don't ever call
either rcu_irq_enter() or rcu_irq_exit(), which works because the CPU
running the test is already marked as not being in dyntick-idle mode.
This means that the rcu_irq_enter() and rcu_irq_exit() calls and RCU
then has no problem working out which CPUs are in dyntick-idle mode and
which are not.

The reason that the imbalance was not noticed before the barrier patch
was applied is that the old implementation of rcu_enter_nohz() ignored
the nesting depth. This could still result in delays, but much shorter
ones. Whenever there was a delay, RCU would IPI the CPU with the
unbalanced nesting level, which would eventually result in rcu_enter_nohz()
being called, which in turn would force RCU to see that the CPU was in
dyntick-idle mode.

The reason that very few people noticed the problem is that the mismatched
irq_enter() vs. __irq_exit() occured only when the kernel was built with
CONFIG_DEBUG_LOCKING_API_SELFTESTS.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2011-05-27 00:42:23 +0800

20 May, 2011

1 commit

80d02085d Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" ... Browse Code »

This reverts commit e59fb3120becfb36b22ddb8bd27d065d3cdca499.

This reversion was due to (extreme) boot-time slowdowns on SPARC seen by
Yinghai Lu and on x86 by Ingo
.
This is a non-trivial reversion due to intervening commits.

Conflicts:

Documentation/RCU/trace.txt
kernel/rcutree.c

Signed-off-by: Ingo Molnar

Paul E. McKenney
2011-05-20 05:25:29 +0800