Eric Lee / smarc-fsl-linux-kernel

16 Jun, 2016

2 commits

bc75e9998 rcu: Correctly handle sparse possible cpus ... Browse Code »

In many cases in the RCU tree code, we iterate over the set of cpus for
a leaf node described by rcu_node::grplo and rcu_node::grphi, checking
per-cpu data for each cpu in this range. However, if the set of possible
cpus is sparse, some cpus described in this range are not possible, and
thus no per-cpu region will have been allocated (or initialised) for
them by the generic percpu code.

Erroneous accesses to a per-cpu area for these !possible cpus may fault
or may hit other data depending on the addressed generated when the
erroneous per cpu offset is applied. In practice, both cases have been
observed on arm64 hardware (the former being silent, but detectable with
additional patches).

To avoid issues resulting from this, we must iterate over the set of
*possible* cpus for a given leaf node. This patch add a new helper,
for_each_leaf_node_possible_cpu, to enable this. As iteration is often
intertwined with rcu_node local bitmask manipulation, a new
leaf_node_cpu_bit helper is added to make this simpler and more
consistent. The RCU tree code is made to use both of these where
appropriate.

Without this patch, running reboot at a shell can result in an oops
like:

[ 3369.075979] Unable to handle kernel paging request at virtual address ffffff8008b21b4c
[ 3369.083881] pgd = ffffffc3ecdda000
[ 3369.087270] [ffffff8008b21b4c] *pgd=00000083eca48003, *pud=00000083eca48003, *pmd=0000000000000000
[ 3369.096222] Internal error: Oops: 96000007 [#1] PREEMPT SMP
[ 3369.101781] Modules linked in:
[ 3369.104825] CPU: 2 PID: 1817 Comm: NetworkManager Tainted: G W 4.6.0+ #3
[ 3369.121239] task: ffffffc0fa13e000 ti: ffffffc3eb940000 task.ti: ffffffc3eb940000
[ 3369.128708] PC is at sync_rcu_exp_select_cpus+0x188/0x510
[ 3369.134094] LR is at sync_rcu_exp_select_cpus+0x104/0x510
[ 3369.139479] pc : [] lr : [] pstate: 200001c5
[ 3369.146860] sp : ffffffc3eb9435a0
[ 3369.150162] x29: ffffffc3eb9435a0 x28: ffffff8008be4f88
[ 3369.155465] x27: ffffff8008b66c80 x26: ffffffc3eceb2600
[ 3369.160767] x25: 0000000000000001 x24: ffffff8008be4f88
[ 3369.166070] x23: ffffff8008b51c3c x22: ffffff8008b66c80
[ 3369.171371] x21: 0000000000000001 x20: ffffff8008b21b40
[ 3369.176673] x19: ffffff8008b66c80 x18: 0000000000000000
[ 3369.181975] x17: 0000007fa951a010 x16: ffffff80086a30f0
[ 3369.187278] x15: 0000007fa9505590 x14: 0000000000000000
[ 3369.192580] x13: ffffff8008b51000 x12: ffffffc3eb940000
[ 3369.197882] x11: 0000000000000006 x10: ffffff8008b51b78
[ 3369.203184] x9 : 0000000000000001 x8 : ffffff8008be4000
[ 3369.208486] x7 : ffffff8008b21b40 x6 : 0000000000001003
[ 3369.213788] x5 : 0000000000000000 x4 : ffffff8008b27280
[ 3369.219090] x3 : ffffff8008b21b4c x2 : 0000000000000001
[ 3369.224406] x1 : 0000000000000001 x0 : 0000000000000140
...
[ 3369.972257] [] sync_rcu_exp_select_cpus+0x188/0x510
[ 3369.978685] [] synchronize_rcu_expedited+0x64/0xa8
[ 3369.985026] [] synchronize_net+0x24/0x30
[ 3369.990499] [] dev_deactivate_many+0x28c/0x298
[ 3369.996493] [] __dev_close_many+0x60/0xd0
[ 3370.002052] [] __dev_close+0x28/0x40
[ 3370.007178] [] __dev_change_flags+0x8c/0x158
[ 3370.012999] [] dev_change_flags+0x20/0x60
[ 3370.018558] [] do_setlink+0x288/0x918
[ 3370.023771] [] rtnl_newlink+0x398/0x6a8
[ 3370.029158] [] rtnetlink_rcv_msg+0xe4/0x220
[ 3370.034891] [] netlink_rcv_skb+0xc4/0xf8
[ 3370.040364] [] rtnetlink_rcv+0x2c/0x40
[ 3370.045663] [] netlink_unicast+0x160/0x238
[ 3370.051309] [] netlink_sendmsg+0x2f0/0x358
[ 3370.056956] [] sock_sendmsg+0x18/0x30
[ 3370.062168] [] ___sys_sendmsg+0x26c/0x280
[ 3370.067728] [] __sys_sendmsg+0x44/0x88
[ 3370.073027] [] SyS_sendmsg+0x10/0x20
[ 3370.078153] [] el0_svc_naked+0x24/0x28

Signed-off-by: Mark Rutland
Reported-by: Dennis Chen
Cc: Catalin Marinas
Cc: Josh Triplett
Cc: Lai Jiangshan
Cc: Mathieu Desnoyers
Cc: Steve Capper
Cc: Steven Rostedt
Cc: Will Deacon
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Paul E. McKenney

Mark Rutland
2016-06-16 07:00:05 +0800
088e9d253 rcu: sysctl: Panic on RCU Stall ... Browse Code »

It is not always easy to determine the cause of an RCU stall just by
analysing the RCU stall messages, mainly when the problem is caused
by the indirect starvation of rcu threads. For example, when preempt_rcu
is not awakened due to the starvation of a timer softirq.

We have been hard coding panic() in the RCU stall functions for
some time while testing the kernel-rt. But this is not possible in
some scenarios, like when supporting customers.

This patch implements the sysctl kernel.panic_on_rcu_stall. If
set to 1, the system will panic() when an RCU stall takes place,
enabling the capture of a vmcore. The vmcore provides a way to analyze
all kernel/tasks states, helping out to point to the culprit and the
solution for the stall.

The kernel.panic_on_rcu_stall sysctl is disabled by default.

Changes from v1:
- Fixed a typo in the git log
- The if(sysctl_panic_on_rcu_stall) panic() is in a static function
- Fixed the CONFIG_TINY_RCU compilation issue
- The var sysctl_panic_on_rcu_stall is now __read_mostly

Cc: Jonathan Corbet
Cc: "Paul E. McKenney"
Cc: Josh Triplett
Cc: Steven Rostedt
Cc: Mathieu Desnoyers
Cc: Lai Jiangshan
Acked-by: Christian Borntraeger
Reviewed-by: Josh Triplett
Reviewed-by: Arnaldo Carvalho de Melo
Tested-by: "Luis Claudio R. Goncalves"
Signed-off-by: Daniel Bristot de Oliveira
Signed-off-by: Paul E. McKenney

Daniel Bristot de Oliveira
2016-06-16 07:00:05 +0800

15 Jun, 2016

4 commits

3549c2bc2 rcu: Move expedited code from tree.c to tree_exp.h ... Browse Code »

People have been having some difficulty finding their way around the
RCU code. This commit therefore pulls some of the expedited grace-period
code from tree.c to a new tree_exp.h file. This commit is strictly code
movement, with the exception of a forward declaration that was added
for the sync_sched_exp_online_cleanup() function.

A subsequent commit will move the remaining expedited grace-period code
from tree_plugin.h to tree_exp.h.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-06-15 07:01:41 +0800
d3acab65f rcu: Remove some superfluous lines ... Browse Code »

I think you'll find this condition is superfluous, as the whole function
is under #ifdef of that same.

Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Paul E. McKenney

Peter Zijlstra
2016-06-15 07:01:41 +0800
590d1757b rcu: Fix outdated hotplug-exclusion comment in rcu_gp_init() ... Browse Code »

In the past, RCU grace-period initialization excluded CPU-hotplug
operations, but this is no longer the case. This commit therefore
removed an outdated comment in rcu_gp_init() claiming that these
are excluded.

Reported-by: Lihao Liang
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-06-15 07:01:40 +0800
0d95092cc rcu: Fix outdated rcu_scheduler_active comment ... Browse Code »

The comment header for rcu_scheduler_active states that it is used
to optimize synchronize_sched() at early boot. This is incorrect.
The synchronize_sched() function instead checks the number of online
CPUs. This commit therefore replaces the comment's synchronize_sched()
with synchronize_rcu(), which really does use rcu_scheduler_active for
this purpose.

Reported-by: Lihao Liang
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-06-15 07:01:39 +0800

22 Apr, 2016

1 commit

dcd36d01f Merge branches 'doc.2016.04.19a', 'exp.2016.03.31d', 'fixes.2016.03.31d' and 'to… ... Browse Code »

…rture.2016.04.21a' into HEAD

doc.2016.04.19a: Documentation updates
exp.2016.03.31d: Expedited grace-period updates
fixes.2016.03.31d: Miscellaneous fixes
torture.2016.004.21a Torture-test updates

Paul E. McKenney
2016-04-22 04:48:20 +0800

01 Apr, 2016

20 commits

291783b8a rcutorture: Expedited-GP batch progress access to torturing ... Browse Code »

This commit provides rcu_exp_batches_completed() and
rcu_exp_batches_completed_sched() functions to allow torture-test modules
to check how many expedited grace period batches have completed.
These are analogous to the existing rcu_batches_completed(),
rcu_batches_completed_bh(), and rcu_batches_completed_sched() functions.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:37:37 +0800
5dffed1e5 rcu: Dump ftrace buffer when kicking grace-period kthread ... Browse Code »

If it is necessary to kick the grace-period kthread, that is a good
time to dump the trace buffer in order to learn why kicking was needed.
This commit therefore does the dump.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:36:37 +0800
8c7c4829a rcu: Awaken grace-period kthread if too long since FQS ... Browse Code »

Recent kernels can fail to awaken the grace-period kthread for
quiescent-state forcing. This commit is a crude hack that does
a wakeup if a scheduling-clock interrupt sees that it has been
too long since force-quiescent-state (FQS) processing.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:34:50 +0800
fcfd0a237 rcu: Make FQS schedule advance only if FQS happened ... Browse Code »

Currently, the force-quiescent-state (FQS) code in rcu_gp_kthread() can
advance the next FQS even if one was not executed last time. This can
happen due timeout-duration uncertainty. This commit therefore avoids
advancing the FQS schedule unless an FQS was just executed. In the
corner case where an FQS was not executed, but is due now, the code does
a one-jiffy wait.

This change prepares for kthread kicking.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:34:49 +0800
86057b80a rcu: Awaken grace-period kthread when stalled ... Browse Code »

Recent kernels can fail to awaken the grace-period kthread for
quiescent-state forcing. This commit is a crude hack that does
a wakeup any time a stall is detected.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:34:49 +0800
3b5f668e7 rcu: Overlap wakeups with next expedited grace period ... Browse Code »

The current expedited grace-period implementation makes subsequent grace
periods wait on wakeups for the prior grace period. This does not fit
the dictionary definition of "expedited", so this commit allows these two
phases to overlap. Doing this requires four waitqueues rather than two
because tasks can now be waiting on the previous, current, and next grace
periods. The fourth waitqueue makes the bit masking work out nicely.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:34:11 +0800
aff12cdf8 rcu: Consolidate expedited GP code into exp_funnel_lock() ... Browse Code »

This commit pulls the grace-period-start counter adjustment and tracing
from synchronize_rcu_expedited() and synchronize_sched_expedited()
into exp_funnel_lock(), thus eliminating some code duplication.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:34:11 +0800
179e5dcd1 rcu: Consolidate expedited GP tracing into rcu_exp_gp_seq_snap() ... Browse Code »

This commit moves some duplicate code from synchronize_rcu_expedited()
and synchronize_sched_expedited() into rcu_exp_gp_seq_snap(). This
doesn't save lines of code, but does eliminate a "tell me twice" issue.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:34:10 +0800
4ea3e85b1 rcu: Consolidate expedited GP code into rcu_exp_wait_wake() ... Browse Code »

Currently, synchronize_rcu_expedited() and rcu_sched_expedited() have
significant duplicate code. This commit therefore consolidates some of
this code into rcu_exp_wake(), which is now renamed to rcu_exp_wait_wake()
in recognition of its added responsibilities.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:34:10 +0800
356051e1d rcu: Add exp_funnel_lock() fastpath ... Browse Code »

This commit speeds up the low-contention case, especially for systems
with large rcu_node trees, by attempting to directly acquire the
->exp_mutex. This fastpath checks the leaves and root first in
order to avoid excessive memory contention on the mutex itself.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:34:09 +0800
f6a12f34a rcu: Enforce expedited-GP fairness via funnel wait queue ... Browse Code »

The current mutex-based funnel-locking approach used by expedited grace
periods is subject to severe unfairness. The problem arises when a
few tasks, making a path from leaves to root, all wake up before other
tasks do. A new task can then follow this path all the way to the root,
which needlessly delays tasks whose grace period is done, but who do
not happen to acquire the lock quickly enough.

This commit avoids this problem by maintaining per-rcu_node wait queues,
along with a per-rcu_node counter that tracks the latest grace period
sought by an earlier task to visit this node. If that grace period
would satisfy the current task, instead of proceeding up the tree,
it waits on the current rcu_node structure using a pair of wait queues
provided for that purpose. This decouples awakening of old tasks from
the arrival of new tasks.

If the wakeups prove to be a bottleneck, additional kthreads can be
brought to bear for that purpose.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:34:08 +0800
d40a4f09a rcu: Shorten expedited_workdone* to exp_workdone* ... Browse Code »

Just a name change to save a few lines and a bit of typing.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:34:08 +0800
ec3833ed0 rcu: Force boolean subscript for expedited stall warnings ... Browse Code »

The cpu_online() function can return values other than 0 and 1, which
can result in subscript overflow when applied to a two-element array.
This commit allows for this behavior by using "!!" on the return value
from cpu_online() when used as a subscript.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:34:07 +0800
e2fd9d358 rcu: Remove expedited GP funnel-lock bypass ... Browse Code »

Commit #cdacbe1f91264 ("rcu: Add fastpath bypassing funnel locking")
turns out to be a pessimization at high load because it forces a tree
full of tasks to wait for an expedited grace period that they probably
do not need. This commit therefore removes this optimization.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:34:07 +0800
4f4153024 rcu: Add expedited-grace-period event tracing ... Browse Code »

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:34:06 +0800
bea2de44a rcu: Add funnel-locking tracing for expedited grace periods ... Browse Code »

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:34:06 +0800
a1e122484 rcu: Make cond_resched_rcu_qs() supply RCU-sched expedited QS ... Browse Code »

Although cond_resched_rcu_qs() supplies quiescent states to all flavors
of normal RCU grace periods, it does nothing for expedited RCU-sched
grace periods. This commit therefore adds a check for a need for a
quiescent state from the current CPU by an expedited RCU-sched grace
period, and invokes rcu_sched_qs() to supply that quiescent state if so.

Note that the check is racy in that we might be migrated to some other
CPU just after checking the per-CPU variable. This is OK because the
act of migration will do a context switch, which will supply the needed
quiescent state. The only downside is that we might do an unnecessary
call to rcu_sched_qs(), but the probability is low and the overhead
is small.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:34:04 +0800
251c617c7 rcu: Make expedited RCU-preempt stall warnings count accurately ... Browse Code »

Currently, synchronize_sched_expedited_wait() simply sets the ndetected
variable to the rcu_print_task_exp_stall() return value. This means
that if the last rcu_node structure has no stalled tasks, record of
any stalled tasks in previous rcu_node structures is lost, which can
in turn result in failure to dump out the blocking rcu_node structures.
Or could, had the test been correct.

This commit therefore adds the return value of rcu_print_task_exp_stall()
to ndetected and corrects the later test for ndetected.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:34:03 +0800
28728dd31 rcu: Make expedited RCU-sched grace period immediately detect idle ... Browse Code »

Currently, sync_sched_exp_handler() will force a reschedule unless
this CPU has already checked in or unless a reschedule has already
been called for. This is clearly wasteful if sync_sched_exp_handler()
interrupted an idle CPU, so this commit immediately reports the
quiescent state in that case.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:34:03 +0800
274529ba9 rcu: Consolidate dumping of ftrace buffer ... Browse Code »

This commit consolidates a couple definitions and several calls for
single-shot ftrace-buffer dumping.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:29:08 +0800

16 Mar, 2016

1 commit

710d60cbf Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull cpu hotplug updates from Thomas Gleixner:
"This is the first part of the ongoing cpu hotplug rework:

- Initial implementation of the state machine

- Runs all online and prepare down callbacks on the plugged cpu and
not on some random processor

- Replaces busy loop waiting with completions

- Adds tracepoints so the states can be followed"

More detailed commentary on this work from an earlier email:
"What's wrong with the current cpu hotplug infrastructure?

- Asymmetry

The hotplug notifier mechanism is asymmetric versus the bringup and
teardown. This is mostly caused by the notifier mechanism.

- Largely undocumented dependencies

While some notifiers use explicitely defined notifier priorities,
we have quite some notifiers which use numerical priorities to
express dependencies without any documentation why.

- Control processor driven

Most of the bringup/teardown of a cpu is driven by a control
processor. While it is understandable, that preperatory steps,
like idle thread creation, memory allocation for and initialization
of essential facilities needs to be done before a cpu can boot,
there is no reason why everything else must run on a control
processor. Before this patch series, bringup looks like this:

Control CPU Booting CPU

do preparatory steps
kick cpu into life

do low level init

sync with booting cpu sync with control cpu

bring the rest up

- All or nothing approach

There is no way to do partial bringups. That's something which is
really desired because we waste e.g. at boot substantial amount of
time just busy waiting that the cpu comes to life. That's stupid
as we could very well do preparatory steps and the initial IPI for
other cpus and then go back and do the necessary low level
synchronization with the freshly booted cpu.

- Minimal debuggability

Due to the notifier based design, it's impossible to switch between
two stages of the bringup/teardown back and forth in order to test
the correctness. So in many hotplug notifiers the cancel
mechanisms are either not existant or completely untested.

- Notifier [un]registering is tedious

To [un]register notifiers we need to protect against hotplug at
every callsite. There is no mechanism that bringup/teardown
callbacks are issued on the online cpus, so every caller needs to
do it itself. That also includes error rollback.

What's the new design?

The base of the new design is a symmetric state machine, where both
the control processor and the booting/dying cpu execute a well
defined set of states. Each state is symmetric in the end, except
for some well defined exceptions, and the bringup/teardown can be
stopped and reversed at almost all states.

So the bringup of a cpu will look like this in the future:

Control CPU Booting CPU

do preparatory steps
kick cpu into life

do low level init

sync with booting cpu sync with control cpu

bring itself up

The synchronization step does not require the control cpu to wait.
That mechanism can be done asynchronously via a worker or some
other mechanism.

The teardown can be made very similar, so that the dying cpu cleans
up and brings itself down. Cleanups which need to be done after
the cpu is gone, can be scheduled asynchronously as well.

There is a long way to this, as we need to refactor the notion when a
cpu is available. Today we set the cpu online right after it comes
out of the low level bringup, which is not really correct.

The proper mechanism is to set it to available, i.e. cpu local
threads, like softirqd, hotplug thread etc. can be scheduled on that
cpu, and once it finished all booting steps, it's set to online, so
general workloads can be scheduled on it. The reverse happens on
teardown. First thing to do is to forbid scheduling of general
workloads, then teardown all the per cpu resources and finally shut it
off completely.

This patch series implements the basic infrastructure for this at the
core level. This includes the following:

- Basic state machine implementation with well defined states, so
ordering and prioritization can be expressed.

- Interfaces to [un]register state callbacks

This invokes the bringup/teardown callback on all online cpus with
the proper protection in place and [un]installs the callbacks in
the state machine array.

For callbacks which have no particular ordering requirement we have
a dynamic state space, so that drivers don't have to register an
explicit hotplug state.

If a callback fails, the code automatically does a rollback to the
previous state.

- Sysfs interface to drive the state machine to a particular step.

This is only partially functional today. Full functionality and
therefor testability will be achieved once we converted all
existing hotplug notifiers over to the new scheme.

- Run all CPU_ONLINE/DOWN_PREPARE notifiers on the booting/dying
processor:

Control CPU Booting CPU

do preparatory steps
kick cpu into life

do low level init

sync with booting cpu sync with control cpu
wait for boot
bring itself up

Signal completion to control cpu

In a previous step of this work we've done a full tree mechanical
conversion of all hotplug notifiers to the new scheme. The balance
is a net removal of about 4000 lines of code.

This is not included in this series, as we decided to take a
different approach. Instead of mechanically converting everything
over, we will do a proper overhaul of the usage sites one by one so
they nicely fit into the symmetric callback scheme.

I decided to do that after I looked at the ugliness of some of the
converted sites and figured out that their hotplug mechanism is
completely buggered anyway. So there is no point to do a
mechanical conversion first as we need to go through the usage
sites one by one again in order to achieve a full symmetric and
testable behaviour"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
cpu/hotplug: Document states better
cpu/hotplug: Fix smpboot thread ordering
cpu/hotplug: Remove redundant state check
cpu/hotplug: Plug death reporting race
rcu: Make CPU_DYING_IDLE an explicit call
cpu/hotplug: Make wait for dead cpu completion based
cpu/hotplug: Let upcoming cpu bring itself fully up
arch/hotplug: Call into idle with a proper state
cpu/hotplug: Move online calls to hotplugged cpu
cpu/hotplug: Create hotplug threads
cpu/hotplug: Split out the state walk into functions
cpu/hotplug: Unpark smpboot threads from the state machine
cpu/hotplug: Move scheduler cpu_online notifier to hotplug core
cpu/hotplug: Implement setup/removal interface
cpu/hotplug: Make target state writeable
cpu/hotplug: Add sysfs state interface
cpu/hotplug: Hand in target state to _cpu_up/down
cpu/hotplug: Convert the hotplugged cpu work to a state machine
cpu/hotplug: Convert to a state machine for the control processor
cpu/hotplug: Add tracepoints
...

Linus Torvalds
2016-03-16 04:50:29 +0800

15 Mar, 2016

1 commit

8bc6782fe Merge commit 'fixes.2015.02.23a' into core/rcu ... Browse Code »

Conflicts:
kernel/rcu/tree.c

Signed-off-by: Ingo Molnar

Ingo Molnar
2016-03-15 16:01:06 +0800

02 Mar, 2016

1 commit

27d50c7ee rcu: Make CPU_DYING_IDLE an explicit call ... Browse Code »

Make the RCU CPU_DYING_IDLE callback an explicit function call, so it gets
invoked at the proper place.

Signed-off-by: Thomas Gleixner
Cc: linux-arch@vger.kernel.org
Cc: Rik van Riel
Cc: Rafael Wysocki
Cc: "Srivatsa S. Bhat"
Cc: Peter Zijlstra
Cc: Arjan van de Ven
Cc: Sebastian Siewior
Cc: Rusty Russell
Cc: Steven Rostedt
Cc: Oleg Nesterov
Cc: Tejun Heo
Cc: Andrew Morton
Cc: Paul McKenney
Cc: Linus Torvalds
Cc: Paul Turner
Link: http://lkml.kernel.org/r/20160226182341.870167933@linutronix.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2016-03-02 03:36:58 +0800

25 Feb, 2016

2 commits

abedf8e24 rcu: Use simple wait queues where possible in rcutree ... Browse Code »

As of commit dae6e64d2bcfd ("rcu: Introduce proper blocking to no-CBs kthreads
GP waits") the RCU subsystem started making use of wait queues.

Here we convert all additions of RCU wait queues to use simple wait queues,
since they don't need the extra overhead of the full wait queue features.

Originally this was done for RT kernels[1], since we would get things like...

BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 1, pid: 8, name: rcu_preempt
Pid: 8, comm: rcu_preempt Not tainted
Call Trace:
[] __might_sleep+0xd0/0xf0
[] rt_spin_lock+0x24/0x50
[] __wake_up+0x36/0x70
[] rcu_gp_kthread+0x4d2/0x680
[] ? __init_waitqueue_head+0x50/0x50
[] ? rcu_gp_fqs+0x80/0x80
[] kthread+0xdb/0xe0
[] ? finish_task_switch+0x52/0x100
[] kernel_thread_helper+0x4/0x10
[] ? __init_kthread_worker+0x60/0x60
[] ? gs_change+0xb/0xb

...and hence simple wait queues were deployed on RT out of necessity
(as simple wait uses a raw lock), but mainline might as well take
advantage of the more streamline support as well.

[1] This is a carry forward of work from v3.10-rt; the original conversion
was by Thomas on an earlier -rt version, and Sebastian extended it to
additional post-3.10 added RCU waiters; here I've added a commit log and
unified the RCU changes into one, and uprev'd it to match mainline RCU.

Signed-off-by: Daniel Wagner
Acked-by: Peter Zijlstra (Intel)
Cc: linux-rt-users@vger.kernel.org
Cc: Boqun Feng
Cc: Marcelo Tosatti
Cc: Steven Rostedt
Cc: Paul Gortmaker
Cc: Paolo Bonzini
Cc: "Paul E. McKenney"
Link: http://lkml.kernel.org/r/1455871601-27484-6-git-send-email-wagi@monom.org
Signed-off-by: Thomas Gleixner

Paul Gortmaker
2016-02-25 18:27:16 +0800
065bb78c5 rcu: Do not call rcu_nocb_gp_cleanup() while holding rnp->lock ... Browse Code »

rcu_nocb_gp_cleanup() is called while holding rnp->lock. Currently,
this is okay because the wake_up_all() in rcu_nocb_gp_cleanup() will
not enable the IRQs. lockdep is happy.

By switching over using swait this is not true anymore. swake_up_all()
enables the IRQs while processing the waiters. __do_softirq() can now
run and will eventually call rcu_process_callbacks() which wants to
grap nrp->lock.

Let's move the rcu_nocb_gp_cleanup() call outside the lock before we
switch over to swait.

If we would hold the rnp->lock and use swait, lockdep reports
following:

=================================
[ INFO: inconsistent lock state ]
4.2.0-rc5-00025-g9a73ba0 #136 Not tainted
---------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
rcu_preempt/8 [HC0[0]:SC0[0]:HE1:SE1] takes:
(rcu_node_1){+.?...}, at: [] rcu_gp_kthread+0xb97/0xeb0
{IN-SOFTIRQ-W} state was registered at:
[] __lock_acquire+0xd5f/0x21e0
[] lock_acquire+0xdf/0x2b0
[] _raw_spin_lock_irqsave+0x59/0xa0
[] rcu_process_callbacks+0x141/0x3c0
[] __do_softirq+0x14d/0x670
[] irq_exit+0x104/0x110
[] smp_apic_timer_interrupt+0x46/0x60
[] apic_timer_interrupt+0x70/0x80
[] rq_attach_root+0xa6/0x100
[] cpu_attach_domain+0x16d/0x650
[] build_sched_domains+0x942/0xb00
[] sched_init_smp+0x509/0x5c1
[] kernel_init_freeable+0x172/0x28f
[] kernel_init+0xe/0xe0
[] ret_from_fork+0x3f/0x70
irq event stamp: 76
hardirqs last enabled at (75): [] _raw_spin_unlock_irq+0x30/0x60
hardirqs last disabled at (76): [] _raw_spin_lock_irq+0x1f/0x90
softirqs last enabled at (0): [] copy_process.part.26+0x602/0x1cf0
softirqs last disabled at (0): [< (null)>] (null)
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(rcu_node_1);

lock(rcu_node_1);
*** DEADLOCK ***
1 lock held by rcu_preempt/8:
#0: (rcu_node_1){+.?...}, at: [] rcu_gp_kthread+0xb97/0xeb0
stack backtrace:
CPU: 0 PID: 8 Comm: rcu_preempt Not tainted 4.2.0-rc5-00025-g9a73ba0 #136
Hardware name: Dell Inc. PowerEdge R820/066N7P, BIOS 2.0.20 01/16/2014
0000000000000000 000000006d7e67d8 ffff881fb081fbd8 ffffffff818379e0
0000000000000000 ffff881fb0812a00 ffff881fb081fc38 ffffffff8110813b
0000000000000000 0000000000000001 ffff881f00000001 ffffffff8102fa4f
Call Trace:
[] dump_stack+0x4f/0x7b
[] print_usage_bug+0x1db/0x1e0
[] ? save_stack_trace+0x2f/0x50
[] mark_lock+0x66d/0x6e0
[] ? check_usage_forwards+0x150/0x150
[] mark_held_locks+0x78/0xa0
[] ? _raw_spin_unlock_irq+0x30/0x60
[] trace_hardirqs_on_caller+0x168/0x220
[] trace_hardirqs_on+0xd/0x10
[] _raw_spin_unlock_irq+0x30/0x60
[] swake_up_all+0xb7/0xe0
[] rcu_gp_kthread+0xab1/0xeb0
[] ? trace_hardirqs_on_caller+0xff/0x220
[] ? _raw_spin_unlock_irq+0x41/0x60
[] ? rcu_barrier+0x20/0x20
[] kthread+0x104/0x120
[] ? _raw_spin_unlock_irq+0x30/0x60
[] ? kthread_create_on_node+0x260/0x260
[] ret_from_fork+0x3f/0x70
[] ? kthread_create_on_node+0x260/0x260

Signed-off-by: Daniel Wagner
Acked-by: Peter Zijlstra (Intel)
Cc: linux-rt-users@vger.kernel.org
Cc: Boqun Feng
Cc: Marcelo Tosatti
Cc: Steven Rostedt
Cc: Paul Gortmaker
Cc: Paolo Bonzini
Cc: "Paul E. McKenney"
Link: http://lkml.kernel.org/r/1455871601-27484-5-git-send-email-wagi@monom.org
Signed-off-by: Thomas Gleixner

Daniel Wagner
2016-02-25 18:27:16 +0800

24 Feb, 2016

7 commits

4b455dc3e rcu: Catch up rcu_report_qs_rdp() comment with reality ... Browse Code »

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-02-24 11:59:56 +0800
67c583a7d RCU: Privatize rcu_node::lock ... Browse Code »

In patch:

"rcu: Add transitivity to remaining rcu_node ->lock acquisitions"

All locking operations on rcu_node::lock are replaced with the wrappers
because of the need of transitivity, which indicates we should never
write code using LOCK primitives alone(i.e. without a proper barrier
following) on rcu_node::lock outside those wrappers. We could detect
this kind of misuses on rcu_node::lock in the future by adding __private
modifier on rcu_node::lock.

To privatize rcu_node::lock, unlock wrappers are also needed. Replacing
spinlock unlocks with these wrappers not only privatizes rcu_node::lock
but also makes it easier to figure out critical sections of rcu_node.

This patch adds __private modifier to rcu_node::lock and makes every
access to it wrapped by ACCESS_PRIVATE(). Besides, unlock wrappers are
added and raw_spin_unlock(&rnp->lock) and its friends are replaced with
those wrappers.

Signed-off-by: Boqun Feng
Signed-off-by: Paul E. McKenney

Boqun Feng
2016-02-24 11:59:54 +0800
1914aab54 rcu: Remove useless rcu_data_p when !PREEMPT_RCU ... Browse Code »

The related warning from gcc 6.0:

In file included from kernel/rcu/tree.c:4630:0:
kernel/rcu/tree_plugin.h:810:40: warning: ‘rcu_data_p’ defined but not used [-Wunused-const-variable]
static struct rcu_data __percpu *const rcu_data_p = &rcu_sched_data;
^~~~~~~~~~

Also remove always redundant rcu_data_p in tree.c.

Signed-off-by: Chen Gang
Signed-off-by: Paul E. McKenney

Chen Gang
2016-02-24 11:59:53 +0800
23a9bacd3 rcu: Set rdp->gpwrap when CPU is idle ... Browse Code »

Commit #e3663b1024d1 ("rcu: Handle gpnum/completed wrap while dyntick
idle") sets rdp->gpwrap on the wrong side of the "if" statement in
dyntick_save_progress_counter(), that is, it sets it when the CPU is
not idle instead of when it is idle. Of course, if the CPU is not idle,
its rdp->gpnum won't be lagging beind the global rsp->gpnum, which means
that rdp->gpwrap will never be set.

This commit therefore moves this code to the proper leg of that "if"
statement. This change means that the "else" cause is just "return 0"
and the "then" clause ends with "return 1", so also move the "return 0"
to follow the "if", dropping the "else" clause.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-02-24 11:59:52 +0800
4914950aa rcu: Stop treating in-kernel CPU-bound workloads as errors ... Browse Code »

Commit 4a81e8328d379 ("Reduce overhead of cond_resched() checks for RCU")
handles the error case where a nohz_full loops indefinitely in the kernel
with the scheduling-clock interrupt disabled. However, this handling
includes IPIing the CPU running the offending loop, which is not what
we want for real-time workloads. And there are starting to be real-time
CPU-bound in-kernel workloads, and these must be handled without IPIing
the CPU, at least not in the common case. Therefore, this situation can
no longer be dismissed as an error case.

This commit therefore splits the handling out, so that the setting of
bits in the per-CPU rcu_sched_qs_mask variable is done relatively early,
but if the problem persists, resched_cpu() is eventually used to IPI the
CPU containing the offending loop. Assuming that in-kernel CPU-bound
loops used by real-time tasks contain frequent calls cond_resched_rcu_qs()
(as in more than once per few tens of milliseconds), the real-time tasks
will never be IPIed.

Signed-off-by: Paul E. McKenney
Cc: Steven Rostedt

Paul E. McKenney
2016-02-24 11:59:51 +0800
8994515cf rcu: Update rcu_report_qs_rsp() comment ... Browse Code »

The header comment for rcu_report_qs_rsp() was obsolete, dating well
before the advent of RCU grace-period kthreads. This commit therefore
brings this comment back into alignment with current reality.

Reported-by: Lihao Liang
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-02-24 11:59:51 +0800
bb53e416e rcu: Assign false instead of 0 for ->core_needs_qs ... Browse Code »

A zero seems to have escaped earlier true/false substitution efforts,
so this commit changes 0 to false for the ->core_needs_qs boolean field.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-02-24 11:59:51 +0800

08 Dec, 2015

1 commit

648c630c6 Merge branches 'doc.2015.12.05a', 'exp.2015.12.07a', 'fixes.2015.12.07a', 'list.… ... Browse Code »

…2015.12.04b' and 'torture.2015.12.05a' into HEAD

doc.2015.12.05a: Documentation updates
exp.2015.12.07a: Expedited grace-period updates
fixes.2015.12.07a: Miscellaneous fixes
list.2015.12.04b: Linked-list updates
torture.2015.12.05a: Torture-test updates

Paul E. McKenney
2015-12-08 09:02:54 +0800