Eric Lee / smarc-fsl-linux-kernel

23 Aug, 2016

1 commit

8b355e3bc rcu: Drive expedited grace periods from workqueue ... Browse Code »

The current implementation of expedited grace periods has the user
task drive the grace period. This works, but has downsides: (1) The
user task must awaken tasks piggybacking on this grace period, which
can result in latencies rivaling that of the grace period itself, and
(2) User tasks can receive signals, which interfere with RCU CPU stall
warnings.

This commit therefore uses workqueues to drive the grace periods, so
that the user task need not do the awakening. A subsequent commit
will remove the now-unnecessary code allowing for signals.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-08-23 00:30:25 +0800

01 Apr, 2016

2 commits

d40a4f09a rcu: Shorten expedited_workdone* to exp_workdone* ... Browse Code »

Just a name change to save a few lines and a bit of typing.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:34:08 +0800
e2fd9d358 rcu: Remove expedited GP funnel-lock bypass ... Browse Code »

Commit #cdacbe1f91264 ("rcu: Add fastpath bypassing funnel locking")
turns out to be a pessimization at high load because it forces a tree
full of tasks to wait for an expedited grace period that they probably
do not need. This commit therefore removes this optimization.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-04-01 04:34:07 +0800

08 Dec, 2015

1 commit

648c630c6 Merge branches 'doc.2015.12.05a', 'exp.2015.12.07a', 'fixes.2015.12.07a', 'list.… ... Browse Code »

…2015.12.04b' and 'torture.2015.12.05a' into HEAD

doc.2015.12.05a: Documentation updates
exp.2015.12.07a: Expedited grace-period updates
fixes.2015.12.07a: Miscellaneous fixes
list.2015.12.04b: Linked-list updates
torture.2015.12.05a: Torture-test updates

Paul E. McKenney
2015-12-08 09:02:54 +0800

05 Dec, 2015

2 commits

47dbc9066 kernel: Make rcu/tree_trace.c explicitly non-modular ... Browse Code »

The Kconfig currently controlling compilation of this code is:

init/Kconfig:config TREE_RCU_TRACE
init/Kconfig: def_bool RCU_TRACE && ( TREE_RCU || PREEMPT_RCU )

...meaning that it currently is not being built as a module by anyone.

Lets remove the modular code that is essentially orphaned, so that
when reading the file there is no doubt it is builtin-only.

Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit. We could
consider moving this to an earlier initcall if desired.

We don't replace module.h with init.h since the file already has that.
We also delete the moduleparam.h include that is left over from
commit 64db4cfff99c04cd5f550357edcc8780f96b54a2 (""Tree RCU": scalable
classic RCU implementation") since it is not needed here either.

We morph some tags like MODULE_AUTHOR into the comments at the top of
the file for documentation purposes.

Cc: "Paul E. McKenney"
Cc: Josh Triplett
Reviewed-by: Josh Triplett
Cc: Steven Rostedt
Cc: Mathieu Desnoyers
Cc: Lai Jiangshan
Signed-off-by: Paul Gortmaker
Signed-off-by: Paul E. McKenney

Paul Gortmaker
2015-12-05 04:27:29 +0800
df5bd5144 rcu: Reduce expedited GP memory contention via per-CPU variables ... Browse Code »

Currently, the piggybacked-work checks carried out by sync_exp_work_done()
atomically increment a small set of variables (the ->expedited_workdone0,
->expedited_workdone1, ->expedited_workdone2, ->expedited_workdone3
fields in the rcu_state structure), which will form a memory-contention
bottleneck given a sufficiently large number of CPUs concurrently invoking
either synchronize_rcu_expedited() or synchronize_sched_expedited().

This commit therefore moves these for fields to the per-CPU rcu_data
structure, eliminating the memory contention. The show_rcuexp() function
also changes to sum up each field in the rcu_data structures.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2015-12-05 04:26:52 +0800

24 Nov, 2015

1 commit

6cf100812 rcu: Add transitivity to remaining rcu_node ->lock acquisitions ... Browse Code »

The rule is that all acquisitions of the rcu_node structure's ->lock
must provide transitivity: The lock is not acquired that frequently,
and sorting out exactly which required it and which did not would be
a maintenance nightmare. This commit therefore supplies the needed
transitivity to the remaining ->lock acquisitions.

Reported-by: Peter Zijlstra
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2015-11-24 02:37:35 +0800

08 Oct, 2015

1 commit

d2856b046 Merge branches 'fixes.2015.10.06a' and 'exp.2015.10.07a' into HEAD ... Browse Code »

exp.2015.10.07a: Reduce OS jitter of RCU-sched expedited grace periods.
fixes.2015.10.06a: Miscellaneous fixes.

Paul E. McKenney
2015-10-08 07:05:21 +0800

07 Oct, 2015

1 commit

77f81fe08 rcu: Finish folding ->fqs_state into ->gp_state ... Browse Code »

Commit commit 4cdfc175c25c89ee ("rcu: Move quiescent-state forcing
into kthread") started the process of folding the old ->fqs_state into
->gp_state, but did not complete it. This situation does not cause
any malfunction, but can result in extremely confusing trace output.
This commit completes this task of eliminating ->fqs_state in favor
of ->gp_state.

The old ->fqs_state was also used to decide when to collect dyntick-idle
snapshots. For this purpose, we add a boolean variable into the kthread,
which is set on the first call to rcu_gp_fqs() for a given grace period
and clear otherwise.

Signed-off-by: Petr Mladek
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Petr Mladek
2015-10-07 02:15:59 +0800

21 Sep, 2015

3 commits

5b74c4589 rcu: Make ->cpu_no_qs be a union for aggregate OR ... Browse Code »

This commit converts the rcu_data structure's ->cpu_no_qs field
to a union. The bytewise side of this union allows individual access
to indications as to whether this CPU needs to find a quiescent state
for a normal (.norm) and/or expedited (.exp) grace period. The setwise
side of the union allows testing whether or not a quiescent state is
needed at all, for either type of grace period.

For now, only .norm is used. A later commit will introduce the expedited
usage.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2015-09-21 12:16:21 +0800
0d43eb34f rcu: Invert passed_quiesce and rename to cpu_no_qs ... Browse Code »

This commit inverts the sense of the rcu_data structure's ->passed_quiesce
field and renames it to ->cpu_no_qs. This will allow a later commit to
use an "aggregate OR" operation to test expedited as well as normal grace
periods without added overhead.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2015-09-21 12:16:21 +0800
97c668b8e rcu: Rename qs_pending to core_needs_qs ... Browse Code »

An upcoming commit needs to invert the sense of the ->passed_quiesce
rcu_data structure field, so this commit is taking this opportunity
to clarify things a bit by renaming ->qs_pending to ->core_needs_qs.

So if !rdp->core_needs_qs, then this CPU need not concern itself with
quiescent states, in particular, it need not acquire its leaf rcu_node
structure's ->lock to check. Otherwise, it needs to report the next
quiescent state.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2015-09-21 12:16:20 +0800

18 Jul, 2015

6 commits

cdacbe1f9 rcu: Add fastpath bypassing funnel locking ... Browse Code »

In the common case, there will be only one expedited grace period in
the system at a given time, in which case it is not helpful to use
funnel locking. This commit therefore adds a fastpath that bypasses
funnel locking when the root ->exp_funnel_mutex is not held.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2015-07-18 05:59:06 +0800
2cd6ffafe rcu: Extend expedited funnel locking to rcu_data structure ... Browse Code »

The strictly rcu_node based funnel-locking scheme works well in many
cases, but systems with CONFIG_RCU_FANOUT_LEAF=64 won't necessarily get
all that much concurrency. This commit therefore extends the funnel
locking into the per-CPU rcu_data structure, providing concurrency equal
to the number of CPUs.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2015-07-18 05:59:00 +0800
4f525a528 rcu: Apply rcu_seq operations to _rcu_barrier() ... Browse Code »

The rcu_seq operations were open-coded in _rcu_barrier(), so this commit
replaces the open-coding with the shiny new rcu_seq operations.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2015-07-18 05:58:57 +0800
3a6d7c64d rcu: Make expedited GP CPU stoppage asynchronous ... Browse Code »

Sequentially stopping the CPUs slows down expedited grace periods by
at least a factor of two, based on rcutorture's grace-period-per-second
rate. This is a conservative measure because rcutorture uses unusually
long RCU read-side critical sections and because rcutorture periodically
quiesces the system in order to test RCU's ability to ramp down to and
up from the idle state. This commit therefore replaces the stop_one_cpu()
with stop_one_cpu_nowait(), using an atomic-counter scheme to determine
when all CPUs have passed through the stopped state.

Signed-off-by: Peter Zijlstra
Signed-off-by: Paul E. McKenney

Peter Zijlstra
2015-07-18 05:58:50 +0800
385b73c06 rcu: Get rid of synchronize_sched_expedited()'s polling loop ... Browse Code »

This commit gets rid of synchronize_sched_expedited()'s mutex_trylock()
polling loop in favor of a funnel-locking scheme based on the rcu_node
tree. The work-done check is done at each level of the tree, allowing
high-contention situations to be resolved quickly with reasonable levels
of mutex contention.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2015-07-18 05:58:48 +0800
d6ada2cf2 rcu: Rework synchronize_sched_expedited() counter handling ... Browse Code »

Now that synchronize_sched_expedited() have a mutex, it can use simpler
work-already-done detection scheme. This commit simplifies this scheme
by using something similar to the sequence-locking counter scheme.
A counter is incremented before and after each grace period, so that
the counter is odd in the midst of the grace period and even otherwise.
So if the counter has advanced to the second even number that is
greater than or equal to the snapshot, the required grace period has
already happened.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2015-07-18 05:58:47 +0800

28 May, 2015

1 commit

7d0ae8086 rcu: Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE() ... Browse Code »

This commit moves from the old ACCESS_ONCE() API to the new READ_ONCE()
and WRITE_ONCE() APIs.

Signed-off-by: Paul E. McKenney
[ paulmck: Updated to include kernel/torture.c as suggested by Jason Low. ]

Paul E. McKenney
2015-05-28 03:56:15 +0800

13 Mar, 2015

1 commit

0aa04b055 rcu: Process offlining and onlining only at grace-period start ... Browse Code »

Races between CPU hotplug and grace periods can be difficult to resolve,
so the ->onoff_mutex is used to exclude the two events. Unfortunately,
this means that it is impossible for an outgoing CPU to perform the
last bits of its offlining from its last pass through the idle loop,
because sleeplocks cannot be acquired in that context.

This commit avoids these problems by buffering online and offline events
in a new ->qsmaskinitnext field in the leaf rcu_node structures. When a
grace period starts, the events accumulated in this mask are applied to
the ->qsmaskinit field, and, if needed, up the rcu_node tree. The special
case of all CPUs corresponding to a given leaf rcu_node structure being
offline while there are still elements in that structure's ->blkd_tasks
list is handled using a new ->wait_blkd_tasks field. In this case,
propagating the offline bits up the tree is deferred until the beginning
of the grace period after all of the tasks have exited their RCU read-side
critical sections and removed themselves from the list, at which point
the ->wait_blkd_tasks flag is cleared. If one of that leaf rcu_node
structure's CPUs comes back online before the list empties, then the
->wait_blkd_tasks flag is simply cleared.

This of course means that RCU's notion of which CPUs are offline can be
out of date. This is OK because RCU need only wait on CPUs that were
online at the time that the grace period started. In addition, RCU's
force-quiescent-state actions will handle the case where a CPU goes
offline after the grace period starts.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2015-03-13 06:19:37 +0800

16 Jan, 2015

1 commit

5cd37193c rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors ... Browse Code »

Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used
in places where it would be useful for it to apply to the normal RCU
flavors, rcu_preempt, rcu_sched, and rcu_bh. This is especially the
case for workloads that aggressively overload the system, particularly
those that generate large numbers of RCU updates on systems running
NO_HZ_FULL CPUs. This commit therefore communicates quiescent states
from cond_resched_rcu_qs() to the normal RCU flavors.

Note that it is unfortunately necessary to leave the old ->passed_quiesce
mechanism in place to allow quiescent states that apply to only one
flavor to be recorded. (Yes, we could decrement ->rcu_qs_ctr_snap in
that case, but that is not so good for debugging of RCU internals.)
In addition, if one of the RCU flavor's grace period has stalled, this
will invoke rcu_momentary_dyntick_idle(), resulting in a heavy-weight
quiescent state visible from other CPUs.

Reported-by: Sasha Levin
Reported-by: Dave Jones
Signed-off-by: Paul E. McKenney
[ paulmck: Merge commit from Sasha Levin fixing a bug where __this_cpu()
was used in preemptible code. ]

Paul E. McKenney
2015-01-16 15:33:14 +0800

18 Feb, 2014

2 commits

87de1cfdc rcu: Stop tracking FSF's postal address ... Browse Code »

All of the RCU source files have the usual GPL header, which contains a
long-obsolete postal address for FSF. To avoid the need to track the
FSF office's movements, this commit substitutes the URL where GPL may
be found.

Reported-by: Greg KH
Reported-by: Steven Rostedt
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2014-02-18 07:01:37 +0800
3660c2813 rcu: Add ACCESS_ONCE() to ->n_force_qs_lh accesses ... Browse Code »

The ->n_force_qs_lh field is accessed without the benefit of any
synchronization, so this commit adds the needed ACCESS_ONCE() wrappers.
Yes, increments to ->n_force_qs_lh can be lost, but contention should
be low and the field is strictly statistical in nature, so this is not
a problem.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2014-02-18 07:01:10 +0800

04 Dec, 2013

1 commit

96d3fd0d3 rcu: Break call_rcu() deadlock involving scheduler and perf ... Browse Code »

Dave Jones got the following lockdep splat:

> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.12.0-rc3+ #92 Not tainted
> -------------------------------------------------------
> trinity-child2/15191 is trying to acquire lock:
> (&rdp->nocb_wq){......}, at: [] __wake_up+0x23/0x50
>
> but task is already holding lock:
> (&ctx->lock){-.-...}, at: [] perf_event_exit_task+0x109/0x230
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #3 (&ctx->lock){-.-...}:
> [] lock_acquire+0x93/0x200
> [] _raw_spin_lock+0x40/0x80
> [] __perf_event_task_sched_out+0x2df/0x5e0
> [] perf_event_task_sched_out+0x93/0xa0
> [] __schedule+0x1d2/0xa20
> [] preempt_schedule_irq+0x50/0xb0
> [] retint_kernel+0x26/0x30
> [] tty_flip_buffer_push+0x34/0x50
> [] pty_write+0x54/0x60
> [] n_tty_write+0x32d/0x4e0
> [] tty_write+0x158/0x2d0
> [] vfs_write+0xc0/0x1f0
> [] SyS_write+0x4c/0xa0
> [] tracesys+0xdd/0xe2
>
> -> #2 (&rq->lock){-.-.-.}:
> [] lock_acquire+0x93/0x200
> [] _raw_spin_lock+0x40/0x80
> [] wake_up_new_task+0xc2/0x2e0
> [] do_fork+0x126/0x460
> [] kernel_thread+0x26/0x30
> [] rest_init+0x23/0x140
> [] start_kernel+0x3f6/0x403
> [] x86_64_start_reservations+0x2a/0x2c
> [] x86_64_start_kernel+0xf1/0xf4
>
> -> #1 (&p->pi_lock){-.-.-.}:
> [] lock_acquire+0x93/0x200
> [] _raw_spin_lock_irqsave+0x4b/0x90
> [] try_to_wake_up+0x31/0x350
> [] default_wake_function+0x12/0x20
> [] autoremove_wake_function+0x18/0x40
> [] __wake_up_common+0x58/0x90
> [] __wake_up+0x39/0x50
> [] __call_rcu_nocb_enqueue+0xa8/0xc0
> [] __call_rcu+0x140/0x820
> [] call_rcu+0x1d/0x20
> [] cpu_attach_domain+0x287/0x360
> [] build_sched_domains+0xe5e/0x10a0
> [] sched_init_smp+0x3b7/0x47a
> [] kernel_init_freeable+0xf6/0x202
> [] kernel_init+0xe/0x190
> [] ret_from_fork+0x7c/0xb0
>
> -> #0 (&rdp->nocb_wq){......}:
> [] __lock_acquire+0x191a/0x1be0
> [] lock_acquire+0x93/0x200
> [] _raw_spin_lock_irqsave+0x4b/0x90
> [] __wake_up+0x23/0x50
> [] __call_rcu_nocb_enqueue+0xa8/0xc0
> [] __call_rcu+0x140/0x820
> [] kfree_call_rcu+0x20/0x30
> [] put_ctx+0x4f/0x70
> [] perf_event_exit_task+0x12e/0x230
> [] do_exit+0x30d/0xcc0
> [] do_group_exit+0x4c/0xc0
> [] SyS_exit_group+0x14/0x20
> [] tracesys+0xdd/0xe2
>
> other info that might help us debug this:
>
> Chain exists of:
> &rdp->nocb_wq --> &rq->lock --> &ctx->lock
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&ctx->lock);
> lock(&rq->lock);
> lock(&ctx->lock);
> lock(&rdp->nocb_wq);
>
> *** DEADLOCK ***
>
> 1 lock held by trinity-child2/15191:
> #0: (&ctx->lock){-.-...}, at: [] perf_event_exit_task+0x109/0x230
>
> stack backtrace:
> CPU: 2 PID: 15191 Comm: trinity-child2 Not tainted 3.12.0-rc3+ #92
> ffffffff82565b70 ffff880070c2dbf8 ffffffff8172a363 ffffffff824edf40
> ffff880070c2dc38 ffffffff81726741 ffff880070c2dc90 ffff88022383b1c0
> ffff88022383aac0 0000000000000000 ffff88022383b188 ffff88022383b1c0
> Call Trace:
> [] dump_stack+0x4e/0x82
> [] print_circular_bug+0x200/0x20f
> [] __lock_acquire+0x191a/0x1be0
> [] ? get_lock_stats+0x19/0x60
> [] ? native_sched_clock+0x24/0x80
> [] lock_acquire+0x93/0x200
> [] ? __wake_up+0x23/0x50
> [] _raw_spin_lock_irqsave+0x4b/0x90
> [] ? __wake_up+0x23/0x50
> [] __wake_up+0x23/0x50
> [] __call_rcu_nocb_enqueue+0xa8/0xc0
> [] __call_rcu+0x140/0x820
> [] ? local_clock+0x3f/0x50
> [] kfree_call_rcu+0x20/0x30
> [] put_ctx+0x4f/0x70
> [] perf_event_exit_task+0x12e/0x230
> [] do_exit+0x30d/0xcc0
> [] ? trace_hardirqs_on_caller+0x115/0x1e0
> [] ? trace_hardirqs_on+0xd/0x10
> [] do_group_exit+0x4c/0xc0
> [] SyS_exit_group+0x14/0x20
> [] tracesys+0xdd/0xe2

The underlying problem is that perf is invoking call_rcu() with the
scheduler locks held, but in NOCB mode, call_rcu() will with high
probability invoke the scheduler -- which just might want to use its
locks. The reason that call_rcu() needs to invoke the scheduler is
to wake up the corresponding rcuo callback-offload kthread, which
does the job of starting up a grace period and invoking the callbacks
afterwards.

One solution (championed on a related problem by Lai Jiangshan) is to
simply defer the wakeup to some point where scheduler locks are no longer
held. Since we don't want to unnecessarily incur the cost of such
deferral, the task before us is threefold:

1. Determine when it is likely that a relevant scheduler lock is held.

2. Defer the wakeup in such cases.

3. Ensure that all deferred wakeups eventually happen, preferably
sooner rather than later.

We use irqs_disabled_flags() as a proxy for relevant scheduler locks
being held. This works because the relevant locks are always acquired
with interrupts disabled. We may defer more often than needed, but that
is at least safe.

The wakeup deferral is tracked via a new field in the per-CPU and
per-RCU-flavor rcu_data structure, namely ->nocb_defer_wakeup.

This flag is checked by the RCU core processing. The __rcu_pending()
function now checks this flag, which causes rcu_check_callbacks()
to initiate RCU core processing at each scheduling-clock interrupt
where this flag is set. Of course this is not sufficient because
scheduling-clock interrupts are often turned off (the things we used to
be able to count on!). So the flags are also checked on entry to any
state that RCU considers to be idle, which includes both NO_HZ_IDLE idle
state and NO_HZ_FULL user-mode-execution state.

This approach should allow call_rcu() to be invoked regardless of what
locks you might be holding, the key word being "should".

Reported-by: Dave Jones
Signed-off-by: Paul E. McKenney
Cc: Peter Zijlstra

Paul E. McKenney
2013-12-04 02:10:18 +0800

16 Oct, 2013

1 commit

4102adab9 rcu: Move RCU-related source code to kernel/rcu directory ... Browse Code »

Signed-off-by: Paul E. McKenney
Reviewed-by: Ingo Molnar

Paul E. McKenney
2013-10-16 03:53:31 +0800