Eric Lee / smarc-fsl-linux-kernel

15 Jul, 2013

1 commit

49fb4c629 rcu: delete __cpuinit usage from all rcu files ... Browse Code »

The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

This removes all the drivers/rcu uses of the __cpuinit macros
from all C files.

[1] https://lkml.org/lkml/2013/5/20/589

Cc: "Paul E. McKenney"
Cc: Josh Triplett
Cc: Dipankar Sarma
Reviewed-by: Josh Triplett
Signed-off-by: Paul Gortmaker

Paul Gortmaker
2013-07-15 07:36:58 +0800

04 Jul, 2013

1 commit

f170168b9 drivers: avoid parsing names as kthread_run() format strings ... Browse Code »

Calling kthread_run with a single name parameter causes it to be handled
as a format string. Many callers are passing potentially dynamic string
content, so use "%s" in those cases to avoid any potential accidents.

Signed-off-by: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2013-07-04 07:07:41 +0800

11 Jun, 2013

13 commits

be77f87c0 Merge branches 'cbnum.2013.06.10a', 'doc.2013.06.10a', 'fixes.2013.06.10a', 'src… ... Browse Code »

…u.2013.06.10a' and 'tiny.2013.06.10a' into HEAD

cbnum.2013.06.10a: Apply simplifications stemming from the new callback
numbering.

doc.2013.06.10a: Documentation updates.

fixes.2013.06.10a: Miscellaneous fixes.

srcu.2013.06.10a: Updates to SRCU.

tiny.2013.06.10a: Eliminate TINY_PREEMPT_RCU.

Paul E. McKenney
2013-06-11 04:46:44 +0800
026ad2835 rcu: Drive quiescent-state-forcing delay from HZ ... Browse Code »

Systems with HZ=100 can have slow bootup times due to the default
three-jiffy delays between quiescent-state forcing attempts. This
commit therefore auto-tunes the RCU_JIFFIES_TILL_FORCE_QS value based
on the value of HZ. However, this would break very large systems that
require more time between quiescent-state forcing attempts. This
commit therefore also ups the default delay by one jiffy for each
256 CPUs that might be on the system (based off of nr_cpu_ids at
runtime, -not- NR_CPUS at build time).

Updated to collapse #ifdefs for RCU_JIFFIES_TILL_FORCE_QS into a
step-function definition as suggested by Josh Triplett.

Reported-by: Paul Mackerras
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2013-06-11 04:44:56 +0800
05eb552bf rcu: Move redundant call to note_gp_changes() into called function ... Browse Code »

The __rcu_process_callbacks() invokes note_gp_changes() immediately
before invoking rcu_check_quiescent_state(), which conditionally
invokes that same function. This commit therefore eliminates the
call to note_gp_changes() in __rcu_process_callbacks() in favor of
making unconditional to call from rcu_check_quiescent_state() to
note_gp_changes().

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:39:44 +0800
ce3d9c03d rcu: Inline trivial wrapper function rcu_start_gp_per_cpu() ... Browse Code »

Given the changes that introduce note_gp_change(), rcu_start_gp_per_cpu()
is now a trivial wrapper function with only one caller. This commit
therefore inlines it into its sole call site.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:39:44 +0800
63274cfb9 rcu: Eliminate check_for_new_grace_period() wrapper function ... Browse Code »

One of the calls to check_for_new_grace_period() is now redundant due to
an immediately preceding call to note_gp_changes(). Eliminating this
redundant call leaves a single caller, which is simpler if inlined.
This commit therefore eliminates the redundant call and inlines the
body of check_for_new_grace_period() into the single remaining call site.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:39:44 +0800
ba9fbe955 rcu: Merge __rcu_process_gp_end() into __note_gp_changes() ... Browse Code »

This commit eliminates some duplicated code by merging
__rcu_process_gp_end() into __note_gp_changes().

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:39:43 +0800
470716fc0 rcu: Switch callers from rcu_process_gp_end() to note_gp_changes() ... Browse Code »

Because note_gp_changes() now incorporates rcu_process_gp_end() function,
this commit switches to the former and eliminates the latter. In
addition, this commit changes external calls from __rcu_process_gp_end()
to __note_gp_changes().

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:39:43 +0800
d34ea3221 rcu: Rename note_new_gpnum() to note_gp_changes() ... Browse Code »

Because note_new_gpnum() now also checks for the ends of old grace periods,
this commit changes its name to note_gp_changes(). Later commits will merge
rcu_process_gp_end() into note_gp_changes().

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:39:42 +0800
398ebe600 rcu: Make __note_new_gpnum() check for ends of prior grace periods ... Browse Code »

The current implementation can detect the beginning of a new grace period
before noting the end of a previous grace period. Although the current
implementation correctly handles this sort of nonsense, it would be
good to reduce RCU's state space by making such nonsense unnecessary,
which is now possible thanks to the fact that RCU's callback groups are
now numbered.

This commit therefore makes __note_new_gpnum() invoke
__rcu_process_gp_end() in order to note the ends of prior grace
periods before noting the beginnings of new grace periods.
Of course, this now means that note_new_gpnum() notes both the
beginnings and ends of grace periods, and could therefore be
used in place of rcu_process_gp_end(). But that is a job for
later commits.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:39:42 +0800
6eaef633d rcu: Move code to apply callback-numbering simplifications ... Browse Code »

The addition of callback numbering allows combining the detection of the
ends of old grace periods and the beginnings of new grace periods. This
commit moves code to set the stage for this combining.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:39:42 +0800
d7f3e2073 rcu: Convert rcutree.c printk calls ... Browse Code »

This commit converts printk() calls to the corresponding pr_*() calls.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:39:41 +0800
971394f38 rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration ... Browse Code »

In Steven Rostedt's words:

> I've been debugging the last couple of days why my tests have been
> locking up. One of my tracing tests, runs all available tracers. The
> lockup always happened with the mmiotrace, which is used to trace
> interactions between priority drivers and the kernel. But to do this
> easily, when the tracer gets registered, it disables all but the boot
> CPUs. The lockup always happened after it got done disabling the CPUs.
>
> Then I decided to try this:
>
> while :; do
> for i in 1 2 3; do
> echo 0 > /sys/devices/system/cpu/cpu$i/online
> done
> for i in 1 2 3; do
> echo 1 > /sys/devices/system/cpu/cpu$i/online
> done
> done
>
> Well, sure enough, that locked up too, with the same users. Doing a
> sysrq-w (showing all blocked tasks):
>
> [ 2991.344562] task PC stack pid father
> [ 2991.344562] rcu_preempt D ffff88007986fdf8 0 10 2 0x00000000
> [ 2991.344562] ffff88007986fc98 0000000000000002 ffff88007986fc48 0000000000000908
> [ 2991.344562] ffff88007986c280 ffff88007986ffd8 ffff88007986ffd8 00000000001d3c80
> [ 2991.344562] ffff880079248a40 ffff88007986c280 0000000000000000 00000000fffd4295
> [ 2991.344562] Call Trace:
> [ 2991.344562] [] schedule+0x64/0x66
> [ 2991.344562] [] schedule_timeout+0xbc/0xf9
> [ 2991.344562] [] ? ftrace_call+0x5/0x2f
> [ 2991.344562] [] ? cascade+0xa8/0xa8
> [ 2991.344562] [] schedule_timeout_uninterruptible+0x1e/0x20
> [ 2991.344562] [] rcu_gp_kthread+0x502/0x94b
> [ 2991.344562] [] ? __init_waitqueue_head+0x50/0x50
> [ 2991.344562] [] ? rcu_gp_fqs+0x64/0x64
> [ 2991.344562] [] kthread+0xb1/0xb9
> [ 2991.344562] [] ? lock_release_holdtime.part.23+0x4e/0x55
> [ 2991.344562] [] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562] [] ret_from_fork+0x7c/0xb0
> [ 2991.344562] [] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562] kworker/0:1 D ffffffff81a30680 0 47 2 0x00000000
> [ 2991.344562] Workqueue: events cpuset_hotplug_workfn
> [ 2991.344562] ffff880078dbbb58 0000000000000002 0000000000000006 00000000000000d8
> [ 2991.344562] ffff880078db8100 ffff880078dbbfd8 ffff880078dbbfd8 00000000001d3c80
> [ 2991.344562] ffff8800779ca5c0 ffff880078db8100 ffffffff81541fcf 0000000000000000
> [ 2991.344562] Call Trace:
> [ 2991.344562] [] ? __mutex_lock_common+0x3d4/0x609
> [ 2991.344562] [] schedule+0x64/0x66
> [ 2991.344562] [] schedule_preempt_disabled+0x18/0x24
> [ 2991.344562] [] __mutex_lock_common+0x3d4/0x609
> [ 2991.344562] [] ? get_online_cpus+0x3c/0x50
> [ 2991.344562] [] ? get_online_cpus+0x3c/0x50
> [ 2991.344562] [] mutex_lock_nested+0x3b/0x40
> [ 2991.344562] [] get_online_cpus+0x3c/0x50
> [ 2991.344562] [] rebuild_sched_domains_locked+0x6e/0x3a8
> [ 2991.344562] [] rebuild_sched_domains+0x1c/0x2a
> [ 2991.344562] [] cpuset_hotplug_workfn+0x1c7/0x1d3
> [ 2991.344562] [] ? cpuset_hotplug_workfn+0x5/0x1d3
> [ 2991.344562] [] process_one_work+0x2d4/0x4d1
> [ 2991.344562] [] ? process_one_work+0x207/0x4d1
> [ 2991.344562] [] worker_thread+0x2e7/0x3b5
> [ 2991.344562] [] ? rescuer_thread+0x332/0x332
> [ 2991.344562] [] kthread+0xb1/0xb9
> [ 2991.344562] [] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562] [] ret_from_fork+0x7c/0xb0
> [ 2991.344562] [] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562] bash D ffffffff81a4aa80 0 2618 2612 0x10000000
> [ 2991.344562] ffff8800379abb58 0000000000000002 0000000000000006 0000000000000c2c
> [ 2991.344562] ffff880077fea140 ffff8800379abfd8 ffff8800379abfd8 00000000001d3c80
> [ 2991.344562] ffff8800779ca5c0 ffff880077fea140 ffffffff81541fcf 0000000000000000
> [ 2991.344562] Call Trace:
> [ 2991.344562] [] ? __mutex_lock_common+0x3d4/0x609
> [ 2991.344562] [] schedule+0x64/0x66
> [ 2991.344562] [] schedule_preempt_disabled+0x18/0x24
> [ 2991.344562] [] __mutex_lock_common+0x3d4/0x609
> [ 2991.344562] [] ? rcu_cpu_notify+0x2f5/0x86e
> [ 2991.344562] [] ? rcu_cpu_notify+0x2f5/0x86e
> [ 2991.344562] [] mutex_lock_nested+0x3b/0x40
> [ 2991.344562] [] rcu_cpu_notify+0x2f5/0x86e
> [ 2991.344562] [] ? __lock_is_held+0x32/0x53
> [ 2991.344562] [] notifier_call_chain+0x6b/0x98
> [ 2991.344562] [] __raw_notifier_call_chain+0xe/0x10
> [ 2991.344562] [] __cpu_notify+0x20/0x32
> [ 2991.344562] [] cpu_notify_nofail+0x17/0x36
> [ 2991.344562] [] _cpu_down+0x154/0x259
> [ 2991.344562] [] cpu_down+0x2d/0x3a
> [ 2991.344562] [] store_online+0x4e/0xe7
> [ 2991.344562] [] dev_attr_store+0x20/0x22
> [ 2991.344562] [] sysfs_write_file+0x108/0x144
> [ 2991.344562] [] vfs_write+0xfd/0x158
> [ 2991.344562] [] SyS_write+0x5c/0x83
> [ 2991.344562] [] tracesys+0xdd/0xe2
>
> As well as held locks:
>
> [ 3034.728033] Showing all locks held in the system:
> [ 3034.728033] 1 lock held by rcu_preempt/10:
> [ 3034.728033] #0: (rcu_preempt_state.onoff_mutex){+.+...}, at: [] rcu_gp_kthread+0x167/0x94b
> [ 3034.728033] 4 locks held by kworker/0:1/47:
> [ 3034.728033] #0: (events){.+.+.+}, at: [] process_one_work+0x207/0x4d1
> [ 3034.728033] #1: (cpuset_hotplug_work){+.+.+.}, at: [] process_one_work+0x207/0x4d1
> [ 3034.728033] #2: (cpuset_mutex){+.+.+.}, at: [] rebuild_sched_domains+0x17/0x2a
> [ 3034.728033] #3: (cpu_hotplug.lock){+.+.+.}, at: [] get_online_cpus+0x3c/0x50
> [ 3034.728033] 1 lock held by mingetty/2563:
> [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2565:
> [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2569:
> [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2572:
> [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2575:
> [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [] n_tty_read+0x252/0x7e8
> [ 3034.728033] 7 locks held by bash/2618:
> [ 3034.728033] #0: (sb_writers#5){.+.+.+}, at: [] file_start_write+0x2a/0x2c
> [ 3034.728033] #1: (&buffer->mutex#2){+.+.+.}, at: [] sysfs_write_file+0x3c/0x144
> [ 3034.728033] #2: (s_active#54){.+.+.+}, at: [] sysfs_write_file+0xe7/0x144
> [ 3034.728033] #3: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [] cpu_hotplug_driver_lock+0x17/0x19
> [ 3034.728033] #4: (cpu_add_remove_lock){+.+.+.}, at: [] cpu_maps_update_begin+0x17/0x19
> [ 3034.728033] #5: (cpu_hotplug.lock){+.+.+.}, at: [] cpu_hotplug_begin+0x2c/0x6d
> [ 3034.728033] #6: (rcu_preempt_state.onoff_mutex){+.+...}, at: [] rcu_cpu_notify+0x2f5/0x86e
> [ 3034.728033] 1 lock held by bash/2980:
> [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [] n_tty_read+0x252/0x7e8
>
> Things looked a little weird. Also, this is a deadlock that lockdep did
> not catch. But what we have here does not look like a circular lock
> issue:
>
> Bash is blocked in rcu_cpu_notify():
>
> 1961 /* Exclude any attempts to start a new grace period. */
> 1962 mutex_lock(&rsp->onoff_mutex);
>
>
> kworker is blocked in get_online_cpus(), which makes sense as we are
> currently taking down a CPU.
>
> But rcu_preempt is not blocked on anything. It is simply sleeping in
> rcu_gp_kthread (really rcu_gp_init) here:
>
> 1453 #ifdef CONFIG_PROVE_RCU_DELAY
> 1454 if ((prandom_u32() % (rcu_num_nodes * 8)) == 0 &&
> 1455 system_state == SYSTEM_RUNNING)
> 1456 schedule_timeout_uninterruptible(2);
> 1457 #endif /* #ifdef CONFIG_PROVE_RCU_DELAY */
>
> And it does this while holding the onoff_mutex that bash is waiting for.
>
> Doing a function trace, it showed me where it happened:
>
> [ 125.940066] rcu_pree-10 3.... 28384115273: schedule_timeout_uninterruptible [...]
> [ 125.940066] rcu_pree-10 3d..3 28384202439: sched_switch: prev_comm=rcu_preempt prev_pid=10 prev_prio=120 prev_state=D ==> next_comm=watchdog/3 next_pid=38 next_prio=120
>
> The watchdog ran, and then:
>
> [ 125.940066] watchdog-38 3d..3 28384692863: sched_switch: prev_comm=watchdog/3 prev_pid=38 prev_prio=120 prev_state=P ==> next_comm=modprobe next_pid=2848 next_prio=118
>
> Not sure what modprobe was doing, but shortly after that:
>
> [ 125.940066] modprobe-2848 3d..3 28385041749: sched_switch: prev_comm=modprobe prev_pid=2848 prev_prio=118 prev_state=R+ ==> next_comm=migration/3 next_pid=40 next_prio=0
>
> Where the migration thread took down the CPU:
>
> [ 125.940066] migratio-40 3d..3 28389148276: sched_switch: prev_comm=migration/3 prev_pid=40 prev_prio=0 prev_state=P ==> next_comm=swapper/3 next_pid=0 next_prio=120
>
> which finally did:
>
> [ 125.940066] -0 3...1 28389282142: arch_cpu_idle_dead [ 125.940066] -0 3...1 28389282548: native_play_dead [ 125.940066] -0 3...1 28389282924: play_dead_common [ 125.940066] -0 3...1 28389283468: idle_task_exit [ 125.940066] -0 3...1 28389284644: amd_e400_remove_cpu
>
> CPU 3 is now offline, the rcu_preempt thread that ran on CPU 3 is still
> doing a schedule_timeout_uninterruptible() and it registered it's
> timeout to the timer base for CPU 3. You would think that it would get
> migrated right? The issue here is that the timer migration happens at
> the CPU notifier for CPU_DEAD. The problem is that the rcu notifier for
> CPU_DOWN is blocked waiting for the onoff_mutex to be released, which is
> held by the thread that just put itself into a uninterruptible sleep,
> that wont wake up until the CPU_DEAD notifier of the timer
> infrastructure is called, which wont happen until the rcu notifier
> finishes. Here's our deadlock!

This commit breaks this deadlock cycle by substituting a shorter udelay()
for the previous schedule_timeout_uninterruptible(), while at the same
time increasing the probability of the delay. This maintains the intensity
of the testing.

Reported-by: Steven Rostedt
Signed-off-by: Paul E. McKenney
Tested-by: Steven Rostedt

Paul E. McKenney
2013-06-11 04:37:12 +0800
016a8d5be rcu: Don't call wakeup() with rcu_node structure ->lock held ... Browse Code »

This commit fixes a lockdep-detected deadlock by moving a wake_up()
call out from a rnp->lock critical section. Please see below for
the long version of this story.

On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:

> [12572.705832] ======================================================
> [12572.750317] [ INFO: possible circular locking dependency detected ]
> [12572.796978] 3.10.0-rc3+ #39 Not tainted
> [12572.833381] -------------------------------------------------------
> [12572.862233] trinity-child17/31341 is trying to acquire lock:
> [12572.870390] (rcu_node_0){..-.-.}, at: [] rcu_read_unlock_special+0x9f/0x4c0
> [12572.878859]
> but task is already holding lock:
> [12572.894894] (&ctx->lock){-.-...}, at: [] perf_lock_task_context+0x7d/0x2d0
> [12572.903381]
> which lock already depends on the new lock.
>
> [12572.927541]
> the existing dependency chain (in reverse order) is:
> [12572.943736]
> -> #4 (&ctx->lock){-.-...}:
> [12572.960032] [] lock_acquire+0x91/0x1f0
> [12572.968337] [] _raw_spin_lock+0x40/0x80
> [12572.976633] [] __perf_event_task_sched_out+0x2e7/0x5e0
> [12572.984969] [] perf_event_task_sched_out+0x93/0xa0
> [12572.993326] [] __schedule+0x2cf/0x9c0
> [12573.001652] [] schedule_user+0x2e/0x70
> [12573.009998] [] retint_careful+0x12/0x2e
> [12573.018321]
> -> #3 (&rq->lock){-.-.-.}:
> [12573.034628] [] lock_acquire+0x91/0x1f0
> [12573.042930] [] _raw_spin_lock+0x40/0x80
> [12573.051248] [] wake_up_new_task+0xb7/0x260
> [12573.059579] [] do_fork+0x105/0x470
> [12573.067880] [] kernel_thread+0x26/0x30
> [12573.076202] [] rest_init+0x23/0x140
> [12573.084508] [] start_kernel+0x3f1/0x3fe
> [12573.092852] [] x86_64_start_reservations+0x2a/0x2c
> [12573.101233] [] x86_64_start_kernel+0xcc/0xcf
> [12573.109528]
> -> #2 (&p->pi_lock){-.-.-.}:
> [12573.125675] [] lock_acquire+0x91/0x1f0
> [12573.133829] [] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.141964] [] try_to_wake_up+0x31/0x320
> [12573.150065] [] default_wake_function+0x12/0x20
> [12573.158151] [] autoremove_wake_function+0x18/0x40
> [12573.166195] [] __wake_up_common+0x58/0x90
> [12573.174215] [] __wake_up+0x39/0x50
> [12573.182146] [] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.190119] [] rcu_start_future_gp+0x1c9/0x1f0
> [12573.198023] [] rcu_nocb_kthread+0x114/0x930
> [12573.205860] [] kthread+0xed/0x100
> [12573.213656] [] ret_from_fork+0x7c/0xb0
> [12573.221379]
> -> #1 (&rsp->gp_wq){..-.-.}:
> [12573.236329] [] lock_acquire+0x91/0x1f0
> [12573.243783] [] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.251178] [] __wake_up+0x23/0x50
> [12573.258505] [] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.265891] [] rcu_start_future_gp+0x1c9/0x1f0
> [12573.273248] [] rcu_nocb_kthread+0x114/0x930
> [12573.280564] [] kthread+0xed/0x100
> [12573.287807] [] ret_from_fork+0x7c/0xb0

Notice the above call chain.

rcu_start_future_gp() is called with the rnp->lock held. Then it calls
rcu_start_gp_advance, which does a wakeup.

You can't do wakeups while holding the rnp->lock, as that would mean
that you could not do a rcu_read_unlock() while holding the rq lock, or
any lock that was taken while holding the rq lock. This is because...
(See below).

> [12573.295067]
> -> #0 (rcu_node_0){..-.-.}:
> [12573.309293] [] __lock_acquire+0x1786/0x1af0
> [12573.316568] [] lock_acquire+0x91/0x1f0
> [12573.323825] [] _raw_spin_lock+0x40/0x80
> [12573.331081] [] rcu_read_unlock_special+0x9f/0x4c0
> [12573.338377] [] __rcu_read_unlock+0x96/0xa0
> [12573.345648] [] perf_lock_task_context+0x143/0x2d0
> [12573.352942] [] find_get_context+0x4e/0x1f0
> [12573.360211] [] SYSC_perf_event_open+0x514/0xbd0
> [12573.367514] [] SyS_perf_event_open+0x9/0x10
> [12573.374816] [] tracesys+0xdd/0xe2

Notice the above trace.

perf took its own ctx->lock, which can be taken while holding the rq
lock. While holding this lock, it did a rcu_read_unlock(). The
perf_lock_task_context() basically looks like:

rcu_read_lock();
raw_spin_lock(ctx->lock);
rcu_read_unlock();

Now, what looks to have happened, is that we scheduled after taking that
first rcu_read_lock() but before taking the spin lock. When we scheduled
back in and took the ctx->lock, the following rcu_read_unlock()
triggered the "special" code.

The rcu_read_unlock_special() takes the rnp->lock, which gives us a
possible deadlock scenario.

CPU0 CPU1 CPU2
---- ---- ----

rcu_nocb_kthread()
lock(rq->lock);
lock(ctx->lock);
lock(rnp->lock);

wake_up();

lock(rq->lock);

rcu_read_unlock();

rcu_read_unlock_special();

lock(rnp->lock);
lock(ctx->lock);

**** DEADLOCK ****

> [12573.382068]
> other info that might help us debug this:
>
> [12573.403229] Chain exists of:
> rcu_node_0 --> &rq->lock --> &ctx->lock
>
> [12573.424471] Possible unsafe locking scenario:
>
> [12573.438499] CPU0 CPU1
> [12573.445599] ---- ----
> [12573.452691] lock(&ctx->lock);
> [12573.459799] lock(&rq->lock);
> [12573.467010] lock(&ctx->lock);
> [12573.474192] lock(rcu_node_0);
> [12573.481262]
> *** DEADLOCK ***
>
> [12573.501931] 1 lock held by trinity-child17/31341:
> [12573.508990] #0: (&ctx->lock){-.-...}, at: [] perf_lock_task_context+0x7d/0x2d0
> [12573.516475]
> stack backtrace:
> [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
> [12573.545357] ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
> [12573.552868] ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
> [12573.560353] 0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
> [12573.567856] Call Trace:
> [12573.575011] [] dump_stack+0x19/0x1b
> [12573.582284] [] print_circular_bug+0x200/0x20f
> [12573.589637] [] __lock_acquire+0x1786/0x1af0
> [12573.596982] [] ? sched_clock_cpu+0xb5/0x100
> [12573.604344] [] lock_acquire+0x91/0x1f0
> [12573.611652] [] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.619030] [] _raw_spin_lock+0x40/0x80
> [12573.626331] [] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.633671] [] rcu_read_unlock_special+0x9f/0x4c0
> [12573.640992] [] ? perf_lock_task_context+0x7d/0x2d0
> [12573.648330] [] ? put_lock_stats.isra.29+0xe/0x40
> [12573.655662] [] ? delay_tsc+0x90/0xe0
> [12573.662964] [] __rcu_read_unlock+0x96/0xa0
> [12573.670276] [] perf_lock_task_context+0x143/0x2d0
> [12573.677622] [] ? __perf_event_enable+0x370/0x370
> [12573.684981] [] find_get_context+0x4e/0x1f0
> [12573.692358] [] SYSC_perf_event_open+0x514/0xbd0
> [12573.699753] [] ? get_parent_ip+0xd/0x50
> [12573.707135] [] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [12573.714599] [] SyS_perf_event_open+0x9/0x10
> [12573.721996] [] tracesys+0xdd/0xe2

This commit delays the wakeup via irq_work(), which is what
perf and ftrace use to perform wakeups in critical sections.

Reported-by: Dave Jones
Signed-off-by: Steven Rostedt
Signed-off-by: Paul E. McKenney

Steven Rostedt
2013-06-11 04:37:11 +0800

02 May, 2013

1 commit

c032862fb Merge commit '8700c95adb03 ' into timers/nohz ... Browse Code »

The full dynticks tree needs the latest RCU and sched
upstream updates in order to fix some dependencies.

Merge a common upstream merge point that has these
updates.

Conflicts:
include/linux/perf_event.h
kernel/rcutree.h
kernel/rcutree_plugin.h

Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2013-05-02 23:54:19 +0800

30 Apr, 2013

2 commits

1f889ec62 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull RCU updates from Ingo Molnar:
"The main changes in this cycle are mostly related to preparatory work
for the full-dynticks work:

- Remove restrictions on no-CBs CPUs, make RCU_FAST_NO_HZ take
advantage of numbered callbacks, do callback accelerations based on
numbered callbacks. Posted to LKML at
https://lkml.org/lkml/2013/3/18/960

- RCU documentation updates. Posted to LKML at
https://lkml.org/lkml/2013/3/18/570

- Miscellaneous fixes. Posted to LKML at
https://lkml.org/lkml/2013/3/18/594"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
rcu: Make rcu_accelerate_cbs() note need for future grace periods
rcu: Abstract rcu_start_future_gp() from rcu_nocb_wait_gp()
rcu: Rename n_nocb_gp_requests to need_future_gp
rcu: Push lock release to rcu_start_gp()'s callers
rcu: Repurpose no-CBs event tracing to future-GP events
rcu: Rearrange locking in rcu_start_gp()
rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks
rcu: Accelerate RCU callbacks at grace-period end
rcu: Export RCU_FAST_NO_HZ parameters to sysfs
rcu: Distinguish "rcuo" kthreads by RCU flavor
rcu: Add event tracing for no-CBs CPUs' grace periods
rcu: Add event tracing for no-CBs CPUs' callback registration
rcu: Introduce proper blocking to no-CBs kthreads GP waits
rcu: Provide compile-time control for no-CBs CPUs
rcu: Tone down debugging during boot-up and shutdown.
rcu: Add softirq-stall indications to stall-warning messages
rcu: Documentation update
rcu: Make bugginess of code sample more evident
rcu: Fix hlist_bl_set_first_rcu() annotation
rcu: Delete unused rcu_node "wakemask" field
...

Linus Torvalds
2013-04-30 22:39:01 +0800
6d65df332 kernel/: rename random32() to prandom_u32() ... Browse Code »

Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2013-04-30 09:28:42 +0800

19 Apr, 2013

1 commit

d1e43fa5f nohz: Ensure full dynticks CPUs are RCU nocbs ... Browse Code »

We need full dynticks CPU to also be RCU nocb so
that we don't have to keep the tick to handle RCU
callbacks.

Make sure the range passed to nohz_full= boot
parameter is a subset of rcu_nocbs=

The CPUs that fail to meet this requirement will be
excluded from the nohz_full range. This is checked
early in boot time, before any CPU has the opportunity
to stop its tick.

Suggested-by: Steven Rostedt
Reviewed-by: Paul E. McKenney
Signed-off-by: Frederic Weisbecker
Cc: Andrew Morton
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-04-19 19:54:04 +0800

16 Apr, 2013

1 commit

65d798f0f rcu: Kick adaptive-ticks CPUs that are holding up RCU grace periods ... Browse Code »

Adaptive-ticks CPUs inform RCU when they enter kernel mode, but they do
not necessarily turn the scheduler-clock tick back on. This state of
affairs could result in RCU waiting on an adaptive-ticks CPU running
for an extended period in kernel mode. Such a CPU will never run the
RCU state machine, and could therefore indefinitely extend the RCU state
machine, sooner or later resulting in an OOM condition.

This patch, inspired by an earlier patch by Frederic Weisbecker, therefore
causes RCU's force-quiescent-state processing to check for this condition
and to send an IPI to CPUs that remain in that state for too long.
"Too long" currently means about three jiffies by default, which is
quite some time for a CPU to remain in the kernel without blocking.
The rcu_tree.jiffies_till_first_fqs and rcutree.jiffies_till_next_fqs
sysfs variables may be used to tune "too long" if needed.

Reported-by: Frederic Weisbecker
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett
Signed-off-by: Frederic Weisbecker
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Paul E. McKenney
2013-04-16 02:18:36 +0800

26 Mar, 2013

9 commits

6d8766935 Merge branches 'doc.2013.03.12a', 'fixes.2013.03.13a' and 'idlenocb.2013.03.26b' into HEAD ... Browse Code »

doc.2013.03.12a: Documentation changes.

fixes.2013.03.13a: Miscellaneous fixes.

idlenocb.2013.03.26b: Remove restrictions on no-CBs CPUs, make
RCU_FAST_NO_HZ take advantage of numbered callbacks, add
callback acceleration based on numbered callbacks.

Paul E. McKenney
2013-03-26 23:07:38 +0800
910ee45db rcu: Make rcu_accelerate_cbs() note need for future grace periods ... Browse Code »

Now that rcu_start_future_gp() has been abstracted from
rcu_nocb_wait_gp(), rcu_accelerate_cbs() can invoke rcu_start_future_gp()
so as to register the need for any future grace periods needed by a
CPU about to enter dyntick-idle mode. This commit makes this change.
Note that some refactoring of rcu_start_gp() is carried out to avoid
recursion and subsequent self-deadlocks.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2013-03-26 23:04:58 +0800
0446be489 rcu: Abstract rcu_start_future_gp() from rcu_nocb_wait_gp() ... Browse Code »

CPUs going idle will need to record the need for a future grace
period, but won't actually need to block waiting on it. This commit
therefore splits rcu_start_future_gp(), which does the recording, from
rcu_nocb_wait_gp(), which now invokes rcu_start_future_gp() to do the
recording, after which rcu_nocb_wait_gp() does the waiting.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2013-03-26 23:04:57 +0800
b8462084a rcu: Push lock release to rcu_start_gp()'s callers ... Browse Code »

If CPUs are to give prior notice of needed grace periods, it will be
necessary to invoke rcu_start_gp() without dropping the root rcu_node
structure's ->lock. This commit takes a second step in this direction
by moving the release of this lock to rcu_start_gp()'s callers.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2013-03-26 23:04:55 +0800
b92db6cb7 rcu: Rearrange locking in rcu_start_gp() ... Browse Code »

If CPUs are to give prior notice of needed grace periods, it will be
necessary to invoke rcu_start_gp() without dropping the root rcu_node
structure's ->lock. This commit takes a first step in this direction
by moving the release of this lock to the end of rcu_start_gp().

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2013-03-26 23:04:52 +0800
c0f4dfd4f rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks ... Browse Code »

Because RCU callbacks are now associated with the number of the grace
period that they must wait for, CPUs can now take advance callbacks
corresponding to grace periods that ended while a given CPU was in
dyntick-idle mode. This eliminates the need to try forcing the RCU
state machine while entering idle, thus reducing the CPU intensiveness
of RCU_FAST_NO_HZ, which should increase its energy efficiency.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2013-03-26 23:04:51 +0800
b11cc5760 rcu: Accelerate RCU callbacks at grace-period end ... Browse Code »

Now that callback acceleration is idempotent, it is safe to accelerate
callbacks during grace-period cleanup on any CPUs that the kthread happens
to be running on. This commit therefore propagates the completion
of the grace period to the per-CPU data structures, and also adds an
rcu_advance_cbs() just before the cpu_needs_another_gp() check in order
to reduce false-positive grace periods.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2013-03-26 23:04:50 +0800
a48898585 rcu: Distinguish "rcuo" kthreads by RCU flavor ... Browse Code »

Currently, the per-no-CBs-CPU kthreads are named "rcuo" followed by
the CPU number, for example, "rcuo". This is problematic given that
there are either two or three RCU flavors, each of which gets a per-CPU
kthread with exactly the same name. This commit therefore introduces
a one-letter abbreviation for each RCU flavor, namely 'b' for RCU-bh,
'p' for RCU-preempt, and 's' for RCU-sched. This abbreviation is used
to distinguish the "rcuo" kthreads, for example, for CPU 0 we would have
"rcuob/0", "rcuop/0", and "rcuos/0".

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Tested-by: Dietmar Eggemann

Paul E. McKenney
2013-03-26 23:04:48 +0800
dae6e64d2 rcu: Introduce proper blocking to no-CBs kthreads GP waits ... Browse Code »

Currently, the no-CBs kthreads do repeated timed waits for grace periods
to elapse. This is crude and energy inefficient, so this commit allows
no-CBs kthreads to specify exactly which grace period they are waiting
for and also allows them to block for the entire duration until the
desired grace period completes.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2013-03-26 23:04:44 +0800

14 Mar, 2013

1 commit

81e59494a rcu: Tone down debugging during boot-up and shutdown. ... Browse Code »

In some situations, randomly delaying RCU grace-period initialization
can cause more trouble than help. This commit therefore restricts this
type of RCU self-torture to runtime, giving it a rest during boot and
shutdown.

Reported-by: Sasha Levin
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2013-03-14 05:44:25 +0800

13 Mar, 2013

4 commits

0bdf5984a rcu: Remove comment referring to __stop_machine() ... Browse Code »

Although it used to be that CPU_DYING notifiers executed on the outgoing
CPU with interrupts disabled and with all other CPUs spinning, this is
no longer the case. This commit therefore removes this obsolete comment.

Signed-off-by: Srivatsa S. Bhat
Signed-off-by: Paul E. McKenney

Srivatsa S. Bhat
2013-03-13 05:07:38 +0800
b0f740360 rcu: Avoid invoking RCU core on offline CPUs ... Browse Code »

Offline CPUs transition through the scheduler to the idle loop one
last time before being shut down. This can result in RCU raising
softirq on this CPU, which is at best useless given that the CPU's
callbacks will be offloaded at CPU_DEAD time. This commit therefore
avoids raising softirq on offline CPUs.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2013-03-13 05:07:37 +0800
b5b393601 rcu: Fix spacing problem ... Browse Code »

Signed-off-by: Jiang Fang
Signed-off-by: Paul E. McKenney

Jiang Fang
2013-03-13 05:07:33 +0800
34ed62461 rcu: Remove restrictions on no-CBs CPUs ... Browse Code »

Currently, CPU 0 is constrained to not be a no-CBs CPU, and furthermore
at least one no-CBs CPU must remain online at any given time. These
restrictions are problematic in some situations, such as cases where
all CPUs must run a real-time workload that needs to be insulated from
OS jitter and latencies due to RCU callback invocation. This commit
therefore provides no-CBs CPUs a (very crude and energy-inefficient)
way to start and to wait for grace periods independently of the normal
RCU callback mechanisms. This approach allows any or all of the CPUs to
be designated as no-CBs CPUs, and allows any proper subset of the CPUs
(whether no-CBs CPUs or not) to be offlined.

This commit also provides a fix for a locking bug spotted by Xie
ChanglongX .

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2013-03-13 02:17:51 +0800

29 Jan, 2013

2 commits

40393f525 Merge branches 'doctorture.2013.01.29a', 'fixes.2013.01.26a', 'tagcb.2013.01.24a… ... Browse Code »

…' and 'tiny.2013.01.29b' into HEAD

doctorture.2013.01.11a: Changes to rcutorture and to RCU documentation.

fixes.2013.01.26a: Miscellaneous fixes.

tagcb.2013.01.24a: Tag RCU callbacks with grace-period number to
simplify callback advancement.

tiny.2013.01.29b: Enhancements to uniprocessor handling in tiny RCU.

Paul E. McKenney
2013-01-29 14:25:21 +0800
6bfc09e23 rcu: Provide RCU CPU stall warnings for tiny RCU ... Browse Code »

Tiny RCU has historically omitted RCU CPU stall warnings in order to
reduce memory requirements, however, lack of these warnings caused
Thomas Gleixner some debugging pain recently. Therefore, this commit
adds RCU CPU stall warnings to tiny RCU if RCU_TRACE=y. This keeps
the memory footprint small, while still enabling CPU stall warnings
in kernels built to enable them.

Updated to include Josh Triplett's suggested use of RCU_STALL_COMMON
config variable to simplify #if expressions.

Reported-by: Thomas Gleixner
Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-01-29 14:06:21 +0800

27 Jan, 2013

2 commits

347f42382 rcu: Remove unused code originally used for context tracking ... Browse Code »

As context tracking subsystem evolved, it stopped using ignore_user_qs
and in_user defined in the rcu_dynticks structure. This commit therefore
removes them.

Signed-off-by: Li Zhong
Signed-off-by: Paul E. McKenney
Acked-by: Frederic Weisbecker

Li Zhong
2013-01-27 08:34:48 +0800
b44f66562 rcu: Correct 'optimized' to 'optimize' in header comment ... Browse Code »

Small grammar fix in rcutree comment regarding 'rcu_scheduler_active'
var.

Signed-off-by: Cody P Schafer
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Cody P Schafer
2013-01-27 08:34:13 +0800

09 Jan, 2013

2 commits

6d4b418c7 rcu: Trace callback acceleration ... Browse Code »

This commit adds event tracing for callback acceleration to allow better
tracking of callbacks through the system.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2013-01-09 06:15:57 +0800
dc35c8934 rcu: Tag callback lists with corresponding grace-period number ... Browse Code »

Currently, callbacks are advanced each time the corresponding CPU
notices a change in its leaf rcu_node structure's ->completed value
(this value counts grace-period completions). This approach has worked
quite well, but with the advent of RCU_FAST_NO_HZ, we cannot count on
a given CPU seeing all the grace-period completions. When a CPU misses
a grace-period completion that occurs while it is in dyntick-idle mode,
this will delay invocation of its callbacks.

In addition, acceleration of callbacks (when RCU realizes that a given
callback need only wait until the end of the next grace period, rather
than having to wait for a partial grace period followed by a full
grace period) must be carried out extremely carefully. Insufficient
acceleration will result in unnecessarily long grace-period latencies,
while excessive acceleration will result in premature callback invocation.
Changes that involve this tradeoff are therefore among the most
nerve-wracking changes to RCU.

This commit therefore explicitly tags groups of callbacks with the
number of the grace period that they are waiting for. This means that
callback-advancement and callback-acceleration functions are idempotent,
so that excessive acceleration will merely waste a few CPU cycles. This
also allows a CPU to take full advantage of any grace periods that have
elapsed while it has been in dyntick-idle mode. It should also enable
simulataneous simplifications to and optimizations of RCU_FAST_NO_HZ.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2013-01-09 06:15:57 +0800