Eric Lee / smarc-fsl-linux-kernel

11 Jun, 2013

13 commits

2439b696c rcu: Shrink TINY_RCU by moving exit_rcu() ... Browse Code »

Now that TINY_PREEMPT_RCU is no more, exit_rcu() is always an empty
function. But if TINY_RCU is going to have an empty function, it should
be in include/linux/rcutiny.h, where it does not bloat the kernel.
This commit therefore moves exit_rcu() out of kernel/rcupdate.c to
kernel/rcutree_plugin.h, and places a static inline empty function in
include/linux/rcutiny.h in order to shrink TINY_RCU a bit.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:45:52 +0800
318bdcd95 rcu: Consolidate rcutiny_plugin.h ifdefs ... Browse Code »

This commit rearranges code in order to allow ifdefs to be consolidated
in kernel/rcutiny_plugin.h, simplifying the code.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:45:52 +0800
4879c84da rcu: Remove check_cpu_stall_preempt() ... Browse Code »

With the removal of CONFIG_TINY_PREEMPT_RCU, check_cpu_stall_preempt()
is now an empty function. This commit therefore eliminates it by
inlining it.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:45:51 +0800
9dc5ad324 rcu: Simplify RCU_TINY RCU callback invocation ... Browse Code »

TINY_PREEMPT_RCU could use a kthread to handle RCU callback invocation,
which required an API to abstract kthread vs. softirq invocation.
Now that TINY_PREEMPT_RCU is no longer with us, this commit retires
this API in favor of direct use of the relevant softirq primitives.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:45:51 +0800
58c4e69d4 rcu: Remove rcu_preempt_process_callbacks() ... Browse Code »

With the removal of CONFIG_TINY_PREEMPT_RCU, rcu_preempt_process_callbacks()
is now an empty function. This commit therefore eliminates it by
inlining it.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:45:50 +0800
47d65935a rcu: Remove rcu_preempt_remove_callbacks() ... Browse Code »

With the removal of CONFIG_TINY_PREEMPT_RCU, rcu_preempt_remove_callbacks()
is now an empty function. This commit therefore eliminates it by
inlining it.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:45:50 +0800
9acaac8ce rcu: Remove rcu_preempt_check_callbacks() ... Browse Code »

With the removal of CONFIG_TINY_PREEMPT_RCU, rcu_preempt_check_callbacks()
is now an empty function. This commit therefore eliminates it by
inlining it.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:45:50 +0800
221304e95 rcu: Remove show_tiny_preempt_stats() ... Browse Code »

With the removal of CONFIG_TINY_PREEMPT_RCU, show_tiny_preempt_stats()
is now an empty function. This commit therefore eliminates it by
inlining it.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:45:49 +0800
127781d1b rcu: Remove TINY_PREEMPT_RCU ... Browse Code »
41

TINY_PREEMPT_RCU adds significant code and complexity, but does not
offer commensurate benefits. People currently using TINY_PREEMPT_RCU
can get much better memory footprint with TINY_RCU, or, if they really
need preemptible RCU, they can use TREE_PREEMPT_RCU with a relatively
minor degradation in memory footprint. Please note that this move
has been widely publicized on LKML (https://lkml.org/lkml/2012/11/12/545)
and on LWN (http://lwn.net/Articles/541037/).

This commit therefore removes TINY_PREEMPT_RCU.

Signed-off-by: Paul E. McKenney
[ paulmck: Updated to eliminate #else in rcutiny.h as suggested by Josh ]
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:45:49 +0800
efc151c33 rcu: Convert rcutree_plugin.h printk calls ... Browse Code »

This commit converts printk() calls to the corresponding pr_*() calls.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:39:42 +0800
d7f3e2073 rcu: Convert rcutree.c printk calls ... Browse Code »

This commit converts printk() calls to the corresponding pr_*() calls.

Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2013-06-11 04:39:41 +0800
971394f38 rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration ... Browse Code »

In Steven Rostedt's words:

> I've been debugging the last couple of days why my tests have been
> locking up. One of my tracing tests, runs all available tracers. The
> lockup always happened with the mmiotrace, which is used to trace
> interactions between priority drivers and the kernel. But to do this
> easily, when the tracer gets registered, it disables all but the boot
> CPUs. The lockup always happened after it got done disabling the CPUs.
>
> Then I decided to try this:
>
> while :; do
> for i in 1 2 3; do
> echo 0 > /sys/devices/system/cpu/cpu$i/online
> done
> for i in 1 2 3; do
> echo 1 > /sys/devices/system/cpu/cpu$i/online
> done
> done
>
> Well, sure enough, that locked up too, with the same users. Doing a
> sysrq-w (showing all blocked tasks):
>
> [ 2991.344562] task PC stack pid father
> [ 2991.344562] rcu_preempt D ffff88007986fdf8 0 10 2 0x00000000
> [ 2991.344562] ffff88007986fc98 0000000000000002 ffff88007986fc48 0000000000000908
> [ 2991.344562] ffff88007986c280 ffff88007986ffd8 ffff88007986ffd8 00000000001d3c80
> [ 2991.344562] ffff880079248a40 ffff88007986c280 0000000000000000 00000000fffd4295
> [ 2991.344562] Call Trace:
> [ 2991.344562] [] schedule+0x64/0x66
> [ 2991.344562] [] schedule_timeout+0xbc/0xf9
> [ 2991.344562] [] ? ftrace_call+0x5/0x2f
> [ 2991.344562] [] ? cascade+0xa8/0xa8
> [ 2991.344562] [] schedule_timeout_uninterruptible+0x1e/0x20
> [ 2991.344562] [] rcu_gp_kthread+0x502/0x94b
> [ 2991.344562] [] ? __init_waitqueue_head+0x50/0x50
> [ 2991.344562] [] ? rcu_gp_fqs+0x64/0x64
> [ 2991.344562] [] kthread+0xb1/0xb9
> [ 2991.344562] [] ? lock_release_holdtime.part.23+0x4e/0x55
> [ 2991.344562] [] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562] [] ret_from_fork+0x7c/0xb0
> [ 2991.344562] [] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562] kworker/0:1 D ffffffff81a30680 0 47 2 0x00000000
> [ 2991.344562] Workqueue: events cpuset_hotplug_workfn
> [ 2991.344562] ffff880078dbbb58 0000000000000002 0000000000000006 00000000000000d8
> [ 2991.344562] ffff880078db8100 ffff880078dbbfd8 ffff880078dbbfd8 00000000001d3c80
> [ 2991.344562] ffff8800779ca5c0 ffff880078db8100 ffffffff81541fcf 0000000000000000
> [ 2991.344562] Call Trace:
> [ 2991.344562] [] ? __mutex_lock_common+0x3d4/0x609
> [ 2991.344562] [] schedule+0x64/0x66
> [ 2991.344562] [] schedule_preempt_disabled+0x18/0x24
> [ 2991.344562] [] __mutex_lock_common+0x3d4/0x609
> [ 2991.344562] [] ? get_online_cpus+0x3c/0x50
> [ 2991.344562] [] ? get_online_cpus+0x3c/0x50
> [ 2991.344562] [] mutex_lock_nested+0x3b/0x40
> [ 2991.344562] [] get_online_cpus+0x3c/0x50
> [ 2991.344562] [] rebuild_sched_domains_locked+0x6e/0x3a8
> [ 2991.344562] [] rebuild_sched_domains+0x1c/0x2a
> [ 2991.344562] [] cpuset_hotplug_workfn+0x1c7/0x1d3
> [ 2991.344562] [] ? cpuset_hotplug_workfn+0x5/0x1d3
> [ 2991.344562] [] process_one_work+0x2d4/0x4d1
> [ 2991.344562] [] ? process_one_work+0x207/0x4d1
> [ 2991.344562] [] worker_thread+0x2e7/0x3b5
> [ 2991.344562] [] ? rescuer_thread+0x332/0x332
> [ 2991.344562] [] kthread+0xb1/0xb9
> [ 2991.344562] [] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562] [] ret_from_fork+0x7c/0xb0
> [ 2991.344562] [] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562] bash D ffffffff81a4aa80 0 2618 2612 0x10000000
> [ 2991.344562] ffff8800379abb58 0000000000000002 0000000000000006 0000000000000c2c
> [ 2991.344562] ffff880077fea140 ffff8800379abfd8 ffff8800379abfd8 00000000001d3c80
> [ 2991.344562] ffff8800779ca5c0 ffff880077fea140 ffffffff81541fcf 0000000000000000
> [ 2991.344562] Call Trace:
> [ 2991.344562] [] ? __mutex_lock_common+0x3d4/0x609
> [ 2991.344562] [] schedule+0x64/0x66
> [ 2991.344562] [] schedule_preempt_disabled+0x18/0x24
> [ 2991.344562] [] __mutex_lock_common+0x3d4/0x609
> [ 2991.344562] [] ? rcu_cpu_notify+0x2f5/0x86e
> [ 2991.344562] [] ? rcu_cpu_notify+0x2f5/0x86e
> [ 2991.344562] [] mutex_lock_nested+0x3b/0x40
> [ 2991.344562] [] rcu_cpu_notify+0x2f5/0x86e
> [ 2991.344562] [] ? __lock_is_held+0x32/0x53
> [ 2991.344562] [] notifier_call_chain+0x6b/0x98
> [ 2991.344562] [] __raw_notifier_call_chain+0xe/0x10
> [ 2991.344562] [] __cpu_notify+0x20/0x32
> [ 2991.344562] [] cpu_notify_nofail+0x17/0x36
> [ 2991.344562] [] _cpu_down+0x154/0x259
> [ 2991.344562] [] cpu_down+0x2d/0x3a
> [ 2991.344562] [] store_online+0x4e/0xe7
> [ 2991.344562] [] dev_attr_store+0x20/0x22
> [ 2991.344562] [] sysfs_write_file+0x108/0x144
> [ 2991.344562] [] vfs_write+0xfd/0x158
> [ 2991.344562] [] SyS_write+0x5c/0x83
> [ 2991.344562] [] tracesys+0xdd/0xe2
>
> As well as held locks:
>
> [ 3034.728033] Showing all locks held in the system:
> [ 3034.728033] 1 lock held by rcu_preempt/10:
> [ 3034.728033] #0: (rcu_preempt_state.onoff_mutex){+.+...}, at: [] rcu_gp_kthread+0x167/0x94b
> [ 3034.728033] 4 locks held by kworker/0:1/47:
> [ 3034.728033] #0: (events){.+.+.+}, at: [] process_one_work+0x207/0x4d1
> [ 3034.728033] #1: (cpuset_hotplug_work){+.+.+.}, at: [] process_one_work+0x207/0x4d1
> [ 3034.728033] #2: (cpuset_mutex){+.+.+.}, at: [] rebuild_sched_domains+0x17/0x2a
> [ 3034.728033] #3: (cpu_hotplug.lock){+.+.+.}, at: [] get_online_cpus+0x3c/0x50
> [ 3034.728033] 1 lock held by mingetty/2563:
> [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2565:
> [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2569:
> [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2572:
> [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2575:
> [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [] n_tty_read+0x252/0x7e8
> [ 3034.728033] 7 locks held by bash/2618:
> [ 3034.728033] #0: (sb_writers#5){.+.+.+}, at: [] file_start_write+0x2a/0x2c
> [ 3034.728033] #1: (&buffer->mutex#2){+.+.+.}, at: [] sysfs_write_file+0x3c/0x144
> [ 3034.728033] #2: (s_active#54){.+.+.+}, at: [] sysfs_write_file+0xe7/0x144
> [ 3034.728033] #3: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [] cpu_hotplug_driver_lock+0x17/0x19
> [ 3034.728033] #4: (cpu_add_remove_lock){+.+.+.}, at: [] cpu_maps_update_begin+0x17/0x19
> [ 3034.728033] #5: (cpu_hotplug.lock){+.+.+.}, at: [] cpu_hotplug_begin+0x2c/0x6d
> [ 3034.728033] #6: (rcu_preempt_state.onoff_mutex){+.+...}, at: [] rcu_cpu_notify+0x2f5/0x86e
> [ 3034.728033] 1 lock held by bash/2980:
> [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [] n_tty_read+0x252/0x7e8
>
> Things looked a little weird. Also, this is a deadlock that lockdep did
> not catch. But what we have here does not look like a circular lock
> issue:
>
> Bash is blocked in rcu_cpu_notify():
>
> 1961 /* Exclude any attempts to start a new grace period. */
> 1962 mutex_lock(&rsp->onoff_mutex);
>
>
> kworker is blocked in get_online_cpus(), which makes sense as we are
> currently taking down a CPU.
>
> But rcu_preempt is not blocked on anything. It is simply sleeping in
> rcu_gp_kthread (really rcu_gp_init) here:
>
> 1453 #ifdef CONFIG_PROVE_RCU_DELAY
> 1454 if ((prandom_u32() % (rcu_num_nodes * 8)) == 0 &&
> 1455 system_state == SYSTEM_RUNNING)
> 1456 schedule_timeout_uninterruptible(2);
> 1457 #endif /* #ifdef CONFIG_PROVE_RCU_DELAY */
>
> And it does this while holding the onoff_mutex that bash is waiting for.
>
> Doing a function trace, it showed me where it happened:
>
> [ 125.940066] rcu_pree-10 3.... 28384115273: schedule_timeout_uninterruptible [...]
> [ 125.940066] rcu_pree-10 3d..3 28384202439: sched_switch: prev_comm=rcu_preempt prev_pid=10 prev_prio=120 prev_state=D ==> next_comm=watchdog/3 next_pid=38 next_prio=120
>
> The watchdog ran, and then:
>
> [ 125.940066] watchdog-38 3d..3 28384692863: sched_switch: prev_comm=watchdog/3 prev_pid=38 prev_prio=120 prev_state=P ==> next_comm=modprobe next_pid=2848 next_prio=118
>
> Not sure what modprobe was doing, but shortly after that:
>
> [ 125.940066] modprobe-2848 3d..3 28385041749: sched_switch: prev_comm=modprobe prev_pid=2848 prev_prio=118 prev_state=R+ ==> next_comm=migration/3 next_pid=40 next_prio=0
>
> Where the migration thread took down the CPU:
>
> [ 125.940066] migratio-40 3d..3 28389148276: sched_switch: prev_comm=migration/3 prev_pid=40 prev_prio=0 prev_state=P ==> next_comm=swapper/3 next_pid=0 next_prio=120
>
> which finally did:
>
> [ 125.940066] -0 3...1 28389282142: arch_cpu_idle_dead [ 125.940066] -0 3...1 28389282548: native_play_dead [ 125.940066] -0 3...1 28389282924: play_dead_common [ 125.940066] -0 3...1 28389283468: idle_task_exit [ 125.940066] -0 3...1 28389284644: amd_e400_remove_cpu
>
> CPU 3 is now offline, the rcu_preempt thread that ran on CPU 3 is still
> doing a schedule_timeout_uninterruptible() and it registered it's
> timeout to the timer base for CPU 3. You would think that it would get
> migrated right? The issue here is that the timer migration happens at
> the CPU notifier for CPU_DEAD. The problem is that the rcu notifier for
> CPU_DOWN is blocked waiting for the onoff_mutex to be released, which is
> held by the thread that just put itself into a uninterruptible sleep,
> that wont wake up until the CPU_DEAD notifier of the timer
> infrastructure is called, which wont happen until the rcu notifier
> finishes. Here's our deadlock!

This commit breaks this deadlock cycle by substituting a shorter udelay()
for the previous schedule_timeout_uninterruptible(), while at the same
time increasing the probability of the delay. This maintains the intensity
of the testing.

Reported-by: Steven Rostedt
Signed-off-by: Paul E. McKenney
Tested-by: Steven Rostedt

Paul E. McKenney
2013-06-11 04:37:12 +0800
016a8d5be rcu: Don't call wakeup() with rcu_node structure ->lock held ... Browse Code »

This commit fixes a lockdep-detected deadlock by moving a wake_up()
call out from a rnp->lock critical section. Please see below for
the long version of this story.

On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:

> [12572.705832] ======================================================
> [12572.750317] [ INFO: possible circular locking dependency detected ]
> [12572.796978] 3.10.0-rc3+ #39 Not tainted
> [12572.833381] -------------------------------------------------------
> [12572.862233] trinity-child17/31341 is trying to acquire lock:
> [12572.870390] (rcu_node_0){..-.-.}, at: [] rcu_read_unlock_special+0x9f/0x4c0
> [12572.878859]
> but task is already holding lock:
> [12572.894894] (&ctx->lock){-.-...}, at: [] perf_lock_task_context+0x7d/0x2d0
> [12572.903381]
> which lock already depends on the new lock.
>
> [12572.927541]
> the existing dependency chain (in reverse order) is:
> [12572.943736]
> -> #4 (&ctx->lock){-.-...}:
> [12572.960032] [] lock_acquire+0x91/0x1f0
> [12572.968337] [] _raw_spin_lock+0x40/0x80
> [12572.976633] [] __perf_event_task_sched_out+0x2e7/0x5e0
> [12572.984969] [] perf_event_task_sched_out+0x93/0xa0
> [12572.993326] [] __schedule+0x2cf/0x9c0
> [12573.001652] [] schedule_user+0x2e/0x70
> [12573.009998] [] retint_careful+0x12/0x2e
> [12573.018321]
> -> #3 (&rq->lock){-.-.-.}:
> [12573.034628] [] lock_acquire+0x91/0x1f0
> [12573.042930] [] _raw_spin_lock+0x40/0x80
> [12573.051248] [] wake_up_new_task+0xb7/0x260
> [12573.059579] [] do_fork+0x105/0x470
> [12573.067880] [] kernel_thread+0x26/0x30
> [12573.076202] [] rest_init+0x23/0x140
> [12573.084508] [] start_kernel+0x3f1/0x3fe
> [12573.092852] [] x86_64_start_reservations+0x2a/0x2c
> [12573.101233] [] x86_64_start_kernel+0xcc/0xcf
> [12573.109528]
> -> #2 (&p->pi_lock){-.-.-.}:
> [12573.125675] [] lock_acquire+0x91/0x1f0
> [12573.133829] [] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.141964] [] try_to_wake_up+0x31/0x320
> [12573.150065] [] default_wake_function+0x12/0x20
> [12573.158151] [] autoremove_wake_function+0x18/0x40
> [12573.166195] [] __wake_up_common+0x58/0x90
> [12573.174215] [] __wake_up+0x39/0x50
> [12573.182146] [] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.190119] [] rcu_start_future_gp+0x1c9/0x1f0
> [12573.198023] [] rcu_nocb_kthread+0x114/0x930
> [12573.205860] [] kthread+0xed/0x100
> [12573.213656] [] ret_from_fork+0x7c/0xb0
> [12573.221379]
> -> #1 (&rsp->gp_wq){..-.-.}:
> [12573.236329] [] lock_acquire+0x91/0x1f0
> [12573.243783] [] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.251178] [] __wake_up+0x23/0x50
> [12573.258505] [] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.265891] [] rcu_start_future_gp+0x1c9/0x1f0
> [12573.273248] [] rcu_nocb_kthread+0x114/0x930
> [12573.280564] [] kthread+0xed/0x100
> [12573.287807] [] ret_from_fork+0x7c/0xb0

Notice the above call chain.

rcu_start_future_gp() is called with the rnp->lock held. Then it calls
rcu_start_gp_advance, which does a wakeup.

You can't do wakeups while holding the rnp->lock, as that would mean
that you could not do a rcu_read_unlock() while holding the rq lock, or
any lock that was taken while holding the rq lock. This is because...
(See below).

> [12573.295067]
> -> #0 (rcu_node_0){..-.-.}:
> [12573.309293] [] __lock_acquire+0x1786/0x1af0
> [12573.316568] [] lock_acquire+0x91/0x1f0
> [12573.323825] [] _raw_spin_lock+0x40/0x80
> [12573.331081] [] rcu_read_unlock_special+0x9f/0x4c0
> [12573.338377] [] __rcu_read_unlock+0x96/0xa0
> [12573.345648] [] perf_lock_task_context+0x143/0x2d0
> [12573.352942] [] find_get_context+0x4e/0x1f0
> [12573.360211] [] SYSC_perf_event_open+0x514/0xbd0
> [12573.367514] [] SyS_perf_event_open+0x9/0x10
> [12573.374816] [] tracesys+0xdd/0xe2

Notice the above trace.

perf took its own ctx->lock, which can be taken while holding the rq
lock. While holding this lock, it did a rcu_read_unlock(). The
perf_lock_task_context() basically looks like:

rcu_read_lock();
raw_spin_lock(ctx->lock);
rcu_read_unlock();

Now, what looks to have happened, is that we scheduled after taking that
first rcu_read_lock() but before taking the spin lock. When we scheduled
back in and took the ctx->lock, the following rcu_read_unlock()
triggered the "special" code.

The rcu_read_unlock_special() takes the rnp->lock, which gives us a
possible deadlock scenario.

CPU0 CPU1 CPU2
---- ---- ----

rcu_nocb_kthread()
lock(rq->lock);
lock(ctx->lock);
lock(rnp->lock);

wake_up();

lock(rq->lock);

rcu_read_unlock();

rcu_read_unlock_special();

lock(rnp->lock);
lock(ctx->lock);

**** DEADLOCK ****

> [12573.382068]
> other info that might help us debug this:
>
> [12573.403229] Chain exists of:
> rcu_node_0 --> &rq->lock --> &ctx->lock
>
> [12573.424471] Possible unsafe locking scenario:
>
> [12573.438499] CPU0 CPU1
> [12573.445599] ---- ----
> [12573.452691] lock(&ctx->lock);
> [12573.459799] lock(&rq->lock);
> [12573.467010] lock(&ctx->lock);
> [12573.474192] lock(rcu_node_0);
> [12573.481262]
> *** DEADLOCK ***
>
> [12573.501931] 1 lock held by trinity-child17/31341:
> [12573.508990] #0: (&ctx->lock){-.-...}, at: [] perf_lock_task_context+0x7d/0x2d0
> [12573.516475]
> stack backtrace:
> [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
> [12573.545357] ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
> [12573.552868] ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
> [12573.560353] 0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
> [12573.567856] Call Trace:
> [12573.575011] [] dump_stack+0x19/0x1b
> [12573.582284] [] print_circular_bug+0x200/0x20f
> [12573.589637] [] __lock_acquire+0x1786/0x1af0
> [12573.596982] [] ? sched_clock_cpu+0xb5/0x100
> [12573.604344] [] lock_acquire+0x91/0x1f0
> [12573.611652] [] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.619030] [] _raw_spin_lock+0x40/0x80
> [12573.626331] [] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.633671] [] rcu_read_unlock_special+0x9f/0x4c0
> [12573.640992] [] ? perf_lock_task_context+0x7d/0x2d0
> [12573.648330] [] ? put_lock_stats.isra.29+0xe/0x40
> [12573.655662] [] ? delay_tsc+0x90/0xe0
> [12573.662964] [] __rcu_read_unlock+0x96/0xa0
> [12573.670276] [] perf_lock_task_context+0x143/0x2d0
> [12573.677622] [] ? __perf_event_enable+0x370/0x370
> [12573.684981] [] find_get_context+0x4e/0x1f0
> [12573.692358] [] SYSC_perf_event_open+0x514/0xbd0
> [12573.699753] [] ? get_parent_ip+0xd/0x50
> [12573.707135] [] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [12573.714599] [] SyS_perf_event_open+0x9/0x10
> [12573.721996] [] tracesys+0xdd/0xe2

This commit delays the wakeup via irq_work(), which is what
perf and ftrace use to perform wakeups in critical sections.

Reported-by: Dave Jones
Signed-off-by: Steven Rostedt
Signed-off-by: Paul E. McKenney

Steven Rostedt
2013-06-11 04:37:11 +0800

09 Jun, 2013

5 commits

81db4dbf5 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer fixes from Thomas Gleixner:

- Trivial: unused variable removal

- Posix-timers: Add the clock ID to the new proc interface to make it
useful. The interface is new and should be functional when we reach
the final 3.10 release.

- Cure a false positive warning in the tick code introduced by the
overhaul in 3.10

- Fix for a persistent clock detection regression introduced in this
cycle

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Correct run-time detection of persistent_clock.
ntp: Remove unused variable flags in __hardpps
posix-timers: Show clock ID in proc file
tick: Cure broadcast false positive pending bit warning

Linus Torvalds
2013-06-09 06:51:21 +0800
c3e58a794 Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux ... Browse Code »

Pull irqdomain bug fixes from Grant Likely:
"This branch contains a set of straight forward bug fixes to the
irqdomain code and to a couple of drivers that make use of it."

* tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux:
irqchip: Return -EPERM for reserved IRQs
irqdomain: document the simple domain first_irq
kernel/irq/irqdomain.c: before use 'irq_data', need check it whether valid.
irqdomain: export irq_domain_add_simple

Linus Torvalds
2013-06-09 06:50:42 +0800
94a63da0a irqdomain: document the simple domain first_irq ... Browse Code »

The first_irq needs to be zero to get a linear domain and that
comes with special semantics. We want to simplify this going
forward but some documentation never hurts.

Signed-off-by: Linus Walleij
Signed-off-by: Grant Likely

Linus Walleij
2013-06-09 04:15:09 +0800
275e31b10 kernel/irq/irqdomain.c: before use 'irq_data', need check it whether valid. ... Browse Code »

Since irq_data may be NULL, if so, we WARN_ON(), and continue, 'hwirq'
which related with 'irq_data' has to initialize later, or it will cause
issue.

Signed-off-by: Chen Gang
Signed-off-by: Grant Likely

Chen Gang
2013-06-09 04:15:09 +0800
346dbb79e irqdomain: export irq_domain_add_simple ... Browse Code »

All other irq_domain_add_* functions are exported already, and apparently
this one got left out by mistake, which causes build errors for ARM
allmodconfig kernels:

ERROR: "irq_domain_add_simple" [drivers/gpio/gpio-rcar.ko] undefined!
ERROR: "irq_domain_add_simple" [drivers/gpio/gpio-em.ko] undefined!

Signed-off-by: Arnd Bergmann
Acked-by: Simon Horman
Signed-off-by: Grant Likely

Arnd Bergmann
2013-06-09 04:15:08 +0800

08 Jun, 2013

1 commit

14d0ee051 Merge tag 'trace-fixes-v3.10-rc3-v3' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
"This contains 4 fixes.

The first two fix the case where full RCU debugging is enabled,
enabling function tracing causes a live lock of the system. This is
due to the added debug checks in rcu_dereference_raw() that is used by
the function tracer. These checks are also traced by the function
tracer as well as cause enough overhead to the function tracer to slow
down the system enough that the time to finish an interrupt can take
longer than when the next interrupt is triggered, causing a live lock
from the timer interrupt.

Talking this over with Paul McKenney, we came up with a fix that adds
a new rcu_dereference_raw_notrace() that does not perform these added
checks, and let the function tracer use that.

The third commit fixes a failed compile when branch tracing is
enabled, due to the conversion of the trace_test_buffer() selftest
that the branch trace wasn't converted for.

The forth patch fixes a bug caught by the RCU lockdep code where a
rcu_read_lock() is performed when rcu is disabled (either going to or
from idle, or user space). This happened on the irqsoff tracer as it
calls task_uid(). The fix here was to use current_uid() when possible
that doesn't use rcu locking. Which luckily, is always used when
irqsoff calls this code."

* tag 'trace-fixes-v3.10-rc3-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Use current_uid() for critical time tracing
tracing: Fix bad parameter passed in branch selftest
ftrace: Use the rcu _notrace variants for rcu_dereference_raw() and friends
rcu: Add _notrace variation of rcu_dereference_raw() and hlist_for_each_entry_rcu()

Linus Torvalds
2013-06-08 09:46:51 +0800

07 Jun, 2013

1 commit

f17a51948 tracing: Use current_uid() for critical time tracing ... Browse Code »

The irqsoff tracer records the max time that interrupts are disabled.
There are hooks in the assembly code that calls back into the tracer when
interrupts are disabled or enabled.

When they are enabled, the tracer checks if the amount of time they
were disabled is larger than the previous recorded max interrupts off
time. If it is, it creates a snapshot of the currently running trace
to store where the last largest interrupts off time was held and how
it happened.

During testing, this RCU lockdep dump appeared:

[ 1257.829021] ===============================
[ 1257.829021] [ INFO: suspicious RCU usage. ]
[ 1257.829021] 3.10.0-rc1-test+ #171 Tainted: G W
[ 1257.829021] -------------------------------
[ 1257.829021] /home/rostedt/work/git/linux-trace.git/include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
[ 1257.829021]
[ 1257.829021] other info that might help us debug this:
[ 1257.829021]
[ 1257.829021]
[ 1257.829021] RCU used illegally from idle CPU!
[ 1257.829021] rcu_scheduler_active = 1, debug_locks = 0
[ 1257.829021] RCU used illegally from extended quiescent state!
[ 1257.829021] 2 locks held by trace-cmd/4831:
[ 1257.829021] #0: (max_trace_lock){......}, at: [] stop_critical_timing+0x1a3/0x209
[ 1257.829021] #1: (rcu_read_lock){.+.+..}, at: [] __update_max_tr+0x88/0x1ee
[ 1257.829021]
[ 1257.829021] stack backtrace:
[ 1257.829021] CPU: 3 PID: 4831 Comm: trace-cmd Tainted: G W 3.10.0-rc1-test+ #171
[ 1257.829021] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
[ 1257.829021] 0000000000000001 ffff880065f49da8 ffffffff8153dd2b ffff880065f49dd8
[ 1257.829021] ffffffff81092a00 ffff88006bd78680 ffff88007add7500 0000000000000003
[ 1257.829021] ffff88006bd78680 ffff880065f49e18 ffffffff810daebf ffffffff810dae5a
[ 1257.829021] Call Trace:
[ 1257.829021] [] dump_stack+0x19/0x1b
[ 1257.829021] [] lockdep_rcu_suspicious+0x109/0x112
[ 1257.829021] [] __update_max_tr+0xed/0x1ee
[ 1257.829021] [] ? __update_max_tr+0x88/0x1ee
[ 1257.829021] [] ? user_enter+0xfd/0x107
[ 1257.829021] [] update_max_tr_single+0x11d/0x12d
[ 1257.829021] [] ? user_enter+0xfd/0x107
[ 1257.829021] [] stop_critical_timing+0x141/0x209
[ 1257.829021] [] ? trace_hardirqs_on+0xd/0xf
[ 1257.829021] [] ? user_enter+0xfd/0x107
[ 1257.829021] [] time_hardirqs_on+0x2a/0x2f
[ 1257.829021] [] ? user_enter+0xfd/0x107
[ 1257.829021] [] trace_hardirqs_on_caller+0x16/0x197
[ 1257.829021] [] trace_hardirqs_on+0xd/0xf
[ 1257.829021] [] user_enter+0xfd/0x107
[ 1257.829021] [] do_notify_resume+0x92/0x97
[ 1257.829021] [] int_signal+0x12/0x17

What happened was entering into the user code, the interrupts were enabled
and a max interrupts off was recorded. The trace buffer was saved along with
various information about the task: comm, pid, uid, priority, etc.

The uid is recorded with task_uid(tsk). But this is a macro that uses rcu_read_lock()
to retrieve the data, and this happened to happen where RCU is blind (user_enter).

As only the preempt and irqs off tracers can have this happen, and they both
only have the tsk == current, if tsk == current, use current_uid() instead of
task_uid(), as current_uid() does not use RCU as only current can change its uid.

This fixes the RCU suspicious splat.

Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2013-06-07 00:35:30 +0800

03 Jun, 2013

1 commit

7d80fea42 Merge branch 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup fixes from Tejun Heo:

- Fix for yet another xattr bug which may lead to NULL deref.

- A subtle bug in for_each_descendant_pre(). This bug requires quite
specific conditions to trigger and isn't too likely to actually
happen in the wild, but maybe that just makes it that much more
nastier.

- A warning message added for silly cgroup re-mount (not -o remount,
but unmount followed by mount) behavior.

* 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: warn about mismatching options of a new mount of an existing hierarchy
cgroup: fix a subtle bug in descendant pre-order walk
cgroup: initialize xattr before calling d_instantiate()

Linus Torvalds
2013-06-03 16:57:16 +0800

31 May, 2013

1 commit

484b002e2 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Peter Anvin:

- Three EFI-related fixes

- Two early memory initialization fixes

- build fix for older binutils

- fix for an eager FPU performance regression -- currently we don't
allow the use of the FPU at interrupt time *at all* in eager mode,
which is clearly wrong.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Allow FPU to be used at interrupt time even with eagerfpu
x86, crc32-pclmul: Fix build with older binutils
x86-64, init: Fix a possible wraparound bug in switchover in head_64.S
x86, range: fix missing merge during add range
x86, efi: initial the local variable of DataSize to zero
efivar: fix oops in efivar_update_sysfs_entries() caused by memory reuse
efivarfs: Never return ENOENT from firmware again

Linus Torvalds
2013-05-31 08:44:10 +0800

30 May, 2013

1 commit

0184d50f9 tracing: Fix bad parameter passed in branch selftest ... Browse Code »

The branch selftest calls trace_test_buffer(), but with the new code
it expects the first parameter to be a pointer to a struct trace_buffer.
All self tests were changed but the branch selftest was missed.

This caused either a crash or failed test when the branch selftest was
enabled.

Link: http://lkml.kernel.org/r/20130529141333.GA24064@localhost

Reported-by: Fengguang Wu
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2013-05-30 04:00:03 +0800

29 May, 2013

6 commits

67dd331c5 Merge branch 'fortglx/3.10/time' of git://git.linaro.org/people/jstultz/linux into timers/urgent Browse Code »

Thomas Gleixner
2013-05-29 15:55:01 +0800
1bb539ca3 ftrace: Use the rcu _notrace variants for rcu_dereference_raw() and friends ... Browse Code »

As rcu_dereference_raw() under RCU debug config options can add quite a
bit of checks, and that tracing uses rcu_dereference_raw(), these checks
happen with the function tracer. The function tracer also happens to trace
these debug checks too. This added overhead can livelock the system.

Have the function tracer use the new RCU _notrace equivalents that do
not do the debug checks for RCU.

Link: http://lkml.kernel.org/r/20130528184209.467603904@goodmis.org

Acked-by: Paul E. McKenney
Signed-off-by: Steven Rostedt

Steven Rostedt
2013-05-29 10:48:00 +0800
2a0ff3fbe cgroup: warn about mismatching options of a new mount of an existing hierarchy ... Browse Code »

With the new __DEVEL__sane_behavior mount option was introduced,
if the root cgroup is alive with no xattr function, to mount a
new cgroup with xattr will be rejected in terms of design which
just fine. However, if the root cgroup does not mounted with
__DEVEL__sane_hehavior, to create a new cgroup with xattr option
will succeed although after that the EA function does not works
as expected but will get ENOTSUPP for setting up attributes under
either cgroup. e.g.

setfattr: /cgroup2/test: Operation not supported

Instead of keeping silence in this case, it's better to drop a log
entry in warning level. That would be helpful to understand the
reason behind the scene from the user's perspective, and this is
essentially an improvement does not break the backward compatibilities.

With this fix, above mount attemption will keep up works as usual but
the following line cound be found at the system log:

[ ...] cgroup: new mount options do not match the existing superblock

tj: minor formatting / message updates.

Signed-off-by: Jie Liu
Reported-by: Alexey Kodanev
Signed-off-by: Tejun Heo
Cc: stable@vger.kernel.org

Jeff Liu
2013-05-29 06:59:39 +0800
0d6bd9953 timekeeping: Correct run-time detection of persistent_clock. ... Browse Code »

Since commit 31ade30692dc9680bfc95700d794818fa3f754ac, timekeeping_init()
checks for presence of persistent clock by attempting to read a non-zero
time value. This is an issue on platforms where persistent_clock (instead
is implemented as a free-running counter (instead of an RTC) starting
from zero on each boot and running during suspend. Examples are some ARM
platforms (e.g. PandaBoard).

An attempt to read such a clock during timekeeping_init() may return zero
value and falsely declare persistent clock as missing. Additionally, in
the above case suspend times may be accounted twice (once from
timekeeping_resume() and once from rtc_resume()), resulting in a gradual
drift of system time.

This patch does a run-time correction of the issue by doing the same check
during timekeeping_suspend().

A better long-term solution would have to return error when trying to read
non-existing clock and zero when trying to read an uninitialized clock, but
that would require changing all persistent_clock implementations.

This patch addresses the immediate breakage, for now.

Cc: John Stultz
Cc: Thomas Gleixner
Cc: Feng Tang
Cc: stable@vger.kernel.org
Signed-off-by: Zoran Markovic
[jstultz: Tweaked commit message and subject]
Signed-off-by: John Stultz

Zoran Markovic
2013-05-29 04:45:19 +0800
aa848233f ntp: Remove unused variable flags in __hardpps ... Browse Code »

kernel/time/ntp.c: In function ‘__hardpps’:
kernel/time/ntp.c:877: warning: unused variable ‘flags’

commit a076b2146fabb0894cae5e0189a8ba3f1502d737 ("ntp: Remove ntp_lock,
using the timekeeping locks to protect ntp state") removed its users,
but not the actual variable.

Signed-off-by: Geert Uytterhoeven
Signed-off-by: John Stultz

Geert Uytterhoeven
2013-05-29 04:45:19 +0800
e3bf756eb Merge tag 'trace-fixes-v3.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/g… ... Browse Code »

…it/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
"Two more fixes:

The first one was reported by Mauro Carvalho Chehab, where if a poll()
is done against a trace buffer for a CPU that has never been online,
it will crash the kernel, as buffers are only created when a CPU comes
on line, but the trace files are for all possible CPUs.

This fix is to check if the buffer was allocated and if not return
-EINVAL.

That was the simple fix, the real fix is a bit more complex and not
for a -rc release. We could have the files created when the CPUs come
online. That would require some design changes.

The second one was reported by Peter Zijlstra. If the kernel command
line has ftrace=nop, it will lock up the system on boot up. This is
because the new design for 3.10 has the nop tracer bootstrap the
tracing subsystem. When ftrace=<trace> is defined, when a that tracer
is registered, it starts the tracing, but uses the nop tracer to clear
things out. What happened here was that ftrace=nop caused the
registering of nop to start it and use nop before it was initialized.

The only thing nop needs to have done to initialize it is to have the
tracer point its current_tracer structure member to the nop tracer.
Doing that before registering the nop tracer makes everything work."

* tag 'trace-fixes-v3.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Do not poll non allocated cpu buffers
tracing: Fix crash when ftrace=nop on the kernel command line

Linus Torvalds
2013-05-29 00:39:04 +0800

28 May, 2013

2 commits

6721cb600 ring-buffer: Do not poll non allocated cpu buffers ... Browse Code »

The tracing infrastructure sets up for possible CPUs, but it uses
the ring buffer polling, it is possible to call the ring buffer
polling code with a CPU that hasn't been allocated. This will cause
a kernel oops when it access a ring buffer cpu buffer that is part
of the possible cpus but hasn't been allocated yet as the CPU has never
been online.

Reported-by: Mauro Carvalho Chehab
Tested-by: Mauro Carvalho Chehab
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2013-05-28 22:53:20 +0800
2938d2757 tick: Cure broadcast false positive pending bit warning ... Browse Code »

commit 26517f3e (tick: Avoid programming the local cpu timer if
broadcast pending) added a warning if the cpu enters broadcast mode
again while the pending bit is still set. Meelis reported that the
warning triggers. There are two corner cases which have been not
considered:

1) cpuidle calls clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
twice. That can result in the following scenario

CPU0 CPU1
cpuidle_idle_call()
clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
set cpu in tick_broadcast_oneshot_mask

broadcast interrupt
event expired for cpu1
set pending bit

acpi_idle_enter_simple()
clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
WARN_ON(pending bit)

Move the WARN_ON into the section where we enter broadcast mode so
it wont provide false positives on the second call.

2) safe_halt() enables interrupts, so a broadcast interrupt can be
delivered befor the broadcast mode is disabled. That sets the
pending bit for the CPU which receives the broadcast
interrupt. Though the interrupt is delivered right away from the
broadcast handler and leaves the pending bit stale.

Clear the pending bit for the current cpu in the broadcast handler.

Reported-and-tested-by: Meelis Roos
Cc: Len Brown
Cc: Frederic Weisbecker
Cc: Borislav Petkov
Cc: Rafael J. Wysocki
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305271841130.4220@ionos
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2013-05-28 15:33:01 +0800

25 May, 2013

2 commits

387b8b3e3 auditfilter.c: fix kernel-doc warnings ... Browse Code »

Fix kernel-doc warnings in kernel/auditfilter.c:

Warning(kernel/auditfilter.c:1029): Excess function parameter 'loginuid' description in 'audit_receive_filter'
Warning(kernel/auditfilter.c:1029): Excess function parameter 'sessionid' description in 'audit_receive_filter'
Warning(kernel/auditfilter.c:1029): Excess function parameter 'sid' description in 'audit_receive_filter'

Signed-off-by: Randy Dunlap
Cc: Eric Paris
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2013-05-25 07:22:52 +0800
17fdfd085 Merge tag 'trace-fixes-v3.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/g… ... Browse Code »

…it/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
"Masami Hiramatsu fixed another bug. This time returning a proper
result in event_enable_func(). After checking the return status of
try_module_get(), it returned the status of try_module_get().

But try_module_get() returns 0 on failure, which is success for
event_enable_func()"

* tag 'trace-fixes-v3.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Return -EBUSY when event_enable_func() fails to get module

Linus Torvalds
2013-05-25 01:46:55 +0800

24 May, 2013

1 commit

7805d000d cgroup: fix a subtle bug in descendant pre-order walk ... Browse Code »

When cgroup_next_descendant_pre() initiates a walk, it checks whether
the subtree root doesn't have any children and if not returns NULL.
Later code assumes that the subtree isn't empty. This is broken
because the subtree may become empty inbetween, which can lead to the
traversal escaping the subtree by walking to the sibling of the
subtree root.

There's no reason to have the early exit path. Remove it along with
the later assumption that the subtree isn't empty. This simplifies
the code a bit and fixes the subtle bug.

While at it, fix the comment of cgroup_for_each_descendant_pre() which
was incorrectly referring to ->css_offline() instead of
->css_online().

Signed-off-by: Tejun Heo
Reviewed-by: Michal Hocko
Cc: stable@vger.kernel.org

Tejun Heo
2013-05-24 09:50:24 +0800

23 May, 2013

1 commit

ca1643186 tracing: Fix crash when ftrace=nop on the kernel command line ... Browse Code »

If ftrace= is on the kernel command line, when that tracer is
registered, it will be initiated by tracing_set_tracer() to execute that
tracer.

The nop tracer is just a stub tracer that is used to have no tracer
enabled. It is assigned at early bootup as it is the default tracer.

But if ftrace=nop is on the kernel command line, the registering of the
nop tracer will call tracing_set_tracer() which will try to execute
the nop tracer. But it expects tr->current_trace to be assigned something
as it usually is assigned to the nop tracer. As it hasn't been assigned
to anything yet, it causes the system to crash.

The simple fix is to move the tr->current_trace = nop before registering
the nop tracer. The functionality is still the same as the nop tracer
doesn't do anything anyway.

Reported-by: Peter Zijlstra
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2013-05-23 23:57:25 +0800

19 May, 2013

1 commit

8f05bde9b Merge tag 'kmemleak-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 ... Browse Code »

Pull kmemleak patches from Catalin Marinas:
"Kmemleak now scans all the writable and non-executable module sections
to avoid false positives (previously it was only scanning specific
sections and missing .ref.data)."

* tag 'kmemleak-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
kmemleak: No need for scanning specific module sections
kmemleak: Scan all allocated, writeable and not executable module sections

Linus Torvalds
2013-05-19 01:21:32 +0800

18 May, 2013

1 commit

fbe06b7ba x86, range: fix missing merge during add range ... Browse Code »

Christian found v3.9 does not work with E350 with EFI is enabled.

[ 1.658832] Trying to unpack rootfs image as initramfs...
[ 1.679935] BUG: unable to handle kernel paging request at ffff88006e3fd000
[ 1.686940] IP: [] memset+0x1f/0xb0
[ 1.692010] PGD 1f77067 PUD 1f7a067 PMD 61420067 PTE 0

but early memtest report all memory could be accessed without problem.

early page table is set in following sequence:
[ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[ 0.000000] init_memory_mapping: [mem 0x6e600000-0x6e7fffff]
[ 0.000000] init_memory_mapping: [mem 0x6c000000-0x6e5fffff]
[ 0.000000] init_memory_mapping: [mem 0x00100000-0x6bffffff]
[ 0.000000] init_memory_mapping: [mem 0x6e800000-0x6ea07fff]
but later efi_enter_virtual_mode try set mapping again wrongly.
[ 0.010644] pid_max: default: 32768 minimum: 301
[ 0.015302] init_memory_mapping: [mem 0x640c5000-0x6e3fcfff]
that means it fails with pfn_range_is_mapped.

It turns out that we have a bug in add_range_with_merge and it does not
merge range properly when new add one fill the hole between two exsiting
ranges. In the case when [mem 0x00100000-0x6bffffff] is the hole between
[mem 0x00000000-0x000fffff] and [mem 0x6c000000-0x6e7fffff].

Fix the add_range_with_merge by calling itself recursively.

Reported-by: "Christian König"
Signed-off-by: Yinghai Lu
Link: http://lkml.kernel.org/r/CAE9FiQVofGoSk7q5-0irjkBxemqK729cND4hov-1QCBJDhxpgQ@mail.gmail.com
Cc: v3.9
Signed-off-by: H. Peter Anvin

Yinghai Lu
2013-05-18 02:49:10 +0800

17 May, 2013

3 commits

89c837351 kmemleak: No need for scanning specific module sections ... Browse Code »

As kmemleak now scans all module sections that are allocated, writable
and non executable, there's no need to scan individual sections that
might reference data.

Signed-off-by: Steven Rostedt
Signed-off-by: Catalin Marinas
Acked-by: Rusty Russell

Steven Rostedt
2013-05-17 16:53:36 +0800
06c9494c0 kmemleak: Scan all allocated, writeable and not executable module sections ... Browse Code »

Instead of just picking data sections by name (names that start
with .data, .bss or .ref.data), use the section flags and scan all
sections that are allocated, writable and not executable. Which should
cover all sections of a module that might reference data.

Signed-off-by: Steven Rostedt
[catalin.marinas@arm.com: removed unused 'name' variable]
[catalin.marinas@arm.com: collapsed 'if' blocks]
Signed-off-by: Catalin Marinas
Acked-by: Rusty Russell

Steven Rostedt
2013-05-17 16:53:07 +0800
4a007ed92 Merge branch 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

Pull workqueue fixes from Tejun Heo:
"Three more workqueue regression fixes.

- Fix unbalanced unlock in trylock failure path of manage_workers().
This shouldn't happen often in the wild but is possible.

- While making schedule_work() and friends inline, they become
unavailable to !GPL modules. Allow !GPL modules to access basic
stuff - system_wq and queue_*work_on() - so that schedule_work()
and friends can be used.

- During boot, the unbound NUMA support code allocates a cpumask for
each possible node using alloc_cpumask_var_node(), which ends up
trying to allocate node-specific memory even for offline nodes
triggering BUG in the memory alloc code. Use NUMA_NO_NODE for
offline nodes."

* 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: don't perform NUMA-aware allocations on offline nodes in wq_numa_init()
workqueue: Make schedule_work() available again to non GPL modules
workqueue: correct handling of the pool spin_lock

Linus Torvalds
2013-05-17 03:03:28 +0800