11 Jun, 2013

13 commits

  • Now that TINY_PREEMPT_RCU is no more, exit_rcu() is always an empty
    function. But if TINY_RCU is going to have an empty function, it should
    be in include/linux/rcutiny.h, where it does not bloat the kernel.
    This commit therefore moves exit_rcu() out of kernel/rcupdate.c to
    kernel/rcutree_plugin.h, and places a static inline empty function in
    include/linux/rcutiny.h in order to shrink TINY_RCU a bit.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • This commit rearranges code in order to allow ifdefs to be consolidated
    in kernel/rcutiny_plugin.h, simplifying the code.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • With the removal of CONFIG_TINY_PREEMPT_RCU, check_cpu_stall_preempt()
    is now an empty function. This commit therefore eliminates it by
    inlining it.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • TINY_PREEMPT_RCU could use a kthread to handle RCU callback invocation,
    which required an API to abstract kthread vs. softirq invocation.
    Now that TINY_PREEMPT_RCU is no longer with us, this commit retires
    this API in favor of direct use of the relevant softirq primitives.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • With the removal of CONFIG_TINY_PREEMPT_RCU, rcu_preempt_process_callbacks()
    is now an empty function. This commit therefore eliminates it by
    inlining it.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • With the removal of CONFIG_TINY_PREEMPT_RCU, rcu_preempt_remove_callbacks()
    is now an empty function. This commit therefore eliminates it by
    inlining it.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • With the removal of CONFIG_TINY_PREEMPT_RCU, rcu_preempt_check_callbacks()
    is now an empty function. This commit therefore eliminates it by
    inlining it.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • With the removal of CONFIG_TINY_PREEMPT_RCU, show_tiny_preempt_stats()
    is now an empty function. This commit therefore eliminates it by
    inlining it.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • TINY_PREEMPT_RCU adds significant code and complexity, but does not
    offer commensurate benefits. People currently using TINY_PREEMPT_RCU
    can get much better memory footprint with TINY_RCU, or, if they really
    need preemptible RCU, they can use TREE_PREEMPT_RCU with a relatively
    minor degradation in memory footprint. Please note that this move
    has been widely publicized on LKML (https://lkml.org/lkml/2012/11/12/545)
    and on LWN (http://lwn.net/Articles/541037/).

    This commit therefore removes TINY_PREEMPT_RCU.

    Signed-off-by: Paul E. McKenney
    [ paulmck: Updated to eliminate #else in rcutiny.h as suggested by Josh ]
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • This commit converts printk() calls to the corresponding pr_*() calls.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • This commit converts printk() calls to the corresponding pr_*() calls.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • In Steven Rostedt's words:

    > I've been debugging the last couple of days why my tests have been
    > locking up. One of my tracing tests, runs all available tracers. The
    > lockup always happened with the mmiotrace, which is used to trace
    > interactions between priority drivers and the kernel. But to do this
    > easily, when the tracer gets registered, it disables all but the boot
    > CPUs. The lockup always happened after it got done disabling the CPUs.
    >
    > Then I decided to try this:
    >
    > while :; do
    > for i in 1 2 3; do
    > echo 0 > /sys/devices/system/cpu/cpu$i/online
    > done
    > for i in 1 2 3; do
    > echo 1 > /sys/devices/system/cpu/cpu$i/online
    > done
    > done
    >
    > Well, sure enough, that locked up too, with the same users. Doing a
    > sysrq-w (showing all blocked tasks):
    >
    > [ 2991.344562] task PC stack pid father
    > [ 2991.344562] rcu_preempt D ffff88007986fdf8 0 10 2 0x00000000
    > [ 2991.344562] ffff88007986fc98 0000000000000002 ffff88007986fc48 0000000000000908
    > [ 2991.344562] ffff88007986c280 ffff88007986ffd8 ffff88007986ffd8 00000000001d3c80
    > [ 2991.344562] ffff880079248a40 ffff88007986c280 0000000000000000 00000000fffd4295
    > [ 2991.344562] Call Trace:
    > [ 2991.344562] [] schedule+0x64/0x66
    > [ 2991.344562] [] schedule_timeout+0xbc/0xf9
    > [ 2991.344562] [] ? ftrace_call+0x5/0x2f
    > [ 2991.344562] [] ? cascade+0xa8/0xa8
    > [ 2991.344562] [] schedule_timeout_uninterruptible+0x1e/0x20
    > [ 2991.344562] [] rcu_gp_kthread+0x502/0x94b
    > [ 2991.344562] [] ? __init_waitqueue_head+0x50/0x50
    > [ 2991.344562] [] ? rcu_gp_fqs+0x64/0x64
    > [ 2991.344562] [] kthread+0xb1/0xb9
    > [ 2991.344562] [] ? lock_release_holdtime.part.23+0x4e/0x55
    > [ 2991.344562] [] ? __init_kthread_worker+0x58/0x58
    > [ 2991.344562] [] ret_from_fork+0x7c/0xb0
    > [ 2991.344562] [] ? __init_kthread_worker+0x58/0x58
    > [ 2991.344562] kworker/0:1 D ffffffff81a30680 0 47 2 0x00000000
    > [ 2991.344562] Workqueue: events cpuset_hotplug_workfn
    > [ 2991.344562] ffff880078dbbb58 0000000000000002 0000000000000006 00000000000000d8
    > [ 2991.344562] ffff880078db8100 ffff880078dbbfd8 ffff880078dbbfd8 00000000001d3c80
    > [ 2991.344562] ffff8800779ca5c0 ffff880078db8100 ffffffff81541fcf 0000000000000000
    > [ 2991.344562] Call Trace:
    > [ 2991.344562] [] ? __mutex_lock_common+0x3d4/0x609
    > [ 2991.344562] [] schedule+0x64/0x66
    > [ 2991.344562] [] schedule_preempt_disabled+0x18/0x24
    > [ 2991.344562] [] __mutex_lock_common+0x3d4/0x609
    > [ 2991.344562] [] ? get_online_cpus+0x3c/0x50
    > [ 2991.344562] [] ? get_online_cpus+0x3c/0x50
    > [ 2991.344562] [] mutex_lock_nested+0x3b/0x40
    > [ 2991.344562] [] get_online_cpus+0x3c/0x50
    > [ 2991.344562] [] rebuild_sched_domains_locked+0x6e/0x3a8
    > [ 2991.344562] [] rebuild_sched_domains+0x1c/0x2a
    > [ 2991.344562] [] cpuset_hotplug_workfn+0x1c7/0x1d3
    > [ 2991.344562] [] ? cpuset_hotplug_workfn+0x5/0x1d3
    > [ 2991.344562] [] process_one_work+0x2d4/0x4d1
    > [ 2991.344562] [] ? process_one_work+0x207/0x4d1
    > [ 2991.344562] [] worker_thread+0x2e7/0x3b5
    > [ 2991.344562] [] ? rescuer_thread+0x332/0x332
    > [ 2991.344562] [] kthread+0xb1/0xb9
    > [ 2991.344562] [] ? __init_kthread_worker+0x58/0x58
    > [ 2991.344562] [] ret_from_fork+0x7c/0xb0
    > [ 2991.344562] [] ? __init_kthread_worker+0x58/0x58
    > [ 2991.344562] bash D ffffffff81a4aa80 0 2618 2612 0x10000000
    > [ 2991.344562] ffff8800379abb58 0000000000000002 0000000000000006 0000000000000c2c
    > [ 2991.344562] ffff880077fea140 ffff8800379abfd8 ffff8800379abfd8 00000000001d3c80
    > [ 2991.344562] ffff8800779ca5c0 ffff880077fea140 ffffffff81541fcf 0000000000000000
    > [ 2991.344562] Call Trace:
    > [ 2991.344562] [] ? __mutex_lock_common+0x3d4/0x609
    > [ 2991.344562] [] schedule+0x64/0x66
    > [ 2991.344562] [] schedule_preempt_disabled+0x18/0x24
    > [ 2991.344562] [] __mutex_lock_common+0x3d4/0x609
    > [ 2991.344562] [] ? rcu_cpu_notify+0x2f5/0x86e
    > [ 2991.344562] [] ? rcu_cpu_notify+0x2f5/0x86e
    > [ 2991.344562] [] mutex_lock_nested+0x3b/0x40
    > [ 2991.344562] [] rcu_cpu_notify+0x2f5/0x86e
    > [ 2991.344562] [] ? __lock_is_held+0x32/0x53
    > [ 2991.344562] [] notifier_call_chain+0x6b/0x98
    > [ 2991.344562] [] __raw_notifier_call_chain+0xe/0x10
    > [ 2991.344562] [] __cpu_notify+0x20/0x32
    > [ 2991.344562] [] cpu_notify_nofail+0x17/0x36
    > [ 2991.344562] [] _cpu_down+0x154/0x259
    > [ 2991.344562] [] cpu_down+0x2d/0x3a
    > [ 2991.344562] [] store_online+0x4e/0xe7
    > [ 2991.344562] [] dev_attr_store+0x20/0x22
    > [ 2991.344562] [] sysfs_write_file+0x108/0x144
    > [ 2991.344562] [] vfs_write+0xfd/0x158
    > [ 2991.344562] [] SyS_write+0x5c/0x83
    > [ 2991.344562] [] tracesys+0xdd/0xe2
    >
    > As well as held locks:
    >
    > [ 3034.728033] Showing all locks held in the system:
    > [ 3034.728033] 1 lock held by rcu_preempt/10:
    > [ 3034.728033] #0: (rcu_preempt_state.onoff_mutex){+.+...}, at: [] rcu_gp_kthread+0x167/0x94b
    > [ 3034.728033] 4 locks held by kworker/0:1/47:
    > [ 3034.728033] #0: (events){.+.+.+}, at: [] process_one_work+0x207/0x4d1
    > [ 3034.728033] #1: (cpuset_hotplug_work){+.+.+.}, at: [] process_one_work+0x207/0x4d1
    > [ 3034.728033] #2: (cpuset_mutex){+.+.+.}, at: [] rebuild_sched_domains+0x17/0x2a
    > [ 3034.728033] #3: (cpu_hotplug.lock){+.+.+.}, at: [] get_online_cpus+0x3c/0x50
    > [ 3034.728033] 1 lock held by mingetty/2563:
    > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [] n_tty_read+0x252/0x7e8
    > [ 3034.728033] 1 lock held by mingetty/2565:
    > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [] n_tty_read+0x252/0x7e8
    > [ 3034.728033] 1 lock held by mingetty/2569:
    > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [] n_tty_read+0x252/0x7e8
    > [ 3034.728033] 1 lock held by mingetty/2572:
    > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [] n_tty_read+0x252/0x7e8
    > [ 3034.728033] 1 lock held by mingetty/2575:
    > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [] n_tty_read+0x252/0x7e8
    > [ 3034.728033] 7 locks held by bash/2618:
    > [ 3034.728033] #0: (sb_writers#5){.+.+.+}, at: [] file_start_write+0x2a/0x2c
    > [ 3034.728033] #1: (&buffer->mutex#2){+.+.+.}, at: [] sysfs_write_file+0x3c/0x144
    > [ 3034.728033] #2: (s_active#54){.+.+.+}, at: [] sysfs_write_file+0xe7/0x144
    > [ 3034.728033] #3: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [] cpu_hotplug_driver_lock+0x17/0x19
    > [ 3034.728033] #4: (cpu_add_remove_lock){+.+.+.}, at: [] cpu_maps_update_begin+0x17/0x19
    > [ 3034.728033] #5: (cpu_hotplug.lock){+.+.+.}, at: [] cpu_hotplug_begin+0x2c/0x6d
    > [ 3034.728033] #6: (rcu_preempt_state.onoff_mutex){+.+...}, at: [] rcu_cpu_notify+0x2f5/0x86e
    > [ 3034.728033] 1 lock held by bash/2980:
    > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [] n_tty_read+0x252/0x7e8
    >
    > Things looked a little weird. Also, this is a deadlock that lockdep did
    > not catch. But what we have here does not look like a circular lock
    > issue:
    >
    > Bash is blocked in rcu_cpu_notify():
    >
    > 1961 /* Exclude any attempts to start a new grace period. */
    > 1962 mutex_lock(&rsp->onoff_mutex);
    >
    >
    > kworker is blocked in get_online_cpus(), which makes sense as we are
    > currently taking down a CPU.
    >
    > But rcu_preempt is not blocked on anything. It is simply sleeping in
    > rcu_gp_kthread (really rcu_gp_init) here:
    >
    > 1453 #ifdef CONFIG_PROVE_RCU_DELAY
    > 1454 if ((prandom_u32() % (rcu_num_nodes * 8)) == 0 &&
    > 1455 system_state == SYSTEM_RUNNING)
    > 1456 schedule_timeout_uninterruptible(2);
    > 1457 #endif /* #ifdef CONFIG_PROVE_RCU_DELAY */
    >
    > And it does this while holding the onoff_mutex that bash is waiting for.
    >
    > Doing a function trace, it showed me where it happened:
    >
    > [ 125.940066] rcu_pree-10 3.... 28384115273: schedule_timeout_uninterruptible [...]
    > [ 125.940066] rcu_pree-10 3d..3 28384202439: sched_switch: prev_comm=rcu_preempt prev_pid=10 prev_prio=120 prev_state=D ==> next_comm=watchdog/3 next_pid=38 next_prio=120
    >
    > The watchdog ran, and then:
    >
    > [ 125.940066] watchdog-38 3d..3 28384692863: sched_switch: prev_comm=watchdog/3 prev_pid=38 prev_prio=120 prev_state=P ==> next_comm=modprobe next_pid=2848 next_prio=118
    >
    > Not sure what modprobe was doing, but shortly after that:
    >
    > [ 125.940066] modprobe-2848 3d..3 28385041749: sched_switch: prev_comm=modprobe prev_pid=2848 prev_prio=118 prev_state=R+ ==> next_comm=migration/3 next_pid=40 next_prio=0
    >
    > Where the migration thread took down the CPU:
    >
    > [ 125.940066] migratio-40 3d..3 28389148276: sched_switch: prev_comm=migration/3 prev_pid=40 prev_prio=0 prev_state=P ==> next_comm=swapper/3 next_pid=0 next_prio=120
    >
    > which finally did:
    >
    > [ 125.940066] -0 3...1 28389282142: arch_cpu_idle_dead [ 125.940066] -0 3...1 28389282548: native_play_dead [ 125.940066] -0 3...1 28389282924: play_dead_common [ 125.940066] -0 3...1 28389283468: idle_task_exit [ 125.940066] -0 3...1 28389284644: amd_e400_remove_cpu
    >
    > CPU 3 is now offline, the rcu_preempt thread that ran on CPU 3 is still
    > doing a schedule_timeout_uninterruptible() and it registered it's
    > timeout to the timer base for CPU 3. You would think that it would get
    > migrated right? The issue here is that the timer migration happens at
    > the CPU notifier for CPU_DEAD. The problem is that the rcu notifier for
    > CPU_DOWN is blocked waiting for the onoff_mutex to be released, which is
    > held by the thread that just put itself into a uninterruptible sleep,
    > that wont wake up until the CPU_DEAD notifier of the timer
    > infrastructure is called, which wont happen until the rcu notifier
    > finishes. Here's our deadlock!

    This commit breaks this deadlock cycle by substituting a shorter udelay()
    for the previous schedule_timeout_uninterruptible(), while at the same
    time increasing the probability of the delay. This maintains the intensity
    of the testing.

    Reported-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney
    Tested-by: Steven Rostedt

    Paul E. McKenney
     
  • This commit fixes a lockdep-detected deadlock by moving a wake_up()
    call out from a rnp->lock critical section. Please see below for
    the long version of this story.

    On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:

    > [12572.705832] ======================================================
    > [12572.750317] [ INFO: possible circular locking dependency detected ]
    > [12572.796978] 3.10.0-rc3+ #39 Not tainted
    > [12572.833381] -------------------------------------------------------
    > [12572.862233] trinity-child17/31341 is trying to acquire lock:
    > [12572.870390] (rcu_node_0){..-.-.}, at: [] rcu_read_unlock_special+0x9f/0x4c0
    > [12572.878859]
    > but task is already holding lock:
    > [12572.894894] (&ctx->lock){-.-...}, at: [] perf_lock_task_context+0x7d/0x2d0
    > [12572.903381]
    > which lock already depends on the new lock.
    >
    > [12572.927541]
    > the existing dependency chain (in reverse order) is:
    > [12572.943736]
    > -> #4 (&ctx->lock){-.-...}:
    > [12572.960032] [] lock_acquire+0x91/0x1f0
    > [12572.968337] [] _raw_spin_lock+0x40/0x80
    > [12572.976633] [] __perf_event_task_sched_out+0x2e7/0x5e0
    > [12572.984969] [] perf_event_task_sched_out+0x93/0xa0
    > [12572.993326] [] __schedule+0x2cf/0x9c0
    > [12573.001652] [] schedule_user+0x2e/0x70
    > [12573.009998] [] retint_careful+0x12/0x2e
    > [12573.018321]
    > -> #3 (&rq->lock){-.-.-.}:
    > [12573.034628] [] lock_acquire+0x91/0x1f0
    > [12573.042930] [] _raw_spin_lock+0x40/0x80
    > [12573.051248] [] wake_up_new_task+0xb7/0x260
    > [12573.059579] [] do_fork+0x105/0x470
    > [12573.067880] [] kernel_thread+0x26/0x30
    > [12573.076202] [] rest_init+0x23/0x140
    > [12573.084508] [] start_kernel+0x3f1/0x3fe
    > [12573.092852] [] x86_64_start_reservations+0x2a/0x2c
    > [12573.101233] [] x86_64_start_kernel+0xcc/0xcf
    > [12573.109528]
    > -> #2 (&p->pi_lock){-.-.-.}:
    > [12573.125675] [] lock_acquire+0x91/0x1f0
    > [12573.133829] [] _raw_spin_lock_irqsave+0x4b/0x90
    > [12573.141964] [] try_to_wake_up+0x31/0x320
    > [12573.150065] [] default_wake_function+0x12/0x20
    > [12573.158151] [] autoremove_wake_function+0x18/0x40
    > [12573.166195] [] __wake_up_common+0x58/0x90
    > [12573.174215] [] __wake_up+0x39/0x50
    > [12573.182146] [] rcu_start_gp_advanced.isra.11+0x4a/0x50
    > [12573.190119] [] rcu_start_future_gp+0x1c9/0x1f0
    > [12573.198023] [] rcu_nocb_kthread+0x114/0x930
    > [12573.205860] [] kthread+0xed/0x100
    > [12573.213656] [] ret_from_fork+0x7c/0xb0
    > [12573.221379]
    > -> #1 (&rsp->gp_wq){..-.-.}:
    > [12573.236329] [] lock_acquire+0x91/0x1f0
    > [12573.243783] [] _raw_spin_lock_irqsave+0x4b/0x90
    > [12573.251178] [] __wake_up+0x23/0x50
    > [12573.258505] [] rcu_start_gp_advanced.isra.11+0x4a/0x50
    > [12573.265891] [] rcu_start_future_gp+0x1c9/0x1f0
    > [12573.273248] [] rcu_nocb_kthread+0x114/0x930
    > [12573.280564] [] kthread+0xed/0x100
    > [12573.287807] [] ret_from_fork+0x7c/0xb0

    Notice the above call chain.

    rcu_start_future_gp() is called with the rnp->lock held. Then it calls
    rcu_start_gp_advance, which does a wakeup.

    You can't do wakeups while holding the rnp->lock, as that would mean
    that you could not do a rcu_read_unlock() while holding the rq lock, or
    any lock that was taken while holding the rq lock. This is because...
    (See below).

    > [12573.295067]
    > -> #0 (rcu_node_0){..-.-.}:
    > [12573.309293] [] __lock_acquire+0x1786/0x1af0
    > [12573.316568] [] lock_acquire+0x91/0x1f0
    > [12573.323825] [] _raw_spin_lock+0x40/0x80
    > [12573.331081] [] rcu_read_unlock_special+0x9f/0x4c0
    > [12573.338377] [] __rcu_read_unlock+0x96/0xa0
    > [12573.345648] [] perf_lock_task_context+0x143/0x2d0
    > [12573.352942] [] find_get_context+0x4e/0x1f0
    > [12573.360211] [] SYSC_perf_event_open+0x514/0xbd0
    > [12573.367514] [] SyS_perf_event_open+0x9/0x10
    > [12573.374816] [] tracesys+0xdd/0xe2

    Notice the above trace.

    perf took its own ctx->lock, which can be taken while holding the rq
    lock. While holding this lock, it did a rcu_read_unlock(). The
    perf_lock_task_context() basically looks like:

    rcu_read_lock();
    raw_spin_lock(ctx->lock);
    rcu_read_unlock();

    Now, what looks to have happened, is that we scheduled after taking that
    first rcu_read_lock() but before taking the spin lock. When we scheduled
    back in and took the ctx->lock, the following rcu_read_unlock()
    triggered the "special" code.

    The rcu_read_unlock_special() takes the rnp->lock, which gives us a
    possible deadlock scenario.

    CPU0 CPU1 CPU2
    ---- ---- ----

    rcu_nocb_kthread()
    lock(rq->lock);
    lock(ctx->lock);
    lock(rnp->lock);

    wake_up();

    lock(rq->lock);

    rcu_read_unlock();

    rcu_read_unlock_special();

    lock(rnp->lock);
    lock(ctx->lock);

    **** DEADLOCK ****

    > [12573.382068]
    > other info that might help us debug this:
    >
    > [12573.403229] Chain exists of:
    > rcu_node_0 --> &rq->lock --> &ctx->lock
    >
    > [12573.424471] Possible unsafe locking scenario:
    >
    > [12573.438499] CPU0 CPU1
    > [12573.445599] ---- ----
    > [12573.452691] lock(&ctx->lock);
    > [12573.459799] lock(&rq->lock);
    > [12573.467010] lock(&ctx->lock);
    > [12573.474192] lock(rcu_node_0);
    > [12573.481262]
    > *** DEADLOCK ***
    >
    > [12573.501931] 1 lock held by trinity-child17/31341:
    > [12573.508990] #0: (&ctx->lock){-.-...}, at: [] perf_lock_task_context+0x7d/0x2d0
    > [12573.516475]
    > stack backtrace:
    > [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
    > [12573.545357] ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
    > [12573.552868] ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
    > [12573.560353] 0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
    > [12573.567856] Call Trace:
    > [12573.575011] [] dump_stack+0x19/0x1b
    > [12573.582284] [] print_circular_bug+0x200/0x20f
    > [12573.589637] [] __lock_acquire+0x1786/0x1af0
    > [12573.596982] [] ? sched_clock_cpu+0xb5/0x100
    > [12573.604344] [] lock_acquire+0x91/0x1f0
    > [12573.611652] [] ? rcu_read_unlock_special+0x9f/0x4c0
    > [12573.619030] [] _raw_spin_lock+0x40/0x80
    > [12573.626331] [] ? rcu_read_unlock_special+0x9f/0x4c0
    > [12573.633671] [] rcu_read_unlock_special+0x9f/0x4c0
    > [12573.640992] [] ? perf_lock_task_context+0x7d/0x2d0
    > [12573.648330] [] ? put_lock_stats.isra.29+0xe/0x40
    > [12573.655662] [] ? delay_tsc+0x90/0xe0
    > [12573.662964] [] __rcu_read_unlock+0x96/0xa0
    > [12573.670276] [] perf_lock_task_context+0x143/0x2d0
    > [12573.677622] [] ? __perf_event_enable+0x370/0x370
    > [12573.684981] [] find_get_context+0x4e/0x1f0
    > [12573.692358] [] SYSC_perf_event_open+0x514/0xbd0
    > [12573.699753] [] ? get_parent_ip+0xd/0x50
    > [12573.707135] [] ? trace_hardirqs_on_caller+0xfd/0x1c0
    > [12573.714599] [] SyS_perf_event_open+0x9/0x10
    > [12573.721996] [] tracesys+0xdd/0xe2

    This commit delays the wakeup via irq_work(), which is what
    perf and ftrace use to perform wakeups in critical sections.

    Reported-by: Dave Jones
    Signed-off-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney

    Steven Rostedt
     

09 Jun, 2013

5 commits

  • Pull timer fixes from Thomas Gleixner:

    - Trivial: unused variable removal

    - Posix-timers: Add the clock ID to the new proc interface to make it
    useful. The interface is new and should be functional when we reach
    the final 3.10 release.

    - Cure a false positive warning in the tick code introduced by the
    overhaul in 3.10

    - Fix for a persistent clock detection regression introduced in this
    cycle

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    timekeeping: Correct run-time detection of persistent_clock.
    ntp: Remove unused variable flags in __hardpps
    posix-timers: Show clock ID in proc file
    tick: Cure broadcast false positive pending bit warning

    Linus Torvalds
     
  • Pull irqdomain bug fixes from Grant Likely:
    "This branch contains a set of straight forward bug fixes to the
    irqdomain code and to a couple of drivers that make use of it."

    * tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux:
    irqchip: Return -EPERM for reserved IRQs
    irqdomain: document the simple domain first_irq
    kernel/irq/irqdomain.c: before use 'irq_data', need check it whether valid.
    irqdomain: export irq_domain_add_simple

    Linus Torvalds
     
  • The first_irq needs to be zero to get a linear domain and that
    comes with special semantics. We want to simplify this going
    forward but some documentation never hurts.

    Signed-off-by: Linus Walleij
    Signed-off-by: Grant Likely

    Linus Walleij
     
  • Since irq_data may be NULL, if so, we WARN_ON(), and continue, 'hwirq'
    which related with 'irq_data' has to initialize later, or it will cause
    issue.

    Signed-off-by: Chen Gang
    Signed-off-by: Grant Likely

    Chen Gang
     
  • All other irq_domain_add_* functions are exported already, and apparently
    this one got left out by mistake, which causes build errors for ARM
    allmodconfig kernels:

    ERROR: "irq_domain_add_simple" [drivers/gpio/gpio-rcar.ko] undefined!
    ERROR: "irq_domain_add_simple" [drivers/gpio/gpio-em.ko] undefined!

    Signed-off-by: Arnd Bergmann
    Acked-by: Simon Horman
    Signed-off-by: Grant Likely

    Arnd Bergmann
     

08 Jun, 2013

1 commit

  • …l/git/rostedt/linux-trace

    Pull tracing fixes from Steven Rostedt:
    "This contains 4 fixes.

    The first two fix the case where full RCU debugging is enabled,
    enabling function tracing causes a live lock of the system. This is
    due to the added debug checks in rcu_dereference_raw() that is used by
    the function tracer. These checks are also traced by the function
    tracer as well as cause enough overhead to the function tracer to slow
    down the system enough that the time to finish an interrupt can take
    longer than when the next interrupt is triggered, causing a live lock
    from the timer interrupt.

    Talking this over with Paul McKenney, we came up with a fix that adds
    a new rcu_dereference_raw_notrace() that does not perform these added
    checks, and let the function tracer use that.

    The third commit fixes a failed compile when branch tracing is
    enabled, due to the conversion of the trace_test_buffer() selftest
    that the branch trace wasn't converted for.

    The forth patch fixes a bug caught by the RCU lockdep code where a
    rcu_read_lock() is performed when rcu is disabled (either going to or
    from idle, or user space). This happened on the irqsoff tracer as it
    calls task_uid(). The fix here was to use current_uid() when possible
    that doesn't use rcu locking. Which luckily, is always used when
    irqsoff calls this code."

    * tag 'trace-fixes-v3.10-rc3-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Use current_uid() for critical time tracing
    tracing: Fix bad parameter passed in branch selftest
    ftrace: Use the rcu _notrace variants for rcu_dereference_raw() and friends
    rcu: Add _notrace variation of rcu_dereference_raw() and hlist_for_each_entry_rcu()

    Linus Torvalds
     

07 Jun, 2013

1 commit

  • The irqsoff tracer records the max time that interrupts are disabled.
    There are hooks in the assembly code that calls back into the tracer when
    interrupts are disabled or enabled.

    When they are enabled, the tracer checks if the amount of time they
    were disabled is larger than the previous recorded max interrupts off
    time. If it is, it creates a snapshot of the currently running trace
    to store where the last largest interrupts off time was held and how
    it happened.

    During testing, this RCU lockdep dump appeared:

    [ 1257.829021] ===============================
    [ 1257.829021] [ INFO: suspicious RCU usage. ]
    [ 1257.829021] 3.10.0-rc1-test+ #171 Tainted: G W
    [ 1257.829021] -------------------------------
    [ 1257.829021] /home/rostedt/work/git/linux-trace.git/include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
    [ 1257.829021]
    [ 1257.829021] other info that might help us debug this:
    [ 1257.829021]
    [ 1257.829021]
    [ 1257.829021] RCU used illegally from idle CPU!
    [ 1257.829021] rcu_scheduler_active = 1, debug_locks = 0
    [ 1257.829021] RCU used illegally from extended quiescent state!
    [ 1257.829021] 2 locks held by trace-cmd/4831:
    [ 1257.829021] #0: (max_trace_lock){......}, at: [] stop_critical_timing+0x1a3/0x209
    [ 1257.829021] #1: (rcu_read_lock){.+.+..}, at: [] __update_max_tr+0x88/0x1ee
    [ 1257.829021]
    [ 1257.829021] stack backtrace:
    [ 1257.829021] CPU: 3 PID: 4831 Comm: trace-cmd Tainted: G W 3.10.0-rc1-test+ #171
    [ 1257.829021] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
    [ 1257.829021] 0000000000000001 ffff880065f49da8 ffffffff8153dd2b ffff880065f49dd8
    [ 1257.829021] ffffffff81092a00 ffff88006bd78680 ffff88007add7500 0000000000000003
    [ 1257.829021] ffff88006bd78680 ffff880065f49e18 ffffffff810daebf ffffffff810dae5a
    [ 1257.829021] Call Trace:
    [ 1257.829021] [] dump_stack+0x19/0x1b
    [ 1257.829021] [] lockdep_rcu_suspicious+0x109/0x112
    [ 1257.829021] [] __update_max_tr+0xed/0x1ee
    [ 1257.829021] [] ? __update_max_tr+0x88/0x1ee
    [ 1257.829021] [] ? user_enter+0xfd/0x107
    [ 1257.829021] [] update_max_tr_single+0x11d/0x12d
    [ 1257.829021] [] ? user_enter+0xfd/0x107
    [ 1257.829021] [] stop_critical_timing+0x141/0x209
    [ 1257.829021] [] ? trace_hardirqs_on+0xd/0xf
    [ 1257.829021] [] ? user_enter+0xfd/0x107
    [ 1257.829021] [] time_hardirqs_on+0x2a/0x2f
    [ 1257.829021] [] ? user_enter+0xfd/0x107
    [ 1257.829021] [] trace_hardirqs_on_caller+0x16/0x197
    [ 1257.829021] [] trace_hardirqs_on+0xd/0xf
    [ 1257.829021] [] user_enter+0xfd/0x107
    [ 1257.829021] [] do_notify_resume+0x92/0x97
    [ 1257.829021] [] int_signal+0x12/0x17

    What happened was entering into the user code, the interrupts were enabled
    and a max interrupts off was recorded. The trace buffer was saved along with
    various information about the task: comm, pid, uid, priority, etc.

    The uid is recorded with task_uid(tsk). But this is a macro that uses rcu_read_lock()
    to retrieve the data, and this happened to happen where RCU is blind (user_enter).

    As only the preempt and irqs off tracers can have this happen, and they both
    only have the tsk == current, if tsk == current, use current_uid() instead of
    task_uid(), as current_uid() does not use RCU as only current can change its uid.

    This fixes the RCU suspicious splat.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

03 Jun, 2013

1 commit

  • Pull cgroup fixes from Tejun Heo:

    - Fix for yet another xattr bug which may lead to NULL deref.

    - A subtle bug in for_each_descendant_pre(). This bug requires quite
    specific conditions to trigger and isn't too likely to actually
    happen in the wild, but maybe that just makes it that much more
    nastier.

    - A warning message added for silly cgroup re-mount (not -o remount,
    but unmount followed by mount) behavior.

    * 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: warn about mismatching options of a new mount of an existing hierarchy
    cgroup: fix a subtle bug in descendant pre-order walk
    cgroup: initialize xattr before calling d_instantiate()

    Linus Torvalds
     

31 May, 2013

1 commit

  • Pull x86 fixes from Peter Anvin:

    - Three EFI-related fixes

    - Two early memory initialization fixes

    - build fix for older binutils

    - fix for an eager FPU performance regression -- currently we don't
    allow the use of the FPU at interrupt time *at all* in eager mode,
    which is clearly wrong.

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86: Allow FPU to be used at interrupt time even with eagerfpu
    x86, crc32-pclmul: Fix build with older binutils
    x86-64, init: Fix a possible wraparound bug in switchover in head_64.S
    x86, range: fix missing merge during add range
    x86, efi: initial the local variable of DataSize to zero
    efivar: fix oops in efivar_update_sysfs_entries() caused by memory reuse
    efivarfs: Never return ENOENT from firmware again

    Linus Torvalds
     

30 May, 2013

1 commit

  • The branch selftest calls trace_test_buffer(), but with the new code
    it expects the first parameter to be a pointer to a struct trace_buffer.
    All self tests were changed but the branch selftest was missed.

    This caused either a crash or failed test when the branch selftest was
    enabled.

    Link: http://lkml.kernel.org/r/20130529141333.GA24064@localhost

    Reported-by: Fengguang Wu
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

29 May, 2013

6 commits

  • Thomas Gleixner
     
  • As rcu_dereference_raw() under RCU debug config options can add quite a
    bit of checks, and that tracing uses rcu_dereference_raw(), these checks
    happen with the function tracer. The function tracer also happens to trace
    these debug checks too. This added overhead can livelock the system.

    Have the function tracer use the new RCU _notrace equivalents that do
    not do the debug checks for RCU.

    Link: http://lkml.kernel.org/r/20130528184209.467603904@goodmis.org

    Acked-by: Paul E. McKenney
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • With the new __DEVEL__sane_behavior mount option was introduced,
    if the root cgroup is alive with no xattr function, to mount a
    new cgroup with xattr will be rejected in terms of design which
    just fine. However, if the root cgroup does not mounted with
    __DEVEL__sane_hehavior, to create a new cgroup with xattr option
    will succeed although after that the EA function does not works
    as expected but will get ENOTSUPP for setting up attributes under
    either cgroup. e.g.

    setfattr: /cgroup2/test: Operation not supported

    Instead of keeping silence in this case, it's better to drop a log
    entry in warning level. That would be helpful to understand the
    reason behind the scene from the user's perspective, and this is
    essentially an improvement does not break the backward compatibilities.

    With this fix, above mount attemption will keep up works as usual but
    the following line cound be found at the system log:

    [ ...] cgroup: new mount options do not match the existing superblock

    tj: minor formatting / message updates.

    Signed-off-by: Jie Liu
    Reported-by: Alexey Kodanev
    Signed-off-by: Tejun Heo
    Cc: stable@vger.kernel.org

    Jeff Liu
     
  • Since commit 31ade30692dc9680bfc95700d794818fa3f754ac, timekeeping_init()
    checks for presence of persistent clock by attempting to read a non-zero
    time value. This is an issue on platforms where persistent_clock (instead
    is implemented as a free-running counter (instead of an RTC) starting
    from zero on each boot and running during suspend. Examples are some ARM
    platforms (e.g. PandaBoard).

    An attempt to read such a clock during timekeeping_init() may return zero
    value and falsely declare persistent clock as missing. Additionally, in
    the above case suspend times may be accounted twice (once from
    timekeeping_resume() and once from rtc_resume()), resulting in a gradual
    drift of system time.

    This patch does a run-time correction of the issue by doing the same check
    during timekeeping_suspend().

    A better long-term solution would have to return error when trying to read
    non-existing clock and zero when trying to read an uninitialized clock, but
    that would require changing all persistent_clock implementations.

    This patch addresses the immediate breakage, for now.

    Cc: John Stultz
    Cc: Thomas Gleixner
    Cc: Feng Tang
    Cc: stable@vger.kernel.org
    Signed-off-by: Zoran Markovic
    [jstultz: Tweaked commit message and subject]
    Signed-off-by: John Stultz

    Zoran Markovic
     
  • kernel/time/ntp.c: In function ‘__hardpps’:
    kernel/time/ntp.c:877: warning: unused variable ‘flags’

    commit a076b2146fabb0894cae5e0189a8ba3f1502d737 ("ntp: Remove ntp_lock,
    using the timekeeping locks to protect ntp state") removed its users,
    but not the actual variable.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: John Stultz

    Geert Uytterhoeven
     
  • …it/rostedt/linux-trace

    Pull tracing fixes from Steven Rostedt:
    "Two more fixes:

    The first one was reported by Mauro Carvalho Chehab, where if a poll()
    is done against a trace buffer for a CPU that has never been online,
    it will crash the kernel, as buffers are only created when a CPU comes
    on line, but the trace files are for all possible CPUs.

    This fix is to check if the buffer was allocated and if not return
    -EINVAL.

    That was the simple fix, the real fix is a bit more complex and not
    for a -rc release. We could have the files created when the CPUs come
    online. That would require some design changes.

    The second one was reported by Peter Zijlstra. If the kernel command
    line has ftrace=nop, it will lock up the system on boot up. This is
    because the new design for 3.10 has the nop tracer bootstrap the
    tracing subsystem. When ftrace=<trace> is defined, when a that tracer
    is registered, it starts the tracing, but uses the nop tracer to clear
    things out. What happened here was that ftrace=nop caused the
    registering of nop to start it and use nop before it was initialized.

    The only thing nop needs to have done to initialize it is to have the
    tracer point its current_tracer structure member to the nop tracer.
    Doing that before registering the nop tracer makes everything work."

    * tag 'trace-fixes-v3.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    ring-buffer: Do not poll non allocated cpu buffers
    tracing: Fix crash when ftrace=nop on the kernel command line

    Linus Torvalds
     

28 May, 2013

2 commits

  • The tracing infrastructure sets up for possible CPUs, but it uses
    the ring buffer polling, it is possible to call the ring buffer
    polling code with a CPU that hasn't been allocated. This will cause
    a kernel oops when it access a ring buffer cpu buffer that is part
    of the possible cpus but hasn't been allocated yet as the CPU has never
    been online.

    Reported-by: Mauro Carvalho Chehab
    Tested-by: Mauro Carvalho Chehab
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • commit 26517f3e (tick: Avoid programming the local cpu timer if
    broadcast pending) added a warning if the cpu enters broadcast mode
    again while the pending bit is still set. Meelis reported that the
    warning triggers. There are two corner cases which have been not
    considered:

    1) cpuidle calls clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
    twice. That can result in the following scenario

    CPU0 CPU1
    cpuidle_idle_call()
    clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
    set cpu in tick_broadcast_oneshot_mask

    broadcast interrupt
    event expired for cpu1
    set pending bit

    acpi_idle_enter_simple()
    clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
    WARN_ON(pending bit)

    Move the WARN_ON into the section where we enter broadcast mode so
    it wont provide false positives on the second call.

    2) safe_halt() enables interrupts, so a broadcast interrupt can be
    delivered befor the broadcast mode is disabled. That sets the
    pending bit for the CPU which receives the broadcast
    interrupt. Though the interrupt is delivered right away from the
    broadcast handler and leaves the pending bit stale.

    Clear the pending bit for the current cpu in the broadcast handler.

    Reported-and-tested-by: Meelis Roos
    Cc: Len Brown
    Cc: Frederic Weisbecker
    Cc: Borislav Petkov
    Cc: Rafael J. Wysocki
    Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305271841130.4220@ionos
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

25 May, 2013

2 commits

  • Fix kernel-doc warnings in kernel/auditfilter.c:

    Warning(kernel/auditfilter.c:1029): Excess function parameter 'loginuid' description in 'audit_receive_filter'
    Warning(kernel/auditfilter.c:1029): Excess function parameter 'sessionid' description in 'audit_receive_filter'
    Warning(kernel/auditfilter.c:1029): Excess function parameter 'sid' description in 'audit_receive_filter'

    Signed-off-by: Randy Dunlap
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • …it/rostedt/linux-trace

    Pull tracing fix from Steven Rostedt:
    "Masami Hiramatsu fixed another bug. This time returning a proper
    result in event_enable_func(). After checking the return status of
    try_module_get(), it returned the status of try_module_get().

    But try_module_get() returns 0 on failure, which is success for
    event_enable_func()"

    * tag 'trace-fixes-v3.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Return -EBUSY when event_enable_func() fails to get module

    Linus Torvalds
     

24 May, 2013

1 commit

  • When cgroup_next_descendant_pre() initiates a walk, it checks whether
    the subtree root doesn't have any children and if not returns NULL.
    Later code assumes that the subtree isn't empty. This is broken
    because the subtree may become empty inbetween, which can lead to the
    traversal escaping the subtree by walking to the sibling of the
    subtree root.

    There's no reason to have the early exit path. Remove it along with
    the later assumption that the subtree isn't empty. This simplifies
    the code a bit and fixes the subtle bug.

    While at it, fix the comment of cgroup_for_each_descendant_pre() which
    was incorrectly referring to ->css_offline() instead of
    ->css_online().

    Signed-off-by: Tejun Heo
    Reviewed-by: Michal Hocko
    Cc: stable@vger.kernel.org

    Tejun Heo
     

23 May, 2013

1 commit

  • If ftrace= is on the kernel command line, when that tracer is
    registered, it will be initiated by tracing_set_tracer() to execute that
    tracer.

    The nop tracer is just a stub tracer that is used to have no tracer
    enabled. It is assigned at early bootup as it is the default tracer.

    But if ftrace=nop is on the kernel command line, the registering of the
    nop tracer will call tracing_set_tracer() which will try to execute
    the nop tracer. But it expects tr->current_trace to be assigned something
    as it usually is assigned to the nop tracer. As it hasn't been assigned
    to anything yet, it causes the system to crash.

    The simple fix is to move the tr->current_trace = nop before registering
    the nop tracer. The functionality is still the same as the nop tracer
    doesn't do anything anyway.

    Reported-by: Peter Zijlstra
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

19 May, 2013

1 commit


18 May, 2013

1 commit

  • Christian found v3.9 does not work with E350 with EFI is enabled.

    [ 1.658832] Trying to unpack rootfs image as initramfs...
    [ 1.679935] BUG: unable to handle kernel paging request at ffff88006e3fd000
    [ 1.686940] IP: [] memset+0x1f/0xb0
    [ 1.692010] PGD 1f77067 PUD 1f7a067 PMD 61420067 PTE 0

    but early memtest report all memory could be accessed without problem.

    early page table is set in following sequence:
    [ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
    [ 0.000000] init_memory_mapping: [mem 0x6e600000-0x6e7fffff]
    [ 0.000000] init_memory_mapping: [mem 0x6c000000-0x6e5fffff]
    [ 0.000000] init_memory_mapping: [mem 0x00100000-0x6bffffff]
    [ 0.000000] init_memory_mapping: [mem 0x6e800000-0x6ea07fff]
    but later efi_enter_virtual_mode try set mapping again wrongly.
    [ 0.010644] pid_max: default: 32768 minimum: 301
    [ 0.015302] init_memory_mapping: [mem 0x640c5000-0x6e3fcfff]
    that means it fails with pfn_range_is_mapped.

    It turns out that we have a bug in add_range_with_merge and it does not
    merge range properly when new add one fill the hole between two exsiting
    ranges. In the case when [mem 0x00100000-0x6bffffff] is the hole between
    [mem 0x00000000-0x000fffff] and [mem 0x6c000000-0x6e7fffff].

    Fix the add_range_with_merge by calling itself recursively.

    Reported-by: "Christian König"
    Signed-off-by: Yinghai Lu
    Link: http://lkml.kernel.org/r/CAE9FiQVofGoSk7q5-0irjkBxemqK729cND4hov-1QCBJDhxpgQ@mail.gmail.com
    Cc: v3.9
    Signed-off-by: H. Peter Anvin

    Yinghai Lu
     

17 May, 2013

3 commits

  • As kmemleak now scans all module sections that are allocated, writable
    and non executable, there's no need to scan individual sections that
    might reference data.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Catalin Marinas
    Acked-by: Rusty Russell

    Steven Rostedt
     
  • Instead of just picking data sections by name (names that start
    with .data, .bss or .ref.data), use the section flags and scan all
    sections that are allocated, writable and not executable. Which should
    cover all sections of a module that might reference data.

    Signed-off-by: Steven Rostedt
    [catalin.marinas@arm.com: removed unused 'name' variable]
    [catalin.marinas@arm.com: collapsed 'if' blocks]
    Signed-off-by: Catalin Marinas
    Acked-by: Rusty Russell

    Steven Rostedt
     
  • Pull workqueue fixes from Tejun Heo:
    "Three more workqueue regression fixes.

    - Fix unbalanced unlock in trylock failure path of manage_workers().
    This shouldn't happen often in the wild but is possible.

    - While making schedule_work() and friends inline, they become
    unavailable to !GPL modules. Allow !GPL modules to access basic
    stuff - system_wq and queue_*work_on() - so that schedule_work()
    and friends can be used.

    - During boot, the unbound NUMA support code allocates a cpumask for
    each possible node using alloc_cpumask_var_node(), which ends up
    trying to allocate node-specific memory even for offline nodes
    triggering BUG in the memory alloc code. Use NUMA_NO_NODE for
    offline nodes."

    * 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: don't perform NUMA-aware allocations on offline nodes in wq_numa_init()
    workqueue: Make schedule_work() available again to non GPL modules
    workqueue: correct handling of the pool spin_lock

    Linus Torvalds