12 Dec, 2011

25 commits

  • Those two APIs were provided to optimize the calls of
    tick_nohz_idle_enter() and rcu_idle_enter() into a single
    irq disabled section. This way no interrupt happening in-between would
    needlessly process any RCU job.

    Now we are talking about an optimization for which benefits
    have yet to be measured. Let's start simple and completely decouple
    idle rcu and dyntick idle logics to simplify.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Reviewed-by: Josh Triplett
    Signed-off-by: Paul E. McKenney

    Frederic Weisbecker
     
  • Running CPU-hotplug operations concurrently with rcutorture has
    historically been a good way to find bugs in both RCU and CPU hotplug.
    This commit therefore adds an rcutorture module parameter called
    "onoff_interval" that causes a randomly selected CPU-hotplug operation to
    be executed at the specified interval, in seconds. The default value of
    "onoff_interval" is zero, which disables rcutorture-instigated CPU-hotplug
    operations.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Change from direct comparison of ->pid with zero to is_idle_task().

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Arnaldo Carvalho de Melo
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Change from direct comparison of ->pid with zero to is_idle_task().

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Cc: Jason Wessel
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Change from direct comparison of ->pid with zero to is_idle_task().

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Currently, if rcutorture is built into the kernel, it must be manually
    started or started from an init script. This is inconvenient for
    automated KVM testing, where it is good to be able to fully control
    rcutorture execution from the kernel parameters. This patch therefore
    adds a module parameter named "rcutorture_runnable" that defaults
    to zero ("don't start automatically"), but which can be set to one
    to cause rcutorture to start up immediately during boot.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Although it is easy to run rcutorture tests under KVM, there is currently
    no nice way to run such a test for a fixed time period, collect all of
    the rcutorture data, and then shut the system down cleanly. This commit
    therefore adds an rcutorture module parameter named "shutdown_secs" that
    specified the run duration in seconds, after which rcutorture terminates
    the test and powers the system down. The default value for "shutdown_secs"
    is zero, which disables shutdown.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • RCU has traditionally relied on idle_cpu() to determine whether a given
    CPU is running in the context of an idle task, but commit 908a3283
    (Fix idle_cpu()) has invalidated this approach. After commit 908a3283,
    idle_cpu() will return true if the current CPU is currently running the
    idle task, and will be doing so for the foreseeable future. RCU instead
    needs to know whether or not the current CPU is currently running the
    idle task, regardless of what the near future might bring.

    This commit therefore switches from idle_cpu() to "current->pid != 0".

    Reported-by: Wu Fengguang
    Suggested-by: Carsten Emde
    Signed-off-by: Paul E. McKenney
    Acked-by: Steven Rostedt
    Tested-by: Wu Fengguang
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, RCU does not permit a CPU to enter dyntick-idle mode if that
    CPU has any RCU callbacks queued. This means that workloads for which
    each CPU wakes up and does some RCU updates every few ticks will never
    enter dyntick-idle mode. This can result in significant unnecessary power
    consumption, so this patch permits a given to enter dyntick-idle mode if
    it has callbacks, but only if that same CPU has completed all current
    work for the RCU core. We determine use rcu_pending() to determine
    whether a given CPU has completed all current work for the RCU core.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The current code just complains if the current task is not the idle task.
    This commit therefore adds printing of the identity of the idle task.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The trace_rcu_dyntick() trace event did not print both the old and
    the new value of the nesting level, and furthermore printed only
    the low-order 32 bits of it. This could result in some confusion
    when interpreting trace-event dumps, so this commit prints both
    the old and the new value, prints the full 64 bits, and also selects
    the process-entry/exit increment to print nicely in hexadecimal.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • On the irq exit path, tick_nohz_irq_exit()
    may raise a softirq, which action leads to the wake up
    path and select_task_rq_fair() that makes use of rcu
    to iterate the domains.

    This is an illegal use of RCU because we may be in RCU
    extended quiescent state if we interrupted an RCU-idle
    window in the idle loop:

    [ 132.978883] ===============================
    [ 132.978883] [ INFO: suspicious RCU usage. ]
    [ 132.978883] -------------------------------
    [ 132.978883] kernel/sched_fair.c:1707 suspicious rcu_dereference_check() usage!
    [ 132.978883]
    [ 132.978883] other info that might help us debug this:
    [ 132.978883]
    [ 132.978883]
    [ 132.978883] rcu_scheduler_active = 1, debug_locks = 0
    [ 132.978883] RCU used illegally from extended quiescent state!
    [ 132.978883] 2 locks held by swapper/0:
    [ 132.978883] #0: (&p->pi_lock){-.-.-.}, at: [] try_to_wake_up+0x39/0x2f0
    [ 132.978883] #1: (rcu_read_lock){.+.+..}, at: [] select_task_rq_fair+0x6a/0xec0
    [ 132.978883]
    [ 132.978883] stack backtrace:
    [ 132.978883] Pid: 0, comm: swapper Tainted: G W 3.0.0+ #178
    [ 132.978883] Call Trace:
    [ 132.978883] [] lockdep_rcu_suspicious+0xe6/0x100
    [ 132.978883] [] select_task_rq_fair+0x749/0xec0
    [ 132.978883] [] ? select_task_rq_fair+0x6a/0xec0
    [ 132.978883] [] ? do_raw_spin_lock+0x54/0x150
    [ 132.978883] [] ? trace_hardirqs_on+0xd/0x10
    [ 132.978883] [] try_to_wake_up+0xd3/0x2f0
    [ 132.978883] [] ? ktime_get+0x68/0xf0
    [ 132.978883] [] wake_up_process+0x15/0x20
    [ 132.978883] [] raise_softirq_irqoff+0x65/0x110
    [ 132.978883] [] __hrtimer_start_range_ns+0x415/0x5a0
    [ 132.978883] [] ? do_raw_spin_unlock+0x5e/0xb0
    [ 132.978883] [] hrtimer_start+0x18/0x20
    [ 132.978883] [] tick_nohz_stop_sched_tick+0x393/0x450
    [ 132.978883] [] irq_exit+0xd2/0x100
    [ 132.978883] [] do_IRQ+0x66/0xe0
    [ 132.978883] [] common_interrupt+0x13/0x13
    [ 132.978883] [] ? native_safe_halt+0xb/0x10
    [ 132.978883] [] ? trace_hardirqs_on+0xd/0x10
    [ 132.978883] [] default_idle+0xba/0x370
    [ 132.978883] [] amd_e400_idle+0x5e/0x130
    [ 132.978883] [] cpu_idle+0xb6/0x120
    [ 132.978883] [] rest_init+0xef/0x150
    [ 132.978883] [] ? rest_init+0x52/0x150
    [ 132.978883] [] start_kernel+0x3da/0x3e5
    [ 132.978883] [] x86_64_start_reservations+0x131/0x135
    [ 132.978883] [] x86_64_start_kernel+0x103/0x112

    Fix this by calling rcu_idle_enter() after tick_nohz_irq_exit().

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Frederic Weisbecker
     
  • It is assumed that rcu won't be used once we switch to tickless
    mode and until we restart the tick. However this is not always
    true, as in x86-64 where we dereference the idle notifiers after
    the tick is stopped.

    To prepare for fixing this, add two new APIs:
    tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().

    If no use of RCU is made in the idle loop between
    tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
    must instead call the new *_norcu() version such that the arch doesn't
    need to call rcu_idle_enter() and rcu_idle_exit().

    Otherwise the arch must call tick_nohz_enter_idle() and
    tick_nohz_exit_idle() and also call explicitly:

    - rcu_idle_enter() after its last use of RCU before the CPU is put
    to sleep.
    - rcu_idle_exit() before the first use of RCU after the CPU is woken
    up.

    Signed-off-by: Frederic Weisbecker
    Cc: Mike Frysinger
    Cc: Guan Xuetao
    Cc: David Miller
    Cc: Chris Metcalf
    Cc: Hans-Christian Egtvedt
    Cc: Ralf Baechle
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Russell King
    Cc: Paul Mackerras
    Cc: Heiko Carstens
    Cc: Paul Mundt
    Signed-off-by: Paul E. McKenney

    Frederic Weisbecker
     
  • The tick_nohz_stop_sched_tick() function, which tries to delay
    the next timer tick as long as possible, can be called from two
    places:

    - From the idle loop to start the dytick idle mode
    - From interrupt exit if we have interrupted the dyntick
    idle mode, so that we reprogram the next tick event in
    case the irq changed some internal state that requires this
    action.

    There are only few minor differences between both that
    are handled by that function, driven by the ts->inidle
    cpu variable and the inidle parameter. The whole guarantees
    that we only update the dyntick mode on irq exit if we actually
    interrupted the dyntick idle mode, and that we enter in RCU extended
    quiescent state from idle loop entry only.

    Split this function into:

    - tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
    dynticks idle mode unconditionally if it can, and enters into RCU
    extended quiescent state.

    - tick_nohz_irq_exit() which only updates the dynticks idle mode
    when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).

    To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
    into tick_nohz_idle_exit().

    This simplifies the code and micro-optimize the irq exit path (no need
    for local_irq_save there). This also prepares for the split between
    dynticks and rcu extended quiescent state logics. We'll need this split to
    further fix illegal uses of RCU in extended quiescent states in the idle
    loop.

    Signed-off-by: Frederic Weisbecker
    Cc: Mike Frysinger
    Cc: Guan Xuetao
    Cc: David Miller
    Cc: Chris Metcalf
    Cc: Hans-Christian Egtvedt
    Cc: Ralf Baechle
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Russell King
    Cc: Paul Mackerras
    Cc: Heiko Carstens
    Cc: Paul Mundt
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Frederic Weisbecker
     
  • Inform the user if an RCU usage error is detected by lockdep while in
    an extended quiescent state (in this case, the RCU-free window in idle).
    This is accomplished by adding a line to the RCU lockdep splat indicating
    whether or not the splat occurred in extended quiescent state.

    Uses of RCU from within extended quiescent state mode are totally ignored
    by RCU, hence the importance of this diagnostic.

    Signed-off-by: Frederic Weisbecker
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Lai Jiangshan
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Frederic Weisbecker
     
  • Report that none of the rcu read lock maps are held while in an RCU
    extended quiescent state (the section between rcu_idle_enter()
    and rcu_idle_exit()). This helps detect any use of rcu_dereference()
    and friends from within the section in idle where RCU is not allowed.

    This way we can guarantee an extended quiescent window where the CPU
    can be put in dyntick idle mode or can simply aoid to be part of any
    global grace period completion while in the idle loop.

    Uses of RCU from such mode are totally ignored by RCU, hence the
    importance of these checks.

    Signed-off-by: Frederic Weisbecker
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Lai Jiangshan
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Frederic Weisbecker
     
  • Empty void functions do not need "return", so this commit removes it
    from rcu_report_exp_rnp().

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Paul E. McKenney

    Thomas Gleixner
     
  • When setting up an expedited grace period, if there were no readers, the
    task will awaken itself. This commit removes this useless self-awakening.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Paul E. McKenney

    Thomas Gleixner
     
  • Because rcu_is_cpu_idle() is to be used to check for extended quiescent
    states in RCU-preempt read-side critical sections, it cannot assume that
    preemption is disabled. And preemption must be disabled when accessing
    the dyntick-idle state, because otherwise the following sequence of events
    could occur:

    1. Task A on CPU 1 enters rcu_is_cpu_idle() and picks up the pointer
    to CPU 1's per-CPU variables.

    2. Task B preempts Task A and starts running on CPU 1.

    3. Task A migrates to CPU 2.

    4. Task B blocks, leaving CPU 1 idle.

    5. Task A continues execution on CPU 2, accessing CPU 1's dyntick-idle
    information using the pointer fetched in step 1 above, and finds
    that CPU 1 is idle.

    6. Task A therefore incorrectly concludes that it is executing in
    an extended quiescent state, possibly issuing a spurious splat.

    Therefore, this commit disables preemption within the rcu_is_cpu_idle()
    function.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Trace the rcutorture RCU accesses and dump the trace buffer when the
    first failure is detected.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Add an EXPORT_SYMBOL_GPL() so that rcutorture can dump the trace buffer
    upon detection of an RCU error.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Earlier versions of RCU used the scheduling-clock tick to detect idleness
    by checking for the idle task, but handled idleness differently for
    CONFIG_NO_HZ=y. But there are now a number of uses of RCU read-side
    critical sections in the idle task, for example, for tracing. A more
    fine-grained detection of idleness is therefore required.

    This commit presses the old dyntick-idle code into full-time service,
    so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
    always invoked at the beginning of an idle loop iteration. Similarly,
    rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
    at the end of an idle-loop iteration. This allows the idle task to
    use RCU everywhere except between consecutive rcu_idle_enter() and
    rcu_idle_exit() calls, in turn allowing architecture maintainers to
    specify exactly where in the idle loop that RCU may be used.

    Because some of the userspace upcall uses can result in what looks
    to RCU like half of an interrupt, it is not possible to expect that
    the irq_enter() and irq_exit() hooks will give exact counts. This
    patch therefore expands the ->dynticks_nesting counter to 64 bits
    and uses two separate bitfields to count process/idle transitions
    and interrupt entry/exit transitions. It is presumed that userspace
    upcalls do not happen in the idle loop or from usermode execution
    (though usermode might do a system call that results in an upcall).
    The counter is hard-reset on each process/idle transition, which
    avoids the interrupt entry/exit error from accumulating. Overflow
    is avoided by the 64-bitness of the ->dyntick_nesting counter.

    This commit also adds warnings if a non-idle task asks RCU to enter
    idle state (and these checks will need some adjustment before applying
    Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
    In addition, validation of ->dynticks and ->dynticks_nesting is added.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • When synchronize_sched_expedited() takes its second and subsequent
    snapshots of sync_sched_expedited_started, it subtracts 1. This
    means that the concurrent caller of synchronize_sched_expedited()
    that incremented to that value sees our successful completion, it
    will not be able to take advantage of it. This restriction is
    pointless, given that our full expedited grace period would have
    happened after the other guy started, and thus should be able to
    serve as a proxy for the other guy successfully executing
    try_stop_cpus().

    This commit therefore removes the subtraction of 1.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Because rcu_read_unlock_special() samples rcu_preempted_readers_exp(rnp)
    after dropping rnp->lock, the following sequence of events is possible:

    1. Task A exits its RCU read-side critical section, and removes
    itself from the ->blkd_tasks list, releases rnp->lock, and is
    then preempted. Task B remains on the ->blkd_tasks list, and
    blocks the current expedited grace period.

    2. Task B exits from its RCU read-side critical section and removes
    itself from the ->blkd_tasks list. Because it is the last task
    blocking the current expedited grace period, it ends that
    expedited grace period.

    3. Task A resumes, and samples rcu_preempted_readers_exp(rnp) which
    of course indicates that nothing is blocking the nonexistent
    expedited grace period. Task A is again preempted.

    4. Some other CPU starts an expedited grace period. There are several
    tasks blocking this expedited grace period queued on the
    same rcu_node structure that Task A was using in step 1 above.

    5. Task A examines its state and incorrectly concludes that it was
    the last task blocking the expedited grace period on the current
    rcu_node structure. It therefore reports completion up the
    rcu_node tree.

    6. The expedited grace period can then incorrectly complete before
    the tasks blocked on this same rcu_node structure exit their
    RCU read-side critical sections. Arbitrarily bad things happen.

    This commit therefore takes a snapshot of rcu_preempted_readers_exp(rnp)
    prior to dropping the lock, so that only the last task thinks that it is
    the last task, thus avoiding the failure scenario laid out above.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The ->signaled field was named before complications in the form of
    dyntick-idle mode and offlined CPUs. These complications have required
    that force_quiescent_state() be implemented as a state machine, instead
    of simply unconditionally sending reschedule IPIs. Therefore, this
    commit renames ->signaled to ->fqs_state to catch up with the new
    force_quiescent_state() reality.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

10 Dec, 2011

1 commit


09 Dec, 2011

3 commits


07 Dec, 2011

3 commits

  • perf_event_sched_in() shouldn't try to schedule task events if there
    are none otherwise task's ctx->is_active will be set and will not be
    cleared during sched_out. This will prevent newly added events from
    being scheduled into the task context.

    Fixes a boo-boo in commit 1d5f003f5a9 ("perf: Do not set task_ctx
    pointer in cpuctx if there are no events in the context").

    Signed-off-by: Gleb Natapov
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20111122140821.GF2557@redhat.com
    Signed-off-by: Ingo Molnar

    Gleb Natapov
     
  • * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    ftrace: Fix hash record accounting bug
    perf: Fix parsing of __print_flags() in TP_printk()
    jump_label: jump_label_inc may return before the code is patched
    ftrace: Remove force undef config value left for testing
    tracing: Restore system filter behavior
    tracing: fix event_subsystem ref counting

    Linus Torvalds
     
  • Since commit f59de89 ("lockdep: Clear whole lockdep_map on initialization"),
    lockdep_init_map() will clear all the struct. But it will break
    lock_set_class()/lock_set_subclass(). A typical race condition
    is like below:

    CPU A CPU B
    lock_set_subclass(lockA);
    lock_set_class(lockA);
    lockdep_init_map(lockA);
    /* lockA->name is cleared */
    memset(lockA);
    __lock_acquire(lockA);
    /* lockA->class_cache[] is cleared */
    register_lock_class(lockA);
    look_up_lock_class(lockA);
    WARN_ON_ONCE(class->name !=
    lock->name);

    lock->name = name;

    So restore to what we have done before commit f59de89 but annotate
    ->lock with kmemcheck_mark_initialized() to suppress the kmemcheck
    warning reported in commit f59de89.

    Reported-by: Sergey Senozhatsky
    Reported-by: Borislav Petkov
    Suggested-by: Vegard Nossum
    Signed-off-by: Yong Zhang
    Cc: Tejun Heo
    Cc: David Rientjes
    Cc:
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20111109080451.GB8124@zhy
    Signed-off-by: Ingo Molnar

    Yong Zhang
     

06 Dec, 2011

8 commits

  • The expiry function compares the timer against current time and does
    not expire the timer when the expiry time is >= now. That's wrong. If
    the timer is set for now, then it must expire.

    Make the condition expiry > now for breaking out the loop.

    Signed-off-by: Thomas Gleixner
    Acked-by: John Stultz
    Cc: stable@kernel.org

    Thomas Gleixner
     
  • * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf: Fix loss of notification with multi-event
    perf, x86: Force IBS LVT offset assignment for family 10h
    perf, x86: Disable PEBS on SandyBridge chips
    trace_events_filter: Use rcu_assign_pointer() when setting ftrace_event_call->filter
    perf session: Fix crash with invalid CPU list
    perf python: Fix undefined symbol problem
    perf/x86: Enable raw event access to Intel offcore events
    perf: Don't use -ENOSPC for out of PMU resources
    perf: Do not set task_ctx pointer in cpuctx if there are no events in the context
    perf/x86: Fix PEBS instruction unwind
    oprofile, x86: Fix crash when unloading module (nmi timer mode)
    oprofile: Fix crash when unloading module (hr timer mode)

    Linus Torvalds
     
  • * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    clockevents: Set noop handler in clockevents_exchange_device()
    tick-broadcast: Stop active broadcast device when replacing it
    clocksource: Fix bug with max_deferment margin calculation
    rtc: Fix some bugs that allowed accumulating time drift in suspend/resume
    rtc: Disable the alarm in the hardware

    Linus Torvalds
     
  • …ernel.org/pub/scm/linux/kernel/git/tip/tip

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    slab, lockdep: Fix silly bug

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    genirq: Fix race condition when stopping the irq thread

    Linus Torvalds
     
  • * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched, x86: Avoid unnecessary overflow in sched_clock
    sched: Fix buglet in return_cfs_rq_runtime()
    sched: Avoid SMT siblings in select_idle_sibling() if possible
    sched: Set the command name of the idle tasks in SMP kernels
    sched, rt: Provide means of disabling cross-cpu bandwidth sharing
    sched: Document wait_for_completion_*() return values
    sched_fair: Fix a typo in the comment describing update_sd_lb_stats
    sched: Add a comment to effective_load() since it's a pain

    Linus Torvalds
     
  • If the set_ftrace_filter is cleared by writing just whitespace to
    it, then the filter hash refcounts will be decremented but not
    updated. This causes two bugs:

    1) No functions will be enabled for tracing when they all should be

    2) If the users clears the set_ftrace_filter twice, it will crash ftrace:

    ------------[ cut here ]------------
    WARNING: at /home/rostedt/work/git/linux-trace.git/kernel/trace/ftrace.c:1384 __ftrace_hash_rec_update.part.27+0x157/0x1a7()
    Modules linked in:
    Pid: 2330, comm: bash Not tainted 3.1.0-test+ #32
    Call Trace:
    [] warn_slowpath_common+0x83/0x9b
    [] warn_slowpath_null+0x1a/0x1c
    [] __ftrace_hash_rec_update.part.27+0x157/0x1a7
    [] ? ftrace_regex_release+0xa7/0x10f
    [] ? kfree+0xe5/0x115
    [] ftrace_hash_move+0x2e/0x151
    [] ftrace_regex_release+0xba/0x10f
    [] fput+0xfd/0x1c2
    [] filp_close+0x6d/0x78
    [] sys_dup3+0x197/0x1c1
    [] sys_dup2+0x4f/0x54
    [] system_call_fastpath+0x16/0x1b
    ---[ end trace 77a3a7ee73794a02 ]---

    Link: http://lkml.kernel.org/r/20111101141420.GA4918@debian

    Reported-by: Rabin Vincent
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • If cpu A calls jump_label_inc() just after atomic_add_return() is
    called by cpu B, atomic_inc_not_zero() will return value greater then
    zero and jump_label_inc() will return to a caller before jump_label_update()
    finishes its job on cpu B.

    Link: http://lkml.kernel.org/r/20111018175551.GH17571@redhat.com

    Cc: stable@vger.kernel.org
    Cc: Peter Zijlstra
    Acked-by: Jason Baron
    Signed-off-by: Gleb Natapov
    Signed-off-by: Steven Rostedt

    Gleb Natapov
     
  • A forced undef of a config value was used for testing and was
    accidently left in during the final commit. This causes x86 to
    run slower than needed while running function tracing as well
    as causes the function graph selftest to fail when DYNMAIC_FTRACE
    is not set. This is because the code in MCOUNT expects the ftrace
    code to be processed with the config value set that happened to
    be forced not set.

    The forced config option was left in by:
    commit 6331c28c962561aee59e5a493b7556a4bb585957
    ftrace: Fix dynamic selftest failure on some archs

    Link: http://lkml.kernel.org/r/20111102150255.GA6973@debian

    Cc: stable@vger.kernel.org
    Reported-by: Rabin Vincent
    Signed-off-by: Steven Rostedt

    Steven Rostedt