10 Nov, 2009

3 commits

  • Impose a clear locking design on the rcu_process_gp_end()
    function's use of the ->completed counter. This is done by
    creating a ->completed field in the rcu_node structure, which
    can safely be accessed under the protection of that structure's
    lock. Performance and scalability are maintained by using a
    form of double-checked locking, so that rcu_process_gp_end()
    only acquires the leaf rcu_node structure's ->lock if a grace
    period has recently ended.

    This fix reduces rcutorture failure rate by at least two orders
    of magnitude under heavy stress with force_quiescent_state()
    being invoked artificially often. Without this fix,
    unsynchronized access to the ->completed field can cause
    rcu_process_gp_end() to advance callbacks whose grace period has
    not yet expired. (Bad idea!)

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: # .32.x
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Impose a clear locking design on non-NO_HZ handling of the
    ->completed counter. This increases the distance between the
    RCU and the CPU-hotplug mechanisms.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: # .32.x
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Merge reason: Pick up RCU fixlet to base further commits on.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

02 Nov, 2009

3 commits

  • Currently, rcu_irq_exit() is invoked only for CONFIG_NO_HZ,
    while rcu_irq_enter() is invoked unconditionally. This patch
    moves rcu_irq_exit() out from under CONFIG_NO_HZ so that the
    calls are balanced.

    This patch has no effect on the behavior of the kernel because
    both rcu_irq_enter() and rcu_irq_exit() are empty for
    !CONFIG_NO_HZ, but the code is easier to understand if the calls
    are obviously balanced in all cases.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Lai Jiangshan
     
  • Very long RCU read-side critical sections (50 milliseconds or
    so) can cause a race between force_quiescent_state() and
    rcu_start_gp() as follows on kernel builds with multi-level
    rcu_node hierarchies:

    1. CPU 0 calls force_quiescent_state(), sees that there is a
    grace period in progress, and acquires ->fsqlock.

    2. CPU 1 detects the end of the grace period, and so
    cpu_quiet_msk_finish() sets rsp->completed to rsp->gpnum.
    This operation is carried out under the root rnp->lock,
    but CPU 0 has not yet acquired that lock. Note that
    rsp->signaled is still RCU_SAVE_DYNTICK from the last
    grace period.

    3. CPU 1 calls rcu_start_gp(), but no one wants a new grace
    period, so it drops the root rnp->lock and returns.

    4. CPU 0 acquires the root rnp->lock and picks up rsp->completed
    and rsp->signaled, then drops rnp->lock. It then enters the
    RCU_SAVE_DYNTICK leg of the switch statement.

    5. CPU 2 invokes call_rcu(), and now needs a new grace period.
    It calls rcu_start_gp(), which acquires the root rnp->lock, sets
    rsp->signaled to RCU_GP_INIT (too bad that CPU 0 is already in
    the RCU_SAVE_DYNTICK leg of the switch statement!) and starts
    initializing the rcu_node hierarchy. If there are multiple
    levels to the hierarchy, it will drop the root rnp->lock and
    initialize the lower levels of the hierarchy.

    6. CPU 0 notes that rsp->completed has not changed, which permits
    both CPU 2 and CPU 0 to try updating it concurrently. If CPU 0's
    update prevails, later calls to force_quiescent_state() can
    count old quiescent states against the new grace period, which
    can in turn result in premature ending of grace periods.

    Not good.

    This patch adds an RCU_GP_IDLE state for rsp->signaled that is
    set initially at boot time and any time a grace period ends.
    This prevents CPU 0 from getting into the workings of
    force_quiescent_state() in step 4. Additional locking and
    checks prevent the concurrent update of rsp->signaled in step 6.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Ingo triggered the following warning:

    WARNING: at lib/debugobjects.c:255 debug_print_object+0x42/0x50()
    Hardware name: System Product Name
    ODEBUG: init active object type: timer_list
    Modules linked in:
    Pid: 2619, comm: dmesg Tainted: G W 2.6.32-rc5-tip+ #5298
    Call Trace:
    [] warn_slowpath_common+0x6a/0x81
    [] ? debug_print_object+0x42/0x50
    [] warn_slowpath_fmt+0x29/0x2c
    [] debug_print_object+0x42/0x50
    [] __debug_object_init+0x279/0x2d7
    [] debug_object_init+0x13/0x18
    [] init_timer_key+0x17/0x6f
    [] free_uid+0x50/0x6c
    [] put_cred_rcu+0x61/0x72
    [] rcu_do_batch+0x70/0x121

    debugobjects warns about an enqueued timer being initialized. If
    CONFIG_USER_SCHED=y the user management code uses delayed work to
    remove the user from the hash table and tear down the sysfs objects.

    free_uid is called from RCU and initializes/schedules delayed work if
    the usage count of the user_struct is 0. The init/schedule happens
    outside of the uidhash_lock protected region which allows a concurrent
    caller of find_user() to reference the about to be destroyed
    user_struct w/o preventing the work from being scheduled. If the next
    free_uid call happens before the work timer expired then the active
    timer is initialized and the work scheduled again.

    The race was introduced in commit 5cb350ba (sched: group scheduling,
    sysfs tunables) and made more prominent by commit 3959214f (sched:
    delayed cleanup of user_struct)

    Move the init/schedule_delayed_work inside of the uidhash_lock
    protected region to prevent the race.

    Signed-off-by: Thomas Gleixner
    Acked-by: Dhaval Giani
    Cc: Paul E. McKenney
    Cc: Kay Sievers
    Cc: stable@kernel.org

    Thomas Gleixner
     

29 Oct, 2009

1 commit

  • The requeue_pi path doesn't use unqueue_me() (and the racy lock_ptr ==
    NULL test) nor does it use the wake_list of futex_wake() which where
    the reason for commit 41890f2 (futex: Handle spurious wake up)

    See debugging discussing on LKML Message-ID:

    The changes in this fix to the wait_requeue_pi path were considered to
    be a likely unecessary, but harmless safety net. But it turns out that
    due to the fact that for unknown $@#!*( reasons EWOULDBLOCK is defined
    as EAGAIN we built an endless loop in the code path which returns
    correctly EWOULDBLOCK.

    Spurious wakeups in wait_requeue_pi code path are unlikely so we do
    the easy solution and return EWOULDBLOCK^WEAGAIN to user space and let
    it deal with the spurious wakeup.

    Cc: Darren Hart
    Cc: Peter Zijlstra
    Cc: Eric Dumazet
    Cc: John Stultz
    Cc: Dinakar Guniguntala
    LKML-Reference:
    Cc: stable@kernel.org
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

27 Oct, 2009

2 commits

  • Some compilers are happy with "#elif CONFIG_RCU_TINY", while
    others strongly prefer "#elif defined(CONFIG_RCU_TINY)". Change
    to the latter to make more compilers happy.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Use lockdep_set_class() to simplify the code and to avoid any
    additional overhead in the !LOCKDEP case. Also move the
    definition of rcu_root_class into kernel/rcutree.c, as suggested
    by Lai Jiangshan.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

26 Oct, 2009

6 commits

  • No change in functionality - just straighten out a few small
    stylistic details.

    Cc: Paul E. McKenney
    Cc: David Howells
    Cc: Josh Triplett
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: avi@redhat.com
    Cc: mtosatti@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Make rcutorture list the available torture_type values when it
    doesn't like the one specified.

    Signed-off-by: Paul E. McKenney
    Acked-by: Josh Triplett
    Reviewed-by: Lai Jiangshan
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: avi@redhat.com
    Cc: mtosatti@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Signed-off-by: Paul E. McKenney
    Acked-by: Josh Triplett
    Reviewed-by: Lai Jiangshan
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: avi@redhat.com
    Cc: mtosatti@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Adds the "srcu_expedited" torture type, and also renames
    sched_ops_sync to sched_sync_ops for consistency while we are in
    this file.

    Signed-off-by: Paul E. McKenney
    Acked-by: Josh Triplett
    Reviewed-by: Lai Jiangshan
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: avi@redhat.com
    Cc: mtosatti@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • This patch creates a synchronize_srcu_expedited() that uses
    synchronize_sched_expedited() where synchronize_srcu()
    uses synchronize_sched(). The synchronize_srcu() and
    synchronize_srcu_expedited() functions become one-liners that
    pass synchronize_sched() or synchronize_sched_expedited(),
    repectively, to a new __synchronize_srcu() function.

    While in the file, move the EXPORT_SYMBOL_GPL()s to immediately
    follow the corresponding functions.

    Requested-by: Avi Kivity
    Tested-by: Marcelo Tosatti
    Signed-off-by: Paul E. McKenney
    Acked-by: Josh Triplett
    Reviewed-by: Lai Jiangshan
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: avi@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • This patch is a version of RCU designed for !SMP provided for a
    small-footprint RCU implementation. In particular, the
    implementation of synchronize_rcu() is extremely lightweight and
    high performance. It passes rcutorture testing in each of the
    four relevant configurations (combinations of NO_HZ and PREEMPT)
    on x86. This saves about 1K bytes compared to old Classic RCU
    (which is no longer in mainline), and more than three kilobytes
    compared to Hierarchical RCU (updated to 2.6.30):

    CONFIG_TREE_RCU:

    text data bss dec filename
    183 4 0 187 kernel/rcupdate.o
    2783 520 36 3339 kernel/rcutree.o
    3526 Total (vs 4565 for v7)

    CONFIG_TREE_PREEMPT_RCU:

    text data bss dec filename
    263 4 0 267 kernel/rcupdate.o
    4594 776 52 5422 kernel/rcutree.o
    5689 Total (6155 for v7)

    CONFIG_TINY_RCU:

    text data bss dec filename
    96 4 0 100 kernel/rcupdate.o
    734 24 0 758 kernel/rcutiny.o
    858 Total (vs 848 for v7)

    The above is for x86. Your mileage may vary on other platforms.
    Further compression is possible, but is being procrastinated.

    Changes from v7 (http://lkml.org/lkml/2009/10/9/388)

    o Apply Lai Jiangshan's review comments (aside from
    might_sleep() in synchronize_sched(), which is covered by SMP builds).

    o Fix up expedited primitives.

    Changes from v6 (http://lkml.org/lkml/2009/9/23/293).

    o Forward ported to put it into the 2.6.33 stream.

    o Added lockdep support.

    o Make lightweight rcu_barrier.

    Changes from v5 (http://lkml.org/lkml/2009/6/23/12).

    o Ported to latest pre-2.6.32 merge window kernel.

    - Renamed rcu_qsctr_inc() to rcu_sched_qs().
    - Renamed rcu_bh_qsctr_inc() to rcu_bh_qs().
    - Provided trivial rcu_cpu_notify().
    - Provided trivial exit_rcu().
    - Provided trivial rcu_needs_cpu().
    - Fixed up the rcu_*_enter/exit() functions in linux/hardirq.h.

    o Removed the dependence on EMBEDDED, with a view to making
    TINY_RCU default for !SMP at some time in the future.

    o Added (trivial) support for expedited grace periods.

    Changes from v4 (http://lkml.org/lkml/2009/5/2/91) include:

    o Squeeze the size down a bit further by removing the
    ->completed field from struct rcu_ctrlblk.

    o This permits synchronize_rcu() to become the empty function.
    Previous concerns about rcutorture were unfounded, as
    rcutorture correctly handles a constant value from
    rcu_batches_completed() and rcu_batches_completed_bh().

    Changes from v3 (http://lkml.org/lkml/2009/3/29/221) include:

    o Changed rcu_batches_completed(), rcu_batches_completed_bh()
    rcu_enter_nohz(), rcu_exit_nohz(), rcu_nmi_enter(), and
    rcu_nmi_exit(), to be static inlines, as suggested by David
    Howells. Doing this saves about 100 bytes from rcutiny.o.
    (The numbers between v3 and this v4 of the patch are not directly
    comparable, since they are against different versions of Linux.)

    Changes from v2 (http://lkml.org/lkml/2009/2/3/333) include:

    o Fix whitespace issues.

    o Change short-circuit "||" operator to instead be "+" in order
    to fix performance bug noted by "kraai" on LWN.

    (http://lwn.net/Articles/324348/)

    Changes from v1 (http://lkml.org/lkml/2009/1/13/440) include:

    o This version depends on EMBEDDED as well as !SMP, as suggested
    by Ingo.

    o Updated rcu_needs_cpu() to unconditionally return zero,
    permitting the CPU to enter dynticks-idle mode at any time.
    This works because callbacks can be invoked upon entry to
    dynticks-idle mode.

    o Paul is now OK with this being included, based on a poll at
    the Kernel Miniconf at linux.conf.au, where about ten people said
    that they cared about saving 900 bytes on single-CPU systems.

    o Applies to both mainline and tip/core/rcu.

    Signed-off-by: Paul E. McKenney
    Acked-by: David Howells
    Acked-by: Josh Triplett
    Reviewed-by: Lai Jiangshan
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: avi@redhat.com
    Cc: mtosatti@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

16 Oct, 2009

2 commits

  • When requeuing tasks from one futex to another, the reference held
    by the requeued task to the original futex location needs to be
    dropped eventually.

    Dropping the reference may ultimately lead to a call to
    "iput_final" and subsequently call into filesystem- specific code -
    which may be non-atomic.

    It is therefore safer to defer this drop operation until after the
    futex_hash_bucket spinlock has been dropped.

    Originally-From: Helge Bahmann
    Signed-off-by: Darren Hart
    Cc:
    Cc: Peter Zijlstra
    Cc: Eric Dumazet
    Cc: Dinakar Guniguntala
    Cc: John Stultz
    Cc: Sven-Thorsten Dietrich
    Cc: John Kacur
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Darren Hart
     
  • If the following sequence of events occurs, then
    TREE_PREEMPT_RCU will hang waiting for a grace period to
    complete, eventually OOMing the system:

    o A TREE_PREEMPT_RCU build of the kernel is booted on a system
    with more than 64 physical CPUs present (32 on a 32-bit system).
    Alternatively, a TREE_PREEMPT_RCU build of the kernel is booted
    with RCU_FANOUT set to a sufficiently small value that the
    physical CPUs populate two or more leaf rcu_node structures.

    o A task is preempted in an RCU read-side critical section
    while running on a CPU corresponding to a given leaf rcu_node
    structure.

    o All CPUs corresponding to this same leaf rcu_node structure
    record quiescent states for the current grace period.

    o All of these same CPUs go offline (hence the need for enough
    physical CPUs to populate more than one leaf rcu_node structure).
    This causes the preempted task to be moved to the root rcu_node
    structure.

    At this point, there is nothing left to cause the quiescent
    state to be propagated up the rcu_node tree, so the current
    grace period never completes.

    The simplest fix, especially after considering the deadlock
    possibilities, is to detect this situation when the last CPU is
    offlined, and to set that CPU's ->qsmask bit in its leaf
    rcu_node structure. This will cause the next invocation of
    force_quiescent_state() to end the grace period.

    Without this fix, this hang can be triggered in an hour or so on
    some machines with rcutorture and random CPU onlining/offlining.
    With this fix, these same machines pass a full 10 hours of this
    sort of abuse.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

15 Oct, 2009

6 commits

  • Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: npiggin@suse.de
    Cc: jens.axboe@oracle.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • o Remove the CONFIG_PREEMPT_RCU documentation since this
    config option has now been removed.

    o Change the now-incorrect references to "rcu" labels to
    instead be "rcu_sched".

    o Add notes stating that CONFIG_TREE_PREEMPT_RCU kernels will
    have additional "rcu_preempt" output.

    o Note the new "oqlen" field in the rcuhier output (for
    RCU callbacks orphaned by an offlined CPU).

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: npiggin@suse.de
    Cc: jens.axboe@oracle.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: npiggin@suse.de
    Cc: jens.axboe@oracle.com
    Cc: Josh Triplett
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    kernel/rcutree_trace.c | 8 ++++++--
    1 file changed, 6 insertions(+), 2 deletions(-)

    Paul E. McKenney
     
  • For the short term, map synchronize_rcu_expedited() to
    synchronize_rcu() for TREE_PREEMPT_RCU and to
    synchronize_sched_expedited() for TREE_RCU.

    Longer term, there needs to be a real expedited grace period for
    TREE_PREEMPT_RCU, but candidate patches to date are considerably
    more complex and intrusive.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: npiggin@suse.de
    Cc: jens.axboe@oracle.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • As the number of callbacks on a given CPU rises, invoke
    force_quiescent_state() only every blimit number of callbacks
    (defaults to 10,000), and even then only if no other CPU has
    invoked force_quiescent_state() in the meantime.

    This should fix the performance regression reported by Nick.

    Reported-by: Nick Piggin
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: jens.axboe@oracle.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • If userspace tries to perform a requeue_pi on a non-requeue_pi waiter,
    it will find the futex_q->requeue_pi_key to be NULL and OOPS.

    Check for NULL in match_futex() instead of doing explicit NULL pointer
    checks on all call sites. While match_futex(NULL, NULL) returning
    false is a little odd, it's still correct as we expect valid key
    references.

    Signed-off-by: Darren Hart
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    CC: Eric Dumazet
    CC: Dinakar Guniguntala
    CC: John Stultz
    Cc: stable@kernel.org
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Darren Hart
     

14 Oct, 2009

1 commit

  • The futex code does not handle spurious wake up in futex_wait and
    futex_wait_requeue_pi.

    The code assumes that any wake up which was not caused by futex_wake /
    requeue or by a timeout was caused by a signal wake up and returns one
    of the syscall restart error codes.

    In case of a spurious wake up the signal delivery code which deals
    with the restart error codes is not invoked and we return that error
    code to user space. That causes applications which actually check the
    return codes to fail. Blaise reported that on preempt-rt a python test
    program run into a exception trap. -rt exposed that due to a built in
    spurious wake up accelerator :)

    Solve this by checking signal_pending(current) in the wake up path and
    handle the spurious wake up case w/o returning to user space.

    Reported-by: Blaise Gassend
    Debugged-by: Darren Hart
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: stable@kernel.org
    LKML-Reference:

    Thomas Gleixner
     

13 Oct, 2009

1 commit


10 Oct, 2009

2 commits

  • A race shouldn't happen since all workqueues or handlers are canceled
    or flushed before the event buffer is freed. A warning is triggered
    now if the buffer is freed too early.

    Also, this patch adds some comments about event buffer protection,
    reworks some code and adds code to clear buffer_pos during alloc and
    free of the event buffer.

    Cc: David Rientjes
    Cc: Stephane Eranian
    Signed-off-by: Robert Richter

    Robert Richter
     
  • Looking at the 2.6.31-rc9 code, it appears there is a race condition
    in the event_buffer cleanup code path (shutdown). This could lead to
    kernel panic as some CPUs may be operating on the event buffer AFTER
    it has been freed. The attached patch solves the problem and makes
    sure CPUs check if the buffer is not NULL before they access it as
    some may have been spinning on the mutex while the buffer was being
    freed.

    The race may happen if the buffer is freed during pending reads. But
    it is not clear why there are races in add_event_entry() since all
    workqueues or handlers are canceled or flushed before the event buffer
    is freed.

    Signed-off-by: David Rientjes
    Signed-off-by: Stephane Eranian
    Signed-off-by: Robert Richter

    David Rientjes
     

09 Oct, 2009

13 commits

  • Some tracepoint magic (TRACE_EVENT(lock_acquired)) relies on
    the fact that lock hold times are positive and uses div64 on
    that. That triggered a build warning on MIPS, and probably
    causes bad output in certain circumstances as well.

    Make it truly positive.

    Reported-by: Andrew Morton
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
    pata_atp867x: add Power Management support
    pata_atp867x: PIO support fixes
    pata_atp867x: clarifications in timings calculations and cable detection
    pata_atp867x: fix it to not claim MWDMA support
    libata: fix incorrect link online check during probe
    ahci: filter FPDMA non-zero offset enable for Aspire 3810T
    libata: make gtf_filter per-dev
    libata: implement more acpi filtering options
    libata: cosmetic updates
    ahci: display all AHCI 1.3 HBA capability flags (v2)
    pata_ali: trivial fix of a very frequent spelling mistake
    ahci: disable 64bit DMA by default on SB600s

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    futex: fix requeue_pi key imbalance
    futex: Fix typo in FUTEX_WAIT/WAKE_BITSET_PRIVATE definitions
    rcu: Place root rcu_node structure in separate lockdep class
    rcu: Make hot-unplugged CPU relinquish its own RCU callbacks
    rcu: Move rcu_barrier() to rcutree
    futex: Move exit_pi_state() call to release_mm()
    futex: Nullify robust lists after cleanup
    futex: Fix locking imbalance
    panic: Fix panic message visibility by calling bust_spinlocks(0) before dying
    rcu: Replace the rcu_barrier enum with pointer to call_rcu*() function
    rcu: Clean up code based on review feedback from Josh Triplett, part 4
    rcu: Clean up code based on review feedback from Josh Triplett, part 3
    rcu: Fix rcu_lock_map build failure on CONFIG_PROVE_LOCKING=y
    rcu: Clean up code to address Ingo's checkpatch feedback
    rcu: Clean up code based on review feedback from Josh Triplett, part 2
    rcu: Clean up code based on review feedback from Josh Triplett

    Linus Torvalds
     
  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: Set correct normal_prio and prio values in sched_fork()

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, pci: Correct spelling in a comment
    x86: Simplify bound checks in the MTRR code
    x86: EDAC: carve out AMD MCE decoding logic
    initcalls: Add early_initcall() for modules
    x86: EDAC: MCE: Fix MCE decoding callback logic

    Linus Torvalds
     
  • …nel/git/tip/linux-2.6-tip

    * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tracing: user local buffer variable for trace branch tracer
    tracing: fix warning on kernel/trace/trace_branch.c andtrace_hw_branches.c
    ftrace: check for failure for all conversions
    tracing: correct module boundaries for ftrace_release
    tracing: fix transposed numbers of lock_depth and preempt_count
    trace: Fix missing assignment in trace_ctxwake_*
    tracing: Use free_percpu instead of kfree
    tracing: Check total refcount before releasing bufs in profile_enable failure

    Linus Torvalds
     
  • …/linux/kernel/git/tip/linux-2.6-tip

    * 'sparc-perf-events-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    mm, perf_event: Make vmalloc_user() align base kernel virtual address to SHMLBA
    perf_event: Provide vmalloc() based mmap() backing

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf_events: Make ABI definitions available to userspace
    perf tools: elf_sym__is_function() should accept "zero" sized functions
    tracing/syscalls: Use long for syscall ret format and field definitions
    perf trace: Update eval_flag() flags array to match interrupt.h
    perf trace: Remove unused code in builtin-trace.c
    perf: Propagate term signal to child

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, timers: Check for pending timers after (device) interrupts
    NOHZ: update idle state also when NOHZ is inactive

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
    ALSA: ice1724: increase SPDIF and independent stereo buffer sizes
    ALSA: opl3: circular locking in the snd_opl3_note_on() and snd_opl3_note_off()
    ALSA: ICE1712/24 - Change the Multi Track Peak control (level meters) from MIXER to PCM type
    ALSA: hda - Fix yet another auto-mic bug in ALC268
    ASoC: WM8350 capture PGA mutes are inverted
    ASoC: Remove absent SYNC and TDM DAI format options from i.MX SSI
    sound: via82xx: move DXS volume controls to PCM interface
    ALSA: hda - Don't pick up invalid HP pins in alc_subsystem_id()
    ALSA: hda - Add a workaround for ASUS A7K
    ALSA: hda - Fix invalid initializations for ALC861 auto mode
    ASoC: wm8940: Fix check on error code form snd_soc_codec_set_cache_io
    ASoC: Fix SND_SOC_DAPM_LINE handling

    Linus Torvalds
     
  • * 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (24 commits)
    drm/radeon/kms: fix vline register for second head.
    drm/r600: avoid assigning vb twice in blit code
    drm/radeon: use list_for_each_entry instead of list_for_each
    drm/radeon/kms: Fix AGP support for R600/RV770 family (v2)
    drm/radeon/kms: Fallback to non AGP when acceleration fails to initialize (v2)
    drm/radeon/kms: Fix RS600/RV515/R520/RS690 IRQ
    drm/radeon: Fix setting of bits
    drm/ttm: fix refcounting in ttm global code.
    drm/fb: add more correct 8/16/24/32 bpp fb support.
    drm/fb: add setcmap and fix 8-bit support.
    drm/radeon/kms: respect single crtc cards, only create one crtc. (v2)
    drm: Delete the DRM_DEBUG_KMS in drm_mode_cursor_ioctl
    drm/radeon/kms: add support for "Surround View"
    drm/radeon/kms: Fix irq handling on AVIVO hw
    drm/radeon/kms: R600/RV770 remove dead code and print message for wrong BIOS
    drm/radeon/kms: Fix R600/RV770 disable acceleration path
    drm/radeon/kms: Fix R600/RV770 startup path & reset
    drm/radeon/kms: Fix R600 write back buffer
    drm/radeon/kms: Remove old init path as no hw use it anymore
    drm/radeon/kms: Convert RS600 to new init path
    ...

    Linus Torvalds
     
  • …/git/tmlind/linux-omap-2.6

    * 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
    omapfb: Blizzard: constify register address tables
    omapfb: Blizzard: fix pointer to be const
    omapfb: Condition mutex acquisition
    omap: iovmm: Add missing mutex_unlock
    omap: iovmm: Fix incorrect spelling
    omap: SRAM: flush the right address after memcpy in omap_sram_push
    omap: Lock DPLL5 at boot
    omap: Fix incorrect 730 vs 850 detection
    OMAP3: PM: introduce a new powerdomain walk helper
    OMAP3: PM: Enable GPIO module-level wakeups
    OMAP3: PM: USBHOST: clear wakeup events on both hosts
    OMAP3: PM: PRCM interrupt: only handle selected PRCM interrupts
    OMAP3: PM: PRCM interrupt: check MPUGRPSEL register
    OMAP3: PM: Prevent hang in prcm_interrupt_handler

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
    amd64_edac: beef up DRAM error injection
    amd64_edac: fix DRAM base and limit extraction
    amd64_edac: fix chip select handling
    amd64_edac: simple fix to allow reporting of CECC errors
    amd64_edac: fix K8 intlv_sel check
    amd64_edac: fix interleave enable tests
    amd64_edac: fix DRAM base and limit address extraction
    amd64_edac: fix driver instance lookup table allocation

    Linus Torvalds