03 Dec, 2009

3 commits

  • Implement an synchronize_rcu_expedited() for preemptible RCU
    that actually is expedited. This uses
    synchronize_sched_expedited() to force all threads currently
    running in a preemptible-RCU read-side critical section onto the
    appropriate ->blocked_tasks[] list, then takes a snapshot of all
    of these lists and waits for them to drain.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Enable a fourth level of rcu_node hierarchy for TREE_RCU and
    TREE_PREEMPT_RCU. This is for stress-testing and experiemental
    purposes only, although in theory this would enable 16,777,216
    CPUs on 64-bit systems, though only 1,048,576 CPUs on 32-bit
    systems. Normal experimental use of this fourth level will
    normally set CONFIG_RCU_FANOUT=2, requiring a 16-CPU system,
    though the more adventurous (and more fortunate) experimenters
    may wish to chose CONFIG_RCU_FANOUT=3 for 81-CPU systems or even
    CONFIG_RCU_FANOUT=4 for 256-CPU systems.

    Signed-off-by: Paul E. McKenney
    Acked-by: Josh Triplett
    Acked-by: Lai Jiangshan
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • The number of "quiet" functions has grown recently, and the
    names are no longer very descriptive. The point of all of these
    functions is to do some portion of the task of reporting a
    quiescent state, so rename them accordingly:

    o cpu_quiet() becomes rcu_report_qs_rdp(), which reports a
    quiescent state to the per-CPU rcu_data structure. If this
    turns out to be a new quiescent state for this grace period,
    then rcu_report_qs_rnp() will be invoked to propagate the
    quiescent state up the rcu_node hierarchy.

    o cpu_quiet_msk() becomes rcu_report_qs_rnp(), which reports
    a quiescent state for a given CPU (or possibly a set of CPUs)
    up the rcu_node hierarchy.

    o cpu_quiet_msk_finish() becomes rcu_report_qs_rsp(), which
    reports a full set of quiescent states to the global rcu_state
    structure.

    o task_quiet() becomes rcu_report_unblock_qs_rnp(), which reports
    a quiescent state due to a task exiting an RCU read-side critical
    section that had previously blocked in that same critical section.
    As indicated by the new name, this type of quiescent state is
    reported up the rcu_node hierarchy (using rcu_report_qs_rnp()
    to do so).

    Signed-off-by: Paul E. McKenney
    Acked-by: Josh Triplett
    Acked-by: Lai Jiangshan
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

23 Nov, 2009

3 commits

  • Remove #ifdefs from kernel/rcupdate.c and
    include/linux/rcupdate.h by moving code to
    include/linux/rcutiny.h, include/linux/rcutree.h, and
    kernel/rcutree.c.

    Also remove some definitions that are no longer used.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • The functions rcu_init() is a wrapper for __rcu_init(), and also
    sets up the CPU-hotplug notifier for rcu_barrier_cpu_hotplug().
    But TINY_RCU doesn't need CPU-hotplug notification, and the
    rcu_barrier_cpu_hotplug() is a simple wrapper for
    rcu_cpu_notify().

    So push rcu_init() out to kernel/rcutree.c and kernel/rcutiny.c
    and get rid of the wrapper function rcu_barrier_cpu_hotplug().

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • When the last CPU of a given leaf rcu_node structure goes
    offline, all of the tasks queued on that leaf rcu_node structure
    (due to having blocked in their current RCU read-side critical
    sections) are requeued onto the root rcu_node structure. This
    requeuing is carried out by rcu_preempt_offline_tasks().
    However, it is possible that these queued tasks are the only
    thing preventing the leaf rcu_node structure from reporting a
    quiescent state up the rcu_node hierarchy. Unfortunately, the
    old code would fail to do this reporting, resulting in a
    grace-period stall given the following sequence of events:

    1. Kernel built for more than 32 CPUs on 32-bit systems or for more
    than 64 CPUs on 64-bit systems, so that there is more than one
    rcu_node structure. (Or CONFIG_RCU_FANOUT is artificially set
    to a number smaller than CONFIG_NR_CPUS.)

    2. The kernel is built with CONFIG_TREE_PREEMPT_RCU.

    3. A task running on a CPU associated with a given leaf rcu_node
    structure blocks while in an RCU read-side critical section
    -and- that CPU has not yet passed through a quiescent state
    for the current RCU grace period. This will cause the task
    to be queued on the leaf rcu_node's blocked_tasks[] array, in
    particular, on the element of this array corresponding to the
    current grace period.

    4. Each of the remaining CPUs corresponding to this same leaf rcu_node
    structure pass through a quiescent state. However, the task is
    still in its RCU read-side critical section, so these quiescent
    states cannot be reported further up the rcu_node hierarchy.
    Nevertheless, all bits in the leaf rcu_node structure's ->qsmask
    field are now zero.

    5. Each of the remaining CPUs go offline. (The events in step
    #4 and #5 can happen in any order as long as each CPU passes
    through a quiescent state before going offline.)

    6. When the last CPU goes offline, __rcu_offline_cpu() will invoke
    rcu_preempt_offline_tasks(), which will move the task to the
    root rcu_node structure, but without reporting a quiescent state
    up the rcu_node hierarchy (and this failure to report a quiescent
    state is the bug).

    But because this leaf rcu_node structure's ->qsmask field is
    already zero and its ->block_tasks[] entries are all empty,
    force_quiescent_state() will skip this rcu_node structure.

    Therefore, grace periods are now hung.

    This patch abstracts some code out of rcu_read_unlock_special(),
    calling the result task_quiet() by analogy with cpu_quiet(), and
    invokes task_quiet() from both rcu_read_lock_special() and
    __rcu_offline_cpu(). Invoking task_quiet() from
    __rcu_offline_cpu() reports the quiescent state up the rcu_node
    hierarchy, fixing the bug. This ends up requiring a separate
    lock_class_key per level of the rcu_node hierarchy, which this
    patch also provides.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

14 Nov, 2009

2 commits

  • Now that there are both ->gpnum and ->completed fields in the
    rcu_node structure, __rcu_pending() should check rdp->gpnum and
    rdp->completed against rnp->gpnum and rdp->completed, respectively,
    instead of the prior comparison against the rcu_state fields
    rsp->gpnum and rsp->completed.

    Given the old comparison, __rcu_pending() could return 1, resulting
    in a needless raise_softirq(RCU_SOFTIRQ). This useless work would
    happen if RCU responded to a scheduling-clock interrupt after the
    rcu_state fields had been updated, but before the rcu_node fields
    had been updated.

    Changing the comparison from the rcu_state fields to the rcu_node
    fields prevents this useless work from happening.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Now that a copy of the rsp->completed flag is available in all
    rcu_node structures, make full use of it. It is still
    legitimate to access rsp->completed while holding the root
    rcu_node structure's lock, however.

    Also, tighten up force_quiescent_state()'s checks for end of
    current grace period.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

13 Nov, 2009

2 commits

  • The force_quiescent_state() function also took a snapshot
    of the ->completed field, which was as obnoxious as it was in
    rcu_sched_qs() and friends. So snapshot ->gpnum-1.

    Also, since the dyntick_record_completed() and
    dyntick_recall_completed() functions are now simple assignments
    that are independent of CONFIG_NO_HZ, and since their names are
    now misleading, get rid of them.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • An earlier fix for a race resulted in a situation where the CPUs
    other than the CPU that detected the end of the grace period would
    not process their callbacks until the next grace period started.

    This means that these other CPUs would unnecessarily demand that an
    extra grace period be started.

    This patch eliminates this extra grace period and speeds callback
    processing by propagating rsp->completed to the rcu_node structures
    in the case where the CPU detecting the end of the grace period
    sees no reason to start a new grace period.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

11 Nov, 2009

2 commits

  • The rdp->passed_quiesc_completed fields are used to properly
    associate the recorded quiescent state with a grace period. It
    is OK to wrongly associate a given quiescent state with a
    preceding grace period, but it is fatal to associate a given
    quiescent state with a grace period that begins after the
    quiescent state occurred. Grace periods are numbered, and the
    following fields track them:

    o ->gpnum is the number of the grace period currently in
    progress, or the number of the last grace period to
    complete if no grace period is currently in progress.

    o ->completed is the number of the last grace period to
    have completed.

    These two fields are equal if there is no grace period in
    progress, otherwise ->gpnum is one greater than ->completed.
    But the rdp->passed_quiesc_completed field compared against
    ->completed, and if equal, the quiescent state is presumed to
    count against the current grace period.

    The earlier code copied rdp->completed to
    rdp->passed_quiesc_completed, which has been made to work, but
    is error-prone. In contrast, copying one less than rdp->gpnum
    is guaranteed safe, because rdp->gpnum is not incremented until
    after the start of the corresponding grace period. At the end of
    the grace period, when ->completed has incremented, then any
    quiescent periods recorded previously will be discarded.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • This field is used whether or not CONFIG_NO_HZ is set, so the
    old name of ->dynticks_completed is quite misleading.

    Change to ->completed_fqs, given that it the value that
    force_quiescent_state() is trying to drive the ->completed field
    away from.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

10 Nov, 2009

4 commits

  • Impose a clear locking design on the note_new_gpnum()
    function's use of the ->gpnum counter. This is done by updating
    rdp->gpnum only from the corresponding leaf rcu_node structure's
    rnp->gpnum field, and even then only under the protection of
    that same rcu_node structure's ->lock field. Performance and
    scalability are maintained using a form of double-checked
    locking, and excessive spinning is avoided by use of the
    spin_trylock() function. The use of spin_trylock() is safe due
    to the fact that CPUs who fail to acquire this lock will try
    again later. The hierarchical nature of the rcu_node data
    structure limits contention (which could be limited further if
    need be using the RCU_FANOUT kernel parameter).

    Without this patch, obscure but quite possible races could
    result in a quiescent state that occurred during one grace
    period to be accounted to the following grace period, causing
    this following grace period to end prematurely. Not good!

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: # .32.x
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Impose a clear locking design on the rcu_process_gp_end()
    function's use of the ->completed counter. This is done by
    creating a ->completed field in the rcu_node structure, which
    can safely be accessed under the protection of that structure's
    lock. Performance and scalability are maintained by using a
    form of double-checked locking, so that rcu_process_gp_end()
    only acquires the leaf rcu_node structure's ->lock if a grace
    period has recently ended.

    This fix reduces rcutorture failure rate by at least two orders
    of magnitude under heavy stress with force_quiescent_state()
    being invoked artificially often. Without this fix,
    unsynchronized access to the ->completed field can cause
    rcu_process_gp_end() to advance callbacks whose grace period has
    not yet expired. (Bad idea!)

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: # .32.x
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Impose a clear locking design on non-NO_HZ handling of the
    ->completed counter. This increases the distance between the
    RCU and the CPU-hotplug mechanisms.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: # .32.x
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Merge reason: Pick up RCU fixlet to base further commits on.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

02 Nov, 2009

1 commit

  • Very long RCU read-side critical sections (50 milliseconds or
    so) can cause a race between force_quiescent_state() and
    rcu_start_gp() as follows on kernel builds with multi-level
    rcu_node hierarchies:

    1. CPU 0 calls force_quiescent_state(), sees that there is a
    grace period in progress, and acquires ->fsqlock.

    2. CPU 1 detects the end of the grace period, and so
    cpu_quiet_msk_finish() sets rsp->completed to rsp->gpnum.
    This operation is carried out under the root rnp->lock,
    but CPU 0 has not yet acquired that lock. Note that
    rsp->signaled is still RCU_SAVE_DYNTICK from the last
    grace period.

    3. CPU 1 calls rcu_start_gp(), but no one wants a new grace
    period, so it drops the root rnp->lock and returns.

    4. CPU 0 acquires the root rnp->lock and picks up rsp->completed
    and rsp->signaled, then drops rnp->lock. It then enters the
    RCU_SAVE_DYNTICK leg of the switch statement.

    5. CPU 2 invokes call_rcu(), and now needs a new grace period.
    It calls rcu_start_gp(), which acquires the root rnp->lock, sets
    rsp->signaled to RCU_GP_INIT (too bad that CPU 0 is already in
    the RCU_SAVE_DYNTICK leg of the switch statement!) and starts
    initializing the rcu_node hierarchy. If there are multiple
    levels to the hierarchy, it will drop the root rnp->lock and
    initialize the lower levels of the hierarchy.

    6. CPU 0 notes that rsp->completed has not changed, which permits
    both CPU 2 and CPU 0 to try updating it concurrently. If CPU 0's
    update prevails, later calls to force_quiescent_state() can
    count old quiescent states against the new grace period, which
    can in turn result in premature ending of grace periods.

    Not good.

    This patch adds an RCU_GP_IDLE state for rsp->signaled that is
    set initially at boot time and any time a grace period ends.
    This prevents CPU 0 from getting into the workings of
    force_quiescent_state() in step 4. Additional locking and
    checks prevent the concurrent update of rsp->signaled in step 6.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

27 Oct, 2009

1 commit

  • Use lockdep_set_class() to simplify the code and to avoid any
    additional overhead in the !LOCKDEP case. Also move the
    definition of rcu_root_class into kernel/rcutree.c, as suggested
    by Lai Jiangshan.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

16 Oct, 2009

1 commit

  • If the following sequence of events occurs, then
    TREE_PREEMPT_RCU will hang waiting for a grace period to
    complete, eventually OOMing the system:

    o A TREE_PREEMPT_RCU build of the kernel is booted on a system
    with more than 64 physical CPUs present (32 on a 32-bit system).
    Alternatively, a TREE_PREEMPT_RCU build of the kernel is booted
    with RCU_FANOUT set to a sufficiently small value that the
    physical CPUs populate two or more leaf rcu_node structures.

    o A task is preempted in an RCU read-side critical section
    while running on a CPU corresponding to a given leaf rcu_node
    structure.

    o All CPUs corresponding to this same leaf rcu_node structure
    record quiescent states for the current grace period.

    o All of these same CPUs go offline (hence the need for enough
    physical CPUs to populate more than one leaf rcu_node structure).
    This causes the preempted task to be moved to the root rcu_node
    structure.

    At this point, there is nothing left to cause the quiescent
    state to be propagated up the rcu_node tree, so the current
    grace period never completes.

    The simplest fix, especially after considering the deadlock
    possibilities, is to detect this situation when the last CPU is
    offlined, and to set that CPU's ->qsmask bit in its leaf
    rcu_node structure. This will cause the next invocation of
    force_quiescent_state() to end the grace period.

    Without this fix, this hang can be triggered in an hour or so on
    some machines with rcutorture and random CPU onlining/offlining.
    With this fix, these same machines pass a full 10 hours of this
    sort of abuse.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

15 Oct, 2009

1 commit

  • As the number of callbacks on a given CPU rises, invoke
    force_quiescent_state() only every blimit number of callbacks
    (defaults to 10,000), and even then only if no other CPU has
    invoked force_quiescent_state() in the meantime.

    This should fix the performance regression reported by Nick.

    Reported-by: Nick Piggin
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: jens.axboe@oracle.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

07 Oct, 2009

3 commits

  • Before this patch, all of the rcu_node structures were in the same lockdep
    class, so that lockdep would complain when rcu_preempt_offline_tasks()
    acquired the root rcu_node structure's lock while holding one of the leaf
    rcu_nodes' locks.

    This patch changes rcu_init_one() to use a separate
    spin_lock_init() for the root rcu_node structure's lock than is
    used for that of all of the rest of the rcu_node structures, which
    puts the root rcu_node structure's lock in its own lockdep class.

    Suggested-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • The current interaction between RCU and CPU hotplug requires that
    RCU block in CPU notifiers waiting for callbacks to drain.

    This can be greatly simplified by having each CPU relinquish its
    own callbacks, and for both _rcu_barrier() and CPU_DEAD notifiers
    to adopt all callbacks that were previously relinquished.

    This change also eliminates the possibility of certain types of
    hangs due to the previous practice of waiting for callbacks to be
    invoked from within CPU notifiers. If you don't every wait, you
    cannot hang.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Move the existing rcu_barrier() implementation to rcutree.c,
    consistent with the fact that the rcu_barrier() implementation is
    tied quite tightly to the RCU implementation.

    This opens the way to simplify and fix rcutree.c's rcu_barrier()
    implementation in a later patch.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

06 Oct, 2009

3 commits

  • These issues identified during an old-fashioned face-to-face code
    review extending over many hours. This group improves an existing
    abstraction and introduces two new ones. It also fixes an RCU
    stall-warning bug found while making the other changes.

    o Make RCU_INIT_FLAVOR() declare its own variables, removing
    the need to declare them at each call site.

    o Create an rcu_for_each_leaf() macro that scans the leaf
    nodes of the rcu_node tree.

    o Create an rcu_for_each_node_breadth_first() macro that does
    a breadth-first traversal of the rcu_node tree, AKA
    stepping through the array in index-number order.

    o If all CPUs corresponding to a given leaf rcu_node
    structure go offline, then any tasks queued on that leaf
    will be moved to the root rcu_node structure. Therefore,
    the stall-warning code must dump out tasks queued on the
    root rcu_node structure as well as those queued on the leaf
    rcu_node structures.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Whitespace fixes, updated comments, and trivial code movement.

    o Fix whitespace error in RCU_HEAD_INIT()

    o Move "So where is rcu_write_lock()" comment so that it does
    not come between the rcu_read_unlock() header comment and
    the rcu_read_unlock() definition.

    o Move the module_param statements for blimit, qhimark, and
    qlowmark to immediately follow the corresponding
    definitions.

    o In __rcu_offline_cpu(), move the assignment to rdp_me
    inside the "if" statement, given that rdp_me is not used
    outside of that "if" statement.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Move the rcu_lock_map definition from rcutree.c to rcupdate.c so that
    TINY_RCU can use lockdep.

    Reported-by: Ingo Molnar
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

24 Sep, 2009

3 commits

  • Move declarations and update storage classes to make checkpatch happy.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • These issues identified during an old-fashioned face-to-face code
    review extending over many hours.

    o Add comments for tricky parts of code, and correct comments
    that have passed their sell-by date.

    o Get rid of the vestiges of rcu_init_sched(), which is no
    longer needed now that PREEMPT_RCU is gone.

    o Move the #include of rcutree_plugin.h to the end of
    rcutree.c, which means that, rather than having a random
    collection of forward declarations, the new set of forward
    declarations document the set of plugins. The new home for
    this #include also allows __rcu_init_preempt() to move into
    rcutree_plugin.h.

    o Fix rcu_preempt_check_callbacks() to be static.

    Suggested-by: Josh Triplett
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    Peter Zijlstra

    Paul E. McKenney
     
  • These issues identified during an old-fashioned face-to-face code
    review extended over many hours.

    o Bury various forms of the "rsp->completed == rsp->gpnum"
    comparison into an rcu_gp_in_progress() function, which has
    the beneficial side-effect of forcing consistent use of
    ACCESS_ONCE().

    o Replace hand-coded arithmetic with DIV_ROUND_UP().

    o Bury several "!list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x01])"
    instances into an rcu_preempted_readers() function, as this
    expression indicates that there are no readers blocked
    within RCU read-side critical sections blocking the current
    grace period. (Though there might well be similar readers
    blocking the next grace period.)

    o Remove a dangling rcu_restart_cpu() declaration that has
    been dangling for almost 20 minor releases of the kernel.

    Signed-off-by: Paul E. McKenney
    Acked-by: Peter Zijlstra
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

19 Sep, 2009

4 commits

  • Fix a number of whitespace ^Ierrors in the include/linux/rcu*
    and the kernel/rcu* files.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    LKML-Reference:
    [ did more checkpatch fixlets ]
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Commit de078d8 ("rcu: Need to update rnp->gpnum if preemptable RCU
    is to be reliable") repeatedly and incorrectly initializes the root
    rcu_node structure's ->gpnum field rather than initializing the
    ->gpnum field of each node in the tree. Fix this. Also add an
    additional consistency check to catch this in the future.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • o Drop the calls to cpu_quiet() from the online/offline code.
    These are unnecessary, since force_quiescent_state() will
    clean up, and removing them simplifies the code a bit.

    o Add a warning to check that we don't enqueue the same blocked
    task twice onto the ->blocked_tasks[] lists.

    o Rework the phase computation in rcu_preempt_note_context_switch()
    to be more readable, as suggested by Josh Triplett.

    o Disable irqs to close a race between the scheduling clock
    interrupt and rcu_preempt_note_context_switch() WRT the
    ->rcu_read_unlock_special field.

    o Add comments to rnp->lock acquisition and release within
    rcu_read_unlock_special() noting that irqs are already
    disabled.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • o Verify that qsmask bits stay clear through GP
    initialization.

    o Verify that cpu_quiet_msk_finish() is never invoked unless
    there actually is an RCU grace period in progress.

    o Verify that all internal-node rcu_node structures have empty
    blocked_tasks[] lists.

    o Verify that child rcu_node structure's bits remain clear after
    acquiring parent's lock.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

18 Sep, 2009

4 commits

  • The earlier approach required two scheduling-clock ticks to note an
    preemptable-RCU quiescent state in the situation in which the
    scheduling-clock interrupt is unlucky enough to always interrupt an
    RCU read-side critical section.

    With this change, the quiescent state is instead noted by the
    outermost rcu_read_unlock() immediately following the first
    scheduling-clock tick, or, alternatively, by the first subsequent
    context switch. Therefore, this change also speeds up grace
    periods.

    Suggested-by: Josh Triplett
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Check to make sure that there are no blocked tasks for the previous
    grace period while initializing for the next grace period, verify
    that rcu_preempt_qs() is given the correct CPU number and is never
    called for an offline CPU.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Prior implementations initialized the root and any internal
    nodes without holding locks, then initialized the leaves
    holding locks.

    This is a false economy, as the leaf nodes will usually greatly
    outnumber the root and internal nodes. Acquiring locks on all
    nodes is conceptually much simpler as well.

    Signed-off-by: Paul E. McKenney
    Acked-by: Steven Rostedt
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Without this patch, tasks preempted in RCU read-side critical
    sections can fail to block the grace period, given that
    rnp->gpnum is used to determine which rnp->blocked_tasks[]
    element the preempted task is enqueued on.

    Before the patch, rnp->gpnum is always zero, so preempted tasks
    are always enqueued on rnp->blocked_tasks[0], which is correct
    only when the current CPU has not checked into the current
    grace period and the grace-period number is even, or,
    similarly, if the current CPU -has- checked into the current
    grace period and the grace-period number is odd.

    Signed-off-by: Paul E. McKenney
    Acked-by: Steven Rostedt
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

12 Sep, 2009

1 commit

  • * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (28 commits)
    rcu: Move end of special early-boot RCU operation earlier
    rcu: Changes from reviews: avoid casts, fix/add warnings, improve comments
    rcu: Create rcutree plugins to handle hotplug CPU for multi-level trees
    rcu: Remove lockdep annotations from RCU's _notrace() API members
    rcu: Add #ifdef to suppress __rcu_offline_cpu() warning in !HOTPLUG_CPU builds
    rcu: Add CPU-offline processing for single-node configurations
    rcu: Add "notrace" to RCU function headers used by ftrace
    rcu: Remove CONFIG_PREEMPT_RCU
    rcu: Merge preemptable-RCU functionality into hierarchical RCU
    rcu: Simplify rcu_pending()/rcu_check_callbacks() API
    rcu: Use debugfs_remove_recursive() simplify code.
    rcu: Merge per-RCU-flavor initialization into pre-existing macro
    rcu: Fix online/offline indication for rcudata.csv trace file
    rcu: Consolidate sparse and lockdep declarations in include/linux/rcupdate.h
    rcu: Renamings to increase RCU clarity
    rcu: Move private definitions from include/linux/rcutree.h to kernel/rcutree.h
    rcu: Expunge lingering references to CONFIG_CLASSIC_RCU, optimize on !SMP
    rcu: Delay rcu_barrier() wait until beginning of next CPU-hotunplug operation.
    rcu: Fix typo in rcu_irq_exit() comment header
    rcu: Make rcupreempt_trace.c look at offline CPUs
    ...

    Linus Torvalds
     

29 Aug, 2009

2 commits

  • Changes suggested by review comments from Josh Triplett and
    Mathieu Desnoyers.

    Signed-off-by: Paul E. McKenney
    Acked-by: Josh Triplett
    Acked-by: Mathieu Desnoyers
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • When offlining CPUs from a multi-level tree, there is the
    possibility of offlining the last CPU from a given node when
    there are preempted RCU read-side critical sections that
    started life on one of the CPUs on that node.

    In this case, the corresponding tasks will be enqueued via the
    task_struct's rcu_node_entry list_head onto one of the
    rcu_node's blocked_tasks[] lists. These tasks need to be moved
    somewhere else so that they will prevent the current grace
    period from ending. That somewhere is the root rcu_node.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney