15 Oct, 2009

1 commit

  • As the number of callbacks on a given CPU rises, invoke
    force_quiescent_state() only every blimit number of callbacks
    (defaults to 10,000), and even then only if no other CPU has
    invoked force_quiescent_state() in the meantime.

    This should fix the performance regression reported by Nick.

    Reported-by: Nick Piggin
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: jens.axboe@oracle.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

07 Oct, 2009

3 commits

  • Before this patch, all of the rcu_node structures were in the same lockdep
    class, so that lockdep would complain when rcu_preempt_offline_tasks()
    acquired the root rcu_node structure's lock while holding one of the leaf
    rcu_nodes' locks.

    This patch changes rcu_init_one() to use a separate
    spin_lock_init() for the root rcu_node structure's lock than is
    used for that of all of the rest of the rcu_node structures, which
    puts the root rcu_node structure's lock in its own lockdep class.

    Suggested-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • The current interaction between RCU and CPU hotplug requires that
    RCU block in CPU notifiers waiting for callbacks to drain.

    This can be greatly simplified by having each CPU relinquish its
    own callbacks, and for both _rcu_barrier() and CPU_DEAD notifiers
    to adopt all callbacks that were previously relinquished.

    This change also eliminates the possibility of certain types of
    hangs due to the previous practice of waiting for callbacks to be
    invoked from within CPU notifiers. If you don't every wait, you
    cannot hang.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Move the existing rcu_barrier() implementation to rcutree.c,
    consistent with the fact that the rcu_barrier() implementation is
    tied quite tightly to the RCU implementation.

    This opens the way to simplify and fix rcutree.c's rcu_barrier()
    implementation in a later patch.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

06 Oct, 2009

3 commits

  • These issues identified during an old-fashioned face-to-face code
    review extending over many hours. This group improves an existing
    abstraction and introduces two new ones. It also fixes an RCU
    stall-warning bug found while making the other changes.

    o Make RCU_INIT_FLAVOR() declare its own variables, removing
    the need to declare them at each call site.

    o Create an rcu_for_each_leaf() macro that scans the leaf
    nodes of the rcu_node tree.

    o Create an rcu_for_each_node_breadth_first() macro that does
    a breadth-first traversal of the rcu_node tree, AKA
    stepping through the array in index-number order.

    o If all CPUs corresponding to a given leaf rcu_node
    structure go offline, then any tasks queued on that leaf
    will be moved to the root rcu_node structure. Therefore,
    the stall-warning code must dump out tasks queued on the
    root rcu_node structure as well as those queued on the leaf
    rcu_node structures.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Whitespace fixes, updated comments, and trivial code movement.

    o Fix whitespace error in RCU_HEAD_INIT()

    o Move "So where is rcu_write_lock()" comment so that it does
    not come between the rcu_read_unlock() header comment and
    the rcu_read_unlock() definition.

    o Move the module_param statements for blimit, qhimark, and
    qlowmark to immediately follow the corresponding
    definitions.

    o In __rcu_offline_cpu(), move the assignment to rdp_me
    inside the "if" statement, given that rdp_me is not used
    outside of that "if" statement.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Move the rcu_lock_map definition from rcutree.c to rcupdate.c so that
    TINY_RCU can use lockdep.

    Reported-by: Ingo Molnar
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

24 Sep, 2009

3 commits

  • Move declarations and update storage classes to make checkpatch happy.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • These issues identified during an old-fashioned face-to-face code
    review extending over many hours.

    o Add comments for tricky parts of code, and correct comments
    that have passed their sell-by date.

    o Get rid of the vestiges of rcu_init_sched(), which is no
    longer needed now that PREEMPT_RCU is gone.

    o Move the #include of rcutree_plugin.h to the end of
    rcutree.c, which means that, rather than having a random
    collection of forward declarations, the new set of forward
    declarations document the set of plugins. The new home for
    this #include also allows __rcu_init_preempt() to move into
    rcutree_plugin.h.

    o Fix rcu_preempt_check_callbacks() to be static.

    Suggested-by: Josh Triplett
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    Peter Zijlstra

    Paul E. McKenney
     
  • These issues identified during an old-fashioned face-to-face code
    review extended over many hours.

    o Bury various forms of the "rsp->completed == rsp->gpnum"
    comparison into an rcu_gp_in_progress() function, which has
    the beneficial side-effect of forcing consistent use of
    ACCESS_ONCE().

    o Replace hand-coded arithmetic with DIV_ROUND_UP().

    o Bury several "!list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x01])"
    instances into an rcu_preempted_readers() function, as this
    expression indicates that there are no readers blocked
    within RCU read-side critical sections blocking the current
    grace period. (Though there might well be similar readers
    blocking the next grace period.)

    o Remove a dangling rcu_restart_cpu() declaration that has
    been dangling for almost 20 minor releases of the kernel.

    Signed-off-by: Paul E. McKenney
    Acked-by: Peter Zijlstra
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

19 Sep, 2009

4 commits

  • Fix a number of whitespace ^Ierrors in the include/linux/rcu*
    and the kernel/rcu* files.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    LKML-Reference:
    [ did more checkpatch fixlets ]
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Commit de078d8 ("rcu: Need to update rnp->gpnum if preemptable RCU
    is to be reliable") repeatedly and incorrectly initializes the root
    rcu_node structure's ->gpnum field rather than initializing the
    ->gpnum field of each node in the tree. Fix this. Also add an
    additional consistency check to catch this in the future.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • o Drop the calls to cpu_quiet() from the online/offline code.
    These are unnecessary, since force_quiescent_state() will
    clean up, and removing them simplifies the code a bit.

    o Add a warning to check that we don't enqueue the same blocked
    task twice onto the ->blocked_tasks[] lists.

    o Rework the phase computation in rcu_preempt_note_context_switch()
    to be more readable, as suggested by Josh Triplett.

    o Disable irqs to close a race between the scheduling clock
    interrupt and rcu_preempt_note_context_switch() WRT the
    ->rcu_read_unlock_special field.

    o Add comments to rnp->lock acquisition and release within
    rcu_read_unlock_special() noting that irqs are already
    disabled.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • o Verify that qsmask bits stay clear through GP
    initialization.

    o Verify that cpu_quiet_msk_finish() is never invoked unless
    there actually is an RCU grace period in progress.

    o Verify that all internal-node rcu_node structures have empty
    blocked_tasks[] lists.

    o Verify that child rcu_node structure's bits remain clear after
    acquiring parent's lock.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

18 Sep, 2009

4 commits

  • The earlier approach required two scheduling-clock ticks to note an
    preemptable-RCU quiescent state in the situation in which the
    scheduling-clock interrupt is unlucky enough to always interrupt an
    RCU read-side critical section.

    With this change, the quiescent state is instead noted by the
    outermost rcu_read_unlock() immediately following the first
    scheduling-clock tick, or, alternatively, by the first subsequent
    context switch. Therefore, this change also speeds up grace
    periods.

    Suggested-by: Josh Triplett
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Check to make sure that there are no blocked tasks for the previous
    grace period while initializing for the next grace period, verify
    that rcu_preempt_qs() is given the correct CPU number and is never
    called for an offline CPU.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Prior implementations initialized the root and any internal
    nodes without holding locks, then initialized the leaves
    holding locks.

    This is a false economy, as the leaf nodes will usually greatly
    outnumber the root and internal nodes. Acquiring locks on all
    nodes is conceptually much simpler as well.

    Signed-off-by: Paul E. McKenney
    Acked-by: Steven Rostedt
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Without this patch, tasks preempted in RCU read-side critical
    sections can fail to block the grace period, given that
    rnp->gpnum is used to determine which rnp->blocked_tasks[]
    element the preempted task is enqueued on.

    Before the patch, rnp->gpnum is always zero, so preempted tasks
    are always enqueued on rnp->blocked_tasks[0], which is correct
    only when the current CPU has not checked into the current
    grace period and the grace-period number is even, or,
    similarly, if the current CPU -has- checked into the current
    grace period and the grace-period number is odd.

    Signed-off-by: Paul E. McKenney
    Acked-by: Steven Rostedt
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

12 Sep, 2009

1 commit

  • * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (28 commits)
    rcu: Move end of special early-boot RCU operation earlier
    rcu: Changes from reviews: avoid casts, fix/add warnings, improve comments
    rcu: Create rcutree plugins to handle hotplug CPU for multi-level trees
    rcu: Remove lockdep annotations from RCU's _notrace() API members
    rcu: Add #ifdef to suppress __rcu_offline_cpu() warning in !HOTPLUG_CPU builds
    rcu: Add CPU-offline processing for single-node configurations
    rcu: Add "notrace" to RCU function headers used by ftrace
    rcu: Remove CONFIG_PREEMPT_RCU
    rcu: Merge preemptable-RCU functionality into hierarchical RCU
    rcu: Simplify rcu_pending()/rcu_check_callbacks() API
    rcu: Use debugfs_remove_recursive() simplify code.
    rcu: Merge per-RCU-flavor initialization into pre-existing macro
    rcu: Fix online/offline indication for rcudata.csv trace file
    rcu: Consolidate sparse and lockdep declarations in include/linux/rcupdate.h
    rcu: Renamings to increase RCU clarity
    rcu: Move private definitions from include/linux/rcutree.h to kernel/rcutree.h
    rcu: Expunge lingering references to CONFIG_CLASSIC_RCU, optimize on !SMP
    rcu: Delay rcu_barrier() wait until beginning of next CPU-hotunplug operation.
    rcu: Fix typo in rcu_irq_exit() comment header
    rcu: Make rcupreempt_trace.c look at offline CPUs
    ...

    Linus Torvalds
     

29 Aug, 2009

2 commits

  • Changes suggested by review comments from Josh Triplett and
    Mathieu Desnoyers.

    Signed-off-by: Paul E. McKenney
    Acked-by: Josh Triplett
    Acked-by: Mathieu Desnoyers
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • When offlining CPUs from a multi-level tree, there is the
    possibility of offlining the last CPU from a given node when
    there are preempted RCU read-side critical sections that
    started life on one of the CPUs on that node.

    In this case, the corresponding tasks will be enqueued via the
    task_struct's rcu_node_entry list_head onto one of the
    rcu_node's blocked_tasks[] lists. These tasks need to be moved
    somewhere else so that they will prevent the current grace
    period from ending. That somewhere is the root rcu_node.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

26 Aug, 2009

1 commit


25 Aug, 2009

1 commit

  • Add preemptable-RCU plugin to handle the CPU-offline
    processing.

    An additional plugin is forthcoming to handle multinode RCU
    trees, but this current plugin works for configurations up to
    32 CPUs (64 CPUs for 64-bit kernels).

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

23 Aug, 2009

5 commits

  • Create a kernel/rcutree_plugin.h file that contains definitions
    for preemptable RCU (or, under the #else branch of the #ifdef,
    empty definitions for the classic non-preemptable semantics).
    These definitions fit into plugins defined in kernel/rcutree.c
    for this purpose.

    This variant of preemptable RCU uses a new algorithm whose
    read-side expense is roughly that of classic hierarchical RCU
    under CONFIG_PREEMPT. This new algorithm's update-side expense
    is similar to that of classic hierarchical RCU, and, in absence
    of read-side preemption or blocking, is exactly that of classic
    hierarchical RCU. Perhaps more important, this new algorithm
    has a much simpler implementation, saving well over 1,000 lines
    of code compared to mainline's implementation of preemptable
    RCU, which will hopefully be retired in favor of this new
    algorithm.

    The simplifications are obtained by maintaining per-task
    nesting state for running tasks, and using a simple
    lock-protected algorithm to handle accounting when tasks block
    within RCU read-side critical sections, making use of lessons
    learned while creating numerous user-level RCU implementations
    over the past 18 months.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • All calls from outside RCU are of the form:

    if (rcu_pending(cpu))
    rcu_check_callbacks(cpu, user);

    This is silly, instead we put a call to rcu_pending() in
    rcu_check_callbacks(), and then make the outside calls be to
    rcu_check_callbacks(). This cuts down on the code a bit and
    also gives the compiler a better chance of optimizing.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Rename the RCU_DATA_PTR_INIT() macro to RCU_INIT_FLAVOR() and
    make it do the rcu_init_one() and rcu_boot_init_percpu_data()
    calls. Merge the loop that was in the original macro with the
    loops that were in __rcu_init().

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Make RCU-sched, RCU-bh, and RCU-preempt be underlying
    implementations, with "RCU" defined in terms of one of the
    three. Update the outdated rcu_qsctr_inc() names, as these
    functions no longer increment anything.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Some information hiding that makes it easier to merge
    preemptability into rcutree without descending into #include
    hell.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

16 Aug, 2009

2 commits

  • Use the new cpu_notifier() API to simplify RCU's CPU-hotplug
    notifiers, collapsing down to a single such notifier.

    This makes it trivial to provide the notifier-ordering
    guarantee that rcu_barrier() depends on.

    Also remove redundant open_softirq() calls from Hierarchical
    RCU notifier.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: josht@linux.vnet.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: hugh.dickins@tiscali.co.uk
    Cc: benh@kernel.crashing.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • This patch divides the rcutree initialization into boot-time
    and hotplug-time components, so that the tree data structures
    are guaranteed to be fully linked at boot time regardless of
    what might happen in CPU hotplug operations.

    This makes RCU more resilient against CPU hotplug misbehavior
    (and vice versa), but more importantly, does a better job of
    compartmentalizing the code.

    Reported-by: Ingo Molnar
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: josht@linux.vnet.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: hugh.dickins@tiscali.co.uk
    Cc: benh@kernel.crashing.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

02 Aug, 2009

1 commit

  • When debugging a recent lockup bug i found various deficiencies
    in how our current lockup detection helpers work:

    - SysRq-L is not very efficient as it uses a workqueue, hence
    it cannot punch through hard lockups and cannot see through
    most soft lockups either.

    - The SysRq-L code depends on the NMI watchdog - which is off
    by default.

    - We dont print backtraces from the RCU code's built-in
    'RCU state machine is stuck' debug code. This debug
    code tends to be one of the first (and only) mechanisms
    that show that a lockup has occured.

    This patch changes the code so taht we:

    - Trigger the NMI backtrace code from SysRq-L instead of using
    a workqueue (which cannot punch through hard lockups)

    - Trigger print-all-CPU-backtraces from the RCU lockup detection
    code

    Also decouple the backtrace printing code from the NMI watchdog:

    - Dont use variable size cpumasks (it might not be initialized
    and they are a bit more fragile anyway)

    - Trigger an NMI immediately via an IPI, instead of waiting
    for the NMI tick to occur. This is a lot faster and can
    produce more relevant backtraces. It will also work if the
    NMI watchdog is disabled.

    - Dont print the 'dazed and confused' message when we print
    a backtrace from the NMI

    - Do a show_regs() plus a dump_stack() to get maximum info
    out of the dump. Worst-case we get two stacktraces - which
    is not a big deal. Sometimes, if register content is
    corrupted, the precise stack walker in show_regs() wont
    give us a full backtrace - in this case dump_stack() will
    do it.

    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Linus Torvalds
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

24 Jun, 2009

1 commit

  • Removes the warnings about Hierarchical RCU being experimental,
    given that it has gone through almost six months of being the
    default RCU in mainline for the x86 with very little trouble.

    This makes hierarchical-RCU bootup look less scary.

    Signed-off-by: Paul E. McKenney
    Cc: akpm@linux-foundation.org
    Cc: niv@us.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: dipankar@in.ibm.com
    Cc: dhowells@redhat.com
    Cc: lethal@linux-sh.org
    Cc: kernel@wantstofly.org
    Cc: cl@linux-foundation.org
    Cc: schamp@sgi.com
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

14 Apr, 2009

2 commits

  • Add tracing to __rcu_pending() to provide information on why RCU
    processing was kicked off. This is helpful for debugging hierarchical
    RCU, and might also be helpful in learning how hierarchical RCU operates.

    Located-by: Anton Blanchard
    Tested-by: Anton Blanchard
    Signed-off-by: Paul E. McKenney
    Cc: anton@samba.org
    Cc: akpm@linux-foundation.org
    Cc: dipankar@in.ibm.com
    Cc: manfred@colorfullife.com
    Cc: cl@linux-foundation.org
    Cc: josht@linux.vnet.ibm.com
    Cc: schamp@sgi.com
    Cc: niv@us.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: ego@in.ibm.com
    Cc: laijs@cn.fujitsu.com
    Cc: rostedt@goodmis.org
    Cc: peterz@infradead.org
    Cc: penberg@cs.helsinki.fi
    Cc: andi@firstfloor.org
    Cc: "Paul E. McKenney"
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • This patch fixes a hierarchical-RCU performance bug located by Anton
    Blanchard. The problem stems from a misguided attempt to provide a
    work-around for jiffies-counter failure. This work-around uses a per-CPU
    n_rcu_pending counter, which is incremented on each call to rcu_pending(),
    which in turn is called from each scheduling-clock interrupt. Each CPU
    then treats this counter as a surrogate for the jiffies counter, so
    that if the jiffies counter fails to advance, the per-CPU n_rcu_pending
    counter will cause RCU to invoke force_quiescent_state(), which in turn
    will (among other things) send resched IPIs to CPUs that have thus far
    failed to pass through an RCU quiescent state.

    Unfortunately, each CPU resets only its own counter after sending a
    batch of IPIs. This means that the other CPUs will also (needlessly)
    send -another- round of IPIs, for a full N-squared set of IPIs in the
    worst case every three scheduler-clock ticks until the grace period
    finally ends. It is not reasonable for a given CPU to reset each and
    every n_rcu_pending for all the other CPUs, so this patch instead simply
    disables the jiffies-counter "training wheels", thus eliminating the
    excessive IPIs.

    Note that the jiffies-counter IPIs do not have this problem due to
    the fact that the jiffies counter is global, so that the CPU sending
    the IPIs can easily reset things, thus preventing the other CPUs from
    sending redundant IPIs.

    Note also that the n_rcu_pending counter remains, as it will continue to
    be used for tracing. It may also see use to update the jiffies counter,
    should an appropriate kick-the-jiffies-counter API appear.

    Located-by: Anton Blanchard
    Tested-by: Anton Blanchard
    Signed-off-by: Paul E. McKenney
    Cc: anton@samba.org
    Cc: akpm@linux-foundation.org
    Cc: dipankar@in.ibm.com
    Cc: manfred@colorfullife.com
    Cc: cl@linux-foundation.org
    Cc: josht@linux.vnet.ibm.com
    Cc: schamp@sgi.com
    Cc: niv@us.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: ego@in.ibm.com
    Cc: laijs@cn.fujitsu.com
    Cc: rostedt@goodmis.org
    Cc: peterz@infradead.org
    Cc: penberg@cs.helsinki.fi
    Cc: andi@firstfloor.org
    Cc: "Paul E. McKenney"
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

03 Apr, 2009

2 commits

  • Impact: cleanup

    We want to remove rcutree internals from the public rcutree.h file for
    upcoming kmemtrace changes - but kernel/rcutree_trace.c depends on them.

    Introduce kernel/rcutree.h for internal definitions. (Probably all
    the other data types from include/linux/rcutree.h could be
    moved here too - except rcu_data.)

    Cc: Pekka Enberg
    Cc: Eduard - Gabriel Munteanu
    Cc: paulmck@linux.vnet.ibm.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Impact: build fix for all non-x86 architectures

    We want to remove percpu.h from rcuclassic.h/rcutree.h (for upcoming
    kmemtrace changes) but that would break the DECLARE_PER_CPU based
    declarations in these files.

    Move the quiescent counter management functions to their respective
    RCU implementation .c files - they were slightly above the inlining
    limit anyway.

    Cc: Pekka Enberg
    Cc: Eduard - Gabriel Munteanu
    Cc: paulmck@linux.vnet.ibm.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

26 Feb, 2009

1 commit

  • This patch fixes a bug located by Vegard Nossum with the aid of
    kmemcheck, updated based on review comments from Nick Piggin,
    Ingo Molnar, and Andrew Morton. And cleans up the variable-name
    and function-name language. ;-)

    The boot CPU runs in the context of its idle thread during boot-up.
    During this time, idle_cpu(0) will always return nonzero, which will
    fool Classic and Hierarchical RCU into deciding that a large chunk of
    the boot-up sequence is a big long quiescent state. This in turn causes
    RCU to prematurely end grace periods during this time.

    This patch changes the rcutree.c and rcuclassic.c rcu_check_callbacks()
    function to ignore the idle task as a quiescent state until the
    system has started up the scheduler in rest_init(), introducing a
    new non-API function rcu_idle_now_means_idle() to inform RCU of this
    transition. RCU maintains an internal rcu_idle_cpu_truthful variable
    to track this state, which is then used by rcu_check_callback() to
    determine if it should believe idle_cpu().

    Because this patch has the effect of disallowing RCU grace periods
    during long stretches of the boot-up sequence, this patch also introduces
    Josh Triplett's UP-only optimization that makes synchronize_rcu() be a
    no-op if num_online_cpus() returns 1. This allows boot-time code that
    calls synchronize_rcu() to proceed normally. Note, however, that RCU
    callbacks registered by call_rcu() will likely queue up until later in
    the boot sequence. Although rcuclassic and rcutree can also use this
    same optimization after boot completes, rcupreempt must restrict its
    use of this optimization to the portion of the boot sequence before the
    scheduler starts up, given that an rcupreempt RCU read-side critical
    section may be preeempted.

    In addition, this patch takes Nick Piggin's suggestion to make the
    system_state global variable be __read_mostly.

    Changes since v4:

    o Changes the name of the introduced function and variable to
    be less emotional. ;-)

    Changes since v3:

    o WARN_ON(nr_context_switches() > 0) to verify that RCU
    switches out of boot-time mode before the first context
    switch, as suggested by Nick Piggin.

    Changes since v2:

    o Created rcu_blocking_is_gp() internal-to-RCU API that
    determines whether a call to synchronize_rcu() is itself
    a grace period.

    o The definition of rcu_blocking_is_gp() for rcuclassic and
    rcutree checks to see if but a single CPU is online.

    o The definition of rcu_blocking_is_gp() for rcupreempt
    checks to see both if but a single CPU is online and if
    the system is still in early boot.

    This allows rcupreempt to again work correctly if running
    on a single CPU after booting is complete.

    o Added check to rcupreempt's synchronize_sched() for there
    being but one online CPU.

    Tested all three variants both SMP and !SMP, booted fine, passed a short
    rcutorture test on both x86 and Power.

    Located-by: Vegard Nossum
    Tested-by: Vegard Nossum
    Tested-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

14 Jan, 2009

1 commit


05 Jan, 2009

2 commits

  • Impact: fix kernel warnings [and potential crash] during suspend+resume

    Kudos to both Dhaval Giani and Jens Axboe for finding a bug in treercu
    that causes warnings after suspend-resume cycles in Dhaval's case and
    during stress tests in Jens's case. It would also probably cause failures
    if heavily stressed. The solution, ironically enough, is to revert to
    rcupreempt's code for initializing the dynticks state. And the patch
    even results in smaller code -- so what was I thinking???

    This is 2.6.29 material, given that people really do suspend and resume
    Linux these days. ;-)

    Reported-by: Dhaval Giani
    Reported-by: Jens Axboe
    Tested-by: Dhaval Giani
    Tested-by: Jens Axboe
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Impact: fix delays during bootup

    Kudos to Andi Kleen for finding a grace-period-latency problem! The
    problem was that the special-case code for small machines never updated
    the ->signaled field to indicate that grace-period initialization had
    completed, which prevented force_quiescent_state() from ever expediting
    grace periods. This problem resulted in grace periods extending for more
    than 20 seconds. Not subtle. I introduced this bug during my inspection
    process when I fixed a race between grace-period initialization and
    force_quiescent_state() execution.

    The following patch properly updates the ->signaled field for the
    "small"-system case (no more than 32 CPUs for 32-bit kernels and no more
    than 64 CPUs for 64-bit kernels).

    Reported-by: Andi Kleen
    Tested-by: Andi Kleen
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar

    Paul E. McKenney