18 Dec, 2010

1 commit

  • Some recent benchmarks have indicated possible lock contention on the
    leaf-level rcu_node locks. This commit therefore limits the number of
    CPUs per leaf-level rcu_node structure to 16, in other words, there
    can be at most 16 rcu_data structures fanning into a given rcu_node
    structure. Prior to this, the limit was 32 on 32-bit systems and 64 on
    64-bit systems.

    Note that the fanout of non-leaf rcu_node structures is unchanged. The
    organization of accesses to the rcu_node tree is such that references
    to non-leaf rcu_node structures are much less frequent than to the
    leaf structures.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

30 Nov, 2010

1 commit

  • When we handle the CPU_DYING notifier, the whole system is stopped except
    for the current CPU. We therefore need no synchronization with the other
    CPUs. This allows us to move any orphaned RCU callbacks directly to the
    list of any online CPU without needing to run them through the global
    orphan lists. These global orphan lists can therefore be dispensed with.
    This commit makes thes changes, though currently victimizes CPU 0 @@@.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     

24 Sep, 2010

1 commit

  • The current tracing data is not sufficient to deduce the average time
    that a callback spends waiting for a grace period to end. Add three
    per-CPU counters recording the number of callbacks invoked (ci), the
    number of callbacks orphaned (co), and the number of callbacks adopted
    (ca). Given the existing callback queue length (ql), the average wait
    time in absence of CPU hotplug operations is ql/ci. The units of wait
    time will be in terms of the duration over which ci was measured.

    In the presence of CPU hotplug operations, there is room for argument,
    but ql/(ci-co+ca) won't steer you too far wrong.

    Also fixes a typo called out by Lucas De Marchi .

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

21 Aug, 2010

2 commits

  • Combine the duplicate definitions of ULONG_CMP_GE(), ULONG_CMP_LT(),
    and rcu_preempt_depth() into include/linux/rcupdate.h.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • When using a kernel debugger, a long sojourn in the debugger can get
    you lots of RCU CPU stall warnings once you resume. This might not be
    helpful, especially if you are using the system console. This patch
    therefore allows RCU CPU stall warnings to be suppressed, but only for
    the duration of the current set of grace periods.

    This differs from Jason's original patch in that it adds support for
    tiny RCU and preemptible RCU, and uses a slightly different method for
    suppressing the RCU CPU stall warning messages.

    Signed-off-by: Jason Wessel
    Signed-off-by: Paul E. McKenney
    Tested-by: Jason Wessel

    Paul E. McKenney
     

20 Aug, 2010

3 commits

  • Currently, if RCU CPU stall warnings are enabled, they are enabled
    immediately upon boot. They can be manually disabled via /sys (and
    also re-enabled via /sys), and are automatically disabled upon panic.
    However, some users need RCU CPU stalls to be disabled at boot time,
    but to be enabled without rebuilding/rebooting. For example, someone
    running a real-time application in production might not want the
    additional latency of RCU CPU stall detection in normal operation, but
    might need to enable it at any point for fault isolation purposes.

    This commit therefore provides a new CONFIG_RCU_CPU_STALL_DETECTOR_RUNNABLE
    kernel configuration parameter that maintains the current behavior
    (enable at boot) by default, but allows a kernel to be configured
    with RCU CPU stall detection built into the kernel, but disabled at
    boot time.

    Requested-by: Clark Williams
    Requested-by: John Kacur
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Also set the default to 60 seconds, up from the previous hard-coded timeout
    of 10 seconds. This allows people who care to set short timeouts, while
    avoiding people with unusual configurations (make randconfig!!!) from being
    bothered with spurious CPU stall warnings.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • &percpu_data is compatible with allocated percpu data.

    And we use it and remove the "->rda[NR_CPUS]" array, saving significant
    storage on systems with large numbers of CPUs. This does add an additional
    level of indirection and thus an additional cache line referenced, but
    because ->rda is not used on the read side, this is OK.

    Signed-off-by: Lai Jiangshan
    Reviewed-by: Tejun Heo
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Lai Jiangshan
     

11 May, 2010

2 commits

  • Lai Jiangshan noted that up to 10% of the RCU_SOFTIRQ are spurious, and
    traced this down to the fact that the current grace-period machinery
    will uselessly raise RCU_SOFTIRQ when a given CPU needs to go through
    a quiescent state, but has not yet done so. In this situation, there
    might well be nothing that RCU_SOFTIRQ can do, and the overhead can be
    worth worrying about in the ksoftirqd case. This patch therefore avoids
    raising RCU_SOFTIRQ in this situation.

    Changes since v1 (http://lkml.org/lkml/2010/3/30/122 from Lai Jiangshan):

    o Omit the rcu_qs_pending() prechecks, as they aren't that
    much less expensive than the quiescent-state checks.

    o Merge with the set_need_resched() patch that reduces IPIs.

    o Add the new n_rp_report_qs field to the rcu_pending tracing output.

    o Update the tracing documentation accordingly.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The existing RCU CPU stall-warning messages can be confusing, especially
    in the case where one CPU detects a single other stalled CPU. In addition,
    the console messages did not say which flavor of RCU detected the stall,
    which can make it difficult to work out exactly what is causing the stall.
    This commit improves these messages.

    Requested-by: Dhaval Giani
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

11 Mar, 2010

1 commit

  • CONFIG_PROVE_RCU imposes additional overhead on the kernel, so
    increase the RCU CPU stall timeouts in an attempt to allow for
    this effect.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

27 Feb, 2010

1 commit

  • It is invalid to invoke __rcu_process_callbacks() with irqs
    disabled, so do it indirectly via raise_softirq(). This
    requires a state-machine implementation to cycle through the
    grace-period machinery the required number of times.

    Located-by: Ingo Molnar
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

25 Feb, 2010

3 commits

  • When RCU detects a grace-period stall, it currently just prints
    out the PID of any tasks doing the stalling. This patch adds
    RCU_CPU_STALL_VERBOSE, which enables the more-verbose reporting
    from sched_show_task().

    Suggested-by: Thomas Gleixner
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • The spinlocks in rcutree need to be real spinlocks in
    preempt-rt. Convert them to raw_spinlocks.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • The C standard does not specify the result of an operation that
    overflows a signed integer, so such operations need to be
    avoided. This patch changes the type of several fields from
    "long" to "unsigned long" and adjusts operations as needed.
    ULONG_CMP_GE() and ULONG_CMP_LT() macros are introduced to do
    the modular comparisons that are appropriate given that overflow
    is an expected event.

    Acked-by: Mathieu Desnoyers
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

16 Jan, 2010

1 commit

  • Rename local variable "i" in rcu_init() to avoid conflict with
    RCU_INIT_FLAVOR(), restrict the scope of RCU_TREE_NONCORE, and
    make __synchronize_srcu() static.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

13 Jan, 2010

4 commits

  • Grace periods cannot be started while force_quiescent_state() is
    active. This is OK in that the affected CPUs will try again
    later, but it does induce needless grace-period delays. This
    patch causes rcu_start_gp() to record a failed attempt to start
    a grace period. When force_quiescent_state() prepares to return,
    it then starts the grace period if there was such a failed
    attempt.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • The comparisons of rsp->gpnum nad rsp->completed in
    rcu_process_dyntick() and force_quiescent_state() can be
    replaced by the much more clear rcu_gp_in_progress() predicate
    function. After doing this, it becomes clear that the
    RCU_SAVE_COMPLETED leg of the force_quiescent_state() function's
    switch statement is almost completely a no-op. A small change
    to the RCU_SAVE_DYNTICK leg renders it a complete no-op, after
    which it can be removed. Doing so also eliminates the forcenow
    local variable from force_quiescent_state().

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Because rsp->fqs_active is set to 1 across
    force_quiescent_state()'s switch statement, rcu_start_gp() will
    refrain from starting a new grace period during this time.
    Therefore, rsp->gpnum is constant, and can be propagated to all
    uses of lastcomp, eliminating this local variable.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Reduce the number and variety of race conditions by prohibiting
    the start of a new grace period while force_quiescent_state() is
    active. A new fqs_active flag in the rcu_state structure is used
    to trace whether or not force_quiescent_state() is active, and
    this new flag is tested by rcu_start_gp(). If the CPU that
    closed out the last grace period needs another grace period,
    this new grace period may be delayed up to one scheduling-clock
    tick, but it will eventually get started.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

03 Dec, 2009

3 commits

  • Implement an synchronize_rcu_expedited() for preemptible RCU
    that actually is expedited. This uses
    synchronize_sched_expedited() to force all threads currently
    running in a preemptible-RCU read-side critical section onto the
    appropriate ->blocked_tasks[] list, then takes a snapshot of all
    of these lists and waits for them to drain.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Enable a fourth level of rcu_node hierarchy for TREE_RCU and
    TREE_PREEMPT_RCU. This is for stress-testing and experiemental
    purposes only, although in theory this would enable 16,777,216
    CPUs on 64-bit systems, though only 1,048,576 CPUs on 32-bit
    systems. Normal experimental use of this fourth level will
    normally set CONFIG_RCU_FANOUT=2, requiring a 16-CPU system,
    though the more adventurous (and more fortunate) experimenters
    may wish to chose CONFIG_RCU_FANOUT=3 for 81-CPU systems or even
    CONFIG_RCU_FANOUT=4 for 256-CPU systems.

    Signed-off-by: Paul E. McKenney
    Acked-by: Josh Triplett
    Acked-by: Lai Jiangshan
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • The number of "quiet" functions has grown recently, and the
    names are no longer very descriptive. The point of all of these
    functions is to do some portion of the task of reporting a
    quiescent state, so rename them accordingly:

    o cpu_quiet() becomes rcu_report_qs_rdp(), which reports a
    quiescent state to the per-CPU rcu_data structure. If this
    turns out to be a new quiescent state for this grace period,
    then rcu_report_qs_rnp() will be invoked to propagate the
    quiescent state up the rcu_node hierarchy.

    o cpu_quiet_msk() becomes rcu_report_qs_rnp(), which reports
    a quiescent state for a given CPU (or possibly a set of CPUs)
    up the rcu_node hierarchy.

    o cpu_quiet_msk_finish() becomes rcu_report_qs_rsp(), which
    reports a full set of quiescent states to the global rcu_state
    structure.

    o task_quiet() becomes rcu_report_unblock_qs_rnp(), which reports
    a quiescent state due to a task exiting an RCU read-side critical
    section that had previously blocked in that same critical section.
    As indicated by the new name, this type of quiescent state is
    reported up the rcu_node hierarchy (using rcu_report_qs_rnp()
    to do so).

    Signed-off-by: Paul E. McKenney
    Acked-by: Josh Triplett
    Acked-by: Lai Jiangshan
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

23 Nov, 2009

1 commit

  • When the last CPU of a given leaf rcu_node structure goes
    offline, all of the tasks queued on that leaf rcu_node structure
    (due to having blocked in their current RCU read-side critical
    sections) are requeued onto the root rcu_node structure. This
    requeuing is carried out by rcu_preempt_offline_tasks().
    However, it is possible that these queued tasks are the only
    thing preventing the leaf rcu_node structure from reporting a
    quiescent state up the rcu_node hierarchy. Unfortunately, the
    old code would fail to do this reporting, resulting in a
    grace-period stall given the following sequence of events:

    1. Kernel built for more than 32 CPUs on 32-bit systems or for more
    than 64 CPUs on 64-bit systems, so that there is more than one
    rcu_node structure. (Or CONFIG_RCU_FANOUT is artificially set
    to a number smaller than CONFIG_NR_CPUS.)

    2. The kernel is built with CONFIG_TREE_PREEMPT_RCU.

    3. A task running on a CPU associated with a given leaf rcu_node
    structure blocks while in an RCU read-side critical section
    -and- that CPU has not yet passed through a quiescent state
    for the current RCU grace period. This will cause the task
    to be queued on the leaf rcu_node's blocked_tasks[] array, in
    particular, on the element of this array corresponding to the
    current grace period.

    4. Each of the remaining CPUs corresponding to this same leaf rcu_node
    structure pass through a quiescent state. However, the task is
    still in its RCU read-side critical section, so these quiescent
    states cannot be reported further up the rcu_node hierarchy.
    Nevertheless, all bits in the leaf rcu_node structure's ->qsmask
    field are now zero.

    5. Each of the remaining CPUs go offline. (The events in step
    #4 and #5 can happen in any order as long as each CPU passes
    through a quiescent state before going offline.)

    6. When the last CPU goes offline, __rcu_offline_cpu() will invoke
    rcu_preempt_offline_tasks(), which will move the task to the
    root rcu_node structure, but without reporting a quiescent state
    up the rcu_node hierarchy (and this failure to report a quiescent
    state is the bug).

    But because this leaf rcu_node structure's ->qsmask field is
    already zero and its ->block_tasks[] entries are all empty,
    force_quiescent_state() will skip this rcu_node structure.

    Therefore, grace periods are now hung.

    This patch abstracts some code out of rcu_read_unlock_special(),
    calling the result task_quiet() by analogy with cpu_quiet(), and
    invokes task_quiet() from both rcu_read_lock_special() and
    __rcu_offline_cpu(). Invoking task_quiet() from
    __rcu_offline_cpu() reports the quiescent state up the rcu_node
    hierarchy, fixing the bug. This ends up requiring a separate
    lock_class_key per level of the rcu_node hierarchy, which this
    patch also provides.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

11 Nov, 2009

2 commits

  • This field is used whether or not CONFIG_NO_HZ is set, so the
    old name of ->dynticks_completed is quite misleading.

    Change to ->completed_fqs, given that it the value that
    force_quiescent_state() is trying to drive the ->completed field
    away from.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Some variants of gcc are reputed to dislike forward references
    to functions declared "inline". Remove the "inline" keyword
    from such functions.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

10 Nov, 2009

2 commits

  • Impose a clear locking design on the rcu_process_gp_end()
    function's use of the ->completed counter. This is done by
    creating a ->completed field in the rcu_node structure, which
    can safely be accessed under the protection of that structure's
    lock. Performance and scalability are maintained by using a
    form of double-checked locking, so that rcu_process_gp_end()
    only acquires the leaf rcu_node structure's ->lock if a grace
    period has recently ended.

    This fix reduces rcutorture failure rate by at least two orders
    of magnitude under heavy stress with force_quiescent_state()
    being invoked artificially often. Without this fix,
    unsynchronized access to the ->completed field can cause
    rcu_process_gp_end() to advance callbacks whose grace period has
    not yet expired. (Bad idea!)

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: # .32.x
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Impose a clear locking design on non-NO_HZ handling of the
    ->completed counter. This increases the distance between the
    RCU and the CPU-hotplug mechanisms.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: # .32.x
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

02 Nov, 2009

1 commit

  • Very long RCU read-side critical sections (50 milliseconds or
    so) can cause a race between force_quiescent_state() and
    rcu_start_gp() as follows on kernel builds with multi-level
    rcu_node hierarchies:

    1. CPU 0 calls force_quiescent_state(), sees that there is a
    grace period in progress, and acquires ->fsqlock.

    2. CPU 1 detects the end of the grace period, and so
    cpu_quiet_msk_finish() sets rsp->completed to rsp->gpnum.
    This operation is carried out under the root rnp->lock,
    but CPU 0 has not yet acquired that lock. Note that
    rsp->signaled is still RCU_SAVE_DYNTICK from the last
    grace period.

    3. CPU 1 calls rcu_start_gp(), but no one wants a new grace
    period, so it drops the root rnp->lock and returns.

    4. CPU 0 acquires the root rnp->lock and picks up rsp->completed
    and rsp->signaled, then drops rnp->lock. It then enters the
    RCU_SAVE_DYNTICK leg of the switch statement.

    5. CPU 2 invokes call_rcu(), and now needs a new grace period.
    It calls rcu_start_gp(), which acquires the root rnp->lock, sets
    rsp->signaled to RCU_GP_INIT (too bad that CPU 0 is already in
    the RCU_SAVE_DYNTICK leg of the switch statement!) and starts
    initializing the rcu_node hierarchy. If there are multiple
    levels to the hierarchy, it will drop the root rnp->lock and
    initialize the lower levels of the hierarchy.

    6. CPU 0 notes that rsp->completed has not changed, which permits
    both CPU 2 and CPU 0 to try updating it concurrently. If CPU 0's
    update prevails, later calls to force_quiescent_state() can
    count old quiescent states against the new grace period, which
    can in turn result in premature ending of grace periods.

    Not good.

    This patch adds an RCU_GP_IDLE state for rsp->signaled that is
    set initially at boot time and any time a grace period ends.
    This prevents CPU 0 from getting into the workings of
    force_quiescent_state() in step 4. Additional locking and
    checks prevent the concurrent update of rsp->signaled in step 6.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

16 Oct, 2009

1 commit

  • If the following sequence of events occurs, then
    TREE_PREEMPT_RCU will hang waiting for a grace period to
    complete, eventually OOMing the system:

    o A TREE_PREEMPT_RCU build of the kernel is booted on a system
    with more than 64 physical CPUs present (32 on a 32-bit system).
    Alternatively, a TREE_PREEMPT_RCU build of the kernel is booted
    with RCU_FANOUT set to a sufficiently small value that the
    physical CPUs populate two or more leaf rcu_node structures.

    o A task is preempted in an RCU read-side critical section
    while running on a CPU corresponding to a given leaf rcu_node
    structure.

    o All CPUs corresponding to this same leaf rcu_node structure
    record quiescent states for the current grace period.

    o All of these same CPUs go offline (hence the need for enough
    physical CPUs to populate more than one leaf rcu_node structure).
    This causes the preempted task to be moved to the root rcu_node
    structure.

    At this point, there is nothing left to cause the quiescent
    state to be propagated up the rcu_node tree, so the current
    grace period never completes.

    The simplest fix, especially after considering the deadlock
    possibilities, is to detect this situation when the last CPU is
    offlined, and to set that CPU's ->qsmask bit in its leaf
    rcu_node structure. This will cause the next invocation of
    force_quiescent_state() to end the grace period.

    Without this fix, this hang can be triggered in an hour or so on
    some machines with rcutorture and random CPU onlining/offlining.
    With this fix, these same machines pass a full 10 hours of this
    sort of abuse.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

15 Oct, 2009

1 commit

  • As the number of callbacks on a given CPU rises, invoke
    force_quiescent_state() only every blimit number of callbacks
    (defaults to 10,000), and even then only if no other CPU has
    invoked force_quiescent_state() in the meantime.

    This should fix the performance regression reported by Nick.

    Reported-by: Nick Piggin
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: jens.axboe@oracle.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

07 Oct, 2009

1 commit

  • The current interaction between RCU and CPU hotplug requires that
    RCU block in CPU notifiers waiting for callbacks to drain.

    This can be greatly simplified by having each CPU relinquish its
    own callbacks, and for both _rcu_barrier() and CPU_DEAD notifiers
    to adopt all callbacks that were previously relinquished.

    This change also eliminates the possibility of certain types of
    hangs due to the previous practice of waiting for callbacks to be
    invoked from within CPU notifiers. If you don't every wait, you
    cannot hang.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

06 Oct, 2009

1 commit

  • These issues identified during an old-fashioned face-to-face code
    review extending over many hours. This group improves an existing
    abstraction and introduces two new ones. It also fixes an RCU
    stall-warning bug found while making the other changes.

    o Make RCU_INIT_FLAVOR() declare its own variables, removing
    the need to declare them at each call site.

    o Create an rcu_for_each_leaf() macro that scans the leaf
    nodes of the rcu_node tree.

    o Create an rcu_for_each_node_breadth_first() macro that does
    a breadth-first traversal of the rcu_node tree, AKA
    stepping through the array in index-number order.

    o If all CPUs corresponding to a given leaf rcu_node
    structure go offline, then any tasks queued on that leaf
    will be moved to the root rcu_node structure. Therefore,
    the stall-warning code must dump out tasks queued on the
    root rcu_node structure as well as those queued on the leaf
    rcu_node structures.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

24 Sep, 2009

3 commits

  • Move declarations and update storage classes to make checkpatch happy.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • These issues identified during an old-fashioned face-to-face code
    review extending over many hours.

    o Add comments for tricky parts of code, and correct comments
    that have passed their sell-by date.

    o Get rid of the vestiges of rcu_init_sched(), which is no
    longer needed now that PREEMPT_RCU is gone.

    o Move the #include of rcutree_plugin.h to the end of
    rcutree.c, which means that, rather than having a random
    collection of forward declarations, the new set of forward
    declarations document the set of plugins. The new home for
    this #include also allows __rcu_init_preempt() to move into
    rcutree_plugin.h.

    o Fix rcu_preempt_check_callbacks() to be static.

    Suggested-by: Josh Triplett
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    Peter Zijlstra

    Paul E. McKenney
     
  • These issues identified during an old-fashioned face-to-face code
    review extended over many hours.

    o Bury various forms of the "rsp->completed == rsp->gpnum"
    comparison into an rcu_gp_in_progress() function, which has
    the beneficial side-effect of forcing consistent use of
    ACCESS_ONCE().

    o Replace hand-coded arithmetic with DIV_ROUND_UP().

    o Bury several "!list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x01])"
    instances into an rcu_preempted_readers() function, as this
    expression indicates that there are no readers blocked
    within RCU read-side critical sections blocking the current
    grace period. (Though there might well be similar readers
    blocking the next grace period.)

    o Remove a dangling rcu_restart_cpu() declaration that has
    been dangling for almost 20 minor releases of the kernel.

    Signed-off-by: Paul E. McKenney
    Acked-by: Peter Zijlstra
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

19 Sep, 2009

1 commit

  • Fix a number of whitespace ^Ierrors in the include/linux/rcu*
    and the kernel/rcu* files.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    LKML-Reference:
    [ did more checkpatch fixlets ]
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

29 Aug, 2009

1 commit


23 Aug, 2009

2 commits

  • Create a kernel/rcutree_plugin.h file that contains definitions
    for preemptable RCU (or, under the #else branch of the #ifdef,
    empty definitions for the classic non-preemptable semantics).
    These definitions fit into plugins defined in kernel/rcutree.c
    for this purpose.

    This variant of preemptable RCU uses a new algorithm whose
    read-side expense is roughly that of classic hierarchical RCU
    under CONFIG_PREEMPT. This new algorithm's update-side expense
    is similar to that of classic hierarchical RCU, and, in absence
    of read-side preemption or blocking, is exactly that of classic
    hierarchical RCU. Perhaps more important, this new algorithm
    has a much simpler implementation, saving well over 1,000 lines
    of code compared to mainline's implementation of preemptable
    RCU, which will hopefully be retired in favor of this new
    algorithm.

    The simplifications are obtained by maintaining per-task
    nesting state for running tasks, and using a simple
    lock-protected algorithm to handle accounting when tasks block
    within RCU read-side critical sections, making use of lessons
    learned while creating numerous user-level RCU implementations
    over the past 18 months.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Make RCU-sched, RCU-bh, and RCU-preempt be underlying
    implementations, with "RCU" defined in terms of one of the
    three. Update the outdated rcu_qsctr_inc() names, as these
    functions no longer increment anything.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney