07 Oct, 2015

2 commits

  • Because preempt_disable() maps to barrier() for non-debug builds,
    it forces the compiler to spill and reload registers. Because Tree
    RCU and Tiny RCU now only appear in CONFIG_PREEMPT=n builds, these
    barrier() instances generate needless extra code for each instance of
    rcu_read_lock() and rcu_read_unlock(). This extra code slows down Tree
    RCU and bloats Tiny RCU.

    This commit therefore removes the preempt_disable() and preempt_enable()
    from the non-preemptible implementations of __rcu_read_lock() and
    __rcu_read_unlock(), respectively. However, for debug purposes,
    preempt_disable() and preempt_enable() are still invoked if
    CONFIG_PREEMPT_COUNT=y, because this allows detection of sleeping inside
    atomic sections in non-preemptible kernels.

    However, Tiny and Tree RCU operates by coalescing all RCU read-side
    critical sections on a given CPU that lie between successive quiescent
    states. It is therefore necessary to compensate for removing barriers
    from __rcu_read_lock() and __rcu_read_unlock() by adding them to a
    couple of the RCU functions invoked during quiescent states, namely to
    rcu_all_qs() and rcu_note_context_switch(). However, note that the latter
    is more paranoia than necessity, at least until link-time optimizations
    become more aggressive.

    This is based on an earlier patch by Paul E. McKenney, fixing
    a bug encountered in kernels built with CONFIG_PREEMPT=n and
    CONFIG_PREEMPT_COUNT=y.

    Signed-off-by: Boqun Feng
    Signed-off-by: Paul E. McKenney

    Boqun Feng
     
  • As we now have rcu_callback_t typedefs as the type of rcu callbacks, we
    should use it in call_rcu*() and friends as the type of parameters. This
    could save us a few lines of code and make it clear which function
    requires an rcu callbacks rather than other callbacks as its argument.

    Besides, this can also help cscope to generate a better database for
    code reading.

    Signed-off-by: Boqun Feng
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Boqun Feng
     

23 Jul, 2015

1 commit

  • The get_state_synchronize_rcu() and cond_synchronize_rcu() functions
    allow polling for grace-period completion, with an actual wait for a
    grace period occurring only when cond_synchronize_rcu() is called too
    soon after the corresponding get_state_synchronize_rcu(). However,
    these functions work only for vanilla RCU. This commit adds the
    get_state_synchronize_sched() and cond_synchronize_sched(), which provide
    the same capability for RCU-sched.

    Reported-by: Peter Zijlstra (Intel)
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

28 May, 2015

1 commit

  • The Tiny RCU counterparts to rcu_idle_enter(), rcu_idle_exit(),
    rcu_irq_enter(), and rcu_irq_exit() are empty functions, but each has
    EXPORT_SYMBOL_GPL(), which needlessly consumes extra memory, especially
    in kernels built with module support. This commit therefore moves these
    functions to static inlines in rcutiny.h, removing the need for exports.

    This won't affect the size of the tiniest kernels, which are likely
    built without module support, but might help semi-tiny kernels that
    might include module support.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

16 Jan, 2015

2 commits

  • …rcu.2015.01.06a', 'stall.2015.01.16a' and 'torture.2015.01.11a' into HEAD

    doc.2015.01.07a: Documentation updates.
    fixes.2015.01.15a: Miscellaneous fixes.
    preempt.2015.01.06a: Changes to handling of lists of preempted tasks.
    srcu.2015.01.06a: SRCU updates.
    stall.2015.01.16a: RCU CPU stall-warning updates and fixes.
    torture.2015.01.11a: RCU torture-test updates and fixes.

    Paul E. McKenney
     
  • Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used
    in places where it would be useful for it to apply to the normal RCU
    flavors, rcu_preempt, rcu_sched, and rcu_bh. This is especially the
    case for workloads that aggressively overload the system, particularly
    those that generate large numbers of RCU updates on systems running
    NO_HZ_FULL CPUs. This commit therefore communicates quiescent states
    from cond_resched_rcu_qs() to the normal RCU flavors.

    Note that it is unfortunately necessary to leave the old ->passed_quiesce
    mechanism in place to allow quiescent states that apply to only one
    flavor to be recorded. (Yes, we could decrement ->rcu_qs_ctr_snap in
    that case, but that is not so good for debugging of RCU internals.)
    In addition, if one of the RCU flavor's grace period has stalled, this
    will invoke rcu_momentary_dyntick_idle(), resulting in a heavy-weight
    quiescent state visible from other CPUs.

    Reported-by: Sasha Levin
    Reported-by: Dave Jones
    Signed-off-by: Paul E. McKenney
    [ paulmck: Merge commit from Sasha Levin fixing a bug where __this_cpu()
    was used in preemptible code. ]

    Paul E. McKenney
     

11 Jan, 2015

3 commits

  • Currently, rcutorture's Reader Batch checks measure from the end of
    the previous grace period to the end of the current one. This commit
    tightens up these checks by measuring from the start and end of the same
    grace period. This involves adding rcu_batches_started() and friends
    corresponding to the existing rcu_batches_completed() and friends.

    We leave SRCU alone for the moment, as it does not yet have a way of
    tracking both ends of its grace periods.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • A bug in rcutorture has caused it to ignore completed batches.
    In preparation for fixing that bug, this commit provides TINY_RCU with
    the required rcu_batches_completed_sched().

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Long ago, the various ->completed fields were of type long, but now are
    unsigned long due to signed-integer-overflow concerns. However, the
    various _batches_completed() functions remained of type long, even though
    their only purpose in life is to return the corresponding ->completed
    field. This patch cleans this up by changing these functions' return
    types to unsigned long.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

04 Nov, 2014

1 commit

  • The "cpu" argument to rcu_note_context_switch() is always the current
    CPU, so drop it. This in turn allows the "cpu" argument to
    rcu_preempt_note_context_switch() to be removed, which allows the sole
    use of "cpu" in both functions to be replaced with a this_cpu_ptr().
    Again, the anticipated cross-CPU uses of these functions has been
    replaced by NO_HZ_FULL.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Pranith Kumar

    Paul E. McKenney
     

08 Sep, 2014

1 commit

  • The rcu_bh_qs(), rcu_preempt_qs(), and rcu_sched_qs() functions use
    old-style per-CPU variable access and write to ->passed_quiesce even
    if it is already set. This commit therefore updates to use the new-style
    per-CPU variable access functions and avoids the spurious writes.
    This commit also eliminates the "cpu" argument to these functions because
    they are always invoked on the indicated CPU.

    Reported-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

15 May, 2014

1 commit


21 Mar, 2014

1 commit

  • The following pattern is currently not well supported by RCU:

    1. Make data element inaccessible to RCU readers.

    2. Do work that probably lasts for more than one grace period.

    3. Do something to make sure RCU readers in flight before #1 above
    have completed.

    Here are some things that could currently be done:

    a. Do a synchronize_rcu() unconditionally at either #1 or #3 above.
    This works, but imposes needless work and latency.

    b. Post an RCU callback at #1 above that does a wakeup, then
    wait for the wakeup at #3. This works well, but likely results
    in an extra unneeded grace period. Open-coding this is also
    a bit more semi-tricky code than would be good.

    This commit therefore adds get_state_synchronize_rcu() and
    cond_synchronize_rcu() APIs. Call get_state_synchronize_rcu() at #1
    above and pass its return value to cond_synchronize_rcu() at #3 above.
    This results in a call to synchronize_rcu() if no grace period has
    elapsed between #1 and #3, but requires only a load, comparison, and
    memory barrier if a full grace period did elapse.

    Requested-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney
    Acked-by: Peter Zijlstra

    Paul E. McKenney
     

26 Feb, 2014

1 commit


18 Feb, 2014

2 commits

  • If CONFIG_RCU_NOCB_CPU_ALL=y, then rcu_needs_cpu() will always
    return false, however, the current version nevertheless checks
    for RCU callbacks. This commit therefore creates a static inline
    implementation of rcu_needs_cpu() that unconditionally returns false
    when CONFIG_RCU_NOCB_CPU_ALL=y.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • All of the RCU source files have the usual GPL header, which contains a
    long-obsolete postal address for FSF. To avoid the need to track the
    FSF office's movements, this commit substitutes the URL where GPL may
    be found.

    Reported-by: Greg KH
    Reported-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

13 Dec, 2013

1 commit


25 Sep, 2013

2 commits

  • The old rcu_is_cpu_idle() function is just __rcu_is_watching() with
    preemption disabled. This commit therefore renames rcu_is_cpu_idle()
    to rcu_is_watching.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • There is currently no way for kernel code to determine whether it
    is safe to enter an RCU read-side critical section, in other words,
    whether or not RCU is paying attention to the currently running CPU.
    Given the large and increasing quantity of code shared by the idle loop
    and non-idle code, the this shortcoming is becoming increasingly painful.

    This commit therefore adds __rcu_is_watching(), which returns true if
    it is safe to enter an RCU read-side critical section on the currently
    running CPU. This function is quite fast, using only a __this_cpu_read().
    However, the caller must disable preemption.

    Reported-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

11 Jun, 2013

5 commits

  • Now that TINY_PREEMPT_RCU is no more, exit_rcu() is always an empty
    function. But if TINY_RCU is going to have an empty function, it should
    be in include/linux/rcutiny.h, where it does not bloat the kernel.
    This commit therefore moves exit_rcu() out of kernel/rcupdate.c to
    kernel/rcutree_plugin.h, and places a static inline empty function in
    include/linux/rcutiny.h in order to shrink TINY_RCU a bit.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • With the removal of CONFIG_TINY_PREEMPT_RCU, rcu_preempt_note_context_switch()
    is now an empty function. This commit therefore eliminates it by inlining it.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Now that CONFIG_TINY_PREEMPT_RCU is no more, this commit removes
    the CONFIG_TINY_RCU ifdefs from include/linux/rcutiny.h in favor of
    unconditionally compiling the CONFIG_TINY_RCU legs of those ifdefs.

    Signed-off-by: Paul E. McKenney
    [ paulmck: Moved removal of #else to "Remove TINY_PREEMPT_RCU" as
    suggested by Josh Triplett. ]
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • TINY_PREEMPT_RCU could use a kthread to handle RCU callback invocation,
    which required an API to abstract kthread vs. softirq invocation.
    Now that TINY_PREEMPT_RCU is no longer with us, this commit retires
    this API in favor of direct use of the relevant softirq primitives.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • TINY_PREEMPT_RCU adds significant code and complexity, but does not
    offer commensurate benefits. People currently using TINY_PREEMPT_RCU
    can get much better memory footprint with TINY_RCU, or, if they really
    need preemptible RCU, they can use TREE_PREEMPT_RCU with a relatively
    minor degradation in memory footprint. Please note that this move
    has been widely publicized on LKML (https://lkml.org/lkml/2012/11/12/545)
    and on LWN (http://lwn.net/Articles/541037/).

    This commit therefore removes TINY_PREEMPT_RCU.

    Signed-off-by: Paul E. McKenney
    [ paulmck: Updated to eliminate #else in rcutiny.h as suggested by Josh ]
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

03 Jul, 2012

1 commit

  • This reverts commit 616c310e83b872024271c915c1b9ab505b9efad9.
    (Move PREEMPT_RCU preemption to switch_to() invocation).
    Testing by Sasha Levin showed that this
    can result in deadlock due to invoking the scheduler when one of
    the runqueue locks is held. Because this commit was simply a
    performance optimization, revert it.

    Reported-by: Sasha Levin
    Signed-off-by: Paul E. McKenney
    Tested-by: Sasha Levin

    Paul E. McKenney
     

07 Jun, 2012

1 commit

  • When a CPU is entering dyntick-idle mode, tick_nohz_stop_sched_tick()
    calls rcu_needs_cpu() see if RCU needs that CPU, and, if not, computes the
    next wakeup time based on the timer wheels. Only later, when actually
    entering the idle loop, rcu_prepare_for_idle() will be invoked. In some
    cases, rcu_prepare_for_idle() will post timers to wake the CPU back up.
    But all for naught: The next wakeup time for the CPU has already been
    computed, and posting a timer afterwards does not force that wakeup
    time to be recomputed. This means that rcu_prepare_for_idle()'s have
    no effect.

    This is not a problem on a busy system because something else will wake
    up the CPU soon enough. However, on lightly loaded systems, the CPU
    might stay asleep for a considerable length of time. If that CPU has
    a callback that the rest of the system is waiting on, the system might
    run very slowly or (in theory) even hang.

    This commit avoids this problem by having rcu_needs_cpu() give
    tick_nohz_stop_sched_tick() an estimate of when RCU will need the CPU
    to wake back up, which tick_nohz_stop_sched_tick() takes into account
    when programming the CPU's wakeup time. An alternative approach is
    for rcu_prepare_for_idle() to use hrtimers instead of normal timers,
    but timers are much more efficient than are hrtimers for frequently
    and repeatedly posting and cancelling a given timer, which is exactly
    what RCU_FAST_NO_HZ does.

    Reported-by: Pascal Chapperon
    Reported-by: Heiko Carstens
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Heiko Carstens
    Tested-by: Pascal Chapperon

    Paul E. McKenney
     

03 May, 2012

2 commits

  • When running preemptible RCU, if a task exits in an RCU read-side
    critical section having blocked within that same RCU read-side critical
    section, the task must be removed from the list of tasks blocking a
    grace period (perhaps the current grace period, perhaps the next grace
    period, depending on timing). The exit() path invokes exit_rcu() to
    do this cleanup.

    However, the current implementation of exit_rcu() needlessly does the
    cleanup even if the task did not block within the current RCU read-side
    critical section, which wastes time and needlessly increases the size
    of the state space. Fix this by only doing the cleanup if the current
    task is actually on the list of tasks blocking some grace period.

    While we are at it, consolidate the two identical exit_rcu() functions
    into a single function.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Linus Torvalds

    Conflicts:

    kernel/rcupdate.c

    Paul E. McKenney
     
  • Currently, PREEMPT_RCU readers are enqueued upon entry to the scheduler.
    This is inefficient because enqueuing is required only if there is a
    context switch, and entry to the scheduler does not guarantee a context
    switch.

    The commit therefore moves the enqueuing to immediately precede the
    call to switch_to() from the scheduler.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Linus Torvalds

    Paul E. McKenney
     

22 Feb, 2012

2 commits

  • This is a port of commit #b0d3041 from TREE_RCU to TREE_PREEMPT_RCU.

    Under some rare but real combinations of configuration parameters, RCU
    callbacks are posted during early boot that use kernel facilities that are
    not yet initialized. Therefore, when these callbacks are invoked, hard
    hangs and crashes ensue. This commit therefore prevents RCU callbacks
    from being invoked until after the scheduler is fully up and running,
    as in after multiple tasks have been spawned.

    It might well turn out that a better approach is to identify the specific
    RCU callbacks that are causing this problem, but that discussion will
    wait until such time as someone really needs an RCU callback to be invoked
    (as opposed to merely registered) during early boot.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • When CONFIG_RCU_FAST_NO_HZ is enabled, RCU will allow a given CPU to
    enter dyntick-idle mode even if it still has RCU callbacks queued.
    RCU avoids system hangs in this case by scheduling a timer for several
    jiffies in the future. However, if all of the callbacks on that CPU
    are from kfree_rcu(), there is no reason to wake the CPU up, as it is
    not a problem to defer freeing of memory.

    This commit therefore tracks the number of callbacks on a given CPU
    that are from kfree_rcu(), and avoids scheduling the timer if all of
    a given CPU's callbacks are from kfree_rcu().

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

29 Sep, 2011

2 commits


06 May, 2011

1 commit


30 Nov, 2010

1 commit

  • The first version of synchronize_sched_expedited() used the migration
    code in the scheduler, and was therefore implemented in kernel/sched.c.
    However, the more recent version of this code no longer uses the
    migration code, so this commit moves it to the main RCU source files.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     

18 Nov, 2010

1 commit

  • If RCU priority boosting is to be meaningful, callback invocation must
    be boosted in addition to preempted RCU readers. Otherwise, in presence
    of CPU real-time threads, the grace period ends, but the callbacks don't
    get invoked. If the callbacks don't get invoked, the associated memory
    doesn't get freed, so the system is still subject to OOM.

    But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit
    moves the callback invocations to a kthread, which can be boosted easily.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

21 Aug, 2010

3 commits

  • The CONFIG_PREEMPT_RCU kernel configuration parameter was recently
    re-introduced, but as an indication of the type of RCU (preemptible
    vs. non-preemptible) instead of as selecting a given implementation.
    This commit uses CONFIG_PREEMPT_RCU to combine duplicate code
    from include/linux/rcutiny.h and include/linux/rcutree.h into
    include/linux/rcupdate.h. This commit also combines a few other pieces
    of duplicate code that have accumulated.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Combine the duplicate definitions of ULONG_CMP_GE(), ULONG_CMP_LT(),
    and rcu_preempt_depth() into include/linux/rcupdate.h.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • When using a kernel debugger, a long sojourn in the debugger can get
    you lots of RCU CPU stall warnings once you resume. This might not be
    helpful, especially if you are using the system console. This patch
    therefore allows RCU CPU stall warnings to be suppressed, but only for
    the duration of the current set of grace periods.

    This differs from Jason's original patch in that it adds support for
    tiny RCU and preemptible RCU, and uses a slightly different method for
    suppressing the RCU CPU stall warning messages.

    Signed-off-by: Jason Wessel
    Signed-off-by: Paul E. McKenney
    Tested-by: Jason Wessel

    Paul E. McKenney
     

20 Aug, 2010

1 commit

  • Implement a small-memory-footprint uniprocessor-only implementation of
    preemptible RCU. This implementation uses but a single blocked-tasks
    list rather than the combinatorial number used per leaf rcu_node by
    TREE_PREEMPT_RCU, which reduces memory consumption and greatly simplifies
    processing. This version also takes advantage of uniprocessor execution
    to accelerate grace periods in the case where there are no readers.

    The general design is otherwise broadly similar to that of TREE_PREEMPT_RCU.

    This implementation is a step towards having RCU implementation driven
    off of the SMP and PREEMPT kernel configuration variables, which can
    happen once this implementation has accumulated sufficient experience.

    Removed ACCESS_ONCE() from __rcu_read_unlock() and added barrier() as
    suggested by Steve Rostedt in order to avoid the compiler-reordering
    issue noted by Mathieu Desnoyers (http://lkml.org/lkml/2010/8/16/183).

    As can be seen below, CONFIG_TINY_PREEMPT_RCU represents almost 5Kbyte
    savings compared to CONFIG_TREE_PREEMPT_RCU. Of course, for non-real-time
    workloads, CONFIG_TINY_RCU is even better.

    CONFIG_TREE_PREEMPT_RCU

    text data bss dec filename
    13 0 0 13 kernel/rcupdate.o
    6170 825 28 7023 kernel/rcutree.o
    ----
    7026 Total

    CONFIG_TINY_PREEMPT_RCU

    text data bss dec filename
    13 0 0 13 kernel/rcupdate.o
    2081 81 8 2170 kernel/rcutiny.o
    ----
    2183 Total

    CONFIG_TINY_RCU (non-preemptible)

    text data bss dec filename
    13 0 0 13 kernel/rcupdate.o
    719 25 0 744 kernel/rcutiny.o
    ---
    757 Total

    Requested-by: Loïc Minier
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

18 May, 2010

1 commit

  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
    stop_machine: Move local variable closer to the usage site in cpu_stop_cpu_callback()
    sched, wait: Use wrapper functions
    sched: Remove a stale comment
    ondemand: Make the iowait-is-busy time a sysfs tunable
    ondemand: Solve a big performance issue by counting IOWAIT time as busy
    sched: Intoduce get_cpu_iowait_time_us()
    sched: Eliminate the ts->idle_lastupdate field
    sched: Fold updating of the last_update_time_info into update_ts_time_stats()
    sched: Update the idle statistics in get_cpu_idle_time_us()
    sched: Introduce a function to update the idle statistics
    sched: Add a comment to get_cpu_idle_time_us()
    cpu_stop: add dummy implementation for UP
    sched: Remove rq argument to the tracepoints
    rcu: need barrier() in UP synchronize_sched_expedited()
    sched: correctly place paranioa memory barriers in synchronize_sched_expedited()
    sched: kill paranoia check in synchronize_sched_expedited()
    sched: replace migration_thread with cpu_stop
    stop_machine: reimplement using cpu_stop
    cpu_stop: implement stop_cpu[s]()
    sched: Fix select_idle_sibling() logic in select_task_rq_fair()
    ...

    Linus Torvalds