08 Feb, 2013

6 commits

  • The old SRCU implementation loads sp->completed within an
    RCU-sched section, courtesy of preempt_disable(). This was required
    due to the use of synchronize_sched() in the old implemenation's
    synchronize_srcu(). However, the new implementation does not rely
    on synchronize_sched(), so it in turn does not require the load of
    sp->completed and the ->c[] counter to be in a single preempt-disabled
    region of code. This commit therefore moves the sp->completed access
    outside of the preempt-disabled region and applies ACCESS_ONCE().

    The resulting code is almost as the same as before, but it removes the
    now-misleading rcu_dereference_index_check() call.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     
  • Because synchronize_srcu_expedited() no longer uses
    synchronize_rcu_sched_expedited(), synchronize_srcu_expedited() no longer
    indirectly acquires any CPU-hotplug-related locks. This commit therefore
    updates the comments accordingly.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     
  • The core of SRCU is changed, but synchronize_srcu()'s comments describe
    the old algorithm. This commit therefore updates them to match the
    new algorithm.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     
  • Pack six lines of code into two lines.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     
  • Although synchronize_srcu() can sleep, it will not sleep if the fast
    path succeeds, which means that illegal use of synchronize_rcu()
    might go unnoticed. This commit therefore adds might_sleep(), which
    unconditionally catches illegal use of synchronize_rcu() from atomic
    context.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     
  • This commit replaces disabling of preemption and decrement of a per-CPU
    variable with this_cpu_dec(), which avoids preemption disabling on x86
    and shortens the code on all platforms.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     

17 Nov, 2012

1 commit

  • …cu.2012.10.27a', 'stall.2012.11.13a', 'tracing.2012.11.08a' and 'idle.2012.10.24a' into HEAD

    urgent.2012.10.27a: Fix for RCU user-mode transition (already in -tip).

    doc.2012.11.08a: Documentation updates, most notably codifying the
    memory-barrier guarantees inherent to grace periods.

    fixes.2012.11.13a: Miscellaneous fixes.

    srcu.2012.10.27a: Allow statically allocated and initialized srcu_struct
    structures (courtesy of Lai Jiangshan).

    stall.2012.11.13a: Add more diagnostic information to RCU CPU stall
    warnings, also decrease from 60 seconds to 21 seconds.

    hotplug.2012.11.08a: Minor updates to CPU hotplug handling.

    tracing.2012.11.08a: Improved debugfs tracing, courtesy of Michael Wang.

    idle.2012.10.24a: Updates to RCU idle/adaptive-idle handling, including
    a boot parameter that maps normal grace periods to expedited.

    Resolved conflict in kernel/rcutree.c due to side-by-side change.

    Paul E. McKenney
     

24 Oct, 2012

3 commits

  • Because process_srcu() will be used in DEFINE_SRCU(), which is a macro
    that could be expanded pretty much anywhere, it can no longer be static.
    Note that process_srcu() is still internal to srcu.h.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     
  • Lai Jiangshan rewrote SRCU, so this commit ensures that he gets his
    proper share of blame^Wcredit.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     
  • There have been some embedded applications that would benefit from
    use of expedited grace-period primitives. In some ways, this is
    similar to synchronize_net() doing either a normal or an expedited
    grace period depending on lock state, but with control outside of
    the kernel.

    This commit therefore adds rcu_expedited boot and sysfs parameters
    that cause the kernel to substitute expedited primitives for the
    normal grace-period primitives.

    [ paulmck: Add trace/event/rcu.h to kernel/srcu.c to avoid build error.
    Get rid of infinite loop through contention path.]

    Signed-off-by: Antti P Miettinen
    Signed-off-by: Paul E. McKenney

    Antti P Miettinen
     

21 Aug, 2012

1 commit

  • system_nrt[_freezable]_wq are now spurious. Mark them deprecated and
    convert all users to system[_freezable]_wq.

    If you're cc'd and wondering what's going on: Now all workqueues are
    non-reentrant, so there's no reason to use system_nrt[_freezable]_wq.
    Please use system[_freezable]_wq instead.

    This patch doesn't make any functional difference.

    Signed-off-by: Tejun Heo
    Acked-By: Lai Jiangshan

    Cc: Jens Axboe
    Cc: David Airlie
    Cc: Jiri Kosina
    Cc: "David S. Miller"
    Cc: Rusty Russell
    Cc: "Paul E. McKenney"
    Cc: David Howells

    Tejun Heo
     

01 May, 2012

9 commits

  • This commit implements an SRCU state machine in support of call_srcu().
    The state machine is preemptible, light-weight, and single-threaded,
    minimizing synchronization overhead. In particular, there is no longer
    any need for synchronize_srcu() to be guarded by a mutex.

    Expedited processing is handled, at least in the absence of concurrent
    grace-period operations on that same srcu_struct structure, by having
    the synchronize_srcu_expedited() thread take on the role of the
    workqueue thread for one iteration.

    There is a reasonable probability that a given SRCU callback will
    be invoked on the same CPU that registered it, however, there is no
    guarantee. Concurrent SRCU grace-period primitives can cause callbacks
    to be executed elsewhere, even in absence of CPU-hotplug operations.

    Callbacks execute in process context, but under the influence of
    local_bh_disable(), so it is illegal to sleep in an SRCU callback
    function.

    Signed-off-by: Lai Jiangshan
    Acked-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     
  • The earlier algorithm used an "expedited" flag combined with a "trycount"
    counter to differentiate between normal and expedited SRCU grace periods.
    However, the difference can be encoded into a single counter with a cutoff
    value and different initial values for expedited and normal SRCU grace
    periods. This commit makes that change.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Conflicts:

    kernel/srcu.c

    Lai Jiangshan
     
  • Expand the calls to srcu_readers_active_idx() from srcu_readers_active()
    inline. This change improves cache locality by interating over the CPUs
    once rather than twice.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     
  • This commit implements a variant of Peter's algorithm, which may be found
    at https://lkml.org/lkml/2012/2/1/119.

    o Make the checking lock-free to enable parallel checking.
    Parallel checking is required when (1) the original checking
    task is preempted for a long time, (2) sychronize_srcu_expedited()
    starts during an ongoing SRCU grace period, or (3) we wish to
    avoid acquiring a lock.

    o Since the checking is lock-free, we avoid a mutex in state machine
    for call_srcu().

    o Remove the SRCU_REF_MASK and remove the coupling with the flipping.
    This might allow us to remove the preempt_disable() in future
    versions, though such removal will need great care because it
    rescinds the one-old-reader-per-CPU guarantee.

    o Remove a smp_mb(), simplify the comments and make the smp_mb() pairs
    more intuitive.

    Inspired-by: Peter Zijlstra
    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     
  • The safety of SRCU is provided byy wait_idx() rather than flipping.
    The flipping actually prevents starvation.

    This commit therefore updates the comments to more accurately and
    precisely describe what is going on.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     
  • This is an optimization of the SRCU grace period. To guard against
    preempted readers with old values of the counter, it suffices to scan the
    old counters once more, then flip ->completed only one time. The reason
    this works is that the old readers must have incremented the old set of
    counters (if they have not yet incremented, then their critical section
    starts after this grace period, so they may be safely ignored).

    This commit therefore optimizes the second flip out in favor of a simple
    rescan.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     
  • The purpose of the upper bit of SRCU's per-CPU counters is to guarantee
    that no reasonable series of srcu_read_lock() and srcu_read_unlock()
    operations can return the value of the counter to its original value.
    This guarantee is require only after the index has been switched to
    the other set of counters, so at most one srcu_read_lock() can affect
    a given CPU's counter. The number of srcu_read_unlock() operations
    on a given counter is limited to the number of tasks in the system,
    which given the Linux kernel's current structure is limited to far less
    than 2^30 on 32-bit systems and far less than 2^62 on 64-bit systems.
    (Something about a limited number of bytes in the kernel's address space.)

    Therefore, if srcu_read_lock() increments the upper bits, then
    srcu_read_unlock() need not do so. In this case, an srcu_read_lock() and
    an srcu_read_unlock() will flip the lower bit of the upper field of the
    counter. An unreasonably large additional number of srcu_read_unlock()
    operations would be required to return the counter to its initial value,
    thus preserving the guarantee.

    This commit takes this approach, which further allows it to shrink
    the size of the upper field to one bit, making the number of
    srcu_read_unlock() operations required to return the counter to its
    initial value even more unreasonable than before.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     
  • The fastpath in __synchronize_srcu() is designed to handle cases where
    there are a large number of concurrent calls for the same srcu_struct
    structure. However, the Linux kernel currently does not use SRCU in
    this manner, so remove the fastpath checks for simplicity.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     
  • The current implementation of synchronize_srcu_expedited() can cause
    severe OS jitter due to its use of synchronize_sched(), which in turn
    invokes try_stop_cpus(), which causes each CPU to be sent an IPI.
    This can result in severe performance degradation for real-time workloads
    and especially for short-interation-length HPC workloads. Furthermore,
    because only one instance of try_stop_cpus() can be making forward progress
    at a given time, only one instance of synchronize_srcu_expedited() can
    make forward progress at a time, even if they are all operating on
    distinct srcu_struct structures.

    This commit, inspired by an earlier implementation by Peter Zijlstra
    (https://lkml.org/lkml/2012/1/31/211) and by further offline discussions,
    takes a strictly algorithmic bits-in-memory approach. This has the
    disadvantage of requiring one explicit memory-barrier instruction in
    each of srcu_read_lock() and srcu_read_unlock(), but on the other hand
    completely dispenses with OS jitter and furthermore allows SRCU to be
    used freely by CPUs that RCU believes to be idle or offline.

    The update-side implementation handles the single read-side memory
    barrier by rechecking the per-CPU counters after summing them and
    by running through the update-side state machine twice.

    This implementation has passed moderate rcutorture testing on both
    x86 and Power. Also updated to use this_cpu_ptr() instead of per_cpu_ptr(),
    as suggested by Peter Zijlstra.

    Reported-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Acked-by: Peter Zijlstra
    Reviewed-by: Lai Jiangshan

    Paul E. McKenney
     

22 Feb, 2012

2 commits


31 Oct, 2011

1 commit

  • The changed files were only including linux/module.h for the
    EXPORT_SYMBOL infrastructure, and nothing else. Revector them
    onto the isolated export header for faster compile times.

    Nothing to see here but a whole lot of instances of:

    -#include
    +#include

    This commit is only changing the kernel dir; next targets
    will probably be mm, fs, the arch dirs, etc.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

14 Jan, 2011

1 commit

  • Because the adaptive synchronize_srcu_expedited() approach has
    worked very well in testing, remove the kernel parameter and
    replace it by a C-preprocessor macro. If someone finds problems
    with this approach, a more complex and aggressively adaptive
    approach might be required.

    Longer term, SRCU will be merged with the other RCU implementations,
    at which point synchronize_srcu_expedited() will be event driven,
    just as synchronize_sched_expedited() currently is. At that point,
    there will be no need for this adaptive approach.

    Reported-by: Linus Torvalds
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

30 Nov, 2010

1 commit

  • The synchronize_srcu_expedited() function is currently quick if there
    are no active readers, but will delay a full jiffy if there are any.
    If these readers leave their SRCU read-side critical sections quickly,
    this is way too long to wait. So this commit first waits ten microseconds,
    and only then falls back to jiffy-at-a-time waiting.

    Reported-by: Avi Kivity
    Reported-by: Marcelo Tosatti
    Tested-by: Takuya Yoshikawa
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

24 Sep, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

25 Feb, 2010

1 commit

  • Inspection is proving insufficient to catch all RCU misuses,
    which is understandable given that rcu_dereference() might be
    protected by any of four different flavors of RCU (RCU, RCU-bh,
    RCU-sched, and SRCU), and might also/instead be protected by any
    of a number of locking primitives. It is therefore time to
    enlist the aid of lockdep.

    This set of patches is inspired by earlier work by Peter
    Zijlstra and Thomas Gleixner, and takes the following approach:

    o Set up separate lockdep classes for RCU, RCU-bh, and RCU-sched.

    o Set up separate lockdep classes for each instance of SRCU.

    o Create primitives that check for being in an RCU read-side
    critical section. These return exact answers if lockdep is
    fully enabled, but if unsure, report being in an RCU read-side
    critical section. (We want to avoid false positives!)
    The primitives are:

    For RCU: rcu_read_lock_held(void)

    For RCU-bh: rcu_read_lock_bh_held(void)

    For RCU-sched: rcu_read_lock_sched_held(void)

    For SRCU: srcu_read_lock_held(struct srcu_struct *sp)

    o Add rcu_dereference_check(), which takes a second argument
    in which one places a boolean expression based on the above
    primitives and/or lockdep_is_held().

    o A new kernel configuration parameter, CONFIG_PROVE_RCU, enables
    rcu_dereference_check(). This depends on CONFIG_PROVE_LOCKING,
    and should be quite helpful during the transition period while
    CONFIG_PROVE_RCU-unaware patches are in flight.

    The existing rcu_dereference() primitive does no checking, but
    upcoming patches will change that.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

16 Jan, 2010

1 commit

  • Rename local variable "i" in rcu_init() to avoid conflict with
    RCU_INIT_FLAVOR(), restrict the scope of RCU_TREE_NONCORE, and
    make __synchronize_srcu() static.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

26 Oct, 2009

1 commit

  • This patch creates a synchronize_srcu_expedited() that uses
    synchronize_sched_expedited() where synchronize_srcu()
    uses synchronize_sched(). The synchronize_srcu() and
    synchronize_srcu_expedited() functions become one-liners that
    pass synchronize_sched() or synchronize_sched_expedited(),
    repectively, to a new __synchronize_srcu() function.

    While in the file, move the EXPORT_SYMBOL_GPL()s to immediately
    follow the corresponding functions.

    Requested-by: Avi Kivity
    Tested-by: Marcelo Tosatti
    Signed-off-by: Paul E. McKenney
    Acked-by: Josh Triplett
    Reviewed-by: Lai Jiangshan
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: avi@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

07 Feb, 2008

1 commit


04 Oct, 2006

2 commits

  • Currently the init_srcu_struct() routine has no way to report out-of-memory
    errors. This patch (as761) makes it return -ENOMEM when the per-cpu data
    allocation fails.

    The patch also makes srcu_init_notifier_head() report a BUG if a notifier
    head can't be initialized. Perhaps it should return -ENOMEM instead, but
    in the most likely cases where this might occur I don't think any recovery
    is possible. Notifier chains generally are not created dynamically.

    [akpm@osdl.org: avoid statement-with-side-effect in macro]
    Signed-off-by: Alan Stern
    Acked-by: Paul E. McKenney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Stern
     
  • Updated patch adding a variant of RCU that permits sleeping in read-side
    critical sections. SRCU is as follows:

    o Each use of SRCU creates its own srcu_struct, and each
    srcu_struct has its own set of grace periods. This is
    critical, as it prevents one subsystem with a blocking
    reader from holding up SRCU grace periods for other
    subsystems.

    o The SRCU primitives (srcu_read_lock(), srcu_read_unlock(),
    and synchronize_srcu()) all take a pointer to a srcu_struct.

    o The SRCU primitives must be called from process context.

    o srcu_read_lock() returns an int that must be passed to
    the matching srcu_read_unlock(). Realtime RCU avoids the
    need for this by storing the state in the task struct,
    but SRCU needs to allow a given code path to pass through
    multiple SRCU domains -- storing state in the task struct
    would therefore require either arbitrary space in the
    task struct or arbitrary limits on SRCU nesting. So I
    kicked the state-storage problem up to the caller.

    Of course, it is not permitted to call synchronize_srcu()
    while in an SRCU read-side critical section.

    o There is no call_srcu(). It would not be hard to implement
    one, but it seems like too easy a way to OOM the system.
    (Hey, we have enough trouble with call_rcu(), which does
    -not- permit readers to sleep!!!) So, if you want it,
    please tell me why...

    [josht@us.ibm.com: sparse notation]
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Josh Triplett
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul E. McKenney