27 Mar, 2019

1 commit

  • The cleanup_srcu_struct_quiesced() function was added because NVME
    used WQ_MEM_RECLAIM workqueues and SRCU did not, which meant that
    NVME workqueues waiting on SRCU workqueues could result in deadlocks
    during low-memory conditions. However, SRCU now also has WQ_MEM_RECLAIM
    workqueues, so there is no longer a potential for deadlock. Furthermore,
    it turns out to be extremely hard to use cleanup_srcu_struct_quiesced()
    correctly due to the fact that SRCU callback invocation accesses the
    srcu_struct structure's per-CPU data area just after callbacks are
    invoked. Therefore, the usual practice of using srcu_barrier() to wait
    for callbacks to be invoked before invoking cleanup_srcu_struct_quiesced()
    fails because SRCU's callback-invocation workqueue handler might be
    delayed, which can result in cleanup_srcu_struct_quiesced() being invoked
    (and thus freeing the per-CPU data) before the SRCU's callback-invocation
    workqueue handler is finished using that per-CPU data. Nor is this a
    theoretical problem: KASAN emitted use-after-free warnings because of
    this problem on actual runs.

    In short, NVME can now safely invoke cleanup_srcu_struct(), which
    avoids the use-after-free scenario. And cleanup_srcu_struct_quiesced()
    is quite difficult to use safely. This commit therefore removes
    cleanup_srcu_struct_quiesced(), switching its sole user back to
    cleanup_srcu_struct(). This effectively reverts the following pair
    of commits:

    f7194ac32ca2 ("srcu: Add cleanup_srcu_struct_quiesced()")
    4317228ad9b8 ("nvme: Avoid flush dependency in delete controller flow")

    Reported-by: Bart Van Assche
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Bart Van Assche
    Tested-by: Bart Van Assche

    Paul E. McKenney
     

10 Feb, 2019

2 commits


26 Jan, 2019

1 commit


28 Nov, 2018

1 commit

  • In RCU, the distinction between "rsp", "rnp", and "rdp" has served well
    for a great many years, but in SRCU, "sp" vs. "sdp" has proven confusing.
    This commit therefore renames SRCU's "sp" pointers to "ssp", so that there
    is "ssp" for srcu_struct pointer, "snp" for srcu_node pointer, and "sdp"
    for srcu_data pointer.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

09 Nov, 2018

1 commit

  • Fix kernel-doc warnings for missing parameter descriptions:

    ../include/linux/srcu.h:175: warning: Function parameter or member 'p' not described in 'srcu_dereference_notrace'
    ../include/linux/srcu.h:175: warning: Function parameter or member 'sp' not described in 'srcu_dereference_notrace'

    Fixes: 0b764a6e4e19d ("srcu: Add notrace variant of srcu_dereference")

    Signed-off-by: Randy Dunlap
    Cc: Lai Jiangshan
    Cc: "Paul E. McKenney"
    Cc: Josh Triplett
    Cc: Steven Rostedt
    Cc: Mathieu Desnoyers
    Cc: Joel Fernandes (Google)
    Signed-off-by: Paul E. McKenney

    Randy Dunlap
     

26 Jul, 2018

2 commits

  • In the last patch in this series, we are making lockdep register hooks
    onto the irq_{disable,enable} tracepoints. These tracepoints use the
    _rcuidle tracepoint variant. In this series we switch the _rcuidle
    tracepoint callers to use SRCU instead of sched-RCU. Inorder to
    dereference the pointer to the probe functions, we could call
    srcu_dereference, however this API will call back into lockdep to check
    if the lock is held *before* the lockdep probe hooks have a chance to
    run and annotate the IRQ enabled/disabled state.

    For this reason we need a notrace variant of srcu_dereference since
    otherwise we get lockdep splats. This patch adds the needed
    srcu_dereference_notrace variant.

    Link: http://lkml.kernel.org/r/20180628182149.226164-3-joel@joelfernandes.org

    Reviewed-by: Paul E. McKenney
    Signed-off-by: Joel Fernandes (Google)
    Signed-off-by: Steven Rostedt (VMware)

    Joel Fernandes (Google)
     
  • This is needed for a future tracepoint patch that uses srcu, and to make
    sure it doesn't call into lockdep.

    tracepoint code already calls notrace variants for rcu_read_lock_sched
    so this patch does the same for srcu which will be used in a later
    patch. Keeps it consistent with rcu-sched.

    [Joel: Added commit message]
    Link: http://lkml.kernel.org/r/20180628182149.226164-2-joel@joelfernandes.org

    Reviewed-by: Steven Rostedt (VMware)
    Signed-off-by: Paul McKenney
    Signed-off-by: Joel Fernandes (Google)
    Signed-off-by: Steven Rostedt (VMware)

    Paul McKenney
     

16 May, 2018

1 commit

  • The current cleanup_srcu_struct() flushes work, which prevents it
    from being invoked from some workqueue contexts, as well as from
    atomic (non-blocking) contexts. This patch therefore introduced a
    cleanup_srcu_struct_quiesced(), which can be invoked only after all
    activity on the specified srcu_struct has completed. This restriction
    allows cleanup_srcu_struct_quiesced() to be invoked from workqueue
    contexts as well as from atomic contexts.

    Suggested-by: Christoph Hellwig
    Signed-off-by: Paul E. McKenney
    Tested-by: Nitzan Carmi
    Tested-by: Nicholas Piggin

    Paul E. McKenney
     

18 Jan, 2018

1 commit

  • These users of lockdep_is_held() either wanted lockdep_is_held to
    take a const pointer, or would benefit from providing a const pointer.

    Signed-off-by: Matthew Wilcox
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Cc: "David S. Miller"
    Link: https://lkml.kernel.org/r/20180117151414.23686-4-willy@infradead.org

    Matthew Wilcox
     

20 Oct, 2017

1 commit


09 Jun, 2017

2 commits

  • Classic SRCU was only ever intended to be a fallback in case of issues
    with Tree/Tiny SRCU, and the latter two are doing quite well in testing.
    This commit therefore removes Classic SRCU.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The call_srcu() docbook entry is currently in include/linux/srcu.h,
    which causes needless processing for each include point. This commit
    therefore moves this entry to kernel/rcu/srcutree.c, which the compiler
    reads only once. In addition, the srcu_batches_completed() function is
    used only within RCU and its torture-test suites. This commit therefore
    also moves this function's declaration from include/linux/srcutiny.h,
    include/linux/srcutree.h, and include/linux/srcuclassic.h to
    kernel/rcu/rcu.h.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

08 Jun, 2017

2 commits

  • Commit d160a727c40e ("srcu: Make SRCU be built by default") in response
    to build errors, which were caused by code that included srcu.h
    despite !SRCU. However, srcutiny.o is almost 2K of code, which is not
    insignificant for those attempting to run the Linux kernel on IoT devices.
    This commit therefore makes SRCU be once again optional, and adjusts
    srcu.h to allow error-free inclusion in !SRCU kernel builds.

    Signed-off-by: Paul E. McKenney
    Acked-by: Nicolas Pitre

    Paul E. McKenney
     
  • Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
    down a guest running iperf on a VFIO assigned device. This happens
    because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
    context, while a worker thread does the same inside kvm_set_irq(). If the
    interrupt happens while the worker thread is executing __srcu_read_lock(),
    updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
    ->srcu_lock_count[] field can be lost.

    The docs say you are not supposed to call srcu_read_lock() and
    srcu_read_unlock() from irq context, but KVM interrupt injection happens
    from (host) interrupt context and it would be nice if SRCU supported the
    use case. KVM is using SRCU here not really for the "sleepable" part,
    but rather due to its IPI-free fast detection of grace periods. It is
    therefore not desirable to switch back to RCU, which would effectively
    revert commit 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
    2014-01-16).

    However, the docs are overly conservative. You can have an SRCU instance
    only has users in irq context, and you can mix process and irq context
    as long as process context users disable interrupts. In addition,
    __srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
    Classic SRCU. For those two implementations, only srcu_read_lock()
    is unsafe.

    When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
    in commit 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via
    this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
    Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
    the caller. Tree SRCU however only does one increment, so on most
    architectures it is more efficient for __srcu_read_lock() to use
    this_cpu_inc(), and any performance differences appear to be down in
    the noise.

    Cc: stable@vger.kernel.org
    Fixes: 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
    Reported-by: Linu Cherian
    Suggested-by: Linu Cherian
    Cc: kvm@vger.kernel.org
    Signed-off-by: Paolo Bonzini
    Cc: Linus Torvalds
    Signed-off-by: Paul E. McKenney

    Paolo Bonzini
     

19 Apr, 2017

7 commits

  • The TREE_SRCU rewrite is large and a bit on the non-simple side, so
    this commit helps reduce risk by allowing the old v4.11 SRCU algorithm
    to be selected using a new CLASSIC_SRCU Kconfig option that depends
    on RCU_EXPERT. The default is to use the new TREE_SRCU and TINY_SRCU
    algorithms, in order to help get these the testing that they need.
    However, if your users do not require the update-side scalability that
    is to be provided by TREE_SRCU, select RCU_EXPERT and then CLASSIC_SRCU
    to revert back to the old classic SRCU algorithm.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • In response to automated complaints about modifications to SRCU
    increasing its size, this commit creates a tiny SRCU that is
    used in SMP=n && PREEMPT=n builds.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • SRCU's implementation of expedited grace periods has always assumed
    that the SRCU instance is idle when the expedited request arrives.
    This commit improves this a bit by maintaining a count of the number
    of outstanding expedited requests, thus allowing prior non-expedited
    grace periods accommodate these requests by shifting to expedited mode.
    However, any non-expedited wait already in progress will still wait for
    the full duration.

    Improved control of expedited grace periods is planned, but one step
    at a time.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Updating ->srcu_state and ->srcu_gp_seq will lead to extremely complex
    race conditions given multiple callback queues, so this commit takes
    advantage of the two-bit state now available in rcu_seq counters to
    store the state in the bottom two bits of ->srcu_gp_seq.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit switches SRCU from custom-built callback queues to the new
    rcu_segcblist structure. This change associates grace-period sequence
    numbers with groups of callbacks, which will be needed for efficient
    processing of per-CPU callbacks.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit adds grace-period sequence numbers, which will be used to
    handle mid-boot grace periods and per-CPU callback lists.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The current SRCU grace-period processing might never reach the last
    portion of srcu_advance_batches(). This is OK given the current
    implementation, as the first portion, up to the try_check_zero()
    following the srcu_flip() is sufficient to drive grace periods forward.
    However, it has the unfortunate side-effect of making it impossible to
    determine when a given grace period has ended, and it will be necessary
    to efficiently trace ends of grace periods in order to efficiently handle
    per-CPU SRCU callback lists.

    This commit therefore adds states to the SRCU grace-period processing,
    so that the end of a given SRCU grace period is marked by the transition
    to the SRCU_STATE_DONE state.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

26 Jan, 2017

1 commit

  • SRCU uses two per-cpu counters: a nesting counter to count the number of
    active critical sections, and a sequence counter to ensure that the nesting
    counters don't change while they are being added together in
    srcu_readers_active_idx_check().

    This patch instead uses per-cpu lock and unlock counters. Because both
    counters only increase and srcu_readers_active_idx_check() reads the unlock
    counter before the lock counter, this achieves the same end without having
    to increment two different counters in srcu_read_lock(). This also saves a
    smp_mb() in srcu_readers_active_idx_check().

    Possible bug: There is no guarantee that the lock counter won't overflow
    during srcu_readers_active_idx_check(), as there are no memory barriers
    around srcu_flip() (see comment in srcu_readers_active_idx_check() for
    details). However, this problem was already present before this patch.

    Suggested-by: Mathieu Desnoyers
    Signed-off-by: Lance Roy
    Cc: Paul E. McKenney
    Cc: Lai Jiangshan
    Cc: Peter Zijlstra
    Signed-off-by: Paul E. McKenney

    Lance Roy
     

24 Feb, 2016

1 commit

  • SRCU uses per-CPU variables, and DEFINE_STATIC_SRCU() uses a static
    per-CPU variable. However, per-CPU variables have significant
    restrictions, for example, names of per-CPU variables must be globally
    unique, even if declared static. These restrictions carry over to
    DEFINE_STATIC_SRCU(), and this commit therefore documents these
    restrictions.

    Reported-by: Stephen Rothwell
    Reported-by: kbuild test robot
    Suggested-by: Boqun Feng
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Tejun Heo

    Paul E. McKenney
     

07 Oct, 2015

1 commit

  • Currently, __srcu_read_lock() cannot be invoked from restricted
    environments because it contains calls to preempt_disable() and
    preempt_enable(), both of which can invoke lockdep, which is a bad
    idea in some restricted execution modes. This commit therefore moves
    the preempt_disable() and preempt_enable() from __srcu_read_lock()
    to srcu_read_lock(). It also inserts the preempt_disable() and
    preempt_enable() around the call to __srcu_read_lock() in do_exit().

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

04 Mar, 2015

1 commit

  • The rcu_dereference_check() family of primitives evaluates the RCU
    lockdep expression first, and only then evaluates the expression passed
    in. This works fine normally, but can potentially fail in environments
    (such as NMI handlers) where lockdep cannot be invoked. The problem is
    that even if the expression passed in is "1", the compiler would need to
    prove that the RCU lockdep expression (rcu_read_lock_held(), for example)
    is free of side effects in order to be able to elide it. Given that
    rcu_read_lock_held() is sometimes separately compiled, the compiler cannot
    always use this optimization.

    This commit therefore reverse the order of evaluation, so that the
    expression passed in is evaluated first, and the RCU lockdep expression is
    evaluated only if the passed-in expression evaluated to false, courtesy
    of the C-language short-circuit boolean evaluation rules. This compells
    the compiler to forego executing the RCU lockdep expression in cases
    where the passed-in expression evaluates to "1" at compile time, so that
    (for example) rcu_dereference_raw() can be guaranteed to execute safely
    within an NMI handler.

    Signed-off-by: Paul E. McKenney
    Acked-by: Peter Zijlstra (Intel)

    Paul E. McKenney
     

07 Jan, 2015

2 commits


18 Feb, 2014

1 commit

  • All of the RCU source files have the usual GPL header, which contains a
    long-obsolete postal address for FSF. To avoid the need to track the
    FSF office's movements, this commit substitutes the URL where GPL may
    be found.

    Reported-by: Greg KH
    Reported-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

06 Nov, 2013

1 commit

  • srcu read lock/unlock include a full memory barrier
    but that's an implementation detail.
    Add an API for make memory fencing explicit for
    users that need this barrier, to make sure we
    can change it as needed without breaking all users.

    Acked-by: "Paul E. McKenney"
    Reviewed-by: Paul E. McKenney
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Gleb Natapov

    Michael S. Tsirkin
     

11 Jun, 2013

1 commit


08 Feb, 2013

2 commits

  • SRCU has its own statemachine and no longer relies on normal RCU.
    Its read-side critical section can now be used by an offline CPU, so this
    commit removes the check and the comments, reverting the SRCU portion
    of ff195cb6 (rcu: Warn when srcu_read_lock() is used in an extended
    quiescent state).

    It also makes the codes match the comments in whatisRCU.txt:

    g. Do you need read-side critical sections that are respected
    even though they are in the middle of the idle loop, during
    user-mode execution, or on an offlined CPU? If so, SRCU is the
    only choice that will work for you.

    [ paulmck: There is at least one remaining issue, namely use of lockdep
    with tracing enabled. ]

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     
  • SRCU has its own statemachine and no longer relies on normal RCU.
    Its read-side critical section can now be used by an offline CPU, so this
    commit removes the check and the comments, reverting the SRCU portion
    of c0d6d01b (rcu: Check for illegal use of RCU from offlined CPUs).

    It also makes the code match the comments in whatisRCU.txt:

    g. Do you need read-side critical sections that are respected
    even though they are in the middle of the idle loop, during
    user-mode execution, or on an offlined CPU? If so, SRCU is the
    only choice that will work for you.

    [ paulmck: There is at least one remaining issue, namely use of lockdep
    with tracing enabled. ]

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     

28 Oct, 2012

1 commit

  • In old days, we had two different API sets for dynamic-allocated per-CPU
    data and DEFINE_PER_CPU()-defined per_cpu data, and because SRCU used
    dynamic-allocated per-CPU data, its srcu_struct structures cannot be
    declared statically. This commit therefore introduces DEFINE_SRCU()
    and DEFINE_STATIC_SRCU() to allow statically declared SRCU structures,
    using the new static per-CPU interfaces.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney
    [ paulmck: Updated for __DELAYED_WORK_INITIALIZER() added argument,
    fixed whitespace issue. ]

    Lai Jiangshan
     

24 Oct, 2012

2 commits


01 May, 2012

4 commits

  • This commit implements an SRCU state machine in support of call_srcu().
    The state machine is preemptible, light-weight, and single-threaded,
    minimizing synchronization overhead. In particular, there is no longer
    any need for synchronize_srcu() to be guarded by a mutex.

    Expedited processing is handled, at least in the absence of concurrent
    grace-period operations on that same srcu_struct structure, by having
    the synchronize_srcu_expedited() thread take on the role of the
    workqueue thread for one iteration.

    There is a reasonable probability that a given SRCU callback will
    be invoked on the same CPU that registered it, however, there is no
    guarantee. Concurrent SRCU grace-period primitives can cause callbacks
    to be executed elsewhere, even in absence of CPU-hotplug operations.

    Callbacks execute in process context, but under the influence of
    local_bh_disable(), so it is illegal to sleep in an SRCU callback
    function.

    Signed-off-by: Lai Jiangshan
    Acked-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     
  • The old srcu_barrier() macro is now unused. This commit removes it so
    that it may be used for the SRCU flavor of rcu_barrier(), which will in
    turn be needed to allow the upcoming call_srcu() to be used from within
    modules.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     
  • This commit implements a variant of Peter's algorithm, which may be found
    at https://lkml.org/lkml/2012/2/1/119.

    o Make the checking lock-free to enable parallel checking.
    Parallel checking is required when (1) the original checking
    task is preempted for a long time, (2) sychronize_srcu_expedited()
    starts during an ongoing SRCU grace period, or (3) we wish to
    avoid acquiring a lock.

    o Since the checking is lock-free, we avoid a mutex in state machine
    for call_srcu().

    o Remove the SRCU_REF_MASK and remove the coupling with the flipping.
    This might allow us to remove the preempt_disable() in future
    versions, though such removal will need great care because it
    rescinds the one-old-reader-per-CPU guarantee.

    o Remove a smp_mb(), simplify the comments and make the smp_mb() pairs
    more intuitive.

    Inspired-by: Peter Zijlstra
    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     
  • The purpose of the upper bit of SRCU's per-CPU counters is to guarantee
    that no reasonable series of srcu_read_lock() and srcu_read_unlock()
    operations can return the value of the counter to its original value.
    This guarantee is require only after the index has been switched to
    the other set of counters, so at most one srcu_read_lock() can affect
    a given CPU's counter. The number of srcu_read_unlock() operations
    on a given counter is limited to the number of tasks in the system,
    which given the Linux kernel's current structure is limited to far less
    than 2^30 on 32-bit systems and far less than 2^62 on 64-bit systems.
    (Something about a limited number of bytes in the kernel's address space.)

    Therefore, if srcu_read_lock() increments the upper bits, then
    srcu_read_unlock() need not do so. In this case, an srcu_read_lock() and
    an srcu_read_unlock() will flip the lower bit of the upper field of the
    counter. An unreasonably large additional number of srcu_read_unlock()
    operations would be required to return the counter to its initial value,
    thus preserving the guarantee.

    This commit takes this approach, which further allows it to shrink
    the size of the upper field to one bit, making the number of
    srcu_read_unlock() operations required to return the counter to its
    initial value even more unreasonable than before.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan