27 Mar, 2019

1 commit

  • The cleanup_srcu_struct_quiesced() function was added because NVME
    used WQ_MEM_RECLAIM workqueues and SRCU did not, which meant that
    NVME workqueues waiting on SRCU workqueues could result in deadlocks
    during low-memory conditions. However, SRCU now also has WQ_MEM_RECLAIM
    workqueues, so there is no longer a potential for deadlock. Furthermore,
    it turns out to be extremely hard to use cleanup_srcu_struct_quiesced()
    correctly due to the fact that SRCU callback invocation accesses the
    srcu_struct structure's per-CPU data area just after callbacks are
    invoked. Therefore, the usual practice of using srcu_barrier() to wait
    for callbacks to be invoked before invoking cleanup_srcu_struct_quiesced()
    fails because SRCU's callback-invocation workqueue handler might be
    delayed, which can result in cleanup_srcu_struct_quiesced() being invoked
    (and thus freeing the per-CPU data) before the SRCU's callback-invocation
    workqueue handler is finished using that per-CPU data. Nor is this a
    theoretical problem: KASAN emitted use-after-free warnings because of
    this problem on actual runs.

    In short, NVME can now safely invoke cleanup_srcu_struct(), which
    avoids the use-after-free scenario. And cleanup_srcu_struct_quiesced()
    is quite difficult to use safely. This commit therefore removes
    cleanup_srcu_struct_quiesced(), switching its sole user back to
    cleanup_srcu_struct(). This effectively reverts the following pair
    of commits:

    f7194ac32ca2 ("srcu: Add cleanup_srcu_struct_quiesced()")
    4317228ad9b8 ("nvme: Avoid flush dependency in delete controller flow")

    Reported-by: Bart Van Assche
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Bart Van Assche
    Tested-by: Bart Van Assche

    Paul E. McKenney
     

10 Feb, 2019

1 commit


28 Nov, 2018

1 commit

  • In RCU, the distinction between "rsp", "rnp", and "rdp" has served well
    for a great many years, but in SRCU, "sp" vs. "sdp" has proven confusing.
    This commit therefore renames SRCU's "sp" pointers to "ssp", so that there
    is "ssp" for srcu_struct pointer, "snp" for srcu_node pointer, and "sdp"
    for srcu_data pointer.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

31 Aug, 2018

2 commits

  • Allocating a list_head structure that is almost never used, and, when
    used, is used only during early boot (rcu_init() and earlier), is a bit
    wasteful. This commit therefore eliminates that list_head in favor of
    the one in the work_struct structure. This is safe because the work_struct
    structure cannot be used until after rcu_init() returns.

    Reported-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney
    Cc: Tejun Heo
    Cc: Lai Jiangshan
    Tested-by: Steven Rostedt (VMware)

    Paul E. McKenney
     
  • Event tracing is moving to SRCU in order to take advantage of the fact
    that SRCU may be safely used from idle and even offline CPUs. However,
    event tracing can invoke call_srcu() very early in the boot process,
    even before workqueue_init_early() is invoked (let alone rcu_init()).
    Therefore, call_srcu()'s attempts to queue work fail miserably.

    This commit therefore detects this situation, and refrains from attempting
    to queue work before rcu_init() time, but does everything else that it
    would have done, and in addition, adds the srcu_struct to a global list.
    The rcu_init() function now invokes a new srcu_init() function, which
    is empty if CONFIG_SRCU=n. Otherwise, srcu_init() queues work for
    each srcu_struct on the list. This all happens early enough in boot
    that there is but a single CPU with interrupts disabled, which allows
    synchronization to be dispensed with.

    Of course, the queued work won't actually be invoked until after
    workqueue_init() is invoked, which happens shortly after the scheduler
    is up and running. This means that although call_srcu() may be invoked
    any time after per-CPU variables have been set up, there is still a very
    narrow window when synchronize_srcu() won't work, and this window
    extends from the time that the scheduler starts until the time that
    workqueue_init() returns. This can be fixed in a manner similar to
    the fix for synchronize_rcu_expedited() and friends, but until someone
    actually needs to use synchronize_srcu() during this window, this fix
    is added churn for no benefit.

    Finally, note that Tree SRCU's new srcu_init() function invokes
    queue_work() rather than the queue_delayed_work() function that is
    invoked post-boot. The reason is that queue_delayed_work() will (as you
    would expect) post a timer, and timers have not yet been initialized.
    So use of queue_work() avoids the complaints about use of uninitialized
    spinlocks that would otherwise result. Besides, some delay is already
    provide by the aforementioned fact that the queued work won't actually
    be invoked until after the scheduler is up and running.

    Requested-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney
    Tested-by: Steven Rostedt (VMware)

    Paul E. McKenney
     

20 Jun, 2018

1 commit

  • Since swait basically implemented exclusive waits only, make sure
    the API reflects that.

    $ git grep -l -e "\"
    -e "\" | while read file;
    do
    sed -i -e 's/\/&_one/g'
    -e 's/\/&_exclusive/g' $file;
    done

    With a few manual touch-ups.

    Suggested-by: Linus Torvalds
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Acked-by: Linus Torvalds
    Cc: bigeasy@linutronix.de
    Cc: oleg@redhat.com
    Cc: paulmck@linux.vnet.ibm.com
    Cc: pbonzini@redhat.com
    Link: https://lkml.kernel.org/r/20180612083909.261946548@infradead.org

    Peter Zijlstra
     

16 May, 2018

1 commit

  • The current cleanup_srcu_struct() flushes work, which prevents it
    from being invoked from some workqueue contexts, as well as from
    atomic (non-blocking) contexts. This patch therefore introduced a
    cleanup_srcu_struct_quiesced(), which can be invoked only after all
    activity on the specified srcu_struct has completed. This restriction
    allows cleanup_srcu_struct_quiesced() to be invoked from workqueue
    contexts as well as from atomic contexts.

    Suggested-by: Christoph Hellwig
    Signed-off-by: Paul E. McKenney
    Tested-by: Nitzan Carmi
    Tested-by: Nicholas Piggin

    Paul E. McKenney
     

25 Jul, 2017

1 commit

  • Other than lockdep support, Tiny RCU has no need for the
    scheduler status. However, Tiny SRCU will need this to control
    boot-time behavior independent of lockdep. Therefore, this commit
    moves rcu_scheduler_starting() from kernel/rcu/tiny_plugin.h to
    kernel/rcu/srcutiny.c. This in turn allows the complete removal of
    kernel/rcu/tiny_plugin.h.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

09 Jun, 2017

1 commit

  • The rcu_segcblist structure provides quite a bit of functionality, and
    Tiny SRCU needs almost none of it. So this commit replaces Tiny SRCU's
    uses of rcu_segcblist with a simple singly linked list with tail pointer.
    This change significantly reduces Tiny SRCU's memory footprint, more
    than making up for the growth caused by the creation of rcu_segcblist.c

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

08 Jun, 2017

2 commits

  • In Tiny SRCU, __srcu_read_lock() is a trivial function, outweighed by
    its EXPORT_SYMBOL_GPL(), and on many architectures, its call sequence.
    This commit therefore moves it to srcutiny.h so that it can be inlined.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
    down a guest running iperf on a VFIO assigned device. This happens
    because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
    context, while a worker thread does the same inside kvm_set_irq(). If the
    interrupt happens while the worker thread is executing __srcu_read_lock(),
    updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
    ->srcu_lock_count[] field can be lost.

    The docs say you are not supposed to call srcu_read_lock() and
    srcu_read_unlock() from irq context, but KVM interrupt injection happens
    from (host) interrupt context and it would be nice if SRCU supported the
    use case. KVM is using SRCU here not really for the "sleepable" part,
    but rather due to its IPI-free fast detection of grace periods. It is
    therefore not desirable to switch back to RCU, which would effectively
    revert commit 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
    2014-01-16).

    However, the docs are overly conservative. You can have an SRCU instance
    only has users in irq context, and you can mix process and irq context
    as long as process context users disable interrupts. In addition,
    __srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
    Classic SRCU. For those two implementations, only srcu_read_lock()
    is unsafe.

    When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
    in commit 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via
    this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
    Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
    the caller. Tree SRCU however only does one increment, so on most
    architectures it is more efficient for __srcu_read_lock() to use
    this_cpu_inc(), and any performance differences appear to be down in
    the noise.

    Unlike Classic and Tree SRCU, Tiny SRCU does increments and decrements on
    a single variable. Therefore, as Peter Zijlstra pointed out, Tiny SRCU's
    implementation already supports mixed-context use of srcu_read_lock()
    and srcu_read_unlock(), at least as long as uses of srcu_read_lock()
    and srcu_read_unlock() in each handler are nested and paired properly.
    In other words, it is still illegal to (say) invoke srcu_read_lock()
    in an interrupt handler and to invoke the matching srcu_read_unlock()
    in a softirq handler. Therefore, the only change required for Tiny SRCU
    is to its comments.

    Fixes: 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
    Reported-by: Linu Cherian
    Suggested-by: Linu Cherian
    Cc: kvm@vger.kernel.org
    Signed-off-by: Paolo Bonzini
    Cc: Linus Torvalds
    Signed-off-by: Paul E. McKenney
    Tested-by: Paolo Bonzini

    Paolo Bonzini
     

02 May, 2017

1 commit

  • Linus noticed that the has huge inline functions
    which should not be inline at all.

    As a first step in cleaning this up, move them all to kernel/rcu/ and
    only keep an absolute minimum of data type defines in the header:

    before: -rw-r--r-- 1 mingo mingo 22284 May 2 10:25 include/linux/rcu_segcblist.h
    after: -rw-r--r-- 1 mingo mingo 3180 May 2 10:22 include/linux/rcu_segcblist.h

    More can be done, such as uninlining the large functions, which inlining
    is unjustified even if it's an RCU internal matter.

    Reported-by: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Signed-off-by: Paul E. McKenney

    Ingo Molnar
     

19 Apr, 2017

1 commit