16 Jun, 2016

2 commits

  • doc.2016.06.15a: Documentation updates
    fixes.2016.06.15b: Documentation updates
    torture.2016.06.14a: Documentation updates

    Paul E. McKenney
     
  • In many cases in the RCU tree code, we iterate over the set of cpus for
    a leaf node described by rcu_node::grplo and rcu_node::grphi, checking
    per-cpu data for each cpu in this range. However, if the set of possible
    cpus is sparse, some cpus described in this range are not possible, and
    thus no per-cpu region will have been allocated (or initialised) for
    them by the generic percpu code.

    Erroneous accesses to a per-cpu area for these !possible cpus may fault
    or may hit other data depending on the addressed generated when the
    erroneous per cpu offset is applied. In practice, both cases have been
    observed on arm64 hardware (the former being silent, but detectable with
    additional patches).

    To avoid issues resulting from this, we must iterate over the set of
    *possible* cpus for a given leaf node. This patch add a new helper,
    for_each_leaf_node_possible_cpu, to enable this. As iteration is often
    intertwined with rcu_node local bitmask manipulation, a new
    leaf_node_cpu_bit helper is added to make this simpler and more
    consistent. The RCU tree code is made to use both of these where
    appropriate.

    Without this patch, running reboot at a shell can result in an oops
    like:

    [ 3369.075979] Unable to handle kernel paging request at virtual address ffffff8008b21b4c
    [ 3369.083881] pgd = ffffffc3ecdda000
    [ 3369.087270] [ffffff8008b21b4c] *pgd=00000083eca48003, *pud=00000083eca48003, *pmd=0000000000000000
    [ 3369.096222] Internal error: Oops: 96000007 [#1] PREEMPT SMP
    [ 3369.101781] Modules linked in:
    [ 3369.104825] CPU: 2 PID: 1817 Comm: NetworkManager Tainted: G W 4.6.0+ #3
    [ 3369.121239] task: ffffffc0fa13e000 ti: ffffffc3eb940000 task.ti: ffffffc3eb940000
    [ 3369.128708] PC is at sync_rcu_exp_select_cpus+0x188/0x510
    [ 3369.134094] LR is at sync_rcu_exp_select_cpus+0x104/0x510
    [ 3369.139479] pc : [] lr : [] pstate: 200001c5
    [ 3369.146860] sp : ffffffc3eb9435a0
    [ 3369.150162] x29: ffffffc3eb9435a0 x28: ffffff8008be4f88
    [ 3369.155465] x27: ffffff8008b66c80 x26: ffffffc3eceb2600
    [ 3369.160767] x25: 0000000000000001 x24: ffffff8008be4f88
    [ 3369.166070] x23: ffffff8008b51c3c x22: ffffff8008b66c80
    [ 3369.171371] x21: 0000000000000001 x20: ffffff8008b21b40
    [ 3369.176673] x19: ffffff8008b66c80 x18: 0000000000000000
    [ 3369.181975] x17: 0000007fa951a010 x16: ffffff80086a30f0
    [ 3369.187278] x15: 0000007fa9505590 x14: 0000000000000000
    [ 3369.192580] x13: ffffff8008b51000 x12: ffffffc3eb940000
    [ 3369.197882] x11: 0000000000000006 x10: ffffff8008b51b78
    [ 3369.203184] x9 : 0000000000000001 x8 : ffffff8008be4000
    [ 3369.208486] x7 : ffffff8008b21b40 x6 : 0000000000001003
    [ 3369.213788] x5 : 0000000000000000 x4 : ffffff8008b27280
    [ 3369.219090] x3 : ffffff8008b21b4c x2 : 0000000000000001
    [ 3369.224406] x1 : 0000000000000001 x0 : 0000000000000140
    ...
    [ 3369.972257] [] sync_rcu_exp_select_cpus+0x188/0x510
    [ 3369.978685] [] synchronize_rcu_expedited+0x64/0xa8
    [ 3369.985026] [] synchronize_net+0x24/0x30
    [ 3369.990499] [] dev_deactivate_many+0x28c/0x298
    [ 3369.996493] [] __dev_close_many+0x60/0xd0
    [ 3370.002052] [] __dev_close+0x28/0x40
    [ 3370.007178] [] __dev_change_flags+0x8c/0x158
    [ 3370.012999] [] dev_change_flags+0x20/0x60
    [ 3370.018558] [] do_setlink+0x288/0x918
    [ 3370.023771] [] rtnl_newlink+0x398/0x6a8
    [ 3370.029158] [] rtnetlink_rcv_msg+0xe4/0x220
    [ 3370.034891] [] netlink_rcv_skb+0xc4/0xf8
    [ 3370.040364] [] rtnetlink_rcv+0x2c/0x40
    [ 3370.045663] [] netlink_unicast+0x160/0x238
    [ 3370.051309] [] netlink_sendmsg+0x2f0/0x358
    [ 3370.056956] [] sock_sendmsg+0x18/0x30
    [ 3370.062168] [] ___sys_sendmsg+0x26c/0x280
    [ 3370.067728] [] __sys_sendmsg+0x44/0x88
    [ 3370.073027] [] SyS_sendmsg+0x10/0x20
    [ 3370.078153] [] el0_svc_naked+0x24/0x28

    Signed-off-by: Mark Rutland
    Reported-by: Dennis Chen
    Cc: Catalin Marinas
    Cc: Josh Triplett
    Cc: Lai Jiangshan
    Cc: Mathieu Desnoyers
    Cc: Steve Capper
    Cc: Steven Rostedt
    Cc: Will Deacon
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Paul E. McKenney

    Mark Rutland
     

15 Jun, 2016

2 commits


01 Apr, 2016

7 commits


15 Mar, 2016

1 commit


25 Feb, 2016

2 commits

  • As of commit dae6e64d2bcfd ("rcu: Introduce proper blocking to no-CBs kthreads
    GP waits") the RCU subsystem started making use of wait queues.

    Here we convert all additions of RCU wait queues to use simple wait queues,
    since they don't need the extra overhead of the full wait queue features.

    Originally this was done for RT kernels[1], since we would get things like...

    BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
    in_atomic(): 1, irqs_disabled(): 1, pid: 8, name: rcu_preempt
    Pid: 8, comm: rcu_preempt Not tainted
    Call Trace:
    [] __might_sleep+0xd0/0xf0
    [] rt_spin_lock+0x24/0x50
    [] __wake_up+0x36/0x70
    [] rcu_gp_kthread+0x4d2/0x680
    [] ? __init_waitqueue_head+0x50/0x50
    [] ? rcu_gp_fqs+0x80/0x80
    [] kthread+0xdb/0xe0
    [] ? finish_task_switch+0x52/0x100
    [] kernel_thread_helper+0x4/0x10
    [] ? __init_kthread_worker+0x60/0x60
    [] ? gs_change+0xb/0xb

    ...and hence simple wait queues were deployed on RT out of necessity
    (as simple wait uses a raw lock), but mainline might as well take
    advantage of the more streamline support as well.

    [1] This is a carry forward of work from v3.10-rt; the original conversion
    was by Thomas on an earlier -rt version, and Sebastian extended it to
    additional post-3.10 added RCU waiters; here I've added a commit log and
    unified the RCU changes into one, and uprev'd it to match mainline RCU.

    Signed-off-by: Daniel Wagner
    Acked-by: Peter Zijlstra (Intel)
    Cc: linux-rt-users@vger.kernel.org
    Cc: Boqun Feng
    Cc: Marcelo Tosatti
    Cc: Steven Rostedt
    Cc: Paul Gortmaker
    Cc: Paolo Bonzini
    Cc: "Paul E. McKenney"
    Link: http://lkml.kernel.org/r/1455871601-27484-6-git-send-email-wagi@monom.org
    Signed-off-by: Thomas Gleixner

    Paul Gortmaker
     
  • rcu_nocb_gp_cleanup() is called while holding rnp->lock. Currently,
    this is okay because the wake_up_all() in rcu_nocb_gp_cleanup() will
    not enable the IRQs. lockdep is happy.

    By switching over using swait this is not true anymore. swake_up_all()
    enables the IRQs while processing the waiters. __do_softirq() can now
    run and will eventually call rcu_process_callbacks() which wants to
    grap nrp->lock.

    Let's move the rcu_nocb_gp_cleanup() call outside the lock before we
    switch over to swait.

    If we would hold the rnp->lock and use swait, lockdep reports
    following:

    =================================
    [ INFO: inconsistent lock state ]
    4.2.0-rc5-00025-g9a73ba0 #136 Not tainted
    ---------------------------------
    inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
    rcu_preempt/8 [HC0[0]:SC0[0]:HE1:SE1] takes:
    (rcu_node_1){+.?...}, at: [] rcu_gp_kthread+0xb97/0xeb0
    {IN-SOFTIRQ-W} state was registered at:
    [] __lock_acquire+0xd5f/0x21e0
    [] lock_acquire+0xdf/0x2b0
    [] _raw_spin_lock_irqsave+0x59/0xa0
    [] rcu_process_callbacks+0x141/0x3c0
    [] __do_softirq+0x14d/0x670
    [] irq_exit+0x104/0x110
    [] smp_apic_timer_interrupt+0x46/0x60
    [] apic_timer_interrupt+0x70/0x80
    [] rq_attach_root+0xa6/0x100
    [] cpu_attach_domain+0x16d/0x650
    [] build_sched_domains+0x942/0xb00
    [] sched_init_smp+0x509/0x5c1
    [] kernel_init_freeable+0x172/0x28f
    [] kernel_init+0xe/0xe0
    [] ret_from_fork+0x3f/0x70
    irq event stamp: 76
    hardirqs last enabled at (75): [] _raw_spin_unlock_irq+0x30/0x60
    hardirqs last disabled at (76): [] _raw_spin_lock_irq+0x1f/0x90
    softirqs last enabled at (0): [] copy_process.part.26+0x602/0x1cf0
    softirqs last disabled at (0): [< (null)>] (null)
    other info that might help us debug this:
    Possible unsafe locking scenario:
    CPU0
    ----
    lock(rcu_node_1);

    lock(rcu_node_1);
    *** DEADLOCK ***
    1 lock held by rcu_preempt/8:
    #0: (rcu_node_1){+.?...}, at: [] rcu_gp_kthread+0xb97/0xeb0
    stack backtrace:
    CPU: 0 PID: 8 Comm: rcu_preempt Not tainted 4.2.0-rc5-00025-g9a73ba0 #136
    Hardware name: Dell Inc. PowerEdge R820/066N7P, BIOS 2.0.20 01/16/2014
    0000000000000000 000000006d7e67d8 ffff881fb081fbd8 ffffffff818379e0
    0000000000000000 ffff881fb0812a00 ffff881fb081fc38 ffffffff8110813b
    0000000000000000 0000000000000001 ffff881f00000001 ffffffff8102fa4f
    Call Trace:
    [] dump_stack+0x4f/0x7b
    [] print_usage_bug+0x1db/0x1e0
    [] ? save_stack_trace+0x2f/0x50
    [] mark_lock+0x66d/0x6e0
    [] ? check_usage_forwards+0x150/0x150
    [] mark_held_locks+0x78/0xa0
    [] ? _raw_spin_unlock_irq+0x30/0x60
    [] trace_hardirqs_on_caller+0x168/0x220
    [] trace_hardirqs_on+0xd/0x10
    [] _raw_spin_unlock_irq+0x30/0x60
    [] swake_up_all+0xb7/0xe0
    [] rcu_gp_kthread+0xab1/0xeb0
    [] ? trace_hardirqs_on_caller+0xff/0x220
    [] ? _raw_spin_unlock_irq+0x41/0x60
    [] ? rcu_barrier+0x20/0x20
    [] kthread+0x104/0x120
    [] ? _raw_spin_unlock_irq+0x30/0x60
    [] ? kthread_create_on_node+0x260/0x260
    [] ret_from_fork+0x3f/0x70
    [] ? kthread_create_on_node+0x260/0x260

    Signed-off-by: Daniel Wagner
    Acked-by: Peter Zijlstra (Intel)
    Cc: linux-rt-users@vger.kernel.org
    Cc: Boqun Feng
    Cc: Marcelo Tosatti
    Cc: Steven Rostedt
    Cc: Paul Gortmaker
    Cc: Paolo Bonzini
    Cc: "Paul E. McKenney"
    Link: http://lkml.kernel.org/r/1455871601-27484-5-git-send-email-wagi@monom.org
    Signed-off-by: Thomas Gleixner

    Daniel Wagner
     

24 Feb, 2016

2 commits

  • In patch:

    "rcu: Add transitivity to remaining rcu_node ->lock acquisitions"

    All locking operations on rcu_node::lock are replaced with the wrappers
    because of the need of transitivity, which indicates we should never
    write code using LOCK primitives alone(i.e. without a proper barrier
    following) on rcu_node::lock outside those wrappers. We could detect
    this kind of misuses on rcu_node::lock in the future by adding __private
    modifier on rcu_node::lock.

    To privatize rcu_node::lock, unlock wrappers are also needed. Replacing
    spinlock unlocks with these wrappers not only privatizes rcu_node::lock
    but also makes it easier to figure out critical sections of rcu_node.

    This patch adds __private modifier to rcu_node::lock and makes every
    access to it wrapped by ACCESS_PRIVATE(). Besides, unlock wrappers are
    added and raw_spin_unlock(&rnp->lock) and its friends are replaced with
    those wrappers.

    Signed-off-by: Boqun Feng
    Signed-off-by: Paul E. McKenney

    Boqun Feng
     
  • The related warning from gcc 6.0:

    In file included from kernel/rcu/tree.c:4630:0:
    kernel/rcu/tree_plugin.h:810:40: warning: ‘rcu_data_p’ defined but not used [-Wunused-const-variable]
    static struct rcu_data __percpu *const rcu_data_p = &rcu_sched_data;
    ^~~~~~~~~~

    Also remove always redundant rcu_data_p in tree.c.

    Signed-off-by: Chen Gang
    Signed-off-by: Paul E. McKenney

    Chen Gang
     

08 Dec, 2015

2 commits


05 Dec, 2015

5 commits

  • We need the scheduler's fastpaths to be, well, fast, and unnecessarily
    disabling and re-enabling interrupts is not necessarily consistent with
    this goal. Especially given that there are regions of the scheduler that
    already have interrupts disabled.

    This commit therefore moves the call to rcu_note_context_switch()
    to one of the interrupts-disabled regions of the scheduler, and
    removes the now-redundant disabling and re-enabling of interrupts from
    rcu_note_context_switch() and the functions it calls.

    Reported-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney
    [ paulmck: Shift rcu_note_context_switch() to avoid deadlock, as suggested
    by Peter Zijlstra. ]

    Paul E. McKenney
     
  • Currently, rcu_prepare_for_idle() checks for tick_nohz_active, even on
    individual NOCBs CPUs, unless all CPUs are marked as NOCBs CPUs at build
    time. This check is pointless on NOCBs CPUs because they never have any
    callbacks posted, given that all of their callbacks are handed off to the
    corresponding rcuo kthread. There is a check for individually designated
    NOCBs CPUs, but it pointelessly follows the check for tick_nohz_active.

    This commit therefore moves the check for individually designated NOCBs
    CPUs up with the check for CONFIG_RCU_NOCB_CPU_ALL.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This function no longer has #ifdefs, so this commit removes the
    header comment calling them out.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Several releases have come and gone without the warning triggering,
    so remove the lock-acquisition loop. Retain the WARN_ON_ONCE()
    out of sheer paranoia.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Although expedited grace periods can be quite useful, and although their
    OS jitter has been greatly reduced, they can still pose problems for
    extreme real-time workloads. This commit therefore adds a rcu_normal
    kernel boot parameter (which can also be manipulated via sysfs)
    to suppress expedited grace periods, that is, to treat requests for
    expedited grace periods as if they were requests for normal grace periods.
    If both rcu_expedited and rcu_normal are specified, rcu_normal wins.
    This means that if you are relying on expedited grace periods to speed up
    boot, you will want to specify rcu_expedited on the kernel command line,
    and then specify rcu_normal via sysfs once boot completes.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

24 Nov, 2015

2 commits

  • The rule is that all acquisitions of the rcu_node structure's ->lock
    must provide transitivity: The lock is not acquired that frequently,
    and sorting out exactly which required it and which did not would be
    a maintenance nightmare. This commit therefore supplies the needed
    transitivity to the remaining ->lock acquisitions.

    Reported-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Providing RCU's memory-ordering guarantees requires that the rcu_node
    tree's locking provide transitive memory ordering, which the Linux kernel's
    spinlocks currently do not provide unless smp_mb__after_unlock_lock()
    is used. Having a separate smp_mb__after_unlock_lock() after each and
    every lock acquisition is error-prone, hard to read, and a bit annoying,
    so this commit provides wrapper functions that pull in the
    smp_mb__after_unlock_lock() invocations.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Paul E. McKenney

    Peter Zijlstra
     

08 Oct, 2015

4 commits


07 Oct, 2015

2 commits

  • This commit makes the RCU CPU stall warning message print online/offline
    indications immediately after a hyphen following the CPU number. A "O"
    indicates that the global CPU-hotplug system believes that the CPU is
    online, a "o" that RCU perceived the CPU to be online at the beginning
    of the current expedited grace period, and an "N" that RCU currently
    believes that it will perceive the CPU as being online at the beginning
    of the next expedited grace period, with "." otherwise for all three
    indications. So for CPU 10, you would normally see "10-OoN:" indicating
    that everything believes that the CPU is online.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • As we now have rcu_callback_t typedefs as the type of rcu callbacks, we
    should use it in call_rcu*() and friends as the type of parameters. This
    could save us a few lines of code and make it clear which function
    requires an rcu callbacks rather than other callbacks as its argument.

    Besides, this can also help cscope to generate a better database for
    code reading.

    Signed-off-by: Boqun Feng
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Boqun Feng
     

21 Sep, 2015

7 commits

  • This commit converts the rcu_data structure's ->cpu_no_qs field
    to a union. The bytewise side of this union allows individual access
    to indications as to whether this CPU needs to find a quiescent state
    for a normal (.norm) and/or expedited (.exp) grace period. The setwise
    side of the union allows testing whether or not a quiescent state is
    needed at all, for either type of grace period.

    For now, only .norm is used. A later commit will introduce the expedited
    usage.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit inverts the sense of the rcu_data structure's ->passed_quiesce
    field and renames it to ->cpu_no_qs. This will allow a later commit to
    use an "aggregate OR" operation to test expedited as well as normal grace
    periods without added overhead.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • An upcoming commit needs to invert the sense of the ->passed_quiesce
    rcu_data structure field, so this commit is taking this opportunity
    to clarify things a bit by renaming ->qs_pending to ->core_needs_qs.

    So if !rdp->core_needs_qs, then this CPU need not concern itself with
    quiescent states, in particular, it need not acquire its leaf rcu_node
    structure's ->lock to check. Otherwise, it needs to report the next
    quiescent state.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The current preemptible-RCU expedited grace-period algorithm invokes
    synchronize_sched_expedited() to enqueue all tasks currently running
    in a preemptible-RCU read-side critical section, then waits for all the
    ->blkd_tasks lists to drain. This works, but results in both an IPI and
    a double context switch even on CPUs that do not happen to be running
    in a preemptible RCU read-side critical section.

    This commit implements a new algorithm that causes less OS jitter.
    This new algorithm IPIs all online CPUs that are not idle (from an
    RCU perspective), but refrains from self-IPIs. If a CPU receiving
    this IPI is not in a preemptible RCU read-side critical section (or
    is just now exiting one), it pushes quiescence up the rcu_node tree,
    otherwise, it sets a flag that will be handled by the upcoming outermost
    rcu_read_unlock(), which will then push quiescence up the tree.

    The expedited grace period must of course wait on any pre-existing blocked
    readers, and newly blocked readers must be queued carefully based on
    the state of both the normal and the expedited grace periods. This
    new queueing approach also avoids the need to update boost state,
    courtesy of the fact that blocked tasks are no longer ever migrated to
    the root rcu_node structure.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit replaces sync_rcu_preempt_exp_init1(() and
    sync_rcu_preempt_exp_init2() with sync_exp_reset_tree_hotplug()
    and sync_exp_reset_tree(), which will also be used by
    synchronize_sched_expedited(), and sync_rcu_exp_select_nodes(), which
    contains code specific to synchronize_rcu_expedited().

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This is a nearly pure code-movement commit, moving rcu_report_exp_rnp(),
    sync_rcu_preempt_exp_done(), and rcu_preempted_readers_exp() so
    that later commits can make synchronize_sched_expedited() use them.
    The non-code-movement portion of this commit tags rcu_report_exp_rnp()
    as __maybe_unused to avoid build errors when CONFIG_PREEMPT=n.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Now that there is an ->expedited_wq waitqueue in each rcu_state structure,
    there is no need for the sync_rcu_preempt_exp_wq global variable. This
    commit therefore substitutes ->expedited_wq for sync_rcu_preempt_exp_wq.
    It also initializes ->expedited_wq only once at boot instead of at the
    start of each expedited grace period.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

04 Aug, 2015

1 commit


23 Jul, 2015

1 commit

  • RCU's rcu_oom_notify() disables CPU hotplug in order to stabilize the
    list of online CPUs, which it traverses. However, this is completely
    pointless because smp_call_function_single() will quietly fail if invoked
    on an offline CPU. Because the count of requests is incremented in the
    rcu_oom_notify_cpu() function that is remotely invoked, everything works
    nicely even in the face of concurrent CPU-hotplug operations.

    Furthermore, in recent kernels, invoking get_online_cpus() from an OOM
    notifier can result in deadlock. This commit therefore removes the
    call to get_online_cpus() and put_online_cpus() from rcu_oom_notify().

    Reported-by: Marcin Ślusarz
    Reported-by: David Rientjes
    Signed-off-by: Paul E. McKenney
    Acked-by: David Rientjes
    Tested-by: Marcin Ślusarz

    Paul E. McKenney