21 Feb, 2018

5 commits

  • RCU's nxttail has been optimized to be a rcu_segcblist, which is
    a multi-tailed linked list with macros defined for the indexes for
    each tail. The indexes have been defined in linux/rcu_segcblist.h,
    so this commit removes the redundant definitions in kernel/rcu/tree.h.

    Signed-off-by: Liu Changcheng
    Signed-off-by: Paul E. McKenney

    Liu, Changcheng
     
  • The debugfs interface displayed statistics on RCU-pending checks but
    this interface has since been removed. This commit therefore removes the
    no-longer-used rcu_state structure's ->n_force_qs_lh and ->n_force_qs_ngp
    fields along with their updates. (Though the ->n_force_qs_ngp field
    was actually not used at all, embarrassingly enough.)

    If this information proves necessary in the future, the corresponding
    event traces will be added.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The debugfs interface displayed statistics on RCU-pending checks
    but this interface has since been removed. This commit therefore
    removes the no-longer-used rcu_data structure's ->n_rcu_pending,
    ->n_rp_core_needs_qs, ->n_rp_report_qs, ->n_rp_cb_ready,
    ->n_rp_cpu_needs_gp, ->n_rp_gp_completed, ->n_rp_gp_started,
    ->n_rp_nocb_defer_wakeup, and ->n_rp_need_nothing fields along with
    their updates.

    If this information proves necessary in the future, the corresponding
    event traces will be added.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The debugfs interface displayed statistics on RCU callback invocation but
    this interface has since been removed. This commit therefore removes the
    no-longer-used rcu_data structure's ->n_cbs_invoked and ->n_nocbs_invoked
    fields along with their updates.

    If this information proves necessary in the future, the corresponding
    event traces will be added.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The debugfs interface displayed statistics on RCU priority boosting,
    but this interface has since been removed. This commit therefore
    removes the no-longer-used rcu_data structure's ->n_tasks_boosted,
    ->n_exp_boosts, and ->n_exp_boosts and their updates.

    If this information proves necessary in the future, the corresponding
    event traces will be added.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

29 Nov, 2017

1 commit

  • Because the ->dynticks_nesting field now only contains the process-based
    nesting level instead of a value encoding both the process nesting level
    and the irq "nesting" level, we no longer need a long long, even on
    32-bit systems. This commit therefore changes both the ->dynticks_nesting
    and ->dynticks_nmi_nesting fields to long.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

28 Nov, 2017

1 commit


10 Oct, 2017

1 commit

  • One common question upon seeing an RCU CPU stall warning is "did
    the stalled CPUs have interrupts disabled?" However, the current
    stall warnings are silent on this point. This commit therefore
    uses irq_work to check whether stalled CPUs still respond to IPIs,
    and flags this state in the RCU CPU stall warning console messages.

    Reported-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

17 Aug, 2017

1 commit

  • …isc.2017.08.17a', 'spin_unlock_wait_no.2017.08.17a', 'srcu.2017.07.27c' and 'torture.2017.07.24c' into HEAD

    doc.2017.08.17a: Documentation updates.
    fixes.2017.08.17a: RCU fixes.
    hotplug.2017.07.25b: CPU-hotplug updates.
    misc.2017.08.17a: Miscellaneous fixes outside of RCU (give or take conflicts).
    spin_unlock_wait_no.2017.08.17a: Remove spin_unlock_wait().
    srcu.2017.07.27c: SRCU updates.
    torture.2017.07.24c: Torture-test updates.

    Paul E. McKenney
     

26 Jul, 2017

5 commits

  • Given that the rcu_state structure's >orphan_pend and ->orphan_done
    fields are used only during migration of callbacks from the recently
    offlined CPU to a surviving CPU, if rcu_send_cbs_to_orphanage() and
    rcu_adopt_orphan_cbs() are combined, these fields can become local
    variables in the combined function. This commit therefore combines
    rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() into a new
    rcu_segcblist_merge() function and removes the ->orphan_pend and
    ->orphan_done fields.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The ->orphan_lock is acquired and released only within the
    rcu_migrate_callbacks() function, which now acquires the root rcu_node
    structure's ->lock. This commit therefore eliminates the ->orphan_lock
    in favor of the root rcu_node structure's ->lock.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • RCU's CPU-hotplug callback-migration code first moves the outgoing
    CPU's callbacks to ->orphan_done and ->orphan_pend, and only then
    moves them to the NOCB callback list. This commit avoids the
    extra step (and simplifies the code) by moving the callbacks directly
    from the outgoing CPU's callback list to the NOCB callback list.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The rcu_node structure's ->n_cbs_orphaned and ->n_cbs_adopted fields
    are updated, but never read. This commit therefore removes them.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The handling of RCU's no-CBs CPUs has a maintenance headache, namely
    that if call_rcu() is invoked with interrupts disabled, the rcuo kthread
    wakeup must be defered to a point where we can be sure that scheduler
    locks are not held. Of course, there are a lot of code paths leading
    from an interrupts-disabled invocation of call_rcu(), and missing any
    one of these can result in excessive callback-invocation latency, and
    potentially even system hangs.

    This commit therefore uses a timer to guarantee that the wakeup will
    eventually occur. If one of the deferred-wakeup points kicks in, then
    the timer is simply cancelled.

    This commit also fixes up an incomplete removal of commits that were
    intended to plug remaining exit paths, which should have the added
    benefit of reducing the overhead of RCU's context-switch hooks. In
    addition, it simplifies leader-to-follower callback-list handoff by
    introducing locking. The call_rcu()-to-leader handoff continues to
    use atomic operations in order to maintain good real-time latency for
    common-case use of call_rcu().

    Signed-off-by: Paul E. McKenney
    [ paulmck: Dan Carpenter fix for mod_timer() usage bug found by smatch. ]

    Paul E. McKenney
     

09 Jun, 2017

4 commits

  • RCU's debugfs tracing used to be the only reasonable low-level debug
    information available, but ftrace and event tracing has since surpassed
    the RCU debugfs level of usefulness. This commit therefore removes
    RCU's debugfs tracing.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The NO_HZ_FULL_SYSIDLE full-system-idle capability was added in 2013
    by commit 0edd1b1784cb ("nohz_full: Add full-system-idle state machine"),
    but has not been used. This commit therefore removes it.

    If it turns out to be needed later, this commit can always be reverted.

    Signed-off-by: Paul E. McKenney
    Cc: Frederic Weisbecker
    Cc: Rik van Riel
    Cc: Ingo Molnar
    Acked-by: Linus Torvalds

    Paul E. McKenney
     
  • This commit moves the now-generic rnp->lock wrapper macros from
    kernel/rcu/tree.h to kernel/rcu/rcu.h, thus allowing SRCU to use them.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Use of smp_mb__after_unlock_lock() would allow SRCU to omit a full
    memory barrier during callback execution, so this commit converts
    raw_spin_lock_rcu_node() from inline functions to type-generic macros
    to allow them to handle locks in srcu_node structures as well as
    rcu_node structures.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

08 Jun, 2017

2 commits

  • The RCU_NOGP_WAKE_NOT, RCU_NOGP_WAKE, and RCU_NOGP_WAKE_FORCE flags
    are used to mediate wakeups for the no-CBs CPU kthreads. The "NOGP"
    really doesn't make any sense, so this commit does s/NOGP/NOCB/.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Although preemptible RCU allows its read-side critical sections to be
    preempted, general blocking is forbidden. The reason for this is that
    excessive preemption times can be handled by CONFIG_RCU_BOOST=y, but a
    voluntarily blocked task doesn't care how high you boost its priority.
    Because preemptible RCU is a global mechanism, one ill-behaved reader
    hurts everyone. Hence the prohibition against general blocking in
    RCU-preempt read-side critical sections. Preemption yes, blocking no.

    This commit enforces this prohibition.

    There is a special exception for the -rt patchset (which they kindly
    volunteered to implement): It is OK to block (as opposed to merely being
    preempted) within an RCU-preempt read-side critical section, but only if
    the blocking is subject to priority inheritance. This exception permits
    CONFIG_RCU_BOOST=y to get -rt RCU readers out of trouble.

    Why doesn't this exception also apply to mainline's rt_mutex? Because
    of the possibility that someone does general blocking while holding
    an rt_mutex. Yes, the priority boosting will affect the rt_mutex,
    but it won't help with the task doing general blocking while holding
    that rt_mutex.

    Reported-by: Thomas Gleixner
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

02 May, 2017

1 commit

  • Linus noticed that the has huge inline functions
    which should not be inline at all.

    As a first step in cleaning this up, move them all to kernel/rcu/ and
    only keep an absolute minimum of data type defines in the header:

    before: -rw-r--r-- 1 mingo mingo 22284 May 2 10:25 include/linux/rcu_segcblist.h
    after: -rw-r--r-- 1 mingo mingo 3180 May 2 10:22 include/linux/rcu_segcblist.h

    More can be done, such as uninlining the large functions, which inlining
    is unjustified even if it's an RCU internal matter.

    Reported-by: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Signed-off-by: Paul E. McKenney

    Ingo Molnar
     

21 Apr, 2017

1 commit

  • Peter Zijlstra proposed using SRCU to reduce mmap_sem contention [1,2],
    however, there are workloads that could result in a high volume of
    concurrent invocations of call_srcu(), which with current SRCU would
    result in excessive lock contention on the srcu_struct structure's
    ->queue_lock, which protects SRCU's callback lists. This commit therefore
    moves SRCU to per-CPU callback lists, thus greatly reducing contention.

    Because a given SRCU instance no longer has a single centralized callback
    list, starting grace periods and invoking callbacks are both more complex
    than in the single-list Classic SRCU implementation. Starting grace
    periods and handling callbacks are now handled using an srcu_node tree
    that is in some ways similar to the rcu_node trees used by RCU-bh,
    RCU-preempt, and RCU-sched (for example, the srcu_node tree shape is
    controlled by exactly the same Kconfig options and boot parameters that
    control the shape of the rcu_node tree).

    In addition, the old per-CPU srcu_array structure is now named srcu_data
    and contains an rcu_segcblist structure named ->srcu_cblist for its
    callbacks (and a spinlock to protect this). The srcu_struct gets
    an srcu_gp_seq that is used to associate callback segments with the
    corresponding completion-time grace-period number. These completion-time
    grace-period numbers are propagated up the srcu_node tree so that the
    grace-period workqueue handler can determine whether additional grace
    periods are needed on the one hand and where to look for callbacks that
    are ready to be invoked.

    The srcu_barrier() function must now wait on all instances of the per-CPU
    ->srcu_cblist. Because each ->srcu_cblist is protected by ->lock,
    srcu_barrier() can remotely add the needed callbacks. In theory,
    it could also remotely start grace periods, but in practice doing so
    is complex and racy. And interestingly enough, it is never necessary
    for srcu_barrier() to start a grace period because srcu_barrier() only
    enqueues a callback when a callback is already present--and it turns out
    that a grace period has to have already been started for this pre-existing
    callback. Furthermore, it is only the callback that srcu_barrier()
    needs to wait on, not any particular grace period. Therefore, a new
    rcu_segcblist_entrain() function enqueues the srcu_barrier() function's
    callback into the same segment occupied by the last pre-existing callback
    in the list. The special case where all the pre-existing callbacks are
    on a different list (because they are in the process of being invoked)
    is handled by enqueuing srcu_barrier()'s callback into the RCU_DONE_TAIL
    segment, relying on the done-callbacks check that takes place after all
    callbacks are inovked.

    Note that the readers use the same algorithm as before. Note that there
    is a separate srcu_idx that tells the readers what counter to increment.
    This unfortunately cannot be combined with srcu_gp_seq because they
    need to be incremented at different times.

    This commit introduces some ugly #ifdefs in rcutorture. These will go
    away when I feel good enough about Tree SRCU to ditch Classic SRCU.

    Some crude performance comparisons, courtesy of a quickly hacked rcuperf
    asynchronous-grace-period capability:

    Callback Queuing Overhead
    -------------------------
    # CPUS Classic SRCU Tree SRCU
    ------ ------------ ---------
    2 0.349 us 0.342 us
    16 31.66 us 0.4 us
    41 --------- 0.417 us

    The times are the 90th percentiles, a statistic that was chosen to reject
    the overheads of the occasional srcu_barrier() call needed to avoid OOMing
    the test machine. The rcuperf test hangs when running Classic SRCU at 41
    CPUs, hence the line of dashes. Despite the hacks to both the rcuperf code
    and that statistics, this is a convincing demonstration of Tree SRCU's
    performance and scalability advantages.

    [1] https://lwn.net/Articles/309030/
    [2] https://patchwork.kernel.org/patch/5108281/

    Signed-off-by: Paul E. McKenney
    [ paulmck: Fix initialization if synchronize_srcu_expedited() called first. ]

    Paul E. McKenney
     

19 Apr, 2017

10 commits

  • This commit moves rcu_for_each_node_breadth_first(),
    rcu_for_each_nonleaf_node_breadth_first(), and
    rcu_for_each_leaf_node() from kernel/rcu/tree.h to
    kernel/rcu/rcu.h so that SRCU can access them.
    This commit is code-movement only.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit moves the C preprocessor code that defines the default shape
    of the rcu_node combining tree to a new include/linux/rcu_node_tree.h
    file as a first step towards enabling SRCU to create its own combining
    tree, which in turn enables SRCU to implement per-CPU callback handling,
    thus avoiding contention on the lock currently guarding the single list
    of callbacks. Note that users of SRCU still need to know the size of
    the srcu_struct structure, hence include/linux rather than kernel/rcu.

    This commit is code-movement only.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit switches SRCU from custom-built callback queues to the new
    rcu_segcblist structure. This change associates grace-period sequence
    numbers with groups of callbacks, which will be needed for efficient
    processing of per-CPU callbacks.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • RCU has only one multi-tail callback list, which is implemented via
    the nxtlist, nxttail, nxtcompleted, qlen_lazy, and qlen fields in the
    rcu_data structure, and whose operations are open-code throughout the
    Tree RCU implementation. This has been more or less OK in the past,
    but upcoming callback-list optimizations in SRCU could really use
    a multi-tail callback list there as well.

    This commit therefore abstracts the multi-tail callback list handling
    into a new kernel/rcu/rcu_segcblist.h file, and uses this new API.
    The simple head-and-tail pointer callback list is also abstracted and
    applied everywhere except for the NOCB callback-offload lists. (Yes,
    the plan is to apply them there as well, but this commit is already
    bigger than would be good.)

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • If the RCU_EXPERT Kconfig option is not set (the default), then the
    RCU_FANOUT_LEAF Kconfig option will not be defined, which will cause
    the leaf-level rcu_node tree fanout to default to 32 on 32-bit systems
    and 64 on 64-bit systems. This can result in excessive lock contention.
    This commit therefore changes the computation of the leaf-level rcu_node
    tree fanout so that the result will be 16 unless an explicit Kconfig or
    kernel-boot setting says otherwise.

    Reported-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The rcu_all_qs() and rcu_note_context_switch() do a series of checks,
    taking various actions to supply RCU with quiescent states, depending
    on the outcomes of the various checks. This is a bit much for scheduling
    fastpaths, so this commit creates a separate ->rcu_urgent_qs field in
    the rcu_dynticks structure that acts as a global guard for these checks.
    Thus, in the common case, rcu_all_qs() and rcu_note_context_switch()
    check the ->rcu_urgent_qs field, find it false, and simply return.

    Signed-off-by: Paul E. McKenney
    Cc: Peter Zijlstra

    Paul E. McKenney
     
  • The rcu_momentary_dyntick_idle() function scans the RCU flavors, checking
    that one of them still needs a quiescent state before doing an expensive
    atomic operation on the ->dynticks counter. However, this check reduces
    overhead only after a rare race condition, and increases complexity. This
    commit therefore removes the scan and the mechanism enabling the scan.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The rcu_qs_ctr variable is yet another isolated per-CPU variable,
    so this commit pulls it into the pre-existing rcu_dynticks per-CPU
    structure.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The rcu_sched_qs_mask variable is yet another isolated per-CPU variable,
    so this commit pulls it into the pre-existing rcu_dynticks per-CPU
    structure.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, IPIs are used to force other CPUs to invalidate their TLBs
    in response to a kernel virtual-memory mapping change. This works, but
    degrades both battery lifetime (for idle CPUs) and real-time response
    (for nohz_full CPUs), and in addition results in unnecessary IPIs due to
    the fact that CPUs executing in usermode are unaffected by stale kernel
    mappings. It would be better to cause a CPU executing in usermode to
    wait until it is entering kernel mode to do the flush, first to avoid
    interrupting usemode tasks and second to handle multiple flush requests
    with a single flush in the case of a long-running user task.

    This commit therefore reserves a bit at the bottom of the ->dynticks
    counter, which is checked upon exit from extended quiescent states.
    If it is set, it is cleared and then a new rcu_eqs_special_exit() macro is
    invoked, which, if not supplied, is an empty single-pass do-while loop.
    If this bottom bit is set on -entry- to an extended quiescent state,
    then a WARN_ON_ONCE() triggers.

    This bottom bit may be set using a new rcu_eqs_special_set() function,
    which returns true if the bit was set, or false if the CPU turned
    out to not be in an extended quiescent state. Please note that this
    function refuses to set the bit for a non-nohz_full CPU when that CPU
    is executing in usermode because usermode execution is tracked by RCU
    as a dyntick-idle extended quiescent state only for nohz_full CPUs.

    Reported-by: Andy Lutomirski
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

02 Mar, 2017

1 commit


26 Jan, 2017

2 commits

  • …rcu.2017.01.25a' and 'torture.2017.01.15b' into HEAD

    doc.2017.01.15b: Documentation updates
    dyntick.2017.01.23a: Dyntick tracking consolidation
    fixes.2017.01.23a: Miscellaneous fixes
    srcu.2017.01.25a: SRCU rewrite, fixes, and verification
    torture.2017.01.15b: Torture-test updates

    Paul E. McKenney
     
  • If a process invokes synchronize_srcu(), is delayed just the right amount
    of time, and thus does not sleep when waiting for the grace period to
    complete, there is no ordering between the end of the grace period and
    the code following the synchronize_srcu(). Similarly, there can be a
    lack of ordering between the end of the SRCU grace period and callback
    invocation.

    This commit adds the necessary ordering.

    Reported-by: Lance Roy
    Signed-off-by: Paul E. McKenney
    [ paulmck: Further smp_mb() adjustment per email with Lance Roy. ]

    Paul E. McKenney
     

24 Jan, 2017

2 commits

  • This commit is the fourth step towards full abstraction of all accesses
    to the ->dynticks counter, implementing previously open-coded checks and
    comparisons in new rcu_dynticks_in_eqs() and rcu_dynticks_in_eqs_since()
    functions. This abstraction will ease changes to the ->dynticks counter
    operation.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Expedited grace periods no longer fall back to normal grace periods
    in response to lock contention, given that expedited grace periods
    now use the rcu_node tree so as to avoid contention. This commit
    therfore removes the expedited_normal counter.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

15 Nov, 2016

1 commit

  • Expedited grace periods check dyntick-idle state, and avoid sending
    IPIs to idle CPUs, including those running guest OSes, and, on NOHZ_FULL
    kernels, nohz_full CPUs. However, the kernel has been observed checking
    a CPU while it was non-idle, but sending the IPI after it has gone
    idle. This commit therefore rechecks idle state immediately before
    sending the IPI, refraining from IPIing CPUs that have since gone idle.

    Reported-by: Rik van Riel
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

23 Aug, 2016

1 commit

  • The current implementation of expedited grace periods has the user
    task drive the grace period. This works, but has downsides: (1) The
    user task must awaken tasks piggybacking on this grace period, which
    can result in latencies rivaling that of the grace period itself, and
    (2) User tasks can receive signals, which interfere with RCU CPU stall
    warnings.

    This commit therefore uses workqueues to drive the grace periods, so
    that the user task need not do the awakening. A subsequent commit
    will remove the now-unnecessary code allowing for signals.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

16 Jun, 2016

1 commit

  • In many cases in the RCU tree code, we iterate over the set of cpus for
    a leaf node described by rcu_node::grplo and rcu_node::grphi, checking
    per-cpu data for each cpu in this range. However, if the set of possible
    cpus is sparse, some cpus described in this range are not possible, and
    thus no per-cpu region will have been allocated (or initialised) for
    them by the generic percpu code.

    Erroneous accesses to a per-cpu area for these !possible cpus may fault
    or may hit other data depending on the addressed generated when the
    erroneous per cpu offset is applied. In practice, both cases have been
    observed on arm64 hardware (the former being silent, but detectable with
    additional patches).

    To avoid issues resulting from this, we must iterate over the set of
    *possible* cpus for a given leaf node. This patch add a new helper,
    for_each_leaf_node_possible_cpu, to enable this. As iteration is often
    intertwined with rcu_node local bitmask manipulation, a new
    leaf_node_cpu_bit helper is added to make this simpler and more
    consistent. The RCU tree code is made to use both of these where
    appropriate.

    Without this patch, running reboot at a shell can result in an oops
    like:

    [ 3369.075979] Unable to handle kernel paging request at virtual address ffffff8008b21b4c
    [ 3369.083881] pgd = ffffffc3ecdda000
    [ 3369.087270] [ffffff8008b21b4c] *pgd=00000083eca48003, *pud=00000083eca48003, *pmd=0000000000000000
    [ 3369.096222] Internal error: Oops: 96000007 [#1] PREEMPT SMP
    [ 3369.101781] Modules linked in:
    [ 3369.104825] CPU: 2 PID: 1817 Comm: NetworkManager Tainted: G W 4.6.0+ #3
    [ 3369.121239] task: ffffffc0fa13e000 ti: ffffffc3eb940000 task.ti: ffffffc3eb940000
    [ 3369.128708] PC is at sync_rcu_exp_select_cpus+0x188/0x510
    [ 3369.134094] LR is at sync_rcu_exp_select_cpus+0x104/0x510
    [ 3369.139479] pc : [] lr : [] pstate: 200001c5
    [ 3369.146860] sp : ffffffc3eb9435a0
    [ 3369.150162] x29: ffffffc3eb9435a0 x28: ffffff8008be4f88
    [ 3369.155465] x27: ffffff8008b66c80 x26: ffffffc3eceb2600
    [ 3369.160767] x25: 0000000000000001 x24: ffffff8008be4f88
    [ 3369.166070] x23: ffffff8008b51c3c x22: ffffff8008b66c80
    [ 3369.171371] x21: 0000000000000001 x20: ffffff8008b21b40
    [ 3369.176673] x19: ffffff8008b66c80 x18: 0000000000000000
    [ 3369.181975] x17: 0000007fa951a010 x16: ffffff80086a30f0
    [ 3369.187278] x15: 0000007fa9505590 x14: 0000000000000000
    [ 3369.192580] x13: ffffff8008b51000 x12: ffffffc3eb940000
    [ 3369.197882] x11: 0000000000000006 x10: ffffff8008b51b78
    [ 3369.203184] x9 : 0000000000000001 x8 : ffffff8008be4000
    [ 3369.208486] x7 : ffffff8008b21b40 x6 : 0000000000001003
    [ 3369.213788] x5 : 0000000000000000 x4 : ffffff8008b27280
    [ 3369.219090] x3 : ffffff8008b21b4c x2 : 0000000000000001
    [ 3369.224406] x1 : 0000000000000001 x0 : 0000000000000140
    ...
    [ 3369.972257] [] sync_rcu_exp_select_cpus+0x188/0x510
    [ 3369.978685] [] synchronize_rcu_expedited+0x64/0xa8
    [ 3369.985026] [] synchronize_net+0x24/0x30
    [ 3369.990499] [] dev_deactivate_many+0x28c/0x298
    [ 3369.996493] [] __dev_close_many+0x60/0xd0
    [ 3370.002052] [] __dev_close+0x28/0x40
    [ 3370.007178] [] __dev_change_flags+0x8c/0x158
    [ 3370.012999] [] dev_change_flags+0x20/0x60
    [ 3370.018558] [] do_setlink+0x288/0x918
    [ 3370.023771] [] rtnl_newlink+0x398/0x6a8
    [ 3370.029158] [] rtnetlink_rcv_msg+0xe4/0x220
    [ 3370.034891] [] netlink_rcv_skb+0xc4/0xf8
    [ 3370.040364] [] rtnetlink_rcv+0x2c/0x40
    [ 3370.045663] [] netlink_unicast+0x160/0x238
    [ 3370.051309] [] netlink_sendmsg+0x2f0/0x358
    [ 3370.056956] [] sock_sendmsg+0x18/0x30
    [ 3370.062168] [] ___sys_sendmsg+0x26c/0x280
    [ 3370.067728] [] __sys_sendmsg+0x44/0x88
    [ 3370.073027] [] SyS_sendmsg+0x10/0x20
    [ 3370.078153] [] el0_svc_naked+0x24/0x28

    Signed-off-by: Mark Rutland
    Reported-by: Dennis Chen
    Cc: Catalin Marinas
    Cc: Josh Triplett
    Cc: Lai Jiangshan
    Cc: Mathieu Desnoyers
    Cc: Steve Capper
    Cc: Steven Rostedt
    Cc: Will Deacon
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Paul E. McKenney

    Mark Rutland