01 Dec, 2018

1 commit

  • commit 92aa39e9dc77481b90cbef25e547d66cab901496 upstream.

    The per-CPU rcu_dynticks.rcu_urgent_qs variable communicates an urgent
    need for an RCU quiescent state from the force-quiescent-state processing
    within the grace-period kthread to context switches and to cond_resched().
    Unfortunately, such urgent needs are not communicated to need_resched(),
    which is sometimes used to decide when to invoke cond_resched(), for
    but one example, within the KVM vcpu_run() function. As of v4.15, this
    can result in synchronize_sched() being delayed by up to ten seconds,
    which can be problematic, to say nothing of annoying.

    This commit therefore checks rcu_dynticks.rcu_urgent_qs from within
    rcu_check_callbacks(), which is invoked from the scheduling-clock
    interrupt handler. If the current task is not an idle task and is
    not executing in usermode, a context switch is forced, and either way,
    the rcu_dynticks.rcu_urgent_qs variable is set to false. If the current
    task is an idle task, then RCU's dyntick-idle code will detect the
    quiescent state, so no further action is required. Similarly, if the
    task is executing in usermode, other code in rcu_check_callbacks() and
    its called functions will report the corresponding quiescent state.

    Reported-by: Marius Hillenbrand
    Reported-by: David Woodhouse
    Suggested-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney
    [ paulmck: Backported to make patch apply cleanly on older versions. ]
    Tested-by: Marius Hillenbrand
    Cc: # 4.12.x - 4.19.x
    Signed-off-by: Greg Kroah-Hartman

    Paul E. McKenney
     

30 May, 2018

1 commit


17 Feb, 2018

1 commit

  • commit 156baec39732f025dc778e00da95fc10d6e45885 upstream.

    Use of init_rcu_head() and destroy_rcu_head() from modules results in
    the following build-time error with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y:

    ERROR: "init_rcu_head" [drivers/scsi/scsi_mod.ko] undefined!
    ERROR: "destroy_rcu_head" [drivers/scsi/scsi_mod.ko] undefined!

    This commit therefore adds EXPORT_SYMBOL_GPL() for each to allow them to
    be used by GPL-licensed kernel modules.

    Reported-by: Bart Van Assche
    Reported-by: Stephen Rothwell
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Martin K. Petersen
    Signed-off-by: Greg Kroah-Hartman

    Paul E. McKenney
     

24 Nov, 2017

1 commit

  • commit 135bd1a230bb69a68c9808a7d25467318900b80a upstream.

    The pending-callbacks check in rcu_prepare_for_idle() is backwards.
    It should accelerate if there are pending callbacks, but the check
    rather uselessly accelerates only if there are no callbacks. This commit
    therefore inverts this check.

    Fixes: 15fecf89e46a ("srcu: Abstract multi-tail callback list handling")
    Signed-off-by: Neeraj Upadhyay
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Greg Kroah-Hartman

    Neeraj Upadhyay
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

20 Oct, 2017

1 commit


04 Oct, 2017

1 commit

  • Pull tracing fixlets from Steven Rostedt:
    "Two updates:

    - A memory fix with left over code from spliting out ftrace_ops and
    function graph tracer, where the function graph tracer could reset
    the trampoline pointer, leaving the old trampoline not to be freed
    (memory leak).

    - The update to Paul's patch that added the unnecessary READ_ONCE().
    This removes the unnecessary READ_ONCE() instead of having to
    rebase the branch to update the patch that added it"

    * tag 'trace-v4.14-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    rcu: Remove extraneous READ_ONCE()s from rcu_irq_{enter,exit}()
    ftrace: Fix kmemleak in unregister_ftrace_graph

    Linus Torvalds
     

03 Oct, 2017

1 commit

  • The read of ->dynticks_nmi_nesting in rcu_irq_enter() and rcu_irq_exit()
    is currently protected with READ_ONCE(). However, this protection is
    unnecessary because (1) ->dynticks_nmi_nesting is updated only by the
    current CPU, (2) Although NMI handlers can update this field, they reset
    it back to its old value before return, and (3) Interrupts are disabled,
    so nothing else can modify it. The value of ->dynticks_nmi_nesting is
    thus effectively constant, and so no protection is required.

    This commit therefore removes the READ_ONCE() protection from these
    two accesses.

    Link: http://lkml.kernel.org/r/20170926031902.GA2074@linux.vnet.ibm.com

    Reported-by: Linus Torvalds
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Steven Rostedt (VMware)

    Paul E. McKenney
     

26 Sep, 2017

1 commit

  • Pull tracing fixes from Steven Rostedt:
    "Stack tracing and RCU has been having issues with each other and
    lockdep has been pointing out constant problems.

    The changes have been going into the stack tracer, but it has been
    discovered that the problem isn't with the stack tracer itself, but it
    is with calling save_stack_trace() from within the internals of RCU.

    The stack tracer is the one that can trigger the issue the easiest,
    but examining the problem further, it could also happen from a WARN()
    in the wrong place, or even if an NMI happened in this area and it did
    an rcu_read_lock().

    The critical area is where RCU is not watching. Which can happen while
    going to and from idle, or bringing up or taking down a CPU.

    The final fix was to put the protection in kernel_text_address() as it
    is the one that requires RCU to be watching while doing the stack
    trace.

    To make this work properly, Paul had to allow rcu_irq_enter() happen
    after rcu_nmi_enter(). This should have been done anyway, since an NMI
    can page fault (reading vmalloc area), and a page fault triggers
    rcu_irq_enter().

    One patch is just a consolidation of code so that the fix only needed
    to be done in one location"

    * tag 'trace-v4.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Remove RCU work arounds from stack tracer
    extable: Enable RCU if it is not watching in kernel_text_address()
    extable: Consolidate *kernel_text_address() functions
    rcu: Allow for page faults in NMI handlers

    Linus Torvalds
     

24 Sep, 2017

1 commit

  • A number of architecture invoke rcu_irq_enter() on exception entry in
    order to allow RCU read-side critical sections in the exception handler
    when the exception is from an idle or nohz_full CPU. This works, at
    least unless the exception happens in an NMI handler. In that case,
    rcu_nmi_enter() would already have exited the extended quiescent state,
    which would mean that rcu_irq_enter() would (incorrectly) cause RCU
    to think that it is again in an extended quiescent state. This will
    in turn result in lockdep splats in response to later RCU read-side
    critical sections.

    This commit therefore causes rcu_irq_enter() and rcu_irq_exit() to
    take no action if there is an rcu_nmi_enter() in effect, thus avoiding
    the unscheduled return to RCU quiescent state. This in turn should
    make the kernel safe for on-demand RCU voyeurism.

    Link: http://lkml.kernel.org/r/20170922211022.GA18084@linux.vnet.ibm.com

    Cc: stable@vger.kernel.org
    Fixes: 0be964be0 ("module: Sanitize RCU usage and locking")
    Reported-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Steven Rostedt (VMware)

    Paul E. McKenney
     

09 Sep, 2017

1 commit

  • First, number of CPUs can't be negative number.

    Second, different signnnedness leads to suboptimal code in the following
    cases:

    1)
    kmalloc(nr_cpu_ids * sizeof(X));

    "int" has to be sign extended to size_t.

    2)
    while (loff_t *pos < nr_cpu_ids)

    MOVSXD is 1 byte longed than the same MOV.

    Other cases exist as well. Basically compiler is told that nr_cpu_ids
    can't be negative which can't be deduced if it is "int".

    Code savings on allyesconfig kernel: -3KB

    add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
    function old new delta
    coretemp_cpu_online 450 512 +62
    rcu_init_one 1234 1272 +38
    pci_device_probe 374 399 +25

    ...

    pgdat_reclaimable_pages 628 556 -72
    select_fallback_rq 446 369 -77
    task_numa_find_cpu 1923 1807 -116

    Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

17 Aug, 2017

13 commits

  • …isc.2017.08.17a', 'spin_unlock_wait_no.2017.08.17a', 'srcu.2017.07.27c' and 'torture.2017.07.24c' into HEAD

    doc.2017.08.17a: Documentation updates.
    fixes.2017.08.17a: RCU fixes.
    hotplug.2017.07.25b: CPU-hotplug updates.
    misc.2017.08.17a: Miscellaneous fixes outside of RCU (give or take conflicts).
    spin_unlock_wait_no.2017.08.17a: Remove spin_unlock_wait().
    srcu.2017.07.27c: SRCU updates.
    torture.2017.07.24c: Torture-test updates.

    Paul E. McKenney
     
  • The rcu_idle_exit() and rcu_idle_enter() functions are exported because
    they were originally used by RCU_NONIDLE(), which was intended to
    be usable from modules. However, RCU_NONIDLE() now instead uses
    rcu_irq_enter_irqson() and rcu_irq_exit_irqson(), which are not
    exported, and there have been no complaints.

    This commit therefore removes the exports from rcu_idle_exit() and
    rcu_idle_enter().

    Reported-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • All current callers of rcu_idle_enter() have irqs disabled, and
    rcu_idle_enter() relies on this, but doesn't check. This commit
    therefore adds a RCU_LOCKDEP_WARN() to add some verification to the trust.
    While we are there, pass "true" rather than "1" to rcu_eqs_enter().

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • All callers to rcu_idle_enter() have irqs disabled, so there is no
    point in rcu_idle_enter disabling them again. This commit therefore
    replaces the irq disabling with a RCU_LOCKDEP_WARN().

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Paul E. McKenney

    Peter Zijlstra (Intel)
     
  • This commit adds assertions verifying the consistency of the rcu_node
    structure's ->blkd_tasks list and its ->gp_tasks, ->exp_tasks, and
    ->boost_tasks pointers. In particular, the ->blkd_tasks lists must be
    empty except for leaf rcu_node structures.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Set disable_rcu_irq_enter on not only rcu_eqs_enter_common() but also
    rcu_eqs_exit(), since rcu_eqs_exit() suffers from the same issue as was
    fixed for rcu_eqs_enter_common() by commit 03ecd3f48e57 ("rcu/tracing:
    Add rcu_disabled to denote when rcu_irq_enter() will not work").

    Signed-off-by: Masami Hiramatsu
    Acked-by: Steven Rostedt (VMware)
    Signed-off-by: Paul E. McKenney

    Masami Hiramatsu
     
  • The _rcu_barrier_trace() function is a wrapper for trace_rcu_barrier(),
    which needs TPS() protection for strings passed through the second
    argument. However, it has escaped prior TPS()-ification efforts because
    it _rcu_barrier_trace() does not start with "trace_". This commit
    therefore adds the needed TPS() protection

    Signed-off-by: Paul E. McKenney
    Acked-by: Steven Rostedt (VMware)

    Paul E. McKenney
     
  • These RCU waits were set to use interruptible waits to avoid the kthreads
    contributing to system load average, even though they are not interruptible
    as they are spawned from a kthread. Use the new TASK_IDLE swaits which makes
    our goal clear, and removes confusion about these paths possibly being
    interruptible -- they are not.

    When the system is idle the RCU grace-period kthread will spend all its time
    blocked inside the swait_event_interruptible(). If the interruptible() was
    not used, then this kthread would contribute to the load average. This means
    that an idle system would have a load average of 2 (or 3 if PREEMPT=y),
    rather than the load average of 0 that almost fifty years of UNIX has
    conditioned sysadmins to expect.

    The same argument applies to swait_event_interruptible_timeout() use. The
    RCU grace-period kthread spends its time blocked inside this call while
    waiting for grace periods to complete. In particular, if there was only one
    busy CPU, but that CPU was frequently invoking call_rcu(), then the RCU
    grace-period kthread would spend almost all its time blocked inside the
    swait_event_interruptible_timeout(). This would mean that the load average
    would be 2 rather than the expected 1 for the single busy CPU.

    Acked-by: "Eric W. Biederman"
    Tested-by: Paul E. McKenney
    Signed-off-by: Luis R. Rodriguez
    Signed-off-by: Paul E. McKenney

    Luis R. Rodriguez
     
  • There is currently event tracing to track when a task is preempted
    within a preemptible RCU read-side critical section, and also when that
    task subsequently reaches its outermost rcu_read_unlock(), but none
    indicating when a new grace period starts when that grace period must
    wait on pre-existing readers that have been been preempted at least once
    since the beginning of their current RCU read-side critical sections.

    This commit therefore adds an event trace at grace-period start in
    the case where there are such readers. Note that only the first
    reader in the list is traced.

    Signed-off-by: Paul E. McKenney
    Acked-by: Steven Rostedt (VMware)

    Paul E. McKenney
     
  • This commit saves a few lines in kernel/rcu/rcu.h by moving to single-line
    definitions for trivial functions, instead of the old style where the
    two curly braces each get their own line.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Strings used in event tracing need to be specially handled, for example,
    using the TPS() macro. Without the TPS() macro, although output looks
    fine from within a running kernel, extracting traces from a crash dump
    produces garbage instead of strings. This commit therefore adds the TPS()
    macro to some unadorned strings that were passed to event-tracing macros.

    Signed-off-by: Paul E. McKenney
    Acked-by: Steven Rostedt (VMware)

    Paul E. McKenney
     
  • Currently, the exit-time support for TASKS_RCU is open-coded in do_exit().
    This commit creates exit_tasks_rcu_start() and exit_tasks_rcu_finish()
    APIs for do_exit() use. This has the benefit of confining the use of the
    tasks_rcu_exit_srcu variable to one file, allowing it to become static.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The actual use of TASKS_RCU is only when PREEMPT, otherwise RCU-sched
    is used instead. This commit therefore makes synchronize_rcu_tasks()
    and call_rcu_tasks() available always, but mapped to synchronize_sched()
    and call_rcu_sched(), respectively, when !PREEMPT. This approach also
    allows some #ifdefs to be removed from rcutorture.

    Reported-by: Ingo Molnar
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Masami Hiramatsu
    Acked-by: Ingo Molnar

    Paul E. McKenney
     

28 Jul, 2017

1 commit

  • Tree RCU guarantees that every online CPU has a memory barrier between
    any given grace period and any of that CPU's RCU read-side sections that
    must be ordered against that grace period. Since RCU doesn't always
    know where read-side critical sections are, the actual implementation
    guarantees order against prior and subsequent non-idle non-offline code,
    whether in an RCU read-side critical section or not. As a result, there
    does not need to be a memory barrier at the end of synchronize_rcu()
    and friends because the ordering internal to the grace period has
    ordered every CPU's post-grace-period execution against each CPU's
    pre-grace-period execution, again for all non-idle online CPUs.

    In contrast, SRCU can have non-idle online CPUs that are completely
    uninvolved in a given SRCU grace period, for example, a CPU that
    never runs any SRCU read-side critical sections and took no part in
    the grace-period processing. It is in theory possible for a given
    synchronize_srcu()'s wakeup to be delivered to a CPU that was completely
    uninvolved in the prior SRCU grace period, which could mean that the
    code following that synchronize_srcu() would end up being unordered with
    respect to both the grace period and any pre-existing SRCU read-side
    critical sections.

    This commit therefore adds an smp_mb() to the end of __synchronize_srcu(),
    which prevents this scenario from occurring.

    Reported-by: Lance Roy
    Signed-off-by: Paul E. McKenney
    Acked-by: Lance Roy
    Cc: # 4.12.x

    Paul E. McKenney
     

26 Jul, 2017

12 commits

  • After adopting callbacks from a newly offlined CPU, the adopting CPU
    checks to make sure that its callback list's count is zero only if the
    list has no callbacks and vice versa. Unfortunately, it does so after
    enabling interrupts, which means that false positives are possible due to
    interrupt handlers invoking call_rcu(). Although these false positives
    are improbable, rcutorture did make it happen once.

    This commit therefore moves this check to an irq-disabled region of code,
    thus suppressing the false positive.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Given changes to callback migration, rcu_cblist_head(),
    rcu_cblist_tail(), rcu_cblist_count_cbs(), rcu_segcblist_segempty(),
    rcu_segcblist_dequeued_lazy(), and rcu_segcblist_new_cbs() are
    no longer used. This commit therefore removes them.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Given that the rcu_state structure's >orphan_pend and ->orphan_done
    fields are used only during migration of callbacks from the recently
    offlined CPU to a surviving CPU, if rcu_send_cbs_to_orphanage() and
    rcu_adopt_orphan_cbs() are combined, these fields can become local
    variables in the combined function. This commit therefore combines
    rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() into a new
    rcu_segcblist_merge() function and removes the ->orphan_pend and
    ->orphan_done fields.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • When migrating callbacks from a newly offlined CPU, we are already
    holding the root rcu_node structure's lock, so it costs almost nothing
    to advance and accelerate the newly migrated callbacks. This patch
    therefore makes this advancing and acceleration happen.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The ->orphan_lock is acquired and released only within the
    rcu_migrate_callbacks() function, which now acquires the root rcu_node
    structure's ->lock. This commit therefore eliminates the ->orphan_lock
    in favor of the root rcu_node structure's ->lock.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • It is possible that the outgoing CPU is unaware of recent grace periods,
    and so it is also possible that some of its pending callbacks are actually
    ready to be invoked. The current callback-migration code would needlessly
    force these callbacks to pass through another grace period. This commit
    therefore invokes rcu_advance_cbs() on the outgoing CPU's callbacks in
    order to give them full credit for having passed through any recent
    grace periods.

    This also fixes an odd theoretical bug where there are no callbacks in
    the system except for those on the outgoing CPU, none of those callbacks
    have yet been associated with a grace-period number, there is never again
    another callback registered, and the surviving CPU never again takes a
    scheduling-clock interrupt, never goes idle, and never enters nohz_full
    userspace execution. Yes, this is (just barely) possible. It requires
    that the surviving CPU be a nohz_full CPU, that its scheduler-clock
    interrupt be shut off, and that it loop forever in the kernel. You get
    bonus points if you can make this one happen! ;-)

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • RCU's CPU-hotplug callback-migration code first moves the outgoing
    CPU's callbacks to ->orphan_done and ->orphan_pend, and only then
    moves them to the NOCB callback list. This commit avoids the
    extra step (and simplifies the code) by moving the callbacks directly
    from the outgoing CPU's callback list to the NOCB callback list.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The current CPU-hotplug RCU-callback-migration code checks
    for the source (newly offlined) CPU being a NOCBs CPU down in
    rcu_send_cbs_to_orphanage(). This commit simplifies callback migration a
    bit by moving this check up to rcu_migrate_callbacks(). This commit also
    adds a check for the source CPU having no callbacks, which eases analysis
    of the rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() functions.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The rcu_node structure's ->n_cbs_orphaned and ->n_cbs_adopted fields
    are updated, but never read. This commit therefore removes them.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The update of the ->expmaskinitnext and of ->ncpus are unsynchronized,
    with the value of ->ncpus being incremented long before the corresponding
    ->expmaskinitnext mask is updated. If an RCU expedited grace period
    sees ->ncpus change, it will update the ->expmaskinit masks from the new
    ->expmaskinitnext masks. But it is possible that ->ncpus has already
    been updated, but the ->expmaskinitnext masks still have their old values.
    For the current expedited grace period, no harm done. The CPU could not
    have been online before the grace period started, so there is no need to
    wait for its non-existent pre-existing readers.

    But the next RCU expedited grace period is in a world of hurt. The value
    of ->ncpus has already been updated, so this grace period will assume
    that the ->expmaskinitnext masks have not changed. But they have, and
    they won't be taken into account until the next never-been-online CPU
    comes online. This means that RCU will be ignoring some CPUs that it
    should be paying attention to.

    The solution is to update ->ncpus and ->expmaskinitnext while holding
    the ->lock for the rcu_node structure containing the ->expmaskinitnext
    mask. Because smp_store_release() is now used to update ->ncpus and
    smp_load_acquire() is now used to locklessly read it, if the expedited
    grace period sees ->ncpus change, then the updating CPU has to
    already be holding the corresponding ->lock. Therefore, when the
    expedited grace period later acquires that ->lock, it is guaranteed
    to see the new value of ->expmaskinitnext.

    On the other hand, if the expedited grace period loads ->ncpus just
    before an update, earlier full memory barriers guarantee that
    the incoming CPU isn't far enough along to be running any RCU readers.

    This commit therefore makes the required change.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • RCU callbacks must be migrated away from an outgoing CPU, and this is
    done near the end of the CPU-hotplug operation, after the outgoing CPU is
    long gone. Unfortunately, this means that other CPU-hotplug callbacks
    can execute while the outgoing CPU's callbacks are still immobilized
    on the long-gone CPU's callback lists. If any of these CPU-hotplug
    callbacks must wait, either directly or indirectly, for the invocation
    of any of the immobilized RCU callbacks, the system will hang.

    This commit avoids such hangs by migrating the callbacks away from the
    outgoing CPU immediately upon its departure, shortly after the return
    from __cpu_die() in takedown_cpu(). Thus, RCU is able to advance these
    callbacks and invoke them, which allows all the after-the-fact CPU-hotplug
    callbacks to wait on these RCU callbacks without risk of a hang.

    While in the neighborhood, this commit also moves rcu_send_cbs_to_orphanage()
    and rcu_adopt_orphan_cbs() under a pre-existing #ifdef to avoid including
    dead code on the one hand and to avoid define-without-use warnings on the
    other hand.

    Reported-by: Jeffrey Hugo
    Link: http://lkml.kernel.org/r/db9c91f6-1b17-6136-84f0-03c3c2581ab4@codeaurora.org
    Signed-off-by: Paul E. McKenney
    Cc: Thomas Gleixner
    Cc: Sebastian Andrzej Siewior
    Cc: Ingo Molnar
    Cc: Anna-Maria Gleixner
    Cc: Boris Ostrovsky
    Cc: Richard Weinberger

    Paul E. McKenney
     
  • The handling of RCU's no-CBs CPUs has a maintenance headache, namely
    that if call_rcu() is invoked with interrupts disabled, the rcuo kthread
    wakeup must be defered to a point where we can be sure that scheduler
    locks are not held. Of course, there are a lot of code paths leading
    from an interrupts-disabled invocation of call_rcu(), and missing any
    one of these can result in excessive callback-invocation latency, and
    potentially even system hangs.

    This commit therefore uses a timer to guarantee that the wakeup will
    eventually occur. If one of the deferred-wakeup points kicks in, then
    the timer is simply cancelled.

    This commit also fixes up an incomplete removal of commits that were
    intended to plug remaining exit paths, which should have the added
    benefit of reducing the overhead of RCU's context-switch hooks. In
    addition, it simplifies leader-to-follower callback-list handoff by
    introducing locking. The call_rcu()-to-leader handoff continues to
    use atomic operations in order to maintain good real-time latency for
    common-case use of call_rcu().

    Signed-off-by: Paul E. McKenney
    [ paulmck: Dan Carpenter fix for mod_timer() usage bug found by smatch. ]

    Paul E. McKenney
     

25 Jul, 2017

3 commits