16 May, 2013

1 commit

  • When rcu_init() is called we already have slab working, allocating
    bootmem at that point results in warnings and an allocation from
    slab. This commit therefore changes alloc_bootmem_cpumask_var() to
    alloc_cpumask_var() in rcu_bootup_announce_oddness(), which is called
    from rcu_init().

    Signed-off-by: Sasha Levin
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett
    Tested-by: Robin Holt

    [paulmck: convert to zalloc_cpumask_var(), as suggested by Yinghai Lu.]

    Sasha Levin
     

15 May, 2013

1 commit

  • Commit c0f4dfd4f (rcu: Make RCU_FAST_NO_HZ take advantage of numbered
    callbacks) introduced a bug that can result in excessively long grace
    periods. This bug reverse the senes of the "if" statement checking
    for lazy callbacks, so that RCU takes a lazy approach when there are
    in fact non-lazy callbacks. This can result in excessive boot, suspend,
    and resume times.

    This commit therefore fixes the sense of this "if" statement.

    Reported-by: Borislav Petkov
    Reported-by: Bjørn Mork
    Reported-by: Joerg Roedel
    Signed-off-by: Paul E. McKenney
    Tested-by: Bjørn Mork
    Tested-by: Joerg Roedel

    Paul E. McKenney
     

06 May, 2013

1 commit

  • Pull 'full dynticks' support from Ingo Molnar:
    "This tree from Frederic Weisbecker adds a new, (exciting! :-) core
    kernel feature to the timer and scheduler subsystems: 'full dynticks',
    or CONFIG_NO_HZ_FULL=y.

    This feature extends the nohz variable-size timer tick feature from
    idle to busy CPUs (running at most one task) as well, potentially
    reducing the number of timer interrupts significantly.

    This feature got motivated by real-time folks and the -rt tree, but
    the general utility and motivation of full-dynticks runs wider than
    that:

    - HPC workloads get faster: CPUs running a single task should be able
    to utilize a maximum amount of CPU power. A periodic timer tick at
    HZ=1000 can cause a constant overhead of up to 1.0%. This feature
    removes that overhead - and speeds up the system by 0.5%-1.0% on
    typical distro configs even on modern systems.

    - Real-time workload latency reduction: CPUs running critical tasks
    should experience as little jitter as possible. The last remaining
    source of kernel-related jitter was the periodic timer tick.

    - A single task executing on a CPU is a pretty common situation,
    especially with an increasing number of cores/CPUs, so this feature
    helps desktop and mobile workloads as well.

    The cost of the feature is mainly related to increased timer
    reprogramming overhead when a CPU switches its tick period, and thus
    slightly longer to-idle and from-idle latency.

    Configuration-wise a third mode of operation is added to the existing
    two NOHZ kconfig modes:

    - CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
    as a config option. This is the traditional Linux periodic tick
    design: there's a HZ tick going on all the time, regardless of
    whether a CPU is idle or not.

    - CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
    periodic tick when a CPU enters idle mode.

    - CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
    tick when a CPU is idle, also slows the tick down to 1 Hz (one
    timer interrupt per second) when only a single task is running on a
    CPU.

    The .config behavior is compatible: existing !CONFIG_NO_HZ and
    CONFIG_NO_HZ=y settings get translated to the new values, without the
    user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
    default.

    This feature is based on a lot of infrastructure work that has been
    steadily going upstream in the last 2-3 cycles: related RCU support
    and non-periodic cputime support in particular is upstream already.

    This tree adds the final pieces and activates the feature. The pull
    request is marked RFC because:

    - it's marked 64-bit only at the moment - the 32-bit support patch is
    small but did not get ready in time.

    - it has a number of fresh commits that came in after the merge
    window. The overwhelming majority of commits are from before the
    merge window, but still some aspects of the tree are fresh and so I
    marked it RFC.

    - it's a pretty wide-reaching feature with lots of effects - and
    while the components have been in testing for some time, the full
    combination is still not very widely used. That it's default-off
    should reduce its regression abilities and obviously there are no
    known regressions with CONFIG_NO_HZ_FULL=y enabled either.

    - the feature is not completely idempotent: there is no 100%
    equivalent replacement for a periodic scheduler/timer tick. In
    particular there's ongoing work to map out and reduce its effects
    on scheduler load-balancing and statistics. This should not impact
    correctness though, there are no known regressions related to this
    feature at this point.

    - it's a pretty ambitious feature that with time will likely be
    enabled by most Linux distros, and we'd like you to make input on
    its design/implementation, if you dislike some aspect we missed.
    Without flaming us to crisp! :-)

    Future plans:

    - there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
    the periodic tick altogether when there's a single busy task on a
    CPU. We'd first like 1 Hz to be exposed more widely before we go
    for the 0 Hz target though.

    - once we reach 0 Hz we can remove the periodic tick assumption from
    nr_running>=2 as well, by essentially interrupting busy tasks only
    as frequently as the sched_latency constraints require us to do -
    once every 4-40 msecs, depending on nr_running.

    I am personally leaning towards biting the bullet and doing this in
    v3.10, like the -rt tree this effort has been going on for too long -
    but the final word is up to you as usual.

    More technical details can be found in Documentation/timers/NO_HZ.txt"

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
    sched: Keep at least 1 tick per second for active dynticks tasks
    rcu: Fix full dynticks' dependency on wide RCU nocb mode
    nohz: Protect smp_processor_id() in tick_nohz_task_switch()
    nohz_full: Add documentation.
    cputime_nsecs: use math64.h for nsec resolution conversion helpers
    nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
    nohz: Reduce overhead under high-freq idling patterns
    nohz: Remove full dynticks' superfluous dependency on RCU tree
    nohz: Fix unavailable tick_stop tracepoint in dynticks idle
    nohz: Add basic tracing
    nohz: Select wide RCU nocb for full dynticks
    nohz: Disable the tick when irq resume in full dynticks CPU
    nohz: Re-evaluate the tick for the new task after a context switch
    nohz: Prepare to stop the tick on irq exit
    nohz: Implement full dynticks kick
    nohz: Re-evaluate the tick from the scheduler IPI
    sched: New helper to prevent from stopping the tick in full dynticks
    sched: Kick full dynticks CPU that have more than one task enqueued.
    perf: New helper to prevent full dynticks CPUs from stopping tick
    perf: Kick full dynticks CPU if events rotation is needed
    ...

    Linus Torvalds
     

02 May, 2013

1 commit


19 Apr, 2013

1 commit

  • We need full dynticks CPU to also be RCU nocb so
    that we don't have to keep the tick to handle RCU
    callbacks.

    Make sure the range passed to nohz_full= boot
    parameter is a subset of rcu_nocbs=

    The CPUs that fail to meet this requirement will be
    excluded from the nohz_full range. This is checked
    early in boot time, before any CPU has the opportunity
    to stop its tick.

    Suggested-by: Steven Rostedt
    Reviewed-by: Paul E. McKenney
    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

16 Apr, 2013

1 commit

  • Adaptive-ticks CPUs inform RCU when they enter kernel mode, but they do
    not necessarily turn the scheduler-clock tick back on. This state of
    affairs could result in RCU waiting on an adaptive-ticks CPU running
    for an extended period in kernel mode. Such a CPU will never run the
    RCU state machine, and could therefore indefinitely extend the RCU state
    machine, sooner or later resulting in an OOM condition.

    This patch, inspired by an earlier patch by Frederic Weisbecker, therefore
    causes RCU's force-quiescent-state processing to check for this condition
    and to send an IPI to CPUs that remain in that state for too long.
    "Too long" currently means about three jiffies by default, which is
    quite some time for a CPU to remain in the kernel without blocking.
    The rcu_tree.jiffies_till_first_fqs and rcutree.jiffies_till_next_fqs
    sysfs variables may be used to tune "too long" if needed.

    Reported-by: Frederic Weisbecker
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett
    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Paul E. McKenney
     

26 Mar, 2013

12 commits

  • doc.2013.03.12a: Documentation changes.

    fixes.2013.03.13a: Miscellaneous fixes.

    idlenocb.2013.03.26b: Remove restrictions on no-CBs CPUs, make
    RCU_FAST_NO_HZ take advantage of numbered callbacks, add
    callback acceleration based on numbered callbacks.

    Paul E. McKenney
     
  • CPUs going idle will need to record the need for a future grace
    period, but won't actually need to block waiting on it. This commit
    therefore splits rcu_start_future_gp(), which does the recording, from
    rcu_nocb_wait_gp(), which now invokes rcu_start_future_gp() to do the
    recording, after which rcu_nocb_wait_gp() does the waiting.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • CPUs going idle need to be able to indicate their need for future grace
    periods. A mechanism for doing this already exists for no-callbacks
    CPUs, so the idea is to re-use that mechanism. This commit therefore
    moves the ->n_nocb_gp_requests field of the rcu_node structure out from
    under the CONFIG_RCU_NOCB_CPU #ifdef and renames it to ->need_future_gp.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • If CPUs are to give prior notice of needed grace periods, it will be
    necessary to invoke rcu_start_gp() without dropping the root rcu_node
    structure's ->lock. This commit takes a second step in this direction
    by moving the release of this lock to rcu_start_gp()'s callers.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Dyntick-idle CPUs need to be able to pre-announce their need for grace
    periods. This can be done using something similar to the mechanism used
    by no-CB CPUs to announce their need for grace periods. This commit
    moves in this direction by renaming the no-CBs grace-period event tracing
    to suit the new future-grace-period needs.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Because RCU callbacks are now associated with the number of the grace
    period that they must wait for, CPUs can now take advance callbacks
    corresponding to grace periods that ended while a given CPU was in
    dyntick-idle mode. This eliminates the need to try forcing the RCU
    state machine while entering idle, thus reducing the CPU intensiveness
    of RCU_FAST_NO_HZ, which should increase its energy efficiency.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • RCU_FAST_NO_HZ operation is controlled by four compile-time C-preprocessor
    macros, but some use cases benefit greatly from runtime adjustment,
    particularly when tuning devices. This commit therefore creates the
    corresponding sysfs entries.

    Reported-by: Robin Randhawa
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, the per-no-CBs-CPU kthreads are named "rcuo" followed by
    the CPU number, for example, "rcuo". This is problematic given that
    there are either two or three RCU flavors, each of which gets a per-CPU
    kthread with exactly the same name. This commit therefore introduces
    a one-letter abbreviation for each RCU flavor, namely 'b' for RCU-bh,
    'p' for RCU-preempt, and 's' for RCU-sched. This abbreviation is used
    to distinguish the "rcuo" kthreads, for example, for CPU 0 we would have
    "rcuob/0", "rcuop/0", and "rcuos/0".

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Dietmar Eggemann

    Paul E. McKenney
     
  • Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, the no-CBs kthreads do repeated timed waits for grace periods
    to elapse. This is crude and energy inefficient, so this commit allows
    no-CBs kthreads to specify exactly which grace period they are waiting
    for and also allows them to block for the entire duration until the
    desired grace period completes.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, the only way to specify no-CBs CPUs is via the rcu_nocbs
    kernel command-line parameter. This is inconvenient in some cases,
    particularly for randconfig testing, so this commit adds a new set of
    kernel configuration parameters. CONFIG_RCU_NOCB_CPU_NONE (the default)
    retains the old behavior, CONFIG_RCU_NOCB_CPU_ZERO offloads callback
    processing from CPU 0 (along with any other CPUs specified by the
    rcu_nocbs boot-time parameter), and CONFIG_RCU_NOCB_CPU_ALL offloads
    callback processing from all CPUs.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

14 Mar, 2013

1 commit

  • If RCU's softirq handler is prevented from executing, an RCU CPU stall
    warning can result. Ways to prevent RCU's softirq handler from executing
    include: (1) CPU spinning with interrupts disabled, (2) infinite loop
    in some softirq handler, and (3) in -rt kernels, an infinite loop in a
    set of real-time threads running at priorities higher than that of RCU's
    softirq handler.

    Because this situation can be difficult to track down, this commit causes
    the count of RCU softirq handler invocations to be printed with RCU
    CPU stall warnings. This information does require some interpretation,
    as now documented in Documentation/RCU/stallwarn.txt.

    Reported-by: Thomas Gleixner
    Signed-off-by: Paul E. McKenney
    Tested-by: Paul Gortmaker

    Paul E. McKenney
     

13 Mar, 2013

1 commit

  • Currently, CPU 0 is constrained to not be a no-CBs CPU, and furthermore
    at least one no-CBs CPU must remain online at any given time. These
    restrictions are problematic in some situations, such as cases where
    all CPUs must run a real-time workload that needs to be insulated from
    OS jitter and latencies due to RCU callback invocation. This commit
    therefore provides no-CBs CPUs a (very crude and energy-inefficient)
    way to start and to wait for grace periods independently of the normal
    RCU callback mechanisms. This approach allows any or all of the CPUs to
    be designated as no-CBs CPUs, and allows any proper subset of the CPUs
    (whether no-CBs CPUs or not) to be offlined.

    This commit also provides a fix for a locking bug spotted by Xie
    ChanglongX .

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

09 Jan, 2013

2 commits

  • The as-documented rcu_nocb_poll will fail to enable this feature
    for two reasons. (1) there is an extra "s" in the documented
    name which is not in the code, and (2) since it uses module_param,
    it really is expecting a prefix, akin to "rcutree.fanout_leaf"
    and the prefix isn't documented.

    However, there are several reasons why we might not want to
    simply fix the typo and add the prefix:

    1) we'd end up with rcutree.rcu_nocb_poll, and rather probably make
    a change to rcutree.nocb_poll

    2) if we did #1, then the prefix wouldn't be consistent with the
    rcu_nocbs= parameter (i.e. one with, one without prefix)

    3) the use of module_param in a header file is less than desired,
    since it isn't immediately obvious that it will get processed
    via rcutree.c and get the prefix from that (although use of
    module_param_named() could clarify that.)

    4) the implied export of /sys/module/rcutree/parameters/rcu_nocb_poll
    data to userspace via module_param() doesn't really buy us anything,
    as it is read-only and we can tell if it is enabled already without
    it, since there is a printk at early boot telling us so.

    In light of all that, just change it from a module_param() to an
    early_setup() call, and worry about adding it to /sys later on if
    we decide to allow a dynamic setting of it.

    Also change the variable to be tagged as read_mostly, since it
    will only ever be fiddled with at most, once at boot.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: Paul E. McKenney

    Paul Gortmaker
     
  • The wait_event() at the head of the rcu_nocb_kthread() can result in
    soft-lockup complaints if the CPU in question does not register RCU
    callbacks for an extended period. This commit therefore changes
    the wait_event() to a wait_event_interruptible().

    Reported-by: Frederic Weisbecker
    Signed-off-by: Paul Gortmaker
    Signed-off-by: Paul E. McKenney

    Paul Gortmaker
     

17 Nov, 2012

3 commits

  • Currently, callback invocations from callback-free CPUs are accounted to
    the CPU that registered the callback, but using the same field that is
    used for normal callbacks. This makes it impossible to determine from
    debugfs output whether callbacks are in fact being diverted. This commit
    therefore adds a separate ->n_nocbs_invoked field in the rcu_data structure
    in which diverted callback invocations are counted. RCU's debugfs tracing
    still displays normal callback invocations using ci=, but displayed
    diverted callbacks with nci=.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • RCU callback execution can add significant OS jitter and also can
    degrade both scheduling latency and, in asymmetric multiprocessors,
    energy efficiency. This commit therefore adds the ability for selected
    CPUs ("rcu_nocbs=" boot parameter) to have their callbacks offloaded
    to kthreads. If the "rcu_nocb_poll" boot parameter is also specified,
    these kthreads will do polling, removing the need for the offloaded
    CPUs to do wakeups. At least one CPU must be doing normal callback
    processing: currently CPU 0 cannot be selected as a no-CBs CPU.
    In addition, attempts to offline the last normal-CBs CPU will fail.

    This feature was inspired by Jim Houston's and Joe Korty's JRCU, and
    this commit includes fixes to problems located by Fengguang Wu's
    kbuild test robot.

    [ paulmck: Added gfp.h include file as suggested by Fengguang Wu. ]

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • …cu.2012.10.27a', 'stall.2012.11.13a', 'tracing.2012.11.08a' and 'idle.2012.10.24a' into HEAD

    urgent.2012.10.27a: Fix for RCU user-mode transition (already in -tip).

    doc.2012.11.08a: Documentation updates, most notably codifying the
    memory-barrier guarantees inherent to grace periods.

    fixes.2012.11.13a: Miscellaneous fixes.

    srcu.2012.10.27a: Allow statically allocated and initialized srcu_struct
    structures (courtesy of Lai Jiangshan).

    stall.2012.11.13a: Add more diagnostic information to RCU CPU stall
    warnings, also decrease from 60 seconds to 21 seconds.

    hotplug.2012.11.08a: Minor updates to CPU hotplug handling.

    tracing.2012.11.08a: Improved debugfs tracing, courtesy of Michael Wang.

    idle.2012.10.24a: Updates to RCU idle/adaptive-idle handling, including
    a boot parameter that maps normal grace periods to expedited.

    Resolved conflict in kernel/rcutree.c due to side-by-side change.

    Paul E. McKenney
     

14 Nov, 2012

1 commit

  • This commit explicitly states the memory-ordering properties of the
    RCU grace-period primitives. Although these properties were in some
    sense implied by the fundmental property of RCU ("a grace period must
    wait for all pre-existing RCU read-side critical sections to complete"),
    stating it explicitly will be a great labor-saving device.

    Reported-by: Oleg Nesterov
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Oleg Nesterov

    Paul E. McKenney
     

09 Nov, 2012

1 commit

  • The ->onofflock field in the rcu_state structure at one time synchronized
    CPU-hotplug operations for RCU. However, its scope has decreased over time
    so that it now only protects the lists of orphaned RCU callbacks. This
    commit therefore renames it to ->orphan_lock to reflect its current use.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

24 Oct, 2012

1 commit

  • There have been some embedded applications that would benefit from
    use of expedited grace-period primitives. In some ways, this is
    similar to synchronize_net() doing either a normal or an expedited
    grace period depending on lock state, but with control outside of
    the kernel.

    This commit therefore adds rcu_expedited boot and sysfs parameters
    that cause the kernel to substitute expedited primitives for the
    normal grace-period primitives.

    [ paulmck: Add trace/event/rcu.h to kernel/srcu.c to avoid build error.
    Get rid of infinite loop through contention path.]

    Signed-off-by: Antti P Miettinen
    Signed-off-by: Paul E. McKenney

    Antti P Miettinen
     

26 Sep, 2012

2 commits

  • The current implementation of RCU_FAST_NO_HZ tries reasonably hard to rid
    the current CPU of RCU callbacks. This is appropriate when the CPU is
    entering idle, where it doesn't have much useful to do anyway, but is most
    definitely not what you want when transitioning to user-mode execution.
    This commit therefore detects the adaptive-tick case, and refrains from
    burning CPU time getting rid of RCU callbacks in that case.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The conflicts between kernel/rcutree.h and kernel/rcutree_plugin.h
    were due to adjacent insertions and deletions, which were resolved
    by simply accepting the changes on both branches.

    Paul E. McKenney
     

25 Sep, 2012

1 commit

  • …', 'hotplug.2012.09.23a' and 'idlechop.2012.09.23a' into HEAD

    bigrt.2012.09.23a contains additional commits to reduce scheduling latency
    from RCU on huge systems (many hundrends or thousands of CPUs).

    doctorture.2012.09.23a contains documentation changes and rcutorture fixes.

    fixes.2012.09.23a contains miscellaneous fixes.

    hotplug.2012.09.23a contains CPU-hotplug-related changes.

    idle.2012.09.23a fixes architectures for which RCU no longer considered
    the idle loop to be a quiescent state due to earlier
    adaptive-dynticks changes. Affected architectures are alpha,
    cris, frv, h8300, m32r, m68k, mn10300, parisc, score, xtensa,
    and ia64.

    Paul E. McKenney
     

23 Sep, 2012

9 commits

  • The print_cpu_stall_fast_no_hz() function attempts to print -1 when
    the ->idle_gp_timer is not pending, but unsigned arithmetic causes it
    to instead print ULONG_MAX, which is 4294967295 on 32-bit systems and
    18446744073709551615 on 64-bit systems. Neither of these are the most
    reader-friendly values, so this commit instead causes "timer not pending"
    to be printed when ->idle_gp_timer is not pending.

    Reported-by: Paul Walmsley
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The rcu_print_detail_task_stall_rnp() function invokes
    rcu_preempt_blocked_readers_cgp() to verify that there are some preempted
    RCU readers blocking the current grace period outside of the protection
    of the rcu_node structure's ->lock. This means that the last blocked
    reader might exit its RCU read-side critical section and remove itself
    from the ->blkd_tasks list before the ->lock is acquired, resulting in
    a segmentation fault when the subsequent code attempts to dereference
    the now-NULL gp_tasks pointer.

    This commit therefore moves the test under the lock. This will not
    have measurable effect on lock contention because this code is invoked
    only when printing RCU CPU stall warnings, in other words, in the common
    case, never.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The increment_cpu_stall_ticks() function listed each RCU flavor
    explicitly, with an ifdef to handle preemptible RCU. This commit
    therefore applies for_each_rcu_flavor() to save a line of code.

    Because this commit switches from a code-based enumeration of the
    flavors of RCU to an rcu_state-list-based enumeration, it is no longer
    possible to apply __get_cpu_var() to the per-CPU rcu_data structures.
    We instead use __this_cpu_var() on the rcu_state structure's ->rda field
    that references the corresponding rcu_data structures.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Commit 1217ed1b (rcu: permit rcu_read_unlock() to be called while holding
    runqueue locks) made rcu_initiate_boost() restore irq state when releasing
    the rcu_node structure's ->lock, but failed to update the header comment
    accordingly. This commit therefore brings the header comment up to date.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The rcu_preempt_offline_tasks() moves all tasks queued on a given leaf
    rcu_node structure to the root rcu_node, which is done when the last CPU
    corresponding the the leaf rcu_node structure goes offline. Now that
    RCU-preempt's synchronize_rcu_expedited() implementation blocks CPU-hotplug
    operations during the initialization of each rcu_node structure's
    ->boost_tasks pointer, rcu_preempt_offline_tasks() can do a better job
    of setting the root rcu_node's ->boost_tasks pointer.

    The key point is that rcu_preempt_offline_tasks() runs as part of the
    CPU-hotplug process, so that a concurrent synchronize_rcu_expedited()
    is guaranteed to either have not started on the one hand (in which case
    there is no boosting on behalf of the expedited grace period) or to be
    completely initialized on the other (in which case, in the absence of
    other priority boosting, all ->boost_tasks pointers will be initialized).
    Therefore, if rcu_preempt_offline_tasks() finds that the ->boost_tasks
    pointer is equal to the ->exp_tasks pointer, it can be sure that it is
    correctly placed.

    In the case where there was boosting ongoing at the time that the
    synchronize_rcu_expedited() function started, different nodes might start
    boosting the tasks blocking the expedited grace period at different times.
    In this mixed case, the root node will either be boosting tasks for
    the expedited grace period already, or it will start as soon as it gets
    done boosting for the normal grace period -- but in this latter case,
    the root node's tasks needed to be boosted in any case.

    This commit therefore adds a check of the ->boost_tasks pointer against
    the ->exp_tasks pointer to the list that prevents updating ->boost_tasks.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • When rcu_preempt_offline_tasks() clears tasks from a leaf rcu_node
    structure, it does not NULL out the structure's ->boost_tasks field.
    This commit therefore fixes this issue.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The current quiescent-state detection algorithm is needlessly
    complex. It records the grace-period number corresponding to
    the quiescent state at the time of the quiescent state, which
    works, but it seems better to simply erase any record of previous
    quiescent states at the time that the CPU notices the new grace
    period. This has the further advantage of removing another piece
    of RCU for which lockless reasoning is required.

    Therefore, this commit makes this change.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The synchronize_rcu_expedited() function disables interrupts across a
    scan of all leaf rcu_node structures, which is not good for real-time
    scheduling latency on large systems (hundreds or especially thousands
    of CPUs). This commit therefore holds off CPU-hotplug operations using
    get_online_cpus(), and removes the prior acquisiion of the ->onofflock
    (which required disabling interrupts).

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • In the C language, signed overflow is undefined. It is true that
    twos-complement arithmetic normally comes to the rescue, but if the
    compiler can subvert this any time it has any information about the values
    being compared. For example, given "if (a - b > 0)", if the compiler
    has enough information to realize that (for example) the value of "a"
    is positive and that of "b" is negative, the compiler is within its
    rights to optimize to a simple "if (1)", which might not be what you want.

    This commit therefore converts synchronize_rcu_expedited()'s work-done
    detection counter from signed to unsigned.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney