22 Jul, 2015

1 commit

  • commit 6e91f8cb138625be96070b778d9ba71ce520ea7e upstream.

    If, at the time __rcu_process_callbacks() is invoked, there are callbacks
    in Tiny RCU's callback list, but none of them are ready to be invoked,
    the current list-management code will knit the non-ready callbacks out
    of the list. This can result in hangs and possibly worse. This commit
    therefore inserts a check for there being no callbacks that can be
    invoked immediately.

    This bug is unlikely to occur -- you have to get a new callback between
    the time rcu_sched_qs() or rcu_bh_qs() was called, but before we get to
    __rcu_process_callbacks(). It was detected by the addition of RCU-bh
    testing to rcutorture, which in turn was instigated by Iftekhar Ahmed's
    mutation testing. Although this bug was made much more likely by
    915e8a4fe45e (rcu: Remove fastpath from __rcu_process_callbacks()), this
    did not cause the bug, but rather made it much more probable. That
    said, it takes more than 40 hours of rcutorture testing, on average,
    for this bug to appear, so this fix cannot be considered an emergency.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett
    Signed-off-by: Greg Kroah-Hartman

    Paul E. McKenney
     

15 Apr, 2015

1 commit

  • In a misguided attempt to avoid an #ifdef, the use of the
    gp_init_delay module parameter was conditioned on the corresponding
    RCU_TORTURE_TEST_SLOW_INIT Kconfig variable, using IS_ENABLED() at
    the point of use in the code. This meant that the compiler always saw
    the delay, which meant that RCU_TORTURE_TEST_SLOW_INIT_DELAY had to be
    unconditionally defined. This in turn caused "make oldconfig" to ask
    pointless questions about the value of RCU_TORTURE_TEST_SLOW_INIT_DELAY
    in cases where it was not even used.

    This commit avoids these pointless questions by defining gp_init_delay
    under #ifdef. In one branch, gp_init_delay is initialized to
    RCU_TORTURE_TEST_SLOW_INIT_DELAY and is also a module parameter (thus
    allowing boot-time modification), and in the other branch gp_init_delay
    is a const variable initialized by default to zero.

    This approach also simplifies the code at the delay point by eliminating
    the IS_DEFINED(). Because gp_init_delay is constant zero in the no-delay
    case intended for production use, the "gp_init_delay > 0" check causes
    the delay to become dead code, as desired in this case. In addition,
    this commit replaces magic constant "10" with the preprocessor variable
    PER_RCU_NODE_PERIOD, which controls the number of grace periods that
    are allowed to elapse at full speed before a delay is inserted.

    Reported-by: Linus Torvalds Signed-off-by:
    Paul E. McKenney

    Paul E. McKenney
     

20 Mar, 2015

3 commits

  • …pexp.2015.02.26a', 'hotplug.2015.03.20a', 'sysidle.2015.02.26b' and 'tiny.2015.02.26a' into HEAD

    doc.2015.02.26a: Documentation changes
    earlycb.2015.03.03a: Permit early-boot RCU callbacks
    fixes.2015.03.03a: Miscellaneous fixes
    gpexp.2015.02.26a: In-kernel expediting of normal grace periods
    hotplug.2015.03.20a: CPU hotplug fixes
    sysidle.2015.02.26b: NO_HZ_FULL_SYSIDLE fixes
    tiny.2015.02.26a: TINY_RCU fixes

    Paul E. McKenney
     
  • As noted in earlier commit logs, CPU hotplug operations running
    concurrently with grace-period initialization can result in a given
    leaf rcu_node structure having all CPUs offline and no blocked readers,
    but with this rcu_node structure nevertheless blocking the current
    grace period. Therefore, the quiescent-state forcing code now checks
    for this situation and repairs it.

    Unfortunately, this checking can result in false positives, for example,
    when the last task has just removed itself from this leaf rcu_node
    structure, but has not yet started clearing the ->qsmask bits further
    up the structure. This means that the grace-period kthread (which
    forces quiescent states) and some other task might be attempting to
    concurrently clear these ->qsmask bits. This is usually not a problem:
    One of these tasks will be the first to acquire the upper-level rcu_node
    structure's lock and with therefore clear the bit, and the other task,
    seeing the bit already cleared, will stop trying to clear bits.

    Sadly, this means that the following unusual sequence of events -can-
    result in a problem:

    1. The grace-period kthread wins, and clears the ->qsmask bits.

    2. This is the last thing blocking the current grace period, so
    that the grace-period kthread clears ->qsmask bits all the way
    to the root and finds that the root ->qsmask field is now zero.

    3. Another grace period is required, so that the grace period kthread
    initializes it, including setting all the needed qsmask bits.

    4. The leaf rcu_node structure (the one that started this whole
    mess) is blocking this new grace period, either because it
    has at least one online CPU or because there is at least one
    task that had blocked within an RCU read-side critical section
    while running on one of this leaf rcu_node structure's CPUs.
    (And yes, that CPU might well have gone offline before the
    grace period in step (3) above started, which can mean that
    there is a task on the leaf rcu_node structure's ->blkd_tasks
    list, but ->qsmask equal to zero.)

    5. The other kthread didn't get around to trying to clear the upper
    level ->qsmask bits until all the above had happened. This means
    that it now sees bits set in the upper-level ->qsmask field, so it
    proceeds to clear them. Too bad that it is doing so on behalf of
    a quiescent state that does not apply to the current grace period!

    This sequence of events can result in the new grace period being too
    short. It can also result in the new grace period ending before the
    leaf rcu_node structure's ->qsmask bits have been cleared, which will
    result in splats during initialization of the next grace period. In
    addition, it can result in tasks blocking the new grace period still
    being queued at the start of the next grace period, which will result
    in other splats. Sasha's testing turned up another of these splats,
    as did rcutorture testing. (And yes, rcutorture is being adjusted to
    make these splats show up more quickly. Which probably is having the
    undesirable side effect of making other problems show up less quickly.
    Can't have everything!)

    Reported-by: Sasha Levin
    Signed-off-by: Paul E. McKenney
    Cc: # 4.0.x
    Tested-by: Sasha Levin

    Paul E. McKenney
     
  • As noted earlier, the following sequence of events can occur when
    running PREEMPT_RCU and HOTPLUG_CPU on a system with a multi-level
    rcu_node combining tree:

    1. A group of tasks block on CPUs corresponding to a given leaf
    rcu_node structure while within RCU read-side critical sections.
    2. All CPUs corrsponding to that rcu_node structure go offline.
    3. The next grace period starts, but because there are still tasks
    blocked, the upper-level bits corresponding to this leaf rcu_node
    structure remain set.
    4. All the tasks exit their RCU read-side critical sections and
    remove themselves from the leaf rcu_node structure's list,
    leaving it empty.
    5. But because there now is code to check for this condition at
    force-quiescent-state time, the upper bits are cleared and the
    grace period completes.

    However, there is another complication that can occur following step 4 above:

    4a. The grace period starts, and the leaf rcu_node structure's
    gp_tasks pointer is set to NULL because there are no tasks
    blocked on this structure.
    4b. One of the CPUs corresponding to the leaf rcu_node structure
    comes back online.
    4b. An endless stream of tasks are preempted within RCU read-side
    critical sections on this CPU, such that the ->blkd_tasks
    list is always non-empty.

    The grace period will never end.

    This commit therefore makes the force-quiescent-state processing check only
    for absence of tasks blocking the current grace period rather than absence
    of tasks altogether. This will cause a quiescent state to be reported if
    the current leaf rcu_node structure is not blocking the current grace period
    and its parent thinks that it is, regardless of how RCU managed to get
    itself into this state.

    Signed-off-by: Paul E. McKenney
    Cc: # 4.0.x
    Tested-by: Sasha Levin

    Paul E. McKenney
     

13 Mar, 2015

6 commits

  • At grace-period initialization time, RCU checks that all quiescent
    states were really reported for the previous grace period. Now that
    grace-period cleanup has been split out of grace-period initialization,
    this commit also performs those checks at grace-period cleanup time.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit informs RCU of an outgoing CPU just before that CPU invokes
    arch_cpu_idle_dead() during its last pass through the idle loop (via a
    new CPU_DYING_IDLE notifier value). This change means that RCU need not
    deal with outgoing CPUs passing through the scheduler after informing
    RCU that they are no longer online. Note that removing the CPU from
    the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time,
    and orphaning callbacks is still done at CPU_DEAD time, the reason being
    that at CPU_DEAD time we have another CPU that can adopt them.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Because that RCU grace-period initialization need no longer exclude
    CPU-hotplug operations, this commit eliminates the ->onoff_mutex and
    its uses.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Races between CPU hotplug and grace periods can be difficult to resolve,
    so the ->onoff_mutex is used to exclude the two events. Unfortunately,
    this means that it is impossible for an outgoing CPU to perform the
    last bits of its offlining from its last pass through the idle loop,
    because sleeplocks cannot be acquired in that context.

    This commit avoids these problems by buffering online and offline events
    in a new ->qsmaskinitnext field in the leaf rcu_node structures. When a
    grace period starts, the events accumulated in this mask are applied to
    the ->qsmaskinit field, and, if needed, up the rcu_node tree. The special
    case of all CPUs corresponding to a given leaf rcu_node structure being
    offline while there are still elements in that structure's ->blkd_tasks
    list is handled using a new ->wait_blkd_tasks field. In this case,
    propagating the offline bits up the tree is deferred until the beginning
    of the grace period after all of the tasks have exited their RCU read-side
    critical sections and removed themselves from the list, at which point
    the ->wait_blkd_tasks flag is cleared. If one of that leaf rcu_node
    structure's CPUs comes back online before the list empties, then the
    ->wait_blkd_tasks flag is simply cleared.

    This of course means that RCU's notion of which CPUs are offline can be
    out of date. This is OK because RCU need only wait on CPUs that were
    online at the time that the grace period started. In addition, RCU's
    force-quiescent-state actions will handle the case where a CPU goes
    offline after the grace period starts.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The rcu_report_unblock_qs_rnp() function is invoked when the
    last task blocking the current grace period exits its outermost
    RCU read-side critical section. Previously, this was called only
    from rcu_read_unlock_special(), and was therefore defined only when
    CONFIG_RCU_PREEMPT=y. However, this function will be invoked even when
    CONFIG_RCU_PREEMPT=n once CPU-hotplug operations are processed only at
    the beginnings of RCU grace periods. The reason for this change is that
    the last task on a given leaf rcu_node structure's ->blkd_tasks list
    might well exit its RCU read-side critical section between the time that
    recent CPU-hotplug operations were applied and when the new grace period
    was initialized. This situation could result in RCU waiting forever on
    that leaf rcu_node structure, because if all that structure's CPUs were
    already offline, there would be no quiescent-state events to drive that
    structure's part of the grace period.

    This commit therefore moves rcu_report_unblock_qs_rnp() to common code
    that is built unconditionally so that the quiescent-state-forcing code
    can clean up after this situation, avoiding the grace-period stall.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, the rcu_node tree ->expmask bitmasks are initially set to
    reflect the online CPUs. This is pointless, because only the CPUs
    preempted within RCU read-side critical sections by the preceding
    synchronize_sched_expedited() need to be tracked. This commit therefore
    instead sets up these bitmasks based on the state of the ->blkd_tasks
    lists.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

12 Mar, 2015

7 commits


04 Mar, 2015

8 commits


27 Feb, 2015

14 commits

  • If the RCU grace-period kthread invoking rcu_sysidle_check_cpu()
    happens to be running on the tick_do_timer_cpu initially,
    then rcu_bind_gp_kthread() won't bind it. This kthread might
    then migrate before invoking rcu_gp_fqs(), which will trigger the
    WARN_ON_ONCE() in rcu_sysidle_check_cpu(). This commit therefore makes
    rcu_bind_gp_kthread() do the binding even if the kthread is currently
    on the same CPU. Because this incurs added overhead, this commit also
    causes each RCU grace-period kthread to invoke rcu_bind_gp_kthread()
    once at boot rather than at the beginning of each grace period.
    And as long as rcu_bind_gp_kthread() is being modified, this commit
    eliminates its #ifdef.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The standard code path accommodates a condition when no
    RCU callbacks are ready to invoke. Since size of the code
    is a priority for tiny RCU, remove the fast path.

    Cc: "Paul E. McKenney"
    Signed-off-by: Alexander Gordeev
    Signed-off-by: Paul E. McKenney

    Alexander Gordeev
     
  • When the ->curtail and ->donetail pointers differ, ->rcucblist
    always points to the beginning of the current list and thus
    cannot be NULL. Therefore, the check ->rcucblist != NULL is
    redundant and this commit removes it.

    Cc: "Paul E. McKenney"
    Signed-off-by: Alexander Gordeev
    Signed-off-by: Paul E. McKenney

    Alexander Gordeev
     
  • On second and subsequent passes through quiescent-state forcing, the
    isidle variable was initialized to false, which would prevent full sysidle
    state from being reached if a grace period needed more than one round
    of quiescent-state forcing (which most should not). However, the check
    for offline CPUs in the quiescent-state forcing main loop had the wrong
    sense, which could prevent CPUs from ever entering full sysidle state.

    This commit fixes both of these bugs. Given that sysidle is not yet
    wired up, this has no effect in old kernels, but might have proven
    frustrating had anyone attempted to wire it up.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The "if" statement at the beginning of rcu_torture_writer() should
    use the same set of variables. In theory, this does not matter because
    the corresponding variables (gp_sync and gp_sync1) have the same value
    at this point in the code, but in practice such puzzles should be
    removed. This commit therefore makes the use of variables consistent.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit adds a CONFIG_RCU_EXPEDITE_BOOT Kconfig parameter
    that emulates a very early boot rcu_expedite_gp(). A late-boot
    call to rcu_end_inkernel_boot() will provide the corresponding
    rcu_unexpedite_gp(). The late-boot call to rcu_end_inkernel_boot()
    should be made just before init is spawned.

    According to Arjan:

    > To show the boot time, I'm using the timestamp of the "Write protecting"
    > line, that's pretty much the last thing we print prior to ring 3 execution.
    >
    > A kernel with default RCU behavior (inside KVM, only virtual devices)
    > looks like this:
    >
    > [ 0.038724] Write protecting the kernel read-only data: 10240k
    >
    > a kernel with expedited RCU (using the command line option, so that I
    > don't have to recompile between measurements and thus am completely
    > oranges-to-oranges)
    >
    > [ 0.031768] Write protecting the kernel read-only data: 10240k
    >
    > which, in percentage, is an 18% improvement.

    Reported-by: Arjan van de Ven
    Signed-off-by: Paul E. McKenney
    Tested-by: Arjan van de Ven

    Paul E. McKenney
     
  • This commit updates open-coded tests of the rcu_expedited variable
    to instead use rcu_gp_is_expedited().

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, expediting of normal synchronous grace-period primitives
    (synchronize_rcu() and friends) is controlled by the rcu_expedited()
    boot/sysfs parameter. This works well, but does not handle nesting.
    This commit therefore provides rcu_expedite_gp() to enable expediting
    and rcu_unexpedite_gp() to cancel a prior rcu_expedite_gp(), both of
    which support nesting.

    Reported-by: Arjan van de Ven
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • When a CPU comes online, it initializes its callback list. This
    is a bad thing if this is the first time that the CPU has come
    online and if that CPU has early boot callbacks. This commit therefore
    avoid initializing the callback list if there are callbacks present,
    in which case the initial call_rcu() did the initialization for us.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Some diagnostics under CONFIG_PROVE_RCU in rcu_nocb_cpu_needs_barrier()
    assume that there can be no early-boot callbacks. This commit therefore
    qualifies the diagnostic with rcu_scheduler_fully_active to permit
    early boot callbacks to avoid this splat.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, a call_rcu() that precedes rcu_init() will splat due to the
    callback lists not having yet been initialized. This commit causes the
    first such callback to initialize the boot CPU's RCU callback list.

    Note that this commit does not change rcu_init()-time initialization,
    which means that the callback will be discarded at rcu_init() time.
    Fixing this is the job of later commits.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit wires up the rcu_state structures' ->rda pointers to the
    per-CPU rcu_data structures at compile time, thus ensuring that this
    linkage is present at early boot, in turn allowing posting of callbacks
    before rcu_init() is executed.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney