08 Oct, 2015

11 commits


07 Oct, 2015

16 commits

  • The torturing_tasks() function is used only in kernels built with
    CONFIG_PROVE_RCU=y, so the second definition can result in unused-function
    compiler warnings. This commit adds __maybe_unused to suppress these
    warnings.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The rcutorture module has a list of torture types, and specifying a
    type not on this list is supposed to cleanly fail the module load.
    Unfortunately, the "fail" happens without the "cleanly". This commit
    therefore adds the needed clean-up after an incorrect torture_type.

    Reported-by: David Miller
    Signed-off-by: Paul E. McKenney
    Acked-by: David Miller
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • 1. Rename __rcu_sync_is_idle() to rcu_sync_lockdep_assert() and
    change it to use rcu_lockdep_assert().

    2. Change rcu_sync_is_idle() to return rsp->gp_state == GP_IDLE
    unconditonally, this way we can remove the same check from
    rcu_sync_lockdep_assert() and clearly isolate the debugging
    code.

    Note: rcu_sync_enter()->wait_event(gp_state == GP_PASSED) needs
    another CONFIG_PROVE_RCU check, the same as is done in ->sync(); but
    this needs some simple preparations in the core RCU code to avoid the
    code duplication.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Oleg Nesterov
     
  • This commit allows rcu_sync structures to be safely deallocated,
    The trick is to add a new ->wait field to the gp_ops array.
    This field is a pointer to the rcu_barrier() function corresponding
    to the flavor of RCU in question. This allows a new rcu_sync_dtor()
    to wait for any outstanding callbacks before freeing the rcu_sync
    structure.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Oleg Nesterov
     
  • This commit validates that the caller of rcu_sync_is_idle() holds the
    corresponding type of RCU read-side lock, but only in kernels built
    with CONFIG_PROVE_RCU=y. This validation is carried out via a new
    rcu_sync_ops->held() method that is checked within rcu_sync_is_idle().

    Note that although this does add code to the fast path, it only does so
    in kernels built with CONFIG_PROVE_RCU=y.

    Suggested-by: "Paul E. McKenney"
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Oleg Nesterov
     
  • This commit adds the new struct rcu_sync_ops which holds sync/call
    methods, and turns the function pointers in rcu_sync_struct into an array
    of struct rcu_sync_ops. This simplifies the "init" helpers by collapsing
    a switch statement and explicit multiple definitions into a simple
    assignment and a helper macro, respectively.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Oleg Nesterov
     
  • The rcu_sync infrastructure can be thought of as infrastructure to be
    used to implement reader-writer primitives having extremely lightweight
    readers during times when there are no writers. The first use is in
    the percpu_rwsem used by the VFS subsystem.

    This infrastructure is functionally equivalent to

    struct rcu_sync_struct {
    atomic_t counter;
    };

    /* Check possibility of fast-path read-side operations. */
    static inline bool rcu_sync_is_idle(struct rcu_sync_struct *rss)
    {
    return atomic_read(&rss->counter) == 0;
    }

    /* Tell readers to use slowpaths. */
    static inline void rcu_sync_enter(struct rcu_sync_struct *rss)
    {
    atomic_inc(&rss->counter);
    synchronize_sched();
    }

    /* Allow readers to once again use fastpaths. */
    static inline void rcu_sync_exit(struct rcu_sync_struct *rss)
    {
    synchronize_sched();
    atomic_dec(&rss->counter);
    }

    The main difference is that it records the state and only calls
    synchronize_sched() if required. At least some of the calls to
    synchronize_sched() will be optimized away when rcu_sync_enter() and
    rcu_sync_exit() are invoked repeatedly in quick succession.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Oleg Nesterov
     
  • This commit moves cond_resched_rcu_qs() into stutter_wait(), saving
    a line and also avoiding RCU CPU stall warnings from all torture
    loops containing a stutter_wait().

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • This commit corrects the comment for the values of the ->gp_state field,
    which previously incorrectly said that these were for the ->gp_flags
    field.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Commit commit 4cdfc175c25c89ee ("rcu: Move quiescent-state forcing
    into kthread") started the process of folding the old ->fqs_state into
    ->gp_state, but did not complete it. This situation does not cause
    any malfunction, but can result in extremely confusing trace output.
    This commit completes this task of eliminating ->fqs_state in favor
    of ->gp_state.

    The old ->fqs_state was also used to decide when to collect dyntick-idle
    snapshots. For this purpose, we add a boolean variable into the kthread,
    which is set on the first call to rcu_gp_fqs() for a given grace period
    and clear otherwise.

    Signed-off-by: Petr Mladek
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Petr Mladek
     
  • Currently, __srcu_read_lock() cannot be invoked from restricted
    environments because it contains calls to preempt_disable() and
    preempt_enable(), both of which can invoke lockdep, which is a bad
    idea in some restricted execution modes. This commit therefore moves
    the preempt_disable() and preempt_enable() from __srcu_read_lock()
    to srcu_read_lock(). It also inserts the preempt_disable() and
    preempt_enable() around the call to __srcu_read_lock() in do_exit().

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • This commit makes the RCU CPU stall warning message print online/offline
    indications immediately after a hyphen following the CPU number. A "O"
    indicates that the global CPU-hotplug system believes that the CPU is
    online, a "o" that RCU perceived the CPU to be online at the beginning
    of the current expedited grace period, and an "N" that RCU currently
    believes that it will perceive the CPU as being online at the beginning
    of the next expedited grace period, with "." otherwise for all three
    indications. So for CPU 10, you would normally see "10-OoN:" indicating
    that everything believes that the CPU is online.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit loosens rcutree.rcu_fanout_leaf range checks
    and replaces a panic() with a fallback to compile-time values.
    This fallback is accompanied by a WARN_ON(), and both occur when the
    rcutree.rcu_fanout_leaf value is too small to accommodate the number of
    CPUs. For example, given the current four-level limit for the rcu_node
    tree, a system with more than 16 CPUs built with CONFIG_FANOUT=2 must
    have rcutree.rcu_fanout_leaf larger than 2.

    Reported-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Because preempt_disable() maps to barrier() for non-debug builds,
    it forces the compiler to spill and reload registers. Because Tree
    RCU and Tiny RCU now only appear in CONFIG_PREEMPT=n builds, these
    barrier() instances generate needless extra code for each instance of
    rcu_read_lock() and rcu_read_unlock(). This extra code slows down Tree
    RCU and bloats Tiny RCU.

    This commit therefore removes the preempt_disable() and preempt_enable()
    from the non-preemptible implementations of __rcu_read_lock() and
    __rcu_read_unlock(), respectively. However, for debug purposes,
    preempt_disable() and preempt_enable() are still invoked if
    CONFIG_PREEMPT_COUNT=y, because this allows detection of sleeping inside
    atomic sections in non-preemptible kernels.

    However, Tiny and Tree RCU operates by coalescing all RCU read-side
    critical sections on a given CPU that lie between successive quiescent
    states. It is therefore necessary to compensate for removing barriers
    from __rcu_read_lock() and __rcu_read_unlock() by adding them to a
    couple of the RCU functions invoked during quiescent states, namely to
    rcu_all_qs() and rcu_note_context_switch(). However, note that the latter
    is more paranoia than necessity, at least until link-time optimizations
    become more aggressive.

    This is based on an earlier patch by Paul E. McKenney, fixing
    a bug encountered in kernels built with CONFIG_PREEMPT=n and
    CONFIG_PREEMPT_COUNT=y.

    Signed-off-by: Boqun Feng
    Signed-off-by: Paul E. McKenney

    Boqun Feng
     
  • We have had the call_rcu_func_t typedef for a quite awhile, but we still
    use explicit function pointer types in some places. These types can
    confuse cscope and can be hard to read. This patch therefore replaces
    these types with the call_rcu_func_t typedef.

    Signed-off-by: Boqun Feng
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Boqun Feng
     
  • As we now have rcu_callback_t typedefs as the type of rcu callbacks, we
    should use it in call_rcu*() and friends as the type of parameters. This
    could save us a few lines of code and make it clear which function
    requires an rcu callbacks rather than other callbacks as its argument.

    Besides, this can also help cscope to generate a better database for
    code reading.

    Signed-off-by: Boqun Feng
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Boqun Feng
     

21 Sep, 2015

9 commits

  • This commit converts the rcu_data structure's ->cpu_no_qs field
    to a union. The bytewise side of this union allows individual access
    to indications as to whether this CPU needs to find a quiescent state
    for a normal (.norm) and/or expedited (.exp) grace period. The setwise
    side of the union allows testing whether or not a quiescent state is
    needed at all, for either type of grace period.

    For now, only .norm is used. A later commit will introduce the expedited
    usage.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit inverts the sense of the rcu_data structure's ->passed_quiesce
    field and renames it to ->cpu_no_qs. This will allow a later commit to
    use an "aggregate OR" operation to test expedited as well as normal grace
    periods without added overhead.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • An upcoming commit needs to invert the sense of the ->passed_quiesce
    rcu_data structure field, so this commit is taking this opportunity
    to clarify things a bit by renaming ->qs_pending to ->core_needs_qs.

    So if !rdp->core_needs_qs, then this CPU need not concern itself with
    quiescent states, in particular, it need not acquire its leaf rcu_node
    structure's ->lock to check. Otherwise, it needs to report the next
    quiescent state.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, synchronize_sched_expedited() uses a single global counter
    to track the number of remaining context switches that the current
    expedited grace period must wait on. This is problematic on large
    systems, where the resulting memory contention can be pathological.
    This commit therefore makes synchronize_sched_expedited() instead use
    the combining tree in the same manner as synchronize_rcu_expedited(),
    keeping memory contention down to a dull roar.

    This commit creates a temporary function sync_sched_exp_select_cpus()
    that is very similar to sync_rcu_exp_select_cpus(). A later commit
    will consolidate these two functions, which becomes possible when
    synchronize_sched_expedited() switches from stop_one_cpu_nowait() to
    smp_call_function_single().

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The current preemptible-RCU expedited grace-period algorithm invokes
    synchronize_sched_expedited() to enqueue all tasks currently running
    in a preemptible-RCU read-side critical section, then waits for all the
    ->blkd_tasks lists to drain. This works, but results in both an IPI and
    a double context switch even on CPUs that do not happen to be running
    in a preemptible RCU read-side critical section.

    This commit implements a new algorithm that causes less OS jitter.
    This new algorithm IPIs all online CPUs that are not idle (from an
    RCU perspective), but refrains from self-IPIs. If a CPU receiving
    this IPI is not in a preemptible RCU read-side critical section (or
    is just now exiting one), it pushes quiescence up the rcu_node tree,
    otherwise, it sets a flag that will be handled by the upcoming outermost
    rcu_read_unlock(), which will then push quiescence up the tree.

    The expedited grace period must of course wait on any pre-existing blocked
    readers, and newly blocked readers must be queued carefully based on
    the state of both the normal and the expedited grace periods. This
    new queueing approach also avoids the need to update boost state,
    courtesy of the fact that blocked tasks are no longer ever migrated to
    the root rcu_node structure.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit replaces sync_rcu_preempt_exp_init1(() and
    sync_rcu_preempt_exp_init2() with sync_exp_reset_tree_hotplug()
    and sync_exp_reset_tree(), which will also be used by
    synchronize_sched_expedited(), and sync_rcu_exp_select_nodes(), which
    contains code specific to synchronize_rcu_expedited().

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This is a nearly pure code-movement commit, moving rcu_report_exp_rnp(),
    sync_rcu_preempt_exp_done(), and rcu_preempted_readers_exp() so
    that later commits can make synchronize_sched_expedited() use them.
    The non-code-movement portion of this commit tags rcu_report_exp_rnp()
    as __maybe_unused to avoid build errors when CONFIG_PREEMPT=n.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Now that there is an ->expedited_wq waitqueue in each rcu_state structure,
    there is no need for the sync_rcu_preempt_exp_wq global variable. This
    commit therefore substitutes ->expedited_wq for sync_rcu_preempt_exp_wq.
    It also initializes ->expedited_wq only once at boot instead of at the
    start of each expedited grace period.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • In kernels built with CONFIG_PREEMPT=y, synchronize_rcu_expedited()
    invokes synchronize_sched_expedited() while holding RCU-preempt's
    root rcu_node structure's ->exp_funnel_mutex, which is acquired after
    the rcu_data structure's ->exp_funnel_mutex. The first thing that
    synchronize_sched_expedited() will do is acquire RCU-sched's rcu_data
    structure's ->exp_funnel_mutex. There is no danger of an actual deadlock
    because the locking order is always from RCU-preempt's expedited mutexes
    to those of RCU-sched. Unfortunately, lockdep considers both rcu_data
    structures' ->exp_funnel_mutex to be in the same lock class and therefore
    reports a deadlock cycle.

    This commit silences this false positive by placing RCU-sched's rcu_data
    structures' ->exp_funnel_mutex locks into their own lock class.

    Reported-by: Sasha Levin
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

04 Aug, 2015

4 commits