09 Apr, 2019

1 commit


27 Mar, 2019

9 commits

  • This commit further consolidates the stall-warning code by moving
    print_cpu_stall_info() and its helper functions along with
    zero_cpu_stall_ticks() to kernel/rcu/tree_stall.h.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The print_cpu_stall_info_begin() and print_cpu_stall_info_end() print a
    single character each onto the console, and are a holdover from a time
    when RCU CPU stall warning messages could be abbreviated using a long-gone
    Kconfig option. This commit therefore adds these single characters to
    already-printed strings in the calling functions, and then eliminates
    both print_cpu_stall_info_begin() and print_cpu_stall_info_end().

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Because expedited CPU stall warnings are contained within the
    kernel/rcu/tree_exp.h file, rcu_print_task_exp_stall() should live
    there too. This commit carries out the required code motion.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The RCU CPU stall-warning code for normal grace periods is currently
    scattered across two files, due to earlier Tiny RCU support for RCU
    CPU stall warnings and for old Kconfig options that have long since
    been retired. Given that it is hard for the lead RCU maintainer to
    find relevant stall-warning code, it would be good to consolidate it.
    This commit continues this process by moving stall-warning code from
    kernel/rcu/tree_plugin.c to a new kernel/rcu/tree_stall.h file.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The task_struct structure's ->rcu_read_unlock_special field is only ever
    read or written by the owning task, but it is accessed both at process
    and interrupt levels. It may therefore be accessed using plain reads
    and writes while interrupts are disabled, but must be accessed using
    READ_ONCE() and WRITE_ONCE() or better otherwise. This commit makes a
    few adjustments to align with this discipline.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Because rcu_wake_cond() checks for a null task_struct pointer, there is
    no need for its callers to do so. This commit eliminates the redundant
    check.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit prints a console message when cpulist_parse() reports a
    bad list of CPUs, and sets all CPUs' bits in that case. The reason for
    setting all CPUs' bits is that this is the safe(r) choice for real-time
    workloads, which would normally be the ones using the rcu_nocbs= kernel
    boot parameter. Either way, later RCU console log messages list the
    actual set of CPUs whose RCU callbacks will be offloaded.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, the rcu_nocbs= kernel boot parameter requires that a specific
    list of CPUs be specified, and has no way to say "all of them".
    As noted by user RavFX in a comment to Phoronix topic 1002538, this
    is an inconvenient side effect of the removal of the RCU_NOCB_CPU_ALL
    Kconfig option. This commit therefore enables the rcu_nocbs= kernel boot
    parameter to be given the string "all", as in "rcu_nocbs=all" to specify
    that all CPUs on the system are to have their RCU callbacks offloaded.

    Another approach would be to make cpulist_parse() check for "all", but
    there are uses of cpulist_parse() that do other checking, which could
    conflict with an "all". This commit therefore focuses on the specific
    use of cpulist_parse() in rcu_nocb_setup().

    Just a note to other people who would like changes to Linux-kernel RCU:
    If you send your requests to me directly, they might get fixed somewhat
    faster. RavFX's comment was posted on January 22, 2018 and I first saw
    it on March 5, 2019. And the only reason that I found it -at- -all- was
    that I was looking for projects using RCU, and my search engine showed
    me that Phoronix comment quite by accident. Your choice, though! ;-)

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The purpose of exit_rcu() is to handle cases where buggy code causes a
    task to exit within an RCU read-side critical section. It currently
    does that in the case where said RCU read-side critical section was
    preempted at least once, but fails to handle cases where preemption did
    not occur. This case needs to be handled because otherwise the final
    context switch away from the exiting task will incorrectly behave as if
    task exit were instead a preemption of an RCU read-side critical section,
    and will therefore queue the exiting task. The exiting task will have
    exited, and thus won't ever execute rcu_read_unlock(), which means that
    it will remain queued forever, blocking all subsequent grace periods,
    and eventually resulting in OOM.

    Although this is arguably better than letting grace periods proceed
    and having a later rcu_read_unlock() access the now-freed task
    structure that once belonged to the exiting tasks, it would obviously
    be better to correctly handle this case. This commit therefore sets
    ->rcu_read_lock_nesting to 1 in that case, so that the subsequence call
    to __rcu_read_unlock() causes the exiting task to exit its dangling RCU
    read-side critical section.

    Note that deferred quiescent states need not be considered. The reason
    is that removing the task from the ->blkd_tasks[] list in the call to
    rcu_preempt_deferred_qs() handles the per-task component of any deferred
    quiescent state, and all other components of any deferred quiescent state
    are associated with the CPU, which isn't going anywhere until some later
    CPU-hotplug operation, which will report any remaining deferred quiescent
    states from within the rcu_report_dead() function.

    Note also that negative values of ->rcu_read_lock_nesting need not be
    considered. First, these won't show up in exit_rcu() unless there is
    a serious bug in RCU, and second, setting ->rcu_read_lock_nesting sets
    the state so that the RCU read-side critical section will be exited
    normally.

    Again, this code has no effect unless there has been some prior bug
    that prevents a task from leaving an RCU read-side critical section
    before exiting. Furthermore, there have been no reports of the bug
    fixed by this commit appearing in production. This commit is therefore
    absolutely -not- recommended for backporting to -stable.

    Reported-by: ABHISHEK DUBEY
    Reported-by: BHARATH Y MOURYA
    Reported-by: Aravinda Prasad
    Signed-off-by: Paul E. McKenney
    Tested-by: ABHISHEK DUBEY

    Paul E. McKenney
     

10 Feb, 2019

2 commits


26 Jan, 2019

11 commits

  • The name rcu_check_callbacks() arguably made sense back in the early
    2000s when RCU was quite a bit simpler than it is today, but it has
    become quite misleading, especially with the advent of dyntick-idle
    and NO_HZ_FULL. The rcu_check_callbacks() function is RCU's hook into
    the scheduling-clock interrupt, and is now but one of many ways that
    callbacks get promoted to invocable state.

    This commit therefore changes the name to rcu_sched_clock_irq(),
    which is the same number of characters and clearly indicates this
    function's relation to the rest of the Linux kernel. In addition, for
    the sake of consistency, rcu_flavor_check_callbacks() is also renamed
    to rcu_flavor_sched_clock_irq().

    While in the area, the header comments for both functions are reworked.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • consolidate.2019.01.26a: RCU flavor consolidation cleanups.
    fwd.2019.01.26a: RCU grace-period forward-progress fixes.

    Paul E. McKenney
     
  • This commit updates a few obsolete comments in the RCU callback-offload
    code.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Given that RCU has a perfectly good per-CPU rcu_data structure, most
    per-CPU quantities should be stored there.

    This commit therefore moves the rcu_cpu_has_work per-CPU variable to
    the rcu_data structure. This also makes this variable unconditionally
    present, which should be acceptable given the memory reduction due to the
    RCU flavor consolidation and also due to simplifications this will enable.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The rcu_cpu_kthread_loops variable used to provide debugfs information,
    but is no longer used. This commit therefore removes it.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Given that RCU has a perfectly good per-CPU rcu_data structure, most
    per-CPU quantities should be stored there.

    This commit therefore moves the rcu_cpu_kthread_status per-CPU variable
    to the rcu_data structure. This also makes this variable unconditionally
    present, which should be acceptable given the memory reduction due to the
    RCU flavor consolidation and also due to simplifications this will enable.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Given that RCU has a perfectly good per-CPU rcu_data structure, most
    per-CPU quantities should be stored there.

    This commit therefore moves the rcu_cpu_kthread_task per-CPU variable to
    the rcu_data structure. This also makes this variable unconditionally
    present, which should be acceptable given the memory reduction due to the
    RCU flavor consolidation and also due to simplifications this will enable.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Back when there were multiple flavors of RCU, it was necessary to
    separately count lazy and non-lazy callbacks for each CPU. These counts
    were used in CONFIG_RCU_FAST_NO_HZ kernels to determine how long a newly
    idle CPU should be allowed to sleep before handling its RCU callbacks.
    But now that there is only one flavor, the callback counts for a given
    CPU's sole rcu_data structure are the counts for that CPU.

    This commit therefore removes the rcu_data structure's ->nonlazy_posted
    and ->nonlazy_posted_snap fields, the rcu_idle_count_callbacks_posted()
    and rcu_cpu_has_callbacks() functions, repurposes the rcu_data structure's
    ->all_lazy field to record the laziness state at the beginning of the
    latest idle sojourn, and modifies CONFIG_RCU_FAST_NO_HZ RCU CPU stall
    warnings accordingly.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Now that rcu_blocking_is_gp() makes the correct immediate-return
    decision for both PREEMPT and !PREEMPT, a single implementation of
    synchronize_rcu() will work correctly under both configurations.
    This commit therefore eliminates a few lines of code by consolidating
    the two implementations of synchronize_rcu().

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The rcu_kthread_do_work() function has a single-line body and only one
    remaining caller. This commit therefore saves a few lines of code by
    inlining rcu_kthread_do_work() into its sole remaining caller.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Given RCU flavor consolidation, the name rcu_spawn_all_nocb_kthreads()
    is quite misleading. It no longer ever creates more than one kthread,
    and it does so only for the specified CPU. This commit therefore changes
    this name to the more descriptive rcu_spawn_cpu_nocb_kthread(), and also
    fixes up a similar issue in its header comment while in the area.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

02 Dec, 2018

3 commits

  • The RCU CPU stall warnings print an estimate of the total number of
    RCU callbacks queued in the system, but this estimate leaves out
    the callbacks queued for nocbs CPUs. This commit therefore introduces
    rcu_get_n_cbs_cpu(), which gives an accurate callback estimate for
    both nocbs and normal CPUs, and uses this new function as needed.

    This commit also introduces a rcu_get_n_cbs_nocb_cpu() helper function
    that returns the number of callbacks for nocbs CPUs or zero otherwise,
    and also uses this function in place of direct access to ->nocb_q_count
    while in the area (fewer characters, you see).

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit affinities the forward-progress tests to avoid hogging a
    housekeeping CPU on the theory that the offloaded callbacks will be
    running on those housekeeping CPUs.

    Signed-off-by: Paul E. McKenney
    [ paulmck: Fix NULL-pointer issue located by kbuild test robot. ]
    Tested-by: Rong Chen

    Paul E. McKenney
     
  • …'fixes.2018.11.12a', 'initrd.2018.11.08b', 'sil.2018.11.12a' and 'srcu.2018.11.27a' into HEAD

    bug.2018.11.12a: Get rid of BUG_ON() and friends
    consolidate.2018.12.01a: Continued RCU flavor-consolidation cleanup
    doc.2018.11.12a: Documentation updates
    fixes.2018.11.12a: Miscellaneous fixes
    initrd.2018.11.08b: Automate creation of rcutorture initrd
    sil.2018.11.12a: Remove more spin_unlock_wait() calls

    Paul E. McKenney
     

13 Nov, 2018

4 commits

  • Subtracting INT_MIN can be interpreted as unconditional signed integer
    overflow, which according to the C standard is undefined behavior.
    Therefore, kernel build arguments notwithstanding, it would be good to
    future-proof the code. This commit therefore substitutes INT_MAX for
    INT_MIN in order to avoid undefined behavior.

    While in the neighborhood, this commit also creates some meaningful names
    for INT_MAX and friends in order to improve readability, as suggested
    by Joel Fernandes.

    Reported-by: Ran Rozenstein
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Because __this_cpu_read() can be lighter weight than equivalent uses of
    this_cpu_ptr(), this commit replaces the latter with the former.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • In PREEMPT kernels, an expedited grace period might send an IPI to a
    CPU that is executing an RCU read-side critical section. In that case,
    it would be nice if the rcu_read_unlock() directly interacted with the
    RCU core code to immediately report the quiescent state. And this does
    happen in the case where the reader has been preempted. But it would
    also be a nice performance optimization if immediate reporting also
    happened in the preemption-free case.

    This commit therefore adds an ->exp_hint field to the task_struct structure's
    ->rcu_read_unlock_special field. The IPI handler sets this hint when
    it has interrupted an RCU read-side critical section, and this causes
    the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(),
    which, if preemption is enabled, reports the quiescent state immediately.
    If preemption is disabled, then the report is required to be deferred
    until preemption (or bottom halves or interrupts or whatever) is re-enabled.

    Because this is a hint, it does nothing for more complicated cases. For
    example, if the IPI interrupts an RCU reader, but interrupts are disabled
    across the rcu_read_unlock(), but another rcu_read_lock() is executed
    before interrupts are re-enabled, the hint will already have been cleared.
    If you do crazy things like this, reporting will be deferred until some
    later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar.

    Reported-by: Joel Fernandes
    Signed-off-by: Paul E. McKenney
    Acked-by: Joel Fernandes (Google)

    Paul E. McKenney
     
  • The tree_plugin.h file has a number of calls to BUG_ON(), which panics
    the kernel, which is not a good strategy for devices (like embedded)
    that don't have a way to capture console output. This commit therefore
    converts these BUG_ON() calls to WARN_ON_ONCE() and WARN_ONCE().

    Reported-by: Linus Torvalds
    Signed-off-by: Paul E. McKenney
    [ paulmck: Fix typo: s/rcuo/rcub/. ]

    Paul E. McKenney
     

31 Aug, 2018

10 commits

  • This commit move ->dynticks from the rcu_dynticks structure to the
    rcu_data structure, replacing the field of the same name. It also updates
    the code to access ->dynticks from the rcu_data structure and to use the
    rcu_data structure rather than following to now-gone ->dynticks field
    to the now-gone rcu_dynticks structure. While in the area, this commit
    also fixes up comments.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit removes ->dynticks_nesting and ->dynticks_nmi_nesting from
    the rcu_dynticks structure and updates the code to access them from the
    rcu_data structure.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit removes ->rcu_need_heavy_qs and ->rcu_urgent_qs from the
    rcu_dynticks structure and updates the code to access them from the
    rcu_data structure.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit removes ->all_lazy, ->nonlazy_posted and ->nonlazy_posted_snap
    from the rcu_dynticks structure and updates the code to access them from
    the rcu_data structure.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit removes ->last_accelerate and ->last_advance_all from the
    rcu_dynticks structure and updates the code to access them from the
    rcu_data structure.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit removes ->tick_nohz_enabled_snap from the rcu_dynticks
    structure and updates the code to access it from the rcu_data
    structure.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The resched_cpu() interface is quite handy, but it does acquire the
    specified CPU's runqueue lock, which does not come for free. This
    commit therefore substitutes the following when directing resched_cpu()
    at the current CPU:

    set_tsk_need_resched(current);
    set_preempt_need_resched();

    Signed-off-by: Paul E. McKenney
    Cc: Peter Zijlstra

    Paul E. McKenney
     
  • Because nohz_full CPUs can leave the scheduler-clock interrupt disabled
    even when in kernel mode, RCU cannot rely on rcu_check_callbacks() to
    enlist the scheduler's aid in extracting a quiescent state from such CPUs.
    This commit therefore more aggressively uses resched_cpu() on nohz_full
    CPUs that fail to pass through a quiescent state in a timely manner.
    By default, the resched_cpu() beating starts 300 milliseconds into the
    quiescent state.

    While in the neighborhood, add a ->last_fqs_resched field to the rcu_data
    structure in order to rate-limit resched_cpu() calls from the RCU
    grace-period kthread.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The jiffies_till_sched_qs value used to determine how old a grace period
    must be before RCU enlists the help of the scheduler to force a quiescent
    state on the holdout CPU. Currently, this defaults to HZ/10 regardless of
    system size and may be set only at boot time. This can be a problem for
    very large systems, because if the values of the jiffies_till_first_fqs
    and jiffies_till_next_fqs kernel parameters are left at their defaults,
    they are calculated to increase as the number of CPUs actually configured
    on the system increases. Thus, on a sufficiently large system, RCU would
    enlist the help of the scheduler before the grace-period kthread had a
    chance to scan for idle CPUs, which wastes CPU time.

    This commit therefore allows jiffies_till_sched_qs to be set, if desired,
    but if left as default, computes is as jiffies_till_first_fqs plus twice
    jiffies_till_next_fqs, thus allowing three force-quiescent-state scans
    for idle CPUs. This scales with the number of CPUs, providing sensible
    default values.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The ->rcu_qs_ctr counter was intended to allow providing a lightweight
    report of a quiescent state to all RCU flavors. But now that there is
    only one flavor of RCU in any one running kernel, there is no point in
    having this feature. This commit therefore removes the ->rcu_qs_ctr
    field from the rcu_dynticks structure and the ->rcu_qs_ctr_snap field
    from the rcu_data structure. This results in the "rqc" option to the
    rcu_fqs trace event no longer being used, so this commit also removes the
    "rqc" description from the header comment.

    While in the neighborhood, this commit also causes the forward-progress
    request .rcu_need_heavy_qs be set one jiffies_till_sched_qs interval
    later in the grace period than the first setting of .rcu_urgent_qs.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney