30 Jun, 2020

1 commit

  • If there is a large number of torture tests running concurrently,
    all of which are dumping large ftrace buffers at shutdown time, the
    resulting dumping can take a very long time, particularly on systems
    with rotating-rust storage. This commit therefore adds a default-off
    torture.ftrace_dump_at_shutdown module parameter that enables
    shutdown-time ftrace-buffer dumping.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

31 Mar, 2020

1 commit

  • Pull core SMP updates from Thomas Gleixner:
    "CPU (hotplug) updates:

    - Support for locked CSD objects in smp_call_function_single_async()
    which allows to simplify callsites in the scheduler core and MIPS

    - Treewide consolidation of CPU hotplug functions which ensures the
    consistency between the sysfs interface and kernel state. The low
    level functions cpu_up/down() are now confined to the core code and
    not longer accessible from random code"

    * tag 'smp-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
    cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus()
    cpu/hotplug: Hide cpu_up/down()
    cpu/hotplug: Move bringup of secondary CPUs out of smp_init()
    torture: Replace cpu_up/down() with add/remove_cpu()
    firmware: psci: Replace cpu_up/down() with add/remove_cpu()
    xen/cpuhotplug: Replace cpu_up/down() with device_online/offline()
    parisc: Replace cpu_up/down() with add/remove_cpu()
    sparc: Replace cpu_up/down() with add/remove_cpu()
    powerpc: Replace cpu_up/down() with add/remove_cpu()
    x86/smp: Replace cpu_up/down() with add/remove_cpu()
    arm64: hibernate: Use bringup_hibernate_cpu()
    cpu/hotplug: Provide bringup_hibernate_cpu()
    arm64: Use reboot_cpu instead of hardconding it to 0
    arm64: Don't use disable_nonboot_cpus()
    ARM: Use reboot_cpu instead of hardcoding it to 0
    ARM: Don't use disable_nonboot_cpus()
    ia64: Replace cpu_down() with smp_shutdown_nonboot_cpus()
    cpu/hotplug: Create a new function to shutdown nonboot cpus
    cpu/hotplug: Add new {add,remove}_cpu() functions
    sched/core: Remove rq.hrtick_csd_pending
    ...

    Linus Torvalds
     

25 Mar, 2020

1 commit

  • The core device API performs extra housekeeping bits that are missing
    from directly calling cpu_up/down().

    See commit a6717c01ddc2 ("powerpc/rtas: use device model APIs and
    serialization during LPM") for an example description of what might go
    wrong.

    This also prepares to make cpu_up/down() a private interface of the CPU
    subsystem.

    Signed-off-by: Qais Yousef
    Signed-off-by: Thomas Gleixner
    Acked-by: "Paul E. McKenney"
    Link: https://lkml.kernel.org/r/20200323135110.30522-16-qais.yousef@arm.com

    Qais Yousef
     

21 Feb, 2020

2 commits

  • In theory, RCU-hotplug operations are supposed to work as soon as there
    is more than one CPU online. However, in practice, in normal production
    there is no way to make them happen until userspace is up and running.
    Besides which, on smaller systems, rcutorture doesn't start doing hotplug
    operations until 30 seconds after the start of boot, which on most
    systems also means the better part of 30 seconds after the end of boot.
    This commit therefore provides a new torture.disable_onoff_at_boot kernel
    boot parameter that suppresses CPU-hotplug torture operations until
    about the time that init is spawned.

    Of course, if you know of a need for boottime CPU-hotplug operations,
    then you should avoid passing this argument to any of the torture tests.
    You might also want to look at the splats linked to below.

    Link: https://lore.kernel.org/lkml/20191206185208.GA25636@paulmck-ThinkPad-P72/
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • During boot, CPU hotplug is often disabled, for example by PCI probing.
    On large systems that take substantial time to boot, this can result
    in spurious RCU_HOTPLUG errors. This commit therefore forgives any
    boottime -EBUSY CPU-hotplug failures by adjusting counters to pretend
    that the corresponding attempt never happened. A non-splat record
    of the failed attempt is emitted to the console with the added string
    "(-EBUSY forgiven during boot)".

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

02 Aug, 2019

1 commit

  • The functions torture_onoff_cleanup() and torture_shuffle_cleanup()
    are declared static and marked EXPORT_SYMBOL_GPL(), which is at best an
    odd combination. Because these functions are not used outside of the
    kernel/torture.c file they are defined in, this commit removes their
    EXPORT_SYMBOL_GPL() marking.

    Fixes: cc47ae083026 ("rcutorture: Abstract torture-test cleanup")
    Signed-off-by: Denis Efremov
    Signed-off-by: Paul E. McKenney

    Denis Efremov
     

29 May, 2019

2 commits

  • Currently, the inter-stutter interval is the same as the stutter duration,
    that is, whatever number of jiffies is passed into torture_stutter_init().
    This has worked well for quite some time, but the addition of
    forward-progress testing to rcutorture can delay processes for several
    seconds, which can triple the time that they are stuttered.

    This commit therefore adds a second argument to torture_stutter_init()
    that specifies the inter-stutter interval. While locktorture preserves
    the current behavior, rcutorture uses the RCU CPU stall warning interval
    to provide a wider inter-stutter interval.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The stutter_wait() function is supposed to return true if it actually
    waits and false otherwise, but it instead unconditionally returns false.
    Which hides a bug in rcu_torture_writer() that fails to account for
    the fact that one of the rcu_tortures[] array elements will normally be
    referenced by rcu_torture_current, and thus not be on the freelist.

    This commit therefore corrects the stutter_wait() return value and adds a
    check for rcu_torture_current to rcu_torture_writer()'s check that things
    get freed after everything goes quiescent. In addition, this commit
    causes torture_stutter() to give a bit more than one second (instead of
    only one jiffy) warning of the end of the stutter interval. Finally,
    this commit disables long-delay readers and aggressive update-side
    forward-progress checks while forward-progress testing is in flight.

    Reported-by: Sebastian Andrzej Siewior
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

27 Mar, 2019

1 commit

  • If there is only one online CPU, it doesn't make sense to try to offline
    it, as any such attempt is guaranteed to fail. This commit therefore
    check for this condition and refuses to attempt the nonsensical.

    Reported-by: Su Yue
    Signed-off-by: Paul E. McKenney
    Tested-By: Su Yue

    Paul E. McKenney
     

10 Feb, 2019

2 commits


26 Jan, 2019

1 commit

  • Beyond a certain point in the CPU-hotplug offline process, timers get
    stranded on the outgoing CPU, and won't fire until that CPU comes back
    online, which might well be never. This commit therefore adds a hook
    in torture_onoff_init() that is invoked from torture_offline(), which
    rcutorture uses to occasionally wait for a grace period. This should
    result in failures for RCU implementations that rely on stranded timers
    eventually firing in the absence of the CPU coming back online.

    Reported-by: Sebastian Andrzej Siewior
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

02 Dec, 2018

2 commits

  • Remove return variables (declared as "ret") in cases where,
    depending on whether a condition evaluates as true, the result of a
    function call can be immediately returned instead of storing the result in
    the return variable. When the condition evaluates as false, the constant
    initially stored in the return variable at declaration is returned instead.

    Signed-off-by: Pierce Griffiths
    Signed-off-by: Paul E. McKenney

    Pierce Griffiths
     
  • Currently, the torture scripts rely on the initrd/init script to bring
    any extra CPUs online, for example, in the case where the kernel and
    qemu have different ideas about how many CPUs are present. This works,
    but is an unnecessary dependency on initrd, which needs to vary depending
    on the distro. This commit therefore causes torture_onoff() to check
    for additional CPUs, attempting to bring any found online. Errors are
    ignored, just as they are by the initrd/init script.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

30 Aug, 2018

1 commit

  • The rcu_torture_writer() function invokes stutter_wait() at the end of
    each writer pass, which occasionally blocks for an extended time period
    in order to ensure that RCU can handle intermittent loads. But part of
    handling a busy period is invoking all the callbacks before the end of
    the idle period induced by stutter_wait().

    This commit therefore adds a return value to stutter_wait() indicating
    whether stutter_wait() actually waited. In addition, this commit causes
    rcu_torture_writer() to test this value and if set, checks that all the
    elements of the rcu_tortures[] array have been freed up.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

26 Jun, 2018

2 commits

  • This commit adds "#define pr_fmt(fmt) fmt" to the torture-test files
    in order to keep the current dmesg format. Once Joe's commits have
    hit mainline, these definitions will be changed in order to automatically
    generate the dmesg line prefix that the scripts expect. This will have
    the beneficial side-effect of allowing printk() formats to be used more
    widely and of shortening some pr_*() lines.

    Signed-off-by: Paul E. McKenney
    Cc: Joe Perches

    Paul E. McKenney
     
  • Some bugs reproduce quickly only at high CPU-hotplug rates, so the
    rcutorture TREE03 scenario now has only 200 milliseconds spacing between
    CPU-hotplug operations. At this rate, the torture-test pair of console
    messages per operation becomes a bit voluminous. This commit therefore
    converts the torture-test set of "verbose" kernel-boot arguments from
    bool to int, and prints the extra console messages only when verbose=2.
    The default is still verbose=1.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

16 May, 2018

1 commit

  • Commit e31d28b6ab8f ("trace: Eliminate cond_resched_rcu_qs() in favor
    of cond_resched()") substituted cond_resched() for the earlier call
    to cond_resched_rcu_qs(). However, the new-age cond_resched() does
    not do anything to help RCU-tasks grace periods because (1) RCU-tasks
    is only enabled when CONFIG_PREEMPT=y and (2) cond_resched() is a
    complete no-op when preemption is enabled. This situation results
    in hangs when running the trace benchmarks.

    A number of potential fixes were discussed on LKML
    (https://lkml.kernel.org/r/20180224151240.0d63a059@vmware.local.home),
    including making cond_resched() not be a no-op; making cond_resched()
    not be a no-op, but only when running tracing benchmarks; reverting
    the aforementioned commit (which works because cond_resched_rcu_qs()
    does provide an RCU-tasks quiescent state; and adding a call to the
    scheduler/RCU rcu_note_voluntary_context_switch() function. All were
    deemed unsatisfactory, either due to added cond_resched() overhead or
    due to magic functions inviting cargo culting.

    This commit renames cond_resched_rcu_qs() to cond_resched_tasks_rcu_qs(),
    which provides a clear hint as to what this function is doing and
    why and where it should be used, and then replaces the call to
    cond_resched() with cond_resched_tasks_rcu_qs() in the trace benchmark's
    benchmark_event_kthread() function.

    Reported-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney
    Tested-by: Nicholas Piggin

    Paul E. McKenney
     

12 Dec, 2017

3 commits

  • Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The purpose of torture_runnable is to allow rcutorture and locktorture
    to be started and stopped via sysfs when they are built into the kernel
    (as in not compiled as loadable modules). However, the 0444 permissions
    for both instances of torture_runnable prevent this use case from ever
    being put into practice. Given that there have been no complaints
    about this deficiency, it is reasonable to conclude that no one actually
    makes use of this sysfs capability. The perf_runnable module parameter
    for rcuperf is in the same situation.

    This commit therefore removes both torture_runnable instances as well
    as perf_runnable.

    Reported-by: Thomas Gleixner
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The stutter_wait() function repeatedly fetched stutter_pause_test, and
    should really just fetch it once on each pass. The races should be
    harmless, but why have the races? Also, the whole point of the value
    "2" for stutter_pause_test is to get everyone to start at very nearly
    the same time, but the value "2" was the first jiffy of the stutter
    rather than the last jiffy of the stutter.

    This commit rearranges the code to be more sensible.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

29 Nov, 2017

1 commit


26 Jul, 2017

1 commit

  • The torture status line contains a series of values preceded by "onoff:".
    The last value in that line, the one preceding the "HZ=" string, is
    always zero. The reason that it is always zero is that torture_offline()
    was incrementing the sum_offl pointer instead of the value that this
    pointer referenced. This commit therefore makes this increment operate
    on the statistic rather than the pointer to the statistic.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

02 Mar, 2017

1 commit


28 Feb, 2017

1 commit

  • Fix typos and add the following to the scripts/spelling.txt:

    varible||variable

    While we are here, tidy up the comment blocks that fit in a single line
    for drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c and
    net/sctp/transport.c.

    Link: http://lkml.kernel.org/r/1481573103-11329-11-git-send-email-yamada.masahiro@socionext.com
    Signed-off-by: Masahiro Yamada
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     

23 Aug, 2016

1 commit

  • Upcoming changes to the timer wheel introduce significant inaccuracy
    and possibly also an ultimate limit on timeout duration. This is a
    problem for the current implementation of torture_shutdown() because
    (1) shutdown times are user-specified, and can therefore be quite long,
    and (2) the torture scripting will kill a test instance that runs for
    more than a few minutes longer than scheduled. This commit therefore
    converts the torture_shutdown() timed waits to an hrtimer, thus avoiding
    too-short torture test runs as well as death by scripting.

    Signed-off-by: Paul E. McKenney
    Acked-by: Arnd Bergmann

    Paul E. McKenney
     

15 Jun, 2016

2 commits


22 Apr, 2016

1 commit

  • When running from the scripts, rcutorture is completely headless,
    so there is no way to to manually dump the trace buffer. This commit
    therefore unconditionally dumps the trace buffer upon timed shutdown.
    However, if you are using rmmod to end the test, it is still up to you
    to manually dump the trace buffer.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

01 Apr, 2016

1 commit


07 Oct, 2015

1 commit


28 May, 2015

1 commit


17 Sep, 2014

1 commit

  • When performing module cleanups by calling torture_cleanup() the
    'torture_type' string in nullified However, callers are not necessarily
    done, and might still need to reference the variable. This impacts
    both rcutorture and locktorture, causing printing things like:

    [ 94.226618] (null)-torture: Stopping lock_torture_writer task
    [ 94.226624] (null)-torture: Stopping lock_torture_stats task

    Thus delay this operation until the very end of the cleanup process.
    The consequence (which shouldn't matter for this kid of program) is,
    of course, that we delay the window between rmmod and modprobing,
    for instance in module_torture_begin().

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Paul E. McKenney

    Davidlohr Bueso
     

08 Sep, 2014

1 commit

  • User pr_alert/pr_cont for printing the logs from rcutorture module directly
    instead of writing it to a buffer and then printing it. This allows us from not
    having to allocate such buffers. Also remove a resulting empty function.

    I tested this using the parse-torture.sh script as follows:

    $ dmesg | grep torture > log.txt
    $ bash parse-torture.sh log.txt test
    $

    There were no warnings which means that parsing went fine.

    Signed-off-by: Joe Perches
    Signed-off-by: Pranith Kumar
    Signed-off-by: Paul E. McKenney

    Joe Perches
     

08 Jul, 2014

1 commit

  • Since the torture-test thread creation interface does not include
    format string arguments, this commit makes sure the name can never be
    accidentally processed as a format string.

    Signed-off-by: Kees Cook
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Kees Cook
     

15 May, 2014

5 commits

  • Loading rcutorture as a module (as opposed to building it directly into
    the kernel) results in the following splat:

    [Wed Apr 16 15:29:33 2014] BUG: unable to handle kernel paging request at ffffffffa0003000
    [Wed Apr 16 15:29:33 2014] IP: [] 0xffffffffa0003000
    [Wed Apr 16 15:29:33 2014] PGD 1c0f067 PUD 1c10063 PMD 378a6067 PTE 0
    [Wed Apr 16 15:29:33 2014] Oops: 0010 [#1] SMP
    [Wed Apr 16 15:29:33 2014] Modules linked in: rcutorture(+) torture
    [Wed Apr 16 15:29:33 2014] CPU: 0 PID: 4257 Comm: modprobe Not tainted 3.15.0-rc1 #10
    [Wed Apr 16 15:29:33 2014] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006
    [Wed Apr 16 15:29:33 2014] task: ffff8800db1e88d0 ti: ffff8800db25c000 task.ti: ffff8800db25c000
    [Wed Apr 16 15:29:33 2014] RIP: 0010:[] [] 0xffffffffa0003000
    [Wed Apr 16 15:29:33 2014] RSP: 0018:ffff8800db25dca0 EFLAGS: 00010282
    [Wed Apr 16 15:29:33 2014] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
    [Wed Apr 16 15:29:33 2014] RDX: ffffffffa00090a8 RSI: 0000000000000001 RDI: ffffffffa0008337
    [Wed Apr 16 15:29:33 2014] RBP: ffff8800db25dd50 R08: 0000000000000000 R09: 0000000000000000
    [Wed Apr 16 15:29:33 2014] R10: ffffea000357b680 R11: ffffffff8113257a R12: ffffffffa000d000
    [Wed Apr 16 15:29:33 2014] R13: ffffffffa00094c0 R14: ffffffffa0009510 R15: 0000000000000001
    [Wed Apr 16 15:29:33 2014] FS: 00007fee30ce5700(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
    [Wed Apr 16 15:29:33 2014] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [Wed Apr 16 15:29:33 2014] CR2: ffffffffa0003000 CR3: 00000000d5eb1000 CR4: 00000000000006f0
    [Wed Apr 16 15:29:33 2014] Stack:
    [Wed Apr 16 15:29:33 2014] ffffffffa000d02c 0000000000000000 ffff88021700d400 0000000000000000
    [Wed Apr 16 15:29:33 2014] ffff8800db25dd40 ffffffff81647951 ffff8802162bd000 ffff88021541846c
    [Wed Apr 16 15:29:33 2014] 0000000000000000 ffffffff817dbe2d ffffffff817dbe2d 0000000000000001
    [Wed Apr 16 15:29:33 2014] Call Trace:
    [Wed Apr 16 15:29:33 2014] [] ? rcu_torture_init+0x2c/0x8b4 [rcutorture]
    [Wed Apr 16 15:29:33 2014] [] ? netlink_broadcast_filtered+0x121/0x3a0
    [Wed Apr 16 15:29:33 2014] [] ? mutex_lock+0xd/0x2a
    [Wed Apr 16 15:29:33 2014] [] ? mutex_lock+0xd/0x2a
    [Wed Apr 16 15:29:33 2014] [] ? trace_module_notify+0x62/0x1d0
    [Wed Apr 16 15:29:33 2014] [] ? 0xffffffffa000cfff
    [Wed Apr 16 15:29:33 2014] [] do_one_initcall+0xfa/0x140
    [Wed Apr 16 15:29:33 2014] [] ? __blocking_notifier_call_chain+0x5e/0x80
    [Wed Apr 16 15:29:33 2014] [] load_module+0x1931/0x21b0
    [Wed Apr 16 15:29:33 2014] [] ? show_initstate+0x50/0x50
    [Wed Apr 16 15:29:33 2014] [] SyS_init_module+0x9e/0xc0
    [Wed Apr 16 15:29:33 2014] [] system_call_fastpath+0x16/0x1b
    [Wed Apr 16 15:29:33 2014] Code: Bad RIP value.
    [Wed Apr 16 15:29:33 2014] RIP [] 0xffffffffa0003000
    [Wed Apr 16 15:29:33 2014] RSP
    [Wed Apr 16 15:29:33 2014] CR2: ffffffffa0003000
    [Wed Apr 16 15:29:33 2014] ---[ end trace 3e88c173037af84b ]---

    This splat is due to the fact that torture_init_begin() and
    torture_init_end() are both marked with __init, despite their use
    at runtime. This commit therefore removes __init from both functions.

    Signed-off-by: Pranith Kumar
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Pranith Kumar
     
  • The torture tests are designed to run in isolation, but do not enforce
    this isolation. This commit therefore checks for concurrent torture
    tests, and refuses to start new tests while old tests are running.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • In torture_shuffle_tasks function, the check if an all-zero mask can
    be passed to set_cpus_allowed_ptr() is redundant after clearing the
    shuffle_idle_cpu bit. If the mask had more than one bit set, after
    clearing a bit it has at least one bit set. If the mask had only
    one bit set, a check is made at the beginning, where the function
    returns, as there is no need to shuffle only one cpu.

    Also, this code is executed inside a critical section, delimited by
    get_online_cpus(), and put_online_cpus(), preventing CPUs from leaving between
    the check of num_online_cpus and the calls to set_cpus_allowed_ptr() function.

    Signed-off-by: Iulia Manda
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Iulia Manda
     
  • Currently, all stuttered kthreads block a jiffy at a time, which can
    result in them starting at different times. (Note: This is not an
    energy-efficiency problem unless you run torture tests in production,
    in which case you have other problems!) This commit increases the
    intensity of the restart event by causing kthreads to spin through the
    last jiffy, restarting when they see the variable change.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Currently, torture_kthread_stopping() prints only the name of the
    kthread that is stopping, which can be unedifying. This commit therefore
    adds "Stopping" to make things more evident.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney