09 Dec, 2020

1 commit

  • This new interface allows to trigger a stopper on a given CPU and wait
    for the end of the work in a separated function cpu_stop_work_wait().

    This differs from stop_one_cpu_nowait() by allowing the usage of the
    cpu_stop completion mechanism.

    Bug: 161210528
    Change-Id: Ida51371e32897d008ece0639190fc21feabb0f28
    Signed-off-by: Vincent Donnefort

    Vincent Donnefort
     

02 Nov, 2020

1 commit


26 Oct, 2020

1 commit

  • Some architectures assume that the stopped CPUs don't make function calls
    to traceable functions when they are in the stopped state. See also commit
    cb9d7fd51d9f ("watchdog: Mark watchdog touch functions as notrace").

    Violating this assumption causes kernel crashes when switching tracer on
    RISC-V.

    Mark rcu_momentary_dyntick_idle() and stop_machine_yield() notrace to
    prevent this.

    Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield")
    Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in multi_cpu_stop()")
    Signed-off-by: Zong Li
    Signed-off-by: Thomas Gleixner
    Tested-by: Atish Patra
    Tested-by: Colin Ian King
    Acked-by: Steven Rostedt (VMware)
    Acked-by: Paul E. McKenney
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20201021073839.43935-1-zong.li@sifive.com

    Zong Li
     

28 Jul, 2020

1 commit

  • Make some scheduler APIs exports to allow vendor
    modules to use them. It is necessary for the modules
    to migrate tasks as they want.

    activate_task:
    To make an inactive and migrated task runnable.

    deactivate_task:
    To make an active task migratible.

    check_preempt_curr:
    To check whether a migrated task needs to preempt
    current task and if so, to do it.

    set_task_cpu:
    To set a cpu for a migratible task and force
    the task to be migrated.

    stop_one_cpu_nowait:
    To move a queued task, stopper should be used.

    Bug: 155241766

    Signed-off-by: Choonghoon Park
    Change-Id: Ied940640525101efbbcef6eca0c39f15eb580007

    Choonghoon Park
     

17 Jan, 2020

1 commit

  • The function stop_cpus() is only used internally by the
    stop_machine for stop multiple cpus.

    Make it static.

    Signed-off-by: Yangtao Li
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20191228161912.24082-1-tiny.windzz@gmail.com

    Yangtao Li
     

17 Dec, 2019

1 commit

  • try_stop_cpus is not used after this:

    commit c190c3b16c0f ("rcu: Switch synchronize_sched_expedited() to
    stop_one_cpu()")

    So remove it.

    Signed-off-by: Yangtao Li
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20191214195107.26480-1-tiny.windzz@gmail.com

    Yangtao Li
     

31 Oct, 2019

1 commit

  • …k/linux-rcu into core/rcu

    Pull RCU and LKMM changes from Paul E. McKenney:

    - Documentation updates.

    - Miscellaneous fixes.

    - Dynamic tick (nohz) updates, perhaps most notably changes to
    force the tick on when needed due to lengthy in-kernel execution
    on CPUs on which RCU is waiting.

    - Replace rcu_swap_protected() with rcu_prepace_pointer().

    - Torture-test updates.

    - Linux-kernel memory consistency model updates.

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

17 Oct, 2019

1 commit

  • Both multi_cpu_stop() and set_state() access multi_stop_data::state
    racily using plain accesses. These are subject to compiler
    transformations which could break the intended behaviour of the code,
    and this situation is detected by KCSAN on both arm64 and x86 (splats
    below).

    Improve matters by using READ_ONCE() and WRITE_ONCE() to ensure that the
    compiler cannot elide, replay, or tear loads and stores.

    In multi_cpu_stop() the two loads of multi_stop_data::state are expected to
    be a consistent value, so snapshot the value into a temporary variable to
    ensure this.

    The state transitions are serialized by atomic manipulation of
    multi_stop_data::num_threads, and other fields in multi_stop_data are not
    modified while subject to concurrent reads.

    KCSAN splat on arm64:

    | BUG: KCSAN: data-race in multi_cpu_stop+0xa8/0x198 and set_state+0x80/0xb0
    |
    | write to 0xffff00001003bd00 of 4 bytes by task 24 on cpu 3:
    | set_state+0x80/0xb0
    | multi_cpu_stop+0x16c/0x198
    | cpu_stopper_thread+0x170/0x298
    | smpboot_thread_fn+0x40c/0x560
    | kthread+0x1a8/0x1b0
    | ret_from_fork+0x10/0x18
    |
    | read to 0xffff00001003bd00 of 4 bytes by task 14 on cpu 1:
    | multi_cpu_stop+0xa8/0x198
    | cpu_stopper_thread+0x170/0x298
    | smpboot_thread_fn+0x40c/0x560
    | kthread+0x1a8/0x1b0
    | ret_from_fork+0x10/0x18
    |
    | Reported by Kernel Concurrency Sanitizer on:
    | CPU: 1 PID: 14 Comm: migration/1 Not tainted 5.3.0-00007-g67ab35a199f4-dirty #3
    | Hardware name: linux,dummy-virt (DT)

    KCSAN splat on x86:

    | write to 0xffffb0bac0013e18 of 4 bytes by task 19 on cpu 2:
    | set_state kernel/stop_machine.c:170 [inline]
    | ack_state kernel/stop_machine.c:177 [inline]
    | multi_cpu_stop+0x1a4/0x220 kernel/stop_machine.c:227
    | cpu_stopper_thread+0x19e/0x280 kernel/stop_machine.c:516
    | smpboot_thread_fn+0x1a8/0x300 kernel/smpboot.c:165
    | kthread+0x1b5/0x200 kernel/kthread.c:255
    | ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:352
    |
    | read to 0xffffb0bac0013e18 of 4 bytes by task 44 on cpu 7:
    | multi_cpu_stop+0xb4/0x220 kernel/stop_machine.c:213
    | cpu_stopper_thread+0x19e/0x280 kernel/stop_machine.c:516
    | smpboot_thread_fn+0x1a8/0x300 kernel/smpboot.c:165
    | kthread+0x1b5/0x200 kernel/kthread.c:255
    | ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:352
    |
    | Reported by Kernel Concurrency Sanitizer on:
    | CPU: 7 PID: 44 Comm: migration/7 Not tainted 5.3.0+ #1
    | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014

    Signed-off-by: Mark Rutland
    Signed-off-by: Thomas Gleixner
    Acked-by: Marco Elver
    Link: https://lkml.kernel.org/r/20191007104536.27276-1-mark.rutland@arm.com

    Mark Rutland
     

06 Oct, 2019

1 commit


08 Aug, 2019

1 commit

  • Make sure the entire for loop has stop_cpus_in_progress set.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Aaron Lu
    Cc: Valentin Schneider
    Cc: mingo@kernel.org
    Cc: Phil Auld
    Cc: Julien Desfossez
    Cc: Nishanth Aravamudan
    Link: https://lkml.kernel.org/r/0fd8fd4b99b9b9aa88d8b2dff897f7fd0d88f72c.1559129225.git.vpillai@digitalocean.com

    Peter Zijlstra
     

15 Jun, 2019

2 commits

  • stop_machine is the only user left of cpu_relax_yield. Given that it
    now has special semantics which are tied to stop_machine introduce a
    weak stop_machine_yield function which architectures can override, and
    get rid of the generic cpu_relax_yield implementation.

    Acked-by: Peter Zijlstra (Intel)
    Acked-by: Thomas Gleixner
    Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • The stop_machine loop to advance the state machine and to wait for all
    affected CPUs to check-in calls cpu_relax_yield in a tight loop until
    the last missing CPUs acknowledged the state transition.

    On a virtual system where not all logical CPUs are backed by real CPUs
    all the time it can take a while for all CPUs to check-in. With the
    current definition of cpu_relax_yield a diagnose 0x44 is done which
    tells the hypervisor to schedule *some* other CPU. That can be any
    CPU and not necessarily one of the CPUs that need to run in order to
    advance the state machine. This can lead to a pretty bad diagnose 0x44
    storm until the last missing CPU finally checked-in.

    Replace the undirected cpu_relax_yield based on diagnose 0x44 with a
    directed yield. Each CPU in the wait loop will pick up the next CPU
    in the cpumask of stop_machine. The diagnose 0x9c is used to tell the
    hypervisor to run this next CPU instead of the current one. If there
    is only a limited number of real CPUs backing the virtual CPUs we
    end up with the real CPUs passed around in a round-robin fashion.

    [heiko.carstens@de.ibm.com]:
    Use cpumask_next_wrap as suggested by Peter Zijlstra.

    Signed-off-by: Martin Schwidefsky
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: Thomas Gleixner
    Signed-off-by: Heiko Carstens

    Martin Schwidefsky
     

24 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this file is released under the gplv2 and any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 1 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Kate Stewart
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190520170857.732920462@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

09 Apr, 2019

1 commit

  • %pF and %pf are functionally equivalent to %pS and %ps conversion
    specifiers. The former are deprecated, therefore switch the current users
    to use the preferred variant.

    The changes have been produced by the following command:

    git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
    while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done

    And verifying the result.

    Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
    Cc: Andy Shevchenko
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: sparclinux@vger.kernel.org
    Cc: linux-um@lists.infradead.org
    Cc: xen-devel@lists.xenproject.org
    Cc: linux-acpi@vger.kernel.org
    Cc: linux-pm@vger.kernel.org
    Cc: drbd-dev@lists.linbit.com
    Cc: linux-block@vger.kernel.org
    Cc: linux-mmc@vger.kernel.org
    Cc: linux-nvdimm@lists.01.org
    Cc: linux-pci@vger.kernel.org
    Cc: linux-scsi@vger.kernel.org
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: linux-mm@kvack.org
    Cc: ceph-devel@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Signed-off-by: Sakari Ailus
    Acked-by: David Sterba (for btrfs)
    Acked-by: Mike Rapoport (for mm/memblock.c)
    Acked-by: Bjorn Helgaas (for drivers/pci)
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Petr Mladek

    Sakari Ailus
     

14 Aug, 2018

1 commit

  • Pull scheduler updates from Thomas Gleixner:

    - Cleanup and improvement of NUMA balancing

    - Refactoring and improvements to the PELT (Per Entity Load Tracking)
    code

    - Watchdog simplification and related cleanups

    - The usual pile of small incremental fixes and improvements

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
    watchdog: Reduce message verbosity
    stop_machine: Reflow cpu_stop_queue_two_works()
    sched/numa: Move task_numa_placement() closer to numa_migrate_preferred()
    sched/numa: Use group_weights to identify if migration degrades locality
    sched/numa: Update the scan period without holding the numa_group lock
    sched/numa: Remove numa_has_capacity()
    sched/numa: Modify migrate_swap() to accept additional parameters
    sched/numa: Remove unused task_capacity from 'struct numa_stats'
    sched/numa: Skip nodes that are at 'hoplimit'
    sched/debug: Reverse the order of printing faults
    sched/numa: Use task faults only if numa_group is not yet set up
    sched/numa: Set preferred_node based on best_cpu
    sched/numa: Simplify load_too_imbalanced()
    sched/numa: Evaluate move once per node
    sched/numa: Remove redundant field
    sched/debug: Show the sum wait time of a task group
    sched/fair: Remove #ifdefs from scale_rt_capacity()
    sched/core: Remove get_cpu() from sched_fork()
    sched/cpufreq: Clarify sugov_get_util()
    sched/sysctl: Remove unused sched_time_avg_ms sysctl
    ...

    Linus Torvalds
     

06 Aug, 2018

1 commit

  • When cpu_stop_queue_work() releases the lock for the stopper
    thread that was queued into its wake queue, preemption is
    enabled, which leads to the following deadlock:

    CPU0 CPU1
    sched_setaffinity(0, ...)
    __set_cpus_allowed_ptr()
    stop_one_cpu(0, ...) stop_two_cpus(0, 1, ...)
    cpu_stop_queue_work(0, ...) cpu_stop_queue_two_works(0, ..., 1, ...)

    -grabs lock for migration/0-
    -spins with preemption disabled,
    waiting for migration/0's lock to be
    released-

    -adds work items for migration/0
    and queues migration/0 to its
    wake_q-

    -releases lock for migration/0
    and preemption is enabled-

    -current thread is preempted,
    and __set_cpus_allowed_ptr
    has changed the thread's
    cpu allowed mask to CPU1 only-

    -acquires migration/0 and migration/1's
    locks-

    -adds work for migration/0 but does not
    add migration/0 to wake_q, since it is
    already in a wake_q-

    -adds work for migration/1 and adds
    migration/1 to its wake_q-

    -releases migration/0 and migration/1's
    locks, wakes migration/1, and enables
    preemption-

    -since migration/1 is requested to run,
    migration/1 begins to run and waits on
    migration/0, but migration/0 will never
    be able to run, since the thread that
    can wake it is affine to CPU1-

    Disable preemption in cpu_stop_queue_work() before queueing works for
    stopper threads, and queueing the stopper thread in the wake queue, to
    ensure that the operation of queueing the works and waking the stopper
    threads is atomic.

    Fixes: 0b26351b910f ("stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock")
    Signed-off-by: Prasad Sodagudi
    Signed-off-by: Isaac J. Manjarres
    Signed-off-by: Thomas Gleixner
    Cc: peterz@infradead.org
    Cc: matt@codeblueprint.co.uk
    Cc: bigeasy@linutronix.de
    Cc: gregkh@linuxfoundation.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/1533329766-4856-1-git-send-email-isaacm@codeaurora.org

    Co-Developed-by: Isaac J. Manjarres

    Prasad Sodagudi
     

02 Aug, 2018

1 commit

  • The code flow in cpu_stop_queue_two_works() is a little arcane; fix this by
    lifting the preempt_disable() to the top to create more natural nesting wrt
    the spinlocks and make the wake_up_q() and preempt_enable() unconditional
    at the end.

    Furthermore, enable preemption in the -EDEADLK case, such that we spin-wait
    with preemption enabled.

    Suggested-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: Sebastian Andrzej Siewior
    Cc: isaacm@codeaurora.org
    Cc: matt@codeblueprint.co.uk
    Cc: psodagud@codeaurora.org
    Cc: gregkh@linuxfoundation.org
    Cc: pkondeti@codeaurora.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180730112140.GH2494@hirez.programming.kicks-ass.net

    Peter Zijlstra
     

25 Jul, 2018

1 commit

  • This commit:

    9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper threads")

    does not fully address the race condition that can occur
    as follows:

    On one CPU, call it CPU 3, thread 1 invokes
    cpu_stop_queue_two_works(2, 3,...), and the execution is such
    that thread 1 queues the works for migration/2 and migration/3,
    and is preempted after releasing the locks for migration/2 and
    migration/3, but before waking the threads.

    Then, On CPU 2, a kworker, call it thread 2, is running,
    and it invokes cpu_stop_queue_two_works(1, 2,...), such that
    thread 2 queues the works for migration/1 and migration/2.
    Meanwhile, on CPU 3, thread 1 resumes execution, and wakes
    migration/2 and migration/3. This means that when CPU 2
    releases the locks for migration/1 and migration/2, but before
    it wakes those threads, it can be preempted by migration/2.

    If thread 2 is preempted by migration/2, then migration/2 will
    execute the first work item successfully, since migration/3
    was woken up by CPU 3, but when it goes to execute the second
    work item, it disables preemption, calls multi_cpu_stop(),
    and thus, CPU 2 will wait forever for migration/1, which should
    have been woken up by thread 2. However migration/1 cannot be
    woken up by thread 2, since it is a kworker, so it is affine to
    CPU 2, but CPU 2 is running migration/2 with preemption
    disabled, so thread 2 will never run.

    Disable preemption after queueing works for stopper threads
    to ensure that the operation of queueing the works and waking
    the stopper threads is atomic.

    Co-Developed-by: Prasad Sodagudi
    Co-Developed-by: Pavankumar Kondeti
    Signed-off-by: Isaac J. Manjarres
    Signed-off-by: Prasad Sodagudi
    Signed-off-by: Pavankumar Kondeti
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: bigeasy@linutronix.de
    Cc: gregkh@linuxfoundation.org
    Cc: matt@codeblueprint.co.uk
    Fixes: 9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper threads")
    Link: http://lkml.kernel.org/r/1531856129-9871-1-git-send-email-isaacm@codeaurora.org
    Signed-off-by: Ingo Molnar

    Isaac J. Manjarres
     

15 Jul, 2018

1 commit

  • When cpu_stop_queue_two_works() begins to wake the stopper threads, it does
    so without preemption disabled, which leads to the following race
    condition:

    The source CPU calls cpu_stop_queue_two_works(), with cpu1 as the source
    CPU, and cpu2 as the destination CPU. When adding the stopper threads to
    the wake queue used in this function, the source CPU stopper thread is
    added first, and the destination CPU stopper thread is added last.

    When wake_up_q() is invoked to wake the stopper threads, the threads are
    woken up in the order that they are queued in, so the source CPU's stopper
    thread is woken up first, and it preempts the thread running on the source
    CPU.

    The stopper thread will then execute on the source CPU, disable preemption,
    and begin executing multi_cpu_stop(), and wait for an ack from the
    destination CPU's stopper thread, with preemption still disabled. Since the
    worker thread that woke up the stopper thread on the source CPU is affine
    to the source CPU, and preemption is disabled on the source CPU, that
    thread will never run to dequeue the destination CPU's stopper thread from
    the wake queue, and thus, the destination CPU's stopper thread will never
    run, causing the source CPU's stopper thread to wait forever, and stall.

    Disable preemption when waking the stopper threads in
    cpu_stop_queue_two_works().

    Fixes: 0b26351b910f ("stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock")
    Co-Developed-by: Prasad Sodagudi
    Signed-off-by: Prasad Sodagudi
    Co-Developed-by: Pavankumar Kondeti
    Signed-off-by: Pavankumar Kondeti
    Signed-off-by: Isaac J. Manjarres
    Signed-off-by: Thomas Gleixner
    Cc: peterz@infradead.org
    Cc: matt@codeblueprint.co.uk
    Cc: bigeasy@linutronix.de
    Cc: gregkh@linuxfoundation.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/1530655334-4601-1-git-send-email-isaacm@codeaurora.org

    Isaac J. Manjarres
     

15 May, 2018

1 commit


03 May, 2018

1 commit

  • Matt reported the following deadlock:

    CPU0 CPU1

    schedule(.prev=migrate/0)
    pick_next_task() ...
    idle_balance() migrate_swap()
    active_balance() stop_two_cpus()
    spin_lock(stopper0->lock)
    spin_lock(stopper1->lock)
    ttwu(migrate/0)
    smp_cond_load_acquire() -- waits for schedule()
    stop_one_cpu(1)
    spin_lock(stopper1->lock) -- waits for stopper lock

    Fix this deadlock by taking the wakeups out from under stopper->lock.
    This allows the active_balance() to queue the stop work and finish the
    context switch, which in turn allows the wakeup from migrate_swap() to
    observe the context and complete the wakeup.

    Signed-off-by: Peter Zijlstra (Intel)
    Reported-by: Matt Fleming
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Matt Fleming
    Cc: Linus Torvalds
    Cc: Michal Hocko
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20180420095005.GH4064@hirez.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

27 Apr, 2018

1 commit

  • Use raw-locks in stop_machine() to allow locking in irq-off and
    preempt-disabled regions on -RT. This also documents the possible locking
    context in general.

    [bigeasy: update patch description.]
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Sebastian Andrzej Siewior
    Link: https://lkml.kernel.org/r/20180423191635.6014-1-bigeasy@linutronix.de

    Thomas Gleixner
     

26 May, 2017

1 commit

  • Some call sites of stop_machine() are within a get_online_cpus() protected
    region.

    stop_machine() calls get_online_cpus() as well, which is possible in the
    current implementation but prevents converting the hotplug locking to a
    percpu rwsem.

    Provide stop_machine_cpuslocked() to avoid nested calls to get_online_cpus().

    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Thomas Gleixner
    Tested-by: Paul E. McKenney
    Acked-by: Paul E. McKenney
    Acked-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20170524081547.400700852@linutronix.de

    Sebastian Andrzej Siewior
     

16 Nov, 2016

1 commit

  • Some time ago the following commit:

    57f2ffe14fd125c2 ("s390: remove diag 44 calls from cpu_relax()")

    ... stopped cpu_relax() on s390 yielding to the hypervisor.

    As it turns out this made stop_machine() run really slow on virtualized
    overcommited systems. For example the kprobes test during bootup took
    several seconds instead of just running unnoticed with large guests.

    Therefore, yielding was reintroduced with commit:

    4d92f50249eb ("s390: reintroduce diag 44 calls for cpu_relax()")

    ... but in fact the stop machine code seems to be the only place where
    this yielding was really necessary. This place is probably the most
    important one as it makes all but one guest CPUs wait for one guest CPU.

    As we now have cpu_relax_yield(), we can use this in multi_cpu_stop().
    For now lets only add it here. We can add it later in other places
    when necessary.

    Signed-off-by: Christian Borntraeger
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Catalin Marinas
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Nicholas Piggin
    Cc: Noam Camus
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: virtualization@lists.linux-foundation.org
    Cc: xen-devel@lists.xenproject.org
    Link: http://lkml.kernel.org/r/1477386195-32736-3-git-send-email-borntraeger@de.ibm.com
    Signed-off-by: Ingo Molnar

    Christian Borntraeger
     

04 Oct, 2016

1 commit

  • Pull scheduler changes from Ingo Molnar:
    "The main changes are:

    - irqtime accounting cleanups and enhancements. (Frederic Weisbecker)

    - schedstat debugging enhancements, make it more broadly runtime
    available. (Josh Poimboeuf)

    - More work on asymmetric topology/capacity scheduling. (Morten
    Rasmussen)

    - sched/wait fixes and cleanups. (Oleg Nesterov)

    - PELT (per entity load tracking) improvements. (Peter Zijlstra)

    - Rewrite and enhance select_idle_siblings(). (Peter Zijlstra)

    - sched/numa enhancements/fixes (Rik van Riel)

    - sched/cputime scalability improvements (Stanislaw Gruszka)

    - Load calculation arithmetics fixes. (Dietmar Eggemann)

    - sched/deadline enhancements (Tommaso Cucinotta)

    - Fix utilization accounting when switching to the SCHED_NORMAL
    policy. (Vincent Guittot)

    - ... plus misc cleanups and enhancements"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
    sched/irqtime: Consolidate irqtime flushing code
    sched/irqtime: Consolidate accounting synchronization with u64_stats API
    u64_stats: Introduce IRQs disabled helpers
    sched/irqtime: Remove needless IRQs disablement on kcpustat update
    sched/irqtime: No need for preempt-safe accessors
    sched/fair: Fix min_vruntime tracking
    sched/debug: Add SCHED_WARN_ON()
    sched/core: Fix set_user_nice()
    sched/fair: Introduce set_curr_task() helper
    sched/core, ia64: Rename set_curr_task()
    sched/core: Fix incorrect utilization accounting when switching to fair class
    sched/core: Optimize SCHED_SMT
    sched/core: Rewrite and improve select_idle_siblings()
    sched/core: Replace sd_busy/nr_busy_cpus with sched_domain_shared
    sched/core: Introduce 'struct sched_domain_shared'
    sched/core: Restructure destroy_sched_domain()
    sched/core: Remove unused @cpu argument from destroy_sched_domain*()
    sched/wait: Introduce init_wait_entry()
    sched/wait: Avoid abort_exclusive_wait() in __wait_on_bit_lock()
    sched/wait: Avoid abort_exclusive_wait() in ___wait_event()
    ...

    Linus Torvalds
     

22 Sep, 2016

2 commits

  • stop_two_cpus() and stop_cpus() use stop_cpus_lock to avoid the deadlock,
    we need to ensure that the stopper functions can't be queued "backwards"
    from one another. This doesn't look nice; if we use lglock then we do not
    really need stopper->lock, cpu_stop_queue_work() could use lg_local_lock()
    under local_irq_save().

    OTOH it would be even better to avoid lglock in stop_machine.c and remove
    lg_double_lock(). This patch adds "bool stop_cpus_in_progress" set/cleared
    by queue_stop_cpus_work(), and changes cpu_stop_queue_two_works() to busy
    wait until it is cleared.

    queue_stop_cpus_work() sets stop_cpus_in_progress = T lockless, but after
    it queues a work on CPU1 it must be visible to stop_two_cpus(CPU1, CPU2)
    which checks it under the same lock. And since stop_two_cpus() holds the
    2nd lock too, queue_stop_cpus_work() can not clear stop_cpus_in_progress
    if it is also going to queue a work on CPU2, it needs to take that 2nd
    lock to do this.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20151121181148.GA433@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • In case @cpu == smp_proccessor_id(), we can avoid a sleep+wakeup
    cycle by doing a preemption.

    Callers such as sched_exec() can benefit from this change.

    Signed-off-by: Cheng Chao
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: chris@chris-wilson.co.uk
    Cc: tj@kernel.org
    Link: http://lkml.kernel.org/r/1473818510-6779-1-git-send-email-cs.os.kernel@gmail.com
    Signed-off-by: Ingo Molnar

    Cheng Chao
     

27 Jul, 2016

1 commit

  • Suppose that stop_machine(fn) hangs because fn() hangs. In this case NMI
    hard-lockup can be triggered on another CPU which does nothing wrong and
    the trace from nmi_panic() won't help to investigate the problem.

    And this change "fixes" the problem we (seem to) hit in practice.

    - stop_two_cpus(0, 1) races with show_state_filter() running on CPU_0.

    - CPU_1 already spins in MULTI_STOP_PREPARE state, it detects the soft
    lockup and tries to report the problem.

    - show_state_filter() enables preemption, CPU_0 calls multi_cpu_stop()
    which goes to MULTI_STOP_DISABLE_IRQ state and disables interrupts.

    - CPU_1 spends more than 10 seconds trying to flush the log buffer to
    the slow serial console.

    - NMI interrupt on CPU_0 (which now waits for CPU_1) calls nmi_panic().

    Reported-by: Wang Shu
    Signed-off-by: Oleg Nesterov
    Reviewed-by: Thomas Gleixner
    Cc: Andrew Morton
    Cc: Dave Anderson
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Tejun Heo
    Link: http://lkml.kernel.org/r/20160726185736.GB4088@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     

17 Jan, 2016

1 commit


06 Jan, 2016

1 commit


13 Dec, 2015

1 commit

  • Currently the full stop_machine() routine is only enabled on SMP if
    module unloading is enabled, or if the CPUs are hotpluggable. This
    leads to configurations where stop_machine() is broken as it will then
    only run the callback on the local CPU with irqs disabled, and not stop
    the other CPUs or run the callback on them.

    For example, this breaks MTRR setup on x86 in certain configs since
    ea8596bb2d8d379 ("kprobes/x86: Remove unused text_poke_smp() and
    text_poke_smp_batch() functions") as the MTRR is only established on the
    boot CPU.

    This patch removes the Kconfig option for STOP_MACHINE and uses the SMP
    and HOTPLUG_CPU config options to compile the correct stop_machine() for
    the architecture, removing the false dependency on MODULE_UNLOAD in the
    process.

    Link: https://lkml.org/lkml/2014/10/8/124
    References: https://bugs.freedesktop.org/show_bug.cgi?id=84794
    Signed-off-by: Chris Wilson
    Acked-by: Ingo Molnar
    Cc: "Paul E. McKenney"
    Cc: Pranith Kumar
    Cc: Michal Hocko
    Cc: Vladimir Davydov
    Cc: Johannes Weiner
    Cc: H. Peter Anvin
    Cc: Tejun Heo
    Cc: Iulia Manda
    Cc: Andy Lutomirski
    Cc: Rusty Russell
    Cc: Peter Zijlstra
    Cc: Chuck Ebbert
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Wilson
     

23 Nov, 2015

8 commits

  • 1. Change this code to use preempt_count_inc/preempt_count_dec; this way
    it works even if CONFIG_PREEMPT_COUNT=n, and we avoid the unnecessary
    __preempt_schedule() check (stop_sched_class is not preemptible).

    And this makes clear that we only want to make preempt_count() != 0
    for __might_sleep() / schedule_debug().

    2. Change WARN_ONCE() to use %pf to print the function name and remove
    kallsyms_lookup/ksym_buf.

    3. Move "int ret" into the "if (work)" block, this looks more consistent.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Tejun Heo
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Milos Vyletel
    Cc: Peter Zijlstra
    Cc: Prarit Bhargava
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20151115193332.GA8281@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • Change cpu_stop_queue_work() and cpu_stopper_thread() to check done != NULL
    before cpu_stop_signal_done(done). This makes the code more clean imo, note
    that cpu_stopper_thread() has to do this check anyway.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Tejun Heo
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Milos Vyletel
    Cc: Peter Zijlstra
    Cc: Prarit Bhargava
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20151115193329.GA8274@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • Now that cpu_stop_done->executed becomes write-only (ignoring WARN_ON()
    checks) we can remove it.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Tejun Heo
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Milos Vyletel
    Cc: Peter Zijlstra
    Cc: Prarit Bhargava
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20151115193326.GA8269@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • Change queue_stop_cpus_work() to return true if it queues at least one
    work, this means that the caller should wait.

    __stop_cpus() can check the value returned by queue_stop_cpus_work() and
    avoid done.executed, just like stop_one_cpu() does.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Tejun Heo
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Milos Vyletel
    Cc: Peter Zijlstra
    Cc: Prarit Bhargava
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20151115193323.GA8262@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • Change stop_one_cpu() to return -ENOENT if cpu_stop_queue_work() fails.
    Otherwise we know that ->executed must be true after wait_for_completion()
    so we can just return done.ret.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Tejun Heo
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Milos Vyletel
    Cc: Peter Zijlstra
    Cc: Prarit Bhargava
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20151115193320.GA8259@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • Change cpu_stop_queue_work() to return true if the work was queued and
    change stop_one_cpu_nowait() to return the result of cpu_stop_queue_work().
    This makes it more useful, for example now you can alloc cpu_stop_work for
    stop_one_cpu_nowait() and free it in the callback or if stop_one_cpu_nowait()
    fails, currently this is impossible because you can't know if @fn will be
    called or not.

    Also, this allows to kill cpu_stop_done->executed, see the next changes.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Tejun Heo
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Milos Vyletel
    Cc: Peter Zijlstra
    Cc: Prarit Bhargava
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20151117170523.GA13955@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • Now that stop_two_cpus() path does not check cpu_active() we can remove
    preempt_disable(), it was only needed to ensure that stop_machine() can
    not be called after we observe cpu_active() == T and before we queue the
    new work.

    Also, turn the pointless and confusing ->executed check into WARN_ON().
    We know that both works must be executed, otherwise we have a bug. And
    in fact I think that done->executed should die, see the next changes.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Tejun Heo
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Milos Vyletel
    Cc: Peter Zijlstra
    Cc: Prarit Bhargava
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20151115193314.GA8249@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • stop_one_cpu_nowait(fn) will crash the kernel if the callback returns
    nonzero, work->done == NULL in this case.

    This needs more cleanups, cpu_stop_signal_done() is called right after
    we check done != NULL and it does the same check.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Tejun Heo
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Milos Vyletel
    Cc: Peter Zijlstra
    Cc: Prarit Bhargava
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20151115193311.GA8242@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     

20 Oct, 2015

1 commit