Eric Lee / smarc-fsl-linux-kernel

27 Jul, 2016

1 commit

ce4f06dcb stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE ... Browse Code »

Suppose that stop_machine(fn) hangs because fn() hangs. In this case NMI
hard-lockup can be triggered on another CPU which does nothing wrong and
the trace from nmi_panic() won't help to investigate the problem.

And this change "fixes" the problem we (seem to) hit in practice.

- stop_two_cpus(0, 1) races with show_state_filter() running on CPU_0.

- CPU_1 already spins in MULTI_STOP_PREPARE state, it detects the soft
lockup and tries to report the problem.

- show_state_filter() enables preemption, CPU_0 calls multi_cpu_stop()
which goes to MULTI_STOP_DISABLE_IRQ state and disables interrupts.

- CPU_1 spends more than 10 seconds trying to flush the log buffer to
the slow serial console.

- NMI interrupt on CPU_0 (which now waits for CPU_1) calls nmi_panic().

Reported-by: Wang Shu
Signed-off-by: Oleg Nesterov
Reviewed-by: Thomas Gleixner
Cc: Andrew Morton
Cc: Dave Anderson
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Tejun Heo
Link: http://lkml.kernel.org/r/20160726185736.GB4088@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2016-07-27 17:12:11 +0800

17 Jan, 2016

1 commit

b493c3430 kernel/stop_machine.c: remove CONFIG_SMP dependencies ... Browse Code »

stop_machine.o is only built if CONFIG_SMP=y, so this ifdef always
evaluates to true.

[akpm@linux-foundation.org: remove now-unneeded ifdef]
Reported-by: Valentin Rothberg
Cc: Chris Wilson
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2016-01-17 03:17:24 +0800

06 Jan, 2016

1 commit

567bee280 Merge branch 'sched/urgent' into sched/core, to pick up fixes before merging new patches ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2016-01-06 18:02:29 +0800

13 Dec, 2015

1 commit

86fffe4a6 kernel: remove stop_machine() Kconfig dependency ... Browse Code »

Currently the full stop_machine() routine is only enabled on SMP if
module unloading is enabled, or if the CPUs are hotpluggable. This
leads to configurations where stop_machine() is broken as it will then
only run the callback on the local CPU with irqs disabled, and not stop
the other CPUs or run the callback on them.

For example, this breaks MTRR setup on x86 in certain configs since
ea8596bb2d8d379 ("kprobes/x86: Remove unused text_poke_smp() and
text_poke_smp_batch() functions") as the MTRR is only established on the
boot CPU.

This patch removes the Kconfig option for STOP_MACHINE and uses the SMP
and HOTPLUG_CPU config options to compile the correct stop_machine() for
the architecture, removing the false dependency on MODULE_UNLOAD in the
process.

Link: https://lkml.org/lkml/2014/10/8/124
References: https://bugs.freedesktop.org/show_bug.cgi?id=84794
Signed-off-by: Chris Wilson
Acked-by: Ingo Molnar
Cc: "Paul E. McKenney"
Cc: Pranith Kumar
Cc: Michal Hocko
Cc: Vladimir Davydov
Cc: Johannes Weiner
Cc: H. Peter Anvin
Cc: Tejun Heo
Cc: Iulia Manda
Cc: Andy Lutomirski
Cc: Rusty Russell
Cc: Peter Zijlstra
Cc: Chuck Ebbert
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chris Wilson
2015-12-13 02:15:34 +0800

23 Nov, 2015

8 commits

accaf6ea3 stop_machine: Clean up the usage of the preemption counter in cpu_stopper_thread() ... Browse Code »

1. Change this code to use preempt_count_inc/preempt_count_dec; this way
it works even if CONFIG_PREEMPT_COUNT=n, and we avoid the unnecessary
__preempt_schedule() check (stop_sched_class is not preemptible).

And this makes clear that we only want to make preempt_count() != 0
for __might_sleep() / schedule_debug().

2. Change WARN_ONCE() to use %pf to print the function name and remove
kallsyms_lookup/ksym_buf.

3. Move "int ret" into the "if (work)" block, this looks more consistent.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Tejun Heo
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Milos Vyletel
Cc: Peter Zijlstra
Cc: Prarit Bhargava
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20151115193332.GA8281@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2015-11-23 16:48:20 +0800
dd2e3121e stop_machine: Shift the 'done != NULL' check from cpu_stop_signal_done() to callers ... Browse Code »

Change cpu_stop_queue_work() and cpu_stopper_thread() to check done != NULL
before cpu_stop_signal_done(done). This makes the code more clean imo, note
that cpu_stopper_thread() has to do this check anyway.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Tejun Heo
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Milos Vyletel
Cc: Peter Zijlstra
Cc: Prarit Bhargava
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20151115193329.GA8274@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2015-11-23 16:48:19 +0800
6fa3b826b stop_machine: Kill cpu_stop_done->executed ... Browse Code »

Now that cpu_stop_done->executed becomes write-only (ignoring WARN_ON()
checks) we can remove it.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Tejun Heo
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Milos Vyletel
Cc: Peter Zijlstra
Cc: Prarit Bhargava
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20151115193326.GA8269@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2015-11-23 16:48:19 +0800
4aff1ca69 stop_machine: Change __stop_cpus() to rely on cpu_stop_queue_work() ... Browse Code »

Change queue_stop_cpus_work() to return true if it queues at least one
work, this means that the caller should wait.

__stop_cpus() can check the value returned by queue_stop_cpus_work() and
avoid done.executed, just like stop_one_cpu() does.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Tejun Heo
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Milos Vyletel
Cc: Peter Zijlstra
Cc: Prarit Bhargava
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20151115193323.GA8262@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2015-11-23 16:48:19 +0800
958c5f848 stop_machine: Change stop_one_cpu() to rely on cpu_stop_queue_work() ... Browse Code »

Change stop_one_cpu() to return -ENOENT if cpu_stop_queue_work() fails.
Otherwise we know that ->executed must be true after wait_for_completion()
so we can just return done.ret.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Tejun Heo
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Milos Vyletel
Cc: Peter Zijlstra
Cc: Prarit Bhargava
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20151115193320.GA8259@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2015-11-23 16:48:18 +0800
1b034bd98 stop_machine: Make cpu_stop_queue_work() and stop_one_cpu_nowait() return bool ... Browse Code »

Change cpu_stop_queue_work() to return true if the work was queued and
change stop_one_cpu_nowait() to return the result of cpu_stop_queue_work().
This makes it more useful, for example now you can alloc cpu_stop_work for
stop_one_cpu_nowait() and free it in the callback or if stop_one_cpu_nowait()
fails, currently this is impossible because you can't know if @fn will be
called or not.

Also, this allows to kill cpu_stop_done->executed, see the next changes.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Tejun Heo
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Milos Vyletel
Cc: Peter Zijlstra
Cc: Prarit Bhargava
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20151117170523.GA13955@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2015-11-23 16:48:18 +0800
6a1900515 stop_machine: Don't disable preemption in stop_two_cpus() ... Browse Code »

Now that stop_two_cpus() path does not check cpu_active() we can remove
preempt_disable(), it was only needed to ensure that stop_machine() can
not be called after we observe cpu_active() == T and before we queue the
new work.

Also, turn the pointless and confusing ->executed check into WARN_ON().
We know that both works must be executed, otherwise we have a bug. And
in fact I think that done->executed should die, see the next changes.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Tejun Heo
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Milos Vyletel
Cc: Peter Zijlstra
Cc: Prarit Bhargava
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20151115193314.GA8249@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2015-11-23 16:48:18 +0800
64038f292 stop_machine: Fix possible cpu_stopper_thread() crash ... Browse Code »

stop_one_cpu_nowait(fn) will crash the kernel if the callback returns
nonzero, work->done == NULL in this case.

This needs more cleanups, cpu_stop_signal_done() is called right after
we check done != NULL and it does the same check.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Tejun Heo
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Milos Vyletel
Cc: Peter Zijlstra
Cc: Prarit Bhargava
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20151115193311.GA8242@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2015-11-23 16:48:17 +0800

20 Oct, 2015

6 commits

62694cd51 sched: Move cpu_active() tests from stop_two_cpus() into migrate_swap_stop() ... Browse Code »

The cpu_active() tests are not fundamentally part of stop_two_cpus(),
move then into the scheduler where they belong.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Signed-off-by: Ingo Molnar

Peter Zijlstra
2015-10-20 16:25:56 +0800
f0cf16cbd stop_machine: Kill cpu_stop_threads->setup() and cpu_stop_unpark() ... Browse Code »

Now that we always use stop_machine_unpark() to wake the stopper
threas up, we can kill ->setup() and fold cpu_stop_unpark() into
stop_machine_unpark().

And we do not need stopper->lock to set stopper->enabled = true.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20151009160051.GA10169@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2015-10-20 16:23:56 +0800
c00166d87 stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() ... Browse Code »

1. Change smpboot_unpark_thread() to check ->selfparking, just
like smpboot_park_thread() does.

2. Introduce stop_machine_unpark() which sets ->enabled and calls
kthread_unpark().

3. Change smpboot_thread_call() and cpu_stop_init() to call
stop_machine_unpark() by hand.

This way:

- IMO the ->selfparking logic becomes more consistent.

- We can kill the smp_hotplug_thread->pre_unpark() method.

- We can easily unpark the stopper thread earlier. Say, we
can move stop_machine_unpark() from smpboot_thread_call()
to sched_cpu_active() as Peter suggests.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20151009160049.GA10166@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2015-10-20 16:23:55 +0800
d8bc85358 stop_machine: Change cpu_stop_queue_two_works() to rely on stopper->enabled ... Browse Code »

Change cpu_stop_queue_two_works() to ensure that both CPU's have
stopper->enabled == T or fail otherwise.

This way stop_two_cpus() no longer needs to check cpu_active() to
avoid the deadlock. This patch doesn't remove these checks, we will
do this later.

Note: we need to take both stopper->lock's at the same time, but this
will also help to remove lglock from stop_machine.c, so I hope this
is fine.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20151008170141.GA25537@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2015-10-20 16:23:55 +0800
5caa1c089 stop_machine: Introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works() ... Browse Code »

Preparation to simplify the review of the next change. Add two simple
helpers, __cpu_stop_queue_work() and cpu_stop_queue_two_works() which
simply take a bit of code from their callers.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20151008145134.GA18146@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2015-10-20 16:23:54 +0800
233e7f267 stop_machine: Ensure that a queued callback will be called before cpu_stop_park() ... Browse Code »

cpu_stop_queue_work() checks stopper->enabled before it queues the
work, but ->enabled == T can only guarantee cpu_stop_signal_done()
if we race with cpu_down().

This is not enough for stop_two_cpus() or stop_machine(), they will
deadlock if multi_cpu_stop() won't be called by one of the target
CPU's. stop_machine/stop_cpus are fine, they rely on stop_cpus_mutex.
But stop_two_cpus() has to check cpu_active() to avoid the same race
with hotplug, and this check is very unobvious and probably not even
correct if we race with cpu_up().

Change cpu_down() pass to clear ->enabled before cpu_stopper_thread()
flushes the pending ->works and returns with KTHREAD_SHOULD_PARK set.

Note also that smpboot_thread_call() calls cpu_stop_unpark() which
sets enabled == T at CPU_ONLINE stage, so this CPU can't go away until
cpu_stopper_thread() is called at least once. This all means that if
cpu_stop_queue_work() succeeds, we know that work->fn() will be called.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20151008145131.GA18139@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2015-10-20 16:23:53 +0800

03 Aug, 2015

5 commits

d308b9f1e stop_machine: Remove cpu_stop_work's from list in cpu_stop_park() ... Browse Code »

cpu_stop_park() does cpu_stop_signal_done() but leaves the work on
stopper->works. The owner of this work can free/reuse this memory
right after that and corrupt the list, so if this CPU becomes online
again cpu_stopper_thread() will crash.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: dave@stgolabs.net
Cc: der.herr@hofr.at
Cc: paulmck@linux.vnet.ibm.com
Cc: riel@redhat.com
Cc: viro@ZenIV.linux.org.uk
Link: http://lkml.kernel.org/r/20150630012958.GA23944@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2015-08-03 18:21:28 +0800
9a301f22f stop_machine: Use 'cpu_stop_fn_t' where possible ... Browse Code »

Cosmetic, but 'cpu_stop_fn_t' actually makes the code more readable and
it doesn't break cscope. And most of the declarations already use it.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: dave@stgolabs.net
Cc: der.herr@hofr.at
Cc: paulmck@linux.vnet.ibm.com
Cc: riel@redhat.com
Cc: viro@ZenIV.linux.org.uk
Link: http://lkml.kernel.org/r/20150630012955.GA23937@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2015-08-03 18:21:27 +0800
7eeb088e7 stop_machine: Unexport __stop_machine() ... Browse Code »

The only caller outside of stop_machine.c is _cpu_down(), it can use
stop_machine(). get_online_cpus() is fine under cpu_hotplug_begin().

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: dave@stgolabs.net
Cc: der.herr@hofr.at
Cc: paulmck@linux.vnet.ibm.com
Cc: riel@redhat.com
Cc: viro@ZenIV.linux.org.uk
Link: http://lkml.kernel.org/r/20150630012951.GA23934@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2015-08-03 18:21:26 +0800
b377c2a08 stop_machine: Don't do for_each_cpu() twice in queue_stop_cpus_work() ... Browse Code »

queue_stop_cpus_work() can do everything in one for_each_cpu() loop.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: dave@stgolabs.net
Cc: der.herr@hofr.at
Cc: paulmck@linux.vnet.ibm.com
Cc: riel@redhat.com
Cc: viro@ZenIV.linux.org.uk
Link: http://lkml.kernel.org/r/20150630012948.GA23927@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2015-08-03 18:21:26 +0800
02cb7aa92 stop_machine: Move 'cpu_stopper_task' and 'stop_cpus_work' into 'struct cpu_stopper' ... Browse Code »

Multpiple DEFINE_PER_CPU's do not make sense, move all the per-cpu
variables into 'struct cpu_stopper'.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: dave@stgolabs.net
Cc: der.herr@hofr.at
Cc: paulmck@linux.vnet.ibm.com
Cc: riel@redhat.com
Cc: viro@ZenIV.linux.org.uk
Link: http://lkml.kernel.org/r/20150630012944.GA23924@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2015-08-03 18:21:25 +0800

19 Jun, 2015

1 commit

b17718d02 sched/stop_machine: Fix deadlock between multiple stop_two_cpus() ... Browse Code »

Jiri reported a machine stuck in multi_cpu_stop() with
migrate_swap_stop() as function and with the following src,dst cpu
pairs: {11, 4} {13, 11} { 4, 13}

4 11 13

cpuM: queue(4 ,13)
*Ma
cpuN: queue(13,11)
*N Na
*M Mb
cpuO: queue(11, 4)
*O Oa
*Nb
*Ob

Where *X denotes the cpu running the queueing of cpu-X and X[ab] denotes
the first/second queued work.

You'll observe the top of the workqueue for each cpu: 4,11,13 to be work
from cpus: M, O, N resp. IOW. deadlock.

Do away with the queueing trickery and introduce lg_double_lock() to
lock both CPUs and fully serialize the stop_two_cpus() callers instead
of the partial (and buggy) serialization we have now.

Reported-by: Jiri Olsa
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Borislav Petkov
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20150605153023.GH19282@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2015-06-19 16:03:12 +0800

05 Jun, 2014

1 commit

cf2500406 kernel/stop_machine.c: kernel-doc warning fix ... Browse Code »

Signed-off-by: Fabian Frederick
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fabian Frederick
2014-06-05 07:54:15 +0800

11 Mar, 2014

1 commit

177c53d94 stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() ... Browse Code »

We must use smp_call_function_single(.wait=1) for the
irq_cpu_stop_queue_work() to ensure the queueing is actually done under
stop_cpus_lock. Without this we could have dropped the lock by the time
we do the queueing and get the race we tried to fix.

Fixes: 7053ea1a34fa ("stop_machine: Fix race between stop_two_cpus() and stop_cpus()")

Signed-off-by: Peter Zijlstra
Cc: Prarit Bhargava
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Christoph Hellwig
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/20140228123905.GK3104@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2014-03-11 18:33:47 +0800

11 Nov, 2013

1 commit

7053ea1a3 stop_machine: Fix race between stop_two_cpus() and stop_cpus() ... Browse Code »

There is a race between stop_two_cpus, and the global stop_cpus.

It is possible for two CPUs to get their stopper functions queued
"backwards" from one another, resulting in the stopper threads
getting stuck, and the system hanging. This can happen because
queuing up stoppers is not synchronized.

This patch adds synchronization between stop_cpus (a rare operation),
and stop_two_cpus.

Reported-and-Tested-by: Prarit Bhargava
Signed-off-by: Rik van Riel
Signed-off-by: Peter Zijlstra
Acked-by: Mel Gorman
Link: http://lkml.kernel.org/r/20131101104146.03d1e043@annuminas.surriel.com
Signed-off-by: Ingo Molnar

Rik van Riel
2013-11-11 19:43:38 +0800

16 Oct, 2013

1 commit

6acce3ef8 sched: Remove get_online_cpus() usage ... Browse Code »

Remove get_online_cpus() usage from the scheduler; there's 4 sites that
use it:

- sched_init_smp(); where its completely superfluous since we're in
'early' boot and there simply cannot be any hotplugging.

- sched_getaffinity(); we already take a raw spinlock to protect the
task cpus_allowed mask, this disables preemption and therefore
also stabilizes cpu_online_mask as that's modified using
stop_machine. However switch to active mask for symmetry with
sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active
mask stability by inserting sync_rcu/sched() into _cpu_down.

- sched_setaffinity(); we don't appear to need get_online_cpus()
either, there's two sites where hotplug appears relevant:
* cpuset_cpus_allowed(); for the !cpuset case we use possible_mask,
for the cpuset case we hold task_lock, which is a spinlock and
thus for mainline disables preemption (might cause pain on RT).
* set_cpus_allowed_ptr(); Holds all scheduler locks and thus has
preemption properly disabled; also it already deals with hotplug
races explicitly where it releases them.

- migrate_swap(); we can make stop_two_cpus() do the heavy lifting for
us with a little trickery. By adding a sync_sched/rcu() after the
CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for
cpu_active_mask. Use these to validate that both our cpus are active
when queueing the stop work before we queue the stop_machine works
for take_cpu_down().

Signed-off-by: Peter Zijlstra
Cc: "Srivatsa S. Bhat"
Cc: Paul McKenney
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Srikar Dronamraju
Cc: Andrea Arcangeli
Cc: Johannes Weiner
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Steven Rostedt
Cc: Oleg Nesterov
Link: http://lkml.kernel.org/r/20131011123820.GV3081@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2013-10-16 20:22:16 +0800

09 Oct, 2013

1 commit

1be0bd77c stop_machine: Introduce stop_two_cpus() ... Browse Code »

Introduce stop_two_cpus() in order to allow controlled swapping of two
tasks. It repurposes the stop_machine() state machine but only stops
the two cpus which we can do with on-stack structures and avoid
machine wide synchronization issues.

The ordering of CPUs is important to avoid deadlocks. If unordered then
two cpus calling stop_two_cpus on each other simultaneously would attempt
to queue in the opposite order on each CPU causing an AB-BA style deadlock.
By always having the lowest number CPU doing the queueing of works, we can
guarantee that works are always queued in the same order, and deadlocks
are avoided.

Signed-off-by: Peter Zijlstra
[ Implemented deadlock avoidance. ]
Signed-off-by: Rik van Riel
Cc: Andrea Arcangeli
Cc: Johannes Weiner
Cc: Srikar Dronamraju
Signed-off-by: Mel Gorman
Link: http://lkml.kernel.org/r/1381141781-10992-38-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar

Peter Zijlstra
2013-10-09 18:40:45 +0800

27 Feb, 2013

1 commit

46c498c2c stop_machine: Mark per cpu stopper enabled early ... Browse Code »

commit 14e568e78 (stop_machine: Use smpboot threads) introduced the
following regression:

Before this commit the stopper enabled bit was set in the online
notifier.

CPU0 CPU1
cpu_up
cpu online
hotplug_notifier(ONLINE)
stopper(CPU1)->enabled = true;
...
stop_machine()

The conversion to smpboot threads moved the enablement to the wakeup
path of the parked thread. The majority of users seem to have the
following working order:

CPU0 CPU1
cpu_up
cpu online
unpark_threads()
wakeup(stopper[CPU1])
....
stopper thread runs
stopper(CPU1)->enabled = true;
stop_machine()

But Konrad and Sander have observed:

CPU0 CPU1
cpu_up
cpu online
unpark_threads()
wakeup(stopper[CPU1])
....
stop_machine()
stopper thread runs
stopper(CPU1)->enabled = true;

Now the stop machinery kicks CPU0 into the stop loop, where it gets
stuck forever because the queue code saw stopper(CPU1)->enabled ==
false, so CPU0 waits for CPU1 to enter stomp_machine, but the CPU1
stopper work got discarded due to enabled == false.

Add a pre_unpark function to the smpboot thread descriptor and call it
before waking the thread.

This fixes the problem at hand, but the stop_machine code should be
more robust. The stopper->enabled flag smells fishy at best.

Thanks to Konrad for going through a loop of debug patches and
providing the information to decode this issue.

Reported-and-tested-by: Konrad Rzeszutek Wilk
Reported-and-tested-by: Sander Eikelenboom
Cc: Srivatsa S. Bhat
Cc: Rusty Russell
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302261843240.22263@ionos
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2013-02-27 05:25:17 +0800

14 Feb, 2013

2 commits

14e568e78 stop_machine: Use smpboot threads ... Browse Code »

Use the smpboot thread infrastructure. Mark the stopper thread
selfparking and park it after it has finished the take_cpu_down()
work.

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Rusty Russell
Cc: Paul McKenney
Cc: Srivatsa S. Bhat
Cc: Arjan van de Veen
Cc: Paul Turner
Cc: Richard Weinberger
Cc: Magnus Damm
Link: http://lkml.kernel.org/r/20130131120741.686315164@linutronix.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2013-02-14 22:29:38 +0800
860a0ffaa stop_machine: Store task reference in a separate per cpu variable ... Browse Code »

To allow the stopper thread being managed by the smpboot thread
infrastructure separate out the task storage from the stopper data
structure.

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Rusty Russell
Cc: Paul McKenney
Cc: Srivatsa S. Bhat
Cc: Arjan van de Veen
Cc: Paul Turner
Cc: Richard Weinberger
Cc: Magnus Damm
Link: http://lkml.kernel.org/r/20130131120741.626690384@linutronix.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2013-02-14 22:29:37 +0800

07 Nov, 2011

1 commit

32aaeffbd Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux ... Browse Code »

* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include
net: sch_generic remove redundant use of
net: inet_timewait_sock doesnt need
...

Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h

Linus Torvalds
2011-11-07 11:44:47 +0800

01 Nov, 2011

1 commit

f445027e4 stop_machine: make stop_machine safe and efficient to call early ... Browse Code »

Make stop_machine() safe to call early in boot, before SMP has been set
up, by simply calling the callback function directly if there's only one
CPU online.

[ Fixes from AKPM:
- add comment
- local_irq_flags, not save_flags
- also call hard_irq_disable() for systems which need it

Tejun suggested using an explicit flag rather than just looking at
the online cpu count. ]

Cc: Tejun Heo
Acked-by: Rusty Russell
Cc: Peter Zijlstra
Cc: H. Peter Anvin
Cc: Ingo Molnar
Cc: Steven Rostedt
Acked-by: Tejun Heo
Cc: Konrad Rzeszutek Wilk
Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeremy Fitzhardinge
2011-11-01 08:30:53 +0800

31 Oct, 2011

1 commit

9984de1a5 kernel: Map most files to use export.h instead of module.h ... Browse Code »

The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else. Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

-#include
+#include

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800

27 Jul, 2011

1 commit

60063497a atomic: use <linux/atomic.h> ... Browse Code »

This allows us to move duplicated code in
(atomic_inc_not_zero() for now) to

Signed-off-by: Arun Sharma
Reviewed-by: Eric Dumazet
Cc: Ingo Molnar
Cc: David Miller
Cc: Eric Dumazet
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arun Sharma
2011-07-27 07:49:47 +0800

28 Jun, 2011

4 commits

192d88574 x86, mtrr: use stop_machine APIs for doing MTRR rendezvous ... Browse Code »

MTRR rendezvous sequence is not implemened using stop_machine() before, as this
gets called both from the process context aswell as the cpu online paths
(where the cpu has not come online and the interrupts are disabled etc).

Now that we have a new stop_machine_from_inactive_cpu() API, use it for
rendezvous during mtrr init of a logical processor that is coming online.

For the rest (runtime MTRR modification, system boot, resume paths), use
stop_machine() to implement the rendezvous sequence. This will consolidate and
cleanup the code.

Signed-off-by: Suresh Siddha
Link: http://lkml.kernel.org/r/20110623182057.076997177@sbsiddha-MOBL3.sc.intel.com
Signed-off-by: H. Peter Anvin

Suresh Siddha
2011-06-28 06:17:13 +0800
f740e6cd0 stop_machine: implement stop_machine_from_inactive_cpu() ... Browse Code »

Currently, mtrr wants stop_machine functionality while a CPU is being
brought up. As stop_machine() requires the calling CPU to be active,
mtrr implements its own stop_machine using stop_one_cpu() on each
online CPU. This doesn't only unnecessarily duplicate complex logic
but also introduces a possibility of deadlock when it races against
the generic stop_machine().

This patch implements stop_machine_from_inactive_cpu() to serve such
use cases. Its functionality is basically the same as stop_machine();
however, it should be called from a CPU which isn't active and doesn't
depend on working scheduling on the calling CPU.

This is achieved by using busy loops for synchronization and
open-coding stop_cpus queuing and waiting with direct invocation of
fn() for local CPU inbetween.

Signed-off-by: Tejun Heo
Link: http://lkml.kernel.org/r/20110623182056.982526827@sbsiddha-MOBL3.sc.intel.com
Signed-off-by: Suresh Siddha
Cc: Ingo Molnar
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Peter Zijlstra
Signed-off-by: H. Peter Anvin

Tejun Heo
2011-06-28 06:17:08 +0800
fd7355ba1 stop_machine: reorganize stop_cpus() implementation ... Browse Code »

Refactor the queuing part of the stop cpus work from __stop_cpus() into
queue_stop_cpus_work().

The reorganization is to help future improvements to stop_machine()
and doesn't introduce any behavior difference.

Signed-off-by: Tejun Heo
Link: http://lkml.kernel.org/r/20110623182056.897818337@sbsiddha-MOBL3.sc.intel.com
Signed-off-by: Suresh Siddha
Cc: Ingo Molnar
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Peter Zijlstra
Signed-off-by: H. Peter Anvin

Tejun Heo
2011-06-28 06:17:07 +0800
6d3321e8e x86, mtrr: lock stop machine during MTRR rendezvous sequence ... Browse Code »
1

MTRR rendezvous sequence using stop_one_cpu_nowait() can potentially
happen in parallel with another system wide rendezvous using
stop_machine(). This can lead to deadlock (The order in which
works are queued can be different on different cpu's. Some cpu's
will be running the first rendezvous handler and others will be running
the second rendezvous handler. Each set waiting for the other set to join
for the system wide rendezvous, leading to a deadlock).

MTRR rendezvous sequence is not implemented using stop_machine() as this
gets called both from the process context aswell as the cpu online paths
(where the cpu has not come online and the interrupts are disabled etc).
stop_machine() works with only online cpus.

For now, take the stop_machine mutex in the MTRR rendezvous sequence that
gets called from an online cpu (here we are in the process context
and can potentially sleep while taking the mutex). And the MTRR rendezvous
that gets triggered during cpu online doesn't need to take this stop_machine
lock (as the stop_machine() already ensures that there is no cpu hotplug
going on in parallel by doing get_online_cpus())

TBD: Pursue a cleaner solution of extending the stop_machine()
infrastructure to handle the case where the calling cpu is
still not online and use this for MTRR rendezvous sequence.

fixes: https://bugzilla.novell.com/show_bug.cgi?id=672008

Reported-by: Vadim Kotelnikov
Signed-off-by: Suresh Siddha
Link: http://lkml.kernel.org/r/20110623182056.807230326@sbsiddha-MOBL3.sc.intel.com
Cc: stable@kernel.org # 2.6.35+, backport a week or two after this gets more testing in mainline
Signed-off-by: H. Peter Anvin

Suresh Siddha
2011-06-28 05:00:46 +0800