Eric Lee / smarc-fsl-linux-kernel

04 Dec, 2015

1 commit

af6f28b55 tracing/sched: Add trace events to track cpu hotplug. ... Browse Code »

Add ftrace event trace_sched_cpu_hotplug to track cpu
hot-add and hot-remove events.

This is useful in a variety of power, performance and
debug analysis scenarios.

Change-Id: I5d202c7a229ffacc3aafb7cf9afee0b0ee7b0931
Signed-off-by: Arun Bharadwaj

Arun Bharadwaj
10 years ago

20 Nov, 2015

1 commit

b0ed9563b Move x86_64 idle notifiers to generic ... Browse Code »

Move the x86_64 idle notifiers originally by Andi Kleen and Venkatesh
Pallipadi to generic.

Change-Id: Idf29cda15be151f494ff245933c12462643388d5
Acked-by: Nicolas Pitre
Signed-off-by: Todd Poynor

Todd Poynor
10 years ago

15 Apr, 2015

1 commit

078838d56 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull RCU changes from Ingo Molnar:
"The main changes in this cycle were:

- changes permitting use of call_rcu() and friends very early in
boot, for example, before rcu_init() is invoked.

- add in-kernel API to enable and disable expediting of normal RCU
grace periods.

- improve RCU's handling of (hotplug-) outgoing CPUs.

- NO_HZ_FULL_SYSIDLE fixes.

- tiny-RCU updates to make it more tiny.

- documentation updates.

- miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
cpu: Provide smpboot_thread_init() on !CONFIG_SMP kernels as well
cpu: Defer smpboot kthread unparking until CPU known to scheduler
rcu: Associate quiescent-state reports with grace period
rcu: Yet another fix for preemption and CPU hotplug
rcu: Add diagnostics to grace-period cleanup
rcutorture: Default to grace-period-initialization delays
rcu: Handle outgoing CPUs on exit from idle loop
cpu: Make CPU-offline idle-loop transition point more precise
rcu: Eliminate ->onoff_mutex from rcu_node structure
rcu: Process offlining and onlining only at grace-period start
rcu: Move rcu_report_unblock_qs_rnp() to common code
rcu: Rework preemptible expedited bitmask handling
rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUs
rcutorture: Enable slow grace-period initializations
rcu: Provide diagnostic option to slow down grace-period initialization
rcu: Detect stalls caused by failure to propagate up rcu_node tree
rcu: Eliminate empty HOTPLUG_CPU ifdef
rcu: Simplify sync_rcu_preempt_exp_init()
rcu: Put all orphan-callback-related code under same comment
rcu: Consolidate offline-CPU callback initialization
...

Linus Torvalds
10 years ago

13 Apr, 2015

1 commit

00df35f99 cpu: Defer smpboot kthread unparking until CPU known to scheduler ... Browse Code »

Currently, smpboot_unpark_threads() is invoked before the incoming CPU
has been added to the scheduler's runqueue structures. This might
potentially cause the unparked kthread to run on the wrong CPU, since the
correct CPU isn't fully set up yet.

That causes a sporadic, hard to debug boot crash triggering on some
systems, reported by Borislav Petkov, and bisected down to:

2a442c9c6453 ("x86: Use common outgoing-CPU-notification code")

This patch places smpboot_unpark_threads() in a CPU hotplug
notifier with priority set so that these kthreads are unparked just after
the CPU has been added to the runqueues.

Reported-and-tested-by: Borislav Petkov
Signed-off-by: Paul E. McKenney
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Paul E. McKenney
10 years ago

03 Apr, 2015

2 commits

a49b116dc clockevents: Cleanup dead cpu explicitely ... Browse Code »

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the cleanup function for a dead cpu and invoke it
directly from the cpu down code. Make it conditional on
CPU_HOTPLUG as well.

Temporary change, will be refined in the future.

Signed-off-by: Thomas Gleixner
[ Rebased, added clockevents_notify() removal ]
Signed-off-by: Rafael J. Wysocki
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1735025.raBZdQHM3m@vostro.rjw.lan
Signed-off-by: Ingo Molnar

Thomas Gleixner
10 years ago
52c063d1a clockevents: Make tick handover explicit ... Browse Code »

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the tick_handover call and invoke it explicitely from
the hotplug code. Temporary solution will be cleaned up in later
patches.

Signed-off-by: Thomas Gleixner
[ Rebase ]
Signed-off-by: Rafael J. Wysocki
Cc: Peter Zijlstra
Cc: John Stultz
Cc: Linus Torvalds
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/1658173.RkEEILFiQZ@vostro.rjw.lan
Signed-off-by: Ingo Molnar

Thomas Gleixner
10 years ago

02 Apr, 2015

1 commit

345527b1e clockevents: Fix cpu_down() race for hrtimer based broadcasting ... Browse Code »

It was found when doing a hotplug stress test on POWER, that the
machine either hit softlockups or rcu_sched stall warnings. The
issue was traced to commit:

7cba160ad789 ("powernv/cpuidle: Redesign idle states management")

which exposed the cpu_down() race with hrtimer based broadcast mode:

5d1638acb9f6 ("tick: Introduce hrtimer based broadcast")

The race is the following:

Assume CPU1 is the CPU which holds the hrtimer broadcasting duty
before it is taken down.

CPU0 CPU1

cpu_down() take_cpu_down()
disable_interrupts()

cpu_die()

while (CPU1 != CPU_DEAD) {
msleep(100);
switch_to_idle();
stop_cpu_timer();
schedule_broadcast();
}

tick_cleanup_cpu_dead()
take_over_broadcast()

So after CPU1 disabled interrupts it cannot handle the broadcast
hrtimer anymore, so CPU0 will be stuck forever.

Fix this by explicitly taking over broadcast duty before cpu_die().

This is a temporary workaround. What we really want is a callback
in the clockevent device which allows us to do that from the dying
CPU by pushing the hrtimer onto a different cpu. That might involve
an IPI and is definitely more complex than this immediate fix.

Changelog was picked up from:

https://lkml.org/lkml/2015/2/16/213

Suggested-by: Thomas Gleixner
Tested-by: Nicolas Pitre
Signed-off-by: Preeti U. Murthy
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: nicolas.pitre@linaro.org
Cc: peterz@infradead.org
Cc: rjw@rjwysocki.net
Fixes: http://linuxppc.10917.n7.nabble.com/offlining-cpus-breakage-td88619.html
Link: http://lkml.kernel.org/r/20150330092410.24979.59887.stgit@preeti.in.ibm.com
[ Merged it to the latest timer tree, renamed the callback, tidied up the changelog. ]
Signed-off-by: Ingo Molnar

Preeti U Murthy
10 years ago

13 Mar, 2015

1 commit

528a25b00 cpu: Make CPU-offline idle-loop transition point more precise ... Browse Code »

This commit uses a per-CPU variable to make the CPU-offline code path
through the idle loop more precise, so that the outgoing CPU is
guaranteed to make it into the idle loop before it is powered off.
This commit is in preparation for putting the RCU offline-handling
code on this code path, which will eliminate the magic one-jiffy
wait that RCU uses as the maximum time for an outgoing CPU to get
all the way through the scheduler.

The magic one-jiffy wait for incoming CPUs remains a separate issue.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
10 years ago

07 Jan, 2015

1 commit

87af9e7ff hotplugcpu: Avoid deadlocks by waking active_writer ... Browse Code »

Commit b2c4623dcd07 ("rcu: More on deadlock between CPU hotplug and expedited
grace periods") introduced another problem that can easily be reproduced by
starting/stopping cpus in a loop.

E.g.:
for i in `seq 5000`; do
echo 1 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu1/online
done

Will result in:
INFO: task /cpu_start_stop:1 blocked for more than 120 seconds.
Call Trace:
([] __schedule+0x406/0x91c)
[] cpu_hotplug_begin+0xd0/0xd4
[] _cpu_up+0x3e/0x1c4
[] cpu_up+0xb6/0xd4
[] device_online+0x80/0xc0
[] online_store+0x90/0xb0
...

And a deadlock.

Problem is that if the last ref in put_online_cpus() can't get the
cpu_hotplug.lock the puts_pending count is incremented, but a sleeping
active_writer might never be woken up, therefore never exiting the loop in
cpu_hotplug_begin().

This fix removes puts_pending and turns refcount into an atomic variable. We
also introduce a wait queue for the active_writer, to avoid possible races and
use-after-free. There is no need to take the lock in put_online_cpus() anymore.

Can't reproduce it with this fix.

Signed-off-by: David Hildenbrand
Signed-off-by: Paul E. McKenney

David Hildenbrand
11 years ago

04 Nov, 2014

1 commit

62db99f47 cpu: Avoid puts_pending overflow ... Browse Code »

A long string of get_online_cpus() with each followed by a
put_online_cpu() that fails to acquire cpu_hotplug.lock can result in
overflow of the cpu_hotplug.puts_pending counter. Although this is
perhaps improbably, a system with absolutely no CPU-hotplug operations
will have an arbitrarily long time in which this overflow could occur.
This commit therefore adds overflow checks to get_online_cpus() and
try_get_online_cpus().

Signed-off-by: Paul E. McKenney
Reviewed-by: Pranith Kumar

Paul E. McKenney
11 years ago

23 Oct, 2014

1 commit

b2c4623dc rcu: More on deadlock between CPU hotplug and expedited grace periods ... Browse Code »

Commit dd56af42bd82 (rcu: Eliminate deadlock between CPU hotplug and
expedited grace periods) was incomplete. Although it did eliminate
deadlocks involving synchronize_sched_expedited()'s acquisition of
cpu_hotplug.lock via get_online_cpus(), it did nothing about the similar
deadlock involving acquisition of this same lock via put_online_cpus().
This deadlock became apparent with testing involving hibernation.

This commit therefore changes put_online_cpus() acquisition of this lock
to be conditional, and increments a new cpu_hotplug.puts_pending field
in case of acquisition failure. Then cpu_hotplug_begin() checks for this
new field being non-zero, and applies any changes to cpu_hotplug.refcount.

Reported-by: Jiri Kosina
Signed-off-by: Paul E. McKenney
Tested-by: Jiri Kosina
Tested-by: Borislav Petkov

Paul E. McKenney
11 years ago

19 Sep, 2014

1 commit

dd56af42b rcu: Eliminate deadlock between CPU hotplug and expedited grace periods ... Browse Code »

Currently, the expedited grace-period primitives do get_online_cpus().
This greatly simplifies their implementation, but means that calls
to them holding locks that are acquired by CPU-hotplug notifiers (to
say nothing of calls to these primitives from CPU-hotplug notifiers)
can deadlock. But this is starting to become inconvenient, as can be
seen here: https://lkml.org/lkml/2014/8/5/754. The problem in this
case is that some developers need to acquire a mutex from a CPU-hotplug
notifier, but also need to hold it across a synchronize_rcu_expedited().
As noted above, this currently results in deadlock.

This commit avoids the deadlock and retains the simplicity by creating
a try_get_online_cpus(), which returns false if the get_online_cpus()
reference count could not immediately be incremented. If a call to
try_get_online_cpus() returns true, the expedited primitives operate as
before. If a call returns false, the expedited primitives fall back to
normal grace-period operations. This falling back of course results in
increased grace-period latency, but only during times when CPU hotplug
operations are actually in flight. The effect should therefore be
negligible during normal operation.

Signed-off-by: Paul E. McKenney
Cc: Josh Triplett
Cc: "Rafael J. Wysocki"
Tested-by: Lan Tianyu

Paul E. McKenney
11 years ago

05 Jul, 2014

1 commit

b728ca060 sched: Rework check_for_tasks() ... Browse Code »

1) Iterate thru all of threads in the system.
Check for all threads, not only for group leaders.

2) Check for p->on_rq instead of p->state and cputime.
Preempted task in !TASK_RUNNING state OR just
created task may be queued, that we want to be
reported too.

3) Use read_lock() instead of write_lock().
This function does not change any structures, and
read_lock() is enough.

Signed-off-by: Kirill Tkhai
Reviewed-by: Srikar Dronamraju
Cc: Andrew Morton
Cc: Ben Segall
Cc: Fabian Frederick
Cc: Gautham R. Shenoy
Cc: Konstantin Khorenko
Cc: Linus Torvalds
Cc: Michael wang
Cc: Mike Galbraith
Cc: Paul Gortmaker
Cc: Paul Turner
Cc: Rafael J. Wysocki
Cc: Srivatsa S. Bhat
Cc: Todd E Brandt
Cc: Toshi Kani
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1403684395.3462.44.camel@tkhai
Signed-off-by: Ingo Molnar

Kirill Tkhai
11 years ago

13 Jun, 2014

1 commit

19c1940fe Merge tag 'pm+acpi-3.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull more ACPI and power management updates from Rafael Wysocki:
"These are fixups on top of the previous PM+ACPI pull request,
regression fixes (ACPI hotplug, cpufreq ppc-corenet), other bug fixes
(ACPI reset, cpufreq), new PM trace points for system suspend
profiling and a copyright notice update.

Specifics:

- I didn't remember correctly that the Hans de Goede's ACPI video
patches actually didn't flip the video.use_native_backlight
default, although we had discussed that and decided to do that.
Since I said we would do that in the previous PM+ACPI pull request,
make that change for real now.

- ACPI bus check notifications for PCI host bridges don't cause the
bus below the host bridge to be checked for changes as they should
because of a mistake in the ACPI-based PCI hotplug (ACPIPHP)
subsystem that forgets to add hotplug contexts to PCI host bridge
ACPI device objects. Create hotplug contexts for PCI host bridges
too as appropriate.

- Revert recent cpufreq commit related to the big.LITTLE cpufreq
driver that breaks arm64 builds.

- Fix for a regression in the ppc-corenet cpufreq driver introduced
during the 3.15 cycle and causing the driver to use the remainder
from do_div instead of the quotient. From Ed Swarthout.

- Resets triggered by panic activate a BUG_ON() in vmalloc.c on
systems where the ACPI reset register is located in memory address
space. Fix from Randy Wright.

- Fix for a problem with cpufreq governors that decisions made by
them may be suboptimal due to the fact that deferrable timers are
used by them for CPU load sampling. From Srivatsa S Bhat.

- Fix for a problem with the Tegra cpufreq driver where the CPU
frequency is temporarily switched to a "stable" level that is
different from both the initial and target frequencies during
transitions which causes udelay() to expire earlier than it should
sometimes. From Viresh Kumar.

- New trace points and rework of some existing trace points for
system suspend/resume profiling from Todd Brandt.

- Assorted cpufreq fixes and cleanups from Stratos Karafotis and
Viresh Kumar.

- Copyright notice update for suspend-and-cpuhotplug.txt from
Srivatsa S Bhat"

* tag 'pm+acpi-3.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / hotplug / PCI: Add hotplug contexts to PCI host bridges
PM / sleep: trace events for device PM callbacks
cpufreq: cpufreq-cpu0: remove dependency on THERMAL and REGULATOR
cpufreq: tegra: update comment for clarity
cpufreq: intel_pstate: Remove duplicate CPU ID check
cpufreq: Mark CPU0 driver with CPUFREQ_NEED_INITIAL_FREQ_CHECK flag
PM / Documentation: Update copyright in suspend-and-cpuhotplug.txt
cpufreq: governor: remove copy_prev_load from 'struct cpu_dbs_common_info'
cpufreq: governor: Be friendly towards latency-sensitive bursty workloads
PM / sleep: trace events for suspend/resume
cpufreq: ppc-corenet-cpu-freq: do_div use quotient
Revert "cpufreq: Enable big.LITTLE cpufreq driver on arm64"
cpufreq: Tegra: implement intermediate frequency callbacks
cpufreq: add support for intermediate (stable) frequencies
ACPI / video: Change the default for video.use_native_backlight to 1
ACPI: Fix bug when ACPI reset register is implemented in system memory

Linus Torvalds
11 years ago

12 Jun, 2014

1 commit

d715a226b Merge branch 'pm-sleep' ... Browse Code »

* pm-sleep:
PM / sleep: trace events for device PM callbacks
PM / sleep: trace events for suspend/resume

Rafael J. Wysocki
11 years ago

07 Jun, 2014

1 commit

bb3632c61 PM / sleep: trace events for suspend/resume ... Browse Code »

Adds trace events that give finer resolution into suspend/resume. These
events are graphed in the timelines generated by the analyze_suspend.py
script. They represent large areas of time consumed that are typical to
suspend and resume.

The event is triggered by calling the function "trace_suspend_resume"
with three arguments: a string (the name of the event to be displayed
in the timeline), an integer (case specific number, such as the power
state or cpu number), and a boolean (where true is used to denote the start
of the timeline event, and false to denote the end).

The suspend_resume trace event reproduces the data that the machine_suspend
trace event did, so the latter has been removed.

Signed-off-by: Todd Brandt
Acked-by: Steven Rostedt
Signed-off-by: Rafael J. Wysocki

Todd E Brandt
11 years ago

05 Jun, 2014

1 commit

84117da5b kernel/cpu.c: convert printk to pr_foo() ... Browse Code »

no level printk converted to pr_warn (if err)
no level printk converted to pr_info (disabling non-boot cpus)
Other printk converted to respective level.

Signed-off-by: Fabian Frederick
Cc: "Rafael J. Wysocki"
Cc: Peter Zijlstra
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fabian Frederick
11 years ago

22 May, 2014

1 commit

6acbfb969 sched: Fix hotplug vs. set_cpus_allowed_ptr() ... Browse Code »

Lai found that:

WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
...
migration_cpu_stop+0x1d/0x22

was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
always a sub-set of cpu_online_mask.

This isn't true since 5fbd036b552f ("sched: Cleanup cpu_active madness").

So set active and online at the same time to avoid this particular
problem.

Fixes: 5fbd036b552f ("sched: Cleanup cpu_active madness")
Signed-off-by: Lai Jiangshan
Signed-off-by: Peter Zijlstra
Cc: Andrew Morton
Cc: Gautham R. Shenoy
Cc: Linus Torvalds
Cc: Michael wang
Cc: Paul Gortmaker
Cc: Rafael J. Wysocki
Cc: Srivatsa S. Bhat
Cc: Toshi Kani
Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com
Signed-off-by: Ingo Molnar

Lai Jiangshan
11 years ago

20 Mar, 2014

2 commits

93ae4f978 CPU hotplug: Provide lockless versions of callback registration functions ... Browse Code »

The following method of CPU hotplug callback registration is not safe
due to the possibility of an ABBA deadlock involving the cpu_add_remove_lock
and the cpu_hotplug.lock.

get_online_cpus();

for_each_online_cpu(cpu)
init_cpu(cpu);

register_cpu_notifier(&foobar_cpu_notifier);

put_online_cpus();

The deadlock is shown below:

CPU 0 CPU 1
----- -----

Acquire cpu_hotplug.lock
[via get_online_cpus()]

CPU online/offline operation
takes cpu_add_remove_lock
[via cpu_maps_update_begin()]

Try to acquire
cpu_add_remove_lock
[via register_cpu_notifier()]

CPU online/offline operation
tries to acquire cpu_hotplug.lock
[via cpu_hotplug_begin()]

*** DEADLOCK! ***

The problem here is that callback registration takes the locks in one order
whereas the CPU hotplug operations take the same locks in the opposite order.
To avoid this issue and to provide a race-free method to register CPU hotplug
callbacks (along with initialization of already online CPUs), introduce new
variants of the callback registration APIs that simply register the callbacks
without holding the cpu_add_remove_lock during the registration. That way,
we can avoid the ABBA scenario. However, we will need to hold the
cpu_add_remove_lock throughout the entire critical section, to protect updates
to the callback/notifier chain.

This can be achieved by writing the callback registration code as follows:

cpu_maps_update_begin(); [ or cpu_notifier_register_begin(); see below ]

for_each_online_cpu(cpu)
init_cpu(cpu);

/* This doesn't take the cpu_add_remove_lock */
__register_cpu_notifier(&foobar_cpu_notifier);

cpu_maps_update_done(); [ or cpu_notifier_register_done(); see below ]

Note that we can't use get_online_cpus() here instead of cpu_maps_update_begin()
because the cpu_hotplug.lock is dropped during the invocation of CPU_POST_DEAD
notifiers, and hence get_online_cpus() cannot provide the necessary
synchronization to protect the callback/notifier chains against concurrent
reads and writes. On the other hand, since the cpu_add_remove_lock protects
the entire hotplug operation (including CPU_POST_DEAD), we can use
cpu_maps_update_begin/done() to guarantee proper synchronization.

Also, since cpu_maps_update_begin/done() is like a super-set of
get/put_online_cpus(), the former naturally protects the critical sections
from concurrent hotplug operations.

Since the names cpu_maps_update_begin/done() don't make much sense in CPU
hotplug callback registration scenarios, we'll introduce new APIs named
cpu_notifier_register_begin/done() and map them to cpu_maps_update_begin/done().

In summary, introduce the lockless variants of un/register_cpu_notifier() and
also export the cpu_notifier_register_begin/done() APIs for use by modules.
This way, we provide a race-free way to register hotplug callbacks as well as
perform initialization for the CPUs that are already online.

Cc: Thomas Gleixner
Cc: Andrew Morton
Cc: Peter Zijlstra
Cc: Ingo Molnar
Acked-by: Oleg Nesterov
Acked-by: Toshi Kani
Reviewed-by: Gautham R. Shenoy
Signed-off-by: Srivatsa S. Bhat
Signed-off-by: Rafael J. Wysocki

Srivatsa S. Bhat
11 years ago
a19423b98 CPU hotplug: Add lockdep annotations to get/put_online_cpus() ... Browse Code »

Add lockdep annotations for get/put_online_cpus() and
cpu_hotplug_begin()/cpu_hotplug_end().

Cc: Ingo Molnar
Reviewed-by: Oleg Nesterov
Signed-off-by: Gautham R. Shenoy
Signed-off-by: Srivatsa S. Bhat
Signed-off-by: Rafael J. Wysocki

Gautham R. Shenoy
11 years ago

14 Nov, 2013

1 commit

fe8a45df3 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fixes from Ingo Molnar:
"Four bugfixes and one performance fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Avoid integer overflow
sched: Optimize task_sched_runtime()
sched/numa: Cure update_numa_stats() vs. hotplug
sched/numa: Fix NULL pointer dereference in task_numa_migrate()
sched: Fix endless sync_sched/rcu() loop inside _cpu_down()

Linus Torvalds
12 years ago

13 Nov, 2013

2 commits

106dd5afd sched: Fix endless sync_sched/rcu() loop inside _cpu_down() ... Browse Code »

Commit 6acce3ef8:

sched: Remove get_online_cpus() usage

tries to do sync_sched/rcu() inside _cpu_down() but triggers:

INFO: task swapper/0:1 blocked for more than 120 seconds.
...
[] synchronize_rcu+0x2c/0x30
[] _cpu_down+0x2b2/0x340
...

It was caused by that in the rcu boost case we rely on smpboot thread to
finish the rcu callback, which has already been parked before sync in here
and leads to the endless sync_sched/rcu().

This patch exchanges the sequence of smpboot_park_threads() and
sync_sched/rcu() to fix the bug.

Reported-by: Fengguang Wu
Tested-by: Fengguang Wu
Signed-off-by: Michael Wang
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/5282EDC0.6060003@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar

Michael wang
12 years ago
01b0f1970 cpu/mem hotplug: add try_online_node() for cpu_up() ... Browse Code »

cpu_up() has #ifdef CONFIG_MEMORY_HOTPLUG code blocks, which call
mem_online_node() to put its node online if offlined and then call
build_all_zonelists() to initialize the zone list.

These steps are specific to memory hotplug, and should be managed in
mm/memory_hotplug.c. lock_memory_hotplug() should also be held for the
whole steps.

For this reason, this patch replaces mem_online_node() with
try_online_node(), which performs the whole steps with
lock_memory_hotplug() held. try_online_node() is named after
try_offline_node() as they have similar purpose.

There is no functional change in this patch.

Signed-off-by: Toshi Kani
Reviewed-by: Yasuaki Ishimatsu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Toshi Kani
12 years ago

16 Oct, 2013

1 commit

6acce3ef8 sched: Remove get_online_cpus() usage ... Browse Code »

Remove get_online_cpus() usage from the scheduler; there's 4 sites that
use it:

- sched_init_smp(); where its completely superfluous since we're in
'early' boot and there simply cannot be any hotplugging.

- sched_getaffinity(); we already take a raw spinlock to protect the
task cpus_allowed mask, this disables preemption and therefore
also stabilizes cpu_online_mask as that's modified using
stop_machine. However switch to active mask for symmetry with
sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active
mask stability by inserting sync_rcu/sched() into _cpu_down.

- sched_setaffinity(); we don't appear to need get_online_cpus()
either, there's two sites where hotplug appears relevant:
* cpuset_cpus_allowed(); for the !cpuset case we use possible_mask,
for the cpuset case we hold task_lock, which is a spinlock and
thus for mainline disables preemption (might cause pain on RT).
* set_cpus_allowed_ptr(); Holds all scheduler locks and thus has
preemption properly disabled; also it already deals with hotplug
races explicitly where it releases them.

- migrate_swap(); we can make stop_two_cpus() do the heavy lifting for
us with a little trickery. By adding a sync_sched/rcu() after the
CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for
cpu_active_mask. Use these to validate that both our cpus are active
when queueing the stop work before we queue the stop_machine works
for take_cpu_down().

Signed-off-by: Peter Zijlstra
Cc: "Srivatsa S. Bhat"
Cc: Paul McKenney
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Srikar Dronamraju
Cc: Andrea Arcangeli
Cc: Johannes Weiner
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Steven Rostedt
Cc: Oleg Nesterov
Link: http://lkml.kernel.org/r/20131011123820.GV3081@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
12 years ago

13 Aug, 2013

1 commit

b9d10be7a ACPI / processor: Acquire writer lock to update CPU maps ... Browse Code »

CPU system maps are protected with reader/writer locks. The reader
lock, get_online_cpus(), assures that the maps are not updated while
holding the lock. The writer lock, cpu_hotplug_begin(), is used to
udpate the cpu maps along with cpu_maps_update_begin().

However, the ACPI processor handler updates the cpu maps without
holding the the writer lock.

acpi_map_lsapic() is called from acpi_processor_hotadd_init() to
update cpu_possible_mask and cpu_present_mask. acpi_unmap_lsapic()
is called from acpi_processor_remove() to update cpu_possible_mask.
Currently, they are either unprotected or protected with the reader
lock, which is not correct.

For example, the get_online_cpus() below is supposed to assure that
cpu_possible_mask is not changed while the code is iterating with
for_each_possible_cpu().

get_online_cpus();
for_each_possible_cpu(cpu) {
:
}
put_online_cpus();

However, this lock has no protection with CPU hotplug since the ACPI
processor handler does not use the writer lock when it updates
cpu_possible_mask. The reader lock does not serialize within the
readers.

This patch protects them with the writer lock with cpu_hotplug_begin()
along with cpu_maps_update_begin(), which must be held before calling
cpu_hotplug_begin(). It also protects arch_register_cpu() /
arch_unregister_cpu(), which creates / deletes a sysfs cpu device
interface. For this purpose it changes cpu_hotplug_begin() and
cpu_hotplug_done() to global and exports them in cpu.h.

Signed-off-by: Toshi Kani
Signed-off-by: Rafael J. Wysocki

Toshi Kani
12 years ago

15 Jul, 2013

1 commit

0db0628d9 kernel: delete __cpuinit usage from all core kernel files ... Browse Code »

The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

This removes all the uses of the __cpuinit macros from C files in
the core kernel directories (kernel, init, lib, mm, and include)
that don't really have a specific maintainer.

[1] https://lkml.org/lkml/2013/5/20/589

Signed-off-by: Paul Gortmaker

Paul Gortmaker
12 years ago

13 Jun, 2013

1 commit

16e53dbf1 CPU hotplug: provide a generic helper to disable/enable CPU hotplug ... Browse Code »

There are instances in the kernel where we would like to disable CPU
hotplug (from sysfs) during some important operation. Today the freezer
code depends on this and the code to do it was kinda tailor-made for
that.

Restructure the code and make it generic enough to be useful for other
usecases too.

Signed-off-by: Srivatsa S. Bhat
Signed-off-by: Robin Holt
Cc: H. Peter Anvin
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Russ Anderson
Cc: Robin Holt
Cc: Russell King
Cc: Guan Xuetao
Cc: Shawn Guo
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Srivatsa S. Bhat
12 years ago

20 Feb, 2013

1 commit

bcbd818c0 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull preparatory smp/hotplug patches from Ingo Molnar:
"Some early preparatory changes for the WIP hotplug rework by Thomas
Gleixner."

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
stop_machine: Use smpboot threads
stop_machine: Store task reference in a separate per cpu variable
smpboot: Allow selfparking per cpu threads

Linus Torvalds
12 years ago

14 Feb, 2013

1 commit

14e568e78 stop_machine: Use smpboot threads ... Browse Code »

Use the smpboot thread infrastructure. Mark the stopper thread
selfparking and park it after it has finished the take_cpu_down()
work.

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Rusty Russell
Cc: Paul McKenney
Cc: Srivatsa S. Bhat
Cc: Arjan van de Veen
Cc: Paul Turner
Cc: Richard Weinberger
Cc: Magnus Damm
Link: http://lkml.kernel.org/r/20130131120741.686315164@linutronix.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
12 years ago

28 Jan, 2013

1 commit

6fac4829c cputime: Use accessors to read task cputime stats ... Browse Code »

This is in preparation for the full dynticks feature. While
remotely reading the cputime of a task running in a full
dynticks CPU, we'll need to do some extra-computation. This
way we can account the time it spent tickless in userspace
since its last cputime snapshot.

Signed-off-by: Frederic Weisbecker
Cc: Andrew Morton
Cc: Ingo Molnar
Cc: Li Zhong
Cc: Namhyung Kim
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
13 years ago

12 Dec, 2012

1 commit

74b842334 Merge branch 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 BSP hotplug changes from Ingo Molnar:
"This tree enables CPU#0 (the boot processor) to be onlined/offlined on
x86, just like any other CPU. Enabled on Intel CPUs for now.

Allowing this required the identification and fixing of latent CPU#0
assumptions (such as CPU#0 initializations, etc.) in the x86
architecture code, plus the identification of barriers to
BSP-offlining, such as active PIC interrupts which can only be
serviced on the BSP.

It's behind a default-off option, and there's a debug option that
allows the automatic testing of this feature.

The motivation of this feature is to allow and prepare for true
CPU-hotplug hardware support: recent changes to MCE support enable us
to detect a deteriorating but not yet hard-failing L1/L2 cache on a
CPU that could be soft-unplugged - or a failing L3 cache on a
multi-socket system.

Note that true hardware hot-plug is not yet fully enabled by this,
because that requires a special platform wakeup sequence to be sent to
the freshly powered up CPU#0. Future patches for this are planned,
once such a platform exists. Chicken and egg"

* 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, topology: Debug CPU0 hotplug
x86/i387.c: Initialize thread xstate only on CPU0 only once
x86, hotplug: Handle retrigger irq by the first available CPU
x86, hotplug: The first online processor saves the MTRR state
x86, hotplug: During CPU0 online, enable x2apic, set_numa_node.
x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI
x86-32, hotplug: Add start_cpu0() entry point to head_32.S
x86-64, hotplug: Add start_cpu0() entry point to head_64.S
kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback
x86, hotplug, suspend: Online CPU0 for suspend or hibernate
x86, hotplug: Support functions for CPU0 online/offline
x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it
x86, Kconfig: Add config switch for CPU0 hotplug
doc: Add x86 CPU0 online/offline feature

Linus Torvalds
13 years ago

15 Nov, 2012

2 commits

5e5041f35 ACPI / processor: prevent cpu from becoming online ... Browse Code »

Even if acpi_processor_handle_eject() offlines cpu, there is a chance
to online the cpu after that. So the patch closes the window by using
get/put_online_cpus().

Why does the patch change _cpu_up() logic?

The patch cares the race of hot-remove cpu and _cpu_up(). If the patch
does not change it, there is the following race.

hot-remove cpu | _cpu_up()
------------------------------------- ------------------------------------
call acpi_processor_handle_eject() |
call cpu_down() |
call get_online_cpus() |
| call cpu_hotplug_begin() and stop here
call arch_unregister_cpu() |
call acpi_unmap_lsapic() |
call put_online_cpus() |
| start and continue _cpu_up()
return acpi_processor_remove() |
continue hot-remove the cpu |

So _cpu_up() can continue to itself. And hot-remove cpu can also continue
itself. If the patch changes _cpu_up() logic, the race disappears as below:

hot-remove cpu | _cpu_up()
-----------------------------------------------------------------------
call acpi_processor_handle_eject() |
call cpu_down() |
call get_online_cpus() |
| call cpu_hotplug_begin() and stop here
call arch_unregister_cpu() |
call acpi_unmap_lsapic() |
cpu's cpu_present is set |
to false by set_cpu_present()|
call put_online_cpus() |
| start _cpu_up()
| check cpu_present() and return -EINVAL
return acpi_processor_remove() |
continue hot-remove the cpu |

Signed-off-by: Yasuaki Ishimatsu
Reviewed-by: Srivatsa S. Bhat
Reviewed-by: Toshi Kani
Signed-off-by: Rafael J. Wysocki

Yasuaki Ishimatsu
13 years ago
6e32d479d kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback ... Browse Code »

cpu_hotplug_pm_callback should have higher priority than
bsp_pm_callback which depends on cpu_hotplug_pm_callback to disable cpu hotplug
to avoid race during bsp online checking.

This is to hightlight the priorities between the two callbacks in case people
may overlook the order.

Ideally the priorities should be defined in macro/enum instead of fixed values.
To do that, a seperate patchset may be pushed which will touch serveral other
generic files and is out of scope of this patchset.

Signed-off-by: Fenghua Yu
Link: http://lkml.kernel.org/r/1352835171-3958-7-git-send-email-fenghua.yu@intel.com
Reviewed-by: Srivatsa S. Bhat
Acked-by: Rafael J. Wysocki
Signed-off-by: H. Peter Anvin

Fenghua Yu
13 years ago

09 Oct, 2012

1 commit

075663d19 CPU hotplug, debug: detect imbalance between get_online_cpus() and put_online_cpus() ... Browse Code »

The synchronization between CPU hotplug readers and writers is achieved
by means of refcounting, safeguarded by the cpu_hotplug.lock.

get_online_cpus() increments the refcount, whereas put_online_cpus()
decrements it. If we ever hit an imbalance between the two, we end up
compromising the guarantees of the hotplug synchronization i.e, for
example, an extra call to put_online_cpus() can end up allowing a
hotplug reader to execute concurrently with a hotplug writer.

So, add a WARN_ON() in put_online_cpus() to detect such cases where the
refcount can go negative, and also attempt to fix it up, so that we can
continue to run.

Signed-off-by: Srivatsa S. Bhat
Reviewed-by: Yasuaki Ishimatsu
Cc: Jiri Kosina
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Srivatsa S. Bhat
13 years ago

02 Oct, 2012

1 commit

da8347969 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86/asm changes from Ingo Molnar:
"The one change that stands out is the alternatives patching change
that prevents us from ever patching back instructions from SMP to UP:
this simplifies things and speeds up CPU hotplug.

Other than that it's smaller fixes, cleanups and improvements."

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Unspaghettize do_trap()
x86_64: Work around old GAS bug
x86: Use REP BSF unconditionally
x86: Prefer TZCNT over BFS
x86/64: Adjust types of temporaries used by ffs()/fls()/fls64()
x86: Drop unnecessary kernel_eflags variable on 64-bit
x86/smp: Don't ever patch back to UP if we unplug cpus

Linus Torvalds
13 years ago

23 Aug, 2012

1 commit

816afe4ff x86/smp: Don't ever patch back to UP if we unplug cpus ... Browse Code »

We still patch SMP instructions to UP variants if we boot with a
single CPU, but not at any other time. In particular, not if we
unplug CPUs to return to a single cpu.

Paul McKenney points out:

mean offline overhead is 6251/48=130.2 milliseconds.

If I remove the alternatives_smp_switch() from the offline
path [...] the mean offline overhead is 550/42=13.1 milliseconds

Basically, we're never going to get those 120ms back, and the
code is pretty messy.

We get rid of:

1) The "smp-alt-once" boot option. It's actually "smp-alt-boot", the
documentation is wrong. It's now the default.

2) The skip_smp_alternatives flag used by suspend.

3) arch_disable_nonboot_cpus_begin() and arch_disable_nonboot_cpus_end()
which were only used to set this one flag.

Signed-off-by: Rusty Russell
Cc: Paul McKenney
Cc: Suresh Siddha
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/87vcgwwive.fsf@rustcorp.com.au
Signed-off-by: Ingo Molnar

Rusty Russell
13 years ago

13 Aug, 2012

1 commit

f97f8f06a smpboot: Provide infrastructure for percpu hotplug threads ... Browse Code »

Provide a generic interface for setting up and tearing down percpu
threads.

On registration the threads for already online cpus are created and
started. On deregistration (modules) the threads are stoppped.

During hotplug operations the threads are created, started, parked and
unparked. The datastructure for registration provides a pointer to
percpu storage space and optional setup, cleanup, park, unpark
functions. These functions are called when the thread state changes.

Each implementation has to provide a function which is queried and
returns whether the thread should run and the thread function itself.

The core code handles all state transitions and avoids duplicated code
in the call sites.

[ paulmck: Preemption leak fix ]

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Reviewed-by: Srivatsa S. Bhat
Cc: Rusty Russell
Reviewed-by: Paul E. McKenney
Cc: Namhyung Kim
Link: http://lkml.kernel.org/r/20120716103948.352501068@linutronix.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
13 years ago

01 Aug, 2012

1 commit

9adb62a5d mm/hotplug: correctly setup fallback zonelists when creating new pgdat ... Browse Code »

When hotadd_new_pgdat() is called to create new pgdat for a new node, a
fallback zonelist should be created for the new node. There's code to try
to achieve that in hotadd_new_pgdat() as below:

/*
* The node we allocated has no zone fallback lists. For avoiding
* to access not-initialized zonelist, build here.
*/
mutex_lock(&zonelists_mutex);
build_all_zonelists(pgdat, NULL);
mutex_unlock(&zonelists_mutex);

But it doesn't work as expected. When hotadd_new_pgdat() is called, the
new node is still in offline state because node_set_online(nid) hasn't
been called yet. And build_all_zonelists() only builds zonelists for
online nodes as:

for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);

build_zonelists(pgdat);
build_zonelist_cache(pgdat);
}

Though we hope to create zonelist for the new pgdat, but it doesn't. So
add a new parameter "pgdat" the build_all_zonelists() to build pgdat for
the new pgdat too.

Signed-off-by: Jiang Liu
Signed-off-by: Xishi Qiu
Cc: Mel Gorman
Cc: Michal Hocko
Cc: Minchan Kim
Cc: Rusty Russell
Cc: Yinghai Lu
Cc: Tony Luck
Cc: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Cc: David Rientjes
Cc: Keping Chen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
13 years ago

01 Jun, 2012

2 commits

e4cc2f873 kernel/cpu.c: document clear_tasks_mm_cpumask() ... Browse Code »

Add more comments on clear_tasks_mm_cpumask, plus adds a runtime check:
the function is only suitable for offlined CPUs, and if called
inappropriately, the kernel should scream aloud.

[akpm@linux-foundation.org: tweak comment: s/walks up/walks/, use 80 cols]
Suggested-by: Andrew Morton
Suggested-by: Peter Zijlstra
Signed-off-by: Anton Vorontsov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anton Vorontsov
13 years ago
cb79295e2 cpu: introduce clear_tasks_mm_cpumask() helper ... Browse Code »

Many architectures clear tasks' mm_cpumask like this:

read_lock(&tasklist_lock);
for_each_process(p) {
if (p->mm)
cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
}
read_unlock(&tasklist_lock);

Depending on the context, the code above may have several problems,
such as:

1. Working with task->mm w/o getting mm or grabing the task lock is
dangerous as ->mm might disappear (exit_mm() assigns NULL under
task_lock(), so tasklist lock is not enough).

2. Checking for process->mm is not enough because process' main
thread may exit or detach its mm via use_mm(), but other threads
may still have a valid mm.

This patch implements a small helper function that does things
correctly, i.e.:

1. We take the task's lock while whe handle its mm (we can't use
get_task_mm()/mmput() pair as mmput() might sleep);

2. To catch exited main thread case, we use find_lock_task_mm(),
which walks up all threads and returns an appropriate task
(with task lock held).

Also, Per Peter Zijlstra's idea, now we don't grab tasklist_lock in
the new helper, instead we take the rcu read lock. We can do this
because the function is called after the cpu is taken down and marked
offline, so no new tasks will get this cpu set in their mm mask.

Signed-off-by: Anton Vorontsov
Cc: Richard Weinberger
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Russell King
Cc: Benjamin Herrenschmidt
Cc: Mike Frysinger
Cc: Paul Mundt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anton Vorontsov
13 years ago