04 Nov, 2014
1 commit
-
A long string of get_online_cpus() with each followed by a
put_online_cpu() that fails to acquire cpu_hotplug.lock can result in
overflow of the cpu_hotplug.puts_pending counter. Although this is
perhaps improbably, a system with absolutely no CPU-hotplug operations
will have an arbitrarily long time in which this overflow could occur.
This commit therefore adds overflow checks to get_online_cpus() and
try_get_online_cpus().Signed-off-by: Paul E. McKenney
Reviewed-by: Pranith Kumar
23 Oct, 2014
1 commit
-
Commit dd56af42bd82 (rcu: Eliminate deadlock between CPU hotplug and
expedited grace periods) was incomplete. Although it did eliminate
deadlocks involving synchronize_sched_expedited()'s acquisition of
cpu_hotplug.lock via get_online_cpus(), it did nothing about the similar
deadlock involving acquisition of this same lock via put_online_cpus().
This deadlock became apparent with testing involving hibernation.This commit therefore changes put_online_cpus() acquisition of this lock
to be conditional, and increments a new cpu_hotplug.puts_pending field
in case of acquisition failure. Then cpu_hotplug_begin() checks for this
new field being non-zero, and applies any changes to cpu_hotplug.refcount.Reported-by: Jiri Kosina
Signed-off-by: Paul E. McKenney
Tested-by: Jiri Kosina
Tested-by: Borislav Petkov
19 Sep, 2014
1 commit
-
Currently, the expedited grace-period primitives do get_online_cpus().
This greatly simplifies their implementation, but means that calls
to them holding locks that are acquired by CPU-hotplug notifiers (to
say nothing of calls to these primitives from CPU-hotplug notifiers)
can deadlock. But this is starting to become inconvenient, as can be
seen here: https://lkml.org/lkml/2014/8/5/754. The problem in this
case is that some developers need to acquire a mutex from a CPU-hotplug
notifier, but also need to hold it across a synchronize_rcu_expedited().
As noted above, this currently results in deadlock.This commit avoids the deadlock and retains the simplicity by creating
a try_get_online_cpus(), which returns false if the get_online_cpus()
reference count could not immediately be incremented. If a call to
try_get_online_cpus() returns true, the expedited primitives operate as
before. If a call returns false, the expedited primitives fall back to
normal grace-period operations. This falling back of course results in
increased grace-period latency, but only during times when CPU hotplug
operations are actually in flight. The effect should therefore be
negligible during normal operation.Signed-off-by: Paul E. McKenney
Cc: Josh Triplett
Cc: "Rafael J. Wysocki"
Tested-by: Lan Tianyu
05 Jul, 2014
1 commit
-
1) Iterate thru all of threads in the system.
Check for all threads, not only for group leaders.2) Check for p->on_rq instead of p->state and cputime.
Preempted task in !TASK_RUNNING state OR just
created task may be queued, that we want to be
reported too.3) Use read_lock() instead of write_lock().
This function does not change any structures, and
read_lock() is enough.Signed-off-by: Kirill Tkhai
Reviewed-by: Srikar Dronamraju
Cc: Andrew Morton
Cc: Ben Segall
Cc: Fabian Frederick
Cc: Gautham R. Shenoy
Cc: Konstantin Khorenko
Cc: Linus Torvalds
Cc: Michael wang
Cc: Mike Galbraith
Cc: Paul Gortmaker
Cc: Paul Turner
Cc: Rafael J. Wysocki
Cc: Srivatsa S. Bhat
Cc: Todd E Brandt
Cc: Toshi Kani
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1403684395.3462.44.camel@tkhai
Signed-off-by: Ingo Molnar
13 Jun, 2014
1 commit
-
Pull more ACPI and power management updates from Rafael Wysocki:
"These are fixups on top of the previous PM+ACPI pull request,
regression fixes (ACPI hotplug, cpufreq ppc-corenet), other bug fixes
(ACPI reset, cpufreq), new PM trace points for system suspend
profiling and a copyright notice update.Specifics:
- I didn't remember correctly that the Hans de Goede's ACPI video
patches actually didn't flip the video.use_native_backlight
default, although we had discussed that and decided to do that.
Since I said we would do that in the previous PM+ACPI pull request,
make that change for real now.- ACPI bus check notifications for PCI host bridges don't cause the
bus below the host bridge to be checked for changes as they should
because of a mistake in the ACPI-based PCI hotplug (ACPIPHP)
subsystem that forgets to add hotplug contexts to PCI host bridge
ACPI device objects. Create hotplug contexts for PCI host bridges
too as appropriate.- Revert recent cpufreq commit related to the big.LITTLE cpufreq
driver that breaks arm64 builds.- Fix for a regression in the ppc-corenet cpufreq driver introduced
during the 3.15 cycle and causing the driver to use the remainder
from do_div instead of the quotient. From Ed Swarthout.- Resets triggered by panic activate a BUG_ON() in vmalloc.c on
systems where the ACPI reset register is located in memory address
space. Fix from Randy Wright.- Fix for a problem with cpufreq governors that decisions made by
them may be suboptimal due to the fact that deferrable timers are
used by them for CPU load sampling. From Srivatsa S Bhat.- Fix for a problem with the Tegra cpufreq driver where the CPU
frequency is temporarily switched to a "stable" level that is
different from both the initial and target frequencies during
transitions which causes udelay() to expire earlier than it should
sometimes. From Viresh Kumar.- New trace points and rework of some existing trace points for
system suspend/resume profiling from Todd Brandt.- Assorted cpufreq fixes and cleanups from Stratos Karafotis and
Viresh Kumar.- Copyright notice update for suspend-and-cpuhotplug.txt from
Srivatsa S Bhat"* tag 'pm+acpi-3.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / hotplug / PCI: Add hotplug contexts to PCI host bridges
PM / sleep: trace events for device PM callbacks
cpufreq: cpufreq-cpu0: remove dependency on THERMAL and REGULATOR
cpufreq: tegra: update comment for clarity
cpufreq: intel_pstate: Remove duplicate CPU ID check
cpufreq: Mark CPU0 driver with CPUFREQ_NEED_INITIAL_FREQ_CHECK flag
PM / Documentation: Update copyright in suspend-and-cpuhotplug.txt
cpufreq: governor: remove copy_prev_load from 'struct cpu_dbs_common_info'
cpufreq: governor: Be friendly towards latency-sensitive bursty workloads
PM / sleep: trace events for suspend/resume
cpufreq: ppc-corenet-cpu-freq: do_div use quotient
Revert "cpufreq: Enable big.LITTLE cpufreq driver on arm64"
cpufreq: Tegra: implement intermediate frequency callbacks
cpufreq: add support for intermediate (stable) frequencies
ACPI / video: Change the default for video.use_native_backlight to 1
ACPI: Fix bug when ACPI reset register is implemented in system memory
12 Jun, 2014
1 commit
-
* pm-sleep:
PM / sleep: trace events for device PM callbacks
PM / sleep: trace events for suspend/resume
07 Jun, 2014
1 commit
-
Adds trace events that give finer resolution into suspend/resume. These
events are graphed in the timelines generated by the analyze_suspend.py
script. They represent large areas of time consumed that are typical to
suspend and resume.The event is triggered by calling the function "trace_suspend_resume"
with three arguments: a string (the name of the event to be displayed
in the timeline), an integer (case specific number, such as the power
state or cpu number), and a boolean (where true is used to denote the start
of the timeline event, and false to denote the end).The suspend_resume trace event reproduces the data that the machine_suspend
trace event did, so the latter has been removed.Signed-off-by: Todd Brandt
Acked-by: Steven Rostedt
Signed-off-by: Rafael J. Wysocki
05 Jun, 2014
1 commit
-
no level printk converted to pr_warn (if err)
no level printk converted to pr_info (disabling non-boot cpus)
Other printk converted to respective level.Signed-off-by: Fabian Frederick
Cc: "Rafael J. Wysocki"
Cc: Peter Zijlstra
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 May, 2014
1 commit
-
Lai found that:
WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
...
migration_cpu_stop+0x1d/0x22was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
always a sub-set of cpu_online_mask.This isn't true since 5fbd036b552f ("sched: Cleanup cpu_active madness").
So set active and online at the same time to avoid this particular
problem.Fixes: 5fbd036b552f ("sched: Cleanup cpu_active madness")
Signed-off-by: Lai Jiangshan
Signed-off-by: Peter Zijlstra
Cc: Andrew Morton
Cc: Gautham R. Shenoy
Cc: Linus Torvalds
Cc: Michael wang
Cc: Paul Gortmaker
Cc: Rafael J. Wysocki
Cc: Srivatsa S. Bhat
Cc: Toshi Kani
Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com
Signed-off-by: Ingo Molnar
20 Mar, 2014
2 commits
-
The following method of CPU hotplug callback registration is not safe
due to the possibility of an ABBA deadlock involving the cpu_add_remove_lock
and the cpu_hotplug.lock.get_online_cpus();
for_each_online_cpu(cpu)
init_cpu(cpu);register_cpu_notifier(&foobar_cpu_notifier);
put_online_cpus();
The deadlock is shown below:
CPU 0 CPU 1
----- -----Acquire cpu_hotplug.lock
[via get_online_cpus()]CPU online/offline operation
takes cpu_add_remove_lock
[via cpu_maps_update_begin()]Try to acquire
cpu_add_remove_lock
[via register_cpu_notifier()]CPU online/offline operation
tries to acquire cpu_hotplug.lock
[via cpu_hotplug_begin()]*** DEADLOCK! ***
The problem here is that callback registration takes the locks in one order
whereas the CPU hotplug operations take the same locks in the opposite order.
To avoid this issue and to provide a race-free method to register CPU hotplug
callbacks (along with initialization of already online CPUs), introduce new
variants of the callback registration APIs that simply register the callbacks
without holding the cpu_add_remove_lock during the registration. That way,
we can avoid the ABBA scenario. However, we will need to hold the
cpu_add_remove_lock throughout the entire critical section, to protect updates
to the callback/notifier chain.This can be achieved by writing the callback registration code as follows:
cpu_maps_update_begin(); [ or cpu_notifier_register_begin(); see below ]
for_each_online_cpu(cpu)
init_cpu(cpu);/* This doesn't take the cpu_add_remove_lock */
__register_cpu_notifier(&foobar_cpu_notifier);cpu_maps_update_done(); [ or cpu_notifier_register_done(); see below ]
Note that we can't use get_online_cpus() here instead of cpu_maps_update_begin()
because the cpu_hotplug.lock is dropped during the invocation of CPU_POST_DEAD
notifiers, and hence get_online_cpus() cannot provide the necessary
synchronization to protect the callback/notifier chains against concurrent
reads and writes. On the other hand, since the cpu_add_remove_lock protects
the entire hotplug operation (including CPU_POST_DEAD), we can use
cpu_maps_update_begin/done() to guarantee proper synchronization.Also, since cpu_maps_update_begin/done() is like a super-set of
get/put_online_cpus(), the former naturally protects the critical sections
from concurrent hotplug operations.Since the names cpu_maps_update_begin/done() don't make much sense in CPU
hotplug callback registration scenarios, we'll introduce new APIs named
cpu_notifier_register_begin/done() and map them to cpu_maps_update_begin/done().In summary, introduce the lockless variants of un/register_cpu_notifier() and
also export the cpu_notifier_register_begin/done() APIs for use by modules.
This way, we provide a race-free way to register hotplug callbacks as well as
perform initialization for the CPUs that are already online.Cc: Thomas Gleixner
Cc: Andrew Morton
Cc: Peter Zijlstra
Cc: Ingo Molnar
Acked-by: Oleg Nesterov
Acked-by: Toshi Kani
Reviewed-by: Gautham R. Shenoy
Signed-off-by: Srivatsa S. Bhat
Signed-off-by: Rafael J. Wysocki -
Add lockdep annotations for get/put_online_cpus() and
cpu_hotplug_begin()/cpu_hotplug_end().Cc: Ingo Molnar
Reviewed-by: Oleg Nesterov
Signed-off-by: Gautham R. Shenoy
Signed-off-by: Srivatsa S. Bhat
Signed-off-by: Rafael J. Wysocki
14 Nov, 2013
1 commit
-
Pull scheduler fixes from Ingo Molnar:
"Four bugfixes and one performance fix"* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Avoid integer overflow
sched: Optimize task_sched_runtime()
sched/numa: Cure update_numa_stats() vs. hotplug
sched/numa: Fix NULL pointer dereference in task_numa_migrate()
sched: Fix endless sync_sched/rcu() loop inside _cpu_down()
13 Nov, 2013
2 commits
-
Commit 6acce3ef8:
sched: Remove get_online_cpus() usage
tries to do sync_sched/rcu() inside _cpu_down() but triggers:
INFO: task swapper/0:1 blocked for more than 120 seconds.
...
[] synchronize_rcu+0x2c/0x30
[] _cpu_down+0x2b2/0x340
...It was caused by that in the rcu boost case we rely on smpboot thread to
finish the rcu callback, which has already been parked before sync in here
and leads to the endless sync_sched/rcu().This patch exchanges the sequence of smpboot_park_threads() and
sync_sched/rcu() to fix the bug.Reported-by: Fengguang Wu
Tested-by: Fengguang Wu
Signed-off-by: Michael Wang
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/5282EDC0.6060003@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar -
cpu_up() has #ifdef CONFIG_MEMORY_HOTPLUG code blocks, which call
mem_online_node() to put its node online if offlined and then call
build_all_zonelists() to initialize the zone list.These steps are specific to memory hotplug, and should be managed in
mm/memory_hotplug.c. lock_memory_hotplug() should also be held for the
whole steps.For this reason, this patch replaces mem_online_node() with
try_online_node(), which performs the whole steps with
lock_memory_hotplug() held. try_online_node() is named after
try_offline_node() as they have similar purpose.There is no functional change in this patch.
Signed-off-by: Toshi Kani
Reviewed-by: Yasuaki Ishimatsu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Oct, 2013
1 commit
-
Remove get_online_cpus() usage from the scheduler; there's 4 sites that
use it:- sched_init_smp(); where its completely superfluous since we're in
'early' boot and there simply cannot be any hotplugging.- sched_getaffinity(); we already take a raw spinlock to protect the
task cpus_allowed mask, this disables preemption and therefore
also stabilizes cpu_online_mask as that's modified using
stop_machine. However switch to active mask for symmetry with
sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active
mask stability by inserting sync_rcu/sched() into _cpu_down.- sched_setaffinity(); we don't appear to need get_online_cpus()
either, there's two sites where hotplug appears relevant:
* cpuset_cpus_allowed(); for the !cpuset case we use possible_mask,
for the cpuset case we hold task_lock, which is a spinlock and
thus for mainline disables preemption (might cause pain on RT).
* set_cpus_allowed_ptr(); Holds all scheduler locks and thus has
preemption properly disabled; also it already deals with hotplug
races explicitly where it releases them.- migrate_swap(); we can make stop_two_cpus() do the heavy lifting for
us with a little trickery. By adding a sync_sched/rcu() after the
CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for
cpu_active_mask. Use these to validate that both our cpus are active
when queueing the stop work before we queue the stop_machine works
for take_cpu_down().Signed-off-by: Peter Zijlstra
Cc: "Srivatsa S. Bhat"
Cc: Paul McKenney
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Srikar Dronamraju
Cc: Andrea Arcangeli
Cc: Johannes Weiner
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Steven Rostedt
Cc: Oleg Nesterov
Link: http://lkml.kernel.org/r/20131011123820.GV3081@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar
13 Aug, 2013
1 commit
-
CPU system maps are protected with reader/writer locks. The reader
lock, get_online_cpus(), assures that the maps are not updated while
holding the lock. The writer lock, cpu_hotplug_begin(), is used to
udpate the cpu maps along with cpu_maps_update_begin().However, the ACPI processor handler updates the cpu maps without
holding the the writer lock.acpi_map_lsapic() is called from acpi_processor_hotadd_init() to
update cpu_possible_mask and cpu_present_mask. acpi_unmap_lsapic()
is called from acpi_processor_remove() to update cpu_possible_mask.
Currently, they are either unprotected or protected with the reader
lock, which is not correct.For example, the get_online_cpus() below is supposed to assure that
cpu_possible_mask is not changed while the code is iterating with
for_each_possible_cpu().get_online_cpus();
for_each_possible_cpu(cpu) {
:
}
put_online_cpus();However, this lock has no protection with CPU hotplug since the ACPI
processor handler does not use the writer lock when it updates
cpu_possible_mask. The reader lock does not serialize within the
readers.This patch protects them with the writer lock with cpu_hotplug_begin()
along with cpu_maps_update_begin(), which must be held before calling
cpu_hotplug_begin(). It also protects arch_register_cpu() /
arch_unregister_cpu(), which creates / deletes a sysfs cpu device
interface. For this purpose it changes cpu_hotplug_begin() and
cpu_hotplug_done() to global and exports them in cpu.h.Signed-off-by: Toshi Kani
Signed-off-by: Rafael J. Wysocki
15 Jul, 2013
1 commit
-
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.This removes all the uses of the __cpuinit macros from C files in
the core kernel directories (kernel, init, lib, mm, and include)
that don't really have a specific maintainer.[1] https://lkml.org/lkml/2013/5/20/589
Signed-off-by: Paul Gortmaker
13 Jun, 2013
1 commit
-
There are instances in the kernel where we would like to disable CPU
hotplug (from sysfs) during some important operation. Today the freezer
code depends on this and the code to do it was kinda tailor-made for
that.Restructure the code and make it generic enough to be useful for other
usecases too.Signed-off-by: Srivatsa S. Bhat
Signed-off-by: Robin Holt
Cc: H. Peter Anvin
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Russ Anderson
Cc: Robin Holt
Cc: Russell King
Cc: Guan Xuetao
Cc: Shawn Guo
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Feb, 2013
1 commit
-
Pull preparatory smp/hotplug patches from Ingo Molnar:
"Some early preparatory changes for the WIP hotplug rework by Thomas
Gleixner."* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
stop_machine: Use smpboot threads
stop_machine: Store task reference in a separate per cpu variable
smpboot: Allow selfparking per cpu threads
14 Feb, 2013
1 commit
-
Use the smpboot thread infrastructure. Mark the stopper thread
selfparking and park it after it has finished the take_cpu_down()
work.Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Rusty Russell
Cc: Paul McKenney
Cc: Srivatsa S. Bhat
Cc: Arjan van de Veen
Cc: Paul Turner
Cc: Richard Weinberger
Cc: Magnus Damm
Link: http://lkml.kernel.org/r/20130131120741.686315164@linutronix.de
Signed-off-by: Thomas Gleixner
28 Jan, 2013
1 commit
-
This is in preparation for the full dynticks feature. While
remotely reading the cputime of a task running in a full
dynticks CPU, we'll need to do some extra-computation. This
way we can account the time it spent tickless in userspace
since its last cputime snapshot.Signed-off-by: Frederic Weisbecker
Cc: Andrew Morton
Cc: Ingo Molnar
Cc: Li Zhong
Cc: Namhyung Kim
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner
12 Dec, 2012
1 commit
-
Pull x86 BSP hotplug changes from Ingo Molnar:
"This tree enables CPU#0 (the boot processor) to be onlined/offlined on
x86, just like any other CPU. Enabled on Intel CPUs for now.Allowing this required the identification and fixing of latent CPU#0
assumptions (such as CPU#0 initializations, etc.) in the x86
architecture code, plus the identification of barriers to
BSP-offlining, such as active PIC interrupts which can only be
serviced on the BSP.It's behind a default-off option, and there's a debug option that
allows the automatic testing of this feature.The motivation of this feature is to allow and prepare for true
CPU-hotplug hardware support: recent changes to MCE support enable us
to detect a deteriorating but not yet hard-failing L1/L2 cache on a
CPU that could be soft-unplugged - or a failing L3 cache on a
multi-socket system.Note that true hardware hot-plug is not yet fully enabled by this,
because that requires a special platform wakeup sequence to be sent to
the freshly powered up CPU#0. Future patches for this are planned,
once such a platform exists. Chicken and egg"* 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, topology: Debug CPU0 hotplug
x86/i387.c: Initialize thread xstate only on CPU0 only once
x86, hotplug: Handle retrigger irq by the first available CPU
x86, hotplug: The first online processor saves the MTRR state
x86, hotplug: During CPU0 online, enable x2apic, set_numa_node.
x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI
x86-32, hotplug: Add start_cpu0() entry point to head_32.S
x86-64, hotplug: Add start_cpu0() entry point to head_64.S
kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback
x86, hotplug, suspend: Online CPU0 for suspend or hibernate
x86, hotplug: Support functions for CPU0 online/offline
x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it
x86, Kconfig: Add config switch for CPU0 hotplug
doc: Add x86 CPU0 online/offline feature
15 Nov, 2012
2 commits
-
Even if acpi_processor_handle_eject() offlines cpu, there is a chance
to online the cpu after that. So the patch closes the window by using
get/put_online_cpus().Why does the patch change _cpu_up() logic?
The patch cares the race of hot-remove cpu and _cpu_up(). If the patch
does not change it, there is the following race.hot-remove cpu | _cpu_up()
------------------------------------- ------------------------------------
call acpi_processor_handle_eject() |
call cpu_down() |
call get_online_cpus() |
| call cpu_hotplug_begin() and stop here
call arch_unregister_cpu() |
call acpi_unmap_lsapic() |
call put_online_cpus() |
| start and continue _cpu_up()
return acpi_processor_remove() |
continue hot-remove the cpu |So _cpu_up() can continue to itself. And hot-remove cpu can also continue
itself. If the patch changes _cpu_up() logic, the race disappears as below:hot-remove cpu | _cpu_up()
-----------------------------------------------------------------------
call acpi_processor_handle_eject() |
call cpu_down() |
call get_online_cpus() |
| call cpu_hotplug_begin() and stop here
call arch_unregister_cpu() |
call acpi_unmap_lsapic() |
cpu's cpu_present is set |
to false by set_cpu_present()|
call put_online_cpus() |
| start _cpu_up()
| check cpu_present() and return -EINVAL
return acpi_processor_remove() |
continue hot-remove the cpu |Signed-off-by: Yasuaki Ishimatsu
Reviewed-by: Srivatsa S. Bhat
Reviewed-by: Toshi Kani
Signed-off-by: Rafael J. Wysocki -
cpu_hotplug_pm_callback should have higher priority than
bsp_pm_callback which depends on cpu_hotplug_pm_callback to disable cpu hotplug
to avoid race during bsp online checking.This is to hightlight the priorities between the two callbacks in case people
may overlook the order.Ideally the priorities should be defined in macro/enum instead of fixed values.
To do that, a seperate patchset may be pushed which will touch serveral other
generic files and is out of scope of this patchset.Signed-off-by: Fenghua Yu
Link: http://lkml.kernel.org/r/1352835171-3958-7-git-send-email-fenghua.yu@intel.com
Reviewed-by: Srivatsa S. Bhat
Acked-by: Rafael J. Wysocki
Signed-off-by: H. Peter Anvin
09 Oct, 2012
1 commit
-
The synchronization between CPU hotplug readers and writers is achieved
by means of refcounting, safeguarded by the cpu_hotplug.lock.get_online_cpus() increments the refcount, whereas put_online_cpus()
decrements it. If we ever hit an imbalance between the two, we end up
compromising the guarantees of the hotplug synchronization i.e, for
example, an extra call to put_online_cpus() can end up allowing a
hotplug reader to execute concurrently with a hotplug writer.So, add a WARN_ON() in put_online_cpus() to detect such cases where the
refcount can go negative, and also attempt to fix it up, so that we can
continue to run.Signed-off-by: Srivatsa S. Bhat
Reviewed-by: Yasuaki Ishimatsu
Cc: Jiri Kosina
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 Oct, 2012
1 commit
-
Pull x86/asm changes from Ingo Molnar:
"The one change that stands out is the alternatives patching change
that prevents us from ever patching back instructions from SMP to UP:
this simplifies things and speeds up CPU hotplug.Other than that it's smaller fixes, cleanups and improvements."
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Unspaghettize do_trap()
x86_64: Work around old GAS bug
x86: Use REP BSF unconditionally
x86: Prefer TZCNT over BFS
x86/64: Adjust types of temporaries used by ffs()/fls()/fls64()
x86: Drop unnecessary kernel_eflags variable on 64-bit
x86/smp: Don't ever patch back to UP if we unplug cpus
23 Aug, 2012
1 commit
-
We still patch SMP instructions to UP variants if we boot with a
single CPU, but not at any other time. In particular, not if we
unplug CPUs to return to a single cpu.Paul McKenney points out:
mean offline overhead is 6251/48=130.2 milliseconds.
If I remove the alternatives_smp_switch() from the offline
path [...] the mean offline overhead is 550/42=13.1 millisecondsBasically, we're never going to get those 120ms back, and the
code is pretty messy.We get rid of:
1) The "smp-alt-once" boot option. It's actually "smp-alt-boot", the
documentation is wrong. It's now the default.2) The skip_smp_alternatives flag used by suspend.
3) arch_disable_nonboot_cpus_begin() and arch_disable_nonboot_cpus_end()
which were only used to set this one flag.Signed-off-by: Rusty Russell
Cc: Paul McKenney
Cc: Suresh Siddha
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/87vcgwwive.fsf@rustcorp.com.au
Signed-off-by: Ingo Molnar
13 Aug, 2012
1 commit
-
Provide a generic interface for setting up and tearing down percpu
threads.On registration the threads for already online cpus are created and
started. On deregistration (modules) the threads are stoppped.During hotplug operations the threads are created, started, parked and
unparked. The datastructure for registration provides a pointer to
percpu storage space and optional setup, cleanup, park, unpark
functions. These functions are called when the thread state changes.Each implementation has to provide a function which is queried and
returns whether the thread should run and the thread function itself.The core code handles all state transitions and avoids duplicated code
in the call sites.[ paulmck: Preemption leak fix ]
Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Reviewed-by: Srivatsa S. Bhat
Cc: Rusty Russell
Reviewed-by: Paul E. McKenney
Cc: Namhyung Kim
Link: http://lkml.kernel.org/r/20120716103948.352501068@linutronix.de
Signed-off-by: Thomas Gleixner
01 Aug, 2012
1 commit
-
When hotadd_new_pgdat() is called to create new pgdat for a new node, a
fallback zonelist should be created for the new node. There's code to try
to achieve that in hotadd_new_pgdat() as below:/*
* The node we allocated has no zone fallback lists. For avoiding
* to access not-initialized zonelist, build here.
*/
mutex_lock(&zonelists_mutex);
build_all_zonelists(pgdat, NULL);
mutex_unlock(&zonelists_mutex);But it doesn't work as expected. When hotadd_new_pgdat() is called, the
new node is still in offline state because node_set_online(nid) hasn't
been called yet. And build_all_zonelists() only builds zonelists for
online nodes as:for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);build_zonelists(pgdat);
build_zonelist_cache(pgdat);
}Though we hope to create zonelist for the new pgdat, but it doesn't. So
add a new parameter "pgdat" the build_all_zonelists() to build pgdat for
the new pgdat too.Signed-off-by: Jiang Liu
Signed-off-by: Xishi Qiu
Cc: Mel Gorman
Cc: Michal Hocko
Cc: Minchan Kim
Cc: Rusty Russell
Cc: Yinghai Lu
Cc: Tony Luck
Cc: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Cc: David Rientjes
Cc: Keping Chen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Jun, 2012
2 commits
-
Add more comments on clear_tasks_mm_cpumask, plus adds a runtime check:
the function is only suitable for offlined CPUs, and if called
inappropriately, the kernel should scream aloud.[akpm@linux-foundation.org: tweak comment: s/walks up/walks/, use 80 cols]
Suggested-by: Andrew Morton
Suggested-by: Peter Zijlstra
Signed-off-by: Anton Vorontsov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Many architectures clear tasks' mm_cpumask like this:
read_lock(&tasklist_lock);
for_each_process(p) {
if (p->mm)
cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
}
read_unlock(&tasklist_lock);Depending on the context, the code above may have several problems,
such as:1. Working with task->mm w/o getting mm or grabing the task lock is
dangerous as ->mm might disappear (exit_mm() assigns NULL under
task_lock(), so tasklist lock is not enough).2. Checking for process->mm is not enough because process' main
thread may exit or detach its mm via use_mm(), but other threads
may still have a valid mm.This patch implements a small helper function that does things
correctly, i.e.:1. We take the task's lock while whe handle its mm (we can't use
get_task_mm()/mmput() pair as mmput() might sleep);2. To catch exited main thread case, we use find_lock_task_mm(),
which walks up all threads and returns an appropriate task
(with task lock held).Also, Per Peter Zijlstra's idea, now we don't grab tasklist_lock in
the new helper, instead we take the rcu read lock. We can do this
because the function is called after the cpu is taken down and marked
offline, so no new tasks will get this cpu set in their mm mask.Signed-off-by: Anton Vorontsov
Cc: Richard Weinberger
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Russell King
Cc: Benjamin Herrenschmidt
Cc: Mike Frysinger
Cc: Paul Mundt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 May, 2012
1 commit
-
percpu areas are already allocated during boot for each possible cpu.
percpu idle threads can be considered as an extension of the percpu areas,
and allocate them for each possible cpu during boot.This will eliminate the need for workqueue based idle thread allocation.
In future we can move the idle thread area into the percpu area too.[ tglx: Moved the loop into smpboot.c and added an error check when
the init code failed to allocate an idle thread for a cpu which
should be onlined ]Signed-off-by: Suresh Siddha
Cc: Peter Zijlstra
Cc: Rusty Russell
Cc: Paul E. McKenney
Cc: Srivatsa S. Bhat
Cc: Tejun Heo
Cc: David Rientjes
Cc: venki@google.com
Link: http://lkml.kernel.org/r/1334966930.28674.245.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Thomas Gleixner
26 Apr, 2012
3 commits
-
All SMP architectures have magic to fork the idle task and to store it
for reusage when cpu hotplug is enabled. Provide a generic
infrastructure for it.Create/reinit the idle thread for the cpu which is brought up in the
generic code and hand the thread pointer to the architecture code via
__cpu_up().Note, that fork_idle() is called via a workqueue, because this
guarantees that the idle thread does not get a reference to a user
space VM. This can happen when the boot process did not bring up all
possible cpus and a later cpu_up() is initiated via the sysfs
interface. In that case fork_idle() would be called in the context of
the user space task and take a reference on the user space VM.Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Rusty Russell
Cc: Paul E. McKenney
Cc: Srivatsa S. Bhat
Cc: Matt Turner
Cc: Russell King
Cc: Mike Frysinger
Cc: Jesper Nilsson
Cc: Richard Kuo
Cc: Tony Luck
Cc: Hirokazu Takata
Cc: Ralf Baechle
Cc: David Howells
Cc: James E.J. Bottomley
Cc: Benjamin Herrenschmidt
Cc: Martin Schwidefsky
Cc: Paul Mundt
Cc: David S. Miller
Cc: Chris Metcalf
Cc: Richard Weinberger
Cc: x86@kernel.org
Acked-by: Venkatesh Pallipadi
Link: http://lkml.kernel.org/r/20120420124557.102478630@linutronix.de -
Start a new file, which will hold SMP and CPU hotplug related generic
infrastructure.Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Rusty Russell
Cc: Paul E. McKenney
Cc: Srivatsa S. Bhat
Cc: Matt Turner
Cc: Russell King
Cc: Mike Frysinger
Cc: Jesper Nilsson
Cc: Richard Kuo
Cc: Tony Luck
Cc: Hirokazu Takata
Cc: Ralf Baechle
Cc: David Howells
Cc: James E.J. Bottomley
Cc: Benjamin Herrenschmidt
Cc: Martin Schwidefsky
Cc: Paul Mundt
Cc: David S. Miller
Cc: Chris Metcalf
Cc: Richard Weinberger
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20120420124557.035417523@linutronix.de -
Preparatory patch to make the idle thread allocation for secondary
cpus generic.Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Rusty Russell
Cc: Paul E. McKenney
Cc: Srivatsa S. Bhat
Cc: Matt Turner
Cc: Russell King
Cc: Mike Frysinger
Cc: Jesper Nilsson
Cc: Richard Kuo
Cc: Tony Luck
Cc: Hirokazu Takata
Cc: Ralf Baechle
Cc: David Howells
Cc: James E.J. Bottomley
Cc: Benjamin Herrenschmidt
Cc: Martin Schwidefsky
Cc: Paul Mundt
Cc: David S. Miller
Cc: Chris Metcalf
Cc: Richard Weinberger
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20120420124556.964170564@linutronix.de
09 Jan, 2012
1 commit
-
* 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits)
PM / Hibernate: Implement compat_ioctl for /dev/snapshot
PM / Freezer: fix return value of freezable_schedule_timeout_killable()
PM / shmobile: Allow the A4R domain to be turned off at run time
PM / input / touchscreen: Make st1232 use device PM QoS constraints
PM / QoS: Introduce dev_pm_qos_add_ancestor_request()
PM / shmobile: Remove the stay_on flag from SH7372's PM domains
PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume
PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode
PM: Drop generic_subsys_pm_ops
PM / Sleep: Remove forward-only callbacks from AMBA bus type
PM / Sleep: Remove forward-only callbacks from platform bus type
PM: Run the driver callback directly if the subsystem one is not there
PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
PM/Devfreq: Add Exynos4-bus device DVFS driver for Exynos4210/4212/4412.
PM / Sleep: Merge internal functions in generic_ops.c
PM / Sleep: Simplify generic system suspend callbacks
PM / Hibernate: Remove deprecated hibernation snapshot ioctls
PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
ARM: S3C64XX: Implement basic power domain support
PM / shmobile: Use common always on power domain governor
...Fix up trivial conflict in fs/xfs/xfs_buf.c due to removal of unused
XBT_FORCE_SLEEP bit
07 Jan, 2012
1 commit
-
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
sched/tracing: Add a new tracepoint for sleeptime
sched: Disable scheduler warnings during oopses
sched: Fix cgroup movement of waking process
sched: Fix cgroup movement of newly created process
sched: Fix cgroup movement of forking process
sched: Remove cfs bandwidth period check in tg_set_cfs_period()
sched: Fix load-balance lock-breaking
sched: Replace all_pinned with a generic flags field
sched: Only queue remote wakeups when crossing cache boundaries
sched: Add missing rcu_dereference() around ->real_parent usage
[S390] fix cputime overflow in uptime_proc_show
[S390] cputime: add sparse checking and cleanup
sched: Mark parent and real_parent as __rcu
sched, nohz: Fix missing RCU read lock
sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer
sched, nohz: Fix the idle cpu check in nohz_idle_balance
sched: Use jump_labels for sched_feat
sched/accounting: Fix parameter passing in task_group_account_field
sched/accounting: Fix user/system tick double accounting
sched/accounting: Re-use scheduler statistics for the root cgroup
...Fix up conflicts in
- arch/ia64/include/asm/cputime.h, include/asm-generic/cputime.h
usecs_to_cputime64() vs the sparse cleanups
- kernel/sched/fair.c, kernel/time/tick-sched.c
scheduler changes in multiple branches
15 Dec, 2011
1 commit
-
Make cputime_t and cputime64_t nocast to enable sparse checking to
detect incorrect use of cputime. Drop the cputime macros for simple
scalar operations. The conversion macros are still needed.Signed-off-by: Martin Schwidefsky
13 Dec, 2011
1 commit
-
Building rcutorture as a module requires cpu_up() as well as cpu_down()
exported, so apply EXPORT_SYMBOL_GPL().Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
24 Nov, 2011
1 commit
-
Add __init for functions alloc_frozen_cpus() and cpu_hotplug_pm_sync_init()
because they are only called during boot time.Add static for function cpu_hotplug_pm_sync_init() because its scope is limited
in this file only.Signed-off-by: Fenghua Yu
Acked-by: Srivatsa S. Bhat
Signed-off-by: Rafael J. Wysocki