08 May, 2012
1 commit
-
kick_all_cpus_sync() is the core implementation of cpu_idle_wait()
which is copied all over the arch code.Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20120507175652.119842173@linutronix.de
07 Apr, 2012
2 commits
-
Fix a NULL pointer dereference panic in cpuidle_play_dead() during
CPU off-lining when no cpuidle driver is registered. A cpuidle
driver may be registered at boot-time based on CPU type. This patch
allows an off-lined CPU to enter HLT-based idle in this condition.Signed-off-by: Toshi Kani
Cc: Boris Ostrovsky
Reviewed-by: Srivatsa S. Bhat
Tested-by: Srivatsa S. Bhat
Signed-off-by: Len Brown
31 Mar, 2012
1 commit
-
Pull ACPI & Power Management changes from Len Brown:
- ACPI 5.0 after-ripples, ACPICA/Linux divergence cleanup
- cpuidle evolving, more ARM use
- thermal sub-system evolving, ditto
- assorted other PM bitsFix up conflicts in various cpuidle implementations due to ARM cpuidle
cleanups (ARM at91 self-refresh and cpu idle code rewritten into
"standby" in asm conflicting with the consolidation of cpuidle time
keeping), trivial SH include file context conflict and RCU tracing fixes
in generic code.* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (77 commits)
ACPI throttling: fix endian bug in acpi_read_throttling_status()
Disable MCP limit exceeded messages from Intel IPS driver
ACPI video: Don't start video device until its associated input device has been allocated
ACPI video: Harden video bus adding.
ACPI: Add support for exposing BGRT data
ACPI: export acpi_kobj
ACPI: Fix logic for removing mappings in 'acpi_unmap'
CPER failed to handle generic error records with multiple sections
ACPI: Clean redundant codes in scan.c
ACPI: Fix unprotected smp_processor_id() in acpi_processor_cst_has_changed()
ACPI: consistently use should_use_kmap()
PNPACPI: Fix device ref leaking in acpi_pnp_match
ACPI: Fix use-after-free in acpi_map_lsapic
ACPI: processor_driver: add missing kfree
ACPI, APEI: Fix incorrect APEI register bit width check and usage
Update documentation for parameter *notrigger* in einj.txt
ACPI, APEI, EINJ, new parameter to control trigger action
ACPI, APEI, EINJ, limit the range of einj_param
ACPI, APEI, Fix ERST header length check
cpuidle: power_usage should be declared signed integer
...
30 Mar, 2012
4 commits
-
power_usage is always assigned a negative value and should be declared
a signed integerSigned-off-by: Boris Ostrovsky
Signed-off-by: Len Brown -
Currently when a CPU is off-lined it enters either MWAIT-based idle or,
if MWAIT is not desired or supported, HLT-based idle (which places the
processor in C1 state). This patch allows processors without MWAIT
support to stay in states deeper than C1.Signed-off-by: Boris Ostrovsky
Signed-off-by: Len Brown -
If the state_count is not initialized for the device use
the driver's state count as the default. That will prevent
to add it manually in the cpuidle driver initialization
routine and will save us from duplicate line of code.Signed-off-by: Daniel Lezcano
Signed-off-by: Len Brown -
Some C states of new CPU might be not good. One reason is BIOS might
configure them incorrectly. To help developers root cause it quickly, the
patch adds a new sysfs entry, so developers could disable specific C state
manually.In addition, C state might have much impact on performance tuning, as it
takes much time to enter/exit C states, which might delay interrupt
processing. With the new debug option, developers could check if a deep C
state could impact performance and how much impact it could cause.Also add this option in Documentation/cpuidle/sysfs.txt.
[akpm@linux-foundation.org: check kstrtol return value]
Signed-off-by: ShuoX Liu
Reviewed-by: Yanmin Zhang
Reviewed-and-Tested-by: Deepthi Dharwar
Signed-off-by: Andrew Morton
Signed-off-by: Len Brown
21 Mar, 2012
1 commit
-
Make necessary changes to implement time keeping and irq enabling
in the core cpuidle code. This will allow the removal of these
functionalities from various platform cpuidle implementations whose
timekeeping and irq enabling follows the form in this common code.Signed-off-by: Robert Lee
Tested-by: Jean Pihet
Tested-by: Amit Daniel
Tested-by: Robert Lee
Reviewed-by: Kevin Hilman
Reviewed-by: Daniel Lezcano
Reviewed-by: Deepthi Dharwar
Acked-by: Jean Pihet
Signed-off-by: Len Brown
05 Mar, 2012
1 commit
-
Conflicts:
tools/perf/builtin-record.c
tools/perf/builtin-top.c
tools/perf/perf.h
tools/perf/util/top.hMerge reason: resolve these cherry-picking conflicts.
Signed-off-by: Ingo Molnar
22 Feb, 2012
1 commit
-
We moved all our pSeries idle loops to the cpu idle framework
so we really want it to come up by default.Signed-off-by: Benjamin Herrenschmidt
13 Feb, 2012
1 commit
-
As the tracepoints in the cpuidle code are called when rcu_idle_exit() is in
effect, the _rcuidle() version must be used, otherwise the rcu_read_lock()s
that protect the tracepoint will not be honored.Cc: Len Brown
Reviewed-by: Josh Triplett
Reviewed-by: Paul E. McKenney
Signed-off-by: Steven Rostedt
22 Dec, 2011
1 commit
-
This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.Userspace relies on events and generic sysfs subsystem infrastructure
from sysdev devices, which are made available with this conversion.Cc: Haavard Skinnemoen
Cc: Hans-Christian Egtvedt
Cc: Tony Luck
Cc: Fenghua Yu
Cc: Arnd Bergmann
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Paul Mundt
Cc: "David S. Miller"
Cc: Chris Metcalf
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Borislav Petkov
Cc: Tigran Aivazian
Cc: Len Brown
Cc: Zhang Rui
Cc: Dave Jones
Cc: Peter Zijlstra
Cc: Russell King
Cc: Andrew Morton
Cc: Arjan van de Ven
Cc: "Rafael J. Wysocki"
Cc: "Srivatsa S. Bhat"
Signed-off-by: Kay Sievers
Signed-off-by: Greg Kroah-Hartman
08 Nov, 2011
1 commit
-
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
cpuidle: Single/Global registration of idle states
cpuidle: Split cpuidle_state structure and move per-cpu statistics fields
cpuidle: Remove CPUIDLE_FLAG_IGNORE and dev->prepare()
cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state
ACPI: Fix CONFIG_ACPI_DOCK=n compiler warning
ACPI: Export FADT pm_profile integer value to userspace
thermal: Prevent polling from happening during system suspend
ACPI: Drop ACPI_NO_HARDWARE_INIT
ACPI atomicio: Convert width in bits to bytes in __acpi_ioremap_fast()
PNPACPI: Simplify disabled resource registration
ACPI: Fix possible recursive locking in hwregs.c
ACPI: use kstrdup()
mrst pmu: update comment
tools/power turbostat: less verbose debugging
07 Nov, 2011
4 commits
-
This patch makes the cpuidle_states structure global (single copy)
instead of per-cpu. The statistics needed on per-cpu basis
by the governor are kept per-cpu. This simplifies the cpuidle
subsystem as state registration is done by single cpu only.
Having single copy of cpuidle_states saves memory. Rare case
of asymmetric C-states can be handled within the cpuidle driver
and architectures such as POWER do not have asymmetric C-states.Having single/global registration of all the idle states,
dynamic C-state transitions on x86 are handled by
the boot cpu. Here, the boot cpu would disable all the devices,
re-populate the states and later enable all the devices,
irrespective of the cpu that would receive the notification first.Reference:
https://lkml.org/lkml/2011/4/25/83Signed-off-by: Deepthi Dharwar
Signed-off-by: Trinabh Gupta
Tested-by: Jean Pihet
Reviewed-by: Kevin Hilman
Acked-by: Arjan van de Ven
Acked-by: Kevin Hilman
Signed-off-by: Len Brown -
This is the first step towards global registration of cpuidle
states. The statistics used primarily by the governor are per-cpu
and have to be split from rest of the fields inside cpuidle_state,
which would be made global i.e. single copy. The driver_data field
is also per-cpu and moved.Signed-off-by: Deepthi Dharwar
Signed-off-by: Trinabh Gupta
Tested-by: Jean Pihet
Reviewed-by: Kevin Hilman
Acked-by: Arjan van de Ven
Acked-by: Kevin Hilman
Signed-off-by: Len Brown -
The cpuidle_device->prepare() mechanism causes updates to the
cpuidle_state[].flags, setting and clearing CPUIDLE_FLAG_IGNORE
to tell the governor not to chose a state on a per-cpu basis at
run-time. State demotion is now handled by the driver and it returns
the actual state entered. Hence, this mechanism is not required.
Also this removes per-cpu flags from cpuidle_state enabling
it to be made global.Reference:
https://lkml.org/lkml/2011/3/25/52Signed-off-by: Deepthi Dharwar
Signed-off-by: Trinabh Gupta
Tested-by: Jean Pihet
Acked-by: Arjan van de Ven
Reviewed-by: Kevin Hilman
Signed-off-by: Len Brown -
Cpuidle governor only suggests the state to enter using the
governor->select() interface, but allows the low level driver to
override the recommended state. The actual entered state
may be different because of software or hardware demotion. Software
demotion is done by the back-end cpuidle driver and can be accounted
correctly. Current cpuidle code uses last_state field to capture the
actual state entered and based on that updates the statistics for the
state entered.Ideally the driver enter routine should update the counters,
and it should return the state actually entered rather than the time
spent there. The generic cpuidle code should simply handle where
the counters live in the sysfs namespace, not updating the counters.Reference:
https://lkml.org/lkml/2011/3/25/52Signed-off-by: Deepthi Dharwar
Signed-off-by: Trinabh Gupta
Tested-by: Jean Pihet
Reviewed-by: Kevin Hilman
Acked-by: Arjan van de Ven
Acked-by: Kevin Hilman
Signed-off-by: Len Brown
01 Nov, 2011
2 commits
-
This file has module_init/exit and MODULE_LICENSE, and so it
needs the full module.h header.Signed-off-by: Paul Gortmaker
-
Signed-off-by: Paul Gortmaker
25 Aug, 2011
1 commit
-
The PM QoS implementation files are better named
kernel/power/qos.c and include/linux/pm_qos.h.The PM QoS support is compiled under the CONFIG_PM option.
Signed-off-by: Jean Pihet
Acked-by: markgross
Reviewed-by: Kevin Hilman
Signed-off-by: Rafael J. Wysocki
04 Aug, 2011
3 commits
-
cpuidle users should call cpuidle_call_idle() directly
rather than via (pm_idle)() function pointer.Architecture may choose to continue using (pm_idle)(),
but cpuidle need not depend on it:my_arch_cpu_idle()
...
if(cpuidle_call_idle())
pm_idle();cc: Kevin Hilman
cc: Paul Mundt
cc: x86@kernel.org
Acked-by: H. Peter Anvin
Signed-off-by: Len Brown -
When a Xen Dom0 kernel boots on a hypervisor, it gets access
to the raw-hardware ACPI tables. While it parses the idle tables
for the hypervisor's beneift, it uses HLT for its own idle.Rather than have xen scribble on pm_idle and access default_idle,
have it simply disable_cpuidle() so acpi_idle will not load and
architecture default HLT will be used.cc: xen-devel@lists.xensource.com
Tested-by: Konrad Rzeszutek Wilk
Acked-by: H. Peter Anvin
Signed-off-by: Len Brown -
useful for disabling cpuidle to fall back
to architecture-default idle loopcpuidle drivers and governors will fail to register.
on x86 they'll say so:intel_idle: intel_idle yielding to (null)
ACPI: acpi_idle yielding to (null)Signed-off-by: Len Brown
30 May, 2011
1 commit
-
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param
x86 idle: deprecate "no-hlt" cmdline param
x86 idle APM: deprecate CONFIG_APM_CPU_IDLE
x86 idle floppy: deprecate disable_hlt()
x86 idle: EXPORT_SYMBOL(default_idle, pm_idle) only when APM demands it
x86 idle: clarify AMD erratum 400 workaround
idle governor: Avoid lock acquisition to read pm_qos before entering idle
cpuidle: menu: fixed wrapping timers at 4.294 seconds
29 May, 2011
1 commit
-
Cpuidle menu governor is using u32 as a temporary datatype for storing
nanosecond values which wrap around at 4.294 seconds. This causes errors
in predicted sleep times resulting in higher than should be C state
selection and increased power consumption. This also breaks cpuidle
state residency statistics.cc: stable@kernel.org # .32.x through .39.x
Signed-off-by: Tero Kristo
Signed-off-by: Len Brown
15 Feb, 2011
1 commit
19 Jan, 2011
1 commit
-
Fix a bunch of
warning: ‘inline’ is not at beginning of declaration
messages when building a 'make allyesconfig' kernel with -Wextra.These warnings are trivial to kill, yet rather annoying when building with
-Wextra.
The more we can cut down on pointless crap like this the better (IMHO).A previous patch to do this for a 'allnoconfig' build has already been
merged. This just takes the cleanup a little further.Signed-off-by: Jesper Juhl
Signed-off-by: Jiri Kosina
13 Jan, 2011
6 commits
-
… from the cpuidle layer
Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle"
events -> this patch fixes it and makes cpu_idle events throwing less complex.It also introduces cpu_idle events for all architectures which use
the cpuidle subsystem, namely:
- arch/arm/mach-at91/cpuidle.c
- arch/arm/mach-davinci/cpuidle.c
- arch/arm/mach-kirkwood/cpuidle.c
- arch/arm/mach-omap2/cpuidle34xx.c
- arch/drivers/acpi/processor_idle.c (for all cases, not only mwait)
- arch/x86/kernel/process.c (did throw events before, but was a mess)
- drivers/idle/intel_idle.c (did throw events before)Convention should be:
Fire cpu_idle events inside the current pm_idle function (not somewhere
down the the callee tree) to keep things easy.Current possible pm_idle functions in X86:
c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle
-> this is really easy is now.This affects userspace:
The type field of the cpu_idle power event can now direclty get
mapped to:
/sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
instead of throwing very CPU/mwait specific values.
This change is not visible for the intel_idle driver.
For the acpi_idle driver it should only be visible if the vendor
misses out C-states in his BIOS.
Another (perf timechart) patch reads out cpuidle info of cpu_idle
events from:
/sys/.../cpuidle/stateX/*, then the cpuidle events are mapped
to the correct C-/cpuidle state again, even if e.g. vendors miss
out C-states in their BIOS and for example only export C1 and C3.
-> everything is fine.Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Robert Schoene <robert.schoene@tu-dresden.de>
CC: Jean Pihet <j-pihet@ti.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: linux-pm@lists.linux-foundation.org
CC: linux-acpi@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-perf-users@vger.kernel.org
CC: linux-omap@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com> -
it serves no purpose
Signed-off-by: Len Brown
-
C0 means and is well know as "not idle".
All documentation out there uses this term as "running"/"not idle"
state. Also Linux userspace tools (e.g. cpufreq-aperf and turbostat)
show C0 residency which there is correct, but means something totally
else than cpuidle "POLL" state.Signed-off-by: Thomas Renninger
Signed-off-by: Len Brown -
The following scenario is possible with the current cpuidle code and
the ACPI cpuidle driver:
(1) acpi_processor_cst_has_changed() is called,
(2) cpuidle_disable_device() is called,
(3) cpuidle_remove_state_sysfs() is called to remove the (presumably
outdated) states info from sysfs,
(3) acpi_processor_get_power_info() is called, the first entry in the
pr->power.states[] table is filled with zeros,
(4) acpi_processor_setup_cpuidle() is called and it doesn't fill the
first entry in pr->power.states[],
(5) cpuidle_enable_device() is called,
(6) __cpuidle_register_device() is _not_ called, since the device has
already been registered,
(7) Consequently, poll_idle_init() is _not_ called either,
(8) cpuidle_add_state_sysfs() is called to create the sysfs attributes
for the new states and it uses the bogus first table entry from
acpi_processor_get_power_info() for creating state0.This problem is avoided if cpuidle_enable_device()
unconditionally calls poll_idle_init().Reported-by: Len Brown
Signed-off-by: Rafael J. Wysocki
Signed-off-by: Len Brown
cc: stable@kernel.org
08 Jan, 2011
1 commit
-
* 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits)
gameport: use this_cpu_read instead of lookup
x86: udelay: Use this_cpu_read to avoid address calculation
x86: Use this_cpu_inc_return for nmi counter
x86: Replace uses of current_cpu_data with this_cpu ops
x86: Use this_cpu_ops to optimize code
vmstat: User per cpu atomics to avoid interrupt disable / enable
irq_work: Use per cpu atomics instead of regular atomics
cpuops: Use cmpxchg for xchg to avoid lock semantics
x86: this_cpu_cmpxchg and this_cpu_xchg operations
percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support
percpu,x86: relocate this_cpu_add_return() and friends
connector: Use this_cpu operations
xen: Use this_cpu_inc_return
taskstats: Use this_cpu_ops
random: Use this_cpu_inc_return
fs: Use this_cpu_inc_return in buffer.c
highmem: Use this_cpu_xx_return() operations
vmstat: Use this_cpu_inc_return for vm statistics
x86: Support for this_cpu_add, sub, dec, inc_return
percpu: Generic support for this_cpu_add, sub, dec, inc_return
...Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c}
as per Tejun.
04 Jan, 2011
1 commit
-
Add these new power trace events:
power:cpu_idle
power:cpu_frequency
power:machine_suspendThe old C-state/idle accounting events:
power:power_start
power:power_endHave now a replacement (but we are still keeping the old
tracepoints for compatibility):power:cpu_idle
and
power:power_frequencyis replaced with:
power:cpu_frequencypower:machine_suspend is newly introduced.
Jean Pihet has a patch integrated into the generic layer
(kernel/power/suspend.c) which will make use of it.the type= field got removed from both, it was never
used and the type is differed by the event type itself.perf timechart userspace tool gets adjusted in a separate patch.
Signed-off-by: Thomas Renninger
Signed-off-by: Ingo Molnar
Acked-by: Arjan van de Ven
Acked-by: Jean Pihet
Cc: Arnaldo Carvalho de Melo
Cc: Peter Zijlstra
Cc: Linus Torvalds
Cc: rjw@sisk.pl
LKML-Reference:
Signed-off-by: Ingo Molnar
LKML-Reference:
17 Dec, 2010
1 commit
-
__get_cpu_var() can be replaced with this_cpu_read and will then use a single
read instruction with implied address calculation to access the correct per cpu
instance.However, the address of a per cpu variable passed to __this_cpu_read() cannot be
determed (since its an implied address conversion through segment prefixes).
Therefore apply this only to uses of __get_cpu_var where the addres of the
variable is not used.V3->V4:
- Move one instance of this_cpu_inc_return to a later patch
so that this one can go in without percpu infrastructrure
changes.Sedat: fixed compile failure caused by an extra ')'.
Cc: Neil Horman
Cc: Martin Schwidefsky
Cc: Sedat Dilek
Acked-by: H. Peter Anvin
Signed-off-by: Christoph Lameter
Signed-off-by: Tejun Heo
29 Sep, 2010
1 commit
-
Signed-off-by: Len Brown
10 Aug, 2010
1 commit
-
On some SoC chips, HW resources may be in use during any particular idle
period. As a consequence, the cpuidle states that the SoC is safe to
enter can change from idle period to idle period. In addition, the
latency and threshold of each cpuidle state can vary, depending on the
operating condition when the CPU becomes idle, e.g. the current cpu
frequency, the current state of the HW blocks, etc.cpuidle core and the menu governor, in the current form, are geared
towards cpuidle states that are static, i.e. the availabiltiy of the
states, their latencies, their thresholds are non-changing during run
time. cpuidle does not provide any hook that cpuidle drivers can use to
adjust those values on the fly for the current idle period before the menu
governor selects the target cpuidle state.This patch extends cpuidle core and the menu governor to handle states
that are dynamic. There are three additions in the patch and the patch
maintains backwards-compatibility with existing cpuidle drivers.1) add prepare() to struct cpuidle_device. A cpuidle driver can hook
into the callback and cpuidle will call prepare() before calling the
governor's select function. The callback gives the cpuidle driver a
chance to update the dynamic information of the cpuidle states for the
current idle period, e.g. state availability, latencies, thresholds,
power values, etc.2) add CPUIDLE_FLAG_IGNORE as one of the state flags. In the prepare()
function, a cpuidle driver can set/clear the flag to indicate to the
menu governor whether a cpuidle state should be ignored, i.e. not
available, during the current idle period.3) add power_specified bit to struct cpuidle_device. The menu governor
currently assumes that the cpuidle states are arranged in the order of
increasing latency, threshold, and power savings. This is true or can
be made true for static states. Once the state parameters are dynamic,
the latencies, thresholds, and power savings for the cpuidle states can
increase or decrease by different amounts from idle period to idle
period. So the assumption of increasing latency, threshold, and power
savings from Cn to C(n+1) can no longer be guaranteed.It can be straightforward to calculate the power consumption of each
available state and to specify it in power_usage for the idle period.
Using the power_usage fields, the menu governor then selects the state
that has the lowest power consumption and that still satisfies all other
critieria. The power_specified bit defaults to 0. For existing cpuidle
drivers, cpuidle detects that power_specified is 0 and fills in a dummy
set of power_usage values.Signed-off-by: Ai Li
Cc: Len Brown
Acked-by: Arjan van de Ven
Cc: Ingo Molnar
Cc: Venkatesh Pallipadi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Aug, 2010
1 commit
-
and fix the broken case if a core's frequency depends on others.
trace_power_frequency was only implemented in a rather ungeneric way
in acpi-cpufreq driver's target() function only.
-> Move the call to trace_power_frequency to
cpufreq.c:cpufreq_notify_transition() where CPUFREQ_POSTCHANGE
notifier is triggered.
This will support power frequency tracing by all cpufreq driverstrace_power_frequency did not trace frequency changes correctly when
the userspace governor was used or when CPU cores' frequency depend
on each other.
-> Moving this into the CPUFREQ_POSTCHANGE notifier and pass the cpu
which gets switched automatically fixes this.Robert Schoene provided some important fixes on top of my initial
quick shot version which are integrated in this patch:
- Forgot some changes in power_end trace (TP_printk/variable names)
- Variable dummy in power_end must now be cpu_id
- Use static 64 bit variable instead of unsigned int for cpu_idSigned-off-by: Thomas Renninger
CC: davej@redhat.com
CC: arjan@infradead.org
CC: linux-kernel@vger.kernel.org
CC: robert.schoene@tu-dresden.de
Tested-by: robert.schoene@tu-dresden.de
Signed-off-by: Dave Jones