Eric Lee / smarc-fsl-linux-kernel

21 Mar, 2016

1 commit

0c313cb20 cpuidle: menu: Fall back to polling if next timer event is near ... Browse Code »

Commit a9ceb78bc75c (cpuidle,menu: use interactivity_req to disable
polling) changed the behavior of the fallback state selection part
of menu_select() so it looks at interactivity_req instead of
data->next_timer_us when it makes its decision. That effectively
caused polling to be used more often as fallback idle which led to
significant increases of energy consumption in some cases.

Commit e132b9b3bc7f (cpuidle: menu: use high confidence factors
only when considering polling) changed that logic again to be more
predictable, but that didn't help with the increased energy
consumption problem.

For this reason, go back to making decisions on which state to fall
back to based on data->next_timer_us which is the time we know for
sure something will happen rather than a prediction (which may be
inaccurate and turns out to be so often enough to be problematic).
However, take the target residency of the first proper idle state
(C1) into account, so that state is not used as the fallback one
if its target residency is greater than data->next_timer_us.

Fixes: a9ceb78bc75c (cpuidle,menu: use interactivity_req to disable polling)
Signed-off-by: Rafael J. Wysocki
Reported-and-tested-by: Doug Smythies

Rafael J. Wysocki
2016-03-21 22:50:28 +0800

17 Mar, 2016

1 commit

e132b9b3b cpuidle: menu: use high confidence factors only when considering polling ... Browse Code »

The menu governor uses five different factors to pick the
idle state:
- the user configured latency_req
- the time until the next timer (next_timer_us)
- the typical sleep interval, as measured recently
- an estimate of sleep time by dividing next_timer_us by an observed factor
- a load corrected version of the above, divided again by load

Only the first three items are known with enough confidence that
we can use them to consider polling, instead of an actual CPU
idle state, because the cost of being wrong about polling can be
excessive power use.

The latter two are used in the menu governor's main selection
loop, and can result in choosing a shallower idle state when
the system is expected to be busy again soon.

This pushes a busy system in the "performance" direction of
the performance<>power tradeoff, when choosing between idle
states, but stays more strictly on the "power" state when
deciding between polling and C1.

Signed-off-by: Rik van Riel
Signed-off-by: Rafael J. Wysocki

Rik van Riel
2016-03-17 09:40:32 +0800

17 Feb, 2016

2 commits

3b99669b7 cpuidle: menu: help gcc generate slightly better code ... Browse Code »

We know that the avg variable actually ends up holding a 32 bit
quantity, since it's an average of such numbers. It is only a u64
because it is temporarily used to hold the sum. Making it an actual
u32 allows gcc to generate slightly better code, e.g. when computing
the square, it can do a 32x32->64 multiply.

Signed-off-by: Rasmus Villemoes
Signed-off-by: Rafael J. Wysocki

Rasmus Villemoes
2016-02-17 07:28:15 +0800
7024b18ca cpuidle: menu: avoid expensive square root computation ... Browse Code »

Computing the integer square root is a rather expensive operation, at
least compared to doing a 64x64 -> 64 multiply (avg*avg) and, on 64
bit platforms, doing an extra comparison to a constant (variance
Signed-off-by: Rafael J. Wysocki

Rasmus Villemoes
2016-02-17 07:27:16 +0800

19 Jan, 2016

1 commit

5bb1729cb cpuidle: menu: Avoid pointless checks in menu_select() ... Browse Code »

If menu_select() cannot find a suitable state to return, it will
return the state index stored in data->last_state_idx. This
means that it is pointless to look at the states whose indices
are less than or equal to data->last_state_idx in the main loop,
so don't do that.

Given that those checks are done on every idle state selection, this
change can save quite a bit of completely unnecessary overhead.

Signed-off-by: Rafael J. Wysocki
Acked-by: Ingo Molnar
Tested-by: Sudeep Holla

Rafael J. Wysocki
2016-01-19 22:28:23 +0800

15 Jan, 2016

1 commit

9c4b2867e cpuidle: menu: Fix menu_select() for CPUIDLE_DRIVER_STATE_START == 0 ... Browse Code »

Commit a9ceb78bc75c (cpuidle,menu: use interactivity_req to disable
polling) exposed a bug in menu_select() causing it to return -1
on systems with CPUIDLE_DRIVER_STATE_START equal to zero, although
it should have returned 0. As a result, idle states are not entered
by CPUs on those systems.

Namely, on the systems in question data->last_state_idx is initially
equal to -1 and the above commit modified the condition that would
have caused it to be changed to 0 to be less likely to trigger which
exposed the problem. However, setting data->last_state_idx initially
to -1 doesn't make sense at all and on the affected systems it should
always be set to CPUIDLE_DRIVER_STATE_START (ie. 0) unconditionally,
so make that happen.

Fixes: a9ceb78bc75c (cpuidle,menu: use interactivity_req to disable polling)
Reported-and-tested-by: Sudeep Holla
Signed-off-by: Rafael J. Wysocki

Rafael J. Wysocki
2016-01-15 06:24:22 +0800

17 Nov, 2015

3 commits

efddfd90f cpuidle,menu: smooth out measured_us calculation ... Browse Code »

The cpuidle state tables contain the maximum exit latency for each
cpuidle state. On x86, that is the exit latency for when the entire
package goes into that same idle state.

However, a lot of the time we only go into the core idle state,
not the package idle state. This means we see a much smaller exit
latency.

We have no way to detect whether we went into the core or package
idle state while idle, and that is ok.

However, the current menu_update logic does have the potential to
trip up the repeating pattern detection in get_typical_interval.
If the system is experiencing an exit latency near the idle state's
exit latency, some of the samples will have exit_us subtracted,
while others will not. This turns a repeating pattern into mush,
potentially breaking get_typical_interval.

Furthermore, for smaller sleep intervals, we know the chance that
all the cores in the package went to the same idle state are fairly
small. Dividing the measured_us by two, instead of subtracting the
full exit latency when hitting a small measured_us, will reduce the
error.

Signed-off-by: Rik van Riel
Acked-by: Arjan van de Ven
Signed-off-by: Rafael J. Wysocki

Rik van Riel
2015-11-17 09:24:25 +0800
a9ceb78bc cpuidle,menu: use interactivity_req to disable polling ... Browse Code »

The menu governor carefully figures out how much time we typically
sleep for an estimated sleep interval, or whether there is a repeating
pattern going on, and corrects that estimate for the CPU load.

Then it proceeds to ignore that information when determining whether
or not to consider polling. This is not a big deal on most x86 CPUs,
which have very low C1 latencies, and the patch should not have any
effect on those CPUs.

However, certain CPUs (eg. Atom) have much higher C1 latencies, and
it would be good to not waste performance and power on those CPUs if
we are expecting a very low wakeup latency.

Disable polling based on the estimated interactivity requirement, not
on the time to the next timer interrupt.

Signed-off-by: Rik van Riel
Acked-by: Arjan van de Ven
Signed-off-by: Rafael J. Wysocki

Rik van Riel
2015-11-17 09:24:25 +0800
7884084f3 cpuidle,x86: increase forced cut-off for polling to 20us ... Browse Code »

The cpuidle menu governor has a forced cut-off for polling at 5us,
in order to deal with firmware that gives the OS bad information
on cpuidle states, leading to the system spending way too much time
in polling.

However, at least one x86 CPU family (Atom) has chips that have
a 20us break-even point for C1. Forcing the polling cut-off to
less than that wastes performance and power.

Increase the polling cut-off to 20us.

Systems with a lower C1 latency will be found in the states table by
the menu governor, which will pick those states as appropriate.

Signed-off-by: Rik van Riel
Acked-by: Arjan van de Ven
Signed-off-by: Rafael J. Wysocki

Rik van Riel
2015-11-17 09:24:24 +0800

05 May, 2015

1 commit

a802ea964 cpuidle: Check the sign of index in cpuidle_reflect() ... Browse Code »

Avoid calling the governor's ->reflect method if the state index
passed to cpuidle_reflect() is negative.

This allows the analogous check to be dropped from menu_reflect(),
so do that too, and ensures that arbitrary error codes can be
passed to cpuidle_reflect() as the index with no adverse
consequences.

Signed-off-by: Rafael J. Wysocki
Reviewed-by: Daniel Lezcano
Acked-by: Peter Zijlstra (Intel)

Rafael J. Wysocki
2015-05-05 04:53:28 +0800

17 Apr, 2015

1 commit

ee3c86f35 cpuidle: menu: use DIV_ROUND_CLOSEST_ULL() ... Browse Code »

Now that the kernel provides DIV_ROUND_CLOSEST_ULL(), drop the internal
implementation and use the kernel one.

Signed-off-by: Javi Merino
Acked-by: Rafael J. Wysocki
Cc: Mel Gorman
Cc: Stephen Hemminger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Javi Merino
2015-04-17 21:03:55 +0800

17 Dec, 2014

1 commit

4108b3d96 cpuidle: menu: Better idle duration measurement without using CPUIDLE_FLAG_TIME_INVALID ... Browse Code »

When menu sees CPUIDLE_FLAG_TIME_INVALID, it ignores its timestamps,
and assumes that idle lasted as long as the time till next predicted
timer expiration.

But if an interrupt was seen and serviced before that duration,
it would actually be more accurate to use the measured time
rather than rounding up to the next predicted timer expiration.

And if an interrupt is seen and serviced such that the mesured time
exceeds the time till next predicted timer expiration, then
truncating to that expiration is the right thing to do --
since we can never stay idle past that timer expiration.

So the code can do a better job without
checking for CPUIDLE_FLAG_TIME_INVALID.

Signed-off-by: Len Brown
Acked-by: Daniel Lezcano
Reviewed-by: Tuukka Tikkanen
Signed-off-by: Rafael J. Wysocki

Len Brown
2014-12-17 09:26:28 +0800

13 Nov, 2014

1 commit

b82b6cca4 cpuidle: Invert CPUIDLE_FLAG_TIME_VALID logic ... Browse Code »

The only place where the time is invalid is when the ACPI_CSTATE_FFH entry
method is not set. Otherwise for all the drivers, the time can be correctly
measured.

Instead of duplicating the CPUIDLE_FLAG_TIME_VALID flag in all the drivers
for all the states, just invert the logic by replacing it by the flag
CPUIDLE_FLAG_TIME_INVALID, hence we can set this flag only for the acpi idle
driver, remove the former flag from all the drivers and invert the logic with
this flag in the different governor.

Signed-off-by: Daniel Lezcano
Signed-off-by: Rafael J. Wysocki

Daniel Lezcano
2014-11-13 04:17:27 +0800

27 Aug, 2014

1 commit

229b6863b drivers/cpuidle: Replace __get_cpu_var uses for address calculation ... Browse Code »

All of these are for address calculation. Replace with
this_cpu_ptr().

Cc: Daniel Lezcano
Cc: linux-pm@vger.kernel.org
Acked-by: Rafael J. Wysocki
[cpufreq changes]
Signed-off-by: Christoph Lameter
Signed-off-by: Tejun Heo

Christoph Lameter
2014-08-27 01:45:45 +0800

15 Aug, 2014

1 commit

c9d26423e Merge tag 'pm+acpi-3.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull more ACPI and power management updates from Rafael Wysocki:
"These are a couple of regression fixes, cpuidle menu governor
optimizations, fixes for ACPI proccessor and battery drivers,
hibernation fix to avoid problems related to the e820 memory map,
fixes for a few cpufreq drivers and a new version of the suspend
profiling tool analyze_suspend.py.

Specifics:

- Fix for an ACPI-based device hotplug regression introduced in 3.14
that causes a kernel panic to trigger when memory hot-remove is
attempted with CONFIG_ACPI_HOTPLUG_MEMORY unset from Tang Chen

- Fix for a cpufreq regression introduced in 3.16 that triggers a
"sleeping function called from invalid context" bug in
dev_pm_opp_init_cpufreq_table() from Stephen Boyd

- ACPI battery driver fix for a warning message added in 3.16 that
prints silly stuff sometimes from Mariusz Ceier

- Hibernation fix for safer handling of mismatches in the 820 memory
map between the configurations during image creation and during the
subsequent restore from Chun-Yi Lee

- ACPI processor driver fix to handle CPU hotplug notifications
correctly during system suspend/resume from Lan Tianyu

- Series of four cpuidle menu governor cleanups that also should
speed it up a bit from Mel Gorman

- Fixes for the speedstep-smi, integrator, cpu0 and arm_big_little
cpufreq drivers from Hans Wennborg, Himangi Saraogi, Markus
Pargmann and Uwe Kleine-König

- Version 3.0 of the analyze_suspend.py suspend profiling tool from
Todd E Brandt"

* tag 'pm+acpi-3.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / battery: Fix warning message in acpi_battery_get_state()
PM / tools: analyze_suspend.py: update to v3.0
cpufreq: arm_big_little: fix module license spec
cpufreq: speedstep-smi: fix decimal printf specifiers
ACPI / hotplug: Check scan handlers in acpi_scan_hot_remove()
cpufreq: OPP: Avoid sleeping while atomic
cpufreq: cpu0: Do not print error message when deferring
cpufreq: integrator: Use set_cpus_allowed_ptr
PM / hibernate: avoid unsafe pages in e820 reserved regions
ACPI / processor: Make acpi_cpu_soft_notify() process CPU FROZEN events
cpuidle: menu: Lookup CPU runqueues less
cpuidle: menu: Call nr_iowait_cpu less times
cpuidle: menu: Use ktime_to_us instead of reinventing the wheel
cpuidle: menu: Use shifts when calculating averages where possible

Linus Torvalds
2014-08-15 08:13:46 +0800

07 Aug, 2014

5 commits

158c12948 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull trivial tree changes from Jiri Kosina:
"Summer edition of trivial tree updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
doc: fix two typos in watchdog-api.txt
irq-gic: remove file name from heading comment
MAINTAINERS: Add miscdevice.h to file list for char/misc drivers.
scsi: mvsas: mv_sas.c: Fix for possible null pointer dereference
doc: replace "practise" with "practice" in Documentation
befs: remove check for CONFIG_BEFS_RW
scsi: doc: fix 'SCSI_NCR_SETUP_MASTER_PARITY'
drivers/usb/phy/phy.c: remove a leading space
mfd: fix comment
cpuidle: fix comment
doc: hpfall.c: fix missing null-terminate after strncpy call
usb: doc: hotplug.txt code typos
kbuild: fix comment in Makefile.modinst
SH: add proper prompt to SH_MAGIC_PANEL_R2_VERSION
ARM: msm: Remove MSM_SCM
crypto: Remove MPILIB_EXTRA
doc: CN: remove dead link, kerneltrap.org no longer works
media: update reference, kerneltrap.org no longer works
hexagon: update reference, kerneltrap.org no longer works
doc: LSM: update reference, kerneltrap.org no longer works
...

Linus Torvalds
2014-08-07 12:03:53 +0800
372ba8cb4 cpuidle: menu: Lookup CPU runqueues less ... Browse Code »

The menu governer makes separate lookups of the CPU runqueue to get
load and number of IO waiters but it can be done with a single lookup.

Signed-off-by: Mel Gorman
Signed-off-by: Rafael J. Wysocki

Mel Gorman
2014-08-07 03:17:45 +0800
64b4ca5cb cpuidle: menu: Call nr_iowait_cpu less times ... Browse Code »

menu_select() via inline functions calls nr_iowait_cpu() twice as much
as necessary.

Signed-off-by: Mel Gorman
Signed-off-by: Rafael J. Wysocki

Mel Gorman
2014-08-07 03:17:44 +0800
107d4f460 cpuidle: menu: Use ktime_to_us instead of reinventing the wheel ... Browse Code »

The ktime_to_us implementation is slightly better than the one implemented
in menu.c. Use it

Signed-off-by: Mel Gorman
Acked-by: Daniel Lezcano
Signed-off-by: Rafael J. Wysocki

Mel Gorman
2014-08-07 03:17:44 +0800
ae7793006 cpuidle: menu: Use shifts when calculating averages where possible ... Browse Code »

We use do_div even though the divisor will usually be a power-of-two
unless there are unusual outliers. Use shifts where possible

Signed-off-by: Mel Gorman
Signed-off-by: Rafael J. Wysocki

Mel Gorman
2014-08-07 03:17:44 +0800

28 Jul, 2014

1 commit

8804ed155 cpuidle: menu governor - remove unused macro STDDEV_THRESH ... Browse Code »

STDDEV_THRESH was once defined and used in menu governor. But now its no longer
used anywhere. So removing the define.

Signed-off-by: Mohammad Merajul Islam Molla
Signed-off-by: Rafael J. Wysocki

Mohammad Merajul Islam Molla
2014-07-28 07:08:05 +0800

19 Jun, 2014

1 commit

2fba5376e cpuidle: fix comment ... Browse Code »

Signed-off-by: Antonio Ospite
Cc: "Rafael J. Wysocki"
Cc: "tuukka.tikkanen@linaro.org"
Signed-off-by: Jiri Kosina

Antonio Ospite
2014-06-19 21:21:56 +0800

01 May, 2014

2 commits

bed4d597a cpuidle / menu: move repeated correction factor check to init ... Browse Code »

In menu_select function we check for correction factor every time.
If it is zero we are initializing to unity. Hence move it to init function
and initialise by unity, hence avoid repeated comparisons.

Signed-off-by: Chander Kashyap
Reviewed-by: Tuukka Tikkanen
Signed-off-by: Rafael J. Wysocki

Chander Kashyap
2014-05-01 07:17:03 +0800
3836785a1 cpuidle / menu: Return (-1) if there are no suitable states ... Browse Code »

If there is a PM QoS latency limit and all of the sufficiently shallow
C-states are disabled, the cpuidle menu governor returns 0 which on
some systems is CPUIDLE_DRIVER_STATE_START and shouldn't be returned
if that C-state has been disabled.

Fix the issue by modifying the menu governor to return (-1) in such
situations.

Signed-off-by: Rafael J. Wysocki

Rafael J. Wysocki
2014-05-01 06:14:04 +0800

06 Mar, 2014

5 commits

96e95182e cpuidle: Move perf multiplier calculation out of the selection loop ... Browse Code »

The menu governor performance multiplier defines a minimum predicted
idle duration to latency ratio. Instead of checking this separately
in every iteration of the state selection loop, adjust the overall
latency restriction for the whole loop if this restriction is tighter
than what is set by the QoS subsystem.

The original test
s->exit_latency * multiplier > data->predicted_us
becomes
s->exit_latency > data->predicted_us / multiplier
by dividing both sides of the comparison by "multiplier".

While division is likely to be several times slower than multiplication,
the minor performance hit allows making a generic sleep state selection
function based on (sleep duration, maximum latency) tuple.

Signed-off-by: Tuukka Tikkanen
Signed-off-by: Rafael J. Wysocki

tuukka.tikkanen@linaro.org
2014-03-06 08:45:59 +0800
61c66d6ef cpuidle: Do not substract exit latency from assumed sleep length ... Browse Code »

The menu governor statistics update function tries to determine the
amount of time between entry to low power state and the occurrence
of the wakeup event. However, the time measured by the framework
includes exit latency on top of the desired value. This exit latency
is substracted from the measured value to obtain the desired value.

When measured value is not available, the menu governor assumes
the wakeup was caused by the timer and the time is equal to remaining
timer length. No exit latency should be substracted from this value.

This patch prevents the erroneous substraction and clarifies the
associated comment. It also removes one intermediate variable that
serves no purpose.

Signed-off-by: Tuukka Tikkanen
Signed-off-by: Rafael J. Wysocki

tuukka.tikkanen@linaro.org
2014-03-06 08:45:59 +0800
7ac264366 cpuidle: Ensure menu coefficients stay within domain ... Browse Code »

The menu governor uses coefficients as one method of actual idle
period length estimation. The coefficients are, as detailed below,
multipliers giving expected idle period length from time until next
timer expiry. The multipliers are supposed to have domain of (0..1].

The coefficients are fractions where only the numerators are stored
and denominators are a shared constant RESOLUTION*DECAY. Since the
value of the coefficient should always be greater than 0 and less
than or equal to 1, the numerator must have a value greater than
0 and less than or equal to RESOLUTION*DECAY.

If the coefficients are updated with measured idle durations exceeding
timer length, the multiplier may reach values exceeding unity (i.e.
the stored numerator exceeds RESOLUTION*DECAY). This patch ensures that
the multipliers are updated with durations capped to timer length.

Signed-off-by: Tuukka Tikkanen
Acked-by: Nicolas Pitre
Signed-off-by: Rafael J. Wysocki

tuukka.tikkanen@linaro.org
2014-03-06 08:45:59 +0800
22695ab63 cpuidle: Use actual state latency in menu governor ... Browse Code »

Currently menu governor records the exit latency of the state it has
chosen for the idle period. The stored latency value is then later
used to calculate the actual length of the idle period. This value
may however be incorrect, as the entered state may not be the one
chosen by the governor. The entered state information is available,
so we can use that to obtain the real exit latency.

Signed-off-by: Tuukka Tikkanen
Acked-by: Daniel Lezcano
Signed-off-by: Rafael J. Wysocki

tuukka.tikkanen@linaro.org
2014-03-06 08:45:58 +0800
5dc2f5a30 cpuidle: rename expected_us to next_timer_us in menu governor ... Browse Code »

The field expected_us is used to store the time remaining until next
timer expiry. The name is inaccurate, as we really do not expect all
wakeups to be caused by timers. In addition, another field with a very
similar name (predicted_us) is used to store the predicted time
remaining until any wakeup source being active.

This patch renames expected_us to next_timer_us in order to better
reflect the contained information.

Signed-off-by: Tuukka Tikkanen
Acked-by: Nicolas Pitre
Acked-by: Len Brown
Signed-off-by: Rafael J. Wysocki

tuukka.tikkanen@linaro.org
2014-03-06 08:45:27 +0800

23 Aug, 2013

8 commits

51f245b89 cpuidle: Change struct menu_device field types ... Browse Code »

Field predicted_us value can never exceed expected_us value, but it has
a potentially larger type. As there is no need for additional 32 bits of
zeroes on 32 bit plaforms, change the type of predicted_us to match the
type of expected_us.

Field correction_factor is used to store a value that cannot exceed the
product of RESOLUTION and DECAY (default 1024*8 = 8192). The constants
cannot in practice be incremented to such values, that they'd overflow
unsigned int even on 32 bit systems, so the type is changed to avoid
unnecessary 64 bit arithmetic on 32 bit systems.

One multiplication of (now) 32 bit values needs an added cast to avoid
truncation of the result and has been added.

In order to avoid another multiplication from 32 bit domain to 64 bit
domain, the new correction_factor calculation has been changed from
new = old * (DECAY-1) / DECAY
to
new = old - old / DECAY,
which with infinite precision would yeild exactly the same result, but
now changes the direction of rounding. The impact is not significant as
the maximum accumulated difference cannot exceed the value of DECAY,
which is relatively small compared to product of RESOLUTION and DECAY
(8 / 8192).

Signed-off-by: Tuukka Tikkanen
Signed-off-by: Rafael J. Wysocki

Tuukka Tikkanen
2013-08-23 06:24:16 +0800
decd51bbc cpuidle: Add a comment warning about possible overflow ... Browse Code »

The menu governor has a number of tunable constants that may be changed
in the source. If certain combination of values are chosen, an overflow
is possible when the correction_factor is being recalculated.

This patch adds a warning regarding this possibility and describes the
change needed for fixing the issue. The change should not be permanently
enabled, as it will hurt performance when it is not needed.

Signed-off-by: Tuukka Tikkanen
Signed-off-by: Rafael J. Wysocki

Tuukka Tikkanen
2013-08-23 06:24:16 +0800
0e96d5adc cpuidle: Fix variable domains in get_typical_interval() ... Browse Code »

The menu governor uses a static function get_typical_interval() to
try to detect a repeating pattern of wakeups. The previous interval
durations are stored as an array of unsigned ints, but the arithmetic
in the function is performed exclusively as 64 bit values, even when
the value stored in a variable is known not to exceed unsigned int,
which may be smaller and more efficient on some platforms.

This patch changes the types of varibles used to store some
intermediates, the maximum and and the cutoff threshold to unsigned
ints. Average and standard deviation are still treated as 64 bit values,
even when the values are known to be within the domain of unsigned int,
to avoid casts to ensure correct integer promotion for arithmetic
operations.

Signed-off-by: Tuukka Tikkanen
Signed-off-by: Rafael J. Wysocki

Tuukka Tikkanen
2013-08-23 06:24:16 +0800
939e33b7f cpuidle: Fix menu_device->intervals type ... Browse Code »

Struct menu_device member intervals is declared as u32, but the value
stored is (unsigned) int. The type is changed to match the value being
stored.

Signed-off-by: Tuukka Tikkanen
Signed-off-by: Rafael J. Wysocki

Tuukka Tikkanen
2013-08-23 06:24:16 +0800
4cd46bca8 cpuidle: CodingStyle: Break up multiple assignments on single line ... Browse Code »

The function get_typical_interval() initializes a number of variables
that are immediately after declarations assigned constant values.
In addition, there are multiple assignments on a single line, which
is explicitly forbidden by Documentation/CodingStyle.

This patch removes redundant initial values for the variables and
breaks up the multiple assignment line.

Signed-off-by: Tuukka Tikkanen
Signed-off-by: Rafael J. Wysocki

Tuukka Tikkanen
2013-08-23 06:24:16 +0800
0d6a7ffa4 cpuidle: Check called function parameter in get_typical_interval() ... Browse Code »

get_typical_interval() uses int_sqrt() in calculation of standard
deviation. The formal parameter of int_sqrt() is unsigned long, which
may on some platforms be smaller than the 64 bit unsigned integer used
as the actual parameter. The overflow can occur frequently when actual
idle period lengths are in hundreds of milliseconds.

This patch adds a check for such overflow and rejects the candidate
average when an overflow would occur.

Signed-off-by: Tuukka Tikkanen
Signed-off-by: Rafael J. Wysocki

Tuukka Tikkanen
2013-08-23 06:24:16 +0800
017099e25 cpuidle: Rearrange code and comments in get_typical_interval() ... Browse Code »

This patch rearranges a if-return-elsif-goto-fi-return sequence into
if-return-fi-if-return-fi-goto sequence. The functionality remains the
same. Also, a lengthy comment that did not describe the functionality
in the order it occurs is split into half and top half is moved closer
to actual implementation it describes.

Signed-off-by: Tuukka Tikkanen
Signed-off-by: Rafael J. Wysocki

Tuukka Tikkanen
2013-08-23 06:24:15 +0800
330647a95 cpuidle: Ignore interval prediction result when timer is shorter ... Browse Code »

This patch prevents cpuidle menu governor from using repeating interval
prediction result if the idle period predicted is longer than the one
allowed by shortest running timer.

Signed-off-by: Tuukka Tikkanen
Signed-off-by: Rafael J. Wysocki

Tuukka Tikkanen
2013-08-23 06:24:15 +0800

15 Aug, 2013

1 commit

ee42f75db Merge back earlier 'pm-cpuidle' material. Browse Code »

Rafael J. Wysocki
2013-08-15 04:18:04 +0800

29 Jul, 2013

2 commits

148519120 Revert "cpuidle: Quickly notice prediction failure for repeat mode" ... Browse Code »

Revert commit 69a37bea (cpuidle: Quickly notice prediction failure for
repeat mode), because it has been identified as the source of a
significant performance regression in v3.8 and later as explained by
Jeremy Eder:

We believe we've identified a particular commit to the cpuidle code
that seems to be impacting performance of variety of workloads.
The simplest way to reproduce is using netperf TCP_RR test, so
we're using that, on a pair of Sandy Bridge based servers. We also
have data from a large database setup where performance is also
measurably/positively impacted, though that test data isn't easily
share-able.

Included below are test results from 3 test kernels:

kernel reverts
-----------------------------------------------------------
1) vanilla upstream (no reverts)

2) perfteam2 reverts e11538d1f03914eb92af5a1a378375c05ae8520c

3) test reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4
e11538d1f03914eb92af5a1a378375c05ae8520c

In summary, netperf TCP_RR numbers improve by approximately 4%
after reverting 69a37beabf1f0a6705c08e879bdd5d82ff6486c4. When
69a37beabf1f0a6705c08e879bdd5d82ff6486c4 is included, C0 residency
never seems to get above 40%. Taking that patch out gets C0 near
100% quite often, and performance increases.

The below data are histograms representing the %c0 residency @
1-second sample rates (using turbostat), while under netperf test.

- If you look at the first 4 histograms, you can see %c0 residency
almost entirely in the 30,40% bin.
- The last pair, which reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4,
shows %c0 in the 80,90,100% bins.

Below each kernel name are netperf TCP_RR trans/s numbers for the
particular kernel that can be disclosed publicly, comparing the 3
test kernels. We ran a 4th test with the vanilla kernel where
we've also set /dev/cpu_dma_latency=0 to show overall impact
boosting single-threaded TCP_RR performance over 11% above
baseline.

3.10-rc2 vanilla RX + c0 lock (/dev/cpu_dma_latency=0):
TCP_RR trans/s 54323.78

-----------------------------------------------------------
3.10-rc2 vanilla RX (no reverts)
TCP_RR trans/s 48192.47

Receiver %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 0]:
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 59]:
***********************************************************
40.0000 - 50.0000 [ 1]: *
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 0]:

Sender %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 0]:
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 11]: ***********
40.0000 - 50.0000 [ 49]:
*************************************************
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 0]:

-----------------------------------------------------------
3.10-rc2 perfteam2 RX (reverts commit
e11538d1f03914eb92af5a1a378375c05ae8520c)
TCP_RR trans/s 49698.69

Receiver %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 1]: *
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 59]:
***********************************************************
40.0000 - 50.0000 [ 0]:
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 0]:

Sender %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 0]:
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 2]: **
40.0000 - 50.0000 [ 58]:
**********************************************************
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 0]:

-----------------------------------------------------------
3.10-rc2 test RX (reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4
and e11538d1f03914eb92af5a1a378375c05ae8520c)
TCP_RR trans/s 47766.95

Receiver %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 1]: *
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 27]: ***************************
40.0000 - 50.0000 [ 2]: **
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 2]: **
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 28]: ****************************

Sender:
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 0]:
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 11]: ***********
40.0000 - 50.0000 [ 0]:
50.0000 - 60.0000 [ 1]: *
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 3]: ***
80.0000 - 90.0000 [ 7]: *******
90.0000 - 100.0000 [ 38]: **************************************

These results demonstrate gaining back the tendency of the CPU to
stay in more responsive, performant C-states (and thus yield
measurably better performance), by reverting commit
69a37beabf1f0a6705c08e879bdd5d82ff6486c4.

Requested-by: Jeremy Eder
Tested-by: Len Brown
Cc: 3.8+
Signed-off-by: Rafael J. Wysocki

Rafael J. Wysocki
2013-07-29 19:32:29 +0800
228b30234 Revert "cpuidle: Quickly notice prediction failure in general case" ... Browse Code »

Revert commit e11538d1 (cpuidle: Quickly notice prediction failure in
general case), since it depends on commit 69a37be (cpuidle: Quickly
notice prediction failure for repeat mode) that has been identified
as the source of a significant performance regression in v3.8 and
later.

Requested-by: Jeremy Eder
Tested-by: Len Brown
Cc: 3.8+
Signed-off-by: Rafael J. Wysocki

Rafael J. Wysocki
2013-07-29 19:32:29 +0800