Eric Lee / smarc-fsl-linux-kernel

12 Feb, 2019

1 commit

0f4892eae cpufreq: Move gov_attr_* macros to cpufreq.h ... Browse Code »

These macros can be reused by governors which don't use the common
governor code present in cpufreq_governor.c and should be moved to the
relevant header.

Now that they are getting moved to the right header file, reuse them in
schedutil governor as well (that required rename of show/store
routines).

Also create gov_attr_wo() macro for write-only sysfs files, this will be
used by Interactive governor in a later patch.

Signed-off-by: Viresh Kumar

Viresh Kumar
2019-02-12 10:24:05 +0800

10 Jan, 2018

1 commit

22af48be8 x86 / CPU: Always show current CPU frequency in /proc/cpuinfo ... Browse Code »

commit 7d5905dc14a87805a59f3c5bf70173aac2bb18f8 upstream.

After commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get()
for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo
on x86 can be either the nominal CPU frequency (which is constant)
or the frequency most recently requested by a scaling governor in
cpufreq, depending on the cpufreq configuration. That is somewhat
inconsistent and is different from what it was before 4.13, so in
order to restore the previous behavior, make it report the current
CPU frequency like the scaling_cur_freq sysfs file in cpufreq.

To that end, modify the /proc/cpuinfo implementation on x86 to use
aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback
registers, if available, and use their values to compute the CPU
frequency to be reported as "cpu MHz".

However, do that carefully enough to avoid accumulating delays that
lead to unacceptable access times for /proc/cpuinfo on systems with
many CPUs. Run aperfmperf_snapshot_khz() once on all CPUs
asynchronously at the /proc/cpuinfo open time, add a single delay
upfront (if necessary) at that point and simply compute the current
frequency while running show_cpuinfo() for each individual CPU.

Also, to avoid slowing down /proc/cpuinfo accesses too much, reduce
the default delay between consecutive APERF and MPERF reads to 10 ms,
which should be sufficient to get large enough numbers for the
frequency computation in all cases.

Fixes: 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"")
Signed-off-by: Rafael J. Wysocki
Acked-by: Thomas Gleixner
Tested-by: Thomas Gleixner
Acked-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Rafael J. Wysocki
2018-01-10 16:31:20 +0800

04 Sep, 2017

1 commit

08a10002b Merge branch 'pm-cpufreq-sched' ... Browse Code »

* pm-cpufreq-sched:
cpufreq: schedutil: Always process remote callback with slow switching
cpufreq: schedutil: Don't restrict kthread to related_cpus unnecessarily
cpufreq: Return 0 from ->fast_switch() on errors
cpufreq: Simplify cpufreq_can_do_remote_dvfs()
cpufreq: Process remote callbacks from any CPU if the platform permits
sched: cpufreq: Allow remote cpufreq callbacks
cpufreq: schedutil: Use unsigned int for iowait boost
cpufreq: schedutil: Make iowait boost more energy efficient

Rafael J. Wysocki
2017-09-04 06:05:22 +0800

08 Aug, 2017

1 commit

d6344d4b5 cpufreq: Simplify cpufreq_can_do_remote_dvfs() ... Browse Code »

The if () in cpufreq_can_do_remote_dvfs() is superfluous, so drop
it and simply return the value of the expression under it.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Rafael J. Wysocki
2017-08-08 23:09:02 +0800

01 Aug, 2017

2 commits

99d14d0e1 cpufreq: Process remote callbacks from any CPU if the platform permits ... Browse Code »

On many platforms, CPUs can do DVFS across cpufreq policies. i.e CPU
from policy-A can change frequency of CPUs belonging to policy-B.

This is quite common in case of ARM platforms where we don't
configure any per-cpu register.

Add a flag to identify such platforms and update
cpufreq_can_do_remote_dvfs() to allow remote callbacks if this flag is
set.

Also enable the flag for cpufreq-dt driver which is used only on ARM
platforms currently.

Signed-off-by: Viresh Kumar
Acked-by: Saravana Kannan
Acked-by: Peter Zijlstra (Intel)
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2017-08-01 20:24:54 +0800
674e75411 sched: cpufreq: Allow remote cpufreq callbacks ... Browse Code »

With Android UI and benchmarks the latency of cpufreq response to
certain scheduling events can become very critical. Currently, callbacks
into cpufreq governors are only made from the scheduler if the target
CPU of the event is the same as the current CPU. This means there are
certain situations where a target CPU may not run the cpufreq governor
for some time.

One testcase to show this behavior is where a task starts running on
CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the
system is configured such that the new tasks should receive maximum
demand initially, this should result in CPU0 increasing frequency
immediately. But because of the above mentioned limitation though, this
does not occur.

This patch updates the scheduler core to call the cpufreq callbacks for
remote CPUs as well.

The schedutil, ondemand and conservative governors are updated to
process cpufreq utilization update hooks called for remote CPUs where
the remote CPU is managed by the cpufreq policy of the local CPU.

The intel_pstate driver is updated to always reject remote callbacks.

This is tested with couple of usecases (Android: hackbench, recentfling,
galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit
octa-core, single policy). Only galleryfling showed minor improvements,
while others didn't had much deviation.

The reason being that this patch only targets a corner case, where
following are required to be true to improve performance and that
doesn't happen too often with these tests:

- Task is migrated to another CPU.
- The task has high demand, and should take the target CPU to higher
OPPs.
- And the target CPU doesn't call into the cpufreq governor until the
next tick.

Based on initial work from Steve Muckle.

Signed-off-by: Viresh Kumar
Acked-by: Saravana Kannan
Acked-by: Peter Zijlstra (Intel)
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2017-08-01 20:24:53 +0800

26 Jul, 2017

2 commits

fe829ed8e cpufreq: Add CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING cpufreq driver flag ... Browse Code »

The policy->transition_latency field is used for multiple purposes
today and its not straight forward at all. This is how it is used:

A. Set the correct transition_latency value.

B. Set it to CPUFREQ_ETERNAL because:
1. We don't want automatic dynamic switching (with
ondemand/conservative) to happen at all.
2. We don't know the transition latency.

This patch handles the B.1. case in a more readable way. A new flag for
the cpufreq drivers is added to disallow use of cpufreq governors which
have dynamic_switching flag set.

All the current cpufreq drivers which are setting transition_latency
unconditionally to CPUFREQ_ETERNAL are updated to use it. They don't
need to set transition_latency anymore.

There shouldn't be any functional change after this patch.

Signed-off-by: Viresh Kumar
Reviewed-by: Dominik Brodowski
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2017-07-26 06:15:46 +0800
ed4676e25 cpufreq: Replace "max_transition_latency" with "dynamic_switching" ... Browse Code »

There is no limitation in the ondemand or conservative governors which
disallow the transition_latency to be greater than 10 ms.

The max_transition_latency field is rather used to disallow automatic
dynamic frequency switching for platforms which didn't wanted these
governors to run.

Replace max_transition_latency with a boolean (dynamic_switching) and
check for transition_latency == CPUFREQ_ETERNAL along with that. This
makes it pretty straight forward to read/understand now.

Signed-off-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2017-07-26 06:15:45 +0800

22 Jul, 2017

2 commits

aa7519af4 cpufreq: Use transition_delay_us for legacy governors as well ... Browse Code »

The policy->transition_delay_us field is used only by the schedutil
governor currently, and this field describes how fast the driver wants
the cpufreq governor to change CPUs frequency. It should rather be a
common thing across all governors, as it doesn't have any schedutil
dependency here.

Create a new helper cpufreq_policy_transition_delay_us() to get the
transition delay across all governors.

Signed-off-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2017-07-22 08:25:20 +0800
2d0450363 cpufreq: governor: Drop min_sampling_rate ... Browse Code »

The cpufreq core and governors aren't supposed to set a limit on how
fast we want to try changing the frequency. This is currently done for
the legacy governors with help of min_sampling_rate.

At worst, we may end up setting the sampling rate to a value lower than
the rate at which frequency can be changed and then one of the CPUs in
the policy will be only changing frequency for ever.

But that is something for the user to decide and there is no need to
have special handling for such cases in the core. Leave it for the user
to figure out.

Signed-off-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2017-07-22 08:25:20 +0800

15 Jul, 2017

1 commit

4d25ec196 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux ... Browse Code »

Pull thermal management updates from Zhang Rui:

- Improve thermal cpu_cooling interaction with cpufreq core.

The cpu_cooling driver is designed to use CPU frequency scaling to
avoid high thermal states for a platform. But it wasn't glued really
well with cpufreq core.

For example clipped-cpus is copied from the policy structure and its
much better to use the policy->cpus (or related_cpus) fields directly
as they may have got updated. Not that things were broken before this
series, but they can be optimized a bit more.

This series tries to improve interactions between cpufreq core and
cpu_cooling driver and does some fixes/cleanups to the cpu_cooling
driver. (Viresh Kumar)

- A couple of fixes and cleanups in thermal core and imx, hisilicon,
bcm_2835, int340x thermal drivers. (Arvind Yadav, Dan Carpenter,
Sumeet Pawnikar, Srinivas Pandruvada, Willy WOLFF)

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (24 commits)
thermal: bcm2835: fix an error code in probe()
thermal: hisilicon: Handle return value of clk_prepare_enable
thermal: imx: Handle return value of clk_prepare_enable
thermal: int340x: check for sensor when PTYP is missing
Thermal/int340x: Fix few typos and kernel-doc style
thermal: fix source code documentation for parameters
thermal: cpu_cooling: Replace kmalloc with kmalloc_array
thermal: cpu_cooling: Rearrange struct cpufreq_cooling_device
thermal: cpu_cooling: 'freq' can't be zero in cpufreq_state2power()
thermal: cpu_cooling: don't store cpu_dev in cpufreq_cdev
thermal: cpu_cooling: get_level() can't fail
thermal: cpu_cooling: create structure for idle time stats
thermal: cpu_cooling: merge frequency and power tables
thermal: cpu_cooling: get rid of 'allowed_cpus'
thermal: cpu_cooling: OPPs are registered for all CPUs
thermal: cpu_cooling: store cpufreq policy
cpufreq: create cpufreq_table_count_valid_entries()
thermal: cpu_cooling: use cpufreq_policy to register cooling device
thermal: cpu_cooling: get rid of a variable in cpufreq_set_cur_state()
thermal: cpu_cooling: remove cpufreq_cooling_get_level()
...

Linus Torvalds
2017-07-15 04:12:32 +0800

27 Jun, 2017

1 commit

f8475cef9 x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF ... Browse Code »

The goal of this change is to give users a uniform and meaningful
result when they read /sys/...cpufreq/scaling_cur_freq
on modern x86 hardware, as compared to what they get today.

Modern x86 processors include the hardware needed
to accurately calculate frequency over an interval --
APERF, MPERF, and the TSC.

Here we provide an x86 routine to make this calculation
on supported hardware, and use it in preference to any
driver driver-specific cpufreq_driver.get() routine.

MHz is computed like so:

MHz = base_MHz * delta_APERF / delta_MPERF

MHz is the average frequency of the busy processor
over a measurement interval. The interval is
defined to be the time between successive invocations
of aperfmperf_khz_on_cpu(), which are expected to to
happen on-demand when users read sysfs attribute
cpufreq/scaling_cur_freq.

As with previous methods of calculating MHz,
idle time is excluded.

base_MHz above is from TSC calibration global "cpu_khz".

This x86 native method to calculate MHz returns a meaningful result
no matter if P-states are controlled by hardware or firmware
and/or if the Linux cpufreq sub-system is or is-not installed.

When this routine is invoked more frequently, the measurement
interval becomes shorter. However, the code limits re-computation
to 10ms intervals so that average frequency remains meaningful.

Discerning users are encouraged to take advantage of
the turbostat(8) utility, which can gracefully handle
concurrent measurement intervals of arbitrary length.

Signed-off-by: Len Brown
Reviewed-by: Thomas Gleixner
Signed-off-by: Rafael J. Wysocki

Len Brown
2017-06-27 07:47:32 +0800

28 May, 2017

1 commit

55d852931 cpufreq: create cpufreq_table_count_valid_entries() ... Browse Code »

We need such a routine at two places already, lets create one.

Signed-off-by: Viresh Kumar
Reviewed-by: Lukasz Luba
Tested-by: Lukasz Luba
Signed-off-by: Eduardo Valentin

Viresh Kumar
2017-05-28 08:32:28 +0800

18 Apr, 2017

1 commit

1b72e7fd3 cpufreq: schedutil: Use policy-dependent transition delays ... Browse Code »

Make the schedutil governor take the initial (default) value of the
rate_limit_us sysfs attribute from the (new) transition_delay_us
policy parameter (to be set by the scaling driver).

That will allow scaling drivers to make schedutil use smaller default
values of rate_limit_us and reduce the default average time interval
between consecutive frequency changes.

Make intel_pstate set transition_delay_us to 500.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2017-04-18 00:37:27 +0800

04 Feb, 2017

3 commits

565ebe807 cpufreq: Fix typos in comments ... Browse Code »

- s/freqnency/frequency/
- s/accomodating/accommodating/

Signed-off-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2017-02-04 07:47:59 +0800
052f573f5 cpufreq: Remove CPUFREQ_START notifier event ... Browse Code »

Its not used anymore, remove it.

Signed-off-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2017-02-04 07:05:30 +0800
f9f41e3ef cpufreq: Remove policy create/remove notifiers ... Browse Code »

Those were added by:

commit fcd7af917abb ("cpufreq: stats: handle cpufreq_unregister_driver()
and suspend/resume properly")

but aren't used anymore since:

commit 1aefc75b2449 ("cpufreq: stats: Make the stats code non-modular").

Remove them. Also remove the redundant parameter to the respective
routines.

Signed-off-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2017-02-04 06:59:38 +0800

21 Nov, 2016

1 commit

30248feff cpufreq: Make cpufreq_update_policy() void ... Browse Code »

The return value of cpufreq_update_policy() is never used, so make
it void.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-11-21 21:35:43 +0800

11 Nov, 2016

1 commit

ee7930ee2 cpufreq: stats: New sysfs attribute for clearing statistics ... Browse Code »

Allow CPUfreq statistics to be cleared by writing anything to
/sys/.../cpufreq/stats/reset.

Signed-off-by: Markus Mayer
Acked-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Markus Mayer
2016-11-11 08:51:11 +0800

20 Oct, 2016

1 commit

c6fe46a79 cpufreq: fix overflow in cpufreq_table_find_index_dl() ... Browse Code »

'best' is always less or equals to 'pos', so `best - pos' returns
a negative value which is then getting casted to `unsigned int'
and passed to __cpufreq_driver_target()->acpi_cpufreq_target()
for policy->freq_table selection. This results in

BUG: unable to handle kernel paging request at ffff881019b469f8
IP: [] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq]
PGD 267f067
PUD 0

Oops: 0000 [#1] PREEMPT SMP
CPU: 6 PID: 70 Comm: kworker/6:1 Not tainted 4.9.0-rc1-next-20161017-dbg-dirty
Workqueue: events dbs_work_handler
task: ffff88041b808000 task.stack: ffff88041b810000
RIP: 0010:[] [] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq]
RSP: 0018:ffff88041b813c60 EFLAGS: 00010282
RAX: ffff880419b46a00 RBX: ffff88041b848400 RCX: ffff880419b20f80
RDX: 00000000001dff38 RSI: 00000000ffffffff RDI: ffff88041b848400
RBP: ffff88041b813cb0 R08: 0000000000000006 R09: 0000000000000040
R10: ffffffff8207f9e0 R11: ffffffff8173595b R12: 0000000000000000
R13: ffff88041f1dff38 R14: 0000000000262900 R15: 0000000bfffffff4
FS: 0000000000000000(0000) GS:ffff88041f000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff881019b469f8 CR3: 000000041a2d3000 CR4: 00000000001406e0
Stack:
ffff88041b813cb0 ffffffff813347f9 ffff88041b813ca0 ffffffff81334663
ffff88041f1d4bc0 ffff88041b848400 0000000000000000 0000000000000000
0000000000262900 0000000000000000 ffff88041b813d00 ffffffff813355dc
Call Trace:
[] ? cpufreq_freq_transition_begin+0xf1/0xfc
[] ? get_cpu_idle_time+0x97/0xa6
[] __cpufreq_driver_target+0x3b6/0x44e
[] cs_dbs_timer+0x11a/0x135
[] dbs_work_handler+0x39/0x62
[] process_one_work+0x280/0x4a5
[] worker_thread+0x24f/0x397
[] ? rescuer_thread+0x30b/0x30b
[] ? nl80211_get_key+0x29/0x36a
[] kthread+0xfc/0x104
[] ? put_lock_stats.isra.9+0xe/0x20
[] ? kthread_create_on_node+0x3f/0x3f
[] ret_from_fork+0x22/0x30
Code: 56 4d 6b ff 0c 41 55 41 54 53 48 83 ec 28 48 8b 15 ad 1e 00 00 44 8b 41
08 48 8b 87 c8 00 00 00 49 89 d5 4e 03 2c c5 80 b2 78 81 8b 74 38 04 45
3b 75 00 75 11 31 c0 83 39 00 0f 84 1c 01 00
RIP [] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq]
RSP
CR2: ffff881019b469f8
---[ end trace 16d9fc7a17897d37 ]---

[ rjw: In some cases this bug may also cause incorrect frequencies to
be selected by cpufreq governors. ]

Fixes: 899bb6642f2a (cpufreq: skip invalid entries when searching the frequency)
Link: http://marc.info/?l=linux-kernel&m=147672030714331&w=2
Reported-and-tested-by: Sedat Dilek
Reported-and-tested-by: Jörg Otte
Signed-off-by: Sergey Senozhatsky
Acked-by: Viresh Kumar
Cc: 4.8+ # 4.8+
Signed-off-by: Rafael J. Wysocki

Sergey Senozhatsky
2016-10-20 22:35:50 +0800

13 Oct, 2016

1 commit

899bb6642 cpufreq: skip invalid entries when searching the frequency ... Browse Code »

Skip invalid entries when searching the frequency. This fixes cpufreq
at least on loongson2 MIPS board.

Fixes: da0c6dc00c69 (cpufreq: Handle sorted frequency tables more efficiently)
Signed-off-by: Aaro Koskinen
Signed-off-by: Viresh Kumar
Cc: 4.8+ # 4.8+
Signed-off-by: Rafael J. Wysocki

Aaro Koskinen
2016-10-13 03:01:18 +0800

21 Jul, 2016

1 commit

e3c062360 cpufreq: add cpufreq_driver_resolve_freq() ... Browse Code »

Cpufreq governors may need to know what a particular target frequency
maps to in the driver without necessarily wanting to set the frequency.
Support this operation via a new cpufreq API,
cpufreq_driver_resolve_freq(). This API returns the lowest driver
frequency equal or greater than the target frequency
(CPUFREQ_RELATION_L), subject to any policy (min/max) or driver
limitations. The mapping is also cached in the policy so that a
subsequent fast_switch operation can avoid repeating the same lookup.

The API will call a new cpufreq driver callback, resolve_freq(), if it
has been registered by the driver. Otherwise the frequency is resolved
via cpufreq_frequency_table_target(). Rather than require ->target()
style drivers to provide a resolve_freq() callback it is left to the
caller to ensure that the driver implements this callback if necessary
to use cpufreq_driver_resolve_freq().

Suggested-by: Rafael J. Wysocki
Signed-off-by: Steve Muckle
Signed-off-by: Rafael J. Wysocki

Steve Muckle
2016-07-21 20:46:08 +0800

07 Jul, 2016

1 commit

da0c6dc00 cpufreq: Handle sorted frequency tables more efficiently ... Browse Code »

cpufreq drivers aren't required to provide a sorted frequency table
today, and even the ones which provide a sorted table aren't handled
efficiently by cpufreq core.

This patch adds infrastructure to verify if the freq-table provided by
the drivers is sorted or not, and use efficient helpers if they are
sorted.

Signed-off-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2016-07-07 06:13:20 +0800

09 Jun, 2016

3 commits

d218ed773 cpufreq: Return index from cpufreq_frequency_table_target() ... Browse Code »

This routine can't fail unless the frequency table is invalid and
doesn't contain any valid entries.

Make it return the index and WARN() in case it is used for an invalid
table.

Signed-off-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2016-06-09 06:58:06 +0800
7ab4aabba cpufreq: Drop freq-table param to cpufreq_frequency_table_target() ... Browse Code »

The policy already has this pointer set, use it instead.

Signed-off-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2016-06-09 06:58:06 +0800
f8bfc116c cpufreq: Remove cpufreq_frequency_get_table() ... Browse Code »

Most of the callers of cpufreq_frequency_get_table() already have the
pointer to a valid 'policy' structure and they don't really need to go
through the per-cpu variable first and then a check to validate the
frequency, in order to find the freq-table for the policy.

Directly use the policy->freq_table field instead for them.

Only one user of that API is left after above changes, cpu_cooling.c and
it accesses the freq_table in a racy way as the policy can get freed in
between.

Fix it by using cpufreq_cpu_get() properly.

Since there are no more users of cpufreq_frequency_get_table() left, get
rid of it.

Signed-off-by: Viresh Kumar
Acked-by: Javi Merino (cpu_cooling.c)
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2016-06-09 06:58:05 +0800

03 Jun, 2016

4 commits

1aefc75b2 cpufreq: stats: Make the stats code non-modular ... Browse Code »

The modularity of cpufreq_stats is quite problematic.

First off, the usage of policy notifiers for the initialization
and cleanup in the cpufreq_stats module is inherently racy with
respect to CPU offline/online and the initialization and cleanup
of the cpufreq driver.

Second, fast frequency switching (used by the schedutil governor)
cannot be enabled if any transition notifiers are registered, so
if the cpufreq_stats module (that registers a transition notifier
for updating transition statistics) is loaded, the schedutil governor
cannot use fast frequency switching.

On the other hand, allowing cpufreq_stats to be built as a module
doesn't really add much value. Arguably, there's not much reason
for that code to be modular at all.

For the above reasons, make the cpufreq stats code non-modular,
modify the core to invoke functions provided by that code directly
and drop the notifiers from it.

Make the stats sysfs attributes appear empty if fast frequency
switching is enabled as the statistics will not be updated in that
case anyway (and returning -EBUSY from those attributes breaks
powertop).

While at it, clean up Kconfig help for the CPU_FREQ_STAT and
CPU_FREQ_STAT_DETAILS options.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-06-03 05:24:41 +0800
9a15fb2c7 cpufreq: Drop the 'initialized' field from struct cpufreq_governor ... Browse Code »

The 'initialized' field in struct cpufreq_governor is only used by
the conservative governor (as a usage counter) and the way that
happens is far from straightforward and arguably incorrect.

Namely, the value of 'initialized' is checked by
cpufreq_dbs_governor_init() and cpufreq_dbs_governor_exit() and
the results of those checks are passed (as the second argument) to
the ->init() and ->exit() callbacks in struct dbs_governor. Those
callbacks are only implemented by the ondemand and conservative
governors and ondemand doesn't use their second argument at all.
In turn, the conservative governor uses it to decide whether or not
to either register or unregister a transition notifier.

That whole mechanism is not only unnecessarily convoluted, but also
racy, because the 'initialized' field of struct cpufreq_governor is
updated in cpufreq_init_governor() and cpufreq_exit_governor() under
policy->rwsem which doesn't help if one of these functions is run
twice in parallel for different policies (which isn't impossible in
principle), for example.

Instead of it, add a proper usage counter to the conservative
governor and update it from cs_init() and cs_exit() which is
guaranteed to be non-racy, as those functions are only called
under gov_dbs_data_mutex which is global.

With that in place, drop the 'initialized' field from struct
cpufreq_governor as it is not used any more.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-06-03 05:24:39 +0800
bf2be2de8 cpufreq: governor: Create cpufreq_policy_apply_limits() ... Browse Code »

Create a new helper to avoid code duplication across governors.

Signed-off-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2016-06-03 05:24:39 +0800
e788892ba cpufreq: governor: Get rid of governor events ... Browse Code »

The design of the cpufreq governor API is not very straightforward,
as struct cpufreq_governor provides only one callback to be invoked
from different code paths for different purposes. The purpose it is
invoked for is determined by its second "event" argument, causing it
to act as a "callback multiplexer" of sorts.

Unfortunately, that leads to extra complexity in governors, some of
which implement the ->governor() callback as a switch statement
that simply checks the event argument and invokes a separate function
to handle that specific event.

That extra complexity can be eliminated by replacing the all-purpose
->governor() callback with a family of callbacks to carry out specific
governor operations: initialization and exit, start and stop and policy
limits updates. That also turns out to reduce the code size too, so
do it.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-06-03 05:24:15 +0800

09 Apr, 2016

1 commit

6c9d9c819 cpufreq: Call cpufreq_disable_fast_switch() in sugov_exit() ... Browse Code »

Due to differences in the cpufreq core's handling of runtime CPU
offline and nonboot CPUs disabling during system suspend-to-RAM,
fast frequency switching gets disabled after a suspend-to-RAM and
resume cycle on all of the nonboot CPUs.

To prevent that from happening, move the invocation of
cpufreq_disable_fast_switch() from cpufreq_exit_governor() to
sugov_exit(), as the schedutil governor is the only user of fast
frequency switching today anyway.

That simply prevents cpufreq_disable_fast_switch() from being called
without invoking the ->governor callback for the CPUFREQ_GOV_POLICY_EXIT
event (which happens during system suspend now).

Fixes: b7898fda5bc7 (cpufreq: Support for fast frequency switching)
Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-04-09 04:41:36 +0800

02 Apr, 2016

3 commits

b7898fda5 cpufreq: Support for fast frequency switching ... Browse Code »

Modify the ACPI cpufreq driver to provide a method for switching
CPU frequencies from interrupt context and update the cpufreq core
to support that method if available.

Introduce a new cpufreq driver callback, ->fast_switch, to be
invoked for frequency switching from interrupt context by (future)
governors supporting that feature via (new) helper function
cpufreq_driver_fast_switch().

Add two new policy flags, fast_switch_possible, to be set by the
cpufreq driver if fast frequency switching can be used for the
given policy and fast_switch_enabled, to be set by the governor
if it is going to use fast frequency switching for the given
policy. Also add a helper for setting the latter.

Since fast frequency switching is inherently incompatible with
cpufreq transition notifiers, make it possible to set the
fast_switch_enabled only if there are no transition notifiers
already registered and make the registration of new transition
notifiers fail if fast_switch_enabled is set for at least one
policy.

Implement the ->fast_switch callback in the ACPI cpufreq driver
and make it set fast_switch_possible during policy initialization
as appropriate.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-04-02 07:09:03 +0800
379480d82 cpufreq: Move governor symbols to cpufreq.h ... Browse Code »

Move definitions of symbols related to transition latency and
sampling rate to include/linux/cpufreq.h so they can be used by
(future) goverernors located outside of drivers/cpufreq/.

No functional changes.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-04-02 07:09:02 +0800
66893b6ac cpufreq: Move governor attribute set headers to cpufreq.h ... Browse Code »

Move definitions and function headers related to struct gov_attr_set
to include/linux/cpufreq.h so they can be used by (future) goverernors
located outside of drivers/cpufreq/.

No functional changes.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-04-02 07:09:02 +0800

11 Mar, 2016

2 commits

a5acbfbd7 Merge branch 'pm-cpufreq-governor' into pm-cpufreq Browse Code »

Rafael J. Wysocki
2016-03-11 03:46:03 +0800
adaf9fcd1 cpufreq: Move scheduler-related code to the sched directory ... Browse Code »

Create cpufreq.c under kernel/sched/ and move the cpufreq code
related to the scheduler to that file and to sched.h.

Redefine cpufreq_update_util() as a static inline function to avoid
function calls at its call sites in the scheduler code (as suggested
by Peter Zijlstra).

Also move the definition of struct update_util_data and declaration
of cpufreq_set_update_util_data() from include/linux/cpufreq.h to
include/linux/sched.h.

Signed-off-by: Rafael J. Wysocki
Acked-by: Peter Zijlstra (Intel)

Rafael J. Wysocki
2016-03-11 03:44:47 +0800

09 Mar, 2016

3 commits

242aa883a cpufreq: Remove 'policy->governor_enabled' ... Browse Code »

The entire sequence of events (like INIT/START or STOP/EXIT) for which
cpufreq_governor() is called, is guaranteed to be protected by
policy->rwsem now.

The additional checks that were added earlier (as we were forced to drop
policy->rwsem before calling cpufreq_governor() for EXIT event), aren't
required anymore.

Over that, they weren't sufficient really. They just take care of
START/STOP events, but not INIT/EXIT and the state machine was never
maintained properly by them.

Kill the unnecessary checks and policy->governor_enabled field.

Signed-off-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2016-03-09 21:41:12 +0800
68e80dae0 Revert "cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT" ... Browse Code »

Earlier, when the struct freq-attr was used to represent governor
attributes, the standard cpufreq show/store sysfs attribute callbacks
were applied to the governor tunable attributes and they always acquire
the policy->rwsem lock before carrying out the operation. That could
have resulted in an ABBA deadlock if governor tunable attributes are
removed under policy->rwsem while one of them is being accessed
concurrently (if sysfs attributes removal wins the race, it will wait
for the access to complete with policy->rwsem held while the attribute
callback will block on policy->rwsem indefinitely).

We attempted to address this issue by dropping policy->rwsem around
governor tunable attributes removal (that is, around invocations of the
->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT)
in cpufreq_set_policy(), but that opened up race conditions that had not
been possible with policy->rwsem held all the time.

The previous commit, "cpufreq: governor: New sysfs show/store callbacks
for governor tunables", fixed the original ABBA deadlock by adding new
governor specific show/store callbacks.

We don't have to drop rwsem around invocations of governor event
CPUFREQ_GOV_POLICY_EXIT anymore, and original fix can be reverted now.

Fixes: 955ef4833574 (cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT)
Signed-off-by: Viresh Kumar
Reported-by: Juri Lelli
Tested-by: Juri Lelli
Tested-by: Shilpasri G Bhat
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2016-03-09 21:40:59 +0800
34e2c555f cpufreq: Add mechanism for registering utilization update callbacks ... Browse Code »

Introduce a mechanism by which parts of the cpufreq subsystem
("setpolicy" drivers or the core) can register callbacks to be
executed from cpufreq_update_util() which is invoked by the
scheduler's update_load_avg() on CPU utilization changes.

This allows the "setpolicy" drivers to dispense with their timers
and do all of the computations they need and frequency/voltage
adjustments in the update_load_avg() code path, among other things.

The update_load_avg() changes were suggested by Peter Zijlstra.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar
Acked-by: Peter Zijlstra (Intel)
Acked-by: Ingo Molnar

Rafael J. Wysocki
2016-03-09 21:39:19 +0800

27 Feb, 2016

1 commit

34b087051 cpufreq: Simplify the cpufreq_for_each_valid_entry() ... Browse Code »

That macro uses an internal static inline function that is first
totally unnecessary and second hard to read, so simplify it and
get rid of that monster.

No functional changes.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-02-27 05:11:56 +0800