Eric Lee / smarc-fsl-linux-kernel

02 Mar, 2017

1 commit

55687da16 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/cpufreq.h> ... Browse Code »

We are going to split out of , which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-03-02 15:42:30 +0800

01 Feb, 2017

1 commit

7fb1327ee sched/cputime: Convert kcpustat to nsecs ... Browse Code »

Kernel CPU stats are stored in cputime_t which is an architecture
defined type, and hence a bit opaque and requiring accessors and mutators
for any operation.

Converting them to nsecs simplifies the code and is one step toward
the removal of cputime_t in the core code.

Signed-off-by: Frederic Weisbecker
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Tony Luck
Cc: Fenghua Yu
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Stanislaw Gruszka
Cc: Wanpeng Li
Link: http://lkml.kernel.org/r/1485832191-26889-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2017-02-01 16:13:47 +0800

17 Nov, 2016

1 commit

00bfe0588 cpufreq: conservative: Decrease frequency faster for deferred updates ... Browse Code »

Conservative governor changes the CPU frequency in steps.
That means that if a CPU runs at max frequency, it will need several
sampling periods to return to min frequency when the workload
is finished.

If the update function that calculates the load and target frequency
is deferred, the governor might need even more time to decrease the
frequency.

This may have impact to power consumption and after all conservative
should decrease the frequency if there is no workload at every sampling
rate.

To resolve the above issue calculate the number of sampling periods
that the update is deferred. Considering that for each sampling period
conservative should drop the frequency by a freq_step because the
CPU was idle apply the proper subtraction to requested frequency.

Below, the kernel trace with and without this patch. First an
intensive workload is applied on a specific CPU. Then the workload
is removed and the CPU goes to idle.

WITHOUT

-0 [007] dN.. 620.329153: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556 [007] .... 620.350857: cpu_frequency: state=1700000 cpu_id=7
kworker/7:2-556 [007] .... 620.370856: cpu_frequency: state=1900000 cpu_id=7
kworker/7:2-556 [007] .... 620.390854: cpu_frequency: state=2100000 cpu_id=7
kworker/7:2-556 [007] .... 620.411853: cpu_frequency: state=2200000 cpu_id=7
kworker/7:2-556 [007] .... 620.432854: cpu_frequency: state=2400000 cpu_id=7
kworker/7:2-556 [007] .... 620.453854: cpu_frequency: state=2600000 cpu_id=7
kworker/7:2-556 [007] .... 620.494856: cpu_frequency: state=2900000 cpu_id=7
kworker/7:2-556 [007] .... 620.515856: cpu_frequency: state=3100000 cpu_id=7
kworker/7:2-556 [007] .... 620.536858: cpu_frequency: state=3300000 cpu_id=7
kworker/7:2-556 [007] .... 620.557857: cpu_frequency: state=3401000 cpu_id=7
-0 [007] d... 669.591363: cpu_idle: state=4 cpu_id=7
-0 [007] d... 669.591939: cpu_idle: state=4294967295 cpu_id=7
-0 [007] d... 669.591980: cpu_idle: state=4 cpu_id=7
-0 [007] dN.. 669.591989: cpu_idle: state=4294967295 cpu_id=7
...
-0 [007] d... 670.201224: cpu_idle: state=4 cpu_id=7
-0 [007] d... 670.221975: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556 [007] .... 670.222016: cpu_frequency: state=3300000 cpu_id=7
-0 [007] d... 670.222026: cpu_idle: state=4 cpu_id=7
-0 [007] d... 670.234964: cpu_idle: state=4294967295 cpu_id=7
...
-0 [007] d... 670.801251: cpu_idle: state=4 cpu_id=7
-0 [007] d... 671.236046: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556 [007] .... 671.236073: cpu_frequency: state=3100000 cpu_id=7
-0 [007] d... 671.236112: cpu_idle: state=4 cpu_id=7
-0 [007] d... 671.393437: cpu_idle: state=4294967295 cpu_id=7
...
-0 [007] d... 671.401277: cpu_idle: state=4 cpu_id=7
-0 [007] d... 671.404083: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556 [007] .... 671.404111: cpu_frequency: state=2900000 cpu_id=7
-0 [007] d... 671.404125: cpu_idle: state=4 cpu_id=7
-0 [007] d... 671.404974: cpu_idle: state=4294967295 cpu_id=7
...
-0 [007] d... 671.501180: cpu_idle: state=4 cpu_id=7
-0 [007] d... 671.995414: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556 [007] .... 671.995459: cpu_frequency: state=2800000 cpu_id=7
-0 [007] d... 671.995469: cpu_idle: state=4 cpu_id=7
-0 [007] d... 671.996287: cpu_idle: state=4294967295 cpu_id=7
...
-0 [007] d... 672.001305: cpu_idle: state=4 cpu_id=7
-0 [007] d... 672.078374: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556 [007] .... 672.078410: cpu_frequency: state=2600000 cpu_id=7
-0 [007] d... 672.078419: cpu_idle: state=4 cpu_id=7
-0 [007] d... 672.158020: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556 [007] .... 672.158040: cpu_frequency: state=2400000 cpu_id=7
-0 [007] d... 672.158044: cpu_idle: state=4 cpu_id=7
-0 [007] d... 672.160038: cpu_idle: state=4294967295 cpu_id=7
...
-0 [007] d... 672.234557: cpu_idle: state=4 cpu_id=7
-0 [007] d... 672.237121: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556 [007] .... 672.237174: cpu_frequency: state=2100000 cpu_id=7
-0 [007] d... 672.237186: cpu_idle: state=4 cpu_id=7
-0 [007] d... 672.237778: cpu_idle: state=4294967295 cpu_id=7
...
-0 [007] d... 672.267902: cpu_idle: state=4 cpu_id=7
-0 [007] d... 672.269860: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556 [007] .... 672.269906: cpu_frequency: state=1900000 cpu_id=7
-0 [007] d... 672.269914: cpu_idle: state=4 cpu_id=7
-0 [007] d... 672.271902: cpu_idle: state=4294967295 cpu_id=7
...
-0 [007] d... 672.751342: cpu_idle: state=4 cpu_id=7
-0 [007] d... 672.823056: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556 [007] .... 672.823095: cpu_frequency: state=1600000 cpu_id=7

WITH

-0 [007] dN.. 4380.928009: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399 [007] .... 4380.949767: cpu_frequency: state=2000000 cpu_id=7
kworker/7:2-399 [007] .... 4380.969765: cpu_frequency: state=2200000 cpu_id=7
kworker/7:2-399 [007] .... 4381.009766: cpu_frequency: state=2500000 cpu_id=7
kworker/7:2-399 [007] .... 4381.029767: cpu_frequency: state=2600000 cpu_id=7
kworker/7:2-399 [007] .... 4381.049769: cpu_frequency: state=2800000 cpu_id=7
kworker/7:2-399 [007] .... 4381.069769: cpu_frequency: state=3000000 cpu_id=7
kworker/7:2-399 [007] .... 4381.089771: cpu_frequency: state=3100000 cpu_id=7
kworker/7:2-399 [007] .... 4381.109772: cpu_frequency: state=3400000 cpu_id=7
kworker/7:2-399 [007] .... 4381.129773: cpu_frequency: state=3401000 cpu_id=7
-0 [007] d... 4428.226159: cpu_idle: state=1 cpu_id=7
-0 [007] d... 4428.226176: cpu_idle: state=4294967295 cpu_id=7
-0 [007] d... 4428.226181: cpu_idle: state=4 cpu_id=7
-0 [007] d... 4428.227177: cpu_idle: state=4294967295 cpu_id=7
...
-0 [007] d... 4428.551640: cpu_idle: state=4 cpu_id=7
-0 [007] d... 4428.649239: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399 [007] .... 4428.649268: cpu_frequency: state=2800000 cpu_id=7
-0 [007] d... 4428.649278: cpu_idle: state=4 cpu_id=7
-0 [007] d... 4428.689856: cpu_idle: state=4294967295 cpu_id=7
...
-0 [007] d... 4428.799542: cpu_idle: state=4 cpu_id=7
-0 [007] d... 4428.801683: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399 [007] .... 4428.801748: cpu_frequency: state=1700000 cpu_id=7
-0 [007] d... 4428.801761: cpu_idle: state=4 cpu_id=7
-0 [007] d... 4428.806545: cpu_idle: state=4294967295 cpu_id=7
...
-0 [007] d... 4429.051880: cpu_idle: state=4 cpu_id=7
-0 [007] d... 4429.086240: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399 [007] .... 4429.086293: cpu_frequency: state=1600000 cpu_id=7

Without the patch the CPU dropped to min frequency after 3.2s
With the patch applied the CPU dropped to min frequency after 0.86s

Signed-off-by: Stratos Karafotis
Acked-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Stratos Karafotis
2016-11-17 06:15:56 +0800

11 Nov, 2016

1 commit

26f0dbc9a cpufreq: governor: Don't use 'timer' keyword ... Browse Code »

The earlier implementation of governors used background timers and so
functions, mutex, etc had 'timer' keyword in their names.

But that's not true anymore. Replace 'timer' with 'update', as those
functions, variables are based around updates to frequency.

Signed-off-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2016-11-11 08:48:33 +0800

17 Aug, 2016

1 commit

58919e83c cpufreq / sched: Pass flags to cpufreq_update_util() ... Browse Code »

It is useful to know the reason why cpufreq_update_util() has just
been called and that can be passed as flags to cpufreq_update_util()
and to the ->func() callback in struct update_util_data. However,
doing that in addition to passing the util and max arguments they
already take would be clumsy, so avoid it.

Instead, use the observation that the schedutil governor is part
of the scheduler proper, so it can access scheduler data directly.
This allows the util and max arguments of cpufreq_update_util()
and the ->func() callback in struct update_util_data to be replaced
with a flags one, but schedutil has to be modified to follow.

Thus make the schedutil governor obtain the CFS utilization
information from the scheduler and use the "RT" and "DL" flags
instead of the special utilization value of ULONG_MAX to track
updates from the RT and DL sched classes. Make it non-modular
too to avoid having to export scheduler variables to modules at
large.

Next, update all of the other users of cpufreq_update_util()
and the ->func() callback in struct update_util_data accordingly.

Suggested-by: Peter Zijlstra
Signed-off-by: Rafael J. Wysocki
Acked-by: Peter Zijlstra (Intel)
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-08-17 04:14:55 +0800

10 Jun, 2016

1 commit

f6709b8aa cpufreq: governor: Drop gov_cancel_work() ... Browse Code »

There's no reason for gov_cancel_work() to exist at all, as it only
has one caller and the only thing done by that caller is to invoke
gov_cancel_work().

Accordingly, drop gov_cancel_work() and move its contents to the
caller.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-06-10 05:39:29 +0800

03 Jun, 2016

4 commits

9a15fb2c7 cpufreq: Drop the 'initialized' field from struct cpufreq_governor ... Browse Code »

The 'initialized' field in struct cpufreq_governor is only used by
the conservative governor (as a usage counter) and the way that
happens is far from straightforward and arguably incorrect.

Namely, the value of 'initialized' is checked by
cpufreq_dbs_governor_init() and cpufreq_dbs_governor_exit() and
the results of those checks are passed (as the second argument) to
the ->init() and ->exit() callbacks in struct dbs_governor. Those
callbacks are only implemented by the ondemand and conservative
governors and ondemand doesn't use their second argument at all.
In turn, the conservative governor uses it to decide whether or not
to either register or unregister a transition notifier.

That whole mechanism is not only unnecessarily convoluted, but also
racy, because the 'initialized' field of struct cpufreq_governor is
updated in cpufreq_init_governor() and cpufreq_exit_governor() under
policy->rwsem which doesn't help if one of these functions is run
twice in parallel for different policies (which isn't impossible in
principle), for example.

Instead of it, add a proper usage counter to the conservative
governor and update it from cs_init() and cs_exit() which is
guaranteed to be non-racy, as those functions are only called
under gov_dbs_data_mutex which is global.

With that in place, drop the 'initialized' field from struct
cpufreq_governor as it is not used any more.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-06-03 05:24:39 +0800
bf2be2de8 cpufreq: governor: Create cpufreq_policy_apply_limits() ... Browse Code »

Create a new helper to avoid code duplication across governors.

Signed-off-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2016-06-03 05:24:39 +0800
666f4ccc5 cpufreq: governor: Remove unnecessary bits from print message ... Browse Code »

pr_*() helpers already prefix the print messages with
"cpufreq_governor:" and similar details aren't required in the actual
message.

For example, the print message getting fixed looks like this before this
patch:

cpufreq_governor: cpufreq: Governor initialization failed (dbs_data kobject init error 0)

Signed-off-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2016-06-03 05:24:38 +0800
e788892ba cpufreq: governor: Get rid of governor events ... Browse Code »

The design of the cpufreq governor API is not very straightforward,
as struct cpufreq_governor provides only one callback to be invoked
from different code paths for different purposes. The purpose it is
invoked for is determined by its second "event" argument, causing it
to act as a "callback multiplexer" of sorts.

Unfortunately, that leads to extra complexity in governors, some of
which implement the ->governor() callback as a switch statement
that simply checks the event argument and invokes a separate function
to handle that specific event.

That extra complexity can be eliminated by replacing the all-purpose
->governor() callback with a family of callbacks to carry out specific
governor operations: initialization and exit, start and stop and policy
limits updates. That also turns out to reduce the code size too, so
do it.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-06-03 05:24:15 +0800

06 May, 2016

1 commit

9485e4ca0 cpufreq: governor: Fix handling of special cases in dbs_update() ... Browse Code »

As reported in KBZ 69821:

"With CONFIG_HZ_PERIODIC=y cpu stays at the lowest frequcency 800MHz
even if usage goes to 100%, frequency does not scale up, the governor
in use is ondemand. Neither works conservative. Performance and
userspace governors work as expected.

With CONFIG_NO_HZ_IDLE or CONFIG_NO_HZ_FULL cpu scales up with ondemand
as expected."

Analysis carried out by Chen Yu leads to the conclusion that the
observed issue is due to idle_time in dbs_update() representing a
negative number in which case the function will return 0 as the load
(unless load is greater than 0 for another CPU sharing the policy),
although that need not be the right choice.

Indeed, idle_time representing a negative number means that during
the last sampling interval the CPU was almost 100% busy on the rough
average, so 100 should be returned as the load in that case.

Modify the code accordingly and rearrange it to clarify the handling
of all of the special cases in it. While at it, also avoid returning
zero as the load if time_elapsed is 0 (it doesn't really make sense
to return 0 then).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=69821
Tested-by: Chen Yu
Tested-by: Timo Valtoaho
Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-05-06 20:24:23 +0800

28 Apr, 2016

1 commit

b4f4b4b37 cpufreq: governor: Change confusing struct field and variable names ... Browse Code »

The name of the prev_cpu_wall field in struct cpu_dbs_info is
confusing, because it doesn't represent wall time, but the previous
update time as returned by get_cpu_idle_time() (that may be the
current value of jiffies_64 in some cases, for example).

Moreover, the names of some related variables in dbs_update() take
that confusion further.

Rename all of those things to make their names reflect the purpose
more accurately. While at it, drop unnecessary parens from one of
the updated expressions.

No functional changes.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar
Acked-by: Chen Yu

Rafael J. Wysocki
2016-04-28 21:10:08 +0800

25 Apr, 2016

3 commits

ba1ca654f cpufreq: governor: Fix prev_load initialization in cpufreq_governor_start() ... Browse Code »

The way cpufreq_governor_start() initializes j_cdbs->prev_load is
questionable.

First off, j_cdbs->prev_cpu_wall used as a denominator in the
computation may be zero. The case this happens is when
get_cpu_idle_time_us() returns -1 and get_cpu_idle_time_jiffy()
used to return that number is called exactly at the jiffies_64
wrap time. It is rather hard to trigger that error, but it is not
impossible and it will just crash the kernel then.

Second, j_cdbs->prev_load is computed as the average load during
the entire time since the system started and it may not reflect the
load in the previous sampling period (as it is expected to).
That doesn't play well with the way dbs_update() uses that value.
Namely, if the update time delta (wall_time) happens do be greater
than twice the sampling rate on the first invocation of it, the
initial value of j_cdbs->prev_load (which may be completely off) will
be returned to the caller as the current load (unless it is equal to
zero and unless another CPU sharing the same policy object has a
greater load value).

For this reason, notice that the prev_load field of struct cpu_dbs_info
is only used by dbs_update() and only in that one place, so if
cpufreq_governor_start() is modified to always initialize it to 0,
it will make dbs_update() always compute the actual load first time
it checks the update time delta against the doubled sampling rate
(after initialization) and there won't be any side effects of it.

Consequently, modify cpufreq_governor_start() as described.

Fixes: 18b46abd0009 (cpufreq: governor: Be friendly towards latency-sensitive bursty workloads)
Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-04-25 22:21:34 +0800
1cbc99dfe Merge back cpufreq changes for v4.7. Browse Code »

Rafael J. Wysocki
2016-04-25 21:44:01 +0800
94862a62d Revert "cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC" ... Browse Code »

Revert commit 0df35026c6a5 (cpufreq: governor: Fix negative idle_time
when configured with CONFIG_HZ_PERIODIC) that introduced a regression
by causing the ondemand cpufreq governor to misbehave for
CONFIG_TICK_CPU_ACCOUNTING unset (the frequency goes up to the max at
one point and stays there indefinitely).

The revert takes subsequent modifications of the code in question into
account.

Fixes: 0df35026c6a5 (cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=115261
Reported-and-tested-by: Timo Valtoaho
Cc: 4.5+ # 4.5+
Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-04-25 08:39:20 +0800

02 Apr, 2016

3 commits

2d0c58ad6 cpufreq: governor: Move abstract gov_attr_set code to seperate file ... Browse Code »

Move abstract code related to struct gov_attr_set to a separate (new)
file so it can be shared with (future) goverernors that won't share
more code with "ondemand" and "conservative".

No intentional functional changes.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-04-02 07:09:01 +0800
0dd3c1d67 cpufreq: governor: New data type for management part of dbs_data ... Browse Code »

In addition to fields representing governor tunables, struct dbs_data
contains some fields needed for the management of objects of that
type. As it turns out, that part of struct dbs_data may be shared
with (future) governors that won't use the common code used by
"ondemand" and "conservative", so move it to a separate struct type
and modify the code using struct dbs_data to follow.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-04-02 07:09:00 +0800
0bed612be cpufreq: sched: Helpers to add and remove update_util hooks ... Browse Code »

Replace the single helper for adding and removing cpufreq utilization
update hooks, cpufreq_set_update_util_data(), with a pair of helpers,
cpufreq_add_update_util_hook() and cpufreq_remove_update_util_hook(),
and modify the users of cpufreq_set_update_util_data() accordingly.

With the new helpers, the code using them doesn't need to worry
about the internals of struct update_util_data and in particular
it doesn't need to worry about populating the func field in it
properly upfront.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar
Acked-by: Peter Zijlstra (Intel)

Rafael J. Wysocki
2016-04-02 07:08:43 +0800

23 Mar, 2016

1 commit

539a4c424 cpufreq: governor: Always schedule work on the CPU running update ... Browse Code »

Modify dbs_irq_work() to always schedule the process-context work
on the current CPU which also ran the dbs_update_util_handler()
that the irq_work being handled came from.

This causes the entire frequency update handling (involving the
"ondemand" or "conservative" governors) to be carried out by the
CPU whose frequency is to be updated and reduces the overall amount
of inter-CPU noise related to cpufreq.

Signed-off-by: Rafael J. Wysocki

Rafael J. Wysocki
2016-03-23 06:13:36 +0800

11 Mar, 2016

1 commit

adaf9fcd1 cpufreq: Move scheduler-related code to the sched directory ... Browse Code »

Create cpufreq.c under kernel/sched/ and move the cpufreq code
related to the scheduler to that file and to sched.h.

Redefine cpufreq_update_util() as a static inline function to avoid
function calls at its call sites in the scheduler code (as suggested
by Peter Zijlstra).

Also move the definition of struct update_util_data and declaration
of cpufreq_set_update_util_data() from include/linux/cpufreq.h to
include/linux/sched.h.

Signed-off-by: Rafael J. Wysocki
Acked-by: Peter Zijlstra (Intel)

Rafael J. Wysocki
2016-03-11 03:44:47 +0800

09 Mar, 2016

20 commits

08f511fd4 cpufreq: Reduce cpufreq_update_util() overhead a bit ... Browse Code »

Use the observation that cpufreq_update_util() is only called
by the scheduler with rq->lock held, so the callers of
cpufreq_set_update_util_data() can use synchronize_sched()
instead of synchronize_rcu() to wait for cpufreq_update_util()
to complete. Moreover, if they are updated to do that,
rcu_read_(un)lock() calls in cpufreq_update_util() might be
replaced with rcu_read_(un)lock_sched(), respectively, but
those aren't really necessary, because the scheduler calls
that function from RCU-sched read-side critical sections
already.

In addition to that, if cpufreq_set_update_util_data() checks
the func field in the struct update_util_data before setting
the per-CPU pointer to it, the data->func check may be dropped
from cpufreq_update_util() as well.

Make the above changes to reduce the overhead from
cpufreq_update_util() in the scheduler paths invoking it
and to make the cleanup after removing its callbacks less
heavy-weight somewhat.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar
Acked-by: Peter Zijlstra (Intel)

Rafael J. Wysocki
2016-03-09 22:07:58 +0800
f737236b1 cpufreq: governor: Drop unnecessary checks from show() and store() ... Browse Code »

The show() and store() routines in the cpufreq-governor core don't need
to check if the struct governor_attr they want to use really provides
the callbacks they need as expected (if that's not the case, it means a
bug in the code anyway), so change them to avoid doing that.

Also change the error value to -EBUSY, if the governor is getting
removed and we aren't allowed to store any more changes.

Signed-off-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2016-03-09 21:41:11 +0800
27de34823 cpufreq: governor: Fix race in dbs_update_util_handler() ... Browse Code »

There is a scenario that may lead to undesired results in
dbs_update_util_handler(). Namely, if two CPUs sharing a policy
enter the funtion at the same time, pass the sample delay check
and then one of them is stalled until dbs_work_handler() (queued
up by the other CPU) clears the work counter, it may update the
work counter and queue up another work item prematurely.

To prevent that from happening, use the observation that the CPU
queuing up a work item in dbs_update_util_handler() updates the
last sample time. This means that if another CPU was stalling after
passing the sample delay check and now successfully updated the work
counter as a result of the race described above, it will see the new
value of the last sample time which is different from what it used in
the sample delay check before. If that happens, the sample delay
check passed previously is not valid any more, so the CPU should not
continue.

Fixes: f17cbb53783c (cpufreq: governor: Avoid atomic operations in hot paths)
Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-03-09 21:41:10 +0800
94ab5e030 cpufreq: governor: Make gov_set_update_util() static ... Browse Code »

The gov_set_update_util() routine is only used internally by the
common governor code and it doesn't need to be exported, so make
it static.

No functional changes.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-03-09 21:41:10 +0800
1112e9d83 cpufreq: governor: Narrow down the dbs_data_mutex coverage ... Browse Code »

Since cpufreq_governor_dbs() is now always called with policy->rwsem
held, it cannot be executed twice in parallel for the same policy.
Thus it is not necessary to hold dbs_data_mutex around the invocations
of cpufreq_governor_start/stop/limits() from it as those functions
never modify any data that can be shared between different policies.

However, cpufreq_governor_dbs() may be executed twice in parallal
for different policies using the same gov->gdbs_data object and
dbs_data_mutex is still necessary to protect that object against
concurrent updates.

For this reason, narrow down the dbs_data_mutex locking to
cpufreq_governor_init/exit() where it is needed and rename the
mutex to gov_dbs_data_mutex to reflect its purpose.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-03-09 21:41:10 +0800
e3f5ed939 cpufreq: governor: Make dbs_data_mutex static ... Browse Code »

That mutex is only used by cpufreq_governor_dbs() and it doesn't
need to be exported to modules, so make it static and drop the
export incantation.

No functional changes.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-03-09 21:41:09 +0800
8c8f77fd0 cpufreq: governor: Move per-CPU data to the common code ... Browse Code »

After previous changes there is only one piece of code in the
ondemand governor making references to per-CPU data structures,
but it can be easily modified to avoid doing that, so modify it
accordingly and move the definition of per-CPU data used by the
ondemand and conservative governors to the common code. Next,
change that code to access the per-CPU data structures directly
rather than via a governor callback.

This causes the ->get_cpu_cdbs governor callback to become
unnecessary, so drop it along with the macro and function
definitions related to it.

Finally, drop the definitions of struct od_cpu_dbs_info_s and
struct cs_cpu_dbs_info_s that aren't necessary any more.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-03-09 21:41:09 +0800
7d5a9956a cpufreq: governor: Make governor private data per-policy ... Browse Code »

Some fields in struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s
are only used for a limited set of CPUs. Namely, if a policy is
shared between multiple CPUs, those fields will only be used for one
of them (policy->cpu). This means that they really are per-policy
rather than per-CPU and holding room for them in per-CPU data
structures is generally wasteful. Also moving those fields into
per-policy data structures will allow some significant simplifications
to be made going forward.

For this reason, introduce struct cs_policy_dbs_info and
struct od_policy_dbs_info to hold those fields. Define each of the
new structures as an extension of struct policy_dbs_info (such that
struct policy_dbs_info is embedded in each of them) and introduce
new ->alloc and ->free governor callbacks to allocate and free
those structures, respectively, such that ->alloc() will return
a pointer to the struct policy_dbs_info embedded in the allocated
data structure and ->free() will take that pointer as its argument.

With that, modify the code accessing the data fields in question
in per-CPU data objects to look for them in the new structures
via the struct policy_dbs_info pointer available to it and drop
them from struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-03-09 21:41:08 +0800
a33cce1c6 cpufreq: governor: Fix CPU load information updates via ->store ... Browse Code »

The ->store() callbacks of some tunable sysfs attributes of the
ondemand and conservative governors trigger immediate updates of
the CPU load information for all CPUs "governed" by the given
dbs_data by walking the cpu_dbs_info structures for all online
CPUs in the system and updating them.

This is questionable for two reasons. First, it may lead to a lot of
extra overhead on a system with many CPUs if the given dbs_data is
only associated with a few of them. Second, if governor tunables are
per-policy, the CPUs associated with the other sets of governor
tunables should not be updated.

To address this issue, use the observation that in all of the places
in question the update operation may be carried out in the same way
(because all of the tunables involved are now located in struct
dbs_data and readily available to the common code) and make the
code in those places invoke the same (new) helper function that
will carry out the update correctly.

That new function always checks the ignore_nice_load tunable value
and updates the CPUs' prev_cpu_nice data fields if that's set, which
wasn't done by the original code in store_io_is_busy(), but it
should have been done in there too.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-03-09 21:41:08 +0800
702c9e542 cpufreq: governor: Add a ->start callback for governors ... Browse Code »

To avoid having to check the governor type explicitly in the common
code in order to initialize data structures specific to the governor
type properly, add a ->start callback to struct dbs_governor and
use it to initialize those data structures for the ondemand and
conservative governors.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-03-09 21:41:07 +0800
8847e038c cpufreq: governor: Move io_is_busy to struct dbs_data ... Browse Code »

The io_is_busy governor tunable is only used by the ondemand governor
and is located in the ondemand-specific data structure, but it is
looked at by the common governor code that has to do ugly things to
get to that value, so move it to struct dbs_data and modify ondemand
accordingly.

Since the conservative governor never touches that field, it will
be always 0 for that governor and it won't have any effect on the
results of computations in that case.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-03-09 21:41:06 +0800
574ef14d5 cpufreq: governor: Close dbs_data update race condition ... Browse Code »

It is possible for a dbs_data object to be updated after its
usage counter has become 0. That may happen if governor_store()
runs (via a govenor tunable sysfs attribute write) in parallel
with cpufreq_governor_exit() called for the last cpufreq policy
associated with the dbs_data in question. In that case, if
governor_store() acquires dbs_data->mutex right after
cpufreq_governor_exit() has released it, the ->store() callback
invoked by it may operate on dbs_data with no users. Although
sysfs will cause the kobject_put() in cpufreq_governor_exit() to
block until governor_store() has returned, that situation may
lead to some unexpected results, depending on the implementation
of the ->store callback, and therefore it should be avoided.

To that end, modify governor_store() to check the dbs_data's
usage count before invoking the ->store() callback and return
an error if it is 0 at that point.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-03-09 21:41:06 +0800
07aa4402a cpufreq: governor: Use microseconds in sample delay computations ... Browse Code »

Do not convert microseconds to jiffies and the other way around
in governor computations related to the sampling rate and sample
delay and drop delay_for_sampling_rate() which isn't of any use
then.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-03-09 21:41:05 +0800
57dc3bcd4 cpufreq: governor: Move rate_mult to struct policy_dbs ... Browse Code »

The rate_mult field in struct od_cpu_dbs_info_s is used by the code
shared with the conservative governor and to access it that code
has to do an ugly governor type check. However, first of all it
is ever only used for policy->cpu, so it is per-policy rather than
per-CPU and second, it is initialized to 1 by cpufreq_governor_start(),
so if the conservative governor never modifies it, it will have no
effect on the results of any computations.

For these reasons, move rate_mult to struct policy_dbs_info (as a
common field).

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-03-09 21:41:04 +0800
78347cdb8 cpufreq: governor: Reset sample delay in store_sampling_rate() ... Browse Code »

If store_sampling_rate() updates the sample delay when the ondemand
governor is in the middle of its high/low dance (OD_SUB_SAMPLE sample
type is set), the governor will still do the bottom half of the
previous sample which may take too much time.

To prevent that from happening, change store_sampling_rate() to always
reset the sample delay to 0 which also is consistent with the new
behavior of cpufreq_governor_limits().

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-03-09 21:41:04 +0800
4cccf7555 cpufreq: governor: Get rid of the ->gov_check_cpu callback ... Browse Code »

The way the ->gov_check_cpu governor callback is used by the ondemand
and conservative governors is not really straightforward. Namely, the
governor calls dbs_check_cpu() that updates the load information for
the policy and the invokes ->gov_check_cpu() for the governor.

To get rid of that entanglement, notice that cpufreq_governor_limits()
doesn't need to call dbs_check_cpu() directly. Instead, it can simply
reset the sample delay to 0 which will cause a sample to be taken
immediately. The result of that is practically equivalent to calling
dbs_check_cpu() except that it will trigger a full update of governor
internal state and not just the ->gov_check_cpu() part.

Following that observation, make cpufreq_governor_limits() reset
the sample delay and turn dbs_check_cpu() into a function that will
simply evaluate the load and return the result called dbs_update().

That function can now be called by governors from the routines that
previously were pointed to by ->gov_check_cpu and those routines
can be called directly by each governor instead of dbs_check_cpu().
This way ->gov_check_cpu becomes unnecessary, so drop it.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-03-09 21:41:04 +0800
57eb832f9 cpufreq: governor: Clean up load-related computations ... Browse Code »

Clean up some load-related computations in dbs_check_cpu() and
cpufreq_governor_start() to get rid of unnecessary operations and
type casts and make the code easier to read.

No functional changes.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-03-09 21:41:03 +0800
679b8fe43 cpufreq: governor: Fix nice contribution computation in dbs_check_cpu() ... Browse Code »

The contribution of the CPU nice time to the idle time in dbs_check_cpu()
is computed in a bogus way, as the code may subtract current and previous
nice values for different CPUs.

That doesn't matter for cases when cpufreq policies are not shared,
but may lead to problems otherwise.

Fix the computation and simplify it to avoid taking unnecessary steps.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-03-09 21:41:03 +0800
e4db2813d cpufreq: governor: Avoid atomic operations in hot paths ... Browse Code »

Rework the handling of work items by dbs_update_util_handler() and
dbs_work_handler() so the former (which is executed in scheduler
paths) only uses atomic operations when absolutely necessary. That
is, when the policy is shared and dbs_update_util_handler() has
already decided that this is the time to queue up a work item.

In particular, this avoids the atomic ops entirely on platforms where
policy objects are never shared.

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-03-09 21:41:03 +0800
f62b93740 cpufreq: governor: Simplify gov_cancel_work() slightly ... Browse Code »

The atomic work counter incrementation in gov_cancel_work() is not
necessary any more, because work items won't be queued up after
gov_clear_update_util() anyway, so drop it along with the comment
about how it may be missed by the gov_clear_update_util().

Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar

Rafael J. Wysocki
2016-03-09 21:41:02 +0800