Eric Lee / smarc-fsl-linux-kernel

10 Aug, 2019

1 commit

600f5badb cpufreq: schedutil: Don't skip freq update when limits change ... Browse Code »

To avoid reducing the frequency of a CPU prematurely, we skip reducing
the frequency if the CPU had been busy recently.

This should not be done when the limits of the policy are changed, for
example due to thermal throttling. We should always get the frequency
within the new limits as soon as possible.

Trying to fix this by using only one flag, i.e. need_freq_update, can
lead to a race condition where the flag gets cleared without forcing us
to change the frequency at least once. And so this patch introduces
another flag to avoid that race condition.

Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
Cc: v4.18+ # v4.18+
Reported-by: Doug Smythies
Tested-by: Doug Smythies
Signed-off-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2019-08-10 19:53:19 +0800

25 Jun, 2019

3 commits

af24bde8d sched/uclamp: Add uclamp support to energy_compute() ... Browse Code »

The Energy Aware Scheduler (EAS) estimates the energy impact of waking
up a task on a given CPU. This estimation is based on:

a) an (active) power consumption defined for each CPU frequency
b) an estimation of which frequency will be used on each CPU
c) an estimation of the busy time (utilization) of each CPU

Utilization clamping can affect both b) and c).

A CPU is expected to run:

- on an higher than required frequency, but for a shorter time, in case
its estimated utilization will be smaller than the minimum utilization
enforced by uclamp
- on a smaller than required frequency, but for a longer time, in case
its estimated utilization is bigger than the maximum utilization
enforced by uclamp

While compute_energy() already accounts clamping effects on busy time,
the clamping effects on frequency selection are currently ignored.

Fix it by considering how CPU clamp values will be affected by a
task waking up and being RUNNABLE on that CPU.

Do that by refactoring schedutil_freq_util() to take an additional
task_struct* which allows EAS to evaluate the impact on clamp values of
a task being eventually queued in a CPU. Clamp values are applied to the
RT+CFS utilization only when a FREQUENCY_UTIL is required by
compute_energy().

Do note that switching from ENERGY_UTIL to FREQUENCY_UTIL in the
computation of the cpu_util signal implies that we are more likely to
estimate the highest OPP when a RT task is running in another CPU of
the same performance domain. This can have an impact on energy
estimation but:

- it's not easy to say which approach is better, since it depends on
the use case
- the original approach could still be obtained by setting a smaller
task-specific util_min whenever required

Since we are at that:

- rename schedutil_freq_util() into schedutil_cpu_util(),
since it's not only used for frequency selection.

Signed-off-by: Patrick Bellasi
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alessio Balsini
Cc: Dietmar Eggemann
Cc: Joel Fernandes
Cc: Juri Lelli
Cc: Linus Torvalds
Cc: Morten Rasmussen
Cc: Paul Turner
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Rafael J . Wysocki
Cc: Steve Muckle
Cc: Suren Baghdasaryan
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Todd Kjos
Cc: Vincent Guittot
Cc: Viresh Kumar
Link: https://lkml.kernel.org/r/20190621084217.8167-12-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar

Patrick Bellasi
2019-06-25 01:23:49 +0800
982d9cdc2 sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks ... Browse Code »

Each time a frequency update is required via schedutil, a frequency is
selected to (possibly) satisfy the utilization reported by each
scheduling class and irqs. However, when utilization clamping is in use,
the frequency selection should consider userspace utilization clamping
hints. This will allow, for example, to:

- boost tasks which are directly affecting the user experience
by running them at least at a minimum "requested" frequency

- cap low priority tasks not directly affecting the user experience
by running them only up to a maximum "allowed" frequency

These constraints are meant to support a per-task based tuning of the
frequency selection thus supporting a fine grained definition of
performance boosting vs energy saving strategies in kernel space.

Add support to clamp the utilization of RUNNABLE FAIR and RT tasks
within the boundaries defined by their aggregated utilization clamp
constraints.

Do that by considering the max(min_util, max_util) to give boosted tasks
the performance they need even when they happen to be co-scheduled with
other capped tasks.

Signed-off-by: Patrick Bellasi
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alessio Balsini
Cc: Dietmar Eggemann
Cc: Joel Fernandes
Cc: Juri Lelli
Cc: Linus Torvalds
Cc: Morten Rasmussen
Cc: Paul Turner
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Rafael J . Wysocki
Cc: Steve Muckle
Cc: Suren Baghdasaryan
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Todd Kjos
Cc: Vincent Guittot
Cc: Viresh Kumar
Link: https://lkml.kernel.org/r/20190621084217.8167-10-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar

Patrick Bellasi
2019-06-25 01:23:48 +0800
8ec59c0f5 sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity() ... Browse Code »

The 'struct sched_domain *sd' parameter to arch_scale_cpu_capacity() is
unused since commit:

765d0af19f5f ("sched/topology: Remove the ::smt_gain field from 'struct sched_domain'")

Remove it.

Signed-off-by: Vincent Guittot
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Viresh Kumar
Reviewed-by: Valentin Schneider
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: gregkh@linuxfoundation.org
Cc: linux@armlinux.org.uk
Cc: quentin.perret@arm.com
Cc: rafael@kernel.org
Link: https://lkml.kernel.org/r/1560783617-5827-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar

Vincent Guittot
2019-06-25 01:23:39 +0800

08 May, 2019

1 commit

cf482a49a Merge tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core/kobject updates from Greg KH:
"Here is the "big" set of driver core patches for 5.2-rc1

There are a number of ACPI patches in here as well, as Rafael said
they should go through this tree due to the driver core changes they
required. They have all been acked by the ACPI developers.

There are also a number of small subsystem-specific changes in here,
due to some changes to the kobject core code. Those too have all been
acked by the various subsystem maintainers.

As for content, it's pretty boring outside of the ACPI changes:
- spdx cleanups
- kobject documentation updates
- default attribute groups for kobjects
- other minor kobject/driver core fixes

All have been in linux-next for a while with no reported issues"

* tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (47 commits)
kobject: clean up the kobject add documentation a bit more
kobject: Fix kernel-doc comment first line
kobject: Remove docstring reference to kset
firmware_loader: Fix a typo ("syfs" -> "sysfs")
kobject: fix dereference before null check on kobj
Revert "driver core: platform: Fix the usage of platform device name(pdev->name)"
init/config: Do not select BUILD_BIN2C for IKCONFIG
Provide in-kernel headers to make extending kernel easier
kobject: Improve doc clarity kobject_init_and_add()
kobject: Improve docs for kobject_add/del
driver core: platform: Fix the usage of platform device name(pdev->name)
livepatch: Replace klp_ktype_patch's default_attrs with groups
cpufreq: schedutil: Replace default_attrs field with groups
padata: Replace padata_attr_type default_attrs field with groups
irqdesc: Replace irq_kobj_type's default_attrs field with groups
net-sysfs: Replace ktype default_attrs field with groups
block: Replace all ktype default_attrs with groups
samples/kobject: Replace foo_ktype's default_attrs field with groups
kobject: Add support for default attribute groups to kobj_type
driver core: Postpone DMA tear-down until after devres release for probe failure
...

Linus Torvalds
2019-05-08 04:01:40 +0800

07 May, 2019

1 commit

8f5e823f9 Merge tag 'pm-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management updates from Rafael Wysocki:
"These fix the (Intel-specific) Performance and Energy Bias Hint (EPB)
handling and expose it to user space via sysfs, fix and clean up
several cpufreq drivers, add support for two new chips to the qoriq
cpufreq driver, fix, simplify and clean up the cpufreq core and the
schedutil governor, add support for "CPU" domains to the generic power
domains (genpd) framework and provide low-level PSCI firmware support
for that feature, fix the exynos cpuidle driver and fix a couple of
issues in the devfreq subsystem and clean it up.

Specifics:

- Fix the handling of Performance and Energy Bias Hint (EPB) on Intel
processors and expose it to user space via sysfs to avoid having to
access it through the generic MSR I/F (Rafael Wysocki).

- Improve the handling of global turbo changes made by the platform
firmware in the intel_pstate driver (Rafael Wysocki).

- Convert some slow-path static_cpu_has() callers to boot_cpu_has()
in cpufreq (Borislav Petkov).

- Fix the frequency calculation loop in the armada-37xx cpufreq
driver (Gregory CLEMENT).

- Fix possible object reference leaks in multuple cpufreq drivers
(Wen Yang).

- Fix kerneldoc comment in the centrino cpufreq driver (dongjian).

- Clean up the ACPI and maple cpufreq drivers (Viresh Kumar, Mohan
Kumar).

- Add support for lx2160a and ls1028a to the qoriq cpufreq driver
(Vabhav Sharma, Yuantian Tang).

- Fix kobject memory leak in the cpufreq core (Viresh Kumar).

- Simplify the IOwait boosting in the schedutil cpufreq governor and
rework the TSC cpufreq notifier on x86 (Rafael Wysocki).

- Clean up the cpufreq core and statistics code (Yue Hu, Kyle Lin).

- Improve the cpufreq documentation, add SPDX license tags to some PM
documentation files and unify copyright notices in them (Rafael
Wysocki).

- Add support for "CPU" domains to the generic power domains (genpd)
framework and provide low-level PSCI firmware support for that
feature (Ulf Hansson).

- Rearrange the PSCI firmware support code and add support for
SYSTEM_RESET2 to it (Ulf Hansson, Sudeep Holla).

- Improve genpd support for devices in multiple power domains (Ulf
Hansson).

- Unify target residency for the AFTR and coupled AFTR states in the
exynos cpuidle driver (Marek Szyprowski).

- Introduce new helper routine in the operating performance points
(OPP) framework (Andrew-sh.Cheng).

- Add support for passing on-die termination (ODT) and auto power
down parameters from the kernel to Trusted Firmware-A (TF-A) to the
rk3399_dmc devfreq driver (Enric Balletbo i Serra).

- Add tracing to devfreq (Lukasz Luba).

- Make the exynos-bus devfreq driver suspend all devices on system
shutdown (Marek Szyprowski).

- Fix a few minor issues in the devfreq subsystem and clean it up
somewhat (Enric Balletbo i Serra, MyungJoo Ham, Rob Herring,
Saravana Kannan, Yangtao Li).

- Improve system wakeup diagnostics (Stephen Boyd).

- Rework filesystem sync messages emitted during system suspend and
hibernation (Harry Pan)"

* tag 'pm-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits)
cpufreq: Fix kobject memleak
cpufreq: armada-37xx: fix frequency calculation for opp
cpufreq: centrino: Fix centrino_setpolicy() kerneldoc comment
cpufreq: qoriq: add support for lx2160a
x86: tsc: Rework time_cpufreq_notifier()
PM / Domains: Allow to attach a CPU via genpd_dev_pm_attach_by_id|name()
PM / Domains: Search for the CPU device outside the genpd lock
PM / Domains: Drop unused in-parameter to some genpd functions
PM / Domains: Use the base device for driver_deferred_probe_check_state()
cpufreq: qoriq: Add ls1028a chip support
PM / Domains: Enable genpd_dev_pm_attach_by_id|name() for single PM domain
PM / Domains: Allow OF lookup for multi PM domain case from ->attach_dev()
PM / Domains: Don't kfree() the virtual device in the error path
cpufreq: Move ->get callback check outside of __cpufreq_get()
PM / Domains: remove unnecessary unlikely()
cpufreq: Remove needless bios_limit check in show_bios_limit()
drivers/cpufreq/acpi-cpufreq.c: This fixes the following checkpatch warning
firmware/psci: add support for SYSTEM_RESET2
PM / devfreq: add tracing for scheduling work
trace: events: add devfreq trace event file
...

Linus Torvalds
2019-05-07 10:40:31 +0800

30 Apr, 2019

1 commit

9a4f26cc9 sched/cpufreq: Fix kobject memleak ... Browse Code »

Currently the error return path from kobject_init_and_add() is not
followed by a call to kobject_put() - which means we are leaking
the kobject.

Fix it by adding a call to kobject_put() in the error path of
kobject_init_and_add().

Signed-off-by: Tobin C. Harding
Cc: Greg Kroah-Hartman
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Thomas Gleixner
Cc: Tobin C. Harding
Cc: Vincent Guittot
Cc: Viresh Kumar
Link: http://lkml.kernel.org/r/20190430001144.24890-1-tobin@kernel.org
Signed-off-by: Ingo Molnar

Tobin C. Harding
2019-04-30 13:57:23 +0800

26 Apr, 2019

1 commit

9782adeb3 cpufreq: schedutil: Replace default_attrs field with groups ... Browse Code »

The kobj_type default_attrs field is being replaced by the
default_groups field. Replace sugov_tunables_ktype's default_attrs field
with default groups. Change "sugov_attributes" to "sugov_attrs" and use
the ATTRIBUTE_GROUPS macro to create sugov_groups.

This patch was tested by setting the scaling governor to schedutil and
verifying that the sysfs files for the attributes in the default groups
were created.

Signed-off-by: Kimberly Brown
Acked-by: Peter Zijlstra (Intel)
Acked-by: Rafael J. Wysocki
Signed-off-by: Greg Kroah-Hartman

Kimberly Brown
2019-04-26 04:06:11 +0800

08 Apr, 2019

1 commit

9eca544b1 cpufreq: schedutil: Simplify iowait boosting ... Browse Code »

There is not reason for the minimum iowait boost value in the
schedutil cpufreq governor to depend on the available range of CPU
frequencies. In fact, that dependency is generally confusing,
because it causes the iowait boost to behave somewhat differently
on CPUs with the same maximum frequency and different minimum
frequencies, for example.

For this reason, replace the min field in struct sugov_cpu
with a constant and choose its values to be 1/8 of
SCHED_CAPACITY_SCALE (for consistency with the intel_pstate
driver's internal governor).

[Note that policy->cpuinfo.max_freq will not be a constant any more
after a subsequent change, so this change is depended on by it.]

Link: https://lore.kernel.org/lkml/20190305083202.GU32494@hirez.programming.kicks-ass.net/T/#ee20bdc98b7d89f6110c0d00e5c3ee8c2ced93c3d
Suggested-by: Peter Zijlstra
Signed-off-by: Rafael J. Wysocki
Acked-by: Peter Zijlstra (Intel)
Acked-by: Viresh Kumar

Rafael J. Wysocki
2019-04-08 17:25:32 +0800

25 Mar, 2019

1 commit

231c807a6 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler updates from Thomas Gleixner:
"Third more careful attempt for this set of fixes:

- Prevent a 32bit math overflow in the cpufreq code

- Fix a buffer overflow when scanning the cgroup2 cpu.max property

- A set of fixes for the NOHZ scheduler logic to prevent waking up
CPUs even if the capacity of the busy CPUs is sufficient along with
other tweaks optimizing the behaviour for asymmetric systems
(big/little)"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Skip LLC NOHZ logic for asymmetric systems
sched/fair: Tune down misfit NOHZ kicks
sched/fair: Comment some nohz_balancer_kick() kick conditions
sched/core: Fix buffer overflow in cgroup2 property cpu.max
sched/cpufreq: Fix 32-bit math overflow

Linus Torvalds
2019-03-25 02:42:10 +0800

19 Mar, 2019

1 commit

a23314e9d sched/cpufreq: Fix 32-bit math overflow ... Browse Code »

Vincent Wang reported that get_next_freq() has a mult overflow bug on
32-bit platforms in the IOWAIT boost case, since in that case {util,max}
are in freq units instead of capacity units.

Solve this by moving the IOWAIT boost to capacity units. And since this
means @max is constant; simplify the code.

Reported-by: Vincent Wang
Tested-by: Vincent Wang
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Rafael J. Wysocki
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Chunyan Zhang
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Rafael J. Wysocki
Cc: Rik van Riel
Cc: Thomas Gleixner
Link: https://lkml.kernel.org/r/20190305083202.GU32494@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2019-03-19 19:06:11 +0800

26 Jan, 2019

1 commit

b290ebcf7 sched: Replace synchronize_sched() with synchronize_rcu() ... Browse Code »

Now that synchronize_rcu() waits for preempt-disable regions of
code as well as RCU read-side critical sections, synchronize_sched()
can be replaced by synchronize_rcu(), in fact, synchronize_sched()
is now completely equivalent to synchronize_rcu(). This commit
therefore replaces synchronize_sched() with synchronize_rcu() so that
synchronize_sched() can eventually be removed entirely.

Signed-off-by: Paul E. McKenney
Cc: Ingo Molnar
Cc: Peter Zijlstra

Paul E. McKenney
2019-01-26 07:28:22 +0800

27 Dec, 2018

1 commit

17bf423a1 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle were:

- Introduce "Energy Aware Scheduling" - by Quentin Perret.

This is a coherent topology description of CPUs in cooperation with
the PM subsystem, with the goal to schedule more energy-efficiently
on asymetric SMP platform - such as waking up tasks to the more
energy-efficient CPUs first, as long as the system isn't
oversubscribed.

For details of the design, see:

https://lore.kernel.org/lkml/20180724122521.22109-1-quentin.perret@arm.com/

- Misc cleanups and smaller enhancements"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
sched/fair: Select an energy-efficient CPU on task wake-up
sched/fair: Introduce an energy estimation helper function
sched/fair: Add over-utilization/tipping point indicator
sched/fair: Clean-up update_sg_lb_stats parameters
sched/toplogy: Introduce the 'sched_energy_present' static key
sched/topology: Make Energy Aware Scheduling depend on schedutil
sched/topology: Disable EAS on inappropriate platforms
sched/topology: Add lowest CPU asymmetry sched_domain level pointer
sched/topology: Reference the Energy Model of CPUs when available
PM: Introduce an Energy Model management framework
sched/cpufreq: Prepare schedutil for Energy Aware Scheduling
sched/topology: Relocate arch_scale_cpu_capacity() to the internal header
sched/core: Remove unnecessary unlikely() in push_*_task()
sched/topology: Remove the ::smt_gain field from 'struct sched_domain'
sched: Fix various typos in comments
sched/core: Clean up the #ifdef block in add_nr_running()
sched/fair: Make some variables static
sched/core: Create task_has_idle_policy() helper
sched/fair: Add lsub_positive() and use it consistently
sched/fair: Mask UTIL_AVG_UNCHANGED usages
...

Linus Torvalds
2018-12-27 06:56:10 +0800

11 Dec, 2018

3 commits

531b5c9f5 sched/topology: Make Energy Aware Scheduling depend on schedutil ... Browse Code »

Energy Aware Scheduling (EAS) is designed with the assumption that
frequencies of CPUs follow their utilization value. When using a CPUFreq
governor other than schedutil, the chances of this assumption being true
are small, if any. When schedutil is being used, EAS' predictions are at
least consistent with the frequency requests. Although those requests
have no guarantees to be honored by the hardware, they should at least
guide DVFS in the right direction and provide some hope in regards to the
EAS model being accurate.

To make sure EAS is only used in a sane configuration, create a strong
dependency on schedutil being used. Since having sugov compiled-in does
not provide that guarantee, make CPUFreq call a scheduler function on
governor changes hence letting it rebuild the scheduling domains, check
the governors of the online CPUs, and enable/disable EAS accordingly.

Signed-off-by: Quentin Perret
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Thomas Gleixner
Cc: adharmap@codeaurora.org
Cc: chris.redpath@arm.com
Cc: currojerez@riseup.net
Cc: dietmar.eggemann@arm.com
Cc: edubezval@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: javi.merino@kernel.org
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: pkondeti@codeaurora.org
Cc: skannan@codeaurora.org
Cc: smuckle@google.com
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Cc: tkjos@google.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Cc: viresh.kumar@linaro.org
Link: https://lkml.kernel.org/r/20181203095628.11858-9-quentin.perret@arm.com
Signed-off-by: Ingo Molnar

Quentin Perret
2018-12-11 22:17:00 +0800
938e5e4b0 sched/cpufreq: Prepare schedutil for Energy Aware Scheduling ... Browse Code »

Schedutil requests frequency by aggregating utilization signals from
the scheduler (CFS, RT, DL, IRQ) and applying a 25% margin on top of
them. Since Energy Aware Scheduling (EAS) needs to be able to predict
the frequency requests, it needs to forecast the decisions made by the
governor.

In order to prepare the introduction of EAS, introduce
schedutil_freq_util() to centralize the aforementioned signal
aggregation and make it available to both schedutil and EAS. Since
frequency selection and energy estimation still need to deal with RT and
DL signals slightly differently, schedutil_freq_util() is called with a
different 'type' parameter in those two contexts, and returns an
aggregated utilization signal accordingly. While at it, introduce the
map_util_freq() function which is designed to make schedutil's 25%
margin usable easily for both sugov and EAS.

As EAS will be able to predict schedutil's frequency requests more
accurately than any other governor by design, it'd be sensible to make
sure EAS cannot be used without schedutil. This will be done later, once
EAS has actually been introduced.

Suggested-by: Peter Zijlstra
Signed-off-by: Quentin Perret
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Thomas Gleixner
Cc: adharmap@codeaurora.org
Cc: chris.redpath@arm.com
Cc: currojerez@riseup.net
Cc: dietmar.eggemann@arm.com
Cc: edubezval@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: javi.merino@kernel.org
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: pkondeti@codeaurora.org
Cc: rjw@rjwysocki.net
Cc: skannan@codeaurora.org
Cc: smuckle@google.com
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Cc: tkjos@google.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Cc: viresh.kumar@linaro.org
Link: https://lkml.kernel.org/r/20181203095628.11858-3-quentin.perret@arm.com
Signed-off-by: Ingo Molnar

Quentin Perret
2018-12-11 22:16:58 +0800
108c35a90 sched/cpufreq: Add the SPDX tags ... Browse Code »

The SPDX tags are not present in cpufreq.c and cpufreq_schedutil.c.

Add them and remove the license descriptions

Signed-off-by: Daniel Lezcano
Acked-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Daniel Lezcano
2018-12-11 18:35:25 +0800

25 Jul, 2018

1 commit

2e62c4743 sched/fair: Remove #ifdefs from scale_rt_capacity() ... Browse Code »

Reuse cpu_util_irq() that has been defined for schedutil and set irq util
to 0 when !CONFIG_IRQ_TIME_ACCOUNTING.

But the compiler is not able to optimize the sequence (at least with
aarch64 GCC 7.2.1):

free *= (max - irq);
free /= max;

when irq is fixed to 0

Add a new inline function scale_irq_capacity() that will scale utilization
when irq is accounted. Reuse this funciton in schedutil which applies
similar formula.

Suggested-by: Ingo Molnar
Signed-off-by: Vincent Guittot
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Viresh Kumar
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/1532001606-6689-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar

Vincent Guittot
2018-07-25 17:41:05 +0800

16 Jul, 2018

5 commits

45f5519ec sched/cpufreq: Clarify sugov_get_util() ... Browse Code »

Add a few comments to (hopefully) clarifying some of the magic in
sugov_get_util().

Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Viresh Kumar
Cc: Linus Torvalds
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Vincent Guittot
Cc: claudio@evidence.eu.com
Cc: daniel.lezcano@linaro.org
Cc: dietmar.eggemann@arm.com
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: patrick.bellasi@arm.com
Cc: quentin.perret@arm.com
Cc: rjw@rjwysocki.net
Cc: valentin.schneider@arm.com
Link: http://lkml.kernel.org/r/20180705123617.GM2458@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2018-07-16 06:16:29 +0800
dfa444dc2 sched/cpufreq: Remove sugov_aggregate_util() ... Browse Code »

There is no reason why sugov_get_util() and sugov_aggregate_util()
were in fact separate functions.

Signed-off-by: Vincent Guittot
[ Rebased after adding irq tracking and fixed some compilation errors. ]
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Viresh Kumar
Cc: Linus Torvalds
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: claudio@evidence.eu.com
Cc: daniel.lezcano@linaro.org
Cc: dietmar.eggemann@arm.com
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: patrick.bellasi@arm.com
Cc: quentin.perret@arm.com
Cc: rjw@rjwysocki.net
Cc: valentin.schneider@arm.com
Link: http://lkml.kernel.org/r/1530200714-4504-9-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar

Vincent Guittot
2018-07-16 05:51:21 +0800
9033ea118 cpufreq/schedutil: Take time spent in interrupts into account ... Browse Code »

The time spent executing IRQ handlers can be significant but it is not reflected
in the utilization of CPU when deciding to choose an OPP. Now that we have
access to this metric, schedutil can take it into account when selecting
the OPP for a CPU.

RQS utilization don't see the time spend under interrupt context and report
their value in the normal context time window. We need to compensate this when
adding interrupt utilization

The CPU utilization is:

IRQ util_avg + (1 - IRQ util_avg / max capacity ) * /Sum rq util_avg

A test with iperf on hikey (octo arm64) gives the following speedup:

iperf -c server_address -r -t 5

w/o patch w/ patch
Tx 276 Mbits/sec 304 Mbits/sec +10%
Rx 299 Mbits/sec 328 Mbits/sec +9%

8 iterations
stdev is lower than 1%

Only WFI idle state is enabled (shallowest idle state).

Signed-off-by: Vincent Guittot
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Viresh Kumar
Cc: Linus Torvalds
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: claudio@evidence.eu.com
Cc: daniel.lezcano@linaro.org
Cc: dietmar.eggemann@arm.com
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: patrick.bellasi@arm.com
Cc: quentin.perret@arm.com
Cc: rjw@rjwysocki.net
Cc: valentin.schneider@arm.com
Link: http://lkml.kernel.org/r/1530200714-4504-8-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar

Vincent Guittot
2018-07-16 05:51:21 +0800
8cc90515a cpufreq/schedutil: Use DL utilization tracking ... Browse Code »

Now that we have both the DL class bandwidth requirement and the DL class
utilization, we can detect when CPU is fully used so we should run at max.
Otherwise, we keep using the DL bandwidth requirement to define the
utilization of the CPU.

Signed-off-by: Vincent Guittot
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Viresh Kumar
Cc: Linus Torvalds
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: claudio@evidence.eu.com
Cc: daniel.lezcano@linaro.org
Cc: dietmar.eggemann@arm.com
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: patrick.bellasi@arm.com
Cc: quentin.perret@arm.com
Cc: rjw@rjwysocki.net
Cc: valentin.schneider@arm.com
Link: http://lkml.kernel.org/r/1530200714-4504-6-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar

Vincent Guittot
2018-07-16 05:51:21 +0800
3ae117c6c cpufreq/schedutil: Use RT utilization tracking ... Browse Code »

Add both CFS and RT utilization when selecting an OPP for CFS tasks as RT
can preempt and steal CFS's running time.

RT util_avg is used to take into account the utilization of RT tasks
on the CPU when selecting OPP. If a RT task migrate, the RT utilization
will not migrate but will decay over time. On an overloaded CPU, CFS
utilization reflects the remaining utilization avialable on CPU. When RT
task migrates, the CFS utilization will increase when tasks will start to
use the newly available capacity. At the same pace, RT utilization will
decay and both variations will compensate each other to keep unchanged
overall utilization and will prevent any OPP drop.

Signed-off-by: Vincent Guittot
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Viresh Kumar
Cc: Linus Torvalds
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: claudio@evidence.eu.com
Cc: daniel.lezcano@linaro.org
Cc: dietmar.eggemann@arm.com
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: patrick.bellasi@arm.com
Cc: quentin.perret@arm.com
Cc: rjw@rjwysocki.net
Cc: valentin.schneider@arm.com
Link: http://lkml.kernel.org/r/1530200714-4504-4-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar

Vincent Guittot
2018-07-16 05:51:20 +0800

03 Jul, 2018

1 commit

296b2ffe7 sched/rt: Fix call to cpufreq_update_util() ... Browse Code »

With commit:

8f111bc357aa ("cpufreq/schedutil: Rewrite CPUFREQ_RT support")

the schedutil governor uses rq->rt.rt_nr_running to detect whether an
RT task is currently running on the CPU and to set frequency to max
if necessary.

cpufreq_update_util() is called in enqueue/dequeue_top_rt_rq() but
rq->rt.rt_nr_running has not been updated yet when dequeue_top_rt_rq() is
called so schedutil still considers that an RT task is running when the
last task is dequeued. The update of rq->rt.rt_nr_running happens later
in dequeue_rt_stack().

In fact, we can take advantage of the sequence that the dequeue then
re-enqueue rt entities when a rt task is enqueued or dequeued;
As a result enqueue_top_rt_rq() is always called when a task is
enqueued or dequeued and also when groups are throttled or unthrottled.
The only place that not use enqueue_top_rt_rq() is when root rt_rq is
throttled.

Signed-off-by: Vincent Guittot
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: efault@gmx.de
Cc: juri.lelli@redhat.com
Cc: patrick.bellasi@arm.com
Cc: viresh.kumar@linaro.org
Fixes: 8f111bc357aa ('cpufreq/schedutil: Rewrite CPUFREQ_RT support')
Link: http://lkml.kernel.org/r/1530021202-21695-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar

Vincent Guittot
2018-07-03 15:17:28 +0800

06 Jun, 2018

1 commit

3c89adb0d Merge tag 'pm-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management updates from Rafael Wysocki:
"These include a significant update of the generic power domains
(genpd) and Operating Performance Points (OPP) frameworks, mostly
related to the introduction of power domain performance levels,
cpufreq updates (new driver for Qualcomm Kryo processors, updates of
the existing drivers, some core fixes, schedutil governor
improvements), PCI power management fixes, ACPI workaround for
EC-based wakeup events handling on resume from suspend-to-idle, and
major updates of the turbostat and pm-graph utilities.

Specifics:

- Introduce power domain performance levels into the the generic
power domains (genpd) and Operating Performance Points (OPP)
frameworks (Viresh Kumar, Rajendra Nayak, Dan Carpenter).

- Fix two issues in the runtime PM framework related to the
initialization and removal of devices using device links (Ulf
Hansson).

- Clean up the initialization of drivers for devices in PM domains
(Ulf Hansson, Geert Uytterhoeven).

- Fix a cpufreq core issue related to the policy sysfs interface
causing CPU online to fail for CPUs sharing one cpufreq policy in
some situations (Tao Wang).

- Make it possible to use platform-specific suspend/resume hooks in
the cpufreq-dt driver and make the Armada 37xx DVFS use that
feature (Viresh Kumar, Miquel Raynal).

- Optimize policy transition notifications in cpufreq (Viresh Kumar).

- Improve the iowait boost mechanism in the schedutil cpufreq
governor (Patrick Bellasi).

- Improve the handling of deferred frequency updates in the schedutil
cpufreq governor (Joel Fernandes, Dietmar Eggemann, Rafael Wysocki,
Viresh Kumar).

- Add a new cpufreq driver for Qualcomm Kryo (Ilia Lin).

- Fix and clean up some cpufreq drivers (Colin Ian King, Dmitry
Osipenko, Doug Smythies, Luc Van Oostenryck, Simon Horman, Viresh
Kumar).

- Fix the handling of PCI devices with the DPM_SMART_SUSPEND flag set
and update stale comments in the PCI core PM code (Rafael Wysocki).

- Work around an issue related to the handling of EC-based wakeup
events in the ACPI PM core during resume from suspend-to-idle if
the EC has been put into the low-power mode (Rafael Wysocki).

- Improve the handling of wakeup source objects in the PM core (Doug
Berger, Mahendran Ganesh, Rafael Wysocki).

- Update the driver core to prevent deferred probe from breaking
suspend/resume ordering (Feng Kan).

- Clean up the PM core somewhat (Bjorn Helgaas, Ulf Hansson, Rafael
Wysocki).

- Make the core suspend/resume code and cpufreq support the RT patch
(Sebastian Andrzej Siewior, Thomas Gleixner).

- Consolidate the PM QoS handling in cpuidle governors (Rafael
Wysocki).

- Fix a possible crash in the hibernation core (Tetsuo Handa).

- Update the rockchip-io Adaptive Voltage Scaling (AVS) driver (David
Wu).

- Update the turbostat utility (fixes, cleanups, new CPU IDs, new
command line options, built-in "Low Power Idle" counters support,
new POLL and POLL% columns) and add an entry for it to MAINTAINERS
(Len Brown, Artem Bityutskiy, Chen Yu, Laura Abbott, Matt Turner,
Prarit Bhargava, Srinivas Pandruvada).

- Update the pm-graph to version 5.1 (Todd Brandt).

- Update the intel_pstate_tracer utility (Doug Smythies)"

* tag 'pm-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (128 commits)
tools/power turbostat: update version number
tools/power turbostat: Add Node in output
tools/power turbostat: add node information into turbostat calculations
tools/power turbostat: remove num_ from cpu_topology struct
tools/power turbostat: rename num_cores_per_pkg to num_cores_per_node
tools/power turbostat: track thread ID in cpu_topology
tools/power turbostat: Calculate additional node information for a package
tools/power turbostat: Fix node and siblings lookup data
tools/power turbostat: set max_num_cpus equal to the cpumask length
tools/power turbostat: if --num_iterations, print for specific number of iterations
tools/power turbostat: Add Cannon Lake support
tools/power turbostat: delete duplicate #defines
x86: msr-index.h: Correct SNB_C1/C3_AUTO_UNDEMOTE defines
tools/power turbostat: Correct SNB_C1/C3_AUTO_UNDEMOTE defines
tools/power turbostat: add POLL and POLL% column
tools/power turbostat: Fix --hide Pk%pc10
tools/power turbostat: Build-in "Low Power Idle" counters support
tools/power turbostat: Don't make man pages executable
tools/power turbostat: remove blank lines
tools/power turbostat: a small C-states dump readability immprovement
...

Linus Torvalds
2018-06-06 00:38:39 +0800

25 May, 2018

1 commit

8ecf04e11 sched/cpufreq: Modify aggregate utilization to always include blocked FAIR utilization ... Browse Code »

Since the refactoring introduced by:

commit 8f111bc357aa ("cpufreq/schedutil: Rewrite CPUFREQ_RT support")

we aggregate FAIR utilization only if this class has runnable tasks.

This was mainly due to avoid the risk to stay on an high frequency just
because of the blocked utilization of a CPU not being properly decayed
while the CPU was idle.

However, since:

commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")

the FAIR blocked utilization is properly decayed also for IDLE CPUs.

This allows us to use the FAIR blocked utilization as a safe mechanism
to gracefully reduce the frequency only if no FAIR tasks show up on a
CPU for a reasonable period of time.

Moreover, we also reduce the frequency drops of CPUs running periodic
tasks which, depending on the task periodicity and the time required
for a frequency switch, was increasing the chances to introduce some
undesirable performance variations.

Reported-by: Vincent Guittot
Tested-by: Vincent Guittot
Signed-off-by: Patrick Bellasi
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Thomas Gleixner
Acked-by: Viresh Kumar
Acked-by: Vincent Guittot
Cc: Dietmar Eggemann
Cc: Joel Fernandes
Cc: Juri Lelli
Cc: Linus Torvalds
Cc: Morten Rasmussen
Cc: Peter Zijlstra
Cc: Rafael J . Wysocki
Cc: Steve Muckle
Link: http://lkml.kernel.org/r/20180524141023.13765-2-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar

Patrick Bellasi
2018-05-25 14:04:52 +0800

24 May, 2018

1 commit

a61dec744 cpufreq: schedutil: Avoid missing updates for one-CPU policies ... Browse Code »

Commit 152db033d775 (schedutil: Allow cpufreq requests to be made
even when kthread kicked) made changes to prevent utilization updates
from being discarded during processing a previous request, but it
left a small window in which that still can happen in the one-CPU
policy case. Namely, updates coming in after setting work_in_progress
in sugov_update_commit() and clearing it in sugov_work() will still
be dropped due to the work_in_progress check in sugov_update_single().

To close that window, rearrange the code so as to acquire the update
lock around the deferred update branch in sugov_update_single()
and drop the work_in_progress check from it.

Signed-off-by: Rafael J. Wysocki
Reviewed-by: Juri Lelli
Acked-by: Viresh Kumar
Reviewed-by: Joel Fernandes (Google)

Rafael J. Wysocki
2018-05-24 16:21:18 +0800

23 May, 2018

2 commits

152db033d schedutil: Allow cpufreq requests to be made even when kthread kicked ... Browse Code »

Currently there is a chance of a schedutil cpufreq update request to be
dropped if there is a pending update request. This pending request can
be delayed if there is a scheduling delay of the irq_work and the wake
up of the schedutil governor kthread.

A very bad scenario is when a schedutil request was already just made,
such as to reduce the CPU frequency, then a newer request to increase
CPU frequency (even sched deadline urgent frequency increase requests)
can be dropped, even though the rate limits suggest that its Ok to
process a request. This is because of the way the work_in_progress flag
is used.

This patch improves the situation by allowing new requests to happen
even though the old one is still being processed. Note that in this
approach, if an irq_work was already issued, we just update next_freq
and don't bother to queue another request so there's no extra work being
done to make this happen.

Acked-by: Viresh Kumar
Acked-by: Juri Lelli
Signed-off-by: Joel Fernandes (Google)
Signed-off-by: Rafael J. Wysocki

Joel Fernandes (Google)
2018-05-23 16:37:56 +0800
036399782 cpufreq: Rename cpufreq_can_do_remote_dvfs() ... Browse Code »

This routine checks if the CPU running this code belongs to the policy
of the target CPU or if not, can it do remote DVFS for it remotely. But
the current name of it implies as if it is only about doing remote
updates.

Rename it to make it more relevant.

Suggested-by: Rafael J. Wysocki
Signed-off-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2018-05-23 16:37:08 +0800

22 May, 2018

2 commits

fd7d5287f cpufreq: schedutil: Cleanup and document iowait boost ... Browse Code »

The iowait boosting code has been recently updated to add a progressive
boosting behavior which allows to be less aggressive in boosting tasks
doing only sporadic IO operations, thus being more energy efficient for
example on mobile platforms.

The current code is now however a bit convoluted. Some functionalities
(e.g. iowait boost reset) are replicated in different paths and their
documentation is slightly misaligned.

Let's cleanup the code by consolidating all the IO wait boosting related
functionality within within few dedicated functions and better define
their role:

- sugov_iowait_boost: set/increase the IO wait boost of a CPU
- sugov_iowait_apply: apply/reduce the IO wait boost of a CPU

Both these two function are used at every sugov update and they make
use of a unified IO wait boost reset policy provided by:

- sugov_iowait_reset: reset/disable the IO wait boost of a CPU
if a CPU is not updated for more then one tick

This makes possible a cleaner and more self-contained design for the IO
wait boosting code since the rest of the sugov update routines, both for
single and shared frequency domains, follow the same template:

/* Configure IO boost, if required */
sugov_iowait_boost()

/* Return here if freq change is in progress or throttled */

/* Collect and aggregate utilization information */
sugov_get_util()
sugov_aggregate_util()

/*
* Add IO boost, if currently enabled, on top of the aggregated
* utilization value
*/
sugov_iowait_apply()

As a extra bonus, let's also add the documentation for the new
functions and better align the in-code documentation.

Signed-off-by: Patrick Bellasi
Reviewed-by: Joel Fernandes (Google)
Acked-by: Viresh Kumar
Acked-by: Peter Zijlstra (Intel)
Signed-off-by: Rafael J. Wysocki

Patrick Bellasi
2018-05-22 20:05:05 +0800
295f1a995 cpufreq: schedutil: Fix iowait boost reset ... Browse Code »

A more energy efficient update of the IO wait boosting mechanism has
been introduced in:

commit a5a0809bc58e ("cpufreq: schedutil: Make iowait boost more energy efficient")

where the boost value is expected to be:

- doubled at each successive wakeup from IO
staring from the minimum frequency supported by a CPU

- reset when a CPU is not updated for more then one tick
by either disabling the IO wait boost or resetting its value to the
minimum frequency if this new update requires an IO boost.

This approach is supposed to "ignore" boosting for sporadic wakeups from
IO, while still getting the frequency boosted to the maximum to benefit
long sequence of wakeup from IO operations.

However, these assumptions are not always satisfied.
For example, when an IO boosted CPU enters idle for more the one tick
and then wakes up after an IO wait, since in sugov_set_iowait_boost() we
first check the IOWAIT flag, we keep doubling the iowait boost instead
of restarting from the minimum frequency value.

This misbehavior could happen mainly on non-shared frequency domains,
thus defeating the energy efficiency optimization, but it can also
happen on shared frequency domain systems.

Let fix this issue in sugov_set_iowait_boost() by:
- first check the IO wait boost reset conditions
to eventually reset the boost value
- then applying the correct IO boost value
if required by the caller

Fixes: a5a0809bc58e (cpufreq: schedutil: Make iowait boost more energy efficient)
Reported-by: Viresh Kumar
Signed-off-by: Patrick Bellasi
Reviewed-by: Joel Fernandes (Google)
Acked-by: Viresh Kumar
Acked-by: Peter Zijlstra (Intel)
Signed-off-by: Rafael J. Wysocki

Patrick Bellasi
2018-05-22 20:05:05 +0800

15 May, 2018

2 commits

ecd288429 cpufreq: schedutil: Don't set next_freq to UINT_MAX ... Browse Code »

The schedutil driver sets sg_policy->next_freq to UINT_MAX on certain
occasions to discard the cached value of next freq:
- In sugov_start(), when the schedutil governor is started for a group
of CPUs.
- And whenever we need to force a freq update before rate-limit
duration, which happens when:
- there is an update in cpufreq policy limits.
- Or when the utilization of DL scheduling class increases.

In return, get_next_freq() doesn't return a cached next_freq value but
recalculates the next frequency instead.

But having special meaning for a particular value of frequency makes the
code less readable and error prone. We recently fixed a bug where the
UINT_MAX value was considered as valid frequency in
sugov_update_single().

All we need is a flag which can be used to discard the value of
sg_policy->next_freq and we already have need_freq_update for that. Lets
reuse it instead of setting next_freq to UINT_MAX.

Signed-off-by: Viresh Kumar
Reviewed-by: Joel Fernandes (Google)
Signed-off-by: Rafael J. Wysocki

Viresh Kumar
2018-05-15 16:38:12 +0800
1b04722c3 Revert "cpufreq: schedutil: Don't restrict kthread to related_cpus unnecessarily" ... Browse Code »

This reverts commit e2cabe48c20efb174ce0c01190f8b9c5f3ea1d13.

Lifting the restriction that the sugov kthread is bound to the
policy->related_cpus for a system with a slow switching cpufreq driver,
which is able to perform DVFS from any cpu (e.g. cpufreq-dt), is not
only not beneficial it also harms Enery-Aware Scheduling (EAS) on
systems with asymmetric cpu capacities (e.g. Arm big.LITTLE).

The sugov kthread which does the update for the little cpus could
potentially run on a big cpu. It could prevent that the big cluster goes
into deeper idle states although all the tasks are running on the little
cluster.

Example: hikey960 w/ 4.16.0-rc6-+
Arm big.LITTLE with per-cluster DVFS

root@h960:~# cat /proc/cpuinfo | grep "^CPU part"
CPU part : 0xd03 (Cortex-A53, little cpu)
CPU part : 0xd03
CPU part : 0xd03
CPU part : 0xd03
CPU part : 0xd09 (Cortex-A73, big cpu)
CPU part : 0xd09
CPU part : 0xd09
CPU part : 0xd09

root@h960:/sys/devices/system/cpu/cpufreq# ls
policy0 policy4 schedutil

root@h960:/sys/devices/system/cpu/cpufreq# cat policy*/related_cpus
0 1 2 3
4 5 6 7

(1) w/o the revert:

root@h960:~# ps -eo pid,class,rtprio,pri,psr,comm | awk 'NR == 1 ||
/sugov/'
PID CLS RTPRIO PRI PSR COMMAND
1489 #6 0 140 1 sugov:0
1490 #6 0 140 0 sugov:4

The sugov kthread sugov:4 responsible for policy4 runs on cpu0. (In this
case both sugov kthreads run on little cpus).

cross policy (cluster) remote callback example:
...
migration/1-14 [001] enqueue_task_fair: this_cpu=1 cpu_of(rq)=5
migration/1-14 [001] sugov_update_shared: this_cpu=1 sg_cpu->cpu=5
sg_cpu->sg_policy->policy->related_cpus=4-7
sugov:4-1490 [000] sugov_work: this_cpu=0
sg_cpu->sg_policy->policy->related_cpus=4-7
...

The remote callback (this_cpu=1, target_cpu=5) is executed on cpu=0.

(2) w/ the revert:

root@h960:~# ps -eo pid,class,rtprio,pri,psr,comm | awk 'NR == 1 ||
/sugov/'
PID CLS RTPRIO PRI PSR COMMAND
1491 #6 0 140 2 sugov:0
1492 #6 0 140 4 sugov:4

The sugov kthread sugov:4 responsible for policy4 runs on cpu4.

cross policy (cluster) remote callback example:
...
migration/1-14 [001] enqueue_task_fair: this_cpu=1 cpu_of(rq)=7
migration/1-14 [001] sugov_update_shared: this_cpu=1 sg_cpu->cpu=7
sg_cpu->sg_policy->policy->related_cpus=4-7
sugov:4-1492 [004] sugov_work: this_cpu=4
sg_cpu->sg_policy->policy->related_cpus=4-7
...

The remote callback (this_cpu=1, target_cpu=7) is executed on cpu=4.

Now the sugov kthread executes again on the policy (cluster) for which
the Operating Performance Point (OPP) should be changed.
It avoids the problem that an otherwise idle policy (cluster) is running
schedutil (the sugov kthread) for another one.

Signed-off-by: Dietmar Eggemann
Acked-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Dietmar Eggemann
2018-05-15 16:29:26 +0800

09 May, 2018

2 commits

97739501f cpufreq: schedutil: Avoid using invalid next_freq ... Browse Code »

If the next_freq field of struct sugov_policy is set to UINT_MAX,
it shouldn't be used for updating the CPU frequency (this is a
special "invalid" value), but after commit b7eaf1aab9f8 (cpufreq:
schedutil: Avoid reducing frequency of busy CPUs prematurely) it
may be passed as the new frequency to sugov_update_commit() in
sugov_update_single().

Fix that by adding an extra check for the special UINT_MAX value
of next_freq to sugov_update_single().

Fixes: b7eaf1aab9f8 (cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely)
Reported-by: Viresh Kumar
Cc: 4.12+ # 4.12+
Signed-off-by: Rafael J. Wysocki
Acked-by: Viresh Kumar
Signed-off-by: Rafael J. Wysocki

Rafael J. Wysocki
2018-05-09 18:21:17 +0800
a744490f1 cpufreq: schedutil: remove stale comment ... Browse Code »

After commit 794a56ebd9a57 (sched/cpufreq: Change the worker kthread to
SCHED_DEADLINE) schedutil kthreads are "ignored" for a clock frequency
selection point of view, so the potential corner case for RT tasks is not
possible at all now.

Remove the stale comment mentioning it.

Signed-off-by: Juri Lelli
Signed-off-by: Rafael J. Wysocki

Juri Lelli
2018-05-09 18:20:24 +0800

05 Apr, 2018

1 commit

ea2a6af51 Merge branch 'linus' into sched/urgent, to pick up fixes and updates ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2018-04-05 15:20:34 +0800

01 Apr, 2018

1 commit

1b5d43cfb sched/cpufreq/schedutil: Fix error path mutex unlock ... Browse Code »

This patch prevents the 'global_tunables_lock' mutex from being
unlocked before being locked. This mutex is not locked if the
sugov_kthread_create() function fails.

Signed-off-by: Jules Maselbas
Acked-by: Peter Zijlstra
Cc: Chris Redpath
Cc: Dietmar Eggermann
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Patrick Bellasi
Cc: Stephen Kyle
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Cc: nd@arm.com
Link: http://lkml.kernel.org/r/20180329144301.38419-1-jules.maselbas@arm.com
Signed-off-by: Ingo Molnar

Jules Maselbas
2018-04-01 02:42:38 +0800

24 Mar, 2018

1 commit

e97a90f70 sched/cpufreq: Rate limits for SCHED_DEADLINE ... Browse Code »

When the SCHED_DEADLINE scheduling class increases the CPU utilization, it
should not wait for the rate limit, otherwise it may miss some deadline.

Tests using rt-app on Exynos5422 with up to 10 SCHED_DEADLINE tasks have
shown reductions of even 10% of deadline misses with a negligible
increase of energy consumption (measured through Baylibre Cape).

Signed-off-by: Claudio Scordino
Signed-off-by: Thomas Gleixner
Reviewed-by: Rafael J. Wysocki
Acked-by: Viresh Kumar
Cc: Juri Lelli
Cc: Joel Fernandes
Cc: Vincent Guittot
Cc: linux-pm@vger.kernel.org
Cc: Peter Zijlstra
Cc: Morten Rasmussen
Cc: Patrick Bellasi
Cc: Todd Kjos
Cc: Dietmar Eggemann
Link: https://lkml.kernel.org/r/1520937340-2755-1-git-send-email-claudio@evidence.eu.com

Claudio Scordino
2018-03-24 05:48:22 +0800

09 Mar, 2018

1 commit

8f111bc35 cpufreq/schedutil: Rewrite CPUFREQ_RT support ... Browse Code »

Instead of trying to duplicate scheduler state to track if an RT task
is running, directly use the scheduler runqueue state for it.

This vastly simplifies things and fixes a number of bugs related to
sugov and the scheduler getting out of sync wrt this state.

As a consequence we not also update the remove cfs/dl state when
iterating the shared mask.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Juri Lelli
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Thomas Gleixner
Cc: Viresh Kumar
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2018-03-09 14:59:15 +0800

04 Mar, 2018

1 commit

325ea10c0 sched/headers: Simplify and clean up header usage in the scheduler ... Browse Code »

Do the following cleanups and simplifications:

- sched/sched.h already includes , so no need to
include it in sched/core.c again.

- order the headers alphabetically

- add all headers to kernel/sched/sched.h

- remove all unnecessary includes from the .c files that
are already included in kernel/sched/sched.h.

Finally, make all scheduler .c files use a single common header:

#include "sched.h"

... which now contains a union of the relied upon headers.

This makes the various .c files easier to read and easier to handle.

Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2018-03-04 19:39:29 +0800

03 Mar, 2018

1 commit

97fb7a0a8 sched: Clean up and harmonize the coding style of the scheduler code base ... Browse Code »

A good number of small style inconsistencies have accumulated
in the scheduler core, so do a pass over them to harmonize
all these details:

- fix speling in comments,

- use curly braces for multi-line statements,

- remove unnecessary parentheses from integer literals,

- capitalize consistently,

- remove stray newlines,

- add comments where necessary,

- remove invalid/unnecessary comments,

- align structure definitions and other data types vertically,

- add missing newlines for increased readability,

- fix vertical tabulation where it's misaligned,

- harmonize preprocessor conditional block labeling
and vertical alignment,

- remove line-breaks where they uglify the code,

- add newline after local variable definitions,

No change in functionality:

md5:
1191fa0a890cfa8132156d2959d7e9e2 built-in.o.before.asm
1191fa0a890cfa8132156d2959d7e9e2 built-in.o.after.asm

Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2018-03-03 22:50:21 +0800