05 Jan, 2021
1 commit
-
When entering cluster-wide or system-wide power mode, Exynos cpu
power management driver checks the next hrtimer events of cpu
composing the power domain to prevent unnecessary attempts to enter
the power mode. Since struct cpuidle_device has next_hrtimer, it
can be solved by passing cpuidle device as a parameter of vh.In order to improve responsiveness, it is necessary to prevent
entering the deep idle state in boosting scenario. So, vendor
driver should be able to control the idle state.Due to above requirements, the parameters required for idle enter
and exit different, so the vendor hook is separated into
cpu_idle_enter and cpu_idle_exit.Bug: 176198732
Change-Id: I2262ba1bae5e6622a8e76bc1d5d16fb27af0bb8a
Signed-off-by: Park Bumgyu
25 Oct, 2020
1 commit
-
…g/pub/scm/linux/kernel/git/konrad/swiotlb") into android-mainline
Resolves merge issues with:
drivers/cpufreq/cpufreq.cSigned-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic2907cf71867dab7b8e1b5fcbd4c888fc01f8c22
28 Sep, 2020
1 commit
27 Sep, 2020
1 commit
-
…x/kernel/git/jejb/scsi") into 'android-mainline'
Fixes up a merge issue in:
net/ipv6/route.c
on the way to a 5.9-rc7 release.Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4eb508eb3761b95ad8f39dd79f03b3352481ceaf
23 Sep, 2020
2 commits
-
CPUs may fail to enter the chosen idle state if there was a
pending interrupt, causing the cpuidle driver to return an error
value.Record that and export it via sysfs along with the other idle state
statistics.This could prove useful in understanding behavior of the governor
and the system during usecases that involve multiple CPUs.Signed-off-by: Lina Iyer
[ rjw: Changelog and documentation edits ]
Signed-off-by: Rafael J. Wysocki -
The commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle deeper into the
idle path"), moved the calls rcu_idle_enter|exit() into the cpuidle core.However, it forgot to remove a couple of comments in enter_s2idle_proper()
about why RCU_NONIDLE earlier was needed. So, let's drop them as they have
become a bit misleading.Fixes: 1098582a0f6c ("sched,idle,rcu: Push rcu_idle deeper into the idle path")
Signed-off-by: Ulf Hansson
Signed-off-by: Rafael J. Wysocki
21 Sep, 2020
1 commit
-
Linux 5.9-rc6
Signed-off-by: Greg Kroah-Hartman
Change-Id: I3bccdbb773bfc2c604742e6ff5983bf0b61ba0b5
17 Sep, 2020
1 commit
-
Some drivers have to do significant work, some of which relies on RCU
still being active. Instead of using RCU_NONIDLE in the drivers and
flipping RCU back on, allow drivers to take over RCU-idle duty.Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Ulf Hansson
Tested-by: Borislav Petkov
Signed-off-by: Rafael J. Wysocki
01 Sep, 2020
1 commit
-
Linux 5.9-rc3
Signed-off-by: Greg Kroah-Hartman
Change-Id: Ic7758bc57a7d91861657388ddd015db5c5db5480
26 Aug, 2020
3 commits
-
This allows moving the leave_mm() call into generic code before
rcu_idle_enter(). Gets rid of more trace_*_rcuidle() users.Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Steven Rostedt (VMware)
Reviewed-by: Thomas Gleixner
Acked-by: Rafael J. Wysocki
Tested-by: Marco Elver
Link: https://lkml.kernel.org/r/20200821085348.369441600@infradead.org -
Lots of things take locks, due to a wee bug, rcu_lockdep didn't notice
that the locking tracepoints were using RCU.Push rcu_idle_{enter,exit}() as deep as possible into the idle paths,
this also resolves a lot of _rcuidle()/RCU_NONIDLE() usage.Specifically, sched_clock_idle_wakeup_event() will use ktime which
will use seqlocks which will tickle lockdep, and
stop_critical_timings() uses lock.Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Steven Rostedt (VMware)
Reviewed-by: Thomas Gleixner
Acked-by: Rafael J. Wysocki
Tested-by: Marco Elver
Link: https://lkml.kernel.org/r/20200821085348.310943801@infradead.org -
Match the pattern elsewhere in this file.
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Steven Rostedt (VMware)
Reviewed-by: Thomas Gleixner
Acked-by: Rafael J. Wysocki
Tested-by: Marco Elver
Link: https://lkml.kernel.org/r/20200821085348.251340558@infradead.org
20 Aug, 2020
1 commit
-
An event that gather the idle state that the cpu attempted to enter and
actually entered is added. Through this, the idle statistics of the cpu
can be obtained and used for vendor specific algorithms or for system
analysis.Bug: 162980647
Change-Id: I9c2491d524722042e881864488f7b3cf7e903d1e
Signed-off-by: Park Bumgyu
25 Jun, 2020
1 commit
-
Implement call_cpuidle_s2idle() in analogy with call_cpuidle()
for the s2idle-specific idle state entry and invoke it from
cpuidle_idle_call() to make the s2idle-specific idle entry code
path look more similar to the "regular" idle entry one.No intentional functional impact.
Signed-off-by: Rafael J. Wysocki
Acked-by: Chen Yu
23 Jun, 2020
1 commit
-
Suspend to idle was found to not work on Goldmont CPU recently.
The issue happens due to:
1. On Goldmont the CPU in idle can only be woken up via IPIs,
not POLLING mode, due to commit 08e237fa56a1 ("x86/cpu: Add
workaround for MONITOR instruction erratum on Goldmont based
CPUs")2. When the CPU is entering suspend to idle process, the
_TIF_POLLING_NRFLAG remains on, because cpuidle_enter_s2idle()
doesn't match call_cpuidle() exactly.3. Commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
makes use of _TIF_POLLING_NRFLAG to avoid sending IPIs to idle
CPUs.4. As a result, some IPIs related functions might not work
well during suspend to idle on Goldmont. For example, one
suspected victim:tick_unfreeze() -> timekeeping_resume() -> hrtimers_resume()
-> clock_was_set() -> on_each_cpu() might wait forever,
because the IPIs will not be sent to the CPUs which are
sleeping with _TIF_POLLING_NRFLAG set, and Goldmont CPU
could not be woken up by only setting _TIF_NEED_RESCHED
on the monitor address.To avoid that, clear the _TIF_POLLING_NRFLAG flag before invoking
enter_s2idle_proper() in cpuidle_enter_s2idle() in analogy with the
call_cpuidle() code flow.Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
Suggested-by: Peter Zijlstra (Intel)
Suggested-by: Rafael J. Wysocki
Reported-by: kbuild test robot
Signed-off-by: Chen Yu
[ rjw: Subject / changelog ]
Signed-off-by: Rafael J. Wysocki
13 Feb, 2020
1 commit
-
Notice that pm_qos_remove_notifier() is not used at all and the only
caller of pm_qos_add_notifier() is the cpuidle core, which only needs
the PM_QOS_CPU_DMA_LATENCY notifier to invoke wake_up_all_idle_cpus()
upon changes of the PM_QOS_CPU_DMA_LATENCY target value.First, to ensure that wake_up_all_idle_cpus() will be called
whenever the PM_QOS_CPU_DMA_LATENCY target value changes, modify the
pm_qos_add/update/remove_request() family of functions to check if
the effective constraint for the PM_QOS_CPU_DMA_LATENCY has changed
and call wake_up_all_idle_cpus() directly in that case.Next, drop the PM_QOS_CPU_DMA_LATENCY notifier from cpuidle as it is
not necessary any more.Finally, drop both pm_qos_add_notifier() and pm_qos_remove_notifier(),
as they have no callers now, along with cpu_dma_lat_notifier which is
only used by them.Signed-off-by: Rafael J. Wysocki
Reviewed-by: Ulf Hansson
Reviewed-by: Amit Kucheria
Tested-by: Amit Kucheria
23 Jan, 2020
2 commits
-
Merge changes updating the ACPI processor driver in order to export
acpi_processor_evaluate_cst() to the code outside of it and adding
ACPI support to the intel_idle driver based on that.* intel_idle+acpi:
Documentation: admin-guide: PM: Add intel_idle document
intel_idle: Use ACPI _CST on server systems
intel_idle: Add module parameter to prevent ACPI _CST from being used
intel_idle: Allow ACPI _CST to be used for selected known processors
cpuidle: Allow idle states to be disabled by default
intel_idle: Use ACPI _CST for processor models without C-state tables
intel_idle: Refactor intel_idle_cpuidle_driver_init()
ACPI: processor: Export acpi_processor_evaluate_cst()
ACPI: processor: Make ACPI_PROCESSOR_CSTATE depend on ACPI_PROCESSOR
ACPI: processor: Clean up acpi_processor_evaluate_cst()
ACPI: processor: Introduce acpi_processor_evaluate_cst()
ACPI: processor: Export function to claim _CST control -
Fix cpuidle_find_deepest_state() kernel documentation to avoid
warnings when compiling with W=1.Signed-off-by: Benjamin Gaignard
Acked-by: Randy Dunlap
Signed-off-by: Rafael J. Wysocki
27 Dec, 2019
1 commit
-
In certain situations it may be useful to prevent some idle states
from being used by default while allowing user space to enable them
later on.For this purpose, introduce a new state flag, CPUIDLE_FLAG_OFF, to
mark idle states that should be disabled by default, make the core
set CPUIDLE_STATE_DISABLED_BY_USER for those states at the
initialization time and add a new state attribute in sysfs,
"default_status", to inform user space of the initial status of
the given idle state ("disabled" if CPUIDLE_FLAG_OFF is set for it,
"enabled" otherwise).Signed-off-by: Rafael J. Wysocki
13 Dec, 2019
1 commit
-
The data type of the target_residency_ns field in struct cpuidle_state
is u64, so it does not need to be cast into u64.Get rid of the unnecessary type cast.
Signed-off-by: Rafael J. Wysocki
Acked-by: Daniel Lezcano
Signed-off-by: Rafael J. Wysocki
09 Dec, 2019
1 commit
-
Commit 259231a04561 ("cpuidle: add poll_limit_ns to cpuidle_device
structure") changed, by mistake, the target residency from the first
available sleep state to the last available sleep state (which should
be longer).This might cause excessive polling.
Fixes: 259231a04561 ("cpuidle: add poll_limit_ns to cpuidle_device structure")
Signed-off-by: Marcelo Tosatti
Cc: 5.4+ # 5.4+
Signed-off-by: Rafael J. Wysocki
29 Nov, 2019
1 commit
-
After recent cpuidle updates the "disabled" field in struct
cpuidle_state is only used by two drivers (intel_idle and shmobile
cpuidle) for marking unusable idle states, but that may as well be
achieved with the help of a state flag, so define an "unusable" idle
state flag, CPUIDLE_FLAG_UNUSABLE, make the drivers in question use
it instead of the "disabled" field and make the core set
CPUIDLE_STATE_DISABLED_BY_DRIVER for the idle states with that flag
set.After the above changes, the "disabled" field in struct cpuidle_state
is not used any more, so drop it.No intentional functional impact.
Signed-off-by: Rafael J. Wysocki
20 Nov, 2019
2 commits
-
Modify cpuidle_use_deepest_state() to take an additional exit latency
limit argument to be passed to find_deepest_idle_state() and make
cpuidle_idle_call() pass dev->forced_idle_latency_limit_ns to it for
forced idle.Suggested-by: Rafael J. Wysocki
Signed-off-by: Daniel Lezcano
[ rjw: Rebase and rearrange code, subject & changelog ]
Signed-off-by: Rafael J. Wysocki -
In some cases it may be useful to specify an exit latency limit for
the idle state to be used during CPU idle time injection.Instead of duplicating the information in struct cpuidle_device
or propagating the latency limit in the call stack, replace the
use_deepest_state field with forced_latency_limit_ns to represent
that limit, so that the deepest idle state with exit latency within
that limit is forced (i.e. no governors) when it is set.A zero exit latency limit for forced idle means to use governors in
the usual way (analogous to use_deepest_state equal to "false" before
this change).Additionally, add play_idle_precise() taking two arguments, the
duration of forced idle and the idle state exit latency limit, both
in nanoseconds, and redefine play_idle() as a wrapper around that
new function.This change is preparatory, no functional impact is expected.
Suggested-by: Rafael J. Wysocki
Signed-off-by: Daniel Lezcano
[ rjw: Subject, changelog, cpuidle_use_deepest_state() kerneldoc, whitespace ]
Signed-off-by: Rafael J. Wysocki
12 Nov, 2019
1 commit
-
Currently, the cpuidle subsystem uses microseconds as the unit of
time which (among other things) causes the idle loop to incur some
integer division overhead for no clear benefit.In order to allow cpuidle to measure time in nanoseconds, add two
new fields, exit_latency_ns and target_residency_ns, to represent the
exit latency and target residency of an idle state in nanoseconds,
respectively, to struct cpuidle_state and initialize them with the
help of the corresponding values in microseconds provided by drivers.
Additionally, change cpuidle_governor_latency_req() to return the
idle state exit latency constraint in nanoseconds.Also meeasure idle state residency (last_residency_ns in struct
cpuidle_device and time_ns in struct cpuidle_driver) in nanoseconds
and update the cpuidle core and governors accordingly.However, the menu governor still computes typical intervals in
microseconds to avoid integer overflows.Signed-off-by: Rafael J. Wysocki
Acked-by: Peter Zijlstra (Intel)
Acked-by: Doug Smythies
Tested-by: Doug Smythies
06 Nov, 2019
1 commit
-
There are two reasons why CPU idle states may be disabled: either
because the driver has disabled them or because they have been
disabled by user space via sysfs.In the former case, the state's "disabled" flag is set once during
the initialization of the driver and it is never cleared later (it
is read-only effectively). In the latter case, the "disable" field
of the given state's cpuidle_state_usage struct is set and it may be
changed via sysfs. Thus checking whether or not an idle state has
been disabled involves reading these two flags every time.In order to avoid the additional check of the state's "disabled" flag
(which is effectively read-only anyway), use the value of it at the
init time to set a (new) flag in the "disable" field of that state's
cpuidle_state_usage structure and use the sysfs interface to
manipulate another (new) flag in it. This way the state is disabled
whenever the "disable" field of its cpuidle_state_usage structure is
nonzero, whatever the reason, and it is the only place to look into
to check whether or not the state has been disabled.Signed-off-by: Rafael J. Wysocki
Acked-by: Daniel Lezcano
Acked-by: Peter Zijlstra (Intel)
30 Jul, 2019
1 commit
-
Add a poll_limit_ns variable to cpuidle_device structure.
Calculate and configure it in the new cpuidle_poll_time
function, in case its zero.Individual governors are allowed to override this value.
Signed-off-by: Marcelo Tosatti
Signed-off-by: Rafael J. Wysocki
10 Apr, 2019
1 commit
-
To be able to predict the sleep duration for a CPU entering idle, it
is essential to know the expiration time of the next timer. Both the
teo and the menu cpuidle governors already use this information for
CPU idle state selection.Moving forward, a similar prediction needs to be made for a group of
idle CPUs rather than for a single one and the following changes
implement a new genpd governor for that purpose.In order to support that feature, add a new function called
tick_nohz_get_next_hrtimer() that will return the next hrtimer
expiration time of a given CPU to be invoked after deciding
whether or not to stop the scheduler tick on that CPU.Make the cpuidle core call tick_nohz_get_next_hrtimer() right
before invoking the ->enter() callback provided by the cpuidle
driver for the given state and store its return value in the
per-CPU struct cpuidle_device, so as to make it available to code
outside of cpuidle.Note that at the point when cpuidle calls tick_nohz_get_next_hrtimer(),
the governor's ->select() callback has already returned and indicated
whether or not the tick should be stopped, so in fact the value
returned by tick_nohz_get_next_hrtimer() always is the next hrtimer
expiration time for the given CPU, possibly including the tick (if
it hasn't been stopped).Co-developed-by: Lina Iyer
Co-developed-by: Daniel Lezcano
Acked-by: Daniel Lezcano
Signed-off-by: Ulf Hansson
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki
13 Dec, 2018
1 commit
-
Add two new metrics for CPU idle states, "above" and "below", to count
the number of times the given state had been asked for (or entered
from the kernel's perspective), but the observed idle duration turned
out to be too short or too long for it (respectively).These metrics help to estimate the quality of the CPU idle governor
in use.Signed-off-by: Rafael J. Wysocki
11 Dec, 2018
1 commit
-
Add cpuidle.governor= command line parameter to allow the default
cpuidle governor to be replaced.That is useful, for example, if someone running a tickful kernel
wants to use the menu governor on it.Signed-off-by: Rafael J. Wysocki
18 Sep, 2018
1 commit
-
Currently, ktime_us_delta() is invoked unconditionally to compute the
idle residency of the CPU, but it only makes sense to do that if a
valid idle state has been entered, so move the ktime_us_delta()
invocation after the entered_state >= 0 check.While at it, merge two comment blocks in there into one and drop
a space between type casting of diff.This patch has no functional changes.
Signed-off-by: Fieah Lim
[ rjw: Changelog cleanup, comment format fix ]
Signed-off-by: Rafael J. Wysocki
06 Apr, 2018
1 commit
-
Add a new pointer argument to cpuidle_select() and to the ->select
cpuidle governor callback to allow a boolean value indicating
whether or not the tick should be stopped before entering the
selected state to be returned from there.Make the ladder governor ignore that pointer (to preserve its
current behavior) and make the menu governor return 'false" through
it if:
(1) the idle exit latency is constrained at 0, or
(2) the selected state is a polling one, or
(3) the expected idle period duration is within the tick period
range.In addition to that, the correction factor computations in the menu
governor need to take the possibility that the tick may not be
stopped into account to avoid artificially small correction factor
values. To that end, add a mechanism to record tick wakeups, as
suggested by Peter Zijlstra, and use it to modify the menu_update()
behavior when tick wakeup occurs. Namely, if the CPU is woken up by
the tick and the return value of tick_nohz_get_sleep_length() is not
within the tick boundary, the predicted idle duration is likely too
short, so make menu_update() try to compensate for that by updating
the governor statistics as though the CPU was idle for a long time.Since the value returned through the new argument pointer of
cpuidle_select() is not used by its caller yet, this change by
itself is not expected to alter the functionality of the code.Signed-off-by: Rafael J. Wysocki
Acked-by: Peter Zijlstra (Intel)
29 Mar, 2018
1 commit
-
Add a new attribute group called "s2idle" under the sysfs directory
of each cpuidle state that supports the ->enter_s2idle callback
and put two new attributes, "usage" and "time", into that group to
represent the number of times the given state was requested for
suspend-to-idle and the total time spent in suspend-to-idle after
requesting that state, respectively.That will allow diagnostic information related to suspend-to-idle
to be collected without enabling advanced debug features and
analyzing dmesg output.Signed-off-by: Rafael J. Wysocki
09 Nov, 2017
2 commits
-
Clean up cpuidle_enable_device() to avoid doing an assignment
in an expression evaluated as an argument of if (), which also
makes the code in question more readable.Signed-off-by: Gaurav Jindal
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki -
Do not fetch per CPU drv if cpuidle_curr_governor is NULL
to avoid useless per CPU processing.Signed-off-by: Gaurav Jindal
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki
28 Sep, 2017
1 commit
-
When failing to enter broadcast timer mode for an idle state that
requires it, a new state is selected that does not require broadcast,
but the broadcast variable remains set. This causes
tick_broadcast_exit to be called despite not having entered broadcast
mode.This causes the WARN_ON_ONCE(!irqs_disabled()) to trigger in some
cases. It does not appear to cause problems for code today, but seems
to violate the interface so should be fixed.Signed-off-by: Nicholas Piggin
Reviewed-by: Thomas Gleixner
Signed-off-by: Rafael J. Wysocki
11 Aug, 2017
1 commit
-
Rename the ->enter_freeze cpuidle driver callback to ->enter_s2idle
to make it clear that it is used for entering suspend-to-idle and
rename the related functions, variables and so on accordingly.Signed-off-by: Rafael J. Wysocki
15 May, 2017
1 commit
-
Ville reported that on his Core2, which has TSC stop in idle, we would
always report very short idle durations. He tracked this down to
commit:e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
which replaces ktime_get() with local_clock().
Add a sched_clock_idle_wakeup_event() call, which will re-sync the
clock with ktime_get_ns() when TSC is unstable and no-op otherwise.Reported-by: Ville Syrjälä
Tested-by: Ville Syrjälä
Signed-off-by: Peter Zijlstra (Intel)
Cc: Daniel Lezcano
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Rafael J . Wysocki
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Fixes: e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
Signed-off-by: Ingo Molnar
02 May, 2017
1 commit
-
In case of there is no cpuidle devices registered, dev will be null, and
panic will be triggered like below;
In this patch, add checking of dev before usage, like that done in
cpuidle_idle_call.Panic without fix:
[ 184.961328] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 184.961328] IP: cpuidle_use_deepest_state+0x30/0x60
...
[ 184.961328] play_idle+0x8d/0x210
[ 184.961328] ? __schedule+0x359/0x8e0
[ 184.961328] ? _raw_spin_unlock_irqrestore+0x28/0x50
[ 184.961328] ? kthread_queue_delayed_work+0x41/0x80
[ 184.961328] clamp_idle_injection_func+0x64/0x1e0Fixes: bb8313b603eb8 (cpuidle: Allow enforcing deepest idle state selection)
Signed-off-by: Li, Fei
Tested-by: Shi, Feng
Reviewed-by: Andy Shevchenko
Cc: 4.10+ # 4.10+
Signed-off-by: Rafael J. Wysocki
02 Mar, 2017
1 commit
-
We are going to split out of , which
will have to be picked up from other headers and .c files.Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar