05 Jan, 2021

1 commit

  • When entering cluster-wide or system-wide power mode, Exynos cpu
    power management driver checks the next hrtimer events of cpu
    composing the power domain to prevent unnecessary attempts to enter
    the power mode. Since struct cpuidle_device has next_hrtimer, it
    can be solved by passing cpuidle device as a parameter of vh.

    In order to improve responsiveness, it is necessary to prevent
    entering the deep idle state in boosting scenario. So, vendor
    driver should be able to control the idle state.

    Due to above requirements, the parameters required for idle enter
    and exit different, so the vendor hook is separated into
    cpu_idle_enter and cpu_idle_exit.

    Bug: 176198732

    Change-Id: I2262ba1bae5e6622a8e76bc1d5d16fb27af0bb8a
    Signed-off-by: Park Bumgyu

    Park Bumgyu
     

25 Oct, 2020

1 commit


28 Sep, 2020

1 commit


27 Sep, 2020

1 commit


23 Sep, 2020

2 commits

  • CPUs may fail to enter the chosen idle state if there was a
    pending interrupt, causing the cpuidle driver to return an error
    value.

    Record that and export it via sysfs along with the other idle state
    statistics.

    This could prove useful in understanding behavior of the governor
    and the system during usecases that involve multiple CPUs.

    Signed-off-by: Lina Iyer
    [ rjw: Changelog and documentation edits ]
    Signed-off-by: Rafael J. Wysocki

    Lina Iyer
     
  • The commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle deeper into the
    idle path"), moved the calls rcu_idle_enter|exit() into the cpuidle core.

    However, it forgot to remove a couple of comments in enter_s2idle_proper()
    about why RCU_NONIDLE earlier was needed. So, let's drop them as they have
    become a bit misleading.

    Fixes: 1098582a0f6c ("sched,idle,rcu: Push rcu_idle deeper into the idle path")
    Signed-off-by: Ulf Hansson
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     

21 Sep, 2020

1 commit


17 Sep, 2020

1 commit

  • Some drivers have to do significant work, some of which relies on RCU
    still being active. Instead of using RCU_NONIDLE in the drivers and
    flipping RCU back on, allow drivers to take over RCU-idle duty.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Ulf Hansson
    Tested-by: Borislav Petkov
    Signed-off-by: Rafael J. Wysocki

    Peter Zijlstra
     

01 Sep, 2020

1 commit


26 Aug, 2020

3 commits

  • This allows moving the leave_mm() call into generic code before
    rcu_idle_enter(). Gets rid of more trace_*_rcuidle() users.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Steven Rostedt (VMware)
    Reviewed-by: Thomas Gleixner
    Acked-by: Rafael J. Wysocki
    Tested-by: Marco Elver
    Link: https://lkml.kernel.org/r/20200821085348.369441600@infradead.org

    Peter Zijlstra
     
  • Lots of things take locks, due to a wee bug, rcu_lockdep didn't notice
    that the locking tracepoints were using RCU.

    Push rcu_idle_{enter,exit}() as deep as possible into the idle paths,
    this also resolves a lot of _rcuidle()/RCU_NONIDLE() usage.

    Specifically, sched_clock_idle_wakeup_event() will use ktime which
    will use seqlocks which will tickle lockdep, and
    stop_critical_timings() uses lock.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Steven Rostedt (VMware)
    Reviewed-by: Thomas Gleixner
    Acked-by: Rafael J. Wysocki
    Tested-by: Marco Elver
    Link: https://lkml.kernel.org/r/20200821085348.310943801@infradead.org

    Peter Zijlstra
     
  • Match the pattern elsewhere in this file.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Steven Rostedt (VMware)
    Reviewed-by: Thomas Gleixner
    Acked-by: Rafael J. Wysocki
    Tested-by: Marco Elver
    Link: https://lkml.kernel.org/r/20200821085348.251340558@infradead.org

    Peter Zijlstra
     

20 Aug, 2020

1 commit

  • An event that gather the idle state that the cpu attempted to enter and
    actually entered is added. Through this, the idle statistics of the cpu
    can be obtained and used for vendor specific algorithms or for system
    analysis.

    Bug: 162980647

    Change-Id: I9c2491d524722042e881864488f7b3cf7e903d1e
    Signed-off-by: Park Bumgyu

    Park Bumgyu
     

25 Jun, 2020

1 commit

  • Implement call_cpuidle_s2idle() in analogy with call_cpuidle()
    for the s2idle-specific idle state entry and invoke it from
    cpuidle_idle_call() to make the s2idle-specific idle entry code
    path look more similar to the "regular" idle entry one.

    No intentional functional impact.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Chen Yu

    Rafael J. Wysocki
     

23 Jun, 2020

1 commit

  • Suspend to idle was found to not work on Goldmont CPU recently.

    The issue happens due to:

    1. On Goldmont the CPU in idle can only be woken up via IPIs,
    not POLLING mode, due to commit 08e237fa56a1 ("x86/cpu: Add
    workaround for MONITOR instruction erratum on Goldmont based
    CPUs")

    2. When the CPU is entering suspend to idle process, the
    _TIF_POLLING_NRFLAG remains on, because cpuidle_enter_s2idle()
    doesn't match call_cpuidle() exactly.

    3. Commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
    makes use of _TIF_POLLING_NRFLAG to avoid sending IPIs to idle
    CPUs.

    4. As a result, some IPIs related functions might not work
    well during suspend to idle on Goldmont. For example, one
    suspected victim:

    tick_unfreeze() -> timekeeping_resume() -> hrtimers_resume()
    -> clock_was_set() -> on_each_cpu() might wait forever,
    because the IPIs will not be sent to the CPUs which are
    sleeping with _TIF_POLLING_NRFLAG set, and Goldmont CPU
    could not be woken up by only setting _TIF_NEED_RESCHED
    on the monitor address.

    To avoid that, clear the _TIF_POLLING_NRFLAG flag before invoking
    enter_s2idle_proper() in cpuidle_enter_s2idle() in analogy with the
    call_cpuidle() code flow.

    Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
    Suggested-by: Peter Zijlstra (Intel)
    Suggested-by: Rafael J. Wysocki
    Reported-by: kbuild test robot
    Signed-off-by: Chen Yu
    [ rjw: Subject / changelog ]
    Signed-off-by: Rafael J. Wysocki

    Chen Yu
     

13 Feb, 2020

1 commit

  • Notice that pm_qos_remove_notifier() is not used at all and the only
    caller of pm_qos_add_notifier() is the cpuidle core, which only needs
    the PM_QOS_CPU_DMA_LATENCY notifier to invoke wake_up_all_idle_cpus()
    upon changes of the PM_QOS_CPU_DMA_LATENCY target value.

    First, to ensure that wake_up_all_idle_cpus() will be called
    whenever the PM_QOS_CPU_DMA_LATENCY target value changes, modify the
    pm_qos_add/update/remove_request() family of functions to check if
    the effective constraint for the PM_QOS_CPU_DMA_LATENCY has changed
    and call wake_up_all_idle_cpus() directly in that case.

    Next, drop the PM_QOS_CPU_DMA_LATENCY notifier from cpuidle as it is
    not necessary any more.

    Finally, drop both pm_qos_add_notifier() and pm_qos_remove_notifier(),
    as they have no callers now, along with cpu_dma_lat_notifier which is
    only used by them.

    Signed-off-by: Rafael J. Wysocki
    Reviewed-by: Ulf Hansson
    Reviewed-by: Amit Kucheria
    Tested-by: Amit Kucheria

    Rafael J. Wysocki
     

23 Jan, 2020

2 commits

  • Merge changes updating the ACPI processor driver in order to export
    acpi_processor_evaluate_cst() to the code outside of it and adding
    ACPI support to the intel_idle driver based on that.

    * intel_idle+acpi:
    Documentation: admin-guide: PM: Add intel_idle document
    intel_idle: Use ACPI _CST on server systems
    intel_idle: Add module parameter to prevent ACPI _CST from being used
    intel_idle: Allow ACPI _CST to be used for selected known processors
    cpuidle: Allow idle states to be disabled by default
    intel_idle: Use ACPI _CST for processor models without C-state tables
    intel_idle: Refactor intel_idle_cpuidle_driver_init()
    ACPI: processor: Export acpi_processor_evaluate_cst()
    ACPI: processor: Make ACPI_PROCESSOR_CSTATE depend on ACPI_PROCESSOR
    ACPI: processor: Clean up acpi_processor_evaluate_cst()
    ACPI: processor: Introduce acpi_processor_evaluate_cst()
    ACPI: processor: Export function to claim _CST control

    Rafael J. Wysocki
     
  • Fix cpuidle_find_deepest_state() kernel documentation to avoid
    warnings when compiling with W=1.

    Signed-off-by: Benjamin Gaignard
    Acked-by: Randy Dunlap
    Signed-off-by: Rafael J. Wysocki

    Benjamin Gaignard
     

27 Dec, 2019

1 commit

  • In certain situations it may be useful to prevent some idle states
    from being used by default while allowing user space to enable them
    later on.

    For this purpose, introduce a new state flag, CPUIDLE_FLAG_OFF, to
    mark idle states that should be disabled by default, make the core
    set CPUIDLE_STATE_DISABLED_BY_USER for those states at the
    initialization time and add a new state attribute in sysfs,
    "default_status", to inform user space of the initial status of
    the given idle state ("disabled" if CPUIDLE_FLAG_OFF is set for it,
    "enabled" otherwise).

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

13 Dec, 2019

1 commit


09 Dec, 2019

1 commit

  • Commit 259231a04561 ("cpuidle: add poll_limit_ns to cpuidle_device
    structure") changed, by mistake, the target residency from the first
    available sleep state to the last available sleep state (which should
    be longer).

    This might cause excessive polling.

    Fixes: 259231a04561 ("cpuidle: add poll_limit_ns to cpuidle_device structure")
    Signed-off-by: Marcelo Tosatti
    Cc: 5.4+ # 5.4+
    Signed-off-by: Rafael J. Wysocki

    Marcelo Tosatti
     

29 Nov, 2019

1 commit

  • After recent cpuidle updates the "disabled" field in struct
    cpuidle_state is only used by two drivers (intel_idle and shmobile
    cpuidle) for marking unusable idle states, but that may as well be
    achieved with the help of a state flag, so define an "unusable" idle
    state flag, CPUIDLE_FLAG_UNUSABLE, make the drivers in question use
    it instead of the "disabled" field and make the core set
    CPUIDLE_STATE_DISABLED_BY_DRIVER for the idle states with that flag
    set.

    After the above changes, the "disabled" field in struct cpuidle_state
    is not used any more, so drop it.

    No intentional functional impact.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

20 Nov, 2019

2 commits

  • Modify cpuidle_use_deepest_state() to take an additional exit latency
    limit argument to be passed to find_deepest_idle_state() and make
    cpuidle_idle_call() pass dev->forced_idle_latency_limit_ns to it for
    forced idle.

    Suggested-by: Rafael J. Wysocki
    Signed-off-by: Daniel Lezcano
    [ rjw: Rebase and rearrange code, subject & changelog ]
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     
  • In some cases it may be useful to specify an exit latency limit for
    the idle state to be used during CPU idle time injection.

    Instead of duplicating the information in struct cpuidle_device
    or propagating the latency limit in the call stack, replace the
    use_deepest_state field with forced_latency_limit_ns to represent
    that limit, so that the deepest idle state with exit latency within
    that limit is forced (i.e. no governors) when it is set.

    A zero exit latency limit for forced idle means to use governors in
    the usual way (analogous to use_deepest_state equal to "false" before
    this change).

    Additionally, add play_idle_precise() taking two arguments, the
    duration of forced idle and the idle state exit latency limit, both
    in nanoseconds, and redefine play_idle() as a wrapper around that
    new function.

    This change is preparatory, no functional impact is expected.

    Suggested-by: Rafael J. Wysocki
    Signed-off-by: Daniel Lezcano
    [ rjw: Subject, changelog, cpuidle_use_deepest_state() kerneldoc, whitespace ]
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     

12 Nov, 2019

1 commit

  • Currently, the cpuidle subsystem uses microseconds as the unit of
    time which (among other things) causes the idle loop to incur some
    integer division overhead for no clear benefit.

    In order to allow cpuidle to measure time in nanoseconds, add two
    new fields, exit_latency_ns and target_residency_ns, to represent the
    exit latency and target residency of an idle state in nanoseconds,
    respectively, to struct cpuidle_state and initialize them with the
    help of the corresponding values in microseconds provided by drivers.
    Additionally, change cpuidle_governor_latency_req() to return the
    idle state exit latency constraint in nanoseconds.

    Also meeasure idle state residency (last_residency_ns in struct
    cpuidle_device and time_ns in struct cpuidle_driver) in nanoseconds
    and update the cpuidle core and governors accordingly.

    However, the menu governor still computes typical intervals in
    microseconds to avoid integer overflows.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: Doug Smythies
    Tested-by: Doug Smythies

    Rafael J. Wysocki
     

06 Nov, 2019

1 commit

  • There are two reasons why CPU idle states may be disabled: either
    because the driver has disabled them or because they have been
    disabled by user space via sysfs.

    In the former case, the state's "disabled" flag is set once during
    the initialization of the driver and it is never cleared later (it
    is read-only effectively). In the latter case, the "disable" field
    of the given state's cpuidle_state_usage struct is set and it may be
    changed via sysfs. Thus checking whether or not an idle state has
    been disabled involves reading these two flags every time.

    In order to avoid the additional check of the state's "disabled" flag
    (which is effectively read-only anyway), use the value of it at the
    init time to set a (new) flag in the "disable" field of that state's
    cpuidle_state_usage structure and use the sysfs interface to
    manipulate another (new) flag in it. This way the state is disabled
    whenever the "disable" field of its cpuidle_state_usage structure is
    nonzero, whatever the reason, and it is the only place to look into
    to check whether or not the state has been disabled.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Daniel Lezcano
    Acked-by: Peter Zijlstra (Intel)

    Rafael J. Wysocki
     

30 Jul, 2019

1 commit


10 Apr, 2019

1 commit

  • To be able to predict the sleep duration for a CPU entering idle, it
    is essential to know the expiration time of the next timer. Both the
    teo and the menu cpuidle governors already use this information for
    CPU idle state selection.

    Moving forward, a similar prediction needs to be made for a group of
    idle CPUs rather than for a single one and the following changes
    implement a new genpd governor for that purpose.

    In order to support that feature, add a new function called
    tick_nohz_get_next_hrtimer() that will return the next hrtimer
    expiration time of a given CPU to be invoked after deciding
    whether or not to stop the scheduler tick on that CPU.

    Make the cpuidle core call tick_nohz_get_next_hrtimer() right
    before invoking the ->enter() callback provided by the cpuidle
    driver for the given state and store its return value in the
    per-CPU struct cpuidle_device, so as to make it available to code
    outside of cpuidle.

    Note that at the point when cpuidle calls tick_nohz_get_next_hrtimer(),
    the governor's ->select() callback has already returned and indicated
    whether or not the tick should be stopped, so in fact the value
    returned by tick_nohz_get_next_hrtimer() always is the next hrtimer
    expiration time for the given CPU, possibly including the tick (if
    it hasn't been stopped).

    Co-developed-by: Lina Iyer
    Co-developed-by: Daniel Lezcano
    Acked-by: Daniel Lezcano
    Signed-off-by: Ulf Hansson
    [ rjw: Subject & changelog ]
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     

13 Dec, 2018

1 commit

  • Add two new metrics for CPU idle states, "above" and "below", to count
    the number of times the given state had been asked for (or entered
    from the kernel's perspective), but the observed idle duration turned
    out to be too short or too long for it (respectively).

    These metrics help to estimate the quality of the CPU idle governor
    in use.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

11 Dec, 2018

1 commit


18 Sep, 2018

1 commit

  • Currently, ktime_us_delta() is invoked unconditionally to compute the
    idle residency of the CPU, but it only makes sense to do that if a
    valid idle state has been entered, so move the ktime_us_delta()
    invocation after the entered_state >= 0 check.

    While at it, merge two comment blocks in there into one and drop
    a space between type casting of diff.

    This patch has no functional changes.

    Signed-off-by: Fieah Lim
    [ rjw: Changelog cleanup, comment format fix ]
    Signed-off-by: Rafael J. Wysocki

    Fieah Lim
     

06 Apr, 2018

1 commit

  • Add a new pointer argument to cpuidle_select() and to the ->select
    cpuidle governor callback to allow a boolean value indicating
    whether or not the tick should be stopped before entering the
    selected state to be returned from there.

    Make the ladder governor ignore that pointer (to preserve its
    current behavior) and make the menu governor return 'false" through
    it if:
    (1) the idle exit latency is constrained at 0, or
    (2) the selected state is a polling one, or
    (3) the expected idle period duration is within the tick period
    range.

    In addition to that, the correction factor computations in the menu
    governor need to take the possibility that the tick may not be
    stopped into account to avoid artificially small correction factor
    values. To that end, add a mechanism to record tick wakeups, as
    suggested by Peter Zijlstra, and use it to modify the menu_update()
    behavior when tick wakeup occurs. Namely, if the CPU is woken up by
    the tick and the return value of tick_nohz_get_sleep_length() is not
    within the tick boundary, the predicted idle duration is likely too
    short, so make menu_update() try to compensate for that by updating
    the governor statistics as though the CPU was idle for a long time.

    Since the value returned through the new argument pointer of
    cpuidle_select() is not used by its caller yet, this change by
    itself is not expected to alter the functionality of the code.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Peter Zijlstra (Intel)

    Rafael J. Wysocki
     

29 Mar, 2018

1 commit

  • Add a new attribute group called "s2idle" under the sysfs directory
    of each cpuidle state that supports the ->enter_s2idle callback
    and put two new attributes, "usage" and "time", into that group to
    represent the number of times the given state was requested for
    suspend-to-idle and the total time spent in suspend-to-idle after
    requesting that state, respectively.

    That will allow diagnostic information related to suspend-to-idle
    to be collected without enabling advanced debug features and
    analyzing dmesg output.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

09 Nov, 2017

2 commits


28 Sep, 2017

1 commit

  • When failing to enter broadcast timer mode for an idle state that
    requires it, a new state is selected that does not require broadcast,
    but the broadcast variable remains set. This causes
    tick_broadcast_exit to be called despite not having entered broadcast
    mode.

    This causes the WARN_ON_ONCE(!irqs_disabled()) to trigger in some
    cases. It does not appear to cause problems for code today, but seems
    to violate the interface so should be fixed.

    Signed-off-by: Nicholas Piggin
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Rafael J. Wysocki

    Nicholas Piggin
     

11 Aug, 2017

1 commit


15 May, 2017

1 commit

  • Ville reported that on his Core2, which has TSC stop in idle, we would
    always report very short idle durations. He tracked this down to
    commit:

    e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")

    which replaces ktime_get() with local_clock().

    Add a sched_clock_idle_wakeup_event() call, which will re-sync the
    clock with ktime_get_ns() when TSC is unstable and no-op otherwise.

    Reported-by: Ville Syrjälä
    Tested-by: Ville Syrjälä
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Daniel Lezcano
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Rafael J . Wysocki
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Fixes: e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

02 May, 2017

1 commit

  • In case of there is no cpuidle devices registered, dev will be null, and
    panic will be triggered like below;
    In this patch, add checking of dev before usage, like that done in
    cpuidle_idle_call.

    Panic without fix:
    [ 184.961328] BUG: unable to handle kernel NULL pointer dereference at
    (null)
    [ 184.961328] IP: cpuidle_use_deepest_state+0x30/0x60
    ...
    [ 184.961328] play_idle+0x8d/0x210
    [ 184.961328] ? __schedule+0x359/0x8e0
    [ 184.961328] ? _raw_spin_unlock_irqrestore+0x28/0x50
    [ 184.961328] ? kthread_queue_delayed_work+0x41/0x80
    [ 184.961328] clamp_idle_injection_func+0x64/0x1e0

    Fixes: bb8313b603eb8 (cpuidle: Allow enforcing deepest idle state selection)
    Signed-off-by: Li, Fei
    Tested-by: Shi, Feng
    Reviewed-by: Andy Shevchenko
    Cc: 4.10+ # 4.10+
    Signed-off-by: Rafael J. Wysocki

    Li, Fei
     

02 Mar, 2017

1 commit