06 Feb, 2020

1 commit

  • [ Upstream commit 63f202e5edf161c2ccffa286a9a701e995427b15 ]

    If the current state with the maximum "early hits" metric in
    teo_select() is also the one "matching" the expected idle duration,
    it will be used as the candidate one for selection even if its
    "misses" metric is greater than its "hits" metric, which is not
    correct.

    In that case, the candidate state should be shallower than the
    current one and its "early hits" metric should be the maximum
    among the idle states shallower than the current one.

    To make that happen, modify teo_select() to save the index of
    the state whose "early hits" metric is the maximum for the
    range of states below the current one and go back to that state
    if it turns out that the current one should be rejected.

    Fixes: 159e48560f51 ("cpuidle: teo: Fix "early hits" handling for disabled idle states")
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Sasha Levin

    Rafael J. Wysocki
     

23 Jan, 2020

1 commit

  • commit 57388a2ccb6c2f554fee39772886c69b796dde53 upstream.

    Fix a simple bug in rotating array index.

    Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems")
    Signed-off-by: Ikjoon Jang
    Cc: 5.1+ # 5.1+
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    Ikjoon Jang
     

18 Dec, 2019

6 commits

  • commit 36fcb4292473cb9c9ce7706d038bcf0eda5cabeb upstream.

    Commit 259231a04561 ("cpuidle: add poll_limit_ns to cpuidle_device
    structure") changed, by mistake, the target residency from the first
    available sleep state to the last available sleep state (which should
    be longer).

    This might cause excessive polling.

    Fixes: 259231a04561 ("cpuidle: add poll_limit_ns to cpuidle_device structure")
    Signed-off-by: Marcelo Tosatti
    Cc: 5.4+ # 5.4+
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    Marcelo Tosatti
     
  • commit 159e48560f51d9c2aa02d762a18cd24f7868ab27 upstream.

    The TEO governor uses idle duration "bins" defined in accordance with
    the CPU idle states table provided by the driver, so that each "bin"
    covers the idle duration range between the target residency of the
    idle state corresponding to it and the target residency of the closest
    deeper idle state. The governor collects statistics for each bin
    regardless of whether or not the idle state corresponding to it is
    currently enabled.

    In particular, the "early hits" metric measures the likelihood of a
    situation in which the idle duration measured after wakeup falls into
    to given bin, but the time till the next timer (sleep length) falls
    into a bin corresponding to one of the deeper idle states. It is
    used when the "hits" and "misses" metrics indicate that the state
    "matching" the sleep length should not be selected, so that the state
    with the maximum "early hits" value is selected instead of it.

    If the idle state corresponding to the given bin is disabled, it
    cannot be selected and if it turns out to be the one that should be
    selected, a shallower idle state needs to be used instead of it.
    Nevertheless, the metrics collected for the bin corresponding to it
    are still valid and need to be taken into account as though that
    state had not been disabled.

    As far as the "early hits" metric is concerned, teo_select() tries to
    take disabled states into account, but the state index corresponding
    to the maximum "early hits" value computed by it may be incorrect.
    Namely, it always uses the index of the previous maximum "early hits"
    state then, but there may be enabled idle states closer to the
    disabled one in question. In particular, if the current candidate
    state (whose index is the idx value) is closer to the disabled one
    and the "early hits" value of the disabled state is greater than the
    current maximum, the index of the current candidate state (idx)
    should replace the "maximum early hits state" index.

    Modify the code to handle that case correctly.

    Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems")
    Reported-by: Doug Smythies
    Signed-off-by: Rafael J. Wysocki
    Cc: 5.1+ # 5.1+
    Signed-off-by: Greg Kroah-Hartman

    Rafael J. Wysocki
     
  • commit e43dcf20215f0287ea113102617ca04daa76b70e upstream.

    The TEO governor uses idle duration "bins" defined in accordance with
    the CPU idle states table provided by the driver, so that each "bin"
    covers the idle duration range between the target residency of the
    idle state corresponding to it and the target residency of the closest
    deeper idle state. The governor collects statistics for each bin
    regardless of whether or not the idle state corresponding to it is
    currently enabled.

    In particular, the "hits" and "misses" metrics measure the likelihood
    of a situation in which both the time till the next timer (sleep
    length) and the idle duration measured after wakeup fall into the
    given bin. Namely, if the "hits" value is greater than the "misses"
    one, that situation is more likely than the one in which the sleep
    length falls into the given bin, but the idle duration measured after
    wakeup falls into a bin corresponding to one of the shallower idle
    states.

    If the idle state corresponding to the given bin is disabled, it
    cannot be selected and if it turns out to be the one that should be
    selected, a shallower idle state needs to be used instead of it.
    Nevertheless, the metrics collected for the bin corresponding to it
    are still valid and need to be taken into account as though that
    state had not been disabled.

    For this reason, make teo_select() always use the "hits" and "misses"
    values of the idle duration range that the sleep length falls into
    even if the specific idle state corresponding to it is disabled and
    if the "hits" values is greater than the "misses" one, select the
    closest enabled shallower idle state in that case.

    Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems")
    Signed-off-by: Rafael J. Wysocki
    Cc: 5.1+ # 5.1+
    Signed-off-by: Greg Kroah-Hartman

    Rafael J. Wysocki
     
  • commit 4f690bb8ce4cc5d3fabe3a8e9c2401de1554cdc1 upstream.

    Rename a local variable in teo_select() in preparation for subsequent
    code modifications, no intentional impact.

    Signed-off-by: Rafael J. Wysocki
    Cc: 5.1+ # 5.1+
    Signed-off-by: Greg Kroah-Hartman

    Rafael J. Wysocki
     
  • commit 069ce2ef1a6dd84cbd4d897b333e30f825e021f0 upstream.

    Prevent disabled CPU idle state with target residencies beyond the
    anticipated idle duration from being taken into account by the TEO
    governor.

    Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems")
    Signed-off-by: Rafael J. Wysocki
    Cc: 5.1+ # 5.1+
    Signed-off-by: Greg Kroah-Hartman

    Rafael J. Wysocki
     
  • commit 918c1fe9fbbe46fcf56837ff21f0ef96424e8b29 upstream.

    Fix __cpuidle_set_driver() to check if any of the CPUs in the mask has
    a driver different from drv already and, if so, return -EBUSY before
    updating any cpuidle_drivers per-CPU pointers.

    Fixes: 82467a5a885d ("cpuidle: simplify multiple driver support")
    Cc: 3.11+ # 3.11+
    Signed-off-by: Zhenzhong Duan
    [ rjw: Subject & changelog ]
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    Zhenzhong Duan
     

22 Oct, 2019

1 commit

  • Currenly haltpoll isn't aware of the 'idle=' override, the priority is
    'idle=poll' > haltpoll > 'idle=halt'. When 'idle=poll' is used, cpuidle
    driver is bypassed but current_driver in sys still shows 'haltpoll'.

    When 'idle=halt' is used, haltpoll takes precedence and makes
    'idle=halt' have no effect.

    Add a check to prevent the haltpoll driver from loading if 'idle=' is
    present.

    Signed-off-by: Zhenzhong Duan
    Co-developed-by: Joao Martins
    [ rjw: Subject ]
    Signed-off-by: Rafael J. Wysocki

    Zhenzhong Duan
     

18 Sep, 2019

1 commit

  • Pull power management updates from Rafael Wysocki:
    "These include a rework of the main suspend-to-idle code flow (related
    to the handling of spurious wakeups), a switch over of several users
    of cpufreq notifiers to QoS-based limits, a new devfreq driver for
    Tegra20, a new cpuidle driver and governor for virtualized guests, an
    extension of the wakeup sources framework to expose wakeup sources as
    device objects in sysfs, and more.

    Specifics:

    - Rework the main suspend-to-idle control flow to avoid repeating
    "noirq" device resume and suspend operations in case of spurious
    wakeups from the ACPI EC and decouple the ACPI EC wakeups support
    from the LPS0 _DSM support (Rafael Wysocki).

    - Extend the wakeup sources framework to expose wakeup sources as
    device objects in sysfs (Tri Vo, Stephen Boyd).

    - Expose system suspend statistics in sysfs (Kalesh Singh).

    - Introduce a new haltpoll cpuidle driver and a new matching governor
    for virtualized guests wanting to do guest-side polling in the idle
    loop (Marcelo Tosatti, Joao Martins, Wanpeng Li, Stephen Rothwell).

    - Fix the menu and teo cpuidle governors to allow the scheduler tick
    to be stopped if PM QoS is used to limit the CPU idle state exit
    latency in some cases (Rafael Wysocki).

    - Increase the resolution of the play_idle() argument to microseconds
    for more fine-grained injection of CPU idle cycles (Daniel
    Lezcano).

    - Switch over some users of cpuidle notifiers to the new QoS-based
    frequency limits and drop the CPUFREQ_ADJUST and CPUFREQ_NOTIFY
    policy notifier events (Viresh Kumar).

    - Add new cpufreq driver based on nvmem for sun50i (Yangtao Li).

    - Add support for MT8183 and MT8516 to the mediatek cpufreq driver
    (Andrew-sh.Cheng, Fabien Parent).

    - Add i.MX8MN support to the imx-cpufreq-dt cpufreq driver (Anson
    Huang).

    - Add qcs404 to cpufreq-dt-platdev blacklist (Jorge Ramirez-Ortiz).

    - Update the qcom cpufreq driver (among other things, to make it
    easier to extend and to use kryo cpufreq for other nvmem-based
    SoCs) and add qcs404 support to it (Niklas Cassel, Douglas
    RAILLARD, Sibi Sankar, Sricharan R).

    - Fix assorted issues and make assorted minor improvements in the
    cpufreq code (Colin Ian King, Douglas RAILLARD, Florian Fainelli,
    Gustavo Silva, Hariprasad Kelam).

    - Add new devfreq driver for NVidia Tegra20 (Dmitry Osipenko, Arnd
    Bergmann).

    - Add new Exynos PPMU events to devfreq events and extend that
    mechanism (Lukasz Luba).

    - Fix and clean up the exynos-bus devfreq driver (Kamil Konieczny).

    - Improve devfreq documentation and governor code, fix spelling typos
    in devfreq (Ezequiel Garcia, Krzysztof Kozlowski, Leonard Crestez,
    MyungJoo Ham, Gaël PORTAY).

    - Add regulators enable and disable to the OPP (operating performance
    points) framework (Kamil Konieczny).

    - Update the OPP framework to support multiple opp-suspend properties
    (Anson Huang).

    - Fix assorted issues and make assorted minor improvements in the OPP
    code (Niklas Cassel, Viresh Kumar, Yue Hu).

    - Clean up the generic power domains (genpd) framework (Ulf Hansson).

    - Clean up assorted pieces of power management code and documentation
    (Akinobu Mita, Amit Kucheria, Chuhong Yuan).

    - Update the pm-graph tool to version 5.5 including multiple fixes
    and improvements (Todd Brandt).

    - Update the cpupower utility (Benjamin Weis, Geert Uytterhoeven,
    Sébastien Szymanski)"

    * tag 'pm-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (126 commits)
    cpuidle-haltpoll: Enable kvm guest polling when dedicated physical CPUs are available
    cpuidle-haltpoll: do not set an owner to allow modunload
    cpuidle-haltpoll: return -ENODEV on modinit failure
    cpuidle-haltpoll: set haltpoll as preferred governor
    cpuidle: allow governor switch on cpuidle_register_driver()
    PM: runtime: Documentation: add runtime_status ABI document
    pm-graph: make setVal unbuffered again for python2 and python3
    powercap: idle_inject: Use higher resolution for idle injection
    cpuidle: play_idle: Increase the resolution to usec
    cpuidle-haltpoll: vcpu hotplug support
    cpufreq: Add qcs404 to cpufreq-dt-platdev blacklist
    cpufreq: qcom: Add support for qcs404 on nvmem driver
    cpufreq: qcom: Refactor the driver to make it easier to extend
    cpufreq: qcom: Re-organise kryo cpufreq to use it for other nvmem based qcom socs
    dt-bindings: opp: Add qcom-opp bindings with properties needed for CPR
    dt-bindings: opp: qcom-nvmem: Support pstates provided by a power domain
    Documentation: cpufreq: Update policy notifier documentation
    cpufreq: Remove CPUFREQ_ADJUST and CPUFREQ_NOTIFY policy notifier events
    PM / Domains: Verify PM domain type in dev_pm_genpd_set_performance_state()
    PM / Domains: Simplify genpd_lookup_dev()
    ...

    Linus Torvalds
     

11 Sep, 2019

5 commits

  • The downside of guest side polling is that polling is performed even
    with other runnable tasks in the host. However, even if poll in kvm
    can aware whether or not other runnable tasks in the same pCPU, it
    can still incur extra overhead in over-subscribe scenario. Now we can
    just enable guest polling when dedicated pCPUs are available.

    Acked-by: Paolo Bonzini
    Signed-off-by: Wanpeng Li
    Signed-off-by: Rafael J. Wysocki

    Wanpeng Li
     
  • cpuidle-haltpoll can be built as a module to allow optional late load.
    Given we are setting @owner to THIS_MODULE, cpuidle will attempt to grab a
    module reference every time a cpuidle_device is registered -- so
    essentially all online cpus get a reference.

    This prevents for the module to be unloaded later, which makes the
    module_exit callback entirely unused. Thus remove the @owner and allow
    module to be unloaded.

    Fixes: fa86ee90eb11 ("add cpuidle-haltpoll driver")
    Signed-off-by: Joao Martins
    Signed-off-by: Rafael J. Wysocki

    Joao Martins
     
  • When a user loads cpuidle-haltpoll on a non KVM guest the module will
    successfully load, even though idle driver registration didn't take
    place.

    We should instead return -ENODEV signaling the user that the driver can't
    be loaded, like other error paths in haltpoll_init(). An example of such
    error paths is when we return -EBUSY when attempting to register an idle
    driver when it had one already (e.g. intel_idle loads at boot and then we
    attempt to insert module cpuidle-haltpoll).

    Fixes: fa86ee90eb11 ("add cpuidle-haltpoll driver")
    Signed-off-by: Joao Martins
    Signed-off-by: Rafael J. Wysocki

    Joao Martins
     
  • Right now, guest current governors have the following ratings:

    * ladder -> 10
    * teo -> 19
    * menu -> 20
    * haltpoll -> 21
    * ladder + nohz=off -> 25

    haltpoll governor got introduced and it is now the default governor given
    its highest rating -- with ladder+nohz being the exception -- regardless of
    idle driver in the guest. An example of an undesirable case is x86 KVM
    guests with MWAIT which have intel_idle registered first, and consequently
    will have haltpoll be used as governor which would get limited to a poll
    state and state 1 and the other states wouldn't get used.

    To keep the previous defaults we decrease rating of governor to 9 (below
    current lowest rating) and thus rely on @governor switch on
    cpuidle_register_driver() to tie in haltpoll idle driver and governor
    together.

    Signed-off-by: Joao Martins
    Signed-off-by: Rafael J. Wysocki

    Joao Martins
     
  • The recently introduced haltpoll driver is largely only useful with
    haltpoll governor. To allow drivers to associate with a particular idle
    behaviour, add a @governor property to 'struct cpuidle_driver' and thus
    allow a cpuidle driver to switch to a *preferred* governor on idle driver
    registration. We save the previous governor, and when an idle driver is
    unregistered we switch back to that.

    The @governor can be overridden by cpuidle.governor= boot param or
    alternatively be ignored if the governor doesn't exist.

    Signed-off-by: Joao Martins
    Signed-off-by: Rafael J. Wysocki

    Joao Martins
     

03 Sep, 2019

1 commit

  • When cpus != maxcpus cpuidle-haltpoll will fail to register all vcpus
    past the online ones and thus fail to register the idle driver.
    This is because cpuidle_add_sysfs() will return with -ENODEV as a
    consequence from get_cpu_device() return no device for a non-existing
    CPU.

    Instead switch to cpuidle_register_driver() and manually register each
    of the present cpus through cpuhp_setup_state() callbacks and future
    ones that get onlined or offlined. This mimmics similar logic that
    intel_idle does.

    Fixes: fa86ee90eb11 ("add cpuidle-haltpoll driver")
    Signed-off-by: Joao Martins
    Signed-off-by: Boris Ostrovsky
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Rafael J. Wysocki

    Joao Martins
     

10 Aug, 2019

6 commits

  • Notice that setting measured_us to UINT_MAX in teo_update() earlier
    doesn't change the behavior of the following code, so do that and
    eliminate a redundant check used for setting measured_us to UINT_MAX.

    This change is not expected to alter functionality.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • Current PSCI code handles idle state entry through the
    psci_cpu_suspend_enter() API, that takes an idle state index as a
    parameter and convert the index into a previously initialized
    power_state parameter before calling the PSCI.CPU_SUSPEND() with it.

    This is unwieldly, since it forces the PSCI firmware layer to keep track
    of power_state parameter for every idle state so that the
    index->power_state conversion can be made in the PSCI firmware layer
    instead of the CPUidle driver implementations.

    Move the power_state handling out of drivers/firmware/psci
    into the respective ACPI/DT PSCI CPUidle backends and convert
    the psci_cpu_suspend_enter() API to get the power_state
    parameter as input, which makes it closer to its firmware
    interface PSCI.CPU_SUSPEND() API.

    A notable side effect is that the PSCI ACPI/DT CPUidle backends
    now can directly handle (and if needed update) power_state
    parameters before handing them over to the PSCI firmware
    interface to trigger PSCI.CPU_SUSPEND() calls.

    Signed-off-by: Lorenzo Pieralisi
    Acked-by: Daniel Lezcano
    Reviewed-by: Ulf Hansson
    Reviewed-by: Sudeep Holla
    Cc: Will Deacon
    Cc: Ulf Hansson
    Cc: Sudeep Holla
    Cc: Daniel Lezcano
    Cc: Catalin Marinas
    Cc: Mark Rutland
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Will Deacon

    Lorenzo Pieralisi
     
  • Allow selection of the PSCI CPUidle in the kernel by updating
    the respective Kconfig entry.

    Remove PSCI callbacks from ARM/ARM64 generic CPU ops
    to prevent the PSCI idle driver from clashing with the generic
    ARM CPUidle driver initialization, that relies on CPU ops
    to initialize and enter idle states.

    Signed-off-by: Lorenzo Pieralisi
    Reviewed-by: Ulf Hansson
    Cc: Will Deacon
    Cc: Ulf Hansson
    Cc: Sudeep Holla
    Cc: Daniel Lezcano
    Cc: Catalin Marinas
    Cc: Mark Rutland
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Will Deacon

    Lorenzo Pieralisi
     
  • PSCI firmware is the standard power management control for
    all ARM64 based platforms and it is also deployed on some
    ARM 32 bit platforms to date.

    Idle state entry in PSCI is currently achieved by calling
    arm_cpuidle_init() and arm_cpuidle_suspend() in a generic
    idle driver, which in turn relies on ARM/ARM64 CPUidle back-end
    to relay the call into PSCI firmware if PSCI is the boot method.

    Given that PSCI is the standard idle entry method on ARM64 systems
    (which means that no other CPUidle driver are expected on ARM64
    platforms - so PSCI is already a generic idle driver), in order to
    simplify idle entry and code maintenance, it makes sense to have a PSCI
    specific idle driver so that idle code that it is currently living in
    drivers/firmware directory can be hoisted out of it and moved
    where it belongs, into a full-fledged PSCI driver, leaving PSCI code
    in drivers/firmware as a pure firmware interface, as it should be.

    Implement a PSCI CPUidle driver. By default it is a silent Kconfig entry
    which is left unselected, since it selection would clash with the
    generic ARM CPUidle driver that provides a PSCI based idle driver
    through the arm/arm64 arches back-ends CPU operations.

    Signed-off-by: Lorenzo Pieralisi
    Acked-by: Daniel Lezcano
    Reviewed-by: Ulf Hansson
    Reviewed-by: Sudeep Holla
    Cc: Ulf Hansson
    Cc: Sudeep Holla
    Cc: Daniel Lezcano
    Cc: Mark Rutland
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Will Deacon

    Lorenzo Pieralisi
     
  • CPUidle back-end operations are not implemented in some platforms
    but this should not be considered an error serious enough to be
    logged. Check the arm_cpuidle_init() return value to detect whether
    the failure must be reported or not in the kernel log and do
    not log it if the platform does not support CPUidle operations.

    Signed-off-by: Lorenzo Pieralisi
    Acked-by: Daniel Lezcano
    Reviewed-by: Ulf Hansson
    Reviewed-by: Sudeep Holla
    Cc: Ulf Hansson
    Cc: Sudeep Holla
    Cc: Daniel Lezcano
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Will Deacon

    Lorenzo Pieralisi
     
  • The generic ARM CPUidle driver includes by mistake.

    Remove the topology header include.

    Signed-off-by: Lorenzo Pieralisi
    Acked-by: Daniel Lezcano
    Reviewed-by: Ulf Hansson
    Reviewed-by: Sudeep Holla
    Cc: Ulf Hansson
    Cc: Sudeep Holla
    Cc: Daniel Lezcano
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Will Deacon

    Lorenzo Pieralisi
     

05 Aug, 2019

2 commits

  • The TEO goveror prevents the scheduler tick from being stopped (unless
    stopped already) if there is a PM QoS latency constraint for the given
    CPU and the target residency of the deepest idle state matching that
    constraint is below the tick boundary.

    However, that is problematic if CPUs with PM QoS latency constraints
    are idle for long times, because it effectively causes the tick to
    run on them all the time which is wasteful. [It is also confusing
    and questionable if they are full dynticks CPUs.]

    To address that issue, modify the TEO governor to carry out the
    entire search for the most suitable idle state (from the target
    residency perspective) even if a latency constraint is present,
    to allow it to determine the expected idle duration in all cases.

    Also, when using the last several measured idle duration values
    to refine the idle state selection, make it compare those values
    with the current expected idle duration value (instead of
    comparing them with the target residency of the idle state
    selected so far) which should prevent the tick from being
    retained when it makes sense to stop it sometimes (especially
    in the presence of PM QoS latency constraints).

    Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems")
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • After commit 554c8aa8ecad ("sched: idle: Select idle state before
    stopping the tick") the menu governor prevents the scheduler tick from
    being stopped (unless stopped already) if there is a PM QoS latency
    constraint for the given CPU and the target residency of the deepest
    idle state matching that constraint is below the tick boundary.

    However, that is problematic if CPUs with PM QoS latency constraints
    are idle for long times, because it effectively causes the tick to
    run on them all the time which is wasteful. [It is also confusing
    and questionable if they are full dynticks CPUs.]

    To address that issue, make the menu governor allow the tick to be
    stopped only if the idle duration predicted by it is beyond the tick
    boundary, except when the shallowest idle state is selected upfront
    and it is not a "polling" one.

    Fixes: 554c8aa8ecad ("sched: idle: Select idle state before stopping the tick")
    Link: https://lore.kernel.org/lkml/79b247b3-e056-610e-9a07-e685dfdaa6c9@gmail.com/
    Reported-by: Thomas Lindroth
    Tested-by: Thomas Lindroth
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

30 Jul, 2019

5 commits

  • When performing guest side polling, it is not necessary to
    also perform host side polling.

    So disable host side polling, via the new MSR interface,
    when loading cpuidle-haltpoll driver.

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Rafael J. Wysocki

    Marcelo Tosatti
     
  • The cpuidle_haltpoll governor, in conjunction with the haltpoll cpuidle
    driver, allows guest vcpus to poll for a specified amount of time before
    halting.
    This provides the following benefits to host side polling:

    1) The POLL flag is set while polling is performed, which allows
    a remote vCPU to avoid sending an IPI (and the associated
    cost of handling the IPI) when performing a wakeup.

    2) The VM-exit cost can be avoided.

    The downside of guest side polling is that polling is performed
    even with other runnable tasks in the host.

    Results comparing halt_poll_ns and server/client application
    where a small packet is ping-ponged:

    host --> 31.33
    halt_poll_ns=300000 / no guest busy spin --> 33.40 (93.8%)
    halt_poll_ns=0 / guest_halt_poll_ns=300000 --> 32.73 (95.7%)

    For the SAP HANA benchmarks (where idle_spin is a parameter
    of the previous version of the patch, results should be the
    same):

    hpns == halt_poll_ns

    idle_spin=0/ idle_spin=800/ idle_spin=0/
    hpns=200000 hpns=0 hpns=800000
    DeleteC06T03 (100 thread) 1.76 1.71 (-3%) 1.78 (+1%)
    InsertC16T02 (100 thread) 2.14 2.07 (-3%) 2.18 (+1.8%)
    DeleteC00T01 (1 thread) 1.34 1.28 (-4.5%) 1.29 (-3.7%)
    UpdateC00T03 (1 thread) 4.72 4.18 (-12%) 4.53 (-5%)

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Rafael J. Wysocki

    Marcelo Tosatti
     
  • Since this field is shared by all governors, move it to
    cpuidle device structure.

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Rafael J. Wysocki

    Marcelo Tosatti
     
  • Add a poll_limit_ns variable to cpuidle_device structure.

    Calculate and configure it in the new cpuidle_poll_time
    function, in case its zero.

    Individual governors are allowed to override this value.

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Rafael J. Wysocki

    Marcelo Tosatti
     
  • Add a cpuidle driver that calls the architecture default_idle routine.

    To be used in conjunction with the haltpoll governor.

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Rafael J. Wysocki

    Marcelo Tosatti
     

18 Jul, 2019

1 commit

  • * pm-cpufreq:
    cpufreq: Make cpufreq_generic_init() return void
    cpufreq: imx-cpufreq-dt: Add i.MX8MN support
    cpufreq: Add QoS requests for userspace constraints
    cpufreq: intel_pstate: Reuse refresh_frequency_limits()
    cpufreq: Register notifiers with the PM QoS framework
    PM / QoS: Add support for MIN/MAX frequency constraints
    PM / QOS: Pass request type to dev_pm_qos_read_value()
    PM / QOS: Rename __dev_pm_qos_read_value() and dev_pm_qos_raw_read_value()
    PM / QOS: Pass request type to dev_pm_qos_{add|remove}_notifier()

    Rafael J. Wysocki
     

04 Jul, 2019

1 commit

  • dev_pm_qos_read_value() will soon need to support more constraint types
    (min/max frequency) and will have another argument to it, i.e. type of
    the constraint. While that is fine for the existing users of
    dev_pm_qos_read_value(), but not that optimal for the callers of
    __dev_pm_qos_read_value() and dev_pm_qos_raw_read_value() as all the
    callers of these two routines are only looking for resume latency
    constraint.

    Lets make these two routines care only about the resume latency
    constraint and rename them to __dev_pm_qos_resume_latency() and
    dev_pm_qos_raw_resume_latency().

    Suggested-by: Rafael J. Wysocki
    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

19 Jun, 2019

1 commit

  • Based on 2 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation #

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 4122 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Enrico Weigelt
    Reviewed-by: Kate Stewart
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this file is released under the gplv2

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 68 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Armijn Hemel
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190531190114.292346262@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

31 May, 2019

4 commits

  • Based on 1 normalized pattern(s):

    this code is licenced under the gpl version 2 as described in the
    copying file that acompanies the linux kernel

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 1 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Alexios Zavras
    Reviewed-by: Allison Randal
    Reviewed-by: Steve Winslow
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190528171439.466585205@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms and conditions of the gnu general public license
    version 2 as published by the free software foundation this program
    is distributed in the hope it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details you should have received a copy of the gnu general
    public license along with this program if not see http www gnu org
    licenses

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 228 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Steve Winslow
    Reviewed-by: Richard Fontana
    Reviewed-by: Alexios Zavras
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190528171438.107155473@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • Based on 3 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version this program is distributed in the
    hope that it will be useful but without any warranty without even
    the implied warranty of merchantability or fitness for a particular
    purpose see the gnu general public license for more details

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version [author] [kishon] [vijay] [abraham]
    [i] [kishon]@[ti] [com] this program is distributed in the hope that
    it will be useful but without any warranty without even the implied
    warranty of merchantability or fitness for a particular purpose see
    the gnu general public license for more details

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version [author] [graeme] [gregory]
    [gg]@[slimlogic] [co] [uk] [author] [kishon] [vijay] [abraham] [i]
    [kishon]@[ti] [com] [based] [on] [twl6030]_[usb] [c] [author] [hema]
    [hk] [hemahk]@[ti] [com] this program is distributed in the hope
    that it will be useful but without any warranty without even the
    implied warranty of merchantability or fitness for a particular
    purpose see the gnu general public license for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 1105 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Richard Fontana
    Reviewed-by: Kate Stewart
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070033.202006027@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

1 commit


10 Apr, 2019

1 commit

  • To be able to predict the sleep duration for a CPU entering idle, it
    is essential to know the expiration time of the next timer. Both the
    teo and the menu cpuidle governors already use this information for
    CPU idle state selection.

    Moving forward, a similar prediction needs to be made for a group of
    idle CPUs rather than for a single one and the following changes
    implement a new genpd governor for that purpose.

    In order to support that feature, add a new function called
    tick_nohz_get_next_hrtimer() that will return the next hrtimer
    expiration time of a given CPU to be invoked after deciding
    whether or not to stop the scheduler tick on that CPU.

    Make the cpuidle core call tick_nohz_get_next_hrtimer() right
    before invoking the ->enter() callback provided by the cpuidle
    driver for the given state and store its return value in the
    per-CPU struct cpuidle_device, so as to make it available to code
    outside of cpuidle.

    Note that at the point when cpuidle calls tick_nohz_get_next_hrtimer(),
    the governor's ->select() callback has already returned and indicated
    whether or not the tick should be stopped, so in fact the value
    returned by tick_nohz_get_next_hrtimer() always is the next hrtimer
    expiration time for the given CPU, possibly including the tick (if
    it hasn't been stopped).

    Co-developed-by: Lina Iyer
    Co-developed-by: Daniel Lezcano
    Acked-by: Daniel Lezcano
    Signed-off-by: Ulf Hansson
    [ rjw: Subject & changelog ]
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     

02 Apr, 2019

1 commit

  • Since commit 45f1ff59e27c ("cpuidle: Return nohz hint from
    cpuidle_select()") Exynos CPUidle driver stopped entering C1 (AFTR) mode
    on Exynos4412-based Trats2 board.

    Further analysis revealed that the CPUidle framework changed the way
    it handles predicted timer ticks and reported target residency for the
    given idle states. As a result, the C1 (AFTR) state was not chosen
    anymore on completely idle device. The main issue was to high target
    residency value. The similar C1 (AFTR) state for 'coupled' CPUidle
    version used 10 times lower value for the target residency, despite
    the fact that it is the same state from the hardware perspective.

    The 100000us value for standard C1 (AFTR) mode is there from the begining
    of the support for this idle state, added by the commit 67173ca492ab
    ("ARM: EXYNOS: Add support AFTR mode on EXYNOS4210"). That commit doesn't
    give any reason for it, instead it looks like it was blindly copied from
    the WFI/IDLE state of the same driver that time. That time, that value
    was probably not really used by the framework for any critical decision,
    so it didn't matter that much.

    Now it turned out to be an issue, so unify the target residency with the
    'coupled' version, as it seems to better match the real use case values
    and restores the operation of the Exynos CPUidle driver on the idle
    device.

    Signed-off-by: Marek Szyprowski
    Reviewed-by: Krzysztof Kozlowski
    Acked-by: Daniel Lezcano
    Acked-by: Bartlomiej Zolnierkiewicz
    Signed-off-by: Rafael J. Wysocki

    Marek Szyprowski