18 Jul, 2019

1 commit

  • * pm-cpufreq:
    cpufreq: Make cpufreq_generic_init() return void
    cpufreq: imx-cpufreq-dt: Add i.MX8MN support
    cpufreq: Add QoS requests for userspace constraints
    cpufreq: intel_pstate: Reuse refresh_frequency_limits()
    cpufreq: Register notifiers with the PM QoS framework
    PM / QoS: Add support for MIN/MAX frequency constraints
    PM / QOS: Pass request type to dev_pm_qos_read_value()
    PM / QOS: Rename __dev_pm_qos_read_value() and dev_pm_qos_raw_read_value()
    PM / QOS: Pass request type to dev_pm_qos_{add|remove}_notifier()

    Rafael J. Wysocki
     

04 Jul, 2019

1 commit

  • dev_pm_qos_read_value() will soon need to support more constraint types
    (min/max frequency) and will have another argument to it, i.e. type of
    the constraint. While that is fine for the existing users of
    dev_pm_qos_read_value(), but not that optimal for the callers of
    __dev_pm_qos_read_value() and dev_pm_qos_raw_read_value() as all the
    callers of these two routines are only looking for resume latency
    constraint.

    Lets make these two routines care only about the resume latency
    constraint and rename them to __dev_pm_qos_resume_latency() and
    dev_pm_qos_raw_resume_latency().

    Suggested-by: Rafael J. Wysocki
    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

19 Jun, 2019

1 commit

  • Based on 2 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation #

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 4122 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Enrico Weigelt
    Reviewed-by: Kate Stewart
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this file is released under the gplv2

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 68 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Armijn Hemel
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190531190114.292346262@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

31 May, 2019

4 commits

  • Based on 1 normalized pattern(s):

    this code is licenced under the gpl version 2 as described in the
    copying file that acompanies the linux kernel

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 1 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Alexios Zavras
    Reviewed-by: Allison Randal
    Reviewed-by: Steve Winslow
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190528171439.466585205@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms and conditions of the gnu general public license
    version 2 as published by the free software foundation this program
    is distributed in the hope it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details you should have received a copy of the gnu general
    public license along with this program if not see http www gnu org
    licenses

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 228 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Steve Winslow
    Reviewed-by: Richard Fontana
    Reviewed-by: Alexios Zavras
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190528171438.107155473@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • Based on 3 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version this program is distributed in the
    hope that it will be useful but without any warranty without even
    the implied warranty of merchantability or fitness for a particular
    purpose see the gnu general public license for more details

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version [author] [kishon] [vijay] [abraham]
    [i] [kishon]@[ti] [com] this program is distributed in the hope that
    it will be useful but without any warranty without even the implied
    warranty of merchantability or fitness for a particular purpose see
    the gnu general public license for more details

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version [author] [graeme] [gregory]
    [gg]@[slimlogic] [co] [uk] [author] [kishon] [vijay] [abraham] [i]
    [kishon]@[ti] [com] [based] [on] [twl6030]_[usb] [c] [author] [hema]
    [hk] [hemahk]@[ti] [com] this program is distributed in the hope
    that it will be useful but without any warranty without even the
    implied warranty of merchantability or fitness for a particular
    purpose see the gnu general public license for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 1105 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Richard Fontana
    Reviewed-by: Kate Stewart
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070033.202006027@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

1 commit


10 Apr, 2019

1 commit

  • To be able to predict the sleep duration for a CPU entering idle, it
    is essential to know the expiration time of the next timer. Both the
    teo and the menu cpuidle governors already use this information for
    CPU idle state selection.

    Moving forward, a similar prediction needs to be made for a group of
    idle CPUs rather than for a single one and the following changes
    implement a new genpd governor for that purpose.

    In order to support that feature, add a new function called
    tick_nohz_get_next_hrtimer() that will return the next hrtimer
    expiration time of a given CPU to be invoked after deciding
    whether or not to stop the scheduler tick on that CPU.

    Make the cpuidle core call tick_nohz_get_next_hrtimer() right
    before invoking the ->enter() callback provided by the cpuidle
    driver for the given state and store its return value in the
    per-CPU struct cpuidle_device, so as to make it available to code
    outside of cpuidle.

    Note that at the point when cpuidle calls tick_nohz_get_next_hrtimer(),
    the governor's ->select() callback has already returned and indicated
    whether or not the tick should be stopped, so in fact the value
    returned by tick_nohz_get_next_hrtimer() always is the next hrtimer
    expiration time for the given CPU, possibly including the tick (if
    it hasn't been stopped).

    Co-developed-by: Lina Iyer
    Co-developed-by: Daniel Lezcano
    Acked-by: Daniel Lezcano
    Signed-off-by: Ulf Hansson
    [ rjw: Subject & changelog ]
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     

02 Apr, 2019

1 commit

  • Since commit 45f1ff59e27c ("cpuidle: Return nohz hint from
    cpuidle_select()") Exynos CPUidle driver stopped entering C1 (AFTR) mode
    on Exynos4412-based Trats2 board.

    Further analysis revealed that the CPUidle framework changed the way
    it handles predicted timer ticks and reported target residency for the
    given idle states. As a result, the C1 (AFTR) state was not chosen
    anymore on completely idle device. The main issue was to high target
    residency value. The similar C1 (AFTR) state for 'coupled' CPUidle
    version used 10 times lower value for the target residency, despite
    the fact that it is the same state from the hardware perspective.

    The 100000us value for standard C1 (AFTR) mode is there from the begining
    of the support for this idle state, added by the commit 67173ca492ab
    ("ARM: EXYNOS: Add support AFTR mode on EXYNOS4210"). That commit doesn't
    give any reason for it, instead it looks like it was blindly copied from
    the WFI/IDLE state of the same driver that time. That time, that value
    was probably not really used by the framework for any critical decision,
    so it didn't matter that much.

    Now it turned out to be an issue, so unify the target residency with the
    'coupled' version, as it seems to better match the real use case values
    and restores the operation of the Exynos CPUidle driver on the idle
    device.

    Signed-off-by: Marek Szyprowski
    Reviewed-by: Krzysztof Kozlowski
    Acked-by: Daniel Lezcano
    Acked-by: Bartlomiej Zolnierkiewicz
    Signed-off-by: Rafael J. Wysocki

    Marek Szyprowski
     

13 Mar, 2019

1 commit

  • After commit 61cb5758d3c4 ("cpuidle: Add cpuidle.governor= command
    line parameter") new cpuidle governors are not added to the list
    of available governors, so governor selection via sysfs doesn't
    work as expected (even though it is rarely used anyway).

    Fix that by making cpuidle_register_governor() add new governors to
    cpuidle_governors again.

    Fixes: 61cb5758d3c4 ("cpuidle: Add cpuidle.governor= command line parameter")
    Reported-by: Kees Cook
    Cc: 5.0+ # 5.0+
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

07 Mar, 2019

1 commit

  • The variance computation in get_typical_interval() may overflow if
    the square of the value of diff exceeds the maximum for the int64_t
    data type value which basically is the case when it is of the order
    of UINT_MAX.

    However, data points so far in the future don't matter for idle
    state selection anyway, so change the initial threshold value in
    get_typical_interval() to INT_MAX which will cause more "outlying"
    data points to be discarded without affecting the selection result.

    Reported-by: Randy Dunlap
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

01 Feb, 2019

2 commits


31 Jan, 2019

1 commit

  • The default time is declared in units of microsecnds,
    but is used as nanoseconds, resulting in significant
    accounting errors for idle state 0 time when all idle
    states deeper than 0 are disabled.

    Under these unusual conditions, we don't really care
    about the poll time limit anyhow.

    Fixes: 800fb34a99ce ("cpuidle: poll_state: Disregard disable idle states")
    Signed-off-by: Doug Smythies
    Signed-off-by: Rafael J. Wysocki

    Doug Smythies
     

17 Jan, 2019

1 commit

  • The venerable menu governor does some things that are quite
    questionable in my view.

    First, it includes timer wakeups in the pattern detection data and
    mixes them up with wakeups from other sources which in some cases
    causes it to expect what essentially would be a timer wakeup in a
    time frame in which no timer wakeups are possible (because it knows
    the time until the next timer event and that is later than the
    expected wakeup time).

    Second, it uses the extra exit latency limit based on the predicted
    idle duration and depending on the number of tasks waiting on I/O,
    even though those tasks may run on a different CPU when they are
    woken up. Moreover, the time ranges used by it for the sleep length
    correction factors depend on whether or not there are tasks waiting
    on I/O, which again doesn't imply anything in particular, and they
    are not correlated to the list of available idle states in any way
    whatever.

    Also, the pattern detection code in menu may end up considering
    values that are too large to matter at all, in which cases running
    it is a waste of time.

    A major rework of the menu governor would be required to address
    these issues and the performance of at least some workloads (tuned
    specifically to the current behavior of the menu governor) is likely
    to suffer from that. It is thus better to introduce an entirely new
    governor without them and let everybody use the governor that works
    better with their actual workloads.

    The new governor introduced here, the timer events oriented (TEO)
    governor, uses the same basic strategy as menu: it always tries to
    find the deepest idle state that can be used in the given conditions.
    However, it applies a different approach to that problem.

    First, it doesn't use "correction factors" for the time till the
    closest timer, but instead it tries to correlate the measured idle
    duration values with the available idle states and use that
    information to pick up the idle state that is most likely to "match"
    the upcoming CPU idle interval.

    Second, it doesn't take the number of "I/O waiters" into account at
    all and the pattern detection code in it avoids taking timer wakeups
    into account. It also only uses idle duration values less than the
    current time till the closest timer (with the tick excluded) for that
    purpose.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Daniel Lezcano

    Rafael J. Wysocki
     

28 Dec, 2018

1 commit

  • Pull powerpc updates from Michael Ellerman:
    "Notable changes:

    - Mitigations for Spectre v2 on some Freescale (NXP) CPUs.

    - A large series adding support for pass-through of Nvidia V100 GPUs
    to guests on Power9.

    - Another large series to enable hardware assistance for TLB table
    walk on MPC8xx CPUs.

    - Some preparatory changes to our DMA code, to make way for further
    cleanups from Christoph.

    - Several fixes for our Transactional Memory handling discovered by
    fuzzing the signal return path.

    - Support for generating our system call table(s) from a text file
    like other architectures.

    - A fix to our page fault handler so that instead of generating a
    WARN_ON_ONCE, user accesses of kernel addresses instead print a
    ratelimited and appropriately scary warning.

    - A cosmetic change to make our unhandled page fault messages more
    similar to other arches and also more compact and informative.

    - Freescale updates from Scott:
    "Highlights include elimination of legacy clock bindings use from
    dts files, an 83xx watchdog handler, fixes to old dts interrupt
    errors, and some minor cleanup."

    And many clean-ups, reworks and minor fixes etc.

    Thanks to: Alexandre Belloni, Alexey Kardashevskiy, Andrew Donnellan,
    Aneesh Kumar K.V, Arnd Bergmann, Benjamin Herrenschmidt, Breno Leitao,
    Christian Lamparter, Christophe Leroy, Christoph Hellwig, Daniel
    Axtens, Darren Stevens, David Gibson, Diana Craciun, Dmitry V. Levin,
    Firoz Khan, Geert Uytterhoeven, Greg Kurz, Gustavo Romero, Hari
    Bathini, Joel Stanley, Kees Cook, Madhavan Srinivasan, Mahesh
    Salgaonkar, Markus Elfring, Mathieu Malaterre, Michal Suchánek, Naveen
    N. Rao, Nick Desaulniers, Oliver O'Halloran, Paul Mackerras, Ram Pai,
    Ravi Bangoria, Rob Herring, Russell Currey, Sabyasachi Gupta, Sam
    Bobroff, Satheesh Rajendran, Scott Wood, Segher Boessenkool, Stephen
    Rothwell, Tang Yuantian, Thiago Jung Bauermann, Yangtao Li, Yuantian
    Tang, Yue Haibing"

    * tag 'powerpc-4.21-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (201 commits)
    Revert "powerpc/fsl_pci: simplify fsl_pci_dma_set_mask"
    powerpc/zImage: Also check for stdout-path
    powerpc: Fix HMIs on big-endian with CONFIG_RELOCATABLE=y
    macintosh: Use of_node_name_{eq, prefix} for node name comparisons
    ide: Use of_node_name_eq for node name comparisons
    powerpc: Use of_node_name_eq for node name comparisons
    powerpc/pseries/pmem: Convert to %pOFn instead of device_node.name
    powerpc/mm: Remove very old comment in hash-4k.h
    powerpc/pseries: Fix node leak in update_lmb_associativity_index()
    powerpc/configs/85xx: Enable CONFIG_DEBUG_KERNEL
    powerpc/dts/fsl: Fix dtc-flagged interrupt errors
    clk: qoriq: add more compatibles strings
    powerpc/fsl: Use new clockgen binding
    powerpc/83xx: handle machine check caused by watchdog timer
    powerpc/fsl-rio: fix spelling mistake "reserverd" -> "reserved"
    powerpc/fsl_pci: simplify fsl_pci_dma_set_mask
    arch/powerpc/fsl_rmu: Use dma_zalloc_coherent
    vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver
    vfio_pci: Allow regions to add own capabilities
    vfio_pci: Allow mapping extra regions
    ...

    Linus Torvalds
     

13 Dec, 2018

1 commit

  • Add two new metrics for CPU idle states, "above" and "below", to count
    the number of times the given state had been asked for (or entered
    from the kernel's perspective), but the observed idle duration turned
    out to be too short or too long for it (respectively).

    These metrics help to estimate the quality of the CPU idle governor
    in use.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

11 Dec, 2018

3 commits


04 Dec, 2018

1 commit

  • When booting a pseries kernel with PREEMPT enabled, it dumps the
    following warning:

    BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
    caller is pseries_processor_idle_init+0x5c/0x22c
    CPU: 13 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc3-00090-g12201a0128bc-dirty #828
    Call Trace:
    [c000000429437ab0] [c0000000009c8878] dump_stack+0xec/0x164 (unreliable)
    [c000000429437b00] [c0000000005f2f24] check_preemption_disabled+0x154/0x160
    [c000000429437b90] [c000000000cab8e8] pseries_processor_idle_init+0x5c/0x22c
    [c000000429437c10] [c000000000010ed4] do_one_initcall+0x64/0x300
    [c000000429437ce0] [c000000000c54500] kernel_init_freeable+0x3f0/0x500
    [c000000429437db0] [c0000000000112dc] kernel_init+0x2c/0x160
    [c000000429437e20] [c00000000000c1d0] ret_from_kernel_thread+0x5c/0x6c

    This happens because the code calls get_lppaca() which calls
    get_paca() and it checks if preemption is disabled through
    check_preemption_disabled().

    Preemption should be disabled because the per CPU variable may make no
    sense if there is a preemption (and a CPU switch) after it reads the
    per CPU data and when it is used.

    In this device driver specifically, it is not a problem, because this
    code just needs to have access to one lppaca struct, and it does not
    matter if it is the current per CPU lppaca struct or not (i.e. when
    there is a preemption and a CPU migration).

    That said, the most appropriate fix seems to be related to avoiding
    the debug_smp_processor_id() call at get_paca(), instead of calling
    preempt_disable() before get_paca().

    Signed-off-by: Breno Leitao
    Signed-off-by: Michael Ellerman

    Breno Leitao
     

09 Nov, 2018

2 commits

  • The only reason that remains, to why the ARM cpuidle driver calls
    cpuidle_register_driver(), is to avoid printing an error message in case
    another driver already have been registered for the CPU. This seems a bit
    silly, but more importantly, if that is a common scenario, perhaps we
    should change cpuidle_register() accordingly instead.

    In either case, let's consolidate the code, by converting to use
    cpuidle_register|unregister(), which also avoids the unnecessary allocation
    of the struct cpuidle_device.

    Signed-off-by: Ulf Hansson
    Reviewed-by: Lorenzo Pieralisi
    Reviewed-by: Daniel Lezcano
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     
  • There's no point to register the cpuidle driver for the current CPU, when
    the initialization of the arch specific back-end data fails by returning
    -ENXIO.

    Instead, let's re-order the sequence to its original flow, by first trying
    to initialize the back-end part and then act accordingly on the returned
    error code. Additionally, let's print the error message, no matter of what
    error code that was returned.

    Fixes: a0d46a3dfdc3 (ARM: cpuidle: Register per cpuidle device)
    Signed-off-by: Ulf Hansson
    Reviewed-by: Daniel Lezcano
    Cc: 4.19+ # v4.19+
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     

31 Oct, 2018

1 commit

  • Pull more power management updates from Rafael Wysocki:
    "These remove a questionable heuristic from the menu cpuidle governor,
    fix a recent build regression in the intel_pstate driver, clean up ARM
    big-Little support in cpufreq and fix up hung task watchdog's
    interaction with system-wide power management transitions.

    Specifics:

    - Fix build regression in the intel_pstate driver that doesn't build
    without CONFIG_ACPI after recent changes (Dominik Brodowski).

    - One of the heuristics in the menu cpuidle governor is based on a
    function returning 0 most of the time, so drop it and clean up the
    scheduler code related to it (Daniel Lezcano).

    - Prevent the arm_big_little cpufreq driver from being used on ARM64
    which is not suitable for it and drop the arm_big_little_dt driver
    that is not used any more (Sudeep Holla).

    - Prevent the hung task watchdog from triggering during resume from
    system-wide sleep states by disabling it before freezing tasks and
    enabling it again after they have been thawed (Vitaly Kuznetsov)"

    * tag 'pm-4.20-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    kernel: hung_task.c: disable on suspend
    cpufreq: remove unused arm_big_little_dt driver
    cpufreq: drop ARM_BIG_LITTLE_CPUFREQ support for ARM64
    cpufreq: intel_pstate: Fix compilation for !CONFIG_ACPI
    cpuidle: menu: Remove get_loadavg() from the performance multiplier
    sched: Factor out nr_iowait and nr_iowait_cpu

    Linus Torvalds
     

27 Oct, 2018

1 commit

  • There are several definitions of those functions/macros in places that
    mess with fixed-point load averages. Provide an official version.

    [akpm@linux-foundation.org: fix missed conversion in block/blk-iolatency.c]
    Link: http://lkml.kernel.org/r/20180828172258.3185-5-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner
    Acked-by: Peter Zijlstra (Intel)
    Tested-by: Suren Baghdasaryan
    Tested-by: Daniel Drake
    Cc: Christopher Lameter
    Cc: Ingo Molnar
    Cc: Johannes Weiner
    Cc: Mike Galbraith
    Cc: Peter Enderborg
    Cc: Randy Dunlap
    Cc: Shakeel Butt
    Cc: Tejun Heo
    Cc: Vinayak Menon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

25 Oct, 2018

1 commit

  • The function get_loadavg() returns almost always zero. To be more
    precise, statistically speaking for a total of 1023379 times passing
    in the function, the load is equal to zero 1020728 times, greater than
    100, 610 times, the remaining is between 0 and 5.

    In 2011, the get_loadavg() was removed from the Android tree because
    of the above [1]. At this time, the load was:

    unsigned long this_cpu_load(void)
    {
    struct rq *this = this_rq();
    return this->cpu_load[0];
    }

    In 2014, the code was changed by commit 372ba8cb46b2 (cpuidle: menu: Lookup CPU
    runqueues less) and the load is:

    void get_iowait_load(unsigned long *nr_waiters, unsigned long *load)
    {
    struct rq *rq = this_rq();
    *nr_waiters = atomic_read(&rq->nr_iowait);
    *load = rq->load.weight;
    }

    with the same result.

    Both measurements show using the load in this code path does no matter
    anymore. Removing it.

    [1] https://android.googlesource.com/kernel/common/+/4dedd9f124703207895777ac6e91dacde0f7cc17

    Signed-off-by: Daniel Lezcano
    Acked-by: Mel Gorman
    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     

18 Oct, 2018

2 commits

  • If the minimum interval taken into account in the average computation
    loop in get_typical_interval() is less than the expected idle
    duration determined so far, the resultant average cannot be greater
    than that value as well and the entire return result of the function
    is going to be discarded anyway going forward.

    In that case, it is a waste of time to carry out the remaining
    computations in get_typical_interval(), so avoid that by returning
    early if the minimum interval is not below the expected idle duration.

    No intentional changes of behavior.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • Since the correction factor cannot be greater than RESOLUTION * DECAY,
    the result of the predicted_us computation in menu_select() cannot be
    greater than data->next_timer_us, so it is not necessary to compare
    the "typical interval" value coming from get_typical_interval() with
    data->next_timer_us separately.

    It is sufficient to copmare predicted_us with the return value of
    get_typical_interval() directly, so do that and drop the now
    redundant expected_interval variable.

    No intentional changes of behavior.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

12 Oct, 2018

1 commit

  • After some recent menu governor changes, the promotion of the
    "polling" state to a physical one is mostly controlled by the
    latency limit (resulting from the "interactivity" factor) and
    not by the time to the closest timer event, so it should be
    sufficient to check the exit latency of that state for this
    purpose (of course, its target residency still needs to be
    within the next timer event range for energy-efficiency).

    Also, the physical state the "polling" one is promoted to need not
    be the next one in principle (in case the next state is disabled,
    for example).

    For these reasons, simplify the checks made to decide whether or
    not to promote the "polling" state to a physical one and update
    the target idle duration when it is promoted in case the residency
    of the new state turns out to be above the tick boundary (in which
    case there is no reason to stop the tick).

    Tested-by: Doug Smythies
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

05 Oct, 2018

6 commits

  • If need_resched() returns "false", breaking out of the loop in
    poll_idle() will cause a new idle state to be selected, so in fact
    it usually doesn't make sense to spin in it longer than the target
    residency of the second state. [Note that the "polling" state is
    used only if there is at least one "real" state defined in addition
    to it, so the second state is always there.] On the other hand,
    breaking out of it early (say in case the next state is disabled)
    shouldn't hurt as it is polling anyway.

    For this reason, make the loop in poll_idle() break if the CPU has
    been spinning longer than the target residency of the second state
    (the "polling" state can only be state[0]).

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • It is better to always update data->bucket before returning from
    menu_select() to avoid updating the correction factor for a stale
    bucket, so combine the latency_req == 0 special check with the more
    general check below.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Peter Zijlstra (Intel)

    Rafael J. Wysocki
     
  • If the next timer event (not including the tick) is closer than the
    target residency of the second state or the PM QoS latency constraint
    is below its exit latency, state[0] will be used regardless of any
    other factors, so skip the computations in menu_select() then and
    return 0 straight away from it.

    Still, do that after the bucket has been determined to avoid
    updating the correction factor for a stale bucket.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Peter Zijlstra (Intel)

    Rafael J. Wysocki
     
  • It is not necessary to update data->last_state_idx in menu_select()
    as it only is used in menu_update() which only runs when
    data->needs_update is set and that is set only when updating
    data->last_state_idx in menu_reflect().

    Accordingly, drop the update of data->last_state_idx from
    menu_select() and get rid of the (now redundant) "out" label
    from it.

    No intentional behavior changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Peter Zijlstra (Intel)
    Reviewed-by: Daniel Lezcano

    Rafael J. Wysocki
     
  • Rearrange the code in menu_select() so that the loop over idle states
    always starts from 0 and get rid of the first_idx variable.

    While at it, add two empty lines to separate conditional statements
    from one another.

    No intentional behavior changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Peter Zijlstra (Intel)
    Reviewed-by: Daniel Lezcano

    Rafael J. Wysocki
     
  • Since menu_select() can only set first_idx to 1 if the exit latency
    of the second state is not greater than the latency limit, it should
    first determine that limit. Thus first_idx should be computed after
    the "interactivity" factor has been taken into account.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Peter Zijlstra (Intel)
    Reviewedy-by: Daniel Lezcano

    Rafael J. Wysocki
     

04 Oct, 2018

1 commit

  • If the CPU exits the "polling" state due to the time limit in the
    loop in poll_idle(), this is not a real wakeup and it just means
    that the "polling" state selection was not adequate. The governor
    mispredicted short idle duration, but had a more suitable state been
    selected, the CPU might have spent more time in it. In fact, there
    is no reason to expect that there would have been a wakeup event
    earlier than the next timer in that case.

    Handling such cases as regular wakeups in menu_update() may cause the
    menu governor to make suboptimal decisions going forward, but ignoring
    them altogether would not be correct either, because every time
    menu_select() is invoked, it makes a separate new attempt to predict
    the idle duration taking distinct time to the closest timer event as
    input and the outcomes of all those attempts should be recorded.

    For this reason, make menu_update() always assume that if the
    "polling" state was exited due to the time limit, the next proper
    wakeup event for the CPU would be the next timer event (not
    including the tick).

    Fixes: a37b969a61c1 "cpuidle: poll_state: Add time limit to poll_idle()"
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Peter Zijlstra (Intel)
    Reviewed-by: Daniel Lezcano

    Rafael J. Wysocki
     

03 Oct, 2018

1 commit

  • The predicted_us field in struct menu_device is only accessed in
    menu_select(), so replace it with a local variable in that function.

    With that, stop using expected_interval instead of predicted_us to
    store the new predicted idle duration value if it is set to the
    selected state's target residency which is quite confusing.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Daniel Lezcano

    Rafael J. Wysocki
     

18 Sep, 2018

1 commit

  • Currently, ktime_us_delta() is invoked unconditionally to compute the
    idle residency of the CPU, but it only makes sense to do that if a
    valid idle state has been entered, so move the ktime_us_delta()
    invocation after the entered_state >= 0 check.

    While at it, merge two comment blocks in there into one and drop
    a space between type casting of diff.

    This patch has no functional changes.

    Signed-off-by: Fieah Lim
    [ rjw: Changelog cleanup, comment format fix ]
    Signed-off-by: Rafael J. Wysocki

    Fieah Lim