18 May, 2016

1 commit

  • Commit 0b89e9aa2856 (cpuidle: delay enabling interrupts until all
    coupled CPUs leave idle) rightfully fixed a regression by letting
    the coupled idle state framework to handle local interrupt enabling
    when the CPU is exiting an idle state.

    The current code checks if the idle state is coupled and, if so, it
    will let the coupled code to enable interrupts. This way, it can
    decrement the ready-count before handling the interrupt. This
    mechanism prevents the other CPUs from waiting for a CPU which is
    handling interrupts.

    But the check is done against the state index returned by the back
    end driver's ->enter functions which could be different from the
    initial index passed as parameter to the cpuidle_enter_state()
    function.

    entered_state = target_state->enter(dev, drv, index);

    [ ... ]

    if (!cpuidle_state_is_coupled(drv, entered_state))
    local_irq_enable();

    [ ... ]

    If the 'index' is referring to a coupled idle state but the
    'entered_state' is *not* coupled, then the interrupts are enabled
    again. All CPUs blocked on the sync barrier may busy loop longer
    if the CPU has interrupts to handle before decrementing the
    ready-count. That's consuming more energy than saving.

    Fixes: 0b89e9aa2856 (cpuidle: delay enabling interrupts until all coupled CPUs leave idle)
    Signed-off-by: Daniel Lezcano
    Cc: 3.15+ # 3.15+
    [ rjw: Subject & changelog ]
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     

07 May, 2016

1 commit


28 Apr, 2016

1 commit

  • arm_cpuidle_suspend() may return -EOPNOTSUPP, or any value returned
    by the cpu_ops/cpuidle_ops suspend call. arm_enter_idle_state() doesn't
    update 'ret' with this value, meaning we always signal success to
    cpuidle_enter_state(), causing it to update the usage counters as if we
    succeeded.

    Fixes: 191de17aa3c1 ("ARM64: cpuidle: Replace cpu_suspend by the common ARM/ARM64 function")
    Signed-off-by: James Morse
    Acked-by: Lorenzo Pieralisi
    Acked-by: Daniel Lezcano
    Cc: 4.1+ # 4.1+
    Signed-off-by: Rafael J. Wysocki

    James Morse
     

26 Apr, 2016

1 commit

  • The ktime_get() can have a non negligeable overhead, use local_clock()
    instead.

    In order to test the difference between ktime_get() and local_clock(),
    a quick hack has been added to trigger, via debugfs, 10000 times a
    call to ktime_get() and local_clock() and measure the elapsed time.

    Then the average value, the min and max is computed for each call.

    From userspace, the test above was called 100 times every 2 seconds.

    So, ktime_get() and local_clock() have been called 1000000 times in
    total.

    The results are:

    ktime_get():
    ============
    * average: 101 ns (stddev: 27.4)
    * maximum: 38313 ns
    * minimum: 65 ns

    local_clock():
    ==============
    * average: 60 ns (stddev: 9.8)
    * maximum: 13487 ns
    * minimum: 46 ns

    The local_clock() is faster and more stable.

    Even if it is a drop in the ocean, changing the ktime_get() by the
    local_clock() allows to save 80ns at idle time (entry + exit). And
    in some circumstances, especially when there are several CPUs racing
    for the clock access, we save tens of microseconds.

    The idle duration resulting from a diff is converted from nanosec to
    microsec. This could be done with integer division (div 1000) - which is
    an expensive operation or by 10 bits shifting (div 1024) - which is fast
    but unprecise.

    The following table gives some results at the limits.

    ------------------------------------------
    | nsec | div(1000) | div(1024) |
    ------------------------------------------
    | 1e3 | 1 usec | 976 nsec |
    ------------------------------------------
    | 1e6 | 1000 usec | 976 usec |
    ------------------------------------------
    | 1e9 | 1000000 usec | 976562 usec |
    ------------------------------------------

    There is a linear deviation of 2.34%. This loss of precision is acceptable
    in the context of the resulting diff which is used for statistics. These
    ones are processed to guess estimate an approximation of the duration of the
    next idle period which ends up into an idle state selection. The selection
    criteria takes into account the next duration based on large intervals,
    represented by the idle state's target residency.

    The 2^10 division is enough because the approximation regarding the 1e3
    division is lost in all the approximations done for the next idle duration
    computation.

    Signed-off-by: Daniel Lezcano
    Acked-by: Peter Zijlstra (Intel)
    [ rjw: Subject ]
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     

09 Apr, 2016

1 commit

  • Currently the 'registered' member of the cpuidle_device struct is set
    to 1 during cpuidle_register_device. In this same function there are
    checks to see if the device is already registered to prevent duplicate
    calls to register the device, but this value is never set to 0 even on
    unregister of the device. Because of this, any attempt to call
    cpuidle_register_device after a call to cpuidle_unregister_device will
    fail which shouldn't be the case.

    To prevent this, set registered to 0 when the device is unregistered.

    Fixes: c878a52d3c7c (cpuidle: Check if device is already registered)
    Signed-off-by: Dave Gerlach
    Acked-by: Daniel Lezcano
    Cc: All applicable
    Signed-off-by: Rafael J. Wysocki

    Dave Gerlach
     

21 Mar, 2016

1 commit

  • Commit a9ceb78bc75c (cpuidle,menu: use interactivity_req to disable
    polling) changed the behavior of the fallback state selection part
    of menu_select() so it looks at interactivity_req instead of
    data->next_timer_us when it makes its decision. That effectively
    caused polling to be used more often as fallback idle which led to
    significant increases of energy consumption in some cases.

    Commit e132b9b3bc7f (cpuidle: menu: use high confidence factors
    only when considering polling) changed that logic again to be more
    predictable, but that didn't help with the increased energy
    consumption problem.

    For this reason, go back to making decisions on which state to fall
    back to based on data->next_timer_us which is the time we know for
    sure something will happen rather than a prediction (which may be
    inaccurate and turns out to be so often enough to be problematic).
    However, take the target residency of the first proper idle state
    (C1) into account, so that state is not used as the fallback one
    if its target residency is greater than data->next_timer_us.

    Fixes: a9ceb78bc75c (cpuidle,menu: use interactivity_req to disable polling)
    Signed-off-by: Rafael J. Wysocki
    Reported-and-tested-by: Doug Smythies

    Rafael J. Wysocki
     

17 Mar, 2016

1 commit

  • The menu governor uses five different factors to pick the
    idle state:
    - the user configured latency_req
    - the time until the next timer (next_timer_us)
    - the typical sleep interval, as measured recently
    - an estimate of sleep time by dividing next_timer_us by an observed factor
    - a load corrected version of the above, divided again by load

    Only the first three items are known with enough confidence that
    we can use them to consider polling, instead of an actual CPU
    idle state, because the cost of being wrong about polling can be
    excessive power use.

    The latter two are used in the menu governor's main selection
    loop, and can result in choosing a shallower idle state when
    the system is expected to be busy again soon.

    This pushes a busy system in the "performance" direction of
    the performance<>power tradeoff, when choosing between idle
    states, but stays more strictly on the "power" state when
    deciding between polling and C1.

    Signed-off-by: Rik van Riel
    Signed-off-by: Rafael J. Wysocki

    Rik van Riel
     

17 Feb, 2016

2 commits


30 Jan, 2016

1 commit

  • * pm-cpuidle:
    cpuidle: coupled: remove unused define cpuidle_coupled_lock
    cpuidle: fix fallback mechanism for suspend to idle in absence of enter_freeze

    * pm-cpufreq:
    cpufreq: cpufreq-dt: avoid uninitialized variable warnings:
    cpufreq: pxa2xx: fix pxa_cpufreq_change_voltage prototype
    cpufreq: Use list_is_last() to check last entry of the policy list
    cpufreq: Fix NULL reference crash while accessing policy->governor_data

    * pm-domains:
    PM / Domains: Fix typo in comment
    PM / Domains: Fix potential deadlock while adding/removing subdomains
    PM / domains: fix lockdep issue for all subdomains

    * pm-sleep:
    PM: APM_EMULATION does not depend on PM

    Rafael J. Wysocki
     

28 Jan, 2016

1 commit

  • This was found with the -RT patch enabled, but the fix should apply to
    non-RT also.

    Used multi_v7_defconfig+PREEMPT_RT_FULL=y and this caused a compilation
    warning without this fix:
    ../drivers/cpuidle/coupled.c:122:21: warning: 'cpuidle_coupled_lock'
    defined but not used [-Wunused-variable]

    Signed-off-by: Anders Roxell
    Signed-off-by: Rafael J. Wysocki

    Anders Roxell
     

22 Jan, 2016

1 commit

  • Commit 51164251f5c3 "sched / idle: Drop default_idle_call() fallback
    from call_cpuidle()" made find_deepest_state() return non-negative
    value and check all the states with index > 0. Also as a result,
    find_deepest_state() returns 0 even when enter_freeze callbacks are not
    implemented and enter_freeze_proper() is called which ends up crashing
    the kernel.

    This patch updates the check for index > 0 in cpuidle_enter_freeze and
    cpuidle_idle_call(when idle_should_freeze is true) to restore the
    suspend-to-idle functionality in absence of enter_freeze callback.

    Fixes: 51164251f5c3 "sched / idle: Drop default_idle_call() fallback from call_cpuidle()"
    Signed-off-by: Sudeep Holla
    Signed-off-by: Rafael J. Wysocki

    Sudeep Holla
     

21 Jan, 2016

1 commit

  • Pull more power management and ACPI updates from Rafael Wysocki:
    "This includes fixes on top of the previous batch of PM+ACPI updates
    and some new material as well.

    From the new material perspective the most significant are the driver
    core changes that should allow USB devices to stay suspended over
    system suspend/resume cycles if they have been runtime-suspended
    already beforehand. Apart from that, ACPICA is updated to upstream
    revision 20160108 (cosmetic mostly, but including one fixup on top of
    the previous ACPICA update) and there are some devfreq updates the
    didn't make it before (due to timing).

    A few recent regressions are fixed, most importantly in the cpuidle
    menu governor and in the ACPI backlight driver and some x86 platform
    drivers depending on it.

    Some more bugs are fixed and cleanups are made on top of that.

    Specifics:

    - Modify the driver core and the USB subsystem to allow USB devices
    to stay suspended over system suspend/resume cycles if they have
    been runtime-suspended already beforehand and fix some bugs on top
    of these changes (Tomeu Vizoso, Rafael Wysocki).

    - Update ACPICA to upstream revision 20160108, including updates of
    the ACPICA's copyright notices, a code fixup resulting from a
    regression fix that was necessary in the upstream code only (the
    regression fixed by it has never been present in Linux) and a
    compiler warning fix (Bob Moore, Lv Zheng).

    - Fix a recent regression in the cpuidle menu governor that broke it
    on practically all architectures other than x86 and make a couple
    of optimizations on top of that fix (Rafael Wysocki).

    - Clean up the selection of cpuidle governors depending on whether or
    not the kernel is configured for tickless systems (Jean Delvare).

    - Revert a recent commit that introduced a regression in the ACPI
    backlight driver, address the problem it attempted to fix in a
    different way and revert one more cosmetic change depending on the
    problematic commit (Hans de Goede).

    - Add two more ACPI backlight quirks (Hans de Goede).

    - Fix a few minor problems in the core devfreq code, clean it up a
    bit and update the MAINTAINERS information related to it (Chanwoo
    Choi, MyungJoo Ham).

    - Improve an error message in the ACPI fan driver (Andy Lutomirski).

    - Fix a recent build regression in the cpupower tool (Shreyas
    Prabhu)"

    * tag 'pm+acpi-4.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits)
    cpuidle: menu: Avoid pointless checks in menu_select()
    sched / idle: Drop default_idle_call() fallback from call_cpuidle()
    cpupower: Fix build error in cpufreq-info
    cpuidle: Don't enable all governors by default
    cpuidle: Default to ladder governor on ticking systems
    time: nohz: Expose tick_nohz_enabled
    ACPICA: Update version to 20160108
    ACPICA: Silence a -Wbad-function-cast warning when acpi_uintptr_t is 'uintptr_t'
    ACPICA: Additional 2016 copyright changes
    ACPICA: Reduce regression fix divergence from upstream ACPICA
    ACPI / video: Add disable_backlight_sysfs_if quirk for the Toshiba Satellite R830
    ACPI / video: Revert "thinkpad_acpi: Use acpi_video_handles_brightness_key_presses()"
    ACPI / video: Document acpi_video_handles_brightness_key_presses() a bit
    ACPI / video: Fix using an uninitialized mutex / list_head in acpi_video_handles_brightness_key_presses()
    ACPI / video: Revert "ACPI / video: driver must be registered before checking for keypresses"
    ACPI / fan: Improve acpi_device_update_power error message
    ACPI / video: Add disable_backlight_sysfs_if quirk for the Toshiba Portege R700
    cpuidle: menu: Fix menu_select() for CPUIDLE_DRIVER_STATE_START == 0
    MAINTAINERS: Add devfreq-event entry
    MAINTAINERS: Add missing git repository and directory for devfreq
    ...

    Linus Torvalds
     

19 Jan, 2016

2 commits

  • If menu_select() cannot find a suitable state to return, it will
    return the state index stored in data->last_state_idx. This
    means that it is pointless to look at the states whose indices
    are less than or equal to data->last_state_idx in the main loop,
    so don't do that.

    Given that those checks are done on every idle state selection, this
    change can save quite a bit of completely unnecessary overhead.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Ingo Molnar
    Tested-by: Sudeep Holla

    Rafael J. Wysocki
     
  • After commit 9c4b2867ed7c (cpuidle: menu: Fix menu_select() for
    CPUIDLE_DRIVER_STATE_START == 0) it is clear that menu_select()
    cannot return negative values. Moreover, ladder_select_state()
    will never return a negative value too, so make find_deepest_state()
    return non-negative values too and drop the default_idle_call()
    fallback from call_cpuidle().

    This eliminates one branch from the idle loop and makes the governors
    and find_deepest_state() handle the case when all states have been
    disabled from sysfs consistently.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: Ingo Molnar
    Tested-by: Sudeep Holla

    Rafael J. Wysocki
     

16 Jan, 2016

3 commits

  • Since commit d6f346f2d2 (cpuidle: improve governor Kconfig options)
    the best cpuidle governor is selected automatically. There is little
    point in additionally selecting the other one as it will not be used,
    so don't select both governors by default.

    Being able to select more than one governor is still good for
    developers booting with cpuidle_sysfs_switch, though.

    This fixes the second half of kernel BZ #65531.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=65531
    Signed-off-by: Jean Delvare
    Signed-off-by: Rafael J. Wysocki

    Jean Delvare
     
  • The menu governor is currently the default on all systems. However the
    documentation claims that the ladder governor is preferred on ticking
    systems. So bump the rating of the ladder governor when NO_HZ is
    disabled, or when booting with nohz=off.

    This fixes the first half of kernel BZ #65531.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=65531
    Signed-off-by: Jean Delvare
    Signed-off-by: Rafael J. Wysocki

    Jean Delvare
     
  • Pull powerpc updates from Michael Ellerman:
    "Core:
    - Ground work for the new Power9 MMU from Aneesh Kumar K.V
    - Optimise FP/VMX/VSX context switching from Anton Blanchard

    Misc:
    - Various cleanups from Krzysztof Kozlowski, John Ogness, Rashmica
    Gupta, Russell Currey, Gavin Shan, Daniel Axtens, Michael Neuling,
    Andrew Donnellan
    - Allow wrapper to work on non-english system from Laurent Vivier
    - Add rN aliases to the pt_regs_offset table from Rashmica Gupta
    - Fix module autoload for rackmeter & axonram drivers from Luis de
    Bethencourt
    - Include KVM guest test in all interrupt vectors from Paul Mackerras
    - Fix DSCR inheritance over fork() from Anton Blanchard
    - Make value-returning atomics & {cmp}xchg* & their atomic_ versions
    fully ordered from Boqun Feng
    - Print MSR TM bits in oops messages from Michael Neuling
    - Add TM signal return & invalid stack selftests from Michael Neuling
    - Limit EPOW reset event warnings from Vipin K Parashar
    - Remove the Cell QPACE code from Rashmica Gupta
    - Append linux_banner to exception information in xmon from Rashmica
    Gupta
    - Add selftest to check if VSRs are corrupted from Rashmica Gupta
    - Remove broken GregorianDay() from Daniel Axtens
    - Import Anton's context_switch2 benchmark into selftests from
    Michael Ellerman
    - Add selftest script to test HMI functionality from Daniel Axtens
    - Remove obsolete OPAL v2 support from Stewart Smith
    - Make enter_rtas() private from Michael Ellerman
    - PPR exception cleanups from Michael Ellerman
    - Add page soft dirty tracking from Laurent Dufour
    - Add support for Nvlink NPUs from Alistair Popple
    - Add support for kexec on 476fpe from Alistair Popple
    - Enable kernel CPU dlpar from sysfs from Nathan Fontenot
    - Copy only required pieces of the mm_context_t to the paca from
    Michael Neuling
    - Add a kmsg_dumper that flushes OPAL console output on panic from
    Russell Currey
    - Implement save_stack_trace_regs() to enable kprobe stack tracing
    from Steven Rostedt
    - Add HWCAP bits for Power9 from Michael Ellerman
    - Fix _PAGE_PTE breaking swapoff from Aneesh Kumar K.V
    - Fix _PAGE_SWP_SOFT_DIRTY breaking swapoff from Hugh Dickins
    - scripts/recordmcount.pl: support data in text section on powerpc
    from Ulrich Weigand
    - Handle R_PPC64_ENTRY relocations in modules from Ulrich Weigand

    cxl:
    - cxl: Fix possible idr warning when contexts are released from
    Vaibhav Jain
    - cxl: use correct operator when writing pcie config space values
    from Andrew Donnellan
    - cxl: Fix DSI misses when the context owning task exits from Vaibhav
    Jain
    - cxl: fix build for GCC 4.6.x from Brian Norris
    - cxl: use -Werror only with CONFIG_PPC_WERROR from Brian Norris
    - cxl: Enable PCI device ID for future IBM CXL adapter from Uma
    Krishnan

    Freescale:
    - Freescale updates from Scott: Highlights include moving QE code out
    of arch/powerpc (to be shared with arm), device tree updates, and
    minor fixes"

    * tag 'powerpc-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (149 commits)
    powerpc/module: Handle R_PPC64_ENTRY relocations
    scripts/recordmcount.pl: support data in text section on powerpc
    powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages
    powerpc/mm: fix _PAGE_SWP_SOFT_DIRTY breaking swapoff
    powerpc/mm: Fix _PAGE_PTE breaking swapoff
    cxl: Enable PCI device ID for future IBM CXL adapter
    cxl: use -Werror only with CONFIG_PPC_WERROR
    cxl: fix build for GCC 4.6.x
    powerpc: Add HWCAP bits for Power9
    powerpc/powernv: Reserve PE#0 on NPU
    powerpc/powernv: Change NPU PE# assignment
    powerpc/powernv: Fix update of NVLink DMA mask
    powerpc/powernv: Remove misleading comment in pci.c
    powerpc: Implement save_stack_trace_regs() to enable kprobe stack tracing
    powerpc: Fix build break due to paca mm_context_t changes
    cxl: Fix DSI misses when the context owning task exits
    MAINTAINERS: Update Scott Wood's e-mail address
    powerpc/powernv: Fix minor off-by-one error in opal_mce_check_early_recovery()
    powerpc: Fix style of self-test config prompts
    powerpc/powernv: Only delay opal_rtc_read() retry when necessary
    ...

    Linus Torvalds
     

15 Jan, 2016

1 commit

  • Commit a9ceb78bc75c (cpuidle,menu: use interactivity_req to disable
    polling) exposed a bug in menu_select() causing it to return -1
    on systems with CPUIDLE_DRIVER_STATE_START equal to zero, although
    it should have returned 0. As a result, idle states are not entered
    by CPUs on those systems.

    Namely, on the systems in question data->last_state_idx is initially
    equal to -1 and the above commit modified the condition that would
    have caused it to be changed to 0 to be less likely to trigger which
    exposed the problem. However, setting data->last_state_idx initially
    to -1 doesn't make sense at all and on the affected systems it should
    always be set to CPUIDLE_DRIVER_STATE_START (ie. 0) unconditionally,
    so make that happen.

    Fixes: a9ceb78bc75c (cpuidle,menu: use interactivity_req to disable polling)
    Reported-and-tested-by: Sudeep Holla
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

17 Dec, 2015

1 commit


15 Dec, 2015

3 commits

  • The Kconfig currently controlling compilation of this code is:

    cpuidle/Kconfig.arm:config ARM_EXYNOS_CPUIDLE
    cpuidle/Kconfig.arm: bool "Cpu Idle Driver for the Exynos processors"

    ...meaning that it currently is not being built as a module by anyone.

    Lets remove the couple traces of modularity so that when reading the
    driver there is no doubt it is builtin-only.

    Since module_platform_driver() uses the same init level priority as
    builtin_platform_driver() the init ordering remains unchanged with
    this commit.

    Acked-by: Daniel Lezcano
    Signed-off-by: Paul Gortmaker
    Signed-off-by: Rafael J. Wysocki

    Paul Gortmaker
     
  • The Kconfig currently controlling compilation of this code is:

    cpuidle/Kconfig.arm:config ARM_U8500_CPUIDLE
    cpuidle/Kconfig.arm: bool "Cpu Idle Driver for the ST-E u8500 processors"

    ...meaning that it currently is not being built as a module by anyone.

    Lets remove the couple traces of modularity so that when reading the
    driver there is no doubt it is builtin-only.

    Since module_platform_driver() uses the same init level priority as
    builtin_platform_driver() the init ordering remains unchanged with
    this commit.

    Acked-by: Daniel Lezcano
    Signed-off-by: Paul Gortmaker
    Signed-off-by: Rafael J. Wysocki

    Paul Gortmaker
     
  • The Kconfig currently controlling compilation of this code is:

    drivers/cpuidle/Kconfig.arm:config ARM_CLPS711X_CPUIDLE
    drivers/cpuidle/Kconfig.arm: bool "CPU Idle Driver for CLPS711X processors"

    ...meaning that it currently is not being built as a module by anyone.

    Lets remove the modular code that is essentially orphaned, so that
    when reading the driver there is no doubt it is builtin-only.

    Since module_platform_driver() uses the same init level priority as
    builtin_platform_driver() the init ordering remains unchanged with
    this commit.

    Also note that MODULE_DEVICE_TABLE is a no-op for non-modular code.

    We also delete the MODULE_LICENSE tag etc. since all that information
    is already contained at the top of the file in the comments.

    Acked-by: Daniel Lezcano
    Signed-off-by: Paul Gortmaker
    Signed-off-by: Rafael J. Wysocki

    Paul Gortmaker
     

17 Nov, 2015

3 commits

  • The cpuidle state tables contain the maximum exit latency for each
    cpuidle state. On x86, that is the exit latency for when the entire
    package goes into that same idle state.

    However, a lot of the time we only go into the core idle state,
    not the package idle state. This means we see a much smaller exit
    latency.

    We have no way to detect whether we went into the core or package
    idle state while idle, and that is ok.

    However, the current menu_update logic does have the potential to
    trip up the repeating pattern detection in get_typical_interval.
    If the system is experiencing an exit latency near the idle state's
    exit latency, some of the samples will have exit_us subtracted,
    while others will not. This turns a repeating pattern into mush,
    potentially breaking get_typical_interval.

    Furthermore, for smaller sleep intervals, we know the chance that
    all the cores in the package went to the same idle state are fairly
    small. Dividing the measured_us by two, instead of subtracting the
    full exit latency when hitting a small measured_us, will reduce the
    error.

    Signed-off-by: Rik van Riel
    Acked-by: Arjan van de Ven
    Signed-off-by: Rafael J. Wysocki

    Rik van Riel
     
  • The menu governor carefully figures out how much time we typically
    sleep for an estimated sleep interval, or whether there is a repeating
    pattern going on, and corrects that estimate for the CPU load.

    Then it proceeds to ignore that information when determining whether
    or not to consider polling. This is not a big deal on most x86 CPUs,
    which have very low C1 latencies, and the patch should not have any
    effect on those CPUs.

    However, certain CPUs (eg. Atom) have much higher C1 latencies, and
    it would be good to not waste performance and power on those CPUs if
    we are expecting a very low wakeup latency.

    Disable polling based on the estimated interactivity requirement, not
    on the time to the next timer interrupt.

    Signed-off-by: Rik van Riel
    Acked-by: Arjan van de Ven
    Signed-off-by: Rafael J. Wysocki

    Rik van Riel
     
  • The cpuidle menu governor has a forced cut-off for polling at 5us,
    in order to deal with firmware that gives the OS bad information
    on cpuidle states, leading to the system spending way too much time
    in polling.

    However, at least one x86 CPU family (Atom) has chips that have
    a 20us break-even point for C1. Forcing the polling cut-off to
    less than that wastes performance and power.

    Increase the polling cut-off to 20us.

    Systems with a lower C1 latency will be found in the states table by
    the menu governor, which will pick those states as appropriate.

    Signed-off-by: Rik van Riel
    Acked-by: Arjan van de Ven
    Signed-off-by: Rafael J. Wysocki

    Rik van Riel
     

23 Oct, 2015

2 commits

  • As the driver doesn't support unbinding, nor does it support arbitary
    binding of devices, disable the bind/unbind attributes for this driver.
    Also, as the driver has no remove function, it can never be modular,
    so use builtin_platform_driver() to avoid the module exit boilerplate.

    Signed-off-by: Russell King
    Signed-off-by: Daniel Lezcano

    Russell King
     
  • There's no need to use multiple platform drivers, especially when we
    want to do something different in the probe, but we still use a common
    probe function.

    We can use the platform ID system to only register one platform driver,
    but have it match several devices, and give us the CPU idle driver via
    the ID's driver_data.

    Signed-off-by: Russell King
    Signed-off-by: Daniel Lezcano

    Russell King
     

12 Sep, 2015

1 commit

  • Pull more power management and ACPI updates from Rafael Wysocki:
    "These are mostly fixes and cleanups on top of the previous PM+ACPI
    pull request (cpufreq core and drivers, cpuidle, generic power domains
    framework). Some of them didn't make to that pull request and some
    fix issues introduced by it.

    The only really new thing is the support for suspend frequency in the
    cpufreq-dt driver, but it is needed to fix an issue with Exynos
    platforms.

    Specifics:

    - build fix for the new Mediatek MT8173 cpufreq driver (Guenter
    Roeck).

    - generic power domains framework fixes (power on error code path,
    subdomain removal) and cleanup of a deprecated API user (Geert
    Uytterhoeven, Jon Hunter, Ulf Hansson).

    - cpufreq-dt driver fixes including two fixes for bugs related to the
    new Operating Performance Points Device Tree bindings introduced
    recently (Viresh Kumar).

    - suspend frequency support for the cpufreq-dt driver (Bartlomiej
    Zolnierkiewicz, Viresh Kumar).

    - cpufreq core cleanups (Viresh Kumar).

    - intel_pstate driver fixes (Chen Yu, Kristen Carlson Accardi).

    - additional sanity check in the cpuidle core (Xunlei Pang).

    - fix for a comment related to CPU power management (Lina Iyer)"

    * tag 'pm+acpi-4.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    intel_pstate: fix PCT_TO_HWP macro
    intel_pstate: Fix user input of min/max to legal policy region
    PM / OPP: Return suspend_opp only if it is enabled
    cpufreq-dt: add suspend frequency support
    cpufreq: allow cpufreq_generic_suspend() to work without suspend frequency
    PM / OPP: add dev_pm_opp_get_suspend_opp() helper
    staging: board: Migrate away from __pm_genpd_name_add_device()
    cpufreq: Use __func__ to print function's name
    cpufreq: staticize cpufreq_cpu_get_raw()
    PM / Domains: Ensure subdomain is not in use before removing
    cpufreq: Add ARM_MT8173_CPUFREQ dependency on THERMAL
    cpuidle/coupled: Add sanity check for safe_state_index
    PM / Domains: Try power off masters in error path of __pm_genpd_poweron()
    cpufreq: dt: Tolerance applies on both sides of target voltage
    cpufreq: dt: Print error on failing to mark OPPs as shared
    cpufreq: dt: Check OPP count before marking them shared
    kernel/cpu_pm: fix cpu_cluster_pm_exit comment

    Linus Torvalds
     

11 Sep, 2015

1 commit

  • * pm-cpu:
    kernel/cpu_pm: fix cpu_cluster_pm_exit comment

    * pm-cpuidle:
    cpuidle/coupled: Add sanity check for safe_state_index

    * pm-domains:
    staging: board: Migrate away from __pm_genpd_name_add_device()
    PM / Domains: Ensure subdomain is not in use before removing
    PM / Domains: Try power off masters in error path of __pm_genpd_poweron()

    Rafael J. Wysocki
     

04 Sep, 2015

1 commit

  • Pull ARM development updates from Russell King:
    "Included in this update:

    - moving PSCI code from ARM64/ARM to drivers/

    - removal of some architecture internals from global kernel view

    - addition of software based "privileged no access" support using the
    old domains register to turn off the ability for kernel
    loads/stores to access userspace. Only the proper accessors will
    be usable.

    - addition of early fixup support for early console

    - re-addition (and reimplementation) of OMAP special interconnect
    barrier

    - removal of finish_arch_switch()

    - only expose cpuX/online in sysfs if hotpluggable

    - a number of code cleanups"

    * 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (41 commits)
    ARM: software-based priviledged-no-access support
    ARM: entry: provide uaccess assembly macro hooks
    ARM: entry: get rid of multiple macro definitions
    ARM: 8421/1: smp: Collapse arch_cpu_idle_dead() into cpu_die()
    ARM: uaccess: provide uaccess_save_and_enable() and uaccess_restore()
    ARM: mm: improve do_ldrd_abort macro
    ARM: entry: ensure that IRQs are enabled when calling syscall_trace_exit()
    ARM: entry: efficiency cleanups
    ARM: entry: get rid of asm_trace_hardirqs_on_cond
    ARM: uaccess: simplify user access assembly
    ARM: domains: remove DOMAIN_TABLE
    ARM: domains: keep vectors in separate domain
    ARM: domains: get rid of manager mode for user domain
    ARM: domains: move initial domain setting value to asm/domains.h
    ARM: domains: provide domain_mask()
    ARM: domains: switch to keeping domain value in register
    ARM: 8419/1: dma-mapping: harmonize definition of DMA_ERROR_CODE
    ARM: 8417/1: refactor bitops functions with BIT_MASK() and BIT_WORD()
    ARM: 8416/1: Feroceon: use of_iomap() to map register base
    ARM: 8415/1: early fixmap support for earlycon
    ...

    Linus Torvalds
     

03 Sep, 2015

1 commit


02 Sep, 2015

1 commit

  • Pull power management and ACPI updates from Rafael Wysocki:
    "From the number of commits perspective, the biggest items are ACPICA
    and cpufreq changes with the latter taking the lead (over 50 commits).

    On the cpufreq front, there are many cleanups and minor fixes in the
    core and governors, driver updates etc. We also have a new cpufreq
    driver for Mediatek MT8173 chips.

    ACPICA mostly updates its debug infrastructure and adds a number of
    fixes and cleanups for a good measure.

    The Operating Performance Points (OPP) framework is updated with new
    DT bindings and support for them among other things.

    We have a few updates of the generic power domains framework and a
    reorganization of the ACPI device enumeration code and bus type
    operations.

    And a lot of fixes and cleanups all over.

    Included is one branch from the MFD tree as it contains some
    PM-related driver core and ACPI PM changes a few other commits are
    based on.

    Specifics:

    - ACPICA update to upstream revision 20150818 including method
    tracing extensions to allow more in-depth AML debugging in the
    kernel and a number of assorted fixes and cleanups (Bob Moore, Lv
    Zheng, Markus Elfring).

    - ACPI sysfs code updates and a documentation update related to AML
    method tracing (Lv Zheng).

    - ACPI EC driver fix related to serialized evaluations of _Qxx
    methods and ACPI tools updates allowing the EC userspace tool to be
    built from the kernel source (Lv Zheng).

    - ACPI processor driver updates preparing it for future introduction
    of CPPC support and ACPI PCC mailbox driver updates (Ashwin
    Chaugule).

    - ACPI interrupts enumeration fix for a regression related to the
    handling of IRQ attribute conflicts between MADT and the ACPI
    namespace (Jiang Liu).

    - Fixes related to ACPI device PM (Mika Westerberg, Srinidhi
    Kasagar).

    - ACPI device registration code reorganization to separate the
    sysfs-related code and bus type operations from the rest (Rafael J
    Wysocki).

    - Assorted cleanups in the ACPI core (Jarkko Nikula, Mathias Krause,
    Andy Shevchenko, Rafael J Wysocki, Nicolas Iooss).

    - ACPI cpufreq driver and ia64 cpufreq driver fixes and cleanups (Pan
    Xinhui, Rafael J Wysocki).

    - cpufreq core cleanups on top of the previous changes allowing it to
    preseve its sysfs directories over system suspend/resume (Viresh
    Kumar, Rafael J Wysocki, Sebastian Andrzej Siewior).

    - cpufreq fixes and cleanups related to governors (Viresh Kumar).

    - cpufreq updates (core and the cpufreq-dt driver) related to the
    turbo/boost mode support (Viresh Kumar, Bartlomiej Zolnierkiewicz).

    - New DT bindings for Operating Performance Points (OPP), support for
    them in the OPP framework and in the cpufreq-dt driver plus related
    OPP framework fixes and cleanups (Viresh Kumar).

    - cpufreq powernv driver updates (Shilpasri G Bhat).

    - New cpufreq driver for Mediatek MT8173 (Pi-Cheng Chen).

    - Assorted cpufreq driver (speedstep-lib, sfi, integrator) cleanups
    and fixes (Abhilash Jindal, Andrzej Hajda, Cristian Ardelean).

    - intel_pstate driver updates including Skylake-S support, support
    for enabling HW P-states per CPU and an additional vendor bypass
    list entry (Kristen Carlson Accardi, Chen Yu, Ethan Zhao).

    - cpuidle core fixes related to the handling of coupled idle states
    (Xunlei Pang).

    - intel_idle driver updates including Skylake Client support and
    support for freeze-mode-specific idle states (Len Brown).

    - Driver core updates related to power management (Andy Shevchenko,
    Rafael J Wysocki).

    - Generic power domains framework fixes and cleanups (Jon Hunter,
    Geert Uytterhoeven, Rajendra Nayak, Ulf Hansson).

    - Device PM QoS framework update to allow the latency tolerance
    setting to be exposed to user space via sysfs (Mika Westerberg).

    - devfreq support for PPMUv2 in Exynos5433 and a fix for an incorrect
    exynos-ppmu DT binding (Chanwoo Choi, Javier Martinez Canillas).

    - System sleep support updates (Alan Stern, Len Brown, SungEun Kim).

    - rockchip-io AVS support updates (Heiko Stuebner).

    - PM core clocks support fixup (Colin Ian King).

    - Power capping RAPL driver update including support for Skylake H/S
    and Broadwell-H (Radivoje Jovanovic, Seiichi Ikarashi).

    - Generic device properties framework fixes related to the handling
    of static (driver-provided) property sets (Andy Shevchenko).

    - turbostat and cpupower updates (Len Brown, Shilpasri G Bhat,
    Shreyas B Prabhu)"

    * tag 'pm+acpi-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (180 commits)
    cpufreq: speedstep-lib: Use monotonic clock
    cpufreq: powernv: Increase the verbosity of OCC console messages
    cpufreq: sfi: use kmemdup rather than duplicating its implementation
    cpufreq: drop !cpufreq_driver check from cpufreq_parse_governor()
    cpufreq: rename cpufreq_real_policy as cpufreq_user_policy
    cpufreq: remove redundant 'policy' field from user_policy
    cpufreq: remove redundant 'governor' field from user_policy
    cpufreq: update user_policy.* on success
    cpufreq: use memcpy() to copy policy
    cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event
    cpufreq: mediatek: Add MT8173 cpufreq driver
    dt-bindings: mediatek: Add MT8173 CPU DVFS clock bindings
    PM / Domains: Fix typo in description of genpd_dev_pm_detach()
    PM / Domains: Remove unusable governor dummies
    PM / Domains: Make pm_genpd_init() available to modules
    PM / domains: Align column headers and data in pm_genpd_summary output
    powercap / RAPL: disable the 2nd power limit properly
    tools: cpupower: Fix error when running cpupower monitor
    PM / OPP: Drop unlikely before IS_ERR(_OR_NULL)
    PM / OPP: Fix static checker warning (broken 64bit big endian systems)
    ...

    Linus Torvalds
     

01 Sep, 2015

1 commit

  • Pull scheduler updates from Ingo Molnar:
    "The biggest change in this cycle is the rewrite of the main SMP load
    balancing metric: the CPU load/utilization. The main goal was to make
    the metric more precise and more representative - see the changelog of
    this commit for the gory details:

    9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")

    It is done in a way that significantly reduces complexity of the code:

    5 files changed, 249 insertions(+), 494 deletions(-)

    and the performance testing results are encouraging. Nevertheless we
    need to keep an eye on potential regressions, since this potentially
    affects every SMP workload in existence.

    This work comes from Yuyang Du.

    Other changes:

    - SCHED_DL updates. (Andrea Parri)

    - Simplify architecture callbacks by removing finish_arch_switch().
    (Peter Zijlstra et al)

    - cputime accounting: guarantee stime + utime == rtime. (Peter
    Zijlstra)

    - optimize idle CPU wakeups some more - inspired by Facebook server
    loads. (Mike Galbraith)

    - stop_machine fixes and updates. (Oleg Nesterov)

    - Introduce the 'trace_sched_waking' tracepoint. (Peter Zijlstra)

    - sched/numa tweaks. (Srikar Dronamraju)

    - misc fixes and small cleanups"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
    sched/deadline: Fix comment in enqueue_task_dl()
    sched/deadline: Fix comment in push_dl_tasks()
    sched: Change the sched_class::set_cpus_allowed() calling context
    sched: Make sched_class::set_cpus_allowed() unconditional
    sched: Fix a race between __kthread_bind() and sched_setaffinity()
    sched: Ensure a task has a non-normalized vruntime when returning back to CFS
    sched/numa: Fix NUMA_DIRECT topology identification
    tile: Reorganize _switch_to()
    sched, sparc32: Update scheduler comments in copy_thread()
    sched: Remove finish_arch_switch()
    sched, tile: Remove finish_arch_switch
    sched, sh: Fold finish_arch_switch() into switch_to()
    sched, score: Remove finish_arch_switch()
    sched, avr32: Remove finish_arch_switch()
    sched, MIPS: Get rid of finish_arch_switch()
    sched, arm: Remove finish_arch_switch()
    sched/fair: Clean up load average references
    sched/fair: Provide runnable_load_avg back to cfs_rq
    sched/fair: Remove task and group entity load when they are dead
    sched/fair: Init cfs_rq's sched_entity load average
    ...

    Linus Torvalds
     

28 Aug, 2015

2 commits


03 Aug, 2015

1 commit

  • Now that the common PSCI client code has been factored out to
    drivers/firmware, and made safe for 32-bit use, move the 32-bit ARM code
    over to it. This results in a moderate reduction of duplicated lines,
    and will prevent further duplication as the PSCI client code is updated
    for PSCI 1.0 and beyond.

    The two legacy platform users of the PSCI invocation code are updated to
    account for interface changes. In both cases the power state parameter
    (which is constant) is now generated using macros, so that the
    pack/unpack logic can be killed in preparation for PSCI 1.0 power state
    changes.

    Signed-off-by: Mark Rutland
    Acked-by: Rob Herring
    Cc: Catalin Marinas
    Cc: Ashwin Chaugule
    Cc: Lorenzo Pieralisi
    Cc: Russell King
    Cc: Will Deacon
    Signed-off-by: Will Deacon

    Mark Rutland
     

21 Jul, 2015

1 commit

  • Make sure to stop tracing only once we are past a point where
    all latency tracing events have been processed (irqs are not
    enabled again). This has the slight advantage of capturing more
    latency related events in the idle path, but most importantly it
    makes sure that latency tracing doesn't get re-enabled
    inadvertently when new events are coming in.

    This makes the irqsoff latency tracer useful again, as we stop
    capturing CPU sleep time as IRQ latency.

    Signed-off-by: Lucas Stach
    Cc: Daniel Lezcano
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rafael J. Wysocki
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: kernel@pengutronix.de
    Cc: patchwork-lst@pengutronix.de
    Link: http://lkml.kernel.org/r/1437410090-3747-1-git-send-email-l.stach@pengutronix.de
    Signed-off-by: Ingo Molnar

    Lucas Stach
     

10 Jul, 2015

1 commit


03 Jul, 2015

1 commit

  • …/kernel/git/paulg/linux

    Pull module_platform_driver replacement from Paul Gortmaker:
    "Replace module_platform_driver with builtin_platform driver in non
    modules.

    We see an increasing number of non-modular drivers using
    modular_driver() type register functions. There are several downsides
    to letting this continue unchecked:

    - The code can appear modular to a reader of the code, and they won't
    know if the code really is modular without checking the Makefile
    and Kconfig to see if compilation is governed by a bool or
    tristate.

    - Coders of drivers may be tempted to code up an __exit function that
    is never used, just in order to satisfy the required three args of
    the modular registration function.

    - Non-modular code ends up including the <module.h> which increases
    CPP overhead that they don't need.

    - It hinders us from performing better separation of the module init
    code and the generic init code.

    So here we introduce similar macros for builtin drivers. Then we
    convert builtin drivers (controlled by a bool Kconfig) by making the
    following type of mapping:

    module_platform_driver() ---> builtin_platform_driver()
    module_platform_driver_probe() ---> builtin_platform_driver_probe().

    The set of drivers that are converted here are just the ones that
    showed up as relying on an implicit include of <module.h> during a
    pending header cleanup. So we convert them here vs adding an include
    of <module.h> to non-modular code to avoid compile fails. Additonal
    conversions can be done asynchronously at any time.

    Once again, an unused module_exit function that is removed here
    appears in the diffstat as an outlier wrt all the other changes"

    * tag 'module-builtin_driver-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
    drivers/clk: convert sunxi/clk-mod0.c to use builtin_platform_driver
    drivers/power: Convert non-modular syscon-reboot to use builtin_platform_driver
    drivers/soc: Convert non-modular soc-realview to use builtin_platform_driver
    drivers/soc: Convert non-modular tegra/pmc to use builtin_platform_driver
    drivers/cpufreq: Convert non-modular s5pv210-cpufreq.c to use builtin_platform_driver
    drivers/cpuidle: Convert non-modular drivers to use builtin_platform_driver
    drivers/platform: Convert non-modular pdev_bus to use builtin_platform_driver
    platform_device: better support builtin boilerplate avoidance

    Linus Torvalds