25 Dec, 2016

1 commit


22 Dec, 2016

1 commit

  • * pm-cpufreq:
    cpufreq: s3c64xx: remove incorrect __init annotation
    cpufreq: Remove CPU hotplug callbacks only if they were initialized
    CPU/hotplug: Clarify description of __cpuhp_setup_state() return value

    Rafael J. Wysocki
     

21 Dec, 2016

2 commits

  • s3c64xx_cpufreq_config_regulator is incorrectly annotated
    as __init, since the caller is also not init:

    WARNING: vmlinux.o(.text+0x92fe1c): Section mismatch in reference from the function s3c64xx_cpufreq_driver_init() to the function .init.text:s3c64xx_cpufreq_config_regulator()

    With modern gcc versions, the function gets inline, so we don't
    see the warning, this only happens with gcc-4.6 and older.

    Signed-off-by: Arnd Bergmann
    Reviewed-by: Krzysztof Kozlowski
    Signed-off-by: Rafael J. Wysocki

    Arnd Bergmann
     
  • Since CPU hotplug callbacks are requested for CPUHP_AP_ONLINE_DYN state,
    successful callback initialization will result in cpuhp_setup_state()
    returning a positive value. Therefore acpi_cpufreq_online being zero
    indicates that callbacks have not been installed.

    This means that acpi_cpufreq_boost_exit() should only remove them if
    acpi_cpufreq_online is positive. Trying to call
    cpuhp_remove_state_nocalls(0) will cause a BUG().

    Signed-off-by: Boris Ostrovsky
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Rafael J. Wysocki

    Boris Ostrovsky
     

14 Dec, 2016

1 commit

  • Pull power management updates from Rafael Wysocki:
    "Again, cpufreq gets more changes than the other parts this time (one
    new driver, one old driver less, a bunch of enhancements of the
    existing code, new CPU IDs, fixes, cleanups)

    There also are some changes in cpuidle (idle injection rework, a
    couple of new CPU IDs, online/offline rework in intel_idle, fixes and
    cleanups), in the generic power domains framework (mostly related to
    supporting power domains containing CPUs), and in the Operating
    Performance Points (OPP) library (mostly related to supporting devices
    with multiple voltage regulators)

    In addition to that, the system sleep state selection interface is
    modified to make it easier for distributions with unchanged user space
    to support suspend-to-idle as the default system suspend method, some
    issues are fixed in the PM core, the latency tolerance PM QoS
    framework is improved a bit, the Intel RAPL power capping driver is
    cleaned up and there are some fixes and cleanups in the devfreq
    subsystem

    Specifics:

    - New cpufreq driver for Broadcom STB SoCs and a Device Tree binding
    for it (Markus Mayer)

    - Support for ARM Integrator/AP and Integrator/CP in the generic DT
    cpufreq driver and elimination of the old Integrator cpufreq driver
    (Linus Walleij)

    - Support for the zx296718, r8a7743 and r8a7745, Socionext UniPhier,
    and PXA SoCs in the the generic DT cpufreq driver (Baoyou Xie,
    Geert Uytterhoeven, Masahiro Yamada, Robert Jarzmik)

    - cpufreq core fix to eliminate races that may lead to using inactive
    policy objects and related cleanups (Rafael Wysocki)

    - cpufreq schedutil governor update to make it use SCHED_FIFO kernel
    threads (instead of regular workqueues) for doing delayed work (to
    reduce the response latency in some cases) and related cleanups
    (Viresh Kumar)

    - New cpufreq sysfs attribute for resetting statistics (Markus Mayer)

    - cpufreq governors fixes and cleanups (Chen Yu, Stratos Karafotis,
    Viresh Kumar)

    - Support for using generic cpufreq governors in the intel_pstate
    driver (Rafael Wysocki)

    - Support for per-logical-CPU P-state limits and the EPP/EPB (Energy
    Performance Preference/Energy Performance Bias) knobs in the
    intel_pstate driver (Srinivas Pandruvada)

    - New CPU ID for Knights Mill in intel_pstate (Piotr Luc)

    - intel_pstate driver modification to use the P-state selection
    algorithm based on CPU load on platforms with the system profile in
    the ACPI tables set to "mobile" (Srinivas Pandruvada)

    - intel_pstate driver cleanups (Arnd Bergmann, Rafael Wysocki,
    Srinivas Pandruvada)

    - cpufreq powernv driver updates including fast switching support
    (for the schedutil governor), fixes and cleanus (Akshay Adiga,
    Andrew Donnellan, Denis Kirjanov)

    - acpi-cpufreq driver rework to switch it over to the new CPU
    offline/online state machine (Sebastian Andrzej Siewior)

    - Assorted cleanups in cpufreq drivers (Wei Yongjun, Prashanth
    Prakash)

    - Idle injection rework (to make it use the regular idle path instead
    of a home-grown custom one) and related powerclamp thermal driver
    updates (Peter Zijlstra, Jacob Pan, Petr Mladek, Sebastian Andrzej
    Siewior)

    - New CPU IDs for Atom Z34xx and Knights Mill in intel_idle (Andy
    Shevchenko, Piotr Luc)

    - intel_idle driver cleanups and switch over to using the new CPU
    offline/online state machine (Anna-Maria Gleixner, Sebastian
    Andrzej Siewior)

    - cpuidle DT driver update to support suspend-to-idle properly
    (Sudeep Holla)

    - cpuidle core cleanups and misc updates (Daniel Lezcano, Pan Bian,
    Rafael Wysocki)

    - Preliminary support for power domains including CPUs in the generic
    power domains (genpd) framework and related DT bindings (Lina Iyer)

    - Assorted fixes and cleanups in the generic power domains (genpd)
    framework (Colin Ian King, Dan Carpenter, Geert Uytterhoeven)

    - Preliminary support for devices with multiple voltage regulators
    and related fixes and cleanups in the Operating Performance Points
    (OPP) library (Viresh Kumar, Masahiro Yamada, Stephen Boyd)

    - System sleep state selection interface rework to make it easier to
    support suspend-to-idle as the default system suspend method
    (Rafael Wysocki)

    - PM core fixes and cleanups, mostly related to the interactions
    between the system suspend and runtime PM frameworks (Ulf Hansson,
    Sahitya Tummala, Tony Lindgren)

    - Latency tolerance PM QoS framework imorovements (Andrew Lutomirski)

    - New Knights Mill CPU ID for the Intel RAPL power capping driver
    (Piotr Luc)

    - Intel RAPL power capping driver fixes, cleanups and switch over to
    using the new CPU offline/online state machine (Jacob Pan, Thomas
    Gleixner, Sebastian Andrzej Siewior)

    - Fixes and cleanups in the exynos-ppmu, exynos-nocp, rk3399_dmc,
    rockchip-dfi devfreq drivers and the devfreq core (Axel Lin,
    Chanwoo Choi, Javier Martinez Canillas, MyungJoo Ham, Viresh Kumar)

    - Fix for false-positive KASAN warnings during resume from ACPI S3
    (suspend-to-RAM) on x86 (Josh Poimboeuf)

    - Memory map verification during resume from hibernation on x86 to
    ensure a consistent address space layout (Chen Yu)

    - Wakeup sources debugging enhancement (Xing Wei)

    - rockchip-io AVS driver cleanup (Shawn Lin)"

    * tag 'pm-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (127 commits)
    devfreq: rk3399_dmc: Don't use OPP structures outside of RCU locks
    devfreq: rk3399_dmc: Remove dangling rcu_read_unlock()
    devfreq: exynos: Don't use OPP structures outside of RCU locks
    Documentation: intel_pstate: Document HWP energy/performance hints
    cpufreq: intel_pstate: Support for energy performance hints with HWP
    cpufreq: intel_pstate: Add locking around HWP requests
    PM / sleep: Print active wakeup sources when blocking on wakeup_count reads
    PM / core: Fix bug in the error handling of async suspend
    PM / wakeirq: Fix dedicated wakeirq for drivers not using autosuspend
    PM / Domains: Fix compatible for domain idle state
    PM / OPP: Don't WARN on multiple calls to dev_pm_opp_set_regulators()
    PM / OPP: Allow platform specific custom set_opp() callbacks
    PM / OPP: Separate out _generic_set_opp()
    PM / OPP: Add infrastructure to manage multiple regulators
    PM / OPP: Pass struct dev_pm_opp_supply to _set_opp_voltage()
    PM / OPP: Manage supply's voltage/current in a separate structure
    PM / OPP: Don't use OPP structure outside of rcu protected section
    PM / OPP: Reword binding supporting multiple regulators per device
    PM / OPP: Fix incorrect cpu-supply property in binding
    cpuidle: Add a kerneldoc comment to cpuidle_use_deepest_state()
    ..

    Linus Torvalds
     

13 Dec, 2016

2 commits

  • * pm-cpufreq: (51 commits)
    Documentation: intel_pstate: Document HWP energy/performance hints
    cpufreq: intel_pstate: Support for energy performance hints with HWP
    cpufreq: intel_pstate: Add locking around HWP requests
    cpufreq: ondemand: Set MIN_FREQUENCY_UP_THRESHOLD to 1
    cpufreq: intel_pstate: Add Knights Mill CPUID
    MAINTAINERS: Add bug tracking system location entry for cpufreq
    cpufreq: dt: Add support for zx296718
    cpufreq: acpi-cpufreq: drop rdmsr_on_cpus() usage
    cpufreq: acpi-cpufreq: Convert to hotplug state machine
    cpufreq: intel_pstate: fix intel_pstate_exit_perf_limits() prototype
    cpufreq: intel_pstate: Set EPP/EPB to 0 in performance mode
    cpufreq: schedutil: Rectify comment in sugov_irq_work() function
    cpufreq: intel_pstate: increase precision of performance limits
    cpufreq: intel_pstate: round up min_perf limits
    cpufreq: Make cpufreq_update_policy() void
    ACPI / processor: Make acpi_processor_ppc_has_changed() void
    cpufreq: Avoid using inactive policies
    cpufreq: intel_pstate: Generic governors support
    cpufreq: intel_pstate: Request P-states control from SMM if needed
    cpufreq: dt: Add support for r8a7743 and r8a7745
    ...

    Rafael J. Wysocki
     
  • * pm-opp:
    PM / OPP: Don't WARN on multiple calls to dev_pm_opp_set_regulators()
    PM / OPP: Allow platform specific custom set_opp() callbacks
    PM / OPP: Separate out _generic_set_opp()
    PM / OPP: Add infrastructure to manage multiple regulators
    PM / OPP: Pass struct dev_pm_opp_supply to _set_opp_voltage()
    PM / OPP: Manage supply's voltage/current in a separate structure
    PM / OPP: Don't use OPP structure outside of rcu protected section
    PM / OPP: Reword binding supporting multiple regulators per device
    PM / OPP: Fix incorrect cpu-supply property in binding
    PM / OPP: Pass opp_table to dev_pm_opp_put_regulator()
    PM / OPP: fix debug/error messages in dev_pm_opp_of_get_sharing_cpus()
    PM / OPP: make _of_get_opp_desc_node() a static function

    Rafael J. Wysocki
     

08 Dec, 2016

2 commits

  • It is possible to provide hints to the HWP algorithms in the processor
    to be more performance centric to more energy centric. These hints are
    provided by using HWP energy performance preference (EPP) or energy
    performance bias (EPB) settings.

    The scope of these settings is per logical processor, which means that
    each of the logical processors in the package can be programmed with a
    different value.

    This change provides cpufreq sysfs interface to provide hint. For each
    policy, two additional attributes will be available to check and provide
    hint. These attributes will only be present when the intel_pstate driver
    is using HWP mode.

    These attributes are:
    - energy_performance_available_preferences
    - energy_performance_preference

    To get list of supported hints:
    $ cat energy_performance_available_preferences
    default performance balance_performance balance_power power

    The current preference can be read or changed via cpufreq sysfs
    attribute "energy_performance_preference". Reading from this attribute
    will display current effective setting changed via any method. User can
    write any of the valid preference string to this attribute. User can
    always restore to power-on default by writing "default".

    Implementation
    Since these hints can be provided by direct MSR write or using some tools
    like x86_energy_perf_policy, the driver internally doesn't maintain any
    state. The user operation will result in direct read/write of MSR: 0x774
    (HWP_REQUEST_MSR). Also driver use read modify write to update other
    fields in this MSR.

    Summary of changes:
    - struct cpudata field epp_saved is renamed to epp_powersave, as this
    stores the value to restore once policy is switched from performance
    to powersave to restore original powersave EPP value.
    - A new struct cpudata field epp_saved is used to store the raw MSR
    EPP/EPB value when a CPU goes offline or on suspend and restore on
    online/resume. This ensures that EPP value is restored to correct
    value irrespective of the means used to set.
    - EPP/EPB value ranges are fixed for each preference, which can be
    set for the cpufreq sysfs, so user request is mapped to/from this
    range.
    - New attributes are only added when HWP is present.
    - Since EPP value of 0 is valid the fields are initialized to
    -EINVAL when not valid. The field epp_default is read only once
    after powerup to avoid reading on subsequent CPU online operation
    - New suspend callback to store epp on suspend operation
    - Don't invalidate old epp_saved field on resume and online as now
    we can restore last epp value on suspend and this field can still
    have old EPP value sampled during switch to performance from
    powersave.
    - While here optimized setting of cpu_data->epp_powersave = epp in
    intel_pstate_hwp_set() as this was done in both true and false
    paths.
    - epp/epb set function returns error to caller on failure to pass
    on to user space for display.

    Signed-off-by: Srinivas Pandruvada
    Signed-off-by: Rafael J. Wysocki

    Srinivas Pandruvada
     
  • To avoid race conditions from multiple threads, increase the scope
    of intel_pstate_limits_lock to include HWP requests also.

    Signed-off-by: Srinivas Pandruvada
    [ rjw: Subject ]
    Signed-off-by: Rafael J. Wysocki

    Srinivas Pandruvada
     

06 Dec, 2016

1 commit


02 Dec, 2016

1 commit

  • Currently the minimal up_threshold is 11, and user may want to
    use a smaller minimal up_threshold for performance tuning,
    so MIN_FREQUENCY_UP_THRESHOLD could be set to 1 because:

    1. Current systems wouldn't be affected as they have already
    a value >= 11.
    2. New systems with a default kernel would keep still the default
    value that is >= 11.

    Users now have the advantage that they can make their own decisions
    and customize the 'trip point' to switch to the max frequency.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=65501
    Signed-off-by: Chen Yu
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Chen Yu
     

01 Dec, 2016

3 commits

  • Add Knights Mill (KNM) to the list of CPUIDs supported by intel_pstate.

    Signed-off-by: Piotr Luc
    Reviewed-by: Dave Hansen
    Acked-by: Srinivas Pandruvada
    Signed-off-by: Rafael J. Wysocki

    Piotr Luc
     
  • Add the compatible string for supporting the generic cpufreq driver on
    the ZTE's zx296718 SoC.

    Signed-off-by: Baoyou Xie
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Baoyou Xie
     
  • Joonyoung Shim reported an interesting problem on his ARM octa-core
    Odoroid-XU3 platform. During system suspend, dev_pm_opp_put_regulator()
    was failing for a struct device for which dev_pm_opp_set_regulator() is
    called earlier.

    This happened because an earlier call to
    dev_pm_opp_of_cpumask_remove_table() function (from cpufreq-dt.c file)
    removed all the entries from opp_table->dev_list apart from the last CPU
    device in the cpumask of CPUs sharing the OPP.

    But both dev_pm_opp_set_regulator() and dev_pm_opp_put_regulator()
    routines get CPU device for the first CPU in the cpumask. And so the OPP
    core failed to find the OPP table for the struct device.

    This patch attempts to fix this problem by returning a pointer to the
    opp_table from dev_pm_opp_set_regulator() and using that as the
    parameter to dev_pm_opp_put_regulator(). This ensures that the
    dev_pm_opp_put_regulator() doesn't fail to find the opp table.

    Note that similar design problem also exists with other
    dev_pm_opp_put_*() APIs, but those aren't used currently by anyone and
    so we don't need to update them for now.

    Cc: 4.4+ # 4.4+
    Reported-by: Joonyoung Shim
    Signed-off-by: Stephen Boyd
    Signed-off-by: Viresh Kumar
    [ Viresh: Wrote commit log and tested on exynos 5250 ]
    Signed-off-by: Rafael J. Wysocki

    Stephen Boyd
     

30 Nov, 2016

1 commit

  • Rename CONFIG_SCHED_ITMT for Intel Turbo Boost Max Technology 3.0
    to CONFIG_SCHED_MC_PRIO. This makes the configuration extensible
    in future to other architectures that wish to similarly establish
    CPU core priorities support in the scheduler.

    The description in Kconfig is updated to reflect this change with
    added details for better clarity. The configuration is explicitly
    default-y, to enable the feature on CPUs that have this feature.

    It has no effect on non-TBM3 CPUs.

    Signed-off-by: Tim Chen
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Srinivas Pandruvada
    Cc: Thomas Gleixner
    Cc: bp@suse.de
    Cc: jolsa@redhat.com
    Cc: linux-acpi@vger.kernel.org
    Cc: linux-pm@vger.kernel.org
    Cc: rjw@rjwysocki.net
    Link: http://lkml.kernel.org/r/2b2ee29d93e3f162922d72d0165a1405864fbb23.1480444902.git.tim.c.chen@linux.intel.com
    Signed-off-by: Ingo Molnar

    Tim Chen
     

28 Nov, 2016

4 commits

  • The online / pre_down callback is invoked on the target CPU since commit
    1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu") which means
    for the hotplug callback we can use rmdsrl() instead of rdmsr_on_cpus().

    This leaves us with set_boost() as the only user which still needs to
    read/write the MSR on different CPUs. There is no point in doing that
    update on all cpus with the read modify write magic via per cpu data. We
    simply can issue a function call on all online CPUs which also means that we
    need half that many IPIs.

    Signed-off-by: Sebastian Andrzej Siewior
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Sebastian Andrzej Siewior
     
  • Install the callbacks via the state machine.

    Signed-off-by: Sebastian Andrzej Siewior
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Sebastian Andrzej Siewior
     
  • The addition of the generic governor support marked the
    intel_pstate_exit_perf_limits as inline(), which fixed a warning,
    but it introduced another warning:

    drivers/cpufreq/intel_pstate.c: In function ‘intel_pstate_exit_perf_limits’:
    drivers/cpufreq/intel_pstate.c:483:1: error: no return statement in function returning non-void [-Werror=return-type]

    This changes it back to a 'void' return type, and changes the
    corresponding intel_pstate_init_acpi_perf_limits() function to
    be inline as well for consistency.

    Fixes: 001c76f05b01 (cpufreq: intel_pstate: Generic governors support)
    Signed-off-by: Arnd Bergmann
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Arnd Bergmann
     
  • When user has selected performance policy, then set the EPP (Energy
    Performance Preference) or EPB (Energy Performance Bias) to maximum
    performance mode.

    Also when user switch back to powersave, then restore EPP/EPB to last
    EPP/EPB value before entering performance mode. If user has not changed
    EPP/EPB manually then it will be power on default value.

    Signed-off-by: Srinivas Pandruvada
    Signed-off-by: Rafael J. Wysocki

    Srinivas Pandruvada
     

25 Nov, 2016

1 commit

  • Use the acpi cppc_lib interface to get CPPC performance limits and update
    the per cpu priority for the ITMT scheduler. If the highest performance of
    CPUs differs the ITMT feature is enabled.

    Co-developed-by: Srinivas Pandruvada
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Tim Chen
    Cc: linux-pm@vger.kernel.org
    Cc: peterz@infradead.org
    Cc: jolsa@redhat.com
    Cc: rjw@rjwysocki.net
    Cc: linux-acpi@vger.kernel.org
    Cc: Srinivas Pandruvada
    Cc: bp@suse.de
    Link: http://lkml.kernel.org/r/0998b98943bcdec7d1ddd4ff27358da555ea8e92.1479844244.git.tim.c.chen@linux.intel.com
    Signed-off-by: Thomas Gleixner

    Rafael J. Wysocki
     

22 Nov, 2016

2 commits

  • Even with round up of limits->min_perf and limits->max_perf, in some
    cases resultant performance is 100 MHz less than the desired.

    For example when the maximum frequency is 3.50 GHz, setting
    scaling_min_frequency to 2.3 GHz always results in 2.2 GHz minimum.

    Currently the fixed floating point operation uses 8 bit precision for
    calculating limits->min_perf and limits->max_perf. For some operations
    in this driver the 14 bit precision is used. Using the 14 bit precision
    also for calculating limits->min_perf and limits->max_perf, addresses
    this issue.

    Introduced fp_ext_toint() equivalent to fp_toint() and int_ext_tofp()
    equivalent to int_tofp() with 14 bit precision.

    Signed-off-by: Srinivas Pandruvada
    Signed-off-by: Rafael J. Wysocki

    Srinivas Pandruvada
     
  • In some use cases, user wants to enforce a minimum performance limit on
    CPUs. But because of simple division the resultant performance is 100 MHz
    less than the desired in some cases.

    For example when the maximum frequency is 3.50 GHz, setting
    scaling_min_frequency to 1.6 GHz always results in 1.5 GHz minimum. With
    simple round up, the frequency can be set to 1.6 GHz to minimum in this
    case. This round up is already done to max_policy_pct and max_perf, so do
    the same for min_policy_pct and min_perf.

    Signed-off-by: Srinivas Pandruvada
    Signed-off-by: Rafael J. Wysocki

    Srinivas Pandruvada
     

21 Nov, 2016

3 commits

  • The return value of cpufreq_update_policy() is never used, so make
    it void.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • There are two places in the cpufreq core in which low-level driver
    callbacks may be invoked for an inactive cpufreq policy, which isn't
    guaranteed to work in general. Both are due to possible races with
    CPU offline.

    First, in cpufreq_get(), the policy may become inactive after
    the check against policy->cpus in cpufreq_cpu_get() and before
    policy->rwsem is acquired, in which case using it going forward may
    not be correct.

    Second, an analogous situation is possible in cpufreq_update_policy().

    Avoid using inactive policies by adding policy_is_inactive() checks
    to the code in the above places.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • There may be reasons to use generic cpufreq governors (eg. schedutil)
    on Intel platforms instead of the intel_pstate driver's internal
    governor. However, that currently can only be done by disabling
    intel_pstate altogether and using the acpi-cpufreq driver instead
    of it, which is subject to limitations.

    First of all, acpi-cpufreq only works on systems where the _PSS
    object is present in the ACPI tables for all logical CPUs. Second,
    on those systems acpi-cpufreq will only use frequencies listed by
    _PSS which may be suboptimal. In particular, by convention, the
    whole turbo range is represented in _PSS as a single P-state and
    the frequency assigned to it is greater by 1 MHz than the greatest
    non-turbo frequency listed by _PSS. That may confuse governors to
    use turbo frequencies less frequently which may lead to suboptimal
    performance.

    For this reason, make it possible to use the intel_pstate driver
    with generic cpufreq governors as a "normal" cpufreq driver. That
    mode is enforced by adding intel_pstate=passive to the kernel
    command line and cannot be disabled at run time. In that mode,
    intel_pstate provides a cpufreq driver interface including
    the ->target() and ->fast_switch() callbacks and is listed in
    scaling_driver as "intel_cpufreq".

    Signed-off-by: Rafael J. Wysocki
    Tested-by: Doug Smythies

    Rafael J. Wysocki
     

18 Nov, 2016

1 commit

  • Currently, intel_pstate is unable to control P-states on my
    IvyBridge-based Acer Aspire S5, because they are controlled by SMM
    on that machine by default and it is necessary to request OS control
    of P-states from it via the SMI Command register exposed in the ACPI
    FADT. intel_pstate doesn't do that now, but acpi-cpufreq and other
    cpufreq drivers for x86 platforms do.

    Address this problem by making intel_pstate use the ACPI-defined
    mechanism as well. However, intel_pstate is not modular and it
    doesn't need the module refcount tricks played by
    acpi_processor_notify_smm(), so export the core of this function
    to it as acpi_processor_pstate_control() and make it call that.
    [The changes in processor_perflib.c related to this should not
    make any functional difference for the acpi_processor_notify_smm()
    users].

    To be safe, only call acpi_processor_notify_smm() from intel_pstate
    if ACPI _PPC support is enabled in it.

    Suggested-by: Srinivas Pandruvada
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Srinivas Pandruvada

    Rafael J. Wysocki
     

17 Nov, 2016

4 commits

  • Add the compatible strings for supporting the generic cpufreq driver on
    the Renesas RZ/G1M (r8a7743) and RZ/G1E (r8a7745) SoCs.

    Signed-off-by: Geert Uytterhoeven
    Acked-by: Viresh Kumar
    Acked-by: Simon Horman
    Signed-off-by: Rafael J. Wysocki

    Geert Uytterhoeven
     
  • With preemption turned on we can read incorrect throttling state
    while being switched to CPU on a different chip.

    BUG: using smp_processor_id() in preemptible [00000000] code: cat/7343
    caller is .powernv_cpufreq_throttle_check+0x2c/0x710
    CPU: 13 PID: 7343 Comm: cat Not tainted 4.8.0-rc5-dirty #1
    Call Trace:
    [c0000007d25b75b0] [c000000000971378] .dump_stack+0xe4/0x150 (unreliable)
    [c0000007d25b7640] [c0000000005162e4] .check_preemption_disabled+0x134/0x150
    [c0000007d25b76e0] [c0000000007b63ac] .powernv_cpufreq_throttle_check+0x2c/0x710
    [c0000007d25b7790] [c0000000007b6d18] .powernv_cpufreq_target_index+0x288/0x360
    [c0000007d25b7870] [c0000000007acee4] .__cpufreq_driver_target+0x394/0x8c0
    [c0000007d25b7920] [c0000000007b22ac] .cpufreq_set+0x7c/0xd0
    [c0000007d25b79b0] [c0000000007adf50] .store_scaling_setspeed+0x80/0xc0
    [c0000007d25b7a40] [c0000000007ae270] .store+0xa0/0x100
    [c0000007d25b7ae0] [c0000000003566e8] .sysfs_kf_write+0x88/0xb0
    [c0000007d25b7b70] [c0000000003553b8] .kernfs_fop_write+0x178/0x260
    [c0000007d25b7c10] [c0000000002ac3cc] .__vfs_write+0x3c/0x1c0
    [c0000007d25b7cf0] [c0000000002ad584] .vfs_write+0xc4/0x230
    [c0000007d25b7d90] [c0000000002aeef8] .SyS_write+0x58/0x100
    [c0000007d25b7e30] [c00000000000bfec] system_call+0x38/0xfc

    Fixes: 09a972d16209 (cpufreq: powernv: Report cpu frequency throttling)
    Reviewed-by: Gautham R. Shenoy
    Signed-off-by: Denis Kirjanov
    Signed-off-by: Rafael J. Wysocki

    Denis Kirjanov
     
  • The original comment about the frequency increase to maximum is wrong.

    Both increase and decrease happen at steps.

    Signed-off-by: Stratos Karafotis
    Signed-off-by: Rafael J. Wysocki

    Stratos Karafotis
     
  • Conservative governor changes the CPU frequency in steps.
    That means that if a CPU runs at max frequency, it will need several
    sampling periods to return to min frequency when the workload
    is finished.

    If the update function that calculates the load and target frequency
    is deferred, the governor might need even more time to decrease the
    frequency.

    This may have impact to power consumption and after all conservative
    should decrease the frequency if there is no workload at every sampling
    rate.

    To resolve the above issue calculate the number of sampling periods
    that the update is deferred. Considering that for each sampling period
    conservative should drop the frequency by a freq_step because the
    CPU was idle apply the proper subtraction to requested frequency.

    Below, the kernel trace with and without this patch. First an
    intensive workload is applied on a specific CPU. Then the workload
    is removed and the CPU goes to idle.

    WITHOUT

    -0 [007] dN.. 620.329153: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 620.350857: cpu_frequency: state=1700000 cpu_id=7
    kworker/7:2-556 [007] .... 620.370856: cpu_frequency: state=1900000 cpu_id=7
    kworker/7:2-556 [007] .... 620.390854: cpu_frequency: state=2100000 cpu_id=7
    kworker/7:2-556 [007] .... 620.411853: cpu_frequency: state=2200000 cpu_id=7
    kworker/7:2-556 [007] .... 620.432854: cpu_frequency: state=2400000 cpu_id=7
    kworker/7:2-556 [007] .... 620.453854: cpu_frequency: state=2600000 cpu_id=7
    kworker/7:2-556 [007] .... 620.494856: cpu_frequency: state=2900000 cpu_id=7
    kworker/7:2-556 [007] .... 620.515856: cpu_frequency: state=3100000 cpu_id=7
    kworker/7:2-556 [007] .... 620.536858: cpu_frequency: state=3300000 cpu_id=7
    kworker/7:2-556 [007] .... 620.557857: cpu_frequency: state=3401000 cpu_id=7
    -0 [007] d... 669.591363: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 669.591939: cpu_idle: state=4294967295 cpu_id=7
    -0 [007] d... 669.591980: cpu_idle: state=4 cpu_id=7
    -0 [007] dN.. 669.591989: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 670.201224: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 670.221975: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 670.222016: cpu_frequency: state=3300000 cpu_id=7
    -0 [007] d... 670.222026: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 670.234964: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 670.801251: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 671.236046: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 671.236073: cpu_frequency: state=3100000 cpu_id=7
    -0 [007] d... 671.236112: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 671.393437: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 671.401277: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 671.404083: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 671.404111: cpu_frequency: state=2900000 cpu_id=7
    -0 [007] d... 671.404125: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 671.404974: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 671.501180: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 671.995414: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 671.995459: cpu_frequency: state=2800000 cpu_id=7
    -0 [007] d... 671.995469: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 671.996287: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 672.001305: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.078374: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 672.078410: cpu_frequency: state=2600000 cpu_id=7
    -0 [007] d... 672.078419: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.158020: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 672.158040: cpu_frequency: state=2400000 cpu_id=7
    -0 [007] d... 672.158044: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.160038: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 672.234557: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.237121: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 672.237174: cpu_frequency: state=2100000 cpu_id=7
    -0 [007] d... 672.237186: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.237778: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 672.267902: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.269860: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 672.269906: cpu_frequency: state=1900000 cpu_id=7
    -0 [007] d... 672.269914: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.271902: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 672.751342: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.823056: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 672.823095: cpu_frequency: state=1600000 cpu_id=7

    WITH

    -0 [007] dN.. 4380.928009: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-399 [007] .... 4380.949767: cpu_frequency: state=2000000 cpu_id=7
    kworker/7:2-399 [007] .... 4380.969765: cpu_frequency: state=2200000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.009766: cpu_frequency: state=2500000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.029767: cpu_frequency: state=2600000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.049769: cpu_frequency: state=2800000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.069769: cpu_frequency: state=3000000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.089771: cpu_frequency: state=3100000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.109772: cpu_frequency: state=3400000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.129773: cpu_frequency: state=3401000 cpu_id=7
    -0 [007] d... 4428.226159: cpu_idle: state=1 cpu_id=7
    -0 [007] d... 4428.226176: cpu_idle: state=4294967295 cpu_id=7
    -0 [007] d... 4428.226181: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 4428.227177: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 4428.551640: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 4428.649239: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-399 [007] .... 4428.649268: cpu_frequency: state=2800000 cpu_id=7
    -0 [007] d... 4428.649278: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 4428.689856: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 4428.799542: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 4428.801683: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-399 [007] .... 4428.801748: cpu_frequency: state=1700000 cpu_id=7
    -0 [007] d... 4428.801761: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 4428.806545: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 4429.051880: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 4429.086240: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-399 [007] .... 4429.086293: cpu_frequency: state=1600000 cpu_id=7

    Without the patch the CPU dropped to min frequency after 3.2s
    With the patch applied the CPU dropped to min frequency after 0.86s

    Signed-off-by: Stratos Karafotis
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Stratos Karafotis
     

15 Nov, 2016

3 commits

  • What's returned from this function is the delta by which the frequency
    must be increased or decreased and not the final frequency that should
    be selected.

    Name it properly to match its purpose. Also update the variables used to
    store that value.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • lpstate_idx remains uninitialized in the case when elapsed_time
    is greater than MAX_RAMP_DOWN_TIME. At the end of rampdown the
    global pstate should be equal to the local pstate.

    Fixes: 20b15b766354 (cpufreq: powernv: Use PMCR to verify global and localpstate)
    Reported-by: Stephen Rothwell
    Signed-off-by: Akshay Adiga
    Signed-off-by: Rafael J. Wysocki

    Akshay Adiga
     
  • Use get_target_pstate_use_cpu_load() to calculate target P-State for
    devices, with the preferred power management profile in ACPI FADT
    set to PM_MOBILE.

    This may help in resolving some thermal issues caused by low sustained
    cpu bound workloads. The current algorithm tend to over provision in this
    case as it doesn't look at the CPU busyness.

    Also included the fix from Arnd Bergmann to solve compile
    issue, when CONFIG_ACPI is not defined.

    Signed-off-by: Srinivas Pandruvada
    Signed-off-by: Rafael J. Wysocki

    Srinivas Pandruvada
     

11 Nov, 2016

6 commits

  • For device-tree based pxa25x and pxa27x platforms, cpufreq-dt driver is
    doing the job as well as pxa2xx-cpufreq, so add these platforms to the
    compatibility list.

    This won't work for legacy non device-tree platforms where
    pxa2xx-cpufreq is still required.

    Signed-off-by: Robert Jarzmik
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Robert Jarzmik
     
  • Allow CPUfreq statistics to be cleared by writing anything to
    /sys/.../cpufreq/stats/reset.

    Signed-off-by: Markus Mayer
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Markus Mayer
     
  • The earlier implementation of governors used background timers and so
    functions, mutex, etc had 'timer' keyword in their names.

    But that's not true anymore. Replace 'timer' with 'update', as those
    functions, variables are based around updates to frequency.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • As fast_switch() may get called with interrupt disable mode, we cannot
    hold a mutex to update the global_pstate_info. So currently, fast_switch()
    does not update the global_pstate_info and it will end up with stale data
    whenever pstate is updated through fast_switch().

    As the gpstate_timer can fire after fast_switch() has updated the pstates,
    the timer handler cannot rely on the cached values of local and global
    pstate and needs to read it from the PMCR.

    Only gpstate_timer_handler() is affected by the stale cached pstate data
    beacause either fast_switch() or target_index() routines will be called
    for a given govenor, but gpstate_timer can fire after the governor has
    changed to schedutil.

    Signed-off-by: Akshay Adiga
    Reviewed-by: Gautham R. Shenoy
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Akshay Adiga
     
  • Adding fast_switch which does light weight operation to set the desired
    pstate. Both global and local pstates are set to the same desired pstate.

    Signed-off-by: Akshay Adiga
    Reviewed-by: Gautham R. Shenoy
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Akshay Adiga
     
  • Fixes the following sparse warning:

    drivers/cpufreq/brcmstb-avs-cpufreq.c:982:18: warning:
    symbol 'brcm_avs_cpufreq_attr' was not declared. Should it be static?

    Signed-off-by: Wei Yongjun
    Acked-by: Markus Mayer
    Signed-off-by: Rafael J. Wysocki

    Wei Yongjun
     

01 Nov, 2016

1 commit