12 Feb, 2019

1 commit

  • These macros can be reused by governors which don't use the common
    governor code present in cpufreq_governor.c and should be moved to the
    relevant header.

    Now that they are getting moved to the right header file, reuse them in
    schedutil governor as well (that required rename of show/store
    routines).

    Also create gov_attr_wo() macro for write-only sysfs files, this will be
    used by Interactive governor in a later patch.

    Signed-off-by: Viresh Kumar

    Viresh Kumar
     

10 Jan, 2018

1 commit

  • commit 7d5905dc14a87805a59f3c5bf70173aac2bb18f8 upstream.

    After commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get()
    for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo
    on x86 can be either the nominal CPU frequency (which is constant)
    or the frequency most recently requested by a scaling governor in
    cpufreq, depending on the cpufreq configuration. That is somewhat
    inconsistent and is different from what it was before 4.13, so in
    order to restore the previous behavior, make it report the current
    CPU frequency like the scaling_cur_freq sysfs file in cpufreq.

    To that end, modify the /proc/cpuinfo implementation on x86 to use
    aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback
    registers, if available, and use their values to compute the CPU
    frequency to be reported as "cpu MHz".

    However, do that carefully enough to avoid accumulating delays that
    lead to unacceptable access times for /proc/cpuinfo on systems with
    many CPUs. Run aperfmperf_snapshot_khz() once on all CPUs
    asynchronously at the /proc/cpuinfo open time, add a single delay
    upfront (if necessary) at that point and simply compute the current
    frequency while running show_cpuinfo() for each individual CPU.

    Also, to avoid slowing down /proc/cpuinfo accesses too much, reduce
    the default delay between consecutive APERF and MPERF reads to 10 ms,
    which should be sufficient to get large enough numbers for the
    frequency computation in all cases.

    Fixes: 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"")
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Thomas Gleixner
    Tested-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Rafael J. Wysocki
     

04 Sep, 2017

1 commit

  • * pm-cpufreq-sched:
    cpufreq: schedutil: Always process remote callback with slow switching
    cpufreq: schedutil: Don't restrict kthread to related_cpus unnecessarily
    cpufreq: Return 0 from ->fast_switch() on errors
    cpufreq: Simplify cpufreq_can_do_remote_dvfs()
    cpufreq: Process remote callbacks from any CPU if the platform permits
    sched: cpufreq: Allow remote cpufreq callbacks
    cpufreq: schedutil: Use unsigned int for iowait boost
    cpufreq: schedutil: Make iowait boost more energy efficient

    Rafael J. Wysocki
     

08 Aug, 2017

1 commit


01 Aug, 2017

2 commits

  • On many platforms, CPUs can do DVFS across cpufreq policies. i.e CPU
    from policy-A can change frequency of CPUs belonging to policy-B.

    This is quite common in case of ARM platforms where we don't
    configure any per-cpu register.

    Add a flag to identify such platforms and update
    cpufreq_can_do_remote_dvfs() to allow remote callbacks if this flag is
    set.

    Also enable the flag for cpufreq-dt driver which is used only on ARM
    platforms currently.

    Signed-off-by: Viresh Kumar
    Acked-by: Saravana Kannan
    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • With Android UI and benchmarks the latency of cpufreq response to
    certain scheduling events can become very critical. Currently, callbacks
    into cpufreq governors are only made from the scheduler if the target
    CPU of the event is the same as the current CPU. This means there are
    certain situations where a target CPU may not run the cpufreq governor
    for some time.

    One testcase to show this behavior is where a task starts running on
    CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the
    system is configured such that the new tasks should receive maximum
    demand initially, this should result in CPU0 increasing frequency
    immediately. But because of the above mentioned limitation though, this
    does not occur.

    This patch updates the scheduler core to call the cpufreq callbacks for
    remote CPUs as well.

    The schedutil, ondemand and conservative governors are updated to
    process cpufreq utilization update hooks called for remote CPUs where
    the remote CPU is managed by the cpufreq policy of the local CPU.

    The intel_pstate driver is updated to always reject remote callbacks.

    This is tested with couple of usecases (Android: hackbench, recentfling,
    galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit
    octa-core, single policy). Only galleryfling showed minor improvements,
    while others didn't had much deviation.

    The reason being that this patch only targets a corner case, where
    following are required to be true to improve performance and that
    doesn't happen too often with these tests:

    - Task is migrated to another CPU.
    - The task has high demand, and should take the target CPU to higher
    OPPs.
    - And the target CPU doesn't call into the cpufreq governor until the
    next tick.

    Based on initial work from Steve Muckle.

    Signed-off-by: Viresh Kumar
    Acked-by: Saravana Kannan
    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

26 Jul, 2017

2 commits

  • The policy->transition_latency field is used for multiple purposes
    today and its not straight forward at all. This is how it is used:

    A. Set the correct transition_latency value.

    B. Set it to CPUFREQ_ETERNAL because:
    1. We don't want automatic dynamic switching (with
    ondemand/conservative) to happen at all.
    2. We don't know the transition latency.

    This patch handles the B.1. case in a more readable way. A new flag for
    the cpufreq drivers is added to disallow use of cpufreq governors which
    have dynamic_switching flag set.

    All the current cpufreq drivers which are setting transition_latency
    unconditionally to CPUFREQ_ETERNAL are updated to use it. They don't
    need to set transition_latency anymore.

    There shouldn't be any functional change after this patch.

    Signed-off-by: Viresh Kumar
    Reviewed-by: Dominik Brodowski
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • There is no limitation in the ondemand or conservative governors which
    disallow the transition_latency to be greater than 10 ms.

    The max_transition_latency field is rather used to disallow automatic
    dynamic frequency switching for platforms which didn't wanted these
    governors to run.

    Replace max_transition_latency with a boolean (dynamic_switching) and
    check for transition_latency == CPUFREQ_ETERNAL along with that. This
    makes it pretty straight forward to read/understand now.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

22 Jul, 2017

2 commits

  • The policy->transition_delay_us field is used only by the schedutil
    governor currently, and this field describes how fast the driver wants
    the cpufreq governor to change CPUs frequency. It should rather be a
    common thing across all governors, as it doesn't have any schedutil
    dependency here.

    Create a new helper cpufreq_policy_transition_delay_us() to get the
    transition delay across all governors.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • The cpufreq core and governors aren't supposed to set a limit on how
    fast we want to try changing the frequency. This is currently done for
    the legacy governors with help of min_sampling_rate.

    At worst, we may end up setting the sampling rate to a value lower than
    the rate at which frequency can be changed and then one of the CPUs in
    the policy will be only changing frequency for ever.

    But that is something for the user to decide and there is no need to
    have special handling for such cases in the core. Leave it for the user
    to figure out.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

15 Jul, 2017

1 commit

  • Pull thermal management updates from Zhang Rui:

    - Improve thermal cpu_cooling interaction with cpufreq core.

    The cpu_cooling driver is designed to use CPU frequency scaling to
    avoid high thermal states for a platform. But it wasn't glued really
    well with cpufreq core.

    For example clipped-cpus is copied from the policy structure and its
    much better to use the policy->cpus (or related_cpus) fields directly
    as they may have got updated. Not that things were broken before this
    series, but they can be optimized a bit more.

    This series tries to improve interactions between cpufreq core and
    cpu_cooling driver and does some fixes/cleanups to the cpu_cooling
    driver. (Viresh Kumar)

    - A couple of fixes and cleanups in thermal core and imx, hisilicon,
    bcm_2835, int340x thermal drivers. (Arvind Yadav, Dan Carpenter,
    Sumeet Pawnikar, Srinivas Pandruvada, Willy WOLFF)

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (24 commits)
    thermal: bcm2835: fix an error code in probe()
    thermal: hisilicon: Handle return value of clk_prepare_enable
    thermal: imx: Handle return value of clk_prepare_enable
    thermal: int340x: check for sensor when PTYP is missing
    Thermal/int340x: Fix few typos and kernel-doc style
    thermal: fix source code documentation for parameters
    thermal: cpu_cooling: Replace kmalloc with kmalloc_array
    thermal: cpu_cooling: Rearrange struct cpufreq_cooling_device
    thermal: cpu_cooling: 'freq' can't be zero in cpufreq_state2power()
    thermal: cpu_cooling: don't store cpu_dev in cpufreq_cdev
    thermal: cpu_cooling: get_level() can't fail
    thermal: cpu_cooling: create structure for idle time stats
    thermal: cpu_cooling: merge frequency and power tables
    thermal: cpu_cooling: get rid of 'allowed_cpus'
    thermal: cpu_cooling: OPPs are registered for all CPUs
    thermal: cpu_cooling: store cpufreq policy
    cpufreq: create cpufreq_table_count_valid_entries()
    thermal: cpu_cooling: use cpufreq_policy to register cooling device
    thermal: cpu_cooling: get rid of a variable in cpufreq_set_cur_state()
    thermal: cpu_cooling: remove cpufreq_cooling_get_level()
    ...

    Linus Torvalds
     

27 Jun, 2017

1 commit

  • The goal of this change is to give users a uniform and meaningful
    result when they read /sys/...cpufreq/scaling_cur_freq
    on modern x86 hardware, as compared to what they get today.

    Modern x86 processors include the hardware needed
    to accurately calculate frequency over an interval --
    APERF, MPERF, and the TSC.

    Here we provide an x86 routine to make this calculation
    on supported hardware, and use it in preference to any
    driver driver-specific cpufreq_driver.get() routine.

    MHz is computed like so:

    MHz = base_MHz * delta_APERF / delta_MPERF

    MHz is the average frequency of the busy processor
    over a measurement interval. The interval is
    defined to be the time between successive invocations
    of aperfmperf_khz_on_cpu(), which are expected to to
    happen on-demand when users read sysfs attribute
    cpufreq/scaling_cur_freq.

    As with previous methods of calculating MHz,
    idle time is excluded.

    base_MHz above is from TSC calibration global "cpu_khz".

    This x86 native method to calculate MHz returns a meaningful result
    no matter if P-states are controlled by hardware or firmware
    and/or if the Linux cpufreq sub-system is or is-not installed.

    When this routine is invoked more frequently, the measurement
    interval becomes shorter. However, the code limits re-computation
    to 10ms intervals so that average frequency remains meaningful.

    Discerning users are encouraged to take advantage of
    the turbostat(8) utility, which can gracefully handle
    concurrent measurement intervals of arbitrary length.

    Signed-off-by: Len Brown
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Rafael J. Wysocki

    Len Brown
     

28 May, 2017

1 commit


18 Apr, 2017

1 commit

  • Make the schedutil governor take the initial (default) value of the
    rate_limit_us sysfs attribute from the (new) transition_delay_us
    policy parameter (to be set by the scaling driver).

    That will allow scaling drivers to make schedutil use smaller default
    values of rate_limit_us and reduce the default average time interval
    between consecutive frequency changes.

    Make intel_pstate set transition_delay_us to 500.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     

04 Feb, 2017

3 commits


21 Nov, 2016

1 commit


11 Nov, 2016

1 commit


20 Oct, 2016

1 commit

  • 'best' is always less or equals to 'pos', so `best - pos' returns
    a negative value which is then getting casted to `unsigned int'
    and passed to __cpufreq_driver_target()->acpi_cpufreq_target()
    for policy->freq_table selection. This results in

    BUG: unable to handle kernel paging request at ffff881019b469f8
    IP: [] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq]
    PGD 267f067
    PUD 0

    Oops: 0000 [#1] PREEMPT SMP
    CPU: 6 PID: 70 Comm: kworker/6:1 Not tainted 4.9.0-rc1-next-20161017-dbg-dirty
    Workqueue: events dbs_work_handler
    task: ffff88041b808000 task.stack: ffff88041b810000
    RIP: 0010:[] [] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq]
    RSP: 0018:ffff88041b813c60 EFLAGS: 00010282
    RAX: ffff880419b46a00 RBX: ffff88041b848400 RCX: ffff880419b20f80
    RDX: 00000000001dff38 RSI: 00000000ffffffff RDI: ffff88041b848400
    RBP: ffff88041b813cb0 R08: 0000000000000006 R09: 0000000000000040
    R10: ffffffff8207f9e0 R11: ffffffff8173595b R12: 0000000000000000
    R13: ffff88041f1dff38 R14: 0000000000262900 R15: 0000000bfffffff4
    FS: 0000000000000000(0000) GS:ffff88041f000000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffff881019b469f8 CR3: 000000041a2d3000 CR4: 00000000001406e0
    Stack:
    ffff88041b813cb0 ffffffff813347f9 ffff88041b813ca0 ffffffff81334663
    ffff88041f1d4bc0 ffff88041b848400 0000000000000000 0000000000000000
    0000000000262900 0000000000000000 ffff88041b813d00 ffffffff813355dc
    Call Trace:
    [] ? cpufreq_freq_transition_begin+0xf1/0xfc
    [] ? get_cpu_idle_time+0x97/0xa6
    [] __cpufreq_driver_target+0x3b6/0x44e
    [] cs_dbs_timer+0x11a/0x135
    [] dbs_work_handler+0x39/0x62
    [] process_one_work+0x280/0x4a5
    [] worker_thread+0x24f/0x397
    [] ? rescuer_thread+0x30b/0x30b
    [] ? nl80211_get_key+0x29/0x36a
    [] kthread+0xfc/0x104
    [] ? put_lock_stats.isra.9+0xe/0x20
    [] ? kthread_create_on_node+0x3f/0x3f
    [] ret_from_fork+0x22/0x30
    Code: 56 4d 6b ff 0c 41 55 41 54 53 48 83 ec 28 48 8b 15 ad 1e 00 00 44 8b 41
    08 48 8b 87 c8 00 00 00 49 89 d5 4e 03 2c c5 80 b2 78 81 8b 74 38 04 45
    3b 75 00 75 11 31 c0 83 39 00 0f 84 1c 01 00
    RIP [] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq]
    RSP
    CR2: ffff881019b469f8
    ---[ end trace 16d9fc7a17897d37 ]---

    [ rjw: In some cases this bug may also cause incorrect frequencies to
    be selected by cpufreq governors. ]

    Fixes: 899bb6642f2a (cpufreq: skip invalid entries when searching the frequency)
    Link: http://marc.info/?l=linux-kernel&m=147672030714331&w=2
    Reported-and-tested-by: Sedat Dilek
    Reported-and-tested-by: Jörg Otte
    Signed-off-by: Sergey Senozhatsky
    Acked-by: Viresh Kumar
    Cc: 4.8+ # 4.8+
    Signed-off-by: Rafael J. Wysocki

    Sergey Senozhatsky
     

13 Oct, 2016

1 commit


21 Jul, 2016

1 commit

  • Cpufreq governors may need to know what a particular target frequency
    maps to in the driver without necessarily wanting to set the frequency.
    Support this operation via a new cpufreq API,
    cpufreq_driver_resolve_freq(). This API returns the lowest driver
    frequency equal or greater than the target frequency
    (CPUFREQ_RELATION_L), subject to any policy (min/max) or driver
    limitations. The mapping is also cached in the policy so that a
    subsequent fast_switch operation can avoid repeating the same lookup.

    The API will call a new cpufreq driver callback, resolve_freq(), if it
    has been registered by the driver. Otherwise the frequency is resolved
    via cpufreq_frequency_table_target(). Rather than require ->target()
    style drivers to provide a resolve_freq() callback it is left to the
    caller to ensure that the driver implements this callback if necessary
    to use cpufreq_driver_resolve_freq().

    Suggested-by: Rafael J. Wysocki
    Signed-off-by: Steve Muckle
    Signed-off-by: Rafael J. Wysocki

    Steve Muckle
     

07 Jul, 2016

1 commit

  • cpufreq drivers aren't required to provide a sorted frequency table
    today, and even the ones which provide a sorted table aren't handled
    efficiently by cpufreq core.

    This patch adds infrastructure to verify if the freq-table provided by
    the drivers is sorted or not, and use efficient helpers if they are
    sorted.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

09 Jun, 2016

3 commits

  • This routine can't fail unless the frequency table is invalid and
    doesn't contain any valid entries.

    Make it return the index and WARN() in case it is used for an invalid
    table.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • The policy already has this pointer set, use it instead.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • Most of the callers of cpufreq_frequency_get_table() already have the
    pointer to a valid 'policy' structure and they don't really need to go
    through the per-cpu variable first and then a check to validate the
    frequency, in order to find the freq-table for the policy.

    Directly use the policy->freq_table field instead for them.

    Only one user of that API is left after above changes, cpu_cooling.c and
    it accesses the freq_table in a racy way as the policy can get freed in
    between.

    Fix it by using cpufreq_cpu_get() properly.

    Since there are no more users of cpufreq_frequency_get_table() left, get
    rid of it.

    Signed-off-by: Viresh Kumar
    Acked-by: Javi Merino (cpu_cooling.c)
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

03 Jun, 2016

4 commits

  • The modularity of cpufreq_stats is quite problematic.

    First off, the usage of policy notifiers for the initialization
    and cleanup in the cpufreq_stats module is inherently racy with
    respect to CPU offline/online and the initialization and cleanup
    of the cpufreq driver.

    Second, fast frequency switching (used by the schedutil governor)
    cannot be enabled if any transition notifiers are registered, so
    if the cpufreq_stats module (that registers a transition notifier
    for updating transition statistics) is loaded, the schedutil governor
    cannot use fast frequency switching.

    On the other hand, allowing cpufreq_stats to be built as a module
    doesn't really add much value. Arguably, there's not much reason
    for that code to be modular at all.

    For the above reasons, make the cpufreq stats code non-modular,
    modify the core to invoke functions provided by that code directly
    and drop the notifiers from it.

    Make the stats sysfs attributes appear empty if fast frequency
    switching is enabled as the statistics will not be updated in that
    case anyway (and returning -EBUSY from those attributes breaks
    powertop).

    While at it, clean up Kconfig help for the CPU_FREQ_STAT and
    CPU_FREQ_STAT_DETAILS options.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The 'initialized' field in struct cpufreq_governor is only used by
    the conservative governor (as a usage counter) and the way that
    happens is far from straightforward and arguably incorrect.

    Namely, the value of 'initialized' is checked by
    cpufreq_dbs_governor_init() and cpufreq_dbs_governor_exit() and
    the results of those checks are passed (as the second argument) to
    the ->init() and ->exit() callbacks in struct dbs_governor. Those
    callbacks are only implemented by the ondemand and conservative
    governors and ondemand doesn't use their second argument at all.
    In turn, the conservative governor uses it to decide whether or not
    to either register or unregister a transition notifier.

    That whole mechanism is not only unnecessarily convoluted, but also
    racy, because the 'initialized' field of struct cpufreq_governor is
    updated in cpufreq_init_governor() and cpufreq_exit_governor() under
    policy->rwsem which doesn't help if one of these functions is run
    twice in parallel for different policies (which isn't impossible in
    principle), for example.

    Instead of it, add a proper usage counter to the conservative
    governor and update it from cs_init() and cs_exit() which is
    guaranteed to be non-racy, as those functions are only called
    under gov_dbs_data_mutex which is global.

    With that in place, drop the 'initialized' field from struct
    cpufreq_governor as it is not used any more.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Create a new helper to avoid code duplication across governors.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • The design of the cpufreq governor API is not very straightforward,
    as struct cpufreq_governor provides only one callback to be invoked
    from different code paths for different purposes. The purpose it is
    invoked for is determined by its second "event" argument, causing it
    to act as a "callback multiplexer" of sorts.

    Unfortunately, that leads to extra complexity in governors, some of
    which implement the ->governor() callback as a switch statement
    that simply checks the event argument and invokes a separate function
    to handle that specific event.

    That extra complexity can be eliminated by replacing the all-purpose
    ->governor() callback with a family of callbacks to carry out specific
    governor operations: initialization and exit, start and stop and policy
    limits updates. That also turns out to reduce the code size too, so
    do it.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     

09 Apr, 2016

1 commit

  • Due to differences in the cpufreq core's handling of runtime CPU
    offline and nonboot CPUs disabling during system suspend-to-RAM,
    fast frequency switching gets disabled after a suspend-to-RAM and
    resume cycle on all of the nonboot CPUs.

    To prevent that from happening, move the invocation of
    cpufreq_disable_fast_switch() from cpufreq_exit_governor() to
    sugov_exit(), as the schedutil governor is the only user of fast
    frequency switching today anyway.

    That simply prevents cpufreq_disable_fast_switch() from being called
    without invoking the ->governor callback for the CPUFREQ_GOV_POLICY_EXIT
    event (which happens during system suspend now).

    Fixes: b7898fda5bc7 (cpufreq: Support for fast frequency switching)
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     

02 Apr, 2016

3 commits

  • Modify the ACPI cpufreq driver to provide a method for switching
    CPU frequencies from interrupt context and update the cpufreq core
    to support that method if available.

    Introduce a new cpufreq driver callback, ->fast_switch, to be
    invoked for frequency switching from interrupt context by (future)
    governors supporting that feature via (new) helper function
    cpufreq_driver_fast_switch().

    Add two new policy flags, fast_switch_possible, to be set by the
    cpufreq driver if fast frequency switching can be used for the
    given policy and fast_switch_enabled, to be set by the governor
    if it is going to use fast frequency switching for the given
    policy. Also add a helper for setting the latter.

    Since fast frequency switching is inherently incompatible with
    cpufreq transition notifiers, make it possible to set the
    fast_switch_enabled only if there are no transition notifiers
    already registered and make the registration of new transition
    notifiers fail if fast_switch_enabled is set for at least one
    policy.

    Implement the ->fast_switch callback in the ACPI cpufreq driver
    and make it set fast_switch_possible during policy initialization
    as appropriate.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Move definitions of symbols related to transition latency and
    sampling rate to include/linux/cpufreq.h so they can be used by
    (future) goverernors located outside of drivers/cpufreq/.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Move definitions and function headers related to struct gov_attr_set
    to include/linux/cpufreq.h so they can be used by (future) goverernors
    located outside of drivers/cpufreq/.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     

11 Mar, 2016

2 commits


09 Mar, 2016

3 commits

  • The entire sequence of events (like INIT/START or STOP/EXIT) for which
    cpufreq_governor() is called, is guaranteed to be protected by
    policy->rwsem now.

    The additional checks that were added earlier (as we were forced to drop
    policy->rwsem before calling cpufreq_governor() for EXIT event), aren't
    required anymore.

    Over that, they weren't sufficient really. They just take care of
    START/STOP events, but not INIT/EXIT and the state machine was never
    maintained properly by them.

    Kill the unnecessary checks and policy->governor_enabled field.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • Earlier, when the struct freq-attr was used to represent governor
    attributes, the standard cpufreq show/store sysfs attribute callbacks
    were applied to the governor tunable attributes and they always acquire
    the policy->rwsem lock before carrying out the operation. That could
    have resulted in an ABBA deadlock if governor tunable attributes are
    removed under policy->rwsem while one of them is being accessed
    concurrently (if sysfs attributes removal wins the race, it will wait
    for the access to complete with policy->rwsem held while the attribute
    callback will block on policy->rwsem indefinitely).

    We attempted to address this issue by dropping policy->rwsem around
    governor tunable attributes removal (that is, around invocations of the
    ->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT)
    in cpufreq_set_policy(), but that opened up race conditions that had not
    been possible with policy->rwsem held all the time.

    The previous commit, "cpufreq: governor: New sysfs show/store callbacks
    for governor tunables", fixed the original ABBA deadlock by adding new
    governor specific show/store callbacks.

    We don't have to drop rwsem around invocations of governor event
    CPUFREQ_GOV_POLICY_EXIT anymore, and original fix can be reverted now.

    Fixes: 955ef4833574 (cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT)
    Signed-off-by: Viresh Kumar
    Reported-by: Juri Lelli
    Tested-by: Juri Lelli
    Tested-by: Shilpasri G Bhat
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • Introduce a mechanism by which parts of the cpufreq subsystem
    ("setpolicy" drivers or the core) can register callbacks to be
    executed from cpufreq_update_util() which is invoked by the
    scheduler's update_load_avg() on CPU utilization changes.

    This allows the "setpolicy" drivers to dispense with their timers
    and do all of the computations they need and frequency/voltage
    adjustments in the update_load_avg() code path, among other things.

    The update_load_avg() changes were suggested by Peter Zijlstra.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: Ingo Molnar

    Rafael J. Wysocki
     

27 Feb, 2016

1 commit