29 Oct, 2018

1 commit

  • These macros can be reused by governors which don't use the common
    governor code present in cpufreq_governor.c and should be moved to the
    relevant header.

    Now that they are getting moved to the right header file, reuse them in
    schedutil governor as well (that required rename of show/store
    routines).

    Also create gov_attr_wo() macro for write-only sysfs files, this will be
    used by Interactive governor in a later patch.

    Signed-off-by: Viresh Kumar

    Viresh Kumar
     

26 Jul, 2017

1 commit

  • There is no limitation in the ondemand or conservative governors which
    disallow the transition_latency to be greater than 10 ms.

    The max_transition_latency field is rather used to disallow automatic
    dynamic frequency switching for platforms which didn't wanted these
    governors to run.

    Replace max_transition_latency with a boolean (dynamic_switching) and
    check for transition_latency == CPUFREQ_ETERNAL along with that. This
    makes it pretty straight forward to read/understand now.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

22 Jul, 2017

1 commit

  • The cpufreq core and governors aren't supposed to set a limit on how
    fast we want to try changing the frequency. This is currently done for
    the legacy governors with help of min_sampling_rate.

    At worst, we may end up setting the sampling rate to a value lower than
    the rate at which frequency can be changed and then one of the CPUs in
    the policy will be only changing frequency for ever.

    But that is something for the user to decide and there is no need to
    have special handling for such cases in the core. Leave it for the user
    to figure out.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

02 Mar, 2017

1 commit


17 Nov, 2016

1 commit

  • Conservative governor changes the CPU frequency in steps.
    That means that if a CPU runs at max frequency, it will need several
    sampling periods to return to min frequency when the workload
    is finished.

    If the update function that calculates the load and target frequency
    is deferred, the governor might need even more time to decrease the
    frequency.

    This may have impact to power consumption and after all conservative
    should decrease the frequency if there is no workload at every sampling
    rate.

    To resolve the above issue calculate the number of sampling periods
    that the update is deferred. Considering that for each sampling period
    conservative should drop the frequency by a freq_step because the
    CPU was idle apply the proper subtraction to requested frequency.

    Below, the kernel trace with and without this patch. First an
    intensive workload is applied on a specific CPU. Then the workload
    is removed and the CPU goes to idle.

    WITHOUT

    -0 [007] dN.. 620.329153: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 620.350857: cpu_frequency: state=1700000 cpu_id=7
    kworker/7:2-556 [007] .... 620.370856: cpu_frequency: state=1900000 cpu_id=7
    kworker/7:2-556 [007] .... 620.390854: cpu_frequency: state=2100000 cpu_id=7
    kworker/7:2-556 [007] .... 620.411853: cpu_frequency: state=2200000 cpu_id=7
    kworker/7:2-556 [007] .... 620.432854: cpu_frequency: state=2400000 cpu_id=7
    kworker/7:2-556 [007] .... 620.453854: cpu_frequency: state=2600000 cpu_id=7
    kworker/7:2-556 [007] .... 620.494856: cpu_frequency: state=2900000 cpu_id=7
    kworker/7:2-556 [007] .... 620.515856: cpu_frequency: state=3100000 cpu_id=7
    kworker/7:2-556 [007] .... 620.536858: cpu_frequency: state=3300000 cpu_id=7
    kworker/7:2-556 [007] .... 620.557857: cpu_frequency: state=3401000 cpu_id=7
    -0 [007] d... 669.591363: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 669.591939: cpu_idle: state=4294967295 cpu_id=7
    -0 [007] d... 669.591980: cpu_idle: state=4 cpu_id=7
    -0 [007] dN.. 669.591989: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 670.201224: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 670.221975: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 670.222016: cpu_frequency: state=3300000 cpu_id=7
    -0 [007] d... 670.222026: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 670.234964: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 670.801251: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 671.236046: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 671.236073: cpu_frequency: state=3100000 cpu_id=7
    -0 [007] d... 671.236112: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 671.393437: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 671.401277: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 671.404083: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 671.404111: cpu_frequency: state=2900000 cpu_id=7
    -0 [007] d... 671.404125: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 671.404974: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 671.501180: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 671.995414: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 671.995459: cpu_frequency: state=2800000 cpu_id=7
    -0 [007] d... 671.995469: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 671.996287: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 672.001305: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.078374: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 672.078410: cpu_frequency: state=2600000 cpu_id=7
    -0 [007] d... 672.078419: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.158020: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 672.158040: cpu_frequency: state=2400000 cpu_id=7
    -0 [007] d... 672.158044: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.160038: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 672.234557: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.237121: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 672.237174: cpu_frequency: state=2100000 cpu_id=7
    -0 [007] d... 672.237186: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.237778: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 672.267902: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.269860: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 672.269906: cpu_frequency: state=1900000 cpu_id=7
    -0 [007] d... 672.269914: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.271902: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 672.751342: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.823056: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 672.823095: cpu_frequency: state=1600000 cpu_id=7

    WITH

    -0 [007] dN.. 4380.928009: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-399 [007] .... 4380.949767: cpu_frequency: state=2000000 cpu_id=7
    kworker/7:2-399 [007] .... 4380.969765: cpu_frequency: state=2200000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.009766: cpu_frequency: state=2500000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.029767: cpu_frequency: state=2600000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.049769: cpu_frequency: state=2800000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.069769: cpu_frequency: state=3000000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.089771: cpu_frequency: state=3100000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.109772: cpu_frequency: state=3400000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.129773: cpu_frequency: state=3401000 cpu_id=7
    -0 [007] d... 4428.226159: cpu_idle: state=1 cpu_id=7
    -0 [007] d... 4428.226176: cpu_idle: state=4294967295 cpu_id=7
    -0 [007] d... 4428.226181: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 4428.227177: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 4428.551640: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 4428.649239: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-399 [007] .... 4428.649268: cpu_frequency: state=2800000 cpu_id=7
    -0 [007] d... 4428.649278: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 4428.689856: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 4428.799542: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 4428.801683: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-399 [007] .... 4428.801748: cpu_frequency: state=1700000 cpu_id=7
    -0 [007] d... 4428.801761: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 4428.806545: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 4429.051880: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 4429.086240: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-399 [007] .... 4429.086293: cpu_frequency: state=1600000 cpu_id=7

    Without the patch the CPU dropped to min frequency after 3.2s
    With the patch applied the CPU dropped to min frequency after 0.86s

    Signed-off-by: Stratos Karafotis
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Stratos Karafotis
     

11 Nov, 2016

1 commit

  • The earlier implementation of governors used background timers and so
    functions, mutex, etc had 'timer' keyword in their names.

    But that's not true anymore. Replace 'timer' with 'update', as those
    functions, variables are based around updates to frequency.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

03 Jun, 2016

2 commits

  • The 'initialized' field in struct cpufreq_governor is only used by
    the conservative governor (as a usage counter) and the way that
    happens is far from straightforward and arguably incorrect.

    Namely, the value of 'initialized' is checked by
    cpufreq_dbs_governor_init() and cpufreq_dbs_governor_exit() and
    the results of those checks are passed (as the second argument) to
    the ->init() and ->exit() callbacks in struct dbs_governor. Those
    callbacks are only implemented by the ondemand and conservative
    governors and ondemand doesn't use their second argument at all.
    In turn, the conservative governor uses it to decide whether or not
    to either register or unregister a transition notifier.

    That whole mechanism is not only unnecessarily convoluted, but also
    racy, because the 'initialized' field of struct cpufreq_governor is
    updated in cpufreq_init_governor() and cpufreq_exit_governor() under
    policy->rwsem which doesn't help if one of these functions is run
    twice in parallel for different policies (which isn't impossible in
    principle), for example.

    Instead of it, add a proper usage counter to the conservative
    governor and update it from cs_init() and cs_exit() which is
    guaranteed to be non-racy, as those functions are only called
    under gov_dbs_data_mutex which is global.

    With that in place, drop the 'initialized' field from struct
    cpufreq_governor as it is not used any more.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The design of the cpufreq governor API is not very straightforward,
    as struct cpufreq_governor provides only one callback to be invoked
    from different code paths for different purposes. The purpose it is
    invoked for is determined by its second "event" argument, causing it
    to act as a "callback multiplexer" of sorts.

    Unfortunately, that leads to extra complexity in governors, some of
    which implement the ->governor() callback as a switch statement
    that simply checks the event argument and invokes a separate function
    to handle that specific event.

    That extra complexity can be eliminated by replacing the all-purpose
    ->governor() callback with a family of callbacks to carry out specific
    governor operations: initialization and exit, start and stop and policy
    limits updates. That also turns out to reduce the code size too, so
    do it.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     

28 Apr, 2016

1 commit

  • The name of the prev_cpu_wall field in struct cpu_dbs_info is
    confusing, because it doesn't represent wall time, but the previous
    update time as returned by get_cpu_idle_time() (that may be the
    current value of jiffies_64 in some cases, for example).

    Moreover, the names of some related variables in dbs_update() take
    that confusion further.

    Rename all of those things to make their names reflect the purpose
    more accurately. While at it, drop unnecessary parens from one of
    the updated expressions.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar
    Acked-by: Chen Yu

    Rafael J. Wysocki
     

02 Apr, 2016

4 commits


09 Mar, 2016

27 commits

  • That mutex is only used by cpufreq_governor_dbs() and it doesn't
    need to be exported to modules, so make it static and drop the
    export incantation.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Move the definitions of struct od_dbs_tuners and struct cs_dbs_tuners
    from the common governor header to the ondemand and conservative
    governor code, respectively, as they don't need to be in the common
    header any more.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • After previous changes there is only one piece of code in the
    ondemand governor making references to per-CPU data structures,
    but it can be easily modified to avoid doing that, so modify it
    accordingly and move the definition of per-CPU data used by the
    ondemand and conservative governors to the common code. Next,
    change that code to access the per-CPU data structures directly
    rather than via a governor callback.

    This causes the ->get_cpu_cdbs governor callback to become
    unnecessary, so drop it along with the macro and function
    definitions related to it.

    Finally, drop the definitions of struct od_cpu_dbs_info_s and
    struct cs_cpu_dbs_info_s that aren't necessary any more.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Some fields in struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s
    are only used for a limited set of CPUs. Namely, if a policy is
    shared between multiple CPUs, those fields will only be used for one
    of them (policy->cpu). This means that they really are per-policy
    rather than per-CPU and holding room for them in per-CPU data
    structures is generally wasteful. Also moving those fields into
    per-policy data structures will allow some significant simplifications
    to be made going forward.

    For this reason, introduce struct cs_policy_dbs_info and
    struct od_policy_dbs_info to hold those fields. Define each of the
    new structures as an extension of struct policy_dbs_info (such that
    struct policy_dbs_info is embedded in each of them) and introduce
    new ->alloc and ->free governor callbacks to allocate and free
    those structures, respectively, such that ->alloc() will return
    a pointer to the struct policy_dbs_info embedded in the allocated
    data structure and ->free() will take that pointer as its argument.

    With that, modify the code accessing the data fields in question
    in per-CPU data objects to look for them in the new structures
    via the struct policy_dbs_info pointer available to it and drop
    them from struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The ->store() callbacks of some tunable sysfs attributes of the
    ondemand and conservative governors trigger immediate updates of
    the CPU load information for all CPUs "governed" by the given
    dbs_data by walking the cpu_dbs_info structures for all online
    CPUs in the system and updating them.

    This is questionable for two reasons. First, it may lead to a lot of
    extra overhead on a system with many CPUs if the given dbs_data is
    only associated with a few of them. Second, if governor tunables are
    per-policy, the CPUs associated with the other sets of governor
    tunables should not be updated.

    To address this issue, use the observation that in all of the places
    in question the update operation may be carried out in the same way
    (because all of the tunables involved are now located in struct
    dbs_data and readily available to the common code) and make the
    code in those places invoke the same (new) helper function that
    will carry out the update correctly.

    That new function always checks the ignore_nice_load tunable value
    and updates the CPUs' prev_cpu_nice data fields if that's set, which
    wasn't done by the original code in store_io_is_busy(), but it
    should have been done in there too.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The ->powersave_bias_init_cpu callback in struct od_ops is only used
    in one place and that invocation may be replaced with a direct call
    to the function pointed to by that callback, so change the code
    accordingly and drop the callback.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • After some previous changes, the ->get_cpu_dbs_info_s governor
    callback and the "governor" field in struct dbs_governor (whose
    value represents the governor type) are not used any more, so
    drop them.

    Also drop the unused gov_ops field from struct dbs_governor.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • To avoid having to check the governor type explicitly in the common
    code in order to initialize data structures specific to the governor
    type properly, add a ->start callback to struct dbs_governor and
    use it to initialize those data structures for the ondemand and
    conservative governors.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The io_is_busy governor tunable is only used by the ondemand governor
    and is located in the ondemand-specific data structure, but it is
    looked at by the common governor code that has to do ugly things to
    get to that value, so move it to struct dbs_data and modify ondemand
    accordingly.

    Since the conservative governor never touches that field, it will
    be always 0 for that governor and it won't have any effect on the
    results of computations in that case.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The ->freq_increase callback in struct od_ops is never invoked,
    so drop it.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Do not convert microseconds to jiffies and the other way around
    in governor computations related to the sampling rate and sample
    delay and drop delay_for_sampling_rate() which isn't of any use
    then.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The rate_mult field in struct od_cpu_dbs_info_s is used by the code
    shared with the conservative governor and to access it that code
    has to do an ugly governor type check. However, first of all it
    is ever only used for policy->cpu, so it is per-policy rather than
    per-CPU and second, it is initialized to 1 by cpufreq_governor_start(),
    so if the conservative governor never modifies it, it will have no
    effect on the results of any computations.

    For these reasons, move rate_mult to struct policy_dbs_info (as a
    common field).

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The way the ->gov_check_cpu governor callback is used by the ondemand
    and conservative governors is not really straightforward. Namely, the
    governor calls dbs_check_cpu() that updates the load information for
    the policy and the invokes ->gov_check_cpu() for the governor.

    To get rid of that entanglement, notice that cpufreq_governor_limits()
    doesn't need to call dbs_check_cpu() directly. Instead, it can simply
    reset the sample delay to 0 which will cause a sample to be taken
    immediately. The result of that is practically equivalent to calling
    dbs_check_cpu() except that it will trigger a full update of governor
    internal state and not just the ->gov_check_cpu() part.

    Following that observation, make cpufreq_governor_limits() reset
    the sample delay and turn dbs_check_cpu() into a function that will
    simply evaluate the load and return the result called dbs_update().

    That function can now be called by governors from the routines that
    previously were pointed to by ->gov_check_cpu and those routines
    can be called directly by each governor instead of dbs_check_cpu().
    This way ->gov_check_cpu becomes unnecessary, so drop it.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Rework the handling of work items by dbs_update_util_handler() and
    dbs_work_handler() so the former (which is executed in scheduler
    paths) only uses atomic operations when absolutely necessary. That
    is, when the policy is shared and dbs_update_util_handler() has
    already decided that this is the time to queue up a work item.

    In particular, this avoids the atomic ops entirely on platforms where
    policy objects are never shared.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The ondemand governor already updates sample_delay_ns immediately on
    updates to the sampling rate, but conservative doesn't do that.

    It was left out earlier as the code was really too complex to get
    that done easily. Things are sorted out very well now, however, and
    the conservative governor can be modified to follow ondemand in that
    respect.

    Moreover, since the code needed to implement that in the
    conservative governor would be identical to the corresponding
    ondemand governor's code, make that code common and change both
    governors to use it.

    Signed-off-by: Viresh Kumar
    Tested-by: Juri Lelli
    Tested-by: Shilpasri G Bhat
    [ rjw: Changelog ]
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • We used to drop policy->rwsem just before calling __cpufreq_governor()
    in some cases earlier and so it was possible that __cpufreq_governor()
    ran concurrently via separate threads for the same policy.

    In order to guarantee valid state transitions for governors,
    'governor_enabled' was required to be protected using some locking
    and cpufreq_governor_lock was added for that.

    But now __cpufreq_governor() is always called under policy->rwsem,
    and 'governor_enabled' is protected against races even without
    cpufreq_governor_lock.

    Get rid of the extra lock now.

    Signed-off-by: Viresh Kumar
    Tested-by: Juri Lelli
    Tested-by: Shilpasri G Bhat
    [ rjw : Changelog ]
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • The dbs_data_mutex lock is currently used in two places. First,
    cpufreq_governor_dbs() uses it to guarantee mutual exclusion between
    invocations of governor operations from the core. Second, it is used by
    ondemand governor's update_sampling_rate() to ensure the stability of
    data structures walked by it.

    The second usage is quite problematic, because update_sampling_rate() is
    called from a governor sysfs attribute's ->store callback and that leads
    to a deadlock scenario involving cpufreq_governor_exit() which runs
    under dbs_data_mutex. Thus it is better to rework the code so
    update_sampling_rate() doesn't need to acquire dbs_data_mutex.

    To that end, rework update_sampling_rate() to walk a list of policy_dbs
    objects supported by the dbs_data one it has been called for (instead of
    walking cpu_dbs_info object for all CPUs). The list manipulation is
    protected with dbs_data->mutex which also is held around the execution
    of update_sampling_rate(), it is not necessary to hold dbs_data_mutex in
    that function any more.

    Reported-by: Juri Lelli
    Reported-by: Shilpasri G Bhat
    Signed-off-by: Viresh Kumar
    [ rjw: Subject & changelog ]
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • The previous commit introduced a new set of macros for creating sysfs
    attributes that represent governor tunables and the old macros used for
    this purpose are not needed any more, so drop them.

    Signed-off-by: Viresh Kumar
    Tested-by: Juri Lelli
    Tested-by: Shilpasri G Bhat
    [ rjw: Subject & changelog ]
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • The ondemand and conservative governors use the global-attr or freq-attr
    structures to represent sysfs attributes corresponding to their tunables
    (which of them is actually used depends on whether or not different
    policy objects can use the same governor with different tunables at the
    same time and, consequently, on where those attributes are located in
    sysfs).

    Unfortunately, in the freq-attr case, the standard cpufreq show/store
    sysfs attribute callbacks are applied to the governor tunable attributes
    and they always acquire the policy->rwsem lock before carrying out the
    operation. That may lead to an ABBA deadlock if governor tunable
    attributes are removed under policy->rwsem while one of them is being
    accessed concurrently (if sysfs attributes removal wins the race, it
    will wait for the access to complete with policy->rwsem held while the
    attribute callback will block on policy->rwsem indefinitely).

    We attempted to address this issue by dropping policy->rwsem around
    governor tunable attributes removal (that is, around invocations of the
    ->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT)
    in cpufreq_set_policy(), but that opened up race conditions that had not
    been possible with policy->rwsem held all the time. Therefore
    policy->rwsem cannot be dropped in cpufreq_set_policy() at any point,
    but the deadlock situation described above must be avoided too.

    To that end, use the observation that in principle governor tunables may
    be represented by the same data type regardless of whether the governor
    is system-wide or per-policy and introduce a new structure, struct
    governor_attr, for representing them and new corresponding macros for
    creating show/store sysfs callbacks for them. Also make their parent
    kobject use a new kobject type whose default show/store callbacks are
    not related to the standard core cpufreq ones in any way (and they don't
    acquire policy->rwsem in particular).

    Signed-off-by: Viresh Kumar
    Tested-by: Juri Lelli
    Tested-by: Shilpasri G Bhat
    [ rjw: Subject & changelog + rebase ]
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • There are a few common tunables shared between the ondemand and
    conservative governors. Move them to struct dbs_data to simplify
    code.

    Signed-off-by: Viresh Kumar
    Tested-by: Juri Lelli
    Tested-by: Shilpasri G Bhat
    [ rjw: Changelog ]
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • Some tunables are present in governor-specific structures, whereas one
    (min_sampling_rate) is located directly in struct dbs_data.

    There is a special macro for creating its sysfs attribute and the
    show/store callbacks, but since more tunables are going to be moved
    to struct dbs_data, a new generic macro for such cases will be useful,
    so add it and use it for min_sampling_rate.

    Signed-off-by: Viresh Kumar
    Tested-by: Juri Lelli
    Tested-by: Shilpasri G Bhat
    [ rjw: Subject & changelog ]
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • The skip_work field in struct policy_dbs_info technically is a
    counter, so give it a new name to reflect that.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The struct policy_dbs_info objects representing per-policy governor
    data are not accessible directly from the corresponding policy
    objects. To access them, one has to get a pointer to the
    struct cpu_dbs_info of policy->cpu and use the policy_dbs field of
    that which isn't really straightforward.

    To address that rearrange the governor data structures so the
    governor_data pointer in struct cpufreq_policy will point to
    struct policy_dbs_info (instead of struct dbs_data) and that will
    contain a pointer to struct dbs_data.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Since policy->cpu is always passed as the second argument to
    dbs_check_cpu(), it is not really necessary to pass it, because
    the function can obtain that value via its first argument just fine.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The struct cpu_common_dbs_info structure represents the per-policy
    part of the governor data (for the ondemand and conservative
    governors), but its name doesn't reflect its purpose.

    Rename it to struct policy_dbs_info and rename variables related to
    it accordingly.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Since it is possible to obtain a pointer to struct dbs_governor
    from a pointer to the struct governor embedded in it with the help
    of container_of(), the additional gov pointer in struct dbs_data
    isn't really necessary.

    Drop that pointer and make the code using it reach the dbs_governor
    object via policy->governor.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Since it is possible to obtain a pointer to struct dbs_governor
    from a pointer to the struct governor embedded in it via
    container_of(), the second argument of cpufreq_governor_init()
    is not necessary. Accordingly, cpufreq_governor_dbs() doesn't
    need its second argument either and the ->governor callbacks
    for both the ondemand and conservative governors may be set
    to cpufreq_governor_dbs() directly. Make that happen.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Saravana Kannan
    Acked-by: Viresh Kumar

    Rafael J. Wysocki