28 Apr, 2016

1 commit

  • The name of the prev_cpu_wall field in struct cpu_dbs_info is
    confusing, because it doesn't represent wall time, but the previous
    update time as returned by get_cpu_idle_time() (that may be the
    current value of jiffies_64 in some cases, for example).

    Moreover, the names of some related variables in dbs_update() take
    that confusion further.

    Rename all of those things to make their names reflect the purpose
    more accurately. While at it, drop unnecessary parens from one of
    the updated expressions.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar
    Acked-by: Chen Yu

    Rafael J. Wysocki
     

02 Apr, 2016

4 commits


09 Mar, 2016

31 commits

  • That mutex is only used by cpufreq_governor_dbs() and it doesn't
    need to be exported to modules, so make it static and drop the
    export incantation.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Move the definitions of struct od_dbs_tuners and struct cs_dbs_tuners
    from the common governor header to the ondemand and conservative
    governor code, respectively, as they don't need to be in the common
    header any more.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • After previous changes there is only one piece of code in the
    ondemand governor making references to per-CPU data structures,
    but it can be easily modified to avoid doing that, so modify it
    accordingly and move the definition of per-CPU data used by the
    ondemand and conservative governors to the common code. Next,
    change that code to access the per-CPU data structures directly
    rather than via a governor callback.

    This causes the ->get_cpu_cdbs governor callback to become
    unnecessary, so drop it along with the macro and function
    definitions related to it.

    Finally, drop the definitions of struct od_cpu_dbs_info_s and
    struct cs_cpu_dbs_info_s that aren't necessary any more.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Some fields in struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s
    are only used for a limited set of CPUs. Namely, if a policy is
    shared between multiple CPUs, those fields will only be used for one
    of them (policy->cpu). This means that they really are per-policy
    rather than per-CPU and holding room for them in per-CPU data
    structures is generally wasteful. Also moving those fields into
    per-policy data structures will allow some significant simplifications
    to be made going forward.

    For this reason, introduce struct cs_policy_dbs_info and
    struct od_policy_dbs_info to hold those fields. Define each of the
    new structures as an extension of struct policy_dbs_info (such that
    struct policy_dbs_info is embedded in each of them) and introduce
    new ->alloc and ->free governor callbacks to allocate and free
    those structures, respectively, such that ->alloc() will return
    a pointer to the struct policy_dbs_info embedded in the allocated
    data structure and ->free() will take that pointer as its argument.

    With that, modify the code accessing the data fields in question
    in per-CPU data objects to look for them in the new structures
    via the struct policy_dbs_info pointer available to it and drop
    them from struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The ->store() callbacks of some tunable sysfs attributes of the
    ondemand and conservative governors trigger immediate updates of
    the CPU load information for all CPUs "governed" by the given
    dbs_data by walking the cpu_dbs_info structures for all online
    CPUs in the system and updating them.

    This is questionable for two reasons. First, it may lead to a lot of
    extra overhead on a system with many CPUs if the given dbs_data is
    only associated with a few of them. Second, if governor tunables are
    per-policy, the CPUs associated with the other sets of governor
    tunables should not be updated.

    To address this issue, use the observation that in all of the places
    in question the update operation may be carried out in the same way
    (because all of the tunables involved are now located in struct
    dbs_data and readily available to the common code) and make the
    code in those places invoke the same (new) helper function that
    will carry out the update correctly.

    That new function always checks the ignore_nice_load tunable value
    and updates the CPUs' prev_cpu_nice data fields if that's set, which
    wasn't done by the original code in store_io_is_busy(), but it
    should have been done in there too.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The ->powersave_bias_init_cpu callback in struct od_ops is only used
    in one place and that invocation may be replaced with a direct call
    to the function pointed to by that callback, so change the code
    accordingly and drop the callback.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • After some previous changes, the ->get_cpu_dbs_info_s governor
    callback and the "governor" field in struct dbs_governor (whose
    value represents the governor type) are not used any more, so
    drop them.

    Also drop the unused gov_ops field from struct dbs_governor.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • To avoid having to check the governor type explicitly in the common
    code in order to initialize data structures specific to the governor
    type properly, add a ->start callback to struct dbs_governor and
    use it to initialize those data structures for the ondemand and
    conservative governors.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The io_is_busy governor tunable is only used by the ondemand governor
    and is located in the ondemand-specific data structure, but it is
    looked at by the common governor code that has to do ugly things to
    get to that value, so move it to struct dbs_data and modify ondemand
    accordingly.

    Since the conservative governor never touches that field, it will
    be always 0 for that governor and it won't have any effect on the
    results of computations in that case.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The ->freq_increase callback in struct od_ops is never invoked,
    so drop it.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Do not convert microseconds to jiffies and the other way around
    in governor computations related to the sampling rate and sample
    delay and drop delay_for_sampling_rate() which isn't of any use
    then.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The rate_mult field in struct od_cpu_dbs_info_s is used by the code
    shared with the conservative governor and to access it that code
    has to do an ugly governor type check. However, first of all it
    is ever only used for policy->cpu, so it is per-policy rather than
    per-CPU and second, it is initialized to 1 by cpufreq_governor_start(),
    so if the conservative governor never modifies it, it will have no
    effect on the results of any computations.

    For these reasons, move rate_mult to struct policy_dbs_info (as a
    common field).

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The way the ->gov_check_cpu governor callback is used by the ondemand
    and conservative governors is not really straightforward. Namely, the
    governor calls dbs_check_cpu() that updates the load information for
    the policy and the invokes ->gov_check_cpu() for the governor.

    To get rid of that entanglement, notice that cpufreq_governor_limits()
    doesn't need to call dbs_check_cpu() directly. Instead, it can simply
    reset the sample delay to 0 which will cause a sample to be taken
    immediately. The result of that is practically equivalent to calling
    dbs_check_cpu() except that it will trigger a full update of governor
    internal state and not just the ->gov_check_cpu() part.

    Following that observation, make cpufreq_governor_limits() reset
    the sample delay and turn dbs_check_cpu() into a function that will
    simply evaluate the load and return the result called dbs_update().

    That function can now be called by governors from the routines that
    previously were pointed to by ->gov_check_cpu and those routines
    can be called directly by each governor instead of dbs_check_cpu().
    This way ->gov_check_cpu becomes unnecessary, so drop it.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Rework the handling of work items by dbs_update_util_handler() and
    dbs_work_handler() so the former (which is executed in scheduler
    paths) only uses atomic operations when absolutely necessary. That
    is, when the policy is shared and dbs_update_util_handler() has
    already decided that this is the time to queue up a work item.

    In particular, this avoids the atomic ops entirely on platforms where
    policy objects are never shared.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The ondemand governor already updates sample_delay_ns immediately on
    updates to the sampling rate, but conservative doesn't do that.

    It was left out earlier as the code was really too complex to get
    that done easily. Things are sorted out very well now, however, and
    the conservative governor can be modified to follow ondemand in that
    respect.

    Moreover, since the code needed to implement that in the
    conservative governor would be identical to the corresponding
    ondemand governor's code, make that code common and change both
    governors to use it.

    Signed-off-by: Viresh Kumar
    Tested-by: Juri Lelli
    Tested-by: Shilpasri G Bhat
    [ rjw: Changelog ]
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • We used to drop policy->rwsem just before calling __cpufreq_governor()
    in some cases earlier and so it was possible that __cpufreq_governor()
    ran concurrently via separate threads for the same policy.

    In order to guarantee valid state transitions for governors,
    'governor_enabled' was required to be protected using some locking
    and cpufreq_governor_lock was added for that.

    But now __cpufreq_governor() is always called under policy->rwsem,
    and 'governor_enabled' is protected against races even without
    cpufreq_governor_lock.

    Get rid of the extra lock now.

    Signed-off-by: Viresh Kumar
    Tested-by: Juri Lelli
    Tested-by: Shilpasri G Bhat
    [ rjw : Changelog ]
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • The dbs_data_mutex lock is currently used in two places. First,
    cpufreq_governor_dbs() uses it to guarantee mutual exclusion between
    invocations of governor operations from the core. Second, it is used by
    ondemand governor's update_sampling_rate() to ensure the stability of
    data structures walked by it.

    The second usage is quite problematic, because update_sampling_rate() is
    called from a governor sysfs attribute's ->store callback and that leads
    to a deadlock scenario involving cpufreq_governor_exit() which runs
    under dbs_data_mutex. Thus it is better to rework the code so
    update_sampling_rate() doesn't need to acquire dbs_data_mutex.

    To that end, rework update_sampling_rate() to walk a list of policy_dbs
    objects supported by the dbs_data one it has been called for (instead of
    walking cpu_dbs_info object for all CPUs). The list manipulation is
    protected with dbs_data->mutex which also is held around the execution
    of update_sampling_rate(), it is not necessary to hold dbs_data_mutex in
    that function any more.

    Reported-by: Juri Lelli
    Reported-by: Shilpasri G Bhat
    Signed-off-by: Viresh Kumar
    [ rjw: Subject & changelog ]
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • The previous commit introduced a new set of macros for creating sysfs
    attributes that represent governor tunables and the old macros used for
    this purpose are not needed any more, so drop them.

    Signed-off-by: Viresh Kumar
    Tested-by: Juri Lelli
    Tested-by: Shilpasri G Bhat
    [ rjw: Subject & changelog ]
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • The ondemand and conservative governors use the global-attr or freq-attr
    structures to represent sysfs attributes corresponding to their tunables
    (which of them is actually used depends on whether or not different
    policy objects can use the same governor with different tunables at the
    same time and, consequently, on where those attributes are located in
    sysfs).

    Unfortunately, in the freq-attr case, the standard cpufreq show/store
    sysfs attribute callbacks are applied to the governor tunable attributes
    and they always acquire the policy->rwsem lock before carrying out the
    operation. That may lead to an ABBA deadlock if governor tunable
    attributes are removed under policy->rwsem while one of them is being
    accessed concurrently (if sysfs attributes removal wins the race, it
    will wait for the access to complete with policy->rwsem held while the
    attribute callback will block on policy->rwsem indefinitely).

    We attempted to address this issue by dropping policy->rwsem around
    governor tunable attributes removal (that is, around invocations of the
    ->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT)
    in cpufreq_set_policy(), but that opened up race conditions that had not
    been possible with policy->rwsem held all the time. Therefore
    policy->rwsem cannot be dropped in cpufreq_set_policy() at any point,
    but the deadlock situation described above must be avoided too.

    To that end, use the observation that in principle governor tunables may
    be represented by the same data type regardless of whether the governor
    is system-wide or per-policy and introduce a new structure, struct
    governor_attr, for representing them and new corresponding macros for
    creating show/store sysfs callbacks for them. Also make their parent
    kobject use a new kobject type whose default show/store callbacks are
    not related to the standard core cpufreq ones in any way (and they don't
    acquire policy->rwsem in particular).

    Signed-off-by: Viresh Kumar
    Tested-by: Juri Lelli
    Tested-by: Shilpasri G Bhat
    [ rjw: Subject & changelog + rebase ]
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • There are a few common tunables shared between the ondemand and
    conservative governors. Move them to struct dbs_data to simplify
    code.

    Signed-off-by: Viresh Kumar
    Tested-by: Juri Lelli
    Tested-by: Shilpasri G Bhat
    [ rjw: Changelog ]
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • Some tunables are present in governor-specific structures, whereas one
    (min_sampling_rate) is located directly in struct dbs_data.

    There is a special macro for creating its sysfs attribute and the
    show/store callbacks, but since more tunables are going to be moved
    to struct dbs_data, a new generic macro for such cases will be useful,
    so add it and use it for min_sampling_rate.

    Signed-off-by: Viresh Kumar
    Tested-by: Juri Lelli
    Tested-by: Shilpasri G Bhat
    [ rjw: Subject & changelog ]
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • The skip_work field in struct policy_dbs_info technically is a
    counter, so give it a new name to reflect that.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The struct policy_dbs_info objects representing per-policy governor
    data are not accessible directly from the corresponding policy
    objects. To access them, one has to get a pointer to the
    struct cpu_dbs_info of policy->cpu and use the policy_dbs field of
    that which isn't really straightforward.

    To address that rearrange the governor data structures so the
    governor_data pointer in struct cpufreq_policy will point to
    struct policy_dbs_info (instead of struct dbs_data) and that will
    contain a pointer to struct dbs_data.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Since policy->cpu is always passed as the second argument to
    dbs_check_cpu(), it is not really necessary to pass it, because
    the function can obtain that value via its first argument just fine.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The struct cpu_common_dbs_info structure represents the per-policy
    part of the governor data (for the ondemand and conservative
    governors), but its name doesn't reflect its purpose.

    Rename it to struct policy_dbs_info and rename variables related to
    it accordingly.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Since it is possible to obtain a pointer to struct dbs_governor
    from a pointer to the struct governor embedded in it with the help
    of container_of(), the additional gov pointer in struct dbs_data
    isn't really necessary.

    Drop that pointer and make the code using it reach the dbs_governor
    object via policy->governor.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Since it is possible to obtain a pointer to struct dbs_governor
    from a pointer to the struct governor embedded in it via
    container_of(), the second argument of cpufreq_governor_init()
    is not necessary. Accordingly, cpufreq_governor_dbs() doesn't
    need its second argument either and the ->governor callbacks
    for both the ondemand and conservative governors may be set
    to cpufreq_governor_dbs() directly. Make that happen.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Saravana Kannan
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The ondemand and conservative governors are represented by
    struct common_dbs_data whose name doesn't reflect the purpose it
    is used for, so rename it to struct dbs_governor and rename
    variables of that type accordingly.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • For the ondemand and conservative governors (generally, governors
    that use the common code in cpufreq_governor.c), there are two static
    data structures representing the governor, the struct governor
    structure (the interface to the cpufreq core) and the struct
    common_dbs_data one (the interface to the cpufreq_governor.c code).

    There's no fundamental reason why those two structures have to be
    separate. Moreover, if the struct governor one is included into
    struct common_dbs_data, it will be possible to reach the latter from
    the policy via its policy->governor pointer, so it won't be necessary
    to pass a separate pointer to it around. For this reason, embed
    struct governor in struct common_dbs_data.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Saravana Kannan
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Every governor relying on the common code in cpufreq_governor.c
    has to provide its own mutex in struct common_dbs_data. However,
    there actually is no need to have a separate mutex per governor
    for this purpose, they may be using the same global mutex just
    fine. Accordingly, introduce a single common mutex for that and
    drop the mutex field from struct common_dbs_data.

    That at least will ensure that the mutex is always present and
    initialized regardless of what the particular governors do.

    Another benefit is that the common code does not need a pointer to
    a governor-related structure to get to the mutex which sometimes
    helps.

    Finally, it makes the code generally easier to follow.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Saravana Kannan
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Instead of using a per-CPU deferrable timer for queuing up governor
    work items, register a utilization update callback that will be
    invoked from the scheduler on utilization changes.

    The sampling rate is still the same as what was used for the
    deferrable timers and the added irq_work overhead should be offset by
    the eliminated timers overhead, so in theory the functional impact of
    this patch should not be significant.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar
    Tested-by: Gautham R. Shenoy

    Rafael J. Wysocki
     

10 Dec, 2015

2 commits

  • It is possible to get rid of the timer_lock spinlock used by the
    governor timer function for synchronization, but a couple of races
    need to be avoided.

    The first race is between multiple dbs_timer_handler() instances
    that may be running in parallel with each other on different
    CPUs. Namely, one of them has to queue up the work item, but it
    cannot be queued up more than once. To achieve that,
    atomic_inc_return() can be used on the skip_work field of
    struct cpu_common_dbs_info.

    The second race is between an already running dbs_timer_handler()
    and gov_cancel_work(). In that case the dbs_timer_handler() might
    not notice the skip_work incrementation in gov_cancel_work() and
    it might queue up its work item after gov_cancel_work() had
    returned (and that work item would corrupt skip_work going
    forward). To prevent that from happening, gov_cancel_work()
    can be made wait for the timer function to complete (on all CPUs)
    right after skip_work has been incremented.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • cpufreq governors evaluate load at sampling rate and based on that they
    update frequency for a group of CPUs belonging to the same cpufreq
    policy.

    This is required to be done in a single thread for all policy->cpus, but
    because we don't want to wakeup idle CPUs to do just that, we use
    deferrable work for this. If we would have used a single delayed
    deferrable work for the entire policy, there were chances that the CPU
    required to run the handler can be in idle and we might end up not
    changing the frequency for the entire group with load variations.

    And so we were forced to keep per-cpu works, and only the one that
    expires first need to do the real work and others are rescheduled for
    next sampling time.

    We have been using the more complex solution until now, where we used a
    delayed deferrable work for this, which is a combination of a timer and
    a work.

    This could be made lightweight by keeping per-cpu deferred timers with a
    single work item, which is scheduled by the first timer that expires.

    This patch does just that and here are important changes:
    - The timer handler will run in irq context and so we need to use a
    spin_lock instead of the timer_mutex. And so a separate timer_lock is
    created. This also makes the use of the mutex and lock quite clear, as
    we know what exactly they are protecting.
    - A new field 'skip_work' is added to track when the timer handlers can
    queue a work. More comments present in code.

    Suggested-by: Rafael J. Wysocki
    Signed-off-by: Viresh Kumar
    Reviewed-by: Ashwin Chaugule
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

07 Dec, 2015

1 commit


26 Sep, 2015

1 commit

  • Conservative governor has its own 'enable' field to check if
    conservative governor is used for a CPU or not

    This can be checked by policy->governor with 'cpufreq_gov_conservative'
    and so this field can be dropped.

    Because its not guaranteed that dbs_info->cdbs.shared will a valid
    pointer for all CPUs (will be NULL for CPUs that don't use
    ondemand/conservative governors), we can't use it anymore. Lets get
    policy with cpufreq_cpu_get_raw() instead.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar