02 Mar, 2017

1 commit


01 Feb, 2017

1 commit

  • Kernel CPU stats are stored in cputime_t which is an architecture
    defined type, and hence a bit opaque and requiring accessors and mutators
    for any operation.

    Converting them to nsecs simplifies the code and is one step toward
    the removal of cputime_t in the core code.

    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stanislaw Gruszka
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1485832191-26889-4-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

17 Nov, 2016

1 commit

  • Conservative governor changes the CPU frequency in steps.
    That means that if a CPU runs at max frequency, it will need several
    sampling periods to return to min frequency when the workload
    is finished.

    If the update function that calculates the load and target frequency
    is deferred, the governor might need even more time to decrease the
    frequency.

    This may have impact to power consumption and after all conservative
    should decrease the frequency if there is no workload at every sampling
    rate.

    To resolve the above issue calculate the number of sampling periods
    that the update is deferred. Considering that for each sampling period
    conservative should drop the frequency by a freq_step because the
    CPU was idle apply the proper subtraction to requested frequency.

    Below, the kernel trace with and without this patch. First an
    intensive workload is applied on a specific CPU. Then the workload
    is removed and the CPU goes to idle.

    WITHOUT

    -0 [007] dN.. 620.329153: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 620.350857: cpu_frequency: state=1700000 cpu_id=7
    kworker/7:2-556 [007] .... 620.370856: cpu_frequency: state=1900000 cpu_id=7
    kworker/7:2-556 [007] .... 620.390854: cpu_frequency: state=2100000 cpu_id=7
    kworker/7:2-556 [007] .... 620.411853: cpu_frequency: state=2200000 cpu_id=7
    kworker/7:2-556 [007] .... 620.432854: cpu_frequency: state=2400000 cpu_id=7
    kworker/7:2-556 [007] .... 620.453854: cpu_frequency: state=2600000 cpu_id=7
    kworker/7:2-556 [007] .... 620.494856: cpu_frequency: state=2900000 cpu_id=7
    kworker/7:2-556 [007] .... 620.515856: cpu_frequency: state=3100000 cpu_id=7
    kworker/7:2-556 [007] .... 620.536858: cpu_frequency: state=3300000 cpu_id=7
    kworker/7:2-556 [007] .... 620.557857: cpu_frequency: state=3401000 cpu_id=7
    -0 [007] d... 669.591363: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 669.591939: cpu_idle: state=4294967295 cpu_id=7
    -0 [007] d... 669.591980: cpu_idle: state=4 cpu_id=7
    -0 [007] dN.. 669.591989: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 670.201224: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 670.221975: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 670.222016: cpu_frequency: state=3300000 cpu_id=7
    -0 [007] d... 670.222026: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 670.234964: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 670.801251: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 671.236046: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 671.236073: cpu_frequency: state=3100000 cpu_id=7
    -0 [007] d... 671.236112: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 671.393437: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 671.401277: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 671.404083: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 671.404111: cpu_frequency: state=2900000 cpu_id=7
    -0 [007] d... 671.404125: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 671.404974: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 671.501180: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 671.995414: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 671.995459: cpu_frequency: state=2800000 cpu_id=7
    -0 [007] d... 671.995469: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 671.996287: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 672.001305: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.078374: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 672.078410: cpu_frequency: state=2600000 cpu_id=7
    -0 [007] d... 672.078419: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.158020: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 672.158040: cpu_frequency: state=2400000 cpu_id=7
    -0 [007] d... 672.158044: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.160038: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 672.234557: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.237121: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 672.237174: cpu_frequency: state=2100000 cpu_id=7
    -0 [007] d... 672.237186: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.237778: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 672.267902: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.269860: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 672.269906: cpu_frequency: state=1900000 cpu_id=7
    -0 [007] d... 672.269914: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.271902: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 672.751342: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 672.823056: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-556 [007] .... 672.823095: cpu_frequency: state=1600000 cpu_id=7

    WITH

    -0 [007] dN.. 4380.928009: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-399 [007] .... 4380.949767: cpu_frequency: state=2000000 cpu_id=7
    kworker/7:2-399 [007] .... 4380.969765: cpu_frequency: state=2200000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.009766: cpu_frequency: state=2500000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.029767: cpu_frequency: state=2600000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.049769: cpu_frequency: state=2800000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.069769: cpu_frequency: state=3000000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.089771: cpu_frequency: state=3100000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.109772: cpu_frequency: state=3400000 cpu_id=7
    kworker/7:2-399 [007] .... 4381.129773: cpu_frequency: state=3401000 cpu_id=7
    -0 [007] d... 4428.226159: cpu_idle: state=1 cpu_id=7
    -0 [007] d... 4428.226176: cpu_idle: state=4294967295 cpu_id=7
    -0 [007] d... 4428.226181: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 4428.227177: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 4428.551640: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 4428.649239: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-399 [007] .... 4428.649268: cpu_frequency: state=2800000 cpu_id=7
    -0 [007] d... 4428.649278: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 4428.689856: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 4428.799542: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 4428.801683: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-399 [007] .... 4428.801748: cpu_frequency: state=1700000 cpu_id=7
    -0 [007] d... 4428.801761: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 4428.806545: cpu_idle: state=4294967295 cpu_id=7
    ...
    -0 [007] d... 4429.051880: cpu_idle: state=4 cpu_id=7
    -0 [007] d... 4429.086240: cpu_idle: state=4294967295 cpu_id=7
    kworker/7:2-399 [007] .... 4429.086293: cpu_frequency: state=1600000 cpu_id=7

    Without the patch the CPU dropped to min frequency after 3.2s
    With the patch applied the CPU dropped to min frequency after 0.86s

    Signed-off-by: Stratos Karafotis
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Stratos Karafotis
     

11 Nov, 2016

1 commit

  • The earlier implementation of governors used background timers and so
    functions, mutex, etc had 'timer' keyword in their names.

    But that's not true anymore. Replace 'timer' with 'update', as those
    functions, variables are based around updates to frequency.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

17 Aug, 2016

1 commit

  • It is useful to know the reason why cpufreq_update_util() has just
    been called and that can be passed as flags to cpufreq_update_util()
    and to the ->func() callback in struct update_util_data. However,
    doing that in addition to passing the util and max arguments they
    already take would be clumsy, so avoid it.

    Instead, use the observation that the schedutil governor is part
    of the scheduler proper, so it can access scheduler data directly.
    This allows the util and max arguments of cpufreq_update_util()
    and the ->func() callback in struct update_util_data to be replaced
    with a flags one, but schedutil has to be modified to follow.

    Thus make the schedutil governor obtain the CFS utilization
    information from the scheduler and use the "RT" and "DL" flags
    instead of the special utilization value of ULONG_MAX to track
    updates from the RT and DL sched classes. Make it non-modular
    too to avoid having to export scheduler variables to modules at
    large.

    Next, update all of the other users of cpufreq_update_util()
    and the ->func() callback in struct update_util_data accordingly.

    Suggested-by: Peter Zijlstra
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     

10 Jun, 2016

1 commit

  • There's no reason for gov_cancel_work() to exist at all, as it only
    has one caller and the only thing done by that caller is to invoke
    gov_cancel_work().

    Accordingly, drop gov_cancel_work() and move its contents to the
    caller.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     

03 Jun, 2016

4 commits

  • The 'initialized' field in struct cpufreq_governor is only used by
    the conservative governor (as a usage counter) and the way that
    happens is far from straightforward and arguably incorrect.

    Namely, the value of 'initialized' is checked by
    cpufreq_dbs_governor_init() and cpufreq_dbs_governor_exit() and
    the results of those checks are passed (as the second argument) to
    the ->init() and ->exit() callbacks in struct dbs_governor. Those
    callbacks are only implemented by the ondemand and conservative
    governors and ondemand doesn't use their second argument at all.
    In turn, the conservative governor uses it to decide whether or not
    to either register or unregister a transition notifier.

    That whole mechanism is not only unnecessarily convoluted, but also
    racy, because the 'initialized' field of struct cpufreq_governor is
    updated in cpufreq_init_governor() and cpufreq_exit_governor() under
    policy->rwsem which doesn't help if one of these functions is run
    twice in parallel for different policies (which isn't impossible in
    principle), for example.

    Instead of it, add a proper usage counter to the conservative
    governor and update it from cs_init() and cs_exit() which is
    guaranteed to be non-racy, as those functions are only called
    under gov_dbs_data_mutex which is global.

    With that in place, drop the 'initialized' field from struct
    cpufreq_governor as it is not used any more.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Create a new helper to avoid code duplication across governors.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • pr_*() helpers already prefix the print messages with
    "cpufreq_governor:" and similar details aren't required in the actual
    message.

    For example, the print message getting fixed looks like this before this
    patch:

    cpufreq_governor: cpufreq: Governor initialization failed (dbs_data kobject init error 0)

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • The design of the cpufreq governor API is not very straightforward,
    as struct cpufreq_governor provides only one callback to be invoked
    from different code paths for different purposes. The purpose it is
    invoked for is determined by its second "event" argument, causing it
    to act as a "callback multiplexer" of sorts.

    Unfortunately, that leads to extra complexity in governors, some of
    which implement the ->governor() callback as a switch statement
    that simply checks the event argument and invokes a separate function
    to handle that specific event.

    That extra complexity can be eliminated by replacing the all-purpose
    ->governor() callback with a family of callbacks to carry out specific
    governor operations: initialization and exit, start and stop and policy
    limits updates. That also turns out to reduce the code size too, so
    do it.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     

06 May, 2016

1 commit

  • As reported in KBZ 69821:

    "With CONFIG_HZ_PERIODIC=y cpu stays at the lowest frequcency 800MHz
    even if usage goes to 100%, frequency does not scale up, the governor
    in use is ondemand. Neither works conservative. Performance and
    userspace governors work as expected.

    With CONFIG_NO_HZ_IDLE or CONFIG_NO_HZ_FULL cpu scales up with ondemand
    as expected."

    Analysis carried out by Chen Yu leads to the conclusion that the
    observed issue is due to idle_time in dbs_update() representing a
    negative number in which case the function will return 0 as the load
    (unless load is greater than 0 for another CPU sharing the policy),
    although that need not be the right choice.

    Indeed, idle_time representing a negative number means that during
    the last sampling interval the CPU was almost 100% busy on the rough
    average, so 100 should be returned as the load in that case.

    Modify the code accordingly and rearrange it to clarify the handling
    of all of the special cases in it. While at it, also avoid returning
    zero as the load if time_elapsed is 0 (it doesn't really make sense
    to return 0 then).

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=69821
    Tested-by: Chen Yu
    Tested-by: Timo Valtoaho
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     

28 Apr, 2016

1 commit

  • The name of the prev_cpu_wall field in struct cpu_dbs_info is
    confusing, because it doesn't represent wall time, but the previous
    update time as returned by get_cpu_idle_time() (that may be the
    current value of jiffies_64 in some cases, for example).

    Moreover, the names of some related variables in dbs_update() take
    that confusion further.

    Rename all of those things to make their names reflect the purpose
    more accurately. While at it, drop unnecessary parens from one of
    the updated expressions.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar
    Acked-by: Chen Yu

    Rafael J. Wysocki
     

25 Apr, 2016

3 commits

  • The way cpufreq_governor_start() initializes j_cdbs->prev_load is
    questionable.

    First off, j_cdbs->prev_cpu_wall used as a denominator in the
    computation may be zero. The case this happens is when
    get_cpu_idle_time_us() returns -1 and get_cpu_idle_time_jiffy()
    used to return that number is called exactly at the jiffies_64
    wrap time. It is rather hard to trigger that error, but it is not
    impossible and it will just crash the kernel then.

    Second, j_cdbs->prev_load is computed as the average load during
    the entire time since the system started and it may not reflect the
    load in the previous sampling period (as it is expected to).
    That doesn't play well with the way dbs_update() uses that value.
    Namely, if the update time delta (wall_time) happens do be greater
    than twice the sampling rate on the first invocation of it, the
    initial value of j_cdbs->prev_load (which may be completely off) will
    be returned to the caller as the current load (unless it is equal to
    zero and unless another CPU sharing the same policy object has a
    greater load value).

    For this reason, notice that the prev_load field of struct cpu_dbs_info
    is only used by dbs_update() and only in that one place, so if
    cpufreq_governor_start() is modified to always initialize it to 0,
    it will make dbs_update() always compute the actual load first time
    it checks the update time delta against the doubled sampling rate
    (after initialization) and there won't be any side effects of it.

    Consequently, modify cpufreq_governor_start() as described.

    Fixes: 18b46abd0009 (cpufreq: governor: Be friendly towards latency-sensitive bursty workloads)
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Rafael J. Wysocki
     
  • Revert commit 0df35026c6a5 (cpufreq: governor: Fix negative idle_time
    when configured with CONFIG_HZ_PERIODIC) that introduced a regression
    by causing the ondemand cpufreq governor to misbehave for
    CONFIG_TICK_CPU_ACCOUNTING unset (the frequency goes up to the max at
    one point and stays there indefinitely).

    The revert takes subsequent modifications of the code in question into
    account.

    Fixes: 0df35026c6a5 (cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC)
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=115261
    Reported-and-tested-by: Timo Valtoaho
    Cc: 4.5+ # 4.5+
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     

02 Apr, 2016

3 commits

  • Move abstract code related to struct gov_attr_set to a separate (new)
    file so it can be shared with (future) goverernors that won't share
    more code with "ondemand" and "conservative".

    No intentional functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • In addition to fields representing governor tunables, struct dbs_data
    contains some fields needed for the management of objects of that
    type. As it turns out, that part of struct dbs_data may be shared
    with (future) governors that won't use the common code used by
    "ondemand" and "conservative", so move it to a separate struct type
    and modify the code using struct dbs_data to follow.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Replace the single helper for adding and removing cpufreq utilization
    update hooks, cpufreq_set_update_util_data(), with a pair of helpers,
    cpufreq_add_update_util_hook() and cpufreq_remove_update_util_hook(),
    and modify the users of cpufreq_set_update_util_data() accordingly.

    With the new helpers, the code using them doesn't need to worry
    about the internals of struct update_util_data and in particular
    it doesn't need to worry about populating the func field in it
    properly upfront.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar
    Acked-by: Peter Zijlstra (Intel)

    Rafael J. Wysocki
     

23 Mar, 2016

1 commit

  • Modify dbs_irq_work() to always schedule the process-context work
    on the current CPU which also ran the dbs_update_util_handler()
    that the irq_work being handled came from.

    This causes the entire frequency update handling (involving the
    "ondemand" or "conservative" governors) to be carried out by the
    CPU whose frequency is to be updated and reduces the overall amount
    of inter-CPU noise related to cpufreq.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

11 Mar, 2016

1 commit

  • Create cpufreq.c under kernel/sched/ and move the cpufreq code
    related to the scheduler to that file and to sched.h.

    Redefine cpufreq_update_util() as a static inline function to avoid
    function calls at its call sites in the scheduler code (as suggested
    by Peter Zijlstra).

    Also move the definition of struct update_util_data and declaration
    of cpufreq_set_update_util_data() from include/linux/cpufreq.h to
    include/linux/sched.h.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Peter Zijlstra (Intel)

    Rafael J. Wysocki
     

09 Mar, 2016

20 commits

  • Use the observation that cpufreq_update_util() is only called
    by the scheduler with rq->lock held, so the callers of
    cpufreq_set_update_util_data() can use synchronize_sched()
    instead of synchronize_rcu() to wait for cpufreq_update_util()
    to complete. Moreover, if they are updated to do that,
    rcu_read_(un)lock() calls in cpufreq_update_util() might be
    replaced with rcu_read_(un)lock_sched(), respectively, but
    those aren't really necessary, because the scheduler calls
    that function from RCU-sched read-side critical sections
    already.

    In addition to that, if cpufreq_set_update_util_data() checks
    the func field in the struct update_util_data before setting
    the per-CPU pointer to it, the data->func check may be dropped
    from cpufreq_update_util() as well.

    Make the above changes to reduce the overhead from
    cpufreq_update_util() in the scheduler paths invoking it
    and to make the cleanup after removing its callbacks less
    heavy-weight somewhat.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar
    Acked-by: Peter Zijlstra (Intel)

    Rafael J. Wysocki
     
  • The show() and store() routines in the cpufreq-governor core don't need
    to check if the struct governor_attr they want to use really provides
    the callbacks they need as expected (if that's not the case, it means a
    bug in the code anyway), so change them to avoid doing that.

    Also change the error value to -EBUSY, if the governor is getting
    removed and we aren't allowed to store any more changes.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • There is a scenario that may lead to undesired results in
    dbs_update_util_handler(). Namely, if two CPUs sharing a policy
    enter the funtion at the same time, pass the sample delay check
    and then one of them is stalled until dbs_work_handler() (queued
    up by the other CPU) clears the work counter, it may update the
    work counter and queue up another work item prematurely.

    To prevent that from happening, use the observation that the CPU
    queuing up a work item in dbs_update_util_handler() updates the
    last sample time. This means that if another CPU was stalling after
    passing the sample delay check and now successfully updated the work
    counter as a result of the race described above, it will see the new
    value of the last sample time which is different from what it used in
    the sample delay check before. If that happens, the sample delay
    check passed previously is not valid any more, so the CPU should not
    continue.

    Fixes: f17cbb53783c (cpufreq: governor: Avoid atomic operations in hot paths)
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The gov_set_update_util() routine is only used internally by the
    common governor code and it doesn't need to be exported, so make
    it static.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Since cpufreq_governor_dbs() is now always called with policy->rwsem
    held, it cannot be executed twice in parallel for the same policy.
    Thus it is not necessary to hold dbs_data_mutex around the invocations
    of cpufreq_governor_start/stop/limits() from it as those functions
    never modify any data that can be shared between different policies.

    However, cpufreq_governor_dbs() may be executed twice in parallal
    for different policies using the same gov->gdbs_data object and
    dbs_data_mutex is still necessary to protect that object against
    concurrent updates.

    For this reason, narrow down the dbs_data_mutex locking to
    cpufreq_governor_init/exit() where it is needed and rename the
    mutex to gov_dbs_data_mutex to reflect its purpose.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • That mutex is only used by cpufreq_governor_dbs() and it doesn't
    need to be exported to modules, so make it static and drop the
    export incantation.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • After previous changes there is only one piece of code in the
    ondemand governor making references to per-CPU data structures,
    but it can be easily modified to avoid doing that, so modify it
    accordingly and move the definition of per-CPU data used by the
    ondemand and conservative governors to the common code. Next,
    change that code to access the per-CPU data structures directly
    rather than via a governor callback.

    This causes the ->get_cpu_cdbs governor callback to become
    unnecessary, so drop it along with the macro and function
    definitions related to it.

    Finally, drop the definitions of struct od_cpu_dbs_info_s and
    struct cs_cpu_dbs_info_s that aren't necessary any more.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Some fields in struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s
    are only used for a limited set of CPUs. Namely, if a policy is
    shared between multiple CPUs, those fields will only be used for one
    of them (policy->cpu). This means that they really are per-policy
    rather than per-CPU and holding room for them in per-CPU data
    structures is generally wasteful. Also moving those fields into
    per-policy data structures will allow some significant simplifications
    to be made going forward.

    For this reason, introduce struct cs_policy_dbs_info and
    struct od_policy_dbs_info to hold those fields. Define each of the
    new structures as an extension of struct policy_dbs_info (such that
    struct policy_dbs_info is embedded in each of them) and introduce
    new ->alloc and ->free governor callbacks to allocate and free
    those structures, respectively, such that ->alloc() will return
    a pointer to the struct policy_dbs_info embedded in the allocated
    data structure and ->free() will take that pointer as its argument.

    With that, modify the code accessing the data fields in question
    in per-CPU data objects to look for them in the new structures
    via the struct policy_dbs_info pointer available to it and drop
    them from struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The ->store() callbacks of some tunable sysfs attributes of the
    ondemand and conservative governors trigger immediate updates of
    the CPU load information for all CPUs "governed" by the given
    dbs_data by walking the cpu_dbs_info structures for all online
    CPUs in the system and updating them.

    This is questionable for two reasons. First, it may lead to a lot of
    extra overhead on a system with many CPUs if the given dbs_data is
    only associated with a few of them. Second, if governor tunables are
    per-policy, the CPUs associated with the other sets of governor
    tunables should not be updated.

    To address this issue, use the observation that in all of the places
    in question the update operation may be carried out in the same way
    (because all of the tunables involved are now located in struct
    dbs_data and readily available to the common code) and make the
    code in those places invoke the same (new) helper function that
    will carry out the update correctly.

    That new function always checks the ignore_nice_load tunable value
    and updates the CPUs' prev_cpu_nice data fields if that's set, which
    wasn't done by the original code in store_io_is_busy(), but it
    should have been done in there too.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • To avoid having to check the governor type explicitly in the common
    code in order to initialize data structures specific to the governor
    type properly, add a ->start callback to struct dbs_governor and
    use it to initialize those data structures for the ondemand and
    conservative governors.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The io_is_busy governor tunable is only used by the ondemand governor
    and is located in the ondemand-specific data structure, but it is
    looked at by the common governor code that has to do ugly things to
    get to that value, so move it to struct dbs_data and modify ondemand
    accordingly.

    Since the conservative governor never touches that field, it will
    be always 0 for that governor and it won't have any effect on the
    results of computations in that case.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • It is possible for a dbs_data object to be updated after its
    usage counter has become 0. That may happen if governor_store()
    runs (via a govenor tunable sysfs attribute write) in parallel
    with cpufreq_governor_exit() called for the last cpufreq policy
    associated with the dbs_data in question. In that case, if
    governor_store() acquires dbs_data->mutex right after
    cpufreq_governor_exit() has released it, the ->store() callback
    invoked by it may operate on dbs_data with no users. Although
    sysfs will cause the kobject_put() in cpufreq_governor_exit() to
    block until governor_store() has returned, that situation may
    lead to some unexpected results, depending on the implementation
    of the ->store callback, and therefore it should be avoided.

    To that end, modify governor_store() to check the dbs_data's
    usage count before invoking the ->store() callback and return
    an error if it is 0 at that point.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Do not convert microseconds to jiffies and the other way around
    in governor computations related to the sampling rate and sample
    delay and drop delay_for_sampling_rate() which isn't of any use
    then.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The rate_mult field in struct od_cpu_dbs_info_s is used by the code
    shared with the conservative governor and to access it that code
    has to do an ugly governor type check. However, first of all it
    is ever only used for policy->cpu, so it is per-policy rather than
    per-CPU and second, it is initialized to 1 by cpufreq_governor_start(),
    so if the conservative governor never modifies it, it will have no
    effect on the results of any computations.

    For these reasons, move rate_mult to struct policy_dbs_info (as a
    common field).

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • If store_sampling_rate() updates the sample delay when the ondemand
    governor is in the middle of its high/low dance (OD_SUB_SAMPLE sample
    type is set), the governor will still do the bottom half of the
    previous sample which may take too much time.

    To prevent that from happening, change store_sampling_rate() to always
    reset the sample delay to 0 which also is consistent with the new
    behavior of cpufreq_governor_limits().

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The way the ->gov_check_cpu governor callback is used by the ondemand
    and conservative governors is not really straightforward. Namely, the
    governor calls dbs_check_cpu() that updates the load information for
    the policy and the invokes ->gov_check_cpu() for the governor.

    To get rid of that entanglement, notice that cpufreq_governor_limits()
    doesn't need to call dbs_check_cpu() directly. Instead, it can simply
    reset the sample delay to 0 which will cause a sample to be taken
    immediately. The result of that is practically equivalent to calling
    dbs_check_cpu() except that it will trigger a full update of governor
    internal state and not just the ->gov_check_cpu() part.

    Following that observation, make cpufreq_governor_limits() reset
    the sample delay and turn dbs_check_cpu() into a function that will
    simply evaluate the load and return the result called dbs_update().

    That function can now be called by governors from the routines that
    previously were pointed to by ->gov_check_cpu and those routines
    can be called directly by each governor instead of dbs_check_cpu().
    This way ->gov_check_cpu becomes unnecessary, so drop it.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Clean up some load-related computations in dbs_check_cpu() and
    cpufreq_governor_start() to get rid of unnecessary operations and
    type casts and make the code easier to read.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The contribution of the CPU nice time to the idle time in dbs_check_cpu()
    is computed in a bogus way, as the code may subtract current and previous
    nice values for different CPUs.

    That doesn't matter for cases when cpufreq policies are not shared,
    but may lead to problems otherwise.

    Fix the computation and simplify it to avoid taking unnecessary steps.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Rework the handling of work items by dbs_update_util_handler() and
    dbs_work_handler() so the former (which is executed in scheduler
    paths) only uses atomic operations when absolutely necessary. That
    is, when the policy is shared and dbs_update_util_handler() has
    already decided that this is the time to queue up a work item.

    In particular, this avoids the atomic ops entirely on platforms where
    policy objects are never shared.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The atomic work counter incrementation in gov_cancel_work() is not
    necessary any more, because work items won't be queued up after
    gov_clear_update_util() anyway, so drop it along with the comment
    about how it may be missed by the gov_clear_update_util().

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki