25 Mar, 2016

2 commits

  • Pull more power management and ACPI updates from Rafael Wysocki:
    "The second batch of power management and ACPI updates for v4.6.

    Included are fixups on top of the previous PM/ACPI pull request and
    other material that didn't make into it but still should go into 4.6.

    Among other things, there's a fix for an intel_pstate driver issue
    uncovered by recent cpufreq changes, a workaround for a boot hang on
    Skylake-H related to the handling of deep C-states by the platform and
    a PCI/ACPI fix for the handling of IO port resources on non-x86
    architectures plus some new device IDs and similar.

    Specifics:

    - Fix for an intel_pstate driver issue related to the handling of MSR
    updates uncovered by the recent cpufreq rework (Rafael Wysocki).

    - cpufreq core cleanups related to starting governors and frequency
    synchronization during resume from system suspend and a locking fix
    for cpufreq_quick_get() (Rafael Wysocki, Richard Cochran).

    - acpi-cpufreq and powernv cpufreq driver updates (Jisheng Zhang,
    Michael Neuling, Richard Cochran, Shilpasri Bhat).

    - intel_idle driver update preventing some Skylake-H systems from
    hanging during initialization by disabling deep C-states mishandled
    by the platform in the problematic configurations (Len Brown).

    - Intel Xeon Phi Processor x200 support for intel_idle
    (Dasaratharaman Chandramouli).

    - cpuidle menu governor updates to make it always honor PM QoS
    latency constraints (and prevent C1 from being used as the fallback
    C-state on x86 when they are set below its exit latency) and to
    restore the previous behavior to fall back to C1 if the next timer
    event is set far enough in the future that was changed in 4.4 which
    led to an energy consumption regression (Rik van Riel, Rafael
    Wysocki).

    - New device ID for a future AMD UART controller in the ACPI driver
    for AMD SoCs (Wang Hongcheng).

    - Rockchip rk3399 support for the rockchip-io-domain adaptive voltage
    scaling (AVS) driver (David Wu).

    - ACPI PCI resources management fix for the handling of IO space
    resources on architectures where the IO space is memory mapped
    (IA64 and ARM64) broken by the introduction of common ACPI
    resources parsing for PCI host bridges in 4.4 (Lorenzo Pieralisi).

    - Fix for the ACPI backend of the generic device properties API to
    make it parse non-device (data node only) children of an ACPI
    device correctly (Irina Tirdea).

    - Fixes for the handling of global suspend flags (introduced in 4.4)
    during hibernation and resume from it (Lukas Wunner).

    - Support for obtaining configuration information from Device Trees
    in the PM clocks framework (Jon Hunter).

    - ACPI _DSM helper code and devfreq framework cleanups (Colin Ian
    King, Geert Uytterhoeven)"

    * tag 'pm+acpi-4.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
    PM / AVS: rockchip-io: add io selectors and supplies for rk3399
    intel_idle: Support for Intel Xeon Phi Processor x200 Product Family
    intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled
    ACPI / PM: Runtime resume devices when waking from hibernate
    PM / sleep: Clear pm_suspend_global_flags upon hibernate
    cpufreq: governor: Always schedule work on the CPU running update
    cpufreq: Always update current frequency before startig governor
    cpufreq: Introduce cpufreq_update_current_freq()
    cpufreq: Introduce cpufreq_start_governor()
    cpufreq: powernv: Add sysfs attributes to show throttle stats
    cpufreq: acpi-cpufreq: make Intel/AMD MSR access, io port access static
    PCI: ACPI: IA64: fix IO port generic range check
    ACPI / util: cast data to u64 before shifting to fix sign extension
    cpufreq: powernv: Define per_cpu chip pointer to optimize hot-path
    cpuidle: menu: Fall back to polling if next timer event is near
    cpufreq: acpi-cpufreq: Clean up hot plug notifier callback
    intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts
    cpufreq: Make cpufreq_quick_get() safe to call
    ACPI / property: fix data node parsing in acpi_get_next_subnode()
    ACPI / APD: Add device HID for future AMD UART controller
    ...

    Linus Torvalds
     
  • * pm-cpufreq:
    cpufreq: governor: Always schedule work on the CPU running update
    cpufreq: Always update current frequency before startig governor
    cpufreq: Introduce cpufreq_update_current_freq()
    cpufreq: Introduce cpufreq_start_governor()
    cpufreq: powernv: Add sysfs attributes to show throttle stats
    cpufreq: acpi-cpufreq: make Intel/AMD MSR access, io port access static
    cpufreq: powernv: Define per_cpu chip pointer to optimize hot-path
    cpufreq: acpi-cpufreq: Clean up hot plug notifier callback
    intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts
    cpufreq: Make cpufreq_quick_get() safe to call

    * pm-cpuidle:
    intel_idle: Support for Intel Xeon Phi Processor x200 Product Family
    intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled
    cpuidle: menu: Fall back to polling if next timer event is near
    cpuidle: menu: use high confidence factors only when considering polling

    Rafael J. Wysocki
     

23 Mar, 2016

6 commits

  • Modify dbs_irq_work() to always schedule the process-context work
    on the current CPU which also ran the dbs_update_util_handler()
    that the irq_work being handled came from.

    This causes the entire frequency update handling (involving the
    "ondemand" or "conservative" governors) to be carried out by the
    CPU whose frequency is to be updated and reduces the overall amount
    of inter-CPU noise related to cpufreq.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • Make policy->cur match the current frequency returned by the driver's
    ->get() callback before starting the governor in case they went out of
    sync in the meantime and drop the piece of code attempting to
    resync policy->cur with the real frequency of the boot CPU from
    cpufreq_resume() as it serves no purpose any more (and it's racy and
    super-ugly anyway).

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Move the part of cpufreq_update_policy() that obtains the current
    frequency from the driver and updates policy->cur if necessary to
    a separate function, cpufreq_get_current_freq().

    That should not introduce functional changes and subsequent change
    set will need it.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Starting a governor in cpufreq always follows the same pattern
    involving two calls to cpufreq_governor(), one with the event
    argument set to CPUFREQ_GOV_START and one with that argument set to
    CPUFREQ_GOV_LIMITS.

    Introduce cpufreq_start_governor() that will carry out those two
    operations and make all places where governors are started use it.

    That slightly modifies the behavior of cpufreq_set_policy() which
    now also will go back to the old governor if the second call to
    cpufreq_governor() (the one with event equal to CPUFREQ_GOV_LIMITS)
    fails, but that really is how it should work in the first place.

    Also cpufreq_resume() will now pring an error message if the
    CPUFREQ_GOV_LIMITS call to cpufreq_governor() fails, but that
    makes it follow cpufreq_add_policy_cpu() and cpufreq_offline()
    in that respect.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Create sysfs attributes to export throttle information in
    /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats directory. The
    newly added sysfs files are as follows:

    1)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/turbo_stat
    2)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/sub-turbo_stat
    3)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/unthrottle
    4)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/powercap
    5)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/overtemp
    6)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/supply_fault
    7)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/overcurrent
    8)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/occ_reset

    Detailed explanation of each attribute is added to
    Documentation/ABI/testing/sysfs-devices-system-cpu

    Signed-off-by: Shilpasri G Bhat
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Shilpasri G Bhat
     
  • These frequency register read/write operations' implementations for the
    given processor (Intel/AMD MSR access or I/O port access) are only used
    internally in acpi-cpufreq, so make them static.

    Signed-off-by: Jisheng Zhang
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Jisheng Zhang
     

22 Mar, 2016

1 commit

  • Commit 96c4726f01cd "cpufreq: powernv: Remove cpu_to_chip_id() from
    hot-path" introduced a 'core_to_chip_map' array to cache the chip-ids
    of all cores.

    Replace this with a per-CPU variable that stores the pointer to the
    chip-array. This removes the linear lookup and provides a neater and
    simpler solution.

    Signed-off-by: Michael Neuling
    Signed-off-by: Shilpasri G Bhat
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Michael Neuling
     

21 Mar, 2016

2 commits

  • Pull ARM SoC driver updates from Arnd Bergmann:
    "Driver updates for ARM SoCs, these contain various things that touch
    the drivers/ directory but got merged through arm-soc for practical
    reasons:

    - Rockchip rk3368 gains power domain support
    - Small updates for the ARM spmi driver
    - The Atmel PMC driver saw a larger rework, touching both
    arch/arm/mach-at91 and drivers/clk/at91
    - All reset controller driver changes alway get merged through
    arm-soc, though this time the largest change is the addition of a
    MIPS pistachio reset driver
    - One bugfix for the NXP (formerly Freescale) i.MX weim bus driver"

    * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (43 commits)
    bus: imx-weim: Take the 'status' property value into account
    clk: at91: remove useless includes
    clk: at91: pmc: remove useless capacities handling
    clk: at91: pmc: drop at91_pmc_base
    usb: gadget: atmel: access the PMC using regmap
    ARM: at91: remove useless includes and function prototypes
    ARM: at91: pm: move idle functions to pm.c
    ARM: at91: pm: find and remap the pmc
    ARM: at91: pm: simply call at91_pm_init
    clk: at91: pmc: move pmc structures to C file
    clk: at91: pmc: merge at91_pmc_init in atmel_pmc_probe
    clk: at91: remove IRQ handling and use polling
    clk: at91: make use of syscon/regmap internally
    clk: at91: make use of syscon to share PMC registers in several drivers
    hwmon: (scpi) add energy meter support
    firmware: arm_scpi: add support for 64-bit sensor values
    firmware: arm_scpi: decrease Tx timeout to 20ms
    firmware: arm_scpi: fix send_message and sensor_get_value for big-endian
    reset: sti: Make reset_control_ops const
    reset: zynq: Make reset_control_ops const
    ...

    Linus Torvalds
     
  • Pull ARM SoC platform updates from Arnd Bergmann:
    "Newly added support for additional SoCs:
    - Axis Artpec-6 SoC family
    - Allwinner A83T SoC
    - Mediatek MT7623
    - NXP i.MX6QP SoC
    - ST Microelectronics stm32f469 microcontroller

    New features:
    - SMP support for Mediatek mt2701
    - Big-endian support for NXP i.MX
    - DaVinci now uses the new DMA engine dma_slave_map
    - OMAP now uses the new DMA engine dma_slave_map
    - earlyprintk support for palmchip uart on mach-tango
    - delay timer support for orion

    Other:
    - Exynos PMU driver moved out to drivers/soc/
    - Various smaller updates for Renesas, Xilinx, PXA, AT91, OMAP,
    uniphier"

    * tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (83 commits)
    ARM: uniphier: rework SMP code to support new System Bus binding
    ARM: uniphier: add missing of_node_put()
    ARM: at91: avoid defining CONFIG_* symbols in source code
    ARM: DRA7: hwmod: Add data for eDMA tpcc, tptc0, tptc1
    ARM: imx: Make reset_control_ops const
    ARM: imx: Do L2 errata only if the L2 cache isn't enabled
    ARM: imx: select ARM_CPU_SUSPEND only for imx6
    dmaengine: pxa_dma: fix the maximum requestor line
    ARM: alpine: select the Alpine MSI controller driver
    ARM: pxa: add the number of DMA requestor lines
    dmaengine: mmp-pdma: add number of requestors
    dma: mmp_pdma: Add the #dma-requests DT property documentation
    ARM: OMAP2+: Add rtc hwmod configuration for ti81xx
    ARM: s3c24xx: Avoid warning for inb/outb
    ARM: zynq: Move early printk virtual address to vmalloc area
    ARM: DRA7: hwmod: Add custom reset handler for PCIeSS
    ARM: SAMSUNG: Remove unused register offset definition
    ARM: EXYNOS: Cleanup header files inclusion
    drivers: soc: samsung: Enable COMPILE_TEST
    MAINTAINERS: Add maintainers entry for drivers/soc/samsung
    ...

    Linus Torvalds
     

20 Mar, 2016

2 commits

  • This driver has two issues. First, it tries to fiddle with the hot
    plugged CPU's MSR on the UP_PREPARE event, at a time when the CPU is
    not yet online. Second, the driver sets the "boost-disable" bit for a
    CPU when going down, but does not clear the bit again if the CPU comes
    up again due to DOWN_FAILED.

    This patch fixes the issues by changing the driver to react to the
    ONLINE/DOWN_FAILED events instead of UP_PREPARE. As an added benefit,
    the driver also becomes symmetric with respect to the hot plug
    mechanism.

    Signed-off-by: Richard Cochran
    Signed-off-by: Rafael J. Wysocki

    Richard Cochran
     
  • After commit a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with
    utilization update callbacks) wrmsrl_on_cpu() cannot be called in the
    intel_pstate_adjust_busy_pstate() path as that is executed with
    disabled interrupts. However, atom_set_pstate() called from there
    via intel_pstate_set_pstate() uses wrmsrl_on_cpu() to update the
    IA32_PERF_CTL MSR which triggers the WARN_ON_ONCE() in
    smp_call_function_single().

    The reason why wrmsrl_on_cpu() is used by atom_set_pstate() is
    because intel_pstate_set_pstate() calling it is also invoked during
    the initialization and cleanup of the driver and in those cases it is
    not guaranteed to be run on the CPU that is being updated. However,
    in the case when intel_pstate_set_pstate() is called by
    intel_pstate_adjust_busy_pstate(), wrmsrl() can be used to update
    the register safely. Moreover, intel_pstate_set_pstate() already
    contains code that only is executed if the function is called by
    intel_pstate_adjust_busy_pstate() and there is a special argument
    passed to it because of that.

    To fix the problem at hand, rearrange the code taking the above
    observations into account.

    First, replace the ->set() callback in struct pstate_funcs with a
    ->get_val() one that will return the value to be written to the
    IA32_PERF_CTL MSR without updating the register.

    Second, split intel_pstate_set_pstate() into two functions,
    intel_pstate_update_pstate() to be called by
    intel_pstate_adjust_busy_pstate() that will contain all of the
    intel_pstate_set_pstate() code which only needs to be executed in
    that case and will use wrmsrl() to update the MSR (after obtaining
    the value to write to it from the ->get_val() callback), and
    intel_pstate_set_min_pstate() to be invoked during the
    initialization and cleanup that will set the P-state to the
    minimum one and will update the MSR using wrmsrl_on_cpu().

    Finally, move the code shared between intel_pstate_update_pstate()
    and intel_pstate_set_min_pstate() to a new static inline function
    intel_pstate_record_pstate() and make them both call it.

    Of course, that unifies the handling of the IA32_PERF_CTL MSR writes
    between Atom and Core.

    Fixes: a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks)
    Reported-and-tested-by: Josh Boyer
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

18 Mar, 2016

1 commit

  • The function, cpufreq_quick_get, accesses the global 'cpufreq_driver' and
    its fields without taking the associated lock, cpufreq_driver_lock.

    Without the locking, nothing guarantees that 'cpufreq_driver' remains
    consistent during the call. This patch fixes the issue by taking the lock
    before accessing the data structure.

    Signed-off-by: Richard Cochran
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Richard Cochran
     

14 Mar, 2016

1 commit

  • * pm-cpufreq: (94 commits)
    intel_pstate: Do not skip samples partially
    intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
    intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
    intel_pstate: Optimize calculation for max/min_perf_adj
    intel_pstate: Remove extra conversions in pid calculation
    cpufreq: Move scheduler-related code to the sched directory
    Revert "cpufreq: postfix policy directory with the first CPU in related_cpus"
    cpufreq: Reduce cpufreq_update_util() overhead a bit
    cpufreq: Select IRQ_WORK if CPU_FREQ_GOV_COMMON is set
    cpufreq: Remove 'policy->governor_enabled'
    cpufreq: Rename __cpufreq_governor() to cpufreq_governor()
    cpufreq: Relocate handle_update() to kill its declaration
    cpufreq: governor: Drop unnecessary checks from show() and store()
    cpufreq: governor: Fix race in dbs_update_util_handler()
    cpufreq: governor: Make gov_set_update_util() static
    cpufreq: governor: Narrow down the dbs_data_mutex coverage
    cpufreq: governor: Make dbs_data_mutex static
    cpufreq: governor: Relocate definitions of tuners structures
    cpufreq: governor: Move per-CPU data to the common code
    cpufreq: governor: Make governor private data per-policy
    ...

    Rafael J. Wysocki
     

11 Mar, 2016

7 commits


10 Mar, 2016

1 commit

  • Revert commit 3510fac45492 (cpufreq: postfix policy directory with the
    first CPU in related_cpus).

    Earlier, the policy->kobj was added to the kobject core, before ->init()
    callback was called for the cpufreq drivers. Which allowed those drivers
    to add or remove, driver dependent, sysfs files/directories to the same
    kobj from their ->init() and ->exit() callbacks.

    That isn't possible anymore after commit 3510fac45492.

    Now, there is no other clean alternative that people can adopt.

    Its better to revert the earlier commit to allow cpufreq drivers to
    create/remove sysfs files from ->init() and ->exit() callbacks.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

09 Mar, 2016

17 commits

  • Use the observation that cpufreq_update_util() is only called
    by the scheduler with rq->lock held, so the callers of
    cpufreq_set_update_util_data() can use synchronize_sched()
    instead of synchronize_rcu() to wait for cpufreq_update_util()
    to complete. Moreover, if they are updated to do that,
    rcu_read_(un)lock() calls in cpufreq_update_util() might be
    replaced with rcu_read_(un)lock_sched(), respectively, but
    those aren't really necessary, because the scheduler calls
    that function from RCU-sched read-side critical sections
    already.

    In addition to that, if cpufreq_set_update_util_data() checks
    the func field in the struct update_util_data before setting
    the per-CPU pointer to it, the data->func check may be dropped
    from cpufreq_update_util() as well.

    Make the above changes to reduce the overhead from
    cpufreq_update_util() in the scheduler paths invoking it
    and to make the cleanup after removing its callbacks less
    heavy-weight somewhat.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar
    Acked-by: Peter Zijlstra (Intel)

    Rafael J. Wysocki
     
  • Commit 0eb463be3436 (cpufreq: governor: Replace timers with utilization
    update callbacks) made CPU_FREQ select IRQ_WORK, but that's not
    necessary, as it is sufficient for IRQ_WORK to be selected by
    CPU_FREQ_GOV_COMMON, so modify the cpufreq Kconfig to that effect.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The entire sequence of events (like INIT/START or STOP/EXIT) for which
    cpufreq_governor() is called, is guaranteed to be protected by
    policy->rwsem now.

    The additional checks that were added earlier (as we were forced to drop
    policy->rwsem before calling cpufreq_governor() for EXIT event), aren't
    required anymore.

    Over that, they weren't sufficient really. They just take care of
    START/STOP events, but not INIT/EXIT and the state machine was never
    maintained properly by them.

    Kill the unnecessary checks and policy->governor_enabled field.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • The __ at the beginning of the routine aren't really necessary at all.
    Rename it to cpufreq_governor() instead.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • handle_update() is declared at the top of the file as its user appear
    before its definition. Relocate the routine to get rid of this.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • The show() and store() routines in the cpufreq-governor core don't need
    to check if the struct governor_attr they want to use really provides
    the callbacks they need as expected (if that's not the case, it means a
    bug in the code anyway), so change them to avoid doing that.

    Also change the error value to -EBUSY, if the governor is getting
    removed and we aren't allowed to store any more changes.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • There is a scenario that may lead to undesired results in
    dbs_update_util_handler(). Namely, if two CPUs sharing a policy
    enter the funtion at the same time, pass the sample delay check
    and then one of them is stalled until dbs_work_handler() (queued
    up by the other CPU) clears the work counter, it may update the
    work counter and queue up another work item prematurely.

    To prevent that from happening, use the observation that the CPU
    queuing up a work item in dbs_update_util_handler() updates the
    last sample time. This means that if another CPU was stalling after
    passing the sample delay check and now successfully updated the work
    counter as a result of the race described above, it will see the new
    value of the last sample time which is different from what it used in
    the sample delay check before. If that happens, the sample delay
    check passed previously is not valid any more, so the CPU should not
    continue.

    Fixes: f17cbb53783c (cpufreq: governor: Avoid atomic operations in hot paths)
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The gov_set_update_util() routine is only used internally by the
    common governor code and it doesn't need to be exported, so make
    it static.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Since cpufreq_governor_dbs() is now always called with policy->rwsem
    held, it cannot be executed twice in parallel for the same policy.
    Thus it is not necessary to hold dbs_data_mutex around the invocations
    of cpufreq_governor_start/stop/limits() from it as those functions
    never modify any data that can be shared between different policies.

    However, cpufreq_governor_dbs() may be executed twice in parallal
    for different policies using the same gov->gdbs_data object and
    dbs_data_mutex is still necessary to protect that object against
    concurrent updates.

    For this reason, narrow down the dbs_data_mutex locking to
    cpufreq_governor_init/exit() where it is needed and rename the
    mutex to gov_dbs_data_mutex to reflect its purpose.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • That mutex is only used by cpufreq_governor_dbs() and it doesn't
    need to be exported to modules, so make it static and drop the
    export incantation.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Move the definitions of struct od_dbs_tuners and struct cs_dbs_tuners
    from the common governor header to the ondemand and conservative
    governor code, respectively, as they don't need to be in the common
    header any more.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • After previous changes there is only one piece of code in the
    ondemand governor making references to per-CPU data structures,
    but it can be easily modified to avoid doing that, so modify it
    accordingly and move the definition of per-CPU data used by the
    ondemand and conservative governors to the common code. Next,
    change that code to access the per-CPU data structures directly
    rather than via a governor callback.

    This causes the ->get_cpu_cdbs governor callback to become
    unnecessary, so drop it along with the macro and function
    definitions related to it.

    Finally, drop the definitions of struct od_cpu_dbs_info_s and
    struct cs_cpu_dbs_info_s that aren't necessary any more.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Some fields in struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s
    are only used for a limited set of CPUs. Namely, if a policy is
    shared between multiple CPUs, those fields will only be used for one
    of them (policy->cpu). This means that they really are per-policy
    rather than per-CPU and holding room for them in per-CPU data
    structures is generally wasteful. Also moving those fields into
    per-policy data structures will allow some significant simplifications
    to be made going forward.

    For this reason, introduce struct cs_policy_dbs_info and
    struct od_policy_dbs_info to hold those fields. Define each of the
    new structures as an extension of struct policy_dbs_info (such that
    struct policy_dbs_info is embedded in each of them) and introduce
    new ->alloc and ->free governor callbacks to allocate and free
    those structures, respectively, such that ->alloc() will return
    a pointer to the struct policy_dbs_info embedded in the allocated
    data structure and ->free() will take that pointer as its argument.

    With that, modify the code accessing the data fields in question
    in per-CPU data objects to look for them in the new structures
    via the struct policy_dbs_info pointer available to it and drop
    them from struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The ondemand_powersave_bias_init() function used for resetting data
    fields related to the powersave bias tunable of the ondemand governor
    works by walking all of the online CPUs in the system and updating the
    od_cpu_dbs_info_s structures for all of them.

    However, if governor tunables are per policy, the update should not
    touch the CPUs that are not associated with the given dbs_data.

    Moreover, since the data fields in question are only ever used for
    policy->cpu in each policy governed by ondemand, the update can be
    limited to those specific CPUs.

    Rework the code to take the above observations into account.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The ->store() callbacks of some tunable sysfs attributes of the
    ondemand and conservative governors trigger immediate updates of
    the CPU load information for all CPUs "governed" by the given
    dbs_data by walking the cpu_dbs_info structures for all online
    CPUs in the system and updating them.

    This is questionable for two reasons. First, it may lead to a lot of
    extra overhead on a system with many CPUs if the given dbs_data is
    only associated with a few of them. Second, if governor tunables are
    per-policy, the CPUs associated with the other sets of governor
    tunables should not be updated.

    To address this issue, use the observation that in all of the places
    in question the update operation may be carried out in the same way
    (because all of the tunables involved are now located in struct
    dbs_data and readily available to the common code) and make the
    code in those places invoke the same (new) helper function that
    will carry out the update correctly.

    That new function always checks the ignore_nice_load tunable value
    and updates the CPUs' prev_cpu_nice data fields if that's set, which
    wasn't done by the original code in store_io_is_busy(), but it
    should have been done in there too.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The ->powersave_bias_init_cpu callback in struct od_ops is only used
    in one place and that invocation may be replaced with a direct call
    to the function pointed to by that callback, so change the code
    accordingly and drop the callback.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • After some previous changes, the ->get_cpu_dbs_info_s governor
    callback and the "governor" field in struct dbs_governor (whose
    value represents the governor type) are not used any more, so
    drop them.

    Also drop the unused gov_ops field from struct dbs_governor.

    No functional changes.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Viresh Kumar

    Rafael J. Wysocki