12 Mar, 2019

1 commit

  • After commit b8bd1581aa61 ("cpufreq: intel_pstate: Rework iowait
    boosting to be less aggressive") the handling of the case when
    the SCHED_CPUFREQ_IOWAIT flag is set again after a few iterations of
    intel_pstate_update_util() is a bit inconsistent, because the
    new value of cpu->iowait_boost may be lower than ONE_EIGHTH_FP
    if it was set before, but has not dropped down to zero just yet.

    Fix that up by ensuring that the new value of cpu->iowait_boost
    will always be at least ONE_EIGHTH_FP then.

    Fixes: b8bd1581aa61 ("cpufreq: intel_pstate: Rework iowait boosting to be less aggressive")
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

18 Feb, 2019

3 commits

  • The current iowait boosting mechanism in intel_pstate_update_util()
    is quite aggressive, as it goes to the maximum P-state right away,
    and may cause excessive amounts of energy to be used, which is not
    desirable and arguably isn't necessary too.

    Follow commit a5a0809bc58e ("cpufreq: schedutil: Make iowait boost
    more energy efficient") that reworked the analogous iowait boost
    mechanism in the schedutil governor and make the iowait boosting
    in intel_pstate_update_util() work along the same lines.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • There is only one caller of intel_pstate_get_base_pstate() and it is
    more straightforward to carry out the computation directly in the
    caller, so do that and drop intel_pstate_get_base_pstate().

    No intentional changes of behavior.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • After commit 1a4fe38add8b ("cpufreq: intel_pstate: Remove max/min
    fractions to limit performance") the initial value of the pstate local
    variable in intel_pstate_max_within_limits() and the initial value of
    the max_pstate local variable in intel_pstate_prepare_request() are
    both immediately discarded, so initialize both these variables to
    their target values upfront.

    No intentional changes of behavior.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

13 Feb, 2019

1 commit

  • The init code path has several exceptions where the driver can
    decide not to load.

    As CONFIG_X86_INTEL_PSTATE is generally set to Y, the return code is
    not reachable. The initialization code is neither verbose of the
    reason why it did choose to prematurely exit, so it is difficult for
    a user to determine, on a given platform, why the driver didn't load
    properly.

    This patch is about reporting to the user the reason/context of why
    the driver failed to load. That is a precious hint when debugging
    a platform.

    Signed-off-by: Erwan Velu
    [ rjw: Subject & changelog, minor fixups ]
    Signed-off-by: Rafael J. Wysocki

    Erwan Velu
     

29 Jan, 2019

1 commit

  • The cpufreq_global_kobject is created using kobject_create_and_add()
    helper, which assigns the kobj_type as dynamic_kobj_ktype and show/store
    routines are set to kobj_attr_show() and kobj_attr_store().

    These routines pass struct kobj_attribute as an argument to the
    show/store callbacks. But all the cpufreq files created using the
    cpufreq_global_kobject expect the argument to be of type struct
    attribute. Things work fine currently as no one accesses the "attr"
    argument. We may not see issues even if the argument is used, as struct
    kobj_attribute has struct attribute as its first element and so they
    will both get same address.

    But this is logically incorrect and we should rather use struct
    kobj_attribute instead of struct global_attr in the cpufreq core and
    drivers and the show/store callbacks should take struct kobj_attribute
    as argument instead.

    This bug is caught using CFI CLANG builds in android kernel which
    catches mismatch in function prototypes for such callbacks.

    Reported-by: Donghee Han
    Reported-by: Sangkyu Kim
    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

27 Dec, 2018

1 commit

  • Pull RCU updates from Ingo Molnar:
    "The biggest RCU changes in this cycle were:

    - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar.

    - Replace calls of RCU-bh and RCU-sched update-side functions to
    their vanilla RCU counterparts. This series is a step towards
    complete removal of the RCU-bh and RCU-sched update-side functions.

    ( Note that some of these conversions are going upstream via their
    respective maintainers. )

    - Documentation updates, including a number of flavor-consolidation
    updates from Joel Fernandes.

    - Miscellaneous fixes.

    - Automate generation of the initrd filesystem used for rcutorture
    testing.

    - Convert spin_is_locked() assertions to instead use lockdep.

    ( Note that some of these conversions are going upstream via their
    respective maintainers. )

    - SRCU updates, especially including a fix from Dennis Krein for a
    bag-on-head-class bug.

    - RCU torture-test updates"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits)
    rcutorture: Don't do busted forward-progress testing
    rcutorture: Use 100ms buckets for forward-progress callback histograms
    rcutorture: Recover from OOM during forward-progress tests
    rcutorture: Print forward-progress test age upon failure
    rcutorture: Print time since GP end upon forward-progress failure
    rcutorture: Print histogram of CB invocation at OOM time
    rcutorture: Print GP age upon forward-progress failure
    rcu: Print per-CPU callback counts for forward-progress failures
    rcu: Account for nocb-CPU callback counts in RCU CPU stall warnings
    rcutorture: Dump grace-period diagnostics upon forward-progress OOM
    rcutorture: Prepare for asynchronous access to rcu_fwd_startat
    torture: Remove unnecessary "ret" variables
    rcutorture: Affinity forward-progress test to avoid housekeeping CPUs
    rcutorture: Break up too-long rcu_torture_fwd_prog() function
    rcutorture: Remove cbflood facility
    torture: Bring any extra CPUs online during kernel startup
    rcutorture: Add call_rcu() flooding forward-progress tests
    rcutorture/formal: Replace synchronize_sched() with synchronize_rcu()
    tools/kernel.h: Replace synchronize_sched() with synchronize_rcu()
    net/decnet: Replace rcu_barrier_bh() with rcu_barrier()
    ...

    Linus Torvalds
     

30 Nov, 2018

1 commit

  • Force HWP Request MAX = HWP Request MIN = HWP Capability MIN and EPP to
    0xFF. In this way the performance limits on the offlined CPU will not
    influence performance limits on its sibling CPU, which is still online.

    If the sibling CPU is calling for higher performance, it will impact the
    max core performance. Here core performance will follow higher of the
    performance requests from each sibling.

    Reported-and-tested-by: Chen Yu
    Signed-off-by: Srinivas Pandruvada
    Signed-off-by: Rafael J. Wysocki

    Srinivas Pandruvada
     

28 Nov, 2018

1 commit


26 Oct, 2018

1 commit


23 Oct, 2018

1 commit

  • Pull perf updates from Ingo Molnar:
    "The main updates in this cycle were:

    - Lots of perf tooling changes too voluminous to list (big perf trace
    and perf stat improvements, lots of libtraceevent reorganization,
    etc.), so I'll list the authors and refer to the changelog for
    details:

    Benjamin Peterson, Jérémie Galarneau, Kim Phillips, Peter
    Zijlstra, Ravi Bangoria, Sangwon Hong, Sean V Kelley, Steven
    Rostedt, Thomas Gleixner, Ding Xiang, Eduardo Habkost, Thomas
    Richter, Andi Kleen, Sanskriti Sharma, Adrian Hunter, Tzvetomir
    Stoyanov, Arnaldo Carvalho de Melo, Jiri Olsa.

    ... with the bulk of the changes written by Jiri Olsa, Tzvetomir
    Stoyanov and Arnaldo Carvalho de Melo.

    - Continued intel_rdt work with a focus on playing well with perf
    events. This also imported some non-perf RDT work due to
    dependencies. (Reinette Chatre)

    - Implement counter freezing for Arch Perfmon v4 (Skylake and newer).
    This allows to speed up the PMI handler by avoiding unnecessary MSR
    writes and make it more accurate. (Andi Kleen)

    - kprobes cleanups and simplification (Masami Hiramatsu)

    - Intel Goldmont PMU updates (Kan Liang)

    - ... plus misc other fixes and updates"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (155 commits)
    kprobes/x86: Use preempt_enable() in optimized_callback()
    x86/intel_rdt: Prevent pseudo-locking from using stale pointers
    kprobes, x86/ptrace.h: Make regs_get_kernel_stack_nth() not fault on bad stack
    perf/x86/intel: Export mem events only if there's PEBS support
    x86/cpu: Drop pointless static qualifier in punit_dev_state_show()
    x86/intel_rdt: Fix initial allocation to consider CDP
    x86/intel_rdt: CBM overlap should also check for overlap with CDP peer
    x86/intel_rdt: Introduce utility to obtain CDP peer
    tools lib traceevent, perf tools: Move struct tep_handler definition in a local header file
    tools lib traceevent: Separate out tep_strerror() for strerror_r() issues
    perf python: More portable way to make CFLAGS work with clang
    perf python: Make clang_has_option() work on Python 3
    perf tools: Free temporary 'sys' string in read_event_files()
    perf tools: Avoid double free in read_event_file()
    perf tools: Free 'printk' string in parse_ftrace_printk()
    perf tools: Cleanup trace-event-info 'tdata' leak
    perf strbuf: Match va_{add,copy} with va_end
    perf test: S390 does not support watchpoints in test 22
    perf auxtrace: Include missing asm/bitsperlong.h to get BITS_PER_LONG
    tools include: Adopt linux/bits.h
    ...

    Linus Torvalds
     

16 Oct, 2018

1 commit

  • Expose base_frequency to user space via cpufreq sysfs when HWP is in
    use.

    This HWP base frequency is read from the ACPI _CPC object if present,
    or from the HWP Capabilities MSR otherwise.

    On the majority of the HWP platforms the _CPC object will point to
    the HWP Capabilities MSR using the "Functional Fixed Hardware"
    address space type. The address space type also can simply be
    ACPI_TYPE_INTEGER, however, in which case the platform firmware
    can set its value at the initialization time based on the system
    constraints.

    Signed-off-by: Srinivas Pandruvada
    [ rjw: Changelog ]
    Signed-off-by: Rafael J. Wysocki

    Srinivas Pandruvada
     

02 Oct, 2018

1 commit

  • Going primarily by:

    https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors

    with additional information gleaned from other related pages; notably:

    - Bonnell shrink was called Saltwell
    - Moorefield is the Merriefield refresh which makes it Airmont

    The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE

    for i in `git grep -l FAM6_ATOM` ; do
    sed -i -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g' \
    -e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/' \
    -e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g' \
    -e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g' \
    -e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g' \
    -e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g' \
    -e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g' \
    -e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g' \
    -e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g' \
    -e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g' \
    -e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i}
    done

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: dave.hansen@linux.intel.com
    Cc: len.brown@intel.com
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

06 Aug, 2018

2 commits


31 Jul, 2018

1 commit

  • Dynamic boosting of HWP performance on IO wake showed significant
    improvement to IO workloads. This series was intended for Skylake Xeon
    platforms only and feature was enabled by default based on CPU model
    number.

    But some Xeon platforms reused the Skylake desktop CPU model number. This
    caused some undesirable side effects to some graphics workloads. Since
    they are heavily IO bound, the increase in CPU performance decreased the
    power available for GPU to do its computing and hence decrease in graphics
    benchmark performance.

    For example on a Skylake desktop, GpuTest benchmark showed average FPS
    reduction from 529 to 506.

    This change makes sure that HWP boost feature is only enabled for Skylake
    server platforms by using ACPI FADT preferred PM Profile. If some desktop
    users wants to get benefit of boost, they can still enable boost from
    intel_pstate sysfs attribute "hwp_dynamic_boost".

    Fixes: 41ab43c9c89e (cpufreq: intel_pstate: enable boost for Skylake Xeon)
    Link: https://bugs.freedesktop.org/show_bug.cgi?id=107410
    Reported-by: Eero Tamminen
    Signed-off-by: Srinivas Pandruvada
    Reviewed-by: Francisco Jerez
    Acked-by: Mel Gorman
    Signed-off-by: Rafael J. Wysocki

    Srinivas Pandruvada
     

25 Jul, 2018

1 commit


19 Jul, 2018

1 commit

  • On HWP platforms with Turbo 3.0, the HWP capability max ratio shows the
    maximum ratio of that core, which can be different than other cores. If
    we show the correct maximum frequency in cpufreq sysfs via
    cpuinfo_max_freq and scaling_max_freq then, user can know which cores
    can run faster for pinning some high priority tasks.

    Currently the max turbo frequency is shown as max frequency, which is
    the max of all cores, even if some cores can't reach that frequency
    even for single threaded workload.

    But it is possible that max ratio in HWP capabilities is set as 0xFF or
    some high invalid value (E.g. One KBL NUC). Since the actual performance
    can never exceed 1 core turbo frequency from MSR TURBO_RATIO_LIMIT, we
    use this as a bound check.

    Signed-off-by: Srinivas Pandruvada
    Signed-off-by: Rafael J. Wysocki

    Srinivas Pandruvada
     

18 Jul, 2018

1 commit

  • Currently, intel_pstate doesn't register if _PSS is not present on
    HP Proliant systems, because it expects the firmware to take over
    CPU performance scaling in that case. However, if ACPI PCCH is
    present, the firmware expects the kernel to use it for CPU
    performance scaling and the pcc-cpufreq driver is loaded for that.

    Unfortunately, the firmware interface used by that driver is not
    scalable for fundamental reasons, so pcc-cpufreq is way suboptimal
    on systems with more than just a few CPUs. In fact, it is better to
    avoid using it at all.

    For this reason, modify intel_pstate to look for ACPI PCCH if _PSS
    is not present and register if it is there. Also prevent the
    pcc-cpufreq driver from trying to initialize itself if intel_pstate
    has been registered already.

    Fixes: fbbcdc0744da (intel_pstate: skip the driver if ACPI has power mgmt option)
    Reported-by: Andreas Herrmann
    Reviewed-by: Andreas Herrmann
    Acked-by: Srinivas Pandruvada
    Tested-by: Andreas Herrmann
    Cc: 4.16+ # 4.16+
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

02 Jul, 2018

1 commit


19 Jun, 2018

1 commit

  • When scaling max/min settings are changed, internally they are converted
    to a ratio using the max turbo 1 core turbo frequency. This works fine
    when 1 core max is same irrespective of the core. But under Turbo 3.0,
    this will not be the case. For example:
    Core 0: max turbo pstate: 43 (4.3GHz)
    Core 1: max turbo pstate: 45 (4.5GHz)
    In this case 1 core turbo ratio will be maximum of all, so it will be
    45 (4.5GHz). Suppose scaling max is set to 4GHz (ratio 40) for all cores
    ,then on core one it will be
    = max_state * policy->max / max_freq;
    = 43 * (4000000/4500000) = 38 (3.8GHz)
    = 38
    which is 200MHz less than the desired.
    On core2, it will be correctly set to ratio 40 (4GHz). Same holds true
    for scaling min frequency limit. So this requires usage of correct turbo
    max frequency for core one, which in this case is 4.3GHz. So we need to
    adjust per CPU cpu->pstate.turbo_freq using the maximum HWP ratio of that
    core.

    This change uses the HWP capability of a core to adjust max turbo
    frequency. But since Broadwell HWP doesn't use ratios in the HWP
    capabilities, we have to use legacy max 1 core turbo ratio. This is not
    a problem as the HWP capabilities don't differ among cores in Broadwell.
    We need to check for non Broadwell CPU model for applying this change,
    though.

    Signed-off-by: Srinivas Pandruvada
    Cc: 4.6+ # 4.6+
    Signed-off-by: Rafael J. Wysocki

    Srinivas Pandruvada
     

13 Jun, 2018

2 commits

  • Pull more power management updates from Rafael Wysocki:
    "These revert a recent PM core change that introduced a regression, fix
    the build when the recently added Kryo cpufreq driver is selected, add
    support for devices attached to multiple power domains to the generic
    power domains (genpd) framework, add support for iowait boosting on
    systens with hardware-managed P-states (HWP) enabled to the
    intel_pstate driver, modify the behavior of the wakeup_count device
    attribute in sysfs, fix a few issues and clean up some ugliness,
    mostly in cpufreq (core and drivers) and in the cpupower utility.

    Specifics:

    - Revert a recent PM core change that attempted to fix an issue
    related to device links, but introduced a regression (Rafael
    Wysocki)

    - Fix build when the recently added cpufreq driver for Kryo
    processors is selected by making it possible to build that driver
    as a module (Arnd Bergmann)

    - Fix the long idle detection mechanism in the out-of-band (ondemand
    and conservative) cpufreq governors (Chen Yu)

    - Add support for devices in multiple power domains to the generic
    power domains (genpd) framework (Ulf Hansson)

    - Add support for iowait boosting on systems with hardware-managed
    P-states (HWP) enabled to the intel_pstate driver and make it use
    that feature on systems with Skylake Xeon processors as it is
    reported to improve performance significantly on those systems
    (Srinivas Pandruvada)

    - Fix and update the acpi_cpufreq, ti-cpufreq and imx6q cpufreq
    drivers (Colin Ian King, Suman Anna, Sébastien Szymanski)

    - Change the behavior of the wakeup_count device attribute in sysfs
    to expose the number of events when the device might have aborted
    system suspend in progress (Ravi Chandra Sadineni)

    - Fix two minor issues in the cpupower utility (Abhishek Goel, Colin
    Ian King)"

    * tag 'pm-4.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    Revert "PM / runtime: Fixup reference counting of device link suppliers at probe"
    cpufreq: imx6q: check speed grades for i.MX6ULL
    cpufreq: governors: Fix long idle detection logic in load calculation
    cpufreq: intel_pstate: enable boost for Skylake Xeon
    PM / wakeup: Export wakeup_count instead of event_count via sysfs
    PM / Domains: Add dev_pm_domain_attach_by_id() to manage multi PM domains
    PM / Domains: Add support for multi PM domains per device to genpd
    PM / Domains: Split genpd_dev_pm_attach()
    PM / Domains: Don't attach devices in genpd with multi PM domains
    PM / Domains: dt: Allow power-domain property to be a list of specifiers
    cpufreq: intel_pstate: New sysfs entry to control HWP boost
    cpufreq: intel_pstate: HWP boost performance on IO wakeup
    cpufreq: intel_pstate: Add HWP boost utility and sched util hooks
    cpufreq: ti-cpufreq: Use devres managed API in probe()
    cpufreq: ti-cpufreq: Fix an incorrect error return value
    cpufreq: ACPI: make function acpi_cpufreq_fast_switch() static
    cpufreq: kryo: allow building as a loadable module
    cpupower : Fix header name to read idle state name
    cpupower: fix spelling mistake: "logilename" -> "logfilename"

    Linus Torvalds
     
  • The vzalloc() function has no 2-factor argument form, so multiplication
    factors need to be wrapped in array_size(). This patch replaces cases of:

    vzalloc(a * b)

    with:
    vzalloc(array_size(a, b))

    as well as handling cases of:

    vzalloc(a * b * c)

    with:

    vzalloc(array3_size(a, b, c))

    This does, however, attempt to ignore constant size factors like:

    vzalloc(4 * 1024)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    vzalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    vzalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    vzalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    vzalloc(
    - sizeof(TYPE) * (COUNT_ID)
    + array_size(COUNT_ID, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * COUNT_ID
    + array_size(COUNT_ID, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * (COUNT_CONST)
    + array_size(COUNT_CONST, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * COUNT_CONST
    + array_size(COUNT_CONST, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * (COUNT_ID)
    + array_size(COUNT_ID, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * COUNT_ID
    + array_size(COUNT_ID, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * (COUNT_CONST)
    + array_size(COUNT_CONST, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * COUNT_CONST
    + array_size(COUNT_CONST, sizeof(THING))
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    vzalloc(
    - SIZE * COUNT
    + array_size(COUNT, SIZE)
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    vzalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    vzalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    vzalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    vzalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    vzalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    vzalloc(C1 * C2 * C3, ...)
    |
    vzalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants.
    @@
    expression E1, E2;
    constant C1, C2;
    @@

    (
    vzalloc(C1 * C2, ...)
    |
    vzalloc(
    - E1 * E2
    + array_size(E1, E2)
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

08 Jun, 2018

1 commit


06 Jun, 2018

3 commits

  • A new attribute is added to intel_pstate sysfs to enable/disable
    HWP dynamic performance boost.

    Reported-by: Mel Gorman
    Tested-by: Giovanni Gherdovich
    Signed-off-by: Srinivas Pandruvada
    Signed-off-by: Rafael J. Wysocki

    Srinivas Pandruvada
     
  • This change uses SCHED_CPUFREQ_IOWAIT flag to boost HWP performance.
    Since SCHED_CPUFREQ_IOWAIT flag is set frequently, we don't start
    boosting steps unless we see two consecutive flags in two ticks. This
    avoids boosting due to IO because of regular system activities.

    To avoid synchronization issues, the actual processing of the flag is
    done on the local CPU callback.

    Reported-by: Mel Gorman
    Tested-by: Giovanni Gherdovich
    Signed-off-by: Srinivas Pandruvada
    Signed-off-by: Rafael J. Wysocki

    Srinivas Pandruvada
     
  • Added two utility functions to HWP boost up gradually and boost down to
    the default cached HWP request values.

    Boost up:
    Boost up updates HWP request minimum value in steps. This minimum value
    can reach upto at HWP request maximum values depends on how frequently,
    this boost up function is called. At max, boost up will take three steps
    to reach the maximum, depending on the current HWP request levels and HWP
    capabilities. For example, if the current settings are:
    If P0 (Turbo max) = P1 (Guaranteed max) = min
    No boost at all.
    If P0 (Turbo max) > P1 (Guaranteed max) = min
    Should result in one level boost only for P0.
    If P0 (Turbo max) = P1 (Guaranteed max) > min
    Should result in two level boost:
    (min + p1)/2 and P1.
    If P0 (Turbo max) > P1 (Guaranteed max) > min
    Should result in three level boost:
    (min + p1)/2, P1 and P0.
    We don't set any level between P0 and P1 as there is no guarantee that
    they will be honored.

    Boost down:
    After the system is idle for hold time of 3ms, the HWP request is reset
    to the default value from HWP init or user modified one via sysfs.

    Caching of HWP Request and Capabilities
    Store the HWP request value last set using MSR_HWP_REQUEST and read
    MSR_HWP_CAPABILITIES. This avoid reading of MSRs in the boost utility
    functions.

    These boost utility functions calculated limits are based on the latest
    HWP request value, which can be modified by setpolicy() callback. So if
    user space modifies the minimum perf value, that will be accounted for
    every time the boost up is called. There will be case when there can be
    contention with the user modified minimum perf, in that case user value
    will gain precedence. For example just before HWP_REQUEST MSR is updated
    from setpolicy() callback, the boost up function is called via scheduler
    tick callback. Here the cached MSR value is already the latest and limits
    are updated based on the latest user limits, but on return the MSR write
    callback called from setpolicy() callback will update the HWP_REQUEST
    value. This will be used till next time the boost up function is called.

    In addition add a variable to control HWP dynamic boosting. When HWP
    dynamic boost is active then set the HWP specific update util hook. The
    contents in the utility hooks will be filled in the subsequent patches.

    Reported-by: Mel Gorman
    Tested-by: Giovanni Gherdovich
    Signed-off-by: Srinivas Pandruvada
    Signed-off-by: Rafael J. Wysocki

    Srinivas Pandruvada
     

15 May, 2018

1 commit

  • Allow use of the trace_pstate_sample trace function
    when the intel_pstate driver is in passive mode.
    Since the core_busy and scaled_busy fields are not
    used, and it might be desirable to know which path
    through the driver was used, either intel_cpufreq_target
    or intel_cpufreq_fast_switch, re-task the core_busy
    field as a flag indicator.

    The user can then use the intel_pstate_tracer.py utility
    to summarize and plot the trace.

    Note: The core_busy feild still goes by that name
    in include/trace/events/power.h and within the
    intel_pstate_tracer.py script and csv file headers,
    but it is graphed as "performance", and called
    core_avg_perf now in the intel_pstate driver.

    Sometimes, in passive mode, the driver is not called for
    many tens or even hundreds of seconds. The user
    needs to understand, and not be confused by, this limitation.

    Signed-off-by: Doug Smythies
    Signed-off-by: Srinivas Pandruvada
    Signed-off-by: Rafael J. Wysocki

    Doug Smythies
     

10 Apr, 2018

1 commit


08 Feb, 2018

1 commit

  • When maxcpus=1 is in the kernel command line, the BP is responsible
    for re-enabling the HWP - because currently only the APs invoke
    intel_pstate_hwp_enable() during their online process - which might
    put the system into unstable state after resume.

    Fix this by enabling the HWP explicitly on BP during resume.

    Reported-by: Doug Smythies
    Suggested-by: Srinivas Pandruvada
    Signed-off-by: Yu Chen
    [ rjw: Subject/changelog, minor modifications ]
    Signed-off-by: Rafael J. Wysocki

    Chen Yu
     

12 Jan, 2018

2 commits

  • Currently intel_pstate can function only in HWP mode on Skylake servers.
    When HWP feature is not enabled on the processor then acpi-cpufreq is
    driver is used.

    Based on the power and performance tests using intel_pstate scaling
    algorithm the results are comparable. But intel_pstate brings in
    additional features:
    - Display of turbo frequency range, which many users like to see
    - Place limits in the turbo frequency range when platform allows

    Since these tests are done only using non PID algorithm introduced in
    kernel version 4.14, this patch is not a backport candidate. So each user
    has to carefully weigh the benefits before he backports.

    Signed-off-by: Srinivas Pandruvada
    Signed-off-by: Rafael J. Wysocki

    Srinivas Pandruvada
     
  • Since core_funcs and bxt_funcs have same set of callbacks, replace
    bxt_funcs with core_funcs.

    Signed-off-by: Srinivas Pandruvada
    Signed-off-by: Rafael J. Wysocki

    Srinivas Pandruvada
     

06 Sep, 2017

1 commit

  • Pull ACPI updates from Rafael Wysocki:
    "These include a usual ACPICA code update (this time to upstream
    revision 20170728), a fix for a boot crash on some systems with
    Thunderbolt devices connected at boot time, a rework of the handling
    of PCI bridges when setting up device wakeup, new support for Apple
    device properties, support for DMA configurations reported via ACPI on
    ARM64, APEI-related updates, ACPI EC driver updates and assorted minor
    modifications in several places.

    Specifics:

    - Update the ACPICA code in the kernel to upstream revision 20170728
    including:
    * Alias operator handling update (Bob Moore).
    * Deferred resolution of reference package elements (Bob Moore).
    * Support for the _DMA method in walk resources (Bob Moore).
    * Tables handling update and support for deferred table
    verification (Lv Zheng).
    * Update of SMMU models for IORT (Robin Murphy).
    * Compiler and disassembler updates (Alex James, Erik Schmauss,
    Ganapatrao Kulkarni, James Morse).
    * Tools updates (Erik Schmauss, Lv Zheng).
    * Assorted minor fixes and cleanups (Bob Moore, Kees Cook, Lv
    Zheng, Shao Ming).

    - Rework the initialization of non-wakeup GPEs with method handlers
    in order to address a boot crash on some systems with Thunderbolt
    devices connected at boot time where we miss an early hotplug event
    due to a delay in GPE enabling (Rafael Wysocki).

    - Rework the handling of PCI bridges when setting up ACPI-based
    device wakeup in order to avoid disabling wakeup for bridges
    prematurely (Rafael Wysocki).

    - Consolidate Apple DMI checks throughout the tree, add support for
    Apple device properties to the device properties framework and use
    these properties for the handling of I2C and SPI devices on Apple
    systems (Lukas Wunner).

    - Add support for _DMA to the ACPI-based device properties lookup
    code and make it possible to use the information from there to
    configure DMA regions on ARM64 systems (Lorenzo Pieralisi).

    - Fix several issues in the APEI code, add support for exporting the
    BERT error region over sysfs and update APEI MAINTAINERS entry with
    reviewers information (Borislav Petkov, Dongjiu Geng, Loc Ho, Punit
    Agrawal, Tony Luck, Yazen Ghannam).

    - Fix a potential initialization ordering issue in the ACPI EC driver
    and clean it up somewhat (Lv Zheng).

    - Update the ACPI SPCR driver to extend the existing XGENE 8250
    workaround in it to a new platform (m400) and to work around an
    Xgene UART clock issue (Graeme Gregory).

    - Add a new utility function to the ACPI core to support using ACPI
    OEM ID / OEM Table ID / Revision for system identification in
    blacklisting or similar and switch over the existing code already
    using this information to this new interface (Toshi Kani).

    - Fix an xpower PMIC issue related to GPADC reads that always return
    0 without extra pin manipulations (Hans de Goede).

    - Add statements to print debug messages in a couple of places in the
    ACPI core for easier diagnostics (Rafael Wysocki).

    - Clean up the ACPI processor driver slightly (Colin Ian King, Hanjun
    Guo).

    - Clean up the ACPI x86 boot code somewhat (Andy Shevchenko).

    - Add a quirk for Dell OptiPlex 9020M to the ACPI backlight driver
    (Alex Hung).

    - Assorted fixes, cleanups and updates related to ACPI (Amitoj Kaur
    Chawla, Bhumika Goyal, Frank Rowand, Jean Delvare, Punit Agrawal,
    Ronald Tschalär, Sumeet Pawnikar)"

    * tag 'acpi-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (75 commits)
    ACPI / APEI: Suppress message if HEST not present
    intel_pstate: convert to use acpi_match_platform_list()
    ACPI / blacklist: add acpi_match_platform_list()
    ACPI, APEI, EINJ: Subtract any matching Register Region from Trigger resources
    ACPI: make device_attribute const
    ACPI / sysfs: Extend ACPI sysfs to provide access to boot error region
    ACPI: APEI: fix the wrong iteration of generic error status block
    ACPI / processor: make function acpi_processor_check_duplicates() static
    ACPI / EC: Clean up EC GPE mask flag
    ACPI: EC: Fix possible issues related to EC initialization order
    ACPI / PM: Add debug statements to acpi_pm_notify_handler()
    ACPI: Add debug statements to acpi_global_event_handler()
    ACPI / scan: Enable GPEs before scanning the namespace
    ACPICA: Make it possible to enable runtime GPEs earlier
    ACPICA: Dispatch active GPEs at init time
    ACPI: SPCR: work around clock issue on xgene UART
    ACPI: SPCR: extend XGENE 8250 workaround to m400
    ACPI / LPSS: Don't abort ACPI scan on missing mem resource
    mailbox: pcc: Drop uninformative output during boot
    ACPI/IORT: Add IORT named component memory address limits
    ...

    Linus Torvalds
     

04 Sep, 2017

3 commits

  • * intel_pstate:
    cpufreq: intel_pstate: Shorten a couple of long names
    cpufreq: intel_pstate: Simplify intel_pstate_adjust_pstate()
    cpufreq: intel_pstate: Improve IO performance with per-core P-states
    cpufreq: intel_pstate: Drop INTEL_PSTATE_HWP_SAMPLING_INTERVAL
    cpufreq: intel_pstate: Drop ->update_util from pstate_funcs
    cpufreq: intel_pstate: Do not use PID-based P-state selection

    Rafael J. Wysocki
     
  • * pm-cpufreq-sched:
    cpufreq: schedutil: Always process remote callback with slow switching
    cpufreq: schedutil: Don't restrict kthread to related_cpus unnecessarily
    cpufreq: Return 0 from ->fast_switch() on errors
    cpufreq: Simplify cpufreq_can_do_remote_dvfs()
    cpufreq: Process remote callbacks from any CPU if the platform permits
    sched: cpufreq: Allow remote cpufreq callbacks
    cpufreq: schedutil: Use unsigned int for iowait boost
    cpufreq: schedutil: Make iowait boost more energy efficient

    Rafael J. Wysocki
     
  • * pm-cpufreq: (33 commits)
    cpufreq: imx6q: Fix imx6sx low frequency support
    cpufreq: speedstep-lib: make several arrays static, makes code smaller
    cpufreq: ti: Fix 'of_node_put' being called twice in error handling path
    cpufreq: dt-platdev: Drop few entries from whitelist
    cpufreq: dt-platdev: Automatically create cpufreq device with OPP v2
    ARM: ux500: don't select CPUFREQ_DT
    cpufreq: Convert to using %pOF instead of full_name
    cpufreq: Cap the default transition delay value to 10 ms
    cpufreq: dbx500: Delete obsolete driver
    mfd: db8500-prcmu: Get rid of cpufreq dependency
    cpufreq: enable the DT cpufreq driver on the Ux500
    cpufreq: Loongson2: constify platform_device_id
    cpufreq: dt: Add r8a7796 support to to use generic cpufreq driver
    cpufreq: remove setting of policy->cpu in policy->cpus during init
    cpufreq: mediatek: add support of cpufreq to MT7622 SoC
    cpufreq: mediatek: add cleanups with the more generic naming
    cpufreq: rcar: Add support for R8A7795 SoC
    cpufreq: dt: Add rk3328 compatible to use generic cpufreq driver
    cpufreq: s5pv210: add missing of_node_put()
    cpufreq: Allow dynamic switching with CPUFREQ_ETERNAL latency
    ...

    Rafael J. Wysocki
     

29 Aug, 2017

1 commit


21 Aug, 2017

1 commit


18 Aug, 2017

1 commit

  • policy->cpu is copied into policy->cpus in cpufreq_online() before
    calling into cpufreq_driver->init(). So there's no need to set the
    same in the individual driver init() functions again.

    This patch removes the redundant setting of policy->cpu in policy->cpus
    in intel_pstate and cppc drivers.

    Reported-by: Viresh Kumar
    Signed-off-by: Sudeep Holla
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Sudeep Holla
     

11 Aug, 2017

1 commit

  • The intel_pstate CPU frequency scaling driver has always
    calculated CPU frequency incorrectly. Recent changes have
    eliminted most of the issues, however the frequency reported
    in the trace buffer, if used, is incorrect.

    It remains desireable that cpu->pstate.scaling still be a nice
    round number for things such as when setting max and min frequencies.
    So the proposal is to just fix the reported frequency in the trace data.

    Fixes what remains of [1].

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=96521 # [1]
    Signed-off-by: Doug Smythies
    Signed-off-by: Rafael J. Wysocki

    Doug Smythies