09 Apr, 2016

1 commit

  • * pm-core:
    PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs
    PM / runtime: Document steps for device removal

    * powercap:
    powercap: intel_rapl: Add missing Haswell model

    * pm-tools:
    tools/power turbostat: work around RC6 counter wrap
    tools/power turbostat: initial KBL support
    tools/power turbostat: initial SKX support
    tools/power turbostat: decode BXT TSC frequency via CPUID
    tools/power turbostat: initial BXT support
    tools/power turbostat: print IRTL MSRs
    tools/power turbostat: SGX state should print only if --debug

    Rafael J. Wysocki
     

08 Apr, 2016

7 commits


17 Mar, 2016

1 commit

  • Pull power management and ACPI updates from Rafael Wysocki:
    "This time the majority of changes go into cpufreq and they are
    significant.

    First off, the way CPU frequency updates are triggered is different
    now. Instead of having to set up and manage a deferrable timer for
    each CPU in the system to evaluate and possibly change its frequency
    periodically, cpufreq governors set up callbacks to be invoked by the
    scheduler on a regular basis (basically on utilization updates). The
    "old" governors, "ondemand" and "conservative", still do all of their
    work in process context (although that is triggered by the scheduler
    now), but intel_pstate does it all in the callback invoked by the
    scheduler with no need for any additional asynchronous processing.

    Of course, this eliminates the overhead related to the management of
    all those timers, but also it allows the cpufreq governor code to be
    simplified quite a bit. On top of that, the common code and data
    structures used by the "ondemand" and "conservative" governors are
    cleaned up and made more straightforward and some long-standing and
    quite annoying problems are addressed. In particular, the handling of
    governor sysfs attributes is modified and the related locking becomes
    more fine grained which allows some concurrency problems to be avoided
    (particularly deadlocks with the core cpufreq code).

    In principle, the new mechanism for triggering frequency updates
    allows utilization information to be passed from the scheduler to
    cpufreq. Although the current code doesn't make use of it, in the
    works is a new cpufreq governor that will make decisions based on the
    scheduler's utilization data. That should allow the scheduler and
    cpufreq to work more closely together in the long run.

    In addition to the core and governor changes, cpufreq drivers are
    updated too. Fixes and optimizations go into intel_pstate, the
    cpufreq-dt driver is updated on top of some modification in the
    Operating Performance Points (OPP) framework and there are fixes and
    other updates in the powernv cpufreq driver.

    Apart from the cpufreq updates there is some new ACPICA material,
    including a fix for a problem introduced by previous ACPICA updates,
    and some less significant changes in the ACPI code, like CPPC code
    optimizations, ACPI processor driver cleanups and support for loading
    ACPI tables from initrd.

    Also updated are the generic power domains framework, the Intel RAPL
    power capping driver and the turbostat utility and we have a bunch of
    traditional assorted fixes and cleanups.

    Specifics:

    - Redesign of cpufreq governors and the intel_pstate driver to make
    them use callbacks invoked by the scheduler to trigger CPU
    frequency evaluation instead of using per-CPU deferrable timers for
    that purpose (Rafael Wysocki).

    - Reorganization and cleanup of cpufreq governor code to make it more
    straightforward and fix some concurrency problems in it (Rafael
    Wysocki, Viresh Kumar).

    - Cleanup and improvements of locking in the cpufreq core (Viresh
    Kumar).

    - Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh
    Kumar, Eric Biggers).

    - intel_pstate driver updates including fixes, optimizations and a
    modification to make it enable enable hardware-coordinated P-state
    selection (HWP) by default if supported by the processor (Philippe
    Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe
    Franciosi).

    - Operating Performance Points (OPP) framework updates to improve its
    handling of voltage regulators and device clocks and updates of the
    cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter).

    - Updates of the powernv cpufreq driver to fix initialization and
    cleanup problems in it and correct its worker thread handling with
    respect to CPU offline, new powernv_throttle tracepoint (Shilpasri
    Bhat).

    - ACPI cpufreq driver optimization and cleanup (Rafael Wysocki).

    - ACPICA updates including one fix for a regression introduced by
    previos changes in the ACPICA code (Bob Moore, Lv Zheng, David Box,
    Colin Ian King).

    - Support for installing ACPI tables from initrd (Lv Zheng).

    - Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin
    Chaugule).

    - Support for _HID(ACPI0010) devices (ACPI processor containers) and
    ACPI processor driver cleanups (Sudeep Holla).

    - Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory,
    Aleksey Makarov).

    - Modification of the ACPI PCI IRQ management code to make it treat
    255 in the Interrupt Line register as "not connected" on x86 (as
    per the specification) and avoid attempts to use that value as a
    valid interrupt vector (Chen Fan).

    - ACPI APEI fixes related to resource leaks (Josh Hunt).

    - Removal of modularity from a few ACPI drivers (BGRT, GHES,
    intel_pmic_crc) that cannot be built as modules in practice (Paul
    Gortmaker).

    - PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS
    as a valid resource type (Harb Abdulhamid).

    - New device ID (future AMD I2C controller) in the ACPI driver for
    AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu).

    - Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin).

    - cpuidle menu governor optimization to avoid a square root
    computation in it (Rasmus Villemoes).

    - Fix for potential use-after-free in the generic device properties
    framework (Heikki Krogerus).

    - Updates of the generic power domains (genpd) framework including
    support for multiple power states of a domain, fixes and debugfs
    output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart,
    Geert Uytterhoeven).

    - Intel RAPL power capping driver updates to reduce IPI overhead in
    it (Jacob Pan).

    - System suspend/hibernation code cleanups (Eric Biggers, Saurabh
    Sengar).

    - Year 2038 fix for the process freezer (Abhilash Jindal).

    - turbostat utility updates including new features (decoding of more
    registers and CPUID fields, sub-second intervals support, GFX MHz
    and RC6 printout, --out command line option), fixes (syscall jitter
    detection and workaround, reductioin of the number of syscalls
    made, fixes related to Xeon x200 processors, compiler warning
    fixes) and cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu)"

    * tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (182 commits)
    tools/power turbostat: bugfix: TDP MSRs print bits fixing
    tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
    tools/power turbostat: call __cpuid() instead of __get_cpuid()
    tools/power turbostat: indicate SMX and SGX support
    tools/power turbostat: detect and work around syscall jitter
    tools/power turbostat: show GFX%rc6
    tools/power turbostat: show GFXMHz
    tools/power turbostat: show IRQs per CPU
    tools/power turbostat: make fewer systems calls
    tools/power turbostat: fix compiler warnings
    tools/power turbostat: add --out option for saving output in a file
    tools/power turbostat: re-name "%Busy" field to "Busy%"
    tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
    tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
    tools/power turbostat: allow sub-sec intervals
    ACPI / APEI: ERST: Fixed leaked resources in erst_init
    ACPI / APEI: Fix leaked resources
    intel_pstate: Do not skip samples partially
    intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
    intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
    ...

    Linus Torvalds
     

14 Mar, 2016

1 commit

  • Pull turbostat updates for 4.6 from Len Brown.

    * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
    tools/power turbostat: bugfix: TDP MSRs print bits fixing
    tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
    tools/power turbostat: call __cpuid() instead of __get_cpuid()
    tools/power turbostat: indicate SMX and SGX support
    tools/power turbostat: detect and work around syscall jitter
    tools/power turbostat: show GFX%rc6
    tools/power turbostat: show GFXMHz
    tools/power turbostat: show IRQs per CPU
    tools/power turbostat: make fewer systems calls
    tools/power turbostat: fix compiler warnings
    tools/power turbostat: add --out option for saving output in a file
    tools/power turbostat: re-name "%Busy" field to "Busy%"
    tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
    tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
    tools/power turbostat: allow sub-sec intervals
    tools/power turbostat: Decode MSR_MISC_PWR_MGMT
    tools/power turbostat: decode HWP registers
    x86 msr-index: Simplify syntax for HWP fields
    tools/power turbostat: CPUID(0x16) leaf shows base, max, and bus frequency
    tools/power turbostat: decode more CPUID fields

    Rafael J. Wysocki
     

13 Mar, 2016

15 commits

  • MSR_CONFIG_TDP_NOMINAL:
    should print all 8 bits of base_ratio (bit 0:7) 0xFF

    MSR_CONFIG_TDP_LEVEL_1:
    should print all 15 bits of PKG_MIN_PWR_LVL1 (bit 48:62) 0x7FFF
    should print all 15 bits of PKG_MAX_PWR_LVL1 (bit 32:46) 0x7FFF
    should print all 8 bits of LVL1_RATIO (bit 16:23) 0xFF
    should print all 15 bits of PKG_TDP_LVL1 (bit 0:14) 0x7FFF

    And the same modification to MSR_CONFIG_TDP_LEVEL_2.

    MSR_TURBO_ACTIVATION_RATIO:
    should print all 8 bits of MAX_NON_TURBO_RATIO (bit 0:7) 0xFF

    Signed-off-by: Chen Yu
    Signed-off-by: Len Brown

    Chen Yu
     
  • MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=0: unlimited)
    should print as
    MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=8: unlimited)

    Signed-off-by: Len Brown

    Len Brown
     
  • turbostat already checks whether calling each cpuid leavf is legal,
    and it doesn't look at the function return value,
    so call the simpler gcc intrinsic __cpuid() instead of __get_cpuid().

    syntax only, no functional change

    Signed-off-by: Len Brown

    Len Brown
     
  • SGX presence is related to a SKL power workaround,
    so lets show when that is enabled.

    Signed-off-by: Len Brown

    Len Brown
     
  • The accuracy of Bzy_Mhz and Busy% depend on reading
    the TSC, APERF, and MPERF close together in time.

    When there is a very short measurement interval,
    or a large system is profoundly idle, the changes
    in APERF and MPERF may be very small.
    They can be small enough that an expensive interrupt
    between reading APERF and MPERF can cause the APERF/MPERF
    ratio to become inaccurate, resulting in invalid
    calculation and display of Bzy_MHz.

    A dummy APERF read of APERF makes this problem
    much more rare. Apparently this 1st systemn call
    after exiting a long stretch of idle is when we
    typically see expensive timer interrupts that cause
    large jitter.

    For the cases that dummy APERF read fails to prevent,
    we compare the latency of the APERF and MPERF reads.
    If they differ by more than 2x, we re-issue them.

    Signed-off-by: Len Brown

    Len Brown
     
  • The column "GFX%c6" show the percentage of time the GPU
    is in the "render C6" state, rc6. Deep package C-states on several
    systems depend on the GPU being in RC6.

    This information comes from the counter
    /sys/class/drm/card0/power/rc6_residency_ms,
    as read before and after the measurement interval.

    Signed-off-by: Len Brown

    Len Brown
     
  • Under the column "GFXMHz", show a snapshot of this attribute:
    /sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz

    This is an instantaneous snapshot of what sysfs presents
    at the end of the measurement interval. turbostat does
    not average or otherwise perform any math on this value.

    Signed-off-by: Len Brown

    Len Brown
     
  • The new IRQ column shows how many interrupts have occurred on each CPU
    during the measurement inteval. This information comes from
    the difference between /proc/interrupts shapshots made before
    and after the measurement interval.

    The first row, the system summary, shows the sum of the IRQS
    for all CPUs during that interval.

    Signed-off-by: Len Brown

    Len Brown
     
  • skip the open(2)/close(2) on each msr read
    by keeping the /dev/cpu/*/msr files open.

    The remaining read(2) is generally far fewer cycles
    than the removed open(2) system call.

    Signed-off-by: Len Brown

    Len Brown
     
  • Signed-off-by: Len Brown

    Len Brown
     
  • By default...

    Turbostat --debug gconfiguration info goes to stderr.

    In FORK mode, turbostat statistics go to stderr.

    In PERIODIC mode, turbostat statistics go to stdout.

    These defaults do not change, but an option "--out file"
    will send all output above only to the specified file.

    Signed-off-by: Len Brown

    Len Brown
     
  • some tools processing turbostat output
    have difficulty with items that begin with %...

    Reported-by: Jacob Pan
    Signed-off-by: Len Brown

    Len Brown
     
  • Following changes have been made:
    - changed MSR_NHM_TURBO_RATIO_LIMIT to MSR_TURBO_RATIO_LIMIT in debug print
    for consistency with Developer Manual
    - updated definition of bitfields in MSR_TURBO_RATIO_LIMIT and appropriate
    parsing code
    - added x200 to list of architectures that do not support Nahlem compatible
    definition of MSR_TURBO_RATIO_LIMIT register (x200 has the register but
    bits definition is custom)
    - fixed typo in code that parses MSR_TURBO_RATIO_LIMIT
    (logical instead of bitwise operator)
    - changed MSR_TURBO_RATIO_LIMIT parsing algorithm so the print out had the
    same order as implementations for other platforms

    Signed-off-by: Hubert Chrzaniuk
    Signed-off-by: Len Brown

    Hubert Chrzaniuk
     
  • x200 does not enable any way to programmatically obtain bus clock
    speed. Bclk for the architecture has a fixed value of 100 MHz.
    At the same time x200 cannot be included in has_snb_msrs since
    it does not support C7 idle state.

    prior to this patch, MHz values reported on this chip
    were erroneously calculated using bclk of 133MHz,
    causing MHz values to be reported 33% higher than actual.

    Signed-off-by: Hubert Chrzaniuk
    Signed-off-by: Len Brown

    Chrzaniuk, Hubert
     
  • turbostat -i interval_sec

    will sample and display statistics every interval_sec.
    interval_sec used to be a whole number of seconds,
    but now we accept a decimal, as small as 0.001 sec (1 ms).

    Signed-off-by: Len Brown

    Len Brown
     

03 Mar, 2016

1 commit

  • When building with gcc 6 we're getting various build warnings that just
    require some trivial function declaration and call fixes:

    turbostat.c: In function ‘dump_cstate_pstate_config_info’:
    turbostat.c:1973:1: warning: type of ‘family’ defaults to ‘int’
    dump_cstate_pstate_config_info(family, model)
    turbostat.c:1973:1: warning: type of ‘model’ defaults to ‘int’
    turbostat.c: In function ‘get_tdp’:
    turbostat.c:2145:8: warning: type of ‘model’ defaults to ‘int’
    double get_tdp(model)
    turbostat.c: In function ‘perf_limit_reasons_probe’:
    turbostat.c:2259:6: warning: type of ‘family’ defaults to ‘int’
    void perf_limit_reasons_probe(family, model)
    turbostat.c:2259:6: warning: type of ‘model’ defaults to ‘int’

    Signed-off-by: Colin Ian King
    Cc: Matt Fleming
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-wbicer8n0s9qe6ql8h9x478e@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Colin Ian King
     

17 Feb, 2016

4 commits


21 Jan, 2016

2 commits

  • * pm-tools:
    cpupower: Fix build error in cpufreq-info

    Rafael J. Wysocki
     
  • * acpica:
    ACPICA: Update version to 20160108
    ACPICA: Silence a -Wbad-function-cast warning when acpi_uintptr_t is 'uintptr_t'
    ACPICA: Additional 2016 copyright changes
    ACPICA: Reduce regression fix divergence from upstream ACPICA

    * acpi-video:
    ACPI / video: Add disable_backlight_sysfs_if quirk for the Toshiba Satellite R830
    ACPI / video: Revert "thinkpad_acpi: Use acpi_video_handles_brightness_key_presses()"
    ACPI / video: Document acpi_video_handles_brightness_key_presses() a bit
    ACPI / video: Fix using an uninitialized mutex / list_head in acpi_video_handles_brightness_key_presses()
    ACPI / video: Revert "ACPI / video: driver must be registered before checking for keypresses"
    ACPI / video: Add disable_backlight_sysfs_if quirk for the Toshiba Portege R700

    * acpi-fan:
    ACPI / fan: Improve acpi_device_update_power error message

    Rafael J. Wysocki
     

19 Jan, 2016

1 commit

  • Fix the following build error by including limits.h -

    utils/cpufreq-info.c: In function ‘get_latency’:
    utils/cpufreq-info.c:437:29: error: ‘UINT_MAX’ undeclared (first use in
    this function)
    if (!latency || latency == UINT_MAX) {
    ^
    Signed-off-by: Shreyas B. Prabhu
    Fixes: e98f033f94f3 (cpupower: fix how "cpupower frequency-info" interprets latency)
    Signed-off-by: Rafael J. Wysocki

    Shreyas B. Prabhu
     

16 Jan, 2016

1 commit


12 Jan, 2016

1 commit

  • * pm-sleep:
    PM / sleep: Add support for read-only sysfs attributes

    * pm-tools:
    cpupower: fix how "cpupower frequency-info" interprets latency
    cpupower: rework the "cpupower frequency-info" command
    cpupower: Do not analyse offlined cpus
    cpupower: Provide STATIC variable in Makefile for debug builds
    cpupower: Fix precedence issue

    Rafael J. Wysocki
     

01 Jan, 2016

2 commits


15 Dec, 2015

1 commit

  • This patch adds a userspace tool to access Linux kernel AML debugger
    interface.

    Tow modes are supported by this tool:
    1. Interactive: Users are able to launch a debugging shell to talk with
    in-kernel AML debugger.
    Note that it's user duty to ensure kernel runtime integrity by using
    this debugging tool:
    A. Some control methods evaluated by the users may result in kernel
    panics if those control methods shouldn't be evaluated by the OSPMs
    according to the current BIOS/OS configurations.
    B. Currently if a single stepping evaluation couldn't run to an end,
    then the synchronization primitives acquired by the evaluation may
    block normal OSPM control method evaluations.
    2. Batch: Users are able to execute debugger commands in a script.
    Note that in addition to the above duties, it's user duty to ensure
    script runtime integrity by using this debugging tool in this mode:
    C. Currently only those commands that are not used for single stepping
    are suitable to be used in this mode.
    D. If the execution of the command may cause a failure that could result
    in an endless kernel execution, the execution of the script may also
    get blocked.
    To exit the utility, currently "exit/quit" commands are recommended, but
    ctrl-C" can also be used.

    Signed-off-by: Lv Zheng
    Signed-off-by: Rafael J. Wysocki

    Lv Zheng
     

03 Dec, 2015

2 commits

  • the intel-pstate driver does not support the ondemand governor and does not
    have a valid value in
    /sys/devices/system/cpu/cpu[x]/cpufreq/cpuinfo_transition_latency. The
    intel-pstate driver sets cpuinfo_transition_latency to CPUFREQ_ETERNAL (-1),
    the value written into cpuinfo_transition_latency is defind as an unsigned
    int so checking the read value against max unsigned int will determine if the
    value is valid.

    Signed-off-by: Jacob Tanenbaum
    Signed-off-by: Thomas Renninger
    Signed-off-by: Rafael J. Wysocki

    Jacob Tanenbaum
     
  • this patch makes two changes to the way that "cpupower
    frequancy-info" operates

    1. make it so that querying individual values always returns a
    message to the user

    currently cpupower frequency info doesn't return anything to the user when
    querying an individual value cannot be returned

    [root@amd-dinar-09 cpupower]# cpupower -c 4 frequency-info -d
    analyzing CPU 4:
    [root@amd-dinar-09 cpupower]#

    I added messages so that each query prints a message to the terminal

    [root@amd-dinar-09 cpupower]# ./cpupower -c 4 frequency-info -d
    analyzing CPU 4:
    no or unknown cpufreq driver is active on this CPU
    [root@amd-dinar-09 cpupower]#

    (this is just one example)

    2. change debug_output_one() to use the functions already provided
    by cpufreq-info.c to query individual values of interest.

    Signed-off-by: Jacob Tanenbaum
    Signed-off-by: Thomas Renninger
    Signed-off-by: Rafael J. Wysocki

    Jacob Tanenbaum