13 Nov, 2017

1 commit

  • * pm-cpuidle:
    intel_idle: Graceful probe failure when MWAIT is disabled
    cpuidle: Avoid assignment in if () argument
    cpuidle: Clean up cpuidle_enable_device() error handling a bit
    cpuidle: ladder: Add per CPU PM QoS resume latency support
    ARM: cpuidle: Refactor rollback operations if init fails
    ARM: cpuidle: Correct driver unregistration if init fails
    intel_idle: replace conditionals with static_cpu_has(X86_FEATURE_ARAT)
    cpuidle: fix broadcast control when broadcast can not be entered

    Conflicts:
    drivers/idle/intel_idle.c

    Rafael J. Wysocki
     

09 Nov, 2017

1 commit

  • When MWAIT is disabled, intel_idle refuses to probe.
    But it may mis-lead the user by blaming this on the model number:

    intel_idle: does not run on family 6 modesl 79

    So defer the check for MWAIT until after the model# white-list check succeeds,
    and if the MWAIT check fails, tell the user how to fix it:

    intel_idle: Please enable MWAIT in BIOS SETUP

    Signed-off-by: Len Brown
    Signed-off-by: Rafael J. Wysocki

    Len Brown
     

04 Nov, 2017

1 commit

  • This reverts commit 43858b4f25cf0adc5c2ca9cf5ce5fdf2532941e5.

    The reason I removed the leave_mm() calls in question is because the
    heuristic wasn't needed after that patch. With the original version
    of my PCID series, we never flushed a "lazy cpu" (i.e. a CPU running
    kernel thread) due a flush on the loaded mm.

    Unfortunately, that caused architectural issues, so now I've
    reinstated these flushes on non-PCID systems in:

    commit b956575bed91 ("x86/mm: Flush more aggressively in lazy TLB mode").

    That, in turn, gives us a power management and occasionally
    performance regression as compared to old kernels: a process that
    goes into a deep idle state on a given CPU and gets its mm flushed
    due to activity on a different CPU will wake the idle CPU.

    Reinstate the old ugly heuristic: if a CPU goes into ACPI C3 or an
    intel_idle state that is likely to cause a TLB flush gets its mm
    switched to init_mm before going idle.

    FWIW, this heuristic is lousy. Whether we should change CR3 before
    idle isn't a good hint except insofar as the performance hit is a bit
    lower if the TLB is getting flushed by the idle code anyway. What we
    really want to know is whether we anticipate being idle long enough
    that the mm is likely to be flushed before we wake up. This is more a
    matter of the expected latency than the idle state that gets chosen.
    This heuristic also completely fails on systems that don't know
    whether the TLB will be flushed (e.g. AMD systems?). OTOH it may be a
    bit obsolete anyway -- PCID systems don't presently benefit from this
    heuristic at all.

    We also shouldn't do this callback from innermost bit of the idle code
    due to the RCU nastiness it causes. All the information need is
    available before rcu_idle_enter() needs to happen.

    Signed-off-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: 43858b4f25cf "x86/mm: Stop calling leave_mm() in idle code"
    Link: http://lkml.kernel.org/r/c513bbd4e653747213e05bc7062de000bf0202a5.1509793738.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

11 Oct, 2017

1 commit


06 Sep, 2017

1 commit

  • Pull power management updates from Rafael Wysocki:
    "This time (again) cpufreq gets the majority of changes which mostly
    are driver updates (including a major consolidation of intel_pstate),
    some schedutil governor modifications and core cleanups.

    There also are some changes in the system suspend area, mostly related
    to diagnostics and debug messages plus some renames of things related
    to suspend-to-idle. One major change here is that suspend-to-idle is
    now going to be preferred over S3 on systems where the ACPI tables
    indicate to do so and provide requsite support (the Low Power Idle S0
    _DSM in particular). The system sleep documentation and the tools
    related to it are updated too.

    The rest is a few cpuidle changes (nothing major), devfreq updates,
    generic power domains (genpd) framework updates and a few assorted
    modifications elsewhere.

    Specifics:

    - Drop the P-state selection algorithm based on a PID controller from
    intel_pstate and make it use the same P-state selection method
    (based on the CPU load) for all types of systems in the active mode
    (Rafael Wysocki, Srinivas Pandruvada).

    - Rework the cpufreq core and governors to make it possible to take
    cross-CPU utilization updates into account and modify the schedutil
    governor to actually do so (Viresh Kumar).

    - Clean up the handling of transition latency information in the
    cpufreq core and untangle it from the information on which drivers
    cannot do dynamic frequency switching (Viresh Kumar).

    - Add support for new SoCs (MT2701/MT7623 and MT7622) to the mediatek
    cpufreq driver and update its DT bindings (Sean Wang).

    - Modify the cpufreq dt-platdev driver to autimatically create
    cpufreq devices for the new (v2) Operating Performance Points (OPP)
    DT bindings and update its whitelist of supported systems (Viresh
    Kumar, Shubhrajyoti Datta, Marc Gonzalez, Khiem Nguyen, Finley
    Xiao).

    - Add support for Ux500 to the cpufreq-dt driver and drop the
    obsolete dbx500 cpufreq driver (Linus Walleij, Arnd Bergmann).

    - Add new SoC (R8A7795) support to the cpufreq rcar driver (Khiem
    Nguyen).

    - Fix and clean up assorted issues in the cpufreq drivers and core
    (Arvind Yadav, Christophe Jaillet, Colin Ian King, Gustavo Silva,
    Julia Lawall, Leonard Crestez, Rob Herring, Sudeep Holla).

    - Update the IO-wait boost handling in the schedutil governor to make
    it less aggressive (Joel Fernandes).

    - Rework system suspend diagnostics to make it print fewer messages
    to the kernel log by default, add a sysfs knob to allow more
    suspend-related messages to be printed and add Low Power S0 Idle
    constraints checks to the ACPI suspend-to-idle code (Rafael
    Wysocki, Srinivas Pandruvada).

    - Prefer suspend-to-idle over S3 on ACPI-based systems with the
    ACPI_FADT_LOW_POWER_S0 flag set and the Low Power Idle S0 _DSM
    interface present in the ACPI tables (Rafael Wysocki).

    - Update documentation related to system sleep and rename a number of
    items in the code to make it cleare that they are related to
    suspend-to-idle (Rafael Wysocki).

    - Export a variable allowing device drivers to check the target
    system sleep state from the core system suspend code (Florian
    Fainelli).

    - Clean up the cpuidle subsystem to handle the polling state on x86
    in a more straightforward way and to use %pOF instead of full_name
    (Rafael Wysocki, Rob Herring).

    - Update the devfreq framework to fix and clean up a few minor issues
    (Chanwoo Choi, Rob Herring).

    - Extend diagnostics in the generic power domains (genpd) framework
    and clean it up slightly (Thara Gopinath, Rob Herring).

    - Fix and clean up a couple of issues in the operating performance
    points (OPP) framework (Viresh Kumar, Waldemar Rymarkiewicz).

    - Add support for RV1108 to the rockchip-io Adaptive Voltage Scaling
    (AVS) driver (David Wu).

    - Fix the usage of notifiers in CPU power management on some
    platforms (Alex Shi).

    - Update the pm-graph system suspend/hibernation and boot profiling
    utility (Todd Brandt).

    - Make it possible to run the cpupower utility without CPU0 (Prarit
    Bhargava)"

    * tag 'pm-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (87 commits)
    cpuidle: Make drivers initialize polling state
    cpuidle: Move polling state initialization code to separate file
    cpuidle: Eliminate the CPUIDLE_DRIVER_STATE_START symbol
    cpufreq: imx6q: Fix imx6sx low frequency support
    cpufreq: speedstep-lib: make several arrays static, makes code smaller
    PM: docs: Delete the obsolete states.txt document
    PM: docs: Describe high-level PM strategies and sleep states
    PM / devfreq: Fix memory leak when fail to register device
    PM / devfreq: Add dependency on PM_OPP
    PM / devfreq: Move private devfreq_update_stats() into devfreq
    PM / devfreq: Convert to using %pOF instead of full_name
    PM / AVS: rockchip-io: add io selectors and supplies for RV1108
    cpufreq: ti: Fix 'of_node_put' being called twice in error handling path
    cpufreq: dt-platdev: Drop few entries from whitelist
    cpufreq: dt-platdev: Automatically create cpufreq device with OPP v2
    ARM: ux500: don't select CPUFREQ_DT
    cpuidle: Convert to using %pOF instead of full_name
    cpufreq: Convert to using %pOF instead of full_name
    PM / Domains: Convert to using %pOF instead of full_name
    cpufreq: Cap the default transition delay value to 10 ms
    ...

    Linus Torvalds
     

04 Sep, 2017

1 commit

  • * pm-sleep:
    ACPI / PM: Check low power idle constraints for debug only
    PM / s2idle: Rename platform operations structure
    PM / s2idle: Rename ->enter_freeze to ->enter_s2idle
    PM / s2idle: Rename freeze_state enum and related items
    PM / s2idle: Rename PM_SUSPEND_FREEZE to PM_SUSPEND_TO_IDLE
    ACPI / PM: Prefer suspend-to-idle over S3 on some systems
    platform/x86: intel-hid: Wake up Dell Latitude 7275 from suspend-to-idle
    PM / suspend: Define pr_fmt() in suspend.c
    PM / suspend: Use mem_sleep_labels[] strings in messages
    PM / sleep: Put pm_test under CONFIG_PM_SLEEP_DEBUG
    PM / sleep: Check pm_wakeup_pending() in __device_suspend_noirq()
    PM / core: Add error argument to dpm_show_time()
    PM / core: Split dpm_suspend_noirq() and dpm_resume_noirq()
    PM / s2idle: Rearrange the main suspend-to-idle loop
    PM / timekeeping: Print debug messages when requested
    PM / sleep: Mark suspend/hibernation start and finish
    PM / sleep: Do not print debug messages by default
    PM / suspend: Export pm_suspend_target_state

    Rafael J. Wysocki
     

30 Aug, 2017

1 commit

  • Make the drivers that want to include the polling state into their
    states table initialize it explicitly and drop the initialization of
    it (which in fact is conditional, but that is not obvious from the
    code) from the core.

    Signed-off-by: Rafael J. Wysocki
    Tested-by: Sudeep Holla
    Acked-by: Daniel Lezcano

    Rafael J. Wysocki
     

11 Aug, 2017

1 commit


18 Jul, 2017

1 commit


05 Jul, 2017

1 commit

  • Now that lazy TLB suppresses all flush IPIs (as opposed to all but
    the first), there's no need to leave_mm() when going idle.

    This means we can get rid of the rcuidle hack in
    switch_mm_irqs_off() and we can unexport leave_mm().

    This also removes acpi_unlazy_tlb() from the x86 and ia64 headers,
    since it has no callers any more.

    Signed-off-by: Andy Lutomirski
    Reviewed-by: Nadav Amit
    Reviewed-by: Borislav Petkov
    Reviewed-by: Thomas Gleixner
    Cc: Andrew Morton
    Cc: Arjan van de Ven
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: Linus Torvalds
    Cc: Mel Gorman
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/03c699cfd6021e467be650d6b73deaccfe4b4bd7.1498751203.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

30 Jun, 2017

1 commit

  • Remove #define PREFIX and add #define pr_fmt to use more common logging.

    Miscellanea:

    o Add missing newline to format
    o Convert a single printk without KERN_ to pr_info

    Signed-off-by: Joe Perches
    Acked-by: Jacob Pan
    Signed-off-by: Rafael J. Wysocki

    Joe Perches
     

02 May, 2017

1 commit


03 Mar, 2017

1 commit

  • Pull turbostat utility updates from Rafael Wysocki:
    "Power management turbostat utility updates.

    These update turbostat significantly and in particular:

    - default output is now verbose, --debug is no longer required to get
    all counters. As a result, some options have been added to specify
    exactly what output is wanted.

    - added --quiet to skip system configuration output

    - added --list, --show and --hide parameters

    - added --cpu parameter

    - enhanced Baytrail SoC support

    - added Gemini Lake SoC support

    - added sysfs C-state columns

    Also the symbol definitions in arch/x86/include/asm/intel-family.h and
    arch/x86/include/asm/msr-index.h are updated and the intel_idle and
    intel_pstate drivers are modified to use the updated symbols.

    Credits to Len Brown for all of these changes"

    * tag 'pm-turbostat-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (44 commits)
    tools/power turbostat: version 17.02.24
    tools/power turbostat: bugfix: --add u32 was printed as u64
    tools/power turbostat: show error on exec
    tools/power turbostat: dump p-state software config
    tools/power turbostat: show package number, even without --debug
    tools/power turbostat: support "--hide C1" etc.
    tools/power turbostat: move --Package and --processor into the --cpu option
    tools/power turbostat: turbostat.8 update
    tools/power turbostat: update --list feature
    tools/power turbostat: use wide columns to display large numbers
    tools/power turbostat: Add --list option to show available header names
    tools/power turbostat: fix zero IRQ count shown in one-shot command mode
    tools/power turbostat: add --cpu parameter
    tools/power turbostat: print sysfs C-state stats
    tools/power turbostat: extend --add option to accept /sys path
    tools/power turbostat: skip unused counters on BDX
    tools/power turbostat: fix decoding for GLM, DNV, SKX turbo-ratio limits
    tools/power turbostat: skip unused counters on SKX
    tools/power turbostat: Denverton: use HW CC1 counter, skip C3, C7
    tools/power turbostat: initial Gemini Lake SOC support
    ...

    Linus Torvalds
     

01 Mar, 2017

2 commits

  • previously known as MSR_NHM_SNB_PKG_CST_CFG_CTL

    Signed-off-by: Len Brown

    Len Brown
     
  • Cosmetic only -- no functional change in this patch.

    sysfs before:

    state4/desc:MWAIT 0x20
    state4/name:C6-HSW

    sysfs after:

    state4/desc:MWAIT 0x20
    state4/name:C6

    We remove the platform acronyms from the end of the state name
    (-HSW in this case) for three reasonse.

    1. more consistency with acpi_idle, which prints C1, C2, C3 etc.

    2. users know what platform they are on already
    an acronym for the processor code name here
    seems to cause more confusion than clarity.

    3. less clutter in "cpupower monitor" output,
    which truncates the names to 4 columns.

    The precise definition of the state continues to be available in "desc".

    Reported-by: Artem Bityutskiy
    Signed-off-by: Len Brown
    Signed-off-by: Rafael J. Wysocki

    Len Brown
     

14 Dec, 2016

1 commit

  • Pull power management updates from Rafael Wysocki:
    "Again, cpufreq gets more changes than the other parts this time (one
    new driver, one old driver less, a bunch of enhancements of the
    existing code, new CPU IDs, fixes, cleanups)

    There also are some changes in cpuidle (idle injection rework, a
    couple of new CPU IDs, online/offline rework in intel_idle, fixes and
    cleanups), in the generic power domains framework (mostly related to
    supporting power domains containing CPUs), and in the Operating
    Performance Points (OPP) library (mostly related to supporting devices
    with multiple voltage regulators)

    In addition to that, the system sleep state selection interface is
    modified to make it easier for distributions with unchanged user space
    to support suspend-to-idle as the default system suspend method, some
    issues are fixed in the PM core, the latency tolerance PM QoS
    framework is improved a bit, the Intel RAPL power capping driver is
    cleaned up and there are some fixes and cleanups in the devfreq
    subsystem

    Specifics:

    - New cpufreq driver for Broadcom STB SoCs and a Device Tree binding
    for it (Markus Mayer)

    - Support for ARM Integrator/AP and Integrator/CP in the generic DT
    cpufreq driver and elimination of the old Integrator cpufreq driver
    (Linus Walleij)

    - Support for the zx296718, r8a7743 and r8a7745, Socionext UniPhier,
    and PXA SoCs in the the generic DT cpufreq driver (Baoyou Xie,
    Geert Uytterhoeven, Masahiro Yamada, Robert Jarzmik)

    - cpufreq core fix to eliminate races that may lead to using inactive
    policy objects and related cleanups (Rafael Wysocki)

    - cpufreq schedutil governor update to make it use SCHED_FIFO kernel
    threads (instead of regular workqueues) for doing delayed work (to
    reduce the response latency in some cases) and related cleanups
    (Viresh Kumar)

    - New cpufreq sysfs attribute for resetting statistics (Markus Mayer)

    - cpufreq governors fixes and cleanups (Chen Yu, Stratos Karafotis,
    Viresh Kumar)

    - Support for using generic cpufreq governors in the intel_pstate
    driver (Rafael Wysocki)

    - Support for per-logical-CPU P-state limits and the EPP/EPB (Energy
    Performance Preference/Energy Performance Bias) knobs in the
    intel_pstate driver (Srinivas Pandruvada)

    - New CPU ID for Knights Mill in intel_pstate (Piotr Luc)

    - intel_pstate driver modification to use the P-state selection
    algorithm based on CPU load on platforms with the system profile in
    the ACPI tables set to "mobile" (Srinivas Pandruvada)

    - intel_pstate driver cleanups (Arnd Bergmann, Rafael Wysocki,
    Srinivas Pandruvada)

    - cpufreq powernv driver updates including fast switching support
    (for the schedutil governor), fixes and cleanus (Akshay Adiga,
    Andrew Donnellan, Denis Kirjanov)

    - acpi-cpufreq driver rework to switch it over to the new CPU
    offline/online state machine (Sebastian Andrzej Siewior)

    - Assorted cleanups in cpufreq drivers (Wei Yongjun, Prashanth
    Prakash)

    - Idle injection rework (to make it use the regular idle path instead
    of a home-grown custom one) and related powerclamp thermal driver
    updates (Peter Zijlstra, Jacob Pan, Petr Mladek, Sebastian Andrzej
    Siewior)

    - New CPU IDs for Atom Z34xx and Knights Mill in intel_idle (Andy
    Shevchenko, Piotr Luc)

    - intel_idle driver cleanups and switch over to using the new CPU
    offline/online state machine (Anna-Maria Gleixner, Sebastian
    Andrzej Siewior)

    - cpuidle DT driver update to support suspend-to-idle properly
    (Sudeep Holla)

    - cpuidle core cleanups and misc updates (Daniel Lezcano, Pan Bian,
    Rafael Wysocki)

    - Preliminary support for power domains including CPUs in the generic
    power domains (genpd) framework and related DT bindings (Lina Iyer)

    - Assorted fixes and cleanups in the generic power domains (genpd)
    framework (Colin Ian King, Dan Carpenter, Geert Uytterhoeven)

    - Preliminary support for devices with multiple voltage regulators
    and related fixes and cleanups in the Operating Performance Points
    (OPP) library (Viresh Kumar, Masahiro Yamada, Stephen Boyd)

    - System sleep state selection interface rework to make it easier to
    support suspend-to-idle as the default system suspend method
    (Rafael Wysocki)

    - PM core fixes and cleanups, mostly related to the interactions
    between the system suspend and runtime PM frameworks (Ulf Hansson,
    Sahitya Tummala, Tony Lindgren)

    - Latency tolerance PM QoS framework imorovements (Andrew Lutomirski)

    - New Knights Mill CPU ID for the Intel RAPL power capping driver
    (Piotr Luc)

    - Intel RAPL power capping driver fixes, cleanups and switch over to
    using the new CPU offline/online state machine (Jacob Pan, Thomas
    Gleixner, Sebastian Andrzej Siewior)

    - Fixes and cleanups in the exynos-ppmu, exynos-nocp, rk3399_dmc,
    rockchip-dfi devfreq drivers and the devfreq core (Axel Lin,
    Chanwoo Choi, Javier Martinez Canillas, MyungJoo Ham, Viresh Kumar)

    - Fix for false-positive KASAN warnings during resume from ACPI S3
    (suspend-to-RAM) on x86 (Josh Poimboeuf)

    - Memory map verification during resume from hibernation on x86 to
    ensure a consistent address space layout (Chen Yu)

    - Wakeup sources debugging enhancement (Xing Wei)

    - rockchip-io AVS driver cleanup (Shawn Lin)"

    * tag 'pm-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (127 commits)
    devfreq: rk3399_dmc: Don't use OPP structures outside of RCU locks
    devfreq: rk3399_dmc: Remove dangling rcu_read_unlock()
    devfreq: exynos: Don't use OPP structures outside of RCU locks
    Documentation: intel_pstate: Document HWP energy/performance hints
    cpufreq: intel_pstate: Support for energy performance hints with HWP
    cpufreq: intel_pstate: Add locking around HWP requests
    PM / sleep: Print active wakeup sources when blocking on wakeup_count reads
    PM / core: Fix bug in the error handling of async suspend
    PM / wakeirq: Fix dedicated wakeirq for drivers not using autosuspend
    PM / Domains: Fix compatible for domain idle state
    PM / OPP: Don't WARN on multiple calls to dev_pm_opp_set_regulators()
    PM / OPP: Allow platform specific custom set_opp() callbacks
    PM / OPP: Separate out _generic_set_opp()
    PM / OPP: Add infrastructure to manage multiple regulators
    PM / OPP: Pass struct dev_pm_opp_supply to _set_opp_voltage()
    PM / OPP: Manage supply's voltage/current in a separate structure
    PM / OPP: Don't use OPP structure outside of rcu protected section
    PM / OPP: Reword binding supporting multiple regulators per device
    PM / OPP: Fix incorrect cpu-supply property in binding
    cpuidle: Add a kerneldoc comment to cpuidle_use_deepest_state()
    ..

    Linus Torvalds
     

02 Dec, 2016

2 commits

  • Install the callbacks via the state machine and let the core invoke the
    callbacks on the already online CPUs.

    The two smp_call_function_single() invocations in intel_idle_cpu_init() have
    been removed because intel_idle_cpu_init() is now invoked via the hotplug
    callback which runs on the target CPU. The IRQ-off calling convention for
    auto_demotion_disable() and c1e_promotion_disable() has not been preserved
    because only those two modify the MSR during CPU intialization.

    Signed-off-by: Sebastian Andrzej Siewior
    Acked-by: Jacob Pan
    Signed-off-by: Rafael J. Wysocki

    Sebastian Andrzej Siewior
     
  • Since commit 1cf4f629d9d2 ("cpu/hotplug: Move online calls to
    hotplugged cpu") the CPU_ONLINE and CPU_DOWN_PREPARE notifiers are
    always run on the hot plugged CPU, and as of commit 3b9d6da67e11
    ("cpu/hotplug: Fix rollback during error-out in __cpu_disable()") the
    CPU_DOWN_FAILED notifier also runs on the hot plugged CPU. This patch
    converts the SMP functional calls into direct calls.

    smp_function_call_single() executes the function with interrupts
    disabled. This calling convention is not preserved, because
    tick_broadcast_enable() and tick_braodcast_disable() handle
    interrupts themselves.

    Signed-off-by: Anna-Maria Gleixner
    Signed-off-by: Sebastian Andrzej Siewior
    Acked-by: Jacob Pan
    Signed-off-by: Rafael J. Wysocki

    Anna-Maria Gleixner
     

01 Dec, 2016

2 commits


18 Nov, 2016

1 commit

  • In preparation for removing the idle_notifier, remove its only user, the
    i7300_idle driver.

    i7300_idle was deployed in 2008 to reduce idle memory power on systems
    using the i7300 chipset. The driver worked by throttling the
    fully-buffered DIMMs during idle periods using the IOAT DMA engine.

    The driver ran only on the i7300 chip-set, and no other hardware has used
    this mechanism. The driver no longer has a maintainer.

    Removing this driver will increase idle power on i7300 systems when they
    run the new kernel without the driver.

    Signed-off-by: Len Brown
    Acked-by: Ingo Molnar
    Acked-by: Peter Zijlstra (Intel)
    Link: http://lkml.kernel.org/r/ad6a044e57cc75f44cc8621abe846e58f7882243.1479449716.git.len.brown@intel.com
    Signed-off-by: Thomas Gleixner

    Len Brown
     

08 Oct, 2016

1 commit

  • When doing an nmi backtrace of many cores, most of which are idle, the
    output is a little overwhelming and very uninformative. Suppress
    messages for cpus that are idling when they are interrupted and just
    emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".

    We do this by grouping all the cpuidle code together into a new
    .cpuidle.text section, and then checking the address of the interrupted
    PC to see if it lies within that section.

    This commit suitably tags x86 and tile idle routines, and only adds in
    the minimal framework for other architectures.

    Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.com
    Signed-off-by: Chris Metcalf
    Acked-by: Peter Zijlstra (Intel)
    Tested-by: Peter Zijlstra (Intel)
    Tested-by: Daniel Thompson [arm]
    Tested-by: Petr Mladek
    Cc: Aaron Tomlin
    Cc: Peter Zijlstra (Intel)
    Cc: "Rafael J. Wysocki"
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Metcalf
     

31 Jul, 2016

1 commit

  • Pull x86 cpufeature updates from Thomas Gleixner:

    - a workaround for the MONITOR instruction erratum of Goldmont CPUs

    - small fixes and cleanups here and there

    * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/cpu: Add workaround for MONITOR instruction erratum on Goldmont based CPUs
    x86/cpu: Rename "WESTMERE2" family to "NEHALEM_G"
    x86/amd_nb: Clean up init path
    x86/cpufeature: Add helper macro for mask check macros
    x86/cpufeature: Make sure DISABLED/REQUIRED macros are updated
    x86/cpufeature: Update cpufeaure macros

    Linus Torvalds
     

09 Jul, 2016

2 commits

  • Commit 5dcef69486 ("intel_idle: add BXT support") added an 8-element
    lookup array with just a 2-bit value used for lookups. As per the SDM
    that bit field is really 3 bits wide. While this is supposedly benign
    here, future re-use of the code for other CPUs might expose the issue.

    Signed-off-by: Jan Beulich
    Signed-off-by: Rafael J. Wysocki

    Jan Beulich
     
  • Since irtl_ns_units[] has itself zero entries, make sure the caller
    recognized those cases along with the MSR read returning zero, as zero
    is not a valid value for exit_latency and target_residency.

    Signed-off-by: Jan Beulich
    Signed-off-by: Rafael J. Wysocki

    Jan Beulich
     

01 Jul, 2016

1 commit

  • Len Brown noticed something was amiss in our INTEL_FAM6_*
    definitions. It seems like model 0x1F was a Nehalem part,
    marketed as "Intel Core i7 and i5 Processors" (according to the
    SDM). But, although it was a Nehalem 0x1F had some uncore events
    which were shared with Westmere.

    Len also mentioned he thought it was called "Havendale", which
    Wikipedia says was graphics-oriented and canceled:

    https://en.wikipedia.org/wiki/Nehalem_(microarchitecture)

    So either way, it's probably not imporant what we call it, but
    call it Nehalem to be accurate, and add a "G" since it seems
    graphics-related. If it were canceled that would be a good reason
    why it's so sparsely and inconsistently referred to in the code.

    Signed-off-by: Dave Hansen
    Cc: Dave Hansen
    Cc: Len Brown
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20160629192737.949C41A8@viggo.jf.intel.com
    Signed-off-by: Ingo Molnar

    Dave Hansen
     

23 Jun, 2016

2 commits

  • Denverton is an Intel Atom based micro server which shares the same
    Goldmont architecture as Broxton. The available C-states on
    Denverton is a subset of Broxton with only C1, C1e, and C6.

    Signed-off-by: Jacob Pan
    Signed-off-by: Len Brown
    Signed-off-by: Rafael J. Wysocki

    Jacob Pan
     
  • The Kconfig for this driver is currently declared with:

    config INTEL_IDLE
    bool "Cpuidle Driver for Intel Processors"

    ...meaning that it currently is not being built as a module by anyone.

    This was done in commit 6ce9cd8669fa1195fdc21643370e34523c7ac988
    ("intel_idle: disable module support") since "...the module capability
    is cauing more trouble than it is worth."

    This was done over 5y ago, and Daniel adds that:

    ...the modular support has been removed from almost all the cpuidle
    drivers and the cpuidle framework is no longer assuming driver could
    be unloaded.

    Removing the modular dead code in the driver makes sense as this
    what have been done in the others drivers.

    So lets remove the modular code that is essentially orphaned, so that
    when reading the driver there is no doubt it is builtin-only.

    Since module_init translates to device_initcall in the non-modular
    case, the init ordering remains unchanged with this commit. At a
    later date we might want to consider whether subsys_init or another
    init category seems more appropriate than device_init.

    We replace module.h with moduleparam.h since the file does declare
    some module parameters, and leaving them as such is currently the
    easiest way to remain compatible with existing boot arg use cases.

    Note that MODULE_DEVICE_TABLE is a no-op for non-modular code.

    Also note that we can't remove intel_idle_cpuidle_devices_uninit() as
    that is still used for unwind purposes if the init fails.

    We also delete the MODULE_LICENSE tag etc. since all that information
    is already contained at the top of the file in the comments.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: Len Brown
    Signed-off-by: Rafael J. Wysocki

    Paul Gortmaker
     

08 Jun, 2016

1 commit

  • Use the new INTEL_FAM6_* macros for intel_idle.c. Also fix up
    some of the macros to be consistent with how some of the
    intel_idle code refers to the model.

    There's on oddity here: model 0x1F is uniquely referred to here
    and nowhere else that I could find. 0x1E/0x1F are just spelled
    out as "Intel Core i7 and i5 Processors" in the SDM or as "Intel
    processors based on the Nehalem, Westmere microarchitectures" in
    the RDPMC section. Comments between tables 19-19 and 19-20 in
    the SDM seem to point to 0x1F being some kind of Westmere, so
    let's call it "WESTMERE2".

    Signed-off-by: Dave Hansen
    Acked-by: Rafael J. Wysocki
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Len Brown
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: jacob.jun.pan@intel.com
    Cc: linux-pm@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160603001932.EE978EB9@viggo.jf.intel.com
    Signed-off-by: Ingo Molnar

    Dave Hansen
     

09 Apr, 2016

1 commit

  • Broxton has all the HSW C-states, except C3.
    BXT C-state timing is slightly different.

    Here we trust the IRTL MSRs as authority
    on maximum C-state latency, and override the driver's tables
    with the values found in the associated IRTL MSRs.
    Further we set the target_residency to 1x maximum latency,
    trusting the hardware demotion logic.

    Signed-off-by: Len Brown
    Signed-off-by: Rafael J. Wysocki

    Len Brown
     

08 Apr, 2016

10 commits

  • KBL is similar to SKL

    Signed-off-by: Len Brown
    Signed-off-by: Rafael J. Wysocki

    Len Brown
     
  • SKX is similar to BDX

    Signed-off-by: Len Brown
    Signed-off-by: Rafael J. Wysocki

    Len Brown
     
  • This driver registers cpuidle devices when a CPU comes online, but it
    leaves the registrations in place when a CPU goes offline. The module
    exit code only unregisters the currently online CPUs, leaving the
    devices for offline CPUs dangling.

    This patch changes the driver to clean up all registrations on exit,
    even those from CPUs that are offline.

    Signed-off-by: Richard Cochran
    Signed-off-by: Len Brown
    Signed-off-by: Rafael J. Wysocki

    Richard Cochran
     
  • If a cpuidle registration error occurs during the hot plug notifier
    callback, we should really inform the hot plug machinery instead of
    just ignoring the error. This patch changes the callback to properly
    return on error.

    Signed-off-by: Richard Cochran
    Signed-off-by: Len Brown
    Signed-off-by: Rafael J. Wysocki

    Richard Cochran
     
  • The helper function, intel_idle_cpu_init, registers one new device
    with the cpuidle layer. If the registration should fail, that
    function immediately calls intel_idle_cpuidle_devices_uninit() to
    unregister every last CPU's device. However, it makes no sense to do
    so, when called from the hot plug notifier callback.

    This patch moves the call to intel_idle_cpuidle_devices_uninit()
    outside of the helper function to the one call site that actually
    needs to perform the de-registrations.

    Signed-off-by: Richard Cochran
    Signed-off-by: Len Brown
    Signed-off-by: Rafael J. Wysocki

    Richard Cochran
     
  • This driver sets the broadcast tick quite early on during probe and does
    not clean up again in cast of failure. This patch moves the setup call
    after the registration, placing the on_each_cpu() calls within the global
    CPU lock region.

    Signed-off-by: Richard Cochran
    Signed-off-by: Len Brown
    Signed-off-by: Rafael J. Wysocki

    Richard Cochran
     
  • The helper function, intel_idle_cpuidle_devices_uninit, frees the
    globally allocated per-CPU data. However, this function is invoked
    from the hot plug notifier callback at a time when freeing that data
    is not safe.

    If the call to cpuidle_register_driver() should fail (say, due to lack
    of memory), then the driver will free its per-CPU region. On the
    *next* CPU_ONLINE event, the driver will happily use the region again
    and even free it again if the failure repeats.

    This patch fixes the issue by moving the call to free_percpu() outside
    of the helper function at the two call sites that actually need to
    free the per-CPU data.

    Signed-off-by: Richard Cochran
    Signed-off-by: Len Brown
    Signed-off-by: Rafael J. Wysocki

    Richard Cochran
     
  • In the module_init() method, if the per-CPU allocation fails, then the
    active cpuidle registration is not cleaned up. This patch fixes the
    issue by attempting the allocation before registration, and then
    cleaning it up again on registration failure.

    Signed-off-by: Richard Cochran
    Signed-off-by: Len Brown
    Signed-off-by: Rafael J. Wysocki

    Richard Cochran
     
  • In the module_exit() method, this driver first frees its per-CPU
    pointer, then unregisters a callback making use of the pointer.
    Furthermore, the function, intel_idle_cpuidle_devices_uninit, is racy
    against CPU hot plugging as it calls for_each_online_cpu().

    This patch corrects the issues by unregistering first on the exit path
    while holding the hot plug lock.

    Signed-off-by: Richard Cochran
    Signed-off-by: Len Brown
    Signed-off-by: Rafael J. Wysocki

    Richard Cochran
     
  • The function, intel_idle_cpuidle_driver_init, makes calls on each CPU
    to auto_demotion_disable() and c1e_promotion_disable(). These calls
    are redundant, as intel_idle_cpu_init() does the same calls just a bit
    later on. They are also premature, as the driver registration may yet
    fail.

    This patch removes the redundant code.

    Signed-off-by: Richard Cochran
    Signed-off-by: Len Brown
    Signed-off-by: Rafael J. Wysocki

    Richard Cochran