15 Sep, 2020

1 commit

  • This driver does not restore stop > 3 state, so it limits itself
    to states which do not lose full state or TB.

    The POWER10 SPRs are sufficiently different from P9 that it seems
    easier to split out the P10 code. The POWER10 deep sleep code
    (e.g., the BHRB restore) has been taken out, but it can be re-added
    when stop > 3 support is added.

    Signed-off-by: Nicholas Piggin
    Tested-by: Pratik Rajesh Sampat
    Tested-by: Vaidyanathan Srinivasan
    Reviewed-by: Pratik Rajesh Sampat
    Reviewed-by: Gautham R. Shenoy
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200819094700.493399-1-npiggin@gmail.com

    Nicholas Piggin
     

15 Jul, 2020

1 commit

  • Commit 1961acad2f88559c2cdd2ef67c58c3627f1f6e54 removes usage of
    function "validate_dt_prop_sizes". This patch removes this unused
    function.

    Signed-off-by: Abhishek Goel
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200706053258.121475-1-huntbag@linux.vnet.ibm.com

    Abhishek Goel
     

06 Nov, 2019

1 commit

  • There are two reasons why CPU idle states may be disabled: either
    because the driver has disabled them or because they have been
    disabled by user space via sysfs.

    In the former case, the state's "disabled" flag is set once during
    the initialization of the driver and it is never cleared later (it
    is read-only effectively). In the latter case, the "disable" field
    of the given state's cpuidle_state_usage struct is set and it may be
    changed via sysfs. Thus checking whether or not an idle state has
    been disabled involves reading these two flags every time.

    In order to avoid the additional check of the state's "disabled" flag
    (which is effectively read-only anyway), use the value of it at the
    init time to set a (new) flag in the "disable" field of that state's
    cpuidle_state_usage structure and use the sysfs interface to
    manipulate another (new) flag in it. This way the state is disabled
    whenever the "disable" field of its cpuidle_state_usage structure is
    nonzero, whatever the reason, and it is the only place to look into
    to check whether or not the state has been disabled.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Daniel Lezcano
    Acked-by: Peter Zijlstra (Intel)

    Rafael J. Wysocki
     

31 Jul, 2018

1 commit


05 Jun, 2018

1 commit

  • The commit 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of
    snooze to deeper idle state") introduced a timeout for the snooze idle
    state so that it could be eventually be promoted to a deeper idle
    state. The snooze timeout value is static and set to the target
    residency of the next idle state, which would train the cpuidle
    governor to pick the next idle state eventually.

    The unfortunate side-effect of this is that if the next idle state(s)
    is disabled, the CPU will forever remain in snooze, despite the fact
    that the system is completely idle, and other deeper idle states are
    available.

    This patch fixes the issue by dynamically setting the snooze timeout
    to the target residency of the next enabled state on the device.

    Before Patch:
    POWER8 : Only nap disabled.
    $ cpupower monitor sleep 30
    sleep took 30.01297 seconds and exited with status 0
    |Idle_Stats
    PKG |CORE|CPU | snoo | Nap | Fast
    0| 8| 0| 96.41| 0.00| 0.00
    0| 8| 1| 96.43| 0.00| 0.00
    0| 8| 2| 96.47| 0.00| 0.00
    0| 8| 3| 96.35| 0.00| 0.00
    0| 8| 4| 96.37| 0.00| 0.00
    0| 8| 5| 96.37| 0.00| 0.00
    0| 8| 6| 96.47| 0.00| 0.00
    0| 8| 7| 96.47| 0.00| 0.00

    POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1,
    stop2) disabled:
    $ cpupower monitor sleep 30
    sleep took 30.05033 seconds and exited with status 0
    |Idle_Stats
    PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop
    0| 16| 0| 89.79| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00
    0| 16| 1| 90.12| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00
    0| 16| 2| 90.21| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00
    0| 16| 3| 90.29| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00

    After Patch:
    POWER8 : Only nap disabled.
    $ cpupower monitor sleep 30
    sleep took 30.01200 seconds and exited with status 0
    |Idle_Stats
    PKG |CORE|CPU | snoo | Nap | Fast
    0| 8| 0| 16.58| 0.00| 77.21
    0| 8| 1| 18.42| 0.00| 75.38
    0| 8| 2| 4.70| 0.00| 94.09
    0| 8| 3| 17.06| 0.00| 81.73
    0| 8| 4| 3.06| 0.00| 95.73
    0| 8| 5| 7.00| 0.00| 96.80
    0| 8| 6| 1.00| 0.00| 98.79
    0| 8| 7| 5.62| 0.00| 94.17

    POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1,
    stop2) disabled:

    $ cpupower monitor sleep 30
    sleep took 30.02110 seconds and exited with status 0
    |Idle_Stats
    PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop
    0| 0| 0| 0.69| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 9.39| 89.70
    0| 0| 1| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.05| 93.21
    0| 0| 2| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 89.93
    0| 0| 3| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 93.26

    Fixes: 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state")
    Cc: stable@vger.kernel.org # v4.2+
    Signed-off-by: Gautham R. Shenoy
    Reviewed-by: Balbir Singh
    Signed-off-by: Michael Ellerman

    Gautham R. Shenoy
     

18 Jan, 2018

1 commit


17 Nov, 2017

1 commit

  • Pull powerpc updates from Michael Ellerman:
    "A bit of a small release, I suspect in part due to me travelling for
    KS. But my backlog of patches to review is smaller than usual, so I
    think in part folks just didn't send as much this cycle.

    Non-highlights:

    - Five fixes for the >128T address space handling, both to fix bugs
    in our implementation and to bring the semantics exactly into line
    with x86.

    Highlights:

    - Support for a new OPAL call on bare metal machines which gives us a
    true NMI (ie. is not masked by MSR[EE]=0) for debugging etc.

    - Support for Power9 DD2 in the CXL driver.

    - Improvements to machine check handling so that uncorrectable errors
    can be reported into the generic memory_failure() machinery.

    - Some fixes and improvements for VPHN, which is used under PowerVM
    to notify the Linux partition of topology changes.

    - Plumbing to enable TM (transactional memory) without suspend on
    some Power9 processors (PPC_FEATURE2_HTM_NO_SUSPEND).

    - Support for emulating vector loads form cache-inhibited memory, on
    some Power9 revisions.

    - Disable the fast-endian switch "syscall" by default (behind a
    CONFIG), we believe it has never had any users.

    - A major rework of the API drivers use when initiating and waiting
    for long running operations performed by OPAL firmware, and changes
    to the powernv_flash driver to use the new API.

    - Several fixes for the handling of FP/VMX/VSX while processes are
    using transactional memory.

    - Optimisations of TLB range flushes when using the radix MMU on
    Power9.

    - Improvements to the VAS facility used to access coprocessors on
    Power9, and related improvements to the way the NX crypto driver
    handles requests.

    - Implementation of PMEM_API and UACCESS_FLUSHCACHE for 64-bit.

    Thanks to: Alexey Kardashevskiy, Alistair Popple, Allen Pais, Andrew
    Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Balbir Singh, Benjamin
    Herrenschmidt, Breno Leitao, Christophe Leroy, Christophe Lombard,
    Cyril Bur, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven,
    Guilherme G. Piccoli, Gustavo Romero, Haren Myneni, Joel Stanley,
    Kamalesh Babulal, Kautuk Consul, Markus Elfring, Masami Hiramatsu,
    Michael Bringmann, Michael Neuling, Michal Suchanek, Naveen N. Rao,
    Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pedro Miraglia
    Franco de Carvalho, Philippe Bergheaud, Sandipan Das, Seth Forshee,
    Shriya, Stephen Rothwell, Stewart Smith, Sukadev Bhattiprolu, Tyrel
    Datwyler, Vaibhav Jain, Vaidyanathan Srinivasan, and William A.
    Kennington III"

    * tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (151 commits)
    powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature
    powerpc/64s: Fix masking of SRR1 bits on instruction fault
    powerpc/64s: mm_context.addr_limit is only used on hash
    powerpc/64s/radix: Fix 128TB-512TB virtual address boundary case allocation
    powerpc/64s/hash: Allow MAP_FIXED allocations to cross 128TB boundary
    powerpc/64s/hash: Fix fork() with 512TB process address space
    powerpc/64s/hash: Fix 128TB-512TB virtual address boundary case allocation
    powerpc/64s/hash: Fix 512T hint detection to use >= 128T
    powerpc: Fix DABR match on hash based systems
    powerpc/signal: Properly handle return value from uprobe_deny_signal()
    powerpc/fadump: use kstrtoint to handle sysfs store
    powerpc/lib: Implement UACCESS_FLUSHCACHE API
    powerpc/lib: Implement PMEM API
    powerpc/powernv/npu: Don't explicitly flush nmmu tlb
    powerpc/powernv/npu: Use flush_all_mm() instead of flush_tlb_mm()
    powerpc/powernv/idle: Round up latency and residency values
    powerpc/kprobes: refactor kprobe_lookup_name for safer string operations
    powerpc/kprobes: Blacklist emulate_update_regs() from kprobes
    powerpc/kprobes: Do not disable interrupts for optprobes and kprobes_on_ftrace
    powerpc/kprobes: Disable preemption before invoking probe handler for optprobes
    ...

    Linus Torvalds
     

13 Nov, 2017

1 commit

  • On PowerNV platforms, firmware provides exit latency and
    target residency for each of the idle states in nano
    seconds. Cpuidle framework expects the values in micro
    seconds. Round up to nearest micro seconds to avoid errors
    in cases where the values are defined as fractional micro
    seconds.

    Default idle state of 'snooze' has exit latency of zero. If
    other states have fractional micro second exit latency, they
    would get rounded down to zero micro second and make cpuidle
    framework choose deeper idle state when snooze loop is the
    right choice.

    Reported-by: Anton Blanchard
    Signed-off-by: Vaidyanathan Srinivasan
    Reviewed-by: Gautham R. Shenoy
    Signed-off-by: Michael Ellerman

    Vaidyanathan Srinivasan
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

08 Aug, 2017

1 commit

  • Currently, we use the opal call opal_slw_set_reg() to inform the
    Sleep-Winkle Engine (SLW) to restore the contents of some of the
    Hypervisor state on wakeup from deep idle states that lose full
    hypervisor context (characterized by the flag
    OPAL_PM_LOSE_FULL_CONTEXT).

    However, the current code has a bug in that if opal_slw_set_reg()
    fails, we don't disable the use of these deep states (winkle on
    POWER8, stop4 onwards on POWER9).

    This patch fixes this bug by ensuring that if programing the
    sleep-winkle engine to restore the hypervisor states in
    pnv_save_sprs_for_deep_states() fails, then we exclude such states by
    clearing the OPAL_PM_LOSE_FULL_CONTEXT flag from
    supported_cpuidle_states. As a result POWER8 will be prevented from
    using winkle for CPU-Hotplug, and POWER9 will put the offlined CPUs to
    the default stop state when available.

    Further, we ensure in the initialization of the cpuidle-powernv driver
    to only include those states whose flags are present in
    supported_cpuidle_states, thereby skipping OPAL_PM_LOSE_FULL_CONTEXT
    states when they have been disabled due to stop-api failure.

    Fixes: 1e1601b38e6 ("powerpc/powernv/idle: Restore SPRs for deep idle
    states via stop API.")

    Signed-off-by: Gautham R. Shenoy
    Signed-off-by: Michael Ellerman

    Gautham R. Shenoy
     

28 Jun, 2017

3 commits


19 Jun, 2017

1 commit


30 May, 2017

1 commit

  • The current code in the cpuidle-powernv intialization only allows deep
    stop states (indicated by OPAL_PM_STOP_INST_DEEP) which lose timebase
    (indicated by OPAL_PM_TIMEBASE_STOP). This assumption goes back to
    POWER8 time where deep states used to lose the timebase. However, on
    POWER9, we do have stop states that are deep (they lose hypervisor
    state) but retain the timebase.

    Fix the initialization code in the cpuidle-powernv driver to allow
    such deep states.

    Further, there is a bug in cpuidle-powernv driver with
    CONFIG_TICK_ONESHOT=n where we end up incrementing the nr_idle_states
    even if a platform idle state which loses time base was not added to
    the cpuidle table.

    Fix this by ensuring that the nr_idle_states variable gets incremented
    only when the platform idle state was added to the cpuidle table.

    Signed-off-by: Gautham R. Shenoy
    Signed-off-by: Michael Ellerman

    Gautham R. Shenoy
     

29 Apr, 2017

1 commit

  • * pm-cpuidle:
    cpuidle: powernv: Avoid a branch in the core snooze_loop() loop
    cpuidle: powernv: Don't continually set thread priority in snooze_loop()
    cpuidle: powernv: Don't bounce between low and very low thread priority
    cpuidle: cpuidle-cps: remove unused variable
    powernv-cpuidle: Validate DT property array size

    * pm-core:
    PM / runtime: Document autosuspend-helper side effects
    PM / runtime: Fix autosuspend documentation

    * pm-domains:
    PM / Domains: Ignore domain-idle-states that are not compatible
    PM / Domains: Don't warn about IRQ safe device for an always on PM domain
    PM / Domains: Respect errors from genpd's ->power_off() callback
    PM / Domains: Enable users of genpd to specify always on PM domains
    PM / Domains: Clean up code validating genpd's status
    PM / Domain: remove conditional from error case

    * pm-avs:
    PM / AVS: rockchip-io: add io selectors and supplies for rk3328

    * pm-devfreq:
    PM / devfreq: Move struct devfreq_governor to devfreq directory

    Rafael J. Wysocki
     

20 Apr, 2017

3 commits


30 Mar, 2017

1 commit

  • drv->cpumask defaults to cpu_possible_mask in __cpuidle_driver_init().
    On PowerNV platform cpu_present could be less than cpu_possible in cases
    where firmware detects the cpu, but it is not available to the OS. When
    CONFIG_HOTPLUG_CPU=n, such cpus are not hotplugable at runtime and hence
    we skip creating cpu_device.

    This breaks cpuidle on powernv where register_cpu() is not called for
    cpus in cpu_possible_mask that cannot be hot-added at runtime.

    Trying cpuidle_register_device() on cpu without cpu_device will cause
    crash like this:

    cpu 0xf: Vector: 380 (Data SLB Access) at [c000000ff1503490]
    pc: c00000000022c8bc: string+0x34/0x60
    lr: c00000000022ed78: vsnprintf+0x284/0x42c
    sp: c000000ff1503710
    msr: 9000000000009033
    dar: 6000000060000000
    current = 0xc000000ff1480000
    paca = 0xc00000000fe82d00 softe: 0 irq_happened: 0x01
    pid = 1, comm = swapper/8
    Linux version 4.11.0-rc2 (sv@sagarika) (gcc version 4.9.4
    (Buildroot 2017.02-00004-gc28573e) ) #15 SMP Fri Mar 17 19:32:02 IST 2017
    enter ? for help
    [link register ] c00000000022ed78 vsnprintf+0x284/0x42c
    [c000000ff1503710] c00000000022ebb8 vsnprintf+0xc4/0x42c (unreliable)
    [c000000ff1503800] c00000000022ef40 vscnprintf+0x20/0x44
    [c000000ff1503830] c0000000000ab61c vprintk_emit+0x94/0x2cc
    [c000000ff15038a0] c0000000000acc9c vprintk_func+0x60/0x74
    [c000000ff15038c0] c000000000619694 printk+0x38/0x4c
    [c000000ff15038e0] c000000000224950 kobject_get+0x40/0x60
    [c000000ff1503950] c00000000022507c kobject_add_internal+0x60/0x2c4
    [c000000ff15039e0] c000000000225350 kobject_init_and_add+0x70/0x78
    [c000000ff1503a60] c00000000053c288 cpuidle_add_sysfs+0x9c/0xe0
    [c000000ff1503ae0] c00000000053aeac cpuidle_register_device+0xd4/0x12c
    [c000000ff1503b30] c00000000053b108 cpuidle_register+0x98/0xcc
    [c000000ff1503bc0] c00000000085eaf0 powernv_processor_idle_init+0x140/0x1e0
    [c000000ff1503c60] c00000000000cd60 do_one_initcall+0xc0/0x15c
    [c000000ff1503d20] c000000000833e84 kernel_init_freeable+0x1a0/0x25c
    [c000000ff1503dc0] c00000000000d478 kernel_init+0x24/0x12c
    [c000000ff1503e30] c00000000000b564 ret_from_kernel_thread+0x5c/0x78

    This patch fixes the bug by passing correct cpumask from
    powernv-cpuidle driver.

    Signed-off-by: Vaidyanathan Srinivasan
    Reviewed-by: Gautham R. Shenoy
    Acked-by: Michael Ellerman
    [ rjw: Comment massage ]
    Signed-off-by: Rafael J. Wysocki

    Vaidyanathan Srinivasan
     

29 Mar, 2017

1 commit

  • The various properties associated with powernv idle states such as
    names, flags, residency-ns, latencies-ns, psscr, psscr-mask are
    exposed in the device-tree as property arrays such the pointwise
    entries in each of these arrays correspond to the properties of the
    same idle state.

    This patch validates that the lengths of the property arrays are the
    same. If there is a mismatch, the patch will ensure that we bail out
    and not expose the platform idle states via cpuidle.

    Signed-off-by: Gautham R. Shenoy
    Reviewed-by: Shilpasri G Bhat
    Signed-off-by: Rafael J. Wysocki

    Gautham R. Shenoy
     

31 Jan, 2017

2 commits

  • The power9_idle_stop method currently takes only the requested stop
    level as a parameter and picks up the rest of the PSSCR bits from a
    hand-coded macro. This is not a very flexible design, especially when
    the firmware has the capability to communicate the psscr value and the
    mask associated with a particular stop state via device tree.

    This patch modifies the power9_idle_stop API to take as parameters the
    PSSCR value and the PSSCR mask corresponding to the stop state that
    needs to be set. These PSSCR value and mask are respectively obtained
    by parsing the "ibm,cpu-idle-state-psscr" and
    "ibm,cpu-idle-state-psscr-mask" fields from the device tree.

    In addition to this, the patch adds support for handling stop states
    for which ESL and EC bits in the PSSCR are zero. As per the
    architecture, a wakeup from these stop states resumes execution from
    the subsequent instruction as opposed to waking up at the System
    Vector.

    The older firmware sets only the Requested Level (RL) field in the
    psscr and psscr-mask exposed in the device tree. For older firmware
    where psscr-mask=0xf, this patch will set the default sane values that
    the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and
    TR). For the new firmware, the patch will validate that the invariants
    required by the ISA for the psscr values are maintained by the
    firmware.

    This skiboot patch that exports fully populated PSSCR values and the
    mask for all the stop states can be found here:
    https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html

    [Optimize the number of instructions before entering STOP with
    ESL=EC=0, validate the PSSCR values provided by the firimware
    maintains the invariants required as per the ISA suggested by Balbir
    Singh]

    Acked-by: Balbir Singh
    Signed-off-by: Gautham R. Shenoy
    Signed-off-by: Michael Ellerman

    Gautham R. Shenoy
     
  • In the current code for powernv_add_idle_states, there is a lot of code
    duplication while initializing an idle state in powernv_states table.

    Add an inline helper function to populate the powernv_states[] table
    for a given idle state. Invoke this for populating the "Nap",
    "Fastsleep" and the stop states in powernv_add_idle_states.

    Signed-off-by: Gautham R. Shenoy
    Acked-by: Balbir Singh
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Michael Ellerman

    Gautham R. Shenoy
     

25 Nov, 2016

1 commit


07 Sep, 2016

1 commit

  • Install the callbacks via the state machine.

    v1…v2: - Use only CPUHP_CPUIDLE_DEAD (requested by Daniel Lezcano)

    Signed-off-by: Sebastian Andrzej Siewior
    Cc: linux-pm@vger.kernel.org
    Cc: Peter Zijlstra
    Cc: Daniel Lezcano
    Cc: "Rafael J. Wysocki"
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160824091259.ozyslcopxvbfdqzy@linutronix.de
    Signed-off-by: Thomas Gleixner

    Sebastian Andrzej Siewior
     

15 Jul, 2016

3 commits

  • POWER ISA v3 defines a new idle processor core mechanism. In summary,
    a) new instruction named stop is added.
    b) new per thread SPR named PSSCR is added which controls the behavior
    of stop instruction.

    Supported idle states and value to be written to PSSCR register to enter
    any idle state is exposed via ibm,cpu-idle-state-names and
    ibm,cpu-idle-state-psscr respectively. To enter an idle state,
    platform provided power_stop() needs to be invoked with the appropriate
    PSSCR value.

    This patch adds support for this new mechanism in cpuidle powernv driver.

    Cc: Rafael J. Wysocki
    Cc: Daniel Lezcano
    Cc: Rob Herring
    Cc: Lorenzo Pieralisi
    Cc: linux-pm@vger.kernel.org
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: linuxppc-dev@lists.ozlabs.org
    Reviewed-by: Gautham R. Shenoy
    Signed-off-by: Shreyas B. Prabhu
    Signed-off-by: Michael Ellerman

    Shreyas B. Prabhu
     
  • - Use stack instead of kzalloc'ed memory for variables while probing
    device tree for idle states.
    - Set cap for number of idle states that can be added to
    cpuidle_state_table
    - Minor change in way we check of_property_read_u32_array for error
    for sake of consistency
    - Drop unnecessary "&" while assigning function pointer

    Cc: Rafael J. Wysocki
    Cc: Daniel Lezcano
    Cc: linux-pm@vger.kernel.org
    Signed-off-by: Shreyas B. Prabhu
    Signed-off-by: Michael Ellerman

    Shreyas B. Prabhu
     
  • Use cpuidle's CPUIDLE_STATE_MAX macro instead of powernv specific
    MAX_POWERNV_IDLE_STATES.

    Cc: Rafael J. Wysocki
    Cc: Daniel Lezcano
    Cc: linux-pm@vger.kernel.org
    Acked-by: Daniel Lezcano
    Signed-off-by: Shreyas B. Prabhu
    Signed-off-by: Michael Ellerman

    Shreyas B. Prabhu
     

17 Dec, 2015

1 commit


26 Jun, 2015

1 commit

  • On some archs, the local clockevent device stops in deep cpuidle states.
    The broadcast framework is used to wakeup cpus in these idle states, in
    which either an external clockevent device is used to send wakeup ipis
    or the hrtimer broadcast framework kicks in in the absence of such a
    device. One cpu is nominated as the broadcast cpu and this cpu sends
    wakeup ipis to sleeping cpus at the appropriate time. This is the
    implementation in the oneshot mode of broadcast.

    In periodic mode of broadcast however, the presence of such cpuidle
    states results in the cpuidle driver calling tick_broadcast_enable()
    which shuts down the local clockevent devices of all the cpus and
    appoints the tick broadcast device as the clockevent device for each of
    them. This works on those archs where the tick broadcast device is a
    real clockevent device. But on archs which depend on the hrtimer mode
    of broadcast, the tick broadcast device hapens to be a pseudo device.
    The consequence is that the local clockevent devices of all cpus are
    shutdown and the kernel hangs at boot time in periodic mode.

    Let us thus not register the cpuidle states which have
    CPUIDLE_FLAG_TIMER_STOP flag set, on archs which depend on the hrtimer
    mode of broadcast in periodic mode. This patch takes care of doing this
    on powerpc. The cpus would not have entered into such deep cpuidle
    states in periodic mode on powerpc anyway. So there is no loss here.

    Signed-off-by: Preeti U Murthy
    Cc: 3.19+ # 3.19+
    Signed-off-by: Rafael J. Wysocki

    preeti
     

22 Jun, 2015

1 commit

  • The idle cpus which stay in snooze for a long period can degrade the
    perfomance of the sibling cpus. If the cpu stays in snooze for more
    than target residency of the next available idle state, then exit from
    snooze. This gives a chance to the cpuidle governor to re-evaluate the
    last idle state of the cpu to promote it to deeper idle states.

    Signed-off-by: Shilpasri G Bhat
    Reviewed-by: Preeti U Murthy
    Signed-off-by: Rafael J. Wysocki

    Shilpasri G Bhat
     

20 Feb, 2015

1 commit


18 Feb, 2015

1 commit

  • The device tree now exposes the residency values for different idle states. Read
    these values instead of calculating residency from the latency values. The values
    exposed in the DT are validated for optimal power efficiency. However to maintain
    compatibility with the older firmware code which does not expose residency
    values, use default values as a fallback mechanism. While at it, use better
    APIs to parse the powermgmt device tree node.

    Signed-off-by: Preeti U Murthy
    Acked-by: Stewart Smith
    Acked-by: Michael Ellerman
    Signed-off-by: Rafael J. Wysocki

    Preeti U Murthy
     

20 Dec, 2014

1 commit

  • Pull second batch of powerpc updates from Michael Ellerman:
    "The highlight is the series that reworks the idle management on
    powernv, which allows us to use deeper idle states on those machines.

    There's the fix from Anton for the "BUG at kernel/smpboot.c:134!"
    problem.

    An i2c driver for powernv. This is acked by Wolfram Sang, and he
    asked that we take it through the powerpc tree.

    A fix for audit from rgb at Red Hat, acked by Paul Moore who is one of
    the audit maintainers.

    A patch from Ben to export the symbol map of our OPAL firmware as a
    sysfs file, so that tools can use it.

    Also some CXL fixes, a couple of powerpc perf fixes, a fix for
    smt-enabled, and the patch to add __force to get_user() so we can use
    bitwise types"

    * tag 'powerpc-3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
    powerpc/powernv: Ignore smt-enabled on Power8 and later
    powerpc/uaccess: Allow get_user() with bitwise types
    powerpc/powernv: Expose OPAL firmware symbol map
    powernv/powerpc: Add winkle support for offline cpus
    powernv/cpuidle: Redesign idle states management
    powerpc/powernv: Enable Offline CPUs to enter deep idle states
    powerpc/powernv: Switch off MMU before entering nap/sleep/rvwinkle mode
    i2c: Driver to expose PowerNV platform i2c busses
    powerpc: add little endian flag to syscall_get_arch()
    power/perf/hv-24x7: Use kmem_cache_free() instead of kfree
    powerpc/perf/hv-24x7: Use per-cpu page buffer
    cxl: Unmap MMIO regions when detaching a context
    cxl: Add timeout to process element commands
    cxl: Change contexts_lock to a mutex to fix sleep while atomic bug
    powerpc: Secondary CPUs must set cpu_callin_map after setting active and online

    Linus Torvalds
     

15 Dec, 2014

2 commits

  • Deep idle states like sleep and winkle are per core idle states. A core
    enters these states only when all the threads enter either the
    particular idle state or a deeper one. There are tasks like fastsleep
    hardware bug workaround and hypervisor core state save which have to be
    done only by the last thread of the core entering deep idle state and
    similarly tasks like timebase resync, hypervisor core register restore
    that have to be done only by the first thread waking up from these
    state.

    The current idle state management does not have a way to distinguish the
    first/last thread of the core waking/entering idle states. Tasks like
    timebase resync are done for all the threads. This is not only is
    suboptimal, but can cause functionality issues when subcores and kvm is
    involved.

    This patch adds the necessary infrastructure to track idle states of
    threads in a per-core structure. It uses this info to perform tasks like
    fastsleep workaround and timebase resync only once per core.

    Signed-off-by: Shreyas B. Prabhu
    Originally-by: Preeti U. Murthy
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Rafael J. Wysocki
    Cc: linux-pm@vger.kernel.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Michael Ellerman

    Shreyas B. Prabhu
     
  • The secondary threads should enter deep idle states so as to gain maximum
    powersavings when the entire core is offline. To do so the offline path
    must be made aware of the available deepest idle state. Hence probe the
    device tree for the possible idle states in powernv core code and
    expose the deepest idle state through flags.

    Since the device tree is probed by the cpuidle driver as well, move
    the parameters required to discover the idle states into an appropriate
    common place to both the driver and the powernv core code.

    Another point is that fastsleep idle state may require workarounds in
    the kernel to function properly. This workaround is introduced in the
    subsequent patches. However neither the cpuidle driver or the hotplug
    path need be bothered about this workaround.

    They will be taken care of by the core powernv code.

    Originally-by: Srivatsa S. Bhat
    Signed-off-by: Preeti U. Murthy
    Signed-off-by: Shreyas B. Prabhu
    Reviewed-by: Paul Mackerras

    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Rafael J. Wysocki
    Cc: linux-pm@vger.kernel.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Michael Ellerman

    Shreyas B. Prabhu
     

13 Nov, 2014

1 commit

  • The only place where the time is invalid is when the ACPI_CSTATE_FFH entry
    method is not set. Otherwise for all the drivers, the time can be correctly
    measured.

    Instead of duplicating the CPUIDLE_FLAG_TIME_VALID flag in all the drivers
    for all the states, just invert the logic by replacing it by the flag
    CPUIDLE_FLAG_TIME_INVALID, hence we can set this flag only for the acpi idle
    driver, remove the former flag from all the drivers and invert the logic with
    this flag in the different governor.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     

21 Oct, 2014

1 commit


05 Aug, 2014

1 commit


11 Jun, 2014

1 commit

  • Currently when entering fastsleep we clear all LPCR PECE bits.

    This patch changes it to only clear the decrementer bit (ie. PECE1), which is
    the only bit we really need to clear here. This is needed if we want to set
    other wakeup causes like the PECEDH bit so we can use hypervisor doorbells on
    powernv. Also we no longer clear the MER bit as it should never be set in the
    host anyway.

    Signed-off-by: Michael Neuling
    Signed-off-by: Benjamin Herrenschmidt

    Michael Neuling