13 Jan, 2021

1 commit


05 Jan, 2021

1 commit

  • When entering cluster-wide or system-wide power mode, Exynos cpu
    power management driver checks the next hrtimer events of cpu
    composing the power domain to prevent unnecessary attempts to enter
    the power mode. Since struct cpuidle_device has next_hrtimer, it
    can be solved by passing cpuidle device as a parameter of vh.

    In order to improve responsiveness, it is necessary to prevent
    entering the deep idle state in boosting scenario. So, vendor
    driver should be able to control the idle state.

    Due to above requirements, the parameters required for idle enter
    and exit different, so the vendor hook is separated into
    cpu_idle_enter and cpu_idle_exit.

    Bug: 176198732

    Change-Id: I2262ba1bae5e6622a8e76bc1d5d16fb27af0bb8a
    Signed-off-by: Park Bumgyu

    Park Bumgyu
     

13 Dec, 2020

1 commit

  • To select domain idlestates for cpuidle-psci when OSI mode has been
    enabled, the PM domains via genpd are being managed through runtime PM.
    This works fine for the regular idlepath, but it doesn't during system wide
    suspend. More precisely, the domain idlestates becomes temporarily
    disabled, which is because the PM core disables runtime PM for devices
    during system wide suspend.

    Later in the system suspend phase, genpd intends to deal with this from its
    ->suspend_noirq() callback, but this doesn't work as expected for a device
    corresponding to a CPU, because the domain idlestates needs to be selected
    on a per CPU basis (the PM core doesn't invoke the callbacks like that).

    To address this problem, let's enable the syscore flag for the
    corresponding CPU device that becomes successfully attached to its PM
    domain (applicable only in OSI mode). This informs the PM core to skip
    invoke the system wide suspend/resume callbacks for the device, thus also
    prevents genpd from screwing up its internal state of it.

    Moreover, to properly select a domain idlestate for the CPUs during
    suspend-to-idle, let's assign a specific ->enter_s2idle() callback for the
    corresponding domain idlestate (applicable only in OSI mode). From that
    callback, let's invoke dev_pm_genpd_suspend|resume(), as this allows a
    domain idlestate to be selected for the current CPU by genpd.

    Signed-off-by: Ulf Hansson
    (cherry picked from commit 670c90def03429a228229420fa48a17913fdcc0d git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git)
    Bug: 175076037
    Change-Id: Ie70c496e0c14b18fa1e8b67231d0a56ff047414f
    Signed-off-by: Lina Iyer

    Ulf Hansson
     

20 Nov, 2020

1 commit


16 Nov, 2020

1 commit

  • Annotate tegra_pm_set[clear]_cpu_in_lp2() with RCU_NONIDLE in order to
    fix lockdep warning about suspicious RCU usage of a spinlock during late
    idling phase.

    WARNING: suspicious RCU usage
    ...
    include/trace/events/lock.h:13 suspicious rcu_dereference_check() usage!
    ...
    (dump_stack) from (lock_acquire)
    (lock_acquire) from (_raw_spin_lock)
    (_raw_spin_lock) from (tegra_pm_set_cpu_in_lp2)
    (tegra_pm_set_cpu_in_lp2) from (tegra_cpuidle_enter)
    (tegra_cpuidle_enter) from (cpuidle_enter_state)
    (cpuidle_enter_state) from (cpuidle_enter_state_coupled)
    (cpuidle_enter_state_coupled) from (cpuidle_enter)
    (cpuidle_enter) from (do_idle)
    ...

    Tested-by: Peter Geis
    Reported-by: Peter Geis
    Signed-off-by: Dmitry Osipenko
    Signed-off-by: Rafael J. Wysocki

    Dmitry Osipenko
     

26 Oct, 2020

1 commit


25 Oct, 2020

1 commit


24 Oct, 2020

1 commit

  • Commit 83788c0caed3 ("cpuidle: remove unused exports") removed
    capability of registering cpuidle governors, which was unused at that
    time. By exporting the symbol, let's allow platform specific modules to
    register cpuidle governors and use cpuidle_governor_latency_req() to get
    the QoS for the CPU.

    Bug: 169136276
    Link: https://lore.kernel.org/linux-pm/010101746fc98add-45e77496-d2d6-4bc1-a1ce-0692599a9a7a-000000@us-west-2.amazonses.com/
    Signed-off-by: Lina Iyer
    Change-Id: Ifa91576af0a3ae92ce9b216cb67728f037546c5b

    Lina Iyer
     

17 Oct, 2020

1 commit

  • Pull powerpc updates from Michael Ellerman:

    - A series from Nick adding ARCH_WANT_IRQS_OFF_ACTIVATE_MM & selecting
    it for powerpc, as well as a related fix for sparc.

    - Remove support for PowerPC 601.

    - Some fixes for watchpoints & addition of a new ptrace flag for
    detecting ISA v3.1 (Power10) watchpoint features.

    - A fix for kernels using 4K pages and the hash MMU on bare metal
    Power9 systems with > 16TB of RAM, or RAM on the 2nd node.

    - A basic idle driver for shallow stop states on Power10.

    - Tweaks to our sched domains code to better inform the scheduler about
    the hardware topology on Power9/10, where two SMT4 cores can be
    presented by firmware as an SMT8 core.

    - A series doing further reworks & cleanups of our EEH code.

    - Addition of a filter for RTAS (firmware) calls done via sys_rtas(),
    to prevent root from overwriting kernel memory.

    - Other smaller features, fixes & cleanups.

    Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V,
    Athira Rajeev, Biwen Li, Cameron Berkenpas, Cédric Le Goater, Christophe
    Leroy, Christoph Hellwig, Colin Ian King, Daniel Axtens, David Dai, Finn
    Thain, Frederic Barrat, Gautham R. Shenoy, Greg Kurz, Gustavo Romero,
    Ira Weiny, Jason Yan, Joel Stanley, Jordan Niethe, Kajol Jain, Konrad
    Rzeszutek Wilk, Laurent Dufour, Leonardo Bras, Liu Shixin, Luca
    Ceresoli, Madhavan Srinivasan, Mahesh Salgaonkar, Nathan Lynch, Nicholas
    Mc Guire, Nicholas Piggin, Nick Desaulniers, Oliver O'Halloran, Pedro
    Miraglia Franco de Carvalho, Pratik Rajesh Sampat, Qian Cai, Qinglang
    Miao, Ravi Bangoria, Russell Currey, Satheesh Rajendran, Scott Cheloha,
    Segher Boessenkool, Srikar Dronamraju, Stan Johnson, Stephen Kitt,
    Stephen Rothwell, Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain,
    Vaidyanathan Srinivasan, Vasant Hegde, Wang Wensheng, Wolfram Sang, Yang
    Yingliang, zhengbin.

    * tag 'powerpc-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (228 commits)
    Revert "powerpc/pci: unmap legacy INTx interrupts when a PHB is removed"
    selftests/powerpc: Fix eeh-basic.sh exit codes
    cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_reboot_notifier
    powerpc/time: Make get_tb() common to PPC32 and PPC64
    powerpc/time: Make get_tbl() common to PPC32 and PPC64
    powerpc/time: Remove get_tbu()
    powerpc/time: Avoid using get_tbl() and get_tbu() internally
    powerpc/time: Make mftb() common to PPC32 and PPC64
    powerpc/time: Rename mftbl() to mftb()
    powerpc/32s: Remove #ifdef CONFIG_PPC_BOOK3S_32 in head_book3s_32.S
    powerpc/32s: Rename head_32.S to head_book3s_32.S
    powerpc/32s: Setup the early hash table at all time.
    powerpc/time: Remove ifdef in get_dec() and set_dec()
    powerpc: Remove get_tb_or_rtc()
    powerpc: Remove __USE_RTC()
    powerpc: Tidy up a bit after removal of PowerPC 601.
    powerpc: Remove support for PowerPC 601
    powerpc: Remove PowerPC 601
    powerpc: Drop SYNC_601() ISYNC_601() and SYNC()
    powerpc: Remove CONFIG_PPC601_SYNC_FIX
    ...

    Linus Torvalds
     

28 Sep, 2020

1 commit


27 Sep, 2020

1 commit


23 Sep, 2020

2 commits

  • CPUs may fail to enter the chosen idle state if there was a
    pending interrupt, causing the cpuidle driver to return an error
    value.

    Record that and export it via sysfs along with the other idle state
    statistics.

    This could prove useful in understanding behavior of the governor
    and the system during usecases that involve multiple CPUs.

    Signed-off-by: Lina Iyer
    [ rjw: Changelog and documentation edits ]
    Signed-off-by: Rafael J. Wysocki

    Lina Iyer
     
  • The commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle deeper into the
    idle path"), moved the calls rcu_idle_enter|exit() into the cpuidle core.

    However, it forgot to remove a couple of comments in enter_s2idle_proper()
    about why RCU_NONIDLE earlier was needed. So, let's drop them as they have
    become a bit misleading.

    Fixes: 1098582a0f6c ("sched,idle,rcu: Push rcu_idle deeper into the idle path")
    Signed-off-by: Ulf Hansson
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     

22 Sep, 2020

2 commits

  • If the PSCI OSI mode isn't supported or fails to be enabled, the PM domain
    topology with the genpd providers isn't initialized. This is perfectly fine
    from cpuidle-psci point of view.

    However, since the PM domain topology in the DTS files is a description of
    the HW, no matter of whether the PSCI OSI mode is supported or not, other
    consumers besides the CPUs may rely on it.

    Therefore, let's always allow the initialization of the PM domain topology
    to succeed, independently of whether the PSCI OSI mode is supported.
    Consequentially we need to track if we succeed to enable the OSI mode, as
    to know when a domain idlestate can be selected.

    Note that, CPU devices are still not being attached to the PM domain
    topology, unless the PSCI OSI mode is supported.

    Acked-by: Sudeep Holla
    Signed-off-by: Ulf Hansson
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     
  • The current user (cpuidle-psci) of psci_set_osi_mode() only needs to enable
    the PSCI OSI mode. Although, as subsequent changes shows, there is a need
    to be able to reset back into the PSCI PC mode.

    Therefore, let's extend psci_set_osi_mode() to take a bool as in-parameter,
    to let the user indicate whether to enable OSI or to switch back to PC
    mode.

    Reviewed-by: Sudeep Holla
    Signed-off-by: Ulf Hansson
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     

21 Sep, 2020

3 commits

  • The enter() callback of CPUIDLE drivers returns index of the entered idle
    state on success or a negative value on failure. The negative value could
    any negative value, i.e. it doesn't necessarily needs to be a error code.
    That's because CPUIDLE core only cares about the fact of failure and not
    about the reason of the enter() failure.

    Like every other enter() callback, the arm_cpuidle_simple_enter() returns
    the entered idle-index on success. Unlike some of other drivers, it never
    fails. It happened that TEGRA_C1=index=err=0 in the code of cpuidle-tegra
    driver, and thus, there is no problem for the cpuidle-tegra driver created
    by the typo in the code which assumes that the arm_cpuidle_simple_enter()
    returns a error code.

    The arm_cpuidle_simple_enter() also may return a -ENODEV error if CPU_IDLE
    is disabled in a kernel's config, but all CPUIDLE drivers are disabled if
    CPU_IDLE is disabled, including the cpuidle-tegra driver. So we can't ever
    see the error code from arm_cpuidle_simple_enter() today.

    Of course the code may get some changes in the future and then the
    typo may transform into a real bug, so let's correct the typo! The
    tegra_cpuidle_state_enter() is now changed to make it return the entered
    idle-index on success and negative error code on fail, which puts it on
    par with the arm_cpuidle_simple_enter(), making code consistent in regards
    to the error handling.

    This patch fixes a minor typo in the code, it doesn't fix any bugs.

    Signed-off-by: Dmitry Osipenko
    Reviewed-by: Jon Hunter
    Signed-off-by: Rafael J. Wysocki

    Dmitry Osipenko
     
  • The commit eb1f00237aca ("lockdep,trace: Expose tracepoints"), started to
    expose us for tracepoints. This lead to the following RCU splat on an ARM64
    Qcom board.

    [ 5.529634] WARNING: suspicious RCU usage
    [ 5.537307] sdhci-pltfm: SDHCI platform and OF driver helper
    [ 5.541092] 5.9.0-rc3 #86 Not tainted
    [ 5.541098] -----------------------------
    [ 5.541105] ../include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage!
    [ 5.541110]
    [ 5.541110] other info that might help us debug this:
    [ 5.541110]
    [ 5.541116]
    [ 5.541116] rcu_scheduler_active = 2, debug_locks = 1
    [ 5.541122] RCU used illegally from extended quiescent state!
    [ 5.541129] no locks held by swapper/0/0.
    [ 5.541134]
    [ 5.541134] stack backtrace:
    [ 5.541143] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #86
    [ 5.541149] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
    [ 5.541157] Call trace:
    [ 5.568185] sdhci_msm 7864900.sdhci: Got CD GPIO
    [ 5.574186] dump_backtrace+0x0/0x1c8
    [ 5.574206] show_stack+0x14/0x20
    [ 5.574229] dump_stack+0xe8/0x154
    [ 5.574250] lockdep_rcu_suspicious+0xd4/0xf8
    [ 5.574269] lock_acquire+0x3f0/0x460
    [ 5.574292] _raw_spin_lock_irqsave+0x80/0xb0
    [ 5.574314] __pm_runtime_suspend+0x4c/0x188
    [ 5.574341] psci_enter_domain_idle_state+0x40/0xa0
    [ 5.574362] cpuidle_enter_state+0xc0/0x610
    [ 5.646487] cpuidle_enter+0x38/0x50
    [ 5.650651] call_cpuidle+0x18/0x40
    [ 5.654467] do_idle+0x228/0x278
    [ 5.657678] cpu_startup_entry+0x24/0x70
    [ 5.661153] rest_init+0x1a4/0x278
    [ 5.665061] arch_call_rest_init+0xc/0x14
    [ 5.668272] start_kernel+0x508/0x540

    Following the path in pm_runtime_put_sync_suspend() from
    psci_enter_domain_idle_state(), it seems like we end up using the RCU.
    Therefore, let's simply silence the splat by informing the RCU about it
    with RCU_NONIDLE.

    Note that, this is a temporary solution. Instead we should strive to avoid
    using RCU_NONIDLE (and similar), but rather push rcu_idle_enter|exit()
    further down, closer to the arch specific code. However, as the CPU PM
    notifiers are also using the RCU, additional rework is needed.

    Reported-by: Naresh Kamboju
    Signed-off-by: Ulf Hansson
    Acked-by: Paul E. McKenney
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     
  • Linux 5.9-rc6

    Signed-off-by: Greg Kroah-Hartman
    Change-Id: I3bccdbb773bfc2c604742e6ff5983bf0b61ba0b5

    Greg Kroah-Hartman
     

19 Sep, 2020

1 commit

  • Pull powerpc fixes from Michael Ellerman:
    "Some more powerpc fixes for 5.9:

    - Opt us out of the DEBUG_VM_PGTABLE support for now as it's causing
    crashes.

    - Fix a long standing bug in our DMA mask handling that was hidden
    until recently, and which caused problems with some drivers.

    - Fix a boot failure on systems with large amounts of RAM, and no
    hugepage support and using Radix MMU, only seen in the lab.

    - A few other minor fixes.

    Thanks to Alexey Kardashevskiy, Aneesh Kumar K.V, Gautham R. Shenoy,
    Hari Bathini, Ira Weiny, Nick Desaulniers, Shirisha Ganta, Vaibhav
    Jain, and Vaidyanathan Srinivasan"

    * tag 'powerpc-5.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
    powerpc/papr_scm: Limit the readability of 'perf_stats' sysfs attribute
    cpuidle: pseries: Fix CEDE latency conversion from tb to us
    powerpc/dma: Fix dma_map_ops::get_required_mask
    Revert "powerpc/build: vdso linker warning for orphan sections"
    powerpc/mm: Remove DEBUG_VM_PGTABLE support on powerpc
    selftests/powerpc: Skip PROT_SAO test in guests/LPARS
    powerpc/book3s64/radix: Fix boot failure with large amount of guest memory

    Linus Torvalds
     

17 Sep, 2020

1 commit

  • Some drivers have to do significant work, some of which relies on RCU
    still being active. Instead of using RCU_NONIDLE in the drivers and
    flipping RCU back on, allow drivers to take over RCU-idle duty.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Ulf Hansson
    Tested-by: Borislav Petkov
    Signed-off-by: Rafael J. Wysocki

    Peter Zijlstra
     

15 Sep, 2020

1 commit

  • This driver does not restore stop > 3 state, so it limits itself
    to states which do not lose full state or TB.

    The POWER10 SPRs are sufficiently different from P9 that it seems
    easier to split out the P10 code. The POWER10 deep sleep code
    (e.g., the BHRB restore) has been taken out, but it can be re-added
    when stop > 3 support is added.

    Signed-off-by: Nicholas Piggin
    Tested-by: Pratik Rajesh Sampat
    Tested-by: Vaidyanathan Srinivasan
    Reviewed-by: Pratik Rajesh Sampat
    Reviewed-by: Gautham R. Shenoy
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200819094700.493399-1-npiggin@gmail.com

    Nicholas Piggin
     

08 Sep, 2020

1 commit

  • Commit d947fb4c965c ("cpuidle: pseries: Fixup exit latency for
    CEDE(0)") sets the exit latency of CEDE(0) based on the latency values
    of the Extended CEDE states advertised by the platform. The values
    advertised by the platform are in timebase ticks. However the cpuidle
    framework requires the latency values in microseconds.

    If the tb-ticks value advertised by the platform correspond to a value
    smaller than 1us, during the conversion from tb-ticks to microseconds,
    in the current code, the result becomes zero. This is incorrect as it
    puts a CEDE state on par with the snooze state.

    This patch fixes this by rounding up the result obtained while
    converting the latency value from tb-ticks to microseconds. It also
    prints a warning in case we discover an extended-cede state with
    wakeup latency to be 0. In such a case, ensure that CEDE(0) has a
    non-zero wakeup latency.

    Fixes: d947fb4c965c ("cpuidle: pseries: Fixup exit latency for CEDE(0)")
    Signed-off-by: Gautham R. Shenoy
    Reviewed-by: Vaidyanathan Srinivasan
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/1599125247-28488-1-git-send-email-ego@linux.vnet.ibm.com

    Gautham R. Shenoy
     

01 Sep, 2020

1 commit


26 Aug, 2020

3 commits

  • This allows moving the leave_mm() call into generic code before
    rcu_idle_enter(). Gets rid of more trace_*_rcuidle() users.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Steven Rostedt (VMware)
    Reviewed-by: Thomas Gleixner
    Acked-by: Rafael J. Wysocki
    Tested-by: Marco Elver
    Link: https://lkml.kernel.org/r/20200821085348.369441600@infradead.org

    Peter Zijlstra
     
  • Lots of things take locks, due to a wee bug, rcu_lockdep didn't notice
    that the locking tracepoints were using RCU.

    Push rcu_idle_{enter,exit}() as deep as possible into the idle paths,
    this also resolves a lot of _rcuidle()/RCU_NONIDLE() usage.

    Specifically, sched_clock_idle_wakeup_event() will use ktime which
    will use seqlocks which will tickle lockdep, and
    stop_critical_timings() uses lock.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Steven Rostedt (VMware)
    Reviewed-by: Thomas Gleixner
    Acked-by: Rafael J. Wysocki
    Tested-by: Marco Elver
    Link: https://lkml.kernel.org/r/20200821085348.310943801@infradead.org

    Peter Zijlstra
     
  • Match the pattern elsewhere in this file.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Steven Rostedt (VMware)
    Reviewed-by: Thomas Gleixner
    Acked-by: Rafael J. Wysocki
    Tested-by: Marco Elver
    Link: https://lkml.kernel.org/r/20200821085348.251340558@infradead.org

    Peter Zijlstra
     

20 Aug, 2020

1 commit

  • An event that gather the idle state that the cpu attempted to enter and
    actually entered is added. Through this, the idle statistics of the cpu
    can be obtained and used for vendor specific algorithms or for system
    analysis.

    Bug: 162980647

    Change-Id: I9c2491d524722042e881864488f7b3cf7e903d1e
    Signed-off-by: Park Bumgyu

    Park Bumgyu
     

08 Aug, 2020

1 commit

  • Pull powerpc updates from Michael Ellerman:

    - Add support for (optionally) using queued spinlocks & rwlocks.

    - Support for a new faster system call ABI using the scv instruction on
    Power9 or later.

    - Drop support for the PROT_SAO mmap/mprotect flag as it will be
    unsupported on Power10 and future processors, leaving us with no way
    to implement the functionality it requests. This risks breaking
    userspace, though we believe it is unused in practice.

    - A bug fix for, and then the removal of, our custom stack expansion
    checking. We now allow stack expansion up to the rlimit, like other
    architectures.

    - Remove the remnants of our (previously disabled) topology update
    code, which tried to react to NUMA layout changes on virtualised
    systems, but was prone to crashes and other problems.

    - Add PMU support for Power10 CPUs.

    - A change to our signal trampoline so that we don't unbalance the link
    stack (branch return predictor) in the signal delivery path.

    - Lots of other cleanups, refactorings, smaller features and so on as
    usual.

    Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey
    Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju
    T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan
    S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris
    Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan
    Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn
    Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand,
    Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel
    Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh
    Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan
    Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton
    Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan
    Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran,
    Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud,
    Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
    Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
    Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar
    Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza
    Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov,
    Wei Yongjun, Wen Xiong, YueHaibing.

    * tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits)
    selftests/powerpc: Fix pkey syscall redefinitions
    powerpc: Fix circular dependency between percpu.h and mmu.h
    powerpc/powernv/sriov: Fix use of uninitialised variable
    selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs
    powerpc/40x: Fix assembler warning about r0
    powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric
    powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
    cpuidle: pseries: Fixup exit latency for CEDE(0)
    cpuidle: pseries: Add function to parse extended CEDE records
    cpuidle: pseries: Set the latency-hint before entering CEDE
    selftests/powerpc: Fix online CPU selection
    powerpc/perf: Consolidate perf_callchain_user_[64|32]()
    powerpc/pseries/hotplug-cpu: Remove double free in error path
    powerpc/pseries/mobility: Add pr_debug() for device tree changes
    powerpc/pseries/mobility: Set pr_fmt()
    powerpc/cacheinfo: Warn if cache object chain becomes unordered
    powerpc/cacheinfo: Improve diagnostics about malformed cache lists
    powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages
    powerpc/cacheinfo: Set pr_fmt()
    powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
    ...

    Linus Torvalds
     

30 Jul, 2020

9 commits

  • We are currently assuming that CEDE(0) has exit latency 10us, since
    there is no way for us to query from the platform. However, if the
    wakeup latency of an Extended CEDE state is smaller than 10us, then we
    can be sure that the exit latency of CEDE(0) cannot be more than that.

    In this patch, we fix the exit latency of CEDE(0) if we discover an
    Extended CEDE state with wakeup latency smaller than 10us.

    Benchmark results:

    On POWER8, this patch does not have any impact since the advertized
    latency of Extended CEDE (1) is 30us which is higher than the default
    latency of CEDE (0) which is 10us.

    On POWER9 we see improvement the single-threaded performance of
    ebizzy, and no regression in the wakeup latency or the number of
    context-switches.

    ebizzy:
    2 ebizzy threads bound to the same big-core. 25% improvement in the
    avg records/s with patch.

    x without_patch
    * with_patch
    N Min Max Median Avg Stddev
    x 10 2491089 5834307 5398375 4244335 1596244.9
    * 10 2893813 5834474 5832448 5327281.3 1055941.4

    context_switch2:
    There is no major regression observed with this patch as seen from the
    context_switch2 benchmark.

    context_switch2 across CPU0 CPU1 (Both belong to same big-core, but
    different small cores). We observe a minor 0.14% regression in the
    number of context-switches (higher is better).

    x without_patch
    * with_patch
    N Min Max Median Avg Stddev
    x 500 348872 362236 354712 354745.69 2711.827
    * 500 349422 361452 353942 354215.4 2576.9258

    Difference at 99.0% confidence
    -530.288 +/- 430.963
    -0.149484% +/- 0.121485%
    (Student's t, pooled s = 2645.24)

    context_switch2 across CPU0 CPU8 (Different big-cores). We observe a
    0.37% improvement in the number of context-switches (higher is
    better).

    x without_patch
    * with_patch
    N Min Max Median Avg Stddev
    x 500 287956 294940 288896 288977.23 646.59295
    * 500 288300 294646 289582 290064.76 1161.9992

    Difference at 99.0% confidence
    1087.53 +/- 153.194
    0.376337% +/- 0.0530125%
    (Student's t, pooled s = 940.299)

    schbench:
    No major difference could be seen until the 99.9th percentile.

    Without-patch:
    Latency percentiles (usec)
    50.0th: 29
    75.0th: 39
    90.0th: 49
    95.0th: 59
    *99.0th: 13104
    99.5th: 14672
    99.9th: 15824
    min=0, max=17993

    With-patch:
    Latency percentiles (usec)
    50.0th: 29
    75.0th: 40
    90.0th: 50
    95.0th: 61
    *99.0th: 13648
    99.5th: 14768
    99.9th: 15664
    min=0, max=29812

    Signed-off-by: Gautham R. Shenoy
    [mpe: Minor formatting]
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/1596087177-30329-4-git-send-email-ego@linux.vnet.ibm.com

    Gautham R. Shenoy
     
  • Currently we use CEDE with latency-hint 0 as the only other idle state
    on a dedicated LPAR apart from the polling "snooze" state.

    The platform might support additional extended CEDE idle states, which
    can be discovered through the "ibm,get-system-parameter" rtas-call
    made with CEDE_LATENCY_TOKEN.

    This patch adds a function to obtain information about the extended
    CEDE idle states from the platform and parse the contents to populate
    an array of extended CEDE states. These idle states thus discovered
    will be added to the cpuidle framework in the next patch.

    dmesg on a POWER8 and POWER9 LPAR, demonstrating the output of parsing
    the extended CEDE latency parameters are as follows

    POWER8
    [ 10.093279] xcede : xcede_record_size = 10
    [ 10.093285] xcede : Record 0 : hint = 1, latency = 0x3c00 tb ticks, Wake-on-irq = 1
    [ 10.093291] xcede : Record 1 : hint = 2, latency = 0x4e2000 tb ticks, Wake-on-irq = 0
    [ 10.093297] cpuidle : Skipping the 2 Extended CEDE idle states

    POWER9
    [ 5.913180] xcede : xcede_record_size = 10
    [ 5.913183] xcede : Record 0 : hint = 1, latency = 0x400 tb ticks, Wake-on-irq = 1
    [ 5.913188] xcede : Record 1 : hint = 2, latency = 0x3e8000 tb ticks, Wake-on-irq = 0
    [ 5.913193] cpuidle : Skipping the 2 Extended CEDE idle states

    Signed-off-by: Gautham R. Shenoy
    [mpe: Make space for 16 records, drop memset, minor cleanup & formatting]
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/1596087177-30329-3-git-send-email-ego@linux.vnet.ibm.com

    Gautham R. Shenoy
     
  • As per the PAPR, each H_CEDE call is associated with a latency-hint to
    be passed in the VPA field "cede_latency_hint". The CEDE states that
    we were implicitly entering so far is CEDE with latency-hint = 0.

    This patch explicitly sets the latency hint corresponding to the CEDE
    state that we are currently entering. While at it, we save the
    previous hint, to be restored once we wakeup from CEDE. This will be
    required in the future when we expose extended-cede states through the
    cpuidle framework, where each of them will have a different
    cede-latency hint.

    Signed-off-by: Gautham R. Shenoy
    [mpe: Make cede_latency_hint static]
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/1596087177-30329-2-git-send-email-ego@linux.vnet.ibm.com

    Gautham R. Shenoy
     
  • Control Flow Integrity(CFI) is a security mechanism that disallows
    changes to the original control flow graph of a compiled binary,
    making it significantly harder to perform such attacks.

    init_state_node() assign same function callback to different
    function pointer declarations.

    static int init_state_node(struct cpuidle_state *idle_state,
    const struct of_device_id *matches,
    struct device_node *state_node) { ...
    idle_state->enter = match_id->data; ...
    idle_state->enter_s2idle = match_id->data; }

    Function declarations:

    struct cpuidle_state { ...
    int (*enter) (struct cpuidle_device *dev,
    struct cpuidle_driver *drv,
    int index);

    void (*enter_s2idle) (struct cpuidle_device *dev,
    struct cpuidle_driver *drv,
    int index); };

    In this case, either enter() or enter_s2idle() would cause CFI check
    failed since they use same callee.

    Align function prototype of enter() since it needs return value for
    some use cases. The return value of enter_s2idle() is no
    need currently.

    Signed-off-by: Neal Liu
    Reviewed-by: Sami Tolvanen
    Signed-off-by: Rafael J. Wysocki

    Neal Liu
     
  • Depending on the SoC/platform, additional devices may be part of the PSCI
    PM domain topology. This is the case with 'qcom,rpmh-rsc' device, for
    example, even if this is not yet visible in the corresponding DTS-files.

    Without going into too much details, a device like the 'qcom,rpmh-rsc' may
    have HW constraints that needs to be obeyed to, before a domain idlestate
    can be picked.

    Therefore, let's implement the ->sync_state() callback to receive a
    notification when all consumers of the PSCI PM domain providers have been
    attached/probed to it. In this way, we can make sure all constraints from
    all relevant devices, are taken into account before allowing a domain
    idlestate to be picked.

    Acked-by: Saravana Kannan
    Signed-off-by: Ulf Hansson
    Reviewed-by: Lukasz Luba
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     
  • To enable support for deferred probing and to allow implementation of the
    ->sync_state() callback from subsequent changes, let's convert into a
    platform driver.

    Reviewed-by: Lina Iyer
    Signed-off-by: Ulf Hansson
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     
  • The current error paths for the cpuidle-psci driver, may leak memory or
    possibly leave CPU devices attached to their PM domains. These are quite
    harmless issues, but still deserves to be taken care of.

    Although, rather than fixing them by keeping track of allocations that
    needs to be freed, which tends to become a bit messy, let's convert into a
    platform driver. In this way, it gets easier to fix the memory leaks as we
    can rely on the devm_* functions.

    Moreover, converting to a platform driver also enables support for deferred
    probe, which subsequent changes takes benefit from.

    Signed-off-by: Ulf Hansson
    Reviewed-by: Lukasz Luba
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     
  • Currently we allow the cpuidle driver registration to succeed, even if we
    failed to enable the OSI mode when the hierarchical DT layout is used. This
    means running in a degraded mode, by using the available idle states per
    CPU, while also preventing the domain idle states.

    Moving forward, this behaviour looks quite questionable to maintain, as
    complexity seems to grow around it, especially when trying to add support
    for deferred probe, for example.

    Therefore, let's make the cpuidle driver registration to fail in this
    situation, thus relying on the default architectural cpuidle backend for
    WFI to be used.

    Reviewed-by: Lina Iyer
    Signed-off-by: Ulf Hansson
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     
  • The combined build object for the PSCI cpuidle driver and the PSCI PM
    domain, is a bit messy. Therefore let's split it up by adding a new Kconfig
    ARM_PSCI_CPUIDLE_DOMAIN and convert into two separate objects.

    Reviewed-by: Lina Iyer
    Reviewed-by: Sudeep Holla
    Signed-off-by: Ulf Hansson
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     

16 Jul, 2020

1 commit

  • The sparse tool complains as follows:

    drivers/cpuidle/cpuidle-pseries.c:25:23: warning:
    symbol 'pseries_idle_driver' was not declared. Should it be static?

    'pseries_idle_driver' is not used outside of this file, so marks
    it static.

    Reported-by: Hulk Robot
    Signed-off-by: Wei Yongjun
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200714142424.66648-1-weiyongjun1@huawei.com

    Wei Yongjun
     

15 Jul, 2020

1 commit

  • Commit 1961acad2f88559c2cdd2ef67c58c3627f1f6e54 removes usage of
    function "validate_dt_prop_sizes". This patch removes this unused
    function.

    Signed-off-by: Abhishek Goel
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200706053258.121475-1-huntbag@linux.vnet.ibm.com

    Abhishek Goel
     

25 Jun, 2020

1 commit

  • Implement call_cpuidle_s2idle() in analogy with call_cpuidle()
    for the s2idle-specific idle state entry and invoke it from
    cpuidle_idle_call() to make the s2idle-specific idle entry code
    path look more similar to the "regular" idle entry one.

    No intentional functional impact.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Chen Yu

    Rafael J. Wysocki