16 Nov, 2020

1 commit

  • Annotate tegra_pm_set[clear]_cpu_in_lp2() with RCU_NONIDLE in order to
    fix lockdep warning about suspicious RCU usage of a spinlock during late
    idling phase.

    WARNING: suspicious RCU usage
    ...
    include/trace/events/lock.h:13 suspicious rcu_dereference_check() usage!
    ...
    (dump_stack) from (lock_acquire)
    (lock_acquire) from (_raw_spin_lock)
    (_raw_spin_lock) from (tegra_pm_set_cpu_in_lp2)
    (tegra_pm_set_cpu_in_lp2) from (tegra_cpuidle_enter)
    (tegra_cpuidle_enter) from (cpuidle_enter_state)
    (cpuidle_enter_state) from (cpuidle_enter_state_coupled)
    (cpuidle_enter_state_coupled) from (cpuidle_enter)
    (cpuidle_enter) from (do_idle)
    ...

    Tested-by: Peter Geis
    Reported-by: Peter Geis
    Signed-off-by: Dmitry Osipenko
    Signed-off-by: Rafael J. Wysocki

    Dmitry Osipenko
     

17 Oct, 2020

1 commit

  • Pull powerpc updates from Michael Ellerman:

    - A series from Nick adding ARCH_WANT_IRQS_OFF_ACTIVATE_MM & selecting
    it for powerpc, as well as a related fix for sparc.

    - Remove support for PowerPC 601.

    - Some fixes for watchpoints & addition of a new ptrace flag for
    detecting ISA v3.1 (Power10) watchpoint features.

    - A fix for kernels using 4K pages and the hash MMU on bare metal
    Power9 systems with > 16TB of RAM, or RAM on the 2nd node.

    - A basic idle driver for shallow stop states on Power10.

    - Tweaks to our sched domains code to better inform the scheduler about
    the hardware topology on Power9/10, where two SMT4 cores can be
    presented by firmware as an SMT8 core.

    - A series doing further reworks & cleanups of our EEH code.

    - Addition of a filter for RTAS (firmware) calls done via sys_rtas(),
    to prevent root from overwriting kernel memory.

    - Other smaller features, fixes & cleanups.

    Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V,
    Athira Rajeev, Biwen Li, Cameron Berkenpas, Cédric Le Goater, Christophe
    Leroy, Christoph Hellwig, Colin Ian King, Daniel Axtens, David Dai, Finn
    Thain, Frederic Barrat, Gautham R. Shenoy, Greg Kurz, Gustavo Romero,
    Ira Weiny, Jason Yan, Joel Stanley, Jordan Niethe, Kajol Jain, Konrad
    Rzeszutek Wilk, Laurent Dufour, Leonardo Bras, Liu Shixin, Luca
    Ceresoli, Madhavan Srinivasan, Mahesh Salgaonkar, Nathan Lynch, Nicholas
    Mc Guire, Nicholas Piggin, Nick Desaulniers, Oliver O'Halloran, Pedro
    Miraglia Franco de Carvalho, Pratik Rajesh Sampat, Qian Cai, Qinglang
    Miao, Ravi Bangoria, Russell Currey, Satheesh Rajendran, Scott Cheloha,
    Segher Boessenkool, Srikar Dronamraju, Stan Johnson, Stephen Kitt,
    Stephen Rothwell, Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain,
    Vaidyanathan Srinivasan, Vasant Hegde, Wang Wensheng, Wolfram Sang, Yang
    Yingliang, zhengbin.

    * tag 'powerpc-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (228 commits)
    Revert "powerpc/pci: unmap legacy INTx interrupts when a PHB is removed"
    selftests/powerpc: Fix eeh-basic.sh exit codes
    cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_reboot_notifier
    powerpc/time: Make get_tb() common to PPC32 and PPC64
    powerpc/time: Make get_tbl() common to PPC32 and PPC64
    powerpc/time: Remove get_tbu()
    powerpc/time: Avoid using get_tbl() and get_tbu() internally
    powerpc/time: Make mftb() common to PPC32 and PPC64
    powerpc/time: Rename mftbl() to mftb()
    powerpc/32s: Remove #ifdef CONFIG_PPC_BOOK3S_32 in head_book3s_32.S
    powerpc/32s: Rename head_32.S to head_book3s_32.S
    powerpc/32s: Setup the early hash table at all time.
    powerpc/time: Remove ifdef in get_dec() and set_dec()
    powerpc: Remove get_tb_or_rtc()
    powerpc: Remove __USE_RTC()
    powerpc: Tidy up a bit after removal of PowerPC 601.
    powerpc: Remove support for PowerPC 601
    powerpc: Remove PowerPC 601
    powerpc: Drop SYNC_601() ISYNC_601() and SYNC()
    powerpc: Remove CONFIG_PPC601_SYNC_FIX
    ...

    Linus Torvalds
     

28 Sep, 2020

1 commit


23 Sep, 2020

2 commits

  • CPUs may fail to enter the chosen idle state if there was a
    pending interrupt, causing the cpuidle driver to return an error
    value.

    Record that and export it via sysfs along with the other idle state
    statistics.

    This could prove useful in understanding behavior of the governor
    and the system during usecases that involve multiple CPUs.

    Signed-off-by: Lina Iyer
    [ rjw: Changelog and documentation edits ]
    Signed-off-by: Rafael J. Wysocki

    Lina Iyer
     
  • The commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle deeper into the
    idle path"), moved the calls rcu_idle_enter|exit() into the cpuidle core.

    However, it forgot to remove a couple of comments in enter_s2idle_proper()
    about why RCU_NONIDLE earlier was needed. So, let's drop them as they have
    become a bit misleading.

    Fixes: 1098582a0f6c ("sched,idle,rcu: Push rcu_idle deeper into the idle path")
    Signed-off-by: Ulf Hansson
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     

22 Sep, 2020

2 commits

  • If the PSCI OSI mode isn't supported or fails to be enabled, the PM domain
    topology with the genpd providers isn't initialized. This is perfectly fine
    from cpuidle-psci point of view.

    However, since the PM domain topology in the DTS files is a description of
    the HW, no matter of whether the PSCI OSI mode is supported or not, other
    consumers besides the CPUs may rely on it.

    Therefore, let's always allow the initialization of the PM domain topology
    to succeed, independently of whether the PSCI OSI mode is supported.
    Consequentially we need to track if we succeed to enable the OSI mode, as
    to know when a domain idlestate can be selected.

    Note that, CPU devices are still not being attached to the PM domain
    topology, unless the PSCI OSI mode is supported.

    Acked-by: Sudeep Holla
    Signed-off-by: Ulf Hansson
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     
  • The current user (cpuidle-psci) of psci_set_osi_mode() only needs to enable
    the PSCI OSI mode. Although, as subsequent changes shows, there is a need
    to be able to reset back into the PSCI PC mode.

    Therefore, let's extend psci_set_osi_mode() to take a bool as in-parameter,
    to let the user indicate whether to enable OSI or to switch back to PC
    mode.

    Reviewed-by: Sudeep Holla
    Signed-off-by: Ulf Hansson
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     

21 Sep, 2020

2 commits

  • The enter() callback of CPUIDLE drivers returns index of the entered idle
    state on success or a negative value on failure. The negative value could
    any negative value, i.e. it doesn't necessarily needs to be a error code.
    That's because CPUIDLE core only cares about the fact of failure and not
    about the reason of the enter() failure.

    Like every other enter() callback, the arm_cpuidle_simple_enter() returns
    the entered idle-index on success. Unlike some of other drivers, it never
    fails. It happened that TEGRA_C1=index=err=0 in the code of cpuidle-tegra
    driver, and thus, there is no problem for the cpuidle-tegra driver created
    by the typo in the code which assumes that the arm_cpuidle_simple_enter()
    returns a error code.

    The arm_cpuidle_simple_enter() also may return a -ENODEV error if CPU_IDLE
    is disabled in a kernel's config, but all CPUIDLE drivers are disabled if
    CPU_IDLE is disabled, including the cpuidle-tegra driver. So we can't ever
    see the error code from arm_cpuidle_simple_enter() today.

    Of course the code may get some changes in the future and then the
    typo may transform into a real bug, so let's correct the typo! The
    tegra_cpuidle_state_enter() is now changed to make it return the entered
    idle-index on success and negative error code on fail, which puts it on
    par with the arm_cpuidle_simple_enter(), making code consistent in regards
    to the error handling.

    This patch fixes a minor typo in the code, it doesn't fix any bugs.

    Signed-off-by: Dmitry Osipenko
    Reviewed-by: Jon Hunter
    Signed-off-by: Rafael J. Wysocki

    Dmitry Osipenko
     
  • The commit eb1f00237aca ("lockdep,trace: Expose tracepoints"), started to
    expose us for tracepoints. This lead to the following RCU splat on an ARM64
    Qcom board.

    [ 5.529634] WARNING: suspicious RCU usage
    [ 5.537307] sdhci-pltfm: SDHCI platform and OF driver helper
    [ 5.541092] 5.9.0-rc3 #86 Not tainted
    [ 5.541098] -----------------------------
    [ 5.541105] ../include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage!
    [ 5.541110]
    [ 5.541110] other info that might help us debug this:
    [ 5.541110]
    [ 5.541116]
    [ 5.541116] rcu_scheduler_active = 2, debug_locks = 1
    [ 5.541122] RCU used illegally from extended quiescent state!
    [ 5.541129] no locks held by swapper/0/0.
    [ 5.541134]
    [ 5.541134] stack backtrace:
    [ 5.541143] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #86
    [ 5.541149] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
    [ 5.541157] Call trace:
    [ 5.568185] sdhci_msm 7864900.sdhci: Got CD GPIO
    [ 5.574186] dump_backtrace+0x0/0x1c8
    [ 5.574206] show_stack+0x14/0x20
    [ 5.574229] dump_stack+0xe8/0x154
    [ 5.574250] lockdep_rcu_suspicious+0xd4/0xf8
    [ 5.574269] lock_acquire+0x3f0/0x460
    [ 5.574292] _raw_spin_lock_irqsave+0x80/0xb0
    [ 5.574314] __pm_runtime_suspend+0x4c/0x188
    [ 5.574341] psci_enter_domain_idle_state+0x40/0xa0
    [ 5.574362] cpuidle_enter_state+0xc0/0x610
    [ 5.646487] cpuidle_enter+0x38/0x50
    [ 5.650651] call_cpuidle+0x18/0x40
    [ 5.654467] do_idle+0x228/0x278
    [ 5.657678] cpu_startup_entry+0x24/0x70
    [ 5.661153] rest_init+0x1a4/0x278
    [ 5.665061] arch_call_rest_init+0xc/0x14
    [ 5.668272] start_kernel+0x508/0x540

    Following the path in pm_runtime_put_sync_suspend() from
    psci_enter_domain_idle_state(), it seems like we end up using the RCU.
    Therefore, let's simply silence the splat by informing the RCU about it
    with RCU_NONIDLE.

    Note that, this is a temporary solution. Instead we should strive to avoid
    using RCU_NONIDLE (and similar), but rather push rcu_idle_enter|exit()
    further down, closer to the arch specific code. However, as the CPU PM
    notifiers are also using the RCU, additional rework is needed.

    Reported-by: Naresh Kamboju
    Signed-off-by: Ulf Hansson
    Acked-by: Paul E. McKenney
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     

19 Sep, 2020

1 commit

  • Pull powerpc fixes from Michael Ellerman:
    "Some more powerpc fixes for 5.9:

    - Opt us out of the DEBUG_VM_PGTABLE support for now as it's causing
    crashes.

    - Fix a long standing bug in our DMA mask handling that was hidden
    until recently, and which caused problems with some drivers.

    - Fix a boot failure on systems with large amounts of RAM, and no
    hugepage support and using Radix MMU, only seen in the lab.

    - A few other minor fixes.

    Thanks to Alexey Kardashevskiy, Aneesh Kumar K.V, Gautham R. Shenoy,
    Hari Bathini, Ira Weiny, Nick Desaulniers, Shirisha Ganta, Vaibhav
    Jain, and Vaidyanathan Srinivasan"

    * tag 'powerpc-5.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
    powerpc/papr_scm: Limit the readability of 'perf_stats' sysfs attribute
    cpuidle: pseries: Fix CEDE latency conversion from tb to us
    powerpc/dma: Fix dma_map_ops::get_required_mask
    Revert "powerpc/build: vdso linker warning for orphan sections"
    powerpc/mm: Remove DEBUG_VM_PGTABLE support on powerpc
    selftests/powerpc: Skip PROT_SAO test in guests/LPARS
    powerpc/book3s64/radix: Fix boot failure with large amount of guest memory

    Linus Torvalds
     

17 Sep, 2020

1 commit

  • Some drivers have to do significant work, some of which relies on RCU
    still being active. Instead of using RCU_NONIDLE in the drivers and
    flipping RCU back on, allow drivers to take over RCU-idle duty.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Ulf Hansson
    Tested-by: Borislav Petkov
    Signed-off-by: Rafael J. Wysocki

    Peter Zijlstra
     

15 Sep, 2020

1 commit

  • This driver does not restore stop > 3 state, so it limits itself
    to states which do not lose full state or TB.

    The POWER10 SPRs are sufficiently different from P9 that it seems
    easier to split out the P10 code. The POWER10 deep sleep code
    (e.g., the BHRB restore) has been taken out, but it can be re-added
    when stop > 3 support is added.

    Signed-off-by: Nicholas Piggin
    Tested-by: Pratik Rajesh Sampat
    Tested-by: Vaidyanathan Srinivasan
    Reviewed-by: Pratik Rajesh Sampat
    Reviewed-by: Gautham R. Shenoy
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200819094700.493399-1-npiggin@gmail.com

    Nicholas Piggin
     

08 Sep, 2020

1 commit

  • Commit d947fb4c965c ("cpuidle: pseries: Fixup exit latency for
    CEDE(0)") sets the exit latency of CEDE(0) based on the latency values
    of the Extended CEDE states advertised by the platform. The values
    advertised by the platform are in timebase ticks. However the cpuidle
    framework requires the latency values in microseconds.

    If the tb-ticks value advertised by the platform correspond to a value
    smaller than 1us, during the conversion from tb-ticks to microseconds,
    in the current code, the result becomes zero. This is incorrect as it
    puts a CEDE state on par with the snooze state.

    This patch fixes this by rounding up the result obtained while
    converting the latency value from tb-ticks to microseconds. It also
    prints a warning in case we discover an extended-cede state with
    wakeup latency to be 0. In such a case, ensure that CEDE(0) has a
    non-zero wakeup latency.

    Fixes: d947fb4c965c ("cpuidle: pseries: Fixup exit latency for CEDE(0)")
    Signed-off-by: Gautham R. Shenoy
    Reviewed-by: Vaidyanathan Srinivasan
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/1599125247-28488-1-git-send-email-ego@linux.vnet.ibm.com

    Gautham R. Shenoy
     

26 Aug, 2020

3 commits

  • This allows moving the leave_mm() call into generic code before
    rcu_idle_enter(). Gets rid of more trace_*_rcuidle() users.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Steven Rostedt (VMware)
    Reviewed-by: Thomas Gleixner
    Acked-by: Rafael J. Wysocki
    Tested-by: Marco Elver
    Link: https://lkml.kernel.org/r/20200821085348.369441600@infradead.org

    Peter Zijlstra
     
  • Lots of things take locks, due to a wee bug, rcu_lockdep didn't notice
    that the locking tracepoints were using RCU.

    Push rcu_idle_{enter,exit}() as deep as possible into the idle paths,
    this also resolves a lot of _rcuidle()/RCU_NONIDLE() usage.

    Specifically, sched_clock_idle_wakeup_event() will use ktime which
    will use seqlocks which will tickle lockdep, and
    stop_critical_timings() uses lock.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Steven Rostedt (VMware)
    Reviewed-by: Thomas Gleixner
    Acked-by: Rafael J. Wysocki
    Tested-by: Marco Elver
    Link: https://lkml.kernel.org/r/20200821085348.310943801@infradead.org

    Peter Zijlstra
     
  • Match the pattern elsewhere in this file.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Steven Rostedt (VMware)
    Reviewed-by: Thomas Gleixner
    Acked-by: Rafael J. Wysocki
    Tested-by: Marco Elver
    Link: https://lkml.kernel.org/r/20200821085348.251340558@infradead.org

    Peter Zijlstra
     

08 Aug, 2020

1 commit

  • Pull powerpc updates from Michael Ellerman:

    - Add support for (optionally) using queued spinlocks & rwlocks.

    - Support for a new faster system call ABI using the scv instruction on
    Power9 or later.

    - Drop support for the PROT_SAO mmap/mprotect flag as it will be
    unsupported on Power10 and future processors, leaving us with no way
    to implement the functionality it requests. This risks breaking
    userspace, though we believe it is unused in practice.

    - A bug fix for, and then the removal of, our custom stack expansion
    checking. We now allow stack expansion up to the rlimit, like other
    architectures.

    - Remove the remnants of our (previously disabled) topology update
    code, which tried to react to NUMA layout changes on virtualised
    systems, but was prone to crashes and other problems.

    - Add PMU support for Power10 CPUs.

    - A change to our signal trampoline so that we don't unbalance the link
    stack (branch return predictor) in the signal delivery path.

    - Lots of other cleanups, refactorings, smaller features and so on as
    usual.

    Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey
    Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju
    T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan
    S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris
    Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan
    Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn
    Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand,
    Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel
    Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh
    Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan
    Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton
    Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan
    Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran,
    Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud,
    Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
    Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
    Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar
    Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza
    Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov,
    Wei Yongjun, Wen Xiong, YueHaibing.

    * tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits)
    selftests/powerpc: Fix pkey syscall redefinitions
    powerpc: Fix circular dependency between percpu.h and mmu.h
    powerpc/powernv/sriov: Fix use of uninitialised variable
    selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs
    powerpc/40x: Fix assembler warning about r0
    powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric
    powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
    cpuidle: pseries: Fixup exit latency for CEDE(0)
    cpuidle: pseries: Add function to parse extended CEDE records
    cpuidle: pseries: Set the latency-hint before entering CEDE
    selftests/powerpc: Fix online CPU selection
    powerpc/perf: Consolidate perf_callchain_user_[64|32]()
    powerpc/pseries/hotplug-cpu: Remove double free in error path
    powerpc/pseries/mobility: Add pr_debug() for device tree changes
    powerpc/pseries/mobility: Set pr_fmt()
    powerpc/cacheinfo: Warn if cache object chain becomes unordered
    powerpc/cacheinfo: Improve diagnostics about malformed cache lists
    powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages
    powerpc/cacheinfo: Set pr_fmt()
    powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
    ...

    Linus Torvalds
     

30 Jul, 2020

9 commits

  • We are currently assuming that CEDE(0) has exit latency 10us, since
    there is no way for us to query from the platform. However, if the
    wakeup latency of an Extended CEDE state is smaller than 10us, then we
    can be sure that the exit latency of CEDE(0) cannot be more than that.

    In this patch, we fix the exit latency of CEDE(0) if we discover an
    Extended CEDE state with wakeup latency smaller than 10us.

    Benchmark results:

    On POWER8, this patch does not have any impact since the advertized
    latency of Extended CEDE (1) is 30us which is higher than the default
    latency of CEDE (0) which is 10us.

    On POWER9 we see improvement the single-threaded performance of
    ebizzy, and no regression in the wakeup latency or the number of
    context-switches.

    ebizzy:
    2 ebizzy threads bound to the same big-core. 25% improvement in the
    avg records/s with patch.

    x without_patch
    * with_patch
    N Min Max Median Avg Stddev
    x 10 2491089 5834307 5398375 4244335 1596244.9
    * 10 2893813 5834474 5832448 5327281.3 1055941.4

    context_switch2:
    There is no major regression observed with this patch as seen from the
    context_switch2 benchmark.

    context_switch2 across CPU0 CPU1 (Both belong to same big-core, but
    different small cores). We observe a minor 0.14% regression in the
    number of context-switches (higher is better).

    x without_patch
    * with_patch
    N Min Max Median Avg Stddev
    x 500 348872 362236 354712 354745.69 2711.827
    * 500 349422 361452 353942 354215.4 2576.9258

    Difference at 99.0% confidence
    -530.288 +/- 430.963
    -0.149484% +/- 0.121485%
    (Student's t, pooled s = 2645.24)

    context_switch2 across CPU0 CPU8 (Different big-cores). We observe a
    0.37% improvement in the number of context-switches (higher is
    better).

    x without_patch
    * with_patch
    N Min Max Median Avg Stddev
    x 500 287956 294940 288896 288977.23 646.59295
    * 500 288300 294646 289582 290064.76 1161.9992

    Difference at 99.0% confidence
    1087.53 +/- 153.194
    0.376337% +/- 0.0530125%
    (Student's t, pooled s = 940.299)

    schbench:
    No major difference could be seen until the 99.9th percentile.

    Without-patch:
    Latency percentiles (usec)
    50.0th: 29
    75.0th: 39
    90.0th: 49
    95.0th: 59
    *99.0th: 13104
    99.5th: 14672
    99.9th: 15824
    min=0, max=17993

    With-patch:
    Latency percentiles (usec)
    50.0th: 29
    75.0th: 40
    90.0th: 50
    95.0th: 61
    *99.0th: 13648
    99.5th: 14768
    99.9th: 15664
    min=0, max=29812

    Signed-off-by: Gautham R. Shenoy
    [mpe: Minor formatting]
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/1596087177-30329-4-git-send-email-ego@linux.vnet.ibm.com

    Gautham R. Shenoy
     
  • Currently we use CEDE with latency-hint 0 as the only other idle state
    on a dedicated LPAR apart from the polling "snooze" state.

    The platform might support additional extended CEDE idle states, which
    can be discovered through the "ibm,get-system-parameter" rtas-call
    made with CEDE_LATENCY_TOKEN.

    This patch adds a function to obtain information about the extended
    CEDE idle states from the platform and parse the contents to populate
    an array of extended CEDE states. These idle states thus discovered
    will be added to the cpuidle framework in the next patch.

    dmesg on a POWER8 and POWER9 LPAR, demonstrating the output of parsing
    the extended CEDE latency parameters are as follows

    POWER8
    [ 10.093279] xcede : xcede_record_size = 10
    [ 10.093285] xcede : Record 0 : hint = 1, latency = 0x3c00 tb ticks, Wake-on-irq = 1
    [ 10.093291] xcede : Record 1 : hint = 2, latency = 0x4e2000 tb ticks, Wake-on-irq = 0
    [ 10.093297] cpuidle : Skipping the 2 Extended CEDE idle states

    POWER9
    [ 5.913180] xcede : xcede_record_size = 10
    [ 5.913183] xcede : Record 0 : hint = 1, latency = 0x400 tb ticks, Wake-on-irq = 1
    [ 5.913188] xcede : Record 1 : hint = 2, latency = 0x3e8000 tb ticks, Wake-on-irq = 0
    [ 5.913193] cpuidle : Skipping the 2 Extended CEDE idle states

    Signed-off-by: Gautham R. Shenoy
    [mpe: Make space for 16 records, drop memset, minor cleanup & formatting]
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/1596087177-30329-3-git-send-email-ego@linux.vnet.ibm.com

    Gautham R. Shenoy
     
  • As per the PAPR, each H_CEDE call is associated with a latency-hint to
    be passed in the VPA field "cede_latency_hint". The CEDE states that
    we were implicitly entering so far is CEDE with latency-hint = 0.

    This patch explicitly sets the latency hint corresponding to the CEDE
    state that we are currently entering. While at it, we save the
    previous hint, to be restored once we wakeup from CEDE. This will be
    required in the future when we expose extended-cede states through the
    cpuidle framework, where each of them will have a different
    cede-latency hint.

    Signed-off-by: Gautham R. Shenoy
    [mpe: Make cede_latency_hint static]
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/1596087177-30329-2-git-send-email-ego@linux.vnet.ibm.com

    Gautham R. Shenoy
     
  • Control Flow Integrity(CFI) is a security mechanism that disallows
    changes to the original control flow graph of a compiled binary,
    making it significantly harder to perform such attacks.

    init_state_node() assign same function callback to different
    function pointer declarations.

    static int init_state_node(struct cpuidle_state *idle_state,
    const struct of_device_id *matches,
    struct device_node *state_node) { ...
    idle_state->enter = match_id->data; ...
    idle_state->enter_s2idle = match_id->data; }

    Function declarations:

    struct cpuidle_state { ...
    int (*enter) (struct cpuidle_device *dev,
    struct cpuidle_driver *drv,
    int index);

    void (*enter_s2idle) (struct cpuidle_device *dev,
    struct cpuidle_driver *drv,
    int index); };

    In this case, either enter() or enter_s2idle() would cause CFI check
    failed since they use same callee.

    Align function prototype of enter() since it needs return value for
    some use cases. The return value of enter_s2idle() is no
    need currently.

    Signed-off-by: Neal Liu
    Reviewed-by: Sami Tolvanen
    Signed-off-by: Rafael J. Wysocki

    Neal Liu
     
  • Depending on the SoC/platform, additional devices may be part of the PSCI
    PM domain topology. This is the case with 'qcom,rpmh-rsc' device, for
    example, even if this is not yet visible in the corresponding DTS-files.

    Without going into too much details, a device like the 'qcom,rpmh-rsc' may
    have HW constraints that needs to be obeyed to, before a domain idlestate
    can be picked.

    Therefore, let's implement the ->sync_state() callback to receive a
    notification when all consumers of the PSCI PM domain providers have been
    attached/probed to it. In this way, we can make sure all constraints from
    all relevant devices, are taken into account before allowing a domain
    idlestate to be picked.

    Acked-by: Saravana Kannan
    Signed-off-by: Ulf Hansson
    Reviewed-by: Lukasz Luba
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     
  • To enable support for deferred probing and to allow implementation of the
    ->sync_state() callback from subsequent changes, let's convert into a
    platform driver.

    Reviewed-by: Lina Iyer
    Signed-off-by: Ulf Hansson
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     
  • The current error paths for the cpuidle-psci driver, may leak memory or
    possibly leave CPU devices attached to their PM domains. These are quite
    harmless issues, but still deserves to be taken care of.

    Although, rather than fixing them by keeping track of allocations that
    needs to be freed, which tends to become a bit messy, let's convert into a
    platform driver. In this way, it gets easier to fix the memory leaks as we
    can rely on the devm_* functions.

    Moreover, converting to a platform driver also enables support for deferred
    probe, which subsequent changes takes benefit from.

    Signed-off-by: Ulf Hansson
    Reviewed-by: Lukasz Luba
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     
  • Currently we allow the cpuidle driver registration to succeed, even if we
    failed to enable the OSI mode when the hierarchical DT layout is used. This
    means running in a degraded mode, by using the available idle states per
    CPU, while also preventing the domain idle states.

    Moving forward, this behaviour looks quite questionable to maintain, as
    complexity seems to grow around it, especially when trying to add support
    for deferred probe, for example.

    Therefore, let's make the cpuidle driver registration to fail in this
    situation, thus relying on the default architectural cpuidle backend for
    WFI to be used.

    Reviewed-by: Lina Iyer
    Signed-off-by: Ulf Hansson
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     
  • The combined build object for the PSCI cpuidle driver and the PSCI PM
    domain, is a bit messy. Therefore let's split it up by adding a new Kconfig
    ARM_PSCI_CPUIDLE_DOMAIN and convert into two separate objects.

    Reviewed-by: Lina Iyer
    Reviewed-by: Sudeep Holla
    Signed-off-by: Ulf Hansson
    Signed-off-by: Rafael J. Wysocki

    Ulf Hansson
     

16 Jul, 2020

1 commit

  • The sparse tool complains as follows:

    drivers/cpuidle/cpuidle-pseries.c:25:23: warning:
    symbol 'pseries_idle_driver' was not declared. Should it be static?

    'pseries_idle_driver' is not used outside of this file, so marks
    it static.

    Reported-by: Hulk Robot
    Signed-off-by: Wei Yongjun
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200714142424.66648-1-weiyongjun1@huawei.com

    Wei Yongjun
     

15 Jul, 2020

1 commit

  • Commit 1961acad2f88559c2cdd2ef67c58c3627f1f6e54 removes usage of
    function "validate_dt_prop_sizes". This patch removes this unused
    function.

    Signed-off-by: Abhishek Goel
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200706053258.121475-1-huntbag@linux.vnet.ibm.com

    Abhishek Goel
     

25 Jun, 2020

1 commit

  • Implement call_cpuidle_s2idle() in analogy with call_cpuidle()
    for the s2idle-specific idle state entry and invoke it from
    cpuidle_idle_call() to make the s2idle-specific idle entry code
    path look more similar to the "regular" idle entry one.

    No intentional functional impact.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Chen Yu

    Rafael J. Wysocki
     

23 Jun, 2020

1 commit

  • Suspend to idle was found to not work on Goldmont CPU recently.

    The issue happens due to:

    1. On Goldmont the CPU in idle can only be woken up via IPIs,
    not POLLING mode, due to commit 08e237fa56a1 ("x86/cpu: Add
    workaround for MONITOR instruction erratum on Goldmont based
    CPUs")

    2. When the CPU is entering suspend to idle process, the
    _TIF_POLLING_NRFLAG remains on, because cpuidle_enter_s2idle()
    doesn't match call_cpuidle() exactly.

    3. Commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
    makes use of _TIF_POLLING_NRFLAG to avoid sending IPIs to idle
    CPUs.

    4. As a result, some IPIs related functions might not work
    well during suspend to idle on Goldmont. For example, one
    suspected victim:

    tick_unfreeze() -> timekeeping_resume() -> hrtimers_resume()
    -> clock_was_set() -> on_each_cpu() might wait forever,
    because the IPIs will not be sent to the CPUs which are
    sleeping with _TIF_POLLING_NRFLAG set, and Goldmont CPU
    could not be woken up by only setting _TIF_NEED_RESCHED
    on the monitor address.

    To avoid that, clear the _TIF_POLLING_NRFLAG flag before invoking
    enter_s2idle_proper() in cpuidle_enter_s2idle() in analogy with the
    call_cpuidle() code flow.

    Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
    Suggested-by: Peter Zijlstra (Intel)
    Suggested-by: Rafael J. Wysocki
    Reported-by: kbuild test robot
    Signed-off-by: Chen Yu
    [ rjw: Subject / changelog ]
    Signed-off-by: Rafael J. Wysocki

    Chen Yu
     

13 Jun, 2020

1 commit

  • Pull thermal updates from Daniel Lezcano:

    - Add the hwmon support on the i.MX SC (Anson Huang)

    - Thermal framework cleanups (self-encapsulation, pointless stubs,
    private structures) (Daniel Lezcano)

    - Use the PM QoS frequency changes for the devfreq cooling device
    (Matthias Kaehlcke)

    - Remove duplicate error messages from platform_get_irq() error
    handling (Markus Elfring)

    - Add support for the bandgap sensors (Keerthy)

    - Statically initialize .get_mode/.set_mode ops (Andrzej Pietrasiewicz)

    - Add Renesas R-Car maintainer entry (Niklas Söderlund)

    - Fix error checking after calling ti_bandgap_get_sensor_data() for the
    TI SoC thermal (Sudip Mukherjee)

    - Add latency constraint for the idle injection, the DT binding and the
    change the registering function (Daniel Lezcano)

    - Convert the thermal framework binding to the Yaml schema (Amit
    Kucheria)

    - Replace zero-length array with flexible-array on i.MX 8MM (Gustavo A.
    R. Silva)

    - Thermal framework cleanups (alphabetic order for heads, replace
    module.h by export.h, make file naming consistent) (Amit Kucheria)

    - Merge tsens-common into the tsens driver (Amit Kucheria)

    - Fix platform dependency for the Qoriq driver (Geert Uytterhoeven)

    - Clean up the rcar_thermal_update_temp() function in the rcar thermal
    driver (Niklas Söderlund)

    - Fix the TMSAR register for the TMUv2 on the Qoriq platform (Yuantian
    Tang)

    - Export GDDV, OEM vendor variables, and don't require IDSP for the
    int340x thermal driver - trivial conflicts fixed (Matthew Garrett)

    * tag 'thermal-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (48 commits)
    thermal/int340x_thermal: Don't require IDSP to exist
    thermal/int340x_thermal: Export OEM vendor variables
    thermal/int340x_thermal: Export GDDV
    thermal: qoriq: Update the settings for TMUv2
    thermal: rcar_thermal: Clean up rcar_thermal_update_temp()
    thermal: qoriq: Add platform dependencies
    drivers: thermal: tsens: Merge tsens-common.c into tsens.c
    thermal/of: Rename of-thermal.c
    thermal/governors: Prefix all source files with gov_
    thermal/drivers/user_space: Sort headers alphabetically
    thermal/drivers/of-thermal: Sort headers alphabetically
    thermal/drivers/cpufreq_cooling: Replace module.h with export.h
    thermal/drivers/cpufreq_cooling: Sort headers alphabetically
    thermal/drivers/clock_cooling: Include export.h
    thermal/drivers/clock_cooling: Sort headers alphabetically
    thermal/drivers/thermal_hwmon: Include export.h
    thermal/drivers/thermal_hwmon: Sort headers alphabetically
    thermal/drivers/thermal_helpers: Include export.h
    thermal/drivers/thermal_helpers: Sort headers alphabetically
    thermal/core: Replace module.h with export.h
    ...

    Linus Torvalds
     

06 Jun, 2020

1 commit

  • Pull powerpc updates from Michael Ellerman:

    - Support for userspace to send requests directly to the on-chip GZIP
    accelerator on Power9.

    - Rework of our lockless page table walking (__find_linux_pte()) to
    make it safe against parallel page table manipulations without
    relying on an IPI for serialisation.

    - A series of fixes & enhancements to make our machine check handling
    more robust.

    - Lots of plumbing to add support for "prefixed" (64-bit) instructions
    on Power10.

    - Support for using huge pages for the linear mapping on 8xx (32-bit).

    - Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound
    driver.

    - Removal of some obsolete 40x platforms and associated cruft.

    - Initial support for booting on Power10.

    - Lots of other small features, cleanups & fixes.

    Thanks to: Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan,
    Andrey Abramov, Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent
    Abali, Cédric Le Goater, Chen Zhou, Christian Zigotzky, Christophe
    JAILLET, Christophe Leroy, Dmitry Torokhov, Emmanuel Nicolet, Erhard F.,
    Gautham R. Shenoy, Geoff Levand, George Spelvin, Greg Kurz, Gustavo A.
    R. Silva, Gustavo Walbon, Haren Myneni, Hari Bathini, Joel Stanley,
    Jordan Niethe, Kajol Jain, Kees Cook, Leonardo Bras, Madhavan
    Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael Neuling, Michal
    Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin,
    Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram Pai,
    Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher
    Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler,
    Wolfram Sang, Xiongfeng Wang.

    * tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (299 commits)
    powerpc/pseries: Make vio and ibmebus initcalls pseries specific
    cxl: Remove dead Kconfig options
    powerpc: Add POWER10 architected mode
    powerpc/dt_cpu_ftrs: Add MMA feature
    powerpc/dt_cpu_ftrs: Enable Prefixed Instructions
    powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selected
    powerpc: Add support for ISA v3.1
    powerpc: Add new HWCAP bits
    powerpc/64s: Don't set FSCR bits in INIT_THREAD
    powerpc/64s: Save FSCR to init_task.thread.fscr after feature init
    powerpc/64s: Don't let DT CPU features set FSCR_DSCR
    powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()
    powerpc/32s: Fix another build failure with CONFIG_PPC_KUAP_DEBUG
    powerpc/module_64: Use special stub for _mcount() with -mprofile-kernel
    powerpc/module_64: Simplify check for -mprofile-kernel ftrace relocations
    powerpc/module_64: Consolidate ftrace code
    powerpc/32: Disable KASAN with pages bigger than 16k
    powerpc/uaccess: Don't set KUEP by default on book3s/32
    powerpc/uaccess: Don't set KUAP by default on book3s/32
    powerpc/8xx: Reduce time spent in allow_user_access() and friends
    ...

    Linus Torvalds
     

05 Jun, 2020

1 commit

  • Pull ARM/SoC driver updates from Arnd Bergmann:
    "These are updates to SoC specific drivers that did not have another
    subsystem maintainer tree to go through for some reason:

    - Some bus and memory drivers for the MIPS P5600 based Baikal-T1 SoC
    that is getting added through the MIPS tree.

    - There are new soc_device identification drivers for TI K3, Qualcomm
    MSM8939

    - New reset controller drivers for NXP i.MX8MP, Renesas RZ/G1H, and
    Hisilicon hi6220

    - The SCMI firmware interface can now work across ARM SMC/HVC as a
    transport.

    - Mediatek platforms now use a new driver for their "MMSYS" hardware
    block that controls clocks and some other aspects in behalf of the
    media and gpu drivers.

    - Some Tegra processors have improved power management support,
    including getting woken up by the PMIC and cluster power down
    during idle.

    - A new v4l staging driver for Tegra is added.

    - Cleanups and minor bugfixes for TI, NXP, Hisilicon, Mediatek, and
    Tegra"

    * tag 'arm-drivers-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (155 commits)
    clk: sprd: fix compile-testing
    bus: bt1-axi: Build the driver into the kernel
    bus: bt1-apb: Build the driver into the kernel
    bus: bt1-axi: Use sysfs_streq instead of strncmp
    bus: bt1-axi: Optimize the return points in the driver
    bus: bt1-apb: Use sysfs_streq instead of strncmp
    bus: bt1-apb: Use PTR_ERR_OR_ZERO to return from request-regs method
    bus: bt1-apb: Fix show/store callback identations
    bus: bt1-apb: Include linux/io.h
    dt-bindings: memory: Add Baikal-T1 L2-cache Control Block binding
    memory: Add Baikal-T1 L2-cache Control Block driver
    bus: Add Baikal-T1 APB-bus driver
    bus: Add Baikal-T1 AXI-bus driver
    dt-bindings: bus: Add Baikal-T1 APB-bus binding
    dt-bindings: bus: Add Baikal-T1 AXI-bus binding
    staging: tegra-video: fix V4L2 dependency
    tee: fix crypto select
    drivers: soc: ti: knav_qmss_queue: Make knav_gp_range_ops static
    soc: ti: add k3 platforms chipid module driver
    dt-bindings: soc: ti: add binding for k3 platforms chipid module
    ...

    Linus Torvalds
     

30 May, 2020

1 commit

  • kobject_init_and_add() takes reference even when it fails.
    If this function returns an error, kobject_put() must be called to
    properly clean up the memory associated with the object.

    Previous commit "b8eb718348b8" fixed a similar problem.

    Signed-off-by: Qiushi Wu
    [ rjw: Subject ]
    Signed-off-by: Rafael J. Wysocki

    Qiushi Wu
     

26 May, 2020

1 commit

  • The Qualcomm SPM cpuidle driver seems to be the last driver still
    using the generic ARM CPUidle infrastructure.

    Converting it actually allows us to simplify the driver,
    and we end up being able to remove more lines than adding new ones:

    - We can parse the CPUidle states in the device tree directly
    with dt_idle_states (and don't need to duplicate that
    functionality into the spm driver).

    - Each "saw" device managed by the SPM driver now directly
    registers its own cpuidle driver, removing the need for
    any global (per cpu) state.

    The device tree binding is the same, so the driver stays
    compatible with all old device trees.

    Signed-off-by: Stephan Gerhold
    Reviewed-by: Lina Iyer
    Reviewed-by: Ulf Hansson
    Acked-by: Bjorn Andersson
    Signed-off-by: Rafael J. Wysocki

    Stephan Gerhold
     

19 May, 2020

5 commits

  • Since the cpuidle governor can be switched via sysfs in default,
    remove sysfs_switch and cpuidle_switch_attrs.

    Signed-off-by: Hanjun Guo
    Reviewed-by: Doug Smythies
    Tested-by: Doug Smythies
    Acked-by: Daniel Lezcano
    Signed-off-by: Rafael J. Wysocki

    Hanjun Guo
     
  • For now cpuidle governor can be switched via sysfs only when the
    boot option "cpuidle_sysfs_switch" is passed, but it's important
    to switch the governor to adapt to different workloads, especially
    after TEO and haltpoll governor were introduced.

    Add available_governors and current_governor into the default
    attributes, but reserve the current_governor_ro for compatiblity.

    Signed-off-by: Hanjun Guo
    Reviewed-by: Doug Smythies
    Tested-by: Doug Smythies
    Acked-by: Daniel Lezcano
    Signed-off-by: Rafael J. Wysocki

    Hanjun Guo
     
  • CPUIDLE_NAME_LEN is 16, so it's possible to accept governor name
    with 15 characters, but now store_current_governor() rejects
    governor name with 15 characters as it returns -EINVAL if count
    equals CPUIDLE_NAME_LEN.

    Refactor the code to accept such case and simplify the code.

    Signed-off-by: Hanjun Guo
    Reviewed-by: Doug Smythies
    Tested-by: Doug Smythies
    Signed-off-by: Rafael J. Wysocki

    Hanjun Guo
     
  • When showing the available governors, it's "%s " in scnprintf(),
    not "%s", so if the governor name has 15 characters, it will
    overlap with the later one, fix it by adding one more for the
    size.

    While we are at it, fix the minor coding style issue and remove
    the "/sizeof(char)" since sizeof(char) always equals 1.

    Signed-off-by: Hanjun Guo
    Reviewed-by: Doug Smythies
    Tested-by: Doug Smythies
    Signed-off-by: Rafael J. Wysocki

    Hanjun Guo
     
  • The cpuidle driver can be used as a cooling device by injecting idle
    cycles.

    When the property is set, register the cpuidle driver with the idle
    state node pointer as a cooling device. The thermal framework will do
    the association automatically with the thermal zone via the
    cooling-device defined in the device tree cooling-maps section.

    Signed-off-by: Daniel Lezcano
    Reviewed-by: Lukasz Luba
    Reviewed-by: Amit Kucheria
    Acked-by: Sudeep Holla
    Link: https://lore.kernel.org/r/20200429103644.5492-4-daniel.lezcano@linaro.org

    Daniel Lezcano