28 Sep, 2020

1 commit


19 Sep, 2020

1 commit


17 Sep, 2020

2 commits


14 Sep, 2020

1 commit

  • In order to deal with IPIs as normal interrupts, let's add
    a new way to register them with the architecture code.

    set_smp_ipi_range() takes a range of interrupts, and allows
    the arch code to request them as if the were normal interrupts.
    A standard handler is then called by the core IRQ code to deal
    with the IPI.

    This means that we don't need to call irq_enter/irq_exit, and
    that we don't need to deal with set_irq_regs either. So let's
    move the dispatcher into its own function, and leave handle_IPI()
    as a compatibility function.

    On the sending side, let's make use of ipi_send_mask, which
    already exists for this purpose.

    One of the major difference is that we end up, in some cases
    (such as when performing IRQ time accounting on the scheduler
    IPI), end up with nested irq_enter()/irq_exit() pairs.
    Other than the (relatively small) overhead, there should be
    no consequences to it (these pairs are designed to nest
    correctly, and the accounting shouldn't be off).

    Reviewed-by: Valentin Schneider
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     

08 Aug, 2020

1 commit

  • Patch series "mm: cleanup usage of "

    Most architectures have very similar versions of pXd_alloc_one() and
    pXd_free_one() for intermediate levels of page table. These patches add
    generic versions of these functions in and enable
    use of the generic functions where appropriate.

    In addition, functions declared and defined in headers are
    used mostly by core mm and early mm initialization in arch and there is no
    actual reason to have the included all over the place.
    The first patch in this series removes unneeded includes of

    In the end it didn't work out as neatly as I hoped and moving
    pXd_alloc_track() definitions to would require
    unnecessary changes to arches that have custom page table allocations, so
    I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
    to mm/.

    This patch (of 8):

    In most cases header is required only for allocations of
    page table memory. Most of the .c files that include that header do not
    use symbols declared in and do not require that header.

    As for the other header files that used to include , it is
    possible to move that include into the .c file that actually uses symbols
    from and drop the include from the header file.

    The process was somewhat automated using

    sed -i -E '/[
    Signed-off-by: Andrew Morton
    Reviewed-by: Pekka Enberg
    Acked-by: Geert Uytterhoeven [m68k]
    Cc: Abdul Haleem
    Cc: Andy Lutomirski
    Cc: Arnd Bergmann
    Cc: Christophe Leroy
    Cc: Joerg Roedel
    Cc: Max Filippov
    Cc: Peter Zijlstra
    Cc: Satheesh Rajendran
    Cc: Stafford Horne
    Cc: Stephen Rothwell
    Cc: Steven Rostedt
    Cc: Joerg Roedel
    Cc: Matthew Wilcox
    Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
    Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

10 Jun, 2020

1 commit

  • Patch series "mm: consolidate definitions of page table accessors", v2.

    The low level page table accessors (pXY_index(), pXY_offset()) are
    duplicated across all architectures and sometimes more than once. For
    instance, we have 31 definition of pgd_offset() for 25 supported
    architectures.

    Most of these definitions are actually identical and typically it boils
    down to, e.g.

    static inline unsigned long pmd_index(unsigned long address)
    {
    return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
    }

    static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
    {
    return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
    }

    These definitions can be shared among 90% of the arches provided
    XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

    For architectures that really need a custom version there is always
    possibility to override the generic version with the usual ifdefs magic.

    These patches introduce include/linux/pgtable.h that replaces
    include/asm-generic/pgtable.h and add the definitions of the page table
    accessors to the new header.

    This patch (of 12):

    The linux/mm.h header includes to allow inlining of the
    functions involving page table manipulations, e.g. pte_alloc() and
    pmd_alloc(). So, there is no point to explicitly include
    in the files that include .

    The include statements in such cases are remove with a simple loop:

    for f in $(git grep -l "include ") ; do
    sed -i -e '/include / d' $f
    done

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Mike Rapoport
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
    Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

06 Dec, 2019

1 commit

  • Commit ca74b316df96 ("arm: Use common cpu_topology structure and
    functions.") changed cpu_coregroup_mask() from the ARM32 specific
    implementation in arch/arm/include/asm/topology.h to the one shared
    with ARM64 and RISCV in drivers/base/arch_topology.c.

    Currently on ARM32 (TC2 w/ CONFIG_SCHED_MC) the task scheduler setup
    code (w/ CONFIG_SCHED_DEBUG) shows this during CPU hotplug:

    ERROR: groups don't span domain->span

    It happens to CPUs of the cluster of the CPU which gets hot-plugged
    out on scheduler domain MC.

    Turns out that the shared cpu_coregroup_mask() requires that the
    hot-plugged CPU is removed from the core_sibling mask via
    remove_cpu_topology(). Otherwise the 'is core_sibling subset of
    cpumask_of_node()' doesn't work. In this case the task scheduler has to
    deal with cpumask_of_node instead of core_sibling which is wrong on
    scheduler domain MC.

    e.g. CPU3 hot-plugged out on TC2 [cluster0: 0,3-4 cluster1: 1-2]:

    cpu_coregroup_mask(): CPU3 cpumask_of_node=0-2,4 core_sibling=0,3-4
    ^
    should be:

    cpu_coregroup_mask(): CPU3 cpumask_of_node=0-2,4 core_sibling=0,4

    Add remove_cpu_topology() to __cpu_disable() to remove the CPU from the
    topology masks in case of a CPU hotplug out operation.

    At the same time tweak store_cpu_topology() slightly so it will call
    update_siblings_masks() in case of CPU hotplug in operation via
    secondary_start_kernel()->smp_store_cpu_info().

    This aligns the ARM32 implementation with the ARM64 one.

    Guarding remove_cpu_topology() with CONFIG_GENERIC_ARCH_TOPOLOGY is
    necessary since some Arm32 defconfigs (aspeed_g5_defconfig,
    milbeaut_m10v_defconfig, spear13xx_defconfig) specify an explicit

    # CONFIG_ARM_CPU_TOPOLOGY is not set

    w/ ./arch/arm/Kconfig: select GENERIC_ARCH_TOPOLOGY if ARM_CPU_TOPOLOGY

    Fixes: ca74b316df96 ("arm: Use common cpu_topology structure and functions")
    Reviewed-by: Sudeep Holla
    Reviewed-by: Lukasz Luba
    Tested-by: Lukasz Luba
    Tested-by: Ondrej Jirman
    Signed-off-by: Dietmar Eggemann
    Signed-off-by: Russell King

    Dietmar Eggemann
     

13 Aug, 2019

1 commit

  • This commit removes the open-coded CPU-offline notification with new
    common code. In particular, this change avoids calling scheduler code
    using RCU from an offline CPU that RCU is ignoring. This is a minimal
    change. A more intrusive change might invoke the cpu_check_up_prepare()
    and cpu_set_state_online() functions at CPU-online time, which would
    allow onlining throw an error if the CPU did not go offline properly.

    Signed-off-by: Paul E. McKenney
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: Russell King
    Cc: Mark Rutland
    Cc: Dietmar Eggemann

    Paul E. McKenney
     

09 Jul, 2019

1 commit

  • Pull ARM updates from Russell King:

    - Add a "cut here" to make it clearer where oops dumps should be cut
    from - we already have a marker for the end of the dumps.

    - Add logging severity to show_pte()

    - Drop unnecessary common-page-size linker flag

    - Errata workarounds for Cortex A12 857271, Cortex A17 857272 and
    Cortex A7 814220.

    - Remove some unused variables that had started to provoke a compiler
    warning.

    * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
    ARM: 8863/1: stm32: select ARM errata 814220
    ARM: 8862/1: errata: 814220-B-Cache maintenance by set/way operations can execute out of order
    ARM: 8865/1: mm: remove unused variables
    ARM: 8864/1: Add workaround for I-Cache line size mismatch between CPU cores
    ARM: 8861/1: errata: Workaround errata A12 857271 / A17 857272
    ARM: 8860/1: VDSO: Drop implicit common-page-size linker flag
    ARM: arrange show_pte() to issue severity-based messages
    ARM: add "8 to kernel dumps

    Linus Torvalds
     

21 Jun, 2019

1 commit

  • Some big.LITTLE systems have I-Cache line size mismatch between
    LITTLE and big cores. This patch adds a workaround for proper I-Cache
    support on such systems. Without it, some class of the userspace code
    (typically self-modifying) might suffer from random SIGILL failures.

    Similar workaround already exists for ARM64 architecture. I has been
    added by commit 116c81f427ff ("arm64: Work around systems with mismatched
    cache line sizes").

    Signed-off-by: Marek Szyprowski
    Signed-off-by: Russell King

    Marek Szyprowski
     

19 Jun, 2019

1 commit

  • Based on 2 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation #

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 4122 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Enrico Weigelt
    Reviewed-by: Kate Stewart
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

15 May, 2019

2 commits

  • Pull more power management updates from Rafael Wysocki:
    "These fix a recent regression causing kernels built with CONFIG_PM
    unset to crash on systems that support the Performance and Energy Bias
    Hint (EPB), clean up the cpufreq core and some users of transition
    notifiers and introduce a new power domain flag into the generic power
    domains framework (genpd).

    Specifics:

    - Fix recent regression causing kernels built with CONFIG_PM unset to
    crash on systems that support the Performance and Energy Bias Hint
    (EPB) by avoiding to compile the EPB-related code depending on
    CONFIG_PM when it is unset (Rafael Wysocki).

    - Clean up the transition notifier invocation code in the cpufreq
    core and change some users of cpufreq transition notifiers
    accordingly (Viresh Kumar).

    - Change MAINTAINERS to cover the schedutil governor as part of
    cpufreq (Viresh Kumar).

    - Simplify cpufreq_init_policy() to avoid redundant computations (Yue
    Hu).

    - Add explanatory comment to the cpufreq core (Rafael Wysocki).

    - Introduce a new flag, GENPD_FLAG_RPM_ALWAYS_ON, to the generic
    power domains (genpd) framework along with the first user of it
    (Leonard Crestez)"

    * tag 'pm-5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    soc: imx: gpc: Use GENPD_FLAG_RPM_ALWAYS_ON for ERR009619
    PM / Domains: Add GENPD_FLAG_RPM_ALWAYS_ON flag
    cpufreq: Update MAINTAINERS to include schedutil governor
    cpufreq: Don't find governor for setpolicy drivers in cpufreq_init_policy()
    cpufreq: Explain the kobject_put() in cpufreq_policy_alloc()
    cpufreq: Call transition notifier only once for each policy
    x86: intel_epb: Take CONFIG_PM into account

    Linus Torvalds
     
  • Patch series "compiler: allow all arches to enable
    CONFIG_OPTIMIZE_INLINING", v3.

    This patch (of 11):

    When function tracing for IPIs is enabled, we get a warning for an
    overflow of the ipi_types array with the IPI_CPU_BACKTRACE type as
    triggered by raise_nmi():

    arch/arm/kernel/smp.c: In function 'raise_nmi':
    arch/arm/kernel/smp.c:489:2: error: array subscript is above array bounds [-Werror=array-bounds]
    trace_ipi_raise(target, ipi_types[ipinr]);

    This is a correct warning as we actually overflow the array here.

    This patch raise_nmi() to call __smp_cross_call() instead of
    smp_cross_call(), to avoid calling into ftrace. For clarification, I'm
    also adding a two new code comments describing how this one is special.

    The warning appears to have shown up after commit e7273ff49acf ("ARM:
    8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI"), which changed the
    number assignment from '15' to '8', but as far as I can tell has existed
    since the IPI tracepoints were first introduced. If we decide to
    backport this patch to stable kernels, we probably need to backport
    e7273ff49acf as well.

    [yamada.masahiro@socionext.com: rebase on v5.1-rc1]
    Link: http://lkml.kernel.org/r/20190423034959.13525-2-yamada.masahiro@socionext.com
    Fixes: e7273ff49acf ("ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI")
    Fixes: 365ec7b17327 ("ARM: add IPI tracepoints") # v3.17
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Masahiro Yamada
    Cc: Heiko Carstens
    Cc: Arnd Bergmann
    Cc: Ingo Molnar
    Cc: Christophe Leroy
    Cc: Mathieu Malaterre
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Stefan Agner
    Cc: Boris Brezillon
    Cc: Miquel Raynal
    Cc: Richard Weinberger
    Cc: David Woodhouse
    Cc: Brian Norris
    Cc: Marek Vasut
    Cc: Russell King
    Cc: Borislav Petkov
    Cc: Mark Rutland
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     

10 May, 2019

1 commit

  • Currently, the notifiers are called once for each CPU of the policy->cpus
    cpumask. It would be more optimal if the notifier can be called only
    once and all the relevant information be provided to it. Out of the 23
    drivers that register for the transition notifiers today, only 4 of them
    do per-cpu updates and the callback for the rest can be called only once
    for the policy without any impact.

    This would also avoid multiple function calls to the notifier callbacks
    and reduce multiple iterations of notifier core's code (which does
    locking as well).

    This patch adds pointer to the cpufreq policy to the struct
    cpufreq_freqs, so the notifier callback has all the information
    available to it with a single call. The five drivers which perform
    per-cpu updates are updated to use the cpufreq policy. The freqs->cpu
    field is redundant now and is removed.

    Acked-by: David S. Miller (sparc)
    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

15 Mar, 2019

1 commit


02 Feb, 2019

3 commits

  • machine_crash_nonpanic_core() does this:

    while (1)
    cpu_relax();

    because the kernel has crashed, and we have no known safe way to deal
    with the CPU. So, we place the CPU into an infinite loop which we
    expect it to never exit - at least not until the system as a whole is
    reset by some method.

    In the absence of erratum 754327, this code assembles to:

    b .

    In other words, an infinite loop. When erratum 754327 is enabled,
    this becomes:

    1: dmb
    b 1b

    It has been observed that on some systems (eg, OMAP4) where, if a
    crash is triggered, the system tries to kexec into the panic kernel,
    but fails after taking the secondary CPU down - placing it into one
    of these loops. This causes the system to livelock, and the most
    noticable effect is the system stops after issuing:

    Loading crashdump kernel...

    to the system console.

    The tested as working solution I came up with was to add wfe() to
    these infinite loops thusly:

    while (1) {
    cpu_relax();
    wfe();
    }

    which, without 754327 builds to:

    1: wfe
    b 1b

    or with 754327 is enabled:

    1: dmb
    wfe
    b 1b

    Adding "wfe" does two things depending on the environment we're running
    under:
    - where we're running on bare metal, and the processor implements
    "wfe", it stops us spinning endlessly in a loop where we're never
    going to do any useful work.
    - if we're running in a VM, it allows the CPU to be given back to the
    hypervisor and rescheduled for other purposes (maybe a different VM)
    rather than wasting CPU cycles inside a crashed VM.

    However, in light of erratum 794072, Will Deacon wanted to see 10 nops
    as well - which is reasonable to cover the case where we have erratum
    754327 enabled _and_ we have a processor that doesn't implement the
    wfe hint.

    So, we now end up with:

    1: wfe
    b 1b

    when erratum 754327 is disabled, or:

    1: dmb
    nop
    nop
    nop
    nop
    nop
    nop
    nop
    nop
    nop
    nop
    wfe
    b 1b

    when erratum 754327 is enabled. We also get the dmb + 10 nop
    sequence elsewhere in the kernel, in terminating loops.

    This is reasonable - it means we get the workaround for erratum
    794072 when erratum 754327 is enabled, but still relinquish the dead
    processor - either by placing it in a lower power mode when wfe is
    implemented as such or by returning it to the hypervisior, or in the
    case where wfe is a no-op, we use the workaround specified in erratum
    794072 to avoid the problem.

    These as two entirely orthogonal problems - the 10 nops addresses
    erratum 794072, and the wfe is an optimisation that makes the system
    more efficient when crashed either in terms of power consumption or
    by allowing the host/other VMs to make use of the CPU.

    I don't see any reason not to use kexec() inside a VM - it has the
    potential to provide automated recovery from a failure of the VMs
    kernel with the opportunity for saving a crashdump of the failure.
    A panic() with a reboot timeout won't do that, and reading the
    libvirt documentation, setting on_reboot to "preserve" won't either
    (the documentation states "The preserve action for an on_reboot event
    is treated as a destroy".) Surely it has to be a good thing to
    avoiding having CPUs spinning inside a VM that is doing no useful
    work.

    Acked-by: Will Deacon
    Signed-off-by: Russell King

    Russell King
     
  • Consolidating the "pen_release" stuff amongst the various SoC
    implementations gives credence to having a CPU holding pen for
    secondary CPUs. However, this is far from the truth.

    Many SoC implementations cargo-cult copied various bits of the pen
    release implementation from the initial Realview/Versatile Express
    implementation without understanding what it was or why it existed.
    The reason it existed is because these are _development_ platforms,
    and some board firmware is unable to individually control the
    startup of secondary CPUs. Moreover, they do not have a way to
    power down or reset secondary CPUs for hot-unplug. Hence, the
    pen_release implementation was designed for ARM Ltd's development
    platforms to provide a working implementation, even though it is
    very far from what is required.

    It was decided a while back to reduce the duplication by consolidating
    the "pen_release" variable, but this only made the situation worse -
    we have ended up with several implementations that read this variable
    but do not write it - again, showing the cargo-cult mentality at work,
    lack of proper review of new code, and in some cases a lack of testing.

    While it would be preferable to remove pen_release entirely from the
    kernel, this is not possible without help from the SoC maintainers,
    which seems to be lacking. However, I want to remove pen_release from
    arch code to remove the credence that having it gives.

    This patch removes pen_release from the arch code entirely, adding
    private per-SoC definitions for it instead, and explicitly stating
    that write_pen_release() is cargo-cult copied and should not be
    copied any further. Rename write_pen_release() in a similar fashion
    as well.

    Signed-off-by: Russell King

    Russell King
     
  • Arm TC2 fails cpu hotplug stress test.

    This issue was tracked down to a missing copy of the new affinity
    cpumask for the vexpress-spc interrupt into struct
    irq_common_data.affinity when the interrupt is migrated in
    migrate_one_irq().

    Fix it by replacing the arm specific hotplug cpu migration with the
    generic irq code.

    This is the counterpart implementation to commit 217d453d473c ("arm64:
    fix a migrating irq bug when hotplug cpu").

    Tested with cpu hotplug stress test on Arm TC2 (multi_v7_defconfig plus
    CONFIG_ARM_BIG_LITTLE_CPUFREQ=y and CONFIG_ARM_VEXPRESS_SPC_CPUFREQ=y).
    The vexpress-spc interrupt (irq=22) on this board is affine to CPU0.
    Its affinity cpumask now changes correctly e.g. from 0 to 1-4 when
    CPU0 is hotplugged out.

    Suggested-by: Marc Zyngier
    Signed-off-by: Dietmar Eggemann
    Acked-by: Marc Zyngier
    Reviewed-by: Linus Walleij
    Signed-off-by: Russell King

    Dietmar Eggemann
     

02 Jan, 2019

1 commit


12 Nov, 2018

1 commit

  • In big.Little systems, some CPUs require the Spectre workarounds in
    paths such as the context switch, but other CPUs do not. In order
    to handle these differences, we need per-CPU vtables.

    We are unable to use the kernel's per-CPU variables to support this
    as per-CPU is not initialised at times when we need access to the
    vtables, so we have to use an array indexed by logical CPU number.

    We use an array-of-pointers to avoid having function pointers in
    the kernel's read/write .data section.

    Reviewed-by: Julien Thierry
    Signed-off-by: Russell King

    Russell King
     

08 Nov, 2018

1 commit

  • In case panic() and panic() called at the same time on different CPUS.
    For example:
    CPU 0:
    panic()
    __crash_kexec
    machine_crash_shutdown
    crash_smp_send_stop
    machine_kexec
    BUG_ON(num_online_cpus() > 1);

    CPU 1:
    panic()
    local_irq_disable
    panic_smp_self_stop

    If CPU 1 calls panic_smp_self_stop() before crash_smp_send_stop(), kdump
    fails. CPU1 can't receive the ipi irq, CPU1 will be always online.
    To fix this problem, this patch split out the panic_smp_self_stop()
    and add set_cpu_online(smp_processor_id(), false).

    Signed-off-by: Yufen Wang
    Signed-off-by: Russell King

    Yufen Wang
     

05 Jun, 2018

1 commit


31 May, 2018

1 commit

  • Check for CPU bugs when secondary processors are being brought online,
    and also when CPUs are resuming from a low power mode. This gives an
    opportunity to check that processor specific bug workarounds are
    correctly enabled for all paths that a CPU re-enters the kernel.

    Signed-off-by: Russell King
    Reviewed-by: Florian Fainelli
    Boot-tested-by: Tony Lindgren
    Reviewed-by: Tony Lindgren
    Acked-by: Marc Zyngier

    Russell King
     

19 May, 2018

1 commit

  • Suspending a CPU on a RT kernel results in the following backtrace:

    | Disabling non-boot CPUs ...
    | BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
    | in_atomic(): 1, irqs_disabled(): 128, pid: 18, name: migration/1
    | INFO: lockdep is turned off.
    | irq event stamp: 122
    | hardirqs last enabled at (121): [] _raw_spin_unlock_irqrestore+0x88/0x90
    | hardirqs last disabled at (122): [] _raw_spin_lock_irq+0x28/0x5c
    | CPU: 1 PID: 18 Comm: migration/1 Tainted: G W 4.1.4-rt3-01046-g96ac8da #204
    | Hardware name: Generic DRA74X (Flattened Device Tree)
    | [] (unwind_backtrace) from [] (show_stack+0x20/0x24)
    | [] (show_stack) from [] (dump_stack+0x88/0xdc)
    | [] (dump_stack) from [] (___might_sleep+0x198/0x2a8)
    | [] (___might_sleep) from [] (rt_spin_lock+0x30/0x70)
    | [] (rt_spin_lock) from [] (find_lock_task_mm+0x9c/0x174)
    | [] (find_lock_task_mm) from [] (clear_tasks_mm_cpumask+0xb4/0x1ac)
    | [] (clear_tasks_mm_cpumask) from [] (__cpu_disable+0x98/0xbc)
    | [] (__cpu_disable) from [] (take_cpu_down+0x1c/0x50)
    | [] (take_cpu_down) from [] (multi_cpu_stop+0x11c/0x158)
    | [] (multi_cpu_stop) from [] (cpu_stopper_thread+0xc4/0x184)
    | [] (cpu_stopper_thread) from [] (smpboot_thread_fn+0x18c/0x324)
    | [] (smpboot_thread_fn) from [] (kthread+0xe8/0x104)
    | [] (kthread) from [] (ret_from_fork+0x14/0x3c)
    | CPU1: shutdown

    The root cause of above backtrace is task_lock() which takes a sleeping
    lock on -RT.

    To fix the issue, move clear_tasks_mm_cpumask() call from __cpu_disable()
    to __cpu_die() which is called on the thread which is asking for a target
    CPU to be shutdown. In addition, this change restores CPU hotplug
    functionality on ARM CPU1 can be unplugged/plugged many times.

    Link: http://lkml.kernel.org/r/1441995683-30817-1-git-send-email-grygorii.strashko@ti.com
    [bigeasy: slighty edited the commit message]

    Signed-off-by: Grygorii Strashko
    Cc:
    Cc: Sekhar Nori
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Russell King

    Grygorii Strashko
     

21 Jan, 2018

1 commit

  • With switch to dynamic exception base address setting, VBAR/Hivecs
    set only for boot CPU, but secondaries stay unaware of that. That
    might lead to weird effects when trying up to bring up secondaries.

    Fixes: ad475117d201 ("ARM: 8649/2: nommu: remove Hivecs configuration is asm")
    Signed-off-by: Vladimir Murzin
    Acked-by: afzal mohammed
    Signed-off-by: Russell King

    Vladimir Murzin
     

23 Oct, 2017

1 commit

  • Currently, there are several issues with how MPU is setup:

    1. We won't boot if MPU is missing
    2. We won't boot if use XIP
    3. Further extension of MPU setup requires asm skills

    The 1st point can be relaxed, so we can continue with boot CPU even if
    MPU is missed and fail boot for secondaries only. To address the 2nd
    point we could create region covering CONFIG_XIP_PHYS_ADDR - _end and
    that might work for the first stage of MPU enable, but due to MPU's
    alignment requirement we could cover too much, IOW we need more
    flexibility in how we're partitioning memory regions... and it'd be
    hardly possible to archive because of the 3rd point.

    This patch is trying to address 1st and 3rd issues and paves the path
    for 2nd and further improvements.

    The most visible change introduced with this patch is that we start
    using mpu_rgn_info array (as it was supposed?), so change in MPU setup
    done by boot CPU is recorded there and feed to secondaries. It
    allows us to keep minimal region setup for boot CPU and do the rest in
    C. Since we start programming MPU regions in C evaluation of MPU
    constrains (number of regions supported and minimal region order) can
    be done once, which in turn open possibility to free-up "probe"
    region early.

    Tested-by: Szemző András
    Tested-by: Alexandre TORGUE
    Tested-by: Benjamin Gaignard
    Signed-off-by: Vladimir Murzin
    Signed-off-by: Russell King

    Vladimir Murzin
     

23 May, 2017

1 commit

  • To enable smp_processor_id() and might_sleep() debug checks earlier, it's
    required to add system states between SYSTEM_BOOTING and SYSTEM_RUNNING.

    Adjust the system_state check in ipi_cpu_stop() to handle the extra states.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Greg Kroah-Hartman
    Cc: Linus Torvalds
    Cc: Mark Rutland
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Steven Rostedt
    Cc: linux-arm-kernel@lists.infradead.org
    Link: http://lkml.kernel.org/r/20170516184735.020718977@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

03 Mar, 2017

1 commit

  • Move the following task->mm helper APIs into a new header file,
    , to further reduce the size and complexity
    of .

    Here are how the APIs are used in various kernel files:

    # mm_alloc():
    arch/arm/mach-rpc/ecard.c
    fs/exec.c
    include/linux/sched/mm.h
    kernel/fork.c

    # __mmdrop():
    arch/arc/include/asm/mmu_context.h
    include/linux/sched/mm.h
    kernel/fork.c

    # mmdrop():
    arch/arm/mach-rpc/ecard.c
    arch/m68k/sun3/mmu_emu.c
    arch/x86/mm/tlb.c
    drivers/gpu/drm/amd/amdkfd/kfd_process.c
    drivers/gpu/drm/i915/i915_gem_userptr.c
    drivers/infiniband/hw/hfi1/file_ops.c
    drivers/vfio/vfio_iommu_spapr_tce.c
    fs/exec.c
    fs/proc/base.c
    fs/proc/task_mmu.c
    fs/proc/task_nommu.c
    fs/userfaultfd.c
    include/linux/mmu_notifier.h
    include/linux/sched/mm.h
    kernel/fork.c
    kernel/futex.c
    kernel/sched/core.c
    mm/khugepaged.c
    mm/ksm.c
    mm/mmu_context.c
    mm/mmu_notifier.c
    mm/oom_kill.c
    virt/kvm/kvm_main.c

    # mmdrop_async_fn():
    include/linux/sched/mm.h

    # mmdrop_async():
    include/linux/sched/mm.h
    kernel/fork.c

    # mmget_not_zero():
    fs/userfaultfd.c
    include/linux/sched/mm.h
    mm/oom_kill.c

    # mmput():
    arch/arc/include/asm/mmu_context.h
    arch/arc/kernel/troubleshoot.c
    arch/frv/mm/mmu-context.c
    arch/powerpc/platforms/cell/spufs/context.c
    arch/sparc/include/asm/mmu_context_32.h
    drivers/android/binder.c
    drivers/gpu/drm/etnaviv/etnaviv_gem.c
    drivers/gpu/drm/i915/i915_gem_userptr.c
    drivers/infiniband/core/umem.c
    drivers/infiniband/core/umem_odp.c
    drivers/infiniband/core/uverbs_main.c
    drivers/infiniband/hw/mlx4/main.c
    drivers/infiniband/hw/mlx5/main.c
    drivers/infiniband/hw/usnic/usnic_uiom.c
    drivers/iommu/amd_iommu_v2.c
    drivers/iommu/intel-svm.c
    drivers/lguest/lguest_user.c
    drivers/misc/cxl/fault.c
    drivers/misc/mic/scif/scif_rma.c
    drivers/oprofile/buffer_sync.c
    drivers/vfio/vfio_iommu_type1.c
    drivers/vhost/vhost.c
    drivers/xen/gntdev.c
    fs/exec.c
    fs/proc/array.c
    fs/proc/base.c
    fs/proc/task_mmu.c
    fs/proc/task_nommu.c
    fs/userfaultfd.c
    include/linux/sched/mm.h
    kernel/cpuset.c
    kernel/events/core.c
    kernel/events/uprobes.c
    kernel/exit.c
    kernel/fork.c
    kernel/ptrace.c
    kernel/sys.c
    kernel/trace/trace_output.c
    kernel/tsacct.c
    mm/memcontrol.c
    mm/memory.c
    mm/mempolicy.c
    mm/migrate.c
    mm/mmu_notifier.c
    mm/nommu.c
    mm/oom_kill.c
    mm/process_vm_access.c
    mm/rmap.c
    mm/swapfile.c
    mm/util.c
    virt/kvm/async_pf.c

    # mmput_async():
    include/linux/sched/mm.h
    kernel/fork.c
    mm/oom_kill.c

    # get_task_mm():
    arch/arc/kernel/troubleshoot.c
    arch/powerpc/platforms/cell/spufs/context.c
    drivers/android/binder.c
    drivers/gpu/drm/etnaviv/etnaviv_gem.c
    drivers/infiniband/core/umem.c
    drivers/infiniband/core/umem_odp.c
    drivers/infiniband/hw/mlx4/main.c
    drivers/infiniband/hw/mlx5/main.c
    drivers/infiniband/hw/usnic/usnic_uiom.c
    drivers/iommu/amd_iommu_v2.c
    drivers/iommu/intel-svm.c
    drivers/lguest/lguest_user.c
    drivers/misc/cxl/fault.c
    drivers/misc/mic/scif/scif_rma.c
    drivers/oprofile/buffer_sync.c
    drivers/vfio/vfio_iommu_type1.c
    drivers/vhost/vhost.c
    drivers/xen/gntdev.c
    fs/proc/array.c
    fs/proc/base.c
    fs/proc/task_mmu.c
    include/linux/sched/mm.h
    kernel/cpuset.c
    kernel/events/core.c
    kernel/exit.c
    kernel/fork.c
    kernel/ptrace.c
    kernel/sys.c
    kernel/trace/trace_output.c
    kernel/tsacct.c
    mm/memcontrol.c
    mm/memory.c
    mm/mempolicy.c
    mm/migrate.c
    mm/mmu_notifier.c
    mm/nommu.c
    mm/util.c

    # mm_access():
    fs/proc/base.c
    include/linux/sched/mm.h
    kernel/fork.c
    mm/process_vm_access.c

    # mm_release():
    arch/arc/include/asm/mmu_context.h
    fs/exec.c
    include/linux/sched/mm.h
    include/uapi/linux/sched.h
    kernel/exit.c
    kernel/fork.c

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

02 Mar, 2017

2 commits


01 Mar, 2017

1 commit

  • Pull ARM updates from Russell King:

    - nommu updates from Afzal Mohammed cleaning up the vectors support

    - allow DMA memory "mapping" for nommu Benjamin Gaignard

    - fixing a correctness issue with R_ARM_PREL31 relocations in the
    module linker

    - add strlen() prototype for the decompressor

    - support for DEBUG_VIRTUAL from Florian Fainelli

    - adjusting memory bounds after memory reservations have been
    registered

    - unipher cache handling updates from Masahiro Yamada

    - initrd and Thumb Kconfig cleanups

    * 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (23 commits)
    ARM: mm: round the initrd reservation to page boundaries
    ARM: mm: clean up initrd initialisation
    ARM: mm: move initrd init code out of arm_memblock_init()
    ARM: 8655/1: improve NOMMU definition of pgprot_*()
    ARM: 8654/1: decompressor: add strlen prototype
    ARM: 8652/1: cache-uniphier: clean up active way setup code
    ARM: 8651/1: cache-uniphier: include instead of
    ARM: 8650/1: module: handle negative R_ARM_PREL31 addends correctly
    ARM: 8649/2: nommu: remove Hivecs configuration is asm
    ARM: 8648/2: nommu: display vectors base
    ARM: 8647/2: nommu: dynamic exception base address setting
    ARM: 8646/1: mmu: decouple VECTORS_BASE from Kconfig
    ARM: 8644/1: Reduce "CPU: shutdown" message to debug level
    ARM: 8641/1: treewide: Replace uses of virt_to_phys with __pa_symbol
    ARM: 8640/1: Add support for CONFIG_DEBUG_VIRTUAL
    ARM: 8639/1: Define KERNEL_START and KERNEL_END
    ARM: 8638/1: mtd: lart: Rename partition defines to be prefixed with PART_
    ARM: 8637/1: Adjust memory boundaries after reservations
    ARM: 8636/1: Cleanup sanity_check_meminfo
    ARM: add CPU_THUMB_CAPABLE to indicate possible Thumb support
    ...

    Linus Torvalds
     

28 Feb, 2017

2 commits

  • Similar to c68b0274fb3c ("ARM: reduce "Booted secondary processor"
    message to debug level"), demote the "CPU: shutdown" pr_notice() into a
    pr_debug().

    Signed-off-by: Florian Fainelli
    Signed-off-by: Russell King

    Florian Fainelli
     
  • Apart from adding the helper function itself, the rest of the kernel is
    converted mechanically using:

    git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)->mm_count);/mmgrab\(\1\);/'
    git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)\.mm_count);/mmgrab\(\&\1\);/'

    This is needed for a later patch that hooks into the helper, but might
    be a worthwhile cleanup on its own.

    (Michal Hocko provided most of the kerneldoc comment.)

    Link: http://lkml.kernel.org/r/20161218123229.22952-1-vegard.nossum@oracle.com
    Signed-off-by: Vegard Nossum
    Acked-by: Michal Hocko
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vegard Nossum
     

08 Oct, 2016

2 commits

  • Currently on arm there is code that checks whether it should call
    dump_stack() explicitly, to avoid trying to raise an NMI when the
    current context is not preemptible by the backtrace IPI. Similarly, the
    forthcoming arch/tile support uses an IPI mechanism that does not
    support generating an NMI to self.

    Accordingly, move the code that guards this case into the generic
    mechanism, and invoke it unconditionally whenever we want a backtrace of
    the current cpu. It seems plausible that in all cases, dump_stack()
    will generate better information than generating a stack from the NMI
    handler. The register state will be missing, but that state is likely
    not particularly helpful in any case.

    Or, if we think it is helpful, we should be capturing and emitting the
    current register state in all cases when regs == NULL is passed to
    nmi_cpu_backtrace().

    Link: http://lkml.kernel.org/r/1472487169-14923-3-git-send-email-cmetcalf@mellanox.com
    Signed-off-by: Chris Metcalf
    Tested-by: Daniel Thompson [arm]
    Reviewed-by: Petr Mladek
    Acked-by: Aaron Tomlin
    Cc: "Rafael J. Wysocki"
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Metcalf
     
  • Patch series "improvements to the nmi_backtrace code" v9.

    This patch series modifies the trigger_xxx_backtrace() NMI-based remote
    backtracing code to make it more flexible, and makes a few small
    improvements along the way.

    The motivation comes from the task isolation code, where there are
    scenarios where we want to be able to diagnose a case where some cpu is
    about to interrupt a task-isolated cpu. It can be helpful to see both
    where the interrupting cpu is, and also an approximation of where the
    cpu that is being interrupted is. The nmi_backtrace framework allows us
    to discover the stack of the interrupted cpu.

    I've tested that the change works as desired on tile, and build-tested
    x86, arm, mips, and sparc64. For x86 I confirmed that the generic
    cpuidle stuff as well as the architecture-specific routines are in the
    new cpuidle section. For arm, mips, and sparc I just build-tested it
    and made sure the generic cpuidle routines were in the new cpuidle
    section, but I didn't attempt to figure out which the platform-specific
    idle routines might be. That might be more usefully done by someone
    with platform experience in follow-up patches.

    This patch (of 4):

    Currently you can only request a backtrace of either all cpus, or all
    cpus but yourself. It can also be helpful to request a remote backtrace
    of a single cpu, and since we want that, the logical extension is to
    support a cpumask as the underlying primitive.

    This change modifies the existing lib/nmi_backtrace.c code to take a
    cpumask as its basic primitive, and modifies the linux/nmi.h code to use
    the new "cpumask" method instead.

    The existing clients of nmi_backtrace (arm and x86) are converted to
    using the new cpumask approach in this change.

    The other users of the backtracing API (sparc64 and mips) are converted
    to use the cpumask approach rather than the all/allbutself approach.
    The mips code ignored the "include_self" boolean but with this change it
    will now also dump a local backtrace if requested.

    Link: http://lkml.kernel.org/r/1472487169-14923-2-git-send-email-cmetcalf@mellanox.com
    Signed-off-by: Chris Metcalf
    Tested-by: Daniel Thompson [arm]
    Reviewed-by: Aaron Tomlin
    Reviewed-by: Petr Mladek
    Cc: "Rafael J. Wysocki"
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Ralf Baechle
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Metcalf
     

12 Aug, 2016

1 commit


19 Jun, 2016

1 commit

  • …t/tmlind/linux-omap into fixes

    Fixes for omaps for v4.7-rc cycle:

    - Two boot warning fixes from the RCU tree that should have gotten
    merged several weeks ago already but did not because of issues
    with who merges them. Paul has now split the RCU warning fixes into
    sets for various maintainers.

    - Fix ams-delta FIQ regression caused by omap1 sparse IRQ changes

    - Fix PM for omap3 boards using timer12 and gptimer, like the
    original beagleboard

    - Fix hangs on am437x-sk-evm by lowering the I2C bus speed

    * tag 'fixes-rcu-fiq-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
    ARM: dts: am437x-sk-evm: Reduce i2c0 bus speed for tps65218
    ARM: OMAP2+: timer: add probe for clocksources
    ARM: OMAP1: fix ams-delta FIQ handler to work with sparse IRQ
    arm: Use _rcuidle for smp_cross_call() tracepoints
    arm: Use _rcuidle tracepoint to allow use from idle

    Signed-off-by: Olof Johansson <olof@lixom.net>

    Olof Johansson
     

15 Jun, 2016

1 commit

  • Further testing with false negatives suppressed by commit 293e2421fe25
    ("rcu: Remove superfluous versions of rcu_read_lock_sched_held()")
    identified another unprotected use of RCU from the idle loop. Because RCU
    actively ignores idle-loop code (for energy-efficiency reasons, among
    other things), using RCU from the idle loop can result in too-short
    grace periods, in turn resulting in arbitrary misbehavior.

    The resulting lockdep-RCU splat is as follows:

    ------------------------------------------------------------------------

    ===============================
    [ INFO: suspicious RCU usage. ]
    4.6.0-rc5-next-20160426+ #1112 Not tainted
    -------------------------------
    include/trace/events/ipi.h:35 suspicious rcu_dereference_check() usage!

    other info that might help us debug this:

    RCU used illegally from idle CPU!
    rcu_scheduler_active = 1, debug_locks = 0
    RCU used illegally from extended quiescent state!
    no locks held by swapper/0/0.

    stack backtrace:
    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6.0-rc5-next-20160426+ #1112
    Hardware name: Generic OMAP4 (Flattened Device Tree)
    [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
    [] (show_stack) from [] (dump_stack+0xb0/0xe4)
    [] (dump_stack) from [] (smp_cross_call+0xbc/0x188)
    [] (smp_cross_call) from [] (generic_exec_single+0x9c/0x15c)
    [] (generic_exec_single) from [] (smp_call_function_single_async+0 x38/0x9c)
    [] (smp_call_function_single_async) from [] (cpuidle_coupled_poke_others+0x8c/0xa8)
    [] (cpuidle_coupled_poke_others) from [] (cpuidle_enter_state_coupled+0x26c/0x390)
    [] (cpuidle_enter_state_coupled) from [] (cpu_startup_entry+0x198/0x3a0)
    [] (cpu_startup_entry) from [] (start_kernel+0x354/0x3c8)
    [] (start_kernel) from [] (0x8000807c)

    ------------------------------------------------------------------------

    Reported-by: Tony Lindgren
    Signed-off-by: Paul E. McKenney
    Tested-by: Tony Lindgren
    Tested-by: Guenter Roeck
    Cc: Russell King
    Cc: Steven Rostedt
    Cc:
    Cc:

    Paul E. McKenney
     

21 May, 2016

1 commit

  • printk() takes some locks and could not be used a safe way in NMI
    context.

    The chance of a deadlock is real especially when printing stacks from
    all CPUs. This particular problem has been addressed on x86 by the
    commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all
    CPUs").

    The patchset brings two big advantages. First, it makes the NMI
    backtraces safe on all architectures for free. Second, it makes all NMI
    messages almost safe on all architectures (the temporary buffer is
    limited. We still should keep the number of messages in NMI context at
    minimum).

    Note that there already are several messages printed in NMI context:
    WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
    handlers. These are not easy to avoid.

    This patch reuses most of the code and makes it generic. It is useful
    for all messages and architectures that support NMI.

    The alternative printk_func is set when entering and is reseted when
    leaving NMI context. It queues IRQ work to copy the messages into the
    main ring buffer in a safe context.

    __printk_nmi_flush() copies all available messages and reset the buffer.
    Then we could use a simple cmpxchg operations to get synchronized with
    writers. There is also used a spinlock to get synchronized with other
    flushers.

    We do not longer use seq_buf because it depends on external lock. It
    would be hard to make all supported operations safe for a lockless use.
    It would be confusing and error prone to make only some operations safe.

    The code is put into separate printk/nmi.c as suggested by Steven
    Rostedt. It needs a per-CPU buffer and is compiled only on
    architectures that call nmi_enter(). This is achieved by the new
    HAVE_NMI Kconfig flag.

    The are MN10300 and Xtensa architectures. We need to clean up NMI
    handling there first. Let's do it separately.

    The patch is heavily based on the draft from Peter Zijlstra, see

    https://lkml.org/lkml/2015/6/10/327

    [arnd@arndb.de: printk-nmi: use %zu format string for size_t]
    [akpm@linux-foundation.org: min_t->min - all types are size_t here]
    Signed-off-by: Petr Mladek
    Suggested-by: Peter Zijlstra
    Suggested-by: Steven Rostedt
    Cc: Jan Kara
    Acked-by: Russell King [arm part]
    Cc: Daniel Thompson
    Cc: Jiri Kosina
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: David Miller
    Cc: Daniel Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek