20 Jan, 2021

3 commits

  • This is the 5.10.8 stable release

    * tag 'v5.10.8': (104 commits)
    Linux 5.10.8
    tools headers UAPI: Sync linux/fscrypt.h with the kernel sources
    drm/panfrost: Remove unused variables in panfrost_job_close()
    ...

    Signed-off-by: Jason Liu

    Jason Liu
     
  • This is the 5.10.7 stable release

    * tag 'v5.10.7': (144 commits)
    Linux 5.10.7
    scsi: target: Fix XCOPY NAA identifier lookup
    rtlwifi: rise completion at the last step of firmware callback
    ...

    Signed-off-by: Jason Liu

    Jason Liu
     
  • This is the 5.10.5 stable release

    * tag 'v5.10.5': (63 commits)
    Linux 5.10.5
    device-dax: Fix range release
    ext4: avoid s_mb_prefetch to be zero in individual scenarios
    ...

    Signed-off-by: Jason Liu

    Jason Liu
     

17 Jan, 2021

1 commit

  • [ Upstream commit 98bf2d3f4970179c702ef64db658e0553bc6ef3a ]

    When we have VMAP stack, exception prolog 1 sets r1, not r11.

    When it is not an RTAS machine check, don't trash r1 because it is
    needed by prolog 1.

    Fixes: da7bb43ab9da ("powerpc/32: Fix vmap stack - Properly set r1 before activating MMU")
    Fixes: d2e006036082 ("powerpc/32: Use SPRN_SPRG_SCRATCH2 in exception prologs")
    Cc: stable@vger.kernel.org # v5.10+
    Signed-off-by: Christophe Leroy
    [mpe: Squash in fixup for RTAS machine check from Christophe]
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/bc77d61d1c18940e456a2dee464f1e2eda65a3f0.1608621048.git.christophe.leroy@csgroup.eu
    Signed-off-by: Sasha Levin

    Christophe Leroy
     

13 Jan, 2021

1 commit

  • commit 3ce47d95b7346dcafd9bed3556a8d072cb2b8571 upstream.

    Commit eff8728fe698 ("vmlinux.lds.h: Add PGO and AutoFDO input
    sections") added ".text.unlikely.*" and ".text.hot.*" due to an LLVM
    change [1].

    After another LLVM change [2], these sections are seen in some PowerPC
    builds, where there is a orphan section warning then build failure:

    $ make -skj"$(nproc)" \
    ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 O=out \
    distclean powernv_defconfig zImage.epapr
    ld.lld: warning: kernel/built-in.a(panic.o):(.text.unlikely.) is being placed in '.text.unlikely.'
    ...
    ld.lld: warning: address (0xc000000000009314) of section .text is not a multiple of alignment (256)
    ...
    ERROR: start_text address is c000000000009400, should be c000000000008000
    ERROR: try to enable LD_HEAD_STUB_CATCH config option
    ERROR: see comments in arch/powerpc/tools/head_check.sh
    ...

    Explicitly handle these sections like in the main linker script so
    there is no more build failure.

    [1]: https://reviews.llvm.org/D79600
    [2]: https://reviews.llvm.org/D92493

    Fixes: 83a092cf95f2 ("powerpc: Link warning for orphan sections")
    Cc: stable@vger.kernel.org
    Signed-off-by: Nathan Chancellor
    Signed-off-by: Michael Ellerman
    Link: https://github.com/ClangBuiltLinux/linux/issues/1218
    Link: https://lore.kernel.org/r/20210104205952.1399409-1-natechancellor@gmail.com
    Signed-off-by: Greg Kroah-Hartman

    Nathan Chancellor
     

06 Jan, 2021

1 commit

  • [ Upstream commit 59d512e4374b2d8a6ad341475dc94c4a4bdec7d3 ]

    This is way to catch some cases of decrementer overflow, when the
    decrementer has underflowed an odd number of times, while MSR[EE] was
    disabled.

    With a typical small decrementer, a timer that fires when MSR[EE] is
    disabled will be "lost" if MSR[EE] remains disabled for between 4.3 and
    8.6 seconds after the timer expires. In any case, the decrementer
    interrupt would be taken at 8.6 seconds and the timer would be found at
    that point.

    So this check is for catching extreme latency events, and it prevents
    those latencies from being a further few seconds long. It's not obvious
    this is a good tradeoff. This is already a watchdog magnitude event and
    that situation is not improved a significantly with this check. For
    large decrementers, it's useless.

    Therefore remove this check, which avoids a mftb when enabling hard
    disabled interrupts (e.g., when enabling after coming from hardware
    interrupt handlers). Perhaps more importantly, it also removes the
    clunky MSR[EE] vs PACA_IRQ_HARD_DIS incoherency in soft-interrupt replay
    which simplifies the code.

    Signed-off-by: Nicholas Piggin
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20201107014336.2337337-1-npiggin@gmail.com
    Signed-off-by: Sasha Levin

    Nicholas Piggin
     

04 Jan, 2021

1 commit

  • This is the 5.10.4 stable release

    * tag 'v5.10.4': (717 commits)
    Linux 5.10.4
    x86/CPU/AMD: Save AMD NodeId as cpu_die_id
    drm/edid: fix objtool warning in drm_cvt_modes()
    ...

    Signed-off-by: Jason Liu

    Conflicts:
    drivers/gpu/drm/imx/dcss/dcss-plane.c
    drivers/media/i2c/ov5640.c

    Jason Liu
     

30 Dec, 2020

6 commits

  • commit f10881a46f8914428110d110140a455c66bdf27b upstream.

    Commit bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace")
    introduced the following error when invoking the errinjct userspace
    tool:

    [root@ltcalpine2-lp5 librtas]# errinjct open
    [327884.071171] sys_rtas: RTAS call blocked - exploit attempt?
    [327884.071186] sys_rtas: token=0x26, nargs=0 (called by errinjct)
    errinjct: Could not open RTAS error injection facility
    errinjct: librtas: open: Unexpected I/O error

    The entry for ibm,open-errinjct in rtas_filter array has a typo where
    the "j" is omitted in the rtas call name. After fixing this typo the
    errinjct tool functions again as expected.

    [root@ltcalpine2-lp5 linux]# errinjct open
    RTAS error injection facility open, token = 1

    Fixes: bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace")
    Cc: stable@vger.kernel.org
    Signed-off-by: Tyrel Datwyler
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20201208195434.8289-1-tyreld@linux.ibm.com
    Signed-off-by: Greg Kroah-Hartman

    Tyrel Datwyler
     
  • commit d5c243989fb0cb03c74d7340daca3b819f706ee7 upstream.

    We need r1 to be properly set before activating MMU, otherwise any new
    exception taken while saving registers into the stack in syscall
    prologs will use the user stack, which is wrong and will even lockup
    or crash when KUAP is selected.

    Do that by switching the meaning of r11 and r1 until we have saved r1
    to the stack: copy r1 into r11 and setup the new stack pointer in r1.
    To avoid complicating and impacting all generic and specific prolog
    code (and more), copy back r1 into r11 once r11 is save onto
    the stack.

    We could get rid of copying r1 back and forth at the cost of rewriting
    everything to use r1 instead of r11 all the way when CONFIG_VMAP_STACK
    is set, but the effort is probably not worth it for now.

    Fixes: da7bb43ab9da ("powerpc/32: Fix vmap stack - Properly set r1 before activating MMU")
    Cc: stable@vger.kernel.org # v5.10+
    Signed-off-by: Christophe Leroy
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/a3d819d5c348cee9783a311d5d3f3ba9b48fd219.1608531452.git.christophe.leroy@csgroup.eu
    Signed-off-by: Greg Kroah-Hartman

    Christophe Leroy
     
  • [ Upstream commit 9014eab6a38c60fd185bc92ed60f46cf99a462ab ]

    It fixes this link warning:

    WARNING: modpost: vmlinux.o(.text.unlikely+0x2d98): Section mismatch in reference from the function init_big_cores.isra.0() to the function .init.text:init_thread_group_cache_map()
    The function init_big_cores.isra.0() references
    the function __init init_thread_group_cache_map().
    This is often because init_big_cores.isra.0 lacks a __init
    annotation or the annotation of init_thread_group_cache_map is wrong.

    Fixes: 425752c63b6f ("powerpc: Detect the presence of big-cores via "ibm, thread-groups"")
    Signed-off-by: Cédric Le Goater
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20201221074154.403779-1-clg@kaod.org
    Signed-off-by: Sasha Levin

    Cédric Le Goater
     
  • [ Upstream commit fe18a35e685c9bdabc8b11b3e19deb85a068b75d ]

    Commit 63ce271b5e37 ("powerpc/prom: convert PROM_BUG() to standard
    trap") added an EMIT_BUG_ENTRY for the trap after the branch to
    start_kernel(). The EMIT_BUG_ENTRY was for the address "0b", however the
    trap was not labeled with "0". Hence the address used for bug is in
    relative_toc() where the previous "0" label is. Label the trap as "0" so
    the correct address is used.

    Fixes: 63ce271b5e37 ("powerpc/prom: convert PROM_BUG() to standard trap")
    Signed-off-by: Jordan Niethe
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20201130004404.30953-1-jniethe5@gmail.com
    Signed-off-by: Sasha Levin

    Jordan Niethe
     
  • [ Upstream commit a7223f5bfcaeade4a86d35263493bcda6c940891 ]

    Commit 7053f80d9696 ("powerpc/64: Prevent stack protection in early
    boot") introduced a couple of uses of __attribute__((optimize)) with
    function scope, to disable the stack protector in some early boot
    code.

    Unfortunately, and this is documented in the GCC man pages [0],
    overriding function attributes for optimization is broken, and is only
    supported for debug scenarios, not for production: the problem appears
    to be that setting GCC -f flags using this method will cause it to
    forget about some or all other optimization settings that have been
    applied.

    So the only safe way to disable the stack protector is to disable it
    for the entire source file.

    [0] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html

    Fixes: 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot")
    Signed-off-by: Ard Biesheuvel
    [mpe: Drop one remaining use of __nostackprotector, reported by snowpatch]
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20201028080433.26799-1-ardb@kernel.org
    Signed-off-by: Sasha Levin

    Ard Biesheuvel
     
  • [ Upstream commit 3c0b976bf20d236c57adcefa80f86a0a1d737727 ]

    Currently in generic_secondary_smp_init(), cur_cpu_spec->cpu_restore()
    is called before a stack has been set up in r1. This was previously fine
    as the cpu_restore() functions were implemented in assembly and did not
    use a stack. However commit 5a61ef74f269 ("powerpc/64s: Support new
    device tree binding for discovering CPU features") used
    __restore_cpu_cpufeatures() as the cpu_restore() function for a
    device-tree features based cputable entry. This is a C function and
    hence uses a stack in r1.

    generic_secondary_smp_init() is entered on the secondary cpus via the
    primary cpu using the OPAL call opal_start_cpu(). In OPAL, each hardware
    thread has its own stack. The OPAL call is ran in the primary's hardware
    thread. During the call, a job is scheduled on a secondary cpu that will
    start executing at the address of generic_secondary_smp_init(). Hence
    the value that will be left in r1 when the secondary cpu enters the
    kernel is part of that secondary cpu's individual OPAL stack. This means
    that __restore_cpu_cpufeatures() will write to that OPAL stack. This is
    not horribly bad as each hardware thread has its own stack and the call
    that enters the kernel from OPAL never returns, but it is still wrong
    and should be corrected.

    Create the temp kernel stack before calling cpu_restore().

    As noted by mpe, for a kexec boot, the secondary CPUs are released from
    the spin loop at address 0x60 by smp_release_cpus() and then jump to
    generic_secondary_smp_init(). The call to smp_release_cpus() is in
    setup_arch(), and it comes before the call to emergency_stack_init().
    emergency_stack_init() allocates an emergency stack in the PACA for each
    CPU. This address in the PACA is what is used to set up the temp kernel
    stack in generic_secondary_smp_init(). Move releasing the secondary CPUs
    to after the PACAs have been allocated an emergency stack, otherwise the
    PACA stack pointer will contain garbage and hence the temp kernel stack
    created from it will be broken.

    Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")
    Signed-off-by: Jordan Niethe
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20201014072837.24539-1-jniethe5@gmail.com
    Signed-off-by: Sasha Levin

    Jordan Niethe
     

18 Dec, 2020

1 commit


14 Dec, 2020

3 commits

  • In sleep mode, the clocks of CPU core and unused IP blocks are turned
    off (IP blocks allowed to wake up system will running).

    Some QorIQ SoCs like MPC8536, P1022 and T104x, have deep sleep PM mode
    in addtion to the sleep PM mode. While in deep sleep mode,
    additionally, the power supply is removed from CPU core and most IP
    blocks. Only the blocks needed to wake up the chip out of deep sleep
    are ON.

    This feature supports 32-bit and 36-bit address space.

    The sleep mode is equal to the Standby state in Linux. The deep sleep
    mode is equal to the Suspend-to-RAM state of Linux Power Management.
    Command to enter sleep mode.
    echo standby > /sys/power/state
    Command to enter deep sleep mode.
    echo mem > /sys/power/state

    Signed-off-by: Dave Liu
    Signed-off-by: Li Yang
    Signed-off-by: Jin Qing
    Signed-off-by: Jerry Huang
    Signed-off-by: Ramneek Mehresh
    Signed-off-by: Zhao Chenhui
    Signed-off-by: Wang Dongsheng
    Signed-off-by: Tang Yuantian
    Signed-off-by: Xie Xiaobo
    Signed-off-by: Zhao Qiang
    Signed-off-by: Shengzhou Liu
    Signed-off-by: Ran Wang

    Ran Wang
     
  • Need to be separated when submitting upstream.

    Signed-off-by: Li Yang

    Li Yang
     
  • Various e500 core have different cache architecture, so they
    need different cache flush operations. Therefore, add a callback
    function cpu_flush_caches to the struct cpu_spec. The cache flush
    operation for the specific kind of e500 is selected at init time.
    The callback function will flush all caches in the current cpu.

    Signed-off-by: Chenhui Zhao
    Reviewed-by: Yang Li
    Reviewed-by: Jose Rivera
    Signed-off-by: Ran Wang

    Ran Wang
     

30 Nov, 2020

1 commit


24 Nov, 2020

1 commit

  • We call arch_cpu_idle() with RCU disabled, but then use
    local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.

    Switch all arch_cpu_idle() implementations to use
    raw_local_irq_{en,dis}able() and carefully manage the
    lockdep,rcu,tracing state like we do in entry.

    (XXX: we really should change arch_cpu_idle() to not return with
    interrupts enabled)

    Reported-by: Sven Schnelle
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Mark Rutland
    Tested-by: Mark Rutland
    Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org

    Peter Zijlstra
     

23 Nov, 2020

1 commit

  • From Daniel's cover letter:

    IBM Power9 processors can speculatively operate on data in the L1 cache
    before it has been completely validated, via a way-prediction mechanism. It
    is not possible for an attacker to determine the contents of impermissible
    memory using this method, since these systems implement a combination of
    hardware and software security measures to prevent scenarios where
    protected data could be leaked.

    However these measures don't address the scenario where an attacker induces
    the operating system to speculatively execute instructions using data that
    the attacker controls. This can be used for example to speculatively bypass
    "kernel user access prevention" techniques, as discovered by Anthony
    Steinhauser of Google's Safeside Project. This is not an attack by itself,
    but there is a possibility it could be used in conjunction with
    side-channels or other weaknesses in the privileged code to construct an
    attack.

    This issue can be mitigated by flushing the L1 cache between privilege
    boundaries of concern.

    This patch series flushes the L1 cache on kernel entry (patch 2) and after the
    kernel performs any user accesses (patch 3). It also adds a self-test and
    performs some related cleanups.

    Michael Ellerman
     

19 Nov, 2020

3 commits

  • In kup.h we currently include kup-radix.h for all 64-bit builds, which
    includes Book3S and Book3E. The latter doesn't make sense, Book3E
    never uses the Radix MMU.

    This has worked up until now, but almost by accident, and the recent
    uaccess flush changes introduced a build breakage on Book3E because of
    the bad structure of the code.

    So disentangle things so that we only use kup-radix.h for Book3S. This
    requires some more stubs in kup.h and fixing an include in
    syscall_64.c.

    Signed-off-by: Michael Ellerman

    Michael Ellerman
     
  • IBM Power9 processors can speculatively operate on data in the L1 cache
    before it has been completely validated, via a way-prediction mechanism. It
    is not possible for an attacker to determine the contents of impermissible
    memory using this method, since these systems implement a combination of
    hardware and software security measures to prevent scenarios where
    protected data could be leaked.

    However these measures don't address the scenario where an attacker induces
    the operating system to speculatively execute instructions using data that
    the attacker controls. This can be used for example to speculatively bypass
    "kernel user access prevention" techniques, as discovered by Anthony
    Steinhauser of Google's Safeside Project. This is not an attack by itself,
    but there is a possibility it could be used in conjunction with
    side-channels or other weaknesses in the privileged code to construct an
    attack.

    This issue can be mitigated by flushing the L1 cache between privilege
    boundaries of concern. This patch flushes the L1 cache after user accesses.

    This is part of the fix for CVE-2020-4788.

    Signed-off-by: Nicholas Piggin
    Signed-off-by: Daniel Axtens
    Signed-off-by: Michael Ellerman

    Nicholas Piggin
     
  • IBM Power9 processors can speculatively operate on data in the L1 cache
    before it has been completely validated, via a way-prediction mechanism. It
    is not possible for an attacker to determine the contents of impermissible
    memory using this method, since these systems implement a combination of
    hardware and software security measures to prevent scenarios where
    protected data could be leaked.

    However these measures don't address the scenario where an attacker induces
    the operating system to speculatively execute instructions using data that
    the attacker controls. This can be used for example to speculatively bypass
    "kernel user access prevention" techniques, as discovered by Anthony
    Steinhauser of Google's Safeside Project. This is not an attack by itself,
    but there is a possibility it could be used in conjunction with
    side-channels or other weaknesses in the privileged code to construct an
    attack.

    This issue can be mitigated by flushing the L1 cache between privilege
    boundaries of concern. This patch flushes the L1 cache on kernel entry.

    This is part of the fix for CVE-2020-4788.

    Signed-off-by: Nicholas Piggin
    Signed-off-by: Daniel Axtens
    Signed-off-by: Michael Ellerman

    Nicholas Piggin
     

18 Nov, 2020

1 commit

  • Commit 2284ffea8f0c ("powerpc/64s/exception: Only test KVM in SRR
    interrupts when PR KVM is supported") removed KVM guest tests from
    interrupts that do not set HV=1, when PR-KVM is not configured.

    This is wrong for HV-KVM HPT guest MMIO emulation case which attempts
    to load the faulting instruction word with MSR[DR]=1 and MSR[HV]=1 with
    the guest MMU context loaded. This can cause host DSI, DSLB interrupts
    which must test for KVM guest. Restore this and add a comment.

    Fixes: 2284ffea8f0c ("powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supported")
    Cc: stable@vger.kernel.org # v5.7+
    Signed-off-by: Nicholas Piggin
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20201117135617.3521127-1-npiggin@gmail.com

    Nicholas Piggin
     

16 Nov, 2020

1 commit

  • pseries guest kernels have a FWNMI handler for SRESET and MCE NMIs,
    which is basically the same as the regular handlers for those
    interrupts.

    The system reset FWNMI handler did not have a KVM guest test in it,
    although it probably should have because the guest can itself run
    guests.

    Commit 4f50541f6703b ("powerpc/64s/exception: Move all interrupt
    handlers to new style code gen macros") convert the handler faithfully
    to avoid a KVM test with a "clever" trick to modify the IKVM_REAL
    setting to 0 when the fwnmi handler is to be generated (PPC_PSERIES=y).
    This worked when the KVM test was generated in the interrupt entry
    handlers, but a later patch moved the KVM test to the common handler,
    and the common handler macro is expanded below the fwnmi entry. This
    prevents the KVM test from being generated even for the 0x100 entry
    point as well.

    The result is NMI IPIs in the host kernel when a guest is running will
    use gest registers. This goes particularly badly when an HPT guest is
    running and the MMU is set to guest mode.

    Remove this trickery and just generate the test always.

    Fixes: 9600f261acaa ("powerpc/64s/exception: Move KVM test to common code")
    Cc: stable@vger.kernel.org # v5.7+
    Signed-off-by: Nicholas Piggin
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20201114114743.3306283-1-npiggin@gmail.com

    Nicholas Piggin
     

08 Nov, 2020

1 commit

  • When calling early_hash_table(), the kernel hasn't been yet
    relocated to its linking address, so data must be addressed
    with relocation offset.

    Add relocation offset to write into Hash in early_hash_table().

    Fixes: 69a1593abdbc ("powerpc/32s: Setup the early hash table at all time.")
    Reported-by: Erhard Furtner
    Reported-by: Andreas Schwab
    Signed-off-by: Christophe Leroy
    Tested-by: Serge Belyshev
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/9e225a856a8b22e0e77587ee22ab7a2f5bca8753.1604740029.git.christophe.leroy@csgroup.eu

    Christophe Leroy
     

05 Nov, 2020

4 commits

  • When _PAGE_ACCESSED is not set, a minor fault is expected.
    To do this, TLB miss exception ANDs _PAGE_PRESENT and _PAGE_ACCESSED
    into the L2 entry valid bit.

    To simplify the processing and reduce the number of instructions in
    TLB miss exceptions, manage it as an APG bit and get it next to
    _PAGE_GUARDED bit to allow a copy in one go. Then declare the
    corresponding groups as handling all accesses as user accesses.
    As the PP bits always define user as No Access, it will generate
    a fault.

    Signed-off-by: Christophe Leroy
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/80f488db230c6b0e7b3b990d72bd94a8a069e93e.1602492856.git.christophe.leroy@csgroup.eu

    Christophe Leroy
     
  • The kernel expects pte_young() to work regardless of CONFIG_SWAP.

    Make sure a minor fault is taken to set _PAGE_ACCESSED when it
    is not already set, regardless of the selection of CONFIG_SWAP.

    This adds at least 3 instructions to the TLB miss exception
    handlers fast path. Following patch will reduce this overhead.

    Also update the rotation instruction to the correct number of bits
    to reflect all changes done to _PAGE_ACCESSED over time.

    Fixes: d069cb4373fe ("powerpc/8xx: Don't touch ACCESSED when no SWAP.")
    Fixes: 5f356497c384 ("powerpc/8xx: remove unused _PAGE_WRITETHRU")
    Fixes: e0a8e0d90a9f ("powerpc/8xx: Handle PAGE_USER via APG bits")
    Fixes: 5b2753fc3e8a ("powerpc/8xx: Implementation of PAGE_EXEC")
    Fixes: a891c43b97d3 ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.")
    Cc: stable@vger.kernel.org
    Signed-off-by: Christophe Leroy
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/af834e8a0f1fa97bfae65664950f0984a70c4750.1602492856.git.christophe.leroy@csgroup.eu

    Christophe Leroy
     
  • The kernel expects pte_young() to work regardless of CONFIG_SWAP.

    Make sure a minor fault is taken to set _PAGE_ACCESSED when it
    is not already set, regardless of the selection of CONFIG_SWAP.

    Fixes: 2c74e2586bb9 ("powerpc/40x: Rework 40x PTE access and TLB miss")
    Cc: stable@vger.kernel.org
    Signed-off-by: Christophe Leroy
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/b02ca2ed2d3676a096219b48c0f69ec982a75bcf.1602342801.git.christophe.leroy@csgroup.eu

    Christophe Leroy
     
  • The kernel expects pte_young() to work regardless of CONFIG_SWAP.

    Make sure a minor fault is taken to set _PAGE_ACCESSED when it
    is not already set, regardless of the selection of CONFIG_SWAP.

    Fixes: 84de6ab0e904 ("powerpc/603: don't handle PAGE_ACCESSED in TLB miss handlers.")
    Cc: stable@vger.kernel.org
    Signed-off-by: Christophe Leroy
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/a44367744de54e2315b2f1a8cbbd7f88488072e0.1602342806.git.christophe.leroy@csgroup.eu

    Christophe Leroy
     

02 Nov, 2020

2 commits

  • The call to rcu_cpu_starting() in start_secondary() is not early
    enough in the CPU-hotplug onlining process, which results in lockdep
    splats as follows (with CONFIG_PROVE_RCU_LIST=y):

    WARNING: suspicious RCU usage
    -----------------------------
    kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!

    other info that might help us debug this:

    RCU used illegally from offline CPU!
    rcu_scheduler_active = 1, debug_locks = 1
    no locks held by swapper/1/0.

    Call Trace:
    dump_stack+0xec/0x144 (unreliable)
    lockdep_rcu_suspicious+0x128/0x14c
    __lock_acquire+0x1060/0x1c60
    lock_acquire+0x140/0x5f0
    _raw_spin_lock_irqsave+0x64/0xb0
    clockevents_register_device+0x74/0x270
    register_decrementer_clockevent+0x94/0x110
    start_secondary+0x134/0x800
    start_secondary_prolog+0x10/0x14

    This is avoided by adding a call to rcu_cpu_starting() near the
    beginning of the start_secondary() function. Note that the
    raw_smp_processor_id() is required in order to avoid calling into
    lockdep before RCU has declared the CPU to be watched for readers.

    It's safe to call rcu_cpu_starting() in the arch code as well as later
    in generic code, as explained by Paul:

    It uses a per-CPU variable so that RCU pays attention only to the
    first call to rcu_cpu_starting() if there is more than one of them.
    This is even intentional, due to there being a generic
    arch-independent call to rcu_cpu_starting() in
    notify_cpu_starting().

    So multiple calls to rcu_cpu_starting() are fine by design.

    Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion")
    Signed-off-by: Qian Cai
    Acked-by: Paul E. McKenney
    [mpe: Add Fixes tag, reword slightly & expand change log]
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20201028182334.13466-1-cai@redhat.com

    Qian Cai
     
  • Lockdep complains that a possible deadlock below in
    eeh_addr_cache_show() because it is acquiring a lock with IRQ enabled,
    but eeh_addr_cache_insert_dev() needs to acquire the same lock with IRQ
    disabled. Let's just make eeh_addr_cache_show() acquire the lock with
    IRQ disabled as well.

    CPU0 CPU1
    ---- ----
    lock(&pci_io_addr_cache_root.piar_lock);
    local_irq_disable();
    lock(&tp->lock);
    lock(&pci_io_addr_cache_root.piar_lock);

    lock(&tp->lock);

    *** DEADLOCK ***

    lock_acquire+0x140/0x5f0
    _raw_spin_lock_irqsave+0x64/0xb0
    eeh_addr_cache_insert_dev+0x48/0x390
    eeh_probe_device+0xb8/0x1a0
    pnv_pcibios_bus_add_device+0x3c/0x80
    pcibios_bus_add_device+0x118/0x290
    pci_bus_add_device+0x28/0xe0
    pci_bus_add_devices+0x54/0xb0
    pcibios_init+0xc4/0x124
    do_one_initcall+0xac/0x528
    kernel_init_freeable+0x35c/0x3fc
    kernel_init+0x24/0x148
    ret_from_kernel_thread+0x5c/0x80

    lock_acquire+0x140/0x5f0
    _raw_spin_lock+0x4c/0x70
    eeh_addr_cache_show+0x38/0x110
    seq_read+0x1a0/0x660
    vfs_read+0xc8/0x1f0
    ksys_read+0x74/0x130
    system_call_exception+0xf8/0x1d0
    system_call_common+0xe8/0x218

    Fixes: 5ca85ae6318d ("powerpc/eeh_cache: Add a way to dump the EEH address cache")
    Signed-off-by: Qian Cai
    Reviewed-by: Oliver O'Halloran
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20201028152717.8967-1-cai@redhat.com

    Qian Cai
     

26 Oct, 2020

1 commit

  • Use a more generic form for __section that requires quotes to avoid
    complications with clang and gcc differences.

    Remove the quote operator # from compiler_attributes.h __section macro.

    Convert all unquoted __section(foo) uses to quoted __section("foo").
    Also convert __attribute__((section("foo"))) uses to __section("foo")
    even if the __attribute__ has multiple list entry forms.

    Conversion done using the script at:

    https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl

    Signed-off-by: Joe Perches
    Reviewed-by: Nick Desaulniers
    Reviewed-by: Miguel Ojeda
    Signed-off-by: Linus Torvalds

    Joe Perches
     

25 Oct, 2020

1 commit

  • Pull powerpc fixes from Michael Ellerman:

    - A fix for undetected data corruption on Power9 Nimbus constraint with GCC 4.9
    powerpc/eeh: Fix eeh_dev_check_failure() for PE#0
    powerpc/64s: Remove TM from Power10 features
    selftests/powerpc: Make alignment handler test P9N DD2.1 vector CI load workaround
    powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulation
    powerpc/powernv/dump: Handle multiple writes to ack attribute
    powerpc/powernv/dump: Fix race while processing OPAL dump
    powerpc/smp: Use GFP_ATOMIC while allocating tmp mask
    powerpc/smp: Remove unnecessary variable
    powerpc/mce: Avoid nmi_enter/exit in real mode on pseries hash
    powerpc/opal_elog: Handle multiple writes to ack attribute

    Linus Torvalds
     

24 Oct, 2020

1 commit

  • Pull arch task_work cleanups from Jens Axboe:
    "Two cleanups that don't fit other categories:

    - Finally get the task_work_add() cleanup done properly, so we don't
    have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
    all callers, and also fixes up the documentation for
    task_work_add().

    - While working on some TIF related changes for 5.11, this
    TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
    duplication for how that is handled"

    * tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
    task_work: cleanup notification modes
    tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()

    Linus Torvalds
     

23 Oct, 2020

2 commits

  • Pull Kbuild updates from Masahiro Yamada:

    - Support 'make compile_commands.json' to generate the compilation
    database more easily, avoiding stale entries

    - Support 'make clang-analyzer' and 'make clang-tidy' for static checks
    using clang-tidy

    - Preprocess scripts/modules.lds.S to allow CONFIG options in the
    module linker script

    - Drop cc-option tests from compiler flags supported by our minimal
    GCC/Clang versions

    - Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y

    - Use sha1 build id for both BFD linker and LLD

    - Improve deb-pkg for reproducible builds and rootless builds

    - Remove stale, useless scripts/namespace.pl

    - Turn -Wreturn-type warning into error

    - Fix build error of deb-pkg when CONFIG_MODULES=n

    - Replace 'hostname' command with more portable 'uname -n'

    - Various Makefile cleanups

    * tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
    kbuild: Use uname for LINUX_COMPILE_HOST detection
    kbuild: Only add -fno-var-tracking-assignments for old GCC versions
    kbuild: remove leftover comment for filechk utility
    treewide: remove DISABLE_LTO
    kbuild: deb-pkg: clean up package name variables
    kbuild: deb-pkg: do not build linux-headers package if CONFIG_MODULES=n
    kbuild: enforce -Werror=return-type
    scripts: remove namespace.pl
    builddeb: Add support for all required debian/rules targets
    builddeb: Enable rootless builds
    builddeb: Pass -n to gzip for reproducible packages
    kbuild: split the build log of kallsyms
    kbuild: explicitly specify the build id style
    scripts/setlocalversion: make git describe output more reliable
    kbuild: remove cc-option test of -Werror=date-time
    kbuild: remove cc-option test of -fno-stack-check
    kbuild: remove cc-option test of -fno-strict-overflow
    kbuild: move CFLAGS_{KASAN,UBSAN,KCSAN} exports to relevant Makefiles
    kbuild: remove redundant CONFIG_KASAN check from scripts/Makefile.kasan
    kbuild: do not create built-in objects for external module builds
    ...

    Linus Torvalds
     
  • Pull initial set_fs() removal from Al Viro:
    "Christoph's set_fs base series + fixups"

    * 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Allow a NULL pos pointer to __kernel_read
    fs: Allow a NULL pos pointer to __kernel_write
    powerpc: remove address space overrides using set_fs()
    powerpc: use non-set_fs based maccess routines
    x86: remove address space overrides using set_fs()
    x86: make TASK_SIZE_MAX usable from assembly code
    x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h
    lkdtm: remove set_fs-based tests
    test_bitmap: remove user bitmap tests
    uaccess: add infrastructure for kernel builds with set_fs()
    fs: don't allow splice read/write without explicit ops
    fs: don't allow kernel reads and writes without iter ops
    sysctl: Convert to iter interfaces
    proc: add a read_iter method to proc proc_ops
    proc: cleanup the compat vs no compat file ops
    proc: remove a level of indentation in proc_get_inode

    Linus Torvalds
     

22 Oct, 2020

1 commit

  • In commit 269e583357df ("powerpc/eeh: Delete eeh_pe->config_addr") the
    following simplification was made:

    - if (!pe->addr && !pe->config_addr) {
    + if (!pe->addr) {
    eeh_stats.no_cfg_addr++;
    return 0;
    }

    This introduced a bug which causes EEH checking to be skipped for
    devices in PE#0.

    Before the change above the check would always pass since at least one
    of the two PE addresses would be non-zero in all circumstances. On
    PowerNV pe->config_addr would be the BDFN of the first device added to
    the PE. The zero BDFN is reserved for the PHB's root port, but this is
    fine since for obscure platform reasons the root port is never
    assigned to PE#0.

    Similarly, on pseries pe->addr has always been non-zero for the
    reasons outlined in commit 42de19d5ef71 ("powerpc/pseries/eeh: Allow
    zero to be a valid PE configuration address").

    We can fix the problem by deleting the block entirely The original
    purpose of this test was to avoid performing EEH checks on devices
    that were not on an EEH capable bus. In modern Linux the edev->pe
    pointer will be NULL for devices that are not on an EEH capable bus.
    The code block immediately above this one already checks for the
    edev->pe == NULL case so this test (new and old) is entirely
    redundant.

    Ideally we'd delete eeh_stats.no_cfg_addr too since nothing increments
    it any more. Unfortunately, that information is exposed via
    /proc/powerpc/eeh which means it's technically ABI. We could make it
    hard-coded, but that's a change for another patch.

    Fixes: 269e583357df ("powerpc/eeh: Delete eeh_pe->config_addr")
    Signed-off-by: Oliver O'Halloran
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20201021232554.1434687-1-oohall@gmail.com

    Oliver O'Halloran
     

20 Oct, 2020

2 commits

  • ISA v3.1 removes transactional memory and hence it should not be present
    in cpu_features or cpu_user_features2. Remove CPU_FTR_TM_COMP from
    CPU_FTRS_POWER10. Remove PPC_FEATURE2_HTM_COMP and
    PPC_FEATURE2_HTM_NOSC_COMP from COMMON_USER2_POWER10.

    Fixes: a3ea40d5c736 ("powerpc: Add POWER10 architected mode")
    Signed-off-by: Jordan Niethe
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200827035529.900-1-jniethe5@gmail.com

    Jordan Niethe
     
  • __get_user_atomic_128_aligned() stores to kaddr using stvx which is a
    VMX store instruction, hence kaddr must be 16 byte aligned otherwise
    the store won't occur as expected.

    Unfortunately when we call __get_user_atomic_128_aligned() in
    p9_hmi_special_emu(), the buffer we pass as kaddr (ie. vbuf) isn't
    guaranteed to be 16B aligned. This means that the write to vbuf in
    __get_user_atomic_128_aligned() has the bottom bits of the address
    truncated. This results in other local variables being
    overwritten. Also vbuf will not contain the correct data which results
    in the userspace emulation being wrong and hence undetected user data
    corruption.

    In the past we've been mostly lucky as vbuf has ended up aligned but
    this is fragile and isn't always true. CONFIG_STACKPROTECTOR in
    particular can change the stack arrangement enough that our luck runs
    out.

    This issue only occurs on POWER9 Nimbus
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20201013043741.743413-1-mikey@neuling.org

    Michael Neuling