20 Jan, 2021
3 commits
-
This is the 5.10.8 stable release
* tag 'v5.10.8': (104 commits)
Linux 5.10.8
tools headers UAPI: Sync linux/fscrypt.h with the kernel sources
drm/panfrost: Remove unused variables in panfrost_job_close()
...Signed-off-by: Jason Liu
-
This is the 5.10.7 stable release
* tag 'v5.10.7': (144 commits)
Linux 5.10.7
scsi: target: Fix XCOPY NAA identifier lookup
rtlwifi: rise completion at the last step of firmware callback
...Signed-off-by: Jason Liu
-
This is the 5.10.5 stable release
* tag 'v5.10.5': (63 commits)
Linux 5.10.5
device-dax: Fix range release
ext4: avoid s_mb_prefetch to be zero in individual scenarios
...Signed-off-by: Jason Liu
17 Jan, 2021
1 commit
-
[ Upstream commit 98bf2d3f4970179c702ef64db658e0553bc6ef3a ]
When we have VMAP stack, exception prolog 1 sets r1, not r11.
When it is not an RTAS machine check, don't trash r1 because it is
needed by prolog 1.Fixes: da7bb43ab9da ("powerpc/32: Fix vmap stack - Properly set r1 before activating MMU")
Fixes: d2e006036082 ("powerpc/32: Use SPRN_SPRG_SCRATCH2 in exception prologs")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Christophe Leroy
[mpe: Squash in fixup for RTAS machine check from Christophe]
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/bc77d61d1c18940e456a2dee464f1e2eda65a3f0.1608621048.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin
13 Jan, 2021
1 commit
-
commit 3ce47d95b7346dcafd9bed3556a8d072cb2b8571 upstream.
Commit eff8728fe698 ("vmlinux.lds.h: Add PGO and AutoFDO input
sections") added ".text.unlikely.*" and ".text.hot.*" due to an LLVM
change [1].After another LLVM change [2], these sections are seen in some PowerPC
builds, where there is a orphan section warning then build failure:$ make -skj"$(nproc)" \
ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 O=out \
distclean powernv_defconfig zImage.epapr
ld.lld: warning: kernel/built-in.a(panic.o):(.text.unlikely.) is being placed in '.text.unlikely.'
...
ld.lld: warning: address (0xc000000000009314) of section .text is not a multiple of alignment (256)
...
ERROR: start_text address is c000000000009400, should be c000000000008000
ERROR: try to enable LD_HEAD_STUB_CATCH config option
ERROR: see comments in arch/powerpc/tools/head_check.sh
...Explicitly handle these sections like in the main linker script so
there is no more build failure.[1]: https://reviews.llvm.org/D79600
[2]: https://reviews.llvm.org/D92493Fixes: 83a092cf95f2 ("powerpc: Link warning for orphan sections")
Cc: stable@vger.kernel.org
Signed-off-by: Nathan Chancellor
Signed-off-by: Michael Ellerman
Link: https://github.com/ClangBuiltLinux/linux/issues/1218
Link: https://lore.kernel.org/r/20210104205952.1399409-1-natechancellor@gmail.com
Signed-off-by: Greg Kroah-Hartman
06 Jan, 2021
1 commit
-
[ Upstream commit 59d512e4374b2d8a6ad341475dc94c4a4bdec7d3 ]
This is way to catch some cases of decrementer overflow, when the
decrementer has underflowed an odd number of times, while MSR[EE] was
disabled.With a typical small decrementer, a timer that fires when MSR[EE] is
disabled will be "lost" if MSR[EE] remains disabled for between 4.3 and
8.6 seconds after the timer expires. In any case, the decrementer
interrupt would be taken at 8.6 seconds and the timer would be found at
that point.So this check is for catching extreme latency events, and it prevents
those latencies from being a further few seconds long. It's not obvious
this is a good tradeoff. This is already a watchdog magnitude event and
that situation is not improved a significantly with this check. For
large decrementers, it's useless.Therefore remove this check, which avoids a mftb when enabling hard
disabled interrupts (e.g., when enabling after coming from hardware
interrupt handlers). Perhaps more importantly, it also removes the
clunky MSR[EE] vs PACA_IRQ_HARD_DIS incoherency in soft-interrupt replay
which simplifies the code.Signed-off-by: Nicholas Piggin
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/20201107014336.2337337-1-npiggin@gmail.com
Signed-off-by: Sasha Levin
04 Jan, 2021
1 commit
-
This is the 5.10.4 stable release
* tag 'v5.10.4': (717 commits)
Linux 5.10.4
x86/CPU/AMD: Save AMD NodeId as cpu_die_id
drm/edid: fix objtool warning in drm_cvt_modes()
...Signed-off-by: Jason Liu
Conflicts:
drivers/gpu/drm/imx/dcss/dcss-plane.c
drivers/media/i2c/ov5640.c
30 Dec, 2020
6 commits
-
commit f10881a46f8914428110d110140a455c66bdf27b upstream.
Commit bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace")
introduced the following error when invoking the errinjct userspace
tool:[root@ltcalpine2-lp5 librtas]# errinjct open
[327884.071171] sys_rtas: RTAS call blocked - exploit attempt?
[327884.071186] sys_rtas: token=0x26, nargs=0 (called by errinjct)
errinjct: Could not open RTAS error injection facility
errinjct: librtas: open: Unexpected I/O errorThe entry for ibm,open-errinjct in rtas_filter array has a typo where
the "j" is omitted in the rtas call name. After fixing this typo the
errinjct tool functions again as expected.[root@ltcalpine2-lp5 linux]# errinjct open
RTAS error injection facility open, token = 1Fixes: bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace")
Cc: stable@vger.kernel.org
Signed-off-by: Tyrel Datwyler
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/20201208195434.8289-1-tyreld@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman -
commit d5c243989fb0cb03c74d7340daca3b819f706ee7 upstream.
We need r1 to be properly set before activating MMU, otherwise any new
exception taken while saving registers into the stack in syscall
prologs will use the user stack, which is wrong and will even lockup
or crash when KUAP is selected.Do that by switching the meaning of r11 and r1 until we have saved r1
to the stack: copy r1 into r11 and setup the new stack pointer in r1.
To avoid complicating and impacting all generic and specific prolog
code (and more), copy back r1 into r11 once r11 is save onto
the stack.We could get rid of copying r1 back and forth at the cost of rewriting
everything to use r1 instead of r11 all the way when CONFIG_VMAP_STACK
is set, but the effort is probably not worth it for now.Fixes: da7bb43ab9da ("powerpc/32: Fix vmap stack - Properly set r1 before activating MMU")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Christophe Leroy
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/a3d819d5c348cee9783a311d5d3f3ba9b48fd219.1608531452.git.christophe.leroy@csgroup.eu
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 9014eab6a38c60fd185bc92ed60f46cf99a462ab ]
It fixes this link warning:
WARNING: modpost: vmlinux.o(.text.unlikely+0x2d98): Section mismatch in reference from the function init_big_cores.isra.0() to the function .init.text:init_thread_group_cache_map()
The function init_big_cores.isra.0() references
the function __init init_thread_group_cache_map().
This is often because init_big_cores.isra.0 lacks a __init
annotation or the annotation of init_thread_group_cache_map is wrong.Fixes: 425752c63b6f ("powerpc: Detect the presence of big-cores via "ibm, thread-groups"")
Signed-off-by: Cédric Le Goater
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/20201221074154.403779-1-clg@kaod.org
Signed-off-by: Sasha Levin -
[ Upstream commit fe18a35e685c9bdabc8b11b3e19deb85a068b75d ]
Commit 63ce271b5e37 ("powerpc/prom: convert PROM_BUG() to standard
trap") added an EMIT_BUG_ENTRY for the trap after the branch to
start_kernel(). The EMIT_BUG_ENTRY was for the address "0b", however the
trap was not labeled with "0". Hence the address used for bug is in
relative_toc() where the previous "0" label is. Label the trap as "0" so
the correct address is used.Fixes: 63ce271b5e37 ("powerpc/prom: convert PROM_BUG() to standard trap")
Signed-off-by: Jordan Niethe
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/20201130004404.30953-1-jniethe5@gmail.com
Signed-off-by: Sasha Levin -
[ Upstream commit a7223f5bfcaeade4a86d35263493bcda6c940891 ]
Commit 7053f80d9696 ("powerpc/64: Prevent stack protection in early
boot") introduced a couple of uses of __attribute__((optimize)) with
function scope, to disable the stack protector in some early boot
code.Unfortunately, and this is documented in the GCC man pages [0],
overriding function attributes for optimization is broken, and is only
supported for debug scenarios, not for production: the problem appears
to be that setting GCC -f flags using this method will cause it to
forget about some or all other optimization settings that have been
applied.So the only safe way to disable the stack protector is to disable it
for the entire source file.[0] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
Fixes: 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot")
Signed-off-by: Ard Biesheuvel
[mpe: Drop one remaining use of __nostackprotector, reported by snowpatch]
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/20201028080433.26799-1-ardb@kernel.org
Signed-off-by: Sasha Levin -
[ Upstream commit 3c0b976bf20d236c57adcefa80f86a0a1d737727 ]
Currently in generic_secondary_smp_init(), cur_cpu_spec->cpu_restore()
is called before a stack has been set up in r1. This was previously fine
as the cpu_restore() functions were implemented in assembly and did not
use a stack. However commit 5a61ef74f269 ("powerpc/64s: Support new
device tree binding for discovering CPU features") used
__restore_cpu_cpufeatures() as the cpu_restore() function for a
device-tree features based cputable entry. This is a C function and
hence uses a stack in r1.generic_secondary_smp_init() is entered on the secondary cpus via the
primary cpu using the OPAL call opal_start_cpu(). In OPAL, each hardware
thread has its own stack. The OPAL call is ran in the primary's hardware
thread. During the call, a job is scheduled on a secondary cpu that will
start executing at the address of generic_secondary_smp_init(). Hence
the value that will be left in r1 when the secondary cpu enters the
kernel is part of that secondary cpu's individual OPAL stack. This means
that __restore_cpu_cpufeatures() will write to that OPAL stack. This is
not horribly bad as each hardware thread has its own stack and the call
that enters the kernel from OPAL never returns, but it is still wrong
and should be corrected.Create the temp kernel stack before calling cpu_restore().
As noted by mpe, for a kexec boot, the secondary CPUs are released from
the spin loop at address 0x60 by smp_release_cpus() and then jump to
generic_secondary_smp_init(). The call to smp_release_cpus() is in
setup_arch(), and it comes before the call to emergency_stack_init().
emergency_stack_init() allocates an emergency stack in the PACA for each
CPU. This address in the PACA is what is used to set up the temp kernel
stack in generic_secondary_smp_init(). Move releasing the secondary CPUs
to after the PACAs have been allocated an emergency stack, otherwise the
PACA stack pointer will contain garbage and hence the temp kernel stack
created from it will be broken.Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")
Signed-off-by: Jordan Niethe
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/20201014072837.24539-1-jniethe5@gmail.com
Signed-off-by: Sasha Levin
18 Dec, 2020
1 commit
-
* origin/arch/qoriq: (6 commits)
soc: fsl: qbman: Ensure device cleanup is run for kexec
drivers/soc/fsl: add EPU FSM configuration for deep sleep
fsl_pmc: update device bindings
powerpc/pm: Fix suspend=n in menuconfig for e500mc platforms.
powerpc/pm: add sleep and deep sleep on QorIQ SoCs
...
14 Dec, 2020
3 commits
-
In sleep mode, the clocks of CPU core and unused IP blocks are turned
off (IP blocks allowed to wake up system will running).Some QorIQ SoCs like MPC8536, P1022 and T104x, have deep sleep PM mode
in addtion to the sleep PM mode. While in deep sleep mode,
additionally, the power supply is removed from CPU core and most IP
blocks. Only the blocks needed to wake up the chip out of deep sleep
are ON.This feature supports 32-bit and 36-bit address space.
The sleep mode is equal to the Standby state in Linux. The deep sleep
mode is equal to the Suspend-to-RAM state of Linux Power Management.
Command to enter sleep mode.
echo standby > /sys/power/state
Command to enter deep sleep mode.
echo mem > /sys/power/stateSigned-off-by: Dave Liu
Signed-off-by: Li Yang
Signed-off-by: Jin Qing
Signed-off-by: Jerry Huang
Signed-off-by: Ramneek Mehresh
Signed-off-by: Zhao Chenhui
Signed-off-by: Wang Dongsheng
Signed-off-by: Tang Yuantian
Signed-off-by: Xie Xiaobo
Signed-off-by: Zhao Qiang
Signed-off-by: Shengzhou Liu
Signed-off-by: Ran Wang -
Need to be separated when submitting upstream.
Signed-off-by: Li Yang
-
Various e500 core have different cache architecture, so they
need different cache flush operations. Therefore, add a callback
function cpu_flush_caches to the struct cpu_spec. The cache flush
operation for the specific kind of e500 is selected at init time.
The callback function will flush all caches in the current cpu.Signed-off-by: Chenhui Zhao
Reviewed-by: Yang Li
Reviewed-by: Jose Rivera
Signed-off-by: Ran Wang
30 Nov, 2020
1 commit
-
Pull locking fixes from Thomas Gleixner:
"Two more places which invoke tracing from RCU disabled regions in the
idle path.Similar to the entry path the low level idle functions have to be
non-instrumentable"* tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intel_idle: Fix intel_idle() vs tracing
sched/idle: Fix arch_cpu_idle() vs tracing
24 Nov, 2020
1 commit
-
We call arch_cpu_idle() with RCU disabled, but then use
local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.Switch all arch_cpu_idle() implementations to use
raw_local_irq_{en,dis}able() and carefully manage the
lockdep,rcu,tracing state like we do in entry.(XXX: we really should change arch_cpu_idle() to not return with
interrupts enabled)Reported-by: Sven Schnelle
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Mark Rutland
Tested-by: Mark Rutland
Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
23 Nov, 2020
1 commit
-
From Daniel's cover letter:
IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern.This patch series flushes the L1 cache on kernel entry (patch 2) and after the
kernel performs any user accesses (patch 3). It also adds a self-test and
performs some related cleanups.
19 Nov, 2020
3 commits
-
In kup.h we currently include kup-radix.h for all 64-bit builds, which
includes Book3S and Book3E. The latter doesn't make sense, Book3E
never uses the Radix MMU.This has worked up until now, but almost by accident, and the recent
uaccess flush changes introduced a build breakage on Book3E because of
the bad structure of the code.So disentangle things so that we only use kup-radix.h for Book3S. This
requires some more stubs in kup.h and fixing an include in
syscall_64.c.Signed-off-by: Michael Ellerman
-
IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This patch flushes the L1 cache after user accesses.This is part of the fix for CVE-2020-4788.
Signed-off-by: Nicholas Piggin
Signed-off-by: Daniel Axtens
Signed-off-by: Michael Ellerman -
IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This patch flushes the L1 cache on kernel entry.This is part of the fix for CVE-2020-4788.
Signed-off-by: Nicholas Piggin
Signed-off-by: Daniel Axtens
Signed-off-by: Michael Ellerman
18 Nov, 2020
1 commit
-
Commit 2284ffea8f0c ("powerpc/64s/exception: Only test KVM in SRR
interrupts when PR KVM is supported") removed KVM guest tests from
interrupts that do not set HV=1, when PR-KVM is not configured.This is wrong for HV-KVM HPT guest MMIO emulation case which attempts
to load the faulting instruction word with MSR[DR]=1 and MSR[HV]=1 with
the guest MMU context loaded. This can cause host DSI, DSLB interrupts
which must test for KVM guest. Restore this and add a comment.Fixes: 2284ffea8f0c ("powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supported")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Nicholas Piggin
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/20201117135617.3521127-1-npiggin@gmail.com
16 Nov, 2020
1 commit
-
pseries guest kernels have a FWNMI handler for SRESET and MCE NMIs,
which is basically the same as the regular handlers for those
interrupts.The system reset FWNMI handler did not have a KVM guest test in it,
although it probably should have because the guest can itself run
guests.Commit 4f50541f6703b ("powerpc/64s/exception: Move all interrupt
handlers to new style code gen macros") convert the handler faithfully
to avoid a KVM test with a "clever" trick to modify the IKVM_REAL
setting to 0 when the fwnmi handler is to be generated (PPC_PSERIES=y).
This worked when the KVM test was generated in the interrupt entry
handlers, but a later patch moved the KVM test to the common handler,
and the common handler macro is expanded below the fwnmi entry. This
prevents the KVM test from being generated even for the 0x100 entry
point as well.The result is NMI IPIs in the host kernel when a guest is running will
use gest registers. This goes particularly badly when an HPT guest is
running and the MMU is set to guest mode.Remove this trickery and just generate the test always.
Fixes: 9600f261acaa ("powerpc/64s/exception: Move KVM test to common code")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Nicholas Piggin
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/20201114114743.3306283-1-npiggin@gmail.com
08 Nov, 2020
1 commit
-
When calling early_hash_table(), the kernel hasn't been yet
relocated to its linking address, so data must be addressed
with relocation offset.Add relocation offset to write into Hash in early_hash_table().
Fixes: 69a1593abdbc ("powerpc/32s: Setup the early hash table at all time.")
Reported-by: Erhard Furtner
Reported-by: Andreas Schwab
Signed-off-by: Christophe Leroy
Tested-by: Serge Belyshev
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/9e225a856a8b22e0e77587ee22ab7a2f5bca8753.1604740029.git.christophe.leroy@csgroup.eu
05 Nov, 2020
4 commits
-
When _PAGE_ACCESSED is not set, a minor fault is expected.
To do this, TLB miss exception ANDs _PAGE_PRESENT and _PAGE_ACCESSED
into the L2 entry valid bit.To simplify the processing and reduce the number of instructions in
TLB miss exceptions, manage it as an APG bit and get it next to
_PAGE_GUARDED bit to allow a copy in one go. Then declare the
corresponding groups as handling all accesses as user accesses.
As the PP bits always define user as No Access, it will generate
a fault.Signed-off-by: Christophe Leroy
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/80f488db230c6b0e7b3b990d72bd94a8a069e93e.1602492856.git.christophe.leroy@csgroup.eu -
The kernel expects pte_young() to work regardless of CONFIG_SWAP.
Make sure a minor fault is taken to set _PAGE_ACCESSED when it
is not already set, regardless of the selection of CONFIG_SWAP.This adds at least 3 instructions to the TLB miss exception
handlers fast path. Following patch will reduce this overhead.Also update the rotation instruction to the correct number of bits
to reflect all changes done to _PAGE_ACCESSED over time.Fixes: d069cb4373fe ("powerpc/8xx: Don't touch ACCESSED when no SWAP.")
Fixes: 5f356497c384 ("powerpc/8xx: remove unused _PAGE_WRITETHRU")
Fixes: e0a8e0d90a9f ("powerpc/8xx: Handle PAGE_USER via APG bits")
Fixes: 5b2753fc3e8a ("powerpc/8xx: Implementation of PAGE_EXEC")
Fixes: a891c43b97d3 ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/af834e8a0f1fa97bfae65664950f0984a70c4750.1602492856.git.christophe.leroy@csgroup.eu -
The kernel expects pte_young() to work regardless of CONFIG_SWAP.
Make sure a minor fault is taken to set _PAGE_ACCESSED when it
is not already set, regardless of the selection of CONFIG_SWAP.Fixes: 2c74e2586bb9 ("powerpc/40x: Rework 40x PTE access and TLB miss")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/b02ca2ed2d3676a096219b48c0f69ec982a75bcf.1602342801.git.christophe.leroy@csgroup.eu -
The kernel expects pte_young() to work regardless of CONFIG_SWAP.
Make sure a minor fault is taken to set _PAGE_ACCESSED when it
is not already set, regardless of the selection of CONFIG_SWAP.Fixes: 84de6ab0e904 ("powerpc/603: don't handle PAGE_ACCESSED in TLB miss handlers.")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/a44367744de54e2315b2f1a8cbbd7f88488072e0.1602342806.git.christophe.leroy@csgroup.eu
02 Nov, 2020
2 commits
-
The call to rcu_cpu_starting() in start_secondary() is not early
enough in the CPU-hotplug onlining process, which results in lockdep
splats as follows (with CONFIG_PROVE_RCU_LIST=y):WARNING: suspicious RCU usage
-----------------------------
kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/1/0.Call Trace:
dump_stack+0xec/0x144 (unreliable)
lockdep_rcu_suspicious+0x128/0x14c
__lock_acquire+0x1060/0x1c60
lock_acquire+0x140/0x5f0
_raw_spin_lock_irqsave+0x64/0xb0
clockevents_register_device+0x74/0x270
register_decrementer_clockevent+0x94/0x110
start_secondary+0x134/0x800
start_secondary_prolog+0x10/0x14This is avoided by adding a call to rcu_cpu_starting() near the
beginning of the start_secondary() function. Note that the
raw_smp_processor_id() is required in order to avoid calling into
lockdep before RCU has declared the CPU to be watched for readers.It's safe to call rcu_cpu_starting() in the arch code as well as later
in generic code, as explained by Paul:It uses a per-CPU variable so that RCU pays attention only to the
first call to rcu_cpu_starting() if there is more than one of them.
This is even intentional, due to there being a generic
arch-independent call to rcu_cpu_starting() in
notify_cpu_starting().So multiple calls to rcu_cpu_starting() are fine by design.
Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion")
Signed-off-by: Qian Cai
Acked-by: Paul E. McKenney
[mpe: Add Fixes tag, reword slightly & expand change log]
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/20201028182334.13466-1-cai@redhat.com -
Lockdep complains that a possible deadlock below in
eeh_addr_cache_show() because it is acquiring a lock with IRQ enabled,
but eeh_addr_cache_insert_dev() needs to acquire the same lock with IRQ
disabled. Let's just make eeh_addr_cache_show() acquire the lock with
IRQ disabled as well.CPU0 CPU1
---- ----
lock(&pci_io_addr_cache_root.piar_lock);
local_irq_disable();
lock(&tp->lock);
lock(&pci_io_addr_cache_root.piar_lock);
lock(&tp->lock);*** DEADLOCK ***
lock_acquire+0x140/0x5f0
_raw_spin_lock_irqsave+0x64/0xb0
eeh_addr_cache_insert_dev+0x48/0x390
eeh_probe_device+0xb8/0x1a0
pnv_pcibios_bus_add_device+0x3c/0x80
pcibios_bus_add_device+0x118/0x290
pci_bus_add_device+0x28/0xe0
pci_bus_add_devices+0x54/0xb0
pcibios_init+0xc4/0x124
do_one_initcall+0xac/0x528
kernel_init_freeable+0x35c/0x3fc
kernel_init+0x24/0x148
ret_from_kernel_thread+0x5c/0x80lock_acquire+0x140/0x5f0
_raw_spin_lock+0x4c/0x70
eeh_addr_cache_show+0x38/0x110
seq_read+0x1a0/0x660
vfs_read+0xc8/0x1f0
ksys_read+0x74/0x130
system_call_exception+0xf8/0x1d0
system_call_common+0xe8/0x218Fixes: 5ca85ae6318d ("powerpc/eeh_cache: Add a way to dump the EEH address cache")
Signed-off-by: Qian Cai
Reviewed-by: Oliver O'Halloran
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/20201028152717.8967-1-cai@redhat.com
26 Oct, 2020
1 commit
-
Use a more generic form for __section that requires quotes to avoid
complications with clang and gcc differences.Remove the quote operator # from compiler_attributes.h __section macro.
Convert all unquoted __section(foo) uses to quoted __section("foo").
Also convert __attribute__((section("foo"))) uses to __section("foo")
even if the __attribute__ has multiple list entry forms.Conversion done using the script at:
https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl
Signed-off-by: Joe Perches
Reviewed-by: Nick Desaulniers
Reviewed-by: Miguel Ojeda
Signed-off-by: Linus Torvalds
25 Oct, 2020
1 commit
-
Pull powerpc fixes from Michael Ellerman:
- A fix for undetected data corruption on Power9 Nimbus constraint with GCC 4.9
powerpc/eeh: Fix eeh_dev_check_failure() for PE#0
powerpc/64s: Remove TM from Power10 features
selftests/powerpc: Make alignment handler test P9N DD2.1 vector CI load workaround
powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulation
powerpc/powernv/dump: Handle multiple writes to ack attribute
powerpc/powernv/dump: Fix race while processing OPAL dump
powerpc/smp: Use GFP_ATOMIC while allocating tmp mask
powerpc/smp: Remove unnecessary variable
powerpc/mce: Avoid nmi_enter/exit in real mode on pseries hash
powerpc/opal_elog: Handle multiple writes to ack attribute
24 Oct, 2020
1 commit
-
Pull arch task_work cleanups from Jens Axboe:
"Two cleanups that don't fit other categories:- Finally get the task_work_add() cleanup done properly, so we don't
have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
all callers, and also fixes up the documentation for
task_work_add().- While working on some TIF related changes for 5.11, this
TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
duplication for how that is handled"* tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
task_work: cleanup notification modes
tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
23 Oct, 2020
2 commits
-
Pull Kbuild updates from Masahiro Yamada:
- Support 'make compile_commands.json' to generate the compilation
database more easily, avoiding stale entries- Support 'make clang-analyzer' and 'make clang-tidy' for static checks
using clang-tidy- Preprocess scripts/modules.lds.S to allow CONFIG options in the
module linker script- Drop cc-option tests from compiler flags supported by our minimal
GCC/Clang versions- Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y
- Use sha1 build id for both BFD linker and LLD
- Improve deb-pkg for reproducible builds and rootless builds
- Remove stale, useless scripts/namespace.pl
- Turn -Wreturn-type warning into error
- Fix build error of deb-pkg when CONFIG_MODULES=n
- Replace 'hostname' command with more portable 'uname -n'
- Various Makefile cleanups
* tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
kbuild: Use uname for LINUX_COMPILE_HOST detection
kbuild: Only add -fno-var-tracking-assignments for old GCC versions
kbuild: remove leftover comment for filechk utility
treewide: remove DISABLE_LTO
kbuild: deb-pkg: clean up package name variables
kbuild: deb-pkg: do not build linux-headers package if CONFIG_MODULES=n
kbuild: enforce -Werror=return-type
scripts: remove namespace.pl
builddeb: Add support for all required debian/rules targets
builddeb: Enable rootless builds
builddeb: Pass -n to gzip for reproducible packages
kbuild: split the build log of kallsyms
kbuild: explicitly specify the build id style
scripts/setlocalversion: make git describe output more reliable
kbuild: remove cc-option test of -Werror=date-time
kbuild: remove cc-option test of -fno-stack-check
kbuild: remove cc-option test of -fno-strict-overflow
kbuild: move CFLAGS_{KASAN,UBSAN,KCSAN} exports to relevant Makefiles
kbuild: remove redundant CONFIG_KASAN check from scripts/Makefile.kasan
kbuild: do not create built-in objects for external module builds
... -
Pull initial set_fs() removal from Al Viro:
"Christoph's set_fs base series + fixups"* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Allow a NULL pos pointer to __kernel_read
fs: Allow a NULL pos pointer to __kernel_write
powerpc: remove address space overrides using set_fs()
powerpc: use non-set_fs based maccess routines
x86: remove address space overrides using set_fs()
x86: make TASK_SIZE_MAX usable from assembly code
x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h
lkdtm: remove set_fs-based tests
test_bitmap: remove user bitmap tests
uaccess: add infrastructure for kernel builds with set_fs()
fs: don't allow splice read/write without explicit ops
fs: don't allow kernel reads and writes without iter ops
sysctl: Convert to iter interfaces
proc: add a read_iter method to proc proc_ops
proc: cleanup the compat vs no compat file ops
proc: remove a level of indentation in proc_get_inode
22 Oct, 2020
1 commit
-
In commit 269e583357df ("powerpc/eeh: Delete eeh_pe->config_addr") the
following simplification was made:- if (!pe->addr && !pe->config_addr) {
+ if (!pe->addr) {
eeh_stats.no_cfg_addr++;
return 0;
}This introduced a bug which causes EEH checking to be skipped for
devices in PE#0.Before the change above the check would always pass since at least one
of the two PE addresses would be non-zero in all circumstances. On
PowerNV pe->config_addr would be the BDFN of the first device added to
the PE. The zero BDFN is reserved for the PHB's root port, but this is
fine since for obscure platform reasons the root port is never
assigned to PE#0.Similarly, on pseries pe->addr has always been non-zero for the
reasons outlined in commit 42de19d5ef71 ("powerpc/pseries/eeh: Allow
zero to be a valid PE configuration address").We can fix the problem by deleting the block entirely The original
purpose of this test was to avoid performing EEH checks on devices
that were not on an EEH capable bus. In modern Linux the edev->pe
pointer will be NULL for devices that are not on an EEH capable bus.
The code block immediately above this one already checks for the
edev->pe == NULL case so this test (new and old) is entirely
redundant.Ideally we'd delete eeh_stats.no_cfg_addr too since nothing increments
it any more. Unfortunately, that information is exposed via
/proc/powerpc/eeh which means it's technically ABI. We could make it
hard-coded, but that's a change for another patch.Fixes: 269e583357df ("powerpc/eeh: Delete eeh_pe->config_addr")
Signed-off-by: Oliver O'Halloran
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/20201021232554.1434687-1-oohall@gmail.com
20 Oct, 2020
2 commits
-
ISA v3.1 removes transactional memory and hence it should not be present
in cpu_features or cpu_user_features2. Remove CPU_FTR_TM_COMP from
CPU_FTRS_POWER10. Remove PPC_FEATURE2_HTM_COMP and
PPC_FEATURE2_HTM_NOSC_COMP from COMMON_USER2_POWER10.Fixes: a3ea40d5c736 ("powerpc: Add POWER10 architected mode")
Signed-off-by: Jordan Niethe
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/20200827035529.900-1-jniethe5@gmail.com -
__get_user_atomic_128_aligned() stores to kaddr using stvx which is a
VMX store instruction, hence kaddr must be 16 byte aligned otherwise
the store won't occur as expected.Unfortunately when we call __get_user_atomic_128_aligned() in
p9_hmi_special_emu(), the buffer we pass as kaddr (ie. vbuf) isn't
guaranteed to be 16B aligned. This means that the write to vbuf in
__get_user_atomic_128_aligned() has the bottom bits of the address
truncated. This results in other local variables being
overwritten. Also vbuf will not contain the correct data which results
in the userspace emulation being wrong and hence undetected user data
corruption.In the past we've been mostly lucky as vbuf has ended up aligned but
this is fragile and isn't always true. CONFIG_STACKPROTECTOR in
particular can change the stack arrangement enough that our luck runs
out.This issue only occurs on POWER9 Nimbus
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/20201013043741.743413-1-mikey@neuling.org