20 Oct, 2018
3 commits
-
commit 8183d99f4a22c2abbc543847a588df3666ef0c0c upstream.
feature fixups need to use patch_instruction() early in the boot,
even before the code is relocated to its final address, requiring
patch_instruction() to use PTRRELOC() in order to address data.But feature fixups applies on code before it is set to read only,
even for modules. Therefore, feature fixups can use
raw_patch_instruction() instead.Signed-off-by: Christophe Leroy
Signed-off-by: Michael Ellerman
Reported-by: David Gounaris
Tested-by: David Gounaris
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 96dc89d526ef77604376f06220e3d2931a0bfd58 ]
Current we store the userspace r1 to PACATMSCRATCH before finally
saving it to the thread struct.In theory an exception could be taken here (like a machine check or
SLB miss) that could write PACATMSCRATCH and hence corrupt the
userspace r1. The SLB fault currently doesn't touch PACATMSCRATCH, but
others do.We've never actually seen this happen but it's theoretically
possible. Either way, the code is fragile as it is.This patch saves r1 to the kernel stack (which can't fault) before we
turn MSR[RI] back on. PACATMSCRATCH is still used but only with
MSR[RI] off. We then copy r1 from the kernel stack to the thread
struct once we have MSR[RI] back on.Suggested-by: Breno Leitao
Signed-off-by: Michael Neuling
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit cf13435b730a502e814c63c84d93db131e563f5f ]
When we treclaim we store the userspace checkpointed r13 to a scratch
SPR and then later save the scratch SPR to the user thread struct.Unfortunately, this doesn't work as accessing the user thread struct
can take an SLB fault and the SLB fault handler will write the same
scratch SPRG that now contains the userspace r13.To fix this, we store r13 to the kernel stack (which can't fault)
before we access the user thread struct.Found by running P8 guest + powervm + disable_1tb_segments + TM. Seen
as a random userspace segfault with r13 looking like a kernel address.Signed-off-by: Michael Neuling
Reviewed-by: Breno Leitao
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
18 Oct, 2018
1 commit
-
commit 4628a64591e6cee181237060961e98c615c33966 upstream.
Currently _PAGE_DEVMAP bit is not preserved in mprotect(2) calls. As a
result we will see warnings such as:BUG: Bad page map in process JobWrk0013 pte:800001803875ea25 pmd:7624381067
addr:00007f0930720000 vm_flags:280000f9 anon_vma: (null) mapping:ffff97f2384056f0 index:0
file:457-000000fe00000030-00000009-000000ca-00000001_2001.fileblock fault:xfs_filemap_fault [xfs] mmap:xfs_file_mmap [xfs] readpage: (null)
CPU: 3 PID: 15848 Comm: JobWrk0013 Tainted: G W 4.12.14-2.g7573215-default #1 SLE12-SP4 (unreleased)
Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0833.051120182255 05/11/2018
Call Trace:
dump_stack+0x5a/0x75
print_bad_pte+0x217/0x2c0
? enqueue_task_fair+0x76/0x9f0
_vm_normal_page+0xe5/0x100
zap_pte_range+0x148/0x740
unmap_page_range+0x39a/0x4b0
unmap_vmas+0x42/0x90
unmap_region+0x99/0xf0
? vma_gap_callbacks_rotate+0x1a/0x20
do_munmap+0x255/0x3a0
vm_munmap+0x54/0x80
SyS_munmap+0x1d/0x30
do_syscall_64+0x74/0x150
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
...when mprotect(2) gets used on DAX mappings. Also there is a wide variety
of other failures that can result from the missing _PAGE_DEVMAP flag
when the area gets used by get_user_pages() later.Fix the problem by including _PAGE_DEVMAP in a set of flags that get
preserved by mprotect(2).Fixes: 69660fd797c3 ("x86, mm: introduce _PAGE_DEVMAP")
Fixes: ebd31197931d ("powerpc/mm: Add devmap support for ppc64")
Cc:
Signed-off-by: Jan Kara
Acked-by: Michal Hocko
Reviewed-by: Johannes Thumshirn
Signed-off-by: Dan Williams
Signed-off-by: Greg Kroah-Hartman
13 Oct, 2018
3 commits
-
commit b45ba4a51cde29b2939365ef0c07ad34c8321789 upstream.
Commit 51c3c62b58b3 ("powerpc: Avoid code patching freed init
sections") accesses 'init_mem_is_free' flag too early, before the
kernel is relocated. This provokes early boot failure (before the
console is active).As it is not necessary to do this verification that early, this
patch moves the test into patch_instruction() instead of
__patch_instruction().This modification also has the advantage of avoiding unnecessary
remappings.Fixes: 51c3c62b58b3 ("powerpc: Avoid code patching freed init sections")
Cc: stable@vger.kernel.org # 4.13+
Signed-off-by: Christophe Leroy
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman -
commit 51c3c62b58b357e8d35e4cc32f7b4ec907426fe3 upstream.
This stops us from doing code patching in init sections after they've
been freed.In this chain:
kvm_guest_init() ->
kvm_use_magic_page() ->
fault_in_pages_readable() ->
__get_user() ->
__get_user_nocheck() ->
barrier_nospec();We have a code patching location at barrier_nospec() and
kvm_guest_init() is an init function. This whole chain gets inlined,
so when we free the init section (hence kvm_guest_init()), this code
goes away and hence should no longer be patched.We seen this as userspace memory corruption when using a memory
checker while doing partition migration testing on powervm (this
starts the code patching post migration via
/sys/kernel/mobility/migration). In theory, it could also happen when
using /sys/kernel/debug/powerpc/barrier_nospec.Cc: stable@vger.kernel.org # 4.13+
Signed-off-by: Michael Neuling
Reviewed-by: Nicholas Piggin
Reviewed-by: Christophe Leroy
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman -
commit 8cf4c05712f04a405f0dacebcca8f042b391694a upstream.
patch_instruction() uses almost the same sequence as
__patch_instruction()This patch refactor it so that patch_instruction() uses
__patch_instruction() instead of duplicating code.Signed-off-by: Christophe Leroy
Acked-by: Balbir Singh
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman
10 Oct, 2018
1 commit
-
[ Upstream commit 46dec40fb741f00f1864580130779aeeaf24fb3d ]
This fixes a bug which causes guest virtual addresses to get translated
to guest real addresses incorrectly when the guest is using the HPT MMU
and has more than 256GB of RAM, or more specifically has a HPT larger
than 2GB. This has showed up in testing as a failure of the host to
emulate doorbell instructions correctly on POWER9 for HPT guests with
more than 256GB of RAM.The bug is that the HPTE index in kvmppc_mmu_book3s_64_hv_xlate()
is stored as an int, and in forming the HPTE address, the index gets
shifted left 4 bits as an int before being signed-extended to 64 bits.
The simple fix is to make the variable a long int, matching the
return type of kvmppc_hv_find_lock_hpte(), which is what calculates
the index.Fixes: 697d3899dcb4 ("KVM: PPC: Implement MMIO emulation support for Book3S HV guests")
Signed-off-by: Paul Mackerras
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
04 Oct, 2018
2 commits
-
[ Upstream commit d3d4ffaae439981e1e441ebb125aa3588627c5d8 ]
We use PHB in mode1 which uses bit 59 to select a correct DMA window.
However there is mode2 which uses bits 59:55 and allows up to 32 DMA
windows per a PE.Even though documentation does not clearly specify that, it seems that
the actual hardware does not support bits 59:55 even in mode1, in other
words we can create a window as big as 1<
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 8950329c4a64c6d3ca0bc34711a1afbd9ce05657 ]
Memory reservation for crashkernel could fail if there are holes around
kdump kernel offset (128M). Fail gracefully in such cases and print an
error message.Signed-off-by: Hari Bathini
Tested-by: David Gibson
Reviewed-by: Dave Young
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
26 Sep, 2018
2 commits
-
[ Upstream commit 51eaa08f029c7343df846325d7cf047be8b96e81 ]
The call to of_find_compatible_node() is returning a pointer with
incremented refcount so it must be explicitly decremented after the
last use. As here it is only being used for checking of node presence
but the result is not actually used in the success path it can be
dropped immediately.Signed-off-by: Nicholas Mc Guire
Fixes: commit f725758b899f ("KVM: PPC: Book3S HV: Use OPAL XICS emulation on POWER9")
Signed-off-by: Paul Mackerras
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit bd90284cc6c1c9e8e48c8eadd0c79574fcce0b81 ]
The intention here is to consume and discard the remaining buffer
upon error. This works if there has not been a previous partial write.
If there has been, then total_len is no longer total number of bytes
to copy. total_len is always "bytes left to copy", so it should be
added to written bytes.This code may not be exercised any more if partial writes will not be
hit, but this is a small bugfix before a larger change.Reviewed-by: Benjamin Herrenschmidt
Signed-off-by: Nicholas Piggin
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
20 Sep, 2018
1 commit
-
[ Upstream commit 9eab9901b015f489199105c470de1ffc337cfabb ]
We've encountered a performance issue when multiple processors stress
{get,put}_mmio_atsd_reg(). These functions contend for
mmio_atsd_usage, an unsigned long used as a bitmask.The accesses to mmio_atsd_usage are done using test_and_set_bit_lock()
and clear_bit_unlock(). As implemented, both of these will require
a (successful) stwcx to that same cache line.What we end up with is thread A, attempting to unlock, being slowed by
other threads repeatedly attempting to lock. A's stwcx instructions
fail and retry because the memory reservation is lost every time a
different thread beats it to the punch.There may be a long-term way to fix this at a larger scale, but for
now resolve the immediate problem by gating our call to
test_and_set_bit_lock() with one to test_bit(), which is obviously
implemented without using a store.Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2")
Signed-off-by: Reza Arbab
Acked-by: Alistair Popple
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
15 Sep, 2018
5 commits
-
[ Upstream commit 74e96bf44f430cf7a01de19ba6cf49b361cdfd6e ]
The global mce data buffer that used to copy rtas error log is of 2048
(RTAS_ERROR_LOG_MAX) bytes in size. Before the copy we read
extended_log_length from rtas error log header, then use max of
extended_log_length and RTAS_ERROR_LOG_MAX as a size of data to be copied.
Ideally the platform (phyp) will never send extended error log with
size > 2048. But if that happens, then we have a risk of buffer overrun
and corruption. Fix this by using min_t instead.Fixes: d368514c3097 ("powerpc: Fix corruption when grabbing FWNMI data")
Reported-by: Michal Suchanek
Signed-off-by: Mahesh Salgaonkar
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 78ee9946371f5848ddfc88ab1a43867df8f17d83 ]
Because rfi_flush_fallback runs immediately before the return to
userspace it currently runs with the user r1 (stack pointer). This
means if we oops in there we will report a bad kernel stack pointer in
the exception entry path, eg:Bad kernel stack pointer 7ffff7150e40 at c0000000000023b4
Oops: Bad kernel stack pointer, sig: 6 [#1]
LE SMP NR_CPUS=32 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 1246 Comm: klogd Not tainted 4.18.0-rc2-gcc-7.3.1-00175-g0443f8a69ba3 #7
NIP: c0000000000023b4 LR: 0000000010053e00 CTR: 0000000000000040
REGS: c0000000fffe7d40 TRAP: 4100 Not tainted (4.18.0-rc2-gcc-7.3.1-00175-g0443f8a69ba3)
MSR: 9000000002803031 CR: 44000442 XER: 20000000
CFAR: c00000000000bac8 IRQMASK: c0000000f1e66a80
GPR00: 0000000002000000 00007ffff7150e40 00007fff93a99900 0000000000000020
...
NIP [c0000000000023b4] rfi_flush_fallback+0x34/0x80
LR [0000000010053e00] 0x10053e00Although the NIP tells us where we were, and the TRAP number tells us
what happened, it would still be nicer if we could report the actual
exception rather than barfing about the stack pointer.We an do that fairly simply by loading the kernel stack pointer on
entry and restoring the user value before returning. That way we see a
regular oops such as:Unrecoverable exception 4100 at c00000000000239c
Oops: Unrecoverable exception, sig: 6 [#1]
LE SMP NR_CPUS=32 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 1251 Comm: klogd Not tainted 4.18.0-rc3-gcc-7.3.1-00097-g4ebfcac65acd-dirty #40
NIP: c00000000000239c LR: 0000000010053e00 CTR: 0000000000000040
REGS: c0000000f1e17bb0 TRAP: 4100 Not tainted (4.18.0-rc3-gcc-7.3.1-00097-g4ebfcac65acd-dirty)
MSR: 9000000002803031 CR: 44000442 XER: 20000000
CFAR: c00000000000bac8 IRQMASK: 0
...
NIP [c00000000000239c] rfi_flush_fallback+0x3c/0x80
LR [0000000010053e00] 0x10053e00
Call Trace:
[c0000000f1e17e30] [c00000000000b9e4] system_call+0x5c/0x70 (unreliable)Note this shouldn't make the kernel stack pointer vulnerable to a
meltdown attack, because it should be flushed from the cache before we
return to userspace. The user r1 value will be in the cache, because
we load it in the return path, but that is harmless.Signed-off-by: Michael Ellerman
Reviewed-by: Nicholas Piggin
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit f5daf77a55ef0e695cc90c440ed6503073ac5e07 ]
Fix build errors and warnings in t1042rdb_diu.c by adding header files
and MODULE_LICENSE().../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: warning: data definition has no type or storage class
early_initcall(t1042rdb_diu_init);
../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: error: type defaults to 'int' in declaration of 'early_initcall' [-Werror=implicit-int]
../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: warning: parameter names (without types) in function declarationand
WARNING: modpost: missing MODULE_LICENSE() in arch/powerpc/platforms/85xx/t1042rdb_diu.oSigned-off-by: Randy Dunlap
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Scott Wood
Cc: Kumar Gala
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit c42d3be0c06f0c1c416054022aa535c08a1f9b39 ]
The problem is the the calculation should be "end - start + 1" but the
plus one is missing in this calculation.Fixes: 8626816e905e ("powerpc: add support for MPIC message register API")
Signed-off-by: Dan Carpenter
Reviewed-by: Tyrel Datwyler
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit f7a6947cd49b7ff4e03f1b4f7e7b223003d752ca ]
Currently if you build a 32-bit powerpc kernel and use get_user() to
load a u64 value it will fail to build with eg:kernel/rseq.o: In function `rseq_get_rseq_cs':
kernel/rseq.c:123: undefined reference to `__get_user_bad'This is hitting the check in __get_user_size() that makes sure the
size we're copying doesn't exceed the size of the destination:#define __get_user_size(x, ptr, size, retval)
do {
retval = 0;
__chk_user_ptr(ptr);
if (size > sizeof(x))
(x) = __get_user_bad();Which doesn't immediately make sense because the size of the
destination is u64, but it's not really, because __get_user_check()
etc. internally create an unsigned long and copy into that:#define __get_user_check(x, ptr, size)
({
long __gu_err = -EFAULT;
unsigned long __gu_val = 0;The problem being that on 32-bit unsigned long is not big enough to
hold a u64. We can fix this with a trick from hpa in the x86 code, we
statically check the type of x and set the type of __gu_val to either
unsigned long or unsigned long long.Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
10 Sep, 2018
4 commits
-
commit 8cfbdbdc24815417a3ab35101ccf706b9a23ff17 upstream.
Commit 76fa4975f3ed ("KVM: PPC: Check if IOMMU page is contained in
the pinned physical page", 2018-07-17) added some checks to ensure
that guest DMA mappings don't attempt to map more than the guest is
entitled to access. However, errors in the logic mean that legitimate
guest requests to map pages for DMA are being denied in some
situations. Specifically, if the first page of the range passed to
mm_iommu_get() is mapped with a normal page, and subsequent pages are
mapped with transparent huge pages, we end up with mem->pageshift ==
0. That means that the page size checks in mm_iommu_ua_to_hpa() and
mm_iommu_up_to_hpa_rm() will always fail for every page in that
region, and thus the guest can never map any memory in that region for
DMA, typically leading to a flood of error messages like this:qemu-system-ppc64: VFIO_MAP_DMA: -22
qemu-system-ppc64: vfio_dma_map(0x10005f47780, 0x800000000000000, 0x10000, 0x7fff63ff0000) = -22 (Invalid argument)The logic errors in mm_iommu_get() are:
(a) use of 'ua' not 'ua + (i << PAGE_SHIFT)' in the find_linux_pte()
call (meaning that find_linux_pte() returns the pte for the
first address in the range, not the address we are currently up
to);
(b) use of 'pageshift' as the variable to receive the hugepage shift
returned by find_linux_pte() - for a normal page this gets set
to 0, leading to us setting mem->pageshift to 0 when we conclude
that the pte returned by find_linux_pte() didn't match the page
we were looking at;
(c) comparing 'compshift', which is a page order, i.e. log base 2 of
the number of pages, with 'pageshift', which is a log base 2 of
the number of bytes.To fix these problems, this patch introduces 'cur_ua' to hold the
current user address and uses that in the find_linux_pte() call;
introduces 'pteshift' to hold the hugepage shift found by
find_linux_pte(); and compares 'pteshift' with 'compshift +
PAGE_SHIFT' rather than 'compshift'.The patch also moves the local_irq_restore to the point after the PTE
pointer returned by find_linux_pte() has been dereferenced because
otherwise the PTE could change underneath us, and adds a check to
avoid doing the find_linux_pte() call once mem->pageshift has been
reduced to PAGE_SHIFT, as an optimization.Fixes: 76fa4975f3ed ("KVM: PPC: Check if IOMMU page is contained in the pinned physical page")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Paul Mackerras
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman -
commit db2173198b9513f7add8009f225afa1f1c79bcc6 upstream.
The generic code is racy when multiple children of a PCI bridge try to
enable it simultaneously.This leads to drivers trying to access a device through a
not-yet-enabled bridge, and this EEH errors under various
circumstances when using parallel driver probing.There is work going on to fix that properly in the PCI core but it
will take some time.x86 gets away with it because (outside of hotplug), the BIOS enables
all the bridges at boot time.This patch does the same thing on powernv by enabling all bridges that
have child devices at boot time, thus avoiding subsequent races. It's
suitable for backporting to stable and distros, while the proper PCI
fix will probably be significantly more invasive.Signed-off-by: Benjamin Herrenschmidt
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman -
commit cd813e1cd7122f2c261dce5b54d1e0c97f80e1a5 upstream.
During Machine Check interrupt on pseries platform, register r3 points
RTAS extended event log passed by hypervisor. Since hypervisor uses r3
to pass pointer to rtas log, it stores the original r3 value at the
start of the memory (first 8 bytes) pointed by r3. Since hypervisor
stores this info and rtas log is in BE format, linux should make
sure to restore r3 value in correct endian format.Without this patch when MCE handler, after recovery, returns to code that
that caused the MCE may end up with Data SLB access interrupt for invalid
address followed by kernel panic or hang.Severe Machine check interrupt [Recovered]
NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
Initiator: CPU
Error type: SLB [Multihit]
Effective address: d00000000ca70000
cpu 0xa: Vector: 380 (Data SLB Access) at [c0000000fc7775b0]
pc: c0000000009694c0: vsnprintf+0x80/0x480
lr: c0000000009698e0: vscnprintf+0x20/0x60
sp: c0000000fc777830
msr: 8000000002009033
dar: a803a30c000000d0
current = 0xc00000000bc9ef00
paca = 0xc00000001eca5c00 softe: 3 irq_happened: 0x01
pid = 8860, comm = insmod
vscnprintf+0x20/0x60
vprintk_emit+0xb4/0x4b0
vprintk_func+0x5c/0xd0
printk+0x38/0x4c
init_module+0x1c0/0x338 [bork_kernel]
do_one_initcall+0x54/0x230
do_init_module+0x8c/0x248
load_module+0x12b8/0x15b0
sys_finit_module+0xa8/0x110
system_call+0x58/0x6c
--- Exception: c00 (System Call) at 00007fff8bda0644
SP (7fffdfbfe980) is in userspaceThis patch fixes this issue.
Fixes: a08a53ea4c97 ("powerpc/le: Enable RTAS events support")
Cc: stable@vger.kernel.org # v3.15+
Reviewed-by: Nicholas Piggin
Signed-off-by: Mahesh Salgaonkar
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman -
commit 1bd6a1c4b80a28d975287630644e6b47d0f977a5 upstream.
Crash memory ranges is an array of memory ranges of the crashing kernel
to be exported as a dump via /proc/vmcore file. The size of the array
is set based on INIT_MEMBLOCK_REGIONS, which works alright in most cases
where memblock memory regions count is less than INIT_MEMBLOCK_REGIONS
value. But this count can grow beyond INIT_MEMBLOCK_REGIONS value since
commit 142b45a72e22 ("memblock: Add array resizing support").On large memory systems with a few DLPAR operations, the memblock memory
regions count could be larger than INIT_MEMBLOCK_REGIONS value. On such
systems, registering fadump results in crash or other system failures
like below:task: c00007f39a290010 ti: c00000000b738000 task.ti: c00000000b738000
NIP: c000000000047df4 LR: c0000000000f9e58 CTR: c00000000010f180
REGS: c00000000b73b570 TRAP: 0300 Tainted: G L X (4.4.140+)
MSR: 8000000000009033 CR: 22004484 XER: 20000000
CFAR: c000000000008500 DAR: 000007a450000000 DSISR: 40000000 SOFTE: 0
...
NIP [c000000000047df4] smp_send_reschedule+0x24/0x80
LR [c0000000000f9e58] resched_curr+0x138/0x160
Call Trace:
resched_curr+0x138/0x160 (unreliable)
check_preempt_curr+0xc8/0xf0
ttwu_do_wakeup+0x38/0x150
try_to_wake_up+0x224/0x4d0
__wake_up_common+0x94/0x100
ep_poll_callback+0xac/0x1c0
__wake_up_common+0x94/0x100
__wake_up_sync_key+0x70/0xa0
sock_def_readable+0x58/0xa0
unix_stream_sendmsg+0x2dc/0x4c0
sock_sendmsg+0x68/0xa0
___sys_sendmsg+0x2cc/0x2e0
__sys_sendmsg+0x5c/0xc0
SyS_socketcall+0x36c/0x3f0
system_call+0x3c/0x100as array index overflow is not checked for while setting up crash memory
ranges causing memory corruption. To resolve this issue, dynamically
allocate memory for crash memory ranges and resize it incrementally,
in units of pagesize, on hitting array size limit.Fixes: 2df173d9e85d ("fadump: Initialize elfcore header and add PT_LOAD program headers.")
Cc: stable@vger.kernel.org # v3.4+
Signed-off-by: Hari Bathini
Reviewed-by: Mahesh Salgaonkar
[mpe: Just use PAGE_SIZE directly, fixup variable placement]
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman
05 Sep, 2018
1 commit
-
[ Upstream commit b9c1e60e7bf4e64ac1b4f4d6d593f0bb57886973 ]
None of the JITs is allowed to implement exit paths from the BPF
insn mappings other than BPF_JMP | BPF_EXIT. In the BPF core code
we have a couple of rewrites in eBPF (e.g. LD_ABS / LD_IND) and
in eBPF to cBPF translation to retain old existing behavior where
exceptions may occur; they are also tightly controlled by the
verifier where it disallows some of the features such as BPF to
BPF calls when legacy LD_ABS / LD_IND ops are present in the BPF
program. During recent review of all BPF_XADD JIT implementations
I noticed that the ppc64 one is buggy in that it contains two
jumps to exit paths. This is problematic as this can bypass verifier
expectations e.g. pointed out in commit f6b1b3bf0d5f ("bpf: fix
subprog verifier bypass by div/mod by 0 exception"). The first
exit path is obsoleted by the fix in ca36960211eb ("bpf: allow xadd
only on aligned memory") anyway, and for the second one we need to
do a fetch, add and store loop if the reservation from lwarx/ldarx
was lost in the meantime.Fixes: 156d0e290e96 ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Reviewed-by: Naveen N. Rao
Reviewed-by: Sandipan Das
Tested-by: Sandipan Das
Signed-off-by: Daniel Borkmann
Signed-off-by: Alexei Starovoitov
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
03 Aug, 2018
12 commits
-
[ Upstream commit 9dcb3df4281876731e4e8bff7940514d72375154 ]
The interrupt controller inside the Wii's Hollywood chip is connected to
two masters, the "Broadway" PowerPC and the "Starlet" ARM926, each with
their own interrupt status and mask registers.When booting the Wii with mini[1], interrupts from the SD card
controller (IRQ 7) are handled by the ARM, because mini provides SD
access over IPC. Linux however can't currently use or disable this IPC
service, so both sides try to handle IRQ 7 without coordination.Let's instead make sure that all interrupts that are unmasked on the PPC
side are masked on the ARM side; this will also make sure that Linux can
properly talk to the SD card controller (and potentially other devices).If access to a device through IPC is desired in the future, interrupts
from that device should not be handled by Linux directly.[1]: https://github.com/lewurm/mini
Signed-off-by: Jonathan Neuschäfer
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 4ea69b2fd623dee2bbc77d3b6b7d8c0924e2026a ]
For multi-function programs, loading the address of a callee
function to a register requires emitting instructions whose
count varies from one to five depending on the nature of the
address.Since we come to know of the callee's address only before the
extra pass, the number of instructions required to load this
address may vary from what was previously generated. This can
make the JITed image grow or shrink.To avoid this, we should generate a constant five-instruction
when loading function addresses by padding the optimized load
sequence with NOPs.Signed-off-by: Sandipan Das
Signed-off-by: Daniel Borkmann
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit e4ccb1dae6bdef228d729c076c38161ef6e7ca34 ]
New binutils generate the following warning
AS arch/powerpc/kernel/head_8xx.o
arch/powerpc/kernel/head_8xx.S: Assembler messages:
arch/powerpc/kernel/head_8xx.S:916: Warning: invalid register expressionThis patch fixes it.
Signed-off-by: Christophe Leroy
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit eae5f709a4d738c52b6ab636981755d76349ea9e ]
__printf is useful to verify format and arguments. Fix arg mismatch
reported by gcc, remove the following warnings (with W=1):arch/powerpc/kernel/prom_init.c:1467:31: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’
arch/powerpc/kernel/prom_init.c:1471:31: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’
arch/powerpc/kernel/prom_init.c:1504:33: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’
arch/powerpc/kernel/prom_init.c:1505:33: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’
arch/powerpc/kernel/prom_init.c:1506:33: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’
arch/powerpc/kernel/prom_init.c:1507:33: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’
arch/powerpc/kernel/prom_init.c:1508:33: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’
arch/powerpc/kernel/prom_init.c:1509:33: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’
arch/powerpc/kernel/prom_init.c:1975:39: error: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘unsigned int’
arch/powerpc/kernel/prom_init.c:1986:27: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’
arch/powerpc/kernel/prom_init.c:2567:38: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’
arch/powerpc/kernel/prom_init.c:2567:46: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 3 has type ‘long unsigned int’
arch/powerpc/kernel/prom_init.c:2569:38: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’
arch/powerpc/kernel/prom_init.c:2569:46: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 3 has type ‘long unsigned int’The patch also include arg mismatch fix for case with #define DEBUG_PROM
(warning not listed here).This patch fix also the following warnings revealed by checkpatch:
WARNING: Prefer using '"%s...", __func__' to using 'alloc_up', this function's name, in a string
#101: FILE: arch/powerpc/kernel/prom_init.c:1235:
+ prom_debug("alloc_up(%lx, %lx)\n", size, align);and
WARNING: Prefer using '"%s...", __func__' to using 'alloc_down', this function's name, in a string
#138: FILE: arch/powerpc/kernel/prom_init.c:1278:
+ prom_debug("alloc_down(%lx, %lx, %s)\n", size, align,Signed-off-by: Mathieu Malaterre
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 5a4b475cf8511da721f20ba432c244061db7139f ]
Since the value of x is never intended to be read, declare it with gcc
attribute as unused. Fix warning treated as error with W=1:arch/powerpc/platforms/powermac/bootx_init.c:471:21: error: variable ‘x’ set but not used [-Werror=unused-but-set-variable]
Suggested-by: Christophe Leroy
Signed-off-by: Mathieu Malaterre
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit f72cf3f1d49f2c35d6cb682af2e8c93550f264e4 ]
Add a missing prototype for function `note_bootable_part` to silence a
warning treated as error with W=1:arch/powerpc/platforms/powermac/setup.c:361:12: error: no previous prototype for ‘note_bootable_part’ [-Werror=missing-prototypes]
Suggested-by: Christophe Leroy
Signed-off-by: Mathieu Malaterre
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit b87a358b4a1421abd544c0b554b1b7159b2b36c0 ]
Add a missing include .
These functions can all be static, make it so. Fix warnings treated as
errors with W=1:arch/powerpc/platforms/chrp/time.c:41:13: error: no previous prototype for ‘chrp_time_init’ [-Werror=missing-prototypes]
arch/powerpc/platforms/chrp/time.c:66:5: error: no previous prototype for ‘chrp_cmos_clock_read’ [-Werror=missing-prototypes]
arch/powerpc/platforms/chrp/time.c:74:6: error: no previous prototype for ‘chrp_cmos_clock_write’ [-Werror=missing-prototypes]
arch/powerpc/platforms/chrp/time.c:86:5: error: no previous prototype for ‘chrp_set_rtc_time’ [-Werror=missing-prototypes]
arch/powerpc/platforms/chrp/time.c:130:6: error: no previous prototype for ‘chrp_get_rtc_time’ [-Werror=missing-prototypes]Signed-off-by: Mathieu Malaterre
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit c89ca593220931c150cffda24b4d4ccf82f13fc8 ]
The header file was missing from the includes. Fix the
following warning, treated as error with W=1:arch/powerpc/kernel/pci_32.c:286:6: error: no previous prototype for ‘sys_pciconfig_iobase’ [-Werror=missing-prototypes]
Signed-off-by: Mathieu Malaterre
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 926bc2f100c24d4842b3064b5af44ae964c1d81c ]
The stores to update the SLB shadow area must be made as they appear
in the C code, so that the hypervisor does not see an entry with
mismatched vsid and esid. Use WRITE_ONCE for this.GCC has been observed to elide the first store to esid in the update,
which means that if the hypervisor interrupts the guest after storing
to vsid, it could see an entry with old esid and new vsid, which may
possibly result in memory corruption.Signed-off-by: Nicholas Piggin
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 46d4be41b987a6b2d25a2ebdd94cafb44e21d6c5 ]
Correct two cases where eeh_pcid_get() is used to reference the driver's
module but the reference is dropped before the driver pointer is used.In eeh_rmv_device() also refactor a little so that only two calls to
eeh_pcid_put() are needed, rather than three and the reference isn't
taken at all if it wasn't needed.Signed-off-by: Sam Bobroff
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit a6b3964ad71a61bb7c61d80a60bea7d42187b2eb ]
A no-op form of ori (or immediate of 0 into r31 and the result stored
in r31) has been re-tasked as a speculation barrier. The instruction
only acts as a barrier on newer machines with appropriate firmware
support. On older CPUs it remains a harmless no-op.Implement barrier_nospec using this instruction.
mpe: The semantics of the instruction are believed to be that it
prevents execution of subsequent instructions until preceding branches
have been fully resolved and are no longer executing speculatively.
There is no further documentation available at this time.Signed-off-by: Michal Suchanek
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 1128bb7813a896bd608fb622eee3c26aaf33b473 ]
commit 87a156fb18fe1 ("Align hot loops of some string functions")
degraded the performance of string functions by adding useless
nopsA simple benchmark on an 8xx calling 100000x a memchr() that
matches the first byte runs in 41668 TB ticks before this patch
and in 35986 TB ticks after this patch. So this gives an
improvement of approx 10%Another benchmark doing the same with a memchr() matching the 128th
byte runs in 1011365 TB ticks before this patch and 1005682 TB ticks
after this patch, so regardless on the number of loops, removing
those useless nops improves the test by 5683 TB ticks.Fixes: 87a156fb18fe1 ("Align hot loops of some string functions")
Signed-off-by: Christophe Leroy
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
28 Jul, 2018
1 commit
-
commit 76fa4975f3ed12d15762bc979ca44078598ed8ee upstream.
A VM which has:
- a DMA capable device passed through to it (eg. network card);
- running a malicious kernel that ignores H_PUT_TCE failure;
- capability of using IOMMU pages bigger that physical pages
can create an IOMMU mapping that exposes (for example) 16MB of
the host physical memory to the device when only 64K was allocated to the VM.The remaining 16MB - 64K will be some other content of host memory, possibly
including pages of the VM, but also pages of host kernel memory, host
programs or other VMs.The attacking VM does not control the location of the page it can map,
and is only allowed to map as many pages as it has pages of RAM.We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that
an IOMMU page is contained in the physical page so the PCI hardware won't
get access to unassigned host memory; however this check is missing in
the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and
did not hit this yet as the very first time when the mapping happens
we do not have tbl::it_userspace allocated yet and fall back to
the userspace which in turn calls VFIO IOMMU driver, this fails and
the guest does not retry,This stores the smallest preregistered page size in the preregistered
region descriptor and changes the mm_iommu_xxx API to check this against
the IOMMU page size.This calculates maximum page size as a minimum of the natural region
alignment and compound page size. For the page shift this uses the shift
returned by find_linux_pte() which indicates how the page is mapped to
the current userspace - if the page is huge and this is not a zero, then
it is a leaf pte and the page is mapped within the range.Fixes: 121f80ba68f1 ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Alexey Kardashevskiy
Reviewed-by: David Gibson
Signed-off-by: Michael Ellerman
Signed-off-by: Alexey Kardashevskiy
Signed-off-by: Greg Kroah-Hartman
25 Jul, 2018
1 commit
-
commit b03897cf318dfc47de33a7ecbc7655584266f034 upstream.
On 64-bit servers, SPRN_SPRG3 and its userspace read-only mirror
SPRN_USPRG3 are used as userspace VDSO write and read registers
respectively.SPRN_SPRG3 is lost when we enter stop4 and above, and is currently not
restored. As a result, any read from SPRN_USPRG3 returns zero on an
exit from stop4 (Power9 only) and above.Thus in this situation, on POWER9, any call from sched_getcpu() always
returns zero, as on powerpc, we call __kernel_getcpu() which relies
upon SPRN_USPRG3 to report the CPU and NUMA node information.Fix this by restoring SPRN_SPRG3 on wake up from a deep stop state
with the sprg_vdso value that is cached in PACA.Fixes: e1c1cfed5432 ("powerpc/powernv: Save/Restore additional SPRs for stop4 cpuidle")
Cc: stable@vger.kernel.org # v4.14+
Reported-by: Florian Weimer
Signed-off-by: Gautham R. Shenoy
Reviewed-by: Michael Ellerman
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman
03 Jul, 2018
3 commits
-
commit 722cde76d68e8cc4f3de42e71c82fd40dea4f7b9 upstream.
Unregister fadump on kexec down path otherwise the fadump registration
in new kexec-ed kernel complains that fadump is already registered.
This makes new kernel to continue using fadump registered by previous
kernel which may lead to invalid vmcore generation. Hence this patch
fixes this issue by un-registering fadump in fadump_cleanup() which is
called during kexec path so that new kernel can register fadump with
new valid values.Fixes: b500afff11f6 ("fadump: Invalidate registration and release reserved memory for general use.")
Cc: stable@vger.kernel.org # v3.4+
Signed-off-by: Mahesh Salgaonkar
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman -
commit ac9816dcbab53c57bcf1d7b15370b08f1e284318 upstream.
Init all present cpus for deep states instead of "all possible" cpus.
Init fails if a possible cpu is guarded. Resulting in making only
non-deep states available for cpuidle/hotplug.Stewart says, this means that for single threaded workloads, if you
guard out a CPU core you'll not get WoF (Workload Optimised
Frequency), which means that performance goes down when you wouldn't
expect it to.Fixes: 77b54e9f213f ("powernv/powerpc: Add winkle support for offline cpus")
Cc: stable@vger.kernel.org # v3.19+
Signed-off-by: Akshay Adiga
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman -
commit 75743649064ec0cf5ddd69f240ef23af66dde16e upstream.
NX can set the 3rd bit in CR register for XER[SO] (Summary overflow)
which is not related to paste request. The current paste function
returns failure for a successful request when this bit is set. So mask
this bit and check the proper return status.Fixes: 2392c8c8c045 ("powerpc/powernv/vas: Define copy/paste interfaces")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Haren Myneni
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman