Eric Lee / smarc-fsl-linux-kernel

17 Apr, 2019

1 commit

902eca1a0 powerpc/tm: Limit TM code inside PPC_TRANSACTIONAL_MEM ... Browse Code »

commit 897bc3df8c5aebb54c32d831f917592e873d0559 upstream.

Commit e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint")
moved a code block around and this block uses a 'msr' variable outside of
the CONFIG_PPC_TRANSACTIONAL_MEM, however the 'msr' variable is declared
inside a CONFIG_PPC_TRANSACTIONAL_MEM block, causing a possible error when
CONFIG_PPC_TRANSACTION_MEM is not defined.

error: 'msr' undeclared (first use in this function)

This is not causing a compilation error in the mainline kernel, because
'msr' is being used as an argument of MSR_TM_ACTIVE(), which is defined as
the following when CONFIG_PPC_TRANSACTIONAL_MEM is *not* set:

#define MSR_TM_ACTIVE(x) 0

This patch just fixes this issue avoiding the 'msr' variable usage outside
the CONFIG_PPC_TRANSACTIONAL_MEM block, avoiding trusting in the
MSR_TM_ACTIVE() definition.

Cc: stable@vger.kernel.org
Reported-by: Christoph Biedl
Fixes: e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint")
Signed-off-by: Breno Leitao
Signed-off-by: Michael Ellerman
Signed-off-by: Michael Neuling
Signed-off-by: Sasha Levin

Breno Leitao
2019-04-17 14:38:39 +0800

06 Apr, 2019

5 commits

06af7dda0 powerpc/pseries: Perform full re-add of CPU for topology update post-migration ... Browse Code »

[ Upstream commit 81b61324922c67f73813d8a9c175f3c153f6a1c6 ]

On pseries systems, performing a partition migration can result in
altering the nodes a CPU is assigned to on the destination system. For
exampl, pre-migration on the source system CPUs are in node 1 and 3,
post-migration on the destination system CPUs are in nodes 2 and 3.

Handling the node change for a CPU can cause corruption in the slab
cache if we hit a timing where a CPUs node is changed while cache_reap()
is invoked. The corruption occurs because the slab cache code appears
to rely on the CPU and slab cache pages being on the same node.

The current dynamic updating of a CPUs node done in arch/powerpc/mm/numa.c
does not prevent us from hitting this scenario.

Changing the device tree property update notification handler that
recognizes an affinity change for a CPU to do a full DLPAR remove and
add of the CPU instead of dynamically changing its node resolves this
issue.

Signed-off-by: Nathan Fontenot
Signed-off-by: Michael W. Bringmann
Tested-by: Michael W. Bringmann
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin

Nathan Fontenot
2019-04-06 04:33:13 +0800
b52681e6e powerpc/64s: Clear on-stack exception marker upon exception return ... Browse Code »

[ Upstream commit eddd0b332304d554ad6243942f87c2fcea98c56b ]

The ppc64 specific implementation of the reliable stacktracer,
save_stack_trace_tsk_reliable(), bails out and reports an "unreliable
trace" whenever it finds an exception frame on the stack. Stack frames
are classified as exception frames if the STACK_FRAME_REGS_MARKER
magic, as written by exception prologues, is found at a particular
location.

However, as observed by Joe Lawrence, it is possible in practice that
non-exception stack frames can alias with prior exception frames and
thus, that the reliable stacktracer can find a stale
STACK_FRAME_REGS_MARKER on the stack. It in turn falsely reports an
unreliable stacktrace and blocks any live patching transition to
finish. Said condition lasts until the stack frame is
overwritten/initialized by function call or other means.

In principle, we could mitigate this by making the exception frame
classification condition in save_stack_trace_tsk_reliable() stronger:
in addition to testing for STACK_FRAME_REGS_MARKER, we could also take
into account that for all exceptions executing on the kernel stack
- their stack frames's backlink pointers always match what is saved
in their pt_regs instance's ->gpr[1] slot and that
- their exception frame size equals STACK_INT_FRAME_SIZE, a value
uncommonly large for non-exception frames.

However, while these are currently true, relying on them would make
the reliable stacktrace implementation more sensitive towards future
changes in the exception entry code. Note that false negatives, i.e.
not detecting exception frames, would silently break the live patching
consistency model.

Furthermore, certain other places (diagnostic stacktraces, perf, xmon)
rely on STACK_FRAME_REGS_MARKER as well.

Make the exception exit code clear the on-stack
STACK_FRAME_REGS_MARKER for those exceptions running on the "normal"
kernel stack and returning to kernelspace: because the topmost frame
is ignored by the reliable stack tracer anyway, returns to userspace
don't need to take care of clearing the marker.

Furthermore, as I don't have the ability to test this on Book 3E or 32
bits, limit the change to Book 3S and 64 bits.

Fixes: df78d3f61480 ("powerpc/livepatch: Implement reliable stack tracing for the consistency model")
Reported-by: Joe Lawrence
Signed-off-by: Nicolai Stange
Signed-off-by: Joe Lawrence
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin

Nicolai Stange
2019-04-06 04:33:13 +0800
6cf5f631b powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback ... Browse Code »

[ Upstream commit 5330367fa300742a97e20e953b1f77f48392faae ]

After we ALIGN up the address we need to make sure we didn't overflow
and resulted in zero address. In that case, we need to make sure that
the returned address is greater than mmap_min_addr.

This fixes selftest va_128TBswitch --run-hugetlb reporting failures when
run as non root user for

mmap(-1, MAP_HUGETLB)

The bug is that a non-root user requesting address -1 will be given address 0
which will then fail, whereas they should have been given something else that
would have succeeded.

We also avoid the first mmap(-1, MAP_HUGETLB) returning NULL address as mmap address
with this change. So we think this is not a security issue, because it only affects
whether we choose an address below mmap_min_addr, not whether we
actually allow that address to be mapped. ie. there are existing capability
checks to prevent a user mapping below mmap_min_addr and those will still be
honoured even without this fix.

Fixes: 484837601d4d ("powerpc/mm: Add radix support for hugetlb")
Reviewed-by: Laurent Dufour
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin

Aneesh Kumar K.V
2019-04-06 04:33:02 +0800
c70214d51 powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc ... Browse Code »

[ Upstream commit e7140639b1de65bba435a6bd772d134901141f86 ]

When building with -Wsometimes-uninitialized, Clang warns:

arch/powerpc/xmon/ppc-dis.c:157:7: warning: variable 'opcode' is used
uninitialized whenever 'if' condition is false
[-Wsometimes-uninitialized]
if (cpu_has_feature(CPU_FTRS_POWER9))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/xmon/ppc-dis.c:167:7: note: uninitialized use occurs here
if (opcode == NULL)
^~~~~~
arch/powerpc/xmon/ppc-dis.c:157:3: note: remove the 'if' if its
condition is always true
if (cpu_has_feature(CPU_FTRS_POWER9))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/xmon/ppc-dis.c:132:38: note: initialize the variable
'opcode' to silence this warning
const struct powerpc_opcode *opcode;
^
= NULL
1 warning generated.

This warning seems to make no sense on the surface because opcode is set
to NULL right below this statement. However, there is a comma instead of
semicolon to end the dialect assignment, meaning that the opcode
assignment only happens in the if statement. Properly terminate that
line so that Clang no longer warns.

Fixes: 5b102782c7f4 ("powerpc/xmon: Enable disassembly files (compilation changes)")
Signed-off-by: Nathan Chancellor
Reviewed-by: Nick Desaulniers
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin

Nathan Chancellor
2019-04-06 04:33:01 +0800
4acf79745 powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables ... Browse Code »

[ Upstream commit 11f5acce2fa43b015a8120fa7620fa4efd0a2952 ]

We store 2 multilevel tables in iommu_table - one for the hardware and
one with the corresponding userspace addresses. Before allocating
the tables, the iommu_table_group_ops::get_table_size() hook returns
the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts
the locked_vm counter correctly. When the table is actually allocated,
the amount of allocated memory is stored in iommu_table::it_allocated_size
and used to decrement the locked_vm counter when we release the memory
used by the table; .get_table_size() and .create_table() calculate it
independently but the result is expected to be the same.

However the allocator does not add the userspace table size to
.it_allocated_size so when we destroy the table because of VFIO PCI
unplug (i.e. VFIO container is gone but the userspace keeps running),
we decrement locked_vm by just a half of size of memory we are
releasing.

To make things worse, since we enabled on-demand allocation of
indirect levels, it_allocated_size contains only the amount of memory
actually allocated at the table creation time which can just be a
fraction. It is not a problem with incrementing locked_vm (as
get_table_size() value is used) but it is with decrementing.

As the result, we leak locked_vm and may not be able to allocate more
IOMMU tables after few iterations of hotplug/unplug.

This sets it_allocated_size in the pnv_pci_ioda2_ops::create_table()
hook to what pnv_pci_ioda2_get_table_size() returns so from now on we
have a single place which calculates the maximum memory a table can
occupy. The original meaning of it_allocated_size is somewhat lost now
though.

We do not ditch it_allocated_size whatsoever here and we do not call
get_table_size() from vfio_iommu_spapr_tce.c when decrementing
locked_vm as we may have multiple IOMMU groups per container and even
though they all are supposed to have the same get_table_size()
implementation, there is a small chance for failure or confusion.

Fixes: 090bad39b237 ("powerpc/powernv: Add indirect levels to it_userspace")
Signed-off-by: Alexey Kardashevskiy
Reviewed-by: David Gibson
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin

Alexey Kardashevskiy
2019-04-06 04:33:01 +0800

03 Apr, 2019

15 commits

c91d07ad3 powerpc/64: Fix memcmp reading past the end of src/dest ... Browse Code »

commit d9470757398a700d9450a43508000bcfd010c7a4 upstream.

Chandan reported that fstests' generic/026 test hit a crash:

BUG: Unable to handle kernel data access at 0xc00000062ac40000
Faulting instruction address: 0xc000000000092240
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
CPU: 0 PID: 27828 Comm: chacl Not tainted 5.0.0-rc2-next-20190115-00001-g6de6dba64dda #1
NIP: c000000000092240 LR: c00000000066a55c CTR: 0000000000000000
REGS: c00000062c0c3430 TRAP: 0300 Not tainted (5.0.0-rc2-next-20190115-00001-g6de6dba64dda)
MSR: 8000000002009033 CR: 44000842 XER: 20000000
CFAR: 00007fff7f3108ac DAR: c00000062ac40000 DSISR: 40000000 IRQMASK: 0
GPR00: 0000000000000000 c00000062c0c36c0 c0000000017f4c00 c00000000121a660
GPR04: c00000062ac3fff9 0000000000000004 0000000000000020 00000000275b19c4
GPR08: 000000000000000c 46494c4500000000 5347495f41434c5f c0000000026073a0
GPR12: 0000000000000000 c0000000027a0000 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: c00000062ea70020 c00000062c0c38d0 0000000000000002 0000000000000002
GPR24: c00000062ac3ffe8 00000000275b19c4 0000000000000001 c00000062ac30000
GPR28: c00000062c0c38d0 c00000062ac30050 c00000062ac30058 0000000000000000
NIP memcmp+0x120/0x690
LR xfs_attr3_leaf_lookup_int+0x53c/0x5b0
Call Trace:
xfs_attr3_leaf_lookup_int+0x78/0x5b0 (unreliable)
xfs_da3_node_lookup_int+0x32c/0x5a0
xfs_attr_node_addname+0x170/0x6b0
xfs_attr_set+0x2ac/0x340
__xfs_set_acl+0xf0/0x230
xfs_set_acl+0xd0/0x160
set_posix_acl+0xc0/0x130
posix_acl_xattr_set+0x68/0x110
__vfs_setxattr+0xa4/0x110
__vfs_setxattr_noperm+0xac/0x240
vfs_setxattr+0x128/0x130
setxattr+0x248/0x600
path_setxattr+0x108/0x120
sys_setxattr+0x28/0x40
system_call+0x5c/0x70
Instruction dump:
7d201c28 7d402428 7c295040 38630008 38840008 408201f0 4200ffe8 2c050000
4182ff6c 20c50008 54c61838 7d201c28 7d293436 7d4a3436 7c295040

The instruction dump decodes as:
subfic r6,r5,8
rlwinm r6,r6,3,0,28
ldbrx r9,0,r3
ldbrx r10,0,r4
Signed-off-by: Michael Ellerman
Reviewed-by: Segher Boessenkool
Tested-by: Chandan Rajendra
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Michael Ellerman
2019-04-03 12:26:29 +0800
d7c00bbbf powerpc/pseries/energy: Use OF accessor functions to read ibm,drc-indexes ... Browse Code »

commit ce9afe08e71e3f7d64f337a6e932e50849230fc2 upstream.

In cpu_to_drc_index() in the case when FW_FEATURE_DRC_INFO is absent,
we currently use of_read_property() to obtain the pointer to the array
corresponding to the property "ibm,drc-indexes". The elements of this
array are of type __be32, but are accessed without any conversion to
the OS-endianness, which is buggy on a Little Endian OS.

Fix this by using of_property_read_u32_index() accessor function to
safely read the elements of the array.

Fixes: e83636ac3334 ("pseries/drc-info: Search DRC properties for CPU indexes")
Cc: stable@vger.kernel.org # v4.16+
Reported-by: Pavithra R. Prakash
Signed-off-by: Gautham R. Shenoy
Reviewed-by: Vaidyanathan Srinivasan
[mpe: Make the WARN_ON a WARN_ON_ONCE so it's not retriggerable]
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Gautham R. Shenoy
2019-04-03 12:26:29 +0800
92d4ee2e8 powerpc: bpf: Fix generation of load/store DW instructions ... Browse Code »

commit 86be36f6502c52ddb4b85938145324fd07332da1 upstream.

Yauheni Kaliuta pointed out that PTR_TO_STACK store/load verifier test
was failing on powerpc64 BE, and rightfully indicated that the PPC_LD()
macro is not masking away the last two bits of the offset per the ISA,
resulting in the generation of 'lwa' instruction instead of the intended
'ld' instruction.

Segher also pointed out that we can't simply mask away the last two bits
as that will result in loading/storing from/to a memory location that
was not intended.

This patch addresses this by using ldx/stdx if the offset is not
word-aligned. We load the offset into a temporary register (TMP_REG_2)
and use that as the index register in a subsequent ldx/stdx. We fix
PPC_LD() macro to mask off the last two bits, but enhance PPC_BPF_LL()
and PPC_BPF_STL() to factor in the offset value and generate the proper
instruction sequence. We also convert all existing users of PPC_LD() and
PPC_STD() to use these macros. All existing uses of these macros have
been audited to ensure that TMP_REG_2 can be clobbered.

Fixes: 156d0e290e96 ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Cc: stable@vger.kernel.org # v4.9+

Reported-by: Yauheni Kaliuta
Signed-off-by: Naveen N. Rao
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman

Naveen N. Rao
2019-04-03 12:26:21 +0800
a1df5db3a powerpc/security: Fix spectre_v2 reporting ... Browse Code »

commit 92edf8df0ff2ae86cc632eeca0e651fd8431d40d upstream.

When I updated the spectre_v2 reporting to handle software count cache
flush I got the logic wrong when there's no software count cache
enabled at all.

The result is that on systems with the software count cache flush
disabled we print:

Mitigation: Indirect branch cache disabled, Software count cache flush

Which correctly indicates that the count cache is disabled, but
incorrectly says the software count cache flush is enabled.

The root of the problem is that we are trying to handle all
combinations of options. But we know now that we only expect to see
the software count cache flush enabled if the other options are false.

So split the two cases, which simplifies the logic and fixes the bug.
We were also missing a space before "(hardware accelerated)".

The result is we see one of:

Mitigation: Indirect branch serialisation (kernel only)
Mitigation: Indirect branch cache disabled
Mitigation: Software count cache flush
Mitigation: Software count cache flush (hardware accelerated)

Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache flush")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Michael Ellerman
Reviewed-by: Michael Neuling
Reviewed-by: Diana Craciun
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Michael Ellerman
2019-04-03 12:26:20 +0800
986f0c656 powerpc/fsl: Fix the flush of branch predictor. ... Browse Code »

commit 27da80719ef132cf8c80eb406d5aeb37dddf78cc upstream.

The commit identified below adds MC_BTB_FLUSH macro only when
CONFIG_PPC_FSL_BOOK3E is defined. This results in the following error
on some configs (seen several times with kisskb randconfig_defconfig)

arch/powerpc/kernel/exceptions-64e.S:576: Error: Unrecognized opcode: `mc_btb_flush'
make[3]: *** [scripts/Makefile.build:367: arch/powerpc/kernel/exceptions-64e.o] Error 1
make[2]: *** [scripts/Makefile.build:492: arch/powerpc/kernel] Error 2
make[1]: *** [Makefile:1043: arch/powerpc] Error 2
make: *** [Makefile:152: sub-make] Error 2

This patch adds a blank definition of MC_BTB_FLUSH for other cases.

Fixes: 10c5e83afd4a ("powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)")
Cc: Diana Craciun
Signed-off-by: Christophe Leroy
Reviewed-by: Daniel Axtens
Reviewed-by: Diana Craciun
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Christophe Leroy
2019-04-03 12:26:20 +0800
b848d19c4 powerpc/fsl: Fixed warning: orphan section `__btb_flush_fixup' ... Browse Code »

commit 039daac5526932ec731e4499613018d263af8b3e upstream.

Fixed the following build warning:
powerpc-linux-gnu-ld: warning: orphan section `__btb_flush_fixup' from
`arch/powerpc/kernel/head_44x.o' being placed in section
`__btb_flush_fixup'.

Signed-off-by: Diana Craciun
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Diana Craciun
2019-04-03 12:26:20 +0800
632d83929 powerpc/fsl: Update Spectre v2 reporting ... Browse Code »

commit dfa88658fb0583abb92e062c7a9cd5a5b94f2a46 upstream.

Report branch predictor state flush as a mitigation for
Spectre variant 2.

Signed-off-by: Diana Craciun
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Diana Craciun
2019-04-03 12:26:20 +0800
43f40620d powerpc/fsl: Enable runtime patching if nospectre_v2 boot arg is used ... Browse Code »

commit 3bc8ea8603ae4c1e09aca8de229ad38b8091fcb3 upstream.

If the user choses not to use the mitigations, replace
the code sequence with nops.

Signed-off-by: Diana Craciun
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Diana Craciun
2019-04-03 12:26:20 +0800
a46a50382 powerpc/fsl: Flush branch predictor when entering KVM ... Browse Code »

commit e7aa61f47b23afbec41031bc47ca8d6cb6516abc upstream.

Switching from the guest to host is another place
where the speculative accesses can be exploited.
Flush the branch predictor when entering KVM.

Signed-off-by: Diana Craciun
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Diana Craciun
2019-04-03 12:26:20 +0800
3cb931c70 powerpc/fsl: Flush the branch predictor at each kernel entry (32 bit) ... Browse Code »

commit 7fef436295bf6c05effe682c8797dfcb0deb112a upstream.

In order to protect against speculation attacks on
indirect branches, the branch predictor is flushed at
kernel entry to protect for the following situations:
- userspace process attacking another userspace process
- userspace process attacking the kernel
Basically when the privillege level change (i.e.the kernel
is entered), the branch predictor state is flushed.

Signed-off-by: Diana Craciun
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Diana Craciun
2019-04-03 12:26:20 +0800
cf72dad92 powerpc/fsl: Flush the branch predictor at each kernel entry (64bit) ... Browse Code »

commit 10c5e83afd4a3f01712d97d3bb1ae34d5b74a185 upstream.

In order to protect against speculation attacks on
indirect branches, the branch predictor is flushed at
kernel entry to protect for the following situations:
- userspace process attacking another userspace process
- userspace process attacking the kernel
Basically when the privillege level change (i.e. the
kernel is entered), the branch predictor state is flushed.

Signed-off-by: Diana Craciun
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Diana Craciun
2019-04-03 12:26:20 +0800
020e5f138 powerpc/fsl: Add nospectre_v2 command line argument ... Browse Code »

commit f633a8ad636efb5d4bba1a047d4a0f1ef719aa06 upstream.

When the command line argument is present, the Spectre variant 2
mitigations are disabled.

Signed-off-by: Diana Craciun
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Diana Craciun
2019-04-03 12:26:20 +0800
4a6a2287e powerpc/fsl: Emulate SPRN_BUCSR register ... Browse Code »

commit 98518c4d8728656db349f875fcbbc7c126d4c973 upstream.

In order to flush the branch predictor the guest kernel performs
writes to the BUCSR register which is hypervisor privilleged. However,
the branch predictor is flushed at each KVM entry, so the branch
predictor has been already flushed, so just return as soon as possible
to guest.

Signed-off-by: Diana Craciun
[mpe: Tweak comment formatting]
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Diana Craciun
2019-04-03 12:26:19 +0800
4944f1d48 powerpc/fsl: Add macro to flush the branch predictor ... Browse Code »

commit 1cbf8990d79ff69da8ad09e8a3df014e1494462b upstream.

The BUCSR register can be used to invalidate the entries in the
branch prediction mechanisms.

Signed-off-by: Diana Craciun
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Diana Craciun
2019-04-03 12:26:19 +0800
d67ab3d9a powerpc/fsl: Add infrastructure to fixup branch predictor flush ... Browse Code »

commit 76a5eaa38b15dda92cd6964248c39b5a6f3a4e9d upstream.

In order to protect against speculation attacks (Spectre
variant 2) on NXP PowerPC platforms, the branch predictor
should be flushed when the privillege level is changed.
This patch is adding the infrastructure to fixup at runtime
the code sections that are performing the branch predictor flush
depending on a boot arg parameter which is added later in a
separate patch.

Signed-off-by: Diana Craciun
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Diana Craciun
2019-04-03 12:26:19 +0800

27 Mar, 2019

1 commit

b8ea151a7 powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038 ... Browse Code »

commit b5b4453e7912f056da1ca7572574cada32ecb60c upstream.

Jakub Drnec reported:
Setting the realtime clock can sometimes make the monotonic clock go
back by over a hundred years. Decreasing the realtime clock across
the y2k38 threshold is one reliable way to reproduce. Allegedly this
can also happen just by running ntpd, I have not managed to
reproduce that other than booting with rtc at >2038 and then running
ntp. When this happens, anything with timers (e.g. openjdk) breaks
rather badly.

And included a test case (slightly edited for brevity):
#define _POSIX_C_SOURCE 199309L
#include
#include
#include
#include

long get_time(void) {
struct timespec tp;
clock_gettime(CLOCK_MONOTONIC, &tp);
return tp.tv_sec + tp.tv_nsec / 1000000000;
}

int main(void) {
long last = get_time();
while(1) {
long now = get_time();
if (now < last) {
printf("clock went backwards by %ld seconds!\n", last - now);
}
last = now;
sleep(1);
}
return 0;
}

Which when run concurrently with:
# date -s 2040-1-1
# date -s 2037-1-1

Will detect the clock going backward.

The root cause is that wtom_clock_sec in struct vdso_data is only a
32-bit signed value, even though we set its value to be equal to
tk->wall_to_monotonic.tv_sec which is 64-bits.

Because the monotonic clock starts at zero when the system boots the
wall_to_montonic.tv_sec offset is negative for current and future
dates. Currently on a freshly booted system the offset will be in the
vicinity of negative 1.5 billion seconds.

However if the wall clock is set past the Y2038 boundary, the offset
from wall to monotonic becomes less than negative 2^31, and no longer
fits in 32-bits. When that value is assigned to wtom_clock_sec it is
truncated and becomes positive, causing the VDSO assembly code to
calculate CLOCK_MONOTONIC incorrectly.

That causes CLOCK_MONOTONIC to jump ahead by ~4 billion seconds which
it is not meant to do. Worse, if the time is then set back before the
Y2038 boundary CLOCK_MONOTONIC will jump backward.

We can fix it simply by storing the full 64-bit offset in the
vdso_data, and using that in the VDSO assembly code. We also shuffle
some of the fields in vdso_data to avoid creating a hole.

The original commit that added the CLOCK_MONOTONIC support to the VDSO
did actually use a 64-bit value for wtom_clock_sec, see commit
a7f290dad32e ("[PATCH] powerpc: Merge vdso's and add vdso support to
32 bits kernel") (Nov 2005). However just 3 days later it was
converted to 32-bits in commit 0c37ec2aa88b ("[PATCH] powerpc: vdso
fixes (take #2)"), and the bug has existed since then AFAICS.

Fixes: 0c37ec2aa88b ("[PATCH] powerpc: vdso fixes (take #2)")
Cc: stable@vger.kernel.org # v2.6.15+
Link: http://lkml.kernel.org/r/HaC.ZfES.62bwlnvAvMP.1STMMj@seznam.cz
Reported-by: Jakub Drnec
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Michael Ellerman
2019-03-27 13:14:40 +0800

24 Mar, 2019

11 commits

23ad135ae KVM: Call kvm_arch_memslots_updated() before updating memslots ... Browse Code »

commit 152482580a1b0accb60676063a1ac57b2d12daf6 upstream.

kvm_arch_memslots_updated() is at this point in time an x86-specific
hook for handling MMIO generation wraparound. x86 stashes 19 bits of
the memslots generation number in its MMIO sptes in order to avoid
full page fault walks for repeat faults on emulated MMIO addresses.
Because only 19 bits are used, wrapping the MMIO generation number is
possible, if unlikely. kvm_arch_memslots_updated() alerts x86 that
the generation has changed so that it can invalidate all MMIO sptes in
case the effective MMIO generation has wrapped so as to avoid using a
stale spte, e.g. a (very) old spte that was created with generation==0.

Given that the purpose of kvm_arch_memslots_updated() is to prevent
consuming stale entries, it needs to be called before the new generation
is propagated to memslots. Invalidating the MMIO sptes after updating
memslots means that there is a window where a vCPU could dereference
the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
spte that was created with (pre-wrap) generation==0.

Fixes: e59dbe09f8e6 ("KVM: Introduce kvm_arch_memslots_updated()")
Cc:
Signed-off-by: Sean Christopherson
Signed-off-by: Paolo Bonzini
Signed-off-by: Greg Kroah-Hartman

Sean Christopherson
2019-03-24 03:10:13 +0800
d6d004b3d powerpc/traps: Fix the message printed when stack overflows ... Browse Code »

commit 9bf3d3c4e4fd82c7174f4856df372ab2a71005b9 upstream.

Today's message is useless:

[ 42.253267] Kernel stack overflow in process (ptrval), r1=c65500b0

This patch fixes it:

[ 66.905235] Kernel stack overflow in process sh[356], r1=c65560b0

Fixes: ad67b74d2469 ("printk: hash addresses printed with %p")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Christophe Leroy
[mpe: Use task_pid_nr()]
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Christophe Leroy
2019-03-24 03:10:08 +0800
461a52a44 powerpc/traps: fix recoverability of machine check handling on book3s/32 ... Browse Code »

commit 0bbea75c476b77fa7d7811d6be911cc7583e640f upstream.

Looks like book3s/32 doesn't set RI on machine check, so
checking RI before calling die() will always be fatal
allthought this is not an issue in most cases.

Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
Fixes: daf00ae71dad ("powerpc/traps: restore recoverability of machine_check interrupts")
Signed-off-by: Christophe Leroy
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Christophe Leroy
2019-03-24 03:10:08 +0800
baed68a95 powerpc/hugetlb: Don't do runtime allocation of 16G pages in LPAR configuration ... Browse Code »

commit 35f2806b481f5b9207f25e1886cba5d1c4d12cc7 upstream.

We added runtime allocation of 16G pages in commit 4ae279c2c96a
("powerpc/mm/hugetlb: Allow runtime allocation of 16G.") That was done
to enable 16G allocation on PowerNV and KVM config. In case of KVM
config, we mostly would have the entire guest RAM backed by 16G
hugetlb pages for this to work. PAPR do support partial backing of
guest RAM with hugepages via ibm,expected#pages node of memory node in
the device tree. This means rest of the guest RAM won't be backed by
16G contiguous pages in the host and hence a hash page table insertion
can fail in such case.

An example error message will look like

hash-mmu: mm: Hashing failure ! EA=0x7efc00000000 access=0x8000000000000006 current=readback
hash-mmu: trap=0x300 vsid=0x67af789 ssize=1 base psize=14 psize 14 pte=0xc000000400000386
readback[12260]: unhandled signal 7 at 00007efc00000000 nip 00000000100012d0 lr 000000001000127c code 2

This patch address that by preventing runtime allocation of 16G
hugepages in LPAR config. To allocate 16G hugetlb one need to kernel
command line hugepagesz=16G hugepages=

With radix translation mode we don't run into this issue.

This change will prevent runtime allocation of 16G hugetlb pages on
kvm with hash translation mode. However, with the current upstream it
was observed that 16G hugetlbfs backed guest doesn't boot at all.

We observe boot failure with the below message:
[131354.647546] KVM: map_vrma at 0 failed, ret=-4

That means this patch is not resulting in an observable regression.
Once we fix the boot issue with 16G hugetlb backed memory, we need to
use ibm,expected#pages memory node attribute to indicate 16G page
reservation to the guest. This will also enable partial backing of
guest RAM with 16G pages.

Fixes: 4ae279c2c96a ("powerpc/mm/hugetlb: Allow runtime allocation of 16G.")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Aneesh Kumar K.V
2019-03-24 03:10:08 +0800
9d2e929c3 powerpc/ptrace: Simplify vr_get/set() to avoid GCC warning ... Browse Code »

commit ca6d5149d2ad0a8d2f9c28cbe379802260a0a5e0 upstream.

GCC 8 warns about the logic in vr_get/set(), which with -Werror breaks
the build:

In function ‘user_regset_copyin’,
inlined from ‘vr_set’ at arch/powerpc/kernel/ptrace.c:628:9:
include/linux/regset.h:295:4: error: ‘memcpy’ offset [-527, -529] is
out of the bounds [0, 16] of object ‘vrsave’ with type ‘union
’ [-Werror=array-bounds]
arch/powerpc/kernel/ptrace.c: In function ‘vr_set’:
arch/powerpc/kernel/ptrace.c:623:5: note: ‘vrsave’ declared here
} vrsave;

This has been identified as a regression in GCC, see GCC bug 88273.

However we can avoid the warning and also simplify the logic and make
it more robust.

Currently we pass -1 as end_pos to user_regset_copyout(). This says
"copy up to the end of the regset".

The definition of the regset is:
[REGSET_VMX] = {
.core_note_type = NT_PPC_VMX, .n = 34,
.size = sizeof(vector128), .align = sizeof(vector128),
.active = vr_active, .get = vr_get, .set = vr_set
},

The end is calculated as (n * size), ie. 34 * sizeof(vector128).

In vr_get/set() we pass start_pos as 33 * sizeof(vector128), meaning
we can copy up to sizeof(vector128) into/out-of vrsave.

The on-stack vrsave is defined as:
union {
elf_vrreg_t reg;
u32 word;
} vrsave;

And elf_vrreg_t is:
typedef __vector128 elf_vrreg_t;

So there is no bug, but we rely on all those sizes lining up,
otherwise we would have a kernel stack exposure/overwrite on our
hands.

Rather than relying on that we can pass an explict end_pos based on
the sizeof(vrsave). The result should be exactly the same but it's
more obviously not over-reading/writing the stack and it avoids the
compiler warning.

Reported-by: Meelis Roos
Reported-by: Mathieu Malaterre
Cc: stable@vger.kernel.org
Tested-by: Mathieu Malaterre
Tested-by: Meelis Roos
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Michael Ellerman
2019-03-24 03:10:07 +0800
344996a83 powerpc: Fix 32-bit KVM-PR lockup and host crash with MacOS guest ... Browse Code »

commit fe1ef6bcdb4fca33434256a802a3ed6aacf0bd2f upstream.

Commit 8792468da5e1 "powerpc: Add the ability to save FPU without
giving it up" unexpectedly removed the MSR_FE0 and MSR_FE1 bits from
the bitmask used to update the MSR of the previous thread in
__giveup_fpu() causing a KVM-PR MacOS guest to lockup and panic the
host kernel.

Leaving FE0/1 enabled means unrelated processes might receive FPEs
when they're not expecting them and crash. In particular if this
happens to init the host will then panic.

eg (transcribed):
qemu-system-ppc[837]: unhandled signal 8 at 12cc9ce4 nip 12cc9ce4 lr 12cc9ca4 code 0
systemd[1]: unhandled signal 8 at 202f02e0 nip 202f02e0 lr 001003d4 code 0
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

Reinstate these bits to the MSR bitmask to enable MacOS guests to run
under 32-bit KVM-PR once again without issue.

Fixes: 8792468da5e1 ("powerpc: Add the ability to save FPU without giving it up")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Mark Cave-Ayland
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Mark Cave-Ayland
2019-03-24 03:10:07 +0800
3bf8ff7bc powerpc/powernv: Don't reprogram SLW image on every KVM guest entry/exit ... Browse Code »

commit 19f8a5b5be2898573a5e1dc1db93e8d40117606a upstream.

Commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api
only on Hotplug", 2017-07-21) added two calls to opal_slw_set_reg()
inside pnv_cpu_offline(), with the aim of changing the LPCR value in
the SLW image to disable wakeups from the decrementer while a CPU is
offline. However, pnv_cpu_offline() gets called each time a secondary
CPU thread is woken up to participate in running a KVM guest, that is,
not just when a CPU is offlined.

Since opal_slw_set_reg() is a very slow operation (with observed
execution times around 20 milliseconds), this means that an offline
secondary CPU can often be busy doing the opal_slw_set_reg() call
when the primary CPU wants to grab all the secondary threads so that
it can run a KVM guest. This leads to messages like "KVM: couldn't
grab CPU n" being printed and guest execution failing.

There is no need to reprogram the SLW image on every KVM guest entry
and exit. So that we do it only when a CPU is really transitioning
between online and offline, this moves the calls to
pnv_program_cpu_hotplug_lpcr() into pnv_smp_cpu_kill_self().

Fixes: 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Paul Mackerras
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Paul Mackerras
2019-03-24 03:10:07 +0800
f6f03d607 powerpc/83xx: Also save/restore SPRG4-7 during suspend ... Browse Code »

commit 36da5ff0bea2dc67298150ead8d8471575c54c7d upstream.

The 83xx has 8 SPRG registers and uses at least SPRG4
for DTLB handling LRU.

Fixes: 2319f1239592 ("powerpc/mm: e300c2/c3/c4 TLB errata workaround")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Christophe Leroy
2019-03-24 03:10:07 +0800
b09349901 powerpc/powernv: Make opal log only readable by root ... Browse Code »

commit 7b62f9bd2246b7d3d086e571397c14ba52645ef1 upstream.

Currently the opal log is globally readable. It is kernel policy to
limit the visibility of physical addresses / kernel pointers to root.
Given this and the fact the opal log may contain this information it
would be better to limit the readability to root.

Fixes: bfc36894a48b ("powerpc/powernv: Add OPAL message log interface")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Jordan Niethe
Reviewed-by: Stewart Smith
Reviewed-by: Andrew Donnellan
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Jordan Niethe
2019-03-24 03:10:07 +0800
9b5305504 powerpc/wii: properly disable use of BATs when requested. ... Browse Code »

commit 6d183ca8baec983dc4208ca45ece3c36763df912 upstream.

'nobats' kernel parameter or some options like CONFIG_DEBUG_PAGEALLOC
deny the use of BATS for mapping memory.

This patch makes sure that the specific wii RAM mapping function
takes it into account as well.

Fixes: de32400dd26e ("wii: use both mem1 and mem2 as ram")
Cc: stable@vger.kernel.org
Reviewed-by: Jonathan Neuschafer
Signed-off-by: Christophe Leroy
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Christophe Leroy
2019-03-24 03:10:07 +0800
40b97853c powerpc/32: Clear on-stack exception marker upon exception return ... Browse Code »

commit 9580b71b5a7863c24a9bd18bcd2ad759b86b1eff upstream.

Clear the on-stack STACK_FRAME_REGS_MARKER on exception exit in order
to avoid confusing stacktrace like the one below.

Call Trace:
[c0e9dca0] [c01c42a0] print_address_description+0x64/0x2bc (unreliable)
[c0e9dcd0] [c01c4684] kasan_report+0xfc/0x180
[c0e9dd10] [c0895130] memchr+0x24/0x74
[c0e9dd30] [c00a9e38] msg_print_text+0x124/0x574
[c0e9dde0] [c00ab710] console_unlock+0x114/0x4f8
[c0e9de40] [c00adc60] vprintk_emit+0x188/0x1c4
--- interrupt: c0e9df00 at 0x400f330
LR = init_stack+0x1f00/0x2000
[c0e9de80] [c00ae3c4] printk+0xa8/0xcc (unreliable)
[c0e9df20] [c0c27e44] early_irq_init+0x38/0x108
[c0e9df50] [c0c15434] start_kernel+0x310/0x488
[c0e9dff0] [00003484] 0x3484

With this patch the trace becomes:

Call Trace:
[c0e9dca0] [c01c42c0] print_address_description+0x64/0x2bc (unreliable)
[c0e9dcd0] [c01c46a4] kasan_report+0xfc/0x180
[c0e9dd10] [c0895150] memchr+0x24/0x74
[c0e9dd30] [c00a9e58] msg_print_text+0x124/0x574
[c0e9dde0] [c00ab730] console_unlock+0x114/0x4f8
[c0e9de40] [c00adc80] vprintk_emit+0x188/0x1c4
[c0e9de80] [c00ae3e4] printk+0xa8/0xcc
[c0e9df20] [c0c27e44] early_irq_init+0x38/0x108
[c0e9df50] [c0c15434] start_kernel+0x310/0x488
[c0e9dff0] [00003484] 0x3484

Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Christophe Leroy
2019-03-24 03:10:07 +0800

27 Feb, 2019

1 commit

99b490e3a powerpc/8xx: fix setting of pagetable for Abatron BDI debug tool. ... Browse Code »

[ Upstream commit fb0bdec51a4901b7dd088de0a1e365e1b9f5cd21 ]

Commit 8c8c10b90d88 ("powerpc/8xx: fix handling of early NULL pointer
dereference") moved the loading of r6 earlier in the code. As some
functions are called inbetween, r6 needs to be loaded again with the
address of swapper_pg_dir in order to set PTE pointers for
the Abatron BDI.

Fixes: 8c8c10b90d88 ("powerpc/8xx: fix handling of early NULL pointer dereference")
Signed-off-by: Christophe Leroy
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin

Christophe Leroy
2019-02-27 17:08:54 +0800

15 Feb, 2019

1 commit

49c473e12 powerpc/radix: Fix kernel crash with mremap() ... Browse Code »

commit 579b9239c1f38665b21e8d0e6ee83ecc96dbd6bb upstream.

With support for split pmd lock, we use pmd page pmd_huge_pte pointer
to store the deposited page table. In those config when we move page
tables we need to make sure we move the deposited page table to the
correct pmd page. Otherwise this can result in crash when we withdraw
of deposited page table because we can find the pmd_huge_pte NULL.

eg:

__split_huge_pmd+0x1070/0x1940
__split_huge_pmd+0xe34/0x1940 (unreliable)
vma_adjust_trans_huge+0x110/0x1c0
__vma_adjust+0x2b4/0x9b0
__split_vma+0x1b8/0x280
__do_munmap+0x13c/0x550
sys_mremap+0x220/0x7e0
system_call+0x5c/0x70

Fixes: 675d995297d4 ("powerpc/book3s64: Enable split pmd ptlock.")
Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Michael Ellerman
Signed-off-by: Greg Kroah-Hartman

Aneesh Kumar K.V
2019-02-15 15:10:12 +0800

13 Feb, 2019

5 commits

fc090081d powerpc/fadump: Do not allow hot-remove memory from fadump reserved area. ... Browse Code »

[ Upstream commit 0db6896ff6332ba694f1e61b93ae3b2640317633 ]

For fadump to work successfully there should not be any holes in reserved
memory ranges where kernel has asked firmware to move the content of old
kernel memory in event of crash. Now that fadump uses CMA for reserved
area, this memory area is now not protected from hot-remove operations
unless it is cma allocated. Hence, fadump service can fail to re-register
after the hot-remove operation, if hot-removed memory belongs to fadump
reserved region. To avoid this make sure that memory from fadump reserved
area is not hot-removable if fadump is registered.

However, if user still wants to remove that memory, he can do so by
manually stopping fadump service before hot-remove operation.

Signed-off-by: Mahesh Salgaonkar
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin

Mahesh Salgaonkar
2019-02-13 02:47:16 +0800
279eb1d94 powerpc/mm: Fix reporting of kernel execute faults on the 8xx ... Browse Code »

[ Upstream commit ffca395b11c4a5a6df6d6345f794b0e3d578e2d0 ]

On the 8xx, no-execute is set via PPP bits in the PTE. Therefore
a no-exec fault generates DSISR_PROTFAULT error bits,
not DSISR_NOEXEC_OR_G.

This patch adds DSISR_PROTFAULT in the test mask.

Fixes: d3ca587404b3 ("powerpc/mm: Fix reporting of kernel execute faults")
Signed-off-by: Christophe Leroy
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin

Christophe Leroy
2019-02-13 02:47:15 +0800
49f182e6b powerpc/powernv/ioda: Allocate indirect TCE levels of cached userspace addresses on demand ... Browse Code »

[ Upstream commit bdbf649efe21173cae63b4b71db84176420f9039 ]

The powernv platform maintains 2 TCE tables for VFIO - a hardware TCE
table and a table with userspace addresses; the latter is used for
marking pages dirty when corresponging TCEs are unmapped from
the hardware table.

a68bd1267b72 ("powerpc/powernv/ioda: Allocate indirect TCE levels
on demand") enabled on-demand allocation of the hardware table,
however it missed the other table so it has still been fully allocated
at the boot time. This fixes the issue by allocating a single level,
just like we do for the hardware table.

Fixes: a68bd1267b72 ("powerpc/powernv/ioda: Allocate indirect TCE levels on demand")
Signed-off-by: Alexey Kardashevskiy
Reviewed-by: David Gibson
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin

Alexey Kardashevskiy
2019-02-13 02:47:14 +0800
5a0c7fb53 powerpc/perf: Fix thresholding counter data for unknown type ... Browse Code »

[ Upstream commit 17cfccc91545682513541924245abb876d296063 ]

MMCRA[34:36] and MMCRA[38:44] expose the thresholding counter value.
Thresholding counter can be used to count latency cycles such as
load miss to reload. But threshold counter value is not relevant
when the sampled instruction type is unknown or reserved. Patch to
fix the thresholding counter value to zero when sampled instruction
type is unknown or reserved.

Fixes: 170a315f41c6('powerpc/perf: Support to export MMCRA[TEC*] field to userspace')
Signed-off-by: Madhavan Srinivasan
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin

Madhavan Srinivasan
2019-02-13 02:47:14 +0800
e12194317 powerpc/uaccess: fix warning/error with access_ok() ... Browse Code »

[ Upstream commit 05a4ab823983d9136a460b7b5e0d49ee709a6f86 ]

With the following piece of code, the following compilation warning
is encountered:

if (_IOC_DIR(ioc) != _IOC_NONE) {
int verify = _IOC_DIR(ioc) & _IOC_READ ? VERIFY_WRITE : VERIFY_READ;

if (!access_ok(verify, ioarg, _IOC_SIZE(ioc))) {

drivers/platform/test/dev.c: In function 'my_ioctl':
drivers/platform/test/dev.c:219:7: warning: unused variable 'verify' [-Wunused-variable]
int verify = _IOC_DIR(ioc) & _IOC_READ ? VERIFY_WRITE : VERIFY_READ;

This patch fixes it by referencing 'type' in the macro allthough
doing nothing with it.

Signed-off-by: Christophe Leroy
Signed-off-by: Michael Ellerman
Signed-off-by: Sasha Levin

Christophe Leroy
2019-02-13 02:47:13 +0800