Eric Lee / smarc-fsl-linux-kernel

28 Sep, 2022

2 commits

cc1504f6d KVM: x86/mmu: Fold rmap_recycle into rmap_add ... Browse Code »

[ Upstream commit 68be1306caea8948738cab04014ca4506b590d38 ]

Consolidate rmap_recycle and rmap_add into a single function since they
are only ever called together (and only from one place). This has a nice
side effect of eliminating an extra kvm_vcpu_gfn_to_memslot(). In
addition it makes mmu_set_spte(), which is a very long function, a
little shorter.

No functional change intended.

Signed-off-by: David Matlack
Message-Id:
Signed-off-by: Paolo Bonzini
Stable-dep-of: 604f533262ae ("KVM: x86/mmu: add missing update to max_mmu_rmap_size")
Signed-off-by: Sasha Levin

David Matlack
2022-09-28 17:11:54 +0800
2427a04bc KVM: x86: Inject #UD on emulated XSETBV if XSAVES isn't enabled ... Browse Code »

commit 50b2d49bafa16e6311ab2da82f5aafc5f9ada99b upstream.

Inject #UD when emulating XSETBV if CR4.OSXSAVE is not set. This also
covers the "XSAVE not supported" check, as setting CR4.OSXSAVE=1 #GPs if
XSAVE is not supported (and userspace gets to keep the pieces if it
forces incoherent vCPU state).

Add a comment to kvm_emulate_xsetbv() to call out that the CPU checks
CR4.OSXSAVE before checking for intercepts. AMD'S APM implies that #UD
has priority (says that intercepts are checked before #GP exceptions),
while Intel's SDM says nothing about interception priority. However,
testing on hardware shows that both AMD and Intel CPUs prioritize the #UD
over interception.

Fixes: 02d4160fbd76 ("x86: KVM: add xsetbv to the emulator")
Cc: stable@vger.kernel.org
Cc: Vitaly Kuznetsov
Signed-off-by: Sean Christopherson
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Greg Kroah-Hartman

Sean Christopherson
2022-09-28 17:11:44 +0800

23 Sep, 2022

1 commit

39b023528 KVM: SEV: add cache flush to solve SEV cache incoherency issues ... Browse Code »

commit 683412ccf61294d727ead4a73d97397396e69a6b upstream.

Flush the CPU caches when memory is reclaimed from an SEV guest (where
reclaim also includes it being unmapped from KVM's memslots). Due to lack
of coherency for SEV encrypted memory, failure to flush results in silent
data corruption if userspace is malicious/broken and doesn't ensure SEV
guest memory is properly pinned and unpinned.

Cache coherency is not enforced across the VM boundary in SEV (AMD APM
vol.2 Section 15.34.7). Confidential cachelines, generated by confidential
VM guests have to be explicitly flushed on the host side. If a memory page
containing dirty confidential cachelines was released by VM and reallocated
to another user, the cachelines may corrupt the new user at a later time.

KVM takes a shortcut by assuming all confidential memory remain pinned
until the end of VM lifetime. Therefore, KVM does not flush cache at
mmu_notifier invalidation events. Because of this incorrect assumption and
the lack of cache flushing, malicous userspace can crash the host kernel:
creating a malicious VM and continuously allocates/releases unpinned
confidential memory pages when the VM is running.

Add cache flush operations to mmu_notifier operations to ensure that any
physical memory leaving the guest VM get flushed. In particular, hook
mmu_notifier_invalidate_range_start and mmu_notifier_release events and
flush cache accordingly. The hook after releasing the mmu lock to avoid
contention with other vCPUs.

Cc: stable@vger.kernel.org
Suggested-by: Sean Christpherson
Reported-by: Mingwei Zhang
Signed-off-by: Mingwei Zhang
Message-Id:
Signed-off-by: Paolo Bonzini
[OP: adjusted KVM_X86_OP_OPTIONAL() -> KVM_X86_OP_NULL, applied
kvm_arch_guest_memory_reclaimed() call in kvm_set_memslot()]
Signed-off-by: Ovidiu Panait
Signed-off-by: Greg Kroah-Hartman

Mingwei Zhang
2022-09-23 20:15:52 +0800

20 Sep, 2022

3 commits

64840a4a2 task_stack, x86/cea: Force-inline stack helpers ... Browse Code »

[ Upstream commit e87f4152e542610d0b4c6c8548964a68a59d2040 ]

Force-inline two stack helpers to fix the following objtool warnings:

vmlinux.o: warning: objtool: in_task_stack()+0xc: call to task_stack_page() leaves .noinstr.text section
vmlinux.o: warning: objtool: in_entry_stack()+0x10: call to cpu_entry_stack() leaves .noinstr.text section

Signed-off-by: Borislav Petkov
Acked-by: Peter Zijlstra (Intel)
Link: https://lore.kernel.org/r/20220324183607.31717-2-bp@alien8.de
Stable-dep-of: 54c3931957f6 ("tracing: hold caller_addr to hardirq_{enable,disable}_ip")
Signed-off-by: Sasha Levin

Borislav Petkov
2022-09-20 18:39:43 +0800
0b009e5fd x86/mm: Force-inline __phys_addr_nodebug() ... Browse Code »

[ Upstream commit ace1a98519270c586c0d4179419292df67441cd1 ]

Fix:

vmlinux.o: warning: objtool: __sev_es_nmi_complete()+0x8b: call to __phys_addr_nodebug() leaves .noinstr.text section

Signed-off-by: Borislav Petkov
Acked-by: Peter Zijlstra (Intel)
Link: https://lore.kernel.org/r/20220324183607.31717-4-bp@alien8.de
Stable-dep-of: 54c3931957f6 ("tracing: hold caller_addr to hardirq_{enable,disable}_ip")
Signed-off-by: Sasha Levin

Borislav Petkov
2022-09-20 18:39:43 +0800
f9571a969 lockdep: Fix -Wunused-parameter for _THIS_IP_ ... Browse Code »

[ Upstream commit 8b023accc8df70e72f7704d29fead7ca914d6837 ]

While looking into a bug related to the compiler's handling of addresses
of labels, I noticed some uses of _THIS_IP_ seemed unused in lockdep.
Drive by cleanup.

-Wunused-parameter:
kernel/locking/lockdep.c:1383:22: warning: unused parameter 'ip'
kernel/locking/lockdep.c:4246:48: warning: unused parameter 'ip'
kernel/locking/lockdep.c:4844:19: warning: unused parameter 'ip'

Signed-off-by: Nick Desaulniers
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Waiman Long
Link: https://lore.kernel.org/r/20220314221909.2027027-1-ndesaulniers@google.com
Stable-dep-of: 54c3931957f6 ("tracing: hold caller_addr to hardirq_{enable,disable}_ip")
Signed-off-by: Sasha Levin

Nick Desaulniers
2022-09-20 18:39:42 +0800

08 Sep, 2022

2 commits

03b1870fb KVM: x86: Mask off unsupported and unknown bits of IA32_ARCH_CAPABILITIES ... Browse Code »

[ Upstream commit 0204750bd4c6ccc2fb7417618477f10373b33f56 ]

KVM should not claim to virtualize unknown IA32_ARCH_CAPABILITIES
bits. When kvm_get_arch_capabilities() was originally written, there
were only a few bits defined in this MSR, and KVM could virtualize all
of them. However, over the years, several bits have been defined that
KVM cannot just blindly pass through to the guest without additional
work (such as virtualizing an MSR promised by the
IA32_ARCH_CAPABILITES feature bit).

Define a mask of supported IA32_ARCH_CAPABILITIES bits, and mask off
any other bits that are set in the hardware MSR.

Cc: Paolo Bonzini
Fixes: 5b76a3cff011 ("KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry")
Signed-off-by: Jim Mattson
Reviewed-by: Vipin Sharma
Reviewed-by: Xiaoyao Li
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

Jim Mattson
2022-09-08 18:28:05 +0800
fec48eba4 KVM: VMX: Heed the 'msr' argument in msr_write_intercepted() ... Browse Code »

[ Upstream commit 020dac4187968535f089f83f376a72beb3451311 ]

Regardless of the 'msr' argument passed to the VMX version of
msr_write_intercepted(), the function always checks to see if a
specific MSR (IA32_SPEC_CTRL) is intercepted for write. This behavior
seems unintentional and unexpected.

Modify the function so that it checks to see if the provided 'msr'
index is intercepted for write.

Fixes: 67f4b9969c30 ("KVM: nVMX: Handle dynamic MSR intercept toggling")
Cc: Sean Christopherson
Signed-off-by: Jim Mattson
Reviewed-by: Sean Christopherson
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

Jim Mattson
2022-09-08 18:28:04 +0800

31 Aug, 2022

8 commits

b5f5fee03 perf/x86/intel/ds: Fix precise store latency handling ... Browse Code »

commit d4bdb0bebc5ba3299d74f123c782d99cd4e25c49 upstream.

With the existing code in store_latency_data(), the memory operation (mem_op)
returned to the user is always OP_LOAD where in fact, it should be OP_STORE.
This comes from the fact that the function is simply grabbing the information
from a data source map which covers only load accesses. Intel 12th gen CPU
offers precise store sampling that captures both the data source and latency.
Therefore it can use the data source mapping table but must override the
memory operation to reflect stores instead of loads.

Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids")
Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20220818054613.1548130-1-eranian@google.com
Signed-off-by: Greg Kroah-Hartman

Stephane Eranian
2022-08-31 23:16:51 +0800
83bd6d121 perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMU ... Browse Code »

commit 11745ecfe8fea4b4a4c322967a7605d2ecbd5080 upstream.

Existing code was generating bogus counts for the SNB IMC bandwidth counters:

$ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/
1.000327813 1,024.03 MiB uncore_imc/data_reads/
1.000327813 20.73 MiB uncore_imc/data_writes/
2.000580153 261,120.00 MiB uncore_imc/data_reads/
2.000580153 23.28 MiB uncore_imc/data_writes/

The problem was introduced by commit:
07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC")

Where the read_counter callback was replace to point to the generic
uncore_mmio_read_counter() function.

The SNB IMC counters are freerunnig 32-bit counters laid out contiguously in
MMIO. But uncore_mmio_read_counter() is using a readq() call to read from
MMIO therefore reading 64-bit from MMIO. Although this is okay for the
uncore_perf_event_update() function because it is shifting the value based
on the actual counter width to compute a delta, it is not okay for the
uncore_pmu_event_start() which is simply reading the counter and therefore
priming the event->prev_count with a bogus value which is responsible for
causing bogus deltas in the perf stat command above.

The fix is to reintroduce the custom callback for read_counter for the SNB
IMC PMU and use readl() instead of readq(). With the change the output of
perf stat is back to normal:
$ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/
1.000120987 296.94 MiB uncore_imc/data_reads/
1.000120987 138.42 MiB uncore_imc/data_writes/
2.000403144 175.91 MiB uncore_imc/data_reads/
2.000403144 68.50 MiB uncore_imc/data_writes/

Fixes: 07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC")
Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Kan Liang
Link: https://lore.kernel.org/r/20220803160031.1379788-1-eranian@google.com
Signed-off-by: Greg Kroah-Hartman

Stephane Eranian
2022-08-31 23:16:51 +0800
992d2fc2f x86/nospec: Fix i386 RSB stuffing ... Browse Code »

commit 332924973725e8cdcc783c175f68cf7e162cb9e5 upstream.

Turns out that i386 doesn't unconditionally have LFENCE, as such the
loop in __FILL_RETURN_BUFFER isn't actually speculation safe on such
chips.

Fixes: ba6e31af2be9 ("x86/speculation: Add LFENCE to RSB fill sequence")
Reported-by: Ben Hutchings
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/Yv9tj9vbQ9nNlXoY@worktop.programming.kicks-ass.net
Signed-off-by: Greg Kroah-Hartman

Peter Zijlstra
2022-08-31 23:16:50 +0800
500195a10 x86/nospec: Unwreck the RSB stuffing ... Browse Code »

commit 4e3aa9238277597c6c7624f302d81a7b568b6f2d upstream.

Commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections")
made a right mess of the RSB stuffing, rewrite the whole thing to not
suck.

Thanks to Andrew for the enlightening comment about Post-Barrier RSB
things so we can make this code less magical.

Cc: stable@vger.kernel.org
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/YvuNdDWoUZSBjYcm@worktop.programming.kicks-ass.net
Signed-off-by: Greg Kroah-Hartman

Peter Zijlstra
2022-08-31 23:16:47 +0800
75fa6c733 x86/bugs: Add "unknown" reporting for MMIO Stale Data ... Browse Code »

commit 7df548840c496b0141fb2404b889c346380c2b22 upstream.

Older Intel CPUs that are not in the affected processor list for MMIO
Stale Data vulnerabilities currently report "Not affected" in sysfs,
which may not be correct. Vulnerability status for these older CPUs is
unknown.

Add known-not-affected CPUs to the whitelist. Report "unknown"
mitigation status for CPUs that are not in blacklist, whitelist and also
don't enumerate MSR ARCH_CAPABILITIES bits that reflect hardware
immunity to MMIO Stale Data vulnerabilities.

Mitigation is not deployed when the status is unknown.

[ bp: Massage, fixup. ]

Fixes: 8d50cdf8b834 ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data")
Suggested-by: Andrew Cooper
Suggested-by: Tony Luck
Signed-off-by: Pawan Gupta
Signed-off-by: Borislav Petkov
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/a932c154772f2121794a5f2eded1a11013114711.1657846269.git.pawan.kumar.gupta@linux.intel.com
Signed-off-by: Greg Kroah-Hartman

Pawan Gupta
2022-08-31 23:16:47 +0800
a7484eb9f x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry ... Browse Code »

commit fc2e426b1161761561624ebd43ce8c8d2fa058da upstream.

When meeting ftrace trampolines in ORC unwinding, unwinder uses address
of ftrace_{regs_}call address to find the ORC entry, which gets next frame at
sp+176.

If there is an IRQ hitting at sub $0xa8,%rsp, the next frame should be
sp+8 instead of 176. It makes unwinder skip correct frame and throw
warnings such as "wrong direction" or "can't access registers", etc,
depending on the content of the incorrect frame address.

By adding the base address ftrace_{regs_}caller with the offset
*ip - ops->trampoline*, we can get the correct address to find the ORC entry.

Also change "caller" to "tramp_addr" to make variable name conform to
its content.

[ mingo: Clarified the changelog a bit. ]

Fixes: 6be7fa3c74d1 ("ftrace, orc, x86: Handle ftrace dynamically allocated trampolines")
Signed-off-by: Chen Zhongjin
Signed-off-by: Ingo Molnar
Reviewed-by: Steven Rostedt (Google)
Cc:
Link: https://lore.kernel.org/r/20220819084334.244016-1-chenzhongjin@huawei.com
Signed-off-by: Greg Kroah-Hartman

Chen Zhongjin
2022-08-31 23:16:47 +0800
1cdfef6cd perf/x86/lbr: Enable the branch type for the Arch LBR by default ... Browse Code »

commit 32ba156df1b1c8804a4e5be5339616945eafea22 upstream.

On the platform with Arch LBR, the HW raw branch type encoding may leak
to the perf tool when the SAVE_TYPE option is not set.

In the intel_pmu_store_lbr(), the HW raw branch type is stored in
lbr_entries[].type. If the SAVE_TYPE option is set, the
lbr_entries[].type will be converted into the generic PERF_BR_* type
in the intel_pmu_lbr_filter() and exposed to the user tools.
But if the SAVE_TYPE option is NOT set by the user, the current perf
kernel doesn't clear the field. The HW raw branch type leaks.

There are two solutions to fix the issue for the Arch LBR.
One is to clear the field if the SAVE_TYPE option is NOT set.
The other solution is to unconditionally convert the branch type and
expose the generic type to the user tools.

The latter is implemented here, because
- The branch type is valuable information. I don't see a case where
you would not benefit from the branch type. (Stephane Eranian)
- Not having the branch type DOES NOT save any space in the
branch record (Stephane Eranian)
- The Arch LBR HW can retrieve the common branch types from the
LBR_INFO. It doesn't require the high overhead SW disassemble.

Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
Reported-by: Stephane Eranian
Signed-off-by: Kan Liang
Signed-off-by: Peter Zijlstra (Intel)
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20220816125612.2042397-1-kan.liang@linux.intel.com
Signed-off-by: Greg Kroah-Hartman

Kan Liang
2022-08-31 23:16:46 +0800
a50d9fde4 x86/entry: Move CLD to the start of the idtentry macro ... Browse Code »

commit c64cc2802a784ecfd25d39945e57e7a147854a5b upstream.

Move it after CLAC.

Suggested-by: Peter Zijlstra
Signed-off-by: Lai Jiangshan
Signed-off-by: Borislav Petkov
Link: https://lore.kernel.org/r/20220503032107.680190-5-jiangshanlai@gmail.com
Signed-off-by: Juergen Gross
Signed-off-by: Greg Kroah-Hartman

Lai Jiangshan
2022-08-31 23:16:34 +0800

25 Aug, 2022

2 commits

14674e47f x86/kprobes: Fix JNG/JNLE emulation ... Browse Code »

commit 8924779df820c53875abaeb10c648e9cb75b46d4 upstream.

When kprobes emulates JNG/JNLE instructions on x86 it uses the wrong
condition. For JNG (opcode: 0F 8E), according to Intel SDM, the jump is
performed if (ZF == 1 or SF != OF). However the kernel emulation
currently uses 'and' instead of 'or'.

As a result, setting a kprobe on JNG/JNLE might cause the kernel to
behave incorrectly whenever the kprobe is hit.

Fix by changing the 'and' to 'or'.

Fixes: 6256e668b7af ("x86/kprobes: Use int3 instead of debug trap for single-step")
Signed-off-by: Nadav Amit
Signed-off-by: Ingo Molnar
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220813225943.143767-1-namit@vmware.com
Signed-off-by: Greg Kroah-Hartman

Nadav Amit
2022-08-25 17:39:57 +0800
d26beb910 x86/mm: Use proper mask when setting PUD mapping ... Browse Code »

commit 88e0a74902f894fbbc55ad3ad2cb23b4bfba555c upstream.

Commit c164fbb40c43f("x86/mm: thread pgprot_t through
init_memory_mapping()") mistakenly used __pgprot() which doesn't respect
__default_kernel_pte_mask when setting PUD mapping.

Fix it by only setting the one bit we actually need (PSE) and leaving
the other bits (that have been properly masked) alone.

Fixes: c164fbb40c43 ("x86/mm: thread pgprot_t through init_memory_mapping()")
Signed-off-by: Aaron Lu
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Aaron Lu
2022-08-25 17:39:53 +0800

21 Aug, 2022

3 commits

3eb602ad6 x86/ftrace: Use alternative RET encoding ... Browse Code »

commit 1f001e9da6bbf482311e45e48f53c2bd2179e59c upstream.

Use the return thunk in ftrace trampolines, if needed.

Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Borislav Petkov
Reviewed-by: Josh Poimboeuf
Signed-off-by: Borislav Petkov
[cascardo: use memcpy(text_gen_insn) as there is no __text_gen_insn]
Signed-off-by: Thadeu Lima de Souza Cascardo
Signed-off-by: Greg Kroah-Hartman

Peter Zijlstra
2022-08-21 21:17:48 +0800
543138c55 x86/ibt,ftrace: Make function-graph play nice ... Browse Code »

commit e52fc2cf3f662828cc0d51c4b73bed73ad275fce upstream.

Return trampoline must not use indirect branch to return; while this
preserves the RSB, it is fundamentally incompatible with IBT. Instead
use a retpoline like ROP gadget that defeats IBT while not unbalancing
the RSB.

And since ftrace_stub is no longer a plain RET, don't use it to copy
from. Since RET is a trivial instruction, poke it directly.

Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Josh Poimboeuf
Link: https://lore.kernel.org/r/20220308154318.347296408@infradead.org
[cascardo: remove ENDBR]
Signed-off-by: Thadeu Lima de Souza Cascardo
Signed-off-by: Greg Kroah-Hartman

Peter Zijlstra
2022-08-21 21:17:48 +0800
f66327634 Revert "x86/ftrace: Use alternative RET encoding" ... Browse Code »

This reverts commit e54fcb0812faebd147de72bd37ad87cc4951c68c.

This temporarily reverts the backport of upstream commit
1f001e9da6bbf482311e45e48f53c2bd2179e59c. It was not correct to copy the
ftrace stub as it would contain a relative jump to the return thunk which
would not apply to the context where it was being copied to, leading to
ftrace support to be broken.

Signed-off-by: Thadeu Lima de Souza Cascardo
Signed-off-by: Greg Kroah-Hartman

Thadeu Lima de Souza Cascardo
2022-08-21 21:17:48 +0800

17 Aug, 2022

19 commits

34bef00a3 KVM: nVMX: Attempt to load PERF_GLOBAL_CTRL on nVMX xfer iff it exists ... Browse Code »

[ Upstream commit 4496a6f9b45e8cd83343ad86a3984d614e22cf54 ]

Attempt to load PERF_GLOBAL_CTRL during nested VM-Enter/VM-Exit if and
only if the MSR exists (according to the guest vCPU model). KVM has very
misguided handling of VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL and
attempts to force the nVMX MSR settings to match the vPMU model, i.e. to
hide/expose the control based on whether or not the MSR exists from the
guest's perspective.

KVM's modifications fail to handle the scenario where the vPMU is hidden
from the guest _after_ being exposed to the guest, e.g. by userspace
doing multiple KVM_SET_CPUID2 calls, which is allowed if done before any
KVM_RUN. nested_vmx_pmu_refresh() is called if and only if there's a
recognized vPMU, i.e. KVM will leave the bits in the allow state and then
ultimately reject the MSR load and WARN.

KVM should not force the VMX MSRs in the first place. KVM taking control
of the MSRs was a misguided attempt at mimicking what commit 5f76f6f5ff96
("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled",
2018-10-01) did for MPX. However, the MPX commit was a workaround for
another KVM bug and not something that should be imitated (and it should
never been done in the first place).

In other words, KVM's ABI _should_ be that userspace has full control
over the MSRs, at which point triggering the WARN that loading the MSR
must not fail is trivial.

The intent of the WARN is still valid; KVM has consistency checks to
ensure that vmcs12->{guest,host}_ia32_perf_global_ctrl is valid. The
problem is that '0' must be considered a valid value at all times, and so
the simple/obvious solution is to just not actually load the MSR when it
does not exist. It is userspace's responsibility to provide a sane vCPU
model, i.e. KVM is well within its ABI and Intel's VMX architecture to
skip the loads if the MSR does not exist.

Fixes: 03a8871add95 ("KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL VM-{Entry,Exit} control")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

Sean Christopherson
2022-08-17 20:24:25 +0800
e365c817b KVM: VMX: Add helper to check if the guest PMU has PERF_GLOBAL_CTRL ... Browse Code »

[ Upstream commit b663f0b5f3d665c261256d1f76e98f077c6e56af ]

Add a helper to check of the guest PMU has PERF_GLOBAL_CTRL, which is
unintuitive _and_ diverges from Intel's architecturally defined behavior.
Even worse, KVM currently implements the check using two different (but
equivalent) checks, _and_ there has been at least one attempt to add a
_third_ flavor.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

Sean Christopherson
2022-08-17 20:24:25 +0800
1eedac05b KVM: x86/pmu: Ignore pmu->global_ctrl check if vPMU doesn't support global_ctrl ... Browse Code »

[ Upstream commit 98defd2e17803263f49548fea930cfc974d505aa ]

MSR_CORE_PERF_GLOBAL_CTRL is introduced as part of Architecture PMU V2,
as indicated by Intel SDM 19.2.2 and the intel_is_valid_msr() function.

So in the absence of global_ctrl support, all PMCs are enabled as AMD does.

Signed-off-by: Like Xu
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

Like Xu
2022-08-17 20:24:25 +0800
9f1a17222 KVM: VMX: Mark all PERF_GLOBAL_(OVF)_CTRL bits reserved if there's no vPMU ... Browse Code »

[ Upstream commit 93255bf92939d948bc86d81c6bb70bb0fecc5db1 ]

Mark all MSR_CORE_PERF_GLOBAL_CTRL and MSR_CORE_PERF_GLOBAL_OVF_CTRL bits
as reserved if there is no guest vPMU. The nVMX VM-Entry consistency
checks do not check for a valid vPMU prior to consuming the masks via
kvm_valid_perf_global_ctrl(), i.e. may incorrectly allow a non-zero mask
to be loaded via VM-Enter or VM-Exit (well, attempted to be loaded, the
actual MSR load will be rejected by intel_is_valid_msr()).

Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

Sean Christopherson
2022-08-17 20:24:25 +0800
81f723a00 KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter ... Browse Code »

[ Upstream commit 2c985527dd8d283e786ad7a67e532ef7f6f00fac ]

The mask value of fixed counter control register should be dynamic
adjusted with the number of fixed counters. This patch introduces a
variable that includes the reserved bits of fixed counter control
registers. This is a generic code refactoring.

Co-developed-by: Luwei Kang
Signed-off-by: Luwei Kang
Signed-off-by: Like Xu
Acked-by: Peter Zijlstra (Intel)
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

Like Xu
2022-08-17 20:24:25 +0800
f2145a1bf KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS) ... Browse Code »

[ Upstream commit 2368048bf5c2ec4b604ac3431564071e89a0bc71 ]

Return '1', not '-1', when handling an illegal WRMSR to a MCi_CTL or
MCi_STATUS MSR. The behavior of "all zeros' or "all ones" for CTL MSRs
is architectural, as is the "only zeros" behavior for STATUS MSRs. I.e.
the intent is to inject a #GP, not exit to userspace due to an unhandled
emulation case. Returning '-1' gets interpreted as -EPERM up the stack
and effecitvely kills the guest.

Fixes: 890ca9aefa78 ("KVM: Add MCE support")
Fixes: 9ffd986c6e4e ("KVM: X86: #GP when guest attempts to write MCi_STATUS register w/o 0")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson
Reviewed-by: Jim Mattson
Link: https://lore.kernel.org/r/20220512222716.4112548-2-seanjc@google.com
Signed-off-by: Sasha Levin

Sean Christopherson
2022-08-17 20:24:22 +0800
1f71d1f7f KVM: set_msr_mce: Permit guests to ignore single-bit ECC errors ... Browse Code »

[ Upstream commit 0471a7bd1bca2a47a5f378f2222c5cf39ce94152 ]

Certain guest operating systems (e.g., UNIXWARE) clear bit 0 of
MC1_CTL to ignore single-bit ECC data errors. Single-bit ECC data
errors are always correctable and thus are safe to ignore because they
are informational in nature rather than signaling a loss of data
integrity.

Prior to this patch, these guests would crash upon writing MC1_CTL,
with resultant error messages like the following:

error: kvm run failed Operation not permitted
EAX=fffffffe EBX=fffffffe ECX=00000404 EDX=ffffffff
ESI=ffffffff EDI=00000001 EBP=fffdaba4 ESP=fffdab20
EIP=c01333a5 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0108 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
CS =0100 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0108 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
DS =0108 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
FS =0000 00000000 ffffffff 00c00000
GS =0000 00000000 ffffffff 00c00000
LDT=0118 c1026390 00000047 00008200 DPL=0 LDT
TR =0110 ffff5af0 00000067 00008b00 DPL=0 TSS32-busy
GDT= ffff5020 000002cf
IDT= ffff52f0 000007ff
CR0=8001003b CR2=00000000 CR3=0100a000 CR4=00000230
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
EFER=0000000000000000
Code=08 89 01 89 51 04 c3 8b 4c 24 08 8b 01 8b 51 04 8b 4c 24 04
30 c3 f7 05 a4 6d ff ff 10 00 00 00 74 03 0f 31 c3 33 c0 33 d2 c3 8d
74 26 00 0f 31 c3

Signed-off-by: Lev Kujawski
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

Lev Kujawski
2022-08-17 20:24:22 +0800
3dd33a09f crypto: blake2s - remove shash module ... Browse Code »

[ Upstream commit 2d16803c562ecc644803d42ba98a8e0aef9c014e ]

BLAKE2s has no currently known use as an shash. Just remove all of this
unnecessary plumbing. Removing this shash was something we talked about
back when we were making BLAKE2s a built-in, but I simply never got
around to doing it. So this completes that project.

Importantly, this fixs a bug in which the lib code depends on
crypto_simd_disabled_for_test, causing linker errors.

Also add more alignment tests to the selftests and compare SIMD and
non-SIMD compression functions, to make up for what we lose from
testmgr.c.

Reported-by: gaochao
Cc: Eric Biggers
Cc: Ard Biesheuvel
Cc: stable@vger.kernel.org
Fixes: 6048fdcc5f26 ("lib/crypto: blake2s: include as built-in")
Signed-off-by: Jason A. Donenfeld
Signed-off-by: Herbert Xu
Signed-off-by: Sasha Levin

Jason A. Donenfeld
2022-08-17 20:24:19 +0800
c273671ae x86/olpc: fix 'logical not is only applied to the left hand side' ... Browse Code »

commit 3a2ba42cbd0b669ce3837ba400905f93dd06c79f upstream.

The bitops compile-time optimization series revealed one more
problem in olpc-xo1-sci.c:send_ebook_state(), resulted in GCC
warnings:

arch/x86/platform/olpc/olpc-xo1-sci.c: In function 'send_ebook_state':
arch/x86/platform/olpc/olpc-xo1-sci.c:83:63: warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses]
83 | if (!!test_bit(SW_TABLET_MODE, ebook_switch_idev->sw) == state)
| ^~
arch/x86/platform/olpc/olpc-xo1-sci.c:83:13: note: add parentheses around left hand side expression to silence this warning

Despite this code working as intended, this redundant double
negation of boolean value, together with comparing to `char`
with no explicit conversion to bool, makes compilers think
the author made some unintentional logical mistakes here.
Make it the other way around and negate the char instead
to silence the warnings.

Fixes: d2aa37411b8e ("x86/olpc/xo1/sci: Produce wakeup events for buttons and switches")
Cc: stable@vger.kernel.org # 3.5+
Reported-by: Guenter Roeck
Reported-by: kernel test robot
Reviewed-and-tested-by: Guenter Roeck
Signed-off-by: Alexander Lobakin
Signed-off-by: Yury Norov
Signed-off-by: Greg Kroah-Hartman

Alexander Lobakin
2022-08-17 20:24:18 +0800
1cbf3882c x86/kprobes: Update kcb status flag after singlestepping ... Browse Code »

commit dec8784c9088b131a1523f582c2194cfc8107dc0 upstream.

Fix kprobes to update kcb (kprobes control block) status flag to
KPROBE_HIT_SSDONE even if the kp->post_handler is not set.

This bug may cause a kernel panic if another INT3 user runs right
after kprobes because kprobe_int3_handler() misunderstands the
INT3 is kprobe's single stepping INT3.

Fixes: 6256e668b7af ("x86/kprobes: Use int3 instead of debug trap for single-step")
Reported-by: Daniel Müller
Signed-off-by: Masami Hiramatsu (Google)
Signed-off-by: Ingo Molnar
Tested-by: Daniel Müller
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20220727210136.jjgc3lpqeq42yr3m@muellerd-fedora-PC2BDTX9
Link: https://lore.kernel.org/r/165942025658.342061.12452378391879093249.stgit@devnote2
Signed-off-by: Greg Kroah-Hartman

Masami Hiramatsu (Google)
2022-08-17 20:24:18 +0800
7c91c8da4 ftrace/x86: Add back ftrace_expected assignment ... Browse Code »

commit ac6c1b2ca77e722a1e5d651f12f437f2f237e658 upstream.

When a ftrace_bug happens (where ftrace fails to modify a location) it is
helpful to have what was at that location as well as what was expected to
be there.

But with the conversion to text_poke() the variable that assigns the
expected for debugging was dropped. Unfortunately, I noticed this when I
needed it. Add it back.

Link: https://lkml.kernel.org/r/20220726101851.069d2e70@gandalf.local.home

Cc: "x86@kernel.org"
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Borislav Petkov
Cc: "H. Peter Anvin"
Cc: Andrew Morton
Cc: stable@vger.kernel.org
Fixes: 768ae4406a5c ("x86/ftrace: Use text_poke()")
Signed-off-by: Steven Rostedt (Google)
Signed-off-by: Greg Kroah-Hartman

Steven Rostedt (Google)
2022-08-17 20:24:18 +0800
0b00cb428 x86/bugs: Enable STIBP for IBPB mitigated RETBleed ... Browse Code »

commit e6cfcdda8cbe81eaf821c897369a65fec987b404 upstream.

AMD's "Technical Guidance for Mitigating Branch Type Confusion,
Rev. 1.0 2022-07-12" whitepaper, under section 6.1.2 "IBPB On
Privileged Mode Entry / SMT Safety" says:

Similar to the Jmp2Ret mitigation, if the code on the sibling thread
cannot be trusted, software should set STIBP to 1 or disable SMT to
ensure SMT safety when using this mitigation.

So, like already being done for retbleed=unret, and now also for
retbleed=ibpb, force STIBP on machines that have it, and report its SMT
vulnerability status accordingly.

[ bp: Remove the "we" and remove "[AMD]" applicability parameter which
doesn't work here. ]

Fixes: 3ebc17006888 ("x86/bugs: Add retbleed=ibpb")
Signed-off-by: Kim Phillips
Signed-off-by: Borislav Petkov
Cc: stable@vger.kernel.org # 5.10, 5.15, 5.19
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Link: https://lore.kernel.org/r/20220804192201.439596-1-kim.phillips@amd.com
Signed-off-by: Greg Kroah-Hartman

Kim Phillips
2022-08-17 20:24:18 +0800
608d4c5f9 x86/entry: Build thunk_$(BITS) only if CONFIG_PREEMPTION=y ... Browse Code »

[ Upstream commit de979c83574abf6e78f3fa65b716515c91b2613d ]

With CONFIG_PREEMPTION disabled, arch/x86/entry/thunk_$(BITS).o becomes
an empty object file.

With some old versions of binutils (i.e., 2.35.90.20210113-1ubuntu1) the
GNU assembler doesn't generate a symbol table for empty object files and
objtool fails with the following error when a valid symbol table cannot
be found:

arch/x86/entry/thunk_64.o: warning: objtool: missing symbol table

To prevent this from happening, build thunk_$(BITS).o only if
CONFIG_PREEMPTION is enabled.

BugLink: https://bugs.launchpad.net/bugs/1911359

Fixes: 320100a5ffe5 ("x86/entry: Remove the TRACE_IRQS cruft")
Signed-off-by: Andrea Righi
Signed-off-by: Ingo Molnar
Link: https://lore.kernel.org/r/Ys/Ke7EWjcX+ZlXO@arighi-desktop
Signed-off-by: Sasha Levin

Andrea Righi
2022-08-17 20:24:15 +0800
3bb94ff1e x86/numa: Use cpumask_available instead of hardcoded NULL check ... Browse Code »

[ Upstream commit 625395c4a0f4775e0fe00f616888d2e6c1ba49db ]

GCC-12 started triggering a new warning:

arch/x86/mm/numa.c: In function ‘cpumask_of_node’:
arch/x86/mm/numa.c:916:39: warning: the comparison will always evaluate as ‘false’ for the address of ‘node_to_cpumask_map’ will never be NULL [-Waddress]
916 | if (node_to_cpumask_map[node] == NULL) {
| ^~

node_to_cpumask_map is of type cpumask_var_t[].

When CONFIG_CPUMASK_OFFSTACK is set, cpumask_var_t is typedef'd to a
pointer for dynamic allocation, else to an array of one element. The
"wicked game" can be checked on line 700 of include/linux/cpumask.h.

The original code in debug_cpumask_set_cpu() and cpumask_of_node() were
probably written by the original authors with CONFIG_CPUMASK_OFFSTACK=y
(i.e. dynamic allocation) in mind, checking if the cpumask was available
via a direct NULL check.

When CONFIG_CPUMASK_OFFSTACK is not set, GCC gives the above warning
while compiling the kernel.

Fix that by using cpumask_available(), which does the NULL check when
CONFIG_CPUMASK_OFFSTACK is set, otherwise returns true. Use it wherever
such checks are made.

Conditional definitions of cpumask_available() can be found along with
the definition of cpumask_var_t. Check the cpumask.h reference mentioned
above.

Fixes: c032ef60d1aa ("cpumask: convert node_to_cpumask_map[] to cpumask_var_t")
Fixes: de2d9445f162 ("x86: Unify node_to_cpumask_map handling between 32 and 64bit")
Signed-off-by: Siddh Raman Pant
Signed-off-by: Ingo Molnar
Link: https://lore.kernel.org/r/20220731160913.632092-1-code@siddh.me
Signed-off-by: Sasha Levin

Siddh Raman Pant
2022-08-17 20:24:15 +0800
31dad89b1 x86/bus_lock: Don't assume the init value of DEBUGCTLMSR.BUS_LOCK_DETECT to be zero ... Browse Code »

[ Upstream commit ffa6482e461ff550325356ae705b79e256702ea9 ]

It's possible that this kernel has been kexec'd from a kernel that
enabled bus lock detection, or (hypothetically) BIOS/firmware has set
DEBUGCTLMSR_BUS_LOCK_DETECT.

Disable bus lock detection explicitly if not wanted.

Fixes: ebb1064e7c2e ("x86/traps: Handle #DB for bus lock")
Signed-off-by: Chenyi Qiang
Signed-off-by: Ingo Molnar
Reviewed-by: Tony Luck
Link: https://lore.kernel.org/r/20220802033206.21333-1-chenyi.qiang@intel.com
Signed-off-by: Sasha Levin

Chenyi Qiang
2022-08-17 20:24:14 +0800
e389e927e KVM: nVMX: Set UMIP bit CR4_FIXED1 MSR when emulating UMIP ... Browse Code »

[ Upstream commit a910b5ab6b250a88fff1866bf708642d83317466 ]

Make UMIP an "allowed-1" bit CR4_FIXED1 MSR when KVM is emulating UMIP.
KVM emulates UMIP for both L1 and L2, and so should enumerate that L2 is
allowed to have CR4.UMIP=1. Not setting the bit doesn't immediately
break nVMX, as KVM does set/clear the bit in CR4_FIXED1 in response to a
guest CPUID update, i.e. KVM will correctly (dis)allow nested VM-Entry
based on whether or not UMIP is exposed to L1. That said, KVM should
enumerate the bit as being allowed from time zero, e.g. userspace will
see the wrong value if the MSR is read before CPUID is written.

Fixes: 0367f205a3b7 ("KVM: vmx: add support for emulating UMIP")
Signed-off-by: Sean Christopherson
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

Sean Christopherson
2022-08-17 20:23:58 +0800
7840dce79 KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported ... Browse Code »

[ Upstream commit 3741aec4c38fa4123ab08ae552f05366d4fd05d8 ]

If NRIPS is supported in hardware but disabled in KVM, set next_rip to
the next RIP when advancing RIP as part of emulating INT3 injection.
There is no flag to tell the CPU that KVM isn't using next_rip, and so
leaving next_rip is left as is will result in the CPU pushing garbage
onto the stack when vectoring the injected event.

Reviewed-by: Maxim Levitsky
Fixes: 66b7138f9136 ("KVM: SVM: Emulate nRIP feature when reinjecting INT3")
Signed-off-by: Sean Christopherson
Signed-off-by: Maciej S. Szmigiero
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

Sean Christopherson
2022-08-17 20:23:40 +0800
2adc7032e KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails" ... Browse Code »

[ Upstream commit cd9e6da8048c5b40315ee2d929b6230ce1252c3c ]

Unwind the RIP advancement done by svm_queue_exception() when injecting
an INT3 ultimately "fails" due to the CPU encountering a VM-Exit while
vectoring the injected event, even if the exception reported by the CPU
isn't the same event that was injected. If vectoring INT3 encounters an
exception, e.g. #NP, and vectoring the #NP encounters an intercepted
exception, e.g. #PF when KVM is using shadow paging, then the #NP will
be reported as the event that was in-progress.

Note, this is still imperfect, as it will get a false positive if the
INT3 is cleanly injected, no VM-Exit occurs before the IRET from the INT3
handler in the guest, the instruction following the INT3 generates an
exception (directly or indirectly), _and_ vectoring that exception
encounters an exception that is intercepted by KVM. The false positives
could theoretically be solved by further analyzing the vectoring event,
e.g. by comparing the error code against the expected error code were an
exception to occur when vectoring the original injected exception, but
SVM without NRIPS is a complete disaster, trying to make it 100% correct
is a waste of time.

Reviewed-by: Maxim Levitsky
Fixes: 66b7138f9136 ("KVM: SVM: Emulate nRIP feature when reinjecting INT3")
Signed-off-by: Sean Christopherson
Signed-off-by: Maciej S. Szmigiero
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

Sean Christopherson
2022-08-17 20:23:40 +0800
7bc43ab2b x86/extable: Fix ex_handler_msr() print condition ... Browse Code »

[ Upstream commit a1a5482a2c6e38a3ebed32e571625c56a8cc41a6 ]

On Fri, Jun 17, 2022 at 02:08:52PM +0300, Stephane Eranian wrote:
> Some changes to the way invalid MSR accesses are reported by the
> kernel is causing some problems with messages printed on the
> console.
>
> We have seen several cases of ex_handler_msr() printing invalid MSR
> accesses once but the callstack multiple times causing confusion on
> the console.

> The problem here is that another earlier commit (5.13):
>
> a358f40600b3 ("once: implement DO_ONCE_LITE for non-fast-path "do once" functionality")
>
> Modifies all the pr_*_once() calls to always return true claiming
> that no caller is ever checking the return value of the functions.
>
> This is why we are seeing the callstack printed without the
> associated printk() msg.

Extract the ONCE_IF(cond) part into __ONCE_LTE_IF() and use that to
implement DO_ONCE_LITE_IF() and fix the extable code.

Fixes: a358f40600b3 ("once: implement DO_ONCE_LITE for non-fast-path "do once" functionality")
Reported-by: Stephane Eranian
Signed-off-by: Peter Zijlstra (Intel)
Tested-by: Stephane Eranian
Link: https://lkml.kernel.org/r/YqyVFsbviKjVGGZ9@worktop.programming.kicks-ass.net
Signed-off-by: Sasha Levin

Peter Zijlstra
2022-08-17 20:23:14 +0800