28 Sep, 2022

2 commits

  • [ Upstream commit 68be1306caea8948738cab04014ca4506b590d38 ]

    Consolidate rmap_recycle and rmap_add into a single function since they
    are only ever called together (and only from one place). This has a nice
    side effect of eliminating an extra kvm_vcpu_gfn_to_memslot(). In
    addition it makes mmu_set_spte(), which is a very long function, a
    little shorter.

    No functional change intended.

    Signed-off-by: David Matlack
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Stable-dep-of: 604f533262ae ("KVM: x86/mmu: add missing update to max_mmu_rmap_size")
    Signed-off-by: Sasha Levin

    David Matlack
     
  • commit 50b2d49bafa16e6311ab2da82f5aafc5f9ada99b upstream.

    Inject #UD when emulating XSETBV if CR4.OSXSAVE is not set. This also
    covers the "XSAVE not supported" check, as setting CR4.OSXSAVE=1 #GPs if
    XSAVE is not supported (and userspace gets to keep the pieces if it
    forces incoherent vCPU state).

    Add a comment to kvm_emulate_xsetbv() to call out that the CPU checks
    CR4.OSXSAVE before checking for intercepts. AMD'S APM implies that #UD
    has priority (says that intercepts are checked before #GP exceptions),
    while Intel's SDM says nothing about interception priority. However,
    testing on hardware shows that both AMD and Intel CPUs prioritize the #UD
    over interception.

    Fixes: 02d4160fbd76 ("x86: KVM: add xsetbv to the emulator")
    Cc: stable@vger.kernel.org
    Cc: Vitaly Kuznetsov
    Signed-off-by: Sean Christopherson
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Sean Christopherson
     

23 Sep, 2022

1 commit

  • commit 683412ccf61294d727ead4a73d97397396e69a6b upstream.

    Flush the CPU caches when memory is reclaimed from an SEV guest (where
    reclaim also includes it being unmapped from KVM's memslots). Due to lack
    of coherency for SEV encrypted memory, failure to flush results in silent
    data corruption if userspace is malicious/broken and doesn't ensure SEV
    guest memory is properly pinned and unpinned.

    Cache coherency is not enforced across the VM boundary in SEV (AMD APM
    vol.2 Section 15.34.7). Confidential cachelines, generated by confidential
    VM guests have to be explicitly flushed on the host side. If a memory page
    containing dirty confidential cachelines was released by VM and reallocated
    to another user, the cachelines may corrupt the new user at a later time.

    KVM takes a shortcut by assuming all confidential memory remain pinned
    until the end of VM lifetime. Therefore, KVM does not flush cache at
    mmu_notifier invalidation events. Because of this incorrect assumption and
    the lack of cache flushing, malicous userspace can crash the host kernel:
    creating a malicious VM and continuously allocates/releases unpinned
    confidential memory pages when the VM is running.

    Add cache flush operations to mmu_notifier operations to ensure that any
    physical memory leaving the guest VM get flushed. In particular, hook
    mmu_notifier_invalidate_range_start and mmu_notifier_release events and
    flush cache accordingly. The hook after releasing the mmu lock to avoid
    contention with other vCPUs.

    Cc: stable@vger.kernel.org
    Suggested-by: Sean Christpherson
    Reported-by: Mingwei Zhang
    Signed-off-by: Mingwei Zhang
    Message-Id:
    Signed-off-by: Paolo Bonzini
    [OP: adjusted KVM_X86_OP_OPTIONAL() -> KVM_X86_OP_NULL, applied
    kvm_arch_guest_memory_reclaimed() call in kvm_set_memslot()]
    Signed-off-by: Ovidiu Panait
    Signed-off-by: Greg Kroah-Hartman

    Mingwei Zhang
     

20 Sep, 2022

3 commits

  • [ Upstream commit e87f4152e542610d0b4c6c8548964a68a59d2040 ]

    Force-inline two stack helpers to fix the following objtool warnings:

    vmlinux.o: warning: objtool: in_task_stack()+0xc: call to task_stack_page() leaves .noinstr.text section
    vmlinux.o: warning: objtool: in_entry_stack()+0x10: call to cpu_entry_stack() leaves .noinstr.text section

    Signed-off-by: Borislav Petkov
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lore.kernel.org/r/20220324183607.31717-2-bp@alien8.de
    Stable-dep-of: 54c3931957f6 ("tracing: hold caller_addr to hardirq_{enable,disable}_ip")
    Signed-off-by: Sasha Levin

    Borislav Petkov
     
  • [ Upstream commit ace1a98519270c586c0d4179419292df67441cd1 ]

    Fix:

    vmlinux.o: warning: objtool: __sev_es_nmi_complete()+0x8b: call to __phys_addr_nodebug() leaves .noinstr.text section

    Signed-off-by: Borislav Petkov
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lore.kernel.org/r/20220324183607.31717-4-bp@alien8.de
    Stable-dep-of: 54c3931957f6 ("tracing: hold caller_addr to hardirq_{enable,disable}_ip")
    Signed-off-by: Sasha Levin

    Borislav Petkov
     
  • [ Upstream commit 8b023accc8df70e72f7704d29fead7ca914d6837 ]

    While looking into a bug related to the compiler's handling of addresses
    of labels, I noticed some uses of _THIS_IP_ seemed unused in lockdep.
    Drive by cleanup.

    -Wunused-parameter:
    kernel/locking/lockdep.c:1383:22: warning: unused parameter 'ip'
    kernel/locking/lockdep.c:4246:48: warning: unused parameter 'ip'
    kernel/locking/lockdep.c:4844:19: warning: unused parameter 'ip'

    Signed-off-by: Nick Desaulniers
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Waiman Long
    Link: https://lore.kernel.org/r/20220314221909.2027027-1-ndesaulniers@google.com
    Stable-dep-of: 54c3931957f6 ("tracing: hold caller_addr to hardirq_{enable,disable}_ip")
    Signed-off-by: Sasha Levin

    Nick Desaulniers
     

08 Sep, 2022

2 commits

  • [ Upstream commit 0204750bd4c6ccc2fb7417618477f10373b33f56 ]

    KVM should not claim to virtualize unknown IA32_ARCH_CAPABILITIES
    bits. When kvm_get_arch_capabilities() was originally written, there
    were only a few bits defined in this MSR, and KVM could virtualize all
    of them. However, over the years, several bits have been defined that
    KVM cannot just blindly pass through to the guest without additional
    work (such as virtualizing an MSR promised by the
    IA32_ARCH_CAPABILITES feature bit).

    Define a mask of supported IA32_ARCH_CAPABILITIES bits, and mask off
    any other bits that are set in the hardware MSR.

    Cc: Paolo Bonzini
    Fixes: 5b76a3cff011 ("KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry")
    Signed-off-by: Jim Mattson
    Reviewed-by: Vipin Sharma
    Reviewed-by: Xiaoyao Li
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    Jim Mattson
     
  • [ Upstream commit 020dac4187968535f089f83f376a72beb3451311 ]

    Regardless of the 'msr' argument passed to the VMX version of
    msr_write_intercepted(), the function always checks to see if a
    specific MSR (IA32_SPEC_CTRL) is intercepted for write. This behavior
    seems unintentional and unexpected.

    Modify the function so that it checks to see if the provided 'msr'
    index is intercepted for write.

    Fixes: 67f4b9969c30 ("KVM: nVMX: Handle dynamic MSR intercept toggling")
    Cc: Sean Christopherson
    Signed-off-by: Jim Mattson
    Reviewed-by: Sean Christopherson
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    Jim Mattson
     

31 Aug, 2022

8 commits

  • commit d4bdb0bebc5ba3299d74f123c782d99cd4e25c49 upstream.

    With the existing code in store_latency_data(), the memory operation (mem_op)
    returned to the user is always OP_LOAD where in fact, it should be OP_STORE.
    This comes from the fact that the function is simply grabbing the information
    from a data source map which covers only load accesses. Intel 12th gen CPU
    offers precise store sampling that captures both the data source and latency.
    Therefore it can use the data source mapping table but must override the
    memory operation to reflect stores instead of loads.

    Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids")
    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20220818054613.1548130-1-eranian@google.com
    Signed-off-by: Greg Kroah-Hartman

    Stephane Eranian
     
  • commit 11745ecfe8fea4b4a4c322967a7605d2ecbd5080 upstream.

    Existing code was generating bogus counts for the SNB IMC bandwidth counters:

    $ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/
    1.000327813 1,024.03 MiB uncore_imc/data_reads/
    1.000327813 20.73 MiB uncore_imc/data_writes/
    2.000580153 261,120.00 MiB uncore_imc/data_reads/
    2.000580153 23.28 MiB uncore_imc/data_writes/

    The problem was introduced by commit:
    07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC")

    Where the read_counter callback was replace to point to the generic
    uncore_mmio_read_counter() function.

    The SNB IMC counters are freerunnig 32-bit counters laid out contiguously in
    MMIO. But uncore_mmio_read_counter() is using a readq() call to read from
    MMIO therefore reading 64-bit from MMIO. Although this is okay for the
    uncore_perf_event_update() function because it is shifting the value based
    on the actual counter width to compute a delta, it is not okay for the
    uncore_pmu_event_start() which is simply reading the counter and therefore
    priming the event->prev_count with a bogus value which is responsible for
    causing bogus deltas in the perf stat command above.

    The fix is to reintroduce the custom callback for read_counter for the SNB
    IMC PMU and use readl() instead of readq(). With the change the output of
    perf stat is back to normal:
    $ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/
    1.000120987 296.94 MiB uncore_imc/data_reads/
    1.000120987 138.42 MiB uncore_imc/data_writes/
    2.000403144 175.91 MiB uncore_imc/data_reads/
    2.000403144 68.50 MiB uncore_imc/data_writes/

    Fixes: 07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC")
    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Kan Liang
    Link: https://lore.kernel.org/r/20220803160031.1379788-1-eranian@google.com
    Signed-off-by: Greg Kroah-Hartman

    Stephane Eranian
     
  • commit 332924973725e8cdcc783c175f68cf7e162cb9e5 upstream.

    Turns out that i386 doesn't unconditionally have LFENCE, as such the
    loop in __FILL_RETURN_BUFFER isn't actually speculation safe on such
    chips.

    Fixes: ba6e31af2be9 ("x86/speculation: Add LFENCE to RSB fill sequence")
    Reported-by: Ben Hutchings
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/Yv9tj9vbQ9nNlXoY@worktop.programming.kicks-ass.net
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit 4e3aa9238277597c6c7624f302d81a7b568b6f2d upstream.

    Commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections")
    made a right mess of the RSB stuffing, rewrite the whole thing to not
    suck.

    Thanks to Andrew for the enlightening comment about Post-Barrier RSB
    things so we can make this code less magical.

    Cc: stable@vger.kernel.org
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/YvuNdDWoUZSBjYcm@worktop.programming.kicks-ass.net
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit 7df548840c496b0141fb2404b889c346380c2b22 upstream.

    Older Intel CPUs that are not in the affected processor list for MMIO
    Stale Data vulnerabilities currently report "Not affected" in sysfs,
    which may not be correct. Vulnerability status for these older CPUs is
    unknown.

    Add known-not-affected CPUs to the whitelist. Report "unknown"
    mitigation status for CPUs that are not in blacklist, whitelist and also
    don't enumerate MSR ARCH_CAPABILITIES bits that reflect hardware
    immunity to MMIO Stale Data vulnerabilities.

    Mitigation is not deployed when the status is unknown.

    [ bp: Massage, fixup. ]

    Fixes: 8d50cdf8b834 ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data")
    Suggested-by: Andrew Cooper
    Suggested-by: Tony Luck
    Signed-off-by: Pawan Gupta
    Signed-off-by: Borislav Petkov
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/a932c154772f2121794a5f2eded1a11013114711.1657846269.git.pawan.kumar.gupta@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman

    Pawan Gupta
     
  • commit fc2e426b1161761561624ebd43ce8c8d2fa058da upstream.

    When meeting ftrace trampolines in ORC unwinding, unwinder uses address
    of ftrace_{regs_}call address to find the ORC entry, which gets next frame at
    sp+176.

    If there is an IRQ hitting at sub $0xa8,%rsp, the next frame should be
    sp+8 instead of 176. It makes unwinder skip correct frame and throw
    warnings such as "wrong direction" or "can't access registers", etc,
    depending on the content of the incorrect frame address.

    By adding the base address ftrace_{regs_}caller with the offset
    *ip - ops->trampoline*, we can get the correct address to find the ORC entry.

    Also change "caller" to "tramp_addr" to make variable name conform to
    its content.

    [ mingo: Clarified the changelog a bit. ]

    Fixes: 6be7fa3c74d1 ("ftrace, orc, x86: Handle ftrace dynamically allocated trampolines")
    Signed-off-by: Chen Zhongjin
    Signed-off-by: Ingo Molnar
    Reviewed-by: Steven Rostedt (Google)
    Cc:
    Link: https://lore.kernel.org/r/20220819084334.244016-1-chenzhongjin@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Chen Zhongjin
     
  • commit 32ba156df1b1c8804a4e5be5339616945eafea22 upstream.

    On the platform with Arch LBR, the HW raw branch type encoding may leak
    to the perf tool when the SAVE_TYPE option is not set.

    In the intel_pmu_store_lbr(), the HW raw branch type is stored in
    lbr_entries[].type. If the SAVE_TYPE option is set, the
    lbr_entries[].type will be converted into the generic PERF_BR_* type
    in the intel_pmu_lbr_filter() and exposed to the user tools.
    But if the SAVE_TYPE option is NOT set by the user, the current perf
    kernel doesn't clear the field. The HW raw branch type leaks.

    There are two solutions to fix the issue for the Arch LBR.
    One is to clear the field if the SAVE_TYPE option is NOT set.
    The other solution is to unconditionally convert the branch type and
    expose the generic type to the user tools.

    The latter is implemented here, because
    - The branch type is valuable information. I don't see a case where
    you would not benefit from the branch type. (Stephane Eranian)
    - Not having the branch type DOES NOT save any space in the
    branch record (Stephane Eranian)
    - The Arch LBR HW can retrieve the common branch types from the
    LBR_INFO. It doesn't require the high overhead SW disassemble.

    Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
    Reported-by: Stephane Eranian
    Signed-off-by: Kan Liang
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20220816125612.2042397-1-kan.liang@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman

    Kan Liang
     
  • commit c64cc2802a784ecfd25d39945e57e7a147854a5b upstream.

    Move it after CLAC.

    Suggested-by: Peter Zijlstra
    Signed-off-by: Lai Jiangshan
    Signed-off-by: Borislav Petkov
    Link: https://lore.kernel.org/r/20220503032107.680190-5-jiangshanlai@gmail.com
    Signed-off-by: Juergen Gross
    Signed-off-by: Greg Kroah-Hartman

    Lai Jiangshan
     

25 Aug, 2022

2 commits

  • commit 8924779df820c53875abaeb10c648e9cb75b46d4 upstream.

    When kprobes emulates JNG/JNLE instructions on x86 it uses the wrong
    condition. For JNG (opcode: 0F 8E), according to Intel SDM, the jump is
    performed if (ZF == 1 or SF != OF). However the kernel emulation
    currently uses 'and' instead of 'or'.

    As a result, setting a kprobe on JNG/JNLE might cause the kernel to
    behave incorrectly whenever the kprobe is hit.

    Fix by changing the 'and' to 'or'.

    Fixes: 6256e668b7af ("x86/kprobes: Use int3 instead of debug trap for single-step")
    Signed-off-by: Nadav Amit
    Signed-off-by: Ingo Molnar
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20220813225943.143767-1-namit@vmware.com
    Signed-off-by: Greg Kroah-Hartman

    Nadav Amit
     
  • commit 88e0a74902f894fbbc55ad3ad2cb23b4bfba555c upstream.

    Commit c164fbb40c43f("x86/mm: thread pgprot_t through
    init_memory_mapping()") mistakenly used __pgprot() which doesn't respect
    __default_kernel_pte_mask when setting PUD mapping.

    Fix it by only setting the one bit we actually need (PSE) and leaving
    the other bits (that have been properly masked) alone.

    Fixes: c164fbb40c43 ("x86/mm: thread pgprot_t through init_memory_mapping()")
    Signed-off-by: Aaron Lu
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Aaron Lu
     

21 Aug, 2022

3 commits

  • commit 1f001e9da6bbf482311e45e48f53c2bd2179e59c upstream.

    Use the return thunk in ftrace trampolines, if needed.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Borislav Petkov
    Reviewed-by: Josh Poimboeuf
    Signed-off-by: Borislav Petkov
    [cascardo: use memcpy(text_gen_insn) as there is no __text_gen_insn]
    Signed-off-by: Thadeu Lima de Souza Cascardo
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit e52fc2cf3f662828cc0d51c4b73bed73ad275fce upstream.

    Return trampoline must not use indirect branch to return; while this
    preserves the RSB, it is fundamentally incompatible with IBT. Instead
    use a retpoline like ROP gadget that defeats IBT while not unbalancing
    the RSB.

    And since ftrace_stub is no longer a plain RET, don't use it to copy
    from. Since RET is a trivial instruction, poke it directly.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Josh Poimboeuf
    Link: https://lore.kernel.org/r/20220308154318.347296408@infradead.org
    [cascardo: remove ENDBR]
    Signed-off-by: Thadeu Lima de Souza Cascardo
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • This reverts commit e54fcb0812faebd147de72bd37ad87cc4951c68c.

    This temporarily reverts the backport of upstream commit
    1f001e9da6bbf482311e45e48f53c2bd2179e59c. It was not correct to copy the
    ftrace stub as it would contain a relative jump to the return thunk which
    would not apply to the context where it was being copied to, leading to
    ftrace support to be broken.

    Signed-off-by: Thadeu Lima de Souza Cascardo
    Signed-off-by: Greg Kroah-Hartman

    Thadeu Lima de Souza Cascardo
     

17 Aug, 2022

19 commits

  • [ Upstream commit 4496a6f9b45e8cd83343ad86a3984d614e22cf54 ]

    Attempt to load PERF_GLOBAL_CTRL during nested VM-Enter/VM-Exit if and
    only if the MSR exists (according to the guest vCPU model). KVM has very
    misguided handling of VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL and
    attempts to force the nVMX MSR settings to match the vPMU model, i.e. to
    hide/expose the control based on whether or not the MSR exists from the
    guest's perspective.

    KVM's modifications fail to handle the scenario where the vPMU is hidden
    from the guest _after_ being exposed to the guest, e.g. by userspace
    doing multiple KVM_SET_CPUID2 calls, which is allowed if done before any
    KVM_RUN. nested_vmx_pmu_refresh() is called if and only if there's a
    recognized vPMU, i.e. KVM will leave the bits in the allow state and then
    ultimately reject the MSR load and WARN.

    KVM should not force the VMX MSRs in the first place. KVM taking control
    of the MSRs was a misguided attempt at mimicking what commit 5f76f6f5ff96
    ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled",
    2018-10-01) did for MPX. However, the MPX commit was a workaround for
    another KVM bug and not something that should be imitated (and it should
    never been done in the first place).

    In other words, KVM's ABI _should_ be that userspace has full control
    over the MSRs, at which point triggering the WARN that loading the MSR
    must not fail is trivial.

    The intent of the WARN is still valid; KVM has consistency checks to
    ensure that vmcs12->{guest,host}_ia32_perf_global_ctrl is valid. The
    problem is that '0' must be considered a valid value at all times, and so
    the simple/obvious solution is to just not actually load the MSR when it
    does not exist. It is userspace's responsibility to provide a sane vCPU
    model, i.e. KVM is well within its ABI and Intel's VMX architecture to
    skip the loads if the MSR does not exist.

    Fixes: 03a8871add95 ("KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL VM-{Entry,Exit} control")
    Cc: stable@vger.kernel.org
    Signed-off-by: Sean Christopherson
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    Sean Christopherson
     
  • [ Upstream commit b663f0b5f3d665c261256d1f76e98f077c6e56af ]

    Add a helper to check of the guest PMU has PERF_GLOBAL_CTRL, which is
    unintuitive _and_ diverges from Intel's architecturally defined behavior.
    Even worse, KVM currently implements the check using two different (but
    equivalent) checks, _and_ there has been at least one attempt to add a
    _third_ flavor.

    Cc: stable@vger.kernel.org
    Signed-off-by: Sean Christopherson
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    Sean Christopherson
     
  • [ Upstream commit 98defd2e17803263f49548fea930cfc974d505aa ]

    MSR_CORE_PERF_GLOBAL_CTRL is introduced as part of Architecture PMU V2,
    as indicated by Intel SDM 19.2.2 and the intel_is_valid_msr() function.

    So in the absence of global_ctrl support, all PMCs are enabled as AMD does.

    Signed-off-by: Like Xu
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    Like Xu
     
  • [ Upstream commit 93255bf92939d948bc86d81c6bb70bb0fecc5db1 ]

    Mark all MSR_CORE_PERF_GLOBAL_CTRL and MSR_CORE_PERF_GLOBAL_OVF_CTRL bits
    as reserved if there is no guest vPMU. The nVMX VM-Entry consistency
    checks do not check for a valid vPMU prior to consuming the masks via
    kvm_valid_perf_global_ctrl(), i.e. may incorrectly allow a non-zero mask
    to be loaded via VM-Enter or VM-Exit (well, attempted to be loaded, the
    actual MSR load will be rejected by intel_is_valid_msr()).

    Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")
    Cc: stable@vger.kernel.org
    Signed-off-by: Sean Christopherson
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    Sean Christopherson
     
  • [ Upstream commit 2c985527dd8d283e786ad7a67e532ef7f6f00fac ]

    The mask value of fixed counter control register should be dynamic
    adjusted with the number of fixed counters. This patch introduces a
    variable that includes the reserved bits of fixed counter control
    registers. This is a generic code refactoring.

    Co-developed-by: Luwei Kang
    Signed-off-by: Luwei Kang
    Signed-off-by: Like Xu
    Acked-by: Peter Zijlstra (Intel)
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    Like Xu
     
  • [ Upstream commit 2368048bf5c2ec4b604ac3431564071e89a0bc71 ]

    Return '1', not '-1', when handling an illegal WRMSR to a MCi_CTL or
    MCi_STATUS MSR. The behavior of "all zeros' or "all ones" for CTL MSRs
    is architectural, as is the "only zeros" behavior for STATUS MSRs. I.e.
    the intent is to inject a #GP, not exit to userspace due to an unhandled
    emulation case. Returning '-1' gets interpreted as -EPERM up the stack
    and effecitvely kills the guest.

    Fixes: 890ca9aefa78 ("KVM: Add MCE support")
    Fixes: 9ffd986c6e4e ("KVM: X86: #GP when guest attempts to write MCi_STATUS register w/o 0")
    Cc: stable@vger.kernel.org
    Signed-off-by: Sean Christopherson
    Reviewed-by: Jim Mattson
    Link: https://lore.kernel.org/r/20220512222716.4112548-2-seanjc@google.com
    Signed-off-by: Sasha Levin

    Sean Christopherson
     
  • [ Upstream commit 0471a7bd1bca2a47a5f378f2222c5cf39ce94152 ]

    Certain guest operating systems (e.g., UNIXWARE) clear bit 0 of
    MC1_CTL to ignore single-bit ECC data errors. Single-bit ECC data
    errors are always correctable and thus are safe to ignore because they
    are informational in nature rather than signaling a loss of data
    integrity.

    Prior to this patch, these guests would crash upon writing MC1_CTL,
    with resultant error messages like the following:

    error: kvm run failed Operation not permitted
    EAX=fffffffe EBX=fffffffe ECX=00000404 EDX=ffffffff
    ESI=ffffffff EDI=00000001 EBP=fffdaba4 ESP=fffdab20
    EIP=c01333a5 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
    ES =0108 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
    CS =0100 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
    SS =0108 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
    DS =0108 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
    FS =0000 00000000 ffffffff 00c00000
    GS =0000 00000000 ffffffff 00c00000
    LDT=0118 c1026390 00000047 00008200 DPL=0 LDT
    TR =0110 ffff5af0 00000067 00008b00 DPL=0 TSS32-busy
    GDT= ffff5020 000002cf
    IDT= ffff52f0 000007ff
    CR0=8001003b CR2=00000000 CR3=0100a000 CR4=00000230
    DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
    DR6=ffff0ff0 DR7=00000400
    EFER=0000000000000000
    Code=08 89 01 89 51 04 c3 8b 4c 24 08 8b 01 8b 51 04 8b 4c 24 04
    30 c3 f7 05 a4 6d ff ff 10 00 00 00 74 03 0f 31 c3 33 c0 33 d2 c3 8d
    74 26 00 0f 31 c3

    Signed-off-by: Lev Kujawski
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    Lev Kujawski
     
  • [ Upstream commit 2d16803c562ecc644803d42ba98a8e0aef9c014e ]

    BLAKE2s has no currently known use as an shash. Just remove all of this
    unnecessary plumbing. Removing this shash was something we talked about
    back when we were making BLAKE2s a built-in, but I simply never got
    around to doing it. So this completes that project.

    Importantly, this fixs a bug in which the lib code depends on
    crypto_simd_disabled_for_test, causing linker errors.

    Also add more alignment tests to the selftests and compare SIMD and
    non-SIMD compression functions, to make up for what we lose from
    testmgr.c.

    Reported-by: gaochao
    Cc: Eric Biggers
    Cc: Ard Biesheuvel
    Cc: stable@vger.kernel.org
    Fixes: 6048fdcc5f26 ("lib/crypto: blake2s: include as built-in")
    Signed-off-by: Jason A. Donenfeld
    Signed-off-by: Herbert Xu
    Signed-off-by: Sasha Levin

    Jason A. Donenfeld
     
  • commit 3a2ba42cbd0b669ce3837ba400905f93dd06c79f upstream.

    The bitops compile-time optimization series revealed one more
    problem in olpc-xo1-sci.c:send_ebook_state(), resulted in GCC
    warnings:

    arch/x86/platform/olpc/olpc-xo1-sci.c: In function 'send_ebook_state':
    arch/x86/platform/olpc/olpc-xo1-sci.c:83:63: warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses]
    83 | if (!!test_bit(SW_TABLET_MODE, ebook_switch_idev->sw) == state)
    | ^~
    arch/x86/platform/olpc/olpc-xo1-sci.c:83:13: note: add parentheses around left hand side expression to silence this warning

    Despite this code working as intended, this redundant double
    negation of boolean value, together with comparing to `char`
    with no explicit conversion to bool, makes compilers think
    the author made some unintentional logical mistakes here.
    Make it the other way around and negate the char instead
    to silence the warnings.

    Fixes: d2aa37411b8e ("x86/olpc/xo1/sci: Produce wakeup events for buttons and switches")
    Cc: stable@vger.kernel.org # 3.5+
    Reported-by: Guenter Roeck
    Reported-by: kernel test robot
    Reviewed-and-tested-by: Guenter Roeck
    Signed-off-by: Alexander Lobakin
    Signed-off-by: Yury Norov
    Signed-off-by: Greg Kroah-Hartman

    Alexander Lobakin
     
  • commit dec8784c9088b131a1523f582c2194cfc8107dc0 upstream.

    Fix kprobes to update kcb (kprobes control block) status flag to
    KPROBE_HIT_SSDONE even if the kp->post_handler is not set.

    This bug may cause a kernel panic if another INT3 user runs right
    after kprobes because kprobe_int3_handler() misunderstands the
    INT3 is kprobe's single stepping INT3.

    Fixes: 6256e668b7af ("x86/kprobes: Use int3 instead of debug trap for single-step")
    Reported-by: Daniel Müller
    Signed-off-by: Masami Hiramatsu (Google)
    Signed-off-by: Ingo Molnar
    Tested-by: Daniel Müller
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/all/20220727210136.jjgc3lpqeq42yr3m@muellerd-fedora-PC2BDTX9
    Link: https://lore.kernel.org/r/165942025658.342061.12452378391879093249.stgit@devnote2
    Signed-off-by: Greg Kroah-Hartman

    Masami Hiramatsu (Google)
     
  • commit ac6c1b2ca77e722a1e5d651f12f437f2f237e658 upstream.

    When a ftrace_bug happens (where ftrace fails to modify a location) it is
    helpful to have what was at that location as well as what was expected to
    be there.

    But with the conversion to text_poke() the variable that assigns the
    expected for debugging was dropped. Unfortunately, I noticed this when I
    needed it. Add it back.

    Link: https://lkml.kernel.org/r/20220726101851.069d2e70@gandalf.local.home

    Cc: "x86@kernel.org"
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Andrew Morton
    Cc: stable@vger.kernel.org
    Fixes: 768ae4406a5c ("x86/ftrace: Use text_poke()")
    Signed-off-by: Steven Rostedt (Google)
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (Google)
     
  • commit e6cfcdda8cbe81eaf821c897369a65fec987b404 upstream.

    AMD's "Technical Guidance for Mitigating Branch Type Confusion,
    Rev. 1.0 2022-07-12" whitepaper, under section 6.1.2 "IBPB On
    Privileged Mode Entry / SMT Safety" says:

    Similar to the Jmp2Ret mitigation, if the code on the sibling thread
    cannot be trusted, software should set STIBP to 1 or disable SMT to
    ensure SMT safety when using this mitigation.

    So, like already being done for retbleed=unret, and now also for
    retbleed=ibpb, force STIBP on machines that have it, and report its SMT
    vulnerability status accordingly.

    [ bp: Remove the "we" and remove "[AMD]" applicability parameter which
    doesn't work here. ]

    Fixes: 3ebc17006888 ("x86/bugs: Add retbleed=ibpb")
    Signed-off-by: Kim Phillips
    Signed-off-by: Borislav Petkov
    Cc: stable@vger.kernel.org # 5.10, 5.15, 5.19
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
    Link: https://lore.kernel.org/r/20220804192201.439596-1-kim.phillips@amd.com
    Signed-off-by: Greg Kroah-Hartman

    Kim Phillips
     
  • [ Upstream commit de979c83574abf6e78f3fa65b716515c91b2613d ]

    With CONFIG_PREEMPTION disabled, arch/x86/entry/thunk_$(BITS).o becomes
    an empty object file.

    With some old versions of binutils (i.e., 2.35.90.20210113-1ubuntu1) the
    GNU assembler doesn't generate a symbol table for empty object files and
    objtool fails with the following error when a valid symbol table cannot
    be found:

    arch/x86/entry/thunk_64.o: warning: objtool: missing symbol table

    To prevent this from happening, build thunk_$(BITS).o only if
    CONFIG_PREEMPTION is enabled.

    BugLink: https://bugs.launchpad.net/bugs/1911359

    Fixes: 320100a5ffe5 ("x86/entry: Remove the TRACE_IRQS cruft")
    Signed-off-by: Andrea Righi
    Signed-off-by: Ingo Molnar
    Link: https://lore.kernel.org/r/Ys/Ke7EWjcX+ZlXO@arighi-desktop
    Signed-off-by: Sasha Levin

    Andrea Righi
     
  • [ Upstream commit 625395c4a0f4775e0fe00f616888d2e6c1ba49db ]

    GCC-12 started triggering a new warning:

    arch/x86/mm/numa.c: In function ‘cpumask_of_node’:
    arch/x86/mm/numa.c:916:39: warning: the comparison will always evaluate as ‘false’ for the address of ‘node_to_cpumask_map’ will never be NULL [-Waddress]
    916 | if (node_to_cpumask_map[node] == NULL) {
    | ^~

    node_to_cpumask_map is of type cpumask_var_t[].

    When CONFIG_CPUMASK_OFFSTACK is set, cpumask_var_t is typedef'd to a
    pointer for dynamic allocation, else to an array of one element. The
    "wicked game" can be checked on line 700 of include/linux/cpumask.h.

    The original code in debug_cpumask_set_cpu() and cpumask_of_node() were
    probably written by the original authors with CONFIG_CPUMASK_OFFSTACK=y
    (i.e. dynamic allocation) in mind, checking if the cpumask was available
    via a direct NULL check.

    When CONFIG_CPUMASK_OFFSTACK is not set, GCC gives the above warning
    while compiling the kernel.

    Fix that by using cpumask_available(), which does the NULL check when
    CONFIG_CPUMASK_OFFSTACK is set, otherwise returns true. Use it wherever
    such checks are made.

    Conditional definitions of cpumask_available() can be found along with
    the definition of cpumask_var_t. Check the cpumask.h reference mentioned
    above.

    Fixes: c032ef60d1aa ("cpumask: convert node_to_cpumask_map[] to cpumask_var_t")
    Fixes: de2d9445f162 ("x86: Unify node_to_cpumask_map handling between 32 and 64bit")
    Signed-off-by: Siddh Raman Pant
    Signed-off-by: Ingo Molnar
    Link: https://lore.kernel.org/r/20220731160913.632092-1-code@siddh.me
    Signed-off-by: Sasha Levin

    Siddh Raman Pant
     
  • [ Upstream commit ffa6482e461ff550325356ae705b79e256702ea9 ]

    It's possible that this kernel has been kexec'd from a kernel that
    enabled bus lock detection, or (hypothetically) BIOS/firmware has set
    DEBUGCTLMSR_BUS_LOCK_DETECT.

    Disable bus lock detection explicitly if not wanted.

    Fixes: ebb1064e7c2e ("x86/traps: Handle #DB for bus lock")
    Signed-off-by: Chenyi Qiang
    Signed-off-by: Ingo Molnar
    Reviewed-by: Tony Luck
    Link: https://lore.kernel.org/r/20220802033206.21333-1-chenyi.qiang@intel.com
    Signed-off-by: Sasha Levin

    Chenyi Qiang
     
  • [ Upstream commit a910b5ab6b250a88fff1866bf708642d83317466 ]

    Make UMIP an "allowed-1" bit CR4_FIXED1 MSR when KVM is emulating UMIP.
    KVM emulates UMIP for both L1 and L2, and so should enumerate that L2 is
    allowed to have CR4.UMIP=1. Not setting the bit doesn't immediately
    break nVMX, as KVM does set/clear the bit in CR4_FIXED1 in response to a
    guest CPUID update, i.e. KVM will correctly (dis)allow nested VM-Entry
    based on whether or not UMIP is exposed to L1. That said, KVM should
    enumerate the bit as being allowed from time zero, e.g. userspace will
    see the wrong value if the MSR is read before CPUID is written.

    Fixes: 0367f205a3b7 ("KVM: vmx: add support for emulating UMIP")
    Signed-off-by: Sean Christopherson
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    Sean Christopherson
     
  • [ Upstream commit 3741aec4c38fa4123ab08ae552f05366d4fd05d8 ]

    If NRIPS is supported in hardware but disabled in KVM, set next_rip to
    the next RIP when advancing RIP as part of emulating INT3 injection.
    There is no flag to tell the CPU that KVM isn't using next_rip, and so
    leaving next_rip is left as is will result in the CPU pushing garbage
    onto the stack when vectoring the injected event.

    Reviewed-by: Maxim Levitsky
    Fixes: 66b7138f9136 ("KVM: SVM: Emulate nRIP feature when reinjecting INT3")
    Signed-off-by: Sean Christopherson
    Signed-off-by: Maciej S. Szmigiero
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    Sean Christopherson
     
  • [ Upstream commit cd9e6da8048c5b40315ee2d929b6230ce1252c3c ]

    Unwind the RIP advancement done by svm_queue_exception() when injecting
    an INT3 ultimately "fails" due to the CPU encountering a VM-Exit while
    vectoring the injected event, even if the exception reported by the CPU
    isn't the same event that was injected. If vectoring INT3 encounters an
    exception, e.g. #NP, and vectoring the #NP encounters an intercepted
    exception, e.g. #PF when KVM is using shadow paging, then the #NP will
    be reported as the event that was in-progress.

    Note, this is still imperfect, as it will get a false positive if the
    INT3 is cleanly injected, no VM-Exit occurs before the IRET from the INT3
    handler in the guest, the instruction following the INT3 generates an
    exception (directly or indirectly), _and_ vectoring that exception
    encounters an exception that is intercepted by KVM. The false positives
    could theoretically be solved by further analyzing the vectoring event,
    e.g. by comparing the error code against the expected error code were an
    exception to occur when vectoring the original injected exception, but
    SVM without NRIPS is a complete disaster, trying to make it 100% correct
    is a waste of time.

    Reviewed-by: Maxim Levitsky
    Fixes: 66b7138f9136 ("KVM: SVM: Emulate nRIP feature when reinjecting INT3")
    Signed-off-by: Sean Christopherson
    Signed-off-by: Maciej S. Szmigiero
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    Sean Christopherson
     
  • [ Upstream commit a1a5482a2c6e38a3ebed32e571625c56a8cc41a6 ]

    On Fri, Jun 17, 2022 at 02:08:52PM +0300, Stephane Eranian wrote:
    > Some changes to the way invalid MSR accesses are reported by the
    > kernel is causing some problems with messages printed on the
    > console.
    >
    > We have seen several cases of ex_handler_msr() printing invalid MSR
    > accesses once but the callstack multiple times causing confusion on
    > the console.

    > The problem here is that another earlier commit (5.13):
    >
    > a358f40600b3 ("once: implement DO_ONCE_LITE for non-fast-path "do once" functionality")
    >
    > Modifies all the pr_*_once() calls to always return true claiming
    > that no caller is ever checking the return value of the functions.
    >
    > This is why we are seeing the callstack printed without the
    > associated printk() msg.

    Extract the ONCE_IF(cond) part into __ONCE_LTE_IF() and use that to
    implement DO_ONCE_LITE_IF() and fix the extable code.

    Fixes: a358f40600b3 ("once: implement DO_ONCE_LITE for non-fast-path "do once" functionality")
    Reported-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra (Intel)
    Tested-by: Stephane Eranian
    Link: https://lkml.kernel.org/r/YqyVFsbviKjVGGZ9@worktop.programming.kicks-ass.net
    Signed-off-by: Sasha Levin

    Peter Zijlstra