17 Apr, 2019

11 commits

  • commit c73f4c998e1fd4249b9edfa39e23f4fda2b9b041 upstream.

    Referring to the "VIRTUALIZING MSR-BASED APIC ACCESSES" chapter of the
    SDM, when "virtualize x2APIC mode" is 1 and "APIC-register
    virtualization" is 0, a RDMSR of 808H should return the VTPR from the
    virtual APIC page.

    However, for nested, KVM currently fails to disable the read intercept
    for this MSR. This means that a RDMSR exit takes precedence over
    "virtualize x2APIC mode", and KVM passes through L1's TPR to L2,
    instead of sourcing the value from L2's virtual APIC page.

    This patch fixes the issue by disabling the read intercept, in VMCS02,
    for the VTPR when "APIC-register virtualization" is 0.

    The issue described above and fix prescribed here, were verified with
    a related patch in kvm-unit-tests titled "Test VMX's virtualize x2APIC
    mode w/ nested".

    Signed-off-by: Marc Orr
    Reviewed-by: Jim Mattson
    Fixes: c992384bde84f ("KVM: vmx: speed up MSR bitmap merge")
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Marc Orr
     
  • commit acff78477b9b4f26ecdf65733a4ed77fe837e9dc upstream.

    The nested_vmx_prepare_msr_bitmap() function doesn't directly guard the
    x2APIC MSR intercepts with the "virtualize x2APIC mode" MSR. As a
    result, we discovered the potential for a buggy or malicious L1 to get
    access to L0's x2APIC MSRs, via an L2, as follows.

    1. L1 executes WRMSR(IA32_SPEC_CTRL, 1). This causes the spec_ctrl
    variable, in nested_vmx_prepare_msr_bitmap() to become true.
    2. L1 disables "virtualize x2APIC mode" in VMCS12.
    3. L1 enables "APIC-register virtualization" in VMCS12.

    Now, KVM will set VMCS02's x2APIC MSR intercepts from VMCS12, and then
    set "virtualize x2APIC mode" to 0 in VMCS02. Oops.

    This patch closes the leak by explicitly guarding VMCS02's x2APIC MSR
    intercepts with VMCS12's "virtualize x2APIC mode" control.

    The scenario outlined above and fix prescribed here, were verified with
    a related patch in kvm-unit-tests titled "Add leak scenario to
    virt_x2apic_mode_test".

    Note, it looks like this issue may have been introduced inadvertently
    during a merge---see 15303ba5d1cd.

    Signed-off-by: Marc Orr
    Reviewed-by: Jim Mattson
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Marc Orr
     
  • commit 3966c3feca3fd10b2935caa0b4a08c7dd59469e5 upstream.

    Spurious interrupt support was added to perf in the following commit, almost
    a decade ago:

    63e6be6d98e1 ("perf, x86: Catch spurious interrupts after disabling counters")

    The two previous patches (resolving the race condition when disabling a
    PMC and NMI latency mitigation) allow for the removal of this older
    spurious interrupt support.

    Currently in x86_pmu_stop(), the bit for the PMC in the active_mask bitmap
    is cleared before disabling the PMC, which sets up a race condition. This
    race condition was mitigated by introducing the running bitmap. That race
    condition can be eliminated by first disabling the PMC, waiting for PMC
    reset on overflow and then clearing the bit for the PMC in the active_mask
    bitmap. The NMI handler will not re-enable a disabled counter.

    If x86_pmu_stop() is called from the perf NMI handler, the NMI latency
    mitigation support will guard against any unhandled NMI messages.

    Signed-off-by: Tom Lendacky
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: # 4.14.x-
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Link: https://lkml.kernel.org/r/Message-ID:
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Lendacky, Thomas
     
  • commit 6d3edaae16c6c7d238360f2841212c2b26774d5e upstream.

    On AMD processors, the detection of an overflowed PMC counter in the NMI
    handler relies on the current value of the PMC. So, for example, to check
    for overflow on a 48-bit counter, bit 47 is checked to see if it is 1 (not
    overflowed) or 0 (overflowed).

    When the perf NMI handler executes it does not know in advance which PMC
    counters have overflowed. As such, the NMI handler will process all active
    PMC counters that have overflowed. NMI latency in newer AMD processors can
    result in multiple overflowed PMC counters being processed in one NMI and
    then a subsequent NMI, that does not appear to be a back-to-back NMI, not
    finding any PMC counters that have overflowed. This may appear to be an
    unhandled NMI resulting in either a panic or a series of messages,
    depending on how the kernel was configured.

    To mitigate this issue, add an AMD handle_irq callback function,
    amd_pmu_handle_irq(), that will invoke the common x86_pmu_handle_irq()
    function and upon return perform some additional processing that will
    indicate if the NMI has been handled or would have been handled had an
    earlier NMI not handled the overflowed PMC. Using a per-CPU variable, a
    minimum value of the number of active PMCs or 2 will be set whenever a
    PMC is active. This is used to indicate the possible number of NMIs that
    can still occur. The value of 2 is used for when an NMI does not arrive
    at the LAPIC in time to be collapsed into an already pending NMI. Each
    time the function is called without having handled an overflowed counter,
    the per-CPU value is checked. If the value is non-zero, it is decremented
    and the NMI indicates that it handled the NMI. If the value is zero, then
    the NMI indicates that it did not handle the NMI.

    Signed-off-by: Tom Lendacky
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: # 4.14.x-
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Link: https://lkml.kernel.org/r/Message-ID:
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Lendacky, Thomas
     
  • commit 914123fa39042e651d79eaf86bbf63a1b938dddf upstream.

    On AMD processors, the detection of an overflowed counter in the NMI
    handler relies on the current value of the counter. So, for example, to
    check for overflow on a 48 bit counter, bit 47 is checked to see if it
    is 1 (not overflowed) or 0 (overflowed).

    There is currently a race condition present when disabling and then
    updating the PMC. Increased NMI latency in newer AMD processors makes this
    race condition more pronounced. If the counter value has overflowed, it is
    possible to update the PMC value before the NMI handler can run. The
    updated PMC value is not an overflowed value, so when the perf NMI handler
    does run, it will not find an overflowed counter. This may appear as an
    unknown NMI resulting in either a panic or a series of messages, depending
    on how the kernel is configured.

    To eliminate this race condition, the PMC value must be checked after
    disabling the counter. Add an AMD function, amd_pmu_disable_all(), that
    will wait for the NMI handler to reset any active and overflowed counter
    after calling x86_pmu_disable_all().

    Signed-off-by: Tom Lendacky
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: # 4.14.x-
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Link: https://lkml.kernel.org/r/Message-ID:
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Lendacky, Thomas
     
  • commit 5b77e95dd7790ff6c8fbf1cd8d0104ebed818a03 upstream.

    There's a number of problems with how arch/x86/include/asm/bitops.h
    is currently using assembly constraints for the memory region
    bitops are modifying:

    1) Use memory clobber in bitops that touch arbitrary memory

    Certain bit operations that read/write bits take a base pointer and an
    arbitrarily large offset to address the bit relative to that base.
    Inline assembly constraints aren't expressive enough to tell the
    compiler that the assembly directive is going to touch a specific memory
    location of unknown size, therefore we have to use the "memory" clobber
    to indicate that the assembly is going to access memory locations other
    than those listed in the inputs/outputs.

    To indicate that BTR/BTS instructions don't necessarily touch the first
    sizeof(long) bytes of the argument, we also move the address to assembly
    inputs.

    This particular change leads to size increase of 124 kernel functions in
    a defconfig build. For some of them the diff is in NOP operations, other
    end up re-reading values from memory and may potentially slow down the
    execution. But without these clobbers the compiler is free to cache
    the contents of the bitmaps and use them as if they weren't changed by
    the inline assembly.

    2) Use byte-sized arguments for operations touching single bytes.

    Passing a long value to ANDB/ORB/XORB instructions makes the compiler
    treat sizeof(long) bytes as being clobbered, which isn't the case. This
    may theoretically lead to worse code in the case of heavy optimization.

    Practical impact:

    I've built a defconfig kernel and looked through some of the functions
    generated by GCC 7.3.0 with and without this clobber, and didn't spot
    any miscompilations.

    However there is a (trivial) theoretical case where this code leads to
    miscompilation:

    https://lkml.org/lkml/2019/3/28/393

    using just GCC 8.3.0 with -O2. It isn't hard to imagine someone writes
    such a function in the kernel someday.

    So the primary motivation is to fix an existing misuse of the asm
    directive, which happens to work in certain configurations now, but
    isn't guaranteed to work under different circumstances.

    [ --mingo: Added -stable tag because defconfig only builds a fraction
    of the kernel and the trivial testcase looks normal enough to
    be used in existing or in-development code. ]

    Signed-off-by: Alexander Potapenko
    Cc:
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: Dmitry Vyukov
    Cc: H. Peter Anvin
    Cc: James Y Knight
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20190402112813.193378-1-glider@google.com
    [ Edited the changelog, tidied up one of the defines. ]
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Alexander Potapenko
     
  • commit 88ca66d8540ca26119b1428cddb96b37925bdf01 upstream.

    The minimum supported gcc version is >= 4.6, so these can be removed.

    Signed-off-by: Rasmus Villemoes
    Signed-off-by: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Dan Williams
    Cc: Geert Uytterhoeven
    Cc: Ingo Molnar
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: x86-ml
    Link: https://lkml.kernel.org/r/20190111084931.24601-1-linux@rasmusvillemoes.dk
    Signed-off-by: Greg Kroah-Hartman

    Rasmus Villemoes
     
  • commit 42d8644bd77dd2d747e004e367cb0c895a606f39 upstream.

    The "call" variable comes from the user in privcmd_ioctl_hypercall().
    It's an offset into the hypercall_page[] which has (PAGE_SIZE / 32)
    elements. We need to put an upper bound on it to prevent an out of
    bounds access.

    Cc: stable@vger.kernel.org
    Fixes: 1246ae0bb992 ("xen: add variable hypercall caller")
    Signed-off-by: Dan Carpenter
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Juergen Gross
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • commit ede885ecb2cdf8a8dd5367702e3d964ec846a2d5 upstream.

    get_num_contig_pages() could potentially overflow int so make its type
    consistent with its usage.

    Reported-by: Cfir Cohen
    Cc: stable@vger.kernel.org
    Signed-off-by: David Rientjes
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    David Rientjes
     
  • commit ac3e233d29f7f77f28243af0132057d378d3ea58 upstream.

    GNU linker's -z common-page-size's default value is based on the target
    architecture. arch/x86/entry/vdso/Makefile sets it to the architecture
    default, which is implicit and redundant. Drop it.

    Fixes: 2aae950b21e4 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu")
    Reported-by: Dmitry Golovin
    Reported-by: Bill Wendling
    Suggested-by: Dmitry Golovin
    Suggested-by: Rui Ueyama
    Signed-off-by: Nick Desaulniers
    Signed-off-by: Borislav Petkov
    Acked-by: Andy Lutomirski
    Cc: Andi Kleen
    Cc: Fangrui Song
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: x86-ml
    Link: https://lkml.kernel.org/r/20181206191231.192355-1-ndesaulniers@google.com
    Link: https://bugs.llvm.org/show_bug.cgi?id=38774
    Link: https://github.com/ClangBuiltLinux/linux/issues/31
    Signed-off-by: Nathan Chancellor
    Signed-off-by: Sasha Levin

    Nick Desaulniers
     
  • [ Upstream commit 9ebdfe5230f2e50e3ba05c57723a06e90946815a ]

    According to the SDM, "NMI-window exiting" VM-exits wake a logical
    processor from the same inactive states as would an NMI and
    "interrupt-window exiting" VM-exits wake a logical processor from the
    same inactive states as would an external interrupt. Specifically, they
    wake a logical processor from the shutdown state and from the states
    entered using the HLT and MWAIT instructions.

    Fixes: 6dfacadd5858 ("KVM: nVMX: Add support for activity state HLT")
    Signed-off-by: Jim Mattson
    Reviewed-by: Peter Shier
    Suggested-by: Sean Christopherson
    [Squashed comments of two Jim's patches and used the simplified code
    hunk provided by Sean. - Radim]
    Signed-off-by: Radim Krčmář
    Signed-off-by: Sasha Levin

    Jim Mattson
     

06 Apr, 2019

5 commits

  • [ Upstream commit a50480cb6d61d5c5fc13308479407b628b6bc1c5 ]

    These interrupt functions are already non-attachable by kprobes.
    Blacklist them explicitly so that they can show up in
    /sys/kernel/debug/kprobes/blacklist and tools like BCC can use this
    additional information.

    Signed-off-by: Andrea Righi
    Cc: Andy Lutomirski
    Cc: Anil S Keshavamurthy
    Cc: Borislav Petkov
    Cc: David S. Miller
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Naveen N. Rao
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Yonghong Song
    Link: http://lkml.kernel.org/r/20181206095648.GA8249@Dell
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin

    Andrea Righi
     
  • [ Upstream commit d071ae09a4a1414c1433d5ae9908959a7325b0ad ]

    Accessing per-CPU variables is done by finding the offset of the
    variable in the per-CPU block and adding it to the address of the
    respective CPU's block.

    Section 3.10.8 of ld.bfd's documentation states:

    For expressions involving numbers, relative addresses and absolute
    addresses, ld follows these rules to evaluate terms:

    Other binary operations, that is, between two relative addresses
    not in the same section, or between a relative address and an
    absolute address, first convert any non-absolute term to an
    absolute address before applying the operator."

    Note that LLVM's linker does not adhere to the GNU ld's implementation
    and as such requires implicitly-absolute terms to be explicitly marked
    as absolute in the linker script. If not, it fails currently with:

    ld.lld: error: ./arch/x86/kernel/vmlinux.lds:153: at least one side of the expression must be absolute
    ld.lld: error: ./arch/x86/kernel/vmlinux.lds:154: at least one side of the expression must be absolute
    Makefile:1040: recipe for target 'vmlinux' failed

    This is not a functional change for ld.bfd which converts the term to an
    absolute symbol anyways as specified above.

    Based on a previous submission by Tri Vo .

    Reported-by: Dmitry Golovin
    Signed-off-by: Rafael Ávila de Espíndola
    [ Update commit message per Boris' and Michael's suggestions. ]
    Signed-off-by: Nick Desaulniers
    [ Massage commit message more, fix typos. ]
    Signed-off-by: Borislav Petkov
    Tested-by: Dmitry Golovin
    Cc: "H. Peter Anvin"
    Cc: Andy Lutomirski
    Cc: Brijesh Singh
    Cc: Cao Jin
    Cc: Ingo Molnar
    Cc: Joerg Roedel
    Cc: Masahiro Yamada
    Cc: Masami Hiramatsu
    Cc: Thomas Gleixner
    Cc: Tri Vo
    Cc: dima@golovin.in
    Cc: morbo@google.com
    Cc: x86-ml
    Link: https://lkml.kernel.org/r/20181219190145.252035-1-ndesaulniers@google.com
    Signed-off-by: Sasha Levin

    Rafael Ávila de Espíndola
     
  • [ Upstream commit 927185c124d62a9a4d35878d7f6d432a166b74e3 ]

    The kernel uses the OUTPUT_FORMAT linker script command in it's linker
    scripts. Most of the time, the -m option is passed to the linker with
    correct architecture, but sometimes (at least for x86_64) the -m option
    contradicts the OUTPUT_FORMAT directive.

    Specifically, arch/x86/boot and arch/x86/realmode/rm produce i386 object
    files, but are linked with the -m elf_x86_64 linker flag when building
    for x86_64.

    The GNU linker manpage doesn't explicitly state any tie-breakers between
    -m and OUTPUT_FORMAT. But with BFD and Gold linkers, OUTPUT_FORMAT
    overrides the emulation value specified with the -m option.

    LLVM lld has a different behavior, however. When supplied with
    contradicting -m and OUTPUT_FORMAT values it fails with the following
    error message:

    ld.lld: error: arch/x86/realmode/rm/header.o is incompatible with elf_x86_64

    Therefore, just add the correct -m after the incorrect one (it overrides
    it), so the linker invocation looks like this:

    ld -m elf_x86_64 -z max-page-size=0x200000 -m elf_i386 --emit-relocs -T \
    realmode.lds header.o trampoline_64.o stack.o reboot.o -o realmode.elf

    This is not a functional change for GNU ld, because (although not
    explicitly documented) OUTPUT_FORMAT overrides -m EMULATION.

    Tested by building x86_64 kernel with GNU gcc/ld toolchain and booting
    it in QEMU.

    [ bp: massage and clarify text. ]

    Suggested-by: Dmitry Golovin
    Signed-off-by: George Rimar
    Signed-off-by: Tri Vo
    Signed-off-by: Borislav Petkov
    Tested-by: Tri Vo
    Tested-by: Nick Desaulniers
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Michael Matz
    Cc: Thomas Gleixner
    Cc: morbo@google.com
    Cc: ndesaulniers@google.com
    Cc: ruiu@google.com
    Cc: x86-ml
    Link: https://lkml.kernel.org/r/20190111201012.71210-1-trong@android.com
    Signed-off-by: Sasha Levin

    George Rimar
     
  • [ Upstream commit 840018668ce2d96783356204ff282d6c9b0e5f66 ]

    When pmu::setup_aux() is called the coresight PMU needs to know which
    sink to use for the session by looking up the information in the
    event's attr::config2 field.

    As such simply replace the cpu information by the complete perf_event
    structure and change all affected customers.

    Signed-off-by: Mathieu Poirier
    Reviewed-by: Suzuki Poulouse
    Acked-by: Peter Zijlstra
    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Alexei Starovoitov
    Cc: Greg Kroah-Hartman
    Cc: H. Peter Anvin
    Cc: Heiko Carstens
    Cc: Jiri Olsa
    Cc: Mark Rutland
    Cc: Martin Schwidefsky
    Cc: Namhyung Kim
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-s390@vger.kernel.org
    Link: http://lkml.kernel.org/r/20190131184714.20388-2-mathieu.poirier@linaro.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Mathieu Poirier
     
  • [ Upstream commit 179fb36abb097976997f50733d5b122a29158cba ]

    After commit 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments"),
    kexec fails with a kernel panic:

    kexec_core: Starting new kernel
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v3.0 03/02/2018
    RIP: 0010:0xffffc9000001d000

    Call Trace:
    ? __send_ipi_mask+0x1c6/0x2d0
    ? hv_send_ipi_mask_allbutself+0x6d/0xb0
    ? mp_save_irq+0x70/0x70
    ? __ioapic_read_entry+0x32/0x50
    ? ioapic_read_entry+0x39/0x50
    ? clear_IO_APIC_pin+0xb8/0x110
    ? native_stop_other_cpus+0x6e/0x170
    ? native_machine_shutdown+0x22/0x40
    ? kernel_kexec+0x136/0x156

    That happens if hypercall based IPIs are used because the hypercall page is
    reset very early upon kexec reboot, but kexec sends IPIs to stop CPUs,
    which invokes the hypercall and dereferences the unusable page.

    To fix his, reset hv_hypercall_pg to NULL before the page is reset to avoid
    any misuse, IPI sending will fall back to the non hypercall based
    method. This only happens on kexec / kdump so just setting the pointer to
    NULL is good enough.

    Fixes: 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments")
    Signed-off-by: Kairui Song
    Signed-off-by: Thomas Gleixner
    Cc: "K. Y. Srinivasan"
    Cc: Haiyang Zhang
    Cc: Stephen Hemminger
    Cc: Sasha Levin
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Vitaly Kuznetsov
    Cc: Dave Young
    Cc: devel@linuxdriverproject.org
    Link: https://lkml.kernel.org/r/20190306111827.14131-1-kasong@redhat.com
    Signed-off-by: Sasha Levin

    Kairui Song
     

03 Apr, 2019

3 commits

  • commit 0cf9135b773bf32fba9dd8e6699c1b331ee4b749 upstream.

    The CPUID flag ARCH_CAPABILITIES is unconditioinally exposed to host
    userspace for all x86 hosts, i.e. KVM advertises ARCH_CAPABILITIES
    regardless of hardware support under the pretense that KVM fully
    emulates MSR_IA32_ARCH_CAPABILITIES. Unfortunately, only VMX hosts
    handle accesses to MSR_IA32_ARCH_CAPABILITIES (despite KVM_GET_MSRS
    also reporting MSR_IA32_ARCH_CAPABILITIES for all hosts).

    Move the MSR_IA32_ARCH_CAPABILITIES handling to common x86 code so
    that it's emulated on AMD hosts.

    Fixes: 1eaafe91a0df4 ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported")
    Cc: stable@vger.kernel.org
    Reported-by: Xiaoyao Li
    Cc: Jim Mattson
    Signed-off-by: Sean Christopherson
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Sean Christopherson
     
  • commit 45def77ebf79e2e8942b89ed79294d97ce914fa0 upstream.

    Most (all?) x86 platforms provide a port IO based reset mechanism, e.g.
    OUT 92h or CF9h. Userspace may emulate said mechanism, i.e. reset a
    vCPU in response to KVM_EXIT_IO, without explicitly announcing to KVM
    that it is doing a reset, e.g. Qemu jams vCPU state and resumes running.

    To avoid corruping %rip after such a reset, commit 0967b7bf1c22 ("KVM:
    Skip pio instruction when it is emulated, not executed") changed the
    behavior of PIO handlers, i.e. today's "fast" PIO handling to skip the
    instruction prior to exiting to userspace. Full emulation doesn't need
    such tricks becase re-emulating the instruction will naturally handle
    %rip being changed to point at the reset vector.

    Updating %rip prior to executing to userspace has several drawbacks:

    - Userspace sees the wrong %rip on the exit, e.g. if PIO emulation
    fails it will likely yell about the wrong address.
    - Single step exits to userspace for are effectively dropped as
    KVM_EXIT_DEBUG is overwritten with KVM_EXIT_IO.
    - Behavior of PIO emulation is different depending on whether it
    goes down the fast path or the slow path.

    Rather than skip the PIO instruction before exiting to userspace,
    snapshot the linear %rip and cancel PIO completion if the current
    value does not match the snapshot. For a 64-bit vCPU, i.e. the most
    common scenario, the snapshot and comparison has negligible overhead
    as VMCS.GUEST_RIP will be cached regardless, i.e. there is no extra
    VMREAD in this case.

    All other alternatives to snapshotting the linear %rip that don't
    rely on an explicit reset announcenment suffer from one corner case
    or another. For example, canceling PIO completion on any write to
    %rip fails if userspace does a save/restore of %rip, and attempting to
    avoid that issue by canceling PIO only if %rip changed then fails if PIO
    collides with the reset %rip. Attempting to zero in on the exact reset
    vector won't work for APs, which means adding more hooks such as the
    vCPU's MP_STATE, and so on and so forth.

    Checking for a linear %rip match technically suffers from corner cases,
    e.g. userspace could theoretically rewrite the underlying code page and
    expect a different instruction to execute, or the guest hardcodes a PIO
    reset at 0xfffffff0, but those are far, far outside of what can be
    considered normal operation.

    Fixes: 432baf60eee3 ("KVM: VMX: use kvm_fast_pio_in for handling IN I/O")
    Cc:
    Reported-by: Jim Mattson
    Signed-off-by: Sean Christopherson
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Sean Christopherson
     
  • commit bebd024e4815b1a170fcd21ead9c2222b23ce9e6 upstream.

    The SMT disable 'nosmt' command line argument is not working properly when
    CONFIG_HOTPLUG_CPU is disabled. The teardown of the sibling CPUs which are
    required to be brought up due to the MCE issues, cannot work. The CPUs are
    then kept in a half dead state.

    As the 'nosmt' functionality has become popular due to the speculative
    hardware vulnerabilities, the half torn down state is not a proper solution
    to the problem.

    Enforce CONFIG_HOTPLUG_CPU=y when SMP is enabled so the full operation is
    possible.

    Reported-by: Tianyu Lan
    Signed-off-by: Thomas Gleixner
    Acked-by: Greg Kroah-Hartman
    Cc: Konrad Wilk
    Cc: Josh Poimboeuf
    Cc: Mukesh Ojha
    Cc: Peter Zijlstra
    Cc: Jiri Kosina
    Cc: Rik van Riel
    Cc: Andy Lutomirski
    Cc: Micheal Kelley
    Cc: "K. Y. Srinivasan"
    Cc: Linus Torvalds
    Cc: Borislav Petkov
    Cc: K. Y. Srinivasan
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190326163811.598166056@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

27 Mar, 2019

2 commits

  • commit ac5ceccce5501e43d217c596e4ee859f2a3fef79 upstream.

    When the ORC unwinder is invoked for an oops caused by IP==0,
    it currently has no idea what to do because there is no debug information
    for the stack frame of NULL.

    But if RIP is NULL, it is very likely that the last successfully executed
    instruction was an indirect CALL/JMP, and it is possible to unwind out in
    the same way as for the first instruction of a normal function. Hardcode
    a corresponding ORC entry.

    With an artificially-added NULL call in prctl_set_seccomp(), before this
    patch, the trace is:

    Call Trace:
    ? __x64_sys_prctl+0x402/0x680
    ? __ia32_sys_prctl+0x6e0/0x6e0
    ? __do_page_fault+0x457/0x620
    ? do_syscall_64+0x6d/0x160
    ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

    After this patch, the trace looks like this:

    Call Trace:
    __x64_sys_prctl+0x402/0x680
    ? __ia32_sys_prctl+0x6e0/0x6e0
    ? __do_page_fault+0x457/0x620
    do_syscall_64+0x6d/0x160
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    prctl_set_seccomp() still doesn't show up in the trace because for some
    reason, tail call optimization is only disabled in builds that use the
    frame pointer unwinder.

    Signed-off-by: Jann Horn
    Signed-off-by: Thomas Gleixner
    Acked-by: Josh Poimboeuf
    Cc: Borislav Petkov
    Cc: Andrew Morton
    Cc: syzbot
    Cc: "H. Peter Anvin"
    Cc: Masahiro Yamada
    Cc: Michal Marek
    Cc: linux-kbuild@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190301031201.7416-2-jannh@google.com
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     
  • commit f4f34e1b82eb4219d8eaa1c7e2e17ca219a6a2b5 upstream.

    When the frame unwinder is invoked for an oops caused by a call to NULL, it
    currently skips the parent function because BP still points to the parent's
    stack frame; the (nonexistent) current function only has the first half of
    a stack frame, and BP doesn't point to it yet.

    Add a special case for IP==0 that calculates a fake BP from SP, then uses
    the real BP for the next frame.

    Note that this handles first_frame specially: Return information about the
    parent function as long as the saved IP is >=first_frame, even if the fake
    BP points below it.

    With an artificially-added NULL call in prctl_set_seccomp(), before this
    patch, the trace is:

    Call Trace:
    ? prctl_set_seccomp+0x3a/0x50
    __x64_sys_prctl+0x457/0x6f0
    ? __ia32_sys_prctl+0x750/0x750
    do_syscall_64+0x72/0x160
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    After this patch, the trace is:

    Call Trace:
    prctl_set_seccomp+0x3a/0x50
    __x64_sys_prctl+0x457/0x6f0
    ? __ia32_sys_prctl+0x750/0x750
    do_syscall_64+0x72/0x160
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Signed-off-by: Jann Horn
    Signed-off-by: Thomas Gleixner
    Acked-by: Josh Poimboeuf
    Cc: Borislav Petkov
    Cc: Andrew Morton
    Cc: syzbot
    Cc: "H. Peter Anvin"
    Cc: Masahiro Yamada
    Cc: Michal Marek
    Cc: linux-kbuild@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190301031201.7416-1-jannh@google.com
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     

24 Mar, 2019

13 commits

  • commit 34333cc6c2cb021662fd32e24e618d1b86de95bf upstream.

    Regarding segments with a limit==0xffffffff, the SDM officially states:

    When the effective limit is FFFFFFFFH (4 GBytes), these accesses may
    or may not cause the indicated exceptions. Behavior is
    implementation-specific and may vary from one execution to another.

    In practice, all CPUs that support VMX ignore limit checks for "flat
    segments", i.e. an expand-up data or code segment with base=0 and
    limit=0xffffffff. This is subtly different than wrapping the effective
    address calculation based on the address size, as the flat segment
    behavior also applies to accesses that would wrap the 4g boundary, e.g.
    a 4-byte access starting at 0xffffffff will access linear addresses
    0xffffffff, 0x0, 0x1 and 0x2.

    Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
    Cc: stable@vger.kernel.org
    Signed-off-by: Sean Christopherson
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Sean Christopherson
     
  • commit 8570f9e881e3fde98801bb3a47eef84dd934d405 upstream.

    The address size of an instruction affects the effective address, not
    the virtual/linear address. The final address may still be truncated,
    e.g. to 32-bits outside of long mode, but that happens irrespective of
    the address size, e.g. a 32-bit address size can yield a 64-bit virtual
    address when using FS/GS with a non-zero base.

    Fixes: 064aea774768 ("KVM: nVMX: Decoding memory operands of VMX instructions")
    Cc: stable@vger.kernel.org
    Signed-off-by: Sean Christopherson
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Sean Christopherson
     
  • commit 946c522b603f281195af1df91837a1d4d1eb3bc9 upstream.

    The VMCS.EXIT_QUALIFCATION field reports the displacements of memory
    operands for various instructions, including VMX instructions, as a
    naturally sized unsigned value, but masks the value by the addr size,
    e.g. given a ModRM encoded as -0x28(%ebp), the -0x28 displacement is
    reported as 0xffffffd8 for a 32-bit address size. Despite some weird
    wording regarding sign extension, the SDM explicitly states that bits
    beyond the instructions address size are undefined:

    In all cases, bits of this field beyond the instruction’s address
    size are undefined.

    Failure to sign extend the displacement results in KVM incorrectly
    treating a negative displacement as a large positive displacement when
    the address size of the VMX instruction is smaller than KVM's native
    size, e.g. a 32-bit address size on a 64-bit KVM.

    The very original decoding, added by commit 064aea774768 ("KVM: nVMX:
    Decoding memory operands of VMX instructions"), sort of modeled sign
    extension by truncating the final virtual/linear address for a 32-bit
    address size. I.e. it messed up the effective address but made it work
    by adjusting the final address.

    When segmentation checks were added, the truncation logic was kept
    as-is and no sign extension logic was introduced. In other words, it
    kept calculating the wrong effective address while mostly generating
    the correct virtual/linear address. As the effective address is what's
    used in the segment limit checks, this results in KVM incorreclty
    injecting #GP/#SS faults due to non-existent segment violations when
    a nested VMM uses negative displacements with an address size smaller
    than KVM's native address size.

    Using the -0x28(%ebp) example, an EBP value of 0x1000 will result in
    KVM using 0x100000fd8 as the effective address when checking for a
    segment limit violation. This causes a 100% failure rate when running
    a 32-bit KVM build as L1 on top of a 64-bit KVM L0.

    Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
    Cc: stable@vger.kernel.org
    Signed-off-by: Sean Christopherson
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Sean Christopherson
     
  • commit ddfd1730fd829743e41213e32ccc8b4aa6dc8325 upstream.

    When installing new memslots, KVM sets bit 0 of the generation number to
    indicate that an update is in-progress. Until the update is complete,
    there are no guarantees as to whether a vCPU will see the old or the new
    memslots. Explicity prevent caching MMIO accesses so as to avoid using
    an access cached from the old memslots after the new memslots have been
    installed.

    Note that it is unclear whether or not disabling caching during the
    update window is strictly necessary as there is no definitive
    documentation as to what ordering guarantees KVM provides with respect
    to updating memslots. That being said, the MMIO spte code does not
    allow reusing sptes created while an update is in-progress, and the
    associated documentation explicitly states:

    We do not want to use an MMIO sptes created with an odd generation
    number, ... If KVM is unlucky and creates an MMIO spte while the
    low bit is 1, the next access to the spte will always be a cache miss.

    At the very least, disabling the per-vCPU MMIO cache during updates will
    make its behavior consistent with the MMIO spte behavior and
    documentation.

    Fixes: 56f17dd3fbc4 ("kvm: x86: fix stale mmio cache bug")
    Cc:
    Signed-off-by: Sean Christopherson
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Sean Christopherson
     
  • commit e1359e2beb8b0a1188abc997273acbaedc8ee791 upstream.

    The check to detect a wrap of the MMIO generation explicitly looks for a
    generation number of zero. Now that unique memslots generation numbers
    are assigned to each address space, only address space 0 will get a
    generation number of exactly zero when wrapping. E.g. when address
    space 1 goes from 0x7fffe to 0x80002, the MMIO generation number will
    wrap to 0x2. Adjust the MMIO generation to strip the address space
    modifier prior to checking for a wrap.

    Fixes: 4bd518f1598d ("KVM: use separate generations for each address space")
    Cc:
    Signed-off-by: Sean Christopherson
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Sean Christopherson
     
  • commit 152482580a1b0accb60676063a1ac57b2d12daf6 upstream.

    kvm_arch_memslots_updated() is at this point in time an x86-specific
    hook for handling MMIO generation wraparound. x86 stashes 19 bits of
    the memslots generation number in its MMIO sptes in order to avoid
    full page fault walks for repeat faults on emulated MMIO addresses.
    Because only 19 bits are used, wrapping the MMIO generation number is
    possible, if unlikely. kvm_arch_memslots_updated() alerts x86 that
    the generation has changed so that it can invalidate all MMIO sptes in
    case the effective MMIO generation has wrapped so as to avoid using a
    stale spte, e.g. a (very) old spte that was created with generation==0.

    Given that the purpose of kvm_arch_memslots_updated() is to prevent
    consuming stale entries, it needs to be called before the new generation
    is propagated to memslots. Invalidating the MMIO sptes after updating
    memslots means that there is a window where a vCPU could dereference
    the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
    spte that was created with (pre-wrap) generation==0.

    Fixes: e59dbe09f8e6 ("KVM: Introduce kvm_arch_memslots_updated()")
    Cc:
    Signed-off-by: Sean Christopherson
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Sean Christopherson
     
  • commit 8041ffd36f42d8521d66dd1e236feb58cecd68bc upstream.

    The client IMC bandwidth events currently return very large values:

    $ perf stat -e uncore_imc/data_reads/ -e uncore_imc/data_writes/ -I 10000 -a

    10.000117222 34,788.76 MiB uncore_imc/data_reads/
    10.000117222 8.26 MiB uncore_imc/data_writes/
    20.000374584 34,842.89 MiB uncore_imc/data_reads/
    20.000374584 10.45 MiB uncore_imc/data_writes/
    30.000633299 37,965.29 MiB uncore_imc/data_reads/
    30.000633299 323.62 MiB uncore_imc/data_writes/
    40.000891548 41,012.88 MiB uncore_imc/data_reads/
    40.000891548 6.98 MiB uncore_imc/data_writes/
    50.001142480 1,125,899,906,621,494.75 MiB uncore_imc/data_reads/
    50.001142480 6.97 MiB uncore_imc/data_writes/

    The client IMC events are freerunning counters. They still use the
    old event encoding format (0x1 for data_read and 0x2 for data write).
    The counter bit width is calculated by common code, which assume that
    the standard encoding format is used for the freerunning counters.
    Error bit width information is calculated.

    The patch intends to convert the old client IMC event encoding to the
    standard encoding format.

    Current common code uses event->attr.config which directly copy from
    user space. We should not implicitly modify it for a converted event.
    The event->hw.config is used to replace the event->attr.config in
    common code.

    For client IMC events, the event->attr.config is used to calculate a
    converted event with standard encoding format in the custom
    event_init(). The converted event is stored in event->hw.config.
    For other events of freerunning counters, they already use the standard
    encoding format. The same value as event->attr.config is assigned to
    event->hw.config in common event_init().

    Reported-by: Jin Yao
    Tested-by: Jin Yao
    Signed-off-by: Kan Liang
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Andy Lutomirski
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: stable@kernel.org # v4.18+
    Fixes: 9aae1780e7e8 ("perf/x86/intel/uncore: Clean up client IMC uncore")
    Link: https://lkml.kernel.org/r/20190227165729.1861-1-kan.liang@linux.intel.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Kan Liang
     
  • commit 0192e6535ebe9af68614198ced4fd6d37b778ebf upstream.

    Prohibit probing on optprobe template code, since it is not
    a code but a template instruction sequence. If we modify
    this template, copied template must be broken.

    Signed-off-by: Masami Hiramatsu
    Cc: Alexander Shishkin
    Cc: Andrea Righi
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mathieu Desnoyers
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Fixes: 9326638cbee2 ("kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation")
    Link: http://lkml.kernel.org/r/154998787911.31052.15274376330136234452.stgit@devbox
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Masami Hiramatsu
     
  • commit 01bd2ac2f55a1916d81dace12fa8d7ae1c79b5ea upstream.

    Commit f7c90c2aa40048 ("x86/xen: don't write ptes directly in 32-bit
    PV guests") introduced a regression for booting dom0 on huge systems
    with lots of RAM (in the TB range).

    Reason is that on those hosts the p2m list needs to be moved early in
    the boot process and this requires temporary page tables to be created.
    Said commit modified xen_set_pte_init() to use a hypercall for writing
    a PTE, but this requires the page table being in the direct mapped
    area, which is not the case for the temporary page tables used in
    xen_relocate_p2m().

    As the page tables are completely written before being linked to the
    actual address space instead of set_pte() a plain write to memory can
    be used in xen_relocate_p2m().

    Fixes: f7c90c2aa40048 ("x86/xen: don't write ptes directly in 32-bit PV guests")
    Cc: stable@vger.kernel.org
    Signed-off-by: Juergen Gross
    Reviewed-by: Jan Beulich
    Signed-off-by: Juergen Gross
    Signed-off-by: Greg Kroah-Hartman

    Juergen Gross
     
  • commit 2060e284e9595fc3baed6e035903c05b93266555 upstream.

    The x86 MORUS implementations all fail the improved AEAD tests because
    they produce the wrong result with some data layouts. The issue is that
    they assume that if the skcipher_walk API gives 'nbytes' not aligned to
    the walksize (a.k.a. walk.stride), then it is the end of the data. In
    fact, this can happen before the end.

    Also, when the CRYPTO_TFM_REQ_MAY_SLEEP flag is given, they can
    incorrectly sleep in the skcipher_walk_*() functions while preemption
    has been disabled by kernel_fpu_begin().

    Fix these bugs.

    Fixes: 56e8e57fc3a7 ("crypto: morus - Add common SIMD glue code for MORUS")
    Cc: # v4.18+
    Cc: Ondrej Mosnacek
    Signed-off-by: Eric Biggers
    Reviewed-by: Ondrej Mosnacek
    Signed-off-by: Herbert Xu
    Signed-off-by: Greg Kroah-Hartman

    Eric Biggers
     
  • commit 3af349639597fea582a93604734d717e59a0e223 upstream.

    gcmaes_crypt_by_sg() dereferences the NULL pointer returned by
    scatterwalk_ffwd() when encrypting an empty plaintext and the source
    scatterlist ends immediately after the associated data.

    Fix it by only fast-forwarding to the src/dst data scatterlists if the
    data length is nonzero.

    This bug is reproduced by the "rfc4543(gcm(aes))" test vectors when run
    with the new AEAD test manager.

    Fixes: e845520707f8 ("crypto: aesni - Update aesni-intel_glue to use scatter/gather")
    Cc: # v4.17+
    Cc: Dave Watson
    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu
    Signed-off-by: Greg Kroah-Hartman

    Eric Biggers
     
  • commit ba6771c0a0bc2fac9d6a8759bab8493bd1cffe3b upstream.

    The x86 AEGIS implementations all fail the improved AEAD tests because
    they produce the wrong result with some data layouts. The issue is that
    they assume that if the skcipher_walk API gives 'nbytes' not aligned to
    the walksize (a.k.a. walk.stride), then it is the end of the data. In
    fact, this can happen before the end.

    Also, when the CRYPTO_TFM_REQ_MAY_SLEEP flag is given, they can
    incorrectly sleep in the skcipher_walk_*() functions while preemption
    has been disabled by kernel_fpu_begin().

    Fix these bugs.

    Fixes: 1d373d4e8e15 ("crypto: x86 - Add optimized AEGIS implementations")
    Cc: # v4.18+
    Cc: Ondrej Mosnacek
    Signed-off-by: Eric Biggers
    Reviewed-by: Ondrej Mosnacek
    Signed-off-by: Herbert Xu
    Signed-off-by: Greg Kroah-Hartman

    Eric Biggers
     
  • [ Upstream commit 8cd8f0ce0d6aafe661cb3d6781c8b82bc696c04d ]

    Add the CPUID model number of Icelake (ICL) mobile processors to the
    Intel family list. Icelake U/Y series uses model number 0x7E.

    Signed-off-by: Rajneesh Bhardwaj
    Signed-off-by: Borislav Petkov
    Cc: Andy Shevchenko
    Cc: Dave Hansen
    Cc: "David E. Box"
    Cc: dvhart@infradead.org
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Cc: platform-driver-x86@vger.kernel.org
    Cc: Qiuxu Zhuo
    Cc: Srinivas Pandruvada
    Cc: Thomas Gleixner
    Cc: x86-ml
    Link: https://lkml.kernel.org/r/20190214115712.19642-2-rajneesh.bhardwaj@linux.intel.com
    Signed-off-by: Sasha Levin

    Rajneesh Bhardwaj
     

19 Mar, 2019

3 commits

  • commit c634dc6bdedeb0b2c750fc611612618a85639ab2 upstream.

    Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort")
    Signed-off-by: kbuild test robot
    Signed-off-by: Thomas Gleixner
    Cc: "Peter Zijlstra (Intel)"
    Cc: kbuild-all@01.org
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Kan Liang
    Cc: Jiri Olsa
    Cc: Andi Kleen
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190313184243.GA10820@lkp-sb-ep06
    Signed-off-by: Greg Kroah-Hartman

    kbuild test robot
     
  • commit ede271b059463731cbd6dffe55ffd70d7dbe8392 upstream.

    Through:

    validate_event()
    x86_pmu.get_event_constraints(.idx=-1)
    tfa_get_event_constraints()
    dyn_constraint()

    cpuc->constraint_list[-1] is used, which is an obvious out-of-bound access.

    In this case, simply skip the TFA constraint code, there is no event
    constraint with just PMC3, therefore the code will never result in the
    empty set.

    Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort")
    Reported-by: Tony Jones
    Reported-by: "DSouza, Nelson"
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Tested-by: Tony Jones
    Tested-by: "DSouza, Nelson"
    Cc: eranian@google.com
    Cc: jolsa@redhat.com
    Cc: stable@kernel.org
    Link: https://lkml.kernel.org/r/20190314130705.441549378@infradead.org
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit f764c58b7faa26f5714e6907f892abc2bc0de4f8 upstream.

    Guenter reported a build warning for CONFIG_CPU_SUP_INTEL=n:

    > With allmodconfig-CONFIG_CPU_SUP_INTEL, this patch results in:
    >
    > In file included from arch/x86/events/amd/core.c:8:0:
    > arch/x86/events/amd/../perf_event.h:1036:45: warning: ‘struct cpu_hw_event’ declared inside parameter list will not be visible outside of this definition or declaration
    > static inline int intel_cpuc_prepare(struct cpu_hw_event *cpuc, int cpu)

    While harmless (an unsed pointer is an unused pointer, no matter the type)
    it needs fixing.

    Reported-by: Guenter Roeck
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Greg Kroah-Hartman
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Fixes: d01b1f96a82e ("perf/x86/intel: Make cpuc allocations consistent")
    Link: http://lkml.kernel.org/r/20190315081410.GR5996@hirez.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     

14 Mar, 2019

3 commits

  • commit 400816f60c543153656ac74eaf7f36f6b7202378 upstream

    Skylake (and later) will receive a microcode update to address a TSX
    errata. This microcode will, on execution of a TSX instruction
    (speculative or not) use (clobber) PMC3. This update will also provide
    a new MSR to change this behaviour along with a CPUID bit to enumerate
    the presence of this new MSR.

    When the MSR gets set; the microcode will no longer use PMC3 but will
    Force Abort every TSX transaction (upon executing COMMIT).

    When TSX Force Abort (TFA) is allowed (default); the MSR gets set when
    PMC3 gets scheduled and cleared when, after scheduling, PMC3 is
    unused.

    When TFA is not allowed; clear PMC3 from all constraints such that it
    will not get used.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra (Intel)
     
  • commit 52f64909409c17adf54fcf5f9751e0544ca3a6b4 upstream

    Skylake systems will receive a microcode update to address a TSX
    errata. This microcode will (by default) clobber PMC3 when TSX
    instructions are (speculatively or not) executed.

    It also provides an MSR to cause all TSX transaction to abort and
    preserve PMC3.

    Add the CPUID enumeration and MSR definition.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra (Intel)
     
  • commit 11f8b2d65ca9029591c8df26bb6bd063c312b7fe upstream

    Such that we can re-use it.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra (Intel)