13 Jan, 2021
4 commits
-
commit 2f80d502d627f30257ba7e3655e71c373b7d1a5a upstream.
Since we know that e >= s, we can reassociate the left shift,
changing the shifted number from 1 to 2 in exchange for
decreasing the right hand side by 1.Reported-by: syzbot+e87846c48bf72bc85311@syzkaller.appspotmail.com
Signed-off-by: Paolo Bonzini
Signed-off-by: Greg Kroah-Hartman -
commit a889ea54b3daa63ee1463dc19ed699407d61458b upstream.
Many TDP MMU functions which need to perform some action on all TDP MMU
roots hold a reference on that root so that they can safely drop the MMU
lock in order to yield to other threads. However, when releasing the
reference on the root, there is a bug: the root will not be freed even
if its reference count (root_count) is reduced to 0.To simplify acquiring and releasing references on TDP MMU root pages, and
to ensure that these roots are properly freed, move the get/put operations
into another TDP MMU root iterator macro.Moving the get/put operations into an iterator macro also helps
simplify control flow when a root does need to be freed. Note that using
the list_for_each_entry_safe macro would not have been appropriate in
this situation because it could keep a pointer to the next root across
an MMU lock release + reacquire, during which time that root could be
freed.Reported-by: Maciej S. Szmigiero
Suggested-by: Paolo Bonzini
Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Fixes: 063afacd8730 ("kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU")
Fixes: a6a0b05da9f3 ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
Fixes: 14881998566d ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU")
Signed-off-by: Ben Gardon
Message-Id:
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini
Signed-off-by: Greg Kroah-Hartman -
commit 39b4d43e6003cee51cd119596d3c33d0449eb44c upstream.
Get the so called "root" level from the low level shadow page table
walkers instead of manually attempting to calculate it higher up the
stack, e.g. in get_mmio_spte(). When KVM is using PAE shadow paging,
the starting level of the walk, from the callers perspective, is not
the CR3 root but rather the PDPTR "root". Checking for reserved bits
from the CR3 root causes get_mmio_spte() to consume uninitialized stack
data due to indexing into sptes[] for a level that was not filled by
get_walk(). This can result in false positives and/or negatives
depending on what garbage happens to be on the stack.Opportunistically nuke a few extra newlines.
Fixes: 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Reported-by: Richard Herbert
Cc: Ben Gardon
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Greg Kroah-Hartman -
commit 2aa078932ff6c66bf10cc5b3144440dbfa7d813d upstream.
Return -1 from the get_walk() helpers if the shadow walk doesn't fill at
least one spte, which can theoretically happen if the walk hits a
not-present PDPTR. Returning the root level in such a case will cause
get_mmio_spte() to return garbage (uninitialized stack data). In
practice, such a scenario should be impossible as KVM shouldn't get a
reserved-bit page fault with a not-present PDPTR.Note, using mmu->root_level in get_walk() is wrong for other reasons,
too, but that's now a moot point.Fixes: 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Cc: Ben Gardon
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Greg Kroah-Hartman
30 Dec, 2020
2 commits
-
commit 9d4747d02376aeb8de38afa25430de79129c5799 upstream.
When both KVM support and the CCP driver are built into the kernel instead
of as modules, KVM initialization can happen before CCP initialization. As
a result, sev_platform_status() will return a failure when it is called
from sev_hardware_setup(), when this isn't really an error condition.Since sev_platform_status() doesn't need to be called at this time anyway,
remove the invocation from sev_hardware_setup().Signed-off-by: Tom Lendacky
Message-Id:
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini
Signed-off-by: Greg Kroah-Hartman -
commit 39485ed95d6b83b62fa75c06c2c4d33992e0d971 upstream.
Until commit e7c587da1252 ("x86/speculation: Use synthetic bits for
IBRS/IBPB/STIBP"), KVM was testing both Intel and AMD CPUID bits before
allowing the guest to write MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD.
Testing only Intel bits on VMX processors, or only AMD bits on SVM
processors, fails if the guests are created with the "opposite" vendor
as the host.While at it, also tweak the host CPU check to use the vendor-agnostic
feature bit X86_FEATURE_IBPB, since we only care about the availability
of the MSR on the host here and not about specific CPUID bits.Fixes: e7c587da1252 ("x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP")
Cc: stable@vger.kernel.org
Reported-by: Denis V. Lunev
Signed-off-by: Paolo Bonzini
Signed-off-by: Greg Kroah-Hartman
13 Dec, 2020
1 commit
-
Pull kvm fixes from Paolo Bonzini:
"Bugfixes for ARM, x86 and tools"* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
tools/kvm_stat: Exempt time-based counters
KVM: mmu: Fix SPTE encoding of MMIO generation upper half
kvm: x86/mmu: Use cpuid to determine max gfn
kvm: svm: de-allocate svm_cpu_data for all cpus in svm_cpu_uninit()
selftests: kvm/set_memory_region_test: Fix race in move region test
KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort()
KVM: arm64: Fix handling of merging tables into a block entry
KVM: arm64: Fix memory leak on stage2 update of a valid PTE
12 Dec, 2020
1 commit
-
Commit cae7ed3c2cb0 ("KVM: x86: Refactor the MMIO SPTE generation handling")
cleaned up the computation of MMIO generation SPTE masks, however it
introduced a bug how the upper part was encoded:
SPTE bits 52-61 were supposed to contain bits 10-19 of the current
generation number, however a missing shift encoded bits 1-10 there instead
(mostly duplicating the lower part of the encoded generation number that
then consisted of bits 1-9).In the meantime, the upper part was shrunk by one bit and moved by
subsequent commits to become an upper half of the encoded generation number
(bits 9-17 of bits 0-17 encoded in a SPTE).In addition to the above, commit 56871d444bc4 ("KVM: x86: fix overlap between SPTE_MMIO_MASK and generation")
has changed the SPTE bit range assigned to encode the generation number and
the total number of bits encoded but did not update them in the comment
attached to their defines, nor in the KVM MMU doc.
Let's do it here, too, since it is too trivial thing to warrant a separate
commit.Fixes: cae7ed3c2cb0 ("KVM: x86: Refactor the MMIO SPTE generation handling")
Signed-off-by: Maciej S. Szmigiero
Message-Id:
Cc: stable@vger.kernel.org
[Reorganize macros so that everything is computed from the bit ranges. - Paolo]
Signed-off-by: Paolo Bonzini
04 Dec, 2020
2 commits
-
In the TDP MMU, use shadow_phys_bits to dermine the maximum possible GFN
mapped in the guest for zapping operations. boot_cpu_data.x86_phys_bits
may be reduced in the case of HW features that steal HPA bits for other
purposes. However, this doesn't necessarily reduce GPA space that can be
accessed via TDP. So zap based on a maximum gfn calculated with MAXPHYADDR
retrieved from CPUID. This is already stored in shadow_phys_bits, so use
it instead of x86_phys_bits.Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Signed-off-by: Rick Edgecombe
Message-Id:
Reviewed-by: Sean Christopherson
Signed-off-by: Paolo Bonzini -
The cpu arg for svm_cpu_uninit() was previously ignored resulting in the
per cpu structure svm_cpu_data not being de-allocated for all cpus.Signed-off-by: Jacob Xu
Message-Id:
Signed-off-by: Paolo Bonzini
28 Nov, 2020
2 commits
-
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix alignment of the new HYP sections
- Fix GICR_TYPER access from userspaceS390:
- do not reset the global diag318 data for per-cpu reset
- do not mark memory as protected too early
- fix for destroy page ultravisor callx86:
- fix for SEV debugging
- fix incorrect return code
- fix for 'noapic' with PIC in userspace and LAPIC in kernel
- fix for 5-level paging"* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PT
KVM: x86: Fix split-irqchip vs interrupt injection window request
KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint
MAINTAINERS: Update email address for Sean Christopherson
MAINTAINERS: add uv.c also to KVM/s390
s390/uv: handle destroy page legacy interface
KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace
KVM: SVM: fix error return code in svm_create_vcpu()
KVM: SVM: Fix offset computation bug in __sev_dbg_decrypt().
KVM: arm64: Correctly align nVHE percpu data
KVM: s390: remove diag318 reset code
KVM: s390: pv: Mark mm as protected after the set secure parameters and improve cleanup -
Commit 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU") caused
the following WARNING on an Intel Ice Lake CPU:get_mmio_spte: detect reserved bits on spte, addr 0xb80a0, dump hierarchy:
------ spte 0xb80a0 level 5.
------ spte 0xfcd210107 level 4.
------ spte 0x1004c40107 level 3.
------ spte 0x1004c41107 level 2.
------ spte 0x1db00000000b83b6 level 1.
WARNING: CPU: 109 PID: 10254 at arch/x86/kvm/mmu/mmu.c:3569 kvm_mmu_page_fault.cold.150+0x54/0x22f [kvm]
...
Call Trace:
? kvm_io_bus_get_first_dev+0x55/0x110 [kvm]
vcpu_enter_guest+0xaa1/0x16a0 [kvm]
? vmx_get_cs_db_l_bits+0x17/0x30 [kvm_intel]
? skip_emulated_instruction+0xaa/0x150 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0xca/0x520 [kvm]The guest triggering this crashes. Note, this happens with the traditional
MMU and EPT enabled, not with the newly introduced TDP MMU. Turns out,
there was a subtle change in the above mentioned commit. Previously,
walk_shadow_page_get_mmio_spte() was setting 'root' to 'iterator.level'
which is returned by shadow_walk_init() and this equals to
'vcpu->arch.mmu->shadow_root_level'. Now, get_mmio_spte() sets it to
'int root = vcpu->arch.mmu->root_level'.The difference between 'root_level' and 'shadow_root_level' on CPUs
supporting 5-level page tables is that in some case we don't want to
use 5-level, in particular when 'cpuid_maxphyaddr(vcpu)
Message-Id:
Signed-off-by: Paolo Bonzini
27 Nov, 2020
2 commits
-
kvm_cpu_accept_dm_intr and kvm_vcpu_ready_for_interrupt_injection are
a hodge-podge of conditions, hacked together to get something that
more or less works. But what is actually needed is much simpler;
in both cases the fundamental question is, do we have a place to stash
an interrupt if userspace does KVM_INTERRUPT?In userspace irqchip mode, that is !vcpu->arch.interrupt.injected.
Currently kvm_event_needs_reinjection(vcpu) covers it, but it is
unnecessarily restrictive.In split irqchip mode it's a bit more complicated, we need to check
kvm_apic_accept_pic_intr(vcpu) (the IRQ window exit is basically an INTACK
cycle and thus requires ExtINTs not to be masked) as well as
!pending_userspace_extint(vcpu). However, there is no need to
check kvm_event_needs_reinjection(vcpu), since split irqchip keeps
pending ExtINT state separate from event injection state, and checking
kvm_cpu_has_interrupt(vcpu) is wrong too since ExtINT has higher
priority than APIC interrupts. In fact the latter fixes a bug:
when userspace requests an IRQ window vmexit, an interrupt in the
local APIC can cause kvm_cpu_has_interrupt() to be true and thus
kvm_vcpu_ready_for_interrupt_injection() to return false. When this
happens, vcpu_run does not exit to userspace but the interrupt window
vmexits keep occurring. The VM loops without any hope of making progress.Once we try to fix these with something like
return kvm_arch_interrupt_allowed(vcpu) &&
- !kvm_cpu_has_interrupt(vcpu) &&
- !kvm_event_needs_reinjection(vcpu) &&
- kvm_cpu_accept_dm_intr(vcpu);
+ (!lapic_in_kernel(vcpu)
+ ? !vcpu->arch.interrupt.injected
+ : (kvm_apic_accept_pic_intr(vcpu)
+ && !pending_userspace_extint(v)));we realize two things. First, thanks to the previous patch the complex
conditional can reuse !kvm_cpu_has_extint(vcpu). Second, the interrupt
window request in vcpu_enter_guest()bool req_int_win =
dm_request_for_irq_injection(vcpu) &&
kvm_cpu_accept_dm_intr(vcpu);should be kept in sync with kvm_vcpu_ready_for_interrupt_injection():
it is unnecessary to ask the processor for an interrupt window
if we would not be able to return to userspace. Therefore,
kvm_cpu_accept_dm_intr(vcpu) is basically !kvm_cpu_has_extint(vcpu)
ANDed with the existing check for masked ExtINT. It all makes sense:- we can accept an interrupt from userspace if there is a place
to stash it (and, for irqchip split, ExtINTs are not masked).
Interrupts from userspace _can_ be accepted even if right now
EFLAGS.IF=0.- in order to tell userspace we will inject its interrupt ("IRQ
window open" i.e. kvm_vcpu_ready_for_interrupt_injection), both
KVM and the vCPU need to be ready to accept the interrupt.... and this is what the patch implements.
Reported-by: David Woodhouse
Analyzed-by: David Woodhouse
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini
Reviewed-by: Nikos Tsironis
Reviewed-by: David Woodhouse
Tested-by: David Woodhouse -
Centralize handling of interrupts from the userspace APIC
in kvm_cpu_has_extint and kvm_cpu_get_extint, since
userspace APIC interrupts are handled more or less the
same as ExtINTs are with split irqchip. This removes
duplicated code from kvm_cpu_has_injectable_intr and
kvm_cpu_has_interrupt, and makes the code more similar
between kvm_cpu_has_{extint,interrupt} on one side
and kvm_cpu_get_{extint,interrupt} on the other.Cc: stable@vger.kernel.org
Reviewed-by: Filippo Sironi
Reviewed-by: David Woodhouse
Tested-by: David Woodhouse
Signed-off-by: Paolo Bonzini
17 Nov, 2020
2 commits
-
Fix to return a negative error code from the error handling case
instead of 0 in function svm_create_vcpu(), as done elsewhere in this
function.Fixes: f4c847a95654 ("KVM: SVM: refactor msr permission bitmap allocation")
Reported-by: Hulk Robot
Signed-off-by: Chen Zhou
Message-Id:
Signed-off-by: Paolo Bonzini -
Fix offset computation in __sev_dbg_decrypt() to include the
source paddr before it is rounded down to be aligned to 16 bytes
as required by SEV API. This fixes incorrect guest memory dumps
observed when using qemu monitor.Signed-off-by: Ashish Kalra
Message-Id:
Signed-off-by: Paolo Bonzini
16 Nov, 2020
2 commits
-
Pull kvm fixes from Paolo Bonzini:
"Fixes for ARM and x86, the latter especially for old processors
without two-dimensional paging (EPT/NPT)"* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: mmu: fix is_tdp_mmu_check when the TDP MMU is not in use
KVM: SVM: Update cr3_lm_rsvd_bits for AMD SEV guests
KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch
KVM: x86: clflushopt should be treated as a no-op by emulation
KVM: arm64: Handle SCXTNUM_ELx traps
KVM: arm64: Unify trap handlers injecting an UNDEF
KVM: arm64: Allow setting of ID_AA64PFR0_EL1.CSV2 from userspace -
Pull perf fixes from Thomas Gleixner:
"A set of fixes for perf:- A set of commits which reduce the stack usage of various perf
event handling functions which allocated large data structs on
stack causing stack overflows in the worst case- Use the proper mechanism for detecting soft interrupts in the
recursion protection- Make the resursion protection simpler and more robust
- Simplify the scheduling of event groups to make the code more
robust and prepare for fixing the issues vs. scheduling of
exclusive event groups- Prevent event multiplexing and rotation for exclusive event groups
- Correct the perf event attribute exclusive semantics to take
pinned events, e.g. the PMU watchdog, into account- Make the anythread filtering conditional for Intel's generic PMU
counters as it is not longer guaranteed to be supported on newer
CPUs. Check the corresponding CPUID leaf to make sure- Fixup a duplicate initialization in an array which was probably
caused by the usual 'copy & paste - forgot to edit' mishap"* tag 'perf-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Fix Add BW copypasta
perf/x86/intel: Make anythread filter support conditional
perf: Tweak perf_event_attr::exclusive semantics
perf: Fix event multiplexing for exclusive groups
perf: Simplify group_sched_in()
perf: Simplify group_sched_out()
perf/x86: Make dummy_iregs static
perf/arch: Remove perf_sample_data::regs_user_copy
perf: Optimize get_recursion_context()
perf: Fix get_recursion_context()
perf/x86: Reduce stack usage for x86_pmu::drain_pebs()
perf: Reduce stack usage of perf_output_begin()
15 Nov, 2020
1 commit
-
In some cases where shadow paging is in use, the root page will
be either mmu->pae_root or vcpu->arch.mmu->lm_root. Then it will
not have an associated struct kvm_mmu_page, because it is allocated
with alloc_page instead of kvm_mmu_alloc_page.Just return false quickly from is_tdp_mmu_root if the TDP MMU is
not in use, which also includes the case where shadow paging is
enabled.Signed-off-by: Paolo Bonzini
13 Nov, 2020
3 commits
-
For AMD SEV guests, update the cr3_lm_rsvd_bits to mask
the memory encryption bit in reserved bits.Signed-off-by: Babu Moger
Message-Id:
Signed-off-by: Paolo Bonzini -
SEV guests fail to boot on a system that supports the PCID feature.
While emulating the RSM instruction, KVM reads the guest CR3
and calls kvm_set_cr3(). If the vCPU is in the long mode,
kvm_set_cr3() does a sanity check for the CR3 value. In this case,
it validates whether the value has any reserved bits set. The
reserved bit range is 63:cpuid_maxphysaddr(). When AMD memory
encryption is enabled, the memory encryption bit is set in the CR3
value. The memory encryption bit may fall within the KVM reserved
bit range, causing the KVM emulation failure.Introduce a new field cr3_lm_rsvd_bits in kvm_vcpu_arch which will
cache the reserved bits in the CR3 value. This will be initialized
to rsvd_bits(cpuid_maxphyaddr(vcpu), 63).If the architecture has any special bits(like AMD SEV encryption bit)
that needs to be masked from the reserved bits, should be cleared
in vendor specific kvm_x86_ops.vcpu_after_set_cpuid handler.Fixes: a780a3ea628268b2 ("KVM: X86: Fix reserved bits check for MOV to CR3")
Signed-off-by: Babu Moger
Message-Id:
Signed-off-by: Paolo Bonzini -
The instruction emulator ignores clflush instructions, yet fails to
support clflushopt. Treat both similarly.Fixes: 13e457e0eebf ("KVM: x86: Emulator does not decode clflush well")
Signed-off-by: David Edmondson
Message-Id:
Reviewed-by: Joao Martins
Signed-off-by: Paolo Bonzini
10 Nov, 2020
1 commit
-
Starting with Arch Perfmon v5, the anythread filter on generic counters may be
deprecated. The current kernel was exporting the any filter without checking.
On Icelake, it means you could do cpu/event=0x3c,any/ even though the filter
does not exist. This patch corrects the problem by relying on the CPUID 0xa leaf
function to determine if anythread is supported or not as described in the
Intel SDM Vol3b 18.2.5.1 AnyThread Deprecation section.Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20201028194247.3160610-1-eranian@google.com
08 Nov, 2020
6 commits
-
Windows2016 guest tries to enable LBR by setting the corresponding bits
in MSR_IA32_DEBUGCTLMSR. KVM does not emulate MSR_IA32_DEBUGCTLMSR and
spams the host kernel logs with error messages like:kvm [...]: vcpu1, guest rIP: 0xfffff800a8b687d3 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x1, nop"
This patch fixes this by enabling error logging only with
'report_ignored_msrs=1'.Signed-off-by: Pankaj Gupta
Message-Id:
Signed-off-by: Paolo Bonzini -
Commit 5b9bb0ebbcdc ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME)
emulation in helper fn", 2020-10-21) subtly changed the behavior of guest
writes to MSR_KVM_SYSTEM_TIME(_NEW). Restore the previous behavior; update
the masterclock any time the guest uses a different msr than before.Fixes: 5b9bb0ebbcdc ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME) emulation in helper fn", 2020-10-21)
Signed-off-by: Oliver Upton
Reviewed-by: Peter Shier
Message-Id:
Signed-off-by: Paolo Bonzini -
Make the paravirtual cpuid enforcement mechanism idempotent to ioctl()
ordering by updating pv_cpuid.features whenever userspace requests the
capability. Extract this update out of kvm_update_cpuid_runtime() into a
new helper function and move its other call site into
kvm_vcpu_after_set_cpuid() where it more likely belongs.Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID")
Signed-off-by: Oliver Upton
Reviewed-by: Peter Shier
Message-Id:
Signed-off-by: Paolo Bonzini -
commit 66570e966dd9 ("kvm: x86: only provide PV features if enabled in
guest's CPUID") only protects against disallowed guest writes to KVM
paravirtual msrs, leaving msr reads unchecked. Fix this by enforcing
KVM_CPUID_FEATURES for msr reads as well.Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID")
Signed-off-by: Oliver Upton
Reviewed-by: Peter Shier
Message-Id:
Signed-off-by: Paolo Bonzini -
Recent introduction of the userspace msr filtering added code that uses
negative error codes for cases that result in either #GP delivery to
the guest, or handled by the userspace msr filtering.This breaks an assumption that a negative error code returned from the
msr emulation code is a semi-fatal error which should be returned
to userspace via KVM_RUN ioctl and usually kill the guest.Fix this by reusing the already existing KVM_MSR_RET_INVALID error code,
and by adding a new KVM_MSR_RET_FILTERED error code for the
userspace filtered msrs.Fixes: 291f35fb2c1d1 ("KVM: x86: report negative values from wrmsr emulation to userspace")
Reported-by: Qian Cai
Signed-off-by: Maxim Levitsky
Message-Id:
Signed-off-by: Paolo Bonzini -
Fix an off-by-one style bug in pte_list_add() where it failed to
account the last full set of SPTEs, i.e. when desc->sptes is full
and desc->more is NULL.Merge the two "PTE_LIST_EXT-1" checks as part of the fix to avoid
an extra comparison.Signed-off-by: Li RongQing
Reviewed-by: Sean Christopherson
Message-Id:
Signed-off-by: Paolo Bonzini
31 Oct, 2020
4 commits
-
Reported-by: kernel test robot
Signed-off-by: Paolo Bonzini -
It was noticed that evmcs_sanitize_exec_ctrls() is not being executed
nowadays despite the code checking 'enable_evmcs' static key looking
correct. Turns out, static key magic doesn't work in '__init' section
(and it is unclear when things changed) but setup_vmcs_config() is called
only once per CPU so we don't really need it to. Switch to checking
'enlightened_vmcs' instead, it is supposed to be in sync with
'enable_evmcs'.Opportunistically make evmcs_sanitize_exec_ctrls '__init' and drop unneeded
extra newline from it.Reported-by: Yang Weijiang
Signed-off-by: Vitaly Kuznetsov
Message-Id:
Signed-off-by: Paolo Bonzini -
The newly introduced kvm_msr_ignored_check() tries to print error or
debug messages via vcpu_*() macros, but those may cause Oops when NULL
vcpu is passed for KVM_GET_MSRS ioctl.Fix it by replacing the print calls with kvm_*() macros.
(Note that this will leave vcpu argument completely unused in the
function, but I didn't touch it to make the fix as small as
possible. A clean up may be applied later.)Fixes: 12bc2132b15e ("KVM: X86: Do the same ignore_msrs check for feature msrs")
BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1178280
Cc:
Signed-off-by: Takashi Iwai
Message-Id:
Signed-off-by: Paolo Bonzini -
Even though the compiler is able to replace static const variables with
their value, it will warn about them being unused when Linux is built with W=1.
Use good old macros instead, this is not C++.Signed-off-by: Paolo Bonzini
25 Oct, 2020
1 commit
-
Pull KVM fixes from Paolo Bonzini:
"Two fixes for this merge window, and an unrelated bugfix for a host
hang"* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: ioapic: break infinite recursion on lazy EOI
KVM: vmx: rename pi_init to avoid conflict with paride
KVM: x86/mmu: Avoid modulo operator on 64-bit value to fix i386 build
24 Oct, 2020
4 commits
-
During shutdown the IOAPIC trigger mode is reset to edge triggered
while the vfio-pci INTx is still registered with a resampler.
This allows us to get into an infinite loop:ioapic_set_irq
-> ioapic_lazy_update_eoi
-> kvm_ioapic_update_eoi_one
-> kvm_notify_acked_irq
-> kvm_notify_acked_gsi
-> (via irq_acked fn ptr) irqfd_resampler_ack
-> kvm_set_irq
-> (via set fn ptr) kvm_set_ioapic_irq
-> kvm_ioapic_set_irq
-> ioapic_set_irqCommit 8be8f932e3db ("kvm: ioapic: Restrict lazy EOI update to
edge-triggered interrupts", 2020-05-04) acknowledges that this recursion
loop exists and tries to avoid it at the call to ioapic_lazy_update_eoi,
but at this point the scenario is already set, we have an edge interrupt
with resampler on the same gsi.Fortunately, the only user of irq ack notifiers (in addition to resamplefd)
is i8254 timer interrupt reinjection. These are edge-triggered, so in
principle they would need the call to kvm_ioapic_update_eoi_one from
ioapic_lazy_update_eoi, but they already disable AVIC(*), so they don't
need the lazy EOI behavior. Therefore, remove the call to
kvm_ioapic_update_eoi_one from ioapic_lazy_update_eoi.This fixes CVE-2020-27152. Note that this issue cannot happen with
SR-IOV assigned devices because virtual functions do not have INTx,
only MSI.Fixes: f458d039db7e ("kvm: ioapic: Lazy update IOAPIC EOI")
Suggested-by: Paolo Bonzini
Tested-by: Alex Williamson
Signed-off-by: Vitaly Kuznetsov
Signed-off-by: Paolo Bonzini -
allyesconfig results in:
ld: drivers/block/paride/paride.o: in function `pi_init':
(.text+0x1340): multiple definition of `pi_init'; arch/x86/kvm/vmx/posted_intr.o:posted_intr.c:(.init.text+0x0): first defined here
make: *** [Makefile:1164: vmlinux] Error 1because commit:
commit 8888cdd0996c2d51cd417f9a60a282c034f3fa28
Author: Xiaoyao Li
Date: Wed Sep 23 11:31:11 2020 -0700KVM: VMX: Extract posted interrupt support to separate files
added another pi_init(), though one already existed in the paride code.
Reported-by: Jens Axboe
Signed-off-by: Paolo Bonzini -
Replace a modulo operator with the more common pattern for computing the
gfn "offset" of a huge page to fix an i386 build error.arch/x86/kvm/mmu/tdp_mmu.c:212: undefined reference to `__umoddi3'
In fact, almost all of tdp_mmu.c can be elided on 32-bit builds, but
that is a much larger patch.Fixes: 2f2fad0897cb ("kvm: x86/mmu: Add functions to handle changed TDP SPTEs")
Reported-by: Daniel Díaz
Signed-off-by: Sean Christopherson
Message-Id:
Signed-off-by: Paolo Bonzini -
Pull KVM updates from Paolo Bonzini:
"For x86, there is a new alternative and (in the future) more scalable
implementation of extended page tables that does not need a reverse
map from guest physical addresses to host physical addresses.For now it is disabled by default because it is still lacking a few of
the existing MMU's bells and whistles. However it is a very solid
piece of work and it is already available for people to hammer on it.Other updates:
ARM:
- New page table code for both hypervisor and guest stage-2
- Introduction of a new EL2-private host context
- Allow EL2 to have its own private per-CPU variables
- Support of PMU event filtering
- Complete rework of the Spectre mitigationPPC:
- Fix for running nested guests with in-kernel IRQ chip
- Fix race condition causing occasional host hard lockup
- Minor cleanups and bugfixesx86:
- allow trapping unknown MSRs to userspace
- allow userspace to force #GP on specific MSRs
- INVPCID support on AMD
- nested AMD cleanup, on demand allocation of nested SVM state
- hide PV MSRs and hypercalls for features not enabled in CPUID
- new test for MSR_IA32_TSC writes from host and guest
- cleanups: MMU, CPUID, shared MSRs
- LAPIC latency optimizations ad bugfixes"* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits)
kvm: x86/mmu: NX largepage recovery for TDP MMU
kvm: x86/mmu: Don't clear write flooding count for direct roots
kvm: x86/mmu: Support MMIO in the TDP MMU
kvm: x86/mmu: Support write protection for nesting in tdp MMU
kvm: x86/mmu: Support disabling dirty logging for the tdp MMU
kvm: x86/mmu: Support dirty logging for the TDP MMU
kvm: x86/mmu: Support changed pte notifier in tdp MMU
kvm: x86/mmu: Add access tracking for tdp_mmu
kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU
kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU
kvm: x86/mmu: Add TDP MMU PF handler
kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg
kvm: x86/mmu: Support zapping SPTEs in the TDP MMU
KVM: Cache as_id in kvm_memory_slot
kvm: x86/mmu: Add functions to handle changed TDP SPTEs
kvm: x86/mmu: Allocate and free TDP MMU roots
kvm: x86/mmu: Init / Uninit the TDP MMU
kvm: x86/mmu: Introduce tdp_iter
KVM: mmu: extract spte.h and spte.c
KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp
...
23 Oct, 2020
2 commits
-
When KVM maps a largepage backed region at a lower level in order to
make it executable (i.e. NX large page shattering), it reduces the TLB
performance of that region. In order to avoid making this degradation
permanent, KVM must periodically reclaim shattered NX largepages by
zapping them and allowing them to be rebuilt in the page fault handler.With this patch, the TDP MMU does not respect KVM's rate limiting on
reclaim. It traverses the entire TDP structure every time. This will be
addressed in a future patch.Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: Ben Gardon
Message-Id:
Signed-off-by: Paolo Bonzini -
Direct roots don't have a write flooding count because the guest can't
affect that paging structure. Thus there's no need to clear the write
flooding count on a fast CR3 switch for direct roots.Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: Ben Gardon
Message-Id:
Signed-off-by: Paolo Bonzini