16 Jul, 2017
1 commit
-
Pull more KVM updates from Radim Krčmář:
"Second batch of KVM updates for v4.13Common:
- add uevents for VM creation/destruction
- annotate and properly access RCU-protected objectss390:
- rename IOCTL added in the first v4.13 mergex86:
- emulate VMLOAD VMSAVE feature in SVM
- support paravirtual asynchronous page fault while nested
- add Hyper-V userspace interfaces for better migration
- improve master clock corner cases
- extend internal error reporting after EPT misconfig
- correct single-stepping of emulated instructions in SVM
- handle MCE during VM entry
- fix nVMX VM entry checks and nVMX VMCS shadowing"* tag 'kvm-4.13-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
kvm: x86: hyperv: make VP_INDEX managed by userspace
KVM: async_pf: Let guest support delivery of async_pf from guest mode
KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
KVM: x86: Simplify kvm_x86_ops->queue_exception parameter list
kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2
KVM: x86: make backwards_tsc_observed a per-VM variable
KVM: trigger uevents when creating or destroying a VM
KVM: SVM: Enable Virtual VMLOAD VMSAVE feature
KVM: SVM: Add Virtual VMLOAD VMSAVE feature definition
KVM: SVM: Rename lbr_ctl field in the vmcb control area
KVM: SVM: Prepare for new bit definition in lbr_ctl
KVM: SVM: handle singlestep exception when skipping emulated instructions
KVM: x86: take slots_lock in kvm_free_pit
KVM: s390: Fix KVM_S390_GET_CMMA_BITS ioctl definition
kvm: vmx: Properly handle machine check during VM-entry
KVM: x86: update master clock before computing kvmclock_offset
kvm: nVMX: Shadow "high" parts of shadowed 64-bit VMCS fields
kvm: nVMX: Fix nested_vmx_check_msr_bitmap_controls
kvm: nVMX: Validate the I/O bitmaps on nested VM-entry
...
14 Jul, 2017
1 commit
-
Pull VFIO updates from Alex Williamson:
- Include Intel XXV710 in INTx workaround (Alex Williamson)
- Make use of ERR_CAST() for error return (Dan Carpenter)
- Fix vfio_group release deadlock from iommu notifier (Alex Williamson)
- Unset KVM-VFIO attributes only on group match (Alex Williamson)
- Fix release path group/file matching with KVM-VFIO (Alex Williamson)
- Remove unnecessary lock uses triggering lockdep splat (Alex Williamson)
* tag 'vfio-v4.13-rc1' of git://github.com/awilliam/linux-vfio:
vfio: Remove unnecessary uses of vfio_container.group_lock
vfio: New external user group/file match
kvm-vfio: Decouple only when we match a group
vfio: Fix group release deadlock
vfio: Use ERR_CAST() instead of open coding it
vfio/pci: Add Intel XXV710 to hidden INTx devices
13 Jul, 2017
1 commit
-
This patch adds a few lines to the KVM common code to fire a
KOBJ_CHANGE uevent whenever a KVM VM is created or destroyed. The event
carries five environment variables:CREATED indicates how many times a new VM has been created. It is
useful for example to trigger specific actions when the first
VM is started
COUNT indicates how many VMs are currently active. This can be used for
logging or monitoring purposes
PID has the pid of the KVM process that has been started or stopped.
This can be used to perform process-specific tuning.
STATS_PATH contains the path in debugfs to the directory with all the
runtime statistics for this VM. This is useful for performance
monitoring and profiling.
EVENT described the type of event, its value can be either "create" or
"destroy"Specific udev rules can be then set up in userspace to deal with the
creation or destruction of VMs as needed.Signed-off-by: Claudio Imbrenda
Signed-off-by: Radim Krčmář
10 Jul, 2017
2 commits
-
…traeger/linux into kvm-master
-
The uniprocessor version of smp_call_function_many does not evaluate
all of its argument, and the compiler emits a warning about "wait"
being unused. This breaks the build on architectures for which
"-Werror" is enabled by default.Work around it by moving the invocation of smp_call_function_many to
its own inline function.Reported-by: Paul Mackerras
Cc: stable@vger.kernel.org
Fixes: 7a97cec26b94c909f4cbad2dc3186af3e457a522
Signed-off-by: Paolo Bonzini
07 Jul, 2017
5 commits
-
we access the memslots array via srcu. Mark it as such and
use the right access functions also for the freeing of
memory slots.Found by sparse:
./include/linux/kvm_host.h:565:16: error: incompatible types in
comparison expression (different address spaces)Signed-off-by: Christian Borntraeger
Reviewed-by: Paolo Bonzini -
mark kvm->busses as rcu protected and use the correct access
function everywhere.found by sparse
virt/kvm/kvm_main.c:3490:15: error: incompatible types in comparison expression (different address spaces)
virt/kvm/kvm_main.c:3509:15: error: incompatible types in comparison expression (different address spaces)
virt/kvm/kvm_main.c:3561:15: error: incompatible types in comparison expression (different address spaces)
virt/kvm/kvm_main.c:3644:15: error: incompatible types in comparison expression (different address spaces)Signed-off-by: Christian Borntraeger
-
irq routing is rcu protected. Use the proper access functions.
Found by sparsevirt/kvm/irqchip.c:233:13: warning: incorrect type in assignment (different address spaces)
virt/kvm/irqchip.c:233:13: expected struct kvm_irq_routing_table *old
virt/kvm/irqchip.c:233:13: got struct kvm_irq_routing_table [noderef] *irq_routingSigned-off-by: Christian Borntraeger
Reviewed-by: Paolo Bonzini -
We do use rcu to protect the pid pointer. Mark it as such and
adopt all code to use the proper access methods.This was detected by sparse.
"virt/kvm/kvm_main.c:2248:15: error: incompatible types in comparison
expression (different address spaces)"Signed-off-by: Christian Borntraeger
Reviewed-by: Paolo Bonzini -
Pull KVM updates from Paolo Bonzini:
"PPC:
- Better machine check handling for HV KVM
- Ability to support guests with threads=2, 4 or 8 on POWER9
- Fix for a race that could cause delayed recognition of signals
- Fix for a bug where POWER9 guests could sleep with interrupts pending.ARM:
- VCPU request overhaul
- allow timer and PMU to have their interrupt number selected from userspace
- workaround for Cavium erratum 30115
- handling of memory poisonning
- the usual crop of fixes and cleanupss390:
- initial machine check forwarding
- migration support for the CMMA page hinting information
- cleanups and fixesx86:
- nested VMX bugfixes and improvements
- more reliable NMI window detection on AMD
- APIC timer optimizationsGeneric:
- VCPU request overhaul + documentation of common code patterns
- kvm_stat improvements"* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits)
Update my email address
kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12
kvm: x86: mmu: allow A/D bits to be disabled in an mmu
x86: kvm: mmu: make spte mmio mask more explicit
x86: kvm: mmu: dead code thanks to access tracking
KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code
KVM: PPC: Book3S HV: Close race with testing for signals on guest entry
KVM: PPC: Book3S HV: Simplify dynamic micro-threading code
KVM: x86: remove ignored type attribute
KVM: LAPIC: Fix lapic timer injection delay
KVM: lapic: reorganize restart_apic_timer
KVM: lapic: reorganize start_hv_timer
kvm: nVMX: Check memory operand to INVVPID
KVM: s390: Inject machine check into the nested guest
KVM: s390: Inject machine check into the guest
tools/kvm_stat: add new interactive command 'b'
tools/kvm_stat: add new command line switch '-i'
tools/kvm_stat: fix error on interactive command 'g'
KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exit
...
06 Jul, 2017
1 commit
-
Pull arm64 updates from Will Deacon:
- RAS reporting via GHES/APEI (ACPI)
- Indirect ftrace trampolines for modules
- Improvements to kernel fault reporting
- Page poisoning
- Sigframe cleanups and preparation for SVE context
- Core dump fixes
- Sparse fixes (mainly relating to endianness)
- xgene SoC PMU v3 driver
- Misc cleanups and non-critical fixes
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (75 commits)
arm64: fix endianness annotation for 'struct jit_ctx' and friends
arm64: cpuinfo: constify attribute_group structures.
arm64: ptrace: Fix incorrect get_user() use in compat_vfp_set()
arm64: ptrace: Remove redundant overrun check from compat_vfp_set()
arm64: ptrace: Avoid setting compat FP[SC]R to garbage if get_user fails
arm64: fix endianness annotation for __apply_alternatives()/get_alt_insn()
arm64: fix endianness annotation in get_kaslr_seed()
arm64: add missing conversion to __wsum in ip_fast_csum()
arm64: fix endianness annotation in acpi_parking_protocol.c
arm64: use readq() instead of readl() to read 64bit entry_point
arm64: fix endianness annotation for reloc_insn_movw() & reloc_insn_imm()
arm64: fix endianness annotation for aarch64_insn_write()
arm64: fix endianness annotation in aarch64_insn_read()
arm64: fix endianness annotation in call_undef_hook()
arm64: fix endianness annotation for debug-monitors.c
ras: mark stub functions as 'inline'
arm64: pass endianness info to sparse
arm64: ftrace: fix !CONFIG_ARM64_MODULE_PLTS kernels
arm64: signal: Allow expansion of the signal frame
acpi: apei: check for pending errors when probing GHES entries
...
30 Jun, 2017
1 commit
-
KVM/ARM updates for 4.13
- vcpu request overhaul
- allow timer and PMU to have their interrupt number
selected from userspace
- workaround for Cavium erratum 30115
- handling of memory poisonning
- the usual crop of fixes and cleanupsConflicts:
arch/s390/include/asm/kvm_host.h
29 Jun, 2017
2 commits
-
At the point where the kvm-vfio pseudo device wants to release its
vfio group reference, we can't always acquire a new reference to make
that happen. The group can be in a state where we wouldn't allow a
new reference to be added. This new helper function allows a caller
to match a file to a group to facilitate this. Given a file and
group, report if they match. Thus the caller needs to already have a
group reference to match to the file. This allows the deletion of a
group without acquiring a new reference.Signed-off-by: Alex Williamson
Reviewed-by: Eric Auger
Reviewed-by: Paolo Bonzini
Tested-by: Eric Auger
Cc: stable@vger.kernel.org -
Unset-KVM and decrement-assignment only when we find the group in our
list. Otherwise we can get out of sync if the user triggers this for
groups that aren't currently on our list.Signed-off-by: Alex Williamson
Reviewed-by: Alexey Kardashevskiy
Reviewed-by: Eric Auger
Tested-by: Eric Auger
Acked-by: Paolo Bonzini
Cc: stable@vger.kernel.org
27 Jun, 2017
2 commits
-
The call to kvm_put_kvm was removed from error handling in commit
506cfba9e726 ("KVM: don't use anon_inode_getfd() before possible
failures"), but it is _not_ a memory leak. Reuse Al's explanation
to avoid that someone else makes the same mistake.Signed-off-by: Paolo Bonzini
-
Replaces "S_IRUGO | S_IWUSR" with 0644. The reason is that symbolic
permissions considered harmful:
https://lwn.net/Articles/696229/Signed-off-by: Roman Storozhenko
Signed-off-by: Paolo Bonzini
23 Jun, 2017
2 commits
-
Currently external aborts are unsupported by the guest abort
handling. Add handling for SEAs so that the host kernel reports
SEAs which occur in the guest kernel.When an SEA occurs in the guest kernel, the guest exits and is
routed to kvm_handle_guest_abort(). Prior to this patch, a print
message of an unsupported FSC would be printed and nothing else
would happen. With this patch, the code gets routed to the APEI
handling of SEAs in the host kernel to report the SEA information.Signed-off-by: Tyler Baicar
Acked-by: Catalin Marinas
Acked-by: Marc Zyngier
Acked-by: Christoffer Dall
Signed-off-by: Will Deacon -
Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64, notifications for
broken memory can call memory_failure() in mm/memory-failure.c to offline
pages of memory, possibly signalling user space processes and notifying all
the in-kernel users.memory_failure() has two modes, early and late. Early is used by
machine-managers like Qemu to receive a notification when a memory error is
notified to the host. These can then be relayed to the guest before the
affected page is accessed. To enable this, the process must set
PR_MCE_KILL_EARLY in PR_MCE_KILL_SET using the prctl() syscall.Once the early notification has been handled, nothing stops the
machine-manager or guest from accessing the affected page. If the
machine-manager does this the page will fail to be mapped and SIGBUS will
be sent. This patch adds the equivalent path for when the guest accesses
the page, sending SIGBUS to the machine-manager.These two signals can be distinguished by the machine-manager using their
si_code: BUS_MCEERR_AO for 'action optional' early notifications, and
BUS_MCEERR_AR for 'action required' synchronous/late notifications.Do as x86 does, and deliver the SIGBUS when we discover pfn ==
KVM_PFN_ERR_HWPOISON. Use the hugepage size as si_addr_lsb if this vma was
allocated as a hugepage. Transparent hugepages will be split by
memory_failure() before we see them here.Cc: Punit Agrawal
Signed-off-by: James Morse
Signed-off-by: Marc Zyngier
20 Jun, 2017
1 commit
-
Rename:
wait_queue_t => wait_queue_entry_t
'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.Start sorting this out by renaming it to 'wait_queue_entry_t'.
This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar
15 Jun, 2017
21 commits
-
When reading the cntpct_el0 in guest with VHE (Virtual Host Extension)
enabled in host, the "Unsupported guest sys_reg access" error reported.
The reason is cnthctl_el2.EL1PCTEN is not enabled, which is expected
to be done in kvm_timer_init_vhe(). The problem is kvm_timer_init_vhe
is called by cpu_init_hyp_mode, and which is called when VHE is disabled.
This patch remove the incorrect call to kvm_timer_init_vhe() from
cpu_init_hyp_mode(), and calls kvm_timer_init_vhe() to enable
cnthctl_el2.EL1PCTEN in cpu_hyp_reinit().Fixes: 488f94d7212b ("KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems")
Cc: stable@vger.kernel.org
Signed-off-by: Hu Huajun
Reviewed-by: Christoffer Dall
Acked-by: Marc Zyngier
Signed-off-by: Christoffer Dall -
Per ARM DDI 0487B.a, the registers are named ICC_IGRPEN*_EL1 rather than
ICC_GRPEN*_EL1. Correct our mnemonics and comments to match, before we
add more GICv3 register definitions.Signed-off-by: Mark Rutland
Cc: Catalin Marinas
Cc: Marc Zyngier
Cc: kvmarm@lists.cs.columbia.edu
Acked-by: Christoffer Dall
Acked-by: Will Deacon
Signed-off-by: Christoffer Dall -
A write-to-read-only GICv3 access should UNDEF at EL1. But since
we're in complete paranoia-land with broken CPUs, let's assume the
worse and gracefully handle the case.Signed-off-by: Marc Zyngier
Reviewed-by: Christoffer Dall
Signed-off-by: Christoffer Dall -
A read-from-write-only GICv3 access should UNDEF at EL1. But since
we're in complete paranoia-land with broken CPUs, let's assume the
worse and gracefully handle the case.Signed-off-by: Marc Zyngier
Reviewed-by: Christoffer Dall
Signed-off-by: Christoffer Dall -
In order to facilitate debug, let's log which class of GICv3 system
registers are trapped.Tested-by: Alexander Graf
Acked-by: David Daney
Acked-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall -
Now that we're able to safely handle common sysreg access, let's
give the user the opportunity to enable it by passing a specific
command-line option (vgic_v3.common_trap).Tested-by: Alexander Graf
Acked-by: David Daney
Signed-off-by: Marc Zyngier
Acked-by: Christoffer Dall
Signed-off-by: Christoffer Dall -
Add a handler for reading/writing the guest's view of the ICC_PMR_EL1
register, which is located in the ICH_VMCR_EL2.VPMR field.Tested-by: Alexander Graf
Acked-by: David Daney
Acked-by: Christoffer Dall
Reviewed-by: Eric Auger
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall -
Add a handler for reading/writing the guest's view of the ICV_CTLR_EL1
register. only EOIMode and CBPR are of interest here, as all the other
bits directly come from ICH_VTR_EL2 and are Read-Only.Tested-by: Alexander Graf
Acked-by: David Daney
Acked-by: Christoffer Dall
Reviewed-by: Eric Auger
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall -
Add a handler for reading the guest's view of the ICV_RPR_EL1
register, returning the highest active priority.Tested-by: Alexander Graf
Acked-by: David Daney
Acked-by: Christoffer Dall
Reviewed-by: Eric Auger
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall -
Add a handler for writing the guest's view of the ICC_DIR_EL1
register, performing the deactivation of an interrupt if EOImode
is set ot 1.Tested-by: Alexander Graf
Acked-by: David Daney
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall -
Some Cavium Thunder CPUs suffer a problem where a KVM guest may
inadvertently cause the host kernel to quit receiving interrupts.Use the Group-0/1 trapping in order to deal with it.
[maz]: Adapted patch to the Group-0/1 trapping, reworked commit log
Tested-by: Alexander Graf
Acked-by: Catalin Marinas
Reviewed-by: Eric Auger
Signed-off-by: David Daney
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall -
Now that we're able to safely handle Group-0 sysreg access, let's
give the user the opportunity to enable it by passing a specific
command-line option (vgic_v3.group0_trap).Tested-by: Alexander Graf
Acked-by: David Daney
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall -
In order to be able to trap Group-0 GICv3 system registers, we need to
set ICH_HCR_EL2.TALL0 begore entering the guest. This is conditionnaly
done after having restored the guest's state, and cleared on exit.Tested-by: Alexander Graf
Acked-by: David Daney
Acked-by: Christoffer Dall
Reviewed-by: Eric Auger
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall -
A number of Group-0 registers can be handled by the same accessors
as that of Group-1, so let's add the required system register encodings
and catch them in the dispatching function.Tested-by: Alexander Graf
Acked-by: David Daney
Acked-by: Christoffer Dall
Reviewed-by: Eric Auger
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall -
Add a handler for reading/writing the guest's view of the ICC_IGRPEN0_EL1
register, which is located in the ICH_VMCR_EL2.VENG0 field.Tested-by: Alexander Graf
Acked-by: David Daney
Reviewed-by: Eric Auger
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall -
Add a handler for reading/writing the guest's view of the ICC_BPR0_EL1
register, which is located in the ICH_VMCR_EL2.BPR0 field.Tested-by: Alexander Graf
Acked-by: David Daney
Reviewed-by: Eric Auger
Signed-off-by: Marc Zyngier
Reviewed-by: Christoffer Dall
Signed-off-by: Christoffer Dall -
Now that we're able to safely handle Group-1 sysreg access, let's
give the user the opportunity to enable it by passing a specific
command-line option (vgic_v3.group1_trap).Tested-by: Alexander Graf
Acked-by: David Daney
Signed-off-by: Marc Zyngier
Acked-by: Christoffer Dall
Signed-off-by: Christoffer Dall -
In order to be able to trap Group-1 GICv3 system registers, we need to
set ICH_HCR_EL2.TALL1 before entering the guest. This is conditionally
done after having restored the guest's state, and cleared on exit.Tested-by: Alexander Graf
Acked-by: David Daney
Acked-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall -
Add a handler for reading the guest's view of the ICV_HPPIR1_EL1
register. This is a simple parsing of the available LRs, extracting the
highest available interrupt.Tested-by: Alexander Graf
Acked-by: David Daney
Reviewed-by: Eric Auger
Signed-off-by: Marc Zyngier
Reviewed-by: Christoffer Dall
Signed-off-by: Christoffer Dall -
Add a handler for reading/writing the guest's view of the ICV_AP1Rn_EL1
registers. We just map them to the corresponding ICH_AP1Rn_EL2 registers.Tested-by: Alexander Graf
Acked-by: David Daney
Reviewed-by: Eric Auger
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall -
Add a handler for writing the guest's view of the ICC_EOIR1_EL1
register. This involves dropping the priority of the interrupt,
and deactivating it if required (EOImode == 0).Tested-by: Alexander Graf
Acked-by: David Daney
Reviewed-by: Eric Auger
Signed-off-by: Marc Zyngier
Reviewed-by: Christoffer Dall
Signed-off-by: Christoffer Dall