04 Aug, 2016
2 commits
-
The KVM_X2APIC_API_USE_32BIT_IDS feature applies to both
KVM_SET_GSI_ROUTING and KVM_SIGNAL_MSI, but was not mentioned in the
documentation for the latter ioctl.Signed-off-by: Paolo Bonzini
-
…it/kvmarm/kvmarm into HEAD
KVM/ARM Changes for v4.8 - Take 2
Includes GSI routing support to go along with the new VGIC and a small fix that
has been cooking in -next for a while.
23 Jul, 2016
4 commits
-
KVM/ARM changes for Linux 4.8
- GICv3 ITS emulation
- Simpler idmap management that fixes potential TLB conflicts
- Honor the kernel protection in HYP mode
- Removal of the old vgic implementation -
Up to now, only irqchip routing entries could be set. This patch
adds the capability to insert MSI routing entries.For ARM64, let's also increase KVM_MAX_IRQ_ROUTES to 4096: this
include SPI irqchip routes plus MSI routes. In the future this
might be extended.Signed-off-by: Eric Auger
Reviewed-by: Andre Przywara
Signed-off-by: Marc Zyngier -
This patch adds compilation and link against irqchip.
Main motivation behind using irqchip code is to enable MSI
routing code. In the future irqchip routing may also be useful
when targeting multiple irqchips.Routing standard callbacks now are implemented in vgic-irqfd:
- kvm_set_routing_entry
- kvm_set_irq
- kvm_set_msiThey only are supported with new_vgic code.
Both HAVE_KVM_IRQCHIP and HAVE_KVM_IRQ_ROUTING are defined.
KVM_CAP_IRQ_ROUTING is advertised and KVM_SET_GSI_ROUTING is allowed.So from now on IRQCHIP routing is enabled and a routing table entry
must exist for irqfd injection to succeed for a given SPI. This patch
builds a default flat irqchip routing table (gsi=irqchip.pin) covering
all the VGIC SPI indexes. This routing table is overwritten by the
first first user-space call to KVM_SET_GSI_ROUTING ioctl.MSI routing setup is not yet allowed.
Signed-off-by: Eric Auger
Signed-off-by: Marc Zyngier -
On ARM, the MSI msg (address and data) comes along with
out-of-band device ID information. The device ID encodes the
device that writes the MSI msg. Let's convey the device id in
kvm_irq_routing_msi and use KVM_MSI_VALID_DEVID flag value in
kvm_irq_routing_entry to indicate the msi devid is populated.Signed-off-by: Eric Auger
Reviewed-by: Andre Przywara
Acked-by: Radim Krčmář
Signed-off-by: Marc Zyngier
19 Jul, 2016
3 commits
-
Now that all ITS emulation functionality is in place, we advertise
MSI functionality to userland and also the ITS device to the guest - if
userland has configured that.Signed-off-by: Andre Przywara
Reviewed-by: Marc Zyngier
Tested-by: Eric Auger
Signed-off-by: Marc Zyngier -
Introduce a new KVM device that represents an ARM Interrupt Translation
Service (ITS) controller. Since there can be multiple of this per guest,
we can't piggy back on the existing GICv3 distributor device, but create
a new type of KVM device.
On the KVM_CREATE_DEVICE ioctl we allocate and initialize the ITS data
structure and store the pointer in the kvm_device data.
Upon an explicit init ioctl from userland (after having setup the MMIO
address) we register the handlers with the kvm_io_bus framework.
Any reference to an ITS thus has to go via this interface.Signed-off-by: Andre Przywara
Reviewed-by: Marc Zyngier
Tested-by: Eric Auger
Signed-off-by: Marc Zyngier -
The ARM GICv3 ITS MSI controller requires a device ID to be able to
assign the proper interrupt vector. On real hardware, this ID is
sampled from the bus. To be able to emulate an ITS controller, extend
the KVM MSI interface to let userspace provide such a device ID. For
PCI devices, the device ID is simply the 16-bit bus-device-function
triplet, which should be easily available to the userland tool.Also there is a new KVM capability which advertises whether the
current VM requires a device ID to be set along with the MSI data.
This flag is still reported as not available everywhere, later we will
enable it when ITS emulation is used.Signed-off-by: Andre Przywara
Reviewed-by: Eric Auger
Reviewed-by: Marc Zyngier
Acked-by: Christoffer Dall
Acked-by: Paolo Bonzini
Tested-by: Eric Auger
Signed-off-by: Marc Zyngier
18 Jul, 2016
1 commit
-
We will use illegal instruction 0x0000 for handling 2 byte sw breakpoints
from user space. As it can be enabled dynamically via a capability,
let's move setting of ICTL_OPEREXC to the post creation step, so we avoid
any races when enabling that capability just while adding new cpus.Acked-by: Janosch Frank
Reviewed-by: Cornelia Huck
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger
14 Jul, 2016
2 commits
-
Add KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK as a feature flag to
KVM_CAP_X2APIC_API.The quirk made KVM interpret 0xff as a broadcast even in x2APIC mode.
The enableable capability is needed in order to support standard x2APIC and
remain backward compatible.Signed-off-by: Radim Krčmář
[Expand kvm_apic_mda comment. - Paolo]
Signed-off-by: Paolo Bonzini -
KVM_CAP_X2APIC_API is a capability for features related to x2APIC
enablement. KVM_X2APIC_API_32BIT_FORMAT feature can be enabled to
extend APIC ID in get/set ioctl and MSI addresses to 32 bits.
Both are needed to support x2APIC.The feature has to be enableable and disabled by default, because
get/set ioctl shifted and truncated APIC ID to 8 bits by using a
non-standard protocol inspired by xAPIC and the change is not
backward-compatible.Changes to MSI addresses follow the format used by interrupt remapping
unit. The upper address word, that used to be 0, contains upper 24 bits
of the LAPIC address in its upper 24 bits. Lower 8 bits are reserved as
0. Using the upper address word is not backward-compatible either as we
didn't check that userspace zeroed the word. Reserved bits are still
not explicitly checked, but non-zero data will affect LAPIC addresses,
which will cause a bug.Signed-off-by: Radim Krčmář
Signed-off-by: Paolo Bonzini
16 Jun, 2016
1 commit
-
Allow up to 6 KVM guest KScratch registers to be enabled and accessed
via the KVM guest register API and from the guest itself (the fallback
reading and writing of commpage registers is sufficient for KScratch
registers to work as expected).User mode can expose the registers by setting the appropriate bits of
the guest Config4.KScrExist field. KScratch registers that aren't usable
won't be writeable via the KVM Ioctl API.Signed-off-by: James Hogan
Cc: Paolo Bonzini
Cc: Radim Krčmář
Cc: Ralf Baechle
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini
15 Jun, 2016
1 commit
-
…/kvms390/linux into HEAD
KVM: s390: Features and fixes for 4.8 part1
Four bigger things:
1. The implementation of the STHYI opcode in the kernel. This is used
in libraries like qclib [1] to provide enough information for a
capacity and usage based software licence pricing. The STHYI content
is defined by the related z/VM documentation [2]. Its data can be
composed by accessing several other interfaces provided by LPAR or
the machine. This information is partially sensitive or root-only
so the kernel does the necessary filtering.
2. Preparation for nested virtualization (VSIE). KVM should query the
proper sclp interfaces for the availability of some features before
using it. In the past we have been sloppy and simply assumed that
several features are available. With this we should be able to handle
most cases of a missing feature.
3. CPU model interfaces extended by some additional features that are
not covered by a facility bit in STFLE. For example all the crypto
instructions of the coprocessor provide a query function. As reality
tends to be more complex (e.g. export regulations might block some
algorithms) we have to provide additional interfaces to query or
set these non-stfle features.
4. Several fixes and changes detected and fixed when doing 1-3.All features change base s390 code. All relevant patches have an ACK
from the s390 or component maintainers.The next pull request for 4.8 (part2) will contain the implementation
of VSIE.[1] http://www.ibm.com/developerworks/linux/linux390/qclib.html
[2] https://www.ibm.com/support/knowledgecenter/SSB27U_6.3.0/com.ibm.zvm.v630.hcpb4/hcpb4sth.htm
14 Jun, 2016
1 commit
-
Signed-off-by: Andrea Gelmini
Signed-off-by: Paolo Bonzini
10 Jun, 2016
3 commits
-
Let's not provide the device attribute for cmma enabling and clearing
if the hardware doesn't support it.This also helps getting rid of the undocumented return value "-EINVAL"
in case CMMA is not available when trying to enable it.Also properly document the meaning of -EINVAL for CMMA clearing.
Reviewed-by: Christian Borntraeger
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger -
We have certain instructions that indicate available subfunctions via
a query subfunction (crypto functions and ptff), or via a test bit
function (plo).By exposing these "subfunction blocks" to user space, we allow user space
to
1) query available subfunctions and make sure subfunctions won't get lost
during migration - e.g. properly indicate them via a CPU model
2) change the subfunctions to be reported to the guest (even adding
unavailable ones)This mechanism works just like the way we indicate the stfl(e) list to
user space.This way, user space could even emulate some subfunctions in QEMU in the
future. If this is ever applicable, we have to make sure later on, that
unsupported subfunctions result in an intercept to QEMU.Please note that support to indicate them to the guest is still missing
and requires hardware support. Usually, the IBC takes already care of these
subfunctions for migration safety. QEMU should make sure to always set
these bits properly according to the machine generation to be emulated.Available subfunctions are only valid in combination with STFLE bits
retrieved via KVM_S390_VM_CPU_MACHINE and enabled via
KVM_S390_VM_CPU_PROCESSOR. If the applicable bits are available, the
indicated subfunctions are guaranteed to be correct.Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger -
For now, we only have an interface to query and configure facilities
indicated via STFL(E). However, we also have features indicated via
SCLP, that have to be indicated to the guest by user space and usually
require KVM support.This patch allows user space to query and configure available cpu features
for the guest.Please note that disabling a feature doesn't necessarily mean that it is
completely disabled (e.g. ESOP is mostly handled by the SIE). We will try
our best to disable it.Most features (e.g. SCLP) can't directly be forwarded, as most of them need
in addition to hardware support, support in KVM. As we later on want to
turn these features in KVM explicitly on/off (to simulate different
behavior), we have to filter all features provided by the hardware and
make them configurable.Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger
12 May, 2016
1 commit
-
The KVM_MAX_VCPUS define provides the maximum number of vCPUs per guest, and
also the upper limit for vCPU ids. This is okay for all archs except PowerPC
which can have higher ids, depending on the cpu/core/thread topology. In the
worst case (single threaded guest, host with 8 threads per core), it limits
the maximum number of vCPUS to KVM_MAX_VCPUS / 8.This patch separates the vCPU numbering from the total number of vCPUs, with
the introduction of KVM_MAX_VCPU_ID, as the maximal valid value for vCPU ids
plus one.The corresponding KVM_CAP_MAX_VCPU_ID allows userspace to validate vCPU ids
before passing them to KVM_CREATE_VCPU.This patch only implements KVM_MAX_VCPU_ID with a specific value for PowerPC.
Other archs continue to return KVM_MAX_VCPUS instead.Suggested-by: Radim Krcmar
Signed-off-by: Greg Kurz
Reviewed-by: Cornelia Huck
Signed-off-by: Paolo Bonzini
10 May, 2016
1 commit
-
…/kvms390/linux into HEAD
KVM: s390: features and fixes for 4.7 part2
- Use hardware provided information about facility bits that do not
need any hypervisor activitiy
- Add missing documentation for KVM_CAP_S390_RI
- Some updates/fixes for handling cpu models and facilities
09 May, 2016
1 commit
-
We forgot to document that capability, let's add documentation.
Reviewed-by: Christian Borntraeger
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger
25 Apr, 2016
1 commit
-
Signed-off-by: Eric Engestrom
Acked-by: Cornelia Huck
Signed-off-by: Radim Krčmář
20 Apr, 2016
2 commits
-
Introduce a FLIC operation for clearing I/O interrupts for a subchannel.
Rationale: According to the platform specification, pending I/O
interruption requests have to be revoked in certain situations. For
instance, according to the Principles of Operation (page 17-27), a
subchannel put into the installed parameters initialized state is in the
same state as after an I/O system reset (just parameters possibly changed).
This implies that any I/O interrupts for that subchannel are no longer
pending (as I/O system resets clear I/O interrupts). Therefore, we need an
interface to clear pending I/O interrupts.Signed-off-by: Halil Pasic
Reviewed-by: Cornelia Huck
Reviewed-by: Christian Borntraeger
Signed-off-by: Christian Borntraeger
Signed-off-by: Cornelia Huck -
FLIC behavior deviates from the API documentation in reporting EINVAL
instead of ENXIO for KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR when the group
or attribute is unknown/unsupported. Unfortunately this can not be fixed
for historical reasons. Let us at least have it documented.Signed-off-by: Halil Pasic
Reviewed-by: Cornelia Huck
Reviewed-by: Christian Borntraeger
Signed-off-by: Christian Borntraeger
Signed-off-by: Cornelia Huck
17 Mar, 2016
1 commit
-
Pull KVM updates from Paolo Bonzini:
"One of the largest releases for KVM... Hardly any generic
changes, but lots of architecture-specific updates.ARM:
- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
- PMU support for guests
- 32bit world switch rewritten in C
- various optimizations to the vgic save/restore code.PPC:
- enabled KVM-VFIO integration ("VFIO device")
- optimizations to speed up IPIs between vcpus
- in-kernel handling of IOMMU hypercalls
- support for dynamic DMA windows (DDW).s390:
- provide the floating point registers via sync regs;
- separated instruction vs. data accesses
- dirty log improvements for huge guests
- bugfixes and documentation improvements.x86:
- Hyper-V VMBus hypercall userspace exit
- alternative implementation of lowest-priority interrupts using
vector hashing (for better VT-d posted interrupt support)
- fixed guest debugging with nested virtualizations
- improved interrupt tracking in the in-kernel IOAPIC
- generic infrastructure for tracking writes to guest
memory - currently its only use is to speedup the legacy shadow
paging (pre-EPT) case, but in the future it will be used for
virtual GPUs as well
- much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes"* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits)
KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch
KVM: x86: disable MPX if host did not enable MPX XSAVE features
arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit
arm64: KVM: vgic-v3: Reset LRs at boot time
arm64: KVM: vgic-v3: Do not save an LR known to be empty
arm64: KVM: vgic-v3: Save maintenance interrupt state only if required
arm64: KVM: vgic-v3: Avoid accessing ICH registers
KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit
KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit
KVM: arm/arm64: vgic-v2: Reset LRs at boot time
KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty
KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function
KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required
KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers
KVM: s390: allocate only one DMA page per VM
KVM: s390: enable STFLE interpretation only if enabled for the guest
KVM: s390: wake up when the VCPU cpu timer expires
KVM: s390: step the VCPU timer while in enabled wait
KVM: s390: protect VCPU cpu timer with a seqcount
KVM: s390: step VCPU cpu timer during kvm_run ioctl
...
10 Mar, 2016
1 commit
-
Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.KVM runs the guest with CR0.WP=1, so it must handle supervisor writes
specially when pte.u=1/pte.w=0/CR0.WP=0. Such writes cause a fault
when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0.
When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and
restarts execution. This will still cause a user write to fault, while
supervisor writes will succeed. User reads will fault spuriously now,
and KVM will then flip U and W again in the SPTE (U=1, W=0). User reads
will be enabled and supervisor writes disabled, going back to the
originary situation where supervisor writes fault spuriously.When SMEP is in effect, however, U=0 will enable kernel execution of
this page. To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0. If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch. (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry
control, so they do not use user-return notifiers for EFER---if they did,
EFER.NX would be forced to the same value as the host).There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.Cc: stable@vger.kernel.org
Cc: Andy Lutomirski
Reviewed-by: Xiao Guangrong
Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini
09 Mar, 2016
1 commit
-
KVM/ARM updates for 4.6
- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
- PMU support for guests
- 32bit world switch rewritten in C
- Various optimizations to the vgic save/restore codeConflicts:
include/uapi/linux/kvm.h
04 Mar, 2016
1 commit
-
Signed-off-by: Radim Krčmář
Signed-off-by: Paolo Bonzini
03 Mar, 2016
2 commits
-
kvm_lpage_info->write_count is used to detect if the large page mapping
for the gfn on the specified level is allowed, rename it to disallow_lpage
to reflect its purpose, also we rename has_wrprotected_page() to
mmu_gfn_lpage_is_disallowed() to make the code more clearerLater we will extend this mechanism for page tracking: if the gfn is
tracked then large mapping for that gfn on any level is not allowed.
The new name is more straightforwardReviewed-by: Paolo Bonzini
Signed-off-by: Xiao Guangrong
Signed-off-by: Paolo Bonzini -
…lus/powerpc into HEAD
The highlights are:
* Enable VFIO device on PowerPC, from David Gibson
* Optimizations to speed up IPIs between vcpus in HV KVM,
from Suresh Warrier (who is also Suresh E. Warrier)
* In-kernel handling of IOMMU hypercalls, and support for dynamic DMA
windows (DDW), from Alexey Kardashevskiy.Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
02 Mar, 2016
1 commit
-
The existing KVM_CREATE_SPAPR_TCE only supports 32bit windows which is not
enough for directly mapped windows as the guest can get more than 4GB.This adds KVM_CREATE_SPAPR_TCE_64 ioctl and advertises it
via KVM_CAP_SPAPR_TCE_64 capability. The table size is checked against
the locked memory limit.Since 64bit windows are to support Dynamic DMA windows (DDW), let's add
@bus_offset and @page_shift which are also required by DDW.Signed-off-by: Alexey Kardashevskiy
Signed-off-by: Paul Mackerras
01 Mar, 2016
3 commits
-
To configure the virtual PMUv3 overflow interrupt number, we use the
vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_PMU_V3_IRQ
attribute within the KVM_ARM_VCPU_PMU_V3_CTRL group.After configuring the PMUv3, call the vcpu ioctl with attribute
KVM_ARM_VCPU_PMU_V3_INIT to initialize the PMUv3.Signed-off-by: Shannon Zhao
Acked-by: Peter Maydell
Reviewed-by: Andrew Jones
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
In some cases it needs to get/set attributes specific to a vcpu and so
needs something else than ONE_REG.Let's copy the KVM_DEVICE approach, and define the respective ioctls
for the vcpu file descriptor.Signed-off-by: Shannon Zhao
Reviewed-by: Andrew Jones
Acked-by: Peter Maydell
Signed-off-by: Marc Zyngier -
To support guest PMUv3, use one bit of the VCPU INIT feature array.
Initialize the PMU when initialzing the vcpu with that bit and PMU
overflow interrupt set.Signed-off-by: Shannon Zhao
Acked-by: Peter Maydell
Reviewed-by: Andrew Jones
Signed-off-by: Marc Zyngier
17 Feb, 2016
1 commit
-
The patch implements KVM_EXIT_HYPERV userspace exit
functionality for Hyper-V VMBus hypercalls:
HV_X64_HCALL_POST_MESSAGE, HV_X64_HCALL_SIGNAL_EVENT.Changes v3:
* use vcpu->arch.complete_userspace_io to setup hypercall
resultChanges v2:
* use KVM_EXIT_HYPERV for hypercallsSigned-off-by: Andrey Smetanin
Reviewed-by: Roman Kagan
CC: Gleb Natapov
CC: Paolo Bonzini
CC: Joerg Roedel
CC: "K. Y. Srinivasan"
CC: Haiyang Zhang
CC: Roman Kagan
CC: Denis V. Lunev
CC: qemu-devel@nongnu.org
Signed-off-by: Paolo Bonzini
16 Feb, 2016
1 commit
-
This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
devices or emulated PCI. These calls allow adding multiple entries
(up to 512) into the TCE table in one call which saves time on
transition between kernel and user space.The current implementation of kvmppc_h_stuff_tce() allows it to be
executed in both real and virtual modes so there is one helper.
The kvmppc_rm_h_put_tce_indirect() needs to translate the guest address
to the host address and since the translation is different, there are
2 helpers - one for each mode.This implements the KVM_CAP_PPC_MULTITCE capability. When present,
the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE if these
are enabled by the userspace via KVM_CAP_PPC_ENABLE_HCALL.
If they can not be handled by the kernel, they are passed on to
the user space. The user space still has to have an implementation
for these.Both HV and PR-syle KVM are supported.
Signed-off-by: Alexey Kardashevskiy
Reviewed-by: David Gibson
Signed-off-by: Paul Mackerras
10 Feb, 2016
3 commits
-
The interface for adapter mappings was designed with code in mind
that maps each address only once; let's document this.Otherwise, duplicate mappings are added to the list, which makes
the code ineffective and uses up the limited amount of mapping
needlessly.Signed-off-by: Cornelia Huck
Signed-off-by: Christian Borntraeger -
Let's properly document KVM_S390_VM_CRYPTO and its attributes.
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger -
Let's properly document KVM_S390_VM_TOD and its attributes.
Reviewed-by: Cornelia Huck
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger
26 Jan, 2016
1 commit
-
The KVM_SMI capability is following the KVM_S390_SET_IRQ_STATE capability
which is "4.95", this changes the number of the KVM_SMI chapter to 4.96.Signed-off-by: Alexey Kardashevskiy
Reviewed-by: David Gibson
Signed-off-by: Paolo Bonzini