06 May, 2013
1 commit
-
Pull kvm updates from Gleb Natapov:
"Highlights of the updates are:general:
- new emulated device API
- legacy device assignment is now optional
- irqfd interface is more generic and can be shared between archesx86:
- VMCS shadow support and other nested VMX improvements
- APIC virtualization and Posted Interrupt hardware support
- Optimize mmio spte zappingppc:
- BookE: in-kernel MPIC emulation with irqfd support
- Book3S: in-kernel XICS emulation (incomplete)
- Book3S: HV: migration fixes
- BookE: more debug support preparation
- BookE: e6500 supportARM:
- reworking of Hyp idmapss390:
- ioeventfd for virtio-ccwAnd many other bug fixes, cleanups and improvements"
* tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
kvm: Add compat_ioctl for device control API
KVM: x86: Account for failing enable_irq_window for NMI window request
KVM: PPC: Book3S: Add API for in-kernel XICS emulation
kvm/ppc/mpic: fix missing unlock in set_base_addr()
kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
kvm/ppc/mpic: remove users
kvm/ppc/mpic: fix mmio region lists when multiple guests used
kvm/ppc/mpic: remove default routes from documentation
kvm: KVM_CAP_IOMMU only available with device assignment
ARM: KVM: iterate over all CPUs for CPU compatibility check
KVM: ARM: Fix spelling in error message
ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
KVM: ARM: Fix API documentation for ONE_REG encoding
ARM: KVM: promote vfp_host pointer to generic host cpu context
ARM: KVM: add architecture specific hook for capabilities
ARM: KVM: perform HYP initilization for hotplugged CPUs
ARM: KVM: switch to a dual-step HYP init code
ARM: KVM: rework HYP page table freeing
ARM: KVM: enforce maximum size for identity mapped code
ARM: KVM: move to a KVM provided HYP idmap
...
03 May, 2013
1 commit
-
* 'kvm-arm-for-3.10' of git://github.com/columbia/linux-kvm-arm:
ARM: KVM: iterate over all CPUs for CPU compatibility check
KVM: ARM: Fix spelling in error message
ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
KVM: ARM: Fix API documentation for ONE_REG encoding
ARM: KVM: promote vfp_host pointer to generic host cpu context
ARM: KVM: add architecture specific hook for capabilities
ARM: KVM: perform HYP initilization for hotplugged CPUs
ARM: KVM: switch to a dual-step HYP init code
ARM: KVM: rework HYP page table freeing
ARM: KVM: enforce maximum size for identity mapped code
ARM: KVM: move to a KVM provided HYP idmap
ARM: KVM: fix HYP mapping limitations around zero
ARM: KVM: simplify HYP mapping population
ARM: KVM: arch_timer: use symbolic constants
ARM: KVM: add support for minimal host vs guest profiling
02 May, 2013
1 commit
-
This adds the API for userspace to instantiate an XICS device in a VM
and connect VCPUs to it. The API consists of a new device type for
the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which
functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl,
which is used to assert and deassert interrupt inputs of the XICS.The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES.
Each attribute within this group corresponds to the state of one
interrupt source. The attribute number is the same as the interrupt
source number.This does not support irq routing or irqfd yet.
Signed-off-by: Paul Mackerras
Acked-by: David Gibson
Signed-off-by: Alexander Graf
30 Apr, 2013
1 commit
-
The default routes were removed from the code during patchset
respinning, but were not removed from the documentation.Signed-off-by: Scott Wood
Signed-off-by: Alexander Graf
29 Apr, 2013
1 commit
-
Unless I'm mistaken, the size field was encoded 4 bits off and a wrong
value was used for 64-bit FP registers.Signed-off-by: Christoffer Dall
27 Apr, 2013
9 commits
-
This adds the ability for userspace to save and restore the state
of the XICS interrupt presentation controllers (ICPs) via the
KVM_GET/SET_ONE_REG interface. Since there is one ICP per vcpu, we
simply define a new 64-bit register in the ONE_REG space for the ICP
state. The state includes the CPU priority setting, the pending IPI
priority, and the priority and source number of any pending external
interrupt.Signed-off-by: Paul Mackerras
Signed-off-by: Alexander Graf -
For pseries machine emulation, in order to move the interrupt
controller code to the kernel, we need to intercept some RTAS
calls in the kernel itself. This adds an infrastructure to allow
in-kernel handlers to be registered for RTAS services by name.
A new ioctl, KVM_PPC_RTAS_DEFINE_TOKEN, then allows userspace to
associate token values with those service names. Then, when the
guest requests an RTAS service with one of those token values, it
will be handled by the relevant in-kernel handler rather than being
passed up to userspace as at present.Signed-off-by: Michael Ellerman
Signed-off-by: Benjamin Herrenschmidt
Signed-off-by: Paul Mackerras
[agraf: fix warning]
Signed-off-by: Alexander Graf -
Now that all the irq routing and irqfd pieces are generic, we can expose
real irqchip support to all of KVM's internal helpers.This allows us to use irqfd with the in-kernel MPIC.
Signed-off-by: Alexander Graf
-
Enabling this capability connects the vcpu to the designated in-kernel
MPIC. Using explicit connections between vcpus and irqchips allows
for flexibility, but the main benefit at the moment is that it
simplifies the code -- KVM doesn't need vm-global state to remember
which MPIC object is associated with this vm, and it doesn't need to
care about ordering between irqchip creation and vcpu creation.Signed-off-by: Scott Wood
[agraf: add stub functions for kvmppc_mpic_{dis,}connect_vcpu]
Signed-off-by: Alexander Graf -
Hook the MPIC code up to the KVM interfaces, add locking, etc.
Signed-off-by: Scott Wood
[agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit]
Signed-off-by: Alexander Graf -
Currently, devices that are emulated inside KVM are configured in a
hardcoded manner based on an assumption that any given architecture
only has one way to do it. If there's any need to access device state,
it is done through inflexible one-purpose-only IOCTLs (e.g.
KVM_GET/SET_LAPIC). Defining new IOCTLs for every little thing is
cumbersome and depletes a limited numberspace.This API provides a mechanism to instantiate a device of a certain
type, returning an ID that can be used to set/get attributes of the
device. Attributes may include configuration parameters (e.g.
register base address), device state, operational commands, etc. It
is similar to the ONE_REG API, except that it acts on devices rather
than vcpus.Both device types and individual attributes can be tested without having
to create the device or get/set the attribute, without the need for
separately managing enumerated capabilities.Signed-off-by: Scott Wood
Signed-off-by: Alexander Graf -
EPTCFG register defined by E.PT is accessed unconditionally by Linux guests
in the presence of MAV 2.0. Emulate it now.Signed-off-by: Mihai Caraman
Signed-off-by: Alexander Graf -
Add support for TLBnPS registers available in MMU Architecture Version
(MAV) 2.0.Signed-off-by: Mihai Caraman
Signed-off-by: Alexander Graf -
MMU registers were exposed to user-space using sregs interface. Add them
to ONE_REG interface using kvmppc_get_one_reg/kvmppc_set_one_reg delegation
mechanism.Signed-off-by: Mihai Caraman
Signed-off-by: Alexander Graf
22 Mar, 2013
1 commit
-
If userspace wants to change some specific bits of TSR
(timer status register) then it uses GET/SET_SREGS ioctl interface.
So the steps will be:
i) user-space will make get ioctl,
ii) change TSR in userspace
iii) then make set ioctl.
It can happen that TSR gets changed by kernel after step i) and
before step iii).To avoid this we have added below one_reg ioctls for oring and clearing
specific bits in TSR. This patch adds one registerface for:
1) setting specific bit in TSR (timer status register)
2) clearing specific bit in TSR (timer status register)
3) setting/getting the TCR register. There are cases where we want to only
change TCR and not TSR. Although we can uses SREGS without
KVM_SREGS_E_UPDATE_TSR flag but I think one reg is better. I am open
if someone feels we should use SREGS only here.
4) getting/setting TSR registerSigned-off-by: Bharat Bhushan
Signed-off-by: Alexander Graf
12 Mar, 2013
1 commit
-
We haven't been keeping it in sync, so just remove it.
Signed-off-by: Rusty Russell
06 Mar, 2013
1 commit
-
Enhance KVM_IOEVENTFD with a new flag that allows to attach to virtio-ccw
devices on s390 via the KVM_VIRTIO_CCW_NOTIFY_BUS.Signed-off-by: Cornelia Huck
Signed-off-by: Marcelo Tosatti
25 Feb, 2013
1 commit
-
Pull KVM updates from Marcelo Tosatti:
"KVM updates for the 3.9 merge window, including x86 real mode
emulation fixes, stronger memory slot interface restrictions, mmu_lock
spinlock hold time reduction, improved handling of large page faults
on shadow, initial APICv HW acceleration support, s390 channel IO
based virtio, amongst others"* tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
Revert "KVM: MMU: lazily drop large spte"
x86: pvclock kvm: align allocation size to page size
KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr
x86 emulator: fix parity calculation for AAD instruction
KVM: PPC: BookE: Handle alignment interrupts
booke: Added DBCR4 SPR number
KVM: PPC: booke: Allow multiple exception types
KVM: PPC: booke: use vcpu reference from thread_struct
KVM: Remove user_alloc from struct kvm_memory_slot
KVM: VMX: disable apicv by default
KVM: s390: Fix handling of iscs.
KVM: MMU: cleanup __direct_map
KVM: MMU: remove pt_access in mmu_set_spte
KVM: MMU: cleanup mapping-level
KVM: MMU: lazily drop large spte
KVM: VMX: cleanup vmx_set_cr0().
KVM: VMX: add missing exit names to VMX_EXIT_REASONS array
KVM: VMX: disable SMEP feature when guest is in non-paging mode
KVM: Remove duplicate text in api.txt
Revert "KVM: MMU: split kvm_mmu_free_page"
...
12 Feb, 2013
2 commits
-
User space defines the model to emulate to a guest and should therefore
decide which addresses are used for both the virtual CPU interface
directly mapped in the guest physical address space and for the emulated
distributor interface, which is mapped in software by the in-kernel VGIC
support.Reviewed-by: Will Deacon
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
On ARM some bits are specific to the model being emulated for the guest and
user space needs a way to tell the kernel about those bits. An example is mmio
device base addresses, where KVM must know the base address for a given device
to properly emulate mmio accesses within a certain address range or directly
map a device with virtualiation extensions into the guest address space.We make this API ARM-specific as we haven't yet reached a consensus for a
generic API for all KVM architectures that will allow us to do something like
this.Reviewed-by: Will Deacon
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier
06 Feb, 2013
1 commit
-
Signed-off-by: Geoff Levand
Signed-off-by: Marcelo Tosatti
05 Feb, 2013
1 commit
-
As Xiao pointed out, there are a few problems with it:
- kvm_arch_commit_memory_region() write protects the memory slot only
for GET_DIRTY_LOG when modifying the flags.
- FNAME(sync_page) uses the old spte value to set a new one without
checking KVM_MEM_READONLY flag.Since we flush all shadow pages when creating a new slot, the simplest
fix is to disallow such problematic flag changes: this is safe because
no one is doing such things.Reviewed-by: Gleb Natapov
Signed-off-by: Takuya Yoshikawa
Cc: Xiao Guangrong
Cc: Alex Williamson
Signed-off-by: Marcelo Tosatti
24 Jan, 2013
6 commits
-
Implement the PSCI specification (ARM DEN 0022A) to control
virtual CPUs being "powered" on or off.PSCI/KVM is detected using the KVM_CAP_ARM_PSCI capability.
A virtual CPU can now be initialized in a "powered off" state,
using the KVM_ARM_VCPU_POWER_OFF feature flag.The guest can use either SMC or HVC to execute a PSCI function.
Reviewed-by: Will Deacon
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall -
We use space #18 for floating point regs.
Reviewed-by: Will Deacon
Reviewed-by: Marcelo Tosatti
Signed-off-by: Rusty Russell
Signed-off-by: Christoffer Dall -
The Cache Size Selection Register (CSSELR) selects the current Cache
Size ID Register (CCSIDR). You write which cache you are interested
in to CSSELR, and read the information out of CCSIDR.Which cache numbers are valid is known by reading the Cache Level ID
Register (CLIDR).To export this state to userspace, we add a KVM_REG_ARM_DEMUX
numberspace (17), which uses 8 bits to represent which register is
being demultiplexed (0 for CCSIDR), and the lower 8 bits to represent
this demultiplexing (in our case, the CSSELR value, which is 4 bits).Reviewed-by: Will Deacon
Reviewed-by: Marcelo Tosatti
Signed-off-by: Rusty Russell
Signed-off-by: Christoffer Dall -
The following three ioctls are implemented:
- KVM_GET_REG_LIST
- KVM_GET_ONE_REG
- KVM_SET_ONE_REGNow we have a table for all the cp15 registers, we can drive a generic
API.The register IDs carry the following encoding:
ARM registers are mapped using the lower 32 bits. The upper 16 of that
is the register group type, or coprocessor number:ARM 32-bit CP15 registers have the following id bit patterns:
0x4002 0000 000FARM 64-bit CP15 registers have the following id bit patterns:
0x4003 0000 000FFor futureproofing, we need to tell QEMU about the CP15 registers the
host lets the guest access.It will need this information to restore a current guest on a future
CPU or perhaps a future KVM which allow some of these to be changed.We use a separate table for these, as they're only for the userspace API.
Reviewed-by: Will Deacon
Reviewed-by: Marcelo Tosatti
Signed-off-by: Rusty Russell
Signed-off-by: Christoffer Dall -
All interrupt injection is now based on the VM ioctl KVM_IRQ_LINE. This
works semantically well for the GIC as we in fact raise/lower a line on
a machine component (the gic). The IOCTL uses the follwing struct.struct kvm_irq_level {
union {
__u32 irq; /* GSI */
__s32 status; /* not used for KVM_IRQ_LEVEL */
};
__u32 level; /* 0 or 1 */
};ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
(GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
specific cpus. The irq field is interpreted like this:bits: | 31 ... 24 | 23 ... 16 | 15 ... 0 |
field: | irq_type | vcpu_index | irq_number |The irq_type field has the following values:
- irq_type[0]: out-of-kernel GIC: irq_number 0 is IRQ, irq_number 1 is FIQ
- irq_type[1]: in-kernel GIC: SPI, irq_number between 32 and 1019 (incl.)
(the vcpu_index field is ignored)
- irq_type[2]: in-kernel GIC: PPI, irq_number between 16 and 31 (incl.)The irq_number thus corresponds to the irq ID in as in the GICv2 specs.
This is documented in Documentation/kvm/api.txt.
Reviewed-by: Will Deacon
Reviewed-by: Marcelo Tosatti
Signed-off-by: Christoffer Dall -
Targets KVM support for Cortex A-15 processors.
Contains all the framework components, make files, header files, some
tracing functionality, and basic user space API.Only supported core is Cortex-A15 for now.
Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.
Reviewed-by: Will Deacon
Reviewed-by: Marcelo Tosatti
Signed-off-by: Rusty Russell
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall
14 Jan, 2013
1 commit
-
Not needed any more.
Reviewed-by: Marcelo Tosatti
Signed-off-by: Takuya Yoshikawa
Signed-off-by: Gleb Natapov
10 Jan, 2013
3 commits
-
We need to be able to read and write the contents of the EPR register
from user space.This patch implements that logic through the ONE_REG API and declares
its (never implemented) SREGS counterpart as deprecated.Signed-off-by: Alexander Graf
-
The External Proxy Facility in FSL BookE chips allows the interrupt
controller to automatically acknowledge an interrupt as soon as a
core gets its pending external interrupt delivered.Today, user space implements the interrupt controller, so we need to
check on it during such a cycle.This patch implements logic for user space to enable EPR exiting,
disable EPR exiting and EPR exiting itself, so that user space can
acknowledge an interrupt when an external interrupt has successfully
been delivered into the guest vcpu.Signed-off-by: Alexander Graf
-
Reflect the uapi folder change in SREGS API documentation.
Signed-off-by: Mihai Caraman
Reviewed-by: Amos Kong
Signed-off-by: Alexander Graf
08 Jan, 2013
4 commits
-
Add a new capability, KVM_CAP_S390_CSS_SUPPORT, which will pass
intercepts for channel I/O instructions to userspace. Only I/O
instructions interacting with I/O interrupts need to be handled
in-kernel:- TEST PENDING INTERRUPTION (tpi) dequeues and stores pending
interrupts entirely in-kernel.
- TEST SUBCHANNEL (tsch) dequeues pending interrupts in-kernel
and exits via KVM_EXIT_S390_TSCH to userspace for subchannel-
related processing.Reviewed-by: Marcelo Tosatti
Reviewed-by: Alexander Graf
Signed-off-by: Cornelia Huck
Signed-off-by: Marcelo Tosatti -
Make s390 support KVM_ENABLE_CAP.
Reviewed-by: Marcelo Tosatti
Acked-by: Alexander Graf
Signed-off-by: Cornelia Huck
Signed-off-by: Marcelo Tosatti -
Add support for injecting machine checks (only repressible
conditions for now).This is a bit more involved than I/O interrupts, for these reasons:
- Machine checks come in both floating and cpu varieties.
- We don't have a bit for machine checks enabling, but have to use
a roundabout approach with trapping PSW changing instructions and
watching for opened machine checks.Reviewed-by: Alexander Graf
Reviewed-by: Marcelo Tosatti
Signed-off-by: Cornelia Huck
Signed-off-by: Marcelo Tosatti -
Add support for handling I/O interrupts (standard, subchannel-related
ones and rudimentary adapter interrupts).The subchannel-identifying parameters are encoded into the interrupt
type.I/O interrupts are floating, so they can't be injected on a specific
vcpu.Reviewed-by: Alexander Graf
Reviewed-by: Marcelo Tosatti
Signed-off-by: Cornelia Huck
Signed-off-by: Marcelo Tosatti
06 Dec, 2012
2 commits
-
Implement ONE_REG interface for EPCR register adding KVM_REG_PPC_EPCR to
the list of ONE_REG PPC supported registers.Signed-off-by: Mihai Caraman
[agraf: remove HV dependency, use get/put_user]
Signed-off-by: Alexander Graf -
A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor. Reads on
this fd return the contents of the HPT (hashed page table), writes
create and/or remove entries in the HPT. There is a new capability,
KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl. The ioctl
takes an argument structure with the index of the first HPT entry to
read out and a set of flags. The flags indicate whether the user is
intending to read or write the HPT, and whether to return all entries
or only the "bolted" entries (those with the bolted bit, 0x10, set in
the first doubleword).This is intended for use in implementing qemu's savevm/loadvm and for
live migration. Therefore, on reads, the first pass returns information
about all HPTEs (or all bolted HPTEs). When the first pass reaches the
end of the HPT, it returns from the read. Subsequent reads only return
information about HPTEs that have changed since they were last read.
A read that finds no changed HPTEs in the HPT following where the last
read finished will return 0 bytes.The format of the data provides a simple run-length compression of the
invalid entries. Each block of data starts with a header that indicates
the index (position in the HPT, which is just an array), the number of
valid entries starting at that index (may be zero), and the number of
invalid entries following those valid entries. The valid entries, 16
bytes each, follow the header. The invalid entries are not explicitly
represented.Signed-off-by: Paul Mackerras
[agraf: fix documentation]
Signed-off-by: Alexander Graf
31 Oct, 2012
1 commit
-
Conflicts:
arch/powerpc/include/asm/Kbuild
arch/powerpc/include/uapi/asm/Kbuild
30 Oct, 2012
1 commit
-
All user space offloaded instruction emulation needs to reenter kvm
to produce consistent state again. Fix the section in the documentation
to mention all of them.Signed-off-by: Alexander Graf