Eric Lee / smarc-fsl-linux-kernel

26 Dec, 2011

1 commit

4d25a066b KVM: Don't automatically expose the TSC deadline timer in cpuid ... Browse Code »

Unlike all of the other cpuid bits, the TSC deadline timer bit is set
unconditionally, regardless of what userspace wants.

This is broken in several ways:
- if userspace doesn't use KVM_CREATE_IRQCHIP, and doesn't emulate the TSC
deadline timer feature, a guest that uses the feature will break
- live migration to older host kernels that don't support the TSC deadline
timer will cause the feature to be pulled from under the guest's feet;
breaking it
- guests that are broken wrt the feature will fail.

Fix by not enabling the feature automatically; instead report it to userspace.
Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot guarantee
will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not
KVM_GET_SUPPORTED_CPUID.

Fixes the Illumos guest kernel, which uses the TSC deadline timer feature.

[avi: add the KVM_CAP + documentation]

Reported-by: Alexey Zaytsev
Tested-by: Alexey Zaytsev
Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Jan Kiszka
2011-12-26 19:27:44 +0800

17 Nov, 2011

1 commit

bb75c627f Revert "KVM: PPC: Add support for explicit HIOR setting" ... Browse Code »

This reverts commit a15bd354f083f20f257db450488db52ac27df439.

It exceeded the padding on the SREGS struct, rendering the ABI
backwards-incompatible.

Conflicts:

arch/powerpc/kvm/powerpc.c
include/linux/kvm.h

Signed-off-by: Avi Kivity

Alexander Graf
2011-11-17 22:30:25 +0800

30 Oct, 2011

1 commit

7697e71f7 KVM: s390: implement sigp external call ... Browse Code »

Implement sigp external call, which might be required for guests that
issue an external call instead of an emergency signal for IPI.

This fixes an issue with "KVM: unknown SIGP: 0x02" when booting
such an SMP guest.

Signed-off-by: Christian Ehrhardt
Signed-off-by: Christian Borntraeger
Signed-off-by: Marcelo Tosatti

Christian Ehrhardt
2011-10-30 18:24:05 +0800

26 Sep, 2011

3 commits

930b412a0 KVM: PPC: Enable the PAPR CAP for Book3S ... Browse Code »

Now that Book3S PV mode can also run PAPR guests, we can add a PAPR cap and
enable it for all Book3S targets. Enabling that CAP switches KVM into PAPR
mode.

Signed-off-by: Alexander Graf

Alexander Graf
2011-09-26 00:52:26 +0800
a15bd354f KVM: PPC: Add support for explicit HIOR setting ... Browse Code »

Until now, we always set HIOR based on the PVR, but this is just wrong.
Instead, we should be setting HIOR explicitly, so user space can decide
what the initial HIOR value is - just like on real hardware.

We keep the old PVR based way around for backwards compatibility, but
once user space uses the SREGS based method, we drop the PVR logic.

Signed-off-by: Alexander Graf

Alexander Graf
2011-09-26 00:52:23 +0800
8c3ba334f KVM: x86: Raise the hard VCPU count limit ... Browse Code »

The patch raises the hard limit of VCPU count to 254.

This will allow developers to easily work on scalability
and will allow users to test high VCPU setups easily without
patching the kernel.

To prevent possible issues with current setups, KVM_CAP_NR_VCPUS
now returns the recommended VCPU limit (which is still 64) - this
should be a safe value for everybody, while a new KVM_CAP_MAX_VCPUS
returns the hard limit which is now 254.

Cc: Avi Kivity
Cc: Ingo Molnar
Cc: Marcelo Tosatti
Cc: Pekka Enberg
Suggested-by: Pekka Enberg
Signed-off-by: Sasha Levin
Signed-off-by: Marcelo Tosatti

Sasha Levin
2011-09-26 00:17:57 +0800

20 Sep, 2011

1 commit

b6cf8788a [S390] kvm: extension capability for new address space layout ... Browse Code »

598841ca9919d008b520114d8a4378c4ce4e40a1 ([S390] use gmap address
spaces for kvm guest images) changed kvm on s390 to use a separate
address space for kvm guests. We can now put KVM guests anywhere
in the user address mode with a size up to 8PB - as long as the
memory is 1MB-aligned. This change was done without KVM extension
capability bit.
The change was added after 3.0, but we still have a chance to add
a feature bit before 3.1 (keeping the releases in a sane state).
We use number 71 to avoid collisions with other pending kvm patches
as requested by Alexander Graf.

Signed-off-by: Christian Borntraeger
Acked-by: Avi Kivity
Cc: Alexander Graf
Signed-off-by: Heiko Carstens

Christian Borntraeger
2011-09-20 23:07:34 +0800

12 Jul, 2011

5 commits

aa04b4cc5 KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests ... Browse Code »

This adds infrastructure which will be needed to allow book3s_hv KVM to
run on older POWER processors, including PPC970, which don't support
the Virtual Real Mode Area (VRMA) facility, but only the Real Mode
Offset (RMO) facility. These processors require a physically
contiguous, aligned area of memory for each guest. When the guest does
an access in real mode (MMU off), the address is compared against a
limit value, and if it is lower, the address is ORed with an offset
value (from the Real Mode Offset Register (RMOR)) and the result becomes
the real address for the access. The size of the RMA has to be one of
a set of supported values, which usually includes 64MB, 128MB, 256MB
and some larger powers of 2.

Since we are unlikely to be able to allocate 64MB or more of physically
contiguous memory after the kernel has been running for a while, we
allocate a pool of RMAs at boot time using the bootmem allocator. The
size and number of the RMAs can be set using the kvm_rma_size=xx and
kvm_rma_count=xx kernel command line options.

KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability
of the pool of preallocated RMAs. The capability value is 1 if the
processor can use an RMA but doesn't require one (because it supports
the VRMA facility), or 2 if the processor requires an RMA for each guest.

This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the
pool and returns a file descriptor which can be used to map the RMA. It
also returns the size of the RMA in the argument structure.

Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION
ioctl calls from userspace. To cope with this, we now preallocate the
kvm->arch.ram_pginfo array when the VM is created with a size sufficient
for up to 64GB of guest memory. Subsequently we will get rid of this
array and use memory associated with each memslot instead.

This moves most of the code that translates the user addresses into
host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level
to kvmppc_core_prepare_memory_region. Also, instead of having to look
up the VMA for each page in order to check the page size, we now check
that the pages we get are compound pages of 16MB. However, if we are
adding memory that is mapped to an RMA, we don't bother with calling
get_user_pages_fast and instead just offset from the base pfn for the
RMA.

Typically the RMA gets added after vcpus are created, which makes it
inconvenient to have the LPCR (logical partition control register) value
in the vcpu->arch struct, since the LPCR controls whether the processor
uses RMA or VRMA for the guest. This moves the LPCR value into the
kvm->arch struct and arranges for the MER (mediated external request)
bit, which is the only bit that varies between vcpus, to be set in
assembly code when going into the guest if there is a pending external
interrupt request.

Signed-off-by: Paul Mackerras
Signed-off-by: Alexander Graf

Paul Mackerras
2011-07-12 18:16:57 +0800
371fefd6f KVM: PPC: Allow book3s_hv guests to use SMT processor modes ... Browse Code »

This lifts the restriction that book3s_hv guests can only run one
hardware thread per core, and allows them to use up to 4 threads
per core on POWER7. The host still has to run single-threaded.

This capability is advertised to qemu through a new KVM_CAP_PPC_SMT
capability. The return value of the ioctl querying this capability
is the number of vcpus per virtual CPU core (vcore), currently 4.

To use this, the host kernel should be booted with all threads
active, and then all the secondary threads should be offlined.
This will put the secondary threads into nap mode. KVM will then
wake them from nap mode and use them for running guest code (while
they are still offline). To wake the secondary threads, we send
them an IPI using a new xics_wake_cpu() function, implemented in
arch/powerpc/sysdev/xics/icp-native.c. In other words, at this stage
we assume that the platform has a XICS interrupt controller and
we are using icp-native.c to drive it. Since the woken thread will
need to acknowledge and clear the IPI, we also export the base
physical address of the XICS registers using kvmppc_set_xics_phys()
for use in the low-level KVM book3s code.

When a vcpu is created, it is assigned to a virtual CPU core.
The vcore number is obtained by dividing the vcpu number by the
number of threads per core in the host. This number is exported
to userspace via the KVM_CAP_PPC_SMT capability. If qemu wishes
to run the guest in single-threaded mode, it should make all vcpu
numbers be multiples of the number of threads per core.

We distinguish three states of a vcpu: runnable (i.e., ready to execute
the guest), blocked (that is, idle), and busy in host. We currently
implement a policy that the vcore can run only when all its threads
are runnable or blocked. This way, if a vcpu needs to execute elsewhere
in the kernel or in qemu, it can do so without being starved of CPU
by the other vcpus.

When a vcore starts to run, it executes in the context of one of the
vcpu threads. The other vcpu threads all go to sleep and stay asleep
until something happens requiring the vcpu thread to return to qemu,
or to wake up to run the vcore (this can happen when another vcpu
thread goes from busy in host state to blocked).

It can happen that a vcpu goes from blocked to runnable state (e.g.
because of an interrupt), and the vcore it belongs to is already
running. In that case it can start to run immediately as long as
the none of the vcpus in the vcore have started to exit the guest.
We send the next free thread in the vcore an IPI to get it to start
to execute the guest. It synchronizes with the other threads via
the vcore->entry_exit_count field to make sure that it doesn't go
into the guest if the other vcpus are exiting by the time that it
is ready to actually enter the guest.

Note that there is no fixed relationship between the hardware thread
number and the vcpu number. Hardware threads are assigned to vcpus
as they become runnable, so we will always use the lower-numbered
hardware threads in preference to higher-numbered threads if not all
the vcpus in the vcore are runnable, regardless of which vcpus are
runnable.

Signed-off-by: Paul Mackerras
Signed-off-by: Alexander Graf

Paul Mackerras
2011-07-12 18:16:57 +0800
54738c097 KVM: PPC: Accelerate H_PUT_TCE by implementing it in real mode ... Browse Code »

This improves I/O performance for guests using the PAPR
paravirtualization interface by making the H_PUT_TCE hcall faster, by
implementing it in real mode. H_PUT_TCE is used for updating virtual
IOMMU tables, and is used both for virtual I/O and for real I/O in the
PAPR interface.

Since this moves the IOMMU tables into the kernel, we define a new
KVM_CREATE_SPAPR_TCE ioctl to allow qemu to create the tables. The
ioctl returns a file descriptor which can be used to mmap the newly
created table. The qemu driver models use them in the same way as
userspace managed tables, but they can be updated directly by the
guest with a real-mode H_PUT_TCE implementation, reducing the number
of host/guest context switches during guest IO.

There are certain circumstances where it is useful for userland qemu
to write to the TCE table even if the kernel H_PUT_TCE path is used
most of the time. Specifically, allowing this will avoid awkwardness
when we need to reset the table. More importantly, we will in the
future need to write the table in order to restore its state after a
checkpoint resume or migration.

Signed-off-by: David Gibson
Signed-off-by: Paul Mackerras
Signed-off-by: Alexander Graf

David Gibson
2011-07-12 18:16:56 +0800
de56a948b KVM: PPC: Add support for Book3S processors in hypervisor mode ... Browse Code »

This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.

This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.

Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.

This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.

With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.

We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.

In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.

The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.

This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.

This also adds a few new exports needed by the book3s_hv code.

Signed-off-by: Paul Mackerras
Signed-off-by: Alexander Graf

Paul Mackerras
2011-07-12 18:16:54 +0800
91e3d71db KVM: Clarify KVM_ASSIGN_PCI_DEVICE documentation ... Browse Code »

Neither host_irq nor the guest_msi struct are used anymore today.
Tag the former, drop the latter to avoid confusion.

Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Jan Kiszka
2011-07-12 18:16:16 +0800

22 May, 2011

1 commit

5ce941ee4 KVM: PPC: booke: add sregs support ... Browse Code »

Signed-off-by: Scott Wood
Signed-off-by: Alexander Graf

Scott Wood
2011-05-22 20:47:53 +0800

11 May, 2011

1 commit

92a1f12d2 KVM: X86: Implement userspace interface to set virtual_tsc_khz ... Browse Code »

This patch implements two new vm-ioctls to get and set the
virtual_tsc_khz if the machine supports tsc-scaling. Setting
the tsc-frequency is only possible before userspace creates
any vcpu.

Signed-off-by: Joerg Roedel
Signed-off-by: Avi Kivity

Joerg Roedel
2011-05-11 19:57:06 +0800

12 Jan, 2011

1 commit

344d9588a KVM: Add PV MSR to enable asynchronous page faults delivery. ... Browse Code »

Guest enables async PF vcpu functionality using this MSR.

Reviewed-by: Rik van Riel
Signed-off-by: Gleb Natapov
Signed-off-by: Marcelo Tosatti

Gleb Natapov
2011-01-12 17:23:12 +0800

24 Oct, 2010

2 commits

7b4203e8c KVM: PPC: Expose level based interrupt cap ... Browse Code »

Now that we have all the level interrupt magic in place, let's
expose the capability to user space, so it can make use of it!

Signed-off-by: Alexander Graf

Alexander Graf
2010-10-24 16:52:19 +0800
15711e9c9 KVM: PPC: Add get_pvinfo interface to query hypercall instructions ... Browse Code »

We need to tell the guest the opcodes that make up a hypercall through
interfaces that are controlled by userspace. So we need to add a call
for userspace to allow it to query those opcodes so it can pass them
on.

This is required because the hypercall opcodes can change based on
the hypervisor conditions. If we're running in hardware accelerated
hypervisor mode, a hypercall looks different from when we're running
without hardware acceleration.

Signed-off-by: Alexander Graf
Signed-off-by: Avi Kivity

Alexander Graf
2010-10-24 16:50:57 +0800

01 Aug, 2010

2 commits

a1f4d3950 KVM: Remove memory alias support ... Browse Code »

As advertised in feature-removal-schedule.txt. Equivalent support is provided
by overlapping memory regions.

Signed-off-by: Avi Kivity

Avi Kivity
2010-08-01 15:47:00 +0800
2d5b5a665 KVM: x86: XSAVE/XRSTOR live migration support ... Browse Code »

This patch enable save/restore of xsave state.

Signed-off-by: Sheng Yang
Signed-off-by: Marcelo Tosatti

Sheng Yang
2010-08-01 15:46:37 +0800

17 May, 2010

3 commits

ad0a048b0 KVM: PPC: Add OSI hypercall interface ... Browse Code »

MOL uses its own hypercall interface to call back into userspace when
the guest wants to do something.

So let's implement that as an exit reason, specify it with a CAP and
only really use it when userspace wants us to.

The only user of it so far is MOL.

Signed-off-by: Alexander Graf
Signed-off-by: Avi Kivity

Alexander Graf
2010-05-17 17:17:10 +0800
71fbfd5f3 KVM: Add support for enabling capabilities per-vcpu ... Browse Code »

Some times we don't want all capabilities to be available to all
our vcpus. One example for that is the OSI interface, implemented
in the next patch.

In order to have a generic mechanism in how to enable capabilities
individually, this patch introduces a new ioctl that can be used
for this purpose. That way features we don't want in all guests or
userspace configurations can just not be enabled and we're good.

Signed-off-by: Alexander Graf
Signed-off-by: Avi Kivity

Alexander Graf
2010-05-17 17:17:09 +0800
18978768d KVM: PPC: Allow userspace to unset the IRQ line ... Browse Code »

Userspace can tell us that it wants to trigger an interrupt. But
so far it can't tell us that it wants to stop triggering one.

So let's interpret the parameter to the ioctl that we have anyways
to tell us if we want to raise or lower the interrupt line.

Signed-off-by: Alexander Graf

v2 -> v3:

- Add CAP for unset irq
Signed-off-by: Avi Kivity

Alexander Graf
2010-05-17 17:16:51 +0800

25 Apr, 2010

3 commits

a1efbe77c KVM: x86: Add support for saving&restoring debug registers ... Browse Code »

So far user space was not able to save and restore debug registers for
migration or after reset. Plug this hole.

Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Jan Kiszka
2010-04-25 17:39:10 +0800
48005f64d KVM: x86: Save&restore interrupt shadow mask ... Browse Code »

The interrupt shadow created by STI or MOV-SS-like operations is part of
the VCPU state and must be preserved across migration. Transfer it in
the spare padding field of kvm_vcpu_events.interrupt.

As a side effect we now have to make vmx_set_interrupt_shadow robust
against both shadow types being set. Give MOV SS a higher priority and
skip STI in that case to avoid that VMX throws a fault on next entry.

Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Jan Kiszka
2010-04-25 17:38:28 +0800
c10207fe8 KVM: PPC: Add capability for paired singles ... Browse Code »

We need to tell userspace that we can emulate paired single instructions.
So let's add a capability export.

Signed-off-by: Alexander Graf
Signed-off-by: Avi Kivity

Alexander Graf
2010-04-25 17:37:47 +0800

01 Mar, 2010

6 commits

d2be1651b KVM: x86: Add KVM_CAP_X86_ROBUST_SINGLESTEP ... Browse Code »

This marks the guest single-step API improvement of 94fe45da and
91586a3b with a capability flag to allow reliable detection by user
space.

Signed-off-by: Jan Kiszka
Cc: stable@kernel.org (2.6.33)
Signed-off-by: Avi Kivity

Jan Kiszka
2010-03-01 23:36:14 +0800
ab9f4ecbb KVM: enable PCI multiple-segments for pass-through device ... Browse Code »

Enable optional parameter (default 0) - PCI segment (or domain) besides
BDF, when assigning PCI device to guest.

Signed-off-by: Zhai Edwin
Acked-by: Chris Wright
Signed-off-by: Marcelo Tosatti

Zhai, Edwin
2010-03-01 23:36:06 +0800
c25bc1638 KVM: Implement NotifyLongSpinWait HYPER-V hypercall ... Browse Code »

Windows issues this hypercall after guest was spinning on a spinlock
for too many iterations.

Signed-off-by: Gleb Natapov
Signed-off-by: Vadim Rozenfeld
Signed-off-by: Avi Kivity

Gleb Natapov
2010-03-01 23:36:00 +0800
10388a071 KVM: Add HYPER-V apic access MSRs ... Browse Code »

Implement HYPER-V apic MSRs. Spec defines three MSRs that speed-up
access to EOI/TPR/ICR apic registers for PV guests.

Signed-off-by: Gleb Natapov
Signed-off-by: Vadim Rozenfeld
Signed-off-by: Avi Kivity

Gleb Natapov
2010-03-01 23:36:00 +0800
55cd8e5a4 KVM: Implement bare minimum of HYPER-V MSRs ... Browse Code »

Minimum HYPER-V implementation should have GUEST_OS_ID, HYPERCALL and
VP_INDEX MSRs.

[avi: fix build on i386]

Signed-off-by: Gleb Natapov
Signed-off-by: Vadim Rozenfeld
Signed-off-by: Avi Kivity

Gleb Natapov
2010-03-01 23:35:57 +0800
bc6678a33 KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU update ... Browse Code »

Use two steps for memslot deletion: mark the slot invalid (which stops
instantiation of new shadow pages for that slot, but allows destruction),
then instantiate the new empty slot.

Also simplifies kvm_handle_hva locking.

Signed-off-by: Marcelo Tosatti

Marcelo Tosatti
2010-03-01 23:35:44 +0800

09 Dec, 2009

1 commit

bcd6acd51 Merge commit 'origin/master' into next ... Browse Code »

Conflicts:
include/linux/kvm.h

Benjamin Herrenschmidt
2009-12-09 14:14:38 +0800

08 Dec, 2009

1 commit

e15a11370 powerpc/kvm: Sync guest visible MMU state ... Browse Code »

Currently userspace has no chance to find out which virtual address space we're
in and resolve addresses. While that is a big problem for migration, it's also
unpleasent when debugging, as gdb and the monitor don't work on virtual
addresses.

This patch exports enough of the MMU segment state to userspace to make
debugging work and thus also includes the groundwork for migration.

Signed-off-by: Alexander Graf
Signed-off-by: Benjamin Herrenschmidt

Alexander Graf
2009-12-08 13:02:50 +0800

03 Dec, 2009

7 commits

d7b0b5eb3 KVM: s390: Make psw available on all exits, not just a subset ... Browse Code »

This patch moves s390 processor status word into the base kvm_run
struct and keeps it up-to date on all userspace exits.

The userspace ABI is broken by this, however there are no applications
in the wild using this. A capability check is provided so users can
verify the updated API exists.

Cc: stable@kernel.org
Signed-off-by: Carsten Otte
Signed-off-by: Avi Kivity

Carsten Otte
2009-12-03 15:32:25 +0800
3cfc3092f KVM: x86: Add KVM_GET/SET_VCPU_EVENTS ... Browse Code »

This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.

[avi: future-proof abi by adding a flags field]

Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Jan Kiszka
2009-12-03 15:32:25 +0800
65ac72640 KVM: VMX: Report unexpected simultaneous exceptions as internal errors ... Browse Code »

These happen when we trap an exception when another exception is being
delivered; we only expect these with MCEs and page faults. If something
unexpected happens, things probably went south and we're better off reporting
an internal error and freezing.

Signed-off-by: Avi Kivity

Avi Kivity
2009-12-03 15:32:24 +0800
a9c7399d6 KVM: Allow internal errors reported to userspace to carry extra data ... Browse Code »

Usually userspace will freeze the guest so we can inspect it, but some
internal state is not available. Add extra data to internal error
reporting so we can expose it to the debugger. Extra data is specific
to the suberror.

Signed-off-by: Avi Kivity

Avi Kivity
2009-12-03 15:32:24 +0800
c54d2aba2 KVM: Reorder IOCTLs in main kvm.h ... Browse Code »

Obviously, people tend to extend this header at the bottom - more or
less blindly. Ensure that deprecated stuff gets its own corner again by
moving things to the top. Also add some comments and reindent IOCTLs to
make them more readable and reduce the risk of number collisions.

Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Jan Kiszka
2009-12-03 15:32:24 +0800
afbcf7ab8 KVM: allow userspace to adjust kvmclock offset ... Browse Code »

When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.

Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.

This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.

[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]

Signed-off-by: Glauber Costa
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Glauber Costa
2009-12-03 15:32:19 +0800
ffde22ac5 KVM: Xen PV-on-HVM guest support ... Browse Code »

Support for Xen PV-on-HVM guests can be implemented almost entirely in
userspace, except for handling one annoying MSR that maps a Xen
hypercall blob into guest address space.

A generic mechanism to delegate MSR writes to userspace seems overkill
and risks encouraging similar MSR abuse in the future. Thus this patch
adds special support for the Xen HVM MSR.

I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
KVM which MSR the guest will write to, as well as the starting address
and size of the hypercall blobs (one each for 32-bit and 64-bit) that
userspace has loaded from files. When the guest writes to the MSR, KVM
copies one page of the blob from userspace to the guest.

I've tested this patch with a hacked-up version of Gerd's userspace
code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.

[jan: fix i386 build warning]
[avi: future proof abi with a flags field]

Signed-off-by: Ed Swierk
Signed-off-by: Jan Kiszka
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Ed Swierk
2009-12-03 15:32:18 +0800