Eric Lee / linux-smarc-t335x-v3.2

12 Jan, 2011

1 commit

344d9588a KVM: Add PV MSR to enable asynchronous page faults delivery. ... Browse Code »

Guest enables async PF vcpu functionality using this MSR.

Reviewed-by: Rik van Riel
Signed-off-by: Gleb Natapov
Signed-off-by: Marcelo Tosatti

Gleb Natapov
2011-01-12 17:23:12 +0800

24 Oct, 2010

2 commits

7b4203e8c KVM: PPC: Expose level based interrupt cap ... Browse Code »

Now that we have all the level interrupt magic in place, let's
expose the capability to user space, so it can make use of it!

Signed-off-by: Alexander Graf

Alexander Graf
2010-10-24 16:52:19 +0800
15711e9c9 KVM: PPC: Add get_pvinfo interface to query hypercall instructions ... Browse Code »

We need to tell the guest the opcodes that make up a hypercall through
interfaces that are controlled by userspace. So we need to add a call
for userspace to allow it to query those opcodes so it can pass them
on.

This is required because the hypercall opcodes can change based on
the hypervisor conditions. If we're running in hardware accelerated
hypervisor mode, a hypercall looks different from when we're running
without hardware acceleration.

Signed-off-by: Alexander Graf
Signed-off-by: Avi Kivity

Alexander Graf
2010-10-24 16:50:57 +0800

01 Aug, 2010

2 commits

a1f4d3950 KVM: Remove memory alias support ... Browse Code »

As advertised in feature-removal-schedule.txt. Equivalent support is provided
by overlapping memory regions.

Signed-off-by: Avi Kivity

Avi Kivity
2010-08-01 15:47:00 +0800
2d5b5a665 KVM: x86: XSAVE/XRSTOR live migration support ... Browse Code »

This patch enable save/restore of xsave state.

Signed-off-by: Sheng Yang
Signed-off-by: Marcelo Tosatti

Sheng Yang
2010-08-01 15:46:37 +0800

17 May, 2010

3 commits

ad0a048b0 KVM: PPC: Add OSI hypercall interface ... Browse Code »

MOL uses its own hypercall interface to call back into userspace when
the guest wants to do something.

So let's implement that as an exit reason, specify it with a CAP and
only really use it when userspace wants us to.

The only user of it so far is MOL.

Signed-off-by: Alexander Graf
Signed-off-by: Avi Kivity

Alexander Graf
2010-05-17 17:17:10 +0800
71fbfd5f3 KVM: Add support for enabling capabilities per-vcpu ... Browse Code »

Some times we don't want all capabilities to be available to all
our vcpus. One example for that is the OSI interface, implemented
in the next patch.

In order to have a generic mechanism in how to enable capabilities
individually, this patch introduces a new ioctl that can be used
for this purpose. That way features we don't want in all guests or
userspace configurations can just not be enabled and we're good.

Signed-off-by: Alexander Graf
Signed-off-by: Avi Kivity

Alexander Graf
2010-05-17 17:17:09 +0800
18978768d KVM: PPC: Allow userspace to unset the IRQ line ... Browse Code »

Userspace can tell us that it wants to trigger an interrupt. But
so far it can't tell us that it wants to stop triggering one.

So let's interpret the parameter to the ioctl that we have anyways
to tell us if we want to raise or lower the interrupt line.

Signed-off-by: Alexander Graf

v2 -> v3:

- Add CAP for unset irq
Signed-off-by: Avi Kivity

Alexander Graf
2010-05-17 17:16:51 +0800

25 Apr, 2010

3 commits

a1efbe77c KVM: x86: Add support for saving&restoring debug registers ... Browse Code »

So far user space was not able to save and restore debug registers for
migration or after reset. Plug this hole.

Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Jan Kiszka
2010-04-25 17:39:10 +0800
48005f64d KVM: x86: Save&restore interrupt shadow mask ... Browse Code »

The interrupt shadow created by STI or MOV-SS-like operations is part of
the VCPU state and must be preserved across migration. Transfer it in
the spare padding field of kvm_vcpu_events.interrupt.

As a side effect we now have to make vmx_set_interrupt_shadow robust
against both shadow types being set. Give MOV SS a higher priority and
skip STI in that case to avoid that VMX throws a fault on next entry.

Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Jan Kiszka
2010-04-25 17:38:28 +0800
c10207fe8 KVM: PPC: Add capability for paired singles ... Browse Code »

We need to tell userspace that we can emulate paired single instructions.
So let's add a capability export.

Signed-off-by: Alexander Graf
Signed-off-by: Avi Kivity

Alexander Graf
2010-04-25 17:37:47 +0800

01 Mar, 2010

6 commits

d2be1651b KVM: x86: Add KVM_CAP_X86_ROBUST_SINGLESTEP ... Browse Code »

This marks the guest single-step API improvement of 94fe45da and
91586a3b with a capability flag to allow reliable detection by user
space.

Signed-off-by: Jan Kiszka
Cc: stable@kernel.org (2.6.33)
Signed-off-by: Avi Kivity

Jan Kiszka
2010-03-01 23:36:14 +0800
ab9f4ecbb KVM: enable PCI multiple-segments for pass-through device ... Browse Code »

Enable optional parameter (default 0) - PCI segment (or domain) besides
BDF, when assigning PCI device to guest.

Signed-off-by: Zhai Edwin
Acked-by: Chris Wright
Signed-off-by: Marcelo Tosatti

Zhai, Edwin
2010-03-01 23:36:06 +0800
c25bc1638 KVM: Implement NotifyLongSpinWait HYPER-V hypercall ... Browse Code »

Windows issues this hypercall after guest was spinning on a spinlock
for too many iterations.

Signed-off-by: Gleb Natapov
Signed-off-by: Vadim Rozenfeld
Signed-off-by: Avi Kivity

Gleb Natapov
2010-03-01 23:36:00 +0800
10388a071 KVM: Add HYPER-V apic access MSRs ... Browse Code »

Implement HYPER-V apic MSRs. Spec defines three MSRs that speed-up
access to EOI/TPR/ICR apic registers for PV guests.

Signed-off-by: Gleb Natapov
Signed-off-by: Vadim Rozenfeld
Signed-off-by: Avi Kivity

Gleb Natapov
2010-03-01 23:36:00 +0800
55cd8e5a4 KVM: Implement bare minimum of HYPER-V MSRs ... Browse Code »

Minimum HYPER-V implementation should have GUEST_OS_ID, HYPERCALL and
VP_INDEX MSRs.

[avi: fix build on i386]

Signed-off-by: Gleb Natapov
Signed-off-by: Vadim Rozenfeld
Signed-off-by: Avi Kivity

Gleb Natapov
2010-03-01 23:35:57 +0800
bc6678a33 KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU update ... Browse Code »

Use two steps for memslot deletion: mark the slot invalid (which stops
instantiation of new shadow pages for that slot, but allows destruction),
then instantiate the new empty slot.

Also simplifies kvm_handle_hva locking.

Signed-off-by: Marcelo Tosatti

Marcelo Tosatti
2010-03-01 23:35:44 +0800

09 Dec, 2009

1 commit

bcd6acd51 Merge commit 'origin/master' into next ... Browse Code »

Conflicts:
include/linux/kvm.h

Benjamin Herrenschmidt
2009-12-09 14:14:38 +0800

08 Dec, 2009

1 commit

e15a11370 powerpc/kvm: Sync guest visible MMU state ... Browse Code »

Currently userspace has no chance to find out which virtual address space we're
in and resolve addresses. While that is a big problem for migration, it's also
unpleasent when debugging, as gdb and the monitor don't work on virtual
addresses.

This patch exports enough of the MMU segment state to userspace to make
debugging work and thus also includes the groundwork for migration.

Signed-off-by: Alexander Graf
Signed-off-by: Benjamin Herrenschmidt

Alexander Graf
2009-12-08 13:02:50 +0800

03 Dec, 2009

7 commits

d7b0b5eb3 KVM: s390: Make psw available on all exits, not just a subset ... Browse Code »

This patch moves s390 processor status word into the base kvm_run
struct and keeps it up-to date on all userspace exits.

The userspace ABI is broken by this, however there are no applications
in the wild using this. A capability check is provided so users can
verify the updated API exists.

Cc: stable@kernel.org
Signed-off-by: Carsten Otte
Signed-off-by: Avi Kivity

Carsten Otte
2009-12-03 15:32:25 +0800
3cfc3092f KVM: x86: Add KVM_GET/SET_VCPU_EVENTS ... Browse Code »

This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.

[avi: future-proof abi by adding a flags field]

Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Jan Kiszka
2009-12-03 15:32:25 +0800
65ac72640 KVM: VMX: Report unexpected simultaneous exceptions as internal errors ... Browse Code »

These happen when we trap an exception when another exception is being
delivered; we only expect these with MCEs and page faults. If something
unexpected happens, things probably went south and we're better off reporting
an internal error and freezing.

Signed-off-by: Avi Kivity

Avi Kivity
2009-12-03 15:32:24 +0800
a9c7399d6 KVM: Allow internal errors reported to userspace to carry extra data ... Browse Code »

Usually userspace will freeze the guest so we can inspect it, but some
internal state is not available. Add extra data to internal error
reporting so we can expose it to the debugger. Extra data is specific
to the suberror.

Signed-off-by: Avi Kivity

Avi Kivity
2009-12-03 15:32:24 +0800
c54d2aba2 KVM: Reorder IOCTLs in main kvm.h ... Browse Code »

Obviously, people tend to extend this header at the bottom - more or
less blindly. Ensure that deprecated stuff gets its own corner again by
moving things to the top. Also add some comments and reindent IOCTLs to
make them more readable and reduce the risk of number collisions.

Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Jan Kiszka
2009-12-03 15:32:24 +0800
afbcf7ab8 KVM: allow userspace to adjust kvmclock offset ... Browse Code »

When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.

Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.

This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.

[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]

Signed-off-by: Glauber Costa
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Glauber Costa
2009-12-03 15:32:19 +0800
ffde22ac5 KVM: Xen PV-on-HVM guest support ... Browse Code »

Support for Xen PV-on-HVM guests can be implemented almost entirely in
userspace, except for handling one annoying MSR that maps a Xen
hypercall blob into guest address space.

A generic mechanism to delegate MSR writes to userspace seems overkill
and risks encouraging similar MSR abuse in the future. Thus this patch
adds special support for the Xen HVM MSR.

I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
KVM which MSR the guest will write to, as well as the starting address
and size of the hypercall blobs (one each for 32-bit and 64-bit) that
userspace has loaded from files. When the guest writes to the MSR, KVM
copies one page of the blob from userspace to the guest.

I've tested this patch with a hacked-up version of Gerd's userspace
code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.

[jan: fix i386 build warning]
[avi: future proof abi with a flags field]

Signed-off-by: Ed Swierk
Signed-off-by: Jan Kiszka
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Ed Swierk
2009-12-03 15:32:18 +0800

10 Sep, 2009

11 commits

b927a3cec KVM: VMX: Introduce KVM_SET_IDENTITY_MAP_ADDR ioctl ... Browse Code »

Now KVM allow guest to modify guest's physical address of EPT's identity mapping page.

(change from v1, discard unnecessary check, change ioctl to accept parameter
address rather than value)

Signed-off-by: Sheng Yang
Signed-off-by: Marcelo Tosatti

Sheng Yang
2009-09-10 13:33:16 +0800
d34e6b175 KVM: add ioeventfd support ... Browse Code »

ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest. Host userspace can register any
arbitrary IO address with a corresponding eventfd and then pass the eventfd
to a specific end-point of interest for handling.

Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
"heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.

However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc). For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible. All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling. This adds additional computational load on the
system, as well as latency to the signalling path.

Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd. This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.

To test this theory, we built a test-harness called "doorbell". This
module has a function called "doorbell_ring()" which simply increments a
counter for each time the doorbell is signaled. It supports signalling
from either an eventfd, or an ioctl().

We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl(). The other is direct via
ioeventfd.

You can download this test harness here:

ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2

The measured results are as follows:

qemu-mmio: 110000 iops, 9.09us rtt
ioeventfd-mmio: 200100 iops, 5.00us rtt
ioeventfd-pio: 367300 iops, 2.72us rtt

I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy. However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:

qemu-pio: 153139 iops, 6.53us rtt
ioeventfd-hc: 412585 iops, 2.37us rtt

these are just for fun, for now, until I can gather more data.

Here is a graph for your convenience:

http://developer.novell.com/wiki/images/7/76/Iofd-chart.png

The conclusion to draw is that we save about 4us by skipping the userspace
hop.

--------------------

Signed-off-by: Gregory Haskins
Acked-by: Michael S. Tsirkin
Signed-off-by: Avi Kivity

Gregory Haskins
2009-09-10 13:33:12 +0800
e9f427573 KVM: PIT support for HPET legacy mode ... Browse Code »

When kvm is in hpet_legacy_mode, the hpet is providing the timer
interrupt and the pit should not be. So in legacy mode, the pit timer
is destroyed, but the *state* of the pit is maintained. So if kvm or
the guest tries to modify the state of the pit, this modification is
accepted, *except* that the timer isn't actually started. When we exit
hpet_legacy_mode, the current state of the pit (which is up to date
since we've been accepting modifications) is used to restart the pit
timer.

The saved_mode code in kvm_pit_load_count temporarily changes mode to
0xff in order to destroy the timer, but then restores the actual
value, again maintaining "current" state of the pit for possible later
reenablement.

[avi: add some reserved storage in the ioctl; make SET_PIT2 IOW]
[marcelo: fix memory corruption due to reserved storage]

Signed-off-by: Beth Kon
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Beth Kon
2009-09-10 13:33:12 +0800
2023a29cb KVM: remove old KVMTRACE support code ... Browse Code »

Return EOPNOTSUPP for KVM_TRACE_ENABLE/PAUSE/DISABLE ioctls.

Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Marcelo Tosatti
2009-09-10 13:33:03 +0800
3f5d18a96 KVM: Return to userspace on emulation failure ... Browse Code »

Instead of mindlessly retrying to execute the instruction, report the
failure to userspace.

Signed-off-by: Avi Kivity

Avi Kivity
2009-09-10 13:32:52 +0800
73880c80a KVM: Break dependency between vcpu index in vcpus array and vcpu_id. ... Browse Code »

Archs are free to use vcpu_id as they see fit. For x86 it is used as
vcpu's apic id. New ioctl is added to configure boot vcpu id that was
assumed to be 0 till now.

Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity

Gleb Natapov
2009-09-10 13:32:52 +0800
6a4a98397 KVM: Reorder ioctls in kvm.h ... Browse Code »

Somehow the VM ioctls got unsorted; resort.

Signed-off-by: Avi Kivity

Avi Kivity
2009-09-10 13:32:50 +0800
e73333914 KVM: Downsize max support MSI-X entry to 256 ... Browse Code »

We only trap one page for MSI-X entry now, so it's 4k/(128/8) = 256 entries at
most.

Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity

Sheng Yang
2009-09-10 13:32:43 +0800
c5ff41ce6 KVM: Allow PIT emulation without speaker port ... Browse Code »

The in-kernel speaker emulation is only a dummy and also unneeded from
the performance point of view. Rather, it takes user space support to
generate sound output on the host, e.g. console beeps.

To allow this, introduce KVM_CREATE_PIT2 which controls in-kernel
speaker port emulation via a flag passed along the new IOCTL. It also
leaves room for future extensions of the PIT configuration interface.

Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Jan Kiszka
2009-09-10 13:32:41 +0800
721eecbf4 KVM: irqfd ... Browse Code »

KVM provides a complete virtual system environment for guests, including
support for injecting interrupts modeled after the real exception/interrupt
facilities present on the native platform (such as the IDT on x86).
Virtual interrupts can come from a variety of sources (emulated devices,
pass-through devices, etc) but all must be injected to the guest via
the KVM infrastructure. This patch adds a new mechanism to inject a specific
interrupt to a guest using a decoupled eventfd mechnanism: Any legal signal
on the irqfd (using eventfd semantics from either userspace or kernel) will
translate into an injected interrupt in the guest at the next available
interrupt window.

Signed-off-by: Gregory Haskins
Signed-off-by: Avi Kivity

Gregory Haskins
2009-09-10 13:32:41 +0800
890ca9aef KVM: Add MCE support ... Browse Code »

The related MSRs are emulated. MCE capability is exported via
extension KVM_CAP_MCE and ioctl KVM_X86_GET_MCE_CAP_SUPPORTED. A new
vcpu ioctl command KVM_X86_SETUP_MCE is used to setup MCE emulation
such as the mcg_cap. MCE is injected via vcpu ioctl command
KVM_X86_SET_MCE. Extended machine-check state (MCG_EXT_P) and CMCI are
not implemented.

Signed-off-by: Huang Ying
Signed-off-by: Avi Kivity

Huang Ying
2009-09-10 13:32:39 +0800

10 Jun, 2009

3 commits

2f8b9ee14 KVM: Make kvm header C++ friendly ... Browse Code »

Two things needed fixing: 1) g++ does not allow a named structure type
within an anonymous union and 2) Avoid name clash between two padding
fields within the same struct by giving them different names as is
done elsewhere in the header.

Signed-off-by: Nathan Binkert
Signed-off-by: Avi Kivity

nathan binkert
2009-06-10 16:48:39 +0800
e56d532f2 KVM: Device assignment framework rework ... Browse Code »

After discussion with Marcelo, we decided to rework device assignment framework
together. The old problems are kernel logic is unnecessary complex. So Marcelo
suggest to split it into a more elegant way:

1. Split host IRQ assign and guest IRQ assign. And userspace determine the
combination. Also discard msi2intx parameter, userspace can specific
KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_INTX in assigned_irq->flags to
enable MSI to INTx convertion.

2. Split assign IRQ and deassign IRQ. Import two new ioctls:
KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ.

This patch also fixed the reversed _IOR vs _IOW in definition(by deprecated the
old interface).

[avi: replace homemade bitcount() by hweight_long()]

Signed-off-by: Marcelo Tosatti
Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity

Sheng Yang
2009-06-10 16:48:29 +0800
d510d6cc6 KVM: Enable MSI-X for KVM assigned device ... Browse Code »

This patch finally enable MSI-X.

What we need for MSI-X:
1. Intercept one page in MMIO region of device. So that we can get guest desired
MSI-X table and set up the real one. Now this have been done by guest, and
transfer to kernel using ioctl KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY.

2. Information for incoming interrupt. Now one device can have more than one
interrupt, and they are all handled by one workqueue structure. So we need to
identify them. The previous patch enable gsi_msg_pending_bitmap get this done.

3. Mapping from host IRQ to guest gsi as well as guest gsi to real MSI/MSI-X
message address/data. We used same entry number for the host and guest here, so
that it's easy to find the correlated guest gsi.

What we lack for now:
1. The PCI spec said nothing can existed with MSI-X table in the same page of
MMIO region, except pending bits. The patch ignore pending bits as the first
step (so they are always 0 - no pending).

2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS
can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch
didn't support this, and Linux also don't work in this way.

3. The patch didn't implement MSI-X mask all and mask single entry. I would
implement the former in driver/pci/msi.c later. And for single entry, userspace
should have reposibility to handle it.

Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity

Sheng Yang
2009-06-10 16:48:23 +0800