Eric Lee / linux-smarc-t335x-v3.2

02 Aug, 2010

2 commits

4a994358b KVM: Convert mask notifiers to use irqchip/pin instead of gsi ... Browse Code »

Devices register mask notifier using gsi, but irqchip knows about
irqchip/pin, so conversion from irqchip/pin to gsi should be done before
looking for mask notifier to call.

Signed-off-by: Gleb Natapov
Signed-off-by: Marcelo Tosatti

Gleb Natapov
2010-08-02 11:40:39 +0800
edba23e51 KVM: Return EFAULT from kvm ioctl when guest accesses bad area ... Browse Code »

Currently if guest access address that belongs to memory slot but is not
backed up by page or page is read only KVM treats it like MMIO access.
Remove that capability. It was never part of the interface and should
not be relied upon.

Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity

Gleb Natapov
2010-08-02 11:40:33 +0800

01 Aug, 2010

7 commits

e36d96f7c KVM: Keep slot ID in memory slot structure ... Browse Code »

May be used for distinguishing between internal and user slots, or for sorting
slots in size order.

Signed-off-by: Avi Kivity

Avi Kivity
2010-08-01 15:47:07 +0800
0719837c0 KVM: Reduce atomic operations on vcpu->requests ... Browse Code »

Usually the vcpu->requests bitmap is sparse, so a test_and_clear_bit() for
each request generates a large number of unneeded atomics if a bit is set.

Replace with a separate test/clear sequence. This is safe since there is
no clear_bit() outside the vcpu thread.

Signed-off-by: Avi Kivity

Avi Kivity
2010-08-01 15:47:06 +0800
a8eeb04a4 KVM: Add mini-API for vcpu->requests ... Browse Code »

Makes it a little more readable and hackable.

Signed-off-by: Avi Kivity

Avi Kivity
2010-08-01 15:47:05 +0800
a1f4d3950 KVM: Remove memory alias support ... Browse Code »

As advertised in feature-removal-schedule.txt. Equivalent support is provided
by overlapping memory regions.

Signed-off-by: Avi Kivity

Avi Kivity
2010-08-01 15:47:00 +0800
2acf923e3 KVM: VMX: Enable XSAVE/XRSTOR for guest ... Browse Code »

This patch enable guest to use XSAVE/XRSTOR instructions.

We assume that host_xcr0 would use all possible bits that OS supported.

And we loaded xcr0 in the same way we handled fpu - do it as late as we can.

Signed-off-by: Dexuan Cui
Signed-off-by: Sheng Yang
Reviewed-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Dexuan Cui
2010-08-01 15:46:31 +0800
d94e1dc9a KVM: Get rid of KVM_REQ_KICK ... Browse Code »

KVM_REQ_KICK poisons vcpu->requests by having a bit set during normal
operation. This causes the fast path check for a clear vcpu->requests
to fail all the time, triggering tons of atomic operations.

Fix by replacing KVM_REQ_KICK with a vcpu->guest_mode atomic.

Signed-off-by: Avi Kivity

Avi Kivity
2010-08-01 15:35:37 +0800
bf998156d KVM: Avoid killing userspace through guest SRAO MCE on unmapped pages ... Browse Code »

In common cases, guest SRAO MCE will cause corresponding poisoned page
be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay
the MCE to guest OS.

But it is reported that if the poisoned page is accessed in guest
after unmapping and before MCE is relayed to guest OS, userspace will
be killed.

The reason is as follows. Because poisoned page has been un-mapped,
guest access will cause guest exit and kvm_mmu_page_fault will be
called. kvm_mmu_page_fault can not get the poisoned page for fault
address, so kernel and user space MMIO processing is tried in turn. In
user MMIO processing, poisoned page is accessed again, then userspace
is killed by force_sig_info.

To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM
and do not try kernel and user space MMIO processing for poisoned
page.

[xiao: fix warning introduced by avi]

Reported-by: Max Asbock
Signed-off-by: Huang Ying
Signed-off-by: Xiao Guangrong
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Huang Ying
2010-08-01 15:35:26 +0800

19 May, 2010

1 commit

0ee75bead KVM: Let vcpu structure alignment be determined at runtime ... Browse Code »

vmx and svm vcpus have different contents and therefore may have different
alignmment requirements. Let each specify its required alignment.

Signed-off-by: Avi Kivity

Avi Kivity
2010-05-19 16:36:29 +0800

17 May, 2010

3 commits

2a059bf44 KVM: Get rid of dead function gva_to_page() ... Browse Code »

Nobody use gva_to_page() anymore, get rid of it.

Signed-off-by: Gui Jianfeng
Signed-off-by: Avi Kivity

Gui Jianfeng
2010-05-17 17:18:10 +0800
90d83dc3d KVM: use the correct RCU API for PROVE_RCU=y ... Browse Code »

The RCU/SRCU API have already changed for proving RCU usage.

I got the following dmesg when PROVE_RCU=y because we used incorrect API.
This patch coverts rcu_deference() to srcu_dereference() or family API.

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
2 locks held by qemu-system-x86/8550:
#0: (&kvm->slots_lock){+.+.+.}, at: [] kvm_set_memory_region+0x29/0x50 [kvm]
#1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm]

stack backtrace:
Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27
Call Trace:
[] lockdep_rcu_dereference+0xaa/0xb3
[] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm]
[] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm]
[] __kvm_set_memory_region+0x636/0x6e2 [kvm]
[] kvm_set_memory_region+0x37/0x50 [kvm]
[] vmx_set_tss_addr+0x46/0x5a [kvm_intel]
[] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm]
[] ? unlock_page+0x27/0x2c
[] ? __do_fault+0x3a9/0x3e1
[] kvm_vm_ioctl+0x364/0x38d [kvm]
[] ? up_read+0x23/0x3d
[] vfs_ioctl+0x32/0xa6
[] do_vfs_ioctl+0x495/0x4db
[] ? fget_light+0xc2/0x241
[] ? do_sys_open+0x104/0x116
[] ? retint_swapgs+0xe/0x13
[] sys_ioctl+0x47/0x6a
[] system_call_fastpath+0x16/0x1b

Signed-off-by: Lai Jiangshan
Signed-off-by: Avi Kivity

Lai Jiangshan
2010-05-17 17:18:01 +0800
660c22c42 KVM: limit the number of pages per memory slot ... Browse Code »

This patch limits the number of pages per memory slot to make
us free from extra care about type issues.

Signed-off-by: Takuya Yoshikawa
Signed-off-by: Marcelo Tosatti

Takuya Yoshikawa
2010-05-17 17:17:41 +0800

20 Apr, 2010

2 commits

e80e2a60f KVM: Increase NR_IOBUS_DEVS limit to 200 ... Browse Code »

This patch increases the current hardcoded limit of NR_IOBUS_DEVS
from 6 to 200. We are hitting this limit when creating a guest with more
than 1 virtio-net device using vhost-net backend. Each virtio-net
device requires 2 such devices to service notifications from rx/tx queues.

Signed-off-by: Sridhar Samudrala
Signed-off-by: Avi Kivity

Sridhar Samudrala
2010-04-20 18:08:30 +0800
87bf6e7de KVM: fix the handling of dirty bitmaps to avoid overflows ... Browse Code »

Int is not long enough to store the size of a dirty bitmap.

This patch fixes this problem with the introduction of a wrapper
function to calculate the sizes of dirty bitmaps.

Note: in mark_page_dirty(), we have to consider the fact that
__set_bit() takes the offset as int, not long.

Signed-off-by: Takuya Yoshikawa
Signed-off-by: Marcelo Tosatti

Takuya Yoshikawa
2010-04-20 18:06:55 +0800

01 Mar, 2010

13 commits

70e335e16 KVM: Convert kvm->requests_lock to raw_spinlock_t ... Browse Code »

The code relies on kvm->requests_lock inhibiting preemption.

Noted by Jan Kiszka.

Signed-off-by: Avi Kivity

Avi Kivity
2010-03-01 23:36:13 +0800
8f0b1ab6f KVM: Introduce kvm_host_page_size ... Browse Code »

This patch introduces a generic function to find out the
host page size for a given gfn. This function is needed by
the kvm iommu code. This patch also simplifies the x86
host_mapping_level function.

Signed-off-by: Joerg Roedel
Signed-off-by: Avi Kivity

Joerg Roedel
2010-03-01 23:36:08 +0800
ab9f4ecbb KVM: enable PCI multiple-segments for pass-through device ... Browse Code »

Enable optional parameter (default 0) - PCI segment (or domain) besides
BDF, when assigning PCI device to guest.

Signed-off-by: Zhai Edwin
Acked-by: Chris Wright
Signed-off-by: Marcelo Tosatti

Zhai, Edwin
2010-03-01 23:36:06 +0800
02daab21d KVM: Lazify fpu activation and deactivation ... Browse Code »

Defer fpu deactivation as much as possible - if the guest fpu is loaded, keep
it loaded until the next heavyweight exit (where we are forced to unload it).
This reduces unnecessary exits.

We also defer fpu activation on clts; while clts signals the intent to use the
fpu, we can't be sure the guest will actually use it.

Signed-off-by: Avi Kivity

Avi Kivity
2010-03-01 23:35:50 +0800
79fac95ec KVM: convert slots_lock to a mutex ... Browse Code »

Signed-off-by: Marcelo Tosatti

Marcelo Tosatti
2010-03-01 23:35:45 +0800
f656ce018 KVM: switch vcpu context to use SRCU ... Browse Code »

Signed-off-by: Marcelo Tosatti

Marcelo Tosatti
2010-03-01 23:35:45 +0800
e93f8a0f8 KVM: convert io_bus to SRCU ... Browse Code »

Signed-off-by: Marcelo Tosatti

Marcelo Tosatti
2010-03-01 23:35:45 +0800
a983fb238 KVM: x86: switch kvm_set_memory_alias to SRCU update ... Browse Code »

Using a similar two-step procedure as for memslots.

Signed-off-by: Marcelo Tosatti

Marcelo Tosatti
2010-03-01 23:35:45 +0800
bc6678a33 KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU update ... Browse Code »

Use two steps for memslot deletion: mark the slot invalid (which stops
instantiation of new shadow pages for that slot, but allows destruction),
then instantiate the new empty slot.

Also simplifies kvm_handle_hva locking.

Signed-off-by: Marcelo Tosatti

Marcelo Tosatti
2010-03-01 23:35:44 +0800
3ad26d813 KVM: use gfn_to_pfn_memslot in kvm_iommu_map_pages ... Browse Code »

So its possible to iommu map a memslot before making it visible to
kvm.

Signed-off-by: Marcelo Tosatti

Marcelo Tosatti
2010-03-01 23:35:44 +0800
506f0d6f9 KVM: introduce gfn_to_pfn_memslot ... Browse Code »

Which takes a memslot pointer instead of using kvm->memslots.

To be used by SRCU convertion later.

Signed-off-by: Marcelo Tosatti

Marcelo Tosatti
2010-03-01 23:35:44 +0800
f7784b8ec KVM: split kvm_arch_set_memory_region into prepare and commit ... Browse Code »

Required for SRCU convertion later.

Signed-off-by: Marcelo Tosatti

Marcelo Tosatti
2010-03-01 23:35:44 +0800
46a26bf55 KVM: modify memslots layout in struct kvm ... Browse Code »

Have a pointer to an allocated region inside struct kvm.

[alex: fix ppc book 3s]

Signed-off-by: Alexander Graf
Signed-off-by: Marcelo Tosatti

Marcelo Tosatti
2010-03-01 23:35:43 +0800

03 Dec, 2009

7 commits

d255f4f2b KVM: introduce kvm_vcpu_on_spin ... Browse Code »

Introduce kvm_vcpu_on_spin, to be used by VMX/SVM to yield processing
once the cpu detects pause-based looping.

Signed-off-by: "Zhai, Edwin"
Signed-off-by: Marcelo Tosatti

Zhai, Edwin
2009-12-03 15:32:17 +0800
10474ae89 KVM: Activate Virtualization On Demand ... Browse Code »

X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).

Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.

To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.

So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.

Signed-off-by: Alexander Graf
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Alexander Graf
2009-12-03 15:32:10 +0800
bfd99ff5d KVM: Move assigned device code to own file ... Browse Code »

Signed-off-by: Avi Kivity

Avi Kivity
2009-12-03 15:32:09 +0800
136bdfeee KVM: Move irq ack notifier list to arch independent code ... Browse Code »

Mask irq notifier list is already there.

Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity

Gleb Natapov
2009-12-03 15:32:07 +0800
3e71f88bc KVM: Maintain back mapping from irqchip/pin to gsi ... Browse Code »

Maintain back mapping from irqchip/pin to gsi to speedup
interrupt acknowledgment notifications.

[avi: build fix on non-x86/ia64]

Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity

Gleb Natapov
2009-12-03 15:32:07 +0800
46e624b95 KVM: Change irq routing table to use gsi indexed array ... Browse Code »

Use gsi indexed array instead of scanning all entries on each interrupt
injection.

Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity

Gleb Natapov
2009-12-03 15:32:07 +0800
1a6e4a8c2 KVM: Move irq sharing information to irqchip level ... Browse Code »

This removes assumptions that max GSIs is smaller than number of pins.
Sharing is tracked on pin level not GSI level.

[avi: no PIC on ia64]

Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity

Gleb Natapov
2009-12-03 15:32:06 +0800

19 Sep, 2009

1 commit

fc5377668 tracing: Remove markers ... Browse Code »
1

Now that the last users of markers have migrated to the event
tracer we can kill off the (now orphan) support code.

Signed-off-by: Christoph Hellwig
Acked-by: Mathieu Desnoyers
Cc: Steven Rostedt
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar

Christoph Hellwig
2009-09-19 03:22:08 +0800

10 Sep, 2009

4 commits

a1b37100d KVM: Reduce runnability interface with arch support code ... Browse Code »

Remove kvm_cpu_has_interrupt() and kvm_arch_interrupt_allowed() from
interface between general code and arch code. kvm_arch_vcpu_runnable()
checks for interrupts instead.

Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity

Gleb Natapov
2009-09-10 13:33:13 +0800
0b71785dc KVM: Move kvm_cpu_get_interrupt() declaration to x86 code ... Browse Code »

It is implemented only by x86.

Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity

Gleb Natapov
2009-09-10 13:33:13 +0800
d34e6b175 KVM: add ioeventfd support ... Browse Code »

ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest. Host userspace can register any
arbitrary IO address with a corresponding eventfd and then pass the eventfd
to a specific end-point of interest for handling.

Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
"heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.

However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc). For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible. All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling. This adds additional computational load on the
system, as well as latency to the signalling path.

Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd. This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.

To test this theory, we built a test-harness called "doorbell". This
module has a function called "doorbell_ring()" which simply increments a
counter for each time the doorbell is signaled. It supports signalling
from either an eventfd, or an ioctl().

We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl(). The other is direct via
ioeventfd.

You can download this test harness here:

ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2

The measured results are as follows:

qemu-mmio: 110000 iops, 9.09us rtt
ioeventfd-mmio: 200100 iops, 5.00us rtt
ioeventfd-pio: 367300 iops, 2.72us rtt

I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy. However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:

qemu-pio: 153139 iops, 6.53us rtt
ioeventfd-hc: 412585 iops, 2.37us rtt

these are just for fun, for now, until I can gather more data.

Here is a graph for your convenience:

http://developer.novell.com/wiki/images/7/76/Iofd-chart.png

The conclusion to draw is that we save about 4us by skipping the userspace
hop.

--------------------

Signed-off-by: Gregory Haskins
Acked-by: Michael S. Tsirkin
Signed-off-by: Avi Kivity

Gregory Haskins
2009-09-10 13:33:12 +0800
090b7aff2 KVM: make io_bus interface more robust ... Browse Code »

Today kvm_io_bus_regsiter_dev() returns void and will internally BUG_ON
if it fails. We want to create dynamic MMIO/PIO entries driven from
userspace later in the series, so we need to enhance the code to be more
robust with the following changes:

1) Add a return value to the registration function
2) Fix up all the callsites to check the return code, handle any
failures, and percolate the error up to the caller.
3) Add an unregister function that collapses holes in the array

Signed-off-by: Gregory Haskins
Acked-by: Michael S. Tsirkin
Signed-off-by: Avi Kivity

Gregory Haskins
2009-09-10 13:33:12 +0800