22 May, 2011
1 commit
-
KVM does not hold any references to rcu protected data when it switches
CPU into a guest mode. In fact switching to a guest mode is very similar
to exiting to userspase from rcu point of view. In addition CPU may stay
in a guest mode for quite a long time (up to one time slice). Lets treat
guest mode as quiescent state, just like we do with user-mode execution.Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity
11 May, 2011
4 commits
-
This patch avoids gcc issuing the following warning when KVM_MAX_VCPUS=1:
warning: array subscript is above array boundskvm_for_each_vcpu currently checks to see if the index for the vcpu is
valid /after/ loading it. We don't run into problems because the address
is still inside the enclosing struct kvm and we never deference or write
to it, so this isn't a security issue.The warning occurs when KVM_MAX_VCPUS=1 because the increment portion of
the loop will *always* cause the loop to load an invalid location since
++idx will always be > 0.This patch moves the load so that the check occurs before the load and
we don't run into the compiler warning.Signed-off-by: Neil Brown
Signed-off-by: Jeff Mahoney
Signed-off-by: Avi Kivity -
Since sse instructions can issue 16-byte mmios, we need to support them. We
can't increase the kvm_run mmio buffer size to 16 bytes without breaking
compatibility, so instead we break the large mmios into two smaller 8-byte
ones. Since the bus is 64-bit we aren't breaking any atomicity guarantees.Signed-off-by: Avi Kivity
-
This reverts commit f86368493ec038218e8663cc1b6e5393cd8e008a.
Simpler fix to follow.
Signed-off-by: Marcelo Tosatti
-
We can get memslot id from memslot->id directly
Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity
18 Mar, 2011
5 commits
-
The interrupt injection logic looks something like
if an nmi is pending, and nmi injection allowed
inject nmi
if an nmi is pending
request exit on nmi windowthe problem is that "nmi is pending" can be set asynchronously by
the PIT; if it happens to fire between the two if statements, we
will request an nmi window even though nmi injection is allowed. On
SVM, this has disasterous results, since it causes eflags.TF to be
set in random guest code.The fix is simple; make nmi_pending synchronous using the standard
vcpu->requests mechanism; this ensures the code above is completely
synchronous wrt nmi_pending.Signed-off-by: Avi Kivity
-
Instead of sleeping in kvm_vcpu_on_spin, which can cause gigantic
slowdowns of certain workloads, we instead use yield_to to get
another VCPU in the same KVM guest to run sooner.This seems to give a 10-15% speedup in certain workloads.
Signed-off-by: Rik van Riel
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity -
Keep track of which task is running a KVM vcpu. This helps us
figure out later what task to wake up if we want to boost a
vcpu that got preempted.Unfortunately there are no guarantees that the same task
always keeps the same vcpu, so we can only track the task
across a single "run" of the vcpu.Signed-off-by: Rik van Riel
Signed-off-by: Avi Kivity -
Now, we have 'vcpu->mode' to judge whether need to send ipi to other
cpus, this way is very exact, so checking request bit is needless,
then we can drop the spinlock let it's collateralSigned-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity -
Currently we keep track of only two states: guest mode and host
mode. This patch adds an "exiting guest mode" state that tells
us that an IPI will happen soon, so unless we need to wait for the
IPI, we can avoid it completely.Also
1: No need atomically to read/write ->mode in vcpu's thread2: reorganize struct kvm_vcpu to make ->mode and ->requests
in the same cache line explicitlySigned-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity
12 Jan, 2011
13 commits
-
Make it available for all archs.
Signed-off-by: Avi Kivity
-
Large page information has two elements but one of them, write_count, alone
is accessed by a helper function.This patch replaces this helper function with more generic one which returns
newly named kvm_lpage_info structure and use it to access the other element
rmap_pde.Signed-off-by: Takuya Yoshikawa
Signed-off-by: Avi Kivity -
Quote from Avi:
| I don't think we need to flush immediately; set a "tlb dirty" bit somewhere
| that is cleareded when we flush the tlb. kvm_mmu_notifier_invalidate_page()
| can consult the bit and force a flush if set.Signed-off-by: Xiao Guangrong
Signed-off-by: Marcelo Tosatti -
KVM compilation fails with the following warning:
include/linux/kvm_host.h: In function 'kvm_irq_routing_update':
include/linux/kvm_host.h:679:2: error: 'struct kvm' has no member named 'irq_routing'That function is only used and reasonable to have on systems that implement
an in-kernel interrupt chip. PPC doesn't.Fix by #ifdef'ing it out when no irqchip is available.
Signed-off-by: Alexander Graf
Signed-off-by: Avi Kivity -
Store irq routing table pointer in the irqfd object,
and use that to inject MSI directly without bouncing out to
a kernel thread.While we touch this structure, rearrange irqfd fields to make fastpath
better packed for better cache utilization.This also adds some comments about locking rules and rcu usage in code.
Some notes on the design:
- Use pointer into the rt instead of copying an entry,
to make it possible to use rcu, thus side-stepping
locking complexities. We also save some memory this way.
- Old workqueue code is still used for level irqs.
I don't think we DTRT with level anyway, however,
it seems easier to keep the code around as
it has been thought through and debugged, and fix level later than
rip out and re-instate it later.Signed-off-by: Michael S. Tsirkin
Acked-by: Marcelo Tosatti
Acked-by: Gregory Haskins
Signed-off-by: Avi Kivity -
Cosmetic change, but it helps to correlate IRQs with PCI devices.
Acked-by: Alex Williamson
Acked-by: Michael S. Tsirkin
Signed-off-by: Jan Kiszka
Signed-off-by: Marcelo Tosatti -
This improves the IRQ forwarding for assigned devices: By using the
kernel's threaded IRQ scheme, we can get rid of the latency-prone work
queue and simplify the code in the same run.Moreover, we no longer have to hold assigned_dev_lock while raising the
guest IRQ, which can be a lenghty operation as we may have to iterate
over all VCPUs. The lock is now only used for synchronizing masking vs.
unmasking of INTx-type IRQs, thus is renames to intx_lock.Acked-by: Alex Williamson
Acked-by: Michael S. Tsirkin
Signed-off-by: Jan Kiszka
Signed-off-by: Marcelo Tosatti -
IA64 support forces us to abstract the allocation of the kvm structure.
But instead of mixing this up with arch-specific initialization and
doing the same on destruction, split both steps. This allows to move
generic destruction calls into generic code.It also fixes error clean-up on failures of kvm_create_vm for IA64.
Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity -
Currently x86's kvm_vm_ioctl_get_dirty_log() needs to allocate a bitmap by
vmalloc() which will be used in the next logging and this has been causing
bad effect to VGA and live-migration: vmalloc() consumes extra systime,
triggers tlb flush, etc.This patch resolves this issue by pre-allocating one more bitmap and switching
between two bitmaps during dirty logging.Performance improvement:
I measured performance for the case of VGA update by trace-cmd.
The result was 1.5 times faster than the original one.In the case of live migration, the improvement ratio depends on the workload
and the guest memory size. In general, the larger the memory size is the more
benefits we get.Note:
This does not change other architectures's logic but the allocation size
becomes twice. This will increase the actual memory consumption only when
the new size changes the number of pages allocated by vmalloc().Signed-off-by: Takuya Yoshikawa
Signed-off-by: Fernando Luis Vazquez Cao
Signed-off-by: Marcelo Tosatti -
As suggested by Andrea, pass r/w error code to gup(), upgrading read fault
to writable if host pte allows it.Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity -
Guest enables async PF vcpu functionality using this MSR.
Reviewed-by: Rik van Riel
Signed-off-by: Gleb Natapov
Signed-off-by: Marcelo Tosatti -
Keep track of memslots changes by keeping generation number in memslots
structure. Provide kvm_write_guest_cached() function that skips
gfn_to_hva() translation if memslots was not changed since previous
invocation.Acked-by: Rik van Riel
Signed-off-by: Gleb Natapov
Signed-off-by: Marcelo Tosatti -
If a guest accesses swapped out memory do not swap it in from vcpu thread
context. Schedule work to do swapping and put vcpu into halted state
instead.Interrupts will still be delivered to the guest and if interrupt will
cause reschedule guest will continue to run another task.[avi: remove call to get_user_pages_noio(), nacked by Linus; this
makes everything synchrnous again]Acked-by: Rik van Riel
Signed-off-by: Gleb Natapov
Signed-off-by: Marcelo Tosatti
25 Oct, 2010
1 commit
-
* 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (321 commits)
KVM: Drop CONFIG_DMAR dependency around kvm_iommu_map_pages
KVM: Fix signature of kvm_iommu_map_pages stub
KVM: MCE: Send SRAR SIGBUS directly
KVM: MCE: Add MCG_SER_P into KVM_MCE_CAP_SUPPORTED
KVM: fix typo in copyright notice
KVM: Disable interrupts around get_kernel_ns()
KVM: MMU: Avoid sign extension in mmu_alloc_direct_roots() pae root address
KVM: MMU: move access code parsing to FNAME(walk_addr) function
KVM: MMU: audit: check whether have unsync sps after root sync
KVM: MMU: audit: introduce audit_printk to cleanup audit code
KVM: MMU: audit: unregister audit tracepoints before module unloaded
KVM: MMU: audit: fix vcpu's spte walking
KVM: MMU: set access bit for direct mapping
KVM: MMU: cleanup for error mask set while walk guest page table
KVM: MMU: update 'root_hpa' out of loop in PAE shadow path
KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn()
KVM: x86: Fix constant type in kvm_get_time_scale
KVM: VMX: Add AX to list of registers clobbered by guest switch
KVM guest: Move a printk that's using the clock before it's ready
KVM: x86: TSC catchup mode
...
24 Oct, 2010
7 commits
-
Breaks otherwise if CONFIG_IOMMU_API is not set.
KVM-Stable-Tag.
Signed-off-by: Jan Kiszka
Signed-off-by: Marcelo Tosatti -
This just changes some names to better reflect the usage they
will be given. Separated out to keep confusion to a minimum.Signed-off-by: Zachary Amsden
Signed-off-by: Marcelo Tosatti -
Instead of blindly attempting to inject an event before each guest entry,
check for a possible event first in vcpu->requests. Sites that can trigger
event injection are modified to set KVM_REQ_EVENT:- interrupt, nmi window opening
- ppr updates
- i8259 output changes
- local apic irr changes
- rflags updates
- gif flag set
- event set on exitThis improves non-injecting entry performance, and sets the stage for
non-atomic injection.Signed-off-by: Avi Kivity
-
This patch introduces a mmu-callback to translate gpa
addresses in the walk_addr code. This is later used to
translate l2_gpa addresses into l1_gpa addresses.Signed-off-by: Joerg Roedel
Signed-off-by: Avi Kivity -
There is a bugs in this function, we call gfn_to_pfn() and kvm_mmu_gva_to_gpa_read() in
atomic context(kvm_mmu_audit() is called under the spinlock(mmu_lock)'s protection).This patch fix it by:
- introduce gfn_to_pfn_atomic instead of gfn_to_pfn
- get the mapping gfn from kvm_mmu_page_get_gfn()And it adds 'notrap' ptes check in unsync/direct sps
Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity -
Introduce this function to get consecutive gfn's pages, it can reduce
gup's overload, used by later patchSigned-off-by: Xiao Guangrong
Signed-off-by: Marcelo Tosatti -
Introduce hva_to_pfn_atomic(), it's the fast path and can used in atomic
context, the later patch will use itSigned-off-by: Xiao Guangrong
Signed-off-by: Marcelo Tosatti
20 Aug, 2010
1 commit
-
Signed-off-by: Arnd Bergmann
Signed-off-by: Paul E. McKenney
Cc: Avi Kivity
Cc: Marcelo Tosatti
Reviewed-by: Josh Triplett
02 Aug, 2010
2 commits
-
Devices register mask notifier using gsi, but irqchip knows about
irqchip/pin, so conversion from irqchip/pin to gsi should be done before
looking for mask notifier to call.Signed-off-by: Gleb Natapov
Signed-off-by: Marcelo Tosatti -
Currently if guest access address that belongs to memory slot but is not
backed up by page or page is read only KVM treats it like MMIO access.
Remove that capability. It was never part of the interface and should
not be relied upon.Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity
01 Aug, 2010
6 commits
-
May be used for distinguishing between internal and user slots, or for sorting
slots in size order.Signed-off-by: Avi Kivity
-
Usually the vcpu->requests bitmap is sparse, so a test_and_clear_bit() for
each request generates a large number of unneeded atomics if a bit is set.Replace with a separate test/clear sequence. This is safe since there is
no clear_bit() outside the vcpu thread.Signed-off-by: Avi Kivity
-
Makes it a little more readable and hackable.
Signed-off-by: Avi Kivity
-
As advertised in feature-removal-schedule.txt. Equivalent support is provided
by overlapping memory regions.Signed-off-by: Avi Kivity
-
This patch enable guest to use XSAVE/XRSTOR instructions.
We assume that host_xcr0 would use all possible bits that OS supported.
And we loaded xcr0 in the same way we handled fpu - do it as late as we can.
Signed-off-by: Dexuan Cui
Signed-off-by: Sheng Yang
Reviewed-by: Marcelo Tosatti
Signed-off-by: Avi Kivity -
KVM_REQ_KICK poisons vcpu->requests by having a bit set during normal
operation. This causes the fast path check for a clear vcpu->requests
to fail all the time, triggering tons of atomic operations.Fix by replacing KVM_REQ_KICK with a vcpu->guest_mode atomic.
Signed-off-by: Avi Kivity