28 Oct, 2008
1 commit
-
Every call of kvm_set_irq() should offer an irq_source_id, which is
allocated by kvm_request_irq_source_id(). Based on irq_source_id, we
identify the irq source and implement logical OR for shared level
interrupts.The allocated irq_source_id can be freed by kvm_free_irq_source_id().
Currently, we support at most sizeof(unsigned long) different irq sources.
[Amit: - rebase to kvm.git HEAD
- move definition of KVM_USERSPACE_IRQ_SOURCE_ID to common file
- move kvm_request_irq_source_id to the update_irq ioctl][Xiantao: - Add kvm/ia64 stuff and make it work for kvm/ia64 guests]
Signed-off-by: Sheng Yang
Signed-off-by: Amit Shah
Signed-off-by: Xiantao Zhang
Signed-off-by: Avi Kivity
15 Oct, 2008
19 commits
-
Moving irqchip_in_kernel() from ioapic.h to irq.h.
Signed-off-by: Xiantao Zhang
Signed-off-by: Avi Kivity -
Moving irq ack notification logic as common, and make
it shared with ia64 side.Signed-off-by: Xiantao Zhang
Signed-off-by: Avi Kivity -
Add a kvm_ prefix to avoid polluting kernel's name space.
Signed-off-by: Xiantao Zhang
Signed-off-by: Avi Kivity -
To share with other archs, this patch moves device assignment
logic to common parts.Signed-off-by: Xiantao Zhang
Signed-off-by: Avi Kivity -
Preparation for kvm/ia64 VT-d support.
Signed-off-by: Zhang xiantao
Signed-off-by: Avi Kivity -
Assigned device could DMA to mmio pages, so also need to map mmio pages
into VT-d page table.Signed-off-by: Weidong Han
Signed-off-by: Avi Kivity -
Currently "#include " is not needed in
virt/kvm/kvm_main.c.Signed-off-by: Weidong Han
Signed-off-by: Avi Kivity -
One of vcpu_setup responsibilities is to do mmu initialization.
However, in case we fail in kvm_arch_vcpu_reset, before we get the
chance to init mmu. OTOH, vcpu_destroy will attempt to destroy mmu,
triggering a bug. Keeping track of whether or not mmu is initialized
would unnecessarily complicate things. Rather, we just make return,
making sure any needed uninitialization is done before we return, in
case we fail.Signed-off-by: Glauber Costa
Signed-off-by: Avi Kivity -
Convert gfn_to_pfn to use get_user_pages_fast, which can do lockless
pagetable lookups on x86. Kernel compilation on 4-way guest is 3.7%
faster on VMX.Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity -
kvm_vm_fault is invoked with mmap_sem held in read mode. Since gfn_to_page
will be converted to get_user_pages_fast, which requires this lock NOT
to be held, switch to opencoded get_user_pages.Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity -
Based on a patch by: Kay, Allen M
This patch enables PCI device assignment based on VT-d support.
When a device is assigned to the guest, the guest memory is pinned and
the mapping is updated in the VT-d IOMMU.[Amit: Expose KVM_CAP_IOMMU so we can check if an IOMMU is present
and also control enable/disable from userspace]Signed-off-by: Kay, Allen M
Signed-off-by: Weidong Han
Signed-off-by: Ben-Ami Yassour
Signed-off-by: Amit ShahAcked-by: Mark Gross
Signed-off-by: Avi Kivity -
Offline or uninitialized vcpu's can be executed if requested to perform
userspace work.Follow Avi's suggestion to handle halted vcpu's in the main loop,
simplifying kvm_emulate_halt(). Introduce a new vcpu->requests bit to
indicate events that promote state from halted to running.Also standardize vcpu wake sites.
Signed-off-by: Marcelo Tosatti redhat.com>
Signed-off-by: Avi Kivity -
This is esoteric and only needed to break COW on MAP_SHARED mappings. Since
KVM no longer does these sorts of mappings, breaking COW on them is no longer
necessary.Signed-off-by: Avi Kivity
-
Before enabling notify_acked_irq for ia64, leave the related APIs as
nop-op first.Signed-off-by: Xiantao Zhang
Signed-off-by: Avi Kivity -
Signed-off-by: Dave Hansen
Signed-off-by: Avi Kivity -
Userspace may specify memory slots that are backed by mmio pages rather than
normal RAM. In some cases it is not enough to identify these mmio pages
by pfn_valid(). This patch adds checking the PageReserved as well.Signed-off-by: Ben-Ami Yassour
Signed-off-by: Muli Ben-Yehuda
Signed-off-by: Avi Kivity -
Based on a patch from: Ben-Ami Yassour
which was based on a patch from: Amit ShahNotify IRQ acking on PIC/APIC emulation. The previous patch missed two things:
- Edge triggered interrupts on IOAPIC
- PIC reset with IRR/ISR set should be equivalent to ack (LAPIC probably
needs something similar).Signed-off-by: Marcelo Tosatti
CC: Amit Shah
CC: Ben-Ami Yassour
Signed-off-by: Avi Kivity -
The current kvmtrace code uses get_cycles() while the interpretation would be
easier using using nanoseconds. ktime_get() should give at least the same
accuracy as get_cycles on all architectures (even better on 32bit archs) but
at a better unit (e.g. comparable between hosts with different frequencies.[avi: avoid ktime_t in public header]
Signed-off-by: Christian Ehrhardt
Acked-by: Christian Borntraeger
Signed-off-by: Avi Kivity -
This patch fixes kvmtrace use on big endian systems. When using bit fields the
compiler will lay data out in the wrong order expected when laid down into a
file.
This fixes it by using one variable instead of using bit fields.Signed-off-by: Jerone Young
Signed-off-by: Christian Ehrhardt
Signed-off-by: Avi Kivity
29 Jul, 2008
2 commits
-
Synchronize changes to host virtual addresses which are part of
a KVM memory slot to the KVM shadow mmu. This allows pte operations
like swapping, page migration, and madvise() to transparently work
with KVM.Signed-off-by: Andrea Arcangeli
Signed-off-by: Avi Kivity -
This allows reading memslots with only the mmu_lock hold for mmu
notifiers that runs in atomic context and with mmu_lock held.Signed-off-by: Andrea Arcangeli
Signed-off-by: Avi Kivity
25 Jul, 2008
1 commit
-
This patch just extends the anon_inode_getfd interface to take an additional
parameter with a flag value. The flag value is passed on to
get_unused_fd_flags in anticipation for a use with the O_CLOEXEC flag.No actual semantic changes here, the changed callers all pass 0 for now.
[akpm@linux-foundation.org: KVM fix]
Signed-off-by: Ulrich Drepper
Acked-by: Davide Libenzi
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Jul, 2008
11 commits
-
smp_call_function_mask() now complains when called in a preemptible context;
adjust its callers accordingly.Signed-off-by: Avi Kivity
-
Flush the shadow mmu before removing regions to avoid stale entries.
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity -
This patch #ifdefs the bitmap array for dirty tracking. We don't have dirty
tracking on s390 today, and we'd love to use our storage keys to store the
dirty information for migration. Therefore, we won't need this array at all,
and due to our limited amount of vmalloc space this limits the amount of guests
we can run.Signed-off-by: Carsten Otte
Signed-off-by: Avi Kivity -
Currently kvmtrace is not portable. This will prevent from copying a
trace file from big-endian target to little-endian workstation for analysis.
In the patch, kernel outputs metadata containing a magic number to trace
log, and changes 64-bit words to be u64 instead of a pair of u32s.Signed-off-by: Tan Li
Acked-by: Jerone Young
Acked-by: Hollis Blanchard
Signed-off-by: Avi Kivity -
This patch adds all needed structures to coalesce MMIOs.
Until an architecture uses it, it is not compiled.Coalesced MMIO introduces two ioctl() to define where are the MMIO zones that
can be coalesced:- KVM_REGISTER_COALESCED_MMIO registers a coalesced MMIO zone.
It requests one parameter (struct kvm_coalesced_mmio_zone) which defines
a memory area where MMIOs can be coalesced until the next switch to
user space. The maximum number of MMIO zones is KVM_COALESCED_MMIO_ZONE_MAX.- KVM_UNREGISTER_COALESCED_MMIO cancels all registered zones inside
the given bounds (bounds are also given by struct kvm_coalesced_mmio_zone).The userspace client can check kernel coalesced MMIO availability by asking
ioctl(KVM_CHECK_EXTENSION) for the KVM_CAP_COALESCED_MMIO capability.
The ioctl() call to KVM_CAP_COALESCED_MMIO will return 0 if not supported,
or the page offset where will be stored the ring buffer.
The page offset depends on the architecture.After an ioctl(KVM_RUN), the first page of the KVM memory mapped points to
a kvm_run structure. The offset given by KVM_CAP_COALESCED_MMIO is
an offset to the coalesced MMIO ring expressed in PAGE_SIZE relatively
to the address of the start of th kvm_run structure. The MMIO ring buffer
is defined by the structure kvm_coalesced_mmio_ring.[akio: fix oops during guest shutdown]
Signed-off-by: Laurent Vivier
Signed-off-by: Akio Takebe
Signed-off-by: Avi Kivity -
Modify member in_range() of structure kvm_io_device to pass length and the type
of the I/O (write or read).This modification allows to use kvm_io_device with coalesced MMIO.
Signed-off-by: Laurent Vivier
Signed-off-by: Avi Kivity -
[avi: fix ia64 build breakage]
Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity -
Obsoleted by the vmx-specific per-cpu list.
Signed-off-by: Avi Kivity
-
KVM turns off hardware virtualization extensions during reboot, in order
to disassociate the memory used by the virtualization extensions from the
processor, and in order to have the system in a consistent state.
Unfortunately virtual machines may still be running while this goes on,
and once virtualization extensions are turned off, any virtulization
instruction will #UD on execution.Fix by adding an exception handler to virtualization instructions; if we get
an exception during reboot, we simply spin waiting for the reset to complete.
If it's a true exception, BUG() so we can have our stack trace.Signed-off-by: Avi Kivity
-
This patch allows VMAs that contain no backing page to be used for guest
memory. This is useful for assigning mmio regions to a guest.Signed-off-by: Anthony Liguori
Signed-off-by: Avi Kivity -
kvm_dev_ioctl casts the arg value to void __user *, just to recast it
again to long. This seems unnecessary.According to objdump the binary code on x86 is unchanged by this patch.
Signed-off-by: Christian Borntraeger
Signed-off-by: Avi Kivity
16 Jul, 2008
1 commit
-
Conflicts:
arch/powerpc/Kconfig
arch/s390/kernel/time.c
arch/x86/kernel/apic_32.c
arch/x86/kernel/cpu/perfctr-watchdog.c
arch/x86/kernel/i8259_64.c
arch/x86/kernel/ldt.c
arch/x86/kernel/nmi_64.c
arch/x86/kernel/smpboot.c
arch/x86/xen/smp.c
include/asm-x86/hw_irq_32.h
include/asm-x86/hw_irq_64.h
include/asm-x86/mach-default/irq_vectors.h
include/asm-x86/mach-voyager/irq_vectors.h
include/asm-x86/smp.h
kernel/MakefileSigned-off-by: Ingo Molnar
06 Jul, 2008
1 commit
-
The "remote_irr" variable is used to indicate an interrupt
which has been received by the LAPIC, but not acked.In our EOI handler, we unset remote_irr and re-inject the
interrupt if the interrupt line is still asserted.However, we do not set remote_irr here, leading to a
situation where if kvm_ioapic_set_irq() is called, then we go
ahead and call ioapic_service(). This means that IRR is
re-asserted even though the interrupt is currently in service
(i.e. LAPIC IRR is cleared and ISR/TMR set)The issue with this is that when the currently executing
interrupt handler finishes and writes LAPIC EOI, then TMR is
unset and EOI sent to the IOAPIC. Since IRR is now asserted,
but TMR is not, then when the second interrupt is handled,
no EOI is sent and if there is any pending interrupt, it is
not re-injected.This fixes a hang only seen while running mke2fs -j on an
8Gb virtio disk backed by a fully sparse raw file, with
aliguori "avoid fragmented virtio-blk transfers by copying"
changes.Signed-off-by: Mark McLoughlin
Acked-by: Marcelo Tosatti
Signed-off-by: Avi Kivity
26 Jun, 2008
2 commits
-
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.Acked-by: Jeremy Fitzhardinge
Reviewed-by: Paul E. McKenney
Signed-off-by: Jens Axboe -
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.Acked-by: Jeremy Fitzhardinge
Signed-off-by: Jens Axboe
24 Jun, 2008
1 commit
-
The ioapic acknowledge path translates interrupt vectors to irqs. It
currently uses a first match algorithm, stopping when it finds the first
redirection table entry containing the vector. That fails however if the
guest changes the irq to a different line, leaving the old redirection table
entry in place (though masked). Result is interrupts not making it to the
guest.Fix by always scanning the entire redirection table.
Signed-off-by: Avi Kivity
07 Jun, 2008
1 commit
-
There's a bug in the IOAPIC code for level-triggered interrupts. Its
relatively easy to trigger by sharing (virtio-blk + usbtablet was the
testcase, initially reported by Gerd von Egidy).The "remote_irr" variable is used to indicate accepted but not yet acked
interrupts. Its cleared from the EOI handler.Problem is that the EOI handler clears remote_irr unconditionally, even
if it reinjected another pending interrupt.In that case, kvm_ioapic_set_irq() proceeds to ioapic_service() which
sets remote_irr even if it failed to inject (since the IRR was high due
to EOI reinjection).Since the TMR bit has been cleared by the first EOI, the second one
fails to clear remote_irr.End result is interrupt line dead.
Fix it by setting remote_irr only if a new pending interrupt has been
generated (and the TMR bit for vector in question set).Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity