27 Aug, 2012
1 commit
-
KVM_SET_SIGNAL_MASK passed a NULL argument leaves the on stack signal
sets uninitialized. It then passes them through to
kvm_vcpu_ioctl_set_sigmask.We should be passing a NULL in this case not translated garbage.
Signed-off-by: Alan Cox
Signed-off-by: Marcelo Tosatti
25 Jul, 2012
1 commit
-
Pull KVM updates from Avi Kivity:
"Highlights include
- full big real mode emulation on pre-Westmere Intel hosts (can be
disabled with emulate_invalid_guest_state=0)
- relatively small ppc and s390 updates
- PCID/INVPCID support in guests
- EOI avoidance; 3.6 guests should perform better on 3.6 hosts on
interrupt intensive workloads)
- Lockless write faults during live migration
- EPT accessed/dirty bits support for new Intel processors"Fix up conflicts in:
- Documentation/virtual/kvm/api.txt:Stupid subchapter numbering, added next to each other.
- arch/powerpc/kvm/booke_interrupts.S:
PPC asm changes clashing with the KVM fixes
- arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c:
Duplicated commits through the kvm tree and the s390 tree, with
subsequent edits in the KVM tree.* tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
KVM: fix race with level interrupts
x86, hyper: fix build with !CONFIG_KVM_GUEST
Revert "apic: fix kvm build on UP without IOAPIC"
KVM guest: switch to apic_set_eoi_write, apic_write
apic: add apic_set_eoi_write for PV use
KVM: VMX: Implement PCID/INVPCID for guests with EPT
KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check
KVM: PPC: Critical interrupt emulation support
KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests
KVM: PPC64: booke: Set interrupt computation mode for 64-bit host
KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt
KVM: PPC: bookehv64: Add support for std/ld emulation.
booke: Added crit/mc exception handler for e500v2
booke/bookehv: Add host crit-watchdog exception support
KVM: MMU: document mmu-lock and fast page fault
KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint
KVM: MMU: trace fast page fault
KVM: MMU: fast path of handling guest page fault
KVM: MMU: introduce SPTE_MMU_WRITEABLE bit
KVM: MMU: fold tlb flush judgement into mmu_spte_update
...
21 Jul, 2012
1 commit
-
When more than 1 source id is in use for the same GSI, we have the
following race related to handling irq_states race:CPU 0 clears bit 0. CPU 0 read irq_state as 0. CPU 1 sets level to 1.
CPU 1 calls kvm_ioapic_set_irq(1). CPU 0 calls kvm_ioapic_set_irq(0).
Now ioapic thinks the level is 0 but irq_state is not 0.Fix by performing all irq_states bitmap handling under pic/ioapic lock.
This also removes the need for atomics with irq_states handling.Reported-by: Gleb Natapov
Signed-off-by: Michael S. Tsirkin
Signed-off-by: Marcelo Tosatti
11 Jul, 2012
1 commit
-
The kernel no longer allows us to pass NULL for the hard handler
without also specifying IRQF_ONESHOT. IRQF_ONESHOT imposes latency
in the exit path that we don't need for MSI interrupts. Long term
we'd like to inject these interrupts from the hard handler when
possible. In the short term, we can create dummy hard handlers
that return us to the previous behavior. Credit to Michael for
original patch.Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=43328
Signed-off-by: Michael S. Tsirkin
Signed-off-by: Alex Williamson
Signed-off-by: Avi Kivity
07 Jul, 2012
1 commit
-
If last_boosted_vcpu == 0, then we fall through all test cases and
may end up with all VCPUs pouncing on vcpu 0. With a large enough
guest, this can result in enormous runqueue lock contention, which
can prevent vcpu0 from running, leading to a livelock.Changing < to
Signed-off-by: Marcelo Tosatti
04 Jul, 2012
1 commit
-
fault_page is forgot to be freed
Signed-off-by: Xiao Guangrong
Signed-off-by: Marcelo Tosatti
03 Jul, 2012
2 commits
-
We only know of one so far.
Signed-off-by: Alex Williamson
Signed-off-by: Marcelo Tosatti -
Prune this down to just the struct kvm_irqfd so we can avoid
changing function definition for every flag or field we use.Signed-off-by: Alex Williamson
Acked-by: Cornelia Huck
Signed-off-by: Marcelo Tosatti
18 Jun, 2012
1 commit
-
The KVM code sometimes uses CONFIG_HAVE_KVM_IRQCHIP to protect
code that is related to IRQ routing, which not all in-kernel
irqchips may support.Use KVM_CAP_IRQ_ROUTING instead.
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall
Signed-off-by: Avi Kivity
16 Jun, 2012
1 commit
-
The masking was wrong (must have been 0x7f), and there is no need to
re-read the value as pci_setup_device already does this for us.Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=43339
Signed-off-by: Jan Kiszka
Acked-by: Alex Williamson
Signed-off-by: Marcelo Tosatti
05 Jun, 2012
3 commits
-
kvm_set_irq() has an internal buffer of three irq routing entries, allowing
connecting a GSI to three IRQ chips or on MSI. However setup_routing_entry()
does not properly enforce this, allowing three irqchip routes followed by
an MSI route to overflow the buffer.Fix by ensuring that an MSI entry is added to an empty list.
Signed-off-by: Avi Kivity
-
lpage_info is created for each large level even when the memory slot is
not for RAM. This means that when we add one slot for a PCI device, we
end up allocating at least KVM_NR_PAGE_SIZES - 1 pages by vmalloc().To make things worse, there is an increasing number of devices which
would result in more pages being wasted this way.This patch mitigates this problem by using kvm_kvzalloc().
Signed-off-by: Takuya Yoshikawa
Signed-off-by: Avi Kivity -
Will be used for lpage_info allocation later.
Signed-off-by: Takuya Yoshikawa
Signed-off-by: Avi Kivity
01 May, 2012
1 commit
-
This patch implements the directed yield hypercall found on other
System z hypervisors. It delegates execution time to the virtual cpu
specified in the instruction's parameter.Useful to avoid long spinlock waits in the guest.
Christian Borntraeger: moved common code in virt/kvm/
Signed-off-by: Konstantin Weitz
Signed-off-by: Christian Borntraeger
Signed-off-by: Marcelo Tosatti
24 Apr, 2012
1 commit
-
Currently, MSI messages can only be injected to in-kernel irqchips by
defining a corresponding IRQ route for each message. This is not only
unhandy if the MSI messages are generated "on the fly" by user space,
IRQ routes are a limited resource that user space has to manage
carefully.By providing a direct injection path, we can both avoid using up limited
resources and simplify the necessary steps for user land.Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity
20 Apr, 2012
1 commit
-
Merge reason: development work has dependency on kvm patches merged
upstream.Conflicts:
Documentation/feature-removal-schedule.txtSigned-off-by: Marcelo Tosatti
19 Apr, 2012
1 commit
-
As pointed out by Jason Baron, when assigning a device to a guest
we first set the iommu domain pointer, which enables mapping
and unmapping of memory slots to the iommu. This leaves a window
where this path is enabled, but we haven't synchronized the iommu
mappings to the existing memory slots. Thus a slot being removed
at that point could send us down unexpected code paths removing
non-existent pinnings and iommu mappings. Take the slots_lock
around creating the iommu domain and initial mappings as well as
around iommu teardown to avoid this race.Signed-off-by: Alex Williamson
Signed-off-by: Marcelo Tosatti
17 Apr, 2012
1 commit
-
Intel spec says that TMR needs to be set/cleared
when IRR is set, but kvm also clears it on EOI.I did some tests on a real (AMD based) system,
and I see same TMR values both before
and after EOI, so I think it's a minor bug in kvm.This patch fixes TMR to be set/cleared on IRR set
only as per spec.And now that we don't clear TMR, we can save
an atomic read of TMR on EOI that's not propagated
to ioapic, by checking whether ioapic needs
a specific vector first and calculating
the mode afterwards.Signed-off-by: Michael S. Tsirkin
Signed-off-by: Marcelo Tosatti
12 Apr, 2012
1 commit
-
We've been adding new mappings, but not destroying old mappings.
This can lead to a page leak as pages are pinned using
get_user_pages, but only unpinned with put_page if they still
exist in the memslots list on vm shutdown. A memslot that is
destroyed while an iommu domain is enabled for the guest will
therefore result in an elevated page reference count that is
never cleared.Additionally, without this fix, the iommu is only programmed
with the first translation for a gpa. This can result in
peer-to-peer errors if a mapping is destroyed and replaced by a
new mapping at the same gpa as the iommu will still be pointing
to the original, pinned memory address.Signed-off-by: Alex Williamson
Signed-off-by: Marcelo Tosatti
08 Apr, 2012
4 commits
-
Now that we do neither double buffering nor heuristic selection of the
write protection method these are not needed anymore.Note: some drivers have their own implementation of set_bit_le() and
making it generic needs a bit of work; so we use test_and_set_bit_le()
and will later replace it with generic set_bit_le().Signed-off-by: Takuya Yoshikawa
Signed-off-by: Avi Kivity -
S390's kvm_vcpu_stat does not contain halt_wakeup member.
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity -
The kvm_vcpu_kick function performs roughly the same funcitonality on
most all architectures, so we shouldn't have separate copies.PowerPC keeps a pointer to interchanging waitqueues on the vcpu_arch
structure and to accomodate this special need a
__KVM_HAVE_ARCH_VCPU_GET_WQ define and accompanying function
kvm_arch_vcpu_wq have been defined. For all other architectures this
is a generic inline that just returns &vcpu->wq;Acked-by: Scott Wood
Signed-off-by: Christoffer Dall
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity -
This patch makes the kvm_io_range array can be resized dynamically.
Signed-off-by: Amos Kong
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity
20 Mar, 2012
1 commit
-
As kvm_notify_acked_irq calls kvm_assigned_dev_ack_irq under
rcu_read_lock, we cannot use a mutex in the latter function. Switch to a
spin lock to address this.Signed-off-by: Jan Kiszka
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity
08 Mar, 2012
8 commits
-
Using 'int' type is not suitable for a 'long' object. So, correct it.
Signed-off-by: Alex Shi
Signed-off-by: Avi Kivity -
PCI 2.3 allows to generically disable IRQ sources at device level. This
enables us to share legacy IRQs of such devices with other host devices
when passing them to a guest.The new IRQ sharing feature introduced here is optional, user space has
to request it explicitly. Moreover, user space can inform us about its
view of PCI_COMMAND_INTX_DISABLE so that we can avoid unmasking the
interrupt and signaling it if the guest masked it via the virtualized
PCI config space.Signed-off-by: Jan Kiszka
Acked-by: Alex Williamson
Acked-by: Michael S. Tsirkin
Signed-off-by: Avi Kivity -
If some vcpus are created before KVM_CREATE_IRQCHIP, then
irqchip_in_kernel() and vcpu->arch.apic will be inconsistent, leading
to potential NULL pointer dereferences.Fix by:
- ensuring that no vcpus are installed when KVM_CREATE_IRQCHIP is called
- ensuring that a vcpu has an apic if it is installed after KVM_CREATE_IRQCHIPThis is somewhat long winded because vcpu->arch.apic is created without
kvm->lock held.Based on earlier patch by Michael Ellerman.
Signed-off-by: Michael Ellerman
Signed-off-by: Avi Kivity -
Other threads may process the same page in that small window and skip
TLB flush and then return before these functions do flush.Signed-off-by: Takuya Yoshikawa
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity -
Some members of kvm_memory_slot are not used by every architecture.
This patch is the first step to make this difference clear by
introducing kvm_memory_slot::arch; lpage_info is moved into it.Signed-off-by: Takuya Yoshikawa
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity -
Narrow down the controlled text inside the conditional so that it will
include lpage_info and rmap stuff only.For this we change the way we check whether the slot is being created
from "if (npages && !new.rmap)" to "if (npages && !old.npages)".We also stop checking if lpage_info is NULL when we create lpage_info
because we do it from inside the slot creation code block.Signed-off-by: Takuya Yoshikawa
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity -
This makes it easy to make lpage_info architecture specific.
Signed-off-by: Takuya Yoshikawa
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity -
This patch cleans up the code and removes the "(void)level;" warning
suppressor.Note that we can also use this for PT_PAGE_TABLE_LEVEL to treat every
level uniformly later.Signed-off-by: Takuya Yoshikawa
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity
05 Mar, 2012
5 commits
-
This moves __gfn_to_memslot() and search_memslots() from kvm_main.c to
kvm_host.h to reduce the code duplication caused by the need for
non-modular code in arch/powerpc/kvm/book3s_hv_rm_mmu.c to call
gfn_to_memslot() in real mode.Rather than putting gfn_to_memslot() itself in a header, which would
lead to increased code size, this puts __gfn_to_memslot() in a header.
Then, the non-modular uses of gfn_to_memslot() are changed to call
__gfn_to_memslot() instead. This way there is only one place in the
source code that needs to be changed should the gfn_to_memslot()
implementation need to be modified.On powerpc, the Book3S HV style of KVM has code that is called from
real mode which needs to call gfn_to_memslot() and thus needs this.
(Module code is allocated in the vmalloc region, which can't be
accessed in real mode.)With this, we can remove builtin_gfn_to_memslot() from book3s_hv_rm_mmu.c.
Signed-off-by: Paul Mackerras
Acked-by: Avi Kivity
Signed-off-by: Alexander Graf
Signed-off-by: Avi Kivity -
find_index_from_host_irq returns 0 on error
but callers assume < 0 on error. This should
not matter much: an out of range irq should never happen since
irq handler was registered with this irq #,
and even if it does we get a spurious msix irq in guest
and typically nothing terrible happens.Still, better to make it consistent.
Signed-off-by: Michael S. Tsirkin
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity -
This adds an smp_wmb in kvm_mmu_notifier_invalidate_range_end() and an
smp_rmb in mmu_notifier_retry() so that mmu_notifier_retry() will give
the correct answer when called without kvm->mmu_lock being held.
PowerPC Book3S HV KVM wants to use a bitlock per guest page rather than
a single global spinlock in order to improve the scalability of updates
to the guest MMU hashed page table, and so needs this.Signed-off-by: Paul Mackerras
Acked-by: Avi Kivity
Signed-off-by: Alexander Graf
Signed-off-by: Avi Kivity -
This patch exports the s390 SIE hardware control block to userspace
via the mapping of the vcpu file descriptor. In order to do so,
a new arch callback named kvm_arch_vcpu_fault is introduced for all
architectures. It allows to map architecture specific pages.Signed-off-by: Carsten Otte
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity -
This patch introduces a new config option for user controlled kernel
virtual machines. It introduces a parameter to KVM_CREATE_VM that
allows to set bits that alter the capabilities of the newly created
virtual machine.
The parameter is passed to kvm_arch_init_vm for all architectures.
The only valid modifier bit for now is KVM_VM_S390_UCONTROL.
This requires CAP_SYS_ADMIN privileges and creates a user controlled
virtual machine on s390 architectures.Signed-off-by: Carsten Otte
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity
01 Feb, 2012
1 commit
-
It is possible that the __set_bit() in mark_page_dirty() is called
simultaneously on the same region of memory, which may result in only
one bit being set, because some callers do not take mmu_lock before
mark_page_dirty().This problem is hard to produce because when we reach mark_page_dirty()
beginning from, e.g., tdp_page_fault(), mmu_lock is being held during
__direct_map(): making kvm-unit-tests' dirty log api test write to two
pages concurrently was not useful for this reason.So we have confirmed that there can actually be race condition by
checking if some callers really reach there without holding mmu_lock
using spin_is_locked(): probably they were from kvm_write_guest_page().To fix this race, this patch changes the bit operation to the atomic
version: note that nr_dirty_pages also suffers from the race but we do
not need exactly correct numbers for now.Signed-off-by: Takuya Yoshikawa
Signed-off-by: Marcelo Tosatti
13 Jan, 2012
1 commit
-
module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.Acked-by: Mauro Carvalho Chehab
Signed-off-by: Rusty Russell
11 Jan, 2012
1 commit
-
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (53 commits)
iommu/amd: Set IOTLB invalidation timeout
iommu/amd: Init stats for iommu=pt
iommu/amd: Remove unnecessary cache flushes in amd_iommu_resume
iommu/amd: Add invalidate-context call-back
iommu/amd: Add amd_iommu_device_info() function
iommu/amd: Adapt IOMMU driver to PCI register name changes
iommu/amd: Add invalid_ppr callback
iommu/amd: Implement notifiers for IOMMUv2
iommu/amd: Implement IO page-fault handler
iommu/amd: Add routines to bind/unbind a pasid
iommu/amd: Implement device aquisition code for IOMMUv2
iommu/amd: Add driver stub for AMD IOMMUv2 support
iommu/amd: Add stat counter for IOMMUv2 events
iommu/amd: Add device errata handling
iommu/amd: Add function to get IOMMUv2 domain for pdev
iommu/amd: Implement function to send PPR completions
iommu/amd: Implement functions to manage GCR3 table
iommu/amd: Implement IOMMUv2 TLB flushing routines
iommu/amd: Add support for IOMMUv2 domain mode
iommu/amd: Add amd_iommu_domain_direct_map function
...