25 Jul, 2012

1 commit

  • Pull KVM updates from Avi Kivity:
    "Highlights include
    - full big real mode emulation on pre-Westmere Intel hosts (can be
    disabled with emulate_invalid_guest_state=0)
    - relatively small ppc and s390 updates
    - PCID/INVPCID support in guests
    - EOI avoidance; 3.6 guests should perform better on 3.6 hosts on
    interrupt intensive workloads)
    - Lockless write faults during live migration
    - EPT accessed/dirty bits support for new Intel processors"

    Fix up conflicts in:
    - Documentation/virtual/kvm/api.txt:

    Stupid subchapter numbering, added next to each other.

    - arch/powerpc/kvm/booke_interrupts.S:

    PPC asm changes clashing with the KVM fixes

    - arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c:

    Duplicated commits through the kvm tree and the s390 tree, with
    subsequent edits in the KVM tree.

    * tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
    KVM: fix race with level interrupts
    x86, hyper: fix build with !CONFIG_KVM_GUEST
    Revert "apic: fix kvm build on UP without IOAPIC"
    KVM guest: switch to apic_set_eoi_write, apic_write
    apic: add apic_set_eoi_write for PV use
    KVM: VMX: Implement PCID/INVPCID for guests with EPT
    KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check
    KVM: PPC: Critical interrupt emulation support
    KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests
    KVM: PPC64: booke: Set interrupt computation mode for 64-bit host
    KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt
    KVM: PPC: bookehv64: Add support for std/ld emulation.
    booke: Added crit/mc exception handler for e500v2
    booke/bookehv: Add host crit-watchdog exception support
    KVM: MMU: document mmu-lock and fast page fault
    KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint
    KVM: MMU: trace fast page fault
    KVM: MMU: fast path of handling guest page fault
    KVM: MMU: introduce SPTE_MMU_WRITEABLE bit
    KVM: MMU: fold tlb flush judgement into mmu_spte_update
    ...

    Linus Torvalds
     

21 Jul, 2012

1 commit

  • When more than 1 source id is in use for the same GSI, we have the
    following race related to handling irq_states race:

    CPU 0 clears bit 0. CPU 0 read irq_state as 0. CPU 1 sets level to 1.
    CPU 1 calls kvm_ioapic_set_irq(1). CPU 0 calls kvm_ioapic_set_irq(0).
    Now ioapic thinks the level is 0 but irq_state is not 0.

    Fix by performing all irq_states bitmap handling under pic/ioapic lock.
    This also removes the need for atomics with irq_states handling.

    Reported-by: Gleb Natapov
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Marcelo Tosatti

    Michael S. Tsirkin
     

11 Jul, 2012

1 commit

  • The kernel no longer allows us to pass NULL for the hard handler
    without also specifying IRQF_ONESHOT. IRQF_ONESHOT imposes latency
    in the exit path that we don't need for MSI interrupts. Long term
    we'd like to inject these interrupts from the hard handler when
    possible. In the short term, we can create dummy hard handlers
    that return us to the previous behavior. Credit to Michael for
    original patch.

    Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=43328

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Alex Williamson
    Signed-off-by: Avi Kivity

    Alex Williamson
     

07 Jul, 2012

1 commit

  • If last_boosted_vcpu == 0, then we fall through all test cases and
    may end up with all VCPUs pouncing on vcpu 0. With a large enough
    guest, this can result in enormous runqueue lock contention, which
    can prevent vcpu0 from running, leading to a livelock.

    Changing < to
    Signed-off-by: Marcelo Tosatti

    Rik van Riel
     

04 Jul, 2012

1 commit


03 Jul, 2012

2 commits


18 Jun, 2012

1 commit


16 Jun, 2012

1 commit

  • The masking was wrong (must have been 0x7f), and there is no need to
    re-read the value as pci_setup_device already does this for us.

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=43339
    Signed-off-by: Jan Kiszka
    Acked-by: Alex Williamson
    Signed-off-by: Marcelo Tosatti

    Jan Kiszka
     

05 Jun, 2012

3 commits

  • kvm_set_irq() has an internal buffer of three irq routing entries, allowing
    connecting a GSI to three IRQ chips or on MSI. However setup_routing_entry()
    does not properly enforce this, allowing three irqchip routes followed by
    an MSI route to overflow the buffer.

    Fix by ensuring that an MSI entry is added to an empty list.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • lpage_info is created for each large level even when the memory slot is
    not for RAM. This means that when we add one slot for a PCI device, we
    end up allocating at least KVM_NR_PAGE_SIZES - 1 pages by vmalloc().

    To make things worse, there is an increasing number of devices which
    would result in more pages being wasted this way.

    This patch mitigates this problem by using kvm_kvzalloc().

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Avi Kivity

    Takuya Yoshikawa
     
  • Will be used for lpage_info allocation later.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Avi Kivity

    Takuya Yoshikawa
     

01 May, 2012

1 commit


24 Apr, 2012

1 commit

  • Currently, MSI messages can only be injected to in-kernel irqchips by
    defining a corresponding IRQ route for each message. This is not only
    unhandy if the MSI messages are generated "on the fly" by user space,
    IRQ routes are a limited resource that user space has to manage
    carefully.

    By providing a direct injection path, we can both avoid using up limited
    resources and simplify the necessary steps for user land.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     

20 Apr, 2012

1 commit


19 Apr, 2012

1 commit

  • As pointed out by Jason Baron, when assigning a device to a guest
    we first set the iommu domain pointer, which enables mapping
    and unmapping of memory slots to the iommu. This leaves a window
    where this path is enabled, but we haven't synchronized the iommu
    mappings to the existing memory slots. Thus a slot being removed
    at that point could send us down unexpected code paths removing
    non-existent pinnings and iommu mappings. Take the slots_lock
    around creating the iommu domain and initial mappings as well as
    around iommu teardown to avoid this race.

    Signed-off-by: Alex Williamson
    Signed-off-by: Marcelo Tosatti

    Alex Williamson
     

17 Apr, 2012

1 commit

  • Intel spec says that TMR needs to be set/cleared
    when IRR is set, but kvm also clears it on EOI.

    I did some tests on a real (AMD based) system,
    and I see same TMR values both before
    and after EOI, so I think it's a minor bug in kvm.

    This patch fixes TMR to be set/cleared on IRR set
    only as per spec.

    And now that we don't clear TMR, we can save
    an atomic read of TMR on EOI that's not propagated
    to ioapic, by checking whether ioapic needs
    a specific vector first and calculating
    the mode afterwards.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Marcelo Tosatti

    Michael S. Tsirkin
     

12 Apr, 2012

1 commit

  • We've been adding new mappings, but not destroying old mappings.
    This can lead to a page leak as pages are pinned using
    get_user_pages, but only unpinned with put_page if they still
    exist in the memslots list on vm shutdown. A memslot that is
    destroyed while an iommu domain is enabled for the guest will
    therefore result in an elevated page reference count that is
    never cleared.

    Additionally, without this fix, the iommu is only programmed
    with the first translation for a gpa. This can result in
    peer-to-peer errors if a mapping is destroyed and replaced by a
    new mapping at the same gpa as the iommu will still be pointing
    to the original, pinned memory address.

    Signed-off-by: Alex Williamson
    Signed-off-by: Marcelo Tosatti

    Alex Williamson
     

08 Apr, 2012

4 commits

  • Now that we do neither double buffering nor heuristic selection of the
    write protection method these are not needed anymore.

    Note: some drivers have their own implementation of set_bit_le() and
    making it generic needs a bit of work; so we use test_and_set_bit_le()
    and will later replace it with generic set_bit_le().

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Avi Kivity

    Takuya Yoshikawa
     
  • S390's kvm_vcpu_stat does not contain halt_wakeup member.

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     
  • The kvm_vcpu_kick function performs roughly the same funcitonality on
    most all architectures, so we shouldn't have separate copies.

    PowerPC keeps a pointer to interchanging waitqueues on the vcpu_arch
    structure and to accomodate this special need a
    __KVM_HAVE_ARCH_VCPU_GET_WQ define and accompanying function
    kvm_arch_vcpu_wq have been defined. For all other architectures this
    is a generic inline that just returns &vcpu->wq;

    Acked-by: Scott Wood
    Signed-off-by: Christoffer Dall
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Christoffer Dall
     
  • This patch makes the kvm_io_range array can be resized dynamically.

    Signed-off-by: Amos Kong
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Amos Kong
     

20 Mar, 2012

1 commit

  • As kvm_notify_acked_irq calls kvm_assigned_dev_ack_irq under
    rcu_read_lock, we cannot use a mutex in the latter function. Switch to a
    spin lock to address this.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Jan Kiszka
     

08 Mar, 2012

8 commits


05 Mar, 2012

5 commits

  • This moves __gfn_to_memslot() and search_memslots() from kvm_main.c to
    kvm_host.h to reduce the code duplication caused by the need for
    non-modular code in arch/powerpc/kvm/book3s_hv_rm_mmu.c to call
    gfn_to_memslot() in real mode.

    Rather than putting gfn_to_memslot() itself in a header, which would
    lead to increased code size, this puts __gfn_to_memslot() in a header.
    Then, the non-modular uses of gfn_to_memslot() are changed to call
    __gfn_to_memslot() instead. This way there is only one place in the
    source code that needs to be changed should the gfn_to_memslot()
    implementation need to be modified.

    On powerpc, the Book3S HV style of KVM has code that is called from
    real mode which needs to call gfn_to_memslot() and thus needs this.
    (Module code is allocated in the vmalloc region, which can't be
    accessed in real mode.)

    With this, we can remove builtin_gfn_to_memslot() from book3s_hv_rm_mmu.c.

    Signed-off-by: Paul Mackerras
    Acked-by: Avi Kivity
    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Paul Mackerras
     
  • find_index_from_host_irq returns 0 on error
    but callers assume < 0 on error. This should
    not matter much: an out of range irq should never happen since
    irq handler was registered with this irq #,
    and even if it does we get a spurious msix irq in guest
    and typically nothing terrible happens.

    Still, better to make it consistent.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Michael S. Tsirkin
     
  • This adds an smp_wmb in kvm_mmu_notifier_invalidate_range_end() and an
    smp_rmb in mmu_notifier_retry() so that mmu_notifier_retry() will give
    the correct answer when called without kvm->mmu_lock being held.
    PowerPC Book3S HV KVM wants to use a bitlock per guest page rather than
    a single global spinlock in order to improve the scalability of updates
    to the guest MMU hashed page table, and so needs this.

    Signed-off-by: Paul Mackerras
    Acked-by: Avi Kivity
    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Paul Mackerras
     
  • This patch exports the s390 SIE hardware control block to userspace
    via the mapping of the vcpu file descriptor. In order to do so,
    a new arch callback named kvm_arch_vcpu_fault is introduced for all
    architectures. It allows to map architecture specific pages.

    Signed-off-by: Carsten Otte
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Carsten Otte
     
  • This patch introduces a new config option for user controlled kernel
    virtual machines. It introduces a parameter to KVM_CREATE_VM that
    allows to set bits that alter the capabilities of the newly created
    virtual machine.
    The parameter is passed to kvm_arch_init_vm for all architectures.
    The only valid modifier bit for now is KVM_VM_S390_UCONTROL.
    This requires CAP_SYS_ADMIN privileges and creates a user controlled
    virtual machine on s390 architectures.

    Signed-off-by: Carsten Otte
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Carsten Otte
     

01 Feb, 2012

1 commit

  • It is possible that the __set_bit() in mark_page_dirty() is called
    simultaneously on the same region of memory, which may result in only
    one bit being set, because some callers do not take mmu_lock before
    mark_page_dirty().

    This problem is hard to produce because when we reach mark_page_dirty()
    beginning from, e.g., tdp_page_fault(), mmu_lock is being held during
    __direct_map(): making kvm-unit-tests' dirty log api test write to two
    pages concurrently was not useful for this reason.

    So we have confirmed that there can actually be race condition by
    checking if some callers really reach there without holding mmu_lock
    using spin_is_locked(): probably they were from kvm_write_guest_page().

    To fix this race, this patch changes the bit operation to the atomic
    version: note that nr_dirty_pages also suffers from the race but we do
    not need exactly correct numbers for now.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Marcelo Tosatti

    Takuya Yoshikawa
     

13 Jan, 2012

1 commit


11 Jan, 2012

1 commit

  • * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (53 commits)
    iommu/amd: Set IOTLB invalidation timeout
    iommu/amd: Init stats for iommu=pt
    iommu/amd: Remove unnecessary cache flushes in amd_iommu_resume
    iommu/amd: Add invalidate-context call-back
    iommu/amd: Add amd_iommu_device_info() function
    iommu/amd: Adapt IOMMU driver to PCI register name changes
    iommu/amd: Add invalid_ppr callback
    iommu/amd: Implement notifiers for IOMMUv2
    iommu/amd: Implement IO page-fault handler
    iommu/amd: Add routines to bind/unbind a pasid
    iommu/amd: Implement device aquisition code for IOMMUv2
    iommu/amd: Add driver stub for AMD IOMMUv2 support
    iommu/amd: Add stat counter for IOMMUv2 events
    iommu/amd: Add device errata handling
    iommu/amd: Add function to get IOMMUv2 domain for pdev
    iommu/amd: Implement function to send PPR completions
    iommu/amd: Implement functions to manage GCR3 table
    iommu/amd: Implement IOMMUv2 TLB flushing routines
    iommu/amd: Add support for IOMMUv2 domain mode
    iommu/amd: Add amd_iommu_domain_direct_map function
    ...

    Linus Torvalds
     

09 Jan, 2012

1 commit