01 May, 2014

1 commit


29 Apr, 2014

1 commit

  • Currently below check in vgic_ioaddr_overlap will always succeed,
    because the vgic dist base and vgic cpu base are still kept UNDEF
    after initialization. The code as follows will be return forever.

    if (IS_VGIC_ADDR_UNDEF(dist) || IS_VGIC_ADDR_UNDEF(cpu))
    return 0;

    So, before invoking the vgic_ioaddr_overlap, it needs to set the
    corresponding base address firstly.

    Signed-off-by: Haibin Wang
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Haibin Wang
     

28 Apr, 2014

4 commits

  • Since KVM internally represents the ICFGR registers by stuffing two
    of them into one word, the offset for accessing the internal
    representation and the one for the MMIO based access are different.
    So keep the original offset around, but adjust the internal array
    offset by one bit.

    Reported-by: Haibin Wang
    Signed-off-by: Andre Przywara
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • get_user_pages(mm) is simply wrong if mm->mm_users == 0 and exit_mmap/etc
    was already called (or is in progress), mm->mm_count can only pin mm->pgd
    and mm_struct itself.

    Change kvm_setup_async_pf/async_pf_execute to inc/dec mm->mm_users.

    kvm_create_vm/kvm_destroy_vm play with ->mm_count too but this case looks
    fine at first glance, it seems that this ->mm is only used to verify that
    current->mm == kvm->mm.

    Signed-off-by: Oleg Nesterov
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Bonzini

    Oleg Nesterov
     
  • When dispatch SGI(mode == 0), that is the vcpu of VM should send
    sgi to the cpu which the target_cpus list.
    So, there must add the "break" to branch of case 0.

    Cc: # 3.10+
    Signed-off-by: Haibin Wang
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Haibin Wang
     
  • As result of deprecation of MSI-X/MSI enablement functions
    pci_enable_msix() and pci_enable_msi_block() all drivers
    using these two interfaces need to be updated to use the
    new pci_enable_msi_range() or pci_enable_msi_exact()
    and pci_enable_msix_range() or pci_enable_msix_exact()
    interfaces.

    Signed-off-by: Alexander Gordeev
    Cc: Gleb Natapov
    Cc: Paolo Bonzini
    Cc: kvm@vger.kernel.org
    Cc: linux-pci@vger.kernel.org
    Signed-off-by: Paolo Bonzini

    Alexander Gordeev
     

15 Apr, 2014

1 commit

  • Pull KVM fixes from Marcelo Tosatti:
    - Fix for guest triggerable BUG_ON (CVE-2014-0155)
    - CR4.SMAP support
    - Spurious WARN_ON() fix

    * git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: x86: remove WARN_ON from get_kernel_ns()
    KVM: Rename variable smep to cr4_smep
    KVM: expose SMAP feature to guest
    KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
    KVM: Add SMAP support when setting CR4
    KVM: Remove SMAP bit from CR4_RESERVED_BITS
    KVM: ioapic: try to recover if pending_eoi goes out of range
    KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155)

    Linus Torvalds
     

08 Apr, 2014

1 commit

  • Commit 8146875de7d4 (arm, kvm: Fix CPU hotplug callback registration)
    holds the lock before calling the two functions:

    kvm_vgic_hyp_init()
    kvm_timer_hyp_init()

    and both the two functions are calling register_cpu_notifier()
    to register cpu notifier, so cause double lock on cpu_add_remove_lock.

    Considered that both two functions are only called inside
    kvm_arch_init() with holding cpu_add_remove_lock, so simply use
    __register_cpu_notifier() to fix the problem.

    Fixes: 8146875de7d4 (arm, kvm: Fix CPU hotplug callback registration)
    Signed-off-by: Ming Lei
    Reviewed-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Ming Lei
     

04 Apr, 2014

3 commits

  • The RTC tracking code tracks the cardinality of rtc_status.dest_map
    into rtc_status.pending_eoi. It has some WARN_ONs that trigger if
    pending_eoi ever becomes negative; however, these do not do anything
    to recover, and it bad things will happen soon after they trigger.

    When the next RTC interrupt is triggered, rtc_check_coalesced() will
    return false, but ioapic_service will find pending_eoi != 0 and
    do a BUG_ON. To avoid this, should pending_eoi ever be nonzero,
    call kvm_rtc_eoi_tracking_restore_all to recompute a correct
    dest_map and pending_eoi.

    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • QE reported that they got the BUG_ON in ioapic_service to trigger.
    I cannot reproduce it, but there are two reasons why this could happen.

    The less likely but also easiest one, is when kvm_irq_delivery_to_apic
    does not deliver to any APIC and returns -1.

    Because irqe.shorthand == 0, the kvm_for_each_vcpu loop in that
    function is never reached. However, you can target the similar loop in
    kvm_irq_delivery_to_apic_fast; just program a zero logical destination
    address into the IOAPIC, or an out-of-range physical destination address.

    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • Pull VFIO updates from Alex Williamson:
    "VFIO updates for v3.15 include:

    - Allow the vfio-type1 IOMMU to support multiple domains within a
    container
    - Plumb path to query whether all domains are cache-coherent
    - Wire query into kvm-vfio device to avoid KVM x86 WBINVD emulation
    - Always select CONFIG_ANON_INODES, vfio depends on it (Arnd)

    The first patch also makes the vfio-type1 IOMMU driver completely
    independent of the bus_type of the devices it's handling, which
    enables it to be used for both vfio-pci and a future vfio-platform
    (and hopefully combinations involving both simultaneously)"

    * tag 'vfio-v3.15-rc1' of git://github.com/awilliam/linux-vfio:
    vfio: always select ANON_INODES
    kvm/vfio: Support for DMA coherent IOMMUs
    vfio: Add external user check extension interface
    vfio/type1: Add extension to test DMA cache coherence of IOMMU
    vfio/iommu_type1: Multi-IOMMU domain support

    Linus Torvalds
     

03 Apr, 2014

1 commit

  • Pull kvm updates from Paolo Bonzini:
    "PPC and ARM do not have much going on this time. Most of the cool
    stuff, instead, is in s390 and (after a few releases) x86.

    ARM has some caching fixes and PPC has transactional memory support in
    guests. MIPS has some fixes, with more probably coming in 3.16 as
    QEMU will soon get support for MIPS KVM.

    For x86 there are optimizations for debug registers, which trigger on
    some Windows games, and other important fixes for Windows guests. We
    now expose to the guest Broadwell instruction set extensions and also
    Intel MPX. There's also a fix/workaround for OS X guests, nested
    virtualization features (preemption timer), and a couple kvmclock
    refinements.

    For s390, the main news is asynchronous page faults, together with
    improvements to IRQs (floating irqs and adapter irqs) that speed up
    virtio devices"

    * tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (96 commits)
    KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8
    KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offset
    KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode
    KVM: PPC: Book3S HV: Return ENODEV error rather than EIO
    KVM: PPC: Book3S: Trim top 4 bits of physical address in RTAS code
    KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state
    KVM: PPC: Book3S HV: Add transactional memory support
    KVM: Specify byte order for KVM_EXIT_MMIO
    KVM: vmx: fix MPX detection
    KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n
    KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCE
    KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write
    KVM: s390: clear local interrupts at cpu initial reset
    KVM: s390: Fix possible memory leak in SIGP functions
    KVM: s390: fix calculation of idle_mask array size
    KVM: s390: randomize sca address
    KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP
    KVM: Bump KVM_MAX_IRQ_ROUTES for s390
    KVM: s390: irq routing for adapter interrupts.
    KVM: s390: adapter interrupt sources
    ...

    Linus Torvalds
     

01 Apr, 2014

1 commit

  • Pull x86 LTO changes from Peter Anvin:
    "More infrastructure work in preparation for link-time optimization
    (LTO). Most of these changes is to make sure symbols accessed from
    assembly code are properly marked as visible so the linker doesn't
    remove them.

    My understanding is that the changes to support LTO are still not
    upstream in binutils, but are on the way there. This patchset should
    conclude the x86-specific changes, and remaining patches to actually
    enable LTO will be fed through the Kbuild tree (other than keeping up
    with changes to the x86 code base, of course), although not
    necessarily in this merge window"

    * 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
    Kbuild, lto: Handle basic LTO in modpost
    Kbuild, lto: Disable LTO for asm-offsets.c
    Kbuild, lto: Add a gcc-ld script to let run gcc as ld
    Kbuild, lto: add ld-version and ld-ifversion macros
    Kbuild, lto: Drop .number postfixes in modpost
    Kbuild, lto, workaround: Don't warn for initcall_reference in modpost
    lto: Disable LTO for sys_ni
    lto: Handle LTO common symbols in module loader
    lto, workaround: Add workaround for initcall reordering
    lto: Make asmlinkage __visible
    x86, lto: Disable LTO for the x86 VDSO
    initconst, x86: Fix initconst mistake in ts5500 code
    initconst: Fix initconst mistake in dcdbas
    asmlinkage: Make trace_hardirqs_on/off_caller visible
    asmlinkage, x86: Fix 32bit memcpy for LTO
    asmlinkage Make __stack_chk_failed and memcmp visible
    asmlinkage: Mark rwsem functions that can be called from assembler asmlinkage
    asmlinkage: Make main_extable_sort_needed visible
    asmlinkage, mutex: Mark __visible
    asmlinkage: Make trace_hardirq visible
    ...

    Linus Torvalds
     

21 Mar, 2014

4 commits


19 Mar, 2014

1 commit

  • When registering a new irqfd, we call its ->poll method to collect any
    event that might have previously been pending so that we can trigger it.
    This is done under the kvm->irqfds.lock, which means the eventfd's ctx
    lock is taken under it.

    However, if we get a POLLHUP in irqfd_wakeup, we will be called with the
    ctx lock held before getting the irqfds.lock to deactivate the irqfd,
    causing lockdep to complain.

    Calling the ->poll method does not really need the irqfds.lock, so let's
    just move it after we've given up the irqfds.lock in kvm_irqfd_assign().

    Signed-off-by: Cornelia Huck
    Signed-off-by: Paolo Bonzini

    Cornelia Huck
     

13 Mar, 2014

1 commit

  • Both QEMU and KVM have already accumulated a significant number of
    optimizations based on the hard-coded assumption that ioapic polarity
    will always use the ActiveHigh convention, where the logical and
    physical states of level-triggered irq lines always match (i.e.,
    active(asserted) == high == 1, inactive == low == 0). QEMU guests
    are expected to follow directions given via ACPI and configure the
    ioapic with polarity 0 (ActiveHigh). However, even when misbehaving
    guests (e.g. OS X
    Signed-off-by: Gabriel L. Somlo
    [Move documentation to KVM_IRQ_LINE, add ia64. - Paolo]
    Signed-off-by: Paolo Bonzini

    Gabriel L. Somlo
     

27 Feb, 2014

2 commits

  • VFIO now has support for using the IOMMU_CACHE flag and a mechanism
    for an external user to test the current operating mode of the IOMMU.
    Add support for this to the kvm-vfio pseudo device so that we only
    register noncoherent DMA when necessary.

    Signed-off-by: Alex Williamson
    Cc: Gleb Natapov
    Cc: Paolo Bonzini
    Acked-by: Paolo Bonzini

    Alex Williamson
     
  • Use the arch specific function kvm_arch_vcpu_runnable() to add a further
    criterium to identify a suitable vcpu to yield to during undirected yield
    processing.

    Signed-off-by: Michael Mueller
    Reviewed-by: Christian Borntraeger
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Paolo Bonzini

    Michael Mueller
     

18 Feb, 2014

1 commit

  • When this was introduced, kvm_flush_remote_tlbs() could be called
    without holding mmu_lock. It is now acknowledged that the function
    must be called before releasing mmu_lock, and all callers have already
    been changed to do so.

    There is no need to use smp_mb() and cmpxchg() any more.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Paolo Bonzini

    Takuya Yoshikawa
     

14 Feb, 2014

3 commits


04 Feb, 2014

1 commit


30 Jan, 2014

4 commits

  • On s390 we are not able to cancel work. Instead we will flush the work and wait for
    completion.

    Signed-off-by: Dominik Dingel
    Signed-off-by: Christian Borntraeger

    Dominik Dingel
     
  • By setting a Kconfig option, the architecture can control when
    guest notifications will be presented by the apf backend.
    There is the default batch mechanism, working as before, where the vcpu
    thread should pull in this information.
    Opposite to this, there is now the direct mechanism, that will push the
    information to the guest.
    This way s390 can use an already existing architecture interface.

    Still the vcpu thread should call check_completion to cleanup leftovers.

    Signed-off-by: Dominik Dingel
    Signed-off-by: Christian Borntraeger

    Dominik Dingel
     
  • If kvm_io_bus_register_dev() fails then it returns success but it should
    return an error code.

    I also did a little cleanup like removing an impossible NULL test.

    Cc: stable@vger.kernel.org
    Fixes: 2b3c246a682c ('KVM: Make coalesced mmio use a device per zone')
    Signed-off-by: Dan Carpenter
    Signed-off-by: Paolo Bonzini

    Dan Carpenter
     
  • This patch adds a floating irq controller as a kvm_device.
    It will be necessary for migration of floating interrupts as well
    as for hardening the reset code by allowing user space to explicitly
    remove all pending floating interrupts.

    Signed-off-by: Jens Freimann
    Reviewed-by: Cornelia Huck
    Signed-off-by: Christian Borntraeger

    Jens Freimann
     

23 Jan, 2014

1 commit

  • Pull KVM updates from Paolo Bonzini:
    "First round of KVM updates for 3.14; PPC parts will come next week.

    Nothing major here, just bugfixes all over the place. The most
    interesting part is the ARM guys' virtualized interrupt controller
    overhaul, which lets userspace get/set the state and thus enables
    migration of ARM VMs"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits)
    kvm: make KVM_MMU_AUDIT help text more readable
    KVM: s390: Fix memory access error detection
    KVM: nVMX: Update guest activity state field on L2 exits
    KVM: nVMX: Fix nested_run_pending on activity state HLT
    KVM: nVMX: Clean up handling of VMX-related MSRs
    KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject
    KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit
    KVM: nVMX: Leave VMX mode on clearing of feature control MSR
    KVM: VMX: Fix DR6 update on #DB exception
    KVM: SVM: Fix reading of DR6
    KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS
    add support for Hyper-V reference time counter
    KVM: remove useless write to vcpu->hv_clock.tsc_timestamp
    KVM: x86: fix tsc catchup issue with tsc scaling
    KVM: x86: limit PIT timer frequency
    KVM: x86: handle invalid root_hpa everywhere
    kvm: Provide kvm_vcpu_eligible_for_directed_yield() stub
    kvm: vfio: silence GCC warning
    KVM: ARM: Remove duplicate include
    arm/arm64: KVM: relax the requirements of VMA alignment for THP
    ...

    Linus Torvalds
     

15 Jan, 2014

2 commits

  • Commit 7940876e1330671708186ac3386aa521ffb5c182 ("kvm: make local
    functions static") broke KVM PPC builds due to removing (rather than
    moving) the stub version of kvm_vcpu_eligible_for_directed_yield().

    This patch reintroduces it.

    Signed-off-by: Scott Wood
    Cc: Stephen Hemminger
    Cc: Alexander Graf
    [Move the #ifdef inside the function. - Paolo]
    Signed-off-by: Paolo Bonzini

    Scott Wood
     
  • Building vfio.o triggers a GCC warning (when building for 32 bits x86):
    arch/x86/kvm/../../../virt/kvm/vfio.c: In function 'kvm_vfio_set_group':
    arch/x86/kvm/../../../virt/kvm/vfio.c:104:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    void __user *argp = (void __user *)arg;
    ^

    Silence this warning by casting arg to unsigned long.

    argp's current type, "void __user *", is always casted to "int32_t
    __user *". So its type might as well be changed to "int32_t __user *".

    Signed-off-by: Paul Bolle
    Signed-off-by: Paolo Bonzini

    Paul Bolle
     

09 Jan, 2014

2 commits


22 Dec, 2013

5 commits

  • Implement support for the CPU interface register access driven by MMIO
    address offsets from the CPU interface base address. Useful for user
    space to support save/restore of the VGIC state.

    This commit adds support only for the same logic as the current VGIC
    support, and no more. For example, the active priority registers are
    handled as RAZ/WI, just like setting priorities on the emulated
    distributor.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • Handle MMIO accesses to the two registers which should support both the
    case where the VMs want to read/write either of these registers and the
    case where user space reads/writes these registers to do save/restore of
    the VGIC state.

    Note that the added complexity compared to simple set/clear enable
    registers stems from the bookkeping of source cpu ids. It may be
    possible to change the underlying data structure to simplify the
    complexity, but since this is not in the critical path at all, this will
    do.

    Also note that reading this register from a live guest will not be
    accurate compared to on hardware, because some state may be living on
    the CPU LRs and the only way to give a consistent read would be to force
    stop all the VCPUs and request them to unqueu the LR state onto the
    distributor. Until we have an actual user of live reading this
    register, we can live with the difference.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • To properly access the VGIC state from user space it is very unpractical
    to have to loop through all the LRs in all register access functions.
    Instead, support moving all pending state from LRs to the distributor,
    but leave active state LRs alone.

    Note that to accurately present the active and pending state to VCPUs
    reading these distributor registers from a live VM, we would have to
    stop all other VPUs than the calling VCPU and ask each CPU to unqueue
    their LR state onto the distributor and add fields to track active state
    on the distributor side as well. We don't have any users of such
    functionality yet and there are other inaccuracies of the GIC emulation,
    so don't provide accurate synchronized access to this state just yet.
    However, when the time comes, having this function should help.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • Add infrastructure to handle distributor and cpu interface register
    accesses through the KVM_{GET/SET}_DEVICE_ATTR interface by adding the
    KVM_DEV_ARM_VGIC_GRP_DIST_REGS and KVM_DEV_ARM_VGIC_GRP_CPU_REGS groups
    and defining the semantics of the attr field to be the MMIO offset as
    specified in the GICv2 specs.

    Missing register accesses or other changes in individual register access
    functions to support save/restore of the VGIC state is added in
    subsequent patches.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • Rename the vgic_ranges array to vgic_dist_ranges to be more specific and
    to prepare for handling CPU interface register access as well (for
    save/restore of VGIC state).

    Pass offset from distributor or interface MMIO base to
    find_matching_range function instead of the physical address of the
    access in the VM memory map. This allows other callers unaware of the
    VM specifics, but with generic VGIC knowledge to reuse the function.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall