15 Apr, 2014

1 commit

  • Pull KVM fixes from Marcelo Tosatti:
    - Fix for guest triggerable BUG_ON (CVE-2014-0155)
    - CR4.SMAP support
    - Spurious WARN_ON() fix

    * git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: x86: remove WARN_ON from get_kernel_ns()
    KVM: Rename variable smep to cr4_smep
    KVM: expose SMAP feature to guest
    KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
    KVM: Add SMAP support when setting CR4
    KVM: Remove SMAP bit from CR4_RESERVED_BITS
    KVM: ioapic: try to recover if pending_eoi goes out of range
    KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155)

    Linus Torvalds
     

08 Apr, 2014

1 commit

  • Commit 8146875de7d4 (arm, kvm: Fix CPU hotplug callback registration)
    holds the lock before calling the two functions:

    kvm_vgic_hyp_init()
    kvm_timer_hyp_init()

    and both the two functions are calling register_cpu_notifier()
    to register cpu notifier, so cause double lock on cpu_add_remove_lock.

    Considered that both two functions are only called inside
    kvm_arch_init() with holding cpu_add_remove_lock, so simply use
    __register_cpu_notifier() to fix the problem.

    Fixes: 8146875de7d4 (arm, kvm: Fix CPU hotplug callback registration)
    Signed-off-by: Ming Lei
    Reviewed-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Ming Lei
     

04 Apr, 2014

3 commits

  • The RTC tracking code tracks the cardinality of rtc_status.dest_map
    into rtc_status.pending_eoi. It has some WARN_ONs that trigger if
    pending_eoi ever becomes negative; however, these do not do anything
    to recover, and it bad things will happen soon after they trigger.

    When the next RTC interrupt is triggered, rtc_check_coalesced() will
    return false, but ioapic_service will find pending_eoi != 0 and
    do a BUG_ON. To avoid this, should pending_eoi ever be nonzero,
    call kvm_rtc_eoi_tracking_restore_all to recompute a correct
    dest_map and pending_eoi.

    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • QE reported that they got the BUG_ON in ioapic_service to trigger.
    I cannot reproduce it, but there are two reasons why this could happen.

    The less likely but also easiest one, is when kvm_irq_delivery_to_apic
    does not deliver to any APIC and returns -1.

    Because irqe.shorthand == 0, the kvm_for_each_vcpu loop in that
    function is never reached. However, you can target the similar loop in
    kvm_irq_delivery_to_apic_fast; just program a zero logical destination
    address into the IOAPIC, or an out-of-range physical destination address.

    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • Pull VFIO updates from Alex Williamson:
    "VFIO updates for v3.15 include:

    - Allow the vfio-type1 IOMMU to support multiple domains within a
    container
    - Plumb path to query whether all domains are cache-coherent
    - Wire query into kvm-vfio device to avoid KVM x86 WBINVD emulation
    - Always select CONFIG_ANON_INODES, vfio depends on it (Arnd)

    The first patch also makes the vfio-type1 IOMMU driver completely
    independent of the bus_type of the devices it's handling, which
    enables it to be used for both vfio-pci and a future vfio-platform
    (and hopefully combinations involving both simultaneously)"

    * tag 'vfio-v3.15-rc1' of git://github.com/awilliam/linux-vfio:
    vfio: always select ANON_INODES
    kvm/vfio: Support for DMA coherent IOMMUs
    vfio: Add external user check extension interface
    vfio/type1: Add extension to test DMA cache coherence of IOMMU
    vfio/iommu_type1: Multi-IOMMU domain support

    Linus Torvalds
     

03 Apr, 2014

1 commit

  • Pull kvm updates from Paolo Bonzini:
    "PPC and ARM do not have much going on this time. Most of the cool
    stuff, instead, is in s390 and (after a few releases) x86.

    ARM has some caching fixes and PPC has transactional memory support in
    guests. MIPS has some fixes, with more probably coming in 3.16 as
    QEMU will soon get support for MIPS KVM.

    For x86 there are optimizations for debug registers, which trigger on
    some Windows games, and other important fixes for Windows guests. We
    now expose to the guest Broadwell instruction set extensions and also
    Intel MPX. There's also a fix/workaround for OS X guests, nested
    virtualization features (preemption timer), and a couple kvmclock
    refinements.

    For s390, the main news is asynchronous page faults, together with
    improvements to IRQs (floating irqs and adapter irqs) that speed up
    virtio devices"

    * tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (96 commits)
    KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8
    KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offset
    KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode
    KVM: PPC: Book3S HV: Return ENODEV error rather than EIO
    KVM: PPC: Book3S: Trim top 4 bits of physical address in RTAS code
    KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state
    KVM: PPC: Book3S HV: Add transactional memory support
    KVM: Specify byte order for KVM_EXIT_MMIO
    KVM: vmx: fix MPX detection
    KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n
    KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCE
    KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write
    KVM: s390: clear local interrupts at cpu initial reset
    KVM: s390: Fix possible memory leak in SIGP functions
    KVM: s390: fix calculation of idle_mask array size
    KVM: s390: randomize sca address
    KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP
    KVM: Bump KVM_MAX_IRQ_ROUTES for s390
    KVM: s390: irq routing for adapter interrupts.
    KVM: s390: adapter interrupt sources
    ...

    Linus Torvalds
     

01 Apr, 2014

1 commit

  • Pull x86 LTO changes from Peter Anvin:
    "More infrastructure work in preparation for link-time optimization
    (LTO). Most of these changes is to make sure symbols accessed from
    assembly code are properly marked as visible so the linker doesn't
    remove them.

    My understanding is that the changes to support LTO are still not
    upstream in binutils, but are on the way there. This patchset should
    conclude the x86-specific changes, and remaining patches to actually
    enable LTO will be fed through the Kbuild tree (other than keeping up
    with changes to the x86 code base, of course), although not
    necessarily in this merge window"

    * 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
    Kbuild, lto: Handle basic LTO in modpost
    Kbuild, lto: Disable LTO for asm-offsets.c
    Kbuild, lto: Add a gcc-ld script to let run gcc as ld
    Kbuild, lto: add ld-version and ld-ifversion macros
    Kbuild, lto: Drop .number postfixes in modpost
    Kbuild, lto, workaround: Don't warn for initcall_reference in modpost
    lto: Disable LTO for sys_ni
    lto: Handle LTO common symbols in module loader
    lto, workaround: Add workaround for initcall reordering
    lto: Make asmlinkage __visible
    x86, lto: Disable LTO for the x86 VDSO
    initconst, x86: Fix initconst mistake in ts5500 code
    initconst: Fix initconst mistake in dcdbas
    asmlinkage: Make trace_hardirqs_on/off_caller visible
    asmlinkage, x86: Fix 32bit memcpy for LTO
    asmlinkage Make __stack_chk_failed and memcmp visible
    asmlinkage: Mark rwsem functions that can be called from assembler asmlinkage
    asmlinkage: Make main_extable_sort_needed visible
    asmlinkage, mutex: Mark __visible
    asmlinkage: Make trace_hardirq visible
    ...

    Linus Torvalds
     

21 Mar, 2014

4 commits


19 Mar, 2014

1 commit

  • When registering a new irqfd, we call its ->poll method to collect any
    event that might have previously been pending so that we can trigger it.
    This is done under the kvm->irqfds.lock, which means the eventfd's ctx
    lock is taken under it.

    However, if we get a POLLHUP in irqfd_wakeup, we will be called with the
    ctx lock held before getting the irqfds.lock to deactivate the irqfd,
    causing lockdep to complain.

    Calling the ->poll method does not really need the irqfds.lock, so let's
    just move it after we've given up the irqfds.lock in kvm_irqfd_assign().

    Signed-off-by: Cornelia Huck
    Signed-off-by: Paolo Bonzini

    Cornelia Huck
     

13 Mar, 2014

1 commit

  • Both QEMU and KVM have already accumulated a significant number of
    optimizations based on the hard-coded assumption that ioapic polarity
    will always use the ActiveHigh convention, where the logical and
    physical states of level-triggered irq lines always match (i.e.,
    active(asserted) == high == 1, inactive == low == 0). QEMU guests
    are expected to follow directions given via ACPI and configure the
    ioapic with polarity 0 (ActiveHigh). However, even when misbehaving
    guests (e.g. OS X
    Signed-off-by: Gabriel L. Somlo
    [Move documentation to KVM_IRQ_LINE, add ia64. - Paolo]
    Signed-off-by: Paolo Bonzini

    Gabriel L. Somlo
     

27 Feb, 2014

2 commits

  • VFIO now has support for using the IOMMU_CACHE flag and a mechanism
    for an external user to test the current operating mode of the IOMMU.
    Add support for this to the kvm-vfio pseudo device so that we only
    register noncoherent DMA when necessary.

    Signed-off-by: Alex Williamson
    Cc: Gleb Natapov
    Cc: Paolo Bonzini
    Acked-by: Paolo Bonzini

    Alex Williamson
     
  • Use the arch specific function kvm_arch_vcpu_runnable() to add a further
    criterium to identify a suitable vcpu to yield to during undirected yield
    processing.

    Signed-off-by: Michael Mueller
    Reviewed-by: Christian Borntraeger
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Paolo Bonzini

    Michael Mueller
     

18 Feb, 2014

1 commit

  • When this was introduced, kvm_flush_remote_tlbs() could be called
    without holding mmu_lock. It is now acknowledged that the function
    must be called before releasing mmu_lock, and all callers have already
    been changed to do so.

    There is no need to use smp_mb() and cmpxchg() any more.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Paolo Bonzini

    Takuya Yoshikawa
     

14 Feb, 2014

3 commits


04 Feb, 2014

1 commit


30 Jan, 2014

4 commits

  • On s390 we are not able to cancel work. Instead we will flush the work and wait for
    completion.

    Signed-off-by: Dominik Dingel
    Signed-off-by: Christian Borntraeger

    Dominik Dingel
     
  • By setting a Kconfig option, the architecture can control when
    guest notifications will be presented by the apf backend.
    There is the default batch mechanism, working as before, where the vcpu
    thread should pull in this information.
    Opposite to this, there is now the direct mechanism, that will push the
    information to the guest.
    This way s390 can use an already existing architecture interface.

    Still the vcpu thread should call check_completion to cleanup leftovers.

    Signed-off-by: Dominik Dingel
    Signed-off-by: Christian Borntraeger

    Dominik Dingel
     
  • If kvm_io_bus_register_dev() fails then it returns success but it should
    return an error code.

    I also did a little cleanup like removing an impossible NULL test.

    Cc: stable@vger.kernel.org
    Fixes: 2b3c246a682c ('KVM: Make coalesced mmio use a device per zone')
    Signed-off-by: Dan Carpenter
    Signed-off-by: Paolo Bonzini

    Dan Carpenter
     
  • This patch adds a floating irq controller as a kvm_device.
    It will be necessary for migration of floating interrupts as well
    as for hardening the reset code by allowing user space to explicitly
    remove all pending floating interrupts.

    Signed-off-by: Jens Freimann
    Reviewed-by: Cornelia Huck
    Signed-off-by: Christian Borntraeger

    Jens Freimann
     

23 Jan, 2014

1 commit

  • Pull KVM updates from Paolo Bonzini:
    "First round of KVM updates for 3.14; PPC parts will come next week.

    Nothing major here, just bugfixes all over the place. The most
    interesting part is the ARM guys' virtualized interrupt controller
    overhaul, which lets userspace get/set the state and thus enables
    migration of ARM VMs"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits)
    kvm: make KVM_MMU_AUDIT help text more readable
    KVM: s390: Fix memory access error detection
    KVM: nVMX: Update guest activity state field on L2 exits
    KVM: nVMX: Fix nested_run_pending on activity state HLT
    KVM: nVMX: Clean up handling of VMX-related MSRs
    KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject
    KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit
    KVM: nVMX: Leave VMX mode on clearing of feature control MSR
    KVM: VMX: Fix DR6 update on #DB exception
    KVM: SVM: Fix reading of DR6
    KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS
    add support for Hyper-V reference time counter
    KVM: remove useless write to vcpu->hv_clock.tsc_timestamp
    KVM: x86: fix tsc catchup issue with tsc scaling
    KVM: x86: limit PIT timer frequency
    KVM: x86: handle invalid root_hpa everywhere
    kvm: Provide kvm_vcpu_eligible_for_directed_yield() stub
    kvm: vfio: silence GCC warning
    KVM: ARM: Remove duplicate include
    arm/arm64: KVM: relax the requirements of VMA alignment for THP
    ...

    Linus Torvalds
     

15 Jan, 2014

2 commits

  • Commit 7940876e1330671708186ac3386aa521ffb5c182 ("kvm: make local
    functions static") broke KVM PPC builds due to removing (rather than
    moving) the stub version of kvm_vcpu_eligible_for_directed_yield().

    This patch reintroduces it.

    Signed-off-by: Scott Wood
    Cc: Stephen Hemminger
    Cc: Alexander Graf
    [Move the #ifdef inside the function. - Paolo]
    Signed-off-by: Paolo Bonzini

    Scott Wood
     
  • Building vfio.o triggers a GCC warning (when building for 32 bits x86):
    arch/x86/kvm/../../../virt/kvm/vfio.c: In function 'kvm_vfio_set_group':
    arch/x86/kvm/../../../virt/kvm/vfio.c:104:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    void __user *argp = (void __user *)arg;
    ^

    Silence this warning by casting arg to unsigned long.

    argp's current type, "void __user *", is always casted to "int32_t
    __user *". So its type might as well be changed to "int32_t __user *".

    Signed-off-by: Paul Bolle
    Signed-off-by: Paolo Bonzini

    Paul Bolle
     

09 Jan, 2014

2 commits


22 Dec, 2013

10 commits

  • Implement support for the CPU interface register access driven by MMIO
    address offsets from the CPU interface base address. Useful for user
    space to support save/restore of the VGIC state.

    This commit adds support only for the same logic as the current VGIC
    support, and no more. For example, the active priority registers are
    handled as RAZ/WI, just like setting priorities on the emulated
    distributor.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • Handle MMIO accesses to the two registers which should support both the
    case where the VMs want to read/write either of these registers and the
    case where user space reads/writes these registers to do save/restore of
    the VGIC state.

    Note that the added complexity compared to simple set/clear enable
    registers stems from the bookkeping of source cpu ids. It may be
    possible to change the underlying data structure to simplify the
    complexity, but since this is not in the critical path at all, this will
    do.

    Also note that reading this register from a live guest will not be
    accurate compared to on hardware, because some state may be living on
    the CPU LRs and the only way to give a consistent read would be to force
    stop all the VCPUs and request them to unqueu the LR state onto the
    distributor. Until we have an actual user of live reading this
    register, we can live with the difference.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • To properly access the VGIC state from user space it is very unpractical
    to have to loop through all the LRs in all register access functions.
    Instead, support moving all pending state from LRs to the distributor,
    but leave active state LRs alone.

    Note that to accurately present the active and pending state to VCPUs
    reading these distributor registers from a live VM, we would have to
    stop all other VPUs than the calling VCPU and ask each CPU to unqueue
    their LR state onto the distributor and add fields to track active state
    on the distributor side as well. We don't have any users of such
    functionality yet and there are other inaccuracies of the GIC emulation,
    so don't provide accurate synchronized access to this state just yet.
    However, when the time comes, having this function should help.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • Add infrastructure to handle distributor and cpu interface register
    accesses through the KVM_{GET/SET}_DEVICE_ATTR interface by adding the
    KVM_DEV_ARM_VGIC_GRP_DIST_REGS and KVM_DEV_ARM_VGIC_GRP_CPU_REGS groups
    and defining the semantics of the attr field to be the MMIO offset as
    specified in the GICv2 specs.

    Missing register accesses or other changes in individual register access
    functions to support save/restore of the VGIC state is added in
    subsequent patches.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • Rename the vgic_ranges array to vgic_dist_ranges to be more specific and
    to prepare for handling CPU interface register access as well (for
    save/restore of VGIC state).

    Pass offset from distributor or interface MMIO base to
    find_matching_range function instead of the physical address of the
    access in the VM memory map. This allows other callers unaware of the
    VM specifics, but with generic VGIC knowledge to reuse the function.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • Support setting the distributor and cpu interface base addresses in the
    VM physical address space through the KVM_{SET,GET}_DEVICE_ATTR API
    in addition to the ARM specific API.

    This has the added benefit of being able to share more code in user
    space and do things in a uniform manner.

    Also deprecate the older API at the same time, but backwards
    compatibility will be maintained.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • Support creating the ARM VGIC device through the KVM_CREATE_DEVICE
    ioctl, which can then later be leveraged to use the
    KVM_{GET/SET}_DEVICE_ATTR, which is useful both for setting addresses in
    a more generic API than the ARM-specific one and is useful for
    save/restore of VGIC state.

    Adds KVM_CAP_DEVICE_CTRL to ARM capabilities.

    Note that we change the check for creating a VGIC from bailing out if
    any VCPUs were created, to bailing out if any VCPUs were ever run. This
    is an important distinction that shouldn't break anything, but allows
    creating the VGIC after the VCPUs have been created.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • Rework the VGIC initialization slightly to allow initialization of the
    vgic cpu-specific state even if the irqchip (the VGIC) hasn't been
    created by user space yet. This is safe, because the vgic data
    structures are already allocated when the CPU is allocated if VGIC
    support is compiled into the kernel. Further, the init process does not
    depend on any other information and the sacrifice is a slight
    performance degradation for creating VMs in the no-VGIC case.

    The reason is that the new device control API doesn't mandate creating
    the VGIC before creating the VCPU and it is unreasonable to require user
    space to create the VGIC before creating the VCPUs.

    At the same time move the irqchip_in_kernel check out of
    kvm_vcpu_first_run_init and into the init function to make the per-vcpu
    and global init functions symmetric and add comments on the exported
    functions making it a bit easier to understand the init flow by only
    looking at vgic.c.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • For migration to work we need to save (and later restore) the state of
    each core's virtual generic timer.
    Since this is per VCPU, we can use the [gs]et_one_reg ioctl and export
    the three needed registers (control, counter, compare value).
    Though they live in cp15 space, we don't use the existing list, since
    they need special accessor functions and the arch timer is optional.

    Acked-by: Marc Zynger
    Signed-off-by: Andre Przywara
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • Initialize the cntvoff at kvm_init_vm time, not before running the VCPUs
    at the first time because that will overwrite any potentially restored
    values from user space.

    Cc: Andre Przywara
    Acked-by: Marc Zynger
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

13 Dec, 2013

1 commit