04 Nov, 2013

1 commit


01 Nov, 2013

1 commit


31 Oct, 2013

3 commits

  • We currently use some ad-hoc arch variables tied to legacy KVM device
    assignment to manage emulation of instructions that depend on whether
    non-coherent DMA is present. Create an interface for this, adapting
    legacy KVM device assignment and adding VFIO via the KVM-VFIO device.
    For now we assume that non-coherent DMA is possible any time we have a
    VFIO group. Eventually an interface can be developed as part of the
    VFIO external user interface to query the coherency of a group.

    Signed-off-by: Alex Williamson
    Signed-off-by: Paolo Bonzini

    Alex Williamson
     
  • Default to operating in coherent mode. This simplifies the logic when
    we switch to a model of registering and unregistering noncoherent I/O
    with KVM.

    Signed-off-by: Alex Williamson
    Signed-off-by: Paolo Bonzini

    Alex Williamson
     
  • So far we've succeeded at making KVM and VFIO mostly unaware of each
    other, but areas are cropping up where a connection beyond eventfds
    and irqfds needs to be made. This patch introduces a KVM-VFIO device
    that is meant to be a gateway for such interaction. The user creates
    the device and can add and remove VFIO groups to it via file
    descriptors. When a group is added, KVM verifies the group is valid
    and gets a reference to it via the VFIO external user interface.

    Signed-off-by: Alex Williamson
    Signed-off-by: Paolo Bonzini

    Alex Williamson
     

17 Oct, 2013

1 commit


15 Oct, 2013

1 commit

  • Page pinning is not mandatory in kvm async page fault processing since
    after async page fault event is delivered to a guest it accesses page once
    again and does its own GUP. Drop the FOLL_GET flag in GUP in async_pf
    code, and do some simplifying in check/clear processing.

    Suggested-by: Gleb Natapov
    Signed-off-by: Gu zheng
    Signed-off-by: chai wen
    Signed-off-by: Gleb Natapov

    chai wen
     

14 Oct, 2013

1 commit

  • The gfn_to_index function relies on huge page defines which either may
    not make sense on systems that don't support huge pages or are defined
    in an unconvenient way for other architectures. Since this is
    x86-specific, move the function to arch/x86/include/asm/kvm_host.h.

    Signed-off-by: Christoffer Dall
    Signed-off-by: Gleb Natapov

    Christoffer Dall
     

30 Sep, 2013

1 commit

  • In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"),
    the kvm_lock was made a raw lock. However, the kvm mmu_shrink()
    function tries to grab the (non-raw) mmu_lock within the scope of
    the raw locked kvm_lock being held. This leads to the following:

    BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
    in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0
    Preemption disabled at:[] mmu_shrink+0x5c/0x1b0 [kvm]

    Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt
    Call Trace:
    [] __might_sleep+0xfd/0x160
    [] rt_spin_lock+0x24/0x50
    [] mmu_shrink+0xec/0x1b0 [kvm]
    [] shrink_slab+0x17d/0x3a0
    [] ? mem_cgroup_iter+0x130/0x260
    [] balance_pgdat+0x54a/0x730
    [] ? set_pgdat_percpu_threshold+0xa7/0xd0
    [] kswapd+0x18f/0x490
    [] ? get_parent_ip+0x11/0x50
    [] ? __init_waitqueue_head+0x50/0x50
    [] ? balance_pgdat+0x730/0x730
    [] kthread+0xdb/0xe0
    [] ? finish_task_switch+0x52/0x100
    [] kernel_thread_helper+0x4/0x10
    [] ? __init_kthread_worker+0x

    After the previous patch, kvm_lock need not be a raw spinlock anymore,
    so change it back.

    Reported-by: Paul Gortmaker
    Cc: kvm@vger.kernel.org
    Cc: gleb@redhat.com
    Cc: jan.kiszka@siemens.com
    Reviewed-by: Gleb Natapov
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

25 Sep, 2013

1 commit

  • '.done' is used to mark the completion of 'async_pf_execute()', but
    'cancel_work_sync()' returns true when the work was canceled, so we
    use it instead.

    Signed-off-by: Radim Krčmář
    Reviewed-by: Paolo Bonzini
    Reviewed-by: Gleb Natapov
    Signed-off-by: Paolo Bonzini

    Radim Krčmář
     

17 Sep, 2013

1 commit

  • Page tables in a read-only memory slot will currently cause a triple
    fault because the page walker uses gfn_to_hva and it fails on such a slot.

    OVMF uses such a page table; however, real hardware seems to be fine with
    that as long as the accessed/dirty bits are set. Save whether the slot
    is readonly, and later check it when updating the accessed and dirty bits.

    Reviewed-by: Xiao Guangrong
    Reviewed-by: Gleb Natapov
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

29 Jul, 2013

1 commit

  • Current common code uses PAGE_OFFSET to indicate a bad host virtual address.
    As this check won't work on architectures that don't map kernel and user memory
    into the same address space (e.g. s390), such architectures can now provide
    their own KVM_HVA_ERR_BAD defines.

    Signed-off-by: Dominik Dingel
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Paolo Bonzini

    Dominik Dingel
     

18 Jul, 2013

2 commits

  • This is called right after the memslots is updated, i.e. when the result
    of update_memslots() gets installed in install_new_memslots(). Since
    the memslots needs to be updated twice when we delete or move a memslot,
    kvm_arch_commit_memory_region() does not correspond to this exactly.

    In the following patch, x86 will use this new API to check if the mmio
    generation has reached its maximum value, in which case mmio sptes need
    to be flushed out.

    Signed-off-by: Takuya Yoshikawa
    Acked-by: Alexander Graf
    Reviewed-by: Xiao Guangrong
    Signed-off-by: Paolo Bonzini

    Takuya Yoshikawa
     
  • Add new functions kvm_io_bus_{read,write}_cookie() that allows users of
    the kvm io infrastructure to use a cookie value to speed up lookup of a
    device on an io bus.

    Signed-off-by: Cornelia Huck
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Gleb Natapov

    Cornelia Huck
     

04 Jul, 2013

1 commit

  • Pull KVM fixes from Paolo Bonzini:
    "On the x86 side, there are some optimizations and documentation
    updates. The big ARM/KVM change for 3.11, support for AArch64, will
    come through Catalin Marinas's tree. s390 and PPC have misc cleanups
    and bugfixes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (87 commits)
    KVM: PPC: Ignore PIR writes
    KVM: PPC: Book3S PR: Invalidate SLB entries properly
    KVM: PPC: Book3S PR: Allow guest to use 1TB segments
    KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match
    KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry
    KVM: PPC: Book3S PR: Fix proto-VSID calculations
    KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL
    KVM: Fix RTC interrupt coalescing tracking
    kvm: Add a tracepoint write_tsc_offset
    KVM: MMU: Inform users of mmio generation wraparound
    KVM: MMU: document fast invalidate all mmio sptes
    KVM: MMU: document fast invalidate all pages
    KVM: MMU: document fast page fault
    KVM: MMU: document mmio page fault
    KVM: MMU: document write_flooding_count
    KVM: MMU: document clear_spte_count
    KVM: MMU: drop kvm_mmu_zap_mmio_sptes
    KVM: MMU: init kvm generation close to mmio wrap-around value
    KVM: MMU: add tracepoint for check_mmio_spte
    KVM: MMU: fast invalidate all mmio sptes
    ...

    Linus Torvalds
     

04 Jun, 2013

1 commit

  • We can easily reach the 1000 limit by start VM with a couple
    hundred I/O devices (multifunction=on). The hardcode limit
    already been adjusted 3 times (6 ~ 200 ~ 300 ~ 1000).

    In userspace, we already have maximum file descriptor to
    limit ioeventfd count. But kvm_io_bus devices also are used
    for pit, pic, ioapic, coalesced_mmio. They couldn't be limited
    by maximum file descriptor.

    Currently only ioeventfds take too much kvm_io_bus devices,
    so just exclude it from counting kvm_io_range limit.

    Also fixed one indent issue in kvm_host.h

    Signed-off-by: Amos Kong
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Gleb Natapov

    Amos Kong
     

31 May, 2013

1 commit

  • The kvm_host.h header file doesn't handle well
    inclusion when archs don't support KVM.

    This results in build crashes for such archs when they
    want to implement context tracking because this subsystem
    includes kvm_host.h in order to implement the
    guest_enter/exit APIs but it doesn't handle KVM off case.

    To fix this, move the guest_enter()/guest_exit()
    declarations and generic implementation to the context
    tracking headers. These generic APIs actually belong to
    this subsystem, besides other domains boundary tracking
    like user_enter() et al.

    KVM now properly becomes a user of this library, not the
    other buggy way around.

    Reported-by: Kevin Hilman
    Reviewed-by: Kevin Hilman
    Tested-by: Kevin Hilman
    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Kevin Hilman
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

16 May, 2013

1 commit

  • kvmclock updates which are isolated to a given vcpu, such as vcpu->cpu
    migration, should not allow system_timestamp from the rest of the vcpus
    to remain static. Otherwise ntp frequency correction applies to one
    vcpu's system_timestamp but not the others.

    So in those cases, request a kvmclock update for all vcpus. The worst
    case for a remote vcpu to update its kvmclock is then bounded by maximum
    nohz sleep latency.

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Gleb Natapov

    Marcelo Tosatti
     

06 May, 2013

1 commit

  • Pull kvm updates from Gleb Natapov:
    "Highlights of the updates are:

    general:
    - new emulated device API
    - legacy device assignment is now optional
    - irqfd interface is more generic and can be shared between arches

    x86:
    - VMCS shadow support and other nested VMX improvements
    - APIC virtualization and Posted Interrupt hardware support
    - Optimize mmio spte zapping

    ppc:
    - BookE: in-kernel MPIC emulation with irqfd support
    - Book3S: in-kernel XICS emulation (incomplete)
    - Book3S: HV: migration fixes
    - BookE: more debug support preparation
    - BookE: e6500 support

    ARM:
    - reworking of Hyp idmaps

    s390:
    - ioeventfd for virtio-ccw

    And many other bug fixes, cleanups and improvements"

    * tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
    kvm: Add compat_ioctl for device control API
    KVM: x86: Account for failing enable_irq_window for NMI window request
    KVM: PPC: Book3S: Add API for in-kernel XICS emulation
    kvm/ppc/mpic: fix missing unlock in set_base_addr()
    kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
    kvm/ppc/mpic: remove users
    kvm/ppc/mpic: fix mmio region lists when multiple guests used
    kvm/ppc/mpic: remove default routes from documentation
    kvm: KVM_CAP_IOMMU only available with device assignment
    ARM: KVM: iterate over all CPUs for CPU compatibility check
    KVM: ARM: Fix spelling in error message
    ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
    KVM: ARM: Fix API documentation for ONE_REG encoding
    ARM: KVM: promote vfp_host pointer to generic host cpu context
    ARM: KVM: add architecture specific hook for capabilities
    ARM: KVM: perform HYP initilization for hotplugged CPUs
    ARM: KVM: switch to a dual-step HYP init code
    ARM: KVM: rework HYP page table freeing
    ARM: KVM: enforce maximum size for identity mapped code
    ARM: KVM: move to a KVM provided HYP idmap
    ...

    Linus Torvalds
     

02 May, 2013

1 commit

  • This adds the API for userspace to instantiate an XICS device in a VM
    and connect VCPUs to it. The API consists of a new device type for
    the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which
    functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl,
    which is used to assert and deassert interrupt inputs of the XICS.

    The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES.
    Each attribute within this group corresponds to the state of one
    interrupt source. The attribute number is the same as the interrupt
    source number.

    This does not support irq routing or irqfd yet.

    Signed-off-by: Paul Mackerras
    Acked-by: David Gibson
    Signed-off-by: Alexander Graf

    Paul Mackerras
     

28 Apr, 2013

3 commits


27 Apr, 2013

7 commits

  • The hassle of getting refcounting right was greater than the hassle
    of keeping a list of devices to destroy on VM exit.

    Signed-off-by: Scott Wood
    Signed-off-by: Alexander Graf

    Scott Wood
     
  • Hook the MPIC code up to the KVM interfaces, add locking, etc.

    Signed-off-by: Scott Wood
    [agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit]
    Signed-off-by: Alexander Graf

    Scott Wood
     
  • Currently, devices that are emulated inside KVM are configured in a
    hardcoded manner based on an assumption that any given architecture
    only has one way to do it. If there's any need to access device state,
    it is done through inflexible one-purpose-only IOCTLs (e.g.
    KVM_GET/SET_LAPIC). Defining new IOCTLs for every little thing is
    cumbersome and depletes a limited numberspace.

    This API provides a mechanism to instantiate a device of a certain
    type, returning an ID that can be used to set/get attributes of the
    device. Attributes may include configuration parameters (e.g.
    register base address), device state, operational commands, etc. It
    is similar to the ONE_REG API, except that it acts on devices rather
    than vcpus.

    Both device types and individual attributes can be tested without having
    to create the device or get/set the attribute, without the need for
    separately managing enumerated capabilities.

    Signed-off-by: Scott Wood
    Signed-off-by: Alexander Graf

    Scott Wood
     
  • Setting up IRQ routes is nothing IOAPIC specific. Extract everything
    that really is generic code into irqchip.c and only leave the ioapic
    specific bits to irq_comm.c.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     
  • The prototype has been stale for a while, I can't spot any real function
    define behind it. Let's just remove it.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     
  • Quite a bit of code in KVM has been conditionalized on availability of
    IOAPIC emulation. However, most of it is generically applicable to
    platforms that don't have an IOPIC, but a different type of irq chip.

    Make code that only relies on IRQ routing, not an APIC itself, on
    CONFIG_HAVE_KVM_IRQ_ROUTING, so that we can reuse it later.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     
  • The concept of routing interrupt lines to an irqchip is nothing
    that is IOAPIC specific. Every irqchip has a maximum number of pins
    that can be linked to irq lines.

    So let's add a new define that allows us to reuse generic code for
    non-IOAPIC platforms.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     

17 Apr, 2013

1 commit


16 Apr, 2013

1 commit


08 Apr, 2013

2 commits

  • The variable kvm_rebooting is a common kvm variable, so move its
    declaration from arch/x86/include/asm/kvm_host.h to
    include/asm/kvm_host.h.

    Fixes this sparse warning when building on arm64:

    virt/kvm/kvm_main.c:warning: symbol 'kvm_rebooting' was not declared. Should it be static?

    Signed-off-by: Geoff Levand
    Signed-off-by: Gleb Natapov

    Geoff Levand
     
  • The variables vm_list and kvm_lock are common to all architectures, so
    move the declarations from arch/x86/include/asm/kvm_host.h to
    include/linux/kvm_host.h.

    Fixes sparse warnings like these when building for arm64:

    virt/kvm/kvm_main.c: warning: symbol 'kvm_lock' was not declared. Should it be static?
    virt/kvm/kvm_main.c: warning: symbol 'vm_list' was not declared. Should it be static?

    Signed-off-by: Geoff Levand
    Signed-off-by: Gleb Natapov

    Geoff Levand
     

07 Apr, 2013

1 commit

  • This patch adds support for kvm_gfn_to_hva_cache_init functions for
    reads and writes that will cross a page. If the range falls within
    the same memslot, then this will be a fast operation. If the range
    is split between two memslots, then the slower kvm_read_guest and
    kvm_write_guest are used.

    Tested: Test against kvm_clock unit tests.

    Signed-off-by: Andrew Honig
    Signed-off-by: Gleb Natapov

    Andrew Honig
     

22 Mar, 2013

1 commit


21 Mar, 2013

1 commit

  • Merge reason:

    From: Alexander Graf

    "Just recently this really important patch got pulled into Linus' tree for 3.9:

    commit 1674400aaee5b466c595a8fc310488263ce888c7
    Author: Anton Blanchard samba.org>
    Date: Tue Mar 12 01:51:51 2013 +0000

    Without that commit, I can not boot my G5, thus I can't run automated tests on it against my queue.

    Could you please merge kvm/next against linus/master, so that I can base my trees against that?"

    * upstream/master: (653 commits)
    PCI: Use ROM images from firmware only if no other ROM source available
    sparc: remove unused "config BITS"
    sparc: delete "if !ULTRA_HAS_POPULATION_COUNT"
    KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798)
    KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797)
    KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796)
    arm64: Kconfig.debug: Remove unused CONFIG_DEBUG_ERRORS
    arm64: Do not select GENERIC_HARDIRQS_NO_DEPRECATED
    inet: limit length of fragment queue hash table bucket lists
    qeth: Fix scatter-gather regression
    qeth: Fix invalid router settings handling
    qeth: delay feature trace
    sgy-cts1000: Remove __dev* attributes
    KVM: x86: fix deadlock in clock-in-progress request handling
    KVM: allow host header to be included even for !CONFIG_KVM
    hwmon: (lm75) Fix tcn75 prefix
    hwmon: (lm75.h) Update header inclusion
    MAINTAINERS: Remove Mark M. Hoffman
    xfs: ensure we capture IO errors correctly
    xfs: fix xfs_iomap_eof_prealloc_initial_size type
    ...

    Signed-off-by: Marcelo Tosatti

    Marcelo Tosatti
     

19 Mar, 2013

1 commit

  • The new context tracking subsystem unconditionally includes kvm_host.h
    headers for the guest enter/exit macros. This causes a compile
    failure when KVM is not enabled.

    Fix by adding an IS_ENABLED(CONFIG_KVM) check to kvm_host so it can
    be included/compiled even when KVM is not enabled.

    Cc: Frederic Weisbecker
    Signed-off-by: Kevin Hilman
    Signed-off-by: Marcelo Tosatti

    Kevin Hilman
     

11 Mar, 2013

1 commit

  • Note that we mark as preempted only when vcpu's task state was
    Running during preemption.

    Thanks Jiannan, Avi for preemption notifier ideas. Thanks Gleb, PeterZ
    for their precious suggestions. Thanks Srikar for an idea on avoiding
    rcu lock while checking task state that improved overcommit numbers.

    Reviewed-by: Chegu Vinod
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Raghavendra K T
    Signed-off-by: Gleb Natapov

    Raghavendra K T
     

06 Mar, 2013

1 commit