04 Jul, 2013

1 commit

  • Pull KVM fixes from Paolo Bonzini:
    "On the x86 side, there are some optimizations and documentation
    updates. The big ARM/KVM change for 3.11, support for AArch64, will
    come through Catalin Marinas's tree. s390 and PPC have misc cleanups
    and bugfixes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (87 commits)
    KVM: PPC: Ignore PIR writes
    KVM: PPC: Book3S PR: Invalidate SLB entries properly
    KVM: PPC: Book3S PR: Allow guest to use 1TB segments
    KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match
    KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry
    KVM: PPC: Book3S PR: Fix proto-VSID calculations
    KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL
    KVM: Fix RTC interrupt coalescing tracking
    kvm: Add a tracepoint write_tsc_offset
    KVM: MMU: Inform users of mmio generation wraparound
    KVM: MMU: document fast invalidate all mmio sptes
    KVM: MMU: document fast invalidate all pages
    KVM: MMU: document fast page fault
    KVM: MMU: document mmio page fault
    KVM: MMU: document write_flooding_count
    KVM: MMU: document clear_spte_count
    KVM: MMU: drop kvm_mmu_zap_mmio_sptes
    KVM: MMU: init kvm generation close to mmio wrap-around value
    KVM: MMU: add tracepoint for check_mmio_spte
    KVM: MMU: fast invalidate all mmio sptes
    ...

    Linus Torvalds
     

04 Jun, 2013

1 commit

  • We can easily reach the 1000 limit by start VM with a couple
    hundred I/O devices (multifunction=on). The hardcode limit
    already been adjusted 3 times (6 ~ 200 ~ 300 ~ 1000).

    In userspace, we already have maximum file descriptor to
    limit ioeventfd count. But kvm_io_bus devices also are used
    for pit, pic, ioapic, coalesced_mmio. They couldn't be limited
    by maximum file descriptor.

    Currently only ioeventfds take too much kvm_io_bus devices,
    so just exclude it from counting kvm_io_range limit.

    Also fixed one indent issue in kvm_host.h

    Signed-off-by: Amos Kong
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Gleb Natapov

    Amos Kong
     

31 May, 2013

1 commit

  • The kvm_host.h header file doesn't handle well
    inclusion when archs don't support KVM.

    This results in build crashes for such archs when they
    want to implement context tracking because this subsystem
    includes kvm_host.h in order to implement the
    guest_enter/exit APIs but it doesn't handle KVM off case.

    To fix this, move the guest_enter()/guest_exit()
    declarations and generic implementation to the context
    tracking headers. These generic APIs actually belong to
    this subsystem, besides other domains boundary tracking
    like user_enter() et al.

    KVM now properly becomes a user of this library, not the
    other buggy way around.

    Reported-by: Kevin Hilman
    Reviewed-by: Kevin Hilman
    Tested-by: Kevin Hilman
    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Kevin Hilman
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

16 May, 2013

1 commit

  • kvmclock updates which are isolated to a given vcpu, such as vcpu->cpu
    migration, should not allow system_timestamp from the rest of the vcpus
    to remain static. Otherwise ntp frequency correction applies to one
    vcpu's system_timestamp but not the others.

    So in those cases, request a kvmclock update for all vcpus. The worst
    case for a remote vcpu to update its kvmclock is then bounded by maximum
    nohz sleep latency.

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Gleb Natapov

    Marcelo Tosatti
     

06 May, 2013

1 commit

  • Pull kvm updates from Gleb Natapov:
    "Highlights of the updates are:

    general:
    - new emulated device API
    - legacy device assignment is now optional
    - irqfd interface is more generic and can be shared between arches

    x86:
    - VMCS shadow support and other nested VMX improvements
    - APIC virtualization and Posted Interrupt hardware support
    - Optimize mmio spte zapping

    ppc:
    - BookE: in-kernel MPIC emulation with irqfd support
    - Book3S: in-kernel XICS emulation (incomplete)
    - Book3S: HV: migration fixes
    - BookE: more debug support preparation
    - BookE: e6500 support

    ARM:
    - reworking of Hyp idmaps

    s390:
    - ioeventfd for virtio-ccw

    And many other bug fixes, cleanups and improvements"

    * tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
    kvm: Add compat_ioctl for device control API
    KVM: x86: Account for failing enable_irq_window for NMI window request
    KVM: PPC: Book3S: Add API for in-kernel XICS emulation
    kvm/ppc/mpic: fix missing unlock in set_base_addr()
    kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
    kvm/ppc/mpic: remove users
    kvm/ppc/mpic: fix mmio region lists when multiple guests used
    kvm/ppc/mpic: remove default routes from documentation
    kvm: KVM_CAP_IOMMU only available with device assignment
    ARM: KVM: iterate over all CPUs for CPU compatibility check
    KVM: ARM: Fix spelling in error message
    ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
    KVM: ARM: Fix API documentation for ONE_REG encoding
    ARM: KVM: promote vfp_host pointer to generic host cpu context
    ARM: KVM: add architecture specific hook for capabilities
    ARM: KVM: perform HYP initilization for hotplugged CPUs
    ARM: KVM: switch to a dual-step HYP init code
    ARM: KVM: rework HYP page table freeing
    ARM: KVM: enforce maximum size for identity mapped code
    ARM: KVM: move to a KVM provided HYP idmap
    ...

    Linus Torvalds
     

02 May, 2013

1 commit

  • This adds the API for userspace to instantiate an XICS device in a VM
    and connect VCPUs to it. The API consists of a new device type for
    the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which
    functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl,
    which is used to assert and deassert interrupt inputs of the XICS.

    The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES.
    Each attribute within this group corresponds to the state of one
    interrupt source. The attribute number is the same as the interrupt
    source number.

    This does not support irq routing or irqfd yet.

    Signed-off-by: Paul Mackerras
    Acked-by: David Gibson
    Signed-off-by: Alexander Graf

    Paul Mackerras
     

28 Apr, 2013

3 commits


27 Apr, 2013

7 commits

  • The hassle of getting refcounting right was greater than the hassle
    of keeping a list of devices to destroy on VM exit.

    Signed-off-by: Scott Wood
    Signed-off-by: Alexander Graf

    Scott Wood
     
  • Hook the MPIC code up to the KVM interfaces, add locking, etc.

    Signed-off-by: Scott Wood
    [agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit]
    Signed-off-by: Alexander Graf

    Scott Wood
     
  • Currently, devices that are emulated inside KVM are configured in a
    hardcoded manner based on an assumption that any given architecture
    only has one way to do it. If there's any need to access device state,
    it is done through inflexible one-purpose-only IOCTLs (e.g.
    KVM_GET/SET_LAPIC). Defining new IOCTLs for every little thing is
    cumbersome and depletes a limited numberspace.

    This API provides a mechanism to instantiate a device of a certain
    type, returning an ID that can be used to set/get attributes of the
    device. Attributes may include configuration parameters (e.g.
    register base address), device state, operational commands, etc. It
    is similar to the ONE_REG API, except that it acts on devices rather
    than vcpus.

    Both device types and individual attributes can be tested without having
    to create the device or get/set the attribute, without the need for
    separately managing enumerated capabilities.

    Signed-off-by: Scott Wood
    Signed-off-by: Alexander Graf

    Scott Wood
     
  • Setting up IRQ routes is nothing IOAPIC specific. Extract everything
    that really is generic code into irqchip.c and only leave the ioapic
    specific bits to irq_comm.c.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     
  • The prototype has been stale for a while, I can't spot any real function
    define behind it. Let's just remove it.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     
  • Quite a bit of code in KVM has been conditionalized on availability of
    IOAPIC emulation. However, most of it is generically applicable to
    platforms that don't have an IOPIC, but a different type of irq chip.

    Make code that only relies on IRQ routing, not an APIC itself, on
    CONFIG_HAVE_KVM_IRQ_ROUTING, so that we can reuse it later.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     
  • The concept of routing interrupt lines to an irqchip is nothing
    that is IOAPIC specific. Every irqchip has a maximum number of pins
    that can be linked to irq lines.

    So let's add a new define that allows us to reuse generic code for
    non-IOAPIC platforms.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     

17 Apr, 2013

1 commit


16 Apr, 2013

1 commit


08 Apr, 2013

2 commits

  • The variable kvm_rebooting is a common kvm variable, so move its
    declaration from arch/x86/include/asm/kvm_host.h to
    include/asm/kvm_host.h.

    Fixes this sparse warning when building on arm64:

    virt/kvm/kvm_main.c:warning: symbol 'kvm_rebooting' was not declared. Should it be static?

    Signed-off-by: Geoff Levand
    Signed-off-by: Gleb Natapov

    Geoff Levand
     
  • The variables vm_list and kvm_lock are common to all architectures, so
    move the declarations from arch/x86/include/asm/kvm_host.h to
    include/linux/kvm_host.h.

    Fixes sparse warnings like these when building for arm64:

    virt/kvm/kvm_main.c: warning: symbol 'kvm_lock' was not declared. Should it be static?
    virt/kvm/kvm_main.c: warning: symbol 'vm_list' was not declared. Should it be static?

    Signed-off-by: Geoff Levand
    Signed-off-by: Gleb Natapov

    Geoff Levand
     

07 Apr, 2013

1 commit

  • This patch adds support for kvm_gfn_to_hva_cache_init functions for
    reads and writes that will cross a page. If the range falls within
    the same memslot, then this will be a fast operation. If the range
    is split between two memslots, then the slower kvm_read_guest and
    kvm_write_guest are used.

    Tested: Test against kvm_clock unit tests.

    Signed-off-by: Andrew Honig
    Signed-off-by: Gleb Natapov

    Andrew Honig
     

22 Mar, 2013

1 commit


21 Mar, 2013

1 commit

  • Merge reason:

    From: Alexander Graf

    "Just recently this really important patch got pulled into Linus' tree for 3.9:

    commit 1674400aaee5b466c595a8fc310488263ce888c7
    Author: Anton Blanchard samba.org>
    Date: Tue Mar 12 01:51:51 2013 +0000

    Without that commit, I can not boot my G5, thus I can't run automated tests on it against my queue.

    Could you please merge kvm/next against linus/master, so that I can base my trees against that?"

    * upstream/master: (653 commits)
    PCI: Use ROM images from firmware only if no other ROM source available
    sparc: remove unused "config BITS"
    sparc: delete "if !ULTRA_HAS_POPULATION_COUNT"
    KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798)
    KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797)
    KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796)
    arm64: Kconfig.debug: Remove unused CONFIG_DEBUG_ERRORS
    arm64: Do not select GENERIC_HARDIRQS_NO_DEPRECATED
    inet: limit length of fragment queue hash table bucket lists
    qeth: Fix scatter-gather regression
    qeth: Fix invalid router settings handling
    qeth: delay feature trace
    sgy-cts1000: Remove __dev* attributes
    KVM: x86: fix deadlock in clock-in-progress request handling
    KVM: allow host header to be included even for !CONFIG_KVM
    hwmon: (lm75) Fix tcn75 prefix
    hwmon: (lm75.h) Update header inclusion
    MAINTAINERS: Remove Mark M. Hoffman
    xfs: ensure we capture IO errors correctly
    xfs: fix xfs_iomap_eof_prealloc_initial_size type
    ...

    Signed-off-by: Marcelo Tosatti

    Marcelo Tosatti
     

19 Mar, 2013

1 commit

  • The new context tracking subsystem unconditionally includes kvm_host.h
    headers for the guest enter/exit macros. This causes a compile
    failure when KVM is not enabled.

    Fix by adding an IS_ENABLED(CONFIG_KVM) check to kvm_host so it can
    be included/compiled even when KVM is not enabled.

    Cc: Frederic Weisbecker
    Signed-off-by: Kevin Hilman
    Signed-off-by: Marcelo Tosatti

    Kevin Hilman
     

11 Mar, 2013

1 commit

  • Note that we mark as preempted only when vcpu's task state was
    Running during preemption.

    Thanks Jiannan, Avi for preemption notifier ideas. Thanks Gleb, PeterZ
    for their precious suggestions. Thanks Srikar for an idea on avoiding
    rcu lock while checking task state that improved overcommit numbers.

    Reviewed-by: Chegu Vinod
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Raghavendra K T
    Signed-off-by: Gleb Natapov

    Raghavendra K T
     

06 Mar, 2013

2 commits

  • Add a new bus type for virtio-ccw devices on s390.

    Signed-off-by: Cornelia Huck
    Signed-off-by: Marcelo Tosatti

    Cornelia Huck
     
  • Currently, eventfd introduces module_init/module_exit functions
    to initialize/cleanup the irqfd workqueue. This only works, however,
    if no other module_init/module_exit functions are built into the
    same module.

    Let's just move the initialization and cleanup to kvm_init and kvm_exit.
    This way, it is also clearer where kvm startup may fail.

    Signed-off-by: Cornelia Huck
    Signed-off-by: Marcelo Tosatti

    Cornelia Huck
     

05 Mar, 2013

5 commits


25 Feb, 2013

1 commit

  • Pull KVM updates from Marcelo Tosatti:
    "KVM updates for the 3.9 merge window, including x86 real mode
    emulation fixes, stronger memory slot interface restrictions, mmu_lock
    spinlock hold time reduction, improved handling of large page faults
    on shadow, initial APICv HW acceleration support, s390 channel IO
    based virtio, amongst others"

    * tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
    Revert "KVM: MMU: lazily drop large spte"
    x86: pvclock kvm: align allocation size to page size
    KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr
    x86 emulator: fix parity calculation for AAD instruction
    KVM: PPC: BookE: Handle alignment interrupts
    booke: Added DBCR4 SPR number
    KVM: PPC: booke: Allow multiple exception types
    KVM: PPC: booke: use vcpu reference from thread_struct
    KVM: Remove user_alloc from struct kvm_memory_slot
    KVM: VMX: disable apicv by default
    KVM: s390: Fix handling of iscs.
    KVM: MMU: cleanup __direct_map
    KVM: MMU: remove pt_access in mmu_set_spte
    KVM: MMU: cleanup mapping-level
    KVM: MMU: lazily drop large spte
    KVM: VMX: cleanup vmx_set_cr0().
    KVM: VMX: add missing exit names to VMX_EXIT_REASONS array
    KVM: VMX: disable SMEP feature when guest is in non-paging mode
    KVM: Remove duplicate text in api.txt
    Revert "KVM: MMU: split kvm_mmu_free_page"
    ...

    Linus Torvalds
     

11 Feb, 2013

1 commit

  • This field was needed to differentiate memory slots created by the new
    API, KVM_SET_USER_MEMORY_REGION, from those by the old equivalent,
    KVM_SET_MEMORY_REGION, whose support was dropped long before:

    commit b74a07beed0e64bfba413dcb70dd6749c57f43dc
    KVM: Remove kernel-allocated memory regions

    Although we also have private memory slots to which KVM allocates
    memory with vm_mmap(), !user_alloc slots in other words, the slot id
    should be enough for differentiating them.

    Note: corresponding function parameters will be removed later.

    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Gleb Natapov

    Takuya Yoshikawa
     

29 Jan, 2013

1 commit

  • Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
    manually, which is fully taken care of by the hardware. This needs
    some special awareness into existing interrupr injection path:

    - for pending interrupt, instead of direct injection, we may need
    update architecture specific indicators before resuming to guest.

    - A pending interrupt, which is masked by ISR, should be also
    considered in above update action, since hardware will decide
    when to inject it at right time. Current has_interrupt and
    get_interrupt only returns a valid vector from injection p.o.v.

    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Kevin Tian
    Signed-off-by: Yang Zhang
    Signed-off-by: Gleb Natapov

    Yang Zhang
     

28 Jan, 2013

2 commits

  • While remotely reading the cputime of a task running in a
    full dynticks CPU, the values stored in utime/stime fields
    of struct task_struct may be stale. Its values may be those
    of the last kernel user transition time snapshot and
    we need to add the tickless time spent since this snapshot.

    To fix this, flush the cputime of the dynticks CPUs on
    kernel user transition and record the time / context
    where we did this. Then on top of this snapshot and the current
    time, perform the fixup on the reader side from task_times()
    accessors.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    [fixed kvm module related build errors]
    Signed-off-by: Sedat Dilek

    Frederic Weisbecker
     
  • Do some ground preparatory work before adding guest_enter()
    and guest_exit() context tracking callbacks. Those will
    be later used to read the guest cputime safely when we
    run in full dynticks mode.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Gleb Natapov
    Cc: Ingo Molnar
    Cc: Li Zhong
    Cc: Marcelo Tosatti
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

10 Jan, 2013

1 commit

  • The External Proxy Facility in FSL BookE chips allows the interrupt
    controller to automatically acknowledge an interrupt as soon as a
    core gets its pending external interrupt delivered.

    Today, user space implements the interrupt controller, so we need to
    check on it during such a cycle.

    This patch implements logic for user space to enable EPR exiting,
    disable EPR exiting and EPR exiting itself, so that user space can
    acknowledge an interrupt when an external interrupt has successfully
    been delivered into the guest vcpu.

    Signed-off-by: Alexander Graf

    Alexander Graf
     

23 Dec, 2012

1 commit

  • Previous patch "kvm: Minor memory slot optimization" (b7f69c555ca43)
    overlooked the generation field of the memory slots. Re-using the
    original memory slots left us with with two slightly different memory
    slots with the same generation. To fix this, make update_memslots()
    take a new parameter to specify the last generation. This also makes
    generation management more explicit to avoid such problems in the future.

    Reported-by: Takuya Yoshikawa
    Signed-off-by: Alex Williamson
    Signed-off-by: Gleb Natapov

    Alex Williamson
     

14 Dec, 2012

1 commit

  • We're currently offering a whopping 32 memory slots to user space, an
    int is a bit excessive for storing this. We would like to increase
    our memslots, but SHRT_MAX should be more than enough.

    Reviewed-by: Gleb Natapov
    Signed-off-by: Alex Williamson
    Signed-off-by: Marcelo Tosatti

    Alex Williamson