06 Nov, 2015

1 commit

  • Pull KVM updates from Paolo Bonzini:
    "First batch of KVM changes for 4.4.

    s390:
    A bunch of fixes and optimizations for interrupt and time handling.

    PPC:
    Mostly bug fixes.

    ARM:
    No big features, but many small fixes and prerequisites including:

    - a number of fixes for the arch-timer

    - introducing proper level-triggered semantics for the arch-timers

    - a series of patches to synchronously halt a guest (prerequisite
    for IRQ forwarding)

    - some tracepoint improvements

    - a tweak for the EL2 panic handlers

    - some more VGIC cleanups getting rid of redundant state

    x86:
    Quite a few changes:

    - support for VT-d posted interrupts (i.e. PCI devices can inject
    interrupts directly into vCPUs). This introduces a new
    component (in virt/lib/) that connects VFIO and KVM together.
    The same infrastructure will be used for ARM interrupt
    forwarding as well.

    - more Hyper-V features, though the main one Hyper-V synthetic
    interrupt controller will have to wait for 4.5. These will let
    KVM expose Hyper-V devices.

    - nested virtualization now supports VPID (same as PCID but for
    vCPUs) which makes it quite a bit faster

    - for future hardware that supports NVDIMM, there is support for
    clflushopt, clwb, pcommit

    - support for "split irqchip", i.e. LAPIC in kernel +
    IOAPIC/PIC/PIT in userspace, which reduces the attack surface of
    the hypervisor

    - obligatory smattering of SMM fixes

    - on the guest side, stable scheduler clock support was rewritten
    to not require help from the hypervisor"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits)
    KVM: VMX: Fix commit which broke PML
    KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()
    KVM: x86: allow RSM from 64-bit mode
    KVM: VMX: fix SMEP and SMAP without EPT
    KVM: x86: move kvm_set_irq_inatomic to legacy device assignment
    KVM: device assignment: remove pointless #ifdefs
    KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic
    KVM: x86: zero apic_arb_prio on reset
    drivers/hv: share Hyper-V SynIC constants with userspace
    KVM: x86: handle SMBASE as physical address in RSM
    KVM: x86: add read_phys to x86_emulate_ops
    KVM: x86: removing unused variable
    KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs
    KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
    KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
    KVM: arm/arm64: Optimize away redundant LR tracking
    KVM: s390: use simple switch statement as multiplexer
    KVM: s390: drop useless newline in debugging data
    KVM: s390: SCA must not cross page boundaries
    KVM: arm: Do not indent the arguments of DECLARE_BITMAP
    ...

    Linus Torvalds
     

04 Nov, 2015

1 commit


23 Oct, 2015

2 commits

  • Correct some old mistakes in the API documentation:

    1. VCPU is identified by index (using kvm_get_vcpu() function), but
    "cpu id" can be mistaken for affinity ID.
    2. Some error codes are wrong.

    [ Slightly tweaked some grammer and did some s/CPU index/vcpu_index/
    in the descriptions. -Christoffer ]

    Signed-off-by: Pavel Fedin
    Signed-off-by: Christoffer Dall

    Pavel Fedin
     
  • Forwarded physical interrupts on arm/arm64 is a tricky concept and the
    way we deal with them is not apparently easy to understand by reading
    various specs.

    Therefore, add a proper documentation file explaining the flow and
    rationale of the behavior of the vgic.

    Some of this text was contributed by Marc Zyngier and edited by me.
    Omissions and errors are all mine.

    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

12 Oct, 2015

1 commit


01 Oct, 2015

6 commits

  • This patch updates the Posted-Interrupts Descriptor when vCPU
    is blocked.

    pre-block:
    - Add the vCPU to the blocked per-CPU list
    - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR

    post-block:
    - Remove the vCPU from the per-CPU list

    Signed-off-by: Feng Wu
    [Concentrate invocation of pre/post-block hooks to vcpu_block. - Paolo]
    Signed-off-by: Paolo Bonzini

    Feng Wu
     
  • Cc: Gleb Natapov
    Cc: Paolo Bonzini
    Signed-off-by: Jason Wang
    Signed-off-by: Paolo Bonzini

    Jason Wang
     
  • In order to enable userspace PIC support, the userspace PIC needs to
    be able to inject local interrupts even when the APICs are in the
    kernel.

    KVM_INTERRUPT now supports sending local interrupts to an APIC when
    APICs are in the kernel.

    The ready_for_interrupt_request flag is now only set when the CPU/APIC
    will immediately accept and inject an interrupt (i.e. APIC has not
    masked the PIC).

    When the PIC wishes to initiate an INTA cycle with, say, CPU0, it
    kicks CPU0 out of the guest, and renedezvous with CPU0 once it arrives
    in userspace.

    When the CPU/APIC unmasks the PIC, a KVM_EXIT_IRQ_WINDOW_OPEN is
    triggered, so that userspace has a chance to inject a PIC interrupt
    if it had been pending.

    Overall, this design can lead to a small number of spurious userspace
    renedezvous. In particular, whenever the PIC transistions from low to
    high while it is masked and whenever the PIC becomes unmasked while
    it is low.

    Note: this does not buffer more than one local interrupt in the
    kernel, so the VMM needs to enter the guest in order to complete
    interrupt injection before injecting an additional interrupt.

    Compiles for x86.

    Can pass the KVM Unit Tests.

    Signed-off-by: Steve Rutherford
    Signed-off-by: Paolo Bonzini

    Steve Rutherford
     
  • In order to support a userspace IOAPIC interacting with an in kernel
    APIC, the EOI exit bitmaps need to be configurable.

    If the IOAPIC is in userspace (i.e. the irqchip has been split), the
    EOI exit bitmaps will be set whenever the GSI Routes are configured.
    In particular, for the low MSI routes are reservable for userspace
    IOAPICs. For these MSI routes, the EOI Exit bit corresponding to the
    destination vector of the route will be set for the destination VCPU.

    The intention is for the userspace IOAPICs to use the reservable MSI
    routes to inject interrupts into the guest.

    This is a slight abuse of the notion of an MSI Route, given that MSIs
    classically bypass the IOAPIC. It might be worthwhile to add an
    additional route type to improve clarity.

    Compile tested for Intel x86.

    Signed-off-by: Steve Rutherford
    Signed-off-by: Paolo Bonzini

    Steve Rutherford
     
  • Adds KVM_EXIT_IOAPIC_EOI which allows the kernel to EOI
    level-triggered IOAPIC interrupts.

    Uses a per VCPU exit bitmap to decide whether or not the IOAPIC needs
    to be informed (which is identical to the EOI_EXIT_BITMAP field used
    by modern x86 processors, but can also be used to elide kvm IOAPIC EOI
    exits on older processors).

    [Note: A prototype using ResampleFDs found that decoupling the EOI
    from the VCPU's thread made it possible for the VCPU to not see a
    recent EOI after reentering the guest. This does not match real
    hardware.]

    Compile tested for Intel x86.

    Signed-off-by: Steve Rutherford
    Signed-off-by: Paolo Bonzini

    Steve Rutherford
     
  • First patch in a series which enables the relocation of the
    PIC/IOAPIC to userspace.

    Adds capability KVM_CAP_SPLIT_IRQCHIP;

    KVM_CAP_SPLIT_IRQCHIP enables the construction of LAPICs without the
    rest of the irqchip.

    Compile tested for x86.

    Signed-off-by: Steve Rutherford
    Suggested-by: Andrew Honig
    Signed-off-by: Paolo Bonzini

    Steve Rutherford
     

23 Aug, 2015

1 commit


23 Jul, 2015

1 commit

  • Sending of notification is done by exiting vcpu to user space
    if KVM_REQ_HV_CRASH is enabled for vcpu. At exit to user space
    the kvm_run structure contains system_event with type
    KVM_SYSTEM_EVENT_CRASH to notify about guest crash occurred.

    Signed-off-by: Andrey Smetanin
    Signed-off-by: Denis V. Lunev
    Reviewed-by: Peter Hornyack
    CC: Paolo Bonzini
    CC: Gleb Natapov
    Signed-off-by: Paolo Bonzini

    Andrey Smetanin
     

21 Jul, 2015

4 commits

  • Finally advertise the KVM capability for SET_GUEST_DEBUG. Once arm
    support is added this check can be moved to the common
    kvm_vm_ioctl_check_extension() code.

    Signed-off-by: Alex Bennée
    Acked-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Alex Bennée
     
  • This adds support for SW breakpoints inserted by userspace.

    We do this by trapping all guest software debug exceptions to the
    hypervisor (MDCR_EL2.TDE). The exit handler sets an exit reason of
    KVM_EXIT_DEBUG with the kvm_debug_exit_arch structure holding the
    exception syndrome information.

    It will be up to userspace to extract the PC (via GET_ONE_REG) and
    determine if the debug event was for a breakpoint it inserted. If not
    userspace will need to re-inject the correct exception restart the
    hypervisor to deliver the debug exception to the guest.

    Any other guest software debug exception (e.g. single step or HW
    assisted breakpoints) will cause an error and the VM to be killed. This
    is addressed by later patches which add support for the other debug
    types.

    Signed-off-by: Alex Bennée
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Alex Bennée
     
  • This commit adds a stub function to support the KVM_SET_GUEST_DEBUG
    ioctl. Any unsupported flag will return -EINVAL. For now, only
    KVM_GUESTDBG_ENABLE is supported, although it won't have any effects.

    Signed-off-by: Alex Bennée .
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Alex Bennée
     
  • Bring into line with the comments for the other structures and their
    KVM_EXIT_* cases. Also update api.txt to reflect use in kvm_run
    documentation.

    Signed-off-by: Alex Bennée
    Reviewed-by: David Hildenbrand
    Reviewed-by: Andrew Jones
    Acked-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Alex Bennée
     

05 Jun, 2015

3 commits

  • Follow up to commit e194bbdf362ba7d53cfd23ba24f1a7c90ef69a74.

    Suggested-by: Bandan Das
    Suggested-by: Alex Williamson
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • This is now very simple to do. The only interesting part is a simple
    trick to find the right memslot in gfn_to_rmap, retrieving the address
    space from the spte role word. The same trick is used in the auditing
    code.

    The comment on top of union kvm_mmu_page_role has been stale forever,
    so remove it. Speaking of stale code, remove pad_for_nice_hex_output
    too: it was splitting the "access" bitfield across two bytes and thus
    had effectively turned into pad_for_ugly_hex_output.

    Reviewed-by: Radim Krčmář
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • Only two ioctls have to be modified; the address space id is
    placed in the higher 16 bits of their slot id argument.

    As of this patch, no architecture defines more than one
    address space; x86 will be the first.

    Reviewed-by: Radim Krčmář
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

04 Jun, 2015

1 commit

  • This patch includes changes to the external API for SMM support.
    Userspace can predicate the availability of the new fields and
    ioctls on a new capability, KVM_CAP_X86_SMM, which is added at the end
    of the patch series.

    Reviewed-by: Radim Krčmář
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

20 May, 2015

1 commit

  • KVM may turn a user page to a kernel page when kernel writes a readonly
    user page if CR0.WP = 1. This shadow page entry will be reused after
    SMAP is enabled so that kernel is allowed to access this user page

    Fix it by setting SMAP && !CR0.WP into shadow page's role and reset mmu
    once CR4.SMAP is updated

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Paolo Bonzini

    Xiao Guangrong
     

07 May, 2015

1 commit

  • Introducing KVM_CAP_DISABLE_QUIRKS for disabling x86 quirks that were previous
    created in order to overcome QEMU issues. Those issue were mostly result of
    invalid VM BIOS. Currently there are two quirks that can be disabled:

    1. KVM_QUIRK_LINT0_REENABLED - LINT0 was enabled after boot
    2. KVM_QUIRK_CD_NW_CLEARED - CD and NW are cleared after boot

    These two issues are already resolved in recent releases of QEMU, and would
    therefore be disabled by QEMU.

    Signed-off-by: Nadav Amit
    Message-Id:
    [Report capability from KVM_CHECK_EXTENSION too. - Paolo]
    Signed-off-by: Paolo Bonzini

    Nadav Amit
     

21 Apr, 2015

1 commit

  • Some PowerNV systems include a hardware random-number generator.
    This HWRNG is present on POWER7+ and POWER8 chips and is capable of
    generating one 64-bit random number every microsecond. The random
    numbers are produced by sampling a set of 64 unstable high-frequency
    oscillators and are almost completely entropic.

    PAPR defines an H_RANDOM hypercall which guests can use to obtain one
    64-bit random sample from the HWRNG. This adds a real-mode
    implementation of the H_RANDOM hypercall. This hypercall was
    implemented in real mode because the latency of reading the HWRNG is
    generally small compared to the latency of a guest exit and entry for
    all the threads in the same virtual core.

    Userspace can detect the presence of the HWRNG and the H_RANDOM
    implementation by querying the KVM_CAP_PPC_HWRNG capability. The
    H_RANDOM hypercall implementation will only be invoked when the guest
    does an H_RANDOM hypercall if userspace first enables the in-kernel
    H_RANDOM implementation using the KVM_CAP_PPC_ENABLE_HCALL capability.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Paul Mackerras
    Signed-off-by: Alexander Graf

    Michael Ellerman
     

08 Apr, 2015

2 commits


01 Apr, 2015

3 commits

  • This patch adds support to migrate vcpu interrupts. Two new vcpu ioctls
    are added which get/set the complete status of pending interrupts in one
    go. The ioctls are marked as available with the new capability
    KVM_CAP_S390_IRQ_STATE.

    We can not use a ONEREG, as the number of pending local interrupts is not
    constant and depends on the number of CPUs.

    To retrieve the interrupt state we add an ioctl KVM_S390_GET_IRQ_STATE.
    Its input parameter is a pointer to a struct kvm_s390_irq_state which
    has a buffer and length. For all currently pending interrupts, we copy
    a struct kvm_s390_irq into the buffer and pass it to userspace.

    To store interrupt state into a buffer provided by userspace, we add an
    ioctl KVM_S390_SET_IRQ_STATE. It passes a struct kvm_s390_irq_state into
    the kernel and injects all interrupts contained in the buffer.

    Signed-off-by: Jens Freimann
    Signed-off-by: Christian Borntraeger
    Acked-by: Cornelia Huck

    Jens Freimann
     
  • We have introduced struct kvm_s390_irq a while ago which allows to
    inject all kinds of interrupts as defined in the Principles of
    Operation.
    Add ioctl to inject interrupts with the extended struct kvm_s390_irq

    Signed-off-by: Jens Freimann
    Signed-off-by: Christian Borntraeger
    Acked-by: Cornelia Huck

    Jens Freimann
     
  • This fixes a bug introduced with commit c05c4186bbe4 ("KVM: s390:
    add floating irq controller").

    get_all_floating_irqs() does copy_to_user() while holding
    a spin lock. Let's fix this by filling a temporary buffer
    first and copy it to userspace after giving up the lock.

    Cc: # 3.18+: 69a8d4562638 KVM: s390: no need to hold...

    Reviewed-by: David Hildenbrand
    Signed-off-by: Jens Freimann
    Signed-off-by: Christian Borntraeger
    Acked-by: Cornelia Huck

    Jens Freimann
     

28 Mar, 2015

6 commits

  • Now that the code is in place for KVM to support MIPS SIMD Architecutre
    (MSA) in MIPS guests, wire up the new KVM_CAP_MIPS_MSA capability.

    For backwards compatibility, the capability must be explicitly enabled
    in order to detect or make use of MSA from the guest.

    The capability is not supported if the hardware supports MSA vector
    partitioning, since the extra support cannot be tested yet and it
    extends the state that the userland program would have to save.

    Signed-off-by: James Hogan
    Acked-by: Paolo Bonzini
    Cc: Ralf Baechle
    Cc: Gleb Natapov
    Cc: Jonathan Corbet
    Cc: linux-mips@linux-mips.org
    Cc: kvm@vger.kernel.org
    Cc: linux-api@vger.kernel.org
    Cc: linux-doc@vger.kernel.org

    James Hogan
     
  • Add KVM register numbers for the MIPS SIMD Architecture (MSA) registers,
    and implement access to them with the KVM_GET_ONE_REG / KVM_SET_ONE_REG
    ioctls when the MSA capability is enabled (exposed in a later patch) and
    present in the guest according to its Config3.MSAP bit.

    The MSA vector registers use the same register numbers as the FPU
    registers except with a different size (128bits). Since MSA depends on
    Status.FR=1, these registers are inaccessible when Status.FR=0. These
    registers are returned as a single native endian 128bit value, rather
    than least significant half first with each 64-bit half native endian as
    the kernel uses internally.

    Signed-off-by: James Hogan
    Cc: Paolo Bonzini
    Cc: Paul Burton
    Cc: Ralf Baechle
    Cc: Gleb Natapov
    Cc: Jonathan Corbet
    Cc: linux-mips@linux-mips.org
    Cc: kvm@vger.kernel.org
    Cc: linux-api@vger.kernel.org
    Cc: linux-doc@vger.kernel.org

    James Hogan
     
  • Now that the code is in place for KVM to support FPU in MIPS KVM guests,
    wire up the new KVM_CAP_MIPS_FPU capability.

    For backwards compatibility, the capability must be explicitly enabled
    in order to detect or make use of the FPU from the guest.

    Signed-off-by: James Hogan
    Cc: Paolo Bonzini
    Cc: Ralf Baechle
    Cc: Gleb Natapov
    Cc: Jonathan Corbet
    Cc: linux-mips@linux-mips.org
    Cc: kvm@vger.kernel.org
    Cc: linux-api@vger.kernel.org
    Cc: linux-doc@vger.kernel.org

    James Hogan
     
  • Add KVM register numbers for the MIPS FPU registers, and implement
    access to them with the KVM_GET_ONE_REG / KVM_SET_ONE_REG ioctls when
    the FPU capability is enabled (exposed in a later patch) and present in
    the guest according to its Config1.FP bit.

    The registers are accessible in the current mode of the guest, with each
    sized access showing what the guest would see with an equivalent access,
    and like the architecture they may become UNPREDICTABLE if the FR mode
    is changed. When FR=0, odd doubles are inaccessible as they do not exist
    in that mode.

    Signed-off-by: James Hogan
    Acked-by: Paolo Bonzini
    Cc: Paul Burton
    Cc: Ralf Baechle
    Cc: Gleb Natapov
    Cc: Jonathan Corbet
    Cc: linux-mips@linux-mips.org
    Cc: kvm@vger.kernel.org
    Cc: linux-api@vger.kernel.org
    Cc: linux-doc@vger.kernel.org

    James Hogan
     
  • Add Config4 and Config5 co-processor 0 registers, and add capability to
    write the Config1, Config3, Config4, and Config5 registers using the KVM
    API.

    Only supported bits can be written, to minimise the chances of the guest
    being given a configuration from e.g. QEMU that is inconsistent with
    that being emulated, and as such the handling is in trap_emul.c as it
    may need to be different for VZ. Currently the only modification
    permitted is to make Config4 and Config5 exist via the M bits, but other
    bits will be added for FPU and MSA support in future patches.

    Care should be taken by userland not to change bits without fully
    handling the possible extra state that may then exist and which the
    guest may begin to use and depend on.

    Signed-off-by: James Hogan
    Cc: Paolo Bonzini
    Cc: Ralf Baechle
    Cc: Gleb Natapov
    Cc: linux-mips@linux-mips.org
    Cc: kvm@vger.kernel.org

    James Hogan
     
  • Implement access to the guest Processor Identification CP0 register
    using the KVM_GET_ONE_REG and KVM_SET_ONE_REG ioctls. This allows the
    owning process to modify and read back the value that is exposed to the
    guest in this register.

    Signed-off-by: James Hogan
    Cc: Paolo Bonzini
    Cc: Ralf Baechle
    Cc: Gleb Natapov
    Cc: linux-mips@linux-mips.org
    Cc: kvm@vger.kernel.org

    James Hogan
     

17 Mar, 2015

3 commits

  • Provide the KVM_S390_GET_SKEYS and KVM_S390_SET_SKEYS ioctl which can be used
    to get/set guest storage keys. This functionality is needed for live migration
    of s390 guests that use storage keys.

    Signed-off-by: Jason J. Herne
    Reviewed-by: David Hildenbrand
    Signed-off-by: Christian Borntraeger

    Jason J. Herne
     
  • The Store System Information (STSI) instruction currently collects all
    information it relays to the caller in the kernel. Some information,
    however, is only available in user space. An example of this is the
    guest name: The kernel always sets "KVMGuest", but user space knows the
    actual guest name.

    This patch introduces a new exit, KVM_EXIT_S390_STSI, guarded by a
    capability that can be enabled by user space if it wants to be able to
    insert such data. User space will be provided with the target buffer
    and the requested STSI function code.

    Reviewed-by: Eric Farman
    Reviewed-by: Christian Borntraeger
    Signed-off-by: Ekaterina Tumanova
    Signed-off-by: Christian Borntraeger

    Ekaterina Tumanova
     
  • On s390, we've got to make sure to hold the IPTE lock while accessing
    logical memory. So let's add an ioctl for reading and writing logical
    memory to provide this feature for userspace, too.
    The maximum transfer size of this call is limited to 64kB to prevent
    that the guest can trigger huge copy_from/to_user transfers. QEMU
    currently only requests up to one or two pages so far, so 16*4kB seems
    to be a reasonable limit here.

    Signed-off-by: Thomas Huth
    Signed-off-by: Christian Borntraeger

    Thomas Huth
     

14 Mar, 2015

1 commit

  • To cleanly restore an SMP VM we need to ensure that the current pause
    state of each vcpu is correctly recorded. Things could get confused if
    the CPU starts running after migration restore completes when it was
    paused before it state was captured.

    We use the existing KVM_GET/SET_MP_STATE ioctl to do this. The arm/arm64
    interface is a lot simpler as the only valid states are
    KVM_MP_STATE_RUNNABLE and KVM_MP_STATE_STOPPED.

    Signed-off-by: Alex Bennée
    Signed-off-by: Christoffer Dall

    Alex Bennée
     

12 Mar, 2015

1 commit

  • This patch enables irqfd on arm/arm64.

    Both irqfd and resamplefd are supported. Injection is implemented
    in vgic.c without routing.

    This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD.

    KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability
    automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set.

    Irqfd injection is restricted to SPI. The rationale behind not
    supporting PPI irqfd injection is that any device using a PPI would
    be a private-to-the-CPU device (timer for instance), so its state
    would have to be context-switched along with the VCPU and would
    require in-kernel wiring anyhow. It is not a relevant use case for
    irqfds.

    Signed-off-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Eric Auger