09 Jan, 2017

1 commit

  • commit 0d808df06a44200f52262b6eb72bcb6042f5a7c5 upstream.

    When switching from/to a guest that has a transaction in progress,
    we need to save/restore the checkpointed register state. Although
    XER is part of the CPU state that gets checkpointed, the code that
    does this saving and restoring doesn't save/restore XER.

    This fixes it by saving and restoring the XER. To allow userspace
    to read/write the checkpointed XER value, we also add a new ONE_REG
    specifier.

    The visible effect of this bug is that the guest may see its XER
    value being corrupted when it uses transactions.

    Fixes: e4e38121507a ("KVM: PPC: Book3S HV: Add transactional memory support")
    Fixes: 0a8eccefcb34 ("KVM: PPC: Book3S HV: Add missing code for transaction reclaim on guest exit")
    Signed-off-by: Paul Mackerras
    Reviewed-by: Thomas Huth
    Signed-off-by: Paul Mackerras
    Signed-off-by: Greg Kroah-Hartman

    Paul Mackerras
     

20 Nov, 2016

1 commit

  • Userspace can read the exact value of kvmclock by reading the TSC
    and fetching the timekeeping parameters out of guest memory. This
    however is brittle and not necessary anymore with KVM 4.11. Provide
    a mechanism that lets userspace know if the new KVM_GET_CLOCK
    semantics are in effect, and---since we are at it---if the clock
    is stable across all VCPUs.

    Cc: Radim Krčmář
    Cc: Marcelo Tosatti
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Radim Krčmář

    Paolo Bonzini
     

04 Aug, 2016

2 commits


23 Jul, 2016

4 commits

  • KVM/ARM changes for Linux 4.8

    - GICv3 ITS emulation
    - Simpler idmap management that fixes potential TLB conflicts
    - Honor the kernel protection in HYP mode
    - Removal of the old vgic implementation

    Radim Krčmář
     
  • Up to now, only irqchip routing entries could be set. This patch
    adds the capability to insert MSI routing entries.

    For ARM64, let's also increase KVM_MAX_IRQ_ROUTES to 4096: this
    include SPI irqchip routes plus MSI routes. In the future this
    might be extended.

    Signed-off-by: Eric Auger
    Reviewed-by: Andre Przywara
    Signed-off-by: Marc Zyngier

    Eric Auger
     
  • This patch adds compilation and link against irqchip.

    Main motivation behind using irqchip code is to enable MSI
    routing code. In the future irqchip routing may also be useful
    when targeting multiple irqchips.

    Routing standard callbacks now are implemented in vgic-irqfd:
    - kvm_set_routing_entry
    - kvm_set_irq
    - kvm_set_msi

    They only are supported with new_vgic code.

    Both HAVE_KVM_IRQCHIP and HAVE_KVM_IRQ_ROUTING are defined.
    KVM_CAP_IRQ_ROUTING is advertised and KVM_SET_GSI_ROUTING is allowed.

    So from now on IRQCHIP routing is enabled and a routing table entry
    must exist for irqfd injection to succeed for a given SPI. This patch
    builds a default flat irqchip routing table (gsi=irqchip.pin) covering
    all the VGIC SPI indexes. This routing table is overwritten by the
    first first user-space call to KVM_SET_GSI_ROUTING ioctl.

    MSI routing setup is not yet allowed.

    Signed-off-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Eric Auger
     
  • On ARM, the MSI msg (address and data) comes along with
    out-of-band device ID information. The device ID encodes the
    device that writes the MSI msg. Let's convey the device id in
    kvm_irq_routing_msi and use KVM_MSI_VALID_DEVID flag value in
    kvm_irq_routing_entry to indicate the msi devid is populated.

    Signed-off-by: Eric Auger
    Reviewed-by: Andre Przywara
    Acked-by: Radim Krčmář
    Signed-off-by: Marc Zyngier

    Eric Auger
     

19 Jul, 2016

2 commits

  • Now that all ITS emulation functionality is in place, we advertise
    MSI functionality to userland and also the ITS device to the guest - if
    userland has configured that.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Tested-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • The ARM GICv3 ITS MSI controller requires a device ID to be able to
    assign the proper interrupt vector. On real hardware, this ID is
    sampled from the bus. To be able to emulate an ITS controller, extend
    the KVM MSI interface to let userspace provide such a device ID. For
    PCI devices, the device ID is simply the 16-bit bus-device-function
    triplet, which should be easily available to the userland tool.

    Also there is a new KVM capability which advertises whether the
    current VM requires a device ID to be set along with the MSI data.
    This flag is still reported as not available everywhere, later we will
    enable it when ITS emulation is used.

    Signed-off-by: Andre Przywara
    Reviewed-by: Eric Auger
    Reviewed-by: Marc Zyngier
    Acked-by: Christoffer Dall
    Acked-by: Paolo Bonzini
    Tested-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Andre Przywara
     

18 Jul, 2016

1 commit

  • We will use illegal instruction 0x0000 for handling 2 byte sw breakpoints
    from user space. As it can be enabled dynamically via a capability,
    let's move setting of ICTL_OPEREXC to the post creation step, so we avoid
    any races when enabling that capability just while adding new cpus.

    Acked-by: Janosch Frank
    Reviewed-by: Cornelia Huck
    Signed-off-by: David Hildenbrand
    Signed-off-by: Christian Borntraeger

    David Hildenbrand
     

14 Jul, 2016

2 commits

  • Add KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK as a feature flag to
    KVM_CAP_X2APIC_API.

    The quirk made KVM interpret 0xff as a broadcast even in x2APIC mode.
    The enableable capability is needed in order to support standard x2APIC and
    remain backward compatible.

    Signed-off-by: Radim Krčmář
    [Expand kvm_apic_mda comment. - Paolo]
    Signed-off-by: Paolo Bonzini

    Radim Krčmář
     
  • KVM_CAP_X2APIC_API is a capability for features related to x2APIC
    enablement. KVM_X2APIC_API_32BIT_FORMAT feature can be enabled to
    extend APIC ID in get/set ioctl and MSI addresses to 32 bits.
    Both are needed to support x2APIC.

    The feature has to be enableable and disabled by default, because
    get/set ioctl shifted and truncated APIC ID to 8 bits by using a
    non-standard protocol inspired by xAPIC and the change is not
    backward-compatible.

    Changes to MSI addresses follow the format used by interrupt remapping
    unit. The upper address word, that used to be 0, contains upper 24 bits
    of the LAPIC address in its upper 24 bits. Lower 8 bits are reserved as
    0. Using the upper address word is not backward-compatible either as we
    didn't check that userspace zeroed the word. Reserved bits are still
    not explicitly checked, but non-zero data will affect LAPIC addresses,
    which will cause a bug.

    Signed-off-by: Radim Krčmář
    Signed-off-by: Paolo Bonzini

    Radim Krčmář
     

16 Jun, 2016

1 commit

  • Allow up to 6 KVM guest KScratch registers to be enabled and accessed
    via the KVM guest register API and from the guest itself (the fallback
    reading and writing of commpage registers is sufficient for KScratch
    registers to work as expected).

    User mode can expose the registers by setting the appropriate bits of
    the guest Config4.KScrExist field. KScratch registers that aren't usable
    won't be writeable via the KVM Ioctl API.

    Signed-off-by: James Hogan
    Cc: Paolo Bonzini
    Cc: Radim Krčmář
    Cc: Ralf Baechle
    Cc: linux-mips@linux-mips.org
    Cc: kvm@vger.kernel.org
    Signed-off-by: Paolo Bonzini

    James Hogan
     

10 Jun, 2016

1 commit

  • Let's not provide the device attribute for cmma enabling and clearing
    if the hardware doesn't support it.

    This also helps getting rid of the undocumented return value "-EINVAL"
    in case CMMA is not available when trying to enable it.

    Also properly document the meaning of -EINVAL for CMMA clearing.

    Reviewed-by: Christian Borntraeger
    Signed-off-by: David Hildenbrand
    Signed-off-by: Christian Borntraeger

    David Hildenbrand
     

12 May, 2016

1 commit

  • The KVM_MAX_VCPUS define provides the maximum number of vCPUs per guest, and
    also the upper limit for vCPU ids. This is okay for all archs except PowerPC
    which can have higher ids, depending on the cpu/core/thread topology. In the
    worst case (single threaded guest, host with 8 threads per core), it limits
    the maximum number of vCPUS to KVM_MAX_VCPUS / 8.

    This patch separates the vCPU numbering from the total number of vCPUs, with
    the introduction of KVM_MAX_VCPU_ID, as the maximal valid value for vCPU ids
    plus one.

    The corresponding KVM_CAP_MAX_VCPU_ID allows userspace to validate vCPU ids
    before passing them to KVM_CREATE_VCPU.

    This patch only implements KVM_MAX_VCPU_ID with a specific value for PowerPC.
    Other archs continue to return KVM_MAX_VCPUS instead.

    Suggested-by: Radim Krcmar
    Signed-off-by: Greg Kurz
    Reviewed-by: Cornelia Huck
    Signed-off-by: Paolo Bonzini

    Greg Kurz
     

09 May, 2016

1 commit


09 Mar, 2016

1 commit


04 Mar, 2016

1 commit


03 Mar, 2016

1 commit


02 Mar, 2016

1 commit

  • The existing KVM_CREATE_SPAPR_TCE only supports 32bit windows which is not
    enough for directly mapped windows as the guest can get more than 4GB.

    This adds KVM_CREATE_SPAPR_TCE_64 ioctl and advertises it
    via KVM_CAP_SPAPR_TCE_64 capability. The table size is checked against
    the locked memory limit.

    Since 64bit windows are to support Dynamic DMA windows (DDW), let's add
    @bus_offset and @page_shift which are also required by DDW.

    Signed-off-by: Alexey Kardashevskiy
    Signed-off-by: Paul Mackerras

    Alexey Kardashevskiy
     

01 Mar, 2016

2 commits

  • In some cases it needs to get/set attributes specific to a vcpu and so
    needs something else than ONE_REG.

    Let's copy the KVM_DEVICE approach, and define the respective ioctls
    for the vcpu file descriptor.

    Signed-off-by: Shannon Zhao
    Reviewed-by: Andrew Jones
    Acked-by: Peter Maydell
    Signed-off-by: Marc Zyngier

    Shannon Zhao
     
  • To support guest PMUv3, use one bit of the VCPU INIT feature array.
    Initialize the PMU when initialzing the vcpu with that bit and PMU
    overflow interrupt set.

    Signed-off-by: Shannon Zhao
    Acked-by: Peter Maydell
    Reviewed-by: Andrew Jones
    Signed-off-by: Marc Zyngier

    Shannon Zhao
     

17 Feb, 2016

1 commit

  • The patch implements KVM_EXIT_HYPERV userspace exit
    functionality for Hyper-V VMBus hypercalls:
    HV_X64_HCALL_POST_MESSAGE, HV_X64_HCALL_SIGNAL_EVENT.

    Changes v3:
    * use vcpu->arch.complete_userspace_io to setup hypercall
    result

    Changes v2:
    * use KVM_EXIT_HYPERV for hypercalls

    Signed-off-by: Andrey Smetanin
    Reviewed-by: Roman Kagan
    CC: Gleb Natapov
    CC: Paolo Bonzini
    CC: Joerg Roedel
    CC: "K. Y. Srinivasan"
    CC: Haiyang Zhang
    CC: Roman Kagan
    CC: Denis V. Lunev
    CC: qemu-devel@nongnu.org
    Signed-off-by: Paolo Bonzini

    Andrey Smetanin
     

16 Feb, 2016

1 commit

  • This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
    H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
    devices or emulated PCI. These calls allow adding multiple entries
    (up to 512) into the TCE table in one call which saves time on
    transition between kernel and user space.

    The current implementation of kvmppc_h_stuff_tce() allows it to be
    executed in both real and virtual modes so there is one helper.
    The kvmppc_rm_h_put_tce_indirect() needs to translate the guest address
    to the host address and since the translation is different, there are
    2 helpers - one for each mode.

    This implements the KVM_CAP_PPC_MULTITCE capability. When present,
    the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE if these
    are enabled by the userspace via KVM_CAP_PPC_ENABLE_HCALL.
    If they can not be handled by the kernel, they are passed on to
    the user space. The user space still has to have an implementation
    for these.

    Both HV and PR-syle KVM are supported.

    Signed-off-by: Alexey Kardashevskiy
    Reviewed-by: David Gibson
    Signed-off-by: Paul Mackerras

    Alexey Kardashevskiy
     

26 Jan, 2016

1 commit


26 Nov, 2015

2 commits

  • A new vcpu exit is introduced to notify the userspace of the
    changes in Hyper-V SynIC configuration triggered by guest writing to the
    corresponding MSRs.

    Changes v4:
    * exit into userspace only if guest writes into SynIC MSR's

    Changes v3:
    * added KVM_EXIT_HYPERV types and structs notes into docs

    Signed-off-by: Andrey Smetanin
    Reviewed-by: Roman Kagan
    Signed-off-by: Denis V. Lunev
    CC: Gleb Natapov
    CC: Paolo Bonzini
    CC: Roman Kagan
    CC: Denis V. Lunev
    CC: qemu-devel@nongnu.org
    Signed-off-by: Paolo Bonzini

    Andrey Smetanin
     
  • SynIC (synthetic interrupt controller) is a lapic extension,
    which is controlled via MSRs and maintains for each vCPU
    - 16 synthetic interrupt "lines" (SINT's); each can be configured to
    trigger a specific interrupt vector optionally with auto-EOI
    semantics
    - a message page in the guest memory with 16 256-byte per-SINT message
    slots
    - an event flag page in the guest memory with 16 2048-bit per-SINT
    event flag areas

    The host triggers a SINT whenever it delivers a new message to the
    corresponding slot or flips an event flag bit in the corresponding area.
    The guest informs the host that it can try delivering a message by
    explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
    MSR.

    The userspace (qemu) triggers interrupts and receives EOM notifications
    via irqfd with resampler; for that, a GSI is allocated for each
    configured SINT, and irq_routing api is extended to support GSI-SINT
    mapping.

    Changes v4:
    * added activation of SynIC by vcpu KVM_ENABLE_CAP
    * added per SynIC active flag
    * added deactivation of APICv upon SynIC activation

    Changes v3:
    * added KVM_CAP_HYPERV_SYNIC and KVM_IRQ_ROUTING_HV_SINT notes into
    docs

    Changes v2:
    * do not use posted interrupts for Hyper-V SynIC AutoEOI vectors
    * add Hyper-V SynIC vectors into EOI exit bitmap
    * Hyper-V SyniIC SINT msr write logic simplified

    Signed-off-by: Andrey Smetanin
    Reviewed-by: Roman Kagan
    Signed-off-by: Denis V. Lunev
    CC: Gleb Natapov
    CC: Paolo Bonzini
    CC: Roman Kagan
    CC: Denis V. Lunev
    CC: qemu-devel@nongnu.org
    Signed-off-by: Paolo Bonzini

    Andrey Smetanin
     

06 Nov, 2015

1 commit

  • Pull KVM updates from Paolo Bonzini:
    "First batch of KVM changes for 4.4.

    s390:
    A bunch of fixes and optimizations for interrupt and time handling.

    PPC:
    Mostly bug fixes.

    ARM:
    No big features, but many small fixes and prerequisites including:

    - a number of fixes for the arch-timer

    - introducing proper level-triggered semantics for the arch-timers

    - a series of patches to synchronously halt a guest (prerequisite
    for IRQ forwarding)

    - some tracepoint improvements

    - a tweak for the EL2 panic handlers

    - some more VGIC cleanups getting rid of redundant state

    x86:
    Quite a few changes:

    - support for VT-d posted interrupts (i.e. PCI devices can inject
    interrupts directly into vCPUs). This introduces a new
    component (in virt/lib/) that connects VFIO and KVM together.
    The same infrastructure will be used for ARM interrupt
    forwarding as well.

    - more Hyper-V features, though the main one Hyper-V synthetic
    interrupt controller will have to wait for 4.5. These will let
    KVM expose Hyper-V devices.

    - nested virtualization now supports VPID (same as PCID but for
    vCPUs) which makes it quite a bit faster

    - for future hardware that supports NVDIMM, there is support for
    clflushopt, clwb, pcommit

    - support for "split irqchip", i.e. LAPIC in kernel +
    IOAPIC/PIC/PIT in userspace, which reduces the attack surface of
    the hypervisor

    - obligatory smattering of SMM fixes

    - on the guest side, stable scheduler clock support was rewritten
    to not require help from the hypervisor"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits)
    KVM: VMX: Fix commit which broke PML
    KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()
    KVM: x86: allow RSM from 64-bit mode
    KVM: VMX: fix SMEP and SMAP without EPT
    KVM: x86: move kvm_set_irq_inatomic to legacy device assignment
    KVM: device assignment: remove pointless #ifdefs
    KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic
    KVM: x86: zero apic_arb_prio on reset
    drivers/hv: share Hyper-V SynIC constants with userspace
    KVM: x86: handle SMBASE as physical address in RSM
    KVM: x86: add read_phys to x86_emulate_ops
    KVM: x86: removing unused variable
    KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs
    KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
    KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
    KVM: arm/arm64: Optimize away redundant LR tracking
    KVM: s390: use simple switch statement as multiplexer
    KVM: s390: drop useless newline in debugging data
    KVM: s390: SCA must not cross page boundaries
    KVM: arm: Do not indent the arguments of DECLARE_BITMAP
    ...

    Linus Torvalds
     

12 Oct, 2015

1 commit


01 Oct, 2015

5 commits

  • Cc: Gleb Natapov
    Cc: Paolo Bonzini
    Signed-off-by: Jason Wang
    Signed-off-by: Paolo Bonzini

    Jason Wang
     
  • In order to enable userspace PIC support, the userspace PIC needs to
    be able to inject local interrupts even when the APICs are in the
    kernel.

    KVM_INTERRUPT now supports sending local interrupts to an APIC when
    APICs are in the kernel.

    The ready_for_interrupt_request flag is now only set when the CPU/APIC
    will immediately accept and inject an interrupt (i.e. APIC has not
    masked the PIC).

    When the PIC wishes to initiate an INTA cycle with, say, CPU0, it
    kicks CPU0 out of the guest, and renedezvous with CPU0 once it arrives
    in userspace.

    When the CPU/APIC unmasks the PIC, a KVM_EXIT_IRQ_WINDOW_OPEN is
    triggered, so that userspace has a chance to inject a PIC interrupt
    if it had been pending.

    Overall, this design can lead to a small number of spurious userspace
    renedezvous. In particular, whenever the PIC transistions from low to
    high while it is masked and whenever the PIC becomes unmasked while
    it is low.

    Note: this does not buffer more than one local interrupt in the
    kernel, so the VMM needs to enter the guest in order to complete
    interrupt injection before injecting an additional interrupt.

    Compiles for x86.

    Can pass the KVM Unit Tests.

    Signed-off-by: Steve Rutherford
    Signed-off-by: Paolo Bonzini

    Steve Rutherford
     
  • In order to support a userspace IOAPIC interacting with an in kernel
    APIC, the EOI exit bitmaps need to be configurable.

    If the IOAPIC is in userspace (i.e. the irqchip has been split), the
    EOI exit bitmaps will be set whenever the GSI Routes are configured.
    In particular, for the low MSI routes are reservable for userspace
    IOAPICs. For these MSI routes, the EOI Exit bit corresponding to the
    destination vector of the route will be set for the destination VCPU.

    The intention is for the userspace IOAPICs to use the reservable MSI
    routes to inject interrupts into the guest.

    This is a slight abuse of the notion of an MSI Route, given that MSIs
    classically bypass the IOAPIC. It might be worthwhile to add an
    additional route type to improve clarity.

    Compile tested for Intel x86.

    Signed-off-by: Steve Rutherford
    Signed-off-by: Paolo Bonzini

    Steve Rutherford
     
  • Adds KVM_EXIT_IOAPIC_EOI which allows the kernel to EOI
    level-triggered IOAPIC interrupts.

    Uses a per VCPU exit bitmap to decide whether or not the IOAPIC needs
    to be informed (which is identical to the EOI_EXIT_BITMAP field used
    by modern x86 processors, but can also be used to elide kvm IOAPIC EOI
    exits on older processors).

    [Note: A prototype using ResampleFDs found that decoupling the EOI
    from the VCPU's thread made it possible for the VCPU to not see a
    recent EOI after reentering the guest. This does not match real
    hardware.]

    Compile tested for Intel x86.

    Signed-off-by: Steve Rutherford
    Signed-off-by: Paolo Bonzini

    Steve Rutherford
     
  • First patch in a series which enables the relocation of the
    PIC/IOAPIC to userspace.

    Adds capability KVM_CAP_SPLIT_IRQCHIP;

    KVM_CAP_SPLIT_IRQCHIP enables the construction of LAPICs without the
    rest of the irqchip.

    Compile tested for x86.

    Signed-off-by: Steve Rutherford
    Suggested-by: Andrew Honig
    Signed-off-by: Paolo Bonzini

    Steve Rutherford
     

23 Aug, 2015

1 commit


23 Jul, 2015

1 commit

  • Sending of notification is done by exiting vcpu to user space
    if KVM_REQ_HV_CRASH is enabled for vcpu. At exit to user space
    the kvm_run structure contains system_event with type
    KVM_SYSTEM_EVENT_CRASH to notify about guest crash occurred.

    Signed-off-by: Andrey Smetanin
    Signed-off-by: Denis V. Lunev
    Reviewed-by: Peter Hornyack
    CC: Paolo Bonzini
    CC: Gleb Natapov
    Signed-off-by: Paolo Bonzini

    Andrey Smetanin
     

21 Jul, 2015

3 commits

  • Finally advertise the KVM capability for SET_GUEST_DEBUG. Once arm
    support is added this check can be moved to the common
    kvm_vm_ioctl_check_extension() code.

    Signed-off-by: Alex Bennée
    Acked-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Alex Bennée
     
  • This adds support for SW breakpoints inserted by userspace.

    We do this by trapping all guest software debug exceptions to the
    hypervisor (MDCR_EL2.TDE). The exit handler sets an exit reason of
    KVM_EXIT_DEBUG with the kvm_debug_exit_arch structure holding the
    exception syndrome information.

    It will be up to userspace to extract the PC (via GET_ONE_REG) and
    determine if the debug event was for a breakpoint it inserted. If not
    userspace will need to re-inject the correct exception restart the
    hypervisor to deliver the debug exception to the guest.

    Any other guest software debug exception (e.g. single step or HW
    assisted breakpoints) will cause an error and the VM to be killed. This
    is addressed by later patches which add support for the other debug
    types.

    Signed-off-by: Alex Bennée
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Alex Bennée
     
  • This commit adds a stub function to support the KVM_SET_GUEST_DEBUG
    ioctl. Any unsupported flag will return -EINVAL. For now, only
    KVM_GUESTDBG_ENABLE is supported, although it won't have any effects.

    Signed-off-by: Alex Bennée .
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Alex Bennée