04 Aug, 2016

2 commits


23 Jul, 2016

4 commits

  • KVM/ARM changes for Linux 4.8

    - GICv3 ITS emulation
    - Simpler idmap management that fixes potential TLB conflicts
    - Honor the kernel protection in HYP mode
    - Removal of the old vgic implementation

    Radim Krčmář
     
  • Up to now, only irqchip routing entries could be set. This patch
    adds the capability to insert MSI routing entries.

    For ARM64, let's also increase KVM_MAX_IRQ_ROUTES to 4096: this
    include SPI irqchip routes plus MSI routes. In the future this
    might be extended.

    Signed-off-by: Eric Auger
    Reviewed-by: Andre Przywara
    Signed-off-by: Marc Zyngier

    Eric Auger
     
  • This patch adds compilation and link against irqchip.

    Main motivation behind using irqchip code is to enable MSI
    routing code. In the future irqchip routing may also be useful
    when targeting multiple irqchips.

    Routing standard callbacks now are implemented in vgic-irqfd:
    - kvm_set_routing_entry
    - kvm_set_irq
    - kvm_set_msi

    They only are supported with new_vgic code.

    Both HAVE_KVM_IRQCHIP and HAVE_KVM_IRQ_ROUTING are defined.
    KVM_CAP_IRQ_ROUTING is advertised and KVM_SET_GSI_ROUTING is allowed.

    So from now on IRQCHIP routing is enabled and a routing table entry
    must exist for irqfd injection to succeed for a given SPI. This patch
    builds a default flat irqchip routing table (gsi=irqchip.pin) covering
    all the VGIC SPI indexes. This routing table is overwritten by the
    first first user-space call to KVM_SET_GSI_ROUTING ioctl.

    MSI routing setup is not yet allowed.

    Signed-off-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Eric Auger
     
  • On ARM, the MSI msg (address and data) comes along with
    out-of-band device ID information. The device ID encodes the
    device that writes the MSI msg. Let's convey the device id in
    kvm_irq_routing_msi and use KVM_MSI_VALID_DEVID flag value in
    kvm_irq_routing_entry to indicate the msi devid is populated.

    Signed-off-by: Eric Auger
    Reviewed-by: Andre Przywara
    Acked-by: Radim Krčmář
    Signed-off-by: Marc Zyngier

    Eric Auger
     

19 Jul, 2016

3 commits

  • Now that all ITS emulation functionality is in place, we advertise
    MSI functionality to userland and also the ITS device to the guest - if
    userland has configured that.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Tested-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • Introduce a new KVM device that represents an ARM Interrupt Translation
    Service (ITS) controller. Since there can be multiple of this per guest,
    we can't piggy back on the existing GICv3 distributor device, but create
    a new type of KVM device.
    On the KVM_CREATE_DEVICE ioctl we allocate and initialize the ITS data
    structure and store the pointer in the kvm_device data.
    Upon an explicit init ioctl from userland (after having setup the MMIO
    address) we register the handlers with the kvm_io_bus framework.
    Any reference to an ITS thus has to go via this interface.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Tested-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • The ARM GICv3 ITS MSI controller requires a device ID to be able to
    assign the proper interrupt vector. On real hardware, this ID is
    sampled from the bus. To be able to emulate an ITS controller, extend
    the KVM MSI interface to let userspace provide such a device ID. For
    PCI devices, the device ID is simply the 16-bit bus-device-function
    triplet, which should be easily available to the userland tool.

    Also there is a new KVM capability which advertises whether the
    current VM requires a device ID to be set along with the MSI data.
    This flag is still reported as not available everywhere, later we will
    enable it when ITS emulation is used.

    Signed-off-by: Andre Przywara
    Reviewed-by: Eric Auger
    Reviewed-by: Marc Zyngier
    Acked-by: Christoffer Dall
    Acked-by: Paolo Bonzini
    Tested-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Andre Przywara
     

18 Jul, 2016

1 commit

  • We will use illegal instruction 0x0000 for handling 2 byte sw breakpoints
    from user space. As it can be enabled dynamically via a capability,
    let's move setting of ICTL_OPEREXC to the post creation step, so we avoid
    any races when enabling that capability just while adding new cpus.

    Acked-by: Janosch Frank
    Reviewed-by: Cornelia Huck
    Signed-off-by: David Hildenbrand
    Signed-off-by: Christian Borntraeger

    David Hildenbrand
     

14 Jul, 2016

2 commits

  • Add KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK as a feature flag to
    KVM_CAP_X2APIC_API.

    The quirk made KVM interpret 0xff as a broadcast even in x2APIC mode.
    The enableable capability is needed in order to support standard x2APIC and
    remain backward compatible.

    Signed-off-by: Radim Krčmář
    [Expand kvm_apic_mda comment. - Paolo]
    Signed-off-by: Paolo Bonzini

    Radim Krčmář
     
  • KVM_CAP_X2APIC_API is a capability for features related to x2APIC
    enablement. KVM_X2APIC_API_32BIT_FORMAT feature can be enabled to
    extend APIC ID in get/set ioctl and MSI addresses to 32 bits.
    Both are needed to support x2APIC.

    The feature has to be enableable and disabled by default, because
    get/set ioctl shifted and truncated APIC ID to 8 bits by using a
    non-standard protocol inspired by xAPIC and the change is not
    backward-compatible.

    Changes to MSI addresses follow the format used by interrupt remapping
    unit. The upper address word, that used to be 0, contains upper 24 bits
    of the LAPIC address in its upper 24 bits. Lower 8 bits are reserved as
    0. Using the upper address word is not backward-compatible either as we
    didn't check that userspace zeroed the word. Reserved bits are still
    not explicitly checked, but non-zero data will affect LAPIC addresses,
    which will cause a bug.

    Signed-off-by: Radim Krčmář
    Signed-off-by: Paolo Bonzini

    Radim Krčmář
     

16 Jun, 2016

1 commit

  • Allow up to 6 KVM guest KScratch registers to be enabled and accessed
    via the KVM guest register API and from the guest itself (the fallback
    reading and writing of commpage registers is sufficient for KScratch
    registers to work as expected).

    User mode can expose the registers by setting the appropriate bits of
    the guest Config4.KScrExist field. KScratch registers that aren't usable
    won't be writeable via the KVM Ioctl API.

    Signed-off-by: James Hogan
    Cc: Paolo Bonzini
    Cc: Radim Krčmář
    Cc: Ralf Baechle
    Cc: linux-mips@linux-mips.org
    Cc: kvm@vger.kernel.org
    Signed-off-by: Paolo Bonzini

    James Hogan
     

15 Jun, 2016

1 commit

  • …/kvms390/linux into HEAD

    KVM: s390: Features and fixes for 4.8 part1

    Four bigger things:
    1. The implementation of the STHYI opcode in the kernel. This is used
    in libraries like qclib [1] to provide enough information for a
    capacity and usage based software licence pricing. The STHYI content
    is defined by the related z/VM documentation [2]. Its data can be
    composed by accessing several other interfaces provided by LPAR or
    the machine. This information is partially sensitive or root-only
    so the kernel does the necessary filtering.
    2. Preparation for nested virtualization (VSIE). KVM should query the
    proper sclp interfaces for the availability of some features before
    using it. In the past we have been sloppy and simply assumed that
    several features are available. With this we should be able to handle
    most cases of a missing feature.
    3. CPU model interfaces extended by some additional features that are
    not covered by a facility bit in STFLE. For example all the crypto
    instructions of the coprocessor provide a query function. As reality
    tends to be more complex (e.g. export regulations might block some
    algorithms) we have to provide additional interfaces to query or
    set these non-stfle features.
    4. Several fixes and changes detected and fixed when doing 1-3.

    All features change base s390 code. All relevant patches have an ACK
    from the s390 or component maintainers.

    The next pull request for 4.8 (part2) will contain the implementation
    of VSIE.

    [1] http://www.ibm.com/developerworks/linux/linux390/qclib.html
    [2] https://www.ibm.com/support/knowledgecenter/SSB27U_6.3.0/com.ibm.zvm.v630.hcpb4/hcpb4sth.htm

    Paolo Bonzini
     

14 Jun, 2016

1 commit


10 Jun, 2016

3 commits

  • Let's not provide the device attribute for cmma enabling and clearing
    if the hardware doesn't support it.

    This also helps getting rid of the undocumented return value "-EINVAL"
    in case CMMA is not available when trying to enable it.

    Also properly document the meaning of -EINVAL for CMMA clearing.

    Reviewed-by: Christian Borntraeger
    Signed-off-by: David Hildenbrand
    Signed-off-by: Christian Borntraeger

    David Hildenbrand
     
  • We have certain instructions that indicate available subfunctions via
    a query subfunction (crypto functions and ptff), or via a test bit
    function (plo).

    By exposing these "subfunction blocks" to user space, we allow user space
    to
    1) query available subfunctions and make sure subfunctions won't get lost
    during migration - e.g. properly indicate them via a CPU model
    2) change the subfunctions to be reported to the guest (even adding
    unavailable ones)

    This mechanism works just like the way we indicate the stfl(e) list to
    user space.

    This way, user space could even emulate some subfunctions in QEMU in the
    future. If this is ever applicable, we have to make sure later on, that
    unsupported subfunctions result in an intercept to QEMU.

    Please note that support to indicate them to the guest is still missing
    and requires hardware support. Usually, the IBC takes already care of these
    subfunctions for migration safety. QEMU should make sure to always set
    these bits properly according to the machine generation to be emulated.

    Available subfunctions are only valid in combination with STFLE bits
    retrieved via KVM_S390_VM_CPU_MACHINE and enabled via
    KVM_S390_VM_CPU_PROCESSOR. If the applicable bits are available, the
    indicated subfunctions are guaranteed to be correct.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Christian Borntraeger

    David Hildenbrand
     
  • For now, we only have an interface to query and configure facilities
    indicated via STFL(E). However, we also have features indicated via
    SCLP, that have to be indicated to the guest by user space and usually
    require KVM support.

    This patch allows user space to query and configure available cpu features
    for the guest.

    Please note that disabling a feature doesn't necessarily mean that it is
    completely disabled (e.g. ESOP is mostly handled by the SIE). We will try
    our best to disable it.

    Most features (e.g. SCLP) can't directly be forwarded, as most of them need
    in addition to hardware support, support in KVM. As we later on want to
    turn these features in KVM explicitly on/off (to simulate different
    behavior), we have to filter all features provided by the hardware and
    make them configurable.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Christian Borntraeger

    David Hildenbrand
     

12 May, 2016

1 commit

  • The KVM_MAX_VCPUS define provides the maximum number of vCPUs per guest, and
    also the upper limit for vCPU ids. This is okay for all archs except PowerPC
    which can have higher ids, depending on the cpu/core/thread topology. In the
    worst case (single threaded guest, host with 8 threads per core), it limits
    the maximum number of vCPUS to KVM_MAX_VCPUS / 8.

    This patch separates the vCPU numbering from the total number of vCPUs, with
    the introduction of KVM_MAX_VCPU_ID, as the maximal valid value for vCPU ids
    plus one.

    The corresponding KVM_CAP_MAX_VCPU_ID allows userspace to validate vCPU ids
    before passing them to KVM_CREATE_VCPU.

    This patch only implements KVM_MAX_VCPU_ID with a specific value for PowerPC.
    Other archs continue to return KVM_MAX_VCPUS instead.

    Suggested-by: Radim Krcmar
    Signed-off-by: Greg Kurz
    Reviewed-by: Cornelia Huck
    Signed-off-by: Paolo Bonzini

    Greg Kurz
     

10 May, 2016

1 commit


09 May, 2016

1 commit


25 Apr, 2016

1 commit


20 Apr, 2016

2 commits

  • Introduce a FLIC operation for clearing I/O interrupts for a subchannel.

    Rationale: According to the platform specification, pending I/O
    interruption requests have to be revoked in certain situations. For
    instance, according to the Principles of Operation (page 17-27), a
    subchannel put into the installed parameters initialized state is in the
    same state as after an I/O system reset (just parameters possibly changed).
    This implies that any I/O interrupts for that subchannel are no longer
    pending (as I/O system resets clear I/O interrupts). Therefore, we need an
    interface to clear pending I/O interrupts.

    Signed-off-by: Halil Pasic
    Reviewed-by: Cornelia Huck
    Reviewed-by: Christian Borntraeger
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Cornelia Huck

    Halil Pasic
     
  • FLIC behavior deviates from the API documentation in reporting EINVAL
    instead of ENXIO for KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR when the group
    or attribute is unknown/unsupported. Unfortunately this can not be fixed
    for historical reasons. Let us at least have it documented.

    Signed-off-by: Halil Pasic
    Reviewed-by: Cornelia Huck
    Reviewed-by: Christian Borntraeger
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Cornelia Huck

    Halil Pasic
     

17 Mar, 2016

1 commit

  • Pull KVM updates from Paolo Bonzini:
    "One of the largest releases for KVM... Hardly any generic
    changes, but lots of architecture-specific updates.

    ARM:
    - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
    - PMU support for guests
    - 32bit world switch rewritten in C
    - various optimizations to the vgic save/restore code.

    PPC:
    - enabled KVM-VFIO integration ("VFIO device")
    - optimizations to speed up IPIs between vcpus
    - in-kernel handling of IOMMU hypercalls
    - support for dynamic DMA windows (DDW).

    s390:
    - provide the floating point registers via sync regs;
    - separated instruction vs. data accesses
    - dirty log improvements for huge guests
    - bugfixes and documentation improvements.

    x86:
    - Hyper-V VMBus hypercall userspace exit
    - alternative implementation of lowest-priority interrupts using
    vector hashing (for better VT-d posted interrupt support)
    - fixed guest debugging with nested virtualizations
    - improved interrupt tracking in the in-kernel IOAPIC
    - generic infrastructure for tracking writes to guest
    memory - currently its only use is to speedup the legacy shadow
    paging (pre-EPT) case, but in the future it will be used for
    virtual GPUs as well
    - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits)
    KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch
    KVM: x86: disable MPX if host did not enable MPX XSAVE features
    arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit
    arm64: KVM: vgic-v3: Reset LRs at boot time
    arm64: KVM: vgic-v3: Do not save an LR known to be empty
    arm64: KVM: vgic-v3: Save maintenance interrupt state only if required
    arm64: KVM: vgic-v3: Avoid accessing ICH registers
    KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit
    KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit
    KVM: arm/arm64: vgic-v2: Reset LRs at boot time
    KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty
    KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function
    KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required
    KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers
    KVM: s390: allocate only one DMA page per VM
    KVM: s390: enable STFLE interpretation only if enabled for the guest
    KVM: s390: wake up when the VCPU cpu timer expires
    KVM: s390: step the VCPU timer while in enabled wait
    KVM: s390: protect VCPU cpu timer with a seqcount
    KVM: s390: step VCPU cpu timer during kvm_run ioctl
    ...

    Linus Torvalds
     

10 Mar, 2016

1 commit

  • Yes, all of these are needed. :) This is admittedly a bit odd, but
    kvm-unit-tests access.flat tests this if you run it with "-cpu host"
    and of course ept=0.

    KVM runs the guest with CR0.WP=1, so it must handle supervisor writes
    specially when pte.u=1/pte.w=0/CR0.WP=0. Such writes cause a fault
    when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0.
    When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and
    restarts execution. This will still cause a user write to fault, while
    supervisor writes will succeed. User reads will fault spuriously now,
    and KVM will then flip U and W again in the SPTE (U=1, W=0). User reads
    will be enabled and supervisor writes disabled, going back to the
    originary situation where supervisor writes fault spuriously.

    When SMEP is in effect, however, U=0 will enable kernel execution of
    this page. To avoid this, KVM also sets NX=1 in the shadow PTE together
    with U=0. If the guest has not enabled NX, the result is a continuous
    stream of page faults due to the NX bit being reserved.

    The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
    switch. (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry
    control, so they do not use user-return notifiers for EFER---if they did,
    EFER.NX would be forced to the same value as the host).

    There is another bug in the reserved bit check, which I've split to a
    separate patch for easier application to stable kernels.

    Cc: stable@vger.kernel.org
    Cc: Andy Lutomirski
    Reviewed-by: Xiao Guangrong
    Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

09 Mar, 2016

1 commit


04 Mar, 2016

1 commit


03 Mar, 2016

2 commits

  • kvm_lpage_info->write_count is used to detect if the large page mapping
    for the gfn on the specified level is allowed, rename it to disallow_lpage
    to reflect its purpose, also we rename has_wrprotected_page() to
    mmu_gfn_lpage_is_disallowed() to make the code more clearer

    Later we will extend this mechanism for page tracking: if the gfn is
    tracked then large mapping for that gfn on any level is not allowed.
    The new name is more straightforward

    Reviewed-by: Paolo Bonzini
    Signed-off-by: Xiao Guangrong
    Signed-off-by: Paolo Bonzini

    Xiao Guangrong
     
  • …lus/powerpc into HEAD

    The highlights are:

    * Enable VFIO device on PowerPC, from David Gibson
    * Optimizations to speed up IPIs between vcpus in HV KVM,
    from Suresh Warrier (who is also Suresh E. Warrier)
    * In-kernel handling of IOMMU hypercalls, and support for dynamic DMA
    windows (DDW), from Alexey Kardashevskiy.

    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

    Paolo Bonzini
     

02 Mar, 2016

1 commit

  • The existing KVM_CREATE_SPAPR_TCE only supports 32bit windows which is not
    enough for directly mapped windows as the guest can get more than 4GB.

    This adds KVM_CREATE_SPAPR_TCE_64 ioctl and advertises it
    via KVM_CAP_SPAPR_TCE_64 capability. The table size is checked against
    the locked memory limit.

    Since 64bit windows are to support Dynamic DMA windows (DDW), let's add
    @bus_offset and @page_shift which are also required by DDW.

    Signed-off-by: Alexey Kardashevskiy
    Signed-off-by: Paul Mackerras

    Alexey Kardashevskiy
     

01 Mar, 2016

3 commits

  • To configure the virtual PMUv3 overflow interrupt number, we use the
    vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_PMU_V3_IRQ
    attribute within the KVM_ARM_VCPU_PMU_V3_CTRL group.

    After configuring the PMUv3, call the vcpu ioctl with attribute
    KVM_ARM_VCPU_PMU_V3_INIT to initialize the PMUv3.

    Signed-off-by: Shannon Zhao
    Acked-by: Peter Maydell
    Reviewed-by: Andrew Jones
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Shannon Zhao
     
  • In some cases it needs to get/set attributes specific to a vcpu and so
    needs something else than ONE_REG.

    Let's copy the KVM_DEVICE approach, and define the respective ioctls
    for the vcpu file descriptor.

    Signed-off-by: Shannon Zhao
    Reviewed-by: Andrew Jones
    Acked-by: Peter Maydell
    Signed-off-by: Marc Zyngier

    Shannon Zhao
     
  • To support guest PMUv3, use one bit of the VCPU INIT feature array.
    Initialize the PMU when initialzing the vcpu with that bit and PMU
    overflow interrupt set.

    Signed-off-by: Shannon Zhao
    Acked-by: Peter Maydell
    Reviewed-by: Andrew Jones
    Signed-off-by: Marc Zyngier

    Shannon Zhao
     

17 Feb, 2016

1 commit

  • The patch implements KVM_EXIT_HYPERV userspace exit
    functionality for Hyper-V VMBus hypercalls:
    HV_X64_HCALL_POST_MESSAGE, HV_X64_HCALL_SIGNAL_EVENT.

    Changes v3:
    * use vcpu->arch.complete_userspace_io to setup hypercall
    result

    Changes v2:
    * use KVM_EXIT_HYPERV for hypercalls

    Signed-off-by: Andrey Smetanin
    Reviewed-by: Roman Kagan
    CC: Gleb Natapov
    CC: Paolo Bonzini
    CC: Joerg Roedel
    CC: "K. Y. Srinivasan"
    CC: Haiyang Zhang
    CC: Roman Kagan
    CC: Denis V. Lunev
    CC: qemu-devel@nongnu.org
    Signed-off-by: Paolo Bonzini

    Andrey Smetanin
     

16 Feb, 2016

1 commit

  • This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
    H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
    devices or emulated PCI. These calls allow adding multiple entries
    (up to 512) into the TCE table in one call which saves time on
    transition between kernel and user space.

    The current implementation of kvmppc_h_stuff_tce() allows it to be
    executed in both real and virtual modes so there is one helper.
    The kvmppc_rm_h_put_tce_indirect() needs to translate the guest address
    to the host address and since the translation is different, there are
    2 helpers - one for each mode.

    This implements the KVM_CAP_PPC_MULTITCE capability. When present,
    the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE if these
    are enabled by the userspace via KVM_CAP_PPC_ENABLE_HCALL.
    If they can not be handled by the kernel, they are passed on to
    the user space. The user space still has to have an implementation
    for these.

    Both HV and PR-syle KVM are supported.

    Signed-off-by: Alexey Kardashevskiy
    Reviewed-by: David Gibson
    Signed-off-by: Paul Mackerras

    Alexey Kardashevskiy
     

10 Feb, 2016

3 commits


26 Jan, 2016

1 commit