15 Jun, 2017

2 commits


08 Jun, 2017

7 commits

  • The PMU IRQ number is set through the VCPU device's KVM_SET_DEVICE_ATTR
    ioctl handler for the KVM_ARM_VCPU_PMU_V3_IRQ attribute, but there is no
    enforced or stated requirement that this must happen after initializing
    the VGIC. As a result, calling vgic_valid_spi() which relies on the
    nr_spis being set during the VGIC init can incorrectly fail.

    Introduce irq_is_spi, which determines if an IRQ number is within the
    SPI range without verifying it against the actual VGIC properties.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Marc Zyngier

    Christoffer Dall
     
  • When injecting an IRQ to the VGIC, you now have to present an owner
    token for that IRQ line to show that you are the owner of that line.

    IRQ lines driven from userspace or via an irqfd do not have an owner and
    will simply pass a NULL pointer.

    Also get rid of the unused kvm_vgic_inject_mapped_irq prototype.

    Signed-off-by: Christoffer Dall
    Acked-by: Marc Zyngier

    Christoffer Dall
     
  • Having multiple devices being able to signal the same interrupt line is
    very confusing and almost certainly guarantees a configuration error.

    Therefore, introduce a very simple allocator which allows a device to
    claim an interrupt line from the vgic for a given VM.

    Signed-off-by: Christoffer Dall
    Acked-by: Marc Zyngier

    Christoffer Dall
     
  • First we define an ABI using the vcpu devices that lets userspace set
    the interrupt numbers for the various timers on both the 32-bit and
    64-bit KVM/ARM implementations.

    Second, we add the definitions for the groups and attributes introduced
    by the above ABI. (We add the PMU define on the 32-bit side as well for
    symmetry and it may get used some day.)

    Third, we set up the arch-specific vcpu device operation handlers to
    call into the timer code for anything related to the
    KVM_ARM_VCPU_TIMER_CTRL group.

    Fourth, we implement support for getting and setting the timer interrupt
    numbers using the above defined ABI in the arch timer code.

    Fifth, we introduce error checking upon enabling the arch timer (which
    is called when first running a VCPU) to check that all VCPUs are
    configured to use the same PPI for the timer (as mandated by the
    architecture) and that the virtual and physical timers are not
    configured to use the same IRQ number.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Marc Zyngier

    Christoffer Dall
     
  • We currently initialize the arch timer IRQ numbers from the reset code,
    presumably because we once intended to model multiple CPU or SoC types
    from within the kernel and have hard-coded reset values in the reset
    code.

    As we are moving towards userspace being in charge of more fine-grained
    CPU emulation and stitching together the pieces needed to emulate a
    particular type of CPU, we should no longer have a tight coupling
    between resetting a VCPU and setting IRQ numbers.

    Therefore, move the logic to define and use the default IRQ numbers to
    the timer code and set the IRQ number immediately when creating the
    VCPU.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Marc Zyngier

    Christoffer Dall
     
  • We are about to need this define in the arch timer code as well so move
    it to a common location.

    Signed-off-by: Christoffer Dall
    Acked-by: Marc Zyngier

    Christoffer Dall
     
  • Since we got support for devices in userspace which allows reporting the
    PMU overflow output status to userspace, we should actually allow
    creating the PMU on systems without an in-kernel irqchip, which in turn
    requires us to slightly clarify error codes for the ABI and move things
    around for the initialization phase.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Marc Zyngier

    Christoffer Dall
     

18 May, 2017

1 commit

  • If userspace creates the VCPUs after initializing the VGIC, then we end
    up in a situation where we trigger a bug in kvm_vcpu_get_idx(), because
    it is called prior to adding the VCPU into the vcpus array on the VM.

    There is no tight coupling between the VCPU index and the area of the
    redistributor region used for the VCPU, so we can simply ensure that all
    creations of redistributors are serialized per VM, and increment an
    offset when we successfully add a redistributor.

    The vgic_register_redist_iodev() function can be called from two paths:
    vgic_redister_all_redist_iodev() which is called via the kvm_vgic_addr()
    device attribute handler. This patch already holds the kvm->lock mutex.

    The other path is via kvm_vgic_vcpu_init, which is called through a
    longer chain from kvm_vm_ioctl_create_vcpu(), which releases the
    kvm->lock mutex just before calling kvm_arch_vcpu_create(), so we can
    simply take this mutex again later for our purposes.

    Fixes: ab6f468c10 ("KVM: arm/arm64: Register iodevs when setting redist base and creating VCPUs")
    Signed-off-by: Christoffer Dall
    Tested-by: Jean-Philippe Brucker
    Reviewed-by: Eric Auger

    Christoffer Dall
     

09 May, 2017

4 commits

  • …l/git/kvmarm/kvmarm into HEAD

    Second round of KVM/ARM Changes for v4.12.

    Changes include:
    - A fix related to the 32-bit idmap stub
    - A fix to the bitmask used to deode the operands of an AArch32 CP
    instruction
    - We have moved the files shared between arch/arm/kvm and
    arch/arm64/kvm to virt/kvm/arm
    - We add support for saving/restoring the virtual ITS state to
    userspace

    Paolo Bonzini
     
  • The its->initialized doesn't bring much to the table, and creates
    unnecessary ordering between setting the address and initializing it
    (which amounts to exactly nothing).

    Let's kill it altogether, making KVM_DEV_ARM_VGIC_CTRL_INIT the no-op
    it deserves to be.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall
    Reviewed-by: Eric Auger

    Marc Zyngier
     
  • Instead of waiting with registering KVM iodevs until the first VCPU is
    run, we can actually create the iodevs when the redist base address is
    set. The only downside is that we must now also check if we need to do
    this for VCPUs which are created after creating the VGIC, because there
    is no enforced ordering between creating the VGIC (and setting its base
    addresses) and creating the VCPUs.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Eric Auger

    Christoffer Dall
     
  • Pull KVM updates from Paolo Bonzini:
    "ARM:
    - HYP mode stub supports kexec/kdump on 32-bit
    - improved PMU support
    - virtual interrupt controller performance improvements
    - support for userspace virtual interrupt controller (slower, but
    necessary for KVM on the weird Broadcom SoCs used by the Raspberry
    Pi 3)

    MIPS:
    - basic support for hardware virtualization (ImgTec P5600/P6600/I6400
    and Cavium Octeon III)

    PPC:
    - in-kernel acceleration for VFIO

    s390:
    - support for guests without storage keys
    - adapter interruption suppression

    x86:
    - usual range of nVMX improvements, notably nested EPT support for
    accessed and dirty bits
    - emulation of CPL3 CPUID faulting

    generic:
    - first part of VCPU thread request API
    - kvm_stat improvements"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
    kvm: nVMX: Don't validate disabled secondary controls
    KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick
    Revert "KVM: Support vCPU-based gfn->hva cache"
    tools/kvm: fix top level makefile
    KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING
    KVM: Documentation: remove VM mmap documentation
    kvm: nVMX: Remove superfluous VMX instruction fault checks
    KVM: x86: fix emulation of RSM and IRET instructions
    KVM: mark requests that need synchronization
    KVM: return if kvm_vcpu_wake_up() did wake up the VCPU
    KVM: add explicit barrier to kvm_vcpu_kick
    KVM: perform a wake_up in kvm_make_all_cpus_request
    KVM: mark requests that do not need a wakeup
    KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up
    KVM: x86: always use kvm_make_request instead of set_bit
    KVM: add kvm_{test,clear}_request to replace {test,clear}_bit
    s390: kvm: Cpu model support for msa6, msa7 and msa8
    KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
    kvm: better MWAIT emulation for guests
    KVM: x86: virtualize cpuid faulting
    ...

    Linus Torvalds
     

08 May, 2017

1 commit

  • We plan to support different migration ABIs, ie. characterizing
    the ITS table layout format in guest RAM. For example, a new ABI
    will be needed if vLPIs get supported for nested use case.

    So let's introduce an array of supported ABIs (at the moment a single
    ABI is supported though). The following characteristics are foreseen
    to vary with the ABI: size of table entries, save/restore operation,
    the way abi settings are applied.

    By default the MAX_ABI_REV is applied on its creation. In subsequent
    patches we will introduce a way for the userspace to change the ABI
    in use.

    The entry sizes now are set according to the ABI version and not
    hardcoded anymore.

    Signed-off-by: Eric Auger
    Reviewed-by: Christoffer Dall

    Eric Auger
     

09 Apr, 2017

5 commits

  • When not using an in-kernel VGIC, but instead emulating an interrupt
    controller in userspace, we should report the PMU overflow status to
    that userspace interrupt controller using the KVM_CAP_ARM_USER_IRQ
    feature.

    Reviewed-by: Alexander Graf
    Reviewed-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • If you're running with a userspace gic or other interrupt controller
    (that is no vgic in the kernel), then you have so far not been able to
    use the architected timers, because the output of the architected
    timers, which are driven inside the kernel, was a kernel-only construct
    between the arch timer code and the vgic.

    This patch implements the new KVM_CAP_ARM_USER_IRQ feature, where we use a
    side channel on the kvm_run structure, run->s.regs.device_irq_level, to
    always notify userspace of the timer output levels when using a userspace
    irqchip.

    This works by ensuring that before we enter the guest, if the timer
    output level has changed compared to what we last told userspace, we
    don't enter the guest, but instead return to userspace to notify it of
    the new level. If we are exiting, because of an MMIO for example, and
    the level changed at the same time, the value is also updated and
    userspace can sample the line as it needs. This is nicely achieved
    simply always updating the timer_irq_level field after the main run
    loop.

    Note that the kvm_timer_update_irq trace event is changed to show the
    host IRQ number for the timer instead of the guest IRQ number, because
    the kernel no longer know which IRQ userspace wires up the timer signal
    to.

    Also note that this patch implements all required functionality but does
    not yet advertise the capability.

    Reviewed-by: Alexander Graf
    Reviewed-by: Marc Zyngier
    Signed-off-by: Alexander Graf
    Signed-off-by: Christoffer Dall

    Alexander Graf
     
  • We don't use these fields anymore so let's nuke them completely.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • There is no need to calculate and maintain live_lrs when we always
    populate the lowest numbered LRs first on every entry and clear all LRs
    on every exit.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • We don't have to save/restore the VMCR on every entry to/from the guest,
    since on GICv2 we can access the control interface from EL1 and on VHE
    systems with GICv3 we can access the control interface from KVM running
    in EL2.

    GICv3 systems without VHE becomes the rare case, which has to
    save/restore the register on each round trip.

    Note that userspace accesses may see out-of-date values if the VCPU is
    running while accessing the VGIC state via the KVM device API, but this
    is already the case and it is up to userspace to quiesce the CPUs before
    reading the CPU registers from the GIC for an up-to-date view.

    Reviewed-by: Marc Zyngier
    Signed-off-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

04 Apr, 2017

1 commit

  • We currently have some code to clear the list registers on GICv3, but we
    never call this code, because the caller got nuked when removing the old
    vgic. We also used to have a similar GICv2 part, but that got lost in
    the process too.

    Let's reintroduce the logic for GICv2 and call the logic when we
    initialize the use of hypervisors on the CPU, for example when first
    loading KVM or when exiting a low power state.

    Reviewed-by: Marc Zyngier
    Signed-off-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Christoffer Dall
     

08 Feb, 2017

6 commits


30 Jan, 2017

1 commit

  • VGICv3 CPU interface registers are accessed using
    KVM_DEV_ARM_VGIC_CPU_SYSREGS ioctl. These registers are accessed
    as 64-bit. The cpu MPIDR value is passed along with register id.
    It is used to identify the cpu for registers access.

    The VM that supports SEIs expect it on destination machine to handle
    guest aborts and hence checked for ICC_CTLR_EL1.SEIS compatibility.
    Similarly, VM that supports Affinity Level 3 that is required for AArch64
    mode, is required to be supported on destination machine. Hence checked
    for ICC_CTLR_EL1.A3V compatibility.

    The arch/arm64/kvm/vgic-sys-reg-v3.c handles read and write of VGIC
    CPU registers for AArch64.

    For AArch32 mode, arch/arm/kvm/vgic-v3-coproc.c file is created but
    APIs are not implemented.

    Updated arch/arm/include/uapi/asm/kvm.h with new definitions
    required to compile for AArch32.

    The version of VGIC v3 specification is defined here
    Documentation/virtual/kvm/devices/arm-vgic-v3.txt

    Acked-by: Christoffer Dall
    Reviewed-by: Eric Auger
    Signed-off-by: Pavel Fedin
    Signed-off-by: Vijaya Kumar K
    Signed-off-by: Marc Zyngier

    Vijaya Kumar K
     

25 Jan, 2017

2 commits

  • Add a file to debugfs to read the in-kernel state of the vgic. We don't
    do any locking of the entire VGIC state while traversing all the IRQs,
    so if the VM is running the user/developer may not see a quiesced state,
    but should take care to pause the VM using facilities in user space for
    that purpose.

    We also don't support LPIs yet, but they can be added easily if needed.

    Reviewed-by: Eric Auger
    Tested-by: Eric Auger
    Tested-by: Andre Przywara
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • One of the goals behind the VGIC redesign was to get rid of cached or
    intermediate state in the data structures, but we decided to allow
    ourselves to precompute the pending value of an IRQ based on the line
    level and pending latch state. However, this has now become difficult
    to base proper GICv3 save/restore on, because there is a potential to
    modify the pending state without knowing if an interrupt is edge or
    level configured.

    See the following post and related message for more background:
    https://lists.cs.columbia.edu/pipermail/kvmarm/2017-January/023195.html

    This commit gets rid of the precomputed pending field in favor of a
    function that calculates the value when needed, irq_is_pending().

    The soft_pending field is renamed to pending_latch to represent that
    this latch is the equivalent hardware latch which gets manipulated by
    the input signal for edge-triggered interrupts and when writing to the
    SPENDR/CPENDR registers.

    After this commit save/restore code should be able to simply restore the
    pending_latch state, line_level state, and config state in any order and
    get the desired result.

    Reviewed-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Tested-by: Andre Przywara
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

13 Jan, 2017

1 commit

  • Current KVM world switch code is unintentionally setting wrong bits to
    CNTHCTL_EL2 when E2H == 1, which may allow guest OS to access physical
    timer. Bit positions of CNTHCTL_EL2 are changing depending on
    HCR_EL2.E2H bit. EL1PCEN and EL1PCTEN are 1st and 0th bits when E2H is
    not set, but they are 11th and 10th bits respectively when E2H is set.

    In fact, on VHE we only need to set those bits once, not for every world
    switch. This is because the host kernel runs in EL2 with HCR_EL2.TGE ==
    1, which makes those bits have no effect for the host kernel execution.
    So we just set those bits once for guests, and that's it.

    Signed-off-by: Jintack Lim
    Reviewed-by: Marc Zyngier
    Signed-off-by: Marc Zyngier

    Jintack Lim
     

25 Dec, 2016

1 commit


22 Sep, 2016

2 commits

  • This patch allows to build and use vgic-v3 in 32-bit mode.

    Unfortunately, it can not be split in several steps without extra
    stubs to keep patches independent and bisectable. For instance,
    virt/kvm/arm/vgic/vgic-v3.c uses function from vgic-v3-sr.c, handling
    access to GICv3 cpu interface from the guest requires vgic_v3.vgic_sre
    to be already defined.

    It is how support has been done:

    * handle SGI requests from the guest

    * report configured SRE on access to GICv3 cpu interface from the guest

    * required vgic-v3 macros are provided via uapi.h

    * static keys are used to select GIC backend

    * to make vgic-v3 build KVM_ARM_VGIC_V3 guard is removed along with
    the static inlines

    Acked-by: Marc Zyngier
    Reviewed-by: Christoffer Dall
    Signed-off-by: Vladimir Murzin
    Signed-off-by: Christoffer Dall

    Vladimir Murzin
     
  • Currently GIC backend is selected via alternative framework and this
    is fine. We are going to introduce vgic-v3 to 32-bit world and there
    we don't have patching framework in hand, so we can either check
    support for GICv3 every time we need to choose which backend to use or
    try to optimise it by using static keys. The later looks quite
    promising because we can share logic involved in selecting GIC backend
    between architectures if both uses static keys.

    This patch moves arm64 from alternative to static keys framework for
    selecting GIC backend. For that we embed static key into vgic_global
    and enable the key during vgic initialisation based on what has
    already been exposed by the host GIC driver.

    Acked-by: Marc Zyngier
    Signed-off-by: Vladimir Murzin
    Signed-off-by: Christoffer Dall

    Vladimir Murzin
     

08 Sep, 2016

2 commits

  • Now that we have the necessary infrastructure to handle MMIO accesses
    in HYP, perform the GICV access on behalf of the guest. This requires
    checking that the access is strictly 32bit, properly aligned, and
    falls within the expected range.

    When all condition are satisfied, we perform the access and tell
    the rest of the HYP code that the instruction has been correctly
    emulated.

    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • In order to efficiently perform the GICV access on behalf of the
    guest, we need to be able to avoid going back all the way to
    the host kernel.

    For this, we introduce a new hook in the world switch code,
    conveniently placed just after populating the fault info.
    At that point, we only have saved/restored the GP registers,
    and we can quickly perform all the required checks (data abort,
    translation fault, valid faulting syndrome, not an external
    abort, not a PTW).

    Coming back from the emulation code, we need to skip the emulated
    instruction. This involves an additional bit of save/restore in
    order to be able to access the guest's PC (and possibly CPSR if
    this is a 32bit guest).

    At this stage, no emulation code is provided.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     

04 Aug, 2016

1 commit


03 Aug, 2016

1 commit

  • Pull KVM updates from Paolo Bonzini:

    - ARM: GICv3 ITS emulation and various fixes. Removal of the
    old VGIC implementation.

    - s390: support for trapping software breakpoints, nested
    virtualization (vSIE), the STHYI opcode, initial extensions
    for CPU model support.

    - MIPS: support for MIPS64 hosts (32-bit guests only) and lots
    of cleanups, preliminary to this and the upcoming support for
    hardware virtualization extensions.

    - x86: support for execute-only mappings in nested EPT; reduced
    vmexit latency for TSC deadline timer (by about 30%) on Intel
    hosts; support for more than 255 vCPUs.

    - PPC: bugfixes.

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits)
    KVM: PPC: Introduce KVM_CAP_PPC_HTM
    MIPS: Select HAVE_KVM for MIPS64_R{2,6}
    MIPS: KVM: Reset CP0_PageMask during host TLB flush
    MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX()
    MIPS: KVM: Sign extend MFC0/RDHWR results
    MIPS: KVM: Fix 64-bit big endian dynamic translation
    MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase
    MIPS: KVM: Use 64-bit CP0_EBase when appropriate
    MIPS: KVM: Set CP0_Status.KX on MIPS64
    MIPS: KVM: Make entry code MIPS64 friendly
    MIPS: KVM: Use kmap instead of CKSEG0ADDR()
    MIPS: KVM: Use virt_to_phys() to get commpage PFN
    MIPS: Fix definition of KSEGX() for 64-bit
    KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD
    kvm: x86: nVMX: maintain internal copy of current VMCS
    KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
    KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
    KVM: arm64: vgic-its: Simplify MAPI error handling
    KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers
    KVM: arm64: vgic-its: Turn device_id validation into generic ID validation
    ...

    Linus Torvalds
     

23 Jul, 2016

1 commit

  • This patch adds compilation and link against irqchip.

    Main motivation behind using irqchip code is to enable MSI
    routing code. In the future irqchip routing may also be useful
    when targeting multiple irqchips.

    Routing standard callbacks now are implemented in vgic-irqfd:
    - kvm_set_routing_entry
    - kvm_set_irq
    - kvm_set_msi

    They only are supported with new_vgic code.

    Both HAVE_KVM_IRQCHIP and HAVE_KVM_IRQ_ROUTING are defined.
    KVM_CAP_IRQ_ROUTING is advertised and KVM_SET_GSI_ROUTING is allowed.

    So from now on IRQCHIP routing is enabled and a routing table entry
    must exist for irqfd injection to succeed for a given SPI. This patch
    builds a default flat irqchip routing table (gsi=irqchip.pin) covering
    all the VGIC SPI indexes. This routing table is overwritten by the
    first first user-space call to KVM_SET_GSI_ROUTING ioctl.

    MSI routing setup is not yet allowed.

    Signed-off-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Eric Auger
     

19 Jul, 2016

1 commit