02 May, 2018

1 commit

  • commit 85bd0ba1ff9875798fad94218b627ea9f768f3c3 upstream.

    Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1
    or 1.0 to a guest, defaulting to the latest version of the PSCI
    implementation that is compatible with the requested version. This is
    no different from doing a firmware upgrade on KVM.

    But in order to give a chance to hypothetical badly implemented guests
    that would have a fit by discovering something other than PSCI 0.2,
    let's provide a new API that allows userspace to pick one particular
    version of the API.

    This is implemented as a new class of "firmware" registers, where
    we expose the PSCI version. This allows the PSCI version to be
    save/restored as part of a guest migration, and also set to
    any supported version if the guest requires it.

    Cc: stable@vger.kernel.org #4.16
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Greg Kroah-Hartman

    Marc Zyngier
     

17 Feb, 2018

5 commits

  • Commit a4097b351118 upstream.

    We're about to need kvm_psci_version in HYP too. So let's turn it
    into a static inline, and pass the kvm structure as a second
    parameter (so that HYP can do a kern_hyp_va on it).

    Tested-by: Ard Biesheuvel
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Catalin Marinas
    Signed-off-by: Will Deacon
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Greg Kroah-Hartman

    Marc Zyngier
     
  • Commit 09e6be12effd upstream.

    The new SMC Calling Convention (v1.1) allows for a reduced overhead
    when calling into the firmware, and provides a new feature discovery
    mechanism.

    Make it visible to KVM guests.

    Tested-by: Ard Biesheuvel
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Catalin Marinas
    Signed-off-by: Will Deacon
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Greg Kroah-Hartman

    Marc Zyngier
     
  • Commit 58e0b2239a4d upstream.

    PSCI 1.0 can be trivially implemented by providing the FEATURES
    call on top of PSCI 0.2 and returning 1.0 as the PSCI version.

    We happily ignore everything else, as they are either optional or
    are clarifications that do not require any additional change.

    PSCI 1.0 is now the default until we decide to add a userspace
    selection API.

    Reviewed-by: Christoffer Dall
    Tested-by: Ard Biesheuvel
    Signed-off-by: Marc Zyngier
    Signed-off-by: Catalin Marinas
    Signed-off-by: Will Deacon
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Greg Kroah-Hartman

    Marc Zyngier
     
  • Commit d0a144f12a7c upstream.

    As we're about to trigger a PSCI version explosion, it doesn't
    hurt to introduce a PSCI_VERSION helper that is going to be
    used everywhere.

    Reviewed-by: Christoffer Dall
    Tested-by: Ard Biesheuvel
    Signed-off-by: Marc Zyngier
    Signed-off-by: Catalin Marinas
    Signed-off-by: Will Deacon
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Greg Kroah-Hartman

    Marc Zyngier
     
  • Commit 1a2fb94e6a77 upstream.

    As we're about to update the PSCI support, and because I'm lazy,
    let's move the PSCI include file to include/kvm so that both
    ARM architectures can find it.

    Acked-by: Christoffer Dall
    Tested-by: Ard Biesheuvel
    Signed-off-by: Marc Zyngier
    Signed-off-by: Catalin Marinas
    Signed-off-by: Will Deacon
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Greg Kroah-Hartman

    Marc Zyngier
     

25 Jul, 2017

1 commit

  • kvm_pmu_overflow_set() is called from perf's interrupt handler,
    making the call of kvm_vgic_inject_irq() from it introduced with
    "KVM: arm/arm64: PMU: remove request-less vcpu kick" a really bad
    idea, as it's quite easy to try and retake a lock that the
    interrupted context is already holding. The fix is to use a vcpu
    kick, leaving the interrupt injection to kvm_pmu_sync_hwstate(),
    like it was doing before the refactoring. We don't just revert,
    though, because before the kick was request-less, leaving the vcpu
    exposed to the request-less vcpu kick race, and also because the
    kick was used unnecessarily from register access handlers.

    Reviewed-by: Christoffer Dall
    Signed-off-by: Andrew Jones
    Signed-off-by: Marc Zyngier

    Andrew Jones
     

15 Jun, 2017

2 commits


08 Jun, 2017

7 commits

  • The PMU IRQ number is set through the VCPU device's KVM_SET_DEVICE_ATTR
    ioctl handler for the KVM_ARM_VCPU_PMU_V3_IRQ attribute, but there is no
    enforced or stated requirement that this must happen after initializing
    the VGIC. As a result, calling vgic_valid_spi() which relies on the
    nr_spis being set during the VGIC init can incorrectly fail.

    Introduce irq_is_spi, which determines if an IRQ number is within the
    SPI range without verifying it against the actual VGIC properties.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Marc Zyngier

    Christoffer Dall
     
  • When injecting an IRQ to the VGIC, you now have to present an owner
    token for that IRQ line to show that you are the owner of that line.

    IRQ lines driven from userspace or via an irqfd do not have an owner and
    will simply pass a NULL pointer.

    Also get rid of the unused kvm_vgic_inject_mapped_irq prototype.

    Signed-off-by: Christoffer Dall
    Acked-by: Marc Zyngier

    Christoffer Dall
     
  • Having multiple devices being able to signal the same interrupt line is
    very confusing and almost certainly guarantees a configuration error.

    Therefore, introduce a very simple allocator which allows a device to
    claim an interrupt line from the vgic for a given VM.

    Signed-off-by: Christoffer Dall
    Acked-by: Marc Zyngier

    Christoffer Dall
     
  • First we define an ABI using the vcpu devices that lets userspace set
    the interrupt numbers for the various timers on both the 32-bit and
    64-bit KVM/ARM implementations.

    Second, we add the definitions for the groups and attributes introduced
    by the above ABI. (We add the PMU define on the 32-bit side as well for
    symmetry and it may get used some day.)

    Third, we set up the arch-specific vcpu device operation handlers to
    call into the timer code for anything related to the
    KVM_ARM_VCPU_TIMER_CTRL group.

    Fourth, we implement support for getting and setting the timer interrupt
    numbers using the above defined ABI in the arch timer code.

    Fifth, we introduce error checking upon enabling the arch timer (which
    is called when first running a VCPU) to check that all VCPUs are
    configured to use the same PPI for the timer (as mandated by the
    architecture) and that the virtual and physical timers are not
    configured to use the same IRQ number.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Marc Zyngier

    Christoffer Dall
     
  • We currently initialize the arch timer IRQ numbers from the reset code,
    presumably because we once intended to model multiple CPU or SoC types
    from within the kernel and have hard-coded reset values in the reset
    code.

    As we are moving towards userspace being in charge of more fine-grained
    CPU emulation and stitching together the pieces needed to emulate a
    particular type of CPU, we should no longer have a tight coupling
    between resetting a VCPU and setting IRQ numbers.

    Therefore, move the logic to define and use the default IRQ numbers to
    the timer code and set the IRQ number immediately when creating the
    VCPU.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Marc Zyngier

    Christoffer Dall
     
  • We are about to need this define in the arch timer code as well so move
    it to a common location.

    Signed-off-by: Christoffer Dall
    Acked-by: Marc Zyngier

    Christoffer Dall
     
  • Since we got support for devices in userspace which allows reporting the
    PMU overflow output status to userspace, we should actually allow
    creating the PMU on systems without an in-kernel irqchip, which in turn
    requires us to slightly clarify error codes for the ABI and move things
    around for the initialization phase.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Marc Zyngier

    Christoffer Dall
     

18 May, 2017

1 commit

  • If userspace creates the VCPUs after initializing the VGIC, then we end
    up in a situation where we trigger a bug in kvm_vcpu_get_idx(), because
    it is called prior to adding the VCPU into the vcpus array on the VM.

    There is no tight coupling between the VCPU index and the area of the
    redistributor region used for the VCPU, so we can simply ensure that all
    creations of redistributors are serialized per VM, and increment an
    offset when we successfully add a redistributor.

    The vgic_register_redist_iodev() function can be called from two paths:
    vgic_redister_all_redist_iodev() which is called via the kvm_vgic_addr()
    device attribute handler. This patch already holds the kvm->lock mutex.

    The other path is via kvm_vgic_vcpu_init, which is called through a
    longer chain from kvm_vm_ioctl_create_vcpu(), which releases the
    kvm->lock mutex just before calling kvm_arch_vcpu_create(), so we can
    simply take this mutex again later for our purposes.

    Fixes: ab6f468c10 ("KVM: arm/arm64: Register iodevs when setting redist base and creating VCPUs")
    Signed-off-by: Christoffer Dall
    Tested-by: Jean-Philippe Brucker
    Reviewed-by: Eric Auger

    Christoffer Dall
     

09 May, 2017

4 commits

  • …l/git/kvmarm/kvmarm into HEAD

    Second round of KVM/ARM Changes for v4.12.

    Changes include:
    - A fix related to the 32-bit idmap stub
    - A fix to the bitmask used to deode the operands of an AArch32 CP
    instruction
    - We have moved the files shared between arch/arm/kvm and
    arch/arm64/kvm to virt/kvm/arm
    - We add support for saving/restoring the virtual ITS state to
    userspace

    Paolo Bonzini
     
  • The its->initialized doesn't bring much to the table, and creates
    unnecessary ordering between setting the address and initializing it
    (which amounts to exactly nothing).

    Let's kill it altogether, making KVM_DEV_ARM_VGIC_CTRL_INIT the no-op
    it deserves to be.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall
    Reviewed-by: Eric Auger

    Marc Zyngier
     
  • Instead of waiting with registering KVM iodevs until the first VCPU is
    run, we can actually create the iodevs when the redist base address is
    set. The only downside is that we must now also check if we need to do
    this for VCPUs which are created after creating the VGIC, because there
    is no enforced ordering between creating the VGIC (and setting its base
    addresses) and creating the VCPUs.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Eric Auger

    Christoffer Dall
     
  • Pull KVM updates from Paolo Bonzini:
    "ARM:
    - HYP mode stub supports kexec/kdump on 32-bit
    - improved PMU support
    - virtual interrupt controller performance improvements
    - support for userspace virtual interrupt controller (slower, but
    necessary for KVM on the weird Broadcom SoCs used by the Raspberry
    Pi 3)

    MIPS:
    - basic support for hardware virtualization (ImgTec P5600/P6600/I6400
    and Cavium Octeon III)

    PPC:
    - in-kernel acceleration for VFIO

    s390:
    - support for guests without storage keys
    - adapter interruption suppression

    x86:
    - usual range of nVMX improvements, notably nested EPT support for
    accessed and dirty bits
    - emulation of CPL3 CPUID faulting

    generic:
    - first part of VCPU thread request API
    - kvm_stat improvements"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
    kvm: nVMX: Don't validate disabled secondary controls
    KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick
    Revert "KVM: Support vCPU-based gfn->hva cache"
    tools/kvm: fix top level makefile
    KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING
    KVM: Documentation: remove VM mmap documentation
    kvm: nVMX: Remove superfluous VMX instruction fault checks
    KVM: x86: fix emulation of RSM and IRET instructions
    KVM: mark requests that need synchronization
    KVM: return if kvm_vcpu_wake_up() did wake up the VCPU
    KVM: add explicit barrier to kvm_vcpu_kick
    KVM: perform a wake_up in kvm_make_all_cpus_request
    KVM: mark requests that do not need a wakeup
    KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up
    KVM: x86: always use kvm_make_request instead of set_bit
    KVM: add kvm_{test,clear}_request to replace {test,clear}_bit
    s390: kvm: Cpu model support for msa6, msa7 and msa8
    KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
    kvm: better MWAIT emulation for guests
    KVM: x86: virtualize cpuid faulting
    ...

    Linus Torvalds
     

08 May, 2017

1 commit

  • We plan to support different migration ABIs, ie. characterizing
    the ITS table layout format in guest RAM. For example, a new ABI
    will be needed if vLPIs get supported for nested use case.

    So let's introduce an array of supported ABIs (at the moment a single
    ABI is supported though). The following characteristics are foreseen
    to vary with the ABI: size of table entries, save/restore operation,
    the way abi settings are applied.

    By default the MAX_ABI_REV is applied on its creation. In subsequent
    patches we will introduce a way for the userspace to change the ABI
    in use.

    The entry sizes now are set according to the ABI version and not
    hardcoded anymore.

    Signed-off-by: Eric Auger
    Reviewed-by: Christoffer Dall

    Eric Auger
     

09 Apr, 2017

5 commits

  • When not using an in-kernel VGIC, but instead emulating an interrupt
    controller in userspace, we should report the PMU overflow status to
    that userspace interrupt controller using the KVM_CAP_ARM_USER_IRQ
    feature.

    Reviewed-by: Alexander Graf
    Reviewed-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • If you're running with a userspace gic or other interrupt controller
    (that is no vgic in the kernel), then you have so far not been able to
    use the architected timers, because the output of the architected
    timers, which are driven inside the kernel, was a kernel-only construct
    between the arch timer code and the vgic.

    This patch implements the new KVM_CAP_ARM_USER_IRQ feature, where we use a
    side channel on the kvm_run structure, run->s.regs.device_irq_level, to
    always notify userspace of the timer output levels when using a userspace
    irqchip.

    This works by ensuring that before we enter the guest, if the timer
    output level has changed compared to what we last told userspace, we
    don't enter the guest, but instead return to userspace to notify it of
    the new level. If we are exiting, because of an MMIO for example, and
    the level changed at the same time, the value is also updated and
    userspace can sample the line as it needs. This is nicely achieved
    simply always updating the timer_irq_level field after the main run
    loop.

    Note that the kvm_timer_update_irq trace event is changed to show the
    host IRQ number for the timer instead of the guest IRQ number, because
    the kernel no longer know which IRQ userspace wires up the timer signal
    to.

    Also note that this patch implements all required functionality but does
    not yet advertise the capability.

    Reviewed-by: Alexander Graf
    Reviewed-by: Marc Zyngier
    Signed-off-by: Alexander Graf
    Signed-off-by: Christoffer Dall

    Alexander Graf
     
  • We don't use these fields anymore so let's nuke them completely.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • There is no need to calculate and maintain live_lrs when we always
    populate the lowest numbered LRs first on every entry and clear all LRs
    on every exit.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • We don't have to save/restore the VMCR on every entry to/from the guest,
    since on GICv2 we can access the control interface from EL1 and on VHE
    systems with GICv3 we can access the control interface from KVM running
    in EL2.

    GICv3 systems without VHE becomes the rare case, which has to
    save/restore the register on each round trip.

    Note that userspace accesses may see out-of-date values if the VCPU is
    running while accessing the VGIC state via the KVM device API, but this
    is already the case and it is up to userspace to quiesce the CPUs before
    reading the CPU registers from the GIC for an up-to-date view.

    Reviewed-by: Marc Zyngier
    Signed-off-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

04 Apr, 2017

1 commit

  • We currently have some code to clear the list registers on GICv3, but we
    never call this code, because the caller got nuked when removing the old
    vgic. We also used to have a similar GICv2 part, but that got lost in
    the process too.

    Let's reintroduce the logic for GICv2 and call the logic when we
    initialize the use of hypervisors on the CPU, for example when first
    loading KVM or when exiting a low power state.

    Reviewed-by: Marc Zyngier
    Signed-off-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Christoffer Dall
     

08 Feb, 2017

6 commits


30 Jan, 2017

1 commit

  • VGICv3 CPU interface registers are accessed using
    KVM_DEV_ARM_VGIC_CPU_SYSREGS ioctl. These registers are accessed
    as 64-bit. The cpu MPIDR value is passed along with register id.
    It is used to identify the cpu for registers access.

    The VM that supports SEIs expect it on destination machine to handle
    guest aborts and hence checked for ICC_CTLR_EL1.SEIS compatibility.
    Similarly, VM that supports Affinity Level 3 that is required for AArch64
    mode, is required to be supported on destination machine. Hence checked
    for ICC_CTLR_EL1.A3V compatibility.

    The arch/arm64/kvm/vgic-sys-reg-v3.c handles read and write of VGIC
    CPU registers for AArch64.

    For AArch32 mode, arch/arm/kvm/vgic-v3-coproc.c file is created but
    APIs are not implemented.

    Updated arch/arm/include/uapi/asm/kvm.h with new definitions
    required to compile for AArch32.

    The version of VGIC v3 specification is defined here
    Documentation/virtual/kvm/devices/arm-vgic-v3.txt

    Acked-by: Christoffer Dall
    Reviewed-by: Eric Auger
    Signed-off-by: Pavel Fedin
    Signed-off-by: Vijaya Kumar K
    Signed-off-by: Marc Zyngier

    Vijaya Kumar K
     

25 Jan, 2017

2 commits

  • Add a file to debugfs to read the in-kernel state of the vgic. We don't
    do any locking of the entire VGIC state while traversing all the IRQs,
    so if the VM is running the user/developer may not see a quiesced state,
    but should take care to pause the VM using facilities in user space for
    that purpose.

    We also don't support LPIs yet, but they can be added easily if needed.

    Reviewed-by: Eric Auger
    Tested-by: Eric Auger
    Tested-by: Andre Przywara
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • One of the goals behind the VGIC redesign was to get rid of cached or
    intermediate state in the data structures, but we decided to allow
    ourselves to precompute the pending value of an IRQ based on the line
    level and pending latch state. However, this has now become difficult
    to base proper GICv3 save/restore on, because there is a potential to
    modify the pending state without knowing if an interrupt is edge or
    level configured.

    See the following post and related message for more background:
    https://lists.cs.columbia.edu/pipermail/kvmarm/2017-January/023195.html

    This commit gets rid of the precomputed pending field in favor of a
    function that calculates the value when needed, irq_is_pending().

    The soft_pending field is renamed to pending_latch to represent that
    this latch is the equivalent hardware latch which gets manipulated by
    the input signal for edge-triggered interrupts and when writing to the
    SPENDR/CPENDR registers.

    After this commit save/restore code should be able to simply restore the
    pending_latch state, line_level state, and config state in any order and
    get the desired result.

    Reviewed-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Tested-by: Andre Przywara
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

13 Jan, 2017

1 commit

  • Current KVM world switch code is unintentionally setting wrong bits to
    CNTHCTL_EL2 when E2H == 1, which may allow guest OS to access physical
    timer. Bit positions of CNTHCTL_EL2 are changing depending on
    HCR_EL2.E2H bit. EL1PCEN and EL1PCTEN are 1st and 0th bits when E2H is
    not set, but they are 11th and 10th bits respectively when E2H is set.

    In fact, on VHE we only need to set those bits once, not for every world
    switch. This is because the host kernel runs in EL2 with HCR_EL2.TGE ==
    1, which makes those bits have no effect for the host kernel execution.
    So we just set those bits once for guests, and that's it.

    Signed-off-by: Jintack Lim
    Reviewed-by: Marc Zyngier
    Signed-off-by: Marc Zyngier

    Jintack Lim
     

25 Dec, 2016

1 commit


22 Sep, 2016

1 commit

  • This patch allows to build and use vgic-v3 in 32-bit mode.

    Unfortunately, it can not be split in several steps without extra
    stubs to keep patches independent and bisectable. For instance,
    virt/kvm/arm/vgic/vgic-v3.c uses function from vgic-v3-sr.c, handling
    access to GICv3 cpu interface from the guest requires vgic_v3.vgic_sre
    to be already defined.

    It is how support has been done:

    * handle SGI requests from the guest

    * report configured SRE on access to GICv3 cpu interface from the guest

    * required vgic-v3 macros are provided via uapi.h

    * static keys are used to select GIC backend

    * to make vgic-v3 build KVM_ARM_VGIC_V3 guard is removed along with
    the static inlines

    Acked-by: Marc Zyngier
    Reviewed-by: Christoffer Dall
    Signed-off-by: Vladimir Murzin
    Signed-off-by: Christoffer Dall

    Vladimir Murzin