25 Aug, 2019

1 commit

  • At the moment we use 2 IO devices per GICv3 redistributor: one
    one for the RD_base frame and one for the SGI_base frame.

    Instead we can use a single IO device per redistributor (the 2
    frames are contiguous). This saves slots on the KVM_MMIO_BUS
    which is currently limited to NR_IOBUS_DEVS (1000).

    This change allows to instantiate up to 512 redistributors and may
    speed the guest boot with a large number of VCPUs.

    Signed-off-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Eric Auger
     

19 Aug, 2019

1 commit


05 Aug, 2019

1 commit

  • Since commit commit 328e56647944 ("KVM: arm/arm64: vgic: Defer
    touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
    its GICv2 equivalent) loaded as long as we can, only syncing it
    back when we're scheduled out.

    There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
    which is indirectly called from kvm_vcpu_check_block(), needs to
    evaluate the guest's view of ICC_PMR_EL1. At the point were we
    call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
    changes to PMR is not visible in memory until we do a vcpu_put().

    Things go really south if the guest does the following:

    mov x0, #0 // or any small value masking interrupts
    msr ICC_PMR_EL1, x0

    [vcpu preempted, then rescheduled, VMCR sampled]

    mov x0, #ff // allow all interrupts
    msr ICC_PMR_EL1, x0
    wfi // traps to EL2, so samping of VMCR

    [interrupt arrives just after WFI]

    Here, the hypervisor's view of PMR is zero, while the guest has enabled
    its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
    interrupts are pending (despite an interrupt being received) and we'll
    block for no reason. If the guest doesn't have a periodic interrupt
    firing once it has blocked, it will stay there forever.

    To avoid this unfortuante situation, let's resync VMCR from
    kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
    will observe the latest value of PMR.

    This has been found by booting an arm64 Linux guest with the pseudo NMI
    feature, and thus using interrupt priorities to mask interrupts instead
    of the usual PSTATE masking.

    Cc: stable@vger.kernel.org # 4.12
    Fixes: 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     

19 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation this program is
    distributed in the hope that it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details you should have received a copy of the gnu general
    public license along with this program if not see http www gnu org
    licenses

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 503 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Alexios Zavras
    Reviewed-by: Allison Randal
    Reviewed-by: Enrico Weigelt
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

24 Jan, 2019

3 commits


12 Aug, 2018

1 commit

  • Although vgic-v3 now supports Group0 interrupts, it still doesn't
    deal with Group0 SGIs. As usually with the GIC, nothing is simple:

    - ICC_SGI1R can signal SGIs of both groups, since GICD_CTLR.DS==1
    with KVM (as per 8.1.10, Non-secure EL1 access)

    - ICC_SGI0R can only generate Group0 SGIs

    - ICC_ASGI1R sees its scope refocussed to generate only Group0
    SGIs (as per the note at the bottom of Table 8-14)

    We only support Group1 SGIs so far, so no material change.

    Reviewed-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     

21 Jul, 2018

3 commits

  • Simply letting IGROUPR be writable from userspace would break
    migration from old kernels to newer kernels, because old kernels
    incorrectly report interrupt groups as group 1. This would not be a big
    problem if userspace wrote GICD_IIDR as read from the kernel, because we
    could detect the incompatibility and return an error to userspace.
    Unfortunately, this is not the case with current userspace
    implementations and simply letting IGROUPR be writable from userspace for
    an emulated GICv2 silently breaks migration and causes the destination
    VM to no longer run after migration.

    We now encourage userspace to write the read and expected value of
    GICD_IIDR as the first part of a GIC register restore, and if we observe
    a write to GICD_IIDR we know that userspace has been updated and has had
    a chance to cope with older kernels (VGICv2 IIDR.Revision == 0)
    incorrectly reporting interrupts as group 1, and therefore we now allow
    groups to be user writable.

    Reviewed-by: Andrew Jones
    Signed-off-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Christoffer Dall
     
  • In preparation for proper group 0 and group 1 support in the vgic, we
    add a field in the struct irq to store the group of all interrupts.

    We initialize the group to group 0 when emulating GICv2 and to group 1
    when emulating GICv3, just like we treat them today. LPIs are always
    group 1. We also continue to ignore writes from the guest, preserving
    existing functionality, for now.

    Finally, we also add this field to the vgic debug logic to show the
    group for all interrupts.

    Reviewed-by: Andrew Jones
    Signed-off-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Christoffer Dall
     
  • As we are about to tweak implementation aspects of the VGIC emulation,
    while still preserving some level of backwards compatibility support,
    add a field to keep track of the implementation revision field which is
    reported to the VM and to userspace.

    Reviewed-by: Andrew Jones
    Signed-off-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Christoffer Dall
     

25 May, 2018

3 commits

  • Let's raise the number of supported vcpus along with
    vgic v3 now that HW is looming with more physical CPUs.

    Signed-off-by: Eric Auger
    Acked-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Eric Auger
     
  • kvm_vgic_vcpu_early_init gets called after kvm_vgic_cpu_init which
    is confusing. The call path is as follows:
    kvm_vm_ioctl_create_vcpu
    |_ kvm_arch_cpu_create
    |_ kvm_vcpu_init
    |_ kvm_arch_vcpu_init
    |_ kvm_vgic_vcpu_init
    |_ kvm_arch_vcpu_postcreate
    |_ kvm_vgic_vcpu_early_init

    Static initialization currently done in kvm_vgic_vcpu_early_init()
    can be moved to kvm_vgic_vcpu_init(). So let's move the code and
    remove kvm_vgic_vcpu_early_init(). kvm_arch_vcpu_postcreate() does
    nothing.

    Signed-off-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Eric Auger
     
  • At the moment KVM supports a single rdist region. We want to
    support several separate rdist regions so let's introduce a list
    of them. This patch currently only cares about a single
    entry in this list as the functionality to register several redist
    regions is not yet there. So this only translates the existing code
    into something functionally similar using that new data struct.

    The redistributor region handle is stored in the vgic_cpu structure
    to allow later computation of the TYPER last bit.

    Signed-off-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Eric Auger
     

27 Apr, 2018

1 commit

  • Now that we make sure we don't inject multiple instances of the
    same GICv2 SGI at the same time, we've made another bug more
    obvious:

    If we exit with an active SGI, we completely lose track of which
    vcpu it came from. On the next entry, we restore it with 0 as a
    source, and if that wasn't the right one, too bad. While this
    doesn't seem to trouble GIC-400, the architectural model gets
    offended and doesn't deactivate the interrupt on EOI.

    Another connected issue is that we will happilly make pending
    an interrupt from another vcpu, overriding the above zero with
    something that is just as inconsistent. Don't do that.

    The final issue is that we signal a maintenance interrupt when
    no pending interrupts are present in the LR. Assuming we've fixed
    the two issues above, we end-up in a situation where we keep
    exiting as soon as we've reached the active state, and not be
    able to inject the following pending.

    The fix comes in 3 parts:
    - GICv2 SGIs have their source vcpu saved if they are active on
    exit, and restored on entry
    - Multi-SGIs cannot go via the Pending+Active state, as this would
    corrupt the source field
    - Multi-SGIs are converted to using MI on EOI instead of NPIE

    Fixes: 16ca6a607d84bef0 ("KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid")
    Reported-by: Mark Rutland
    Tested-by: Mark Rutland
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     

20 Mar, 2018

1 commit


19 Mar, 2018

2 commits

  • As we're about to change the way we map devices at HYP, we need
    to move away from kern_hyp_va on an IO address.

    One way of achieving this is to store the VAs in kvm_vgic_global_state,
    and use that directly from the HYP code. This requires a small change
    to create_hyp_io_mappings so that it can also return a HYP VA.

    We take this opportunity to nuke the vctrl_base field in the emulated
    distributor, as it is not used anymore.

    Acked-by: Catalin Marinas
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • There is really no need to store the vgic_elrsr on the VGIC data
    structures as the only need we have for the elrsr is to figure out if an
    LR is inactive when we save the VGIC state upon returning from the
    guest. We can might as well store this in a temporary local variable.

    Reviewed-by: Marc Zyngier
    Signed-off-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Christoffer Dall
     

15 Mar, 2018

1 commit

  • We currently don't allow resetting mapped IRQs from userspace, because
    their state is controlled by the hardware. But we do need to reset the
    state when the VM is reset, so we provide a function for the 'owner' of
    the mapped interrupt to reset the interrupt state.

    Currently only the timer uses mapped interrupts, so we call this
    function from the timer reset logic.

    Cc: stable@vger.kernel.org
    Fixes: 4c60e360d6df ("KVM: arm/arm64: Provide a get_input_level for the arch timer")
    Signed-off-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Christoffer Dall
     

02 Jan, 2018

1 commit

  • The GIC sometimes need to sample the physical line of a mapped
    interrupt. As we know this to be notoriously slow, provide a callback
    function for devices (such as the timer) which can do this much faster
    than talking to the distributor, for example by comparing a few
    in-memory values. Fall back to the good old method of poking the
    physical GIC if no callback is provided.

    Reviewed-by: Marc Zyngier
    Reviewed-by: Eric Auger
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

10 Nov, 2017

4 commits

  • The doorbell interrupt is only useful if the vcpu is blocked on WFI.
    In all other cases, recieving a doorbell interrupt is just a waste
    of cycles.

    So let's only enable the doorbell if a vcpu is getting blocked,
    and disable it when it is unblocked. This is very similar to
    what we're doing for the background timer.

    Reviewed-by: Christoffer Dall
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Let's use the irq bypass mechanism also used for x86 posted interrupts
    to intercept the virtual PCIe endpoint configuration and establish our
    LPI->VLPI mapping.

    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • In order to control the GICv4 view of virtual CPUs, we rely
    on an irqdomain allocated for that purpose. Let's add a couple
    of helpers to that effect.

    At the same time, the vgic data structures gain new fields to
    track all this... erm... wonderful stuff.

    The way we hook into the vgic init is slightly convoluted. We
    need the vgic to be initialized (in order to guarantee that
    the number of vcpus is now fixed), and we must have a vITS
    (otherwise this is all very pointless). So we end-up calling
    the init from both vgic_init and vgic_its_create.

    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Add a new has_gicv4 field in the global VGIC state that indicates
    whether the HW is GICv4 capable, as a per-VM predicate indicating
    if there is a possibility for a VM to support direct injection
    (the above being true and the VM having an ITS).

    Reviewed-by: Christoffer Dall
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     

07 Nov, 2017

1 commit

  • We want to reuse the core of the map/unmap functions for IRQ
    forwarding. Let's move the computation of the hwirq in
    kvm_vgic_map_phys_irq and pass the linux IRQ as parameter.
    the host_irq is added to struct vgic_irq.

    We introduce kvm_vgic_map/unmap_irq which take a struct vgic_irq
    handle as a parameter.

    Acked-by: Christoffer Dall
    Signed-off-by: Eric Auger
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Eric Auger
     

15 Jun, 2017

2 commits


08 Jun, 2017

4 commits

  • The PMU IRQ number is set through the VCPU device's KVM_SET_DEVICE_ATTR
    ioctl handler for the KVM_ARM_VCPU_PMU_V3_IRQ attribute, but there is no
    enforced or stated requirement that this must happen after initializing
    the VGIC. As a result, calling vgic_valid_spi() which relies on the
    nr_spis being set during the VGIC init can incorrectly fail.

    Introduce irq_is_spi, which determines if an IRQ number is within the
    SPI range without verifying it against the actual VGIC properties.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Marc Zyngier

    Christoffer Dall
     
  • When injecting an IRQ to the VGIC, you now have to present an owner
    token for that IRQ line to show that you are the owner of that line.

    IRQ lines driven from userspace or via an irqfd do not have an owner and
    will simply pass a NULL pointer.

    Also get rid of the unused kvm_vgic_inject_mapped_irq prototype.

    Signed-off-by: Christoffer Dall
    Acked-by: Marc Zyngier

    Christoffer Dall
     
  • Having multiple devices being able to signal the same interrupt line is
    very confusing and almost certainly guarantees a configuration error.

    Therefore, introduce a very simple allocator which allows a device to
    claim an interrupt line from the vgic for a given VM.

    Signed-off-by: Christoffer Dall
    Acked-by: Marc Zyngier

    Christoffer Dall
     
  • We are about to need this define in the arch timer code as well so move
    it to a common location.

    Signed-off-by: Christoffer Dall
    Acked-by: Marc Zyngier

    Christoffer Dall
     

18 May, 2017

1 commit

  • If userspace creates the VCPUs after initializing the VGIC, then we end
    up in a situation where we trigger a bug in kvm_vcpu_get_idx(), because
    it is called prior to adding the VCPU into the vcpus array on the VM.

    There is no tight coupling between the VCPU index and the area of the
    redistributor region used for the VCPU, so we can simply ensure that all
    creations of redistributors are serialized per VM, and increment an
    offset when we successfully add a redistributor.

    The vgic_register_redist_iodev() function can be called from two paths:
    vgic_redister_all_redist_iodev() which is called via the kvm_vgic_addr()
    device attribute handler. This patch already holds the kvm->lock mutex.

    The other path is via kvm_vgic_vcpu_init, which is called through a
    longer chain from kvm_vm_ioctl_create_vcpu(), which releases the
    kvm->lock mutex just before calling kvm_arch_vcpu_create(), so we can
    simply take this mutex again later for our purposes.

    Fixes: ab6f468c10 ("KVM: arm/arm64: Register iodevs when setting redist base and creating VCPUs")
    Signed-off-by: Christoffer Dall
    Tested-by: Jean-Philippe Brucker
    Reviewed-by: Eric Auger

    Christoffer Dall
     

09 May, 2017

4 commits

  • …l/git/kvmarm/kvmarm into HEAD

    Second round of KVM/ARM Changes for v4.12.

    Changes include:
    - A fix related to the 32-bit idmap stub
    - A fix to the bitmask used to deode the operands of an AArch32 CP
    instruction
    - We have moved the files shared between arch/arm/kvm and
    arch/arm64/kvm to virt/kvm/arm
    - We add support for saving/restoring the virtual ITS state to
    userspace

    Paolo Bonzini
     
  • The its->initialized doesn't bring much to the table, and creates
    unnecessary ordering between setting the address and initializing it
    (which amounts to exactly nothing).

    Let's kill it altogether, making KVM_DEV_ARM_VGIC_CTRL_INIT the no-op
    it deserves to be.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall
    Reviewed-by: Eric Auger

    Marc Zyngier
     
  • Instead of waiting with registering KVM iodevs until the first VCPU is
    run, we can actually create the iodevs when the redist base address is
    set. The only downside is that we must now also check if we need to do
    this for VCPUs which are created after creating the VGIC, because there
    is no enforced ordering between creating the VGIC (and setting its base
    addresses) and creating the VCPUs.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Eric Auger

    Christoffer Dall
     
  • Pull KVM updates from Paolo Bonzini:
    "ARM:
    - HYP mode stub supports kexec/kdump on 32-bit
    - improved PMU support
    - virtual interrupt controller performance improvements
    - support for userspace virtual interrupt controller (slower, but
    necessary for KVM on the weird Broadcom SoCs used by the Raspberry
    Pi 3)

    MIPS:
    - basic support for hardware virtualization (ImgTec P5600/P6600/I6400
    and Cavium Octeon III)

    PPC:
    - in-kernel acceleration for VFIO

    s390:
    - support for guests without storage keys
    - adapter interruption suppression

    x86:
    - usual range of nVMX improvements, notably nested EPT support for
    accessed and dirty bits
    - emulation of CPL3 CPUID faulting

    generic:
    - first part of VCPU thread request API
    - kvm_stat improvements"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
    kvm: nVMX: Don't validate disabled secondary controls
    KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick
    Revert "KVM: Support vCPU-based gfn->hva cache"
    tools/kvm: fix top level makefile
    KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING
    KVM: Documentation: remove VM mmap documentation
    kvm: nVMX: Remove superfluous VMX instruction fault checks
    KVM: x86: fix emulation of RSM and IRET instructions
    KVM: mark requests that need synchronization
    KVM: return if kvm_vcpu_wake_up() did wake up the VCPU
    KVM: add explicit barrier to kvm_vcpu_kick
    KVM: perform a wake_up in kvm_make_all_cpus_request
    KVM: mark requests that do not need a wakeup
    KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up
    KVM: x86: always use kvm_make_request instead of set_bit
    KVM: add kvm_{test,clear}_request to replace {test,clear}_bit
    s390: kvm: Cpu model support for msa6, msa7 and msa8
    KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
    kvm: better MWAIT emulation for guests
    KVM: x86: virtualize cpuid faulting
    ...

    Linus Torvalds
     

08 May, 2017

1 commit

  • We plan to support different migration ABIs, ie. characterizing
    the ITS table layout format in guest RAM. For example, a new ABI
    will be needed if vLPIs get supported for nested use case.

    So let's introduce an array of supported ABIs (at the moment a single
    ABI is supported though). The following characteristics are foreseen
    to vary with the ABI: size of table entries, save/restore operation,
    the way abi settings are applied.

    By default the MAX_ABI_REV is applied on its creation. In subsequent
    patches we will introduce a way for the userspace to change the ABI
    in use.

    The entry sizes now are set according to the ABI version and not
    hardcoded anymore.

    Signed-off-by: Eric Auger
    Reviewed-by: Christoffer Dall

    Eric Auger
     

09 Apr, 2017

3 commits

  • We don't use these fields anymore so let's nuke them completely.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • There is no need to calculate and maintain live_lrs when we always
    populate the lowest numbered LRs first on every entry and clear all LRs
    on every exit.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • We don't have to save/restore the VMCR on every entry to/from the guest,
    since on GICv2 we can access the control interface from EL1 and on VHE
    systems with GICv3 we can access the control interface from KVM running
    in EL2.

    GICv3 systems without VHE becomes the rare case, which has to
    save/restore the register on each round trip.

    Note that userspace accesses may see out-of-date values if the VCPU is
    running while accessing the VGIC state via the KVM device API, but this
    is already the case and it is up to userspace to quiesce the CPUs before
    reading the CPU registers from the GIC for an up-to-date view.

    Reviewed-by: Marc Zyngier
    Signed-off-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Christoffer Dall