17 Sep, 2015

1 commit

  • This patch removes config option of KVM_ARM_MAX_VCPUS,
    and like other ARCHs, just choose the maximum allowed
    value from hardware, and follows the reasons:

    1) from distribution view, the option has to be
    defined as the max allowed value because it need to
    meet all kinds of virtulization applications and
    need to support most of SoCs;

    2) using a bigger value doesn't introduce extra memory
    consumption, and the help text in Kconfig isn't accurate
    because kvm_vpu structure isn't allocated until request
    of creating VCPU is sent from QEMU;

    3) the main effect is that the field of vcpus[] in 'struct kvm'
    becomes a bit bigger(sizeof(void *) per vcpu) and need more cache
    lines to hold the structure, but 'struct kvm' is one generic struct,
    and it has worked well on other ARCHs already in this way. Also,
    the world switch frequecy is often low, for example, it is ~2000
    when running kernel building load in VM from APM xgene KVM host,
    so the effect is very small, and the difference can't be observed
    in my test at all.

    Cc: Dann Frazier
    Signed-off-by: Ming Lei
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Ming Lei
     

12 Aug, 2015

5 commits

  • In order to remove the crude hack where we sneak the masked bit
    into the timer's control register, make use of the phys_irq_map
    API control the active state of the interrupt.

    This causes some limited changes to allow for potential error
    propagation.

    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Virtual interrupts mapped to a HW interrupt should only be triggered
    from inside the kernel. Otherwise, you could end up confusing the
    kernel (and the GIC's) state machine.

    Rearrange the injection path so that kvm_vgic_inject_irq is
    used for non-mapped interrupts, and kvm_vgic_inject_mapped_irq is
    used for mapped interrupts. The latter should only be called from
    inside the kernel (timer, irqfd).

    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • In order to control the active state of an interrupt, introduce
    a pair of accessors allowing the state to be set/queried.

    This only affects the logical state, and the HW state will only be
    applied at world-switch time.

    Acked-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • In order to be able to feed physical interrupts to a guest, we need
    to be able to establish the virtual-physical mapping between the two
    worlds.

    The mappings are kept in a set of RCU lists, indexed by virtual interrupts.

    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • As we're about to cram more information in the vgic_lr structure
    (HW interrupt number and additional state information), we switch
    to a layout similar to the HW's:

    - use bitfields to save space (we don't need more than 10 bits
    to represent the irq numbers)
    - source CPU and HW interrupt can share the same field, as
    a SGI doesn't have a physical line.

    Reviewed-by: Alex Bennée
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     

08 Apr, 2015

1 commit


31 Mar, 2015

2 commits

  • Currently we have struct kvm_exit_mmio for encapsulating MMIO abort
    data to be passed on from syndrome decoding all the way down to the
    VGIC register handlers. Now as we switch the MMIO handling to be
    routed through the KVM MMIO bus, it does not make sense anymore to
    use that structure already from the beginning. So we keep the data in
    local variables until we put them into the kvm_io_bus framework.
    Then we fill kvm_exit_mmio in the VGIC only, making it a VGIC private
    structure. On that way we replace the data buffer in that structure
    with a pointer pointing to a single location in a local variable, so
    we get rid of some copying on the way.
    With all of the virtual GIC emulation code now being registered with
    the kvm_io_bus, we can remove all of the old MMIO handling code and
    its dispatching functionality.

    I didn't bother to rename kvm_exit_mmio (to vgic_mmio or something),
    because that touches a lot of code lines without any good reason.

    This is based on an original patch by Nikolay.

    Signed-off-by: Andre Przywara
    Cc: Nikolay Nikolaev
    Reviewed-by: Marc Zyngier
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • Using the framework provided by the recent vgic.c changes, we
    register a kvm_io_bus device on mapping the virtual GICv3 resources.
    The distributor mapping is pretty straight forward, but the
    redistributors need some more love, since they need to be tagged with
    the respective redistributor (read: VCPU) they are connected with.
    We use the kvm_io_bus framework to register one devices per VCPU.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Signed-off-by: Marc Zyngier

    Andre Przywara
     

27 Mar, 2015

3 commits

  • Using the framework provided by the recent vgic.c changes we register
    a kvm_io_bus device when initializing the virtual GICv2.

    Signed-off-by: Andre Przywara
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • Currently we use a lot of VGIC specific code to do the MMIO
    dispatching.
    Use the previous reworks to add kvm_io_bus style MMIO handlers.

    Those are not yet called by the MMIO abort handler, also the actual
    VGIC emulator function do not make use of it yet, but will be enabled
    with the following patches.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • iodev.h contains definitions for the kvm_io_bus framework. This is
    needed both by the generic KVM code in virt/kvm as well as by
    architecture specific code under arch/. Putting the header file in
    virt/kvm and using local includes in the architecture part seems at
    least dodgy to me, so let's move the file into include/kvm, so that a
    more natural "#include " can be used by all of the code.
    This also solves a problem later when using struct kvm_io_device
    in arm_vgic.h.
    Fixing up the FSF address in the GPL header and a wrong include path
    on the way.

    Signed-off-by: Andre Przywara
    Acked-by: Christoffer Dall
    Reviewed-by: Marc Zyngier
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Marc Zyngier

    Andre Przywara
     

14 Mar, 2015

3 commits

  • When a VCPU is no longer running, we currently check to see if it has a
    timer scheduled in the future, and if it does, we schedule a host
    hrtimer to notify is in case the timer expires while the VCPU is still
    not running. When the hrtimer fires, we mask the guest's timer and
    inject the timer IRQ (still relying on the guest unmasking the time when
    it receives the IRQ).

    This is all good and fine, but when migration a VM (checkpoint/restore)
    this introduces a race. It is unlikely, but possible, for the following
    sequence of events to happen:

    1. Userspace stops the VM
    2. Hrtimer for VCPU is scheduled
    3. Userspace checkpoints the VGIC state (no pending timer interrupts)
    4. The hrtimer fires, schedules work in a workqueue
    5. Workqueue function runs, masks the timer and injects timer interrupt
    6. Userspace checkpoints the timer state (timer masked)

    At restore time, you end up with a masked timer without any timer
    interrupts and your guest halts never receiving timer interrupts.

    Fix this by only kicking the VCPU in the workqueue function, and sample
    the expired state of the timer when entering the guest again and inject
    the interrupt and mask the timer only then.

    Signed-off-by: Christoffer Dall
    Signed-off-by: Alex Bennée
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • Migrating active interrupts causes the active state to be lost
    completely. This implements some additional bitmaps to track the active
    state on the distributor and export this to user space.

    Signed-off-by: Christoffer Dall
    Signed-off-by: Alex Bennée
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • There is an interesting bug in the vgic code, which manifests itself
    when the KVM run loop has a signal pending or needs a vmid generation
    rollover after having disabled interrupts but before actually switching
    to the guest.

    In this case, we flush the vgic as usual, but we sync back the vgic
    state and exit to userspace before entering the guest. The consequence
    is that we will be syncing the list registers back to the software model
    using the GICH_ELRSR and GICH_EISR from the last execution of the guest,
    potentially overwriting a list register containing an interrupt.

    This showed up during migration testing where we would capture a state
    where the VM has masked the arch timer but there were no interrupts,
    resulting in a hung test.

    Cc: Marc Zyngier
    Reported-by: Alex Bennee
    Signed-off-by: Christoffer Dall
    Signed-off-by: Alex Bennée
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

12 Mar, 2015

1 commit

  • We can definitely decide at run-time whether to use the GIC and timers
    or not, and the extra code and data structures that we allocate space
    for is really negligable with this config option, so I don't think it's
    worth the extra complexity of always having to define stub static
    inlines. The !CONFIG_KVM_ARM_VGIC/TIMER case is pretty much an untested
    code path anyway, so we're better off just getting rid of it.

    Signed-off-by: Christoffer Dall
    Acked-by: Marc Zyngier

    Christoffer Dall
     

21 Jan, 2015

8 commits

  • With all of the GICv3 code in place now we allow userland to ask the
    kernel for using a virtual GICv3 in the guest.
    Also we provide the necessary support for guests setting the memory
    addresses for the virtual distributor and redistributors.
    This requires some userland code to make use of that feature and
    explicitly ask for a virtual GICv3.
    Document that KVM_CREATE_IRQCHIP only works for GICv2, but is
    considered legacy and using KVM_CREATE_DEVICE is preferred.

    Signed-off-by: Andre Przywara
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • With all the necessary GICv3 emulation code in place, we can now
    connect the code to the GICv3 backend in the kernel.
    The LR register handling is different depending on the emulated GIC
    model, so provide different implementations for each.
    Also allow non-v2-compatible GICv3 implementations (which don't
    provide MMIO regions for the virtual CPU interface in the DT), but
    restrict those hosts to support GICv3 guests only.
    If the device tree provides a GICv2 compatible GICV resource entry,
    but that one is faulty, just disable the GICv2 emulation and let the
    user use at least the GICv3 emulation for guests.
    To provide proper support for the legacy KVM_CREATE_IRQCHIP ioctl,
    note virtual GICv2 compatibility in struct vgic_params and use it
    on creating a VGICv2.

    Signed-off-by: Andre Przywara
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • While the generation of a (virtual) inter-processor interrupt (SGI)
    on a GICv2 works by writing to a MMIO register, GICv3 uses the system
    register ICC_SGI1R_EL1 to trigger them.
    Add a trap handler function that calls the new SGI register handler
    in the GICv3 code. As ICC_SRE_EL1.SRE at this point is still always 0,
    this will not trap yet, but will only be used later when all the data
    structures have been initialized properly.

    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • With everything separated and prepared, we implement a model of a
    GICv3 distributor and redistributors by using the existing framework
    to provide handler functions for each register group.

    Currently we limit the emulation to a model enforcing a single
    security state, with SRE==1 (forcing system register access) and
    ARE==1 (allowing more than 8 VCPUs).

    We share some of the functions provided for GICv2 emulation, but take
    the different ways of addressing (v)CPUs into account.
    Save and restore is currently not implemented.

    Similar to the split-off of the GICv2 specific code, the new emulation
    code goes into a new file (vgic-v3-emul.c).

    Signed-off-by: Andre Przywara
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • ICC_SRE_EL1 is a system register allowing msr/mrs accesses to the
    GIC CPU interface for EL1 (guests). Currently we force it to 0, but
    for proper GICv3 support we have to allow guests to use it (depending
    on their selected virtual GIC model).
    So add ICC_SRE_EL1 to the list of saved/restored registers on a
    world switch, but actually disallow a guest to change it by only
    restoring a fixed, once-initialized value.
    This value depends on the GIC model userland has chosen for a guest.

    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • Currently the maximum number of vCPUs supported is a global value
    limited by the used GIC model. GICv3 will lift this limit, but we
    still need to observe it for guests using GICv2.
    So the maximum number of vCPUs is per-VM value, depending on the
    GIC model the guest uses.
    Store and check the value in struct kvm_arch, but keep it down to
    8 for now.

    Signed-off-by: Andre Przywara
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • Currently we only have one virtual GIC model supported, so all guests
    use the same emulation code. With the addition of another model we
    end up with different guests using potentially different vGIC models,
    so we have to split up some functions to be per VM.
    Introduce a vgic_vm_ops struct to hold function pointers for those
    functions that are different and provide the necessary code to
    initialize them.
    Also split up the vgic_init() function to separate out VGIC model
    specific functionality into a separate function, which will later be
    different for a GICv3 model.

    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • With the introduction of a second emulated GIC model we need to let
    userspace specify the GIC model to use for each VM. Pass the
    userspace provided value down into the vGIC code and store it there
    to differentiate later.

    Signed-off-by: Andre Przywara
    Acked-by: Christoffer Dall
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Andre Przywara
     

15 Dec, 2014

1 commit

  • It is curently possible to run a VM with architected timers support
    without creating an in-kernel VGIC, which will result in interrupts from
    the virtual timer going nowhere.

    To address this issue, move the architected timers initialization to the
    time when we run a VCPU for the first time, and then only initialize
    (and enable) the architected timers if we have a properly created and
    initialized in-kernel VGIC.

    When injecting interrupts from the virtual timer to the vgic, the
    current setup should ensure that this never calls an on-demand init of
    the VGIC, which is the only call path that could return an error from
    kvm_vgic_inject_irq(), so capture the return value and raise a warning
    if there's an error there.

    We also change the kvm_timer_init() function from returning an int to be
    a void function, since the function always succeeds.

    Reviewed-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

13 Dec, 2014

3 commits

  • Some code paths will need to check to see if the internal state of the
    vgic has been initialized (such as when creating new VCPUs), so
    introduce such a macro that checks the nr_cpus field which is set when
    the vgic has been initialized.

    Also set nr_cpus = 0 in kvm_vgic_destroy, because the error path in
    vgic_init() will call this function, and code should never errornously
    assume the vgic to be properly initialized after an error.

    Acked-by: Marc Zyngier
    Reviewed-by: Eric Auger
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • The vgic_initialized() macro currently returns the state of the
    vgic->ready flag, which indicates if the vgic is ready to be used when
    running a VM, not specifically if its internal state has been
    initialized.

    Rename the macro accordingly in preparation for a more nuanced
    initialization flow.

    Acked-by: Marc Zyngier
    Reviewed-by: Eric Auger
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • VGIC initialization currently happens in three phases:
    (1) kvm_vgic_create() (triggered by userspace GIC creation)
    (2) vgic_init_maps() (triggered by userspace GIC register read/write
    requests, or from kvm_vgic_init() if not already run)
    (3) kvm_vgic_init() (triggered by first VM run)

    We were doing initialization of some state to correspond with the
    state of a freshly-reset GIC in kvm_vgic_init(); this is too late,
    since it will overwrite changes made by userspace using the
    register access APIs before the VM is run. Move this initialization
    earlier, into the vgic_init_maps() phase.

    This fixes a bug where QEMU could successfully restore a saved
    VM state snapshot into a VM that had already been run, but could
    not restore it "from cold" using the -loadvm command line option
    (the symptoms being that the restored VM would run but interrupts
    were ignored).

    Finally rename vgic_init_maps to vgic_init and renamed kvm_vgic_init to
    kvm_vgic_map_resources.

    [ This patch is originally written by Peter Maydell, but I have
    modified it somewhat heavily, renaming various bits and moving code
    around. If something is broken, I am to be blamed. - Christoffer ]

    Acked-by: Marc Zyngier
    Reviewed-by: Eric Auger
    Signed-off-by: Peter Maydell
    Signed-off-by: Christoffer Dall

    Peter Maydell
     

16 Oct, 2014

1 commit

  • The EIRSR and ELRSR registers are 32-bit registers on GICv2, and we
    store these as an array of two such registers on the vgic vcpu struct.
    However, we access them as a single 64-bit value or as a bitmap pointer
    in the generic vgic code, which breaks BE support.

    Instead, store them as u64 values on the vgic structure and do the
    word-swapping in the assembly code, which already handles the byte order
    for BE systems.

    Tested-by: Victor Kamensky
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

07 Oct, 2014

1 commit

  • The vgic code can be disabled in Kconfig and there are dummy implementations
    of most of the provided API functions for the disabled case.

    However, the newly introduced kvm_vgic_destroy/kvm_vgic_vcpu_destroy
    functions are lacking those dummies, resulting in this build error:

    arch/arm/kvm/arm.c: In function 'kvm_arch_destroy_vm':
    arch/arm/kvm/arm.c:165:2: error: implicit declaration of function 'kvm_vgic_destroy' [-Werror=implicit-function-declaration]
    kvm_vgic_destroy(kvm);
    ^
    arch/arm/kvm/arm.c: In function 'kvm_arch_vcpu_free':
    arch/arm/kvm/arm.c:248:2: error: implicit declaration of function 'kvm_vgic_vcpu_destroy' [-Werror=implicit-function-declaration]
    kvm_vgic_vcpu_destroy(vcpu);
    ^

    This adds two inline helpers to get it to build again in this configuration.

    Signed-off-by: Arnd Bergmann
    Fixes: c1bfb577add ("arm/arm64: KVM: vgic: switch to dynamic allocation")
    Signed-off-by: Christoffer Dall

    Arnd Bergmann
     

19 Sep, 2014

9 commits

  • It is now quite easy to delay the allocation of the vgic tables
    until we actually require it to be up and running (when the first
    vcpu is kicking around, or someones tries to access the GIC registers).

    This allow us to allocate memory for the exact number of CPUs we
    have. As nobody configures the number of interrupts just yet,
    use a fallback to VGIC_NR_IRQS_LEGACY.

    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Nuke VGIC_NR_IRQS entierly, now that the distributor instance
    contains the number of IRQ allocated to this GIC.

    Also add VGIC_NR_IRQS_LEGACY to preserve the current API.

    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Now that we can (almost) dynamically size the number of interrupts,
    we're facing an interesting issue:

    We have to evaluate at runtime whether or not an access hits a valid
    register, based on the sizing of this particular instance of the
    distributor. Furthermore, the GIC spec says that accessing a reserved
    register is RAZ/WI.

    For this, add a new field to our range structure, indicating the number
    of bits a single interrupts uses. That allows us to find out whether or
    not the access is in range.

    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • We now have the information about the number of CPU interfaces in
    the distributor itself. Let's get rid of VGIC_MAX_CPUS, and just
    rely on KVM_MAX_VCPUS where we don't have the choice. Yet.

    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Having a dynamic number of supported interrupts means that we
    cannot relly on VGIC_NR_SHARED_IRQS being fixed anymore.

    Instead, make it take the distributor structure as a parameter,
    so it can return the right value.

    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • So far, all the VGIC data structures are statically defined by the
    *maximum* number of vcpus and interrupts it supports. It means that
    we always have to oversize it to cater for the worse case.

    Start by changing the data structures to be dynamically sizeable,
    and allocate them at runtime.

    The sizes are still very static though.

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Writes to GICD_ISPENDRn and GICD_ICPENDRn are currently not handled
    correctly for level-triggered interrupts. The spec states that for
    level-triggered interrupts, writes to the GICD_ISPENDRn activate the
    output of a flip-flop which is in turn or'ed with the actual input
    interrupt signal. Correspondingly, writes to GICD_ICPENDRn simply
    deactivates the output of that flip-flop, but does not (of course) affect
    the external input signal. Reads from GICC_IAR will also deactivate the
    flip-flop output.

    This requires us to track the state of the level-input separately from
    the state in the flip-flop. We therefore introduce two new variables on
    the distributor struct to track these two states. Astute readers may
    notice that this is introducing more state than required (because an OR
    of the two states gives you the pending state), but the remaining vgic
    code uses the pending bitmap for optimized operations to figure out, at
    the end of the day, if an interrupt is pending or not on the distributor
    side. Refactoring the code to consider the two state variables all the
    places where we currently access the precomputed pending value, did not
    look pretty.

    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • We have a special bitmap on the distributor struct to keep track of when
    level-triggered interrupts are queued on the list registers. This was
    named irq_active, which is confusing, because the active state of an
    interrupt as per the GIC spec is a different thing, not specifically
    related to edge-triggered/level-triggered configurations but rather
    indicates an interrupt which has been ack'ed but not yet eoi'ed.

    Rename the bitmap and the corresponding accessor functions to irq_queued
    to clarify what this is actually used for.

    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • The irq_state field on the distributor struct is ambiguous in its
    meaning; the comment says it's the level of the input put, but that
    doesn't make much sense for edge-triggered interrupts. The code
    actually uses this state variable to check if the interrupt is in the
    pending state on the distributor so clarify the comment and rename the
    actual variable and accessor methods.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

11 Jul, 2014

1 commit