20 May, 2016

21 commits

  • We now store the mapped hardware IRQ number in our struct, so we
    don't need the irq_phys_map for the new VGIC.
    Implement the hardware IRQ mapping on top of the reworked arch
    timer interface.

    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall

    Andre Przywara
     
  • map_resources is the last initialization step. It is executed on
    first VCPU run. At that stage the code checks that userspace has provided
    the base addresses for the relevant VGIC regions, which depend on the
    type of VGIC that is exposed to the guest. Also we check if the two
    regions overlap.
    If the checks succeeded, we register the respective register frames with
    the kvm_io_bus framework.

    If we emulate a GICv2, the function also forces vgic_init execution if
    it has not been executed yet. Also we map the virtual GIC CPU interface
    onto the guest's CPU interface.

    Signed-off-by: Eric Auger
    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall

    Eric Auger
     
  • This patch allocates and initializes the data structures used
    to model the vgic distributor and virtual cpu interfaces. At that
    stage the number of IRQs and number of virtual CPUs is frozen.

    Signed-off-by: Eric Auger
    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall

    Eric Auger
     
  • This patch implements the vgic_creation function which is
    called on CREATE_IRQCHIP VM IOCTL (v2 only) or KVM_CREATE_DEVICE

    Signed-off-by: Eric Auger
    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall

    Eric Auger
     
  • Implements kvm_vgic_hyp_init and vgic_probe function.
    This uses the new firmware independent VGIC probing to support both ACPI
    and DT based systems (code from Marc Zyngier).

    The vgic_global struct is enriched with new fields populated
    by those functions.

    Signed-off-by: Eric Auger
    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall

    Eric Auger
     
  • kvm_vgic_addr is used by the userspace to set the base address of
    the following register regions, as seen by the guest:
    - distributor(v2 and v3),
    - re-distributors (v3),
    - CPU interface (v2).

    Signed-off-by: Eric Auger
    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall

    Eric Auger
     
  • In contrast to GICv2 SGIs in a GICv3 implementation are not triggered
    by a MMIO write, but with a system register write. KVM knows about
    that register already, we just need to implement the handler and wire
    it up to the core KVM/ARM code.

    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall

    Andre Przywara
     
  • Add an MMIO handling framework to the VGIC emulation:
    Each register is described by its offset, size (or number of bits per
    IRQ, if applicable) and the read/write handler functions. We provide
    initialization macros to describe each GIC register later easily.

    Separate dispatch functions for read and write accesses are connected
    to the kvm_io_bus framework and binary-search for the responsible
    register handler based on the offset address within the region.
    We convert the incoming data (referenced by a pointer) to the host's
    endianess and use pass-by-value to hand the data over to the actual
    handler functions.

    The register handler prototype and the endianess conversion are
    courtesy of Christoffer Dall.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall
    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall

    Marc Zyngier
     
  • Tell KVM whether a particular VCPU has an IRQ that needs handling
    in the guest. This is used to decide whether a VCPU is runnable.

    Signed-off-by: Eric Auger
    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall
    Reviewed-by: Marc Zyngier

    Eric Auger
     
  • Implement the framework for syncing IRQs between our emulation and
    the list registers, which represent the guest's view of IRQs.
    This is done in kvm_vgic_flush_hwstate and kvm_vgic_sync_hwstate,
    which gets called on guest entry and exit.
    The code talking to the actual GICv2/v3 hardware is added in the
    following patches.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall
    Signed-off-by: Eric Auger
    Signed-off-by: Andre Przywara
    Reviewed-by: Eric Auger
    Reviewed-by: Christoffer Dall

    Marc Zyngier
     
  • Provide a vgic_queue_irq_unlock() function which decides whether a
    given IRQ needs to be queued to a VCPU's ap_list.
    This should be called whenever an IRQ becomes pending or enabled,
    either as a result of userspace injection, from in-kernel emulated
    devices like the architected timer or from MMIO accesses to the
    distributor emulation.
    Also provides the necessary functions to allow userland to inject an
    IRQ to a guest.
    Since this is the first code that starts using our locking mechanism, we
    add some (hopefully) clear documentation of our locking strategy and
    requirements along with this patch.

    Signed-off-by: Christoffer Dall
    Signed-off-by: Andre Przywara

    Christoffer Dall
     
  • Add a new header file for the new and improved GIC implementation.
    The big change is that we now have a struct vgic_irq per IRQ instead
    of spreading all the information over various bitmaps.

    We include this new header conditionally from within the old header
    file for the time being to avoid touching all the users.

    Signed-off-by: Christoffer Dall
    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier

    Christoffer Dall
     
  • Currently the PMU uses a member of the struct vgic_dist directly,
    which not only breaks abstraction, but will fail with the new VGIC.
    Abstract this access in the VGIC header file and refactor the validity
    check in the PMU code.

    Signed-off-by: Andre Przywara

    Andre Przywara
     
  • The number of list registers is a property of the underlying system, not
    of emulated VGIC CPU interface.

    As we are about to move this variable to global state in the new vgic
    for clarity, move it from the legacy implementation as well to make the
    merge of the new code easier.

    Signed-off-by: Christoffer Dall
    Signed-off-by: Andre Przywara
    Reviewed-by: Andre Przywara

    Christoffer Dall
     
  • We are about to modify the VGIC to allocate all data structures
    dynamically and store mapped IRQ information on a per-IRQ struct, which
    is indeed allocated dynamically at init time.

    Therefore, we cannot record the mapped IRQ info from the timer at timer
    reset time like it's done now, because VCPU reset happens before timer
    init.

    A possible later time to do this is on the first run of a per VCPU, it
    just requires us to move the enable state to be a per-VCPU state and do
    the lookup of the physical IRQ number when we are about to run the VCPU.

    Signed-off-by: Christoffer Dall
    Signed-off-by: Andre Przywara

    Christoffer Dall
     
  • Now that the virtual arch timer does not care about the irq_phys_map
    anymore, let's rework kvm_vgic_map_phys_irq() to return an error
    value instead. Any reference to that mapping can later be done by
    passing the correct combination of VCPU and virtual IRQ number.
    This makes the irq_phys_map handling completely private to the
    VGIC code.

    Signed-off-by: Andre Przywara
    Reviewed-by: Eric Auger
    Reviewed-by: Christoffer Dall

    Andre Przywara
     
  • Now that the interface between the arch timer and the VGIC does not
    require passing the irq_phys_map entry pointer anymore, let's remove
    it from the virtual arch timer and use the virtual IRQ number instead
    directly.
    The remaining pointer returned by kvm_vgic_map_phys_irq() will be
    removed in the following patch.

    Signed-off-by: Andre Przywara
    Reviewed-by: Eric Auger
    Reviewed-by: Christoffer Dall

    Andre Przywara
     
  • The communication of a Linux IRQ number from outside the VGIC to the
    vgic was a leftover from the day when the vgic code cared about how a
    particular device injects virtual interrupts mapped to a physical
    interrupt.

    We can safely remove this notion, leaving all physical IRQ handling to
    be done in the device driver (the arch timer in this case), which makes
    room for a saner API for the new VGIC.

    Signed-off-by: Christoffer Dall
    Signed-off-by: Andre Przywara
    Reviewed-by: Eric Auger

    Christoffer Dall
     
  • kvm_vgic_unmap_phys_irq() only needs the virtual IRQ number, so let's
    just pass that between the arch timer and the VGIC to get rid of
    the irq_phys_map pointer.

    Signed-off-by: Andre Przywara
    Reviewed-by: Eric Auger
    Reviewed-by: Christoffer Dall

    Andre Przywara
     
  • For getting the active state of a mapped IRQ, we actually only need
    the virtual IRQ number, not the pointer to the mapping entry.
    Pass the virtual IRQ number from the arch timer to the VGIC directly.

    Signed-off-by: Andre Przywara
    Reviewed-by: Eric Auger
    Reviewed-by: Christoffer Dall

    Andre Przywara
     
  • When we want to inject a hardware mapped IRQ into a guest, we actually
    only need the virtual IRQ number from the irq_phys_map.
    So let's pass this number directly from the arch timer to the VGIC
    to avoid using the map as a parameter.

    Signed-off-by: Andre Przywara
    Reviewed-by: Eric Auger
    Reviewed-by: Christoffer Dall

    Andre Przywara
     

03 May, 2016

1 commit

  • Currently, the firmware tables are parsed 2 times: once in the GIC
    drivers, the other time when initializing the vGIC. It means code
    duplication and make more tedious to add the support for another
    firmware table (like ACPI).

    Use the recently introduced helper gic_get_kvm_info() to get
    information about the virtual GIC.

    With this change, the virtual GIC becomes agnostic to the firmware
    table and KVM will be able to initialize the vGIC on ACPI.

    Signed-off-by: Julien Grall
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Julien Grall
     

09 Mar, 2016

2 commits

  • Just like on GICv2, we're a bit hammer-happy with GICv3, and access
    them more often than we should.

    Adopt a policy similar to what we do for GICv2, only save/restoring
    the minimal set of registers. As we don't access the registers
    linearly anymore (we may skip some), the convoluted accessors become
    slightly simpler, and we can drop the ugly indexing macro that
    tended to confuse the reviewers.

    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • GICv2 registers are *slow*. As in "terrifyingly slow". Which is bad.
    But we're equaly bad, as we make a point in accessing them even if
    we don't have any interrupt in flight.

    A good solution is to first find out if we have anything useful to
    write into the GIC, and if we don't, to simply not do it. This
    involves tracking which LRs actually have something valid there.

    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     

01 Mar, 2016

14 commits

  • Programming the active state in the (re)distributor can be an
    expensive operation so it makes some sense to try and reduce
    the number of accesses as much as possible. So far, we
    program the active state on each VM entry, but there is some
    opportunity to do less.

    An obvious solution is to cache the active state in memory,
    and only program it in the HW when conditions change. But
    because the HW can also change things under our feet (the active
    state can transition from 1 to 0 when the guest does an EOI),
    some precautions have to be taken, which amount to only caching
    an "inactive" state, and always programing it otherwise.

    With this in place, we observe a reduction of around 700 cycles
    on a 2GHz GICv2 platform for a NULL hypercall.

    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • To configure the virtual PMUv3 overflow interrupt number, we use the
    vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_PMU_V3_IRQ
    attribute within the KVM_ARM_VCPU_PMU_V3_CTRL group.

    After configuring the PMUv3, call the vcpu ioctl with attribute
    KVM_ARM_VCPU_PMU_V3_INIT to initialize the PMUv3.

    Signed-off-by: Shannon Zhao
    Acked-by: Peter Maydell
    Reviewed-by: Andrew Jones
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Shannon Zhao
     
  • To support guest PMUv3, use one bit of the VCPU INIT feature array.
    Initialize the PMU when initialzing the vcpu with that bit and PMU
    overflow interrupt set.

    Signed-off-by: Shannon Zhao
    Acked-by: Peter Maydell
    Reviewed-by: Andrew Jones
    Signed-off-by: Marc Zyngier

    Shannon Zhao
     
  • When KVM frees VCPU, it needs to free the perf_event of PMU.

    Signed-off-by: Shannon Zhao
    Reviewed-by: Marc Zyngier
    Reviewed-by: Andrew Jones
    Signed-off-by: Marc Zyngier

    Shannon Zhao
     
  • When resetting vcpu, it needs to reset the PMU state to initial status.

    Signed-off-by: Shannon Zhao
    Reviewed-by: Marc Zyngier
    Reviewed-by: Andrew Jones
    Signed-off-by: Marc Zyngier

    Shannon Zhao
     
  • When calling perf_event_create_kernel_counter to create perf_event,
    assign a overflow handler. Then when the perf event overflows, set the
    corresponding bit of guest PMOVSSET register. If this counter is enabled
    and its interrupt is enabled as well, kick the vcpu to sync the
    interrupt.

    On VM entry, if there is counter overflowed and interrupt level is
    changed, inject the interrupt with corresponding level. On VM exit, sync
    the interrupt level as well if it has been changed.

    Signed-off-by: Shannon Zhao
    Reviewed-by: Marc Zyngier
    Reviewed-by: Andrew Jones
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Shannon Zhao
     
  • According to ARMv8 spec, when writing 1 to PMCR.E, all counters are
    enabled by PMCNTENSET, while writing 0 to PMCR.E, all counters are
    disabled. When writing 1 to PMCR.P, reset all event counters, not
    including PMCCNTR, to zero. When writing 1 to PMCR.C, reset PMCCNTR to
    zero.

    Signed-off-by: Shannon Zhao
    Reviewed-by: Marc Zyngier
    Signed-off-by: Marc Zyngier

    Shannon Zhao
     
  • Add access handler which emulates writing and reading PMSWINC
    register and add support for creating software increment event.

    Signed-off-by: Shannon Zhao
    Reviewed-by: Andrew Jones
    Signed-off-by: Marc Zyngier

    Shannon Zhao
     
  • Since the reset value of PMOVSSET and PMOVSCLR is UNKNOWN, use
    reset_unknown for its reset handler. Add a handler to emulate writing
    PMOVSSET or PMOVSCLR register.

    When writing non-zero value to PMOVSSET, the counter and its interrupt
    is enabled, kick this vcpu to sync PMU interrupt.

    Signed-off-by: Shannon Zhao
    Reviewed-by: Andrew Jones
    Signed-off-by: Marc Zyngier

    Shannon Zhao
     
  • When we use tools like perf on host, perf passes the event type and the
    id of this event type category to kernel, then kernel will map them to
    hardware event number and write this number to PMU PMEVTYPER_EL0
    register. When getting the event number in KVM, directly use raw event
    type to create a perf_event for it.

    Signed-off-by: Shannon Zhao
    Reviewed-by: Marc Zyngier
    Signed-off-by: Marc Zyngier

    Shannon Zhao
     
  • Since the reset value of PMCNTENSET and PMCNTENCLR is UNKNOWN, use
    reset_unknown for its reset handler. Add a handler to emulate writing
    PMCNTENSET or PMCNTENCLR register.

    When writing to PMCNTENSET, call perf_event_enable to enable the perf
    event. When writing to PMCNTENCLR, call perf_event_disable to disable
    the perf event.

    Signed-off-by: Shannon Zhao
    Signed-off-by: Marc Zyngier

    Shannon Zhao
     
  • These kind of registers include PMEVCNTRn, PMCCNTR and PMXEVCNTR which
    is mapped to PMEVCNTRn.

    The access handler translates all aarch32 register offsets to aarch64
    ones and uses vcpu_sys_reg() to access their values to avoid taking care
    of big endian.

    When reading these registers, return the sum of register value and the
    value perf event counts.

    Signed-off-by: Shannon Zhao
    Reviewed-by: Andrew Jones
    Signed-off-by: Marc Zyngier

    Shannon Zhao
     
  • Add reset handler which gets host value of PMCR_EL0 and make writable
    bits architecturally UNKNOWN except PMCR.E which is zero. Add an access
    handler for PMCR.

    Signed-off-by: Shannon Zhao
    Reviewed-by: Andrew Jones
    Signed-off-by: Marc Zyngier

    Shannon Zhao
     
  • Here we plan to support virtual PMU for guest by full software
    emulation, so define some basic structs and functions preparing for
    futher steps. Define struct kvm_pmc for performance monitor counter and
    struct kvm_pmu for performance monitor unit for each vcpu. According to
    ARMv8 spec, the PMU contains at most 32(ARMV8_PMU_MAX_COUNTERS)
    counters.

    Since this only supports ARM64 (or PMUv3), add a separate config symbol
    for it.

    Signed-off-by: Shannon Zhao
    Acked-by: Marc Zyngier
    Reviewed-by: Andrew Jones
    Signed-off-by: Marc Zyngier

    Shannon Zhao
     

14 Dec, 2015

1 commit


25 Nov, 2015

1 commit

  • We were incorrectly removing the active state from the physical
    distributor on the timer interrupt when the timer output level was
    deasserted. We shouldn't be doing this without considering the virtual
    interrupt's active state, because the architecture requires that when an
    LR has the HW bit set and the pending or active bits set, then the
    physical interrupt must also have the corresponding bits set.

    This addresses an issue where we have been observing an inconsistency
    between the LR state and the physical distributor state where the LR
    state was active and the physical distributor was not active, which
    shouldn't happen.

    Reviewed-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall