08 Jun, 2017

2 commits

  • First we define an ABI using the vcpu devices that lets userspace set
    the interrupt numbers for the various timers on both the 32-bit and
    64-bit KVM/ARM implementations.

    Second, we add the definitions for the groups and attributes introduced
    by the above ABI. (We add the PMU define on the 32-bit side as well for
    symmetry and it may get used some day.)

    Third, we set up the arch-specific vcpu device operation handlers to
    call into the timer code for anything related to the
    KVM_ARM_VCPU_TIMER_CTRL group.

    Fourth, we implement support for getting and setting the timer interrupt
    numbers using the above defined ABI in the arch timer code.

    Fifth, we introduce error checking upon enabling the arch timer (which
    is called when first running a VCPU) to check that all VCPUs are
    configured to use the same PPI for the timer (as mandated by the
    architecture) and that the virtual and physical timers are not
    configured to use the same IRQ number.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Marc Zyngier

    Christoffer Dall
     
  • We currently initialize the arch timer IRQ numbers from the reset code,
    presumably because we once intended to model multiple CPU or SoC types
    from within the kernel and have hard-coded reset values in the reset
    code.

    As we are moving towards userspace being in charge of more fine-grained
    CPU emulation and stitching together the pieces needed to emulate a
    particular type of CPU, we should no longer have a tight coupling
    between resetting a VCPU and setting IRQ numbers.

    Therefore, move the logic to define and use the default IRQ numbers to
    the timer code and set the IRQ number immediately when creating the
    VCPU.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Marc Zyngier

    Christoffer Dall
     

09 Apr, 2017

1 commit

  • If you're running with a userspace gic or other interrupt controller
    (that is no vgic in the kernel), then you have so far not been able to
    use the architected timers, because the output of the architected
    timers, which are driven inside the kernel, was a kernel-only construct
    between the arch timer code and the vgic.

    This patch implements the new KVM_CAP_ARM_USER_IRQ feature, where we use a
    side channel on the kvm_run structure, run->s.regs.device_irq_level, to
    always notify userspace of the timer output levels when using a userspace
    irqchip.

    This works by ensuring that before we enter the guest, if the timer
    output level has changed compared to what we last told userspace, we
    don't enter the guest, but instead return to userspace to notify it of
    the new level. If we are exiting, because of an MMIO for example, and
    the level changed at the same time, the value is also updated and
    userspace can sample the line as it needs. This is nicely achieved
    simply always updating the timer_irq_level field after the main run
    loop.

    Note that the kvm_timer_update_irq trace event is changed to show the
    host IRQ number for the timer instead of the guest IRQ number, because
    the kernel no longer know which IRQ userspace wires up the timer signal
    to.

    Also note that this patch implements all required functionality but does
    not yet advertise the capability.

    Reviewed-by: Alexander Graf
    Reviewed-by: Marc Zyngier
    Signed-off-by: Alexander Graf
    Signed-off-by: Christoffer Dall

    Alexander Graf
     

08 Feb, 2017

6 commits


13 Jan, 2017

1 commit

  • Current KVM world switch code is unintentionally setting wrong bits to
    CNTHCTL_EL2 when E2H == 1, which may allow guest OS to access physical
    timer. Bit positions of CNTHCTL_EL2 are changing depending on
    HCR_EL2.E2H bit. EL1PCEN and EL1PCTEN are 1st and 0th bits when E2H is
    not set, but they are 11th and 10th bits respectively when E2H is set.

    In fact, on VHE we only need to set those bits once, not for every world
    switch. This is because the host kernel runs in EL2 with HCR_EL2.TGE ==
    1, which makes those bits have no effect for the host kernel execution.
    So we just set those bits once for guests, and that's it.

    Signed-off-by: Jintack Lim
    Reviewed-by: Marc Zyngier
    Signed-off-by: Marc Zyngier

    Jintack Lim
     

25 Dec, 2016

1 commit


20 May, 2016

2 commits

  • We are about to modify the VGIC to allocate all data structures
    dynamically and store mapped IRQ information on a per-IRQ struct, which
    is indeed allocated dynamically at init time.

    Therefore, we cannot record the mapped IRQ info from the timer at timer
    reset time like it's done now, because VCPU reset happens before timer
    init.

    A possible later time to do this is on the first run of a per VCPU, it
    just requires us to move the enable state to be a per-VCPU state and do
    the lookup of the physical IRQ number when we are about to run the VCPU.

    Signed-off-by: Christoffer Dall
    Signed-off-by: Andre Przywara

    Christoffer Dall
     
  • Now that the interface between the arch timer and the VGIC does not
    require passing the irq_phys_map entry pointer anymore, let's remove
    it from the virtual arch timer and use the virtual IRQ number instead
    directly.
    The remaining pointer returned by kvm_vgic_map_phys_irq() will be
    removed in the following patch.

    Signed-off-by: Andre Przywara
    Reviewed-by: Eric Auger
    Reviewed-by: Christoffer Dall

    Andre Przywara
     

01 Mar, 2016

1 commit

  • Programming the active state in the (re)distributor can be an
    expensive operation so it makes some sense to try and reduce
    the number of accesses as much as possible. So far, we
    program the active state on each VM entry, but there is some
    opportunity to do less.

    An obvious solution is to cache the active state in memory,
    and only program it in the HW when conditions change. But
    because the HW can also change things under our feet (the active
    state can transition from 1 to 0 when the guest does an EOI),
    some precautions have to be taken, which amount to only caching
    an "inactive" state, and always programing it otherwise.

    With this in place, we observe a reduction of around 700 cycles
    on a 2GHz GICv2 platform for a NULL hypercall.

    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     

23 Oct, 2015

2 commits

  • The arch timer currently uses edge-triggered semantics in the sense that
    the line is never sampled by the vgic and lowering the line from the
    timer to the vgic doesn't have any effect on the pending state of
    virtual interrupts in the vgic. This means that we do not support a
    guest with the otherwise valid behavior of (1) disable interrupts (2)
    enable the timer (3) disable the timer (4) enable interrupts. Such a
    guest would validly not expect to see any interrupts on real hardware,
    but will see interrupts on KVM.

    This patch fixes this shortcoming through the following series of
    changes.

    First, we change the flow of the timer/vgic sync/flush operations. Now
    the timer is always flushed/synced before the vgic, because the vgic
    samples the state of the timer output. This has the implication that we
    move the timer operations in to non-preempible sections, but that is
    fine after the previous commit getting rid of hrtimer schedules on every
    entry/exit.

    Second, we change the internal behavior of the timer, letting the timer
    keep track of its previous output state, and only lower/raise the line
    to the vgic when the state changes. Note that in theory this could have
    been accomplished more simply by signalling the vgic every time the
    state *potentially* changed, but we don't want to be hitting the vgic
    more often than necessary.

    Third, we get rid of the use of the map->active field in the vgic and
    instead simply set the interrupt as active on the physical distributor
    whenever the input to the GIC is asserted and conversely clear the
    physical active state when the input to the GIC is deasserted.

    Fourth, and finally, we now initialize the timer PPIs (and all the other
    unused PPIs for now), to be level-triggered, and modify the sync code to
    sample the line state on HW sync and re-inject a new interrupt if it is
    still pending at that time.

    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • We currently schedule a soft timer every time we exit the guest if the
    timer did not expire while running the guest. This is really not
    necessary, because the only work we do in the timer work function is to
    kick the vcpu.

    Kicking the vcpu does two things:
    (1) If the vpcu thread is on a waitqueue, make it runnable and remove it
    from the waitqueue.
    (2) If the vcpu is running on a different physical CPU from the one
    doing the kick, it sends a reschedule IPI.

    The second case cannot happen, because the soft timer is only ever
    scheduled when the vcpu is not running. The first case is only relevant
    when the vcpu thread is on a waitqueue, which is only the case when the
    vcpu thread has called kvm_vcpu_block().

    Therefore, we only need to make sure a timer is scheduled for
    kvm_vcpu_block(), which we do by encapsulating all calls to
    kvm_vcpu_block() with kvm_timer_{un}schedule calls.

    Additionally, we only schedule a soft timer if the timer is enabled and
    unmasked, since it is useless otherwise.

    Note that theoretically userspace can use the SET_ONE_REG interface to
    change registers that should cause the timer to fire, even if the vcpu
    is blocked without a scheduled timer, but this case was not supported
    before this patch and we leave it for future work for now.

    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

12 Aug, 2015

1 commit


14 Mar, 2015

1 commit

  • When a VCPU is no longer running, we currently check to see if it has a
    timer scheduled in the future, and if it does, we schedule a host
    hrtimer to notify is in case the timer expires while the VCPU is still
    not running. When the hrtimer fires, we mask the guest's timer and
    inject the timer IRQ (still relying on the guest unmasking the time when
    it receives the IRQ).

    This is all good and fine, but when migration a VM (checkpoint/restore)
    this introduces a race. It is unlikely, but possible, for the following
    sequence of events to happen:

    1. Userspace stops the VM
    2. Hrtimer for VCPU is scheduled
    3. Userspace checkpoints the VGIC state (no pending timer interrupts)
    4. The hrtimer fires, schedules work in a workqueue
    5. Workqueue function runs, masks the timer and injects timer interrupt
    6. Userspace checkpoints the timer state (timer masked)

    At restore time, you end up with a masked timer without any timer
    interrupts and your guest halts never receiving timer interrupts.

    Fix this by only kicking the VCPU in the workqueue function, and sample
    the expired state of the timer when entering the guest again and inject
    the interrupt and mask the timer only then.

    Signed-off-by: Christoffer Dall
    Signed-off-by: Alex Bennée
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

12 Mar, 2015

1 commit

  • We can definitely decide at run-time whether to use the GIC and timers
    or not, and the extra code and data structures that we allocate space
    for is really negligable with this config option, so I don't think it's
    worth the extra complexity of always having to define stub static
    inlines. The !CONFIG_KVM_ARM_VGIC/TIMER case is pretty much an untested
    code path anyway, so we're better off just getting rid of it.

    Signed-off-by: Christoffer Dall
    Acked-by: Marc Zyngier

    Christoffer Dall
     

15 Dec, 2014

1 commit

  • It is curently possible to run a VM with architected timers support
    without creating an in-kernel VGIC, which will result in interrupts from
    the virtual timer going nowhere.

    To address this issue, move the architected timers initialization to the
    time when we run a VCPU for the first time, and then only initialize
    (and enable) the architected timers if we have a properly created and
    initialized in-kernel VGIC.

    When injecting interrupts from the virtual timer to the vgic, the
    current setup should ensure that this never calls an on-demand init of
    the VGIC, which is the only call path that could return an error from
    kvm_vgic_inject_irq(), so capture the return value and raise a warning
    if there's an error there.

    We also change the kvm_timer_init() function from returning an int to be
    a void function, since the function always succeeds.

    Reviewed-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

11 Jul, 2014

1 commit

  • For correct guest suspend/resume behaviour we need to ensure we include
    the generic timer registers for 64 bit guests. As CONFIG_KVM_ARM_TIMER is
    always set for arm64 we don't need to worry about null implementations.
    However I have re-jigged the kvm_arm_timer_set/get_reg declarations to
    be in the common include/kvm/arm_arch_timer.h headers.

    Acked-by: Marc Zyngier
    Reviewed-by: Christoffer Dall
    Signed-off-by: Alex Bennée
    Signed-off-by: Christoffer Dall

    Alex Bennée
     

27 Jun, 2013

1 commit

  • The arch_timer irq numbers (or PPI numbers) are implementation dependent,
    so the host virtual timer irq number can be different from guest virtual
    timer irq number.

    This patch ensures that host virtual timer irq number is read from DTB and
    guest virtual timer irq is determined based on vcpu target type.

    Signed-off-by: Anup Patel
    Signed-off-by: Pranavkumar Sawargaonkar
    Signed-off-by: Christoffer Dall

    Anup Patel
     

19 May, 2013

1 commit

  • As KVM/arm64 is looming on the horizon, it makes sense to move some
    of the common code to a single location in order to reduce duplication.

    The code could live anywhere. Actually, most of KVM is already built
    with a bunch of ugly ../../.. hacks in the various Makefiles, so we're
    not exactly talking about style here. But maybe it is time to start
    moving into a less ugly direction.

    The include files must be in a "public" location, as they are accessed
    from non-KVM files (arch/arm/kernel/asm-offsets.c).

    For this purpose, introduce two new locations:
    - virt/kvm/arm/ : x86 and ia64 already share the ioapic code in
    virt/kvm, so this could be seen as a (very ugly) precedent.
    - include/kvm/ : there is already an include/xen, and while the
    intent is slightly different, this seems as good a location as
    any

    Eventually, we should probably have independant Makefiles at every
    levels (just like everywhere else in the kernel), but this is just
    the first step.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Gleb Natapov

    Marc Zyngier