29 Sep, 2016

1 commit


28 Sep, 2016

2 commits

  • If the vgic hasn't been created and initialized, we shouldn't attempt to
    look at its data structures or flush/sync anything to the GIC hardware.

    This fixes an issue reported by Alexander Graf when using a userspace
    irqchip.

    Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
    Cc: stable@vger.kernel.org
    Reported-by: Alexander Graf
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • If userspace creates a PMU for the VCPU, but doesn't create an in-kernel
    irqchip, then we end up in a nasty path where we try to take an
    uninitialized spinlock, which can lead to all sorts of breakages.

    Luckily, QEMU always creates the VGIC before the PMU, so we can
    establish this as ABI and check for the VGIC in the PMU init stage.
    This can be relaxed at a later time if we want to support PMU with a
    userspace irqchip.

    Cc: stable@vger.kernel.org
    Cc: Shannon Zhao
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

22 Sep, 2016

6 commits

  • This patch allows to build and use vgic-v3 in 32-bit mode.

    Unfortunately, it can not be split in several steps without extra
    stubs to keep patches independent and bisectable. For instance,
    virt/kvm/arm/vgic/vgic-v3.c uses function from vgic-v3-sr.c, handling
    access to GICv3 cpu interface from the guest requires vgic_v3.vgic_sre
    to be already defined.

    It is how support has been done:

    * handle SGI requests from the guest

    * report configured SRE on access to GICv3 cpu interface from the guest

    * required vgic-v3 macros are provided via uapi.h

    * static keys are used to select GIC backend

    * to make vgic-v3 build KVM_ARM_VGIC_V3 guard is removed along with
    the static inlines

    Acked-by: Marc Zyngier
    Reviewed-by: Christoffer Dall
    Signed-off-by: Vladimir Murzin
    Signed-off-by: Christoffer Dall

    Vladimir Murzin
     
  • We have couple of 64-bit registers defined in GICv3 architecture, so
    unsigned long accesses to these registers will only access a single
    32-bit part of that regitser. On the other hand these registers can't
    be accessed as 64-bit with a single instruction like ldrd/strd or
    ldmia/stmia if we run a 32-bit host because KVM does not support
    access to MMIO space done by these instructions.

    It means that a 32-bit guest accesses these registers in 32-bit
    chunks, so the only thing we need to do is to ensure that
    extract_bytes() always takes 64-bit data.

    Acked-by: Marc Zyngier
    Signed-off-by: Vladimir Murzin
    Signed-off-by: Christoffer Dall

    Vladimir Murzin
     
  • Well, this patch is looking ahead of time, but we'll get following
    compiler warnings as soon as we introduce vgic-v3 to 32-bit world

    CC arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.o
    arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c: In function 'vgic_mmio_read_v3r_typer':
    arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:184:35: warning: left shift count >= width of type [-Wshift-count-overflow]
    value = (mpidr & GENMASK(23, 0)) << 32;
    ^
    In file included from ./include/linux/kernel.h:10:0,
    from ./include/asm-generic/bug.h:13,
    from ./arch/arm/include/asm/bug.h:59,
    from ./include/linux/bug.h:4,
    from ./include/linux/io.h:23,
    from ./arch/arm/include/asm/arch_gicv3.h:23,
    from ./include/linux/irqchip/arm-gic-v3.h:411,
    from arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:14:
    arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c: In function 'vgic_v3_dispatch_sgi':
    ./include/linux/bitops.h:6:24: warning: left shift count >= width of type [-Wshift-count-overflow]
    #define BIT(nr) (1UL << (nr))
    ^
    arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:614:20: note: in expansion of macro 'BIT'
    broadcast = reg & BIT(ICC_SGI1R_IRQ_ROUTING_MODE_BIT);
    ^
    Let's fix them now.

    Acked-by: Marc Zyngier
    Signed-off-by: Vladimir Murzin
    Signed-off-by: Christoffer Dall

    Vladimir Murzin
     
  • By now ITS code guarded with KVM_ARM_VGIC_V3 config option which was
    introduced to hide everything specific to vgic-v3 from 32-bit world.
    We are going to support vgic-v3 in 32-bit world and KVM_ARM_VGIC_V3
    will gone, but we don't have support for ITS there yet and we need to
    continue keeping ITS away.
    Introduce the new config option to prevent ITS code being build in
    32-bit mode when support for vgic-v3 is done.

    Signed-off-by: Vladimir Murzin
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Vladimir Murzin
     
  • So we can reuse the code under arch/arm

    Signed-off-by: Vladimir Murzin
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Vladimir Murzin
     
  • Currently GIC backend is selected via alternative framework and this
    is fine. We are going to introduce vgic-v3 to 32-bit world and there
    we don't have patching framework in hand, so we can either check
    support for GICv3 every time we need to choose which backend to use or
    try to optimise it by using static keys. The later looks quite
    promising because we can share logic involved in selecting GIC backend
    between architectures if both uses static keys.

    This patch moves arm64 from alternative to static keys framework for
    selecting GIC backend. For that we embed static key into vgic_global
    and enable the key during vgic initialisation based on what has
    already been exposed by the host GIC driver.

    Acked-by: Marc Zyngier
    Signed-off-by: Vladimir Murzin
    Signed-off-by: Christoffer Dall

    Vladimir Murzin
     

16 Sep, 2016

2 commits

  • This commit adds the ability for archs to export
    per-vcpu information via a new per-vcpu dir in
    the VM's debugfs directory.

    If kvm_arch_has_vcpu_debugfs() returns true, then KVM
    will create a vcpu dir for each vCPU in the VM's
    debugfs directory. Then kvm_arch_create_vcpu_debugfs()
    is responsible for populating each vcpu directory
    with arch specific entries.

    The per-vcpu path in debugfs will look like:

    /sys/kernel/debug/kvm/29162-10/vcpu0
    /sys/kernel/debug/kvm/29162-10/vcpu1

    This is all arch specific for now because the only
    user of this interface (x86) wants to export x86-specific
    per-vcpu information to user-space.

    Signed-off-by: Luiz Capitulino
    Signed-off-by: Paolo Bonzini

    Luiz Capitulino
     
  • This make it possible to call kvm_destroy_vm_debugfs() from
    kvm_create_vm_debugfs() in error conditions.

    Reviewed-by: Paolo Bonzini
    Signed-off-by: Luiz Capitulino
    Signed-off-by: Paolo Bonzini

    Luiz Capitulino
     

13 Sep, 2016

1 commit


08 Sep, 2016

13 commits

  • Remove two unnecessary labels now that kvm_timer_hyp_init is not
    creating its own workqueue anymore.

    Signed-off-by: Paolo Bonzini
    Signed-off-by: Christoffer Dall

    Paolo Bonzini
     
  • If, when proxying a GICV access at EL2, we detect that the guest is
    doing something silly, report an EL1 SError instead ofgnoring the
    access.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • So far, we've been disabling KVM on systems where the GICV region couldn't
    be safely given to a guest. Now that we're able to handle this access
    safely by emulating it in HYP, we can enable this feature when we detect
    an unsafe configuration.

    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Now that we have the necessary infrastructure to handle MMIO accesses
    in HYP, perform the GICV access on behalf of the guest. This requires
    checking that the access is strictly 32bit, properly aligned, and
    falls within the expected range.

    When all condition are satisfied, we perform the access and tell
    the rest of the HYP code that the instruction has been correctly
    emulated.

    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • In order to efficiently perform the GICV access on behalf of the
    guest, we need to be able to avoid going back all the way to
    the host kernel.

    For this, we introduce a new hook in the world switch code,
    conveniently placed just after populating the fault info.
    At that point, we only have saved/restored the GP registers,
    and we can quickly perform all the required checks (data abort,
    translation fault, valid faulting syndrome, not an external
    abort, not a PTW).

    Coming back from the emulation code, we need to skip the emulated
    instruction. This involves an additional bit of save/restore in
    order to be able to access the guest's PC (and possibly CPSR if
    this is a 32bit guest).

    At this stage, no emulation code is provided.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • As we plan to do some emulation at HYP, let's make kvm_skip_instr32
    as part of the hyp_text section. This doesn't preclude the kernel
    from using it.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Add the bit of glue and const-ification that is required to use
    the code inherited from the arm64 port, and move over to it.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • It would make some sense to share the conditional execution code
    between 32 and 64bit. In order to achieve this, let's move that
    code to virt/kvm/arm/aarch32.c. While we're at it, drop a
    superfluous BUG_ON() that wasn't that useful.

    Following patches will migrate the 32bit port to that code base.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • As kvm_set_routing_entry() was changing prototype between 4.7 and 4.8,
    an ugly hack was put in place in order to survive both building in
    -next and the merge window.

    Now that everything has been merged, let's dump the compatibility
    hack for good.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Eric Auger
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Just a rename so we can implement a v3-specific function later.

    We take the chance to get rid of the V2/V3 ops comments as well.

    No functional change.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Eric Auger

    Christoffer Dall
     
  • As we are about to deal with multiple data types and situations where
    the vgic should not be initialized when doing userspace accesses on the
    register attributes, factor out the functionality of
    vgic_attr_regs_access into smaller bits which can be reused by a new
    function later.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Eric Auger

    Christoffer Dall
     
  • vms and vcpus have statistics associated with them which can be viewed
    within the debugfs. Currently it is assumed within the vcpu_stat_get() and
    vm_stat_get() functions that all of these statistics are represented as
    u32s, however the next patch adds some u64 vcpu statistics.

    Change all vcpu statistics to u64 and modify vcpu_stat_get() accordingly.
    Since vcpu statistics are per vcpu, they will only be updated by a single
    vcpu at a time so this shouldn't present a problem on 32-bit machines
    which can't atomically increment 64-bit numbers. However vm statistics
    could potentially be updated by multiple vcpus from that vm at a time.
    To avoid the overhead of atomics make all vm statistics ulong such that
    they are 64-bit on 64-bit systems where they can be atomically incremented
    and are 32-bit on 32-bit systems which may not be able to atomically
    increment 64-bit numbers. Modify vm_stat_get() to expect ulongs.

    Signed-off-by: Suraj Jitindar Singh
    Reviewed-by: David Matlack
    Acked-by: Christian Borntraeger
    Signed-off-by: Paul Mackerras

    Suraj Jitindar Singh
     
  • The workqueue "irqfd_cleanup_wq" queues a single work item
    &irqfd->shutdown and hence doesn't require ordering. It is a host-wide
    workqueue for issuing deferred shutdown requests aggregated from all
    vm* instances. It is not being used on a memory reclaim path.
    Hence, it has been converted to use system_wq.
    The work item has been flushed in kvm_irqfd_release().

    The workqueue "wqueue" queues a single work item &timer->expired
    and hence doesn't require ordering. Also, it is not being used on
    a memory reclaim path. Hence, it has been converted to use system_wq.

    System workqueues have been able to handle high level of concurrency
    for a long time now and hence it's not required to have a singlethreaded
    workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue
    created with create_singlethread_workqueue(), system_wq allows multiple
    work items to overlap executions even on the same CPU; however, a
    per-cpu workqueue doesn't have any CPU locality or global ordering
    guarantee unless the target CPU is explicitly specified and thus the
    increase of local concurrency shouldn't make any difference.

    Signed-off-by: Bhaktipriya Shridhar
    Signed-off-by: Paolo Bonzini

    Bhaktipriya Shridhar
     

18 Aug, 2016

1 commit


17 Aug, 2016

2 commits

  • Similarily to f005bd7e3b84 ("clocksource/arm_arch_timer: Force
    per-CPU interrupt to be level-triggered"), make sure we can
    survive an interrupt that has been misconfigured as edge-triggered
    by forcing it to be level-triggered (active low is assumed, but
    the GIC doesn't really care whether this is high or low).

    Hopefully, the amount of shouting in the kernel log will convince
    the user to do something about their firmware.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • When a guest wants to map a device-ID/event-ID combination that is
    already mapped, we may end up in a situation where an LPI is never
    "put", thus never being freed.
    Since the GICv3 spec says that mapping an already mapped LPI is
    UNPREDICTABLE, lets just bail out early in this situation to avoid
    any potential leaks.

    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Andre Przywara
     

16 Aug, 2016

3 commits

  • When userspace provides the doorbell address for an MSI to be
    injected into the guest, we find a KVM device which feels responsible.
    Lets check that this device is really an emulated ITS before we make
    real use of the container_of-ed pointer.

    [ Moved NULL-pointer check to caller of static function
    - Christoffer ]

    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • Currently we register an ITS device upon userland issuing the CTLR_INIT
    ioctl to mark initialization of the ITS as done.
    This deviates from the initialization sequence of the existing GIC
    devices and does not play well with the way QEMU handles things.
    To be more in line with what we are used to, register the ITS(es) just
    before the first VCPU is about to run, so in the map_resources() call.
    This involves iterating through the list of KVM devices and map each
    ITS that we find.

    Signed-off-by: Andre Przywara
    Reviewed-by: Eric Auger
    Tested-by: Eric Auger
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • There are two problems with the current implementation of the MMIO
    handlers for the propbaser and pendbaser:

    First, the write to the value itself is not guaranteed to be an atomic
    64-bit write so two concurrent writes to the structure field could be
    intermixed.

    Second, because we do a read-modify-update operation without any
    synchronization, if we have two 32-bit accesses to separate parts of the
    register, we can loose one of them.

    By using the atomic cmpxchg64 we should cover both issues above.

    Reviewed-by: Andre Przywara
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

12 Aug, 2016

2 commits

  • KVM devices were manipulating list data structures without any form of
    synchronization, and some implementations of the create operations also
    suffered from a lack of synchronization.

    Now when we've split the xics create operation into create and init, we
    can hold the kvm->lock mutex while calling the create operation and when
    manipulating the devices list.

    The error path in the generic code gets slightly ugly because we have to
    take the mutex again and delete the device from the list, but holding
    the mutex during anon_inode_getfd or releasing/locking the mutex in the
    common non-error path seemed wrong.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Paolo Bonzini
    Acked-by: Christian Borntraeger
    Signed-off-by: Radim Krčmář

    Christoffer Dall
     
  • As we are about to hold the kvm->lock during the create operation on KVM
    devices, we should move the call to xics_debugfs_init into its own
    function, since holding a mutex over extended amounts of time might not
    be a good idea.

    Introduce an init operation on the kvm_device_ops struct which cannot
    fail and call this, if configured, after the device has been created.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Paolo Bonzini
    Signed-off-by: Radim Krčmář

    Christoffer Dall
     

10 Aug, 2016

2 commits

  • Right now the following sequence of events can happen:

    1. Thread X calls vgic_put_irq
    2. Thread Y calls vgic_add_lpi
    3. Thread Y gets lpi_list_lock
    4. Thread X drops the ref count to 0 and blocks on lpi_list_lock
    5. Thread Y finds the irq via the lpi_list_lock, raises the ref
    count to 1, and release the lpi_list_lock.
    6. Thread X proceeds and frees the irq.

    Avoid this by holding the spinlock around the kref_put.

    Reviewed-by: Andre Przywara
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • During low memory conditions, we could be dereferencing a NULL pointer
    when vgic_add_lpi fails to allocate memory.

    Consider for example this call sequence:

    vgic_its_cmd_handle_mapi
    itte->irq = vgic_add_lpi(kvm, lpi_nr);
    update_lpi_config(kvm, itte->irq, NULL);
    ret = kvm_read_guest(kvm, propbase + irq->intid
    ^^^^
    kaboom?

    Instead, return an error pointer from vgic_add_lpi and check the return
    value from its single caller.

    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

09 Aug, 2016

1 commit

  • According to the KVM API documentation a successful MSI injection
    should return a value > 0 on success.
    Return possible errors in vgic_its_trigger_msi() and report a
    successful injection back to userland, while also reporting the
    case where the MSI could not be delivered due to the guest not
    having the LPI mapped, for instance.

    Signed-off-by: Andre Przywara
    Reviewed-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Andre Przywara
     

04 Aug, 2016

1 commit


03 Aug, 2016

1 commit

  • Pull KVM updates from Paolo Bonzini:

    - ARM: GICv3 ITS emulation and various fixes. Removal of the
    old VGIC implementation.

    - s390: support for trapping software breakpoints, nested
    virtualization (vSIE), the STHYI opcode, initial extensions
    for CPU model support.

    - MIPS: support for MIPS64 hosts (32-bit guests only) and lots
    of cleanups, preliminary to this and the upcoming support for
    hardware virtualization extensions.

    - x86: support for execute-only mappings in nested EPT; reduced
    vmexit latency for TSC deadline timer (by about 30%) on Intel
    hosts; support for more than 255 vCPUs.

    - PPC: bugfixes.

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits)
    KVM: PPC: Introduce KVM_CAP_PPC_HTM
    MIPS: Select HAVE_KVM for MIPS64_R{2,6}
    MIPS: KVM: Reset CP0_PageMask during host TLB flush
    MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX()
    MIPS: KVM: Sign extend MFC0/RDHWR results
    MIPS: KVM: Fix 64-bit big endian dynamic translation
    MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase
    MIPS: KVM: Use 64-bit CP0_EBase when appropriate
    MIPS: KVM: Set CP0_Status.KX on MIPS64
    MIPS: KVM: Make entry code MIPS64 friendly
    MIPS: KVM: Use kmap instead of CKSEG0ADDR()
    MIPS: KVM: Use virt_to_phys() to get commpage PFN
    MIPS: Fix definition of KSEGX() for 64-bit
    KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD
    kvm: x86: nVMX: maintain internal copy of current VMCS
    KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
    KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
    KVM: arm64: vgic-its: Simplify MAPI error handling
    KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers
    KVM: arm64: vgic-its: Turn device_id validation into generic ID validation
    ...

    Linus Torvalds
     

30 Jul, 2016

1 commit

  • Pull smp hotplug updates from Thomas Gleixner:
    "This is the next part of the hotplug rework.

    - Convert all notifiers with a priority assigned

    - Convert all CPU_STARTING/DYING notifiers

    The final removal of the STARTING/DYING infrastructure will happen
    when the merge window closes.

    Another 700 hundred line of unpenetrable maze gone :)"

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
    timers/core: Correct callback order during CPU hot plug
    leds/trigger/cpu: Move from CPU_STARTING to ONLINE level
    powerpc/numa: Convert to hotplug state machine
    arm/perf: Fix hotplug state machine conversion
    irqchip/armada: Avoid unused function warnings
    ARC/time: Convert to hotplug state machine
    clocksource/atlas7: Convert to hotplug state machine
    clocksource/armada-370-xp: Convert to hotplug state machine
    clocksource/exynos_mct: Convert to hotplug state machine
    clocksource/arm_global_timer: Convert to hotplug state machine
    rcu: Convert rcutree to hotplug state machine
    KVM/arm/arm64/vgic-new: Convert to hotplug state machine
    smp/cfd: Convert core to hotplug state machine
    x86/x2apic: Convert to CPU hotplug state machine
    profile: Convert to hotplug state machine
    timers/core: Convert to hotplug state machine
    hrtimer: Convert to hotplug state machine
    x86/tboot: Convert to hotplug state machine
    arm64/armv8 deprecated: Convert to hotplug state machine
    hwtracing/coresight-etm4x: Convert to hotplug state machine
    ...

    Linus Torvalds
     

24 Jul, 2016

1 commit