22 Apr, 2015

3 commits

  • …it/kvmarm/kvmarm into kvm-master

    KVM/ARM changes for v4.1, take #2:

    Rather small this time:

    - a fix for a nasty bug with virtual IRQ injection
    - a fix for irqfd

    Paolo Bonzini
     
  • When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently
    only check it against a fixed limit, which historically is set
    to 127. With the new dynamic IRQ allocation the effective limit may
    actually be smaller (64).
    So when now a malicious or buggy userland injects a SPI in that
    range, we spill over on our VGIC bitmaps and bytemaps memory.
    I could trigger a host kernel NULL pointer dereference with current
    mainline by injecting some bogus IRQ number from a hacked kvmtool:
    -----------------
    ....
    DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1)
    DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1)
    DEBUG: IRQ #114 still in the game, writing to bytemap now...
    Unable to handle kernel NULL pointer dereference at virtual address 00000000
    pgd = ffffffc07652e000
    [00000000] *pgd=00000000f658b003, *pud=00000000f658b003, *pmd=0000000000000000
    Internal error: Oops: 96000006 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027
    Hardware name: FVP Base (DT)
    task: ffffffc0774e9680 ti: ffffffc0765a8000 task.ti: ffffffc0765a8000
    PC is at kvm_vgic_inject_irq+0x234/0x310
    LR is at kvm_vgic_inject_irq+0x30c/0x310
    pc : [] lr : [] pstate: 80000145
    .....

    So this patch fixes this by checking the SPI number against the
    actual limit. Also we remove the former legacy hard limit of
    127 in the ioctl code.

    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall
    CC: # 4.0, 3.19, 3.18
    [maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__,
    as suggested by Christopher Covington]
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • irqfd/arm curently does not support routing. kvm_irq_map_gsi is
    supposed to return all the routing entries associated with the
    provided gsi and return the number of those entries. We should
    return 0 at this point.

    Signed-off-by: Eric Auger
    Acked-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Eric Auger
     

21 Apr, 2015

1 commit

  • This creates a debugfs directory for each HV guest (assuming debugfs
    is enabled in the kernel config), and within that directory, a file
    by which the contents of the guest's HPT (hashed page table) can be
    read. The directory is named vmnnnn, where nnnn is the PID of the
    process that created the guest. The file is named "htab". This is
    intended to help in debugging problems in the host's management
    of guest memory.

    The contents of the file consist of a series of lines like this:

    3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196

    The first field is the index of the entry in the HPT, the second and
    third are the HPT entry, so the third entry contains the real page
    number that is mapped by the entry if the entry's valid bit is set.
    The fourth field is the guest's view of the second doubleword of the
    entry, so it contains the guest physical address. (The format of the
    second through fourth fields are described in the Power ISA and also
    in arch/powerpc/include/asm/mmu-hash64.h.)

    Signed-off-by: Paul Mackerras
    Signed-off-by: Alexander Graf

    Paul Mackerras
     

14 Apr, 2015

1 commit

  • Pull KVM updates from Paolo Bonzini:
    "First batch of KVM changes for 4.1

    The most interesting bit here is irqfd/ioeventfd support for ARM and
    ARM64.

    Summary:

    ARM/ARM64:
    fixes for live migration, irqfd and ioeventfd support (enabling
    vhost, too), page aging

    s390:
    interrupt handling rework, allowing to inject all local interrupts
    via new ioctl and to get/set the full local irq state for migration
    and introspection. New ioctls to access memory by virtual address,
    and to get/set the guest storage keys. SIMD support.

    MIPS:
    FPU and MIPS SIMD Architecture (MSA) support. Includes some
    patches from Ralf Baechle's MIPS tree.

    x86:
    bugfixes (notably for pvclock, the others are small) and cleanups.
    Another small latency improvement for the TSC deadline timer"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
    KVM: use slowpath for cross page cached accesses
    kvm: mmu: lazy collapse small sptes into large sptes
    KVM: x86: Clear CR2 on VCPU reset
    KVM: x86: DR0-DR3 are not clear on reset
    KVM: x86: BSP in MSR_IA32_APICBASE is writable
    KVM: x86: simplify kvm_apic_map
    KVM: x86: avoid logical_map when it is invalid
    KVM: x86: fix mixed APIC mode broadcast
    KVM: x86: use MDA for interrupt matching
    kvm/ppc/mpic: drop unused IRQ_testbit
    KVM: nVMX: remove unnecessary double caching of MAXPHYADDR
    KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry
    KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu
    KVM: vmx: pass error code with internal error #2
    x86: vdso: fix pvclock races with task migration
    KVM: remove kvm_read_hva and kvm_read_hva_atomic
    KVM: x86: optimize delivery of TSC deadline timer interrupt
    KVM: x86: extract blocking logic from __vcpu_run
    kvm: x86: fix x86 eflags fixed bit
    KVM: s390: migrate vcpu interrupt state
    ...

    Linus Torvalds
     

10 Apr, 2015

1 commit


08 Apr, 2015

4 commits


01 Apr, 2015

1 commit

  • We have introduced struct kvm_s390_irq a while ago which allows to
    inject all kinds of interrupts as defined in the Principles of
    Operation.
    Add ioctl to inject interrupts with the extended struct kvm_s390_irq

    Signed-off-by: Jens Freimann
    Signed-off-by: Christian Borntraeger
    Acked-by: Cornelia Huck

    Jens Freimann
     

31 Mar, 2015

3 commits

  • Currently we have struct kvm_exit_mmio for encapsulating MMIO abort
    data to be passed on from syndrome decoding all the way down to the
    VGIC register handlers. Now as we switch the MMIO handling to be
    routed through the KVM MMIO bus, it does not make sense anymore to
    use that structure already from the beginning. So we keep the data in
    local variables until we put them into the kvm_io_bus framework.
    Then we fill kvm_exit_mmio in the VGIC only, making it a VGIC private
    structure. On that way we replace the data buffer in that structure
    with a pointer pointing to a single location in a local variable, so
    we get rid of some copying on the way.
    With all of the virtual GIC emulation code now being registered with
    the kvm_io_bus, we can remove all of the old MMIO handling code and
    its dispatching functionality.

    I didn't bother to rename kvm_exit_mmio (to vgic_mmio or something),
    because that touches a lot of code lines without any good reason.

    This is based on an original patch by Nikolay.

    Signed-off-by: Andre Przywara
    Cc: Nikolay Nikolaev
    Reviewed-by: Marc Zyngier
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • Using the framework provided by the recent vgic.c changes, we
    register a kvm_io_bus device on mapping the virtual GICv3 resources.
    The distributor mapping is pretty straight forward, but the
    redistributors need some more love, since they need to be tagged with
    the respective redistributor (read: VCPU) they are connected with.
    We use the kvm_io_bus framework to register one devices per VCPU.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • Currently we handle the redistributor registers in two separate MMIO
    regions, one for the overall behaviour and SPIs and one for the
    SGIs/PPIs. That latter forces the creation of _two_ KVM I/O bus
    devices for each redistributor.
    Since the spec mandates those two pages to be contigious, we could as
    well merge them and save the churn with the second KVM I/O bus device.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Signed-off-by: Marc Zyngier

    Andre Przywara
     

27 Mar, 2015

6 commits

  • Using the framework provided by the recent vgic.c changes we register
    a kvm_io_bus device when initializing the virtual GICv2.

    Signed-off-by: Andre Przywara
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • Currently we use a lot of VGIC specific code to do the MMIO
    dispatching.
    Use the previous reworks to add kvm_io_bus style MMIO handlers.

    Those are not yet called by the MMIO abort handler, also the actual
    VGIC emulator function do not make use of it yet, but will be enabled
    with the following patches.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • The vgic_find_range() function in vgic.c takes a struct kvm_exit_mmio
    argument, but actually only used the length field in there. Since we
    need to get rid of that structure in that part of the code anyway,
    let's rework the function (and it's callers) to pass the length
    argument to the function directly.

    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall
    Reviewed-by: Marc Zyngier
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • The name "kvm_mmio_range" is a bit bold, given that it only covers
    the VGIC's MMIO ranges. To avoid confusion with kvm_io_range, rename
    it to vgic_io_range.

    Signed-off-by: Andre Przywara
    Acked-by: Christoffer Dall
    Reviewed-by: Marc Zyngier
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • iodev.h contains definitions for the kvm_io_bus framework. This is
    needed both by the generic KVM code in virt/kvm as well as by
    architecture specific code under arch/. Putting the header file in
    virt/kvm and using local includes in the architecture part seems at
    least dodgy to me, so let's move the file into include/kvm, so that a
    more natural "#include " can be used by all of the code.
    This also solves a problem later when using struct kvm_io_device
    in arm_vgic.h.
    Fixing up the FSF address in the GPL header and a wrong include path
    on the way.

    Signed-off-by: Andre Przywara
    Acked-by: Christoffer Dall
    Reviewed-by: Marc Zyngier
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • This is needed in e.g. ARM vGIC emulation, where the MMIO handling
    depends on the VCPU that does the access.

    Signed-off-by: Nikolay Nikolaev
    Signed-off-by: Andre Przywara
    Acked-by: Paolo Bonzini
    Acked-by: Christoffer Dall
    Reviewed-by: Marc Zyngier
    Signed-off-by: Marc Zyngier

    Nikolay Nikolaev
     

24 Mar, 2015

1 commit

  • KVM guest can fail to startup with following trace on host:

    qemu-system-x86: page allocation failure: order:4, mode:0x40d0
    Call Trace:
    dump_stack+0x47/0x67
    warn_alloc_failed+0xee/0x150
    __alloc_pages_direct_compact+0x14a/0x150
    __alloc_pages_nodemask+0x776/0xb80
    alloc_kmem_pages+0x3a/0x110
    kmalloc_order+0x13/0x50
    kmemdup+0x1b/0x40
    __kvm_set_memory_region+0x24a/0x9f0 [kvm]
    kvm_set_ioapic+0x130/0x130 [kvm]
    kvm_set_memory_region+0x21/0x40 [kvm]
    kvm_vm_ioctl+0x43f/0x750 [kvm]

    Failure happens when attempting to allocate pages for
    'struct kvm_memslots', however it doesn't have to be
    present in physically contiguous (kmalloc-ed) address
    space, change allocation to kvm_kvzalloc() so that
    it will be vmalloc-ed when its size is more then a page.

    Signed-off-by: Igor Mammedov
    Signed-off-by: Marcelo Tosatti

    Igor Mammedov
     

19 Mar, 2015

1 commit

  • When all bits in mask are not set,
    kvm_arch_mmu_enable_log_dirty_pt_masked() has nothing to do. But since
    it needs to be called from the generic code, it cannot be inlined, and
    a few function calls, two when PML is enabled, are wasted.

    Since it is common to see many pages remain clean, e.g. framebuffers can
    stay calm for a long time, it is worth eliminating this overhead.

    Signed-off-by: Takuya Yoshikawa
    Reviewed-by: Paolo Bonzini
    Signed-off-by: Marcelo Tosatti

    Takuya Yoshikawa
     

17 Mar, 2015

1 commit


14 Mar, 2015

4 commits

  • When a VCPU is no longer running, we currently check to see if it has a
    timer scheduled in the future, and if it does, we schedule a host
    hrtimer to notify is in case the timer expires while the VCPU is still
    not running. When the hrtimer fires, we mask the guest's timer and
    inject the timer IRQ (still relying on the guest unmasking the time when
    it receives the IRQ).

    This is all good and fine, but when migration a VM (checkpoint/restore)
    this introduces a race. It is unlikely, but possible, for the following
    sequence of events to happen:

    1. Userspace stops the VM
    2. Hrtimer for VCPU is scheduled
    3. Userspace checkpoints the VGIC state (no pending timer interrupts)
    4. The hrtimer fires, schedules work in a workqueue
    5. Workqueue function runs, masks the timer and injects timer interrupt
    6. Userspace checkpoints the timer state (timer masked)

    At restore time, you end up with a masked timer without any timer
    interrupts and your guest halts never receiving timer interrupts.

    Fix this by only kicking the VCPU in the workqueue function, and sample
    the expired state of the timer when entering the guest again and inject
    the interrupt and mask the timer only then.

    Signed-off-by: Christoffer Dall
    Signed-off-by: Alex Bennée
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • Migrating active interrupts causes the active state to be lost
    completely. This implements some additional bitmaps to track the active
    state on the distributor and export this to user space.

    Signed-off-by: Christoffer Dall
    Signed-off-by: Alex Bennée
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • This helps re-factor away some of the repetitive code and makes the code
    flow more nicely.

    Signed-off-by: Alex Bennée
    Signed-off-by: Christoffer Dall

    Alex Bennée
     
  • There is an interesting bug in the vgic code, which manifests itself
    when the KVM run loop has a signal pending or needs a vmid generation
    rollover after having disabled interrupts but before actually switching
    to the guest.

    In this case, we flush the vgic as usual, but we sync back the vgic
    state and exit to userspace before entering the guest. The consequence
    is that we will be syncing the list registers back to the software model
    using the GICH_ELRSR and GICH_EISR from the last execution of the guest,
    potentially overwriting a list register containing an interrupt.

    This showed up during migration testing where we would capture a state
    where the VM has masked the arch timer but there were no interrupts,
    resulting in a hung test.

    Cc: Marc Zyngier
    Reported-by: Alex Bennee
    Signed-off-by: Christoffer Dall
    Signed-off-by: Alex Bennée
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

13 Mar, 2015

1 commit


12 Mar, 2015

3 commits

  • This patch enables irqfd on arm/arm64.

    Both irqfd and resamplefd are supported. Injection is implemented
    in vgic.c without routing.

    This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD.

    KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability
    automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set.

    Irqfd injection is restricted to SPI. The rationale behind not
    supporting PPI irqfd injection is that any device using a PPI would
    be a private-to-the-CPU device (timer for instance), so its state
    would have to be context-switched along with the VCPU and would
    require in-kernel wiring anyhow. It is not a relevant use case for
    irqfds.

    Signed-off-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Eric Auger
     
  • To prepare for irqfd addition, coarse grain locking is removed at
    kvm_vgic_sync_hwstate level and finer grain locking is introduced in
    vgic_process_maintenance only.

    Signed-off-by: Eric Auger
    Acked-by: Christoffer Dall
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Eric Auger
     
  • Introduce __KVM_HAVE_ARCH_INTC_INITIALIZED define and
    associated kvm_arch_intc_initialized function. This latter
    allows to test whether the virtual interrupt controller is initialized
    and ready to accept virtual IRQ injection. On some architectures,
    the virtual interrupt controller is dynamically instantiated, justifying
    that kind of check.

    The new function can now be used by irqfd to check whether the
    virtual interrupt controller is ready on KVM_IRQFD request. If not,
    KVM_IRQFD returns -EAGAIN.

    Signed-off-by: Eric Auger
    Acked-by: Christoffer Dall
    Reviewed-by: Andre Przywara
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Eric Auger
     

11 Mar, 2015

2 commits

  • Several dts only list "arm,cortex-a7-gic" or "arm,gic-400" in their GIC
    compatible list, and while this is correct (and supported by the GIC
    driver), KVM will fail to detect that it can support these cases.

    This patch adds the missing strings to the VGIC code. The of_device_id
    entries are padded to keep the probe function data aligned.

    Signed-off-by: Mark Rutland
    Cc: Andre Przywara
    Cc: Christoffer Dall
    Cc: Marc Zyngier
    Cc: Michal Simek
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Mark Rutland
     
  • POWER supports irqfds but forgot to advertise them. Some userspace does
    not check for the capability, but others check it---thus they work on
    x86 and s390 but not POWER.

    To avoid that other architectures in the future make the same mistake, let
    common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE.

    Reported-and-tested-by: Greg Kurz
    Cc: stable@vger.kernel.org
    Fixes: 297e21053a52f060944e9f0de4c64fad9bcd72fc
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Marcelo Tosatti

    Paolo Bonzini
     

10 Mar, 2015

7 commits

  • WARNING: Prefer [subsystem eg: netdev]_info([subsystem]dev, ... then
    dev_info(dev, ... then pr_info(... to printk(KERN_INFO ...
    + printk(KERN_INFO "kvm: exiting hardware virtualization\n");

    WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then
    dev_err(dev, ... then pr_err(... to printk(KERN_ERR ...
    + printk(KERN_ERR "kvm: misc device register failed\n");

    Signed-off-by: Xiubo Li
    Signed-off-by: Marcelo Tosatti

    Xiubo Li
     
  • ERROR: code indent should use tabs where possible
    + const struct kvm_io_range *r2)$

    WARNING: please, no spaces at the start of a line
    + const struct kvm_io_range *r2)$

    This patch fixes this ERROR & WARNING to reduce noise when checking new
    patches in kvm_main.c.

    Signed-off-by: Xiubo Li
    Signed-off-by: Marcelo Tosatti

    Xiubo Li
     
  • WARNING: please, no space before tabs
    + * ^I^Ikvm->lock --> kvm->slots_lock --> kvm->irq_lock$

    WARNING: please, no space before tabs
    +^I^I * ^I- gfn_to_hva (kvm_read_guest, gfn_to_pfn)$

    WARNING: please, no space before tabs
    +^I^I * ^I- kvm_is_visible_gfn (mmu_check_roots)$

    This patch fixes these warnings to reduce noise when checking new
    patches in kvm_main.c.

    Signed-off-by: Xiubo Li
    Signed-off-by: Marcelo Tosatti

    Xiubo Li
     
  • There are many Warnings like this:
    WARNING: Missing a blank line after declarations
    + struct kvm_coalesced_mmio_zone zone;
    + r = -EFAULT;

    This patch fixes these warnings to reduce noise when checking new
    patches in kvm_main.c.

    Signed-off-by: Xiubo Li
    Signed-off-by: Marcelo Tosatti

    Xiubo Li
     
  • WARNING: EXPORT_SYMBOL(foo); should immediately follow its
    function/variable
    +EXPORT_SYMBOL_GPL(gfn_to_page);

    This patch fixes these warnings to reduce noise when checking new
    patches in kvm_main.c.

    Signed-off-by: Xiubo Li
    Signed-off-by: Marcelo Tosatti

    Xiubo Li
     
  • ERROR: do not initialise statics to 0 or NULL
    +static int kvm_usage_count = 0;

    The kvm_usage_count will be placed to .bss segment when linking, so
    not need to set it to 0 here obviously.

    This patch fixes this ERROR to reduce noise when checking new patches
    in kvm_main.c.

    Signed-off-by: Xiubo Li
    Signed-off-by: Marcelo Tosatti

    Xiubo Li
     
  • WARNING: labels should not be indented
    + out_free_irq_routing:

    This patch fixes this WARNING to reduce noise when checking new patches
    in kvm_main.c.

    Signed-off-by: Xiubo Li
    Signed-off-by: Marcelo Tosatti

    Xiubo Li