29 Jul, 2008

2 commits


25 Jul, 2008

1 commit

  • This patch just extends the anon_inode_getfd interface to take an additional
    parameter with a flag value. The flag value is passed on to
    get_unused_fd_flags in anticipation for a use with the O_CLOEXEC flag.

    No actual semantic changes here, the changed callers all pass 0 for now.

    [akpm@linux-foundation.org: KVM fix]
    Signed-off-by: Ulrich Drepper
    Acked-by: Davide Libenzi
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     

20 Jul, 2008

11 commits

  • smp_call_function_mask() now complains when called in a preemptible context;
    adjust its callers accordingly.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Flush the shadow mmu before removing regions to avoid stale entries.

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     
  • This patch #ifdefs the bitmap array for dirty tracking. We don't have dirty
    tracking on s390 today, and we'd love to use our storage keys to store the
    dirty information for migration. Therefore, we won't need this array at all,
    and due to our limited amount of vmalloc space this limits the amount of guests
    we can run.

    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Carsten Otte
     
  • Currently kvmtrace is not portable. This will prevent from copying a
    trace file from big-endian target to little-endian workstation for analysis.
    In the patch, kernel outputs metadata containing a magic number to trace
    log, and changes 64-bit words to be u64 instead of a pair of u32s.

    Signed-off-by: Tan Li
    Acked-by: Jerone Young
    Acked-by: Hollis Blanchard
    Signed-off-by: Avi Kivity

    Tan, Li
     
  • This patch adds all needed structures to coalesce MMIOs.
    Until an architecture uses it, it is not compiled.

    Coalesced MMIO introduces two ioctl() to define where are the MMIO zones that
    can be coalesced:

    - KVM_REGISTER_COALESCED_MMIO registers a coalesced MMIO zone.
    It requests one parameter (struct kvm_coalesced_mmio_zone) which defines
    a memory area where MMIOs can be coalesced until the next switch to
    user space. The maximum number of MMIO zones is KVM_COALESCED_MMIO_ZONE_MAX.

    - KVM_UNREGISTER_COALESCED_MMIO cancels all registered zones inside
    the given bounds (bounds are also given by struct kvm_coalesced_mmio_zone).

    The userspace client can check kernel coalesced MMIO availability by asking
    ioctl(KVM_CHECK_EXTENSION) for the KVM_CAP_COALESCED_MMIO capability.
    The ioctl() call to KVM_CAP_COALESCED_MMIO will return 0 if not supported,
    or the page offset where will be stored the ring buffer.
    The page offset depends on the architecture.

    After an ioctl(KVM_RUN), the first page of the KVM memory mapped points to
    a kvm_run structure. The offset given by KVM_CAP_COALESCED_MMIO is
    an offset to the coalesced MMIO ring expressed in PAGE_SIZE relatively
    to the address of the start of th kvm_run structure. The MMIO ring buffer
    is defined by the structure kvm_coalesced_mmio_ring.

    [akio: fix oops during guest shutdown]

    Signed-off-by: Laurent Vivier
    Signed-off-by: Akio Takebe
    Signed-off-by: Avi Kivity

    Laurent Vivier
     
  • Modify member in_range() of structure kvm_io_device to pass length and the type
    of the I/O (write or read).

    This modification allows to use kvm_io_device with coalesced MMIO.

    Signed-off-by: Laurent Vivier
    Signed-off-by: Avi Kivity

    Laurent Vivier
     
  • [avi: fix ia64 build breakage]

    Signed-off-by: Sheng Yang
    Signed-off-by: Avi Kivity

    Sheng Yang
     
  • Obsoleted by the vmx-specific per-cpu list.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • KVM turns off hardware virtualization extensions during reboot, in order
    to disassociate the memory used by the virtualization extensions from the
    processor, and in order to have the system in a consistent state.
    Unfortunately virtual machines may still be running while this goes on,
    and once virtualization extensions are turned off, any virtulization
    instruction will #UD on execution.

    Fix by adding an exception handler to virtualization instructions; if we get
    an exception during reboot, we simply spin waiting for the reset to complete.
    If it's a true exception, BUG() so we can have our stack trace.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • This patch allows VMAs that contain no backing page to be used for guest
    memory. This is useful for assigning mmio regions to a guest.

    Signed-off-by: Anthony Liguori
    Signed-off-by: Avi Kivity

    Anthony Liguori
     
  • kvm_dev_ioctl casts the arg value to void __user *, just to recast it
    again to long. This seems unnecessary.

    According to objdump the binary code on x86 is unchanged by this patch.

    Signed-off-by: Christian Borntraeger
    Signed-off-by: Avi Kivity

    Christian Borntraeger
     

16 Jul, 2008

1 commit

  • Conflicts:

    arch/powerpc/Kconfig
    arch/s390/kernel/time.c
    arch/x86/kernel/apic_32.c
    arch/x86/kernel/cpu/perfctr-watchdog.c
    arch/x86/kernel/i8259_64.c
    arch/x86/kernel/ldt.c
    arch/x86/kernel/nmi_64.c
    arch/x86/kernel/smpboot.c
    arch/x86/xen/smp.c
    include/asm-x86/hw_irq_32.h
    include/asm-x86/hw_irq_64.h
    include/asm-x86/mach-default/irq_vectors.h
    include/asm-x86/mach-voyager/irq_vectors.h
    include/asm-x86/smp.h
    kernel/Makefile

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

06 Jul, 2008

1 commit

  • The "remote_irr" variable is used to indicate an interrupt
    which has been received by the LAPIC, but not acked.

    In our EOI handler, we unset remote_irr and re-inject the
    interrupt if the interrupt line is still asserted.

    However, we do not set remote_irr here, leading to a
    situation where if kvm_ioapic_set_irq() is called, then we go
    ahead and call ioapic_service(). This means that IRR is
    re-asserted even though the interrupt is currently in service
    (i.e. LAPIC IRR is cleared and ISR/TMR set)

    The issue with this is that when the currently executing
    interrupt handler finishes and writes LAPIC EOI, then TMR is
    unset and EOI sent to the IOAPIC. Since IRR is now asserted,
    but TMR is not, then when the second interrupt is handled,
    no EOI is sent and if there is any pending interrupt, it is
    not re-injected.

    This fixes a hang only seen while running mke2fs -j on an
    8Gb virtio disk backed by a fully sparse raw file, with
    aliguori "avoid fragmented virtio-blk transfers by copying"
    changes.

    Signed-off-by: Mark McLoughlin
    Acked-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Mark McLoughlin
     

26 Jun, 2008

2 commits


24 Jun, 2008

1 commit

  • The ioapic acknowledge path translates interrupt vectors to irqs. It
    currently uses a first match algorithm, stopping when it finds the first
    redirection table entry containing the vector. That fails however if the
    guest changes the irq to a different line, leaving the old redirection table
    entry in place (though masked). Result is interrupts not making it to the
    guest.

    Fix by always scanning the entire redirection table.

    Signed-off-by: Avi Kivity

    Avi Kivity
     

07 Jun, 2008

1 commit

  • There's a bug in the IOAPIC code for level-triggered interrupts. Its
    relatively easy to trigger by sharing (virtio-blk + usbtablet was the
    testcase, initially reported by Gerd von Egidy).

    The "remote_irr" variable is used to indicate accepted but not yet acked
    interrupts. Its cleared from the EOI handler.

    Problem is that the EOI handler clears remote_irr unconditionally, even
    if it reinjected another pending interrupt.

    In that case, kvm_ioapic_set_irq() proceeds to ioapic_service() which
    sets remote_irr even if it failed to inject (since the IRR was high due
    to EOI reinjection).

    Since the TMR bit has been cleared by the first EOI, the second one
    fails to clear remote_irr.

    End result is interrupt line dead.

    Fix it by setting remote_irr only if a new pending interrupt has been
    generated (and the TMR bit for vector in question set).

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     

18 May, 2008

1 commit

  • There's still a race in kvm_vcpu_block(), if a wake_up_interruptible()
    call happens before the task state is set to TASK_INTERRUPTIBLE:

    CPU0 CPU1

    kvm_vcpu_block

    add_wait_queue

    kvm_cpu_has_interrupt = 0
    set interrupt
    if (waitqueue_active())
    wake_up_interruptible()

    kvm_cpu_has_pending_timer
    kvm_arch_vcpu_runnable
    signal_pending

    set_current_state(TASK_INTERRUPTIBLE)
    schedule()

    Can be fixed by using prepare_to_wait() which sets the task state before
    testing for the wait condition.

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     

04 May, 2008

1 commit


02 May, 2008

1 commit

  • a) none of the callers even looks at inode or file returned by anon_inode_getfd()
    b) any caller that would try to look at those would be racy, since by the time
    it returns we might have raced with close() from another thread and that
    file would be pining for fjords.

    Signed-off-by: Al Viro

    Al Viro
     

27 Apr, 2008

14 commits

  • Use kvm own refcounting instead of playing with ->filp->f_count.
    That will allow to get rid of a lot of crap in anon_inode_getfd() and
    kill a race in kvm_dev_ioctl_create_vm() (file might have been closed
    immediately by another thread, so ->filp might point to already freed
    struct file when we get around to setting it).

    Signed-off-by: Al Viro
    Signed-off-by: Avi Kivity

    Al Viro
     
  • It's a globally exported symbol now.

    Signed-off-by: Hollis Blanchard
    Signed-off-by: Avi Kivity

    Hollis Blanchard
     
  • So userspace can save/restore the mpstate during migration.

    [avi: export the #define constants describing the value]
    [christian: add s390 stubs]
    [avi: ditto for ia64]

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     
  • Timers that fire between guest hlt and vcpu_block's add_wait_queue() are
    ignored, possibly resulting in hangs.

    Also make sure that atomic_inc and waitqueue_active tests happen in the
    specified order, otherwise the following race is open:

    CPU0 CPU1
    if (waitqueue_active(wq))
    add_wait_queue()
    if (!atomic_read(pit_timer->pending))
    schedule()
    atomic_inc(pit_timer->pending)

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     
  • This interface allows user a space application to read the trace of kvm
    related events through relayfs.

    Signed-off-by: Feng (Eric) Liu
    Signed-off-by: Avi Kivity

    Feng(Eric) Liu
     
  • This patch introduces a gfn_to_pfn() function and corresponding functions like
    kvm_release_pfn_dirty(). Using these new functions, we can modify the x86
    MMU to no longer assume that it can always get a struct page for any given gfn.

    We don't want to eliminate gfn_to_page() entirely because a number of places
    assume they can do gfn_to_page() and then kmap() the results. When we support
    IO memory, gfn_to_page() will fail for IO pages although gfn_to_pfn() will
    succeed.

    This does not implement support for avoiding reference counting for reserved
    RAM or for IO memory. However, it should make those things pretty straight
    forward.

    Since we're only introducing new common symbols, I don't think it will break
    the non-x86 architectures but I haven't tested those. I've tested Intel,
    AMD, NPT, and hugetlbfs with Windows and Linux guests.

    [avi: fix overflow when shifting left pfns by adding casts]

    Signed-off-by: Anthony Liguori
    Signed-off-by: Avi Kivity

    Anthony Liguori
     
  • the main purpose of adding this functions is the abilaty to release the
    spinlock that protect the kvm list while still be able to do operations
    on a specific kvm in a safe way.

    Signed-off-by: Izik Eidus
    Signed-off-by: Avi Kivity

    Izik Eidus
     
  • Since the size of kvm_regs is too big to allocate from kernel stack on ia64,
    use kzalloc to allocate it.

    Signed-off-by: Xiantao Zhang
    Signed-off-by: Avi Kivity

    Xiantao Zhang
     
  • Create large pages mappings if the guest PTE's are marked as such and
    the underlying memory is hugetlbfs backed. If the largepage contains
    write-protected pages, a large pte is not used.

    Gives a consistent 2% improvement for data copies on ram mounted
    filesystem, without NPT/EPT.

    Anthony measures a 4% improvement on 4-way kernbench, with NPT.

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     
  • Mark zapped root pagetables as invalid and ignore such pages during lookup.

    This is a problem with the cr3-target feature, where a zapped root table fools
    the faulting code into creating a read-only mapping. The result is a lockup
    if the instruction can't be emulated.

    Signed-off-by: Marcelo Tosatti
    Cc: Anthony Liguori
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     
  • With CONFIG_PREEMPT=n, this is needed in order to disable the fault-in
    code from sleeping.

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Avi Kivity

    Andrea Arcangeli
     
  • The second page is only needed on archs that support pio.

    Noted by Carsten Otte.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Signed-off-by: Jan Engelhardt
    Signed-off-by: Avi Kivity

    Jan Engelhardt
     

04 Mar, 2008

2 commits


09 Feb, 2008

1 commit

  • Sometimes simple attributes might need to return an error, e.g. for
    acquiring a mutex interruptibly. In fact we have that situation in
    spufs already which is the original user of the simple attributes. This
    patch merged the temporarily forked attributes in spufs back into the
    main ones and allows to return errors.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Christoph Hellwig
    Cc:
    Cc: Arnd Bergmann
    Cc: Greg KH
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig