12 Jan, 2011

13 commits

  • Make it available for all archs.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Large page information has two elements but one of them, write_count, alone
    is accessed by a helper function.

    This patch replaces this helper function with more generic one which returns
    newly named kvm_lpage_info structure and use it to access the other element
    rmap_pde.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Avi Kivity

    Takuya Yoshikawa
     
  • Quote from Avi:
    | I don't think we need to flush immediately; set a "tlb dirty" bit somewhere
    | that is cleareded when we flush the tlb. kvm_mmu_notifier_invalidate_page()
    | can consult the bit and force a flush if set.

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Marcelo Tosatti

    Xiao Guangrong
     
  • KVM compilation fails with the following warning:

    include/linux/kvm_host.h: In function 'kvm_irq_routing_update':
    include/linux/kvm_host.h:679:2: error: 'struct kvm' has no member named 'irq_routing'

    That function is only used and reasonable to have on systems that implement
    an in-kernel interrupt chip. PPC doesn't.

    Fix by #ifdef'ing it out when no irqchip is available.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • Store irq routing table pointer in the irqfd object,
    and use that to inject MSI directly without bouncing out to
    a kernel thread.

    While we touch this structure, rearrange irqfd fields to make fastpath
    better packed for better cache utilization.

    This also adds some comments about locking rules and rcu usage in code.

    Some notes on the design:
    - Use pointer into the rt instead of copying an entry,
    to make it possible to use rcu, thus side-stepping
    locking complexities. We also save some memory this way.
    - Old workqueue code is still used for level irqs.
    I don't think we DTRT with level anyway, however,
    it seems easier to keep the code around as
    it has been thought through and debugged, and fix level later than
    rip out and re-instate it later.

    Signed-off-by: Michael S. Tsirkin
    Acked-by: Marcelo Tosatti
    Acked-by: Gregory Haskins
    Signed-off-by: Avi Kivity

    Michael S. Tsirkin
     
  • Cosmetic change, but it helps to correlate IRQs with PCI devices.

    Acked-by: Alex Williamson
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Jan Kiszka
    Signed-off-by: Marcelo Tosatti

    Jan Kiszka
     
  • This improves the IRQ forwarding for assigned devices: By using the
    kernel's threaded IRQ scheme, we can get rid of the latency-prone work
    queue and simplify the code in the same run.

    Moreover, we no longer have to hold assigned_dev_lock while raising the
    guest IRQ, which can be a lenghty operation as we may have to iterate
    over all VCPUs. The lock is now only used for synchronizing masking vs.
    unmasking of INTx-type IRQs, thus is renames to intx_lock.

    Acked-by: Alex Williamson
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Jan Kiszka
    Signed-off-by: Marcelo Tosatti

    Jan Kiszka
     
  • IA64 support forces us to abstract the allocation of the kvm structure.
    But instead of mixing this up with arch-specific initialization and
    doing the same on destruction, split both steps. This allows to move
    generic destruction calls into generic code.

    It also fixes error clean-up on failures of kvm_create_vm for IA64.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     
  • Currently x86's kvm_vm_ioctl_get_dirty_log() needs to allocate a bitmap by
    vmalloc() which will be used in the next logging and this has been causing
    bad effect to VGA and live-migration: vmalloc() consumes extra systime,
    triggers tlb flush, etc.

    This patch resolves this issue by pre-allocating one more bitmap and switching
    between two bitmaps during dirty logging.

    Performance improvement:
    I measured performance for the case of VGA update by trace-cmd.
    The result was 1.5 times faster than the original one.

    In the case of live migration, the improvement ratio depends on the workload
    and the guest memory size. In general, the larger the memory size is the more
    benefits we get.

    Note:
    This does not change other architectures's logic but the allocation size
    becomes twice. This will increase the actual memory consumption only when
    the new size changes the number of pages allocated by vmalloc().

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Fernando Luis Vazquez Cao
    Signed-off-by: Marcelo Tosatti

    Takuya Yoshikawa
     
  • As suggested by Andrea, pass r/w error code to gup(), upgrading read fault
    to writable if host pte allows it.

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     
  • Guest enables async PF vcpu functionality using this MSR.

    Reviewed-by: Rik van Riel
    Signed-off-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Gleb Natapov
     
  • Keep track of memslots changes by keeping generation number in memslots
    structure. Provide kvm_write_guest_cached() function that skips
    gfn_to_hva() translation if memslots was not changed since previous
    invocation.

    Acked-by: Rik van Riel
    Signed-off-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Gleb Natapov
     
  • If a guest accesses swapped out memory do not swap it in from vcpu thread
    context. Schedule work to do swapping and put vcpu into halted state
    instead.

    Interrupts will still be delivered to the guest and if interrupt will
    cause reschedule guest will continue to run another task.

    [avi: remove call to get_user_pages_noio(), nacked by Linus; this
    makes everything synchrnous again]

    Acked-by: Rik van Riel
    Signed-off-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Gleb Natapov
     

25 Oct, 2010

1 commit

  • * 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (321 commits)
    KVM: Drop CONFIG_DMAR dependency around kvm_iommu_map_pages
    KVM: Fix signature of kvm_iommu_map_pages stub
    KVM: MCE: Send SRAR SIGBUS directly
    KVM: MCE: Add MCG_SER_P into KVM_MCE_CAP_SUPPORTED
    KVM: fix typo in copyright notice
    KVM: Disable interrupts around get_kernel_ns()
    KVM: MMU: Avoid sign extension in mmu_alloc_direct_roots() pae root address
    KVM: MMU: move access code parsing to FNAME(walk_addr) function
    KVM: MMU: audit: check whether have unsync sps after root sync
    KVM: MMU: audit: introduce audit_printk to cleanup audit code
    KVM: MMU: audit: unregister audit tracepoints before module unloaded
    KVM: MMU: audit: fix vcpu's spte walking
    KVM: MMU: set access bit for direct mapping
    KVM: MMU: cleanup for error mask set while walk guest page table
    KVM: MMU: update 'root_hpa' out of loop in PAE shadow path
    KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn()
    KVM: x86: Fix constant type in kvm_get_time_scale
    KVM: VMX: Add AX to list of registers clobbered by guest switch
    KVM guest: Move a printk that's using the clock before it's ready
    KVM: x86: TSC catchup mode
    ...

    Linus Torvalds
     

24 Oct, 2010

7 commits


20 Aug, 2010

1 commit


02 Aug, 2010

2 commits


01 Aug, 2010

7 commits

  • May be used for distinguishing between internal and user slots, or for sorting
    slots in size order.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Usually the vcpu->requests bitmap is sparse, so a test_and_clear_bit() for
    each request generates a large number of unneeded atomics if a bit is set.

    Replace with a separate test/clear sequence. This is safe since there is
    no clear_bit() outside the vcpu thread.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Makes it a little more readable and hackable.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • As advertised in feature-removal-schedule.txt. Equivalent support is provided
    by overlapping memory regions.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • This patch enable guest to use XSAVE/XRSTOR instructions.

    We assume that host_xcr0 would use all possible bits that OS supported.

    And we loaded xcr0 in the same way we handled fpu - do it as late as we can.

    Signed-off-by: Dexuan Cui
    Signed-off-by: Sheng Yang
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Dexuan Cui
     
  • KVM_REQ_KICK poisons vcpu->requests by having a bit set during normal
    operation. This causes the fast path check for a clear vcpu->requests
    to fail all the time, triggering tons of atomic operations.

    Fix by replacing KVM_REQ_KICK with a vcpu->guest_mode atomic.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • In common cases, guest SRAO MCE will cause corresponding poisoned page
    be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay
    the MCE to guest OS.

    But it is reported that if the poisoned page is accessed in guest
    after unmapping and before MCE is relayed to guest OS, userspace will
    be killed.

    The reason is as follows. Because poisoned page has been un-mapped,
    guest access will cause guest exit and kvm_mmu_page_fault will be
    called. kvm_mmu_page_fault can not get the poisoned page for fault
    address, so kernel and user space MMIO processing is tried in turn. In
    user MMIO processing, poisoned page is accessed again, then userspace
    is killed by force_sig_info.

    To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM
    and do not try kernel and user space MMIO processing for poisoned
    page.

    [xiao: fix warning introduced by avi]

    Reported-by: Max Asbock
    Signed-off-by: Huang Ying
    Signed-off-by: Xiao Guangrong
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Huang Ying
     

19 May, 2010

1 commit


17 May, 2010

3 commits

  • Nobody use gva_to_page() anymore, get rid of it.

    Signed-off-by: Gui Jianfeng
    Signed-off-by: Avi Kivity

    Gui Jianfeng
     
  • The RCU/SRCU API have already changed for proving RCU usage.

    I got the following dmesg when PROVE_RCU=y because we used incorrect API.
    This patch coverts rcu_deference() to srcu_dereference() or family API.

    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 0
    2 locks held by qemu-system-x86/8550:
    #0: (&kvm->slots_lock){+.+.+.}, at: [] kvm_set_memory_region+0x29/0x50 [kvm]
    #1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm]

    stack backtrace:
    Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27
    Call Trace:
    [] lockdep_rcu_dereference+0xaa/0xb3
    [] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm]
    [] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm]
    [] __kvm_set_memory_region+0x636/0x6e2 [kvm]
    [] kvm_set_memory_region+0x37/0x50 [kvm]
    [] vmx_set_tss_addr+0x46/0x5a [kvm_intel]
    [] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm]
    [] ? unlock_page+0x27/0x2c
    [] ? __do_fault+0x3a9/0x3e1
    [] kvm_vm_ioctl+0x364/0x38d [kvm]
    [] ? up_read+0x23/0x3d
    [] vfs_ioctl+0x32/0xa6
    [] do_vfs_ioctl+0x495/0x4db
    [] ? fget_light+0xc2/0x241
    [] ? do_sys_open+0x104/0x116
    [] ? retint_swapgs+0xe/0x13
    [] sys_ioctl+0x47/0x6a
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Avi Kivity

    Lai Jiangshan
     
  • This patch limits the number of pages per memory slot to make
    us free from extra care about type issues.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Marcelo Tosatti

    Takuya Yoshikawa
     

20 Apr, 2010

2 commits

  • This patch increases the current hardcoded limit of NR_IOBUS_DEVS
    from 6 to 200. We are hitting this limit when creating a guest with more
    than 1 virtio-net device using vhost-net backend. Each virtio-net
    device requires 2 such devices to service notifications from rx/tx queues.

    Signed-off-by: Sridhar Samudrala
    Signed-off-by: Avi Kivity

    Sridhar Samudrala
     
  • Int is not long enough to store the size of a dirty bitmap.

    This patch fixes this problem with the introduction of a wrapper
    function to calculate the sizes of dirty bitmaps.

    Note: in mark_page_dirty(), we have to consider the fact that
    __set_bit() takes the offset as int, not long.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Marcelo Tosatti

    Takuya Yoshikawa
     

01 Mar, 2010

3 commits