23 Sep, 2010

2 commits

  • When we reboot, we disable vmx extensions or otherwise INIT gets blocked.
    If a task on another cpu hits a vmx instruction, it will fault if vmx is
    disabled. We trap that to avoid a nasty oops and spin until the reboot
    completes.

    Problem is, we sleep with interrupts disabled. This blocks smp_send_stop()
    from running, and the reboot process halts.

    Fix by enabling interrupts before spinning.

    KVM-Stable-Tag.
    Signed-off-by: Avi Kivity
    Signed-off-by: Marcelo Tosatti

    Avi Kivity
     
  • I think I see the following (theoretical) race:

    During irqfd assign, we drop irqfds lock before we
    schedule inject work. Therefore, deassign running
    on another CPU could cause shutdown and flush to run
    before inject, causing user after free in inject.

    A simple fix it to schedule inject under the lock.

    Signed-off-by: Michael S. Tsirkin
    Acked-by: Gregory Haskins
    Signed-off-by: Marcelo Tosatti

    Michael S. Tsirkin
     

10 Sep, 2010

1 commit

  • The CPU_STARTING callback was added upstream with the intention
    of being used for KVM, specifically for the hardware enablement
    that must be done before we can run in hardware virt. It had
    bugs on the x86_64 architecture at the time, where it was called
    after CPU_ONLINE. The arches have since merged and the bug is
    gone.

    It might be noted other features should probably start making
    use of this callback; microcode updates in particular which
    might be fixing important erratums would be best applied before
    beginning to run user tasks.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Marcelo Tosatti

    Zachary Amsden
     

02 Aug, 2010

4 commits


01 Aug, 2010

15 commits

  • This patch converts unnecessary divide and modulo operations
    in the KVM large page related code into logical operations.
    This allows to convert gfn_t to u64 while not breaking 32
    bit builds.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Marcelo Tosatti

    Joerg Roedel
     
  • This patch fixes the following warning.

    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    include/linux/kvm_host.h:259 invoked rcu_dereference_check() without
    protection!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 0
    no locks held by qemu-system-x86/29679.

    stack backtrace:
    Pid: 29679, comm: qemu-system-x86 Not tainted 2.6.35-rc3+ #200
    Call Trace:
    [] lockdep_rcu_dereference+0xa8/0xb1
    [] kvm_iommu_unmap_memslots+0xc9/0xde [kvm]
    [] kvm_iommu_unmap_guest+0x40/0x4e [kvm]
    [] kvm_arch_destroy_vm+0x1a/0x186 [kvm]
    [] kvm_put_kvm+0x110/0x167 [kvm]
    [] kvm_vcpu_release+0x18/0x1c [kvm]
    [] fput+0x22a/0x3a0
    [] filp_close+0xb4/0xcd
    [] put_files_struct+0x1b7/0x36b
    [] ? put_files_struct+0x48/0x36b
    [] ? do_raw_spin_unlock+0x118/0x160
    [] exit_files+0x6d/0x75
    [] do_exit+0x47d/0xc60
    [] ? _raw_spin_unlock_irq+0x30/0x36
    [] do_group_exit+0xcf/0x134
    [] get_signal_to_deliver+0x732/0x81d
    [] ? cpu_clock+0x4e/0x60
    [] do_notify_resume+0x117/0xc43
    [] ? trace_hardirqs_on+0xd/0xf
    [] ? sys_rt_sigtimedwait+0x2b5/0x3bf
    [] ? trace_hardirqs_off_thunk+0x3a/0x3c
    [] ? sysret_signal+0x5/0x3d
    [] int_signal+0x12/0x17

    Signed-off-by: Sheng Yang
    Signed-off-by: Marcelo Tosatti

    Sheng Yang
     
  • is_hwpoison_address accesses the page table, so the caller must hold
    current->mm->mmap_sem in read mode. So fix its usage in hva_to_pfn of
    kvm accordingly.

    Comment is_hwpoison_address to remind other users.

    Reported-by: Avi Kivity
    Signed-off-by: Huang Ying
    Signed-off-by: Avi Kivity

    Huang Ying
     
  • May be used for distinguishing between internal and user slots, or for sorting
    slots in size order.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Makes it a little more readable and hackable.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • As advertised in feature-removal-schedule.txt. Equivalent support is provided
    by overlapping memory regions.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Otherwise we might try to deliver a timer interrupt to a cpu that
    can't possibly handle it.

    Signed-off-by: Chris Lalancette
    Signed-off-by: Marcelo Tosatti

    Chris Lalancette
     
  • No real bugs in this one.

    Signed-off-by: Andi Kleen
    Signed-off-by: Avi Kivity

    Andi Kleen
     
  • When the user passed in a NULL mask pass this on from the ioctl
    handler.

    Found by gcc 4.6's new warnings.

    Signed-off-by: Andi Kleen
    Signed-off-by: Avi Kivity

    Andi Kleen
     
  • The type of '*new.rmap' is not 'struct page *', fix it

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Marcelo Tosatti

    Lai Jiangshan
     
  • Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Now that all arch specific ioctls have centralized locking, it is easy to
    move it to the central dispatcher.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • All vcpu ioctls need to be locked, so instead of locking each one specifically
    we lock at the generic dispatcher.

    This patch only updates generic ioctls and leaves arch specific ioctls alone.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Remove this check in an effort to allow kvm guests to run without
    root privileges. This capability check doesn't seem to add any
    security since the device needs to have already been added via the
    assign device ioctl and the io actually occurs through the pci
    sysfs interface.

    Signed-off-by: Alex Williamson
    Signed-off-by: Marcelo Tosatti

    Alex Williamson
     
  • In common cases, guest SRAO MCE will cause corresponding poisoned page
    be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay
    the MCE to guest OS.

    But it is reported that if the poisoned page is accessed in guest
    after unmapping and before MCE is relayed to guest OS, userspace will
    be killed.

    The reason is as follows. Because poisoned page has been un-mapped,
    guest access will cause guest exit and kvm_mmu_page_fault will be
    called. kvm_mmu_page_fault can not get the poisoned page for fault
    address, so kernel and user space MMIO processing is tried in turn. In
    user MMIO processing, poisoned page is accessed again, then userspace
    is killed by force_sig_info.

    To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM
    and do not try kernel and user space MMIO processing for poisoned
    page.

    [xiao: fix warning introduced by avi]

    Reported-by: Max Asbock
    Signed-off-by: Huang Ying
    Signed-off-by: Xiao Guangrong
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Huang Ying
     

11 Jun, 2010

1 commit


09 Jun, 2010

1 commit

  • This is obviously a left-over from the the old interface taking the
    size. Apparently a mostly harmless issue with the current iommu_unmap
    implementation.

    Signed-off-by: Jan Kiszka
    Acked-by: Joerg Roedel
    Signed-off-by: Avi Kivity

    Jan Kiszka
     

22 May, 2010

1 commit

  • * 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (269 commits)
    KVM: x86: Add missing locking to arch specific vcpu ioctls
    KVM: PPC: Add missing vcpu_load()/vcpu_put() in vcpu ioctls
    KVM: MMU: Segregate shadow pages with different cr0.wp
    KVM: x86: Check LMA bit before set_efer
    KVM: Don't allow lmsw to clear cr0.pe
    KVM: Add cpuid.txt file
    KVM: x86: Tell the guest we'll warn it about tsc stability
    x86, paravirt: don't compute pvclock adjustments if we trust the tsc
    x86: KVM guest: Try using new kvm clock msrs
    KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
    KVM: x86: add new KVMCLOCK cpuid feature
    KVM: x86: change msr numbers for kvmclock
    x86, paravirt: Add a global synchronization point for pvclock
    x86, paravirt: Enable pvclock flags in vcpu_time_info structure
    KVM: x86: Inject #GP with the right rip on efer writes
    KVM: SVM: Don't allow nested guest to VMMCALL into host
    KVM: x86: Fix exception reinjection forced to true
    KVM: Fix wallclock version writing race
    KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots
    KVM: VMX: enable VMXON check with SMX enabled (Intel TXT)
    ...

    Linus Torvalds
     

19 May, 2010

1 commit


18 May, 2010

1 commit

  • …/git/tip/linux-2.6-tip

    * 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86/amd-iommu: Add amd_iommu=off command line option
    iommu-api: Remove iommu_{un}map_range functions
    x86/amd-iommu: Implement ->{un}map callbacks for iommu-api
    x86/amd-iommu: Make amd_iommu_iova_to_phys aware of multiple page sizes
    x86/amd-iommu: Make iommu_unmap_page and fetch_pte aware of page sizes
    x86/amd-iommu: Make iommu_map_page and alloc_pte aware of page sizes
    kvm: Change kvm_iommu_map_pages to map large pages
    VT-d: Change {un}map_range functions to implement {un}map interface
    iommu-api: Add ->{un}map callbacks to iommu_ops
    iommu-api: Add iommu_map and iommu_unmap functions
    iommu-api: Rename ->{un}map function pointers to ->{un}map_range

    Linus Torvalds
     

17 May, 2010

8 commits

  • As Avi pointed out, testing bit part in mark_page_dirty() was important
    in the days of shadow paging, but currently EPT and NPT has already become
    common and the chance of faulting a page more that once per iteration is
    small. So let's remove the test bit to avoid extra access.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Avi Kivity

    Takuya Yoshikawa
     
  • When CPU_UP_CANCELED, hardware_enable() has not been called at the CPU
    which is going up because raw_notifier_call_chain(CPU_ONLINE)
    has not been called for this cpu.

    Drop the handling for CPU_UP_CANCELED.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Avi Kivity

    Lai Jiangshan
     
  • The RCU/SRCU API have already changed for proving RCU usage.

    I got the following dmesg when PROVE_RCU=y because we used incorrect API.
    This patch coverts rcu_deference() to srcu_dereference() or family API.

    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 0
    2 locks held by qemu-system-x86/8550:
    #0: (&kvm->slots_lock){+.+.+.}, at: [] kvm_set_memory_region+0x29/0x50 [kvm]
    #1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm]

    stack backtrace:
    Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27
    Call Trace:
    [] lockdep_rcu_dereference+0xaa/0xb3
    [] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm]
    [] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm]
    [] __kvm_set_memory_region+0x636/0x6e2 [kvm]
    [] kvm_set_memory_region+0x37/0x50 [kvm]
    [] vmx_set_tss_addr+0x46/0x5a [kvm_intel]
    [] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm]
    [] ? unlock_page+0x27/0x2c
    [] ? __do_fault+0x3a9/0x3e1
    [] kvm_vm_ioctl+0x364/0x38d [kvm]
    [] ? up_read+0x23/0x3d
    [] vfs_ioctl+0x32/0xa6
    [] do_vfs_ioctl+0x495/0x4db
    [] ? fget_light+0xc2/0x241
    [] ? do_sys_open+0x104/0x116
    [] ? retint_swapgs+0xe/0x13
    [] sys_ioctl+0x47/0x6a
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Avi Kivity

    Lai Jiangshan
     
  • This patch limits the number of pages per memory slot to make
    us free from extra care about type issues.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Marcelo Tosatti

    Takuya Yoshikawa
     
  • kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced
    mmio ring page and dev even after it has freed them.

    Also, if this function fails, though it might be rare, it seems to be
    suggesting the system's serious state: so we'd better stop the works
    following the kvm_creat_vm().

    This patch clears these problems.

    We move the coalesced mmio's initialization out of kvm_create_vm().
    This seems to be natural because it includes a registration which
    can be done only when vm is successfully created.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Marcelo Tosatti

    Takuya Yoshikawa
     
  • Free IRQ's and disable MSIX upon failure.

    Cc: Avi Kivity
    Signed-off-by: Jing Zhang
    Signed-off-by: Marcelo Tosatti

    jing zhang
     
  • This patch change the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO
    from -EINVAL to -ENXIO if no coalesced mmio dev exists.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Marcelo Tosatti

    Wei Yongjun
     
  • This patch does:

    - no need call tracepoint_synchronize_unregister() when kvm module
    is unloaded since ftrace can handle it

    - cleanup ftrace's macro

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Avi Kivity

    Xiao Guangrong
     

13 May, 2010

1 commit


11 May, 2010

1 commit


25 Apr, 2010

1 commit


21 Apr, 2010

1 commit

  • I got this dmesg due to srcu_read_lock() is missing in
    kvm_mmu_notifier_release().

    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    arch/x86/kvm/x86.h:72 invoked rcu_dereference_check() without protection!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 0
    2 locks held by qemu-system-x86/3100:
    #0: (rcu_read_lock){.+.+..}, at: [] __mmu_notifier_release+0x38/0xdf
    #1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] kvm_mmu_zap_all+0x21/0x5e [kvm]

    stack backtrace:
    Pid: 3100, comm: qemu-system-x86 Not tainted 2.6.34-rc3-22949-gbc8a97a-dirty #2
    Call Trace:
    [] lockdep_rcu_dereference+0xaa/0xb3
    [] unalias_gfn+0x56/0xab [kvm]
    [] gfn_to_memslot+0x16/0x25 [kvm]
    [] gfn_to_rmap+0x17/0x6e [kvm]
    [] rmap_remove+0xa0/0x19d [kvm]
    [] kvm_mmu_zap_page+0x109/0x34d [kvm]
    [] kvm_mmu_zap_all+0x35/0x5e [kvm]
    [] kvm_arch_flush_shadow+0x16/0x22 [kvm]
    [] kvm_mmu_notifier_release+0x15/0x17 [kvm]
    [] __mmu_notifier_release+0x88/0xdf
    [] ? __mmu_notifier_release+0x38/0xdf
    [] ? exit_mm+0xe0/0x115
    [] exit_mmap+0x2c/0x17e
    [] mmput+0x2d/0xd4
    [] exit_mm+0x108/0x115
    [...]

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Avi Kivity

    Lai Jiangshan
     

20 Apr, 2010

1 commit

  • Int is not long enough to store the size of a dirty bitmap.

    This patch fixes this problem with the introduction of a wrapper
    function to calculate the sizes of dirty bitmaps.

    Note: in mark_page_dirty(), we have to consider the fact that
    __set_bit() takes the offset as int, not long.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Marcelo Tosatti

    Takuya Yoshikawa