02 Aug, 2010

4 commits


01 Aug, 2010

15 commits

  • This patch converts unnecessary divide and modulo operations
    in the KVM large page related code into logical operations.
    This allows to convert gfn_t to u64 while not breaking 32
    bit builds.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Marcelo Tosatti

    Joerg Roedel
     
  • This patch fixes the following warning.

    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    include/linux/kvm_host.h:259 invoked rcu_dereference_check() without
    protection!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 0
    no locks held by qemu-system-x86/29679.

    stack backtrace:
    Pid: 29679, comm: qemu-system-x86 Not tainted 2.6.35-rc3+ #200
    Call Trace:
    [] lockdep_rcu_dereference+0xa8/0xb1
    [] kvm_iommu_unmap_memslots+0xc9/0xde [kvm]
    [] kvm_iommu_unmap_guest+0x40/0x4e [kvm]
    [] kvm_arch_destroy_vm+0x1a/0x186 [kvm]
    [] kvm_put_kvm+0x110/0x167 [kvm]
    [] kvm_vcpu_release+0x18/0x1c [kvm]
    [] fput+0x22a/0x3a0
    [] filp_close+0xb4/0xcd
    [] put_files_struct+0x1b7/0x36b
    [] ? put_files_struct+0x48/0x36b
    [] ? do_raw_spin_unlock+0x118/0x160
    [] exit_files+0x6d/0x75
    [] do_exit+0x47d/0xc60
    [] ? _raw_spin_unlock_irq+0x30/0x36
    [] do_group_exit+0xcf/0x134
    [] get_signal_to_deliver+0x732/0x81d
    [] ? cpu_clock+0x4e/0x60
    [] do_notify_resume+0x117/0xc43
    [] ? trace_hardirqs_on+0xd/0xf
    [] ? sys_rt_sigtimedwait+0x2b5/0x3bf
    [] ? trace_hardirqs_off_thunk+0x3a/0x3c
    [] ? sysret_signal+0x5/0x3d
    [] int_signal+0x12/0x17

    Signed-off-by: Sheng Yang
    Signed-off-by: Marcelo Tosatti

    Sheng Yang
     
  • is_hwpoison_address accesses the page table, so the caller must hold
    current->mm->mmap_sem in read mode. So fix its usage in hva_to_pfn of
    kvm accordingly.

    Comment is_hwpoison_address to remind other users.

    Reported-by: Avi Kivity
    Signed-off-by: Huang Ying
    Signed-off-by: Avi Kivity

    Huang Ying
     
  • May be used for distinguishing between internal and user slots, or for sorting
    slots in size order.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Makes it a little more readable and hackable.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • As advertised in feature-removal-schedule.txt. Equivalent support is provided
    by overlapping memory regions.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Otherwise we might try to deliver a timer interrupt to a cpu that
    can't possibly handle it.

    Signed-off-by: Chris Lalancette
    Signed-off-by: Marcelo Tosatti

    Chris Lalancette
     
  • No real bugs in this one.

    Signed-off-by: Andi Kleen
    Signed-off-by: Avi Kivity

    Andi Kleen
     
  • When the user passed in a NULL mask pass this on from the ioctl
    handler.

    Found by gcc 4.6's new warnings.

    Signed-off-by: Andi Kleen
    Signed-off-by: Avi Kivity

    Andi Kleen
     
  • The type of '*new.rmap' is not 'struct page *', fix it

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Marcelo Tosatti

    Lai Jiangshan
     
  • Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Now that all arch specific ioctls have centralized locking, it is easy to
    move it to the central dispatcher.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • All vcpu ioctls need to be locked, so instead of locking each one specifically
    we lock at the generic dispatcher.

    This patch only updates generic ioctls and leaves arch specific ioctls alone.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Remove this check in an effort to allow kvm guests to run without
    root privileges. This capability check doesn't seem to add any
    security since the device needs to have already been added via the
    assign device ioctl and the io actually occurs through the pci
    sysfs interface.

    Signed-off-by: Alex Williamson
    Signed-off-by: Marcelo Tosatti

    Alex Williamson
     
  • In common cases, guest SRAO MCE will cause corresponding poisoned page
    be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay
    the MCE to guest OS.

    But it is reported that if the poisoned page is accessed in guest
    after unmapping and before MCE is relayed to guest OS, userspace will
    be killed.

    The reason is as follows. Because poisoned page has been un-mapped,
    guest access will cause guest exit and kvm_mmu_page_fault will be
    called. kvm_mmu_page_fault can not get the poisoned page for fault
    address, so kernel and user space MMIO processing is tried in turn. In
    user MMIO processing, poisoned page is accessed again, then userspace
    is killed by force_sig_info.

    To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM
    and do not try kernel and user space MMIO processing for poisoned
    page.

    [xiao: fix warning introduced by avi]

    Reported-by: Max Asbock
    Signed-off-by: Huang Ying
    Signed-off-by: Xiao Guangrong
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Huang Ying
     

11 Jun, 2010

1 commit


09 Jun, 2010

1 commit

  • This is obviously a left-over from the the old interface taking the
    size. Apparently a mostly harmless issue with the current iommu_unmap
    implementation.

    Signed-off-by: Jan Kiszka
    Acked-by: Joerg Roedel
    Signed-off-by: Avi Kivity

    Jan Kiszka
     

22 May, 2010

1 commit

  • * 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (269 commits)
    KVM: x86: Add missing locking to arch specific vcpu ioctls
    KVM: PPC: Add missing vcpu_load()/vcpu_put() in vcpu ioctls
    KVM: MMU: Segregate shadow pages with different cr0.wp
    KVM: x86: Check LMA bit before set_efer
    KVM: Don't allow lmsw to clear cr0.pe
    KVM: Add cpuid.txt file
    KVM: x86: Tell the guest we'll warn it about tsc stability
    x86, paravirt: don't compute pvclock adjustments if we trust the tsc
    x86: KVM guest: Try using new kvm clock msrs
    KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
    KVM: x86: add new KVMCLOCK cpuid feature
    KVM: x86: change msr numbers for kvmclock
    x86, paravirt: Add a global synchronization point for pvclock
    x86, paravirt: Enable pvclock flags in vcpu_time_info structure
    KVM: x86: Inject #GP with the right rip on efer writes
    KVM: SVM: Don't allow nested guest to VMMCALL into host
    KVM: x86: Fix exception reinjection forced to true
    KVM: Fix wallclock version writing race
    KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots
    KVM: VMX: enable VMXON check with SMX enabled (Intel TXT)
    ...

    Linus Torvalds
     

19 May, 2010

1 commit


18 May, 2010

1 commit

  • …/git/tip/linux-2.6-tip

    * 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86/amd-iommu: Add amd_iommu=off command line option
    iommu-api: Remove iommu_{un}map_range functions
    x86/amd-iommu: Implement ->{un}map callbacks for iommu-api
    x86/amd-iommu: Make amd_iommu_iova_to_phys aware of multiple page sizes
    x86/amd-iommu: Make iommu_unmap_page and fetch_pte aware of page sizes
    x86/amd-iommu: Make iommu_map_page and alloc_pte aware of page sizes
    kvm: Change kvm_iommu_map_pages to map large pages
    VT-d: Change {un}map_range functions to implement {un}map interface
    iommu-api: Add ->{un}map callbacks to iommu_ops
    iommu-api: Add iommu_map and iommu_unmap functions
    iommu-api: Rename ->{un}map function pointers to ->{un}map_range

    Linus Torvalds
     

17 May, 2010

8 commits

  • As Avi pointed out, testing bit part in mark_page_dirty() was important
    in the days of shadow paging, but currently EPT and NPT has already become
    common and the chance of faulting a page more that once per iteration is
    small. So let's remove the test bit to avoid extra access.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Avi Kivity

    Takuya Yoshikawa
     
  • When CPU_UP_CANCELED, hardware_enable() has not been called at the CPU
    which is going up because raw_notifier_call_chain(CPU_ONLINE)
    has not been called for this cpu.

    Drop the handling for CPU_UP_CANCELED.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Avi Kivity

    Lai Jiangshan
     
  • The RCU/SRCU API have already changed for proving RCU usage.

    I got the following dmesg when PROVE_RCU=y because we used incorrect API.
    This patch coverts rcu_deference() to srcu_dereference() or family API.

    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 0
    2 locks held by qemu-system-x86/8550:
    #0: (&kvm->slots_lock){+.+.+.}, at: [] kvm_set_memory_region+0x29/0x50 [kvm]
    #1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm]

    stack backtrace:
    Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27
    Call Trace:
    [] lockdep_rcu_dereference+0xaa/0xb3
    [] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm]
    [] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm]
    [] __kvm_set_memory_region+0x636/0x6e2 [kvm]
    [] kvm_set_memory_region+0x37/0x50 [kvm]
    [] vmx_set_tss_addr+0x46/0x5a [kvm_intel]
    [] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm]
    [] ? unlock_page+0x27/0x2c
    [] ? __do_fault+0x3a9/0x3e1
    [] kvm_vm_ioctl+0x364/0x38d [kvm]
    [] ? up_read+0x23/0x3d
    [] vfs_ioctl+0x32/0xa6
    [] do_vfs_ioctl+0x495/0x4db
    [] ? fget_light+0xc2/0x241
    [] ? do_sys_open+0x104/0x116
    [] ? retint_swapgs+0xe/0x13
    [] sys_ioctl+0x47/0x6a
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Avi Kivity

    Lai Jiangshan
     
  • This patch limits the number of pages per memory slot to make
    us free from extra care about type issues.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Marcelo Tosatti

    Takuya Yoshikawa
     
  • kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced
    mmio ring page and dev even after it has freed them.

    Also, if this function fails, though it might be rare, it seems to be
    suggesting the system's serious state: so we'd better stop the works
    following the kvm_creat_vm().

    This patch clears these problems.

    We move the coalesced mmio's initialization out of kvm_create_vm().
    This seems to be natural because it includes a registration which
    can be done only when vm is successfully created.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Marcelo Tosatti

    Takuya Yoshikawa
     
  • Free IRQ's and disable MSIX upon failure.

    Cc: Avi Kivity
    Signed-off-by: Jing Zhang
    Signed-off-by: Marcelo Tosatti

    jing zhang
     
  • This patch change the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO
    from -EINVAL to -ENXIO if no coalesced mmio dev exists.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Marcelo Tosatti

    Wei Yongjun
     
  • This patch does:

    - no need call tracepoint_synchronize_unregister() when kvm module
    is unloaded since ftrace can handle it

    - cleanup ftrace's macro

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Avi Kivity

    Xiao Guangrong
     

13 May, 2010

1 commit


11 May, 2010

1 commit


25 Apr, 2010

1 commit


21 Apr, 2010

1 commit

  • I got this dmesg due to srcu_read_lock() is missing in
    kvm_mmu_notifier_release().

    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    arch/x86/kvm/x86.h:72 invoked rcu_dereference_check() without protection!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 0
    2 locks held by qemu-system-x86/3100:
    #0: (rcu_read_lock){.+.+..}, at: [] __mmu_notifier_release+0x38/0xdf
    #1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] kvm_mmu_zap_all+0x21/0x5e [kvm]

    stack backtrace:
    Pid: 3100, comm: qemu-system-x86 Not tainted 2.6.34-rc3-22949-gbc8a97a-dirty #2
    Call Trace:
    [] lockdep_rcu_dereference+0xaa/0xb3
    [] unalias_gfn+0x56/0xab [kvm]
    [] gfn_to_memslot+0x16/0x25 [kvm]
    [] gfn_to_rmap+0x17/0x6e [kvm]
    [] rmap_remove+0xa0/0x19d [kvm]
    [] kvm_mmu_zap_page+0x109/0x34d [kvm]
    [] kvm_mmu_zap_all+0x35/0x5e [kvm]
    [] kvm_arch_flush_shadow+0x16/0x22 [kvm]
    [] kvm_mmu_notifier_release+0x15/0x17 [kvm]
    [] __mmu_notifier_release+0x88/0xdf
    [] ? __mmu_notifier_release+0x38/0xdf
    [] ? exit_mm+0xe0/0x115
    [] exit_mmap+0x2c/0x17e
    [] mmput+0x2d/0xd4
    [] exit_mm+0x108/0x115
    [...]

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Avi Kivity

    Lai Jiangshan
     

20 Apr, 2010

1 commit

  • Int is not long enough to store the size of a dirty bitmap.

    This patch fixes this problem with the introduction of a wrapper
    function to calculate the sizes of dirty bitmaps.

    Note: in mark_page_dirty(), we have to consider the fact that
    __set_bit() takes the offset as int, not long.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Marcelo Tosatti

    Takuya Yoshikawa
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

08 Mar, 2010

1 commit


01 Mar, 2010

1 commit