02 Aug, 2010

2 commits


01 Aug, 2010

7 commits

  • May be used for distinguishing between internal and user slots, or for sorting
    slots in size order.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Usually the vcpu->requests bitmap is sparse, so a test_and_clear_bit() for
    each request generates a large number of unneeded atomics if a bit is set.

    Replace with a separate test/clear sequence. This is safe since there is
    no clear_bit() outside the vcpu thread.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Makes it a little more readable and hackable.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • As advertised in feature-removal-schedule.txt. Equivalent support is provided
    by overlapping memory regions.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • This patch enable guest to use XSAVE/XRSTOR instructions.

    We assume that host_xcr0 would use all possible bits that OS supported.

    And we loaded xcr0 in the same way we handled fpu - do it as late as we can.

    Signed-off-by: Dexuan Cui
    Signed-off-by: Sheng Yang
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Dexuan Cui
     
  • KVM_REQ_KICK poisons vcpu->requests by having a bit set during normal
    operation. This causes the fast path check for a clear vcpu->requests
    to fail all the time, triggering tons of atomic operations.

    Fix by replacing KVM_REQ_KICK with a vcpu->guest_mode atomic.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • In common cases, guest SRAO MCE will cause corresponding poisoned page
    be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay
    the MCE to guest OS.

    But it is reported that if the poisoned page is accessed in guest
    after unmapping and before MCE is relayed to guest OS, userspace will
    be killed.

    The reason is as follows. Because poisoned page has been un-mapped,
    guest access will cause guest exit and kvm_mmu_page_fault will be
    called. kvm_mmu_page_fault can not get the poisoned page for fault
    address, so kernel and user space MMIO processing is tried in turn. In
    user MMIO processing, poisoned page is accessed again, then userspace
    is killed by force_sig_info.

    To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM
    and do not try kernel and user space MMIO processing for poisoned
    page.

    [xiao: fix warning introduced by avi]

    Reported-by: Max Asbock
    Signed-off-by: Huang Ying
    Signed-off-by: Xiao Guangrong
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Huang Ying
     

19 May, 2010

1 commit


17 May, 2010

3 commits

  • Nobody use gva_to_page() anymore, get rid of it.

    Signed-off-by: Gui Jianfeng
    Signed-off-by: Avi Kivity

    Gui Jianfeng
     
  • The RCU/SRCU API have already changed for proving RCU usage.

    I got the following dmesg when PROVE_RCU=y because we used incorrect API.
    This patch coverts rcu_deference() to srcu_dereference() or family API.

    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 0
    2 locks held by qemu-system-x86/8550:
    #0: (&kvm->slots_lock){+.+.+.}, at: [] kvm_set_memory_region+0x29/0x50 [kvm]
    #1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm]

    stack backtrace:
    Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27
    Call Trace:
    [] lockdep_rcu_dereference+0xaa/0xb3
    [] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm]
    [] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm]
    [] __kvm_set_memory_region+0x636/0x6e2 [kvm]
    [] kvm_set_memory_region+0x37/0x50 [kvm]
    [] vmx_set_tss_addr+0x46/0x5a [kvm_intel]
    [] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm]
    [] ? unlock_page+0x27/0x2c
    [] ? __do_fault+0x3a9/0x3e1
    [] kvm_vm_ioctl+0x364/0x38d [kvm]
    [] ? up_read+0x23/0x3d
    [] vfs_ioctl+0x32/0xa6
    [] do_vfs_ioctl+0x495/0x4db
    [] ? fget_light+0xc2/0x241
    [] ? do_sys_open+0x104/0x116
    [] ? retint_swapgs+0xe/0x13
    [] sys_ioctl+0x47/0x6a
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Avi Kivity

    Lai Jiangshan
     
  • This patch limits the number of pages per memory slot to make
    us free from extra care about type issues.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Marcelo Tosatti

    Takuya Yoshikawa
     

20 Apr, 2010

2 commits

  • This patch increases the current hardcoded limit of NR_IOBUS_DEVS
    from 6 to 200. We are hitting this limit when creating a guest with more
    than 1 virtio-net device using vhost-net backend. Each virtio-net
    device requires 2 such devices to service notifications from rx/tx queues.

    Signed-off-by: Sridhar Samudrala
    Signed-off-by: Avi Kivity

    Sridhar Samudrala
     
  • Int is not long enough to store the size of a dirty bitmap.

    This patch fixes this problem with the introduction of a wrapper
    function to calculate the sizes of dirty bitmaps.

    Note: in mark_page_dirty(), we have to consider the fact that
    __set_bit() takes the offset as int, not long.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Marcelo Tosatti

    Takuya Yoshikawa
     

01 Mar, 2010

13 commits


03 Dec, 2009

7 commits


19 Sep, 2009

1 commit

  • Now that the last users of markers have migrated to the event
    tracer we can kill off the (now orphan) support code.

    Signed-off-by: Christoph Hellwig
    Acked-by: Mathieu Desnoyers
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Christoph Hellwig
     

10 Sep, 2009

4 commits

  • Remove kvm_cpu_has_interrupt() and kvm_arch_interrupt_allowed() from
    interface between general code and arch code. kvm_arch_vcpu_runnable()
    checks for interrupts instead.

    Signed-off-by: Gleb Natapov
    Signed-off-by: Avi Kivity

    Gleb Natapov
     
  • It is implemented only by x86.

    Signed-off-by: Gleb Natapov
    Signed-off-by: Avi Kivity

    Gleb Natapov
     
  • ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
    signal when written to by a guest. Host userspace can register any
    arbitrary IO address with a corresponding eventfd and then pass the eventfd
    to a specific end-point of interest for handling.

    Normal IO requires a blocking round-trip since the operation may cause
    side-effects in the emulated model or may return data to the caller.
    Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
    "heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
    device model synchronously before returning control back to the vcpu.

    However, there is a subclass of IO which acts purely as a trigger for
    other IO (such as to kick off an out-of-band DMA request, etc). For these
    patterns, the synchronous call is particularly expensive since we really
    only want to simply get our notification transmitted asychronously and
    return as quickly as possible. All the sychronous infrastructure to ensure
    proper data-dependencies are met in the normal IO case are just unecessary
    overhead for signalling. This adds additional computational load on the
    system, as well as latency to the signalling path.

    Therefore, we provide a mechanism for registration of an in-kernel trigger
    point that allows the VCPU to only require a very brief, lightweight
    exit just long enough to signal an eventfd. This also means that any
    clients compatible with the eventfd interface (which includes userspace
    and kernelspace equally well) can now register to be notified. The end
    result should be a more flexible and higher performance notification API
    for the backend KVM hypervisor and perhipheral components.

    To test this theory, we built a test-harness called "doorbell". This
    module has a function called "doorbell_ring()" which simply increments a
    counter for each time the doorbell is signaled. It supports signalling
    from either an eventfd, or an ioctl().

    We then wired up two paths to the doorbell: One via QEMU via a registered
    io region and through the doorbell ioctl(). The other is direct via
    ioeventfd.

    You can download this test harness here:

    ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2

    The measured results are as follows:

    qemu-mmio: 110000 iops, 9.09us rtt
    ioeventfd-mmio: 200100 iops, 5.00us rtt
    ioeventfd-pio: 367300 iops, 2.72us rtt

    I didn't measure qemu-pio, because I have to figure out how to register a
    PIO region with qemu's device model, and I got lazy. However, for now we
    can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
    and -350ns for HC, we get:

    qemu-pio: 153139 iops, 6.53us rtt
    ioeventfd-hc: 412585 iops, 2.37us rtt

    these are just for fun, for now, until I can gather more data.

    Here is a graph for your convenience:

    http://developer.novell.com/wiki/images/7/76/Iofd-chart.png

    The conclusion to draw is that we save about 4us by skipping the userspace
    hop.

    --------------------

    Signed-off-by: Gregory Haskins
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Avi Kivity

    Gregory Haskins
     
  • Today kvm_io_bus_regsiter_dev() returns void and will internally BUG_ON
    if it fails. We want to create dynamic MMIO/PIO entries driven from
    userspace later in the series, so we need to enhance the code to be more
    robust with the following changes:

    1) Add a return value to the registration function
    2) Fix up all the callsites to check the return code, handle any
    failures, and percolate the error up to the caller.
    3) Add an unregister function that collapses holes in the array

    Signed-off-by: Gregory Haskins
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Avi Kivity

    Gregory Haskins