22 May, 2011

1 commit

  • KVM does not hold any references to rcu protected data when it switches
    CPU into a guest mode. In fact switching to a guest mode is very similar
    to exiting to userspase from rcu point of view. In addition CPU may stay
    in a guest mode for quite a long time (up to one time slice). Lets treat
    guest mode as quiescent state, just like we do with user-mode execution.

    Signed-off-by: Gleb Natapov
    Signed-off-by: Avi Kivity

    Gleb Natapov
     

11 May, 2011

4 commits

  • This patch avoids gcc issuing the following warning when KVM_MAX_VCPUS=1:
    warning: array subscript is above array bounds

    kvm_for_each_vcpu currently checks to see if the index for the vcpu is
    valid /after/ loading it. We don't run into problems because the address
    is still inside the enclosing struct kvm and we never deference or write
    to it, so this isn't a security issue.

    The warning occurs when KVM_MAX_VCPUS=1 because the increment portion of
    the loop will *always* cause the loop to load an invalid location since
    ++idx will always be > 0.

    This patch moves the load so that the check occurs before the load and
    we don't run into the compiler warning.

    Signed-off-by: Neil Brown
    Signed-off-by: Jeff Mahoney
    Signed-off-by: Avi Kivity

    Jeff Mahoney
     
  • Since sse instructions can issue 16-byte mmios, we need to support them. We
    can't increase the kvm_run mmio buffer size to 16 bytes without breaking
    compatibility, so instead we break the large mmios into two smaller 8-byte
    ones. Since the bus is 64-bit we aren't breaking any atomicity guarantees.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • This reverts commit f86368493ec038218e8663cc1b6e5393cd8e008a.

    Simpler fix to follow.

    Signed-off-by: Marcelo Tosatti

    Marcelo Tosatti
     
  • We can get memslot id from memslot->id directly

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Avi Kivity

    Xiao Guangrong
     

18 Mar, 2011

5 commits

  • The interrupt injection logic looks something like

    if an nmi is pending, and nmi injection allowed
    inject nmi
    if an nmi is pending
    request exit on nmi window

    the problem is that "nmi is pending" can be set asynchronously by
    the PIT; if it happens to fire between the two if statements, we
    will request an nmi window even though nmi injection is allowed. On
    SVM, this has disasterous results, since it causes eflags.TF to be
    set in random guest code.

    The fix is simple; make nmi_pending synchronous using the standard
    vcpu->requests mechanism; this ensures the code above is completely
    synchronous wrt nmi_pending.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Instead of sleeping in kvm_vcpu_on_spin, which can cause gigantic
    slowdowns of certain workloads, we instead use yield_to to get
    another VCPU in the same KVM guest to run sooner.

    This seems to give a 10-15% speedup in certain workloads.

    Signed-off-by: Rik van Riel
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Rik van Riel
     
  • Keep track of which task is running a KVM vcpu. This helps us
    figure out later what task to wake up if we want to boost a
    vcpu that got preempted.

    Unfortunately there are no guarantees that the same task
    always keeps the same vcpu, so we can only track the task
    across a single "run" of the vcpu.

    Signed-off-by: Rik van Riel
    Signed-off-by: Avi Kivity

    Rik van Riel
     
  • Now, we have 'vcpu->mode' to judge whether need to send ipi to other
    cpus, this way is very exact, so checking request bit is needless,
    then we can drop the spinlock let it's collateral

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Avi Kivity

    Xiao Guangrong
     
  • Currently we keep track of only two states: guest mode and host
    mode. This patch adds an "exiting guest mode" state that tells
    us that an IPI will happen soon, so unless we need to wait for the
    IPI, we can avoid it completely.

    Also
    1: No need atomically to read/write ->mode in vcpu's thread

    2: reorganize struct kvm_vcpu to make ->mode and ->requests
    in the same cache line explicitly

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Avi Kivity

    Xiao Guangrong
     

12 Jan, 2011

13 commits

  • Make it available for all archs.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Large page information has two elements but one of them, write_count, alone
    is accessed by a helper function.

    This patch replaces this helper function with more generic one which returns
    newly named kvm_lpage_info structure and use it to access the other element
    rmap_pde.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Avi Kivity

    Takuya Yoshikawa
     
  • Quote from Avi:
    | I don't think we need to flush immediately; set a "tlb dirty" bit somewhere
    | that is cleareded when we flush the tlb. kvm_mmu_notifier_invalidate_page()
    | can consult the bit and force a flush if set.

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Marcelo Tosatti

    Xiao Guangrong
     
  • KVM compilation fails with the following warning:

    include/linux/kvm_host.h: In function 'kvm_irq_routing_update':
    include/linux/kvm_host.h:679:2: error: 'struct kvm' has no member named 'irq_routing'

    That function is only used and reasonable to have on systems that implement
    an in-kernel interrupt chip. PPC doesn't.

    Fix by #ifdef'ing it out when no irqchip is available.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • Store irq routing table pointer in the irqfd object,
    and use that to inject MSI directly without bouncing out to
    a kernel thread.

    While we touch this structure, rearrange irqfd fields to make fastpath
    better packed for better cache utilization.

    This also adds some comments about locking rules and rcu usage in code.

    Some notes on the design:
    - Use pointer into the rt instead of copying an entry,
    to make it possible to use rcu, thus side-stepping
    locking complexities. We also save some memory this way.
    - Old workqueue code is still used for level irqs.
    I don't think we DTRT with level anyway, however,
    it seems easier to keep the code around as
    it has been thought through and debugged, and fix level later than
    rip out and re-instate it later.

    Signed-off-by: Michael S. Tsirkin
    Acked-by: Marcelo Tosatti
    Acked-by: Gregory Haskins
    Signed-off-by: Avi Kivity

    Michael S. Tsirkin
     
  • Cosmetic change, but it helps to correlate IRQs with PCI devices.

    Acked-by: Alex Williamson
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Jan Kiszka
    Signed-off-by: Marcelo Tosatti

    Jan Kiszka
     
  • This improves the IRQ forwarding for assigned devices: By using the
    kernel's threaded IRQ scheme, we can get rid of the latency-prone work
    queue and simplify the code in the same run.

    Moreover, we no longer have to hold assigned_dev_lock while raising the
    guest IRQ, which can be a lenghty operation as we may have to iterate
    over all VCPUs. The lock is now only used for synchronizing masking vs.
    unmasking of INTx-type IRQs, thus is renames to intx_lock.

    Acked-by: Alex Williamson
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Jan Kiszka
    Signed-off-by: Marcelo Tosatti

    Jan Kiszka
     
  • IA64 support forces us to abstract the allocation of the kvm structure.
    But instead of mixing this up with arch-specific initialization and
    doing the same on destruction, split both steps. This allows to move
    generic destruction calls into generic code.

    It also fixes error clean-up on failures of kvm_create_vm for IA64.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     
  • Currently x86's kvm_vm_ioctl_get_dirty_log() needs to allocate a bitmap by
    vmalloc() which will be used in the next logging and this has been causing
    bad effect to VGA and live-migration: vmalloc() consumes extra systime,
    triggers tlb flush, etc.

    This patch resolves this issue by pre-allocating one more bitmap and switching
    between two bitmaps during dirty logging.

    Performance improvement:
    I measured performance for the case of VGA update by trace-cmd.
    The result was 1.5 times faster than the original one.

    In the case of live migration, the improvement ratio depends on the workload
    and the guest memory size. In general, the larger the memory size is the more
    benefits we get.

    Note:
    This does not change other architectures's logic but the allocation size
    becomes twice. This will increase the actual memory consumption only when
    the new size changes the number of pages allocated by vmalloc().

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Fernando Luis Vazquez Cao
    Signed-off-by: Marcelo Tosatti

    Takuya Yoshikawa
     
  • As suggested by Andrea, pass r/w error code to gup(), upgrading read fault
    to writable if host pte allows it.

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     
  • Guest enables async PF vcpu functionality using this MSR.

    Reviewed-by: Rik van Riel
    Signed-off-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Gleb Natapov
     
  • Keep track of memslots changes by keeping generation number in memslots
    structure. Provide kvm_write_guest_cached() function that skips
    gfn_to_hva() translation if memslots was not changed since previous
    invocation.

    Acked-by: Rik van Riel
    Signed-off-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Gleb Natapov
     
  • If a guest accesses swapped out memory do not swap it in from vcpu thread
    context. Schedule work to do swapping and put vcpu into halted state
    instead.

    Interrupts will still be delivered to the guest and if interrupt will
    cause reschedule guest will continue to run another task.

    [avi: remove call to get_user_pages_noio(), nacked by Linus; this
    makes everything synchrnous again]

    Acked-by: Rik van Riel
    Signed-off-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Gleb Natapov
     

25 Oct, 2010

1 commit

  • * 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (321 commits)
    KVM: Drop CONFIG_DMAR dependency around kvm_iommu_map_pages
    KVM: Fix signature of kvm_iommu_map_pages stub
    KVM: MCE: Send SRAR SIGBUS directly
    KVM: MCE: Add MCG_SER_P into KVM_MCE_CAP_SUPPORTED
    KVM: fix typo in copyright notice
    KVM: Disable interrupts around get_kernel_ns()
    KVM: MMU: Avoid sign extension in mmu_alloc_direct_roots() pae root address
    KVM: MMU: move access code parsing to FNAME(walk_addr) function
    KVM: MMU: audit: check whether have unsync sps after root sync
    KVM: MMU: audit: introduce audit_printk to cleanup audit code
    KVM: MMU: audit: unregister audit tracepoints before module unloaded
    KVM: MMU: audit: fix vcpu's spte walking
    KVM: MMU: set access bit for direct mapping
    KVM: MMU: cleanup for error mask set while walk guest page table
    KVM: MMU: update 'root_hpa' out of loop in PAE shadow path
    KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn()
    KVM: x86: Fix constant type in kvm_get_time_scale
    KVM: VMX: Add AX to list of registers clobbered by guest switch
    KVM guest: Move a printk that's using the clock before it's ready
    KVM: x86: TSC catchup mode
    ...

    Linus Torvalds
     

24 Oct, 2010

7 commits


20 Aug, 2010

1 commit


02 Aug, 2010

2 commits


01 Aug, 2010

6 commits