24 Oct, 2012

1 commit


23 Oct, 2012

1 commit

  • We can not directly call kvm_release_pfn_clean to release the pfn
    since we can meet noslot pfn which is used to cache mmio info into
    spte

    Signed-off-by: Xiao Guangrong
    Cc: stable@vger.kernel.org
    Signed-off-by: Avi Kivity

    Xiao Guangrong
     

06 Oct, 2012

1 commit


05 Oct, 2012

1 commit

  • Pull KVM updates from Avi Kivity:
    "Highlights of the changes for this release include support for vfio
    level triggered interrupts, improved big real mode support on older
    Intels, a streamlines guest page table walker, guest APIC speedups,
    PIO optimizations, better overcommit handling, and read-only memory."

    * tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (138 commits)
    KVM: s390: Fix vcpu_load handling in interrupt code
    KVM: x86: Fix guest debug across vcpu INIT reset
    KVM: Add resampling irqfds for level triggered interrupts
    KVM: optimize apic interrupt delivery
    KVM: MMU: Eliminate pointless temporary 'ac'
    KVM: MMU: Avoid access/dirty update loop if all is well
    KVM: MMU: Eliminate eperm temporary
    KVM: MMU: Optimize is_last_gpte()
    KVM: MMU: Simplify walk_addr_generic() loop
    KVM: MMU: Optimize pte permission checks
    KVM: MMU: Update accessed and dirty bits after guest pagetable walk
    KVM: MMU: Move gpte_access() out of paging_tmpl.h
    KVM: MMU: Optimize gpte_access() slightly
    KVM: MMU: Push clean gpte write protection out of gpte_access()
    KVM: clarify kvmclock documentation
    KVM: make processes waiting on vcpu mutex killable
    KVM: SVM: Make use of asm.h
    KVM: VMX: Make use of asm.h
    KVM: VMX: Make lto-friendly
    KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
    ...

    Conflicts:
    arch/s390/include/asm/processor.h
    arch/x86/kvm/i8259.c

    Linus Torvalds
     

03 Oct, 2012

1 commit

  • Pull workqueue changes from Tejun Heo:
    "This is workqueue updates for v3.7-rc1. A lot of activities this
    round including considerable API and behavior cleanups.

    * delayed_work combines a timer and a work item. The handling of the
    timer part has always been a bit clunky leading to confusing
    cancelation API with weird corner-case behaviors. delayed_work is
    updated to use new IRQ safe timer and cancelation now works as
    expected.

    * Another deficiency of delayed_work was lack of the counterpart of
    mod_timer() which led to cancel+queue combinations or open-coded
    timer+work usages. mod_delayed_work[_on]() are added.

    These two delayed_work changes make delayed_work provide interface
    and behave like timer which is executed with process context.

    * A work item could be executed concurrently on multiple CPUs, which
    is rather unintuitive and made flush_work() behavior confusing and
    half-broken under certain circumstances. This problem doesn't
    exist for non-reentrant workqueues. While non-reentrancy check
    isn't free, the overhead is incurred only when a work item bounces
    across different CPUs and even in simulated pathological scenario
    the overhead isn't too high.

    All workqueues are made non-reentrant. This removes the
    distinction between flush_[delayed_]work() and
    flush_[delayed_]_work_sync(). The former is now as strong as the
    latter and the specified work item is guaranteed to have finished
    execution of any previous queueing on return.

    * In addition to the various bug fixes, Lai redid and simplified CPU
    hotplug handling significantly.

    * Joonsoo introduced system_highpri_wq and used it during CPU
    hotplug.

    There are two merge commits - one to pull in IRQ safe timer from
    tip/timers/core and the other to pull in CPU hotplug fixes from
    wq/for-3.6-fixes as Lai's hotplug restructuring depended on them."

    Fixed a number of trivial conflicts, but the more interesting conflicts
    were silent ones where the deprecated interfaces had been used by new
    code in the merge window, and thus didn't cause any real data conflicts.

    Tejun pointed out a few of them, I fixed a couple more.

    * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits)
    workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending()
    workqueue: use cwq_set_max_active() helper for workqueue_set_max_active()
    workqueue: introduce cwq_set_max_active() helper for thaw_workqueues()
    workqueue: remove @delayed from cwq_dec_nr_in_flight()
    workqueue: fix possible stall on try_to_grab_pending() of a delayed work item
    workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback()
    workqueue: use __cpuinit instead of __devinit for cpu callbacks
    workqueue: rename manager_mutex to assoc_mutex
    workqueue: WORKER_REBIND is no longer necessary for idle rebinding
    workqueue: WORKER_REBIND is no longer necessary for busy rebinding
    workqueue: reimplement idle worker rebinding
    workqueue: deprecate __cancel_delayed_work()
    workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()
    workqueue: use mod_delayed_work() instead of __cancel + queue
    workqueue: use irqsafe timer for delayed_work
    workqueue: clean up delayed_work initializers and add missing one
    workqueue: make deferrable delayed_work initializer names consistent
    workqueue: cosmetic whitespace updates for macro definitions
    workqueue: deprecate system_nrt[_freezable]_wq
    workqueue: deprecate flush[_delayed]_work_sync()
    ...

    Linus Torvalds
     

23 Sep, 2012

1 commit

  • To emulate level triggered interrupts, add a resample option to
    KVM_IRQFD. When specified, a new resamplefd is provided that notifies
    the user when the irqchip has been resampled by the VM. This may, for
    instance, indicate an EOI. Also in this mode, posting of an interrupt
    through an irqfd only asserts the interrupt. On resampling, the
    interrupt is automatically de-asserted prior to user notification.
    This enables level triggered interrupts to be posted and re-enabled
    from vfio with no userspace intervention.

    All resampling irqfds can make use of a single irq source ID, so we
    reserve a new one for this interface.

    Signed-off-by: Alex Williamson
    Signed-off-by: Avi Kivity

    Alex Williamson
     

20 Sep, 2012

1 commit

  • Most interrupt are delivered to only one vcpu. Use pre-build tables to
    find interrupt destination instead of looping through all vcpus. In case
    of logical mode loop only through vcpus in a logical cluster irq is sent
    to.

    Signed-off-by: Gleb Natapov
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Avi Kivity

    Gleb Natapov
     

18 Sep, 2012

1 commit

  • vcpu mutex can be held for unlimited time so
    taking it with mutex_lock on an ioctl is wrong:
    one process could be passed a vcpu fd and
    call this ioctl on the vcpu used by another process,
    it will then be unkillable until the owner exits.

    Call mutex_lock_killable instead and return status.
    Note: mutex_lock_interruptible would be even nicer,
    but I am not sure all users are prepared to handle EINTR
    from these ioctls. They might misinterpret it as an error.

    Cleanup paths expect a vcpu that can't be used by
    any userspace so this will always succeed - catch bugs
    by calling BUG_ON.

    Catch callers that don't check return state by adding
    __must_check.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Marcelo Tosatti

    Michael S. Tsirkin
     

06 Sep, 2012

3 commits


28 Aug, 2012

1 commit

  • The build error was caused by that builtin functions are calling
    the functions implemented in modules. This error was introduced by
    commit 4d8b81abc4 ("KVM: introduce readonly memslot").

    The patch fixes the build error by moving function __gfn_to_hva_memslot()
    from kvm_main.c to kvm_host.h and making that "inline" so that the
    builtin function (kvmppc_h_enter) can use that.

    Acked-by: Paul Mackerras
    Signed-off-by: Gavin Shan
    Signed-off-by: Marcelo Tosatti

    Gavin Shan
     

27 Aug, 2012

1 commit

  • KVM_SET_SIGNAL_MASK passed a NULL argument leaves the on stack signal
    sets uninitialized. It then passes them through to
    kvm_vcpu_ioctl_set_sigmask.

    We should be passing a NULL in this case not translated garbage.

    Signed-off-by: Alan Cox
    Signed-off-by: Marcelo Tosatti

    Alan Cox
     

22 Aug, 2012

7 commits


21 Aug, 2012

1 commit

  • flush[_delayed]_work_sync() are now spurious. Mark them deprecated
    and convert all users to flush[_delayed]_work().

    If you're cc'd and wondering what's going on: Now all workqueues are
    non-reentrant and the regular flushes guarantee that the work item is
    not pending or running on any CPU on return, so there's no reason to
    use the sync flushes at all and they're going away.

    This patch doesn't make any functional difference.

    Signed-off-by: Tejun Heo
    Cc: Russell King
    Cc: Paul Mundt
    Cc: Ian Campbell
    Cc: Jens Axboe
    Cc: Mattia Dongili
    Cc: Kent Yoder
    Cc: David Airlie
    Cc: Jiri Kosina
    Cc: Karsten Keil
    Cc: Bryan Wu
    Cc: Benjamin Herrenschmidt
    Cc: Alasdair Kergon
    Cc: Mauro Carvalho Chehab
    Cc: Florian Tobias Schandinat
    Cc: David Woodhouse
    Cc: "David S. Miller"
    Cc: linux-wireless@vger.kernel.org
    Cc: Anton Vorontsov
    Cc: Sangbeom Kim
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: Eric Van Hensbergen
    Cc: Takashi Iwai
    Cc: Steven Whitehouse
    Cc: Petr Vandrovec
    Cc: Mark Fasheh
    Cc: Christoph Hellwig
    Cc: Avi Kivity

    Tejun Heo
     

15 Aug, 2012

1 commit


06 Aug, 2012

9 commits


26 Jul, 2012

4 commits

  • Handle KVM_IRQ_LINE and KVM_IRQ_LINE_STATUS in the generic
    kvm_vm_ioctl() function and call into kvm_vm_ioctl_irq_line().

    This is even more relevant when KVM/ARM also uses this ioctl.

    Signed-off-by: Christoffer Dall
    Signed-off-by: Avi Kivity

    Christoffer Dall
     
  • Currently, kvm allocates some pages and use them as error indicators,
    it wastes memory and is not good for scalability

    Base on Avi's suggestion, we use the error codes instead of these pages
    to indicate the error conditions

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Avi Kivity

    Xiao Guangrong
     
  • In kvm_async_pf_wakeup_all, it uses bad_page to generate broadcast wakeup,
    and uses put_page to release bad_page, the work depends on the fact that
    bad_page is the normal page. But we will use the error code instead of
    bad_page, so use kvm_release_page_clean to release the page which will
    release the error code properly

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Avi Kivity

    Xiao Guangrong
     
  • Merge patches queued during the run-up to the merge window.

    * queue: (25 commits)
    KVM: Choose better candidate for directed yield
    KVM: Note down when cpu relax intercepted or pause loop exited
    KVM: Add config to support ple or cpu relax optimzation
    KVM: switch to symbolic name for irq_states size
    KVM: x86: Fix typos in pmu.c
    KVM: x86: Fix typos in lapic.c
    KVM: x86: Fix typos in cpuid.c
    KVM: x86: Fix typos in emulate.c
    KVM: x86: Fix typos in x86.c
    KVM: SVM: Fix typos
    KVM: VMX: Fix typos
    KVM: remove the unused parameter of gfn_to_pfn_memslot
    KVM: remove is_error_hpa
    KVM: make bad_pfn static to kvm_main.c
    KVM: using get_fault_pfn to get the fault pfn
    KVM: MMU: track the refcount when unmap the page
    KVM: x86: remove unnecessary mark_page_dirty
    KVM: MMU: Avoid handling same rmap_pde in kvm_handle_hva_range()
    KVM: MMU: Push trace_kvm_age_page() into kvm_age_rmapp()
    KVM: MMU: Add memslot parameter to hva handlers
    ...

    Signed-off-by: Avi Kivity

    Avi Kivity
     

25 Jul, 2012

1 commit

  • Pull KVM updates from Avi Kivity:
    "Highlights include
    - full big real mode emulation on pre-Westmere Intel hosts (can be
    disabled with emulate_invalid_guest_state=0)
    - relatively small ppc and s390 updates
    - PCID/INVPCID support in guests
    - EOI avoidance; 3.6 guests should perform better on 3.6 hosts on
    interrupt intensive workloads)
    - Lockless write faults during live migration
    - EPT accessed/dirty bits support for new Intel processors"

    Fix up conflicts in:
    - Documentation/virtual/kvm/api.txt:

    Stupid subchapter numbering, added next to each other.

    - arch/powerpc/kvm/booke_interrupts.S:

    PPC asm changes clashing with the KVM fixes

    - arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c:

    Duplicated commits through the kvm tree and the s390 tree, with
    subsequent edits in the KVM tree.

    * tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
    KVM: fix race with level interrupts
    x86, hyper: fix build with !CONFIG_KVM_GUEST
    Revert "apic: fix kvm build on UP without IOAPIC"
    KVM guest: switch to apic_set_eoi_write, apic_write
    apic: add apic_set_eoi_write for PV use
    KVM: VMX: Implement PCID/INVPCID for guests with EPT
    KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check
    KVM: PPC: Critical interrupt emulation support
    KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests
    KVM: PPC64: booke: Set interrupt computation mode for 64-bit host
    KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt
    KVM: PPC: bookehv64: Add support for std/ld emulation.
    booke: Added crit/mc exception handler for e500v2
    booke/bookehv: Add host crit-watchdog exception support
    KVM: MMU: document mmu-lock and fast page fault
    KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint
    KVM: MMU: trace fast page fault
    KVM: MMU: fast path of handling guest page fault
    KVM: MMU: introduce SPTE_MMU_WRITEABLE bit
    KVM: MMU: fold tlb flush judgement into mmu_spte_update
    ...

    Linus Torvalds
     

23 Jul, 2012

3 commits

  • Currently, on a large vcpu guests, there is a high probability of
    yielding to the same vcpu who had recently done a pause-loop exit or
    cpu relax intercepted. Such a yield can lead to the vcpu spinning
    again and hence degrade the performance.

    The patchset keeps track of the pause loop exit/cpu relax interception
    and gives chance to a vcpu which:
    (a) Has not done pause loop exit or cpu relax intercepted at all
    (probably he is preempted lock-holder)
    (b) Was skipped in last iteration because it did pause loop exit or
    cpu relax intercepted, and probably has become eligible now
    (next eligible lock holder)

    Signed-off-by: Raghavendra K T
    Reviewed-by: Marcelo Tosatti
    Reviewed-by: Rik van Riel
    Tested-by: Christian Borntraeger # on s390x
    Signed-off-by: Avi Kivity

    Raghavendra K T
     
  • Noting pause loop exited vcpu or cpu relax intercepted helps in
    filtering right candidate to yield. Wrong selection of vcpu;
    i.e., a vcpu that just did a pl-exit or cpu relax intercepted may
    contribute to performance degradation.

    Signed-off-by: Raghavendra K T
    Reviewed-by: Marcelo Tosatti
    Reviewed-by: Rik van Riel
    Tested-by: Christian Borntraeger # on s390x
    Signed-off-by: Avi Kivity

    Raghavendra K T
     
  • Suggested-by: Avi Kivity
    Signed-off-by: Raghavendra K T
    Reviewed-by: Marcelo Tosatti
    Reviewed-by: Rik van Riel
    Tested-by: Christian Borntraeger # on s390x
    Signed-off-by: Avi Kivity

    Raghavendra K T
     

21 Jul, 2012

1 commit