14 Dec, 2012

1 commit

  • Pull KVM updates from Marcelo Tosatti:
    "Considerable KVM/PPC work, x86 kvmclock vsyscall support,
    IA32_TSC_ADJUST MSR emulation, amongst others."

    Fix up trivial conflict in kernel/sched/core.c due to cross-cpu
    migration notifier added next to rq migration call-back.

    * tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (156 commits)
    KVM: emulator: fix real mode segment checks in address linearization
    VMX: remove unneeded enable_unrestricted_guest check
    KVM: VMX: fix DPL during entry to protected mode
    x86/kexec: crash_vmclear_local_vmcss needs __rcu
    kvm: Fix irqfd resampler list walk
    KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump
    x86/kexec: VMCLEAR VMCSs loaded on all cpus if necessary
    KVM: MMU: optimize for set_spte
    KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interface
    KVM: PPC: bookehv: Add EPCR support in mtspr/mfspr emulation
    KVM: PPC: bookehv: Add guest computation mode for irq delivery
    KVM: PPC: Make EPCR a valid field for booke64 and bookehv
    KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit
    KVM: PPC: e500: Mask MAS2 EPN high 32-bits in 32/64 tlbwe emulation
    KVM: PPC: Mask ea's high 32-bits in 32/64 instr emulation
    KVM: PPC: e500: Add emulation helper for getting instruction ea
    KVM: PPC: bookehv64: Add support for interrupt handling
    KVM: PPC: bookehv: Remove GET_VCPU macro from exception handler
    KVM: PPC: booke: Fix get_tb() compile error on 64-bit
    KVM: PPC: e500: Silence bogus GCC warning in tlb code
    ...

    Linus Torvalds
     

10 Dec, 2012

1 commit

  • * 'for-upstream' of https://github.com/agraf/linux-2.6: (28 commits)
    KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interface
    KVM: PPC: bookehv: Add EPCR support in mtspr/mfspr emulation
    KVM: PPC: bookehv: Add guest computation mode for irq delivery
    KVM: PPC: Make EPCR a valid field for booke64 and bookehv
    KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit
    KVM: PPC: e500: Mask MAS2 EPN high 32-bits in 32/64 tlbwe emulation
    KVM: PPC: Mask ea's high 32-bits in 32/64 instr emulation
    KVM: PPC: e500: Add emulation helper for getting instruction ea
    KVM: PPC: bookehv64: Add support for interrupt handling
    KVM: PPC: bookehv: Remove GET_VCPU macro from exception handler
    KVM: PPC: booke: Fix get_tb() compile error on 64-bit
    KVM: PPC: e500: Silence bogus GCC warning in tlb code
    KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking
    KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidations
    MAINTAINERS: Add git tree link for PPC KVM
    KVM: PPC: Book3S PR: MSR_DE doesn't exist on Book 3S
    KVM: PPC: Book3S PR: Fix VSX handling
    KVM: PPC: Book3S PR: Emulate PURR, SPURR and DSCR registers
    KVM: PPC: Book3S HV: Don't give the guest RW access to RO pages
    KVM: PPC: Book3S HV: Report correct HPT entry index when reading HPT
    ...

    Marcelo Tosatti
     

08 Dec, 2012

1 commit


07 Dec, 2012

1 commit


06 Dec, 2012

1 commit

  • The current eventfd code assumes that when we have eventfd, we also have
    irqfd for in-kernel interrupt delivery. This is not necessarily true. On
    PPC we don't have an in-kernel irqchip yet, but we can still support easily
    support eventfd.

    Signed-off-by: Alexander Graf

    Alexander Graf
     

05 Dec, 2012

1 commit


28 Nov, 2012

2 commits

  • TSC initialization will soon make use of online_vcpus.

    Signed-off-by: Marcelo Tosatti

    Marcelo Tosatti
     
  • KVM added a global variable to guarantee monotonicity in the guest.
    One of the reasons for that is that the time between

    1. ktime_get_ts(×pec);
    2. rdtscll(tsc);

    Is variable. That is, given a host with stable TSC, suppose that
    two VCPUs read the same time via ktime_get_ts() above.

    The time required to execute 2. is not the same on those two instances
    executing in different VCPUS (cache misses, interrupts...).

    If the TSC value that is used by the host to interpolate when
    calculating the monotonic time is the same value used to calculate
    the tsc_timestamp value stored in the pvclock data structure, and
    a single tuple is visible to all
    vcpus simultaneously, this problem disappears. See comment on top
    of pvclock_update_vm_gtod_copy for details.

    Monotonicity is then guaranteed by synchronicity of the host TSCs
    and guest TSCs.

    Set TSC stable pvclock flag in that case, allowing the guest to read
    clock from userspace.

    Signed-off-by: Marcelo Tosatti

    Marcelo Tosatti
     

19 Nov, 2012

1 commit

  • Prepending irq-unsafe vtime APIs with underscores was actually
    a bad idea as the result is a big mess in the API namespace that
    is even waiting to be further extended. Also these helpers
    are always called from irq safe callers except kvm. Just
    provide a vtime_account_system_irqsafe() for this specific
    case so that we can remove the underscore prefix on other
    vtime functions.

    Signed-off-by: Frederic Weisbecker
    Reviewed-by: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Steven Rostedt
    Cc: Paul Gortmaker
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens

    Frederic Weisbecker
     

01 Nov, 2012

1 commit

  • After commit b3356bf0dbb349 (KVM: emulator: optimize "rep ins" handling),
    the pieces of io data can be collected and write them to the guest memory
    or MMIO together

    Unfortunately, kvm splits the mmio access into 8 bytes and store them to
    vcpu->mmio_fragments. If the guest uses "rep ins" to move large data, it
    will cause vcpu->mmio_fragments overflow

    The bug can be exposed by isapc (-M isapc):

    [23154.818733] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
    [ ......]
    [23154.858083] Call Trace:
    [23154.859874] [] kvm_get_cr8+0x1d/0x28 [kvm]
    [23154.861677] [] kvm_arch_vcpu_ioctl_run+0xcda/0xe45 [kvm]
    [23154.863604] [] ? kvm_arch_vcpu_load+0x17b/0x180 [kvm]

    Actually, we can use one mmio_fragment to store a large mmio access then
    split it when we pass the mmio-exit-info to userspace. After that, we only
    need two entries to store mmio info for the cross-mmio pages access

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Marcelo Tosatti

    Xiao Guangrong
     

30 Oct, 2012

2 commits

  • This patch filters noslot pfn out from error pfns based on Marcelo comment:
    noslot pfn is not a error pfn

    After this patch,
    - is_noslot_pfn indicates that the gfn is not in slot
    - is_error_pfn indicates that the gfn is in slot but the error is occurred
    when translate the gfn to pfn
    - is_error_noslot_pfn indicates that the pfn either it is error pfns or it
    is noslot pfn
    And is_invalid_pfn can be removed, it makes the code more clean

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Marcelo Tosatti

    Xiao Guangrong
     
  • Switching to or from guest context is done on ioctl context.
    So by the time we call kvm_guest_enter() or kvm_guest_exit()
    we know we are not running the idle task.

    As a result, we can directly account the cputime using
    vtime_account_system().

    There are two good reasons to do this:

    * We avoid some useless checks on guest switch. It optimizes
    a bit this fast path.

    * In the case of CONFIG_IRQ_TIME_ACCOUNTING, calling vtime_account()
    checks for irq time to account. This is pointless since we know
    we are not in an irq on guest switch. This is wasting cpu cycles
    for no good reason. vtime_account_system() OTOH is a no-op in
    this config option.

    * We can remove the irq disable/enable around kvm guest switch in s390.

    A further optimization may consist in introducing a vtime_account_guest()
    that directly calls account_guest_time().

    Signed-off-by: Frederic Weisbecker
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Avi Kivity
    Cc: Marcelo Tosatti
    Cc: Joerg Roedel
    Cc: Alexander Graf
    Cc: Xiantao Zhang
    Cc: Christian Borntraeger
    Cc: Cornelia Huck
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Steven Rostedt
    Cc: Paul Gortmaker

    Frederic Weisbecker
     

23 Oct, 2012

1 commit


11 Oct, 2012

1 commit

  • * 'for-upstream' of http://github.com/agraf/linux-2.6: (56 commits)
    arch/powerpc/kvm/e500_tlb.c: fix error return code
    KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas
    KVM: PPC: Book3S: Get/set guest FP regs using the GET/SET_ONE_REG interface
    KVM: PPC: Book3S: Get/set guest SPRs using the GET/SET_ONE_REG interface
    KVM: PPC: set IN_GUEST_MODE before checking requests
    KVM: PPC: e500: MMU API: fix leak of shared_tlb_pages
    KVM: PPC: e500: fix allocation size error on g2h_tlb1_map
    KVM: PPC: Book3S HV: Fix calculation of guest phys address for MMIO emulation
    KVM: PPC: Book3S HV: Remove bogus update of physical thread IDs
    KVM: PPC: Book3S HV: Fix updates of vcpu->cpu
    KVM: Move some PPC ioctl definitions to the correct place
    KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly
    KVM: PPC: Move kvm->arch.slot_phys into memslot.arch
    KVM: PPC: Book3S HV: Take the SRCU read lock before looking up memslots
    KVM: PPC: bookehv: Allow duplicate calls of DO_KVM macro
    KVM: PPC: BookE: Support FPU on non-hv systems
    KVM: PPC: 440: Implement mfdcrx
    KVM: PPC: 440: Implement mtdcrx
    Document IACx/DACx registers access using ONE_REG API
    KVM: PPC: E500: Remove E500_TLB_DIRTY flag
    ...

    Marcelo Tosatti
     

09 Oct, 2012

1 commit


06 Oct, 2012

1 commit

  • This patch adds the watchdog emulation in KVM. The watchdog
    emulation is enabled by KVM_ENABLE_CAP(KVM_CAP_PPC_BOOKE_WATCHDOG) ioctl.
    The kernel timer are used for watchdog emulation and emulates
    h/w watchdog state machine. On watchdog timer expiry, it exit to QEMU
    if TCR.WRC is non ZERO. QEMU can reset/shutdown etc depending upon how
    it is configured.

    Signed-off-by: Liu Yu
    Signed-off-by: Scott Wood
    [bharat.bhushan@freescale.com: reworked patch]
    Signed-off-by: Bharat Bhushan
    [agraf: adjust to new request framework]
    Signed-off-by: Alexander Graf

    Bharat Bhushan
     

05 Oct, 2012

1 commit

  • Pull KVM updates from Avi Kivity:
    "Highlights of the changes for this release include support for vfio
    level triggered interrupts, improved big real mode support on older
    Intels, a streamlines guest page table walker, guest APIC speedups,
    PIO optimizations, better overcommit handling, and read-only memory."

    * tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (138 commits)
    KVM: s390: Fix vcpu_load handling in interrupt code
    KVM: x86: Fix guest debug across vcpu INIT reset
    KVM: Add resampling irqfds for level triggered interrupts
    KVM: optimize apic interrupt delivery
    KVM: MMU: Eliminate pointless temporary 'ac'
    KVM: MMU: Avoid access/dirty update loop if all is well
    KVM: MMU: Eliminate eperm temporary
    KVM: MMU: Optimize is_last_gpte()
    KVM: MMU: Simplify walk_addr_generic() loop
    KVM: MMU: Optimize pte permission checks
    KVM: MMU: Update accessed and dirty bits after guest pagetable walk
    KVM: MMU: Move gpte_access() out of paging_tmpl.h
    KVM: MMU: Optimize gpte_access() slightly
    KVM: MMU: Push clean gpte write protection out of gpte_access()
    KVM: clarify kvmclock documentation
    KVM: make processes waiting on vcpu mutex killable
    KVM: SVM: Make use of asm.h
    KVM: VMX: Make use of asm.h
    KVM: VMX: Make lto-friendly
    KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
    ...

    Conflicts:
    arch/s390/include/asm/processor.h
    arch/x86/kvm/i8259.c

    Linus Torvalds
     

25 Sep, 2012

1 commit

  • Use a naming based on vtime as a prefix for virtual based
    cputime accounting APIs:

    - account_system_vtime() -> vtime_account()
    - account_switch_vtime() -> vtime_task_switch()

    It makes it easier to allow for further declension such
    as vtime_account_system(), vtime_account_idle(), ... if we
    want to find out the context we account to from generic code.

    This also make it better to know on which subsystem these APIs
    refer to.

    Signed-off-by: Frederic Weisbecker
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra

    Frederic Weisbecker
     

23 Sep, 2012

1 commit

  • To emulate level triggered interrupts, add a resample option to
    KVM_IRQFD. When specified, a new resamplefd is provided that notifies
    the user when the irqchip has been resampled by the VM. This may, for
    instance, indicate an EOI. Also in this mode, posting of an interrupt
    through an irqfd only asserts the interrupt. On resampling, the
    interrupt is automatically de-asserted prior to user notification.
    This enables level triggered interrupts to be posted and re-enabled
    from vfio with no userspace intervention.

    All resampling irqfds can make use of a single irq source ID, so we
    reserve a new one for this interface.

    Signed-off-by: Alex Williamson
    Signed-off-by: Avi Kivity

    Alex Williamson
     

18 Sep, 2012

1 commit

  • vcpu mutex can be held for unlimited time so
    taking it with mutex_lock on an ioctl is wrong:
    one process could be passed a vcpu fd and
    call this ioctl on the vcpu used by another process,
    it will then be unkillable until the owner exits.

    Call mutex_lock_killable instead and return status.
    Note: mutex_lock_interruptible would be even nicer,
    but I am not sure all users are prepared to handle EINTR
    from these ioctls. They might misinterpret it as an error.

    Cleanup paths expect a vcpu that can't be used by
    any userspace so this will always succeed - catch bugs
    by calling BUG_ON.

    Catch callers that don't check return state by adding
    __must_check.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Marcelo Tosatti

    Michael S. Tsirkin
     

06 Sep, 2012

1 commit


28 Aug, 2012

1 commit

  • The build error was caused by that builtin functions are calling
    the functions implemented in modules. This error was introduced by
    commit 4d8b81abc4 ("KVM: introduce readonly memslot").

    The patch fixes the build error by moving function __gfn_to_hva_memslot()
    from kvm_main.c to kvm_host.h and making that "inline" so that the
    builtin function (kvmppc_h_enter) can use that.

    Acked-by: Paul Mackerras
    Signed-off-by: Gavin Shan
    Signed-off-by: Marcelo Tosatti

    Gavin Shan
     

22 Aug, 2012

6 commits


06 Aug, 2012

9 commits


26 Jul, 2012

3 commits

  • Handle KVM_IRQ_LINE and KVM_IRQ_LINE_STATUS in the generic
    kvm_vm_ioctl() function and call into kvm_vm_ioctl_irq_line().

    This is even more relevant when KVM/ARM also uses this ioctl.

    Signed-off-by: Christoffer Dall
    Signed-off-by: Avi Kivity

    Christoffer Dall
     
  • Currently, kvm allocates some pages and use them as error indicators,
    it wastes memory and is not good for scalability

    Base on Avi's suggestion, we use the error codes instead of these pages
    to indicate the error conditions

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Avi Kivity

    Xiao Guangrong
     
  • Merge patches queued during the run-up to the merge window.

    * queue: (25 commits)
    KVM: Choose better candidate for directed yield
    KVM: Note down when cpu relax intercepted or pause loop exited
    KVM: Add config to support ple or cpu relax optimzation
    KVM: switch to symbolic name for irq_states size
    KVM: x86: Fix typos in pmu.c
    KVM: x86: Fix typos in lapic.c
    KVM: x86: Fix typos in cpuid.c
    KVM: x86: Fix typos in emulate.c
    KVM: x86: Fix typos in x86.c
    KVM: SVM: Fix typos
    KVM: VMX: Fix typos
    KVM: remove the unused parameter of gfn_to_pfn_memslot
    KVM: remove is_error_hpa
    KVM: make bad_pfn static to kvm_main.c
    KVM: using get_fault_pfn to get the fault pfn
    KVM: MMU: track the refcount when unmap the page
    KVM: x86: remove unnecessary mark_page_dirty
    KVM: MMU: Avoid handling same rmap_pde in kvm_handle_hva_range()
    KVM: MMU: Push trace_kvm_age_page() into kvm_age_rmapp()
    KVM: MMU: Add memslot parameter to hva handlers
    ...

    Signed-off-by: Avi Kivity

    Avi Kivity