28 Nov, 2012

1 commit


22 Oct, 2012

1 commit

  • Use __func__ instead of the function name in svm_hardware_enable since
    those things tend to get out of sync. This also slims down printk line
    length in conjunction with using pr_err.

    No functionality change.

    Cc: Joerg Roedel
    Cc: Avi Kivity
    Signed-off-by: Borislav Petkov
    Signed-off-by: Avi Kivity

    Borislav Petkov
     

23 Sep, 2012

1 commit

  • If we reset a vcpu on INIT, we so far overwrote dr7 as provided by
    KVM_SET_GUEST_DEBUG, and we also cleared switch_db_regs unconditionally.

    Fix this by saving the dr7 used for guest debugging and calculating the
    effective register value as well as switch_db_regs on any potential
    change. This will change to focus of the set_guest_debug vendor op to
    update_dp_bp_intercept.

    Found while trying to stop on start_secondary.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     

17 Sep, 2012

1 commit


05 Sep, 2012

1 commit


06 Aug, 2012

1 commit


21 Jul, 2012

1 commit


12 Jul, 2012

1 commit

  • This patch handles PCID/INVPCID for guests.

    Process-context identifiers (PCIDs) are a facility by which a logical processor
    may cache information for multiple linear-address spaces so that the processor
    may retain cached information when software switches to a different linear
    address space. Refer to section 4.10.1 in IA32 Intel Software Developer's Manual
    Volume 3A for details.

    For guests with EPT, the PCID feature is enabled and INVPCID behaves as running
    natively.
    For guests without EPT, the PCID feature is disabled and INVPCID triggers #UD.

    Signed-off-by: Junjie Mao
    Signed-off-by: Avi Kivity

    Mao, Junjie
     

06 Jun, 2012

1 commit

  • Introduces a couple of print functions, which are essentially wrappers
    around standard printk functions, with a KVM: prefix.

    Functions introduced or modified are:
    - kvm_err(fmt, ...)
    - kvm_info(fmt, ...)
    - kvm_debug(fmt, ...)
    - kvm_pr_unimpl(fmt, ...)
    - pr_unimpl(vcpu, fmt, ...) -> vcpu_unimpl(vcpu, fmt, ...)

    Signed-off-by: Christoffer Dall
    Signed-off-by: Avi Kivity

    Christoffer Dall
     

17 Apr, 2012

1 commit


08 Apr, 2012

1 commit


29 Mar, 2012

1 commit

  • Pull kvm updates from Avi Kivity:
    "Changes include timekeeping improvements, support for assigning host
    PCI devices that share interrupt lines, s390 user-controlled guests, a
    large ppc update, and random fixes."

    This is with the sign-off's fixed, hopefully next merge window we won't
    have rebased commits.

    * 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits)
    KVM: Convert intx_mask_lock to spin lock
    KVM: x86: fix kvm_write_tsc() TSC matching thinko
    x86: kvmclock: abstract save/restore sched_clock_state
    KVM: nVMX: Fix erroneous exception bitmap check
    KVM: Ignore the writes to MSR_K7_HWCR(3)
    KVM: MMU: make use of ->root_level in reset_rsvds_bits_mask
    KVM: PMU: add proper support for fixed counter 2
    KVM: PMU: Fix raw event check
    KVM: PMU: warn when pin control is set in eventsel msr
    KVM: VMX: Fix delayed load of shared MSRs
    KVM: use correct tlbs dirty type in cmpxchg
    KVM: Allow host IRQ sharing for assigned PCI 2.3 devices
    KVM: Ensure all vcpus are consistent with in-kernel irqchip settings
    KVM: x86 emulator: Allow PM/VM86 switch during task switch
    KVM: SVM: Fix CPL updates
    KVM: x86 emulator: VM86 segments must have DPL 3
    KVM: x86 emulator: Fix task switch privilege checks
    arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice
    KVM: x86 emulator: correctly mask pmc index bits in RDPMC instruction emulation
    KVM: mmu_notifier: Flush TLBs before releasing mmu_lock
    ...

    Linus Torvalds
     

08 Mar, 2012

5 commits

  • Task switches can switch between Protected Mode and VM86. The current
    mode must be updated during the task switch emulation so that the new
    segment selectors are interpreted correctly.

    In order to let privilege checks succeed, rflags needs to be updated in
    the vcpu struct as this causes a CPL update.

    Signed-off-by: Kevin Wolf
    Signed-off-by: Avi Kivity

    Kevin Wolf
     
  • Keep CPL at 0 in real mode and at 3 in VM86. In protected/long mode, use
    RPL rather than DPL of the code segment.

    Signed-off-by: Kevin Wolf
    Signed-off-by: Avi Kivity

    Kevin Wolf
     
  • Currently, all task switches check privileges against the DPL of the
    TSS. This is only correct for jmp/call to a TSS. If a task gate is used,
    the DPL of this take gate is used for the check instead. Exceptions,
    external interrupts and iret shouldn't perform any check.

    [avi: kill kvm-kmod remnants]

    Signed-off-by: Kevin Wolf
    Signed-off-by: Avi Kivity

    Kevin Wolf
     
  • Redefine the API to take a parameter indicating whether an
    adjustment is in host or guest cycles.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     
  • This requires some restructuring; rather than use 'virtual_tsc_khz'
    to indicate whether hardware rate scaling is in effect, we consider
    each VCPU to always have a virtual TSC rate. Instead, there is new
    logic above the vendor-specific hardware scaling that decides whether
    it is even necessary to use and updates all rate variables used by
    common code. This means we can simply query the virtual rate at
    any point, which is needed for software rate scaling.

    There is also now a threshold added to the TSC rate scaling; minor
    differences and variations of measured TSC rate can accidentally
    provoke rate scaling to be used when it is not needed. Instead,
    we have a tolerance variable called tsc_tolerance_ppm, which is
    the maximum variation from user requested rate at which scaling
    will be used. The default is 250ppm, which is the half the
    threshold for NTP adjustment, allowing for some hardware variation.

    In the event that hardware rate scaling is not available, we can
    kludge a bit by forcing TSC catchup to turn on when a faster than
    hardware speed has been requested, but there is nothing available
    yet for the reverse case; this requires a trap and emulate software
    implementation for RDTSC, which is still forthcoming.

    [avi: fix 64-bit division on i386]

    Signed-off-by: Zachary Amsden
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Zachary Amsden
     

05 Mar, 2012

2 commits

  • Also use true instead of 1 for enabling by default.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Davidlohr Bueso
     
  • In some cases guests should not provide workarounds for errata even when the
    physical processor is affected. For example, because of erratum 400 on family
    10h processors a Linux guest will read an MSR (resulting in VMEXIT) before
    going to idle in order to avoid getting stuck in a non-C0 state. This is not
    necessary: HLT and IO instructions are intercepted and therefore there is no
    reason for erratum 400 workaround in the guest.

    This patch allows us to present a guest with certain errata as fixed,
    regardless of the state of actual hardware.

    Signed-off-by: Boris Ostrovsky
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Boris Ostrovsky
     

02 Mar, 2012

1 commit

  • It turned out that a performance counter on AMD does not
    count at all when the GO or HO bit is set in the control
    register and SVM is disabled in EFER.

    This patch works around this issue by masking out the HO bit
    in the performance counter control register when SVM is not
    enabled.

    The GO bit is not touched because it is only set when the
    user wants to count in guest-mode only. So when SVM is
    disabled the counter should not run at all and the
    not-counting is the intended behaviour.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Peter Zijlstra
    Cc: Avi Kivity
    Cc: Stephane Eranian
    Cc: David Ahern
    Cc: Gleb Natapov
    Cc: Robert Richter
    Cc: stable@vger.kernel.org # v3.2
    Link: http://lkml.kernel.org/r/1330523852-19566-1-git-send-email-joerg.roedel@amd.com
    Signed-off-by: Ingo Molnar

    Joerg Roedel
     

27 Dec, 2011

1 commit


30 Oct, 2011

1 commit

  • AMD processors apparently have a bug in the hardware task switching
    support when NPT is enabled. If the task switch triggers a NPF, we can
    get wrong EXITINTINFO along with that fault. On resume, spurious
    exceptions may then be injected into the guest.

    We were able to reproduce this bug when our guest triggered #SS and the
    handler were supposed to run over a separate task with not yet touched
    stack pages.

    Work around the issue by continuing to emulate task switches even in
    NPT mode.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Marcelo Tosatti

    Jan Kiszka
     

26 Sep, 2011

6 commits

  • This avoids that events causing the vmexit are recorded before the
    actual exit reason.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Marcelo Tosatti

    Jan Kiszka
     
  • When the TSC MSR is read by an L2 guest (when L1 allowed this MSR to be
    read without exit), we need to return L2's notion of the TSC, not L1's.

    The current code incorrectly returned L1 TSC, because svm_get_msr() was also
    used in x86.c where this was assumed, but now that these places call the new
    svm_read_l1_tsc(), the MSR read can be fixed.

    Signed-off-by: Nadav Har'El
    Tested-by: Joerg Roedel
    Acked-by: Joerg Roedel
    Signed-off-by: Avi Kivity

    Nadav Har'El
     
  • KVM assumed in several places that reading the TSC MSR returns the value for
    L1. This is incorrect, because when L2 is running, the correct TSC read exit
    emulation is to return L2's value.

    We therefore add a new x86_ops function, read_l1_tsc, to use in places that
    specifically need to read the L1 TSC, NOT the TSC of the current level of
    guest.

    Note that one change, of one line in kvm_arch_vcpu_load, is made redundant
    by a different patch sent by Zachary Amsden (and not yet applied):
    kvm_arch_vcpu_load() should not read the guest TSC, and if it didn't, of
    course we didn't have to change the call of kvm_get_msr() to read_l1_tsc().

    [avi: moved callback to kvm_x86_ops tsc block]

    Signed-off-by: Nadav Har'El
    Acked-by: Zachary Amsdem
    Signed-off-by: Avi Kivity

    Nadav Har'El
     
  • Architecturally, PDPTEs are cached in the PDPTRs when CR3 is reloaded.
    On SVM, it is not possible to implement this, but on VMX this is possible
    and was indeed implemented until nested SVM changed this to unconditionally
    read PDPTEs dynamically. This has noticable impact when running PAE guests.

    Fix by changing the MMU to read PDPTRs from the cache, falling back to
    reading from memory for the nested MMU.

    Signed-off-by: Avi Kivity
    Tested-by: Joerg Roedel
    Signed-off-by: Marcelo Tosatti

    Avi Kivity
     
  • The vmexit tracepoints format the exit_reason to make it human-readable.
    Since the exit_reason depends on the instruction set (vmx or svm),
    formatting is handled with ftrace_print_symbols_seq() by referring to
    the appropriate exit reason table.

    However, the ftrace_print_symbols_seq() function is not meant to be used
    directly in tracepoints since it does not export the formatting table
    which userspace tools like trace-cmd and perf use to format traces.

    In practice perf dies when formatting vmexit-related events and
    trace-cmd falls back to printing the numeric value (with extra
    formatting code in the kvm plugin to paper over this limitation). Other
    userspace consumers of vmexit-related tracepoints would be in similar
    trouble.

    To avoid significant changes to the kvm_exit tracepoint, this patch
    moves the vmx and svm exit reason tables into arch/x86/kvm/trace.h and
    selects the right table with __print_symbolic() depending on the
    instruction set. Note that __print_symbolic() is designed for exporting
    the formatting table to userspace and allows trace-cmd and perf to work.

    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Avi Kivity

    Stefan Hajnoczi
     
  • The kvm_exit tracepoint recently added the isa argument to aid decoding
    exit_reason. The semantics of exit_reason depend on the instruction set
    (vmx or svm) and the isa argument allows traces to be analyzed on other
    machines.

    Add the isa argument to kvm_nested_vmexit and kvm_nested_vmexit_inject
    so these tracepoints can also be self-describing.

    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Avi Kivity

    Stefan Hajnoczi
     

12 Jul, 2011

1 commit

  • This patch allows the guest to enable the VMXE bit in CR4, which is a
    prerequisite to running VMXON.

    Whether to allow setting the VMXE bit now depends on the architecture (svm
    or vmx), so its checking has moved to kvm_x86_ops->set_cr4(). This function
    now returns an int: If kvm_x86_ops->set_cr4() returns 1, __kvm_set_cr4()
    will also return 1, and this will cause kvm_set_cr4() will throw a #GP.

    Turning on the VMXE bit is allowed only when the nested VMX feature is
    enabled, and turning it off is forbidden after a vmxon.

    Signed-off-by: Nadav Har'El
    Signed-off-by: Marcelo Tosatti

    Nadav Har'El
     

22 May, 2011

2 commits


11 May, 2011

9 commits