27 Apr, 2013

17 commits

  • Quite a bit of code in KVM has been conditionalized on availability of
    IOAPIC emulation. However, most of it is generically applicable to
    platforms that don't have an IOPIC, but a different type of irq chip.

    Make code that only relies on IRQ routing, not an APIC itself, on
    CONFIG_HAVE_KVM_IRQ_ROUTING, so that we can reuse it later.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     
  • The concept of routing interrupt lines to an irqchip is nothing
    that is IOAPIC specific. Every irqchip has a maximum number of pins
    that can be linked to irq lines.

    So let's add a new define that allows us to reuse generic code for
    non-IOAPIC platforms.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     
  • At present, the KVM_GET_DIRTY_LOG ioctl doesn't report modifications
    done by the host to the virtual processor areas (VPAs) and dispatch
    trace logs (DTLs) registered by the guest. This is because those
    modifications are done either in real mode or in the host kernel
    context, and in neither case does the access go through the guest's
    HPT, and thus no change (C) bit gets set in the guest's HPT.

    However, the changes done by the host do need to be tracked so that
    the modified pages get transferred when doing live migration. In
    order to track these modifications, this adds a dirty flag to the
    struct representing the VPA/DTL areas, and arranges to set the flag
    when the VPA/DTL gets modified by the host. Then, when we are
    collecting the dirty log, we also check the dirty flags for the
    VPA and DTL for each vcpu and set the relevant bit in the dirty log
    if necessary. Doing this also means we now need to keep track of
    the guest physical address of the VPA/DTL areas.

    So as not to lose track of modifications to a VPA/DTL area when it gets
    unregistered, or when a new area gets registered in its place, we need
    to transfer the dirty state to the rmap chain. This adds code to
    kvmppc_unpin_guest_page() to do that if the area was dirty. To simplify
    that code, we now require that all VPA, DTL and SLB shadow buffer areas
    fit within a single host page. Guests already comply with this
    requirement because pHyp requires that these areas not cross a 4k
    boundary.

    Signed-off-by: Paul Mackerras
    Signed-off-by: Alexander Graf

    Paul Mackerras
     
  • At present, the code that determines whether a HPT entry has changed,
    and thus needs to be sent to userspace when it is copying the HPT,
    doesn't consider a hardware update to the reference and change bits
    (R and C) in the HPT entries to constitute a change that needs to
    be sent to userspace. This adds code to check for changes in R and C
    when we are scanning the HPT to find changed entries, and adds code
    to set the changed flag for the HPTE when we update the R and C bits
    in the guest view of the HPTE.

    Since we now need to set the HPTE changed flag in book3s_64_mmu_hv.c
    as well as book3s_hv_rm_mmu.c, we move the note_hpte_modification()
    function into kvm_book3s_64.h.

    Current Linux guest kernels don't use the hardware updates of R and C
    in the HPT, so this change won't affect them. Linux (or other) kernels
    might in future want to use the R and C bits and have them correctly
    transferred across when a guest is migrated, so it is better to correct
    this deficiency.

    Signed-off-by: Paul Mackerras
    Signed-off-by: Alexander Graf

    Paul Mackerras
     
  • Add e6500 core to Kconfig description.

    Signed-off-by: Mihai Caraman
    Signed-off-by: Alexander Graf

    Mihai Caraman
     
  • Extend processor compatibility names to e6500 cores.

    Signed-off-by: Mihai Caraman
    Reviewed-by: Alexander Graf
    Signed-off-by: Alexander Graf

    Mihai Caraman
     
  • Embedded.Page Table (E.PT) category is not supported yet in e6500 kernel.
    Configure TLBnCFG to remove E.PT and E.HV.LRAT categories from VCPUs.

    Signed-off-by: Mihai Caraman
    Signed-off-by: Alexander Graf

    Mihai Caraman
     
  • EPTCFG register defined by E.PT is accessed unconditionally by Linux guests
    in the presence of MAV 2.0. Emulate it now.

    Signed-off-by: Mihai Caraman
    Signed-off-by: Alexander Graf

    Mihai Caraman
     
  • Add support for TLBnPS registers available in MMU Architecture Version
    (MAV) 2.0.

    Signed-off-by: Mihai Caraman
    Signed-off-by: Alexander Graf

    Mihai Caraman
     
  • Vcpu's MMU default configuration and geometry update logic was buried in
    a chunk of code. Move them to dedicated functions to add more clarity.

    Signed-off-by: Mihai Caraman
    Signed-off-by: Alexander Graf

    Mihai Caraman
     
  • MMU registers were exposed to user-space using sregs interface. Add them
    to ONE_REG interface using kvmppc_get_one_reg/kvmppc_set_one_reg delegation
    mechanism.

    Signed-off-by: Mihai Caraman
    Signed-off-by: Alexander Graf

    Mihai Caraman
     
  • Refactor Book3E ONE_REG ioctl implementation to use kvmppc_get_one_reg/
    kvmppc_set_one_reg delegation interface introduced by Book3S. This is
    necessary for MMU SPRs which are platform specifics.

    Get rid of useless case braces in the process.

    Signed-off-by: Mihai Caraman
    Signed-off-by: Alexander Graf

    Mihai Caraman
     
  • This allows the exit to user space if emulator request by returning
    EMULATE_EXIT_USER. This will be used in subsequent patches in list

    Signed-off-by: Bharat Bhushan
    Signed-off-by: Alexander Graf

    Bharat Bhushan
     
  • Currently the instruction emulator code returns EMULATE_EXIT_USER
    and common code initializes the "run->exit_reason = .." and
    "vcpu->arch.hcall_needed = .." with one fixed reason.
    But there can be different reasons when emulator need to exit
    to user space. To support that the "run->exit_reason = .."
    and "vcpu->arch.hcall_needed = .." initialization is moved a
    level up to emulator.

    Signed-off-by: Bharat Bhushan
    Signed-off-by: Alexander Graf

    Bharat Bhushan
     
  • Instruction emulation return EMULATE_DO_PAPR when it requires
    exit to userspace on book3s. Similar return is required
    for booke. EMULATE_DO_PAPR reads out to be confusing so it is
    renamed to EMULATE_EXIT_USER.

    Signed-off-by: Bharat Bhushan
    Signed-off-by: Alexander Graf

    Bharat Bhushan
     
  • This patch defines the interface parameter for KVM_SET_GUEST_DEBUG
    ioctl support. Follow up patches will use this for setting up
    hardware breakpoints, watchpoints and software breakpoints.

    Also kvm_arch_vcpu_ioctl_set_guest_debug() is brought one level below.
    This is because I am not sure what is required for book3s. So this ioctl
    behaviour will not change for book3s.

    Signed-off-by: Bharat Bhushan
    Signed-off-by: Alexander Graf

    Bharat Bhushan
     
  • Kernel can only access pages which maps as memory.
    So flush only the valid kernel pages.

    Signed-off-by: Bharat Bhushan
    Signed-off-by: Alexander Graf

    Bharat Bhushan
     

25 Apr, 2013

1 commit


22 Apr, 2013

18 commits

  • If we load the complete EFER MSR on entry or exit, EFER.LMA (and LME)
    loading is skipped. Their consistency is already checked now before
    starting the transition.

    Signed-off-by: Jan Kiszka
    Reviewed-by: Paolo Bonzini
    Signed-off-by: Gleb Natapov

    Jan Kiszka
     
  • As we may emulate the loading of EFER on VM-entry and VM-exit, implement
    the checks that VMX performs on the guest and host values on vmlaunch/
    vmresume. Factor out kvm_valid_efer for this purpose which checks for
    set reserved bits.

    Signed-off-by: Jan Kiszka
    Reviewed-by: Paolo Bonzini
    Signed-off-by: Gleb Natapov

    Jan Kiszka
     
  • The logic for checking if interrupts can be injected has to be applied
    also on NMIs. The difference is that if NMI interception is on these
    events are consumed and blocked by the VM exit.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Gleb Natapov

    Jan Kiszka
     
  • vmx_set_nmi_mask will soon be used by vmx_nmi_allowed. No functional
    changes.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Gleb Natapov

    Jan Kiszka
     
  • If userspace creates and destroys multiple VMs within the same process
    we leak 20k of memory in the userspace process context per VM. This
    patch frees the memory in kvm_arch_destroy_vm. If the process exits
    without closing the VM file descriptor or the file descriptor has been
    shared with another process then we don't free the memory.

    It's still possible for a user space process to leak memory if the last
    process to close the fd for the VM is not the process that created it.
    However, this is an unexpected case that's only caused by a user space
    process that's misbehaving.

    Signed-off-by: Andrew Honig
    Signed-off-by: Gleb Natapov

    Andrew Honig
     
  • Fix to return a negative error code from the error handling
    case instead of 0, as returned elsewhere in this function.

    Signed-off-by: Wei Yongjun
    Reviewed-by: Paolo Bonzini
    Signed-off-by: Gleb Natapov

    Wei Yongjun
     
  • Once L1 loads VMCS12 we enable shadow-vmcs capability and copy all the VMCS12
    shadowed fields to the shadow vmcs. When we release the VMCS12, we also
    disable shadow-vmcs capability.

    Signed-off-by: Abel Gordon
    Reviewed-by: Orit Wasserman
    Signed-off-by: Gleb Natapov

    Abel Gordon
     
  • Synchronize between the VMCS12 software controlled structure and the
    processor-specific shadow vmcs

    Signed-off-by: Abel Gordon
    Reviewed-by: Orit Wasserman
    Signed-off-by: Gleb Natapov

    Abel Gordon
     
  • Introduce a function used to copy fields from the software controlled VMCS12
    to the processor-specific shadow vmcs

    Signed-off-by: Abel Gordon
    Reviewed-by: Orit Wasserman
    Signed-off-by: Gleb Natapov

    Abel Gordon
     
  • Introduce a function used to copy fields from the processor-specific shadow
    vmcs to the software controlled VMCS12

    Signed-off-by: Abel Gordon
    Reviewed-by: Orit Wasserman
    Signed-off-by: Gleb Natapov

    Abel Gordon
     
  • Unmap vmcs12 and release the corresponding shadow vmcs

    Signed-off-by: Abel Gordon
    Reviewed-by: Orit Wasserman
    Signed-off-by: Gleb Natapov

    Abel Gordon
     
  • Allocate a shadow vmcs used by the processor to shadow part of the fields
    stored in the software defined VMCS12 (let L1 access fields without causing
    exits). Note we keep a shadow vmcs only for the current vmcs12. Once a vmcs12
    becomes non-current, its shadow vmcs is released.

    Signed-off-by: Abel Gordon
    Reviewed-by: Orit Wasserman
    Signed-off-by: Gleb Natapov

    Abel Gordon
     
  • handle_vmon doesn't check if L1 is already in root mode (VMXON
    was previously called). This patch adds this missing check and calls
    nested_vmx_failValid if VMX is already ON.
    We need this check because L0 will allocate the shadow vmcs when L1
    executes VMXON and we want to avoid host leaks (due to shadow vmcs
    allocation) if L1 executes VMXON repeatedly.

    Signed-off-by: Abel Gordon
    Reviewed-by: Orit Wasserman
    Signed-off-by: Gleb Natapov

    Abel Gordon
     
  • Refactor existent code so we re-use vmcs12_write_any to copy fields from the
    shadow vmcs specified by the link pointer (used by the processor,
    implementation-specific) to the VMCS12 software format used by L0 to hold
    the fields in L1 memory address space.

    Signed-off-by: Abel Gordon
    Reviewed-by: Orit Wasserman
    Signed-off-by: Gleb Natapov

    Abel Gordon
     
  • Prepare vmread and vmwrite bitmaps according to a pre-specified list of fields.
    These lists are intended to specifiy most frequent accessed fields so we can
    minimize the number of fields that are copied from/to the software controlled
    VMCS12 format to/from to processor-specific shadow vmcs. The lists were built
    measuring the VMCS fields access rate after L2 Ubuntu 12.04 booted when it was
    running on top of L1 KVM, also Ubuntu 12.04. Note that during boot there were
    additional fields which were frequently modified but they were not added to
    these lists because after boot these fields were not longer accessed by L1.

    Signed-off-by: Abel Gordon
    Reviewed-by: Orit Wasserman
    Signed-off-by: Gleb Natapov

    Abel Gordon
     
  • Add logic required to detect if shadow-vmcs is supported by the
    processor. Introduce a new kernel module parameter to specify if L0 should use
    shadow vmcs (or not) to run L1.

    Signed-off-by: Abel Gordon
    Reviewed-by: Orit Wasserman
    Signed-off-by: Gleb Natapov

    Abel Gordon
     
  • Add definitions for all the vmcs control fields/bits
    required to enable vmcs-shadowing

    Signed-off-by: Abel Gordon
    Reviewed-by: Orit Wasserman
    Signed-off-by: Gleb Natapov

    Abel Gordon
     
  • Gleb Natapov
     

18 Apr, 2013

2 commits


17 Apr, 2013

2 commits