25 Apr, 2010

40 commits

  • Marcelo introduced gfn_to_hva_memslot() when he implemented
    gfn_to_pfn_memslot(). Let's use this for gfn_to_hva() too.

    Note: also remove parentheses next to return as checkpatch said to do.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Avi Kivity

    Takuya Yoshikawa
     
  • When injecting an vmexit.intr into the nested hypervisor
    there might be leftover values in the exit_info fields.
    Clear them to not confuse nested hypervisors.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Avi Kivity

    Joerg Roedel
     
  • If we have the following situation with nested svm:

    1. Host KVM intercepts cr0 writes
    2. Guest hypervisor intercepts only selective cr0 writes

    Then we get an cr0 write intercept which is handled on the
    host. But that intercepts may actually be a selective cr0
    intercept for the guest. This patch checks for this
    condition and injects a selective cr0 intercept if needed.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Avi Kivity

    Joerg Roedel
     
  • The vcpu->arch.cr0 variable is already set in the
    architecture specific set_cr0 callbacks. There is no need to
    set it in the common code.
    This allows the architecture code to keep the old arch.cr0
    value if it wants. This is required for nested svm to decide
    if a selective_cr0 exit needs to be injected.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Avi Kivity

    Joerg Roedel
     
  • Hyper-V as a guest wants to write this bit. This patch
    ignores it.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Avi Kivity

    Joerg Roedel
     
  • This patch implements the emulation of the vm_cr msr for
    nested svm.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Avi Kivity

    Joerg Roedel
     
  • This patch adds a tracepoint to get information about the
    most important intercept bitmasks from the nested vmcb.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Avi Kivity

    Joerg Roedel
     
  • A recent change broke tracing of the nested vmcb address. It
    was reported as 0 all the time. This patch fixes it.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Avi Kivity

    Joerg Roedel
     
  • This patch implements the NMI intercept checking for nested
    svm.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Avi Kivity

    Joerg Roedel
     
  • Without resetting the MMU the gva_to_pga function will not
    work reliably when the vcpu is running in nested context.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Avi Kivity

    Joerg Roedel
     
  • This patch removes whitespace errors, fixes comment formats
    and most of checkpatch warnings. Now vim does not show
    c-space-errors anymore.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Avi Kivity

    Joerg Roedel
     
  • Call directly into the vendor services for getting/setting rflags in
    emulate_instruction to ensure injected TF survives the emulation.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     
  • RF is not required for injecting TF as the latter will trigger only
    after an instruction execution anyway. So do not touch RF when arming or
    disarming guest single-step mode.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     
  • When in guest debugging mode, we have to reinject those #BP software
    exceptions that are caused by guest-injected INT3. As older AMD
    processors do not support the required nRIP VMCB field, try to emulate
    it by moving RIP past the instruction on exception injection. Fix it up
    again in case the injection failed and we were able to catch this. This
    does not work for unintercepted faults, but it is better than doing
    nothing.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     
  • Based on Gleb's suggestion: Add a helper kvm_is_linear_rip that matches
    a given linear RIP against the current one. Use this for guest
    single-stepping, more users will follow.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     
  • Move svm_queue_exception past skip_emulated_instruction to allow calling
    it later on.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     
  • This restores the deferred VCPU kicking before 956f97cf. We need this
    over -rt as wake_up* requires non-atomic context in this configuration.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     
  • When we destory a vcpu, we should also make sure to kill all pending
    timers that could still be up. When not doing this, hrtimers might
    dereference null pointers trying to call our code.

    This patch fixes spontanious kernel panics seen after closing VMs.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • While converting the kzalloc we used to allocate our vcpu struct to
    vmalloc, I forgot to memset the contents to zeros. That broke quite
    a lot.

    This patch memsets it to zero again.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • So far user space was not able to save and restore debug registers for
    migration or after reset. Plug this hole.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     
  • The interrupt shadow created by STI or MOV-SS-like operations is part of
    the VCPU state and must be preserved across migration. Transfer it in
    the spare padding field of kvm_vcpu_events.interrupt.

    As a side effect we now have to make vmx_set_interrupt_shadow robust
    against both shadow types being set. Give MOV SS a higher priority and
    skip STI in that case to avoid that VMX throws a fault on next entry.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     
  • To avoid that user space migrates a pending software exception or
    interrupt, mask them out on KVM_GET_VCPU_EVENTS. Without this, user
    space would try to reinject them, and we would have to reconstruct the
    proper instruction length for VMX event injection. Now the pending event
    will be reinjected via executing the triggering instruction again.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     
  • The nested_svm_intr() function does not execute the vmexit
    anymore. Therefore we may still be in the nested state after
    that function ran. This patch changes the nested_svm_intr()
    function to return wether the irq window could be enabled.

    Cc: stable@kernel.org
    Signed-off-by: Joerg Roedel
    Signed-off-by: Avi Kivity

    Joerg Roedel
     
  • vcpu->run is initialized on vcpu creation and can never be NULL
    here.

    Signed-off-by: Gleb Natapov
    Signed-off-by: Avi Kivity

    Gleb Natapov
     
  • We used to use get_free_pages to allocate our vcpu struct. Unfortunately
    that call failed on me several times after my machine had a big enough
    uptime, as memory became too fragmented by then.

    Fortunately, we don't need it to be page aligned any more! We can just
    vmalloc it and everything's great.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • We don't need as complex code. I had some thinkos while writing it, figuring
    I needed to support PPC32 paths on PPC64 which would have required DR=0, but
    everything just runs fine with DR=1.

    So let's make the functions simple C call wrappers that reserve some space on
    the stack for the respective functions to clobber.

    Fixes out-of-RMA-access (and thus guest FPU loading) on the PS3.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • We had code to make use of the secondary htab buckets, but kept that
    disabled because it was unstable when I put it in.

    I checked again if that's still the case and apparently it was only
    exposing some instability that was there anyways before. I haven't
    seen any badness related to usage of secondary htab entries so far.

    This should speed up guest memory allocations by quite a bit, because
    we now have more space to put PTEs in.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • We need to tell userspace that we can emulate paired single instructions.
    So let's add a capability export.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • The one big thing about the Gekko is paired singles.

    Paired singles are an extension to the instruction set, that adds 32 single
    precision floating point registers (qprs), some SPRs to modify the behavior
    of paired singled operations and instructions to deal with qprs to the
    instruction set.

    Unfortunately, it also changes semantics of existing operations that affect
    single values in FPRs. In most cases they get mirrored to the coresponding
    QPR.

    Thanks to that we need to emulate all FPU operations and all the new paired
    single operations too.

    In order to achieve that, we use the just introduced FPU call helpers to
    call the real FPU whenever the guest wants to modify an FPR. Additionally
    we also fix up the QPR values along the way.

    That way we can execute paired single FPU operations without implementing a
    soft fpu.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • When we get a program interrupt we usually don't expect it to perform an
    MMIO operation. But why not? When we emulate paired singles, we can end
    up loading or storing to an MMIO address - and the handling of those
    happens in the program interrupt handler.

    So let's teach the program interrupt handler how to deal with EMULATE_MMIO.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • The PowerPC specification always lists bits from MSB to LSB. That is
    really confusing when you're trying to write C code, because it fits
    in pretty badly with the normal (1 << xx) schemes.

    So I came up with some nice wrappers that allow to get and set fields
    in a u64 with bit numbers exactly as given in the spec. That makes the
    code in KVM and the spec easier comparable.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • BATs didn't work. Well, they did, but only up to BAT3. As soon as we
    came to BAT4 the offset calculation was screwed up and we ended up
    overwriting BAT0-3.

    Fortunately, Linux hasn't been using BAT4+. It's still a good
    idea to write correct code though.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • To emulate paired single instructions, we need to be able to call FPU
    operations from within the kernel. Since we don't want gcc to spill
    arbitrary FPU code everywhere, we tell it to use a soft fpu.

    Since we know we can really call the FPU in safe areas, let's also add
    some calls that we can later use to actually execute real world FPU
    operations on the host's FPU.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • We need to call the ext giveup handlers from code outside of book3s.c.
    So let's make it non-static.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • The Book3S KVM implementation contains some helper functions to load and store
    data from and to virtual addresses.

    Unfortunately, this helper used to keep the physical address it so nicely
    found out for us to itself. So let's change that and make it return the
    physical address it resolved.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • The Book3S_32 specifications allows for two instructions to modify segment
    registers: mtsrin and mtsr.

    Most normal operating systems use mtsrin, because it allows to define which
    segment it wants to change using a register. But since I was trying to run
    an embedded guest, it turned out to be using mtsr with hardcoded values.

    So let's also emulate mtsr. It's a valid instruction after all.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • There's a typo in the debug ifdef of the book3s_32 mmu emulation. While trying
    to debug something I stumbled across that and wanted to save anyone after me
    (or myself later) from having to debug that again.

    So let's fix the ifdef.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • There are some situations when we're pretty sure the guest will use the
    FPU soon. So we can save the churn of going into the guest, finding out
    it does want to use the FPU and going out again.

    This patch adds preloading of the FPU when it's reasonable.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • When we for example get an Altivec interrupt, but our guest doesn't support
    altivec, we need to inject a program interrupt, not an altivec interrupt.

    The same goes for paired singles. When an altivec interrupt arrives, we're
    pretty sure we need to emulate the instruction because it's a paired single
    operation.

    So let's make all the ext handlers aware that they need to jump to the
    program interrupt handler when an extension interrupt arrives that
    was not supposed to arrive for the guest CPU.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • The Gekko has some SPR values that differ from other PPC core values and
    also some additional ones.

    Let's add support for them in our mfspr/mtspr emulator.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf