17 Jan, 2013

4 commits


14 Jan, 2013

8 commits

  • If the userspace starts dirty logging for a large slot, say 64GB of
    memory, kvm_mmu_slot_remove_write_access() needs to hold mmu_lock for
    a long time such as tens of milliseconds. This patch controls the lock
    hold time by asking the scheduler if we need to reschedule for others.

    One penalty for this is that we need to flush TLBs before releasing
    mmu_lock. But since holding mmu_lock for a long time does affect not
    only the guest, vCPU threads in other words, but also the host as a
    whole, we should pay for that.

    In practice, the cost will not be so high because we can protect a fair
    amount of memory before being rescheduled: on my test environment,
    cond_resched_lock() was called only once for protecting 12GB of memory
    even without THP. We can also revisit Avi's "unlocked TLB flush" work
    later for completely suppressing extra TLB flushes if needed.

    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Gleb Natapov

    Takuya Yoshikawa
     
  • Better to place mmu_lock handling and TLB flushing code together since
    this is a self-contained function.

    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Gleb Natapov

    Takuya Yoshikawa
     
  • No reason to make callers take mmu_lock since we do not need to protect
    kvm_mmu_change_mmu_pages() and kvm_mmu_slot_remove_write_access()
    together by mmu_lock in kvm_arch_commit_memory_region(): the former
    calls kvm_mmu_commit_zap_page() and flushes TLBs by itself.

    Note: we do not need to protect kvm->arch.n_requested_mmu_pages by
    mmu_lock as can be seen from the fact that it is read locklessly.

    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Gleb Natapov

    Takuya Yoshikawa
     
  • Not needed any more.

    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Gleb Natapov

    Takuya Yoshikawa
     
  • This makes it possible to release mmu_lock and reschedule conditionally
    in a later patch. Although this may increase the time needed to protect
    the whole slot when we start dirty logging, the kernel should not allow
    the userspace to trigger something that will hold a spinlock for such a
    long time as tens of milliseconds: actually there is no limit since it
    is roughly proportional to the number of guest pages.

    Another point to note is that this patch removes the only user of
    slot_bitmap which will cause some problems when we increase the number
    of slots further.

    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Gleb Natapov

    Takuya Yoshikawa
     
  • No longer need to care about the mapping level in this function.

    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Gleb Natapov

    Takuya Yoshikawa
     
  • Calling kvm_mmu_slot_remove_write_access() for a deleted slot does
    nothing but search for non-existent mmu pages which have mappings to
    that deleted memory; this is safe but a waste of time.

    Since we want to make the function rmap based in a later patch, in a
    manner which makes it unsafe to be called for a deleted slot, we makes
    the caller see if the slot is non-zero and being dirty logged.

    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Gleb Natapov

    Takuya Yoshikawa
     
  • Gleb Natapov
     

11 Jan, 2013

3 commits

  • trace_kvm_userspace_exit has been missing the KVM_EXIT_WATCHDOG exit.

    CC: Bharat Bhushan
    Signed-off-by: Cornelia Huck
    Signed-off-by: Marcelo Tosatti

    Cornelia Huck
     
  • We have two issues in current code:
    - if target gfn is used as its page table, guest will refault then kvm will use
    small page size to map it. We need two #PF to fix its shadow page table

    - sometimes, say a exception is triggered during vm-exit caused by #PF
    (see handle_exception() in vmx.c), we remove all the shadow pages shadowed
    by the target gfn before go into page fault path, it will cause infinite
    loop:
    delete shadow pages shadowed by the gfn -> try to use large page size to map
    the gfn -> retry the access ->...

    To fix these, we can adjust page size early if the target gfn is used as page
    table

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Marcelo Tosatti

    Xiao Guangrong
     
  • If the write-fault access is from supervisor and CR0.WP is not set on the
    vcpu, kvm will fix it by adjusting pte access - it sets the W bit on pte
    and clears U bit. This is the chance that kvm can change pte access from
    readonly to writable

    Unfortunately, the pte access is the access of 'direct' shadow page table,
    means direct sp.role.access = pte_access, then we will create a writable
    spte entry on the readonly shadow page table. It will cause Dirty bit is
    not tracked when two guest ptes point to the same large page. Note, it
    does not have other impact except Dirty bit since cr0.wp is encoded into
    sp.role

    It can be fixed by adjusting pte access before establishing shadow page
    table. Also, after that, no mmu specified code exists in the common function
    and drop two parameters in set_spte

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Marcelo Tosatti

    Xiao Guangrong
     

10 Jan, 2013

17 commits


09 Jan, 2013

2 commits


08 Jan, 2013

6 commits

  • MMU code tries to avoid if()s HW is not able to predict reliably by using
    bitwise operation to streamline code execution, but in case of a dirty bit
    folding this gives us nothing since write_fault is checked right before
    the folding code. Lets just piggyback onto the if() to make code more clear.

    Signed-off-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Gleb Natapov
     
  • trace_kvm_mmu_delay_free_pages() is no longer used.

    Signed-off-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Gleb Natapov
     
  • Add a new capability, KVM_CAP_S390_CSS_SUPPORT, which will pass
    intercepts for channel I/O instructions to userspace. Only I/O
    instructions interacting with I/O interrupts need to be handled
    in-kernel:

    - TEST PENDING INTERRUPTION (tpi) dequeues and stores pending
    interrupts entirely in-kernel.
    - TEST SUBCHANNEL (tsch) dequeues pending interrupts in-kernel
    and exits via KVM_EXIT_S390_TSCH to userspace for subchannel-
    related processing.

    Reviewed-by: Marcelo Tosatti
    Reviewed-by: Alexander Graf
    Signed-off-by: Cornelia Huck
    Signed-off-by: Marcelo Tosatti

    Cornelia Huck
     
  • Make s390 support KVM_ENABLE_CAP.

    Reviewed-by: Marcelo Tosatti
    Acked-by: Alexander Graf
    Signed-off-by: Cornelia Huck
    Signed-off-by: Marcelo Tosatti

    Cornelia Huck
     
  • Explicitely catch all channel I/O related instructions intercepts
    in the kernel and set condition code 3 for them.

    This paves the way for properly handling these instructions later
    on.

    Note: This is not architecture compliant (the previous code wasn't
    either) since setting cc 3 is not the correct thing to do for some
    of these instructions. For Linux guests, however, it still has the
    intended effect of stopping css probing.

    Reviewed-by: Marcelo Tosatti
    Reviewed-by: Alexander Graf
    Signed-off-by: Cornelia Huck
    Signed-off-by: Marcelo Tosatti

    Cornelia Huck
     
  • Add support for injecting machine checks (only repressible
    conditions for now).

    This is a bit more involved than I/O interrupts, for these reasons:

    - Machine checks come in both floating and cpu varieties.
    - We don't have a bit for machine checks enabling, but have to use
    a roundabout approach with trapping PSW changing instructions and
    watching for opened machine checks.

    Reviewed-by: Alexander Graf
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Cornelia Huck
    Signed-off-by: Marcelo Tosatti

    Cornelia Huck