12 Sep, 2020

2 commits

  • when kmalloc() fails in kvm_io_bus_unregister_dev(), before removing
    the bus, we should iterate over all other devices linked to it and call
    kvm_iodevice_destructor() for them

    Fixes: 90db10434b16 ("KVM: kvm_io_bus_unregister_dev() should never fail")
    Cc: stable@vger.kernel.org
    Reported-and-tested-by: syzbot+f196caa45793d6374707@syzkaller.appspotmail.com
    Link: https://syzkaller.appspot.com/bug?extid=f196caa45793d6374707
    Signed-off-by: Rustam Kovhaev
    Reviewed-by: Vitaly Kuznetsov
    Message-Id:
    Signed-off-by: Paolo Bonzini

    Rustam Kovhaev
     
  • …kvmarm/kvmarm into HEAD

    KVM/arm64 fixes for Linux 5.9, take #1

    - Multiple stolen time fixes, with a new capability to match x86
    - Fix for hugetlbfs mappings when PUD and PMD are the same level
    - Fix for hugetlbfs mappings when PTE mappings are enforced
    (dirty logging, for example)
    - Fix tracing output of 64bit values

    Paolo Bonzini
     

22 Aug, 2020

1 commit

  • The 'flags' field of 'struct mmu_notifier_range' is used to indicate
    whether invalidate_range_{start,end}() are permitted to block. In the
    case of kvm_mmu_notifier_invalidate_range_start(), this field is not
    forwarded on to the architecture-specific implementation of
    kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
    whether or not to block.

    Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
    architectures are aware as to whether or not they are permitted to block.

    Cc:
    Cc: Marc Zyngier
    Cc: Suzuki K Poulose
    Cc: James Morse
    Signed-off-by: Will Deacon
    Message-Id:
    Signed-off-by: Paolo Bonzini

    Will Deacon
     

13 Aug, 2020

1 commit

  • After the cleanup of page fault accounting, gup does not need to pass
    task_struct around any more. Remove that parameter in the whole gup
    stack.

    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Reviewed-by: John Hubbard
    Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     

10 Jul, 2020

1 commit


09 Jul, 2020

1 commit

  • OVMF booted guest running on shadow pages crashes on TRIPLE FAULT after
    enabling paging from SMM. The crash is triggered from mmu_check_root() and
    is caused by kvm_is_visible_gfn() searching through memslots with as_id = 0
    while vCPU may be in a different context (address space).

    Introduce kvm_vcpu_is_visible_gfn() and use it from mmu_check_root().

    Signed-off-by: Vitaly Kuznetsov
    Message-Id:
    Signed-off-by: Paolo Bonzini

    Vitaly Kuznetsov
     

02 Jul, 2020

1 commit


13 Jun, 2020

1 commit

  • Pull more KVM updates from Paolo Bonzini:
    "The guest side of the asynchronous page fault work has been delayed to
    5.9 in order to sync with Thomas's interrupt entry rework, but here's
    the rest of the KVM updates for this merge window.

    MIPS:
    - Loongson port

    PPC:
    - Fixes

    ARM:
    - Fixes

    x86:
    - KVM_SET_USER_MEMORY_REGION optimizations
    - Fixes
    - Selftest fixes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (62 commits)
    KVM: x86: do not pass poisoned hva to __kvm_set_memory_region
    KVM: selftests: fix sync_with_host() in smm_test
    KVM: async_pf: Inject 'page ready' event only if 'page not present' was previously injected
    KVM: async_pf: Cleanup kvm_setup_async_pf()
    kvm: i8254: remove redundant assignment to pointer s
    KVM: x86: respect singlestep when emulating instruction
    KVM: selftests: Don't probe KVM_CAP_HYPERV_ENLIGHTENED_VMCS when nested VMX is unsupported
    KVM: selftests: do not substitute SVM/VMX check with KVM_CAP_NESTED_STATE check
    KVM: nVMX: Consult only the "basic" exit reason when routing nested exit
    KVM: arm64: Move hyp_symbol_addr() to kvm_asm.h
    KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception
    KVM: arm64: Make vcpu_cp1x() work on Big Endian hosts
    KVM: arm64: Remove host_cpu_context member from vcpu structure
    KVM: arm64: Stop sparse from moaning at __hyp_this_cpu_ptr
    KVM: arm64: Handle PtrAuth traps early
    KVM: x86: Unexport x86_fpu_cache and make it static
    KVM: selftests: Ignore KVM 5-level paging support for VM_MODE_PXXV48_4K
    KVM: arm64: Save the host's PtrAuth keys in non-preemptible context
    KVM: arm64: Stop save/restoring ACTLR_EL1
    KVM: arm64: Add emulation for 32bit guests accessing ACTLR2
    ...

    Linus Torvalds
     

10 Jun, 2020

2 commits

  • This change converts the existing mmap_sem rwsem calls to use the new mmap
    locking API instead.

    The change is generated using coccinelle with the following rule:

    // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

    @@
    expression mm;
    @@
    (
    -init_rwsem
    +mmap_init_lock
    |
    -down_write
    +mmap_write_lock
    |
    -down_write_killable
    +mmap_write_lock_killable
    |
    -down_write_trylock
    +mmap_write_trylock
    |
    -up_write
    +mmap_write_unlock
    |
    -downgrade_write
    +mmap_write_downgrade
    |
    -down_read
    +mmap_read_lock
    |
    -down_read_killable
    +mmap_read_lock_killable
    |
    -down_read_trylock
    +mmap_read_trylock
    |
    -up_read
    +mmap_read_unlock
    )
    -(&mm->mmap_sem)
    +(mm)

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Daniel Jordan
    Reviewed-by: Laurent Dufour
    Reviewed-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Patch series "mm: consolidate definitions of page table accessors", v2.

    The low level page table accessors (pXY_index(), pXY_offset()) are
    duplicated across all architectures and sometimes more than once. For
    instance, we have 31 definition of pgd_offset() for 25 supported
    architectures.

    Most of these definitions are actually identical and typically it boils
    down to, e.g.

    static inline unsigned long pmd_index(unsigned long address)
    {
    return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
    }

    static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
    {
    return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
    }

    These definitions can be shared among 90% of the arches provided
    XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

    For architectures that really need a custom version there is always
    possibility to override the generic version with the usual ifdefs magic.

    These patches introduce include/linux/pgtable.h that replaces
    include/asm-generic/pgtable.h and add the definitions of the page table
    accessors to the new header.

    This patch (of 12):

    The linux/mm.h header includes to allow inlining of the
    functions involving page table manipulations, e.g. pte_alloc() and
    pmd_alloc(). So, there is no point to explicitly include
    in the files that include .

    The include statements in such cases are remove with a simple loop:

    for f in $(git grep -l "include ") ; do
    sed -i -e '/include / d' $f
    done

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Mike Rapoport
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
    Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

09 Jun, 2020

1 commit

  • API __get_user_pages_fast() renamed to get_user_pages_fast_only() to
    align with pin_user_pages_fast_only().

    As part of this we will get rid of write parameter. Instead caller will
    pass FOLL_WRITE to get_user_pages_fast_only(). This will not change any
    existing functionality of the API.

    All the callers are changed to pass FOLL_WRITE.

    Also introduce get_user_page_fast_only(), and use it in a few places
    that hard-code nr_pages to 1.

    Updated the documentation of the API.

    Signed-off-by: Souptick Joarder
    Signed-off-by: Andrew Morton
    Reviewed-by: John Hubbard
    Reviewed-by: Paul Mackerras [arch/powerpc/kvm]
    Cc: Matthew Wilcox
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Mark Rutland
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paolo Bonzini
    Cc: Stephen Rothwell
    Cc: Mike Rapoport
    Cc: Aneesh Kumar K.V
    Cc: Michal Suchanek
    Link: http://lkml.kernel.org/r/1590396812-31277-1-git-send-email-jrdr.linux@gmail.com
    Signed-off-by: Linus Torvalds

    Souptick Joarder
     

08 Jun, 2020

1 commit

  • Commit b1394e745b94 ("KVM: x86: fix APIC page invalidation") tried
    to fix inappropriate APIC page invalidation by re-introducing arch
    specific kvm_arch_mmu_notifier_invalidate_range() and calling it from
    kvm_mmu_notifier_invalidate_range_start. However, the patch left a
    possible race where the VMCS APIC address cache is updated *before*
    it is unmapped:

    (Invalidator) kvm_mmu_notifier_invalidate_range_start()
    (Invalidator) kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD)
    (KVM VCPU) vcpu_enter_guest()
    (KVM VCPU) kvm_vcpu_reload_apic_access_page()
    (Invalidator) actually unmap page

    Because of the above race, there can be a mismatch between the
    host physical address stored in the APIC_ACCESS_PAGE VMCS field and
    the host physical address stored in the EPT entry for the APIC GPA
    (0xfee0000). When this happens, the processor will not trap APIC
    accesses, and will instead show the raw contents of the APIC-access page.
    Because Windows OS periodically checks for unexpected modifications to
    the LAPIC register, this will show up as a BSOD crash with BugCheck
    CRITICAL_STRUCTURE_CORRUPTION (109) we are currently seeing in
    https://bugzilla.redhat.com/show_bug.cgi?id=1751017.

    The root cause of the issue is that kvm_arch_mmu_notifier_invalidate_range()
    cannot guarantee that no additional references are taken to the pages in
    the range before kvm_mmu_notifier_invalidate_range_end(). Fortunately,
    this case is supported by the MMU notifier API, as documented in
    include/linux/mmu_notifier.h:

    * If the subsystem
    * can't guarantee that no additional references are taken to
    * the pages in the range, it has to implement the
    * invalidate_range() notifier to remove any references taken
    * after invalidate_range_start().

    The fix therefore is to reload the APIC-access page field in the VMCS
    from kvm_mmu_notifier_invalidate_range() instead of ..._range_start().

    Cc: stable@vger.kernel.org
    Fixes: b1394e745b94 ("KVM: x86: fix APIC page invalidation")
    Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=197951
    Signed-off-by: Eiichi Tsukata
    Message-Id:
    Signed-off-by: Paolo Bonzini

    Eiichi Tsukata
     

05 Jun, 2020

1 commit


04 Jun, 2020

1 commit

  • After commit 63d0434 ("KVM: x86: move kvm_create_vcpu_debugfs after
    last failure point") we are creating the pre-vCPU debugfs files
    after the creation of the vCPU file descriptor. This makes it
    possible for userspace to reach kvm_vcpu_release before
    kvm_create_vcpu_debugfs has finished. The vcpu->debugfs_dentry
    then does not have any associated inode anymore, and this causes
    a NULL-pointer dereference in debugfs_create_file.

    The solution is simply to avoid removing the files; they are
    cleaned up when the VM file descriptor is closed (and that must be
    after KVM_CREATE_VCPU returns). We can stop storing the dentry
    in struct kvm_vcpu too, because it is not needed anywhere after
    kvm_create_vcpu_debugfs returns.

    Reported-by: syzbot+705f4401d5a93a59b87d@syzkaller.appspotmail.com
    Fixes: 63d04348371b ("KVM: x86: move kvm_create_vcpu_debugfs after last failure point")
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

01 Jun, 2020

4 commits


16 May, 2020

4 commits

  • Fix spelling and typos (e.g., repeated words) in comments.

    Signed-off-by: Fuad Tabba
    Signed-off-by: Marc Zyngier
    Link: https://lore.kernel.org/r/20200401140310.29701-1-tabba@google.com

    Fuad Tabba
     
  • Two new stats for exposing halt-polling cpu usage:
    halt_poll_success_ns
    halt_poll_fail_ns

    Thus sum of these 2 stats is the total cpu time spent polling. "success"
    means the VCPU polled until a virtual interrupt was delivered. "fail"
    means the VCPU had to schedule out (either because the maximum poll time
    was reached or it needed to yield the CPU).

    To avoid touching every arch's kvm_vcpu_stat struct, only update and
    export halt-polling cpu usage stats if we're on x86.

    Exporting cpu usage as a u64 and in nanoseconds means we will overflow at
    ~500 years, which seems reasonably large.

    Signed-off-by: David Matlack
    Signed-off-by: Jon Cargille
    Reviewed-by: Jim Mattson

    Message-Id:
    Signed-off-by: Paolo Bonzini

    David Matlack
     
  • While optimizing posted-interrupt delivery especially for the timer
    fastpath scenario, I measured kvm_x86_ops.deliver_posted_interrupt()
    to introduce substantial latency because the processor has to perform
    all vmentry tasks, ack the posted interrupt notification vector,
    read the posted-interrupt descriptor etc.

    This is not only slow, it is also unnecessary when delivering an
    interrupt to the current CPU (as is the case for the LAPIC timer) because
    PIR->IRR and IRR->RVI synchronization is already performed on vmentry
    Therefore skip kvm_vcpu_trigger_posted_interrupt in this case, and
    instead do vmx_sync_pir_to_irr() on the EXIT_FASTPATH_REENTER_GUEST
    fastpath as well.

    Tested-by: Haiwei Li
    Cc: Haiwei Li
    Suggested-by: Paolo Bonzini
    Signed-off-by: Wanpeng Li
    Message-Id:
    Signed-off-by: Paolo Bonzini

    Wanpeng Li
     
  • hva_to_pfn_remapped() calls fixup_user_fault(), which has already
    handled the retry gracefully. Even if "unlocked" is set to true, it
    means that we've got a VM_FAULT_RETRY inside fixup_user_fault(),
    however the page fault has already retried and we should have the pfn
    set correctly. No need to do that again.

    Signed-off-by: Peter Xu
    Message-Id:
    Signed-off-by: Paolo Bonzini

    Peter Xu
     

14 May, 2020

2 commits

  • The use of any sort of waitqueue (simple or regular) for
    wait/waking vcpus has always been an overkill and semantically
    wrong. Because this is per-vcpu (which is blocked) there is
    only ever a single waiting vcpu, thus no need for any sort of
    queue.

    As such, make use of the rcuwait primitive, with the following
    considerations:

    - rcuwait already provides the proper barriers that serialize
    concurrent waiter and waker.

    - Task wakeup is done in rcu read critical region, with a
    stable task pointer.

    - Because there is no concurrency among waiters, we need
    not worry about rcuwait_wait_event() calls corrupting
    the wait->task. As a consequence, this saves the locking
    done in swait when modifying the queue. This also applies
    to per-vcore wait for powerpc kvm-hv.

    The x86 tscdeadline_latency test mentioned in 8577370fb0cb
    ("KVM: Use simple waitqueue for vcpu->wq") shows that, on avg,
    latency is reduced by around 15-20% with this change.

    Cc: Paul Mackerras
    Cc: kvmarm@lists.cs.columbia.edu
    Cc: linux-mips@vger.kernel.org
    Reviewed-by: Marc Zyngier
    Signed-off-by: Davidlohr Bueso
    Message-Id:
    [Avoid extra logic changes. - Paolo]
    Signed-off-by: Paolo Bonzini

    Davidlohr Bueso
     
  • Paolo Bonzini
     

08 May, 2020

1 commit


25 Apr, 2020

1 commit

  • KVM_CAP_HALT_POLL is a per-VM capability that lets userspace
    control the halt-polling time, allowing halt-polling to be tuned or
    disabled on particular VMs.

    With dynamic halt-polling, a VM's VCPUs can poll from anywhere from
    [0, halt_poll_ns] on each halt. KVM_CAP_HALT_POLL sets the
    upper limit on the poll time.

    Signed-off-by: David Matlack
    Signed-off-by: Jon Cargille
    Reviewed-by: Jim Mattson
    Message-Id:
    Signed-off-by: Paolo Bonzini

    David Matlack
     

21 Apr, 2020

4 commits

  • When a nested page fault is taken from an address that does not have
    a memslot associated to it, kvm_mmu_do_page_fault returns RET_PF_EMULATE
    (via mmu_set_spte) and kvm_mmu_page_fault then invokes svm_need_emulation_on_page_fault.

    The default answer there is to return false, but in this case this just
    causes the page fault to be retried ad libitum. Since this is not a
    fast path, and the only other case where it is taken is an erratum,
    just stick a kvm_vcpu_gfn_to_memslot check in there to detect the
    common case where the erratum is not happening.

    This fixes an infinite loop in the new set_memory_region_test.

    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • In earlier versions of kvm, 'kvm_run' was an independent structure
    and was not included in the vcpu structure. At present, 'kvm_run'
    is already included in the vcpu structure, so the parameter
    'kvm_run' is redundant.

    This patch simplifies the function definition, removes the extra
    'kvm_run' parameter, and extracts it from the 'kvm_vcpu' structure
    if necessary.

    Signed-off-by: Tianjia Zhang
    Message-Id:
    Signed-off-by: Paolo Bonzini

    Tianjia Zhang
     
  • Create a new function kvm_is_visible_memslot() and use it from
    kvm_is_visible_gfn(); use the new function in try_async_pf() too,
    to avoid an extra memslot lookup.

    Opportunistically squish a multi-line comment into a single-line comment.

    Note, the end result, KVM_PFN_NOSLOT, is unchanged.

    Cc: Jim Mattson
    Cc: Rick Edgecombe
    Suggested-by: Sean Christopherson
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • The placement of kvm_create_vcpu_debugfs is more or less irrelevant, since
    it cannot fail and userspace should not care about the debugfs entries until
    it knows the vcpu has been created. Moving it after the last failure
    point removes the need to remove the directory when unwinding the creation.

    Reviewed-by: Emanuele Giuseppe Esposito
    Message-Id:
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

16 Apr, 2020

1 commit


31 Mar, 2020

1 commit

  • Pass @opaque to kvm_arch_hardware_setup() and
    kvm_arch_check_processor_compat() to allow architecture specific code to
    reference @opaque without having to stash it away in a temporary global
    variable. This will enable x86 to separate its vendor specific callback
    ops, which are passed via @opaque, into "init" and "runtime" ops without
    having to stash away the "init" ops.

    No functional change intended.

    Reviewed-by: Cornelia Huck
    Tested-by: Cornelia Huck #s390
    Acked-by: Marc Zyngier
    Signed-off-by: Sean Christopherson
    Message-Id:
    Reviewed-by: Vitaly Kuznetsov
    Signed-off-by: Paolo Bonzini

    Sean Christopherson
     

26 Mar, 2020

1 commit

  • Reset the LRU slot if it becomes invalid when deleting a memslot to fix
    an out-of-bounds/use-after-free access when searching through memslots.

    Explicitly check for there being no used slots in search_memslots(), and
    in the caller of s390's approximation variant.

    Fixes: 36947254e5f9 ("KVM: Dynamically size memslot array based on number of used slots")
    Reported-by: Qian Cai
    Cc: Peter Xu
    Signed-off-by: Sean Christopherson
    Message-Id:
    Acked-by: Christian Borntraeger
    Signed-off-by: Paolo Bonzini

    Sean Christopherson
     

17 Mar, 2020

7 commits

  • Drop largepages_enabled, kvm_largepages_enabled() and
    kvm_disable_largepages() now that all users are gone.

    Note, largepages_enabled was an x86-only flag that got left in common
    KVM code when KVM gained support for multiple architectures.

    No functional change intended.

    Reviewed-by: Vitaly Kuznetsov
    Signed-off-by: Sean Christopherson
    Signed-off-by: Paolo Bonzini

    Sean Christopherson
     
  • It's never used anywhere now.

    Signed-off-by: Peter Xu
    Signed-off-by: Paolo Bonzini

    Peter Xu
     
  • It could take kvm->mmu_lock for an extended period of time when
    enabling dirty log for the first time. The main cost is to clear
    all the D-bits of last level SPTEs. This situation can benefit from
    manual dirty log protect as well, which can reduce the mmu_lock
    time taken. The sequence is like this:

    1. Initialize all the bits of the dirty bitmap to 1 when enabling
    dirty log for the first time
    2. Only write protect the huge pages
    3. KVM_GET_DIRTY_LOG returns the dirty bitmap info
    4. KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level
    SPTEs gradually in small chunks

    Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment,
    I did some tests with a 128G windows VM and counted the time taken
    of memory_global_dirty_log_start, here is the numbers:

    VM Size Before After optimization
    128G 460ms 10ms

    Signed-off-by: Jay Zhou
    Signed-off-by: Paolo Bonzini

    Jay Zhou
     
  • Now that the memslot logic doesn't assume memslots are always non-NULL,
    dynamically size the array of memslots instead of unconditionally
    allocating memory for the maximum number of memslots.

    Note, because a to-be-deleted memslot must first be invalidated, the
    array size cannot be immediately reduced when deleting a memslot.
    However, consecutive deletions will realize the memory savings, i.e.
    a second deletion will trim the entry.

    Tested-by: Christoffer Dall
    Tested-by: Marc Zyngier
    Reviewed-by: Peter Xu
    Signed-off-by: Sean Christopherson
    Signed-off-by: Paolo Bonzini

    Sean Christopherson
     
  • Refactor memslot handling to treat the number of used slots as the de
    facto size of the memslot array, e.g. return NULL from id_to_memslot()
    when an invalid index is provided instead of relying on npages==0 to
    detect an invalid memslot. Rework the sorting and walking of memslots
    in advance of dynamically sizing memslots to aid bisection and debug,
    e.g. with luck, a bug in the refactoring will bisect here and/or hit a
    WARN instead of randomly corrupting memory.

    Alternatively, a global null/invalid memslot could be returned, i.e. so
    callers of id_to_memslot() don't have to explicitly check for a NULL
    memslot, but that approach runs the risk of introducing difficult-to-
    debug issues, e.g. if the global null slot is modified. Constifying
    the return from id_to_memslot() to combat such issues is possible, but
    would require a massive refactoring of arch specific code and would
    still be susceptible to casting shenanigans.

    Add function comments to update_memslots() and search_memslots() to
    explicitly (and loudly) state how memslots are sorted.

    Opportunistically stuff @hva with a non-canonical value when deleting a
    private memslot on x86 to detect bogus usage of the freed slot.

    No functional change intended.

    Tested-by: Christoffer Dall
    Tested-by: Marc Zyngier
    Signed-off-by: Sean Christopherson
    Signed-off-by: Paolo Bonzini

    Sean Christopherson
     
  • Rework kvm_get_dirty_log() so that it "returns" the associated memslot
    on success. A future patch will rework memslot handling such that
    id_to_memslot() can return NULL, returning the memslot makes it more
    obvious that the validity of the memslot has been verified, i.e.
    precludes the need to add validity checks in the arch code that are
    technically unnecessary.

    To maintain ordering in s390, move the call to kvm_arch_sync_dirty_log()
    from s390's kvm_vm_ioctl_get_dirty_log() to the new kvm_get_dirty_log().
    This is a nop for PPC, the only other arch that doesn't select
    KVM_GENERIC_DIRTYLOG_READ_PROTECT, as its sync_dirty_log() is empty.

    Ideally, moving the sync_dirty_log() call would be done in a separate
    patch, but it can't be done in a follow-on patch because that would
    temporarily break s390's ordering. Making the move in a preparatory
    patch would be functionally correct, but would create an odd scenario
    where the moved sync_dirty_log() would operate on a "different" memslot
    due to consuming the result of a different id_to_memslot(). The
    memslot couldn't actually be different as slots_lock is held, but the
    code is confusing enough as it is, i.e. moving sync_dirty_log() in this
    patch is the lesser of all evils.

    Signed-off-by: Sean Christopherson
    Signed-off-by: Paolo Bonzini

    Sean Christopherson
     
  • Move the implementations of KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG
    for CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT into common KVM code.
    The arch specific implemenations are extremely similar, differing
    only in whether the dirty log needs to be sync'd from hardware (x86)
    and how the TLBs are flushed. Add new arch hooks to handle sync
    and TLB flush; the sync will also be used for non-generic dirty log
    support in a future patch (s390).

    The ulterior motive for providing a common implementation is to
    eliminate the dependency between arch and common code with respect to
    the memslot referenced by the dirty log, i.e. to make it obvious in the
    code that the validity of the memslot is guaranteed, as a future patch
    will rework memslot handling such that id_to_memslot() can return NULL.

    Signed-off-by: Sean Christopherson
    Signed-off-by: Paolo Bonzini

    Sean Christopherson