Eric Lee / smarc-fsl-linux-kernel

12 Sep, 2020

2 commits

f65886606 KVM: fix memory leak in kvm_io_bus_unregister_dev() ... Browse Code »

when kmalloc() fails in kvm_io_bus_unregister_dev(), before removing
the bus, we should iterate over all other devices linked to it and call
kvm_iodevice_destructor() for them

Fixes: 90db10434b16 ("KVM: kvm_io_bus_unregister_dev() should never fail")
Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+f196caa45793d6374707@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=f196caa45793d6374707
Signed-off-by: Rustam Kovhaev
Reviewed-by: Vitaly Kuznetsov
Message-Id:
Signed-off-by: Paolo Bonzini

Rustam Kovhaev
2020-09-12 01:15:11 +0800
1b67fd086 Merge tag 'kvmarm-fixes-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/… ... Browse Code »

…kvmarm/kvmarm into HEAD

KVM/arm64 fixes for Linux 5.9, take #1

- Multiple stolen time fixes, with a new capability to match x86
- Fix for hugetlbfs mappings when PUD and PMD are the same level
- Fix for hugetlbfs mappings when PTE mappings are enforced
(dirty logging, for example)
- Fix tracing output of 64bit values

Paolo Bonzini
2020-09-12 01:12:11 +0800

22 Aug, 2020

1 commit

fdfe7cbd5 KVM: Pass MMU notifier range flags to kvm_unmap_hva_range() ... Browse Code »

The 'flags' field of 'struct mmu_notifier_range' is used to indicate
whether invalidate_range_{start,end}() are permitted to block. In the
case of kvm_mmu_notifier_invalidate_range_start(), this field is not
forwarded on to the architecture-specific implementation of
kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
whether or not to block.

Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
architectures are aware as to whether or not they are permitted to block.

Cc:
Cc: Marc Zyngier
Cc: Suzuki K Poulose
Cc: James Morse
Signed-off-by: Will Deacon
Message-Id:
Signed-off-by: Paolo Bonzini

Will Deacon
2020-08-22 06:03:47 +0800

13 Aug, 2020

1 commit

64019a2e4 mm/gup: remove task_struct pointer for all gup code ... Browse Code »

After the cleanup of page fault accounting, gup does not need to pass
task_struct around any more. Remove that parameter in the whole gup
stack.

Signed-off-by: Peter Xu
Signed-off-by: Andrew Morton
Reviewed-by: John Hubbard
Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com
Signed-off-by: Linus Torvalds

Peter Xu
2020-08-13 01:58:04 +0800

10 Jul, 2020

1 commit

6926f95ac KVM: Move x86's MMU memory cache helpers to common KVM code ... Browse Code »

Move x86's memory cache helpers to common KVM code so that they can be
reused by arm64 and MIPS in future patches.

Suggested-by: Christoffer Dall
Reviewed-by: Ben Gardon
Signed-off-by: Sean Christopherson
Message-Id:
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-07-10 01:29:42 +0800

09 Jul, 2020

1 commit

995decb6c KVM: x86: take as_id into account when checking PGD ... Browse Code »

OVMF booted guest running on shadow pages crashes on TRIPLE FAULT after
enabling paging from SMM. The crash is triggered from mmu_check_root() and
is caused by kvm_is_visible_gfn() searching through memslots with as_id = 0
while vCPU may be in a different context (address space).

Introduce kvm_vcpu_is_visible_gfn() and use it from mmu_check_root().

Signed-off-by: Vitaly Kuznetsov
Message-Id:
Signed-off-by: Paolo Bonzini

Vitaly Kuznetsov
2020-07-09 19:08:37 +0800

02 Jul, 2020

1 commit

1393b4aaf kvm: use more precise cast and do not drop __user ... Browse Code »

Sparse complains on a call to get_compat_sigset, fix it. The "if"
right above explains that sigmask_arg->sigset is basically a
compat_sigset_t.

Signed-off-by: Paolo Bonzini

Paolo Bonzini
2020-07-02 17:39:31 +0800

13 Jun, 2020

1 commit

52cd0d972 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull more KVM updates from Paolo Bonzini:
"The guest side of the asynchronous page fault work has been delayed to
5.9 in order to sync with Thomas's interrupt entry rework, but here's
the rest of the KVM updates for this merge window.

MIPS:
- Loongson port

PPC:
- Fixes

ARM:
- Fixes

x86:
- KVM_SET_USER_MEMORY_REGION optimizations
- Fixes
- Selftest fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (62 commits)
KVM: x86: do not pass poisoned hva to __kvm_set_memory_region
KVM: selftests: fix sync_with_host() in smm_test
KVM: async_pf: Inject 'page ready' event only if 'page not present' was previously injected
KVM: async_pf: Cleanup kvm_setup_async_pf()
kvm: i8254: remove redundant assignment to pointer s
KVM: x86: respect singlestep when emulating instruction
KVM: selftests: Don't probe KVM_CAP_HYPERV_ENLIGHTENED_VMCS when nested VMX is unsupported
KVM: selftests: do not substitute SVM/VMX check with KVM_CAP_NESTED_STATE check
KVM: nVMX: Consult only the "basic" exit reason when routing nested exit
KVM: arm64: Move hyp_symbol_addr() to kvm_asm.h
KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception
KVM: arm64: Make vcpu_cp1x() work on Big Endian hosts
KVM: arm64: Remove host_cpu_context member from vcpu structure
KVM: arm64: Stop sparse from moaning at __hyp_this_cpu_ptr
KVM: arm64: Handle PtrAuth traps early
KVM: x86: Unexport x86_fpu_cache and make it static
KVM: selftests: Ignore KVM 5-level paging support for VM_MODE_PXXV48_4K
KVM: arm64: Save the host's PtrAuth keys in non-preemptible context
KVM: arm64: Stop save/restoring ACTLR_EL1
KVM: arm64: Add emulation for 32bit guests accessing ACTLR2
...

Linus Torvalds
2020-06-13 02:05:52 +0800

10 Jun, 2020

2 commits

d8ed45c5d mmap locking API: use coccinelle to convert mmap_sem rwsem call sites ... Browse Code »

This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse
Signed-off-by: Andrew Morton
Reviewed-by: Daniel Jordan
Reviewed-by: Laurent Dufour
Reviewed-by: Vlastimil Babka
Cc: Davidlohr Bueso
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Jason Gunthorpe
Cc: Jerome Glisse
Cc: John Hubbard
Cc: Liam Howlett
Cc: Matthew Wilcox
Cc: Peter Zijlstra
Cc: Ying Han
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds

Michel Lespinasse
2020-06-10 00:39:14 +0800
e31cf2f4c mm: don't include asm/pgtable.h if linux/mm.h is already included ... Browse Code »

Patch series "mm: consolidate definitions of page table accessors", v2.

The low level page table accessors (pXY_index(), pXY_offset()) are
duplicated across all architectures and sometimes more than once. For
instance, we have 31 definition of pgd_offset() for 25 supported
architectures.

Most of these definitions are actually identical and typically it boils
down to, e.g.

static inline unsigned long pmd_index(unsigned long address)
{
return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
}

static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}

These definitions can be shared among 90% of the arches provided
XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

For architectures that really need a custom version there is always
possibility to override the generic version with the usual ifdefs magic.

These patches introduce include/linux/pgtable.h that replaces
include/asm-generic/pgtable.h and add the definitions of the page table
accessors to the new header.

This patch (of 12):

The linux/mm.h header includes to allow inlining of the
functions involving page table manipulations, e.g. pte_alloc() and
pmd_alloc(). So, there is no point to explicitly include
in the files that include .

The include statements in such cases are remove with a simple loop:

for f in $(git grep -l "include ") ; do
sed -i -e '/include / d' $f
done

Signed-off-by: Mike Rapoport
Signed-off-by: Andrew Morton
Cc: Arnd Bergmann
Cc: Borislav Petkov
Cc: Brian Cain
Cc: Catalin Marinas
Cc: Chris Zankel
Cc: "David S. Miller"
Cc: Geert Uytterhoeven
Cc: Greentime Hu
Cc: Greg Ungerer
Cc: Guan Xuetao
Cc: Guo Ren
Cc: Heiko Carstens
Cc: Helge Deller
Cc: Ingo Molnar
Cc: Ley Foon Tan
Cc: Mark Salter
Cc: Matthew Wilcox
Cc: Matt Turner
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Mike Rapoport
Cc: Nick Hu
Cc: Paul Walmsley
Cc: Richard Weinberger
Cc: Rich Felker
Cc: Russell King
Cc: Stafford Horne
Cc: Thomas Bogendoerfer
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Vincent Chen
Cc: Vineet Gupta
Cc: Will Deacon
Cc: Yoshinori Sato
Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
Signed-off-by: Linus Torvalds

Mike Rapoport
2020-06-10 00:39:13 +0800

09 Jun, 2020

1 commit

dadbb612f mm/gup.c: convert to use get_user_{page|pages}_fast_only() ... Browse Code »

API __get_user_pages_fast() renamed to get_user_pages_fast_only() to
align with pin_user_pages_fast_only().

As part of this we will get rid of write parameter. Instead caller will
pass FOLL_WRITE to get_user_pages_fast_only(). This will not change any
existing functionality of the API.

All the callers are changed to pass FOLL_WRITE.

Also introduce get_user_page_fast_only(), and use it in a few places
that hard-code nr_pages to 1.

Updated the documentation of the API.

Signed-off-by: Souptick Joarder
Signed-off-by: Andrew Morton
Reviewed-by: John Hubbard
Reviewed-by: Paul Mackerras [arch/powerpc/kvm]
Cc: Matthew Wilcox
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Mark Rutland
Cc: Alexander Shishkin
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paolo Bonzini
Cc: Stephen Rothwell
Cc: Mike Rapoport
Cc: Aneesh Kumar K.V
Cc: Michal Suchanek
Link: http://lkml.kernel.org/r/1590396812-31277-1-git-send-email-jrdr.linux@gmail.com
Signed-off-by: Linus Torvalds

Souptick Joarder
2020-06-09 02:05:56 +0800

08 Jun, 2020

1 commit

e649b3f01 KVM: x86: Fix APIC page invalidation race ... Browse Code »

Commit b1394e745b94 ("KVM: x86: fix APIC page invalidation") tried
to fix inappropriate APIC page invalidation by re-introducing arch
specific kvm_arch_mmu_notifier_invalidate_range() and calling it from
kvm_mmu_notifier_invalidate_range_start. However, the patch left a
possible race where the VMCS APIC address cache is updated *before*
it is unmapped:

(Invalidator) kvm_mmu_notifier_invalidate_range_start()
(Invalidator) kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD)
(KVM VCPU) vcpu_enter_guest()
(KVM VCPU) kvm_vcpu_reload_apic_access_page()
(Invalidator) actually unmap page

Because of the above race, there can be a mismatch between the
host physical address stored in the APIC_ACCESS_PAGE VMCS field and
the host physical address stored in the EPT entry for the APIC GPA
(0xfee0000). When this happens, the processor will not trap APIC
accesses, and will instead show the raw contents of the APIC-access page.
Because Windows OS periodically checks for unexpected modifications to
the LAPIC register, this will show up as a BSOD crash with BugCheck
CRITICAL_STRUCTURE_CORRUPTION (109) we are currently seeing in
https://bugzilla.redhat.com/show_bug.cgi?id=1751017.

The root cause of the issue is that kvm_arch_mmu_notifier_invalidate_range()
cannot guarantee that no additional references are taken to the pages in
the range before kvm_mmu_notifier_invalidate_range_end(). Fortunately,
this case is supported by the MMU notifier API, as documented in
include/linux/mmu_notifier.h:

* If the subsystem
* can't guarantee that no additional references are taken to
* the pages in the range, it has to implement the
* invalidate_range() notifier to remove any references taken
* after invalidate_range_start().

The fix therefore is to reload the APIC-access page field in the VMCS
from kvm_mmu_notifier_invalidate_range() instead of ..._range_start().

Cc: stable@vger.kernel.org
Fixes: b1394e745b94 ("KVM: x86: fix APIC page invalidation")
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=197951
Signed-off-by: Eiichi Tsukata
Message-Id:
Signed-off-by: Paolo Bonzini

Eiichi Tsukata
2020-06-08 21:05:38 +0800

05 Jun, 2020

1 commit

7ec28e264 KVM: Use vmemdup_user() ... Browse Code »

Replace opencoded alloc and copy with vmemdup_user().

Signed-off-by: Denis Efremov
Message-Id:
Signed-off-by: Paolo Bonzini

Denis Efremov
2020-06-05 02:41:05 +0800

04 Jun, 2020

1 commit

d56f5136b KVM: let kvm_destroy_vm_debugfs clean up vCPU debugfs directories ... Browse Code »

After commit 63d0434 ("KVM: x86: move kvm_create_vcpu_debugfs after
last failure point") we are creating the pre-vCPU debugfs files
after the creation of the vCPU file descriptor. This makes it
possible for userspace to reach kvm_vcpu_release before
kvm_create_vcpu_debugfs has finished. The vcpu->debugfs_dentry
then does not have any associated inode anymore, and this causes
a NULL-pointer dereference in debugfs_create_file.

The solution is simply to avoid removing the files; they are
cleaned up when the VM file descriptor is closed (and that must be
after KVM_CREATE_VCPU returns). We can stop storing the dentry
in struct kvm_vcpu too, because it is not needed anywhere after
kvm_create_vcpu_debugfs returns.

Reported-by: syzbot+705f4401d5a93a59b87d@syzkaller.appspotmail.com
Fixes: 63d04348371b ("KVM: x86: move kvm_create_vcpu_debugfs after last failure point")
Signed-off-by: Paolo Bonzini

Paolo Bonzini
2020-06-04 23:00:54 +0800

01 Jun, 2020

4 commits

380609445 Merge tag 'kvmarm-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD ... Browse Code »

KVM/arm64 updates for Linux 5.8:

- Move the arch-specific code into arch/arm64/kvm
- Start the post-32bit cleanup
- Cherry-pick a few non-invasive pre-NV patches

Paolo Bonzini
2020-06-01 16:26:27 +0800
09d952c97 KVM: check userspace_addr for all memslots ... Browse Code »

The userspace_addr alignment and range checks are not performed for private
memory slots that are prepared by KVM itself. This is unnecessary and makes
it questionable to use __*_user functions to access memory later on. We also
rely on the userspace address being aligned since we have an entire family
of functions to map gfn to pfn.

Fortunately skipping the check is completely unnecessary. Only x86 uses
private memslots and their userspace_addr is obtained from vm_mmap,
therefore it must be below PAGE_OFFSET. In fact, any attempt to pass
an address above PAGE_OFFSET would have failed because such an address
would return true for kvm_is_error_hva.

Reported-by: Linus Torvalds
Signed-off-by: Paolo Bonzini

Paolo Bonzini
2020-06-01 16:26:14 +0800
0958f0cef KVM: introduce kvm_read_guest_offset_cached() ... Browse Code »

We already have kvm_write_guest_offset_cached(), introduce read analogue.

Signed-off-by: Vitaly Kuznetsov
Message-Id:
Signed-off-by: Paolo Bonzini

Vitaly Kuznetsov
2020-06-01 16:26:07 +0800
a8387d0b4 Revert "KVM: No need to retry for hva_to_pfn_remapped()" ... Browse Code »

This reverts commit 5b494aea13fe9ec67365510c0d75835428cbb303.
If unlocked==true then the vma pointer could be invalidated, so the 2nd
follow_pfn() is potentially racy: we do need to get out and redo
find_vma_intersection().

Signed-off-by: Peter Xu
Signed-off-by: Paolo Bonzini

Paolo Bonzini
2020-06-01 16:26:05 +0800

16 May, 2020

4 commits

656012c73 KVM: Fix spelling in code comments ... Browse Code »

Fix spelling and typos (e.g., repeated words) in comments.

Signed-off-by: Fuad Tabba
Signed-off-by: Marc Zyngier
Link: https://lore.kernel.org/r/20200401140310.29701-1-tabba@google.com

Fuad Tabba
2020-05-16 22:05:01 +0800
cb953129b kvm: add halt-polling cpu usage stats ... Browse Code »

Two new stats for exposing halt-polling cpu usage:
halt_poll_success_ns
halt_poll_fail_ns

Thus sum of these 2 stats is the total cpu time spent polling. "success"
means the VCPU polled until a virtual interrupt was delivered. "fail"
means the VCPU had to schedule out (either because the maximum poll time
was reached or it needed to yield the CPU).

To avoid touching every arch's kvm_vcpu_stat struct, only update and
export halt-polling cpu usage stats if we're on x86.

Exporting cpu usage as a u64 and in nanoseconds means we will overflow at
~500 years, which seems reasonably large.

Signed-off-by: David Matlack
Signed-off-by: Jon Cargille
Reviewed-by: Jim Mattson

Message-Id:
Signed-off-by: Paolo Bonzini

David Matlack
2020-05-16 00:26:26 +0800
379a3c8ee KVM: VMX: Optimize posted-interrupt delivery for timer fastpath ... Browse Code »

While optimizing posted-interrupt delivery especially for the timer
fastpath scenario, I measured kvm_x86_ops.deliver_posted_interrupt()
to introduce substantial latency because the processor has to perform
all vmentry tasks, ack the posted interrupt notification vector,
read the posted-interrupt descriptor etc.

This is not only slow, it is also unnecessary when delivering an
interrupt to the current CPU (as is the case for the LAPIC timer) because
PIR->IRR and IRR->RVI synchronization is already performed on vmentry
Therefore skip kvm_vcpu_trigger_posted_interrupt in this case, and
instead do vmx_sync_pir_to_irr() on the EXIT_FASTPATH_REENTER_GUEST
fastpath as well.

Tested-by: Haiwei Li
Cc: Haiwei Li
Suggested-by: Paolo Bonzini
Signed-off-by: Wanpeng Li
Message-Id:
Signed-off-by: Paolo Bonzini

Wanpeng Li
2020-05-16 00:26:20 +0800
5b494aea1 KVM: No need to retry for hva_to_pfn_remapped() ... Browse Code »

hva_to_pfn_remapped() calls fixup_user_fault(), which has already
handled the retry gracefully. Even if "unlocked" is set to true, it
means that we've got a VM_FAULT_RETRY inside fixup_user_fault(),
however the page fault has already retried and we should have the pfn
set correctly. No need to do that again.

Signed-off-by: Peter Xu
Message-Id:
Signed-off-by: Paolo Bonzini

Peter Xu
2020-05-16 00:26:14 +0800

14 May, 2020

2 commits

da4ad88ca kvm: Replace vcpu->swait with rcuwait ... Browse Code »

The use of any sort of waitqueue (simple or regular) for
wait/waking vcpus has always been an overkill and semantically
wrong. Because this is per-vcpu (which is blocked) there is
only ever a single waiting vcpu, thus no need for any sort of
queue.

As such, make use of the rcuwait primitive, with the following
considerations:

- rcuwait already provides the proper barriers that serialize
concurrent waiter and waker.

- Task wakeup is done in rcu read critical region, with a
stable task pointer.

- Because there is no concurrency among waiters, we need
not worry about rcuwait_wait_event() calls corrupting
the wait->task. As a consequence, this saves the locking
done in swait when modifying the queue. This also applies
to per-vcore wait for powerpc kvm-hv.

The x86 tscdeadline_latency test mentioned in 8577370fb0cb
("KVM: Use simple waitqueue for vcpu->wq") shows that, on avg,
latency is reduced by around 15-20% with this change.

Cc: Paul Mackerras
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-mips@vger.kernel.org
Reviewed-by: Marc Zyngier
Signed-off-by: Davidlohr Bueso
Message-Id:
[Avoid extra logic changes. - Paolo]
Signed-off-by: Paolo Bonzini

Davidlohr Bueso
2020-05-14 00:14:56 +0800
4aef2ec90 Merge branch 'kvm-amd-fixes' into HEAD Browse Code »

Paolo Bonzini
2020-05-14 00:14:05 +0800

08 May, 2020

1 commit

54163a346 KVM: Introduce kvm_make_all_cpus_request_except() ... Browse Code »

This allows making request to all other vcpus except the one
specified in the parameter.

Signed-off-by: Suravee Suthikulpanit
Message-Id:
Signed-off-by: Paolo Bonzini

Suravee Suthikulpanit
2020-05-08 19:44:32 +0800

25 Apr, 2020

1 commit

acd05785e kvm: add capability for halt polling ... Browse Code »

KVM_CAP_HALT_POLL is a per-VM capability that lets userspace
control the halt-polling time, allowing halt-polling to be tuned or
disabled on particular VMs.

With dynamic halt-polling, a VM's VCPUs can poll from anywhere from
[0, halt_poll_ns] on each halt. KVM_CAP_HALT_POLL sets the
upper limit on the poll time.

Signed-off-by: David Matlack
Signed-off-by: Jon Cargille
Reviewed-by: Jim Mattson
Message-Id:
Signed-off-by: Paolo Bonzini

David Matlack
2020-04-25 00:53:17 +0800

21 Apr, 2020

4 commits

e72436bc3 KVM: SVM: avoid infinite loop on NPF from bad address ... Browse Code »

When a nested page fault is taken from an address that does not have
a memslot associated to it, kvm_mmu_do_page_fault returns RET_PF_EMULATE
(via mmu_set_spte) and kvm_mmu_page_fault then invokes svm_need_emulation_on_page_fault.

The default answer there is to return false, but in this case this just
causes the page fault to be retried ad libitum. Since this is not a
fast path, and the only other case where it is taken is an erratum,
just stick a kvm_vcpu_gfn_to_memslot check in there to detect the
common case where the erratum is not happening.

This fixes an infinite loop in the new set_memory_region_test.

Signed-off-by: Paolo Bonzini

Paolo Bonzini
2020-04-21 21:13:13 +0800
1b94f6f81 KVM: Remove redundant argument to kvm_arch_vcpu_ioctl_run ... Browse Code »

In earlier versions of kvm, 'kvm_run' was an independent structure
and was not included in the vcpu structure. At present, 'kvm_run'
is already included in the vcpu structure, so the parameter
'kvm_run' is redundant.

This patch simplifies the function definition, removes the extra
'kvm_run' parameter, and extracts it from the 'kvm_vcpu' structure
if necessary.

Signed-off-by: Tianjia Zhang
Message-Id:
Signed-off-by: Paolo Bonzini

Tianjia Zhang
2020-04-21 21:13:11 +0800
c36b71503 KVM: x86/mmu: Avoid an extra memslot lookup in try_async_pf() for L2 ... Browse Code »

Create a new function kvm_is_visible_memslot() and use it from
kvm_is_visible_gfn(); use the new function in try_async_pf() too,
to avoid an extra memslot lookup.

Opportunistically squish a multi-line comment into a single-line comment.

Note, the end result, KVM_PFN_NOSLOT, is unchanged.

Cc: Jim Mattson
Cc: Rick Edgecombe
Suggested-by: Sean Christopherson
Signed-off-by: Paolo Bonzini

Paolo Bonzini
2020-04-21 21:13:08 +0800
63d043483 KVM: x86: move kvm_create_vcpu_debugfs after last failure point ... Browse Code »

The placement of kvm_create_vcpu_debugfs is more or less irrelevant, since
it cannot fail and userspace should not care about the debugfs entries until
it knows the vcpu has been created. Moving it after the last failure
point removes the need to remove the directory when unwinding the creation.

Reviewed-by: Emanuele Giuseppe Esposito
Message-Id:
Signed-off-by: Paolo Bonzini

Paolo Bonzini
2020-04-21 21:13:00 +0800

16 Apr, 2020

1 commit

788109c1c KVM: remove redundant assignment to variable r ... Browse Code »

The variable r is being assigned with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King
Message-Id:
Signed-off-by: Paolo Bonzini

Colin Ian King
2020-04-16 00:08:40 +0800

31 Mar, 2020

1 commit

b99040853 KVM: Pass kvm_init()'s opaque param to additional arch funcs ... Browse Code »

Pass @opaque to kvm_arch_hardware_setup() and
kvm_arch_check_processor_compat() to allow architecture specific code to
reference @opaque without having to stash it away in a temporary global
variable. This will enable x86 to separate its vendor specific callback
ops, which are passed via @opaque, into "init" and "runtime" ops without
having to stash away the "init" ops.

No functional change intended.

Reviewed-by: Cornelia Huck
Tested-by: Cornelia Huck #s390
Acked-by: Marc Zyngier
Signed-off-by: Sean Christopherson
Message-Id:
Reviewed-by: Vitaly Kuznetsov
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-03-31 22:48:03 +0800

26 Mar, 2020

1 commit

0774a964e KVM: Fix out of range accesses to memslots ... Browse Code »

Reset the LRU slot if it becomes invalid when deleting a memslot to fix
an out-of-bounds/use-after-free access when searching through memslots.

Explicitly check for there being no used slots in search_memslots(), and
in the caller of s390's approximation variant.

Fixes: 36947254e5f9 ("KVM: Dynamically size memslot array based on number of used slots")
Reported-by: Qian Cai
Cc: Peter Xu
Signed-off-by: Sean Christopherson
Message-Id:
Acked-by: Christian Borntraeger
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-03-26 17:58:27 +0800

17 Mar, 2020

7 commits

600087b61 KVM: Drop largepages_enabled and its accessor/mutator ... Browse Code »

Drop largepages_enabled, kvm_largepages_enabled() and
kvm_disable_largepages() now that all users are gone.

Note, largepages_enabled was an x86-only flag that got left in common
KVM code when KVM gained support for multiple architectures.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov
Signed-off-by: Sean Christopherson
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-03-17 00:58:42 +0800
2bde08f9f KVM: Drop gfn_to_pfn_atomic() ... Browse Code »

It's never used anywhere now.

Signed-off-by: Peter Xu
Signed-off-by: Paolo Bonzini

Peter Xu
2020-03-17 00:57:48 +0800
3c9bd4006 KVM: x86: enable dirty log gradually in small chunks ... Browse Code »

It could take kvm->mmu_lock for an extended period of time when
enabling dirty log for the first time. The main cost is to clear
all the D-bits of last level SPTEs. This situation can benefit from
manual dirty log protect as well, which can reduce the mmu_lock
time taken. The sequence is like this:

1. Initialize all the bits of the dirty bitmap to 1 when enabling
dirty log for the first time
2. Only write protect the huge pages
3. KVM_GET_DIRTY_LOG returns the dirty bitmap info
4. KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level
SPTEs gradually in small chunks

Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment,
I did some tests with a 128G windows VM and counted the time taken
of memory_global_dirty_log_start, here is the numbers:

VM Size Before After optimization
128G 460ms 10ms

Signed-off-by: Jay Zhou
Signed-off-by: Paolo Bonzini

Jay Zhou
2020-03-17 00:57:37 +0800
36947254e KVM: Dynamically size memslot array based on number of used slots ... Browse Code »

Now that the memslot logic doesn't assume memslots are always non-NULL,
dynamically size the array of memslots instead of unconditionally
allocating memory for the maximum number of memslots.

Note, because a to-be-deleted memslot must first be invalidated, the
array size cannot be immediately reduced when deleting a memslot.
However, consecutive deletions will realize the memory savings, i.e.
a second deletion will trim the entry.

Tested-by: Christoffer Dall
Tested-by: Marc Zyngier
Reviewed-by: Peter Xu
Signed-off-by: Sean Christopherson
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-03-17 00:57:26 +0800
0577d1abe KVM: Terminate memslot walks via used_slots ... Browse Code »

Refactor memslot handling to treat the number of used slots as the de
facto size of the memslot array, e.g. return NULL from id_to_memslot()
when an invalid index is provided instead of relying on npages==0 to
detect an invalid memslot. Rework the sorting and walking of memslots
in advance of dynamically sizing memslots to aid bisection and debug,
e.g. with luck, a bug in the refactoring will bisect here and/or hit a
WARN instead of randomly corrupting memory.

Alternatively, a global null/invalid memslot could be returned, i.e. so
callers of id_to_memslot() don't have to explicitly check for a NULL
memslot, but that approach runs the risk of introducing difficult-to-
debug issues, e.g. if the global null slot is modified. Constifying
the return from id_to_memslot() to combat such issues is possible, but
would require a massive refactoring of arch specific code and would
still be susceptible to casting shenanigans.

Add function comments to update_memslots() and search_memslots() to
explicitly (and loudly) state how memslots are sorted.

Opportunistically stuff @hva with a non-canonical value when deleting a
private memslot on x86 to detect bogus usage of the freed slot.

No functional change intended.

Tested-by: Christoffer Dall
Tested-by: Marc Zyngier
Signed-off-by: Sean Christopherson
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-03-17 00:57:26 +0800
2a49f61df KVM: Ensure validity of memslot with respect to kvm_get_dirty_log() ... Browse Code »

Rework kvm_get_dirty_log() so that it "returns" the associated memslot
on success. A future patch will rework memslot handling such that
id_to_memslot() can return NULL, returning the memslot makes it more
obvious that the validity of the memslot has been verified, i.e.
precludes the need to add validity checks in the arch code that are
technically unnecessary.

To maintain ordering in s390, move the call to kvm_arch_sync_dirty_log()
from s390's kvm_vm_ioctl_get_dirty_log() to the new kvm_get_dirty_log().
This is a nop for PPC, the only other arch that doesn't select
KVM_GENERIC_DIRTYLOG_READ_PROTECT, as its sync_dirty_log() is empty.

Ideally, moving the sync_dirty_log() call would be done in a separate
patch, but it can't be done in a follow-on patch because that would
temporarily break s390's ordering. Making the move in a preparatory
patch would be functionally correct, but would create an odd scenario
where the moved sync_dirty_log() would operate on a "different" memslot
due to consuming the result of a different id_to_memslot(). The
memslot couldn't actually be different as slots_lock is held, but the
code is confusing enough as it is, i.e. moving sync_dirty_log() in this
patch is the lesser of all evils.

Signed-off-by: Sean Christopherson
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-03-17 00:57:25 +0800
0dff08460 KVM: Provide common implementation for generic dirty log functions ... Browse Code »

Move the implementations of KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG
for CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT into common KVM code.
The arch specific implemenations are extremely similar, differing
only in whether the dirty log needs to be sync'd from hardware (x86)
and how the TLBs are flushed. Add new arch hooks to handle sync
and TLB flush; the sync will also be used for non-generic dirty log
support in a future patch (s390).

The ulterior motive for providing a common implementation is to
eliminate the dependency between arch and common code with respect to
the memslot referenced by the dirty log, i.e. to make it obvious in the
code that the validity of the memslot is guaranteed, as a future patch
will rework memslot handling such that id_to_memslot() can return NULL.

Signed-off-by: Sean Christopherson
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-03-17 00:57:24 +0800