Eric Lee / smarc-fsl-linux-kernel

28 Feb, 2020

1 commit

e951445f4 Merge tag 'kvmarm-fixes-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/… ... Browse Code »

…kvmarm/kvmarm into HEAD

KVM/arm fixes for 5.6, take #1

- Fix compilation on 32bit
- Move VHE guest entry/exit into the VHE-specific entry code
- Make sure all functions called by the non-VHE HYP code is tagged as __always_inline

Paolo Bonzini
2020-02-28 18:50:06 +0800

17 Feb, 2020

1 commit

b3f15ec3d kvm: arm/arm64: Fold VHE entry/exit work into kvm_vcpu_run_vhe() ... Browse Code »

With VHE, running a vCPU always requires the sequence:

1. kvm_arm_vhe_guest_enter();
2. kvm_vcpu_run_vhe();
3. kvm_arm_vhe_guest_exit()

... and as we invoke this from the shared arm/arm64 KVM code, 32-bit arm
has to provide stubs for all three functions.

To simplify the common code, and make it easier to make further
modifications to the arm64-specific portions in the near future, let's
fold kvm_arm_vhe_guest_enter() and kvm_arm_vhe_guest_exit() into
kvm_vcpu_run_vhe().

The 32-bit stubs for kvm_arm_vhe_guest_enter() and
kvm_arm_vhe_guest_exit() are removed, as they are no longer used. The
32-bit stub for kvm_vcpu_run_vhe() is left as-is.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland
Signed-off-by: Marc Zyngier
Link: https://lore.kernel.org/r/20200210114757.2889-1-mark.rutland@arm.com

Mark Rutland
2020-02-17 22:38:37 +0800

12 Feb, 2020

1 commit

1f03b2bcd KVM: Disable preemption in kvm_get_running_vcpu() ... Browse Code »

Accessing a per-cpu variable only makes sense when preemption is
disabled (and the kernel does check this when the right debug options
are switched on).

For kvm_get_running_vcpu(), it is fine to return the value after
re-enabling preemption, as the preempt notifiers will make sure that
this is kept consistent across task migration (the comment above the
function hints at it, but lacks the crucial preemption management).

While we're at it, move the comment from the ARM code, which explains
why the whole thing works.

Fixes: 7495e22bb165 ("KVM: Move running VCPU from ARM to common code").
Cc: Paolo Bonzini
Reported-by: Zenghui Yu
Tested-by: Zenghui Yu
Reviewed-by: Peter Xu
Signed-off-by: Marc Zyngier
Link: https://lore.kernel.org/r/318984f6-bc36-33a3-abc6-bf2295974b06@huawei.com
Message-id:
Signed-off-by: Paolo Bonzini

Marc Zyngier
2020-02-12 19:19:35 +0800

05 Feb, 2020

2 commits

7df003c85 KVM: fix overflow of zero page refcount with ksm running ... Browse Code »

We are testing Virtual Machine with KSM on v5.4-rc2 kernel,
and found the zero_page refcount overflow.
The cause of refcount overflow is increased in try_async_pf
(get_user_page) without being decreased in mmu_set_spte()
while handling ept violation.
In kvm_release_pfn_clean(), only unreserved page will call
put_page. However, zero page is reserved.
So, as well as creating and destroy vm, the refcount of
zero page will continue to increase until it overflows.

step1:
echo 10000 > /sys/kernel/pages_to_scan/pages_to_scan
echo 1 > /sys/kernel/pages_to_scan/run
echo 1 > /sys/kernel/pages_to_scan/use_zero_pages

step2:
just create several normal qemu kvm vms.
And destroy it after 10s.
Repeat this action all the time.

After a long period of time, all domains hang because
of the refcount of zero page overflow.

Qemu print error log as follow:
â€¦
error: kvm run failed Bad address
EAX=00006cdc EBX=00000008 ECX=80202001 EDX=078bfbfd
ESI=ffffffff EDI=00000000 EBP=00000008 ESP=00006cc4
EIP=000efd75 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
DS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
FS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
GS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT= 000f7070 00000037
IDT= 000f70ae 00000000
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 01 00 00 00 e9 e8 00 00 00 c7 05 4c 55 0f 00 01 00 00 00 35 00 00 01 00 8b 3d 04 00 01 00 b8 d8 d3 00 00 c1 e0 08 0c ea a3 00 00 01 00 c7 05 04
â€¦

Meanwhile, a kernel warning is departed.

[40914.836375] WARNING: CPU: 3 PID: 82067 at ./include/linux/mm.h:987 try_get_page+0x1f/0x30
[40914.836412] CPU: 3 PID: 82067 Comm: CPU 0/KVM Kdump: loaded Tainted: G OE 5.2.0-rc2 #5
[40914.836415] RIP: 0010:try_get_page+0x1f/0x30
[40914.836417] Code: 40 00 c3 0f 1f 84 00 00 00 00 00 48 8b 47 08 a8 01 75 11 8b 47 34 85 c0 7e 10 f0 ff 47 34 b8 01 00 00 00 c3 48 8d 78 ff eb e9 0b 31 c0 c3 66 90 66 2e 0f 1f 84 00 0
0 00 00 00 48 8b 47 08 a8
[40914.836418] RSP: 0018:ffffb4144e523988 EFLAGS: 00010286
[40914.836419] RAX: 0000000080000000 RBX: 0000000000000326 RCX: 0000000000000000
[40914.836420] RDX: 0000000000000000 RSI: 00004ffdeba10000 RDI: ffffdf07093f6440
[40914.836421] RBP: ffffdf07093f6440 R08: 800000424fd91225 R09: 0000000000000000
[40914.836421] R10: ffff9eb41bfeebb8 R11: 0000000000000000 R12: ffffdf06bbd1e8a8
[40914.836422] R13: 0000000000000080 R14: 800000424fd91225 R15: ffffdf07093f6440
[40914.836423] FS: 00007fb60ffff700(0000) GS:ffff9eb4802c0000(0000) knlGS:0000000000000000
[40914.836425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[40914.836426] CR2: 0000000000000000 CR3: 0000002f220e6002 CR4: 00000000003626e0
[40914.836427] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[40914.836427] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[40914.836428] Call Trace:
[40914.836433] follow_page_pte+0x302/0x47b
[40914.836437] __get_user_pages+0xf1/0x7d0
[40914.836441] ? irq_work_queue+0x9/0x70
[40914.836443] get_user_pages_unlocked+0x13f/0x1e0
[40914.836469] __gfn_to_pfn_memslot+0x10e/0x400 [kvm]
[40914.836486] try_async_pf+0x87/0x240 [kvm]
[40914.836503] tdp_page_fault+0x139/0x270 [kvm]
[40914.836523] kvm_mmu_page_fault+0x76/0x5e0 [kvm]
[40914.836588] vcpu_enter_guest+0xb45/0x1570 [kvm]
[40914.836632] kvm_arch_vcpu_ioctl_run+0x35d/0x580 [kvm]
[40914.836645] kvm_vcpu_ioctl+0x26e/0x5d0 [kvm]
[40914.836650] do_vfs_ioctl+0xa9/0x620
[40914.836653] ksys_ioctl+0x60/0x90
[40914.836654] __x64_sys_ioctl+0x16/0x20
[40914.836658] do_syscall_64+0x5b/0x180
[40914.836664] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[40914.836666] RIP: 0033:0x7fb61cb6bfc7

Signed-off-by: LinFeng
Signed-off-by: Zhuang Yanying
Signed-off-by: Paolo Bonzini

Zhuang Yanying
2020-02-05 22:27:46 +0800
51b256940 KVM: arm/arm64: Fix up includes for trace.h ... Browse Code »

Fedora kernel builds on armv7hl began failing recently because
kvm_arm_exception_type and kvm_arm_exception_class were undeclared in
trace.h. Add the missing include.

Fixes: 0e20f5e25556 ("KVM: arm/arm64: Cleanup MMIO handling")
Signed-off-by: Jeremy Cline
Signed-off-by: Marc Zyngier
Link: https://lore.kernel.org/r/20200205134146.82678-1-jcline@redhat.com

Jeremy Cline
2020-02-05 22:26:16 +0800

31 Jan, 2020

4 commits

4cbc418a4 Merge branch 'cve-2019-3016' into kvm-next-5.6 ... Browse Code »

From Boris Ostrovsky:

The KVM hypervisor may provide a guest with ability to defer remote TLB
flush when the remote VCPU is not running. When this feature is used,
the TLB flush will happen only when the remote VPCU is scheduled to run
again. This will avoid unnecessary (and expensive) IPIs.

Under certain circumstances, when a guest initiates such deferred action,
the hypervisor may miss the request. It is also possible that the guest
may mistakenly assume that it has already marked remote VCPU as needing
a flush when in fact that request had already been processed by the
hypervisor. In both cases this will result in an invalid translation
being present in a vCPU, potentially allowing accesses to memory locations
in that guest's address space that should not be accessible.

Note that only intra-guest memory is vulnerable.

The five patches address both of these problems:
1. The first patch makes sure the hypervisor doesn't accidentally clear
a guest's remote flush request
2. The rest of the patches prevent the race between hypervisor
acknowledging a remote flush request and guest issuing a new one.

Conflicts:
arch/x86/kvm/x86.c [move from kvm_arch_vcpu_free to kvm_arch_vcpu_destroy]

Paolo Bonzini
2020-01-31 01:47:59 +0800
917248144 x86/kvm: Cache gfn to pfn translation ... Browse Code »

__kvm_map_gfn()'s call to gfn_to_pfn_memslot() is
* relatively expensive
* in certain cases (such as when done from atomic context) cannot be called

Stashing gfn-to-pfn mapping should help with both cases.

This is part of CVE-2019-3016.

Signed-off-by: Boris Ostrovsky
Reviewed-by: Joao Martins
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini

Boris Ostrovsky
2020-01-31 01:45:55 +0800
1eff70a9a x86/kvm: Introduce kvm_(un)map_gfn() ... Browse Code »

kvm_vcpu_(un)map operates on gfns from any current address space.
In certain cases we want to make sure we are not mapping SMRAM
and for that we can use kvm_(un)map_gfn() that we are introducing
in this patch.

This is part of CVE-2019-3016.

Signed-off-by: Boris Ostrovsky
Reviewed-by: Joao Martins
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini

Boris Ostrovsky
2020-01-31 01:45:54 +0800
621ab20c0 Merge tag 'kvmarm-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD ... Browse Code »

KVM/arm updates for Linux 5.6

- Fix MMIO sign extension
- Fix HYP VA tagging on tag space exhaustion
- Fix PSTATE/CPSR handling when generating exception
- Fix MMU notifier's advertizing of young pages
- Fix poisoned page handling
- Fix PMU SW event handling
- Fix TVAL register access
- Fix AArch32 external abort injection
- Fix ITS unmapped collection handling
- Various cleanups

Paolo Bonzini
2020-01-31 01:13:14 +0800

28 Jan, 2020

23 commits

4a267aa70 KVM: arm64: Treat emulated TVAL TimerValue as a signed 32-bit integer ... Browse Code »

According to the ARM ARM, registers CNT{P,V}_TVAL_EL0 have bits [63:32]
RES0 [1]. When reading the register, the value is truncated to the least
significant 32 bits [2], and on writes, TimerValue is treated as a signed
32-bit integer [1, 2].

When the guest behaves correctly and writes 32-bit values, treating TVAL
as an unsigned 64 bit register works as expected. However, things start
to break down when the guest writes larger values, because
(u64)0x1_ffff_ffff = 8589934591. but (s32)0x1_ffff_ffff = -1, and the
former will cause the timer interrupt to be asserted in the future, but
the latter will cause it to be asserted now. Let's treat TVAL as a
signed 32-bit register on writes, to match the behaviour described in
the architecture, and the behaviour experimentally exhibited by the
virtual timer on a non-vhe host.

[1] Arm DDI 0487E.a, section D13.8.18
[2] Arm DDI 0487E.a, section D11.2.4

Signed-off-by: Alexandru Elisei
[maz: replaced the read-side mask with lower_32_bits]
Signed-off-by: Marc Zyngier
Fixes: 8fa761624871 ("KVM: arm/arm64: arch_timer: Fix CNTP_TVAL calculation")
Link: https://lore.kernel.org/r/20200127103652.2326-1-alexandru.elisei@arm.com

Alexandru Elisei
2020-01-28 21:09:31 +0800
c01d6a180 KVM: arm64: pmu: Only handle supported event counters ... Browse Code »

Let the code never use unsupported event counters. Change
kvm_pmu_handle_pmcr() to only reset supported counters and
kvm_pmu_vcpu_reset() to only stop supported counters.

Other actions are filtered on the supported counters in
kvm/sysregs.c

Signed-off-by: Eric Auger
Signed-off-by: Marc Zyngier
Link: https://lore.kernel.org/r/20200124142535.29386-5-eric.auger@redhat.com

Eric Auger
2020-01-28 21:05:05 +0800
aa7682917 KVM: arm64: pmu: Fix chained SW_INCR counters ... Browse Code »

At the moment a SW_INCR counter always overflows on 32-bit
boundary, independently on whether the n+1th counter is
programmed as CHAIN.

Check whether the SW_INCR counter is a 64b counter and if so,
implement the 64b logic.

Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters")
Signed-off-by: Eric Auger
Signed-off-by: Marc Zyngier
Link: https://lore.kernel.org/r/20200124142535.29386-4-eric.auger@redhat.com

Eric Auger
2020-01-28 20:50:33 +0800
76c9fc56d KVM: arm64: pmu: Don't mark a counter as chained if the odd one is disabled ... Browse Code »

At the moment we update the chain bitmap on type setting. This
does not take into account the enable state of the odd register.

Let's make sure a counter is never considered as chained if
the high counter is disabled.

We recompute the chain state on enable/disable and type changes.

Also let create_perf_event() use the chain bitmap and not use
kvm_pmu_idx_has_chain_evtype().

Suggested-by: Marc Zyngier
Signed-off-by: Eric Auger
Signed-off-by: Marc Zyngier
Link: https://lore.kernel.org/r/20200124142535.29386-3-eric.auger@redhat.com

Eric Auger
2020-01-28 20:50:33 +0800
3837407c1 KVM: arm64: pmu: Don't increment SW_INCR if PMCR.E is unset ... Browse Code »

The specification says PMSWINC increments PMEVCNTR_EL1 by 1
if PMEVCNTR_EL0 is enabled and configured to count SW_INCR.

For PMEVCNTR_EL0 to be enabled, we need both PMCNTENSET to
be set for the corresponding event counter but we also need
the PMCR.E bit to be set.

Fixes: 7a0adc7064b8 ("arm64: KVM: Add access handler for PMSWINC register")
Signed-off-by: Eric Auger
Signed-off-by: Marc Zyngier
Reviewed-by: Andrew Murray
Acked-by: Marc Zyngier
Link: https://lore.kernel.org/r/20200124142535.29386-2-eric.auger@redhat.com

Eric Auger
2020-01-28 20:50:32 +0800
42cde48b2 KVM: Play nice with read-only memslots when querying host page size ... Browse Code »

Avoid the "writable" check in __gfn_to_hva_many(), which will always fail
on read-only memslots due to gfn_to_hva() assuming writes. Functionally,
this allows x86 to create large mappings for read-only memslots that
are backed by HugeTLB mappings.

Note, the changelog for commit 05da45583de9 ("KVM: MMU: large page
support") states "If the largepage contains write-protected pages, a
large pte is not used.", but "write-protected" refers to pages that are
temporarily read-only, e.g. read-only memslots didn't even exist at the
time.

Fixes: 4d8b81abc47b ("KVM: introduce readonly memslot")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson
[Redone using kvm_vcpu_gfn_to_memslot_prot. - Paolo]
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-01-28 03:00:02 +0800
f9b84e192 KVM: Use vcpu-specific gva->hva translation when querying host page size ... Browse Code »

Use kvm_vcpu_gfn_to_hva() when retrieving the host page size so that the
correct set of memslots is used when handling x86 page faults in SMM.

Fixes: 54bf36aac520 ("KVM: x86: use vcpu-specific functions to read/write/translate GFNs")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-01-28 03:00:02 +0800
005ba37cb mm: thp: KVM: Explicitly check for THP when populating secondary MMU ... Browse Code »

Add a helper, is_transparent_hugepage(), to explicitly check whether a
compound page is a THP and use it when populating KVM's secondary MMU.
The explicit check fixes a bug where a remapped compound page, e.g. for
an XDP Rx socket, is mapped into a KVM guest and is mistaken for a THP,
which results in KVM incorrectly creating a huge page in its secondary
MMU.

Fixes: 936a5fe6e6148 ("thp: kvm mmu transparent hugepage support")
Reported-by: syzbot+c9d1fb51ac9d0d10c39d@syzkaller.appspotmail.com
Cc: Andrea Arcangeli
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-01-28 03:00:01 +0800
dc9ce71e6 KVM: Return immediately if __kvm_gfn_to_hva_cache_init() fails ... Browse Code »

Check the result of __kvm_gfn_to_hva_cache_init() and return immediately
instead of relying on the kvm_is_error_hva() check to detect errors so
that it's abundantly clear KVM intends to immediately bail on an error.

Note, the hva check is still mandatory to handle errors on subqeuesnt
calls with the same generation. Similarly, always return -EFAULT on
error so that multiple (bad) calls for a given generation will get the
same result, e.g. on an illegal gfn wrap, propagating the return from
__kvm_gfn_to_hva_cache_init() would cause the initial call to return
-EINVAL and subsequent calls to return -EFAULT.

Signed-off-by: Sean Christopherson
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-01-28 03:00:00 +0800
6ad1e29fe KVM: Clean up __kvm_gfn_to_hva_cache_init() and its callers ... Browse Code »

Barret reported a (technically benign) bug where nr_pages_avail can be
accessed without being initialized if gfn_to_hva_many() fails.

virt/kvm/kvm_main.c:2193:13: warning: 'nr_pages_avail' may be
used uninitialized in this function [-Wmaybe-uninitialized]

Rather than simply squashing the warning by initializing nr_pages_avail,
fix the underlying issues by reworking __kvm_gfn_to_hva_cache_init() to
return immediately instead of continuing on. Now that all callers check
the result and/or bail immediately on a bad hva, there's no need to
explicitly nullify the memslot on error.

Reported-by: Barret Rhoden
Fixes: f1b9dd5eb86c ("kvm: Disallow wraparound in kvm_gfn_to_hva_cache_init")
Cc: Jim Mattson
Signed-off-by: Sean Christopherson
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-01-28 02:59:59 +0800
fcfbc6175 KVM: Check for a bad hva before dropping into the ghc slow path ... Browse Code »

When reading/writing using the guest/host cache, check for a bad hva
before checking for a NULL memslot, which triggers the slow path for
handing cross-page accesses. Because the memslot is nullified on error
by __kvm_gfn_to_hva_cache_init(), if the bad hva is encountered after
crossing into a new page, then the kvm_{read,write}_guest() slow path
could potentially write/access the first chunk prior to detecting the
bad hva.

Arguably, performing a partial access is semantically correct from an
architectural perspective, but that behavior is certainly not intended.
In the original implementation, memslot was not explicitly nullified
and therefore the partial access behavior varied based on whether the
memslot itself was null, or if the hva was simply bad. The current
behavior was introduced as a seemingly unintentional side effect in
commit f1b9dd5eb86c ("kvm: Disallow wraparound in
kvm_gfn_to_hva_cache_init"), which justified the change with "since some
callers don't check the return code from this function, it sit seems
prudent to clear ghc->memslot in the event of an error".

Regardless of intent, the partial access is dependent on _not_ checking
the result of the cache initialization, which is arguably a bug in its
own right, at best simply weird.

Fixes: 8f964525a121 ("KVM: Allow cross page reads and writes from cached translations.")
Cc: Jim Mattson
Cc: Andrew Honig
Signed-off-by: Sean Christopherson
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-01-28 02:59:58 +0800
7495e22bb KVM: Move running VCPU from ARM to common code ... Browse Code »

For ring-based dirty log tracking, it will be more efficient to account
writes during schedule-out or schedule-in to the currently running VCPU.
We would like to do it even if the write doesn't use the current VCPU's
address space, as is the case for cached writes (see commit 4e335d9e7ddb,
"Revert "KVM: Support vCPU-based gfn->hva cache"", 2017-05-02).

Therefore, add a mechanism to track the currently-loaded kvm_vcpu struct.
There is already something similar in KVM/ARM; one important difference
is that kvm_arch_vcpu_{load,put} have two callers in virt/kvm/kvm_main.c:
we have to update both the architecture-independent vcpu_{load,put} and
the preempt notifiers.

Another change made in the process is to allow using kvm_get_running_vcpu()
in preemptible code. This is allowed because preempt notifiers ensure
that the value does not change even after the VCPU thread is migrated.

Signed-off-by: Paolo Bonzini
Reviewed-by: Paolo Bonzini
Signed-off-by: Peter Xu
Signed-off-by: Paolo Bonzini

Paolo Bonzini
2020-01-28 02:59:54 +0800
fcd97ad58 KVM: Add build-time error check on kvm_run size ... Browse Code »

It's already going to reach 2400 Bytes (which is over half of page
size on 4K page archs), so maybe it's good to have this build-time
check in case it overflows when adding new fields.

Signed-off-by: Peter Xu
Signed-off-by: Paolo Bonzini

Peter Xu
2020-01-28 02:59:52 +0800
ef82eddc0 KVM: Remove kvm_read_guest_atomic() ... Browse Code »

Remove kvm_read_guest_atomic() because it's not used anywhere.

Signed-off-by: Peter Xu
Signed-off-by: Paolo Bonzini

Peter Xu
2020-01-28 02:59:51 +0800
8bd826d62 KVM: Move vcpu->run page allocation out of kvm_vcpu_init() ... Browse Code »

Open code the allocation and freeing of the vcpu->run page in
kvm_vm_ioctl_create_vcpu() and kvm_vcpu_destroy() respectively. Doing
so allows kvm_vcpu_init() to be a pure init function and eliminates
kvm_vcpu_uninit() entirely.

Signed-off-by: Sean Christopherson
Reviewed-by: Cornelia Huck
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-01-28 02:59:34 +0800
9941d224f KVM: Move putting of vcpu->pid to kvm_vcpu_destroy() ... Browse Code »

Move the putting of vcpu->pid to kvm_vcpu_destroy(). vcpu->pid is
guaranteed to be NULL when kvm_vcpu_uninit() is called in the error path
of kvm_vm_ioctl_create_vcpu(), e.g. it is explicitly nullified by
kvm_vcpu_init() and is only changed by KVM_RUN.

No functional change intended.

Acked-by: Christoffer Dall
Signed-off-by: Sean Christopherson
Reviewed-by: Cornelia Huck
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-01-28 02:59:33 +0800
ddd259c9a KVM: Drop kvm_arch_vcpu_init() and kvm_arch_vcpu_uninit() ... Browse Code »

Remove kvm_arch_vcpu_init() and kvm_arch_vcpu_uninit() now that all
arch specific implementations are nops.

Acked-by: Christoffer Dall
Signed-off-by: Sean Christopherson
Reviewed-by: Cornelia Huck
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-01-28 02:59:33 +0800
19bcc89eb KVM: arm64: Free sve_state via arm specific hook ... Browse Code »

Add an arm specific hook to free the arm64-only sve_state. Doing so
eliminates the last functional code from kvm_arch_vcpu_uninit() across
all architectures and paves the way for removing kvm_arch_vcpu_init()
and kvm_arch_vcpu_uninit() entirely.

Signed-off-by: Sean Christopherson
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-01-28 02:59:32 +0800
39a93a879 KVM: ARM: Move all vcpu init code into kvm_arch_vcpu_create() ... Browse Code »

Fold init() into create() now that the two are called back-to-back by
common KVM code (kvm_vcpu_init() calls kvm_arch_vcpu_init() as its last
action, and kvm_vm_ioctl_create_vcpu() calls kvm_arch_vcpu_create()
immediately thereafter). This paves the way for removing
kvm_arch_vcpu_{un}init() entirely.

Note, there is no associated unwinding in kvm_arch_vcpu_uninit() that
needs to be relocated (to kvm_arch_vcpu_destroy()).

No functional change intended.

Signed-off-by: Sean Christopherson
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-01-28 02:59:30 +0800
afede96df KVM: Drop kvm_arch_vcpu_setup() ... Browse Code »

Remove kvm_arch_vcpu_setup() now that all arch specific implementations
are nops.

Acked-by: Christoffer Dall
Signed-off-by: Sean Christopherson
Reviewed-by: Cornelia Huck
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-01-28 02:59:28 +0800
d5c48debc KVM: Move initialization of preempt notifier to kvm_vcpu_init() ... Browse Code »

Initialize the preempt notifier immediately in kvm_vcpu_init() to pave
the way for removing kvm_arch_vcpu_setup(), i.e. to allow arch specific
code to call vcpu_load() during kvm_arch_vcpu_create().

Back when preemption support was added, the location of the call to init
the preempt notifier was perfectly sane. The overall vCPU creation flow
featured a single arch specific hook and the preempt notifer was used
immediately after its initialization (by vcpu_load()). E.g.:

vcpu = kvm_arch_ops->vcpu_create(kvm, n);
if (IS_ERR(vcpu))
return PTR_ERR(vcpu);

preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops);

vcpu_load(vcpu);
r = kvm_mmu_setup(vcpu);
vcpu_put(vcpu);
if (r < 0)
goto free_vcpu;

Today, the call to preempt_notifier_init() is sandwiched between two
arch specific calls, kvm_arch_vcpu_create() and kvm_arch_vcpu_setup(),
which needlessly forces x86 (and possibly others?) to split its vCPU
creation flow. Init the preempt notifier prior to any arch specific
call so that each arch can independently decide how best to organize
its creation flow.

Acked-by: Christoffer Dall
Signed-off-by: Sean Christopherson
Reviewed-by: Cornelia Huck
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-01-28 02:59:25 +0800
aaba298c6 KVM: Unexport kvm_vcpu_cache and kvm_vcpu_{un}init() ... Browse Code »

Unexport kvm_vcpu_cache and kvm_vcpu_{un}init() and make them static
now that they are referenced only in kvm_main.c.

Acked-by: Christoffer Dall
Signed-off-by: Sean Christopherson
Reviewed-by: Cornelia Huck
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-01-28 02:59:24 +0800
e529ef66e KVM: Move vcpu alloc and init invocation to common code ... Browse Code »

Now that all architectures tightly couple vcpu allocation/free with the
mandatory calls to kvm_{un}init_vcpu(), move the sequences verbatim to
common KVM code.

Move both allocation and initialization in a single patch to eliminate
thrash in arch specific code. The bisection benefits of moving the two
pieces in separate patches is marginal at best, whereas the odds of
introducing a transient arch specific bug are non-zero.

Acked-by: Christoffer Dall
Signed-off-by: Sean Christopherson
Reviewed-by: Cornelia Huck
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-01-28 02:59:20 +0800

24 Jan, 2020

3 commits

4543bdc08 KVM: Introduce kvm_vcpu_destroy() ... Browse Code »

Add kvm_vcpu_destroy() and wire up all architectures to call the common
function instead of their arch specific implementation. The common
destruction function will be used by future patches to move allocation
and initialization of vCPUs to common KVM code, i.e. to free resources
that are allocated by arch agnostic code.

No functional change intended.

Acked-by: Christoffer Dall
Signed-off-by: Sean Christopherson
Reviewed-by: Cornelia Huck
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-01-24 16:19:11 +0800
897cc38ea KVM: Add kvm_arch_vcpu_precreate() to handle pre-allocation issues ... Browse Code »

Add a pre-allocation arch hook to handle checks that are currently done
by arch specific code prior to allocating the vCPU object. This paves
the way for moving the allocation to common KVM code.

Acked-by: Christoffer Dall
Signed-off-by: Sean Christopherson
Reviewed-by: Cornelia Huck
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-01-24 16:19:07 +0800
4b8fff780 KVM: arm: Drop kvm_arch_vcpu_free() ... Browse Code »

Remove the superfluous kvm_arch_vcpu_free() as it is no longer called
from commmon KVM code. Note, kvm_arch_vcpu_destroy() *is* called from
common code, i.e. choosing which function to whack is not completely
arbitrary.

Acked-by: Christoffer Dall
Signed-off-by: Sean Christopherson
Signed-off-by: Paolo Bonzini

Sean Christopherson
2020-01-24 16:19:03 +0800

23 Jan, 2020

5 commits

21aecdbd7 KVM: arm: Make inject_abt32() inject an external abort instead ... Browse Code »

KVM's inject_abt64() injects an external-abort into an aarch64 guest.
The KVM_CAP_ARM_INJECT_EXT_DABT is intended to do exactly this, but
for an aarch32 guest inject_abt32() injects an implementation-defined
exception, 'Lockdown fault'.

Change this to external abort. For non-LPAE we now get the documented:
| Unhandled fault: external abort on non-linefetch (0x008) at 0x9c800f00
and for LPAE:
| Unhandled fault: synchronous external abort (0x210) at 0x9c800f00

Fixes: 74a64a981662a ("KVM: arm/arm64: Unify 32bit fault injection")
Reported-by: Beata Michalska
Signed-off-by: James Morse
Signed-off-by: Marc Zyngier
Link: https://lore.kernel.org/r/20200121123356.203000-3-james.morse@arm.com

James Morse
2020-01-23 18:38:15 +0800
018f22f95 KVM: arm: Fix DFSR setting for non-LPAE aarch32 guests ... Browse Code »

Beata reports that KVM_SET_VCPU_EVENTS doesn't inject the expected
exception to a non-LPAE aarch32 guest.

The host intends to inject DFSR.FS=0x14 "IMPLEMENTATION DEFINED fault
(Lockdown fault)", but the guest receives DFSR.FS=0x04 "Fault on
instruction cache maintenance". This fault is hooked by
do_translation_fault() since ARMv6, which goes on to silently 'handle'
the exception, and restart the faulting instruction.

It turns out, when TTBCR.EAE is clear DFSR is split, and FS[4] has
to shuffle up to DFSR[10].

As KVM only does this in one place, fix up the static values. We
now get the expected:
| Unhandled fault: lock abort (0x404) at 0x9c800f00

Fixes: 74a64a981662a ("KVM: arm/arm64: Unify 32bit fault injection")
Reported-by: Beata Michalska
Signed-off-by: James Morse
Signed-off-by: Marc Zyngier
Link: https://lore.kernel.org/r/20200121123356.203000-2-james.morse@arm.com

James Morse
2020-01-23 18:38:15 +0800
cf2d23e0b KVM: arm/arm64: Fix young bit from mmu notifier ... Browse Code »

kvm_test_age_hva() is called upon mmu_notifier_test_young(), but wrong
address range has been passed to handle_hva_to_gpa(). With the wrong
address range, no young bits will be checked in handle_hva_to_gpa().
It means zero is always returned from mmu_notifier_test_young().

This fixes the issue by passing correct address range to the underly
function handle_hva_to_gpa(), so that the hardware young (access) bit
will be visited.

Fixes: 35307b9a5f7e ("arm/arm64: KVM: Implement Stage-2 page aging")
Signed-off-by: Gavin Shan
Signed-off-by: Marc Zyngier
Link: https://lore.kernel.org/r/20200121055659.19560-1-gshan@redhat.com

Gavin Shan
2020-01-23 18:38:15 +0800
0e20f5e25 KVM: arm/arm64: Cleanup MMIO handling ... Browse Code »

Our MMIO handling is a bit odd, in the sense that it uses an
intermediate per-vcpu structure to store the various decoded
information that describe the access.

But the same information is readily available in the HSR/ESR_EL2
field, and we actually use this field to populate the structure.

Let's simplify the whole thing by getting rid of the superfluous
structure and save a (tiny) bit of space in the vcpu structure.

[32bit fix courtesy of Olof Johansson ]
Signed-off-by: Marc Zyngier

Marc Zyngier
2020-01-23 18:38:14 +0800
4425f567b KVM: async_pf: drop kvm_arch_async_page_present wrappers ... Browse Code »

The wrappers make it less clear that the position of the call
to kvm_arch_async_page_present depends on the architecture, and
that only one of the two call sites will actually be active.
Remove them.

Cc: Andy Lutomirski
Cc: Christian Borntraeger
Signed-off-by: Paolo Bonzini

Paolo Bonzini
2020-01-23 16:51:08 +0800