Eric Lee / smarc-fsl-linux-kernel

17 Jan, 2019

1 commit

cb754d67c KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less ... Browse Code »

commit fb544d1ca65a89f7a3895f7531221ceeed74ada7 upstream.

We recently addressed a VMID generation race by introducing a read/write
lock around accesses and updates to the vmid generation values.

However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but
does so without taking the read lock.

As far as I can tell, this can lead to the same kind of race:

VM 0, VCPU 0 VM 0, VCPU 1
------------ ------------
update_vttbr (vmid 254)
update_vttbr (vmid 1) // roll over
read_lock(kvm_vmid_lock);
force_vm_exit()
local_irq_disable
need_new_vmid_gen == false //because vmid gen matches

enter_guest (vmid 254)
kvm_arch.vttbr = :
read_unlock(kvm_vmid_lock);

enter_guest (vmid 1)

Which results in running two VCPUs in the same VM with different VMIDs
and (even worse) other VCPUs from other VMs could now allocate clashing
VMID 254 from the new generation as long as VCPU 0 is not exiting.

Attempt to solve this by making sure vttbr is updated before another CPU
can observe the updated VMID generation.

Cc: stable@vger.kernel.org
Fixes: f0cf47d939d0 "KVM: arm/arm64: Close VMID generation race"
Reviewed-by: Julien Thierry
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman

Christoffer Dall
2019-01-17 05:07:13 +0800

10 Jan, 2019

1 commit

83c2752a5 arm/arm64: KVM: vgic: Force VM halt when changing the active state of GICv3 PPIs/SGIs ... Browse Code »

commit 107352a24900fb458152b92a4e72fbdc83fd5510 upstream.

We currently only halt the guest when a vCPU messes with the active
state of an SPI. This is perfectly fine for GICv2, but isn't enough
for GICv3, where all vCPUs can access the state of any other vCPU.

Let's broaden the condition to include any GICv3 interrupt that
has an active state (i.e. all but LPIs).

Cc: stable@vger.kernel.org
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman

Marc Zyngier
2019-01-10 00:14:52 +0800

14 Nov, 2018

1 commit

f1df76543 KVM: arm64: Fix caching of host MDCR_EL2 value ... Browse Code »

commit da5a3ce66b8bb51b0ea8a89f42aac153903f90fb upstream.

At boot time, KVM stashes the host MDCR_EL2 value, but only does this
when the kernel is not running in hyp mode (i.e. is non-VHE). In these
cases, the stashed value of MDCR_EL2.HPMN happens to be zero, which can
lead to CONSTRAINED UNPREDICTABLE behaviour.

Since we use this value to derive the MDCR_EL2 value when switching
to/from a guest, after a guest have been run, the performance counters
do not behave as expected. This has been observed to result in accesses
via PMXEVTYPER_EL0 and PMXEVCNTR_EL0 not affecting the relevant
counters, resulting in events not being counted. In these cases, only
the fixed-purpose cycle counter appears to work as expected.

Fix this by always stashing the host MDCR_EL2 value, regardless of VHE.

Cc: Christopher Dall
Cc: James Morse
Cc: Will Deacon
Cc: stable@vger.kernel.org
Fixes: 1e947bad0b63b351 ("arm64: KVM: Skip HYP setup when already running in HYP")
Tested-by: Robin Murphy
Signed-off-by: Mark Rutland
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman

Mark Rutland
2018-11-14 03:15:08 +0800

26 Sep, 2018

2 commits

f3662e332 KVM: arm/arm64: Fix vgic init race ... Browse Code »

[ Upstream commit 1d47191de7e15900f8fbfe7cccd7c6e1c2d7c31a ]

The vgic_init function can race with kvm_arch_vcpu_create() which does
not hold kvm_lock() and we therefore have no synchronization primitives
to ensure we're doing the right thing.

As the user is trying to initialize or run the VM while at the same time
creating more VCPUs, we just have to refuse to initialize the VGIC in
this case rather than silently failing with a broken VCPU.

Reviewed-by: Eric Auger
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Christoffer Dall
2018-09-26 14:38:04 +0800
737066efe KVM: arm/arm64: vgic: Fix possible spectre-v1 write in vgic_mmio_write_apr() ... Browse Code »

[ Upstream commit 6b8b9a48545e08345b8ff77c9fd51b1aebdbefb3 ]

It's possible for userspace to control n. Sanitize n when using it as an
array index, to inhibit the potential spectre-v1 write gadget.

Note that while it appears that n must be bound to the interval [0,3]
due to the way it is extracted from addr, we cannot guarantee that
compiler transformations (and/or future refactoring) will ensure this is
the case, and given this is a slow path it's better to always perform
the masking.

Found by smatch.

Signed-off-by: Mark Rutland
Cc: Christoffer Dall
Cc: Marc Zyngier
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Marc Zyngier
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Mark Rutland
2018-09-26 14:38:02 +0800

05 Sep, 2018

2 commits

4a06fdf2c KVM: arm/arm64: Skip updating PTE entry if no change ... Browse Code »

commit 976d34e2dab10ece5ea8fe7090b7692913f89084 upstream.

When there is contention on faulting in a particular page table entry
at stage 2, the break-before-make requirement of the architecture can
lead to additional refaulting due to TLB invalidation.

Avoid this by skipping a page table update if the new value of the PTE
matches the previous value.

Cc: stable@vger.kernel.org
Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Reviewed-by: Suzuki Poulose
Acked-by: Christoffer Dall
Signed-off-by: Punit Agrawal
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman

Punit Agrawal
2018-09-05 15:26:36 +0800
792a03941 KVM: arm/arm64: Skip updating PMD entry if no change ... Browse Code »

commit 86658b819cd0a9aa584cd84453ed268a6f013770 upstream.

Contention on updating a PMD entry by a large number of vcpus can lead
to duplicate work when handling stage 2 page faults. As the page table
update follows the break-before-make requirement of the architecture,
it can lead to repeated refaults due to clearing the entry and
flushing the tlbs.

This problem is more likely when -

* there are large number of vcpus
* the mapping is large block mapping

such as when using PMD hugepages (512MB) with 64k pages.

Fix this by skipping the page table update if there is no change in
the entry being updated.

Cc: stable@vger.kernel.org
Fixes: ad361f093c1e ("KVM: ARM: Support hugetlbfs backed huge pages")
Reviewed-by: Suzuki Poulose
Acked-by: Christoffer Dall
Signed-off-by: Punit Agrawal
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman

Punit Agrawal
2018-09-05 15:26:35 +0800

24 Aug, 2018

2 commits

270d5d771 KVM: irqfd: fix race between EPOLLHUP and irq_bypass_register_consumer ... Browse Code »

commit 9432a3175770e06cb83eada2d91fac90c977cb99 upstream.

A comment warning against this bug is there, but the code is not doing what
the comment says. Therefore it is possible that an EPOLLHUP races against
irq_bypass_register_consumer. The EPOLLHUP handler schedules irqfd_shutdown,
and if that runs soon enough, you get a use-after-free.

Reported-by: syzbot
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini
Reviewed-by: David Hildenbrand
Signed-off-by: Sudip Mukherjee
Signed-off-by: Greg Kroah-Hartman

Paolo Bonzini
2018-08-24 19:09:21 +0800
7a21294b8 KVM: arm/arm64: Drop resource size check for GICV window ... Browse Code »

[ Upstream commit ba56bc3a0786992755e6804fbcbdc60ef6cfc24c ]

When booting a 64 KB pages kernel on a ACPI GICv3 system that
implements support for v2 emulation, the following warning is
produced

GICV size 0x2000 not a multiple of page size 0x10000

and support for v2 emulation is disabled, preventing GICv2 VMs
from being able to run on such hosts.

The reason is that vgic_v3_probe() performs a sanity check on the
size of the window (it should be a multiple of the page size),
while the ACPI MADT parsing code hardcodes the size of the window
to 8 KB. This makes sense, considering that ACPI does not bother
to describe the size in the first place, under the assumption that
platforms implementing ACPI will follow the architecture and not
put anything else in the same 64 KB window.

So let's just drop the sanity check altogether, and assume that
the window is at least 64 KB in size.

Fixes: 909777324588 ("KVM: arm/arm64: vgic-new: vgic_init: implement kvm_vgic_hyp_init")
Signed-off-by: Ard Biesheuvel
Signed-off-by: Marc Zyngier
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Ard Biesheuvel
2018-08-24 19:09:03 +0800

25 Jul, 2018

1 commit

3a46a033b KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel. ... Browse Code »

commit b5020a8e6b54d2ece80b1e7dedb33c79a40ebd47 upstream.

Syzbot reports crashes in kvm_irqfd_assign(), caused by use-after-free
when kvm_irqfd_assign() and kvm_irqfd_deassign() run in parallel
for one specific eventfd. When the assign path hasn't finished but irqfd
has been added to kvm->irqfds.items list, another thead may deassign the
eventfd and free struct kvm_kernel_irqfd(). The assign path then uses
the struct kvm_kernel_irqfd that has been freed by deassign path. To avoid
such issue, keep irqfd under kvm->irq_srcu protection after the irqfd
has been added to kvm->irqfds.items list, and call synchronize_srcu()
in irq_shutdown() to make sure that irqfd has been fully initialized in
the assign path.

Reported-by: Dmitry Vyukov
Cc: Paolo Bonzini
Cc: Radim Krčmář
Cc: Dmitry Vyukov
Signed-off-by: Tianyu Lan
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini
Signed-off-by: Greg Kroah-Hartman

Lan Tianyu
2018-07-25 17:25:07 +0800

22 Jul, 2018

4 commits

96fd60c81 arm64: KVM: Add ARCH_WORKAROUND_2 discovery through ARCH_FEATURES_FUNC_ID ... Browse Code »

commit 5d81f7dc9bca4f4963092433e27b508cbe524a32 upstream.

Now that all our infrastructure is in place, let's expose the
availability of ARCH_WORKAROUND_2 to guests. We take this opportunity
to tidy up a couple of SMCCC constants.

Acked-by: Christoffer Dall
Reviewed-by: Mark Rutland
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman

Marc Zyngier
2018-07-22 20:28:52 +0800
805357aa6 arm64: KVM: Add ARCH_WORKAROUND_2 support for guests ... Browse Code »

commit 55e3748e8902ff641e334226bdcb432f9a5d78d3 upstream.

In order to offer ARCH_WORKAROUND_2 support to guests, we need
a bit of infrastructure.

Let's add a flag indicating whether or not the guest uses
SSBD mitigation. Depending on the state of this flag, allow
KVM to disable ARCH_WORKAROUND_2 before entering the guest,
and enable it when exiting it.

Reviewed-by: Christoffer Dall
Reviewed-by: Mark Rutland
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman

Marc Zyngier
2018-07-22 20:28:51 +0800
dca781560 KVM: arm/arm64: Do not use kern_hyp_va() with kvm_vgic_global_state ... Browse Code »

Commit 44a497abd621a71c645f06d3d545ae2f46448830 upstream.

kvm_vgic_global_state is part of the read-only section, and is
usually accessed using a PC-relative address generation (adrp + add).

It is thus useless to use kern_hyp_va() on it, and actively problematic
if kern_hyp_va() becomes non-idempotent. On the other hand, there is
no way that the compiler is going to guarantee that such access is
always PC relative.

So let's bite the bullet and provide our own accessor.

Acked-by: Catalin Marinas
Reviewed-by: James Morse
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman

Marc Zyngier
2018-07-22 20:28:50 +0800
8ad56472d KVM: arm/arm64: Convert kvm_host_cpu_state to a static per-cpu allocation ... Browse Code »

Commit 36989e7fd386a9a5822c48691473863f8fbb404d upstream.

kvm_host_cpu_state is a per-cpu allocation made from kvm_arch_init()
used to store the host EL1 registers when KVM switches to a guest.

Make it easier for ASM to generate pointers into this per-cpu memory
by making it a static allocation.

Signed-off-by: James Morse
Acked-by: Christoffer Dall
Signed-off-by: Catalin Marinas
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman

James Morse
2018-07-22 20:28:50 +0800

21 Jun, 2018

1 commit

81d27c6ed KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_mmio_read_apr() ... Browse Code »

[ Upstream commit 5e1ca5e23b167987d5b6d8b08f2d5b7dd2d13f49 ]

It's possible for userspace to control n. Sanitize n when using it as an
array index.

Note that while it appears that n must be bound to the interval [0,3]
due to the way it is extracted from addr, we cannot guarantee that
compiler transformations (and/or future refactoring) will ensure this is
the case, and given this is a slow path it's better to always perform
the masking.

Found by smatch.

Signed-off-by: Mark Rutland
Acked-by: Christoffer Dall
Acked-by: Marc Zyngier
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Will Deacon
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Mark Rutland
2018-06-21 03:02:49 +0800

30 May, 2018

1 commit

05c401183 KVM: arm/arm64: vgic: Add missing irq_lock to vgic_mmio_read_pending ... Browse Code »

[ Upstream commit 62b06f8f429cd233e4e2e7bbd21081ad60c9018f ]

Our irq_is_pending() helper function accesses multiple members of the
vgic_irq struct, so we need to hold the lock when calling it.
Add that requirement as a comment to the definition and take the lock
around the call in vgic_mmio_read_pending(), where we were missing it
before.

Fixes: 96b298000db4 ("KVM: arm/arm64: vgic-new: Add PENDING registers handlers")
Signed-off-by: Andre Przywara
Signed-off-by: Marc Zyngier
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Andre Przywara
2018-05-30 13:52:15 +0800

23 May, 2018

2 commits

27ea98a4c KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock ... Browse Code »

commit bf308242ab98b5d1648c3663e753556bef9bec01 upstream.

kvm_read_guest() will eventually look up in kvm_memslots(), which requires
either to hold the kvm->slots_lock or to be inside a kvm->srcu critical
section.
In contrast to x86 and s390 we don't take the SRCU lock on every guest
exit, so we have to do it individually for each kvm_read_guest() call.

Provide a wrapper which does that and use that everywhere.

Note that ending the SRCU critical section before returning from the
kvm_read_guest() wrapper is safe, because the data has been *copied*, so
we don't need to rely on valid references to the memslot anymore.

Cc: Stable # 4.8+
Reported-by: Jan Glauber
Signed-off-by: Andre Przywara
Acked-by: Christoffer Dall
Signed-off-by: Paolo Bonzini
Signed-off-by: Greg Kroah-Hartman

Andre Przywara
2018-05-23 00:53:57 +0800
b6f6d8bfe KVM: arm/arm64: VGIC/ITS save/restore: protect kvm_read_guest() calls ... Browse Code »

commit 711702b57cc3c50b84bd648de0f1ca0a378805be upstream.

kvm_read_guest() will eventually look up in kvm_memslots(), which requires
either to hold the kvm->slots_lock or to be inside a kvm->srcu critical
section.
In contrast to x86 and s390 we don't take the SRCU lock on every guest
exit, so we have to do it individually for each kvm_read_guest() call.
Use the newly introduced wrapper for that.

Cc: Stable # 4.12+
Reported-by: Jan Glauber
Signed-off-by: Andre Przywara
Acked-by: Christoffer Dall
Signed-off-by: Paolo Bonzini
Signed-off-by: Greg Kroah-Hartman

Andre Przywara
2018-05-23 00:53:57 +0800

02 May, 2018

2 commits

e5a290c4f arm/arm64: KVM: Add PSCI version selection API ... Browse Code »

commit 85bd0ba1ff9875798fad94218b627ea9f768f3c3 upstream.

Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1
or 1.0 to a guest, defaulting to the latest version of the PSCI
implementation that is compatible with the requested version. This is
no different from doing a firmware upgrade on KVM.

But in order to give a chance to hypothetical badly implemented guests
that would have a fit by discovering something other than PSCI 0.2,
let's provide a new API that allows userspace to pick one particular
version of the API.

This is implemented as a new class of "firmware" registers, where
we expose the PSCI version. This allows the PSCI version to be
save/restored as part of a guest migration, and also set to
any supported version if the guest requires it.

Cc: stable@vger.kernel.org #4.16
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman

Marc Zyngier
2018-05-02 03:58:27 +0800
5a5ea3401 KVM: arm/arm64: Close VMID generation race ... Browse Code »

commit f0cf47d939d0b4b4f660c5aaa4276fa3488f3391 upstream.

Before entering the guest, we check whether our VMID is still
part of the current generation. In order to avoid taking a lock,
we start with checking that the generation is still current, and
only if not current do we take the lock, recheck, and update the
generation and VMID.

This leaves open a small race: A vcpu can bump up the global
generation number as well as the VM's, but has not updated
the VMID itself yet.

At that point another vcpu from the same VM comes in, checks
the generation (and finds it not needing anything), and jumps
into the guest. At this point, we end-up with two vcpus belonging
to the same VM running with two different VMIDs. Eventually, the
VMID used by the second vcpu will get reassigned, and things will
really go wrong...

A simple solution would be to drop this initial check, and always take
the lock. This is likely to cause performance issues. A middle ground
is to convert the spinlock to a rwlock, and only take the read lock
on the fast path. If the check fails at that point, drop it and
acquire the write lock, rechecking the condition.

This ensures that the above scenario doesn't occur.

Cc: stable@vger.kernel.org
Reported-by: Mark Rutland
Tested-by: Shannon Zhao
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman

Marc Zyngier
2018-05-02 03:58:22 +0800

26 Apr, 2018

1 commit

d757c3a9c kvm: Map PFN-type memory regions as writable (if possible) ... Browse Code »

[ Upstream commit a340b3e229b24a56f1c7f5826b15a3af0f4b13e5 ]

For EPT-violations that are triggered by a read, the pages are also mapped with
write permissions (if their memory region is also writable). That would avoid
getting yet another fault on the same page when a write occurs.

This optimization only happens when you have a "struct page" backing the memory
region. So also enable it for memory regions that do not have a "struct page".

Cc: Paolo Bonzini
Cc: Radim Krčmář
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed
Reviewed-by: Paolo Bonzini
Signed-off-by: Radim Krčmář
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

KarimAllah Ahmed
2018-04-26 17:02:13 +0800

24 Apr, 2018

1 commit

8f1a2803e KVM: arm/arm64: vgic-its: Fix potential overrun in vgic_copy_lpi_list ... Browse Code »

commit 7d8b44c54e0c7c8f688e3a07f17e6083f849f01f upstream.

vgic_copy_lpi_list() parses the LPI list and picks LPIs targeting
a given vcpu. We allocate the array containing the intids before taking
the lpi_list_lock, which means we can have an array size that is not
equal to the number of LPIs.

This is particularly obvious when looking at the path coming from
vgic_enable_lpis, which is not a command, and thus can run in parallel
with commands:

vcpu 0: vcpu 1:
vgic_enable_lpis
its_sync_lpi_pending_table
vgic_copy_lpi_list
intids = kmalloc_array(irq_count)
MAPI(lpi targeting vcpu 0)
list_for_each_entry(lpi_list_head)
intids[i++] = irq->intid;

At that stage, we will happily overrun the intids array. Boo. An easy
fix is is to break once the array is full. The MAPI command will update
the config anyway, and we won't miss a thing. We also make sure that
lpi_list_count is read exactly once, so that further updates of that
value will not affect the array bound check.

Cc: stable@vger.kernel.org
Fixes: ccb1d791ab9e ("KVM: arm64: vgic-its: Fix pending table sync")
Reviewed-by: Andre Przywara
Reviewed-by: Eric Auger
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman

Marc Zyngier
2018-04-24 15:36:23 +0800

21 Mar, 2018

3 commits

e693f1331 KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid ... Browse Code »

commit 16ca6a607d84bef0129698d8d808f501afd08d43 upstream.

The vgic code is trying to be clever when injecting GICv2 SGIs,
and will happily populate LRs with the same interrupt number if
they come from multiple vcpus (after all, they are distinct
interrupt sources).

Unfortunately, this is against the letter of the architecture,
and the GICv2 architecture spec says "Each valid interrupt stored
in the List registers must have a unique VirtualID for that
virtual CPU interface.". GICv3 has similar (although slightly
ambiguous) restrictions.

This results in guests locking up when using GICv2-on-GICv3, for
example. The obvious fix is to stop trying so hard, and inject
a single vcpu per SGI per guest entry. After all, pending SGIs
with multiple source vcpus are pretty rare, and are mostly seen
in scenario where the physical CPUs are severely overcomitted.

But as we now only inject a single instance of a multi-source SGI per
vcpu entry, we may delay those interrupts for longer than strictly
necessary, and run the risk of injecting lower priority interrupts
in the meantime.

In order to address this, we adopt a three stage strategy:
- If we encounter a multi-source SGI in the AP list while computing
its depth, we force the list to be sorted
- When populating the LRs, we prevent the injection of any interrupt
of lower priority than that of the first multi-source SGI we've
injected.
- Finally, the injection of a multi-source SGI triggers the request
of a maintenance interrupt when there will be no pending interrupt
in the LRs (HCR_NPIE).

At the point where the last pending interrupt in the LRs switches
from Pending to Active, the maintenance interrupt will be delivered,
allowing us to add the remaining SGIs using the same process.

Cc: stable@vger.kernel.org
Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
Acked-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman

Marc Zyngier
2018-03-21 19:06:43 +0800
b85437d00 kvm: arm/arm64: vgic-v3: Tighten synchronization for guests using v2 on v3 ... Browse Code »

commit 27e91ad1e746e341ca2312f29bccb9736be7b476 upstream.

On guest exit, and when using GICv2 on GICv3, we use a dsb(st) to
force synchronization between the memory-mapped guest view and
the system-register view that the hypervisor uses.

This is incorrect, as the spec calls out the need for "a DSB whose
required access type is both loads and stores with any Shareability
attribute", while we're only synchronizing stores.

We also lack an isb after the dsb to ensure that the latter has
actually been executed before we start reading stuff from the sysregs.

The fix is pretty easy: turn dsb(st) into dsb(sy), and slap an isb()
just after.

Cc: stable@vger.kernel.org
Fixes: f68d2b1b73cc ("arm64: KVM: Implement vgic-v3 save/restore")
Acked-by: Christoffer Dall
Reviewed-by: Andre Przywara
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman

Marc Zyngier
2018-03-21 19:06:43 +0800
2ffe95e3a KVM: arm/arm64: Reduce verbosity of KVM init log ... Browse Code »

commit 76600428c3677659e3c3633bb4f2ea302220a275 upstream.

On my GICv3 system, the following is printed to the kernel log at boot:

kvm [1]: 8-bit VMID
kvm [1]: IDMAP page: d20e35000
kvm [1]: HYP VA range: 800000000000:ffffffffffff
kvm [1]: vgic-v2@2c020000
kvm [1]: GIC system register CPU interface enabled
kvm [1]: vgic interrupt IRQ1
kvm [1]: virtual timer IRQ4
kvm [1]: Hyp mode initialized successfully

The KVM IDMAP is a mapping of a statically allocated kernel structure,
and so printing its physical address leaks the physical placement of
the kernel when physical KASLR in effect. So change the kvm_info() to
kvm_debug() to remove it from the log output.

While at it, trim the output a bit more: IRQ numbers can be found in
/proc/interrupts, and the HYP VA and vgic-v2 lines are not highly
informational either.

Cc:
Acked-by: Will Deacon
Acked-by: Christoffer Dall
Signed-off-by: Ard Biesheuvel
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman

Ard Biesheuvel
2018-03-21 19:06:43 +0800

09 Mar, 2018

1 commit

7135aaf3e KVM: mmu: Fix overlap between public and private memslots ... Browse Code »

commit b28676bb8ae4569cced423dc2a88f7cb319d5379 upstream.

Reported by syzkaller:

pte_list_remove: ffff9714eb1f8078 0->BUG
------------[ cut here ]------------
kernel BUG at arch/x86/kvm/mmu.c:1157!
invalid opcode: 0000 [#1] SMP
RIP: 0010:pte_list_remove+0x11b/0x120 [kvm]
Call Trace:
drop_spte+0x83/0xb0 [kvm]
mmu_page_zap_pte+0xcc/0xe0 [kvm]
kvm_mmu_prepare_zap_page+0x81/0x4a0 [kvm]
kvm_mmu_invalidate_zap_all_pages+0x159/0x220 [kvm]
kvm_arch_flush_shadow_all+0xe/0x10 [kvm]
kvm_mmu_notifier_release+0x6c/0xa0 [kvm]
? kvm_mmu_notifier_release+0x5/0xa0 [kvm]
__mmu_notifier_release+0x79/0x110
? __mmu_notifier_release+0x5/0x110
exit_mmap+0x15a/0x170
? do_exit+0x281/0xcb0
mmput+0x66/0x160
do_exit+0x2c9/0xcb0
? __context_tracking_exit.part.5+0x4a/0x150
do_group_exit+0x50/0xd0
SyS_exit_group+0x14/0x20
do_syscall_64+0x73/0x1f0
entry_SYSCALL64_slow_path+0x25/0x25

The reason is that when creates new memslot, there is no guarantee for new
memslot not overlap with private memslots. This can be triggered by the
following program:

#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include

long r[16];

int main()
{
void *p = valloc(0x4000);

r[2] = open("/dev/kvm", 0);
r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul);

uint64_t addr = 0xf000;
ioctl(r[3], KVM_SET_IDENTITY_MAP_ADDR, &addr);
r[6] = ioctl(r[3], KVM_CREATE_VCPU, 0x0ul);
ioctl(r[3], KVM_SET_TSS_ADDR, 0x0ul);
ioctl(r[6], KVM_RUN, 0);
ioctl(r[6], KVM_RUN, 0);

struct kvm_userspace_memory_region mr = {
.slot = 0,
.flags = KVM_MEM_LOG_DIRTY_PAGES,
.guest_phys_addr = 0xf000,
.memory_size = 0x4000,
.userspace_addr = (uintptr_t) p
};
ioctl(r[3], KVM_SET_USER_MEMORY_REGION, &mr);
return 0;
}

This patch fixes the bug by not adding a new memslot even if it
overlaps with private memslots.

Reported-by: Dmitry Vyukov
Cc: Paolo Bonzini
Cc: Radim Krčmář
Cc: Dmitry Vyukov
Cc: Eric Biggers
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li

Wanpeng Li
2018-03-09 14:41:24 +0800

25 Feb, 2018

2 commits

97ef3a502 KVM: arm/arm64: Fix spinlock acquisition in vgic_set_owner ... Browse Code »

[ Upstream commit 7465894e90e5a47e0e52aa5f1f708653fc40020f ]

vgic_set_owner acquires the irq lock without disabling interrupts,
resulting in a lockdep splat (an interrupt could fire and result
in the same lock being taken if the same virtual irq is to be
injected).

In practice, it is almost impossible to trigger this bug, but
better safe than sorry. Convert the lock acquisition to a
spin_lock_irqsave() and keep lockdep happy.

Reported-by: James Morse
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Marc Zyngier
2018-02-25 18:07:56 +0800
49a3efa81 kvm: arm: don't treat unavailable HYP mode as an error ... Browse Code »

[ Upstream commit 58d0d19a204604ca0da26058828a53558b265da3 ]

Since it is perfectly legal to run the kernel at EL1, it is not
actually an error if HYP mode is not available when attempting to
initialize KVM, given that KVM support cannot be built as a module.
So demote the kvm_err() to kvm_info(), which prevents the error from
appearing on an otherwise 'quiet' console.

Acked-by: Marc Zyngier
Acked-by: Christoffer Dall
Signed-off-by: Ard Biesheuvel
Signed-off-by: Christoffer Dall
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Ard Biesheuvel
2018-02-25 18:07:55 +0800

17 Feb, 2018

9 commits

985bf3991 KVM: arm/arm64: Handle CPU_PM_ENTER_FAILED ... Browse Code »

commit 58d6b15e9da5042a99c9c30ad725792e4569150e upstream.

cpu_pm_enter() calls the pm notifier chain with CPU_PM_ENTER, then if
there is a failure: CPU_PM_ENTER_FAILED.

When KVM receives CPU_PM_ENTER it calls cpu_hyp_reset() which will
return us to the hyp-stub. If we subsequently get a CPU_PM_ENTER_FAILED,
KVM does nothing, leaving the CPU running with the hyp-stub, at odds
with kvm_arm_hardware_enabled.

Add CPU_PM_ENTER_FAILED as a fallthrough for CPU_PM_EXIT, this reloads
KVM based on kvm_arm_hardware_enabled. This is safe even if CPU_PM_ENTER
never gets as far as KVM, as cpu_hyp_reinit() calls cpu_hyp_reset()
to make sure the hyp-stub is loaded before reloading KVM.

Fixes: 67f691976662 ("arm64: kvm: allows kvm cpu hotplug")
CC: Lorenzo Pieralisi
Reviewed-by: Christoffer Dall
Signed-off-by: James Morse
Signed-off-by: Christoffer Dall
Signed-off-by: Greg Kroah-Hartman

James Morse
2018-02-17 03:23:03 +0800
e47273d08 arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support ... Browse Code »

Commit 6167ec5c9145 upstream.

A new feature of SMCCC 1.1 is that it offers firmware-based CPU
workarounds. In particular, SMCCC_ARCH_WORKAROUND_1 provides
BP hardening for CVE-2017-5715.

If the host has some mitigation for this issue, report that
we deal with it using SMCCC_ARCH_WORKAROUND_1, as we apply the
host workaround on every guest exit.

Tested-by: Ard Biesheuvel
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
Signed-off-by: Will Deacon
Signed-off-by: Ard Biesheuvel
Signed-off-by: Greg Kroah-Hartman

Marc Zyngier
2018-02-17 03:22:55 +0800
2cfe8929f arm/arm64: KVM: Turn kvm_psci_version into a static inline ... Browse Code »

Commit a4097b351118 upstream.

We're about to need kvm_psci_version in HYP too. So let's turn it
into a static inline, and pass the kvm structure as a second
parameter (so that HYP can do a kern_hyp_va on it).

Tested-by: Ard Biesheuvel
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
Signed-off-by: Will Deacon
Signed-off-by: Ard Biesheuvel
Signed-off-by: Greg Kroah-Hartman

Marc Zyngier
2018-02-17 03:22:55 +0800
45e206114 arm/arm64: KVM: Advertise SMCCC v1.1 ... Browse Code »

Commit 09e6be12effd upstream.

The new SMC Calling Convention (v1.1) allows for a reduced overhead
when calling into the firmware, and provides a new feature discovery
mechanism.

Make it visible to KVM guests.

Tested-by: Ard Biesheuvel
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
Signed-off-by: Will Deacon
Signed-off-by: Ard Biesheuvel
Signed-off-by: Greg Kroah-Hartman

Marc Zyngier
2018-02-17 03:22:55 +0800
4ba100aa9 arm/arm64: KVM: Implement PSCI 1.0 support ... Browse Code »

Commit 58e0b2239a4d upstream.

PSCI 1.0 can be trivially implemented by providing the FEATURES
call on top of PSCI 0.2 and returning 1.0 as the PSCI version.

We happily ignore everything else, as they are either optional or
are clarifications that do not require any additional change.

PSCI 1.0 is now the default until we decide to add a userspace
selection API.

Reviewed-by: Christoffer Dall
Tested-by: Ard Biesheuvel
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
Signed-off-by: Will Deacon
Signed-off-by: Ard Biesheuvel
Signed-off-by: Greg Kroah-Hartman

Marc Zyngier
2018-02-17 03:22:55 +0800
ce15f32d4 arm/arm64: KVM: Add smccc accessors to PSCI code ... Browse Code »

Commit 84684fecd7ea upstream.

Instead of open coding the accesses to the various registers,
let's add explicit SMCCC accessors.

Reviewed-by: Christoffer Dall
Tested-by: Ard Biesheuvel
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
Signed-off-by: Will Deacon
Signed-off-by: Ard Biesheuvel
Signed-off-by: Greg Kroah-Hartman

Marc Zyngier
2018-02-17 03:22:55 +0800
4efa1a863 arm/arm64: KVM: Add PSCI_VERSION helper ... Browse Code »

Commit d0a144f12a7c upstream.

As we're about to trigger a PSCI version explosion, it doesn't
hurt to introduce a PSCI_VERSION helper that is going to be
used everywhere.

Reviewed-by: Christoffer Dall
Tested-by: Ard Biesheuvel
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
Signed-off-by: Will Deacon
Signed-off-by: Ard Biesheuvel
Signed-off-by: Greg Kroah-Hartman

Marc Zyngier
2018-02-17 03:22:54 +0800
591862b56 arm/arm64: KVM: Consolidate the PSCI include files ... Browse Code »

Commit 1a2fb94e6a77 upstream.

As we're about to update the PSCI support, and because I'm lazy,
let's move the PSCI include file to include/kvm so that both
ARM architectures can find it.

Acked-by: Christoffer Dall
Tested-by: Ard Biesheuvel
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
Signed-off-by: Will Deacon
Signed-off-by: Ard Biesheuvel
Signed-off-by: Greg Kroah-Hartman

Marc Zyngier
2018-02-17 03:22:54 +0800
aab330670 arm64: KVM: Use per-CPU vector when BP hardening is enabled ... Browse Code »

Commit 6840bdd73d07 upstream.

Now that we have per-CPU vectors, let's plug then in the KVM/arm64 code.

Signed-off-by: Marc Zyngier
Signed-off-by: Will Deacon
Signed-off-by: Catalin Marinas
Signed-off-by: Ard Biesheuvel
Signed-off-by: Greg Kroah-Hartman

Marc Zyngier
2018-02-17 03:22:53 +0800

04 Feb, 2018

1 commit

40ba283e2 KVM: Let KVM_SET_SIGNAL_MASK work as advertised ... Browse Code »

[ Upstream commit 20b7035c66bacc909ae3ffe92c1a1ea7db99fe4f ]

KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that
"any unblocked signal received [...] will cause KVM_RUN to return with
-EINTR" and that "the signal will only be delivered if not blocked by
the original signal mask".

This, however, is only true, when the calling task has a signal handler
registered for a signal. If not, signal evaluation is short-circuited for
SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN
returning or the whole process is terminated.

Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar
to that in do_sigtimedwait() to avoid short-circuiting of signals.

Signed-off-by: Jan H. SchÃ¶nherr
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Jan H. Schönherr
2018-02-04 00:39:06 +0800

24 Jan, 2018

1 commit

d64105389 KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2 ... Browse Code »

commit c507babf10ead4d5c8cca704539b170752a8ac84 upstream.

KVM only supports PMD hugepages at stage 2 but doesn't actually check
that the provided hugepage memory pagesize is PMD_SIZE before populating
stage 2 entries.

In cases where the backing hugepage size is smaller than PMD_SIZE (such
as when using contiguous hugepages), KVM can end up creating stage 2
mappings that extend beyond the supplied memory.

Fix this by checking for the pagesize of userspace vma before creating
PMD hugepage at stage 2.

Fixes: 66b3923a1a0f77a ("arm64: hugetlb: add support for PTE contiguous bit")
Signed-off-by: Punit Agrawal
Cc: Marc Zyngier
Reviewed-by: Christoffer Dall
Signed-off-by: Christoffer Dall
Signed-off-by: Greg Kroah-Hartman

Punit Agrawal
2018-01-24 02:58:19 +0800

17 Jan, 2018

1 commit

653c41ac4 KVM: Fix stack-out-of-bounds read in write_mmio ... Browse Code »

commit e39d200fa5bf5b94a0948db0dae44c1b73b84a56 upstream.

Reported by syzkaller:

BUG: KASAN: stack-out-of-bounds in write_mmio+0x11e/0x270 [kvm]
Read of size 8 at addr ffff8803259df7f8 by task syz-executor/32298

CPU: 6 PID: 32298 Comm: syz-executor Tainted: G OE 4.15.0-rc2+ #18
Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016
Call Trace:
dump_stack+0xab/0xe1
print_address_description+0x6b/0x290
kasan_report+0x28a/0x370
write_mmio+0x11e/0x270 [kvm]
emulator_read_write_onepage+0x311/0x600 [kvm]
emulator_read_write+0xef/0x240 [kvm]
emulator_fix_hypercall+0x105/0x150 [kvm]
em_hypercall+0x2b/0x80 [kvm]
x86_emulate_insn+0x2b1/0x1640 [kvm]
x86_emulate_instruction+0x39a/0xb90 [kvm]
handle_exception+0x1b4/0x4d0 [kvm_intel]
vcpu_enter_guest+0x15a0/0x2640 [kvm]
kvm_arch_vcpu_ioctl_run+0x549/0x7d0 [kvm]
kvm_vcpu_ioctl+0x479/0x880 [kvm]
do_vfs_ioctl+0x142/0x9a0
SyS_ioctl+0x74/0x80
entry_SYSCALL_64_fastpath+0x23/0x9a

The path of patched vmmcall will patch 3 bytes opcode 0F 01 C1(vmcall)
to the guest memory, however, write_mmio tracepoint always prints 8 bytes
through *(u64 *)val since kvm splits the mmio access into 8 bytes. This
leaks 5 bytes from the kernel stack (CVE-2017-17741). This patch fixes
it by just accessing the bytes which we operate on.

Before patch:

syz-executor-5567 [007] .... 51370.561696: kvm_mmio: mmio write len 3 gpa 0x10 val 0x1ffff10077c1010f

After patch:

syz-executor-13416 [002] .... 51302.299573: kvm_mmio: mmio write len 3 gpa 0x10 val 0xc1010f

Reported-by: Dmitry Vyukov
Reviewed-by: Darren Kenny
Reviewed-by: Marc Zyngier
Tested-by: Marc Zyngier
Cc: Paolo Bonzini
Cc: Radim Krčmář
Cc: Marc Zyngier
Cc: Christoffer Dall
Signed-off-by: Wanpeng Li
Signed-off-by: Paolo Bonzini
Cc: Mathieu Desnoyers
Signed-off-by: Greg Kroah-Hartman

Wanpeng Li
2018-01-17 16:45:17 +0800