25 Jul, 2018
1 commit
-
commit b5020a8e6b54d2ece80b1e7dedb33c79a40ebd47 upstream.
Syzbot reports crashes in kvm_irqfd_assign(), caused by use-after-free
when kvm_irqfd_assign() and kvm_irqfd_deassign() run in parallel
for one specific eventfd. When the assign path hasn't finished but irqfd
has been added to kvm->irqfds.items list, another thead may deassign the
eventfd and free struct kvm_kernel_irqfd(). The assign path then uses
the struct kvm_kernel_irqfd that has been freed by deassign path. To avoid
such issue, keep irqfd under kvm->irq_srcu protection after the irqfd
has been added to kvm->irqfds.items list, and call synchronize_srcu()
in irq_shutdown() to make sure that irqfd has been fully initialized in
the assign path.Reported-by: Dmitry Vyukov
Cc: Paolo Bonzini
Cc: Radim Krčmář
Cc: Dmitry Vyukov
Signed-off-by: Tianyu Lan
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini
Signed-off-by: Greg Kroah-Hartman
22 Jul, 2018
4 commits
-
commit 5d81f7dc9bca4f4963092433e27b508cbe524a32 upstream.
Now that all our infrastructure is in place, let's expose the
availability of ARCH_WORKAROUND_2 to guests. We take this opportunity
to tidy up a couple of SMCCC constants.Acked-by: Christoffer Dall
Reviewed-by: Mark Rutland
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman -
commit 55e3748e8902ff641e334226bdcb432f9a5d78d3 upstream.
In order to offer ARCH_WORKAROUND_2 support to guests, we need
a bit of infrastructure.Let's add a flag indicating whether or not the guest uses
SSBD mitigation. Depending on the state of this flag, allow
KVM to disable ARCH_WORKAROUND_2 before entering the guest,
and enable it when exiting it.Reviewed-by: Christoffer Dall
Reviewed-by: Mark Rutland
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman -
Commit 44a497abd621a71c645f06d3d545ae2f46448830 upstream.
kvm_vgic_global_state is part of the read-only section, and is
usually accessed using a PC-relative address generation (adrp + add).It is thus useless to use kern_hyp_va() on it, and actively problematic
if kern_hyp_va() becomes non-idempotent. On the other hand, there is
no way that the compiler is going to guarantee that such access is
always PC relative.So let's bite the bullet and provide our own accessor.
Acked-by: Catalin Marinas
Reviewed-by: James Morse
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman -
Commit 36989e7fd386a9a5822c48691473863f8fbb404d upstream.
kvm_host_cpu_state is a per-cpu allocation made from kvm_arch_init()
used to store the host EL1 registers when KVM switches to a guest.Make it easier for ASM to generate pointers into this per-cpu memory
by making it a static allocation.Signed-off-by: James Morse
Acked-by: Christoffer Dall
Signed-off-by: Catalin Marinas
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman
21 Jun, 2018
1 commit
-
[ Upstream commit 5e1ca5e23b167987d5b6d8b08f2d5b7dd2d13f49 ]
It's possible for userspace to control n. Sanitize n when using it as an
array index.Note that while it appears that n must be bound to the interval [0,3]
due to the way it is extracted from addr, we cannot guarantee that
compiler transformations (and/or future refactoring) will ensure this is
the case, and given this is a slow path it's better to always perform
the masking.Found by smatch.
Signed-off-by: Mark Rutland
Acked-by: Christoffer Dall
Acked-by: Marc Zyngier
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Will Deacon
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
30 May, 2018
1 commit
-
[ Upstream commit 62b06f8f429cd233e4e2e7bbd21081ad60c9018f ]
Our irq_is_pending() helper function accesses multiple members of the
vgic_irq struct, so we need to hold the lock when calling it.
Add that requirement as a comment to the definition and take the lock
around the call in vgic_mmio_read_pending(), where we were missing it
before.Fixes: 96b298000db4 ("KVM: arm/arm64: vgic-new: Add PENDING registers handlers")
Signed-off-by: Andre Przywara
Signed-off-by: Marc Zyngier
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
23 May, 2018
2 commits
-
commit bf308242ab98b5d1648c3663e753556bef9bec01 upstream.
kvm_read_guest() will eventually look up in kvm_memslots(), which requires
either to hold the kvm->slots_lock or to be inside a kvm->srcu critical
section.
In contrast to x86 and s390 we don't take the SRCU lock on every guest
exit, so we have to do it individually for each kvm_read_guest() call.Provide a wrapper which does that and use that everywhere.
Note that ending the SRCU critical section before returning from the
kvm_read_guest() wrapper is safe, because the data has been *copied*, so
we don't need to rely on valid references to the memslot anymore.Cc: Stable # 4.8+
Reported-by: Jan Glauber
Signed-off-by: Andre Przywara
Acked-by: Christoffer Dall
Signed-off-by: Paolo Bonzini
Signed-off-by: Greg Kroah-Hartman -
commit 711702b57cc3c50b84bd648de0f1ca0a378805be upstream.
kvm_read_guest() will eventually look up in kvm_memslots(), which requires
either to hold the kvm->slots_lock or to be inside a kvm->srcu critical
section.
In contrast to x86 and s390 we don't take the SRCU lock on every guest
exit, so we have to do it individually for each kvm_read_guest() call.
Use the newly introduced wrapper for that.Cc: Stable # 4.12+
Reported-by: Jan Glauber
Signed-off-by: Andre Przywara
Acked-by: Christoffer Dall
Signed-off-by: Paolo Bonzini
Signed-off-by: Greg Kroah-Hartman
02 May, 2018
2 commits
-
commit 85bd0ba1ff9875798fad94218b627ea9f768f3c3 upstream.
Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1
or 1.0 to a guest, defaulting to the latest version of the PSCI
implementation that is compatible with the requested version. This is
no different from doing a firmware upgrade on KVM.But in order to give a chance to hypothetical badly implemented guests
that would have a fit by discovering something other than PSCI 0.2,
let's provide a new API that allows userspace to pick one particular
version of the API.This is implemented as a new class of "firmware" registers, where
we expose the PSCI version. This allows the PSCI version to be
save/restored as part of a guest migration, and also set to
any supported version if the guest requires it.Cc: stable@vger.kernel.org #4.16
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman -
commit f0cf47d939d0b4b4f660c5aaa4276fa3488f3391 upstream.
Before entering the guest, we check whether our VMID is still
part of the current generation. In order to avoid taking a lock,
we start with checking that the generation is still current, and
only if not current do we take the lock, recheck, and update the
generation and VMID.This leaves open a small race: A vcpu can bump up the global
generation number as well as the VM's, but has not updated
the VMID itself yet.At that point another vcpu from the same VM comes in, checks
the generation (and finds it not needing anything), and jumps
into the guest. At this point, we end-up with two vcpus belonging
to the same VM running with two different VMIDs. Eventually, the
VMID used by the second vcpu will get reassigned, and things will
really go wrong...A simple solution would be to drop this initial check, and always take
the lock. This is likely to cause performance issues. A middle ground
is to convert the spinlock to a rwlock, and only take the read lock
on the fast path. If the check fails at that point, drop it and
acquire the write lock, rechecking the condition.This ensures that the above scenario doesn't occur.
Cc: stable@vger.kernel.org
Reported-by: Mark Rutland
Tested-by: Shannon Zhao
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman
26 Apr, 2018
1 commit
-
[ Upstream commit a340b3e229b24a56f1c7f5826b15a3af0f4b13e5 ]
For EPT-violations that are triggered by a read, the pages are also mapped with
write permissions (if their memory region is also writable). That would avoid
getting yet another fault on the same page when a write occurs.This optimization only happens when you have a "struct page" backing the memory
region. So also enable it for memory regions that do not have a "struct page".Cc: Paolo Bonzini
Cc: Radim Krčmář
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed
Reviewed-by: Paolo Bonzini
Signed-off-by: Radim Krčmář
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
24 Apr, 2018
1 commit
-
commit 7d8b44c54e0c7c8f688e3a07f17e6083f849f01f upstream.
vgic_copy_lpi_list() parses the LPI list and picks LPIs targeting
a given vcpu. We allocate the array containing the intids before taking
the lpi_list_lock, which means we can have an array size that is not
equal to the number of LPIs.This is particularly obvious when looking at the path coming from
vgic_enable_lpis, which is not a command, and thus can run in parallel
with commands:vcpu 0: vcpu 1:
vgic_enable_lpis
its_sync_lpi_pending_table
vgic_copy_lpi_list
intids = kmalloc_array(irq_count)
MAPI(lpi targeting vcpu 0)
list_for_each_entry(lpi_list_head)
intids[i++] = irq->intid;At that stage, we will happily overrun the intids array. Boo. An easy
fix is is to break once the array is full. The MAPI command will update
the config anyway, and we won't miss a thing. We also make sure that
lpi_list_count is read exactly once, so that further updates of that
value will not affect the array bound check.Cc: stable@vger.kernel.org
Fixes: ccb1d791ab9e ("KVM: arm64: vgic-its: Fix pending table sync")
Reviewed-by: Andre Przywara
Reviewed-by: Eric Auger
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman
21 Mar, 2018
3 commits
-
commit 16ca6a607d84bef0129698d8d808f501afd08d43 upstream.
The vgic code is trying to be clever when injecting GICv2 SGIs,
and will happily populate LRs with the same interrupt number if
they come from multiple vcpus (after all, they are distinct
interrupt sources).Unfortunately, this is against the letter of the architecture,
and the GICv2 architecture spec says "Each valid interrupt stored
in the List registers must have a unique VirtualID for that
virtual CPU interface.". GICv3 has similar (although slightly
ambiguous) restrictions.This results in guests locking up when using GICv2-on-GICv3, for
example. The obvious fix is to stop trying so hard, and inject
a single vcpu per SGI per guest entry. After all, pending SGIs
with multiple source vcpus are pretty rare, and are mostly seen
in scenario where the physical CPUs are severely overcomitted.But as we now only inject a single instance of a multi-source SGI per
vcpu entry, we may delay those interrupts for longer than strictly
necessary, and run the risk of injecting lower priority interrupts
in the meantime.In order to address this, we adopt a three stage strategy:
- If we encounter a multi-source SGI in the AP list while computing
its depth, we force the list to be sorted
- When populating the LRs, we prevent the injection of any interrupt
of lower priority than that of the first multi-source SGI we've
injected.
- Finally, the injection of a multi-source SGI triggers the request
of a maintenance interrupt when there will be no pending interrupt
in the LRs (HCR_NPIE).At the point where the last pending interrupt in the LRs switches
from Pending to Active, the maintenance interrupt will be delivered,
allowing us to add the remaining SGIs using the same process.Cc: stable@vger.kernel.org
Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
Acked-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman -
commit 27e91ad1e746e341ca2312f29bccb9736be7b476 upstream.
On guest exit, and when using GICv2 on GICv3, we use a dsb(st) to
force synchronization between the memory-mapped guest view and
the system-register view that the hypervisor uses.This is incorrect, as the spec calls out the need for "a DSB whose
required access type is both loads and stores with any Shareability
attribute", while we're only synchronizing stores.We also lack an isb after the dsb to ensure that the latter has
actually been executed before we start reading stuff from the sysregs.The fix is pretty easy: turn dsb(st) into dsb(sy), and slap an isb()
just after.Cc: stable@vger.kernel.org
Fixes: f68d2b1b73cc ("arm64: KVM: Implement vgic-v3 save/restore")
Acked-by: Christoffer Dall
Reviewed-by: Andre Przywara
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman -
commit 76600428c3677659e3c3633bb4f2ea302220a275 upstream.
On my GICv3 system, the following is printed to the kernel log at boot:
kvm [1]: 8-bit VMID
kvm [1]: IDMAP page: d20e35000
kvm [1]: HYP VA range: 800000000000:ffffffffffff
kvm [1]: vgic-v2@2c020000
kvm [1]: GIC system register CPU interface enabled
kvm [1]: vgic interrupt IRQ1
kvm [1]: virtual timer IRQ4
kvm [1]: Hyp mode initialized successfullyThe KVM IDMAP is a mapping of a statically allocated kernel structure,
and so printing its physical address leaks the physical placement of
the kernel when physical KASLR in effect. So change the kvm_info() to
kvm_debug() to remove it from the log output.While at it, trim the output a bit more: IRQ numbers can be found in
/proc/interrupts, and the HYP VA and vgic-v2 lines are not highly
informational either.Cc:
Acked-by: Will Deacon
Acked-by: Christoffer Dall
Signed-off-by: Ard Biesheuvel
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman
09 Mar, 2018
1 commit
-
commit b28676bb8ae4569cced423dc2a88f7cb319d5379 upstream.
Reported by syzkaller:
pte_list_remove: ffff9714eb1f8078 0->BUG
------------[ cut here ]------------
kernel BUG at arch/x86/kvm/mmu.c:1157!
invalid opcode: 0000 [#1] SMP
RIP: 0010:pte_list_remove+0x11b/0x120 [kvm]
Call Trace:
drop_spte+0x83/0xb0 [kvm]
mmu_page_zap_pte+0xcc/0xe0 [kvm]
kvm_mmu_prepare_zap_page+0x81/0x4a0 [kvm]
kvm_mmu_invalidate_zap_all_pages+0x159/0x220 [kvm]
kvm_arch_flush_shadow_all+0xe/0x10 [kvm]
kvm_mmu_notifier_release+0x6c/0xa0 [kvm]
? kvm_mmu_notifier_release+0x5/0xa0 [kvm]
__mmu_notifier_release+0x79/0x110
? __mmu_notifier_release+0x5/0x110
exit_mmap+0x15a/0x170
? do_exit+0x281/0xcb0
mmput+0x66/0x160
do_exit+0x2c9/0xcb0
? __context_tracking_exit.part.5+0x4a/0x150
do_group_exit+0x50/0xd0
SyS_exit_group+0x14/0x20
do_syscall_64+0x73/0x1f0
entry_SYSCALL64_slow_path+0x25/0x25The reason is that when creates new memslot, there is no guarantee for new
memslot not overlap with private memslots. This can be triggered by the
following program:#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#includelong r[16];
int main()
{
void *p = valloc(0x4000);r[2] = open("/dev/kvm", 0);
r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul);uint64_t addr = 0xf000;
ioctl(r[3], KVM_SET_IDENTITY_MAP_ADDR, &addr);
r[6] = ioctl(r[3], KVM_CREATE_VCPU, 0x0ul);
ioctl(r[3], KVM_SET_TSS_ADDR, 0x0ul);
ioctl(r[6], KVM_RUN, 0);
ioctl(r[6], KVM_RUN, 0);struct kvm_userspace_memory_region mr = {
.slot = 0,
.flags = KVM_MEM_LOG_DIRTY_PAGES,
.guest_phys_addr = 0xf000,
.memory_size = 0x4000,
.userspace_addr = (uintptr_t) p
};
ioctl(r[3], KVM_SET_USER_MEMORY_REGION, &mr);
return 0;
}This patch fixes the bug by not adding a new memslot even if it
overlaps with private memslots.Reported-by: Dmitry Vyukov
Cc: Paolo Bonzini
Cc: Radim Krčmář
Cc: Dmitry Vyukov
Cc: Eric Biggers
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li
25 Feb, 2018
2 commits
-
[ Upstream commit 7465894e90e5a47e0e52aa5f1f708653fc40020f ]
vgic_set_owner acquires the irq lock without disabling interrupts,
resulting in a lockdep splat (an interrupt could fire and result
in the same lock being taken if the same virtual irq is to be
injected).In practice, it is almost impossible to trigger this bug, but
better safe than sorry. Convert the lock acquisition to a
spin_lock_irqsave() and keep lockdep happy.Reported-by: James Morse
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 58d0d19a204604ca0da26058828a53558b265da3 ]
Since it is perfectly legal to run the kernel at EL1, it is not
actually an error if HYP mode is not available when attempting to
initialize KVM, given that KVM support cannot be built as a module.
So demote the kvm_err() to kvm_info(), which prevents the error from
appearing on an otherwise 'quiet' console.Acked-by: Marc Zyngier
Acked-by: Christoffer Dall
Signed-off-by: Ard Biesheuvel
Signed-off-by: Christoffer Dall
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
17 Feb, 2018
9 commits
-
commit 58d6b15e9da5042a99c9c30ad725792e4569150e upstream.
cpu_pm_enter() calls the pm notifier chain with CPU_PM_ENTER, then if
there is a failure: CPU_PM_ENTER_FAILED.When KVM receives CPU_PM_ENTER it calls cpu_hyp_reset() which will
return us to the hyp-stub. If we subsequently get a CPU_PM_ENTER_FAILED,
KVM does nothing, leaving the CPU running with the hyp-stub, at odds
with kvm_arm_hardware_enabled.Add CPU_PM_ENTER_FAILED as a fallthrough for CPU_PM_EXIT, this reloads
KVM based on kvm_arm_hardware_enabled. This is safe even if CPU_PM_ENTER
never gets as far as KVM, as cpu_hyp_reinit() calls cpu_hyp_reset()
to make sure the hyp-stub is loaded before reloading KVM.Fixes: 67f691976662 ("arm64: kvm: allows kvm cpu hotplug")
CC: Lorenzo Pieralisi
Reviewed-by: Christoffer Dall
Signed-off-by: James Morse
Signed-off-by: Christoffer Dall
Signed-off-by: Greg Kroah-Hartman -
Commit 6167ec5c9145 upstream.
A new feature of SMCCC 1.1 is that it offers firmware-based CPU
workarounds. In particular, SMCCC_ARCH_WORKAROUND_1 provides
BP hardening for CVE-2017-5715.If the host has some mitigation for this issue, report that
we deal with it using SMCCC_ARCH_WORKAROUND_1, as we apply the
host workaround on every guest exit.Tested-by: Ard Biesheuvel
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
Signed-off-by: Will Deacon
Signed-off-by: Ard Biesheuvel
Signed-off-by: Greg Kroah-Hartman -
Commit a4097b351118 upstream.
We're about to need kvm_psci_version in HYP too. So let's turn it
into a static inline, and pass the kvm structure as a second
parameter (so that HYP can do a kern_hyp_va on it).Tested-by: Ard Biesheuvel
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
Signed-off-by: Will Deacon
Signed-off-by: Ard Biesheuvel
Signed-off-by: Greg Kroah-Hartman -
Commit 09e6be12effd upstream.
The new SMC Calling Convention (v1.1) allows for a reduced overhead
when calling into the firmware, and provides a new feature discovery
mechanism.Make it visible to KVM guests.
Tested-by: Ard Biesheuvel
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
Signed-off-by: Will Deacon
Signed-off-by: Ard Biesheuvel
Signed-off-by: Greg Kroah-Hartman -
Commit 58e0b2239a4d upstream.
PSCI 1.0 can be trivially implemented by providing the FEATURES
call on top of PSCI 0.2 and returning 1.0 as the PSCI version.We happily ignore everything else, as they are either optional or
are clarifications that do not require any additional change.PSCI 1.0 is now the default until we decide to add a userspace
selection API.Reviewed-by: Christoffer Dall
Tested-by: Ard Biesheuvel
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
Signed-off-by: Will Deacon
Signed-off-by: Ard Biesheuvel
Signed-off-by: Greg Kroah-Hartman -
Commit 84684fecd7ea upstream.
Instead of open coding the accesses to the various registers,
let's add explicit SMCCC accessors.Reviewed-by: Christoffer Dall
Tested-by: Ard Biesheuvel
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
Signed-off-by: Will Deacon
Signed-off-by: Ard Biesheuvel
Signed-off-by: Greg Kroah-Hartman -
Commit d0a144f12a7c upstream.
As we're about to trigger a PSCI version explosion, it doesn't
hurt to introduce a PSCI_VERSION helper that is going to be
used everywhere.Reviewed-by: Christoffer Dall
Tested-by: Ard Biesheuvel
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
Signed-off-by: Will Deacon
Signed-off-by: Ard Biesheuvel
Signed-off-by: Greg Kroah-Hartman -
Commit 1a2fb94e6a77 upstream.
As we're about to update the PSCI support, and because I'm lazy,
let's move the PSCI include file to include/kvm so that both
ARM architectures can find it.Acked-by: Christoffer Dall
Tested-by: Ard Biesheuvel
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
Signed-off-by: Will Deacon
Signed-off-by: Ard Biesheuvel
Signed-off-by: Greg Kroah-Hartman -
Commit 6840bdd73d07 upstream.
Now that we have per-CPU vectors, let's plug then in the KVM/arm64 code.
Signed-off-by: Marc Zyngier
Signed-off-by: Will Deacon
Signed-off-by: Catalin Marinas
Signed-off-by: Ard Biesheuvel
Signed-off-by: Greg Kroah-Hartman
04 Feb, 2018
1 commit
-
[ Upstream commit 20b7035c66bacc909ae3ffe92c1a1ea7db99fe4f ]
KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that
"any unblocked signal received [...] will cause KVM_RUN to return with
-EINTR" and that "the signal will only be delivered if not blocked by
the original signal mask".This, however, is only true, when the calling task has a signal handler
registered for a signal. If not, signal evaluation is short-circuited for
SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN
returning or the whole process is terminated.Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar
to that in do_sigtimedwait() to avoid short-circuiting of signals.Signed-off-by: Jan H. Schönherr
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
24 Jan, 2018
1 commit
-
commit c507babf10ead4d5c8cca704539b170752a8ac84 upstream.
KVM only supports PMD hugepages at stage 2 but doesn't actually check
that the provided hugepage memory pagesize is PMD_SIZE before populating
stage 2 entries.In cases where the backing hugepage size is smaller than PMD_SIZE (such
as when using contiguous hugepages), KVM can end up creating stage 2
mappings that extend beyond the supplied memory.Fix this by checking for the pagesize of userspace vma before creating
PMD hugepage at stage 2.Fixes: 66b3923a1a0f77a ("arm64: hugetlb: add support for PTE contiguous bit")
Signed-off-by: Punit Agrawal
Cc: Marc Zyngier
Reviewed-by: Christoffer Dall
Signed-off-by: Christoffer Dall
Signed-off-by: Greg Kroah-Hartman
17 Jan, 2018
1 commit
-
commit e39d200fa5bf5b94a0948db0dae44c1b73b84a56 upstream.
Reported by syzkaller:
BUG: KASAN: stack-out-of-bounds in write_mmio+0x11e/0x270 [kvm]
Read of size 8 at addr ffff8803259df7f8 by task syz-executor/32298CPU: 6 PID: 32298 Comm: syz-executor Tainted: G OE 4.15.0-rc2+ #18
Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016
Call Trace:
dump_stack+0xab/0xe1
print_address_description+0x6b/0x290
kasan_report+0x28a/0x370
write_mmio+0x11e/0x270 [kvm]
emulator_read_write_onepage+0x311/0x600 [kvm]
emulator_read_write+0xef/0x240 [kvm]
emulator_fix_hypercall+0x105/0x150 [kvm]
em_hypercall+0x2b/0x80 [kvm]
x86_emulate_insn+0x2b1/0x1640 [kvm]
x86_emulate_instruction+0x39a/0xb90 [kvm]
handle_exception+0x1b4/0x4d0 [kvm_intel]
vcpu_enter_guest+0x15a0/0x2640 [kvm]
kvm_arch_vcpu_ioctl_run+0x549/0x7d0 [kvm]
kvm_vcpu_ioctl+0x479/0x880 [kvm]
do_vfs_ioctl+0x142/0x9a0
SyS_ioctl+0x74/0x80
entry_SYSCALL_64_fastpath+0x23/0x9aThe path of patched vmmcall will patch 3 bytes opcode 0F 01 C1(vmcall)
to the guest memory, however, write_mmio tracepoint always prints 8 bytes
through *(u64 *)val since kvm splits the mmio access into 8 bytes. This
leaks 5 bytes from the kernel stack (CVE-2017-17741). This patch fixes
it by just accessing the bytes which we operate on.Before patch:
syz-executor-5567 [007] .... 51370.561696: kvm_mmio: mmio write len 3 gpa 0x10 val 0x1ffff10077c1010f
After patch:
syz-executor-13416 [002] .... 51302.299573: kvm_mmio: mmio write len 3 gpa 0x10 val 0xc1010f
Reported-by: Dmitry Vyukov
Reviewed-by: Darren Kenny
Reviewed-by: Marc Zyngier
Tested-by: Marc Zyngier
Cc: Paolo Bonzini
Cc: Radim Krčmář
Cc: Marc Zyngier
Cc: Christoffer Dall
Signed-off-by: Wanpeng Li
Signed-off-by: Paolo Bonzini
Cc: Mathieu Desnoyers
Signed-off-by: Greg Kroah-Hartman
30 Dec, 2017
1 commit
-
commit 7839c672e58bf62da8f2f0197fefb442c02ba1dd upstream.
When we unmap the HYP memory, we try to be clever and unmap one
PGD at a time. If we start with a non-PGD aligned address and try
to unmap a whole PGD, things go horribly wrong in unmap_hyp_range
(addr and end can never match, and it all goes really badly as we
keep incrementing pgd and parse random memory as page tables...).The obvious fix is to let unmap_hyp_range do what it does best,
which is to iterate over a range.The size of the linear mapping, which begins at PAGE_OFFSET, can be
easily calculated by subtracting PAGE_OFFSET form high_memory, because
high_memory is defined as the linear map address of the last byte of
DRAM, plus one.The size of the vmalloc region is given trivially by VMALLOC_END -
VMALLOC_START.Reported-by: Andre Przywara
Tested-by: Andre Przywara
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall
Signed-off-by: Greg Kroah-Hartman
25 Dec, 2017
1 commit
-
[ Upstream commit 46bea48ac241fe0b413805952dda74dd0c09ba8b ]
The kvm slabs can consume a significant amount of system memory
and indeed in our production environment we have observed that
a lot of machines are spending significant amount of memory that
can not be left as system memory overhead. Also the allocations
from these slabs can be triggered directly by user space applications
which has access to kvm and thus a buggy application can leak
such memory. So, these caches should be accounted to kmemcg.Signed-off-by: Shakeel Butt
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
17 Dec, 2017
1 commit
-
commit 64afe6e9eb4841f35317da4393de21a047a883b3 upstream.
The current pending table parsing code assumes that we keep the
previous read of the pending bits, but keep that variable in
the current block, making sure it is discarded on each loop.We end-up using whatever is on the stack. Who knows, it might
just be the right thing...Fixes: 33d3bc9556a7d ("KVM: arm64: vgic-its: Read initial LPI pending table")
Reported-by: AKASHI Takahiro
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall
Signed-off-by: Greg Kroah-Hartman
14 Dec, 2017
5 commits
-
commit 686f294f2f1ae40705283dd413ca1e4c14f20f93 upstream.
We miss a test against NULL after allocation.
Fixes: 6d03a68f8054 ("KVM: arm64: vgic-its: Turn device_id validation into generic ID validation")
Reported-by: AKASHI Takahiro
Acked-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall
Signed-off-by: Greg Kroah-Hartman -
commit ddb4b0102cb9cdd2398d98b3e1e024e08a2f4239 upstream.
The current pending table parsing code assumes that we keep the
previous read of the pending bits, but keep that variable in
the current block, making sure it is discarded on each loop.We end-up using whatever is on the stack. Who knows, it might
just be the right thing...Fixes: 280771252c1ba ("KVM: arm64: vgic-v3: KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES")
Reported-by: AKASHI Takahiro
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall
Signed-off-by: Greg Kroah-Hartman -
commit 150009e2c70cc3c6e97f00e7595055765d32fb85 upstream.
Using the size of the structure we're allocating is a good idea
and avoids any surprise... In this case, we're happilly confusing
kvm_kernel_irq_routing_entry and kvm_irq_routing_entry...Fixes: 95b110ab9a09 ("KVM: arm/arm64: Enable irqchip routing")
Reported-by: AKASHI Takahiro
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall
Signed-off-by: Greg Kroah-Hartman -
commit fc396e066318c0a02208c1d3f0b62950a7714999 upstream.
We are incorrectly rearranging 32-bit words inside a 64-bit typed value
for big endian systems, which would result in never marking a virtual
interrupt as inactive on big endian systems (assuming 32 or fewer LRs on
the hardware). Fix this by not doing any word order manipulation for
the typed values.Acked-by: Christoffer Dall
Signed-off-by: Christoffer Dall
Signed-off-by: Greg Kroah-Hartman -
commit b1394e745b9453dcb5b0671c205b770e87dedb87 upstream.
Implementation of the unpinned APIC page didn't update the VMCS address
cache when invalidation was done through range mmu notifiers.
This became a problem when the page notifier was removed.Re-introduce the arch-specific helper and call it from ...range_start.
Reported-by: Fabian Grünbichler
Fixes: 38b9917350cb ("kvm: vmx: Implement set_apic_access_page_addr")
Fixes: 369ea8242c0f ("mm/rmap: update to new mmu_notifier semantic v2")
Reviewed-by: Paolo Bonzini
Reviewed-by: Andrea Arcangeli
Tested-by: Wanpeng Li
Tested-by: Fabian Grünbichler
Signed-off-by: Radim Krčmář
Signed-off-by: Greg Kroah-Hartman
05 Nov, 2017
1 commit
-
Pull KVM fixes from Paolo Bonzini:
"Fixes for interrupt controller emulation in ARM/ARM64 and x86, plus a
one-liner x86 KVM guest fix"* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Update APICv on APIC reset
KVM: VMX: Do not fully reset PI descriptor on vCPU reset
kvm: Return -ENODEV from update_persistent_clock
KVM: arm/arm64: vgic-its: Check GITS_BASER Valid bit before saving tables
KVM: arm/arm64: vgic-its: Check CBASER/BASER validity before enabling the ITS
KVM: arm/arm64: vgic-its: Fix vgic_its_restore_collection_table returned value
KVM: arm/arm64: vgic-its: Fix return value for device table restore
arm/arm64: kvm: Disable branch profiling in HYP code
arm/arm64: kvm: Move initialization completion message
arm/arm64: KVM: set right LR register value for 32 bit guest when inject abort
KVM: arm64: its: Fix missing dynamic allocation check in scan_its_table