05 Sep, 2018
2 commits
-
commit 976d34e2dab10ece5ea8fe7090b7692913f89084 upstream.
When there is contention on faulting in a particular page table entry
at stage 2, the break-before-make requirement of the architecture can
lead to additional refaulting due to TLB invalidation.Avoid this by skipping a page table update if the new value of the PTE
matches the previous value.Cc: stable@vger.kernel.org
Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Reviewed-by: Suzuki Poulose
Acked-by: Christoffer Dall
Signed-off-by: Punit Agrawal
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman -
commit 86658b819cd0a9aa584cd84453ed268a6f013770 upstream.
Contention on updating a PMD entry by a large number of vcpus can lead
to duplicate work when handling stage 2 page faults. As the page table
update follows the break-before-make requirement of the architecture,
it can lead to repeated refaults due to clearing the entry and
flushing the tlbs.This problem is more likely when -
* there are large number of vcpus
* the mapping is large block mappingsuch as when using PMD hugepages (512MB) with 64k pages.
Fix this by skipping the page table update if there is no change in
the entry being updated.Cc: stable@vger.kernel.org
Fixes: ad361f093c1e ("KVM: ARM: Support hugetlbfs backed huge pages")
Reviewed-by: Suzuki Poulose
Acked-by: Christoffer Dall
Signed-off-by: Punit Agrawal
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman
21 Mar, 2018
1 commit
-
commit 76600428c3677659e3c3633bb4f2ea302220a275 upstream.
On my GICv3 system, the following is printed to the kernel log at boot:
kvm [1]: 8-bit VMID
kvm [1]: IDMAP page: d20e35000
kvm [1]: HYP VA range: 800000000000:ffffffffffff
kvm [1]: vgic-v2@2c020000
kvm [1]: GIC system register CPU interface enabled
kvm [1]: vgic interrupt IRQ1
kvm [1]: virtual timer IRQ4
kvm [1]: Hyp mode initialized successfullyThe KVM IDMAP is a mapping of a statically allocated kernel structure,
and so printing its physical address leaks the physical placement of
the kernel when physical KASLR in effect. So change the kvm_info() to
kvm_debug() to remove it from the log output.While at it, trim the output a bit more: IRQ numbers can be found in
/proc/interrupts, and the HYP VA and vgic-v2 lines are not highly
informational either.Cc:
Acked-by: Will Deacon
Acked-by: Christoffer Dall
Signed-off-by: Ard Biesheuvel
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman
24 Jan, 2018
1 commit
-
commit c507babf10ead4d5c8cca704539b170752a8ac84 upstream.
KVM only supports PMD hugepages at stage 2 but doesn't actually check
that the provided hugepage memory pagesize is PMD_SIZE before populating
stage 2 entries.In cases where the backing hugepage size is smaller than PMD_SIZE (such
as when using contiguous hugepages), KVM can end up creating stage 2
mappings that extend beyond the supplied memory.Fix this by checking for the pagesize of userspace vma before creating
PMD hugepage at stage 2.Fixes: 66b3923a1a0f77a ("arm64: hugetlb: add support for PTE contiguous bit")
Signed-off-by: Punit Agrawal
Cc: Marc Zyngier
Reviewed-by: Christoffer Dall
Signed-off-by: Christoffer Dall
Signed-off-by: Greg Kroah-Hartman
30 Dec, 2017
1 commit
-
commit 7839c672e58bf62da8f2f0197fefb442c02ba1dd upstream.
When we unmap the HYP memory, we try to be clever and unmap one
PGD at a time. If we start with a non-PGD aligned address and try
to unmap a whole PGD, things go horribly wrong in unmap_hyp_range
(addr and end can never match, and it all goes really badly as we
keep incrementing pgd and parse random memory as page tables...).The obvious fix is to let unmap_hyp_range do what it does best,
which is to iterate over a range.The size of the linear mapping, which begins at PAGE_OFFSET, can be
easily calculated by subtracting PAGE_OFFSET form high_memory, because
high_memory is defined as the linear map address of the last byte of
DRAM, plus one.The size of the vmalloc region is given trivially by VMALLOC_END -
VMALLOC_START.Reported-by: Andre Przywara
Tested-by: Andre Przywara
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall
Signed-off-by: Greg Kroah-Hartman
05 Sep, 2017
1 commit
-
The ARM-ARM has two bits in the ESR/HSR relevant to external aborts.
A range of {I,D}FSC values (of which bit 5 is always set) and bit 9 'EA'
which provides:
> an IMPLEMENTATION DEFINED classification of External Aborts.This bit is in addition to the {I,D}FSC range, and has an implementation
defined meaning. KVM should always ignore this bit when handling external
aborts from a guest.Remove the ESR_ELx_EA definition and rewrite its helper
kvm_vcpu_dabt_isextabt() to check the {I,D}FSC range. This merges
kvm_vcpu_dabt_isextabt() and the recently added is_abort_sea() helper.CC: Tyler Baicar
Reported-by: gengdongjiu
Signed-off-by: James Morse
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall
25 Jul, 2017
1 commit
-
The mmu_notifier_release() callback of KVM triggers cleaning up
the stage2 page table on kvm-arm. However there could be other
notifier callbacks in parallel with the mmu_notifier_release(),
which could cause the call backs ending up in an empty stage2
page table. Make sure we check it for all the notifier callbacks.Cc: stable@vger.kernel.org
Fixes: commit 293f29363 ("kvm-arm: Unmap shadow pagetables properly")
Reported-by: Alex Graf
Reviewed-by: Christoffer Dall
Signed-off-by: Suzuki K Poulose
Signed-off-by: Marc Zyngier
07 Jul, 2017
1 commit
-
Pull KVM updates from Paolo Bonzini:
"PPC:
- Better machine check handling for HV KVM
- Ability to support guests with threads=2, 4 or 8 on POWER9
- Fix for a race that could cause delayed recognition of signals
- Fix for a bug where POWER9 guests could sleep with interrupts pending.ARM:
- VCPU request overhaul
- allow timer and PMU to have their interrupt number selected from userspace
- workaround for Cavium erratum 30115
- handling of memory poisonning
- the usual crop of fixes and cleanupss390:
- initial machine check forwarding
- migration support for the CMMA page hinting information
- cleanups and fixesx86:
- nested VMX bugfixes and improvements
- more reliable NMI window detection on AMD
- APIC timer optimizationsGeneric:
- VCPU request overhaul + documentation of common code patterns
- kvm_stat improvements"* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits)
Update my email address
kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12
kvm: x86: mmu: allow A/D bits to be disabled in an mmu
x86: kvm: mmu: make spte mmio mask more explicit
x86: kvm: mmu: dead code thanks to access tracking
KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code
KVM: PPC: Book3S HV: Close race with testing for signals on guest entry
KVM: PPC: Book3S HV: Simplify dynamic micro-threading code
KVM: x86: remove ignored type attribute
KVM: LAPIC: Fix lapic timer injection delay
KVM: lapic: reorganize restart_apic_timer
KVM: lapic: reorganize start_hv_timer
kvm: nVMX: Check memory operand to INVVPID
KVM: s390: Inject machine check into the nested guest
KVM: s390: Inject machine check into the guest
tools/kvm_stat: add new interactive command 'b'
tools/kvm_stat: add new command line switch '-i'
tools/kvm_stat: fix error on interactive command 'g'
KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exit
...
06 Jul, 2017
1 commit
-
Pull arm64 updates from Will Deacon:
- RAS reporting via GHES/APEI (ACPI)
- Indirect ftrace trampolines for modules
- Improvements to kernel fault reporting
- Page poisoning
- Sigframe cleanups and preparation for SVE context
- Core dump fixes
- Sparse fixes (mainly relating to endianness)
- xgene SoC PMU v3 driver
- Misc cleanups and non-critical fixes
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (75 commits)
arm64: fix endianness annotation for 'struct jit_ctx' and friends
arm64: cpuinfo: constify attribute_group structures.
arm64: ptrace: Fix incorrect get_user() use in compat_vfp_set()
arm64: ptrace: Remove redundant overrun check from compat_vfp_set()
arm64: ptrace: Avoid setting compat FP[SC]R to garbage if get_user fails
arm64: fix endianness annotation for __apply_alternatives()/get_alt_insn()
arm64: fix endianness annotation in get_kaslr_seed()
arm64: add missing conversion to __wsum in ip_fast_csum()
arm64: fix endianness annotation in acpi_parking_protocol.c
arm64: use readq() instead of readl() to read 64bit entry_point
arm64: fix endianness annotation for reloc_insn_movw() & reloc_insn_imm()
arm64: fix endianness annotation for aarch64_insn_write()
arm64: fix endianness annotation in aarch64_insn_read()
arm64: fix endianness annotation in call_undef_hook()
arm64: fix endianness annotation for debug-monitors.c
ras: mark stub functions as 'inline'
arm64: pass endianness info to sparse
arm64: ftrace: fix !CONFIG_ARM64_MODULE_PLTS kernels
arm64: signal: Allow expansion of the signal frame
acpi: apei: check for pending errors when probing GHES entries
...
23 Jun, 2017
2 commits
-
Currently external aborts are unsupported by the guest abort
handling. Add handling for SEAs so that the host kernel reports
SEAs which occur in the guest kernel.When an SEA occurs in the guest kernel, the guest exits and is
routed to kvm_handle_guest_abort(). Prior to this patch, a print
message of an unsupported FSC would be printed and nothing else
would happen. With this patch, the code gets routed to the APEI
handling of SEAs in the host kernel to report the SEA information.Signed-off-by: Tyler Baicar
Acked-by: Catalin Marinas
Acked-by: Marc Zyngier
Acked-by: Christoffer Dall
Signed-off-by: Will Deacon -
Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64, notifications for
broken memory can call memory_failure() in mm/memory-failure.c to offline
pages of memory, possibly signalling user space processes and notifying all
the in-kernel users.memory_failure() has two modes, early and late. Early is used by
machine-managers like Qemu to receive a notification when a memory error is
notified to the host. These can then be relayed to the guest before the
affected page is accessed. To enable this, the process must set
PR_MCE_KILL_EARLY in PR_MCE_KILL_SET using the prctl() syscall.Once the early notification has been handled, nothing stops the
machine-manager or guest from accessing the affected page. If the
machine-manager does this the page will fail to be mapped and SIGBUS will
be sent. This patch adds the equivalent path for when the guest accesses
the page, sending SIGBUS to the machine-manager.These two signals can be distinguished by the machine-manager using their
si_code: BUS_MCEERR_AO for 'action optional' early notifications, and
BUS_MCEERR_AR for 'action required' synchronous/late notifications.Do as x86 does, and deliver the SIGBUS when we discover pfn ==
KVM_PFN_ERR_HWPOISON. Use the hugepage size as si_addr_lsb if this vma was
allocated as a hugepage. Transparent hugepages will be split by
memory_failure() before we see them here.Cc: Punit Agrawal
Signed-off-by: James Morse
Signed-off-by: Marc Zyngier
06 Jun, 2017
1 commit
-
Under memory pressure, we start ageing pages, which amounts to parsing
the page tables. Since we don't want to allocate any extra level,
we pass NULL for our private allocation cache. Which means that
stage2_get_pud() is allowed to fail. This results in the following
splat:[ 1520.409577] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[ 1520.417741] pgd = ffff810f52fef000
[ 1520.421201] [00000008] *pgd=0000010f636c5003, *pud=0000010f56f48003, *pmd=0000000000000000
[ 1520.429546] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 1520.435156] Modules linked in:
[ 1520.438246] CPU: 15 PID: 53550 Comm: qemu-system-aar Tainted: G W 4.12.0-rc4-00027-g1885c397eaec #7205
[ 1520.448705] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB12A 10/26/2016
[ 1520.463726] task: ffff800ac5fb4e00 task.stack: ffff800ce04e0000
[ 1520.469666] PC is at stage2_get_pmd+0x34/0x110
[ 1520.474119] LR is at kvm_age_hva_handler+0x44/0xf0
[ 1520.478917] pc : [] lr : [] pstate: 40000145
[ 1520.486325] sp : ffff800ce04e33d0
[ 1520.489644] x29: ffff800ce04e33d0 x28: 0000000ffff40064
[ 1520.494967] x27: 0000ffff27e00000 x26: 0000000000000000
[ 1520.500289] x25: ffff81051ba65008 x24: 0000ffff40065000
[ 1520.505618] x23: 0000ffff40064000 x22: 0000000000000000
[ 1520.510947] x21: ffff810f52b20000 x20: 0000000000000000
[ 1520.516274] x19: 0000000058264000 x18: 0000000000000000
[ 1520.521603] x17: 0000ffffa6fe7438 x16: ffff000008278b70
[ 1520.526940] x15: 000028ccd8000000 x14: 0000000000000008
[ 1520.532264] x13: ffff7e0018298000 x12: 0000000000000002
[ 1520.537582] x11: ffff000009241b93 x10: 0000000000000940
[ 1520.542908] x9 : ffff0000092ef800 x8 : 0000000000000200
[ 1520.548229] x7 : ffff800ce04e36a8 x6 : 0000000000000000
[ 1520.553552] x5 : 0000000000000001 x4 : 0000000000000000
[ 1520.558873] x3 : 0000000000000000 x2 : 0000000000000008
[ 1520.571696] x1 : ffff000008fd5000 x0 : ffff0000080b149c
[ 1520.577039] Process qemu-system-aar (pid: 53550, stack limit = 0xffff800ce04e0000)
[...]
[ 1521.510735] [] stage2_get_pmd+0x34/0x110
[ 1521.516221] [] kvm_age_hva_handler+0x44/0xf0
[ 1521.522054] [] handle_hva_to_gpa+0xb8/0xe8
[ 1521.527716] [] kvm_age_hva+0x44/0xf0
[ 1521.532854] [] kvm_mmu_notifier_clear_flush_young+0x70/0xc0
[ 1521.539992] [] __mmu_notifier_clear_flush_young+0x88/0xd0
[ 1521.546958] [] page_referenced_one+0xf0/0x188
[ 1521.552881] [] rmap_walk_anon+0xec/0x250
[ 1521.558370] [] rmap_walk+0x78/0xa0
[ 1521.563337] [] page_referenced+0x164/0x180
[ 1521.569002] [] shrink_active_list+0x178/0x3b8
[ 1521.574922] [] shrink_node_memcg+0x328/0x600
[ 1521.580758] [] shrink_node+0xc4/0x328
[ 1521.585986] [] do_try_to_free_pages+0xc0/0x340
[ 1521.592000] [] try_to_free_pages+0xcc/0x240
[...]The trivial fix is to handle this NULL pud value early, rather than
dereferencing it blindly.Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier
Reviewed-by: Christoffer Dall
Signed-off-by: Christoffer Dall
16 May, 2017
2 commits
-
We yield the kvm->mmu_lock occassionaly while performing an operation
(e.g, unmap or permission changes) on a large area of stage2 mappings.
However this could possibly cause another thread to clear and free up
the stage2 page tables while we were waiting for regaining the lock and
thus the original thread could end up in accessing memory that was
freed. This patch fixes the problem by making sure that the stage2
pagetable is still valid after we regain the lock. The fact that
mmu_notifer->release() could be called twice (via __mmu_notifier_release
and mmu_notifier_unregsister) enhances the possibility of hitting
this race where there are two threads trying to unmap the entire guest
shadow pages.While at it, cleanup the redudant checks around cond_resched_lock in
stage2_wp_range(), as cond_resched_lock already does the same checks.Cc: Mark Rutland
Cc: Radim Krčmář
Cc: andreyknvl@google.com
Cc: Paolo Bonzini
Cc: stable@vger.kernel.org
Acked-by: Marc Zyngier
Signed-off-by: Suzuki K Poulose
Reviewed-by: Christoffer Dall
Signed-off-by: Christoffer Dall -
Make sure we don't use a cached value of the KVM stage2 PGD while
resetting the PGD.Cc: Marc Zyngier
Cc: stable@vger.kernel.org
Signed-off-by: Suzuki K Poulose
Reviewed-by: Christoffer Dall
Signed-off-by: Christoffer Dall
15 May, 2017
1 commit
-
In kvm_free_stage2_pgd() we check the stage2 PGD before holding
the lock and proceed to take the lock if it is valid. And we unmap
the page tables, followed by releasing the lock. We reset the PGD
only after dropping this lock, which could cause a race condition
where another thread waiting on or even holding the lock, could
potentially see that the PGD is still valid and proceed to perform
a stage2 operation and later encounter a NULL PGD.[223090.242280] Unable to handle kernel NULL pointer dereference at
virtual address 00000040
[223090.262330] PC is at unmap_stage2_range+0x8c/0x428
[223090.262332] LR is at kvm_unmap_hva_handler+0x2c/0x3c
[223090.262531] Call trace:
[223090.262533] [] unmap_stage2_range+0x8c/0x428
[223090.262535] [] kvm_unmap_hva_handler+0x2c/0x3c
[223090.262537] [] handle_hva_to_gpa+0xb0/0x104
[223090.262539] [] kvm_unmap_hva+0x5c/0xbc
[223090.262543] []
kvm_mmu_notifier_invalidate_page+0x50/0x8c
[223090.262547] []
__mmu_notifier_invalidate_page+0x5c/0x84
[223090.262551] [] try_to_unmap_one+0x1d0/0x4a0
[223090.262553] [] rmap_walk+0x1cc/0x2e0
[223090.262555] [] try_to_unmap+0x74/0xa4
[223090.262557] [] migrate_pages+0x31c/0x5ac
[223090.262561] [] compact_zone+0x3fc/0x7ac
[223090.262563] [] compact_zone_order+0x94/0xb0
[223090.262564] [] try_to_compact_pages+0x108/0x290
[223090.262569] [] __alloc_pages_direct_compact+0x70/0x1ac
[223090.262571] [] __alloc_pages_nodemask+0x434/0x9f4
[223090.262572] [] alloc_pages_vma+0x230/0x254
[223090.262574] [] do_huge_pmd_anonymous_page+0x114/0x538
[223090.262576] [] handle_mm_fault+0xd40/0x17a4
[223090.262577] [] __get_user_pages+0x12c/0x36c
[223090.262578] [] get_user_pages_unlocked+0xa4/0x1b8
[223090.262579] [] __gfn_to_pfn_memslot+0x280/0x31c
[223090.262580] [] gfn_to_pfn_prot+0x4c/0x5c
[223090.262582] [] kvm_handle_guest_abort+0x240/0x774
[223090.262584] [] handle_exit+0x11c/0x1ac
[223090.262586] [] kvm_arch_vcpu_ioctl_run+0x31c/0x648
[223090.262587] [] kvm_vcpu_ioctl+0x378/0x768
[223090.262590] [] do_vfs_ioctl+0x324/0x5a4
[223090.262591] [] SyS_ioctl+0x90/0xa4
[223090.262595] [] el0_svc_naked+0x38/0x3cThis patch moves the stage2 PGD manipulation under the lock.
Reported-by: Alexander Graf
Cc: Mark Rutland
Cc: Marc Zyngier
Cc: Paolo Bonzini
Cc: Radim Krčmář
Reviewed-by: Christoffer Dall
Reviewed-by: Marc Zyngier
Signed-off-by: Suzuki K Poulose
Signed-off-by: Christoffer Dall
09 May, 2017
1 commit
-
…l/git/kvmarm/kvmarm into HEAD
Second round of KVM/ARM Changes for v4.12.
Changes include:
- A fix related to the 32-bit idmap stub
- A fix to the bitmask used to deode the operands of an AArch32 CP
instruction
- We have moved the files shared between arch/arm/kvm and
arch/arm64/kvm to virt/kvm/arm
- We add support for saving/restoring the virtual ITS state to
userspace
04 May, 2017
1 commit
-
For some time now we have been having a lot of shared functionality
between the arm and arm64 KVM support in arch/arm, which not only
required a horrible inter-arch reference from the Makefile in
arch/arm64/kvm, but also created confusion for newcomers to the code
base, as was recently seen on the mailing list.Further, it causes confusion for things like cscope, which needs special
attention to index specific shared files for arm64 from the arm tree.Move the shared files into virt/kvm/arm and move the trace points along
with it. When moving the tracepoints we have to modify the way the vgic
creates definitions of the trace points, so we take the chance to
include the VGIC tracepoints in its very own special vgic trace.h file.Signed-off-by: Christoffer Dall