Eric Lee / smarc-fsl-linux-kernel

27 Dec, 2018

2 commits

792bf4d87 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull RCU updates from Ingo Molnar:
"The biggest RCU changes in this cycle were:

- Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar.

- Replace calls of RCU-bh and RCU-sched update-side functions to
their vanilla RCU counterparts. This series is a step towards
complete removal of the RCU-bh and RCU-sched update-side functions.

( Note that some of these conversions are going upstream via their
respective maintainers. )

- Documentation updates, including a number of flavor-consolidation
updates from Joel Fernandes.

- Miscellaneous fixes.

- Automate generation of the initrd filesystem used for rcutorture
testing.

- Convert spin_is_locked() assertions to instead use lockdep.

( Note that some of these conversions are going upstream via their
respective maintainers. )

- SRCU updates, especially including a fix from Dennis Krein for a
bag-on-head-class bug.

- RCU torture-test updates"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits)
rcutorture: Don't do busted forward-progress testing
rcutorture: Use 100ms buckets for forward-progress callback histograms
rcutorture: Recover from OOM during forward-progress tests
rcutorture: Print forward-progress test age upon failure
rcutorture: Print time since GP end upon forward-progress failure
rcutorture: Print histogram of CB invocation at OOM time
rcutorture: Print GP age upon forward-progress failure
rcu: Print per-CPU callback counts for forward-progress failures
rcu: Account for nocb-CPU callback counts in RCU CPU stall warnings
rcutorture: Dump grace-period diagnostics upon forward-progress OOM
rcutorture: Prepare for asynchronous access to rcu_fwd_startat
torture: Remove unnecessary "ret" variables
rcutorture: Affinity forward-progress test to avoid housekeeping CPUs
rcutorture: Break up too-long rcu_torture_fwd_prog() function
rcutorture: Remove cbflood facility
torture: Bring any extra CPUs online during kernel startup
rcutorture: Add call_rcu() flooding forward-progress tests
rcutorture/formal: Replace synchronize_sched() with synchronize_rcu()
tools/kernel.h: Replace synchronize_sched() with synchronize_rcu()
net/decnet: Replace rcu_barrier_bh() with rcu_barrier()
...

Linus Torvalds
2018-12-27 05:07:19 +0800
42b00f122 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM updates from Paolo Bonzini:
"ARM:
- selftests improvements
- large PUD support for HugeTLB
- single-stepping fixes
- improved tracing
- various timer and vGIC fixes

x86:
- Processor Tracing virtualization
- STIBP support
- some correctness fixes
- refactorings and splitting of vmx.c
- use the Hyper-V range TLB flush hypercall
- reduce order of vcpu struct
- WBNOINVD support
- do not use -ftrace for __noclone functions
- nested guest support for PAUSE filtering on AMD
- more Hyper-V enlightenments (direct mode for synthetic timers)

PPC:
- nested VFIO

s390:
- bugfixes only this time"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
KVM: x86: Add CPUID support for new instruction WBNOINVD
kvm: selftests: ucall: fix exit mmio address guessing
Revert "compiler-gcc: disable -ftracer for __noclone functions"
KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines
KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobs
KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup
MAINTAINERS: Add arch/x86/kvm sub-directories to existing KVM/x86 entry
KVM/x86: Use SVM assembly instruction mnemonics instead of .byte streams
KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()
KVM/MMU: Flush tlb directly in kvm_set_pte_rmapp()
KVM/MMU: Move tlb flush in kvm_set_pte_rmapp() to kvm_mmu_notifier_change_pte()
KVM: Make kvm_set_spte_hva() return int
KVM: Replace old tlb flush function with new one to flush a specified range.
KVM/MMU: Add tlb flush with range helper function
KVM/VMX: Add hv tlb range flush support
x86/hyper-v: Add HvFlushGuestAddressList hypercall support
KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops
KVM: x86: Disable Intel PT when VMXON in L1 guest
KVM: x86: Set intercept for Intel PT MSRs read/write
KVM: x86: Implement Intel PT MSRs read/write emulation
...

Linus Torvalds
2018-12-27 03:46:28 +0800

26 Dec, 2018

1 commit

5694cecdb Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux ... Browse Code »

Pull arm64 festive updates from Will Deacon:
"In the end, we ended up with quite a lot more than I expected:

- Support for ARMv8.3 Pointer Authentication in userspace (CRIU and
kernel-side support to come later)

- Support for per-thread stack canaries, pending an update to GCC
that is currently undergoing review

- Support for kexec_file_load(), which permits secure boot of a kexec
payload but also happens to improve the performance of kexec
dramatically because we can avoid the sucky purgatory code from
userspace. Kdump will come later (requires updates to libfdt).

- Optimisation of our dynamic CPU feature framework, so that all
detected features are enabled via a single stop_machine()
invocation

- KPTI whitelisting of Cortex-A CPUs unaffected by Meltdown, so that
they can benefit from global TLB entries when KASLR is not in use

- 52-bit virtual addressing for userspace (kernel remains 48-bit)

- Patch in LSE atomics for per-cpu atomic operations

- Custom preempt.h implementation to avoid unconditional calls to
preempt_schedule() from preempt_enable()

- Support for the new 'SB' Speculation Barrier instruction

- Vectorised implementation of XOR checksumming and CRC32
optimisations

- Workaround for Cortex-A76 erratum #1165522

- Improved compatibility with Clang/LLD

- Support for TX2 system PMUS for profiling the L3 cache and DMC

- Reflect read-only permissions in the linear map by default

- Ensure MMIO reads are ordered with subsequent calls to Xdelay()

- Initial support for memory hotplug

- Tweak the threshold when we invalidate the TLB by-ASID, so that
mremap() performance is improved for ranges spanning multiple PMDs.

- Minor refactoring and cleanups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (125 commits)
arm64: kaslr: print PHYS_OFFSET in dump_kernel_offset()
arm64: sysreg: Use _BITUL() when defining register bits
arm64: cpufeature: Rework ptr auth hwcaps using multi_entry_cap_matches
arm64: cpufeature: Reduce number of pointer auth CPU caps from 6 to 4
arm64: docs: document pointer authentication
arm64: ptr auth: Move per-thread keys from thread_info to thread_struct
arm64: enable pointer authentication
arm64: add prctl control for resetting ptrauth keys
arm64: perf: strip PAC when unwinding userspace
arm64: expose user PAC bit positions via ptrace
arm64: add basic pointer authentication support
arm64/cpufeature: detect pointer authentication
arm64: Don't trap host pointer auth use to EL2
arm64/kvm: hide ptrauth from guests
arm64/kvm: consistently handle host HCR_EL2 flags
arm64: add pointer authentication register bits
arm64: add comments about EC exception levels
arm64: perf: Treat EXCLUDE_EL* bit definitions as unsigned
arm64: kpti: Whitelist Cortex-A CPUs that don't implement the CSV3 field
arm64: enable per-task stack canaries
...

Linus Torvalds
2018-12-26 09:41:56 +0800

21 Dec, 2018

5 commits

0cf853c5e KVM/MMU: Move tlb flush in kvm_set_pte_rmapp() to kvm_mmu_notifier_change_pte() ... Browse Code »

This patch is to move tlb flush in kvm_set_pte_rmapp() to
kvm_mmu_notifier_change_pte() in order to avoid redundant tlb flush.

Signed-off-by: Lan Tianyu
Signed-off-by: Paolo Bonzini

Lan Tianyu
2018-12-21 18:28:42 +0800
748c0e312 KVM: Make kvm_set_spte_hva() return int ... Browse Code »

The patch is to make kvm_set_spte_hva() return int and caller can
check return value to determine flush tlb or not.

Signed-off-by: Lan Tianyu
Acked-by: Paul Mackerras
Signed-off-by: Paolo Bonzini

Lan Tianyu
2018-12-21 18:28:41 +0800
bdd303cb1 KVM: fix some typos ... Browse Code »

Signed-off-by: Wei Yang
[Preserved the iff and a probably intentional weird bracket notation.
Also dropped the style change to make a single-purpose patch. - Radim]
Signed-off-by: Radim Krčmář

Wei Yang
2018-12-21 18:28:26 +0800
7a86dab8c kvm: Change offset in kvm_write_guest_offset_cached to unsigned ... Browse Code »

Since the offset is added directly to the hva from the
gfn_to_hva_cache, a negative offset could result in an out of bounds
write. The existing BUG_ON only checks for addresses beyond the end of
the gfn_to_hva_cache, not for addresses before the start of the
gfn_to_hva_cache.

Note that all current call sites have non-negative offsets.

Fixes: 4ec6e8636256 ("kvm: Introduce kvm_write_guest_offset_cached()")
Reported-by: Cfir Cohen
Signed-off-by: Jim Mattson
Reviewed-by: Cfir Cohen
Reviewed-by: Peter Shier
Reviewed-by: Krish Sadhukhan
Reviewed-by: Sean Christopherson
Signed-off-by: Radim Krčmář

Jim Mattson
2018-12-21 18:28:22 +0800
f1b9dd5eb kvm: Disallow wraparound in kvm_gfn_to_hva_cache_init ... Browse Code »

Previously, in the case where (gpa + len) wrapped around, the entire
region was not validated, as the comment claimed. It doesn't actually
seem that wraparound should be allowed here at all.

Furthermore, since some callers don't check the return code from this
function, it seems prudent to clear ghc->memslot in the event of an
error.

Fixes: 8f964525a121f ("KVM: Allow cross page reads and writes from cached translations.")
Reported-by: Cfir Cohen
Signed-off-by: Jim Mattson
Reviewed-by: Cfir Cohen
Reviewed-by: Marc Orr
Cc: Andrew Honig
Signed-off-by: Radim Krčmář

Jim Mattson
2018-12-21 18:28:22 +0800

20 Dec, 2018

9 commits

8c5e14f43 Merge tag 'kvmarm-for-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/kv… ... Browse Code »

…marm/kvmarm into HEAD

KVM/arm updates for 4.21

- Large PUD support for HugeTLB
- Single-stepping fixes
- Improved tracing
- Various timer and vgic fixups

Paolo Bonzini
2018-12-20 03:33:55 +0800
58466766c arm/arm64: KVM: Add ARM_EXCEPTION_IS_TRAP macro ... Browse Code »

32 and 64bit use different symbols to identify the traps.
32bit has a fine grained approach (prefetch abort, data abort and HVC),
while 64bit is pretty happy with just "trap".

This has been fine so far, except that we now need to decode some
of that in tracepoints that are common to both architectures.

Introduce ARM_EXCEPTION_IS_TRAP which abstracts the trap symbols
and make the tracepoint use it.

Acked-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Marc Zyngier
2018-12-20 01:47:53 +0800
6794ad544 KVM: arm/arm64: Fix unintended stage 2 PMD mappings ... Browse Code »

There are two things we need to take care of when we create block
mappings in the stage 2 page tables:

(1) The alignment within a PMD between the host address range and the
guest IPA range must be the same, since otherwise we end up mapping
pages with the wrong offset.

(2) The head and tail of a memory slot may not cover a full block
size, and we have to take care to not map those with block
descriptors, since we could expose memory to the guest that the host
did not intend to expose.

So far, we have been taking care of (1), but not (2), and our commentary
describing (1) was somewhat confusing.

This commit attempts to factor out the checks of both into a common
function, and if we don't pass the check, we won't attempt any PMD
mappings for neither hugetlbfs nor THP.

Note that we used to only check the alignment for THP, not for
hugetlbfs, but as far as I can tell the check needs to be applied to
both scenarios.

Cc: Ralph Palutke
Cc: Lukas Braun
Reported-by: Lukas Braun
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-12-20 01:47:52 +0800
107352a24 arm/arm64: KVM: vgic: Force VM halt when changing the active state of GICv3 PPIs/SGIs ... Browse Code »

We currently only halt the guest when a vCPU messes with the active
state of an SPI. This is perfectly fine for GICv2, but isn't enough
for GICv3, where all vCPUs can access the state of any other vCPU.

Let's broaden the condition to include any GICv3 interrupt that
has an active state (i.e. all but LPIs).

Cc: stable@vger.kernel.org
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Marc Zyngier
2018-12-20 01:47:08 +0800
6e14ef1d1 KVM: arm/arm64: arch_timer: Simplify kvm_timer_vcpu_terminate ... Browse Code »

kvm_timer_vcpu_terminate can only be called in two scenarios:

1. As part of cleanup during a failed VCPU create
2. As part of freeing the whole VM (struct kvm refcount == 0)

In the first case, we cannot have programmed any timers or mapped any
IRQs, and therefore we do not have to cancel anything or unmap anything.

In the second case, the VCPU will have gone through kvm_timer_vcpu_put,
which will have canceled the emulated physical timer's hrtimer, and we
do not need to that here as well. We also do not care if the irq is
recorded as mapped or not in the VGIC data structure, because the whole
VM is going away. That leaves us only with having to ensure that we
cancel the bg_timer if we were blocking the last time we called
kvm_timer_vcpu_put().

Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-12-20 01:47:07 +0800
8a411b060 KVM: arm/arm64: Remove arch timer workqueue ... Browse Code »

The use of a work queue in the hrtimer expire function for the bg_timer
is a leftover from the time when we would inject interrupts when the
bg_timer expired.

Since we are no longer doing that, we can instead call
kvm_vcpu_wake_up() directly from the hrtimer function and remove all
workqueue functionality from the arch timer code.

Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-12-20 01:47:07 +0800
71a7e47f3 KVM: arm/arm64: Fixup the kvm_exit tracepoint ... Browse Code »

The kvm_exit tracepoint strangely always reported exits as being IRQs.
This seems to be because either the __print_symbolic or the tracepoint
macros use a variable named idx.

Take this chance to update the fields in the tracepoint to reflect the
concepts in the arm64 architecture that we pass to the tracepoint and
move the exception type table to the same location and header files as
the exits code.

We also clear out the exception code to 0 for IRQ exits (which
translates to UNKNOWN in text) to make it slighyly less confusing to
parse the trace output.

Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-12-20 01:47:06 +0800
9009782a4 KVM: arm/arm64: vgic: Consider priority and active state for pending irq ... Browse Code »

When checking if there are any pending IRQs for the VM, consider the
active state and priority of the IRQs as well.

Otherwise we could be continuously scheduling a guest hypervisor without
it seeing an IRQ.

Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-12-20 01:47:06 +0800
c23b2e6fc KVM: arm/arm64: vgic: Fix off-by-one bug in vgic_get_irq() ... Browse Code »

When using the nospec API, it should be taken into account that:

"...if the CPU speculates past the bounds check then
* array_index_nospec() will clamp the index within the range of [0,
* size)."

The above is part of the header for macro array_index_nospec() in
linux/nospec.h

Now, in this particular case, if intid evaluates to exactly VGIC_MAX_SPI
or to exaclty VGIC_MAX_PRIVATE, the array_index_nospec() macro ends up
returning VGIC_MAX_SPI - 1 or VGIC_MAX_PRIVATE - 1 respectively, instead
of VGIC_MAX_SPI or VGIC_MAX_PRIVATE, which, based on the original logic:

/* SGIs and PPIs */
if (intid arch.vgic_cpu.private_irqs[intid];

/* SPIs */
if (intid arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS];

are valid values for intid.

Fix this by calling array_index_nospec() macro with VGIC_MAX_PRIVATE + 1
and VGIC_MAX_SPI + 1 as arguments for its parameter size.

Fixes: 41b87599c743 ("KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_get_irq()")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva
[dropped the SPI part which was fixed separately]
Signed-off-by: Marc Zyngier

Gustavo A. R. Silva
2018-12-20 01:46:07 +0800

19 Dec, 2018

1 commit

987d1149b KVM: fix unregistering coalesced mmio zone from wrong bus ... Browse Code »

If you register a kvm_coalesced_mmio_zone with '.pio = 0' but then
unregister it with '.pio = 1', KVM_UNREGISTER_COALESCED_MMIO will try to
unregister it from KVM_PIO_BUS rather than KVM_MMIO_BUS, which is a
no-op. But it frees the kvm_coalesced_mmio_dev anyway, causing a
use-after-free.

Fix it by only unregistering and freeing the zone if the correct value
of 'pio' is provided.

Reported-by: syzbot+f87f60bb6f13f39b54e3@syzkaller.appspotmail.com
Fixes: 0804c849f1df ("kvm/x86 : add coalesced pio support")
Signed-off-by: Eric Biggers
Signed-off-by: Paolo Bonzini

Eric Biggers
2018-12-19 05:07:25 +0800

18 Dec, 2018

15 commits

bea2ef803 KVM: arm/arm64: vgic: Cap SPIs to the VM-defined maximum ... Browse Code »

SPIs should be checked against the VMs specific configuration, and
not the architectural maximum.

Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier

Marc Zyngier
2018-12-18 23:14:50 +0800
6992195cc KVM: arm64: Clarify explanation of STAGE2_PGTABLE_LEVELS ... Browse Code »

In attempting to re-construct the logic for our stage 2 page table
layout I found the reasoning in the comment explaining how we calculate
the number of levels used for stage 2 page tables a bit backwards.

This commit attempts to clarify the comment, to make it slightly easier
to read without having the Arm ARM open on the right page.

While we're at it, fixup a typo in a comment that was recently changed.

Reviewed-by: Suzuki K Poulose
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-12-18 23:14:50 +0800
2e2f6c3c0 KVM: arm/arm64: vgic: Do not cond_resched_lock() with IRQs disabled ... Browse Code »

To change the active state of an MMIO, halt is requested for all vcpus of
the affected guest before modifying the IRQ state. This is done by calling
cond_resched_lock() in vgic_mmio_change_active(). However interrupts are
disabled at this point and we cannot reschedule a vcpu.

We actually don't need any of this, as kvm_arm_halt_guest ensures that
all the other vcpus are out of the guest. Let's just drop that useless
code.

Signed-off-by: Julien Thierry
Suggested-by: Christoffer Dall
Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier

Julien Thierry
2018-12-18 23:14:49 +0800
b8e0ba7c8 KVM: arm64: Add support for creating PUD hugepages at stage 2 ... Browse Code »

KVM only supports PMD hugepages at stage 2. Now that the various page
handling routines are updated, extend the stage 2 fault handling to
map in PUD hugepages.

Addition of PUD hugepage support enables additional page sizes (e.g.,
1G with 4K granule) which can be useful on cores that support mapping
larger block sizes in the TLB entries.

Signed-off-by: Punit Agrawal
Reviewed-by: Christoffer Dall
Cc: Russell King
Cc: Catalin Marinas
Cc: Will Deacon
[ Replace BUG() => WARN_ON(1) for arm32 PUD helpers ]
Signed-off-by: Suzuki Poulose
Signed-off-by: Marc Zyngier

Punit Agrawal
2018-12-18 23:14:49 +0800
35a639661 KVM: arm64: Update age handlers to support PUD hugepages ... Browse Code »

In preparation for creating larger hugepages at Stage 2, add support
to the age handling notifiers for PUD hugepages when encountered.

Provide trivial helpers for arm32 to allow sharing code.

Signed-off-by: Punit Agrawal
Reviewed-by: Christoffer Dall
Cc: Russell King
Cc: Catalin Marinas
Cc: Will Deacon
[ Replaced BUG() => WARN_ON(1) for arm32 PUD helpers ]
Signed-off-by: Suzuki K Poulose
Signed-off-by: Marc Zyngier

Punit Agrawal
2018-12-18 23:14:48 +0800
eb3f0624e KVM: arm64: Support handling access faults for PUD hugepages ... Browse Code »

In preparation for creating larger hugepages at Stage 2, extend the
access fault handling at Stage 2 to support PUD hugepages when
encountered.

Provide trivial helpers for arm32 to allow sharing of code.

Signed-off-by: Punit Agrawal
Reviewed-by: Christoffer Dall
Cc: Russell King
Cc: Catalin Marinas
Cc: Will Deacon
[ Replaced BUG() => WARN_ON(1) in PUD helpers ]
Signed-off-by: Suzuki K Poulose
Signed-off-by: Marc Zyngier

Punit Agrawal
2018-12-18 23:14:48 +0800
86d1c55ea KVM: arm64: Support PUD hugepage in stage2_is_exec() ... Browse Code »

In preparation for creating PUD hugepages at stage 2, add support for
detecting execute permissions on PUD page table entries. Faults due to
lack of execute permissions on page table entries is used to perform
i-cache invalidation on first execute.

Provide trivial implementations of arm32 helpers to allow sharing of
code.

Signed-off-by: Punit Agrawal
Reviewed-by: Christoffer Dall
Cc: Russell King
Cc: Catalin Marinas
Cc: Will Deacon
[ Replaced BUG() => WARN_ON(1) in arm32 PUD helpers ]
Signed-off-by: Suzuki K Poulose
Signed-off-by: Marc Zyngier

Punit Agrawal
2018-12-18 23:14:48 +0800
4ea5af531 KVM: arm64: Support dirty page tracking for PUD hugepages ... Browse Code »

In preparation for creating PUD hugepages at stage 2, add support for
write protecting PUD hugepages when they are encountered. Write
protecting guest tables is used to track dirty pages when migrating
VMs.

Also, provide trivial implementations of required kvm_s2pud_* helpers
to allow sharing of code with arm32.

Signed-off-by: Punit Agrawal
Reviewed-by: Christoffer Dall
Cc: Russell King
Cc: Catalin Marinas
Cc: Will Deacon
[ Replaced BUG() => WARN_ON() in arm32 pud helpers ]
Signed-off-by: Suzuki K Poulose
Signed-off-by: Marc Zyngier

Punit Agrawal
2018-12-18 23:14:47 +0800
f8df73388 KVM: arm/arm64: Introduce helpers to manipulate page table entries ... Browse Code »

Introduce helpers to abstract architectural handling of the conversion
of pfn to page table entries and marking a PMD page table entry as a
block entry.

The helpers are introduced in preparation for supporting PUD hugepages
at stage 2 - which are supported on arm64 but do not exist on arm.

Signed-off-by: Punit Agrawal
Reviewed-by: Suzuki K Poulose
Acked-by: Christoffer Dall
Cc: Russell King
Cc: Catalin Marinas
Cc: Will Deacon
Reviewed-by: Marc Zyngier
Signed-off-by: Suzuki K Poulose
Signed-off-by: Marc Zyngier

Punit Agrawal
2018-12-18 23:14:47 +0800
6396b852e KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault ... Browse Code »

Stage 2 fault handler marks a page as executable if it is handling an
execution fault or if it was a permission fault in which case the
executable bit needs to be preserved.

The logic to decide if the page should be marked executable is
duplicated for PMD and PTE entries. To avoid creating another copy
when support for PUD hugepages is introduced refactor the code to
share the checks needed to mark a page table entry as executable.

Signed-off-by: Punit Agrawal
Reviewed-by: Suzuki K Poulose
Reviewed-by: Christoffer Dall
Signed-off-by: Suzuki K Poulose
Signed-off-by: Marc Zyngier

Punit Agrawal
2018-12-18 23:14:47 +0800
3f58bf634 KVM: arm/arm64: Share common code in user_mem_abort() ... Browse Code »

The code for operations such as marking the pfn as dirty, and
dcache/icache maintenance during stage 2 fault handling is duplicated
between normal pages and PMD hugepages.

Instead of creating another copy of the operations when we introduce
PUD hugepages, let's share them across the different pagesizes.

Signed-off-by: Punit Agrawal
Reviewed-by: Suzuki K Poulose
Reviewed-by: Christoffer Dall
Signed-off-by: Suzuki K Poulose
Signed-off-by: Marc Zyngier

Punit Agrawal
2018-12-18 23:14:46 +0800
60c3ab30d KVM: arm/arm64: vgic-v2: Set active_source to 0 when restoring state ... Browse Code »

When restoring the active state from userspace, we don't know which CPU
was the source for the active state, and this is not architecturally
exposed in any of the register state.

Set the active_source to 0 in this case. In the future, we can expand
on this and exposse the information as additional information to
userspace for GICv2 if anyone cares.

Cc: stable@vger.kernel.org
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-12-18 23:14:46 +0800
fb544d1ca KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less ... Browse Code »

We recently addressed a VMID generation race by introducing a read/write
lock around accesses and updates to the vmid generation values.

However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but
does so without taking the read lock.

As far as I can tell, this can lead to the same kind of race:

VM 0, VCPU 0 VM 0, VCPU 1
------------ ------------
update_vttbr (vmid 254)
update_vttbr (vmid 1) // roll over
read_lock(kvm_vmid_lock);
force_vm_exit()
local_irq_disable
need_new_vmid_gen == false //because vmid gen matches

enter_guest (vmid 254)
kvm_arch.vttbr = :
read_unlock(kvm_vmid_lock);

enter_guest (vmid 1)

Which results in running two VCPUs in the same VM with different VMIDs
and (even worse) other VCPUs from other VMs could now allocate clashing
VMID 254 from the new generation as long as VCPU 0 is not exiting.

Attempt to solve this by making sure vttbr is updated before another CPU
can observe the updated VMID generation.

Cc: stable@vger.kernel.org
Fixes: f0cf47d939d0 "KVM: arm/arm64: Close VMID generation race"
Reviewed-by: Julien Thierry
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-12-18 23:14:45 +0800
bd7d95caf arm64: KVM: Consistently advance singlestep when emulating instructions ... Browse Code »

When we emulate a guest instruction, we don't advance the hardware
singlestep state machine, and thus the guest will receive a software
step exception after a next instruction which is not emulated by the
host.

We bodge around this in an ad-hoc fashion. Sometimes we explicitly check
whether userspace requested a single step, and fake a debug exception
from within the kernel. Other times, we advance the HW singlestep state
rely on the HW to generate the exception for us. Thus, the observed step
behaviour differs for host and guest.

Let's make this simpler and consistent by always advancing the HW
singlestep state machine when we skip an instruction. Thus we can rely
on the hardware to generate the singlestep exception for us, and never
need to explicitly check for an active-pending step, nor do we need to
fake a debug exception from the guest.

Cc: Peter Maydell
Reviewed-by: Alex Bennée
Reviewed-by: Christoffer Dall
Signed-off-by: Mark Rutland
Signed-off-by: Marc Zyngier

Mark Rutland
2018-12-18 22:11:37 +0800
0d640732d arm64: KVM: Skip MMIO insn after emulation ... Browse Code »

When we emulate an MMIO instruction, we advance the CPU state within
decode_hsr(), before emulating the instruction effects.

Having this logic in decode_hsr() is opaque, and advancing the state
before emulation is problematic. It gets in the way of applying
consistent single-step logic, and it prevents us from being able to fail
an MMIO instruction with a synchronous exception.

Clean this up by only advancing the CPU state *after* the effects of the
instruction are emulated.

Cc: Peter Maydell
Reviewed-by: Alex Bennée
Reviewed-by: Christoffer Dall
Signed-off-by: Mark Rutland
Signed-off-by: Marc Zyngier

Mark Rutland
2018-12-18 22:10:36 +0800

14 Dec, 2018

3 commits

2a31b9db1 kvm: introduce manual dirty log reprotect ... Browse Code »

There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:

1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).

This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.

Signed-off-by: Paolo Bonzini

Paolo Bonzini
2018-12-14 19:34:19 +0800
8fe65a829 kvm: rename last argument to kvm_get_dirty_log_protect ... Browse Code »

When manual dirty log reprotect will be enabled, kvm_get_dirty_log_protect's
pointer argument will always be false on exit, because no TLB flush is needed
until the manual re-protection operation. Rename it from "is_dirty" to "flush",
which more accurately tells the caller what they have to do with it.

Signed-off-by: Paolo Bonzini

Paolo Bonzini
2018-12-14 19:34:18 +0800
e5d83c74a kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic ... Browse Code »

The first such capability to be handled in virt/kvm/ will be manual
dirty page reprotection.

Signed-off-by: Paolo Bonzini

Paolo Bonzini
2018-12-14 19:34:18 +0800

10 Dec, 2018

1 commit

33e5f4e50 KVM: arm64: Rework detection of SVE, !VHE systems ... Browse Code »

An SVE system is so far the only case where we mandate VHE. As we're
starting to grow this requirements, let's slightly rework the way we
deal with that situation, allowing for easy extension of this check.

Acked-by: Christoffer Dall
Reviewed-by: James Morse
Signed-off-by: Marc Zyngier
Signed-off-by: Will Deacon

Marc Zyngier
2018-12-10 19:57:52 +0800

13 Nov, 2018

1 commit

d4d592a6e KVM: arm/arm64: vgic: Replace spin_is_locked() with lockdep ... Browse Code »

lockdep_assert_held() is better suited to checking locking requirements,
since it only checks if the current thread holds the lock regardless of
whether someone else does. This is also a step towards possibly removing
spin_is_locked().

Signed-off-by: Lance Roy
Cc: Marc Zyngier
Cc: Eric Auger
Cc: linux-arm-kernel@lists.infradead.org
Cc:
Signed-off-by: Paul E. McKenney
Acked-by: Christoffer Dall

Lance Roy
2018-11-13 01:06:22 +0800

27 Oct, 2018

1 commit

4e15a073a Revert "mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks" ... Browse Code »

Revert 5ff7091f5a2ca ("mm, mmu_notifier: annotate mmu notifiers with
blockable invalidate callbacks").

MMU_INVALIDATE_DOES_NOT_BLOCK flags was the only one used and it is no
longer needed since 93065ac753e4 ("mm, oom: distinguish blockable mode for
mmu notifiers"). We now have a full support for per range !blocking
behavior so we can drop the stop gap workaround which the per notifier
flag was used for.

Link: http://lkml.kernel.org/r/20180827112623.8992-4-mhocko@kernel.org
Signed-off-by: Michal Hocko
Cc: David Rientjes
Cc: Boris Ostrovsky
Cc: Jerome Glisse
Cc: Juergen Gross
Cc: Tetsuo Handa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2018-10-27 07:25:19 +0800

26 Oct, 2018

1 commit

0d1e8b8d2 Merge tag 'kvm-4.20-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM updates from Radim Krčmář:
"ARM:
- Improved guest IPA space support (32 to 52 bits)

- RAS event delivery for 32bit

- PMU fixes

- Guest entry hardening

- Various cleanups

- Port of dirty_log_test selftest

PPC:
- Nested HV KVM support for radix guests on POWER9. The performance
is much better than with PR KVM. Migration and arbitrary level of
nesting is supported.

- Disable nested HV-KVM on early POWER9 chips that need a particular
hardware bug workaround

- One VM per core mode to prevent potential data leaks

- PCI pass-through optimization

- merge ppc-kvm topic branch and kvm-ppc-fixes to get a better base

s390:
- Initial version of AP crypto virtualization via vfio-mdev

- Improvement for vfio-ap

- Set the host program identifier

- Optimize page table locking

x86:
- Enable nested virtualization by default

- Implement Hyper-V IPI hypercalls

- Improve #PF and #DB handling

- Allow guests to use Enlightened VMCS

- Add migration selftests for VMCS and Enlightened VMCS

- Allow coalesced PIO accesses

- Add an option to perform nested VMCS host state consistency check
through hardware

- Automatic tuning of lapic_timer_advance_ns

- Many fixes, minor improvements, and cleanups"

* tag 'kvm-4.20-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
KVM/nVMX: Do not validate that posted_intr_desc_addr is page aligned
Revert "kvm: x86: optimize dr6 restore"
KVM: PPC: Optimize clearing TCEs for sparse tables
x86/kvm/nVMX: tweak shadow fields
selftests/kvm: add missing executables to .gitignore
KVM: arm64: Safety check PSTATE when entering guest and handle IL
KVM: PPC: Book3S HV: Don't use streamlined entry path on early POWER9 chips
arm/arm64: KVM: Enable 32 bits kvm vcpu events support
arm/arm64: KVM: Rename function kvm_arch_dev_ioctl_check_extension()
KVM: arm64: Fix caching of host MDCR_EL2 value
KVM: VMX: enable nested virtualization by default
KVM/x86: Use 32bit xor to clear registers in svm.c
kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD
kvm: vmx: Defer setting of DR6 until #DB delivery
kvm: x86: Defer setting of CR2 until #PF delivery
kvm: x86: Add payload operands to kvm_multiple_exception
kvm: x86: Add exception payload fields to kvm_vcpu_events
kvm: x86: Add has_payload and payload to kvm_queued_exception
KVM: Documentation: Fix omission in struct kvm_vcpu_events
KVM: selftests: add Enlightened VMCS test
...

Linus Torvalds
2018-10-26 08:57:35 +0800