Eric Lee / smarc-fsl-linux-kernel

23 Aug, 2018

3 commits

b37211531 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull second set of KVM updates from Paolo Bonzini:
"ARM:
- Support for Group0 interrupts in guests
- Cache management optimizations for ARMv8.4 systems
- Userspace interface for RAS
- Fault path optimization
- Emulated physical timer fixes
- Random cleanups

x86:
- fixes for L1TF
- a new test case
- non-support for SGX (inject the right exception in the guest)
- fix lockdep false positive"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (49 commits)
KVM: VMX: fixes for vmentry_l1d_flush module parameter
kvm: selftest: add dirty logging test
kvm: selftest: pass in extra memory when create vm
kvm: selftest: include the tools headers
kvm: selftest: unify the guest port macros
tools: introduce test_and_clear_bit
KVM: x86: SVM: Call x86_spec_ctrl_set_guest/host() with interrupts disabled
KVM: vmx: Inject #UD for SGX ENCLS instruction in guest
KVM: vmx: Add defines for SGX ENCLS exiting
x86/kvm/vmx: Fix coding style in vmx_setup_l1d_flush()
x86: kvm: avoid unused variable warning
KVM: Documentation: rename the capability of KVM_CAP_ARM_SET_SERROR_ESR
KVM: arm/arm64: Skip updating PTE entry if no change
KVM: arm/arm64: Skip updating PMD entry if no change
KVM: arm: Use true and false for boolean values
KVM: arm/arm64: vgic: Do not use spin_lock_irqsave/restore with irq disabled
KVM: arm/arm64: vgic: Move DEBUG_SPINLOCK_BUG_ON to vgic.h
KVM: arm: vgic-v3: Add support for ICC_SGI0R and ICC_ASGI1R accesses
KVM: arm64: vgic-v3: Add support for ICC_SGI0R_EL1 and ICC_ASGI1R_EL1 accesses
KVM: arm/arm64: vgic-v3: Add core support for Group0 SGIs
...

Linus Torvalds
2018-08-23 04:52:44 +0800
cd9b44f90 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge more updates from Andrew Morton:

- the rest of MM

- procfs updates

- various misc things

- more y2038 fixes

- get_maintainer updates

- lib/ updates

- checkpatch updates

- various epoll updates

- autofs updates

- hfsplus

- some reiserfs work

- fatfs updates

- signal.c cleanups

- ipc/ updates

* emailed patches from Andrew Morton : (166 commits)
ipc/util.c: update return value of ipc_getref from int to bool
ipc/util.c: further variable name cleanups
ipc: simplify ipc initialization
ipc: get rid of ids->tables_initialized hack
lib/rhashtable: guarantee initial hashtable allocation
lib/rhashtable: simplify bucket_table_alloc()
ipc: drop ipc_lock()
ipc/util.c: correct comment in ipc_obtain_object_check
ipc: rename ipcctl_pre_down_nolock()
ipc/util.c: use ipc_rcu_putref() for failues in ipc_addid()
ipc: reorganize initialization of kern_ipc_perm.seq
ipc: compute kern_ipc_perm.id under the ipc lock
init/Kconfig: remove EXPERT from CHECKPOINT_RESTORE
fs/sysv/inode.c: use ktime_get_real_seconds() for superblock stamp
adfs: use timespec64 for time conversion
kernel/sysctl.c: fix typos in comments
drivers/rapidio/devices/rio_mport_cdev.c: remove redundant pointer md
fork: don't copy inconsistent signal handler state to child
signal: make get_signal() return bool
signal: make sigkill_pending() return bool
...

Linus Torvalds
2018-08-23 03:34:08 +0800
93065ac75 mm, oom: distinguish blockable mode for mmu notifiers ... Browse Code »

There are several blockable mmu notifiers which might sleep in
mmu_notifier_invalidate_range_start and that is a problem for the
oom_reaper because it needs to guarantee a forward progress so it cannot
depend on any sleepable locks.

Currently we simply back off and mark an oom victim with blockable mmu
notifiers as done after a short sleep. That can result in selecting a new
oom victim prematurely because the previous one still hasn't torn its
memory down yet.

We can do much better though. Even if mmu notifiers use sleepable locks
there is no reason to automatically assume those locks are held. Moreover
majority of notifiers only care about a portion of the address space and
there is absolutely zero reason to fail when we are unmapping an unrelated
range. Many notifiers do really block and wait for HW which is harder to
handle and we have to bail out though.

This patch handles the low hanging fruit.
__mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
are not allowed to sleep if the flag is set to false. This is achieved by
using trylock instead of the sleepable lock for most callbacks and
continue as long as we do not block down the call chain.

I think we can improve that even further because there is a common pattern
to do a range lookup first and then do something about that. The first
part can be done without a sleeping lock in most cases AFAICS.

The oom_reaper end then simply retries if there is at least one notifier
which couldn't make any progress in !blockable mode. A retry loop is
already implemented to wait for the mmap_sem and this is basically the
same thing.

The simplest way for driver developers to test this code path is to wrap
userspace code which uses these notifiers into a memcg and set the hard
limit to hit the oom. This can be done e.g. after the test faults in all
the mmu notifier managed memory and set the hard limit to something really
small. Then we are looking for a proper process tear down.

[akpm@linux-foundation.org: coding style fixes]
[akpm@linux-foundation.org: minor code simplification]
Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.org
Signed-off-by: Michal Hocko
Acked-by: Christian König # AMD notifiers
Acked-by: Leon Romanovsky # mlx and umem_odp
Reported-by: David Rientjes
Cc: "David (ChunMing) Zhou"
Cc: Paolo Bonzini
Cc: Alex Deucher
Cc: David Airlie
Cc: Jani Nikula
Cc: Joonas Lahtinen
Cc: Rodrigo Vivi
Cc: Doug Ledford
Cc: Jason Gunthorpe
Cc: Mike Marciniszyn
Cc: Dennis Dalessandro
Cc: Sudeep Dutt
Cc: Ashutosh Dixit
Cc: Dimitri Sivanich
Cc: Boris Ostrovsky
Cc: Juergen Gross
Cc: "Jérôme Glisse"
Cc: Andrea Arcangeli
Cc: Felix Kuehling
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2018-08-23 01:52:44 +0800

22 Aug, 2018

2 commits

631989303 Merge tag 'kvmarm-for-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kv… ... Browse Code »

…marm/kvmarm into HEAD

KVM/arm updates for 4.19

- Support for Group0 interrupts in guests
- Cache management optimizations for ARMv8.4 systems
- Userspace interface for RAS, allowing error retrival and injection
- Fault path optimization
- Emulated physical timer fixes
- Random cleanups

Paolo Bonzini
2018-08-22 20:07:56 +0800
0214f46b3 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/eb… ... Browse Code »

…iederm/user-namespace

Pull core signal handling updates from Eric Biederman:
"It was observed that a periodic timer in combination with a
sufficiently expensive fork could prevent fork from every completing.
This contains the changes to remove the need for that restart.

This set of changes is split into several parts:

- The first part makes PIDTYPE_TGID a proper pid type instead
something only for very special cases. The part starts using
PIDTYPE_TGID enough so that in __send_signal where signals are
actually delivered we know if the signal is being sent to a a group
of processes or just a single process.

- With that prep work out of the way the logic in fork is modified so
that fork logically makes signals received while it is running
appear to be received after the fork completes"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (22 commits)
signal: Don't send signals to tasks that don't exist
signal: Don't restart fork when signals come in.
fork: Have new threads join on-going signal group stops
fork: Skip setting TIF_SIGPENDING in ptrace_init_task
signal: Add calculate_sigpending()
fork: Unconditionally exit if a fatal signal is pending
fork: Move and describe why the code examines PIDNS_ADDING
signal: Push pid type down into complete_signal.
signal: Push pid type down into __send_signal
signal: Push pid type down into send_signal
signal: Pass pid type into do_send_sig_info
signal: Pass pid type into send_sigio_to_task & send_sigurg_to_task
signal: Pass pid type into group_send_sig_info
signal: Pass pid and pid type into send_sigqueue
posix-timers: Noralize good_sigevent
signal: Use PIDTYPE_TGID to clearly store where file signals will be sent
pid: Implement PIDTYPE_TGID
pids: Move the pgrp and session pid pointers from task_struct to signal_struct
kvm: Don't open code task_pid in kvm_vcpu_ioctl
pids: Compute task_tgid using signal->leader_pid
...

Linus Torvalds
2018-08-22 04:47:29 +0800

20 Aug, 2018

1 commit

e61cf2e3a Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull first set of KVM updates from Paolo Bonzini:
"PPC:
- minor code cleanups

x86:
- PCID emulation and CR3 caching for shadow page tables
- nested VMX live migration
- nested VMCS shadowing
- optimized IPI hypercall
- some optimizations

ARM will come next week"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (85 commits)
kvm: x86: Set highest physical address bits in non-present/reserved SPTEs
KVM/x86: Use CC_SET()/CC_OUT in arch/x86/kvm/vmx.c
KVM: X86: Implement PV IPIs in linux guest
KVM: X86: Add kvm hypervisor init time platform setup callback
KVM: X86: Implement "send IPI" hypercall
KVM/x86: Move X86_CR4_OSXSAVE check into kvm_valid_sregs()
KVM: x86: Skip pae_root shadow allocation if tdp enabled
KVM/MMU: Combine flushing remote tlb in mmu_set_spte()
KVM: vmx: skip VMWRITE of HOST_{FS,GS}_BASE when possible
KVM: vmx: skip VMWRITE of HOST_{FS,GS}_SEL when possible
KVM: vmx: always initialize HOST_{FS,GS}_BASE to zero during setup
KVM: vmx: move struct host_state usage to struct loaded_vmcs
KVM: vmx: compute need to reload FS/GS/LDT on demand
KVM: nVMX: remove a misleading comment regarding vmcs02 fields
KVM: vmx: rename __vmx_load_host_state() and vmx_save_host_state()
KVM: vmx: add dedicated utility to access guest's kernel_gs_base
KVM: vmx: track host_state.loaded using a loaded_vmcs pointer
KVM: vmx: refactor segmentation code in vmx_save_host_state()
kvm: nVMX: Fix fault priority for VMX operations
kvm: nVMX: Fix fault vector for VMX operation at CPL > 0
...

Linus Torvalds
2018-08-20 01:38:36 +0800

15 Aug, 2018

1 commit

1202f4fdb Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux ... Browse Code »

Pull arm64 updates from Will Deacon:
"A bunch of good stuff in here. Worth noting is that we've pulled in
the x86/mm branch from -tip so that we can make use of the core
ioremap changes which allow us to put down huge mappings in the
vmalloc area without screwing up the TLB. Much of the positive
diffstat is because of the rseq selftest for arm64.

Summary:

- Wire up support for qspinlock, replacing our trusty ticket lock
code

- Add an IPI to flush_icache_range() to ensure that stale
instructions fetched into the pipeline are discarded along with the
I-cache lines

- Support for the GCC "stackleak" plugin

- Support for restartable sequences, plus an arm64 port for the
selftest

- Kexec/kdump support on systems booting with ACPI

- Rewrite of our syscall entry code in C, which allows us to zero the
GPRs on entry from userspace

- Support for chained PMU counters, allowing 64-bit event counters to
be constructed on current CPUs

- Ensure scheduler topology information is kept up-to-date with CPU
hotplug events

- Re-enable support for huge vmalloc/IO mappings now that the core
code has the correct hooks to use break-before-make sequences

- Miscellaneous, non-critical fixes and cleanups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (90 commits)
arm64: alternative: Use true and false for boolean values
arm64: kexec: Add comment to explain use of __flush_icache_range()
arm64: sdei: Mark sdei stack helper functions as static
arm64, kaslr: export offset in VMCOREINFO ELF notes
arm64: perf: Add cap_user_time aarch64
efi/libstub: Only disable stackleak plugin for arm64
arm64: drop unused kernel_neon_begin_partial() macro
arm64: kexec: machine_kexec should call __flush_icache_range
arm64: svc: Ensure hardirq tracing is updated before return
arm64: mm: Export __sync_icache_dcache() for xen-privcmd
drivers/perf: arm-ccn: Use devm_ioremap_resource() to map memory
arm64: Add support for STACKLEAK gcc plugin
arm64: Add stack information to on_accessible_stack
drivers/perf: hisi: update the sccl_id/ccl_id when MT is supported
arm64: fix ACPI dependencies
rseq/selftests: Add support for arm64
arm64: acpi: fix alignment fault in accessing ACPI
efi/arm: map UEFI memory map even w/o runtime services enabled
efi/arm: preserve early mapping of UEFI memory map longer for BGRT
drivers: acpi: add dependency of EFI for arm64
...

Linus Torvalds
2018-08-15 07:39:13 +0800

13 Aug, 2018

2 commits

976d34e2d KVM: arm/arm64: Skip updating PTE entry if no change ... Browse Code »

When there is contention on faulting in a particular page table entry
at stage 2, the break-before-make requirement of the architecture can
lead to additional refaulting due to TLB invalidation.

Avoid this by skipping a page table update if the new value of the PTE
matches the previous value.

Cc: stable@vger.kernel.org
Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Reviewed-by: Suzuki Poulose
Acked-by: Christoffer Dall
Signed-off-by: Punit Agrawal
Signed-off-by: Marc Zyngier

Punit Agrawal
2018-08-13 22:32:01 +0800
86658b819 KVM: arm/arm64: Skip updating PMD entry if no change ... Browse Code »

Contention on updating a PMD entry by a large number of vcpus can lead
to duplicate work when handling stage 2 page faults. As the page table
update follows the break-before-make requirement of the architecture,
it can lead to repeated refaults due to clearing the entry and
flushing the tlbs.

This problem is more likely when -

* there are large number of vcpus
* the mapping is large block mapping

such as when using PMD hugepages (512MB) with 64k pages.

Fix this by skipping the page table update if there is no change in
the entry being updated.

Cc: stable@vger.kernel.org
Fixes: ad361f093c1e ("KVM: ARM: Support hugetlbfs backed huge pages")
Reviewed-by: Suzuki Poulose
Acked-by: Christoffer Dall
Signed-off-by: Punit Agrawal
Signed-off-by: Marc Zyngier

Punit Agrawal
2018-08-13 22:31:35 +0800

12 Aug, 2018

3 commits

d0823cb34 KVM: arm/arm64: vgic: Do not use spin_lock_irqsave/restore with irq disabled ... Browse Code »

kvm_vgic_sync_hwstate is only called with IRQ being disabled.
There is thus no need to call spin_lock_irqsave/restore in
vgic_fold_lr_state and vgic_prune_ap_list.

This patch replace them with the non irq-safe version.

Signed-off-by: Jia He
Acked-by: Christoffer Dall
[maz: commit message tidy-up]
Signed-off-by: Marc Zyngier

Jia He
2018-08-12 19:15:18 +0800
dc961e539 KVM: arm/arm64: vgic: Move DEBUG_SPINLOCK_BUG_ON to vgic.h ... Browse Code »

DEBUG_SPINLOCK_BUG_ON can be used with both vgic-v2 and vgic-v3,
so let's move it to vgic.h

Signed-off-by: Jia He
[maz: commit message tidy-up]
Signed-off-by: Marc Zyngier

Jia He
2018-08-12 19:14:08 +0800
6249f2a47 KVM: arm/arm64: vgic-v3: Add core support for Group0 SGIs ... Browse Code »

Although vgic-v3 now supports Group0 interrupts, it still doesn't
deal with Group0 SGIs. As usually with the GIC, nothing is simple:

- ICC_SGI1R can signal SGIs of both groups, since GICD_CTLR.DS==1
with KVM (as per 8.1.10, Non-secure EL1 access)

- ICC_SGI0R can only generate Group0 SGIs

- ICC_ASGI1R sees its scope refocussed to generate only Group0
SGIs (as per the note at the bottom of Table 8-14)

We only support Group1 SGIs so far, so no material change.

Reviewed-by: Eric Auger
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Marc Zyngier
2018-08-12 19:06:34 +0800

06 Aug, 2018

4 commits

b9b33da2a KVM: try __get_user_pages_fast even if not in atomic context ... Browse Code »

We are currently cutting hva_to_pfn_fast short if we do not want an
immediate exit, which is represented by !async && !atomic. However,
this is unnecessary, and __get_user_pages_fast is *much* faster
because the regular get_user_pages takes pmd_lock/pte_lock.
In fact, when many CPUs take a nested vmexit at the same time
the contention on those locks is visible, and this patch removes
about 25% (compared to 4.18) from vmexit.flat on a 16 vCPU
nested guest.

Suggested-by: Andrea Arcangeli
Signed-off-by: Paolo Bonzini

Paolo Bonzini
2018-08-06 23:59:07 +0800
b08660e59 KVM: x86: Add tlb remote flush callback in kvm_x86_ops. ... Browse Code »

This patch is to provide a way for platforms to register hv tlb remote
flush callback and this helps to optimize operation of tlb flush
among vcpus for nested virtualization case.

Signed-off-by: Lan Tianyu
Signed-off-by: Paolo Bonzini

Tianyu Lan
2018-08-06 23:59:06 +0800
50c28f21d kvm: x86: Use fast CR3 switch for nested VMX ... Browse Code »

Use the fast CR3 switch mechanism to locklessly change the MMU root
page when switching between L1 and L2. The switch from L2 to L1 should
always go through the fast path, while the switch from L1 to L2 should
go through the fast path if L1's CR3/EPTP for L2 hasn't changed
since the last time.

Signed-off-by: Junaid Shahid
Signed-off-by: Paolo Bonzini

Junaid Shahid
2018-08-06 23:58:54 +0800
d2ce98ca0 Merge tag 'v4.18-rc6' into HEAD ... Browse Code »

Pull bug fixes into the KVM development tree to avoid nasty conflicts.

Paolo Bonzini
2018-08-06 23:31:36 +0800

31 Jul, 2018

2 commits

245715cbe KVM: arm/arm64: Fix lost IRQs from emulated physcial timer when blocked ... Browse Code »

When the VCPU is blocked (for example from WFI) we don't inject the
physical timer interrupt if it should fire while the CPU is blocked, but
instead we just wake up the VCPU and expect kvm_timer_vcpu_load to take
care of injecting the interrupt.

Unfortunately, kvm_timer_vcpu_load() doesn't actually do that, it only
has support to schedule a soft timer if the emulated phys timer is
expected to fire in the future.

Follow the same pattern as kvm_timer_update_state() and update the irq
state after potentially scheduling a soft timer.

Reported-by: Andre Przywara
Cc: Stable # 4.15+
Fixes: bbdd52cfcba29 ("KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit")
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-07-31 14:53:20 +0800
7afc4ddbf KVM: arm/arm64: Fix potential loss of ptimer interrupts ... Browse Code »

kvm_timer_update_state() is called when changing the phys timer
configuration registers, either via vcpu reset, as a result of a trap
from the guest, or when userspace programs the registers.

phys_timer_emulate() is in turn called by kvm_timer_update_state() to
either cancel an existing software timer, or program a new software
timer, to emulate the behavior of a real phys timer, based on the change
in configuration registers.

Unfortunately, the interaction between these two functions left a small
race; if the conceptual emulated phys timer should actually fire, but
the soft timer hasn't executed its callback yet, we cancel the timer in
phys_timer_emulate without injecting an irq. This only happens if the
check in kvm_timer_update_state is called before the timer should fire,
which is relatively unlikely, but possible.

The solution is to update the state of the phys timer after calling
phys_timer_emulate, which will pick up the pending timer state and
update the interrupt value.

Note that this leaves the opportunity of raising the interrupt twice,
once in the just-programmed soft timer, and once in
kvm_timer_update_state. Since this always happens synchronously with
the VCPU execution, there is no harm in this, and the guest ever only
sees a single timer interrupt.

Cc: Stable # 4.15+
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-07-31 14:53:16 +0800

25 Jul, 2018

1 commit

4765096f4 Merge branch 'sched/urgent' into sched/core, to pick up fixes ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2018-07-25 17:29:58 +0800

24 Jul, 2018

1 commit

6b8b9a485 KVM: arm/arm64: vgic: Fix possible spectre-v1 write in vgic_mmio_write_apr() ... Browse Code »

It's possible for userspace to control n. Sanitize n when using it as an
array index, to inhibit the potential spectre-v1 write gadget.

Note that while it appears that n must be bound to the interval [0,3]
due to the way it is extracted from addr, we cannot guarantee that
compiler transformations (and/or future refactoring) will ensure this is
the case, and given this is a slow path it's better to always perform
the masking.

Found by smatch.

Signed-off-by: Mark Rutland
Cc: Christoffer Dall
Cc: Marc Zyngier
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Marc Zyngier

Mark Rutland
2018-07-24 20:53:54 +0800

21 Jul, 2018

16 commits

71dbc8a96 kvm: Don't open code task_pid in kvm_vcpu_ioctl ... Browse Code »

Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2018-07-21 23:43:12 +0800
b0960b956 KVM: arm: Add 32bit get/set events support ... Browse Code »

arm64's new use of KVMs get_events/set_events API calls isn't just
or RAS, it allows an SError that has been made pending by KVM as
part of its device emulation to be migrated.

Wire this up for 32bit too.

We only need to read/write the HCR_VA bit, and check that no esr has
been provided, as we don't yet support VDFSR.

Signed-off-by: James Morse
Reviewed-by: Dongjiu Geng
Signed-off-by: Marc Zyngier

James Morse
2018-07-21 23:02:32 +0800
539aee0ed KVM: arm64: Share the parts of get/set events useful to 32bit ... Browse Code »

The get/set events helpers to do some work to check reserved
and padding fields are zero. This is useful on 32bit too.

Move this code into virt/kvm/arm/arm.c, and give the arch
code some underscores.

This is temporarily hidden behind __KVM_HAVE_VCPU_EVENTS until
32bit is wired up.

Signed-off-by: James Morse
Reviewed-by: Dongjiu Geng
Signed-off-by: Marc Zyngier

James Morse
2018-07-21 23:02:31 +0800
b7b27facc arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS ... Browse Code »

For the migrating VMs, user space may need to know the exception
state. For example, in the machine A, KVM make an SError pending,
when migrate to B, KVM also needs to pend an SError.

This new IOCTL exports user-invisible states related to SError.
Together with appropriate user space changes, user space can get/set
the SError exception state to do migrate/snapshot/suspend.

Signed-off-by: Dongjiu Geng
Reviewed-by: James Morse
[expanded documentation wording]
Signed-off-by: James Morse
Signed-off-by: Marc Zyngier

Dongjiu Geng
2018-07-21 23:02:30 +0800
32f8777ed KVM: arm/arm64: vgic: Let userspace opt-in to writable v2 IGROUPR ... Browse Code »

Simply letting IGROUPR be writable from userspace would break
migration from old kernels to newer kernels, because old kernels
incorrectly report interrupt groups as group 1. This would not be a big
problem if userspace wrote GICD_IIDR as read from the kernel, because we
could detect the incompatibility and return an error to userspace.
Unfortunately, this is not the case with current userspace
implementations and simply letting IGROUPR be writable from userspace for
an emulated GICv2 silently breaks migration and causes the destination
VM to no longer run after migration.

We now encourage userspace to write the read and expected value of
GICD_IIDR as the first part of a GIC register restore, and if we observe
a write to GICD_IIDR we know that userspace has been updated and has had
a chance to cope with older kernels (VGICv2 IIDR.Revision == 0)
incorrectly reporting interrupts as group 1, and therefore we now allow
groups to be user writable.

Reviewed-by: Andrew Jones
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-07-21 23:02:29 +0800
d53c2c29a KVM: arm/arm64: vgic: Allow configuration of interrupt groups ... Browse Code »

Implement the required MMIO accessors for GICv2 and GICv3 for the
IGROUPR distributor and redistributor registers.

This can allow guests to change behavior compared to running on previous
versions of KVM, but only to align with the architecture and hardware
implementations.

This also allows userspace to configure the interrupts groups for GICv3.
We don't allow userspace to write the groups on GICv2 just yet, because
that would result in GICv2 guests not receiving interrupts after
migrating from an older kernel that exposes GICv2 interrupts as group 1.

Reviewed-by: Andrew Jones
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-07-21 23:02:29 +0800
b489edc36 KVM: arm/arm64: vgic: Return error on incompatible uaccess GICD_IIDR writes ... Browse Code »

If userspace attempts to write a GICD_IIDR that does not match the
kernel version, return an error to userspace. The intention is to allow
implementation changes inside KVM while avoiding silently breaking
migration resulting in guests not running without any clear indication
of what went wrong.

Reviewed-by: Andrew Jones
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-07-21 23:02:28 +0800
c6e0917b6 KVM: arm/arm64: vgic: Permit uaccess writes to return errors ... Browse Code »

Currently we do not allow any vgic mmio write operations to fail, which
makes sense from mmio traps from the guest. However, we should be able
to report failures to userspace, if userspace writes incompatible values
to read-only registers. Rework the internal interface to allow errors
to be returned on the write side for userspace writes.

Reviewed-by: Andrew Jones
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-07-21 23:02:27 +0800
873220990 KVM: arm/arm64: vgic: Signal IRQs using their configured group ... Browse Code »

Now when we have a group configuration on the struct IRQ, use this state
when populating the LR and signaling interrupts as either group 0 or
group 1 to the VM. Depending on the model of the emulated GIC, and the
guest's configuration of the VMCR, interrupts may be signaled as IRQs or
FIQs to the VM.

Reviewed-by: Andrew Jones
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-07-21 23:02:26 +0800
8df3c8f33 KVM: arm/arm64: vgic: Add group field to struct irq ... Browse Code »

In preparation for proper group 0 and group 1 support in the vgic, we
add a field in the struct irq to store the group of all interrupts.

We initialize the group to group 0 when emulating GICv2 and to group 1
when emulating GICv3, just like we treat them today. LPIs are always
group 1. We also continue to ignore writes from the guest, preserving
existing functionality, for now.

Finally, we also add this field to the vgic debug logic to show the
group for all interrupts.

Reviewed-by: Andrew Jones
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-07-21 23:02:24 +0800
dd6251e46 KVM: arm/arm64: vgic: GICv2 IGROUPR should read as zero ... Browse Code »

We currently don't support grouping in the emulated VGIC, which is a
known defect on KVM (not hurting any currently used guests as far as
we're aware). This is currently handled by treating all interrupts as
group 0 interrupts for an emulated GICv2 and always signaling interrupts
as group 0 to the virtual CPU interface.

However, when reading which group interrupts belongs to in the guest
from the emulated VGIC, the VGIC currently reports group 1 instead of
group 0, which is misleading. Fix this temporarily before introducing
full group support by changing the hander to _raz instead of _rao.

Fixes: fb848db39661a "KVM: arm/arm64: vgic-new: Add GICv2 MMIO handling framework"
Reviewed-by: Andrew Jones
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-07-21 23:02:22 +0800
aa075b0f3 KVM: arm/arm64: vgic: Keep track of implementation revision ... Browse Code »

As we are about to tweak implementation aspects of the VGIC emulation,
while still preserving some level of backwards compatibility support,
add a field to keep track of the implementation revision field which is
reported to the VM and to userspace.

Reviewed-by: Andrew Jones
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-07-21 23:02:21 +0800
a2dca217d KVM: arm/arm64: vgic: Define GICD_IIDR fields for GICv2 and GIv3 ... Browse Code »

Instead of hardcoding the shifts and masks in the GICD_IIDR register
emulation, let's add the definition of these fields to the GIC header
files and use them.

This will make things more obvious when we're going to bump the revision
in the IIDR when we'll make guest-visible changes to the implementation.

Reviewed-by: Andrew Jones
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-07-21 23:02:19 +0800
e294cb3a6 KVM: arm/arm64: vgic-debug: Show LPI status ... Browse Code »

The vgic debugfs file only knows about SGI/PPI/SPI interrupts, and
completely ignores LPIs. Let's fix that.

Signed-off-by: Marc Zyngier

Marc Zyngier
2018-07-21 23:02:16 +0800
2326aceeb KVM: arm64: vgic-its: Remove VLA usage ... Browse Code »

In the quest to remove all stack VLA usage from the kernel[1], this
switches to using a maximum size and adds sanity checks. Additionally
cleans up some of the int-vs-u32 usage and adds additional bounds checking.
As it currently stands, this will always be 8 bytes until the ABI changes.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com

Cc: Christoffer Dall
Cc: Eric Auger
Cc: Andre Przywara
Cc: linux-arm-kernel@lists.infradead.org
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Kees Cook
[maz: dropped WARN_ONs]
Signed-off-by: Marc Zyngier

Kees Cook
2018-07-21 23:02:14 +0800
1d47191de KVM: arm/arm64: Fix vgic init race ... Browse Code »

The vgic_init function can race with kvm_arch_vcpu_create() which does
not hold kvm_lock() and we therefore have no synchronization primitives
to ensure we're doing the right thing.

As the user is trying to initialize or run the VM while at the same time
creating more VCPUs, we just have to refuse to initialize the VGIC in
this case rather than silently failing with a broken VCPU.

Reviewed-by: Eric Auger
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-07-21 23:02:07 +0800

19 Jul, 2018

1 commit

47f7dc4b8 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull kvm fixes from Paolo Bonzini:
"Miscellaneous bugfixes, plus a small patchlet related to Spectre v2"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvmclock: fix TSC calibration for nested guests
KVM: VMX: Mark VMXArea with revision_id of physical CPU even when eVMCS enabled
KVM: irqfd: fix race between EPOLLHUP and irq_bypass_register_consumer
KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel.
x86/kvmclock: set pvti_cpu0_va after enabling kvmclock
x86/kvm/Kconfig: Ensure CRYPTO_DEV_CCP_DD state at minimum matches KVM_AMD
kvm: nVMX: Restore exit qual for VM-entry failure due to MSR loading
x86/kvm/vmx: don't read current->thread.{fs,gs}base of legacy tasks
KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR

Linus Torvalds
2018-07-19 02:08:44 +0800

18 Jul, 2018

2 commits

9432a3175 KVM: irqfd: fix race between EPOLLHUP and irq_bypass_register_consumer ... Browse Code »

A comment warning against this bug is there, but the code is not doing what
the comment says. Therefore it is possible that an EPOLLHUP races against
irq_bypass_register_consumer. The EPOLLHUP handler schedules irqfd_shutdown,
and if that runs soon enough, you get a use-after-free.

Reported-by: syzbot
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini
Reviewed-by: David Hildenbrand

Paolo Bonzini
2018-07-18 17:31:27 +0800
b5020a8e6 KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel. ... Browse Code »

Syzbot reports crashes in kvm_irqfd_assign(), caused by use-after-free
when kvm_irqfd_assign() and kvm_irqfd_deassign() run in parallel
for one specific eventfd. When the assign path hasn't finished but irqfd
has been added to kvm->irqfds.items list, another thead may deassign the
eventfd and free struct kvm_kernel_irqfd(). The assign path then uses
the struct kvm_kernel_irqfd that has been freed by deassign path. To avoid
such issue, keep irqfd under kvm->irq_srcu protection after the irqfd
has been added to kvm->irqfds.items list, and call synchronize_srcu()
in irq_shutdown() to make sure that irqfd has been fully initialized in
the assign path.

Reported-by: Dmitry Vyukov
Cc: Paolo Bonzini
Cc: Radim Krčmář
Cc: Dmitry Vyukov
Signed-off-by: Tianyu Lan
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini

Lan Tianyu
2018-07-18 17:31:27 +0800

13 Jul, 2018

1 commit

03133347b KVM: s390: a utility function for migration ... Browse Code »

Introduce a utility function that will be used later on for storage
attributes migration, and use it in kvm_main.c to replace existing code
that does the same thing.

Signed-off-by: Claudio Imbrenda
Message-Id:
Signed-off-by: Christian Borntraeger

Claudio Imbrenda
2018-07-13 15:48:57 +0800