Eric Lee / smarc-fsl-linux-kernel

16 Aug, 2018

1 commit

f0660d587 KVM: x86: Add a framework for supporting MSR-based features ... Browse Code »

commit 801e459a6f3a63af9d447e6249088c76ae16efc4 upstream

Provide a new KVM capability that allows bits within MSRs to be recognized
as features. Two new ioctls are added to the /dev/kvm ioctl routine to
retrieve the list of these MSRs and then retrieve their values. A kvm_x86_ops
callback is used to determine support for the listed MSR-based features.

Signed-off-by: Tom Lendacky
Signed-off-by: Paolo Bonzini
[Tweaked documentation. - Radim]
Signed-off-by: Radim Krčmář
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Tom Lendacky
2018-08-16 00:12:59 +0800

02 May, 2018

1 commit

e5a290c4f arm/arm64: KVM: Add PSCI version selection API ... Browse Code »

commit 85bd0ba1ff9875798fad94218b627ea9f768f3c3 upstream.

Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1
or 1.0 to a guest, defaulting to the latest version of the PSCI
implementation that is compatible with the requested version. This is
no different from doing a firmware upgrade on KVM.

But in order to give a chance to hypothetical badly implemented guests
that would have a fit by discovering something other than PSCI 0.2,
let's provide a new API that allows userspace to pick one particular
version of the API.

This is implemented as a new class of "firmware" registers, where
we expose the PSCI version. This allows the PSCI version to be
save/restored as part of a guest migration, and also set to
any supported version if the guest requires it.

Cc: stable@vger.kernel.org #4.16
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman

Marc Zyngier
2018-05-02 03:58:27 +0800

26 Apr, 2018

1 commit

ddf09f2a0 KVM: PPC: Book3S HV: Enable migration of decrementer register ... Browse Code »

[ Upstream commit 5855564c8ab2d9cefca7b2933bd19818eb795e40 ]

This adds a register identifier for use with the one_reg interface
to allow the decrementer expiry time to be read and written by
userspace. The decrementer expiry time is in guest timebase units
and is equal to the sum of the decrementer and the guest timebase.
(The expiry time is used rather than the decrementer value itself
because the expiry time is not constantly changing, though the
decrementer value is, while the guest vcpu is not running.)

Without this, a guest vcpu migrated to a new host will see its
decrementer set to some random value. On POWER8 and earlier, the
decrementer is 32 bits wide and counts down at 512MHz, so the
guest vcpu will potentially see no decrementer interrupts for up
to about 4 seconds, which will lead to a stall. With POWER9, the
decrementer is now 56 bits side, so the stall can be much longer
(up to 2.23 years) and more noticeable.

To help work around the problem in cases where userspace has not been
updated to migrate the decrementer expiry time, we now set the
default decrementer expiry at vcpu creation time to the current time
rather than the maximum possible value. This should mean an
immediate decrementer interrupt when a migrated vcpu starts
running. In cases where the decrementer is 32 bits wide and more
than 4 seconds elapse between the creation of the vcpu and when it
first runs, the decrementer would have wrapped around to positive
values and there may still be a stall - but this is no worse than
the current situation. In the large-decrementer case, we are sure
to get an immediate decrementer interrupt (assuming the time from
vcpu creation to first run is less than 2.23 years) and we thus
avoid a very long stall.

Signed-off-by: Paul Mackerras
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Paul Mackerras
2018-04-26 17:02:04 +0800

11 Mar, 2018

1 commit

dc6fb79de KVM: x86: fix backward migration with async_PF ... Browse Code »

commit fe2a3027e74e40a3ece3a4c1e4e51403090a907a upstream.

Guests on new hypersiors might set KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT
bit when enabling async_PF, but this bit is reserved on old hypervisors,
which results in a failure upon migration.

To avoid breaking different cases, we are checking for CPUID feature bit
before enabling the feature and nothing else.

Fixes: 52a5c155cf79 ("KVM: async_pf: Let guest support delivery of async_pf from guest mode")
Cc:
Reviewed-by: Wanpeng Li
Reviewed-by: David Hildenbrand
Signed-off-by: Radim Krčmář
Signed-off-by: Paolo Bonzini
[jwang: port to 4.14]
Signed-off-by: Jack Wang
Signed-off-by: Greg Kroah-Hartman

Radim Krčmář
2018-03-11 23:23:23 +0800

08 Sep, 2017

1 commit

082d3900a Merge tag 'kvm-arm-for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm ... Browse Code »

KVM/ARM Changes for v4.14

Two minor cleanups and improvements, a fix for decoding external abort
types from guests, and added support for migrating the active priority
of interrupts when running a GICv2 guest on a GICv3 host.

Radim Krčmář
2017-09-08 00:22:04 +0800

05 Sep, 2017

1 commit

9b87e7a8b KVM: arm/arm64: Support uaccess of GICC_APRn ... Browse Code »

When migrating guests around we need to know the active priorities to
ensure functional virtual interrupt prioritization by the GIC.

This commit clarifies the API and how active priorities of interrupts in
different groups are represented, and implements the accessor functions
for the uaccess register range.

We live with a slight layering violation in accessing GICv3 data
structures from vgic-mmio-v2.c, because anything else just adds too much
complexity for us to deal with (it's not like there's a benefit
elsewhere in the code of an intermediate representation as is the case
with the VMCR). We accept this, because while doing v3 processing from
a file named something-v2.c can look strange at first, this really is
specific to dealing with the user space interface for something that
looks like a GICv2.

Reviewed-by: Marc Zyngier
Signed-off-by: Christoffer Dall

Christoffer Dall
2017-09-05 23:33:39 +0800

29 Aug, 2017

1 commit

8fa1696ea KVM: s390: Multiple Epoch Facility support ... Browse Code »

Allow for the enablement of MEF and the support for the extended
epoch in SIE and VSIE for the extended guest TOD-Clock.

A new interface is used for getting/setting a guest's extended TOD-Clock
that uses a single ioctl invocation, KVM_S390_VM_TOD_EXT. Since the
host time is a moving target that might see an epoch switch or STP sync
checks we need an atomic ioctl and cannot use the exisiting two
interfaces. The old method of getting and setting the guest TOD-Clock is
still retained and is used when the old ioctls are called.

Signed-off-by: Collin L. Walling
Reviewed-by: Janosch Frank
Reviewed-by: Claudio Imbrenda
Reviewed-by: Jason J. Herne
Reviewed-by: Cornelia Huck
Signed-off-by: Christian Borntraeger

Collin L. Walling
2017-08-29 21:15:54 +0800

14 Jul, 2017

2 commits

d3457c877 kvm: x86: hyperv: make VP_INDEX managed by userspace ... Browse Code »

Hyper-V identifies vCPUs by Virtual Processor Index, which can be
queried via HV_X64_MSR_VP_INDEX msr. It is defined by the spec as a
sequential number which can't exceed the maximum number of vCPUs per VM.
APIC ids can be sparse and thus aren't a valid replacement for VP
indices.

Current KVM uses its internal vcpu index as VP_INDEX. However, to make
it predictable and persistent across VM migrations, the userspace has to
control the value of VP_INDEX.

This patch achieves that, by storing vp_index explicitly on vcpu, and
allowing HV_X64_MSR_VP_INDEX to be set from the host side. For
compatibility it's initialized to KVM vcpu index. Also a few variables
are renamed to make clear distinction betweed this Hyper-V vp_index and
KVM vcpu_id (== APIC id). Besides, a new capability,
KVM_CAP_HYPERV_VP_INDEX, is added to allow the userspace to skip
attempting msr writes where unsupported, to avoid spamming error logs.

Signed-off-by: Roman Kagan
Signed-off-by: Radim Krčmář

Roman Kagan
2017-07-14 22:28:18 +0800
52a5c155c KVM: async_pf: Let guest support delivery of async_pf from guest mode ... Browse Code »

Adds another flag bit (bit 2) to MSR_KVM_ASYNC_PF_EN. If bit 2 is 1,
async page faults are delivered to L1 as #PF vmexits; if bit 2 is 0,
kvm_can_do_async_pf returns 0 if in guest mode.

This is similar to what svm.c wanted to do all along, but it is only
enabled for Linux as L1 hypervisor. Foreign hypervisors must never
receive async page faults as vmexits, because they'd probably be very
confused about that.

Cc: Paolo Bonzini
Cc: Radim Krčmář
Signed-off-by: Wanpeng Li
Signed-off-by: Radim Krčmář

Wanpeng Li
2017-07-14 20:26:16 +0800

13 Jul, 2017

1 commit

efc479e69 kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2 ... Browse Code »

There is a flaw in the Hyper-V SynIC implementation in KVM: when message
page or event flags page is enabled by setting the corresponding msr,
KVM zeroes it out. This is problematic because on migration the
corresponding MSRs are loaded on the destination, so the content of
those pages is lost.

This went unnoticed so far because the only user of those pages was
in-KVM hyperv synic timers, which could continue working despite that
zeroing.

Newer QEMU uses those pages for Hyper-V VMBus implementation, and
zeroing them breaks the migration.

Besides, in newer QEMU the content of those pages is fully managed by
QEMU, so zeroing them is undesirable even when writing the MSRs from the
guest side.

To support this new scheme, introduce a new capability,
KVM_CAP_HYPERV_SYNIC2, which, when enabled, makes sure that the synic
pages aren't zeroed out in KVM.

Signed-off-by: Roman Kagan
Signed-off-by: Radim Krčmář

Roman Kagan
2017-07-13 23:41:04 +0800

03 Jul, 2017

2 commits

ac8d57e57 kvm: x86: mmu: allow A/D bits to be disabled in an mmu ... Browse Code »

Adds the plumbing to disable A/D bits in the MMU based on a new role
bit, ad_disabled. When A/D is disabled, the MMU operates as though A/D
aren't available (i.e., using access tracking faults instead).

To avoid SP -> kvm_mmu_page.role.ad_disabled lookups all over the
place, A/D disablement is now stored in the SPTE. This state is stored
in the SPTE by tweaking the use of SPTE_SPECIAL_MASK for access
tracking. Rather than just setting SPTE_SPECIAL_MASK when an
access-tracking SPTE is non-present, we now always set
SPTE_SPECIAL_MASK for access-tracking SPTEs.

Signed-off-by: Peter Feiner
[Use role.ad_disabled even for direct (non-shadow) EPT page tables. Add
documentation and a few MMU_WARN_ONs. - Paolo]
Signed-off-by: Paolo Bonzini

Peter Feiner
2017-07-03 17:19:54 +0800
8a53e7e57 Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/pau… ... Browse Code »

…lus/powerpc into HEAD

- Better machine check handling for HV KVM
- Ability to support guests with threads=2, 4 or 8 on POWER9
- Fix for a race that could cause delayed recognition of signals
- Fix for a bug where POWER9 guests could sleep with interrupts
pending.

Paolo Bonzini
2017-07-03 16:41:59 +0800

30 Jun, 2017

1 commit

04a7ea04d Merge tag 'kvmarm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD ... Browse Code »

KVM/ARM updates for 4.13

- vcpu request overhaul
- allow timer and PMU to have their interrupt number
selected from userspace
- workaround for Cavium erratum 30115
- handling of memory poisonning
- the usual crop of fixes and cleanups

Conflicts:
arch/s390/include/asm/kvm_host.h

Paolo Bonzini
2017-06-30 18:38:26 +0800

22 Jun, 2017

4 commits

2c1a48f2e KVM: S390: add new group for flic ... Browse Code »

In some cases, userspace needs to get or set all ais states for example
migration. So we introduce a new group KVM_DEV_FLIC_AISM_ALL to provide
interfaces to get or set the adapter-interruption-suppression mode for
all ISCs. The corresponding documentation is updated.

Signed-off-by: Yi Min Zhao
Reviewed-by: Halil Pasic
Signed-off-by: Christian Borntraeger

Yi Min Zhao
2017-06-22 18:41:07 +0800
4036e3874 KVM: s390: ioctls to get and set guest storage attributes ... Browse Code »

* Add the struct used in the ioctls to get and set CMMA attributes.
* Add the two functions needed to get and set the CMMA attributes for
guest pages.
* Add the two ioctls that use the aforementioned functions.

Signed-off-by: Claudio Imbrenda
Acked-by: Cornelia Huck
Signed-off-by: Christian Borntraeger

Claudio Imbrenda
2017-06-22 18:41:06 +0800
190df4a21 KVM: s390: CMMA tracking, ESSA emulation, migration mode ... Browse Code »

* Add a migration state bitmap to keep track of which pages have dirty
CMMA information.
* Disable CMMA by default, so we can track if it's used or not. Enable
it on first use like we do for storage keys (unless we are doing a
migration).
* Creates a VM attribute to enter and leave migration mode.
* In migration mode, CMMA is disabled in the SIE block, so ESSA is
always interpreted and emulated in software.
* Free the migration state on VM destroy.

Signed-off-by: Claudio Imbrenda
Acked-by: Cornelia Huck
Reviewed-by: Christian Borntraeger
Signed-off-by: Christian Borntraeger

Claudio Imbrenda
2017-06-22 18:41:05 +0800
2ed4f9dd1 KVM: PPC: Book3S HV: Add capability to report possible virtual SMT modes ... Browse Code »

Now that userspace can set the virtual SMT mode by enabling the
KVM_CAP_PPC_SMT capability, it is useful for userspace to be able
to query the set of possible virtual SMT modes. This provides a
new capability, KVM_CAP_PPC_SMT_POSSIBLE, to provide this
information. The return value is a bitmap of possible modes, with
bit N set if virtual SMT mode 2^N is available. That is, 1 indicates
SMT1 is available, 2 indicates that SMT2 is available, 3 indicates
that both SMT1 and SMT2 are available, and so on.

Signed-off-by: Paul Mackerras

Paul Mackerras
2017-06-22 09:25:31 +0800

21 Jun, 2017

1 commit

134764ed6 KVM: PPC: Book3S HV: Add new capability to control MCE behaviour ... Browse Code »

This introduces a new KVM capability to control how KVM behaves
on machine check exception (MCE) in HV KVM guests.

If this capability has not been enabled, KVM redirects machine check
exceptions to guest's 0x200 vector, if the address in error belongs to
the guest. With this capability enabled, KVM will cause a guest exit
with the exit reason indicating an NMI.

The new capability is required to avoid problems if a new kernel/KVM
is used with an old QEMU, running a guest that doesn't issue
"ibm,nmi-register". As old QEMU does not understand the NMI exit
type, it treats it as a fatal error. However, the guest could have
handled the machine check error if the exception was delivered to
guest's 0x200 interrupt vector instead of NMI exit in case of old
QEMU.

[paulus@ozlabs.org - Reworded the commit message to be clearer,
enable only on HV KVM.]

Signed-off-by: Aravinda Prasad
Reviewed-by: David Gibson
Signed-off-by: Mahesh Salgaonkar
Signed-off-by: Paul Mackerras

Aravinda Prasad
2017-06-21 11:37:08 +0800

19 Jun, 2017

1 commit

3c3135246 KVM: PPC: Book3S HV: Allow userspace to set the desired SMT mode ... Browse Code »

This allows userspace to set the desired virtual SMT (simultaneous
multithreading) mode for a VM, that is, the number of VCPUs that
get assigned to each virtual core. Previously, the virtual SMT mode
was fixed to the number of threads per subcore, and if userspace
wanted to have fewer vcpus per vcore, then it would achieve that by
using a sparse CPU numbering. This had the disadvantage that the
vcpu numbers can get quite large, particularly for SMT1 guests on
a POWER8 with 8 threads per core. With this patch, userspace can
set its desired virtual SMT mode and then use contiguous vcpu
numbering.

On POWER8, where the threading mode is "strict", the virtual SMT mode
must be less than or equal to the number of threads per subcore. On
POWER9, which implements a "loose" threading mode, the virtual SMT
mode can be any power of 2 between 1 and 8, even though there is
effectively one thread per subcore, since the threads are independent
and can all be in different partitions.

Signed-off-by: Paul Mackerras

Paul Mackerras
2017-06-19 12:34:20 +0800

08 Jun, 2017

2 commits

99a1db7a2 KVM: arm/arm64: Allow setting the timer IRQ numbers from userspace ... Browse Code »

First we define an ABI using the vcpu devices that lets userspace set
the interrupt numbers for the various timers on both the 32-bit and
64-bit KVM/ARM implementations.

Second, we add the definitions for the groups and attributes introduced
by the above ABI. (We add the PMU define on the 32-bit side as well for
symmetry and it may get used some day.)

Third, we set up the arch-specific vcpu device operation handlers to
call into the timer code for anything related to the
KVM_ARM_VCPU_TIMER_CTRL group.

Fourth, we implement support for getting and setting the timer interrupt
numbers using the above defined ABI in the arch timer code.

Fifth, we introduce error checking upon enabling the arch timer (which
is called when first running a VCPU) to check that all VCPUs are
configured to use the same PPI for the timer (as mandated by the
architecture) and that the virtual and physical timers are not
configured to use the same IRQ number.

Signed-off-by: Christoffer Dall
Reviewed-by: Marc Zyngier

Christoffer Dall
2017-06-08 22:59:57 +0800
a2befacf5 KVM: arm64: Allow creating the PMU without the in-kernel GIC ... Browse Code »

Since we got support for devices in userspace which allows reporting the
PMU overflow output status to userspace, we should actually allow
creating the PMU on systems without an in-kernel irqchip, which in turn
requires us to slightly clarify error codes for the ABI and move things
around for the initialization phase.

Signed-off-by: Christoffer Dall
Reviewed-by: Marc Zyngier

Christoffer Dall
2017-06-08 22:59:44 +0800

04 Jun, 2017

1 commit

3bb96149f KVM: Add documentation for VCPU requests ... Browse Code »

Signed-off-by: Andrew Jones
Acked-by: Christoffer Dall
Signed-off-by: Christoffer Dall

Andrew Jones
2017-06-04 22:53:00 +0800

09 May, 2017

3 commits

36c344f3f Merge tag 'kvm-arm-for-v4.12-round2' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/kvmarm/kvmarm into HEAD

Second round of KVM/ARM Changes for v4.12.

Changes include:
- A fix related to the 32-bit idmap stub
- A fix to the bitmask used to deode the operands of an AArch32 CP
instruction
- We have moved the files shared between arch/arm/kvm and
arch/arm64/kvm to virt/kvm/arm
- We add support for saving/restoring the virtual ITS state to
userspace

Paolo Bonzini
2017-05-09 18:51:49 +0800
cb9d04346 KVM: arm/arm64: Clarification and relaxation to ITS save/restore ABI ... Browse Code »

Clarify what is meant by the save/restore ABI only supporting virtual
physical interrupts.

Relax the requirement of the order that the collection entries are
written in and be clear that there is no particular ordering enforced.

Some cosmetic changes in the capitalization of ID names to align with
the GICv3 manual and remove the empty line in the bottom of the patch.

Signed-off-by: Christoffer Dall
Reviewed-by: Eric Auger

Christoffer Dall
2017-05-09 16:51:37 +0800
2d3e4866d Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM updates from Paolo Bonzini:
"ARM:
- HYP mode stub supports kexec/kdump on 32-bit
- improved PMU support
- virtual interrupt controller performance improvements
- support for userspace virtual interrupt controller (slower, but
necessary for KVM on the weird Broadcom SoCs used by the Raspberry
Pi 3)

MIPS:
- basic support for hardware virtualization (ImgTec P5600/P6600/I6400
and Cavium Octeon III)

PPC:
- in-kernel acceleration for VFIO

s390:
- support for guests without storage keys
- adapter interruption suppression

x86:
- usual range of nVMX improvements, notably nested EPT support for
accessed and dirty bits
- emulation of CPL3 CPUID faulting

generic:
- first part of VCPU thread request API
- kvm_stat improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
kvm: nVMX: Don't validate disabled secondary controls
KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick
Revert "KVM: Support vCPU-based gfn->hva cache"
tools/kvm: fix top level makefile
KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING
KVM: Documentation: remove VM mmap documentation
kvm: nVMX: Remove superfluous VMX instruction fault checks
KVM: x86: fix emulation of RSM and IRET instructions
KVM: mark requests that need synchronization
KVM: return if kvm_vcpu_wake_up() did wake up the VCPU
KVM: add explicit barrier to kvm_vcpu_kick
KVM: perform a wake_up in kvm_make_all_cpus_request
KVM: mark requests that do not need a wakeup
KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up
KVM: x86: always use kvm_make_request instead of set_bit
KVM: add kvm_{test,clear}_request to replace {test,clear}_bit
s390: kvm: Cpu model support for msa6, msa7 and msa8
KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
kvm: better MWAIT emulation for guests
KVM: x86: virtualize cpuid faulting
...

Linus Torvalds
2017-05-09 03:37:56 +0800

08 May, 2017

2 commits

100e62983 KVM: arm/arm64: Add GICV3 pending table save API documentation ... Browse Code »

Add description for how to save GICV3 LPI pending bit into
guest RAM pending tables.

Signed-off-by: Eric Auger
Acked-by: Christoffer Dall
Acked-by: Marc Zyngier

Eric Auger
2017-05-08 20:31:22 +0800
de2a09107 KVM: arm/arm64: Add ITS save/restore API documentation ... Browse Code »

Add description for how to access ITS registers and how to save/restore
ITS tables into/from memory.

Reviewed-by: Christoffer Dall
Signed-off-by: Eric Auger

Eric Auger
2017-05-08 20:30:49 +0800

29 Apr, 2017

1 commit

bcb85c887 KVM: Documentation: remove VM mmap documentation ... Browse Code »

Since commit 80f5b5e700fa9c ("KVM: remove vm mmap method"), the VM mmap
handler is gone. Remove the corresponding documentation.

Signed-off-by: Jann Horn
Signed-off-by: Paolo Bonzini

Jann Horn
2017-04-29 02:40:52 +0800

27 Apr, 2017

2 commits

c24a7be21 Merge tag 'kvm-arm-for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/k… ... Browse Code »

…vmarm/kvmarm into HEAD

KVM/ARM Changes for v4.12.

Changes include:
- Using the common sysreg definitions between KVM and arm64
- Improved hyp-stub implementation with support for kexec and kdump on the 32-bit side
- Proper PMU exception handling
- Performance improvements of our GIC handling
- Support for irqchip in userspace with in-kernel arch-timers and PMU support
- A fix for a race condition in our PSCI code

Conflicts:
Documentation/virtual/kvm/api.txt
include/uapi/linux/kvm.h

Paolo Bonzini
2017-04-27 23:33:14 +0800
cf9bdd357 Merge tag 'kvm-s390-next-4.12-3' of git://git.kernel.org/pub/scm/linux/kernel/gi… ... Browse Code »

…t/kvms390/linux into HEAD

KVM: s390: MSA8 feature for guests

- Detect all function codes for KMA and export the features
for use in the cpu model

Paolo Bonzini
2017-04-27 20:11:07 +0800

26 Apr, 2017

1 commit

e000b8e09 s390: kvm: Cpu model support for msa6, msa7 and msa8 ... Browse Code »

msa6 and msa7 require no changes.
msa8 adds kma instruction and feature area.

Signed-off-by: Jason J. Herne
Reviewed-by: Christian Borntraeger
Signed-off-by: Christian Borntraeger

Jason J. Herne
2017-04-26 20:19:01 +0800

21 Apr, 2017

1 commit

668fffa3f kvm: better MWAIT emulation for guests ... Browse Code »

Guests that are heavy on futexes end up IPI'ing each other a lot. That
can lead to significant slowdowns and latency increase for those guests
when running within KVM.

If only a single guest is needed on a host, we have a lot of spare host
CPU time we can throw at the problem. Modern CPUs implement a feature
called "MWAIT" which allows guests to wake up sleeping remote CPUs without
an IPI - thus without an exit - at the expense of never going out of guest
context.

The decision whether this is something sensible to use should be up to the
VM admin, so to user space. We can however allow MWAIT execution on systems
that support it properly hardware wise.

This patch adds a CAP to user space and a KVM cpuid leaf to indicate
availability of native MWAIT execution. With that enabled, the worst a
guest can do is waste as many cycles as a "jmp ." would do, so it's not
a privilege problem.

We consciously do *not* expose the feature in our CPUID bitmap, as most
people will want to benefit from sleeping vCPUs to allow for over commit.

Reported-by: "Gabriel L. Somlo"
Signed-off-by: Michael S. Tsirkin
[agraf: fix amd, change commit message]
Signed-off-by: Alexander Graf
Signed-off-by: Paolo Bonzini

Michael S. Tsirkin
2017-04-21 18:50:28 +0800

20 Apr, 2017

1 commit

121f80ba6 KVM: PPC: VFIO: Add in-kernel acceleration for VFIO ... Browse Code »

This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO
without passing them to user space which saves time on switching
to user space and back.

This adds H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE handlers to KVM.
KVM tries to handle a TCE request in the real mode, if failed
it passes the request to the virtual mode to complete the operation.
If it a virtual mode handler fails, the request is passed to
the user space; this is not expected to happen though.

To avoid dealing with page use counters (which is tricky in real mode),
this only accelerates SPAPR TCE IOMMU v2 clients which are required
to pre-register the userspace memory. The very first TCE request will
be handled in the VFIO SPAPR TCE driver anyway as the userspace view
of the TCE table (iommu_table::it_userspace) is not allocated till
the very first mapping happens and we cannot call vmalloc in real mode.

If we fail to update a hardware IOMMU table unexpected reason, we just
clear it and move on as there is nothing really we can do about it -
for example, if we hot plug a VFIO device to a guest, existing TCE tables
will be mirrored automatically to the hardware and there is no interface
to report to the guest about possible failures.

This adds new attribute - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE - to
the VFIO KVM device. It takes a VFIO group fd and SPAPR TCE table fd
and associates a physical IOMMU table with the SPAPR TCE table (which
is a guest view of the hardware IOMMU table). The iommu_table object
is cached and referenced so we do not have to look up for it in real mode.

This does not implement the UNSET counterpart as there is no use for it -
once the acceleration is enabled, the existing userspace won't
disable it unless a VFIO container is destroyed; this adds necessary
cleanup to the KVM_DEV_VFIO_GROUP_DEL handler.

This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user
space.

This adds real mode version of WARN_ON_ONCE() as the generic version
causes problems with rcu_sched. Since we testing what vmalloc_to_phys()
returns in the code, this also adds a check for already existing
vmalloc_to_phys() call in kvmppc_rm_h_put_tce_indirect().

This finally makes use of vfio_external_user_iommu_id() which was
introduced quite some time ago and was considered for removal.

Tests show that this patch increases transmission speed from 220MB/s
to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

Signed-off-by: Alexey Kardashevskiy
Acked-by: Alex Williamson
Reviewed-by: David Gibson
Signed-off-by: Paul Mackerras

Alexey Kardashevskiy
2017-04-20 09:39:26 +0800

12 Apr, 2017

1 commit

f7b1a77d3 Merge tag 'kvm-s390-next-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux ... Browse Code »

From: Christian Borntraeger

KVM: s390: features for 4.12

1. guarded storage support for guests
This contains an s390 base Linux feature branch that is necessary
to implement the KVM part
2. Provide an interface to implement adapter interruption suppression
which is necessary for proper zPCI support
3. Use more defines instead of numbers
4. Provide logging for lazy enablement of runtime instrumentation

Radim Krčmář
2017-04-12 02:54:40 +0800

09 Apr, 2017

2 commits

3fe17e682 KVM: arm/arm64: Add ARM user space interrupt signaling ABI ... Browse Code »

We have 2 modes for dealing with interrupts in the ARM world. We can
either handle them all using hardware acceleration through the vgic or
we can emulate a gic in user space and only drive CPU IRQ pins from
there.

Unfortunately, when driving IRQs from user space, we never tell user
space about events from devices emulated inside the kernel, which may
result in interrupt line state changes, so we lose out on for example
timer and PMU events if we run with user space gic emulation.

Define an ABI to publish such device output levels to userspace.

Reviewed-by: Alexander Graf
Reviewed-by: Marc Zyngier
Signed-off-by: Alexander Graf
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Alexander Graf
2017-04-09 22:49:38 +0800
d824ca52a arm/arm64: Add hyp-stub API documentation ... Browse Code »

In order to help people understanding the hyp-stub API that exists
between the host kernel and the hypervisor mode (whether a hypervisor
has been installed or not), let's document said API.

As with any form of documentation, I expect it to become obsolete
and completely misleading within 20 minutes after having being merged.

Acked-by: Russell King
Acked-by: Catalin Marinas
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall

Marc Zyngier
2017-04-09 22:49:36 +0800

07 Apr, 2017

2 commits

ad6260da1 KVM: x86: drop legacy device assignment ... Browse Code »

Legacy device assignment has been deprecated since 4.2 (released
1.5 years ago). VFIO is better and everyone should have switched to it.
If they haven't, this should convince them. :)

Reviewed-by: Alex Williamson
Signed-off-by: Paolo Bonzini

Paolo Bonzini
2017-04-07 22:49:00 +0800
47a4693e1 KVM: s390: introduce AIS capability ... Browse Code »

Introduce a cap to enable AIS facility bit, and add documentation
for this capability.

Signed-off-by: Yi Min Zhao
Signed-off-by: Fei Li
Reviewed-by: Cornelia Huck
Signed-off-by: Christian Borntraeger

Yi Min Zhao
2017-04-07 15:11:11 +0800

06 Apr, 2017

2 commits

715958f92 Merge tag 'kvm_mips_4.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/kvm-mips ... Browse Code »

From: James Hogan

KVM: MIPS: VZ support, Octeon III, and TLBR

Add basic support for the MIPS Virtualization Module (generally known as
MIPS VZ) in KVM. We primarily support the ImgTec P5600, P6600, I6400,
and Cavium Octeon III cores so far. Support is included for the
following VZ / guest hardware features:
- MIPS32 and MIPS64, r5 (VZ requires r5 or later) and r6
- TLBs with GuestID (IMG cores) or Root ASID Dealias (Octeon III)
- Shared physical root/guest TLB (IMG cores)
- FPU / MSA
- Cop0 timer (up to 1GHz for now due to soft timer limit)
- Segmentation control (EVA)
- Hardware page table walker (HTW) both for root and guest TLB

Also included is a proper implementation of the TLBR instruction for the
trap & emulate MIPS KVM implementation.

Preliminary MIPS architecture changes are applied directly with Ralf's
ack.

Radim Krčmář
2017-04-06 20:47:03 +0800
a89209501 KVM: s390: introduce adapter interrupt inject function ... Browse Code »

Inject adapter interrupts on a specified adapter which allows to
retrieve the adapter flags, e.g. if the adapter is subject to AIS
facility or not. And add documentation for this interface.

For adapters subject to AIS, handle the airq injection suppression
for a given ISC according to the interruption mode:
- before injection, if NO-Interruptions Mode, just return 0 and
suppress, otherwise, allow the injection.
- after injection, if SINGLE-Interruption Mode, change it to
NO-Interruptions Mode to suppress the following interrupts.

Besides, add tracepoint for suppressed airq and AIS mode transitions.

Signed-off-by: Yi Min Zhao
Signed-off-by: Fei Li
Reviewed-by: Cornelia Huck
Signed-off-by: Christian Borntraeger

Yi Min Zhao
2017-04-06 19:15:37 +0800