Eric Lee / smarc-fsl-linux-kernel

16 Mar, 2019

2 commits

636deed6c Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM updates from Paolo Bonzini:
"ARM:
- some cleanups
- direct physical timer assignment
- cache sanitization for 32-bit guests

s390:
- interrupt cleanup
- introduction of the Guest Information Block
- preparation for processor subfunctions in cpu models

PPC:
- bug fixes and improvements, especially related to machine checks
and protection keys

x86:
- many, many cleanups, including removing a bunch of MMU code for
unnecessary optimizations
- AVIC fixes

Generic:
- memcg accounting"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (147 commits)
kvm: vmx: fix formatting of a comment
KVM: doc: Document the life cycle of a VM and its resources
MAINTAINERS: Add KVM selftests to existing KVM entry
Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()"
KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()
KVM: PPC: Fix compilation when KVM is not enabled
KVM: Minor cleanups for kvm_main.c
KVM: s390: add debug logging for cpu model subfunctions
KVM: s390: implement subfunction processor calls
arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2
KVM: arm/arm64: Remove unused timer variable
KVM: PPC: Book3S: Improve KVM reference counting
KVM: PPC: Book3S HV: Fix build failure without IOMMU support
Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()"
x86: kvmguest: use TSC clocksource if invariant TSC is exposed
KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start
KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter
KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns
KVM: x86/mmu: Consolidate kvm_mmu_zap_all() and kvm_mmu_zap_mmio_sptes()
KVM: x86/mmu: WARN if zapping a MMIO spte results in zapping children
...

Linus Torvalds
2019-03-16 06:00:28 +0800
eca6be566 KVM: doc: Document the life cycle of a VM and its resources ... Browse Code »

The series to add memcg accounting to KVM allocations[1] states:

There are many KVM kernel memory allocations which are tied to the
life of the VM process and should be charged to the VM process's
cgroup.

While it is correct to account KVM kernel allocations to the cgroup of
the process that created the VM, it's technically incorrect to state
that the KVM kernel memory allocations are tied to the life of the VM
process. This is because the VM itself, i.e. struct kvm, is not tied to
the life of the process which created it, rather it is tied to the life
of its associated file descriptor. In other words, kvm_destroy_vm() is
not invoked until fput() decrements its associated file's refcount to
zero. A simple example is to fork() in Qemu and have the child sleep
indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file
descriptor *and* the rogue child is killed.

The allocations are guaranteed to be *accounted* to the process which
created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the
ioctl() with -EIO if kvm->mm != current->mm. I.e. the child can keep
the VM "alive" but can't do anything useful with its reference.

Note that because 'struct kvm' also holds a reference to the mm_struct
of its owner, the above behavior also applies to userspace allocations.

Given that mucking with a VM's file descriptor can lead to subtle and
undesirable behavior, e.g. memcg charges persisting after a VM is shut
down, explicitly document a VM's lifecycle and its impact on the VM's
resources.

Alternatively, KVM could aggressively free resources when the creating
process exits, e.g. via mmu_notifier->release(). However, mmu_notifier
isn't guaranteed to be available, and freeing resources when the creator
exits is likely to be error prone and fragile as KVM would need to
ensure that it only freed resources that are truly out of reach. In
practice, the existing behavior shouldn't be problematic as a properly
configured system will prevent a child process from being moved out of
the appropriate cgroup hierarchy, i.e. prevent hiding the process from
the OOM killer, and will prevent an unprivileged user from being able to
to hold a reference to struct kvm via another method, e.g. debugfs.

[1]https://patchwork.kernel.org/patch/10806707/

Signed-off-by: Sean Christopherson
Signed-off-by: Paolo Bonzini

Sean Christopherson
2019-03-16 02:24:33 +0800

07 Mar, 2019

1 commit

8457fdfeb virtio-ccw: diag 500 may return a negative cookie ... Browse Code »

If something goes wrong in the kvm io bus handling, the virtio-ccw
diagnose may return a negative error value in the cookie gpr.

Document this.

Reviewed-by: Halil Pasic
Signed-off-by: Cornelia Huck
Signed-off-by: Michael S. Tsirkin

Cornelia Huck
2019-03-07 00:19:33 +0800

21 Feb, 2019

3 commits

49113d360 KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter ... Browse Code »

The hard-coded value 10000 in grow_halt_poll_ns() stands for the initial
start value when raising up vcpu->halt_poll_ns.
It actually sets the first timeout to the first polling session.
This value has significant effect on how tolerant we are to outliers.
On the standard case, higher value is better - we will spend more time
in the polling busyloop, handle events/interrupts faster and result
in better performance.
But on outliers it puts us in a busy loop that does nothing.
Even if the shrink factor is zero, we will still waste time on the first
iteration.
The optimal value changes between different workloads. It depends on
outliers rate and polling sessions length.
As this value has significant effect on the dynamic halt-polling
algorithm, it should be configurable and exposed.

Reviewed-by: Boris Ostrovsky
Reviewed-by: Liran Alon
Signed-off-by: Nir Weiner
Signed-off-by: Paolo Bonzini

Nir Weiner
2019-02-21 05:48:50 +0800
a592a3b8f Revert "KVM: MMU: document fast invalidate all pages" ... Browse Code »

Remove x86 KVM's fast invalidate mechanism, i.e. revert all patches
from the original series[1].

Though not explicitly stated, for all intents and purposes the fast
invalidate mechanism was added to speed up the scenario where removing
a memslot, e.g. as part of accessing reading PCI ROM, caused KVM to
flush all shadow entries[1]. Now that the memslot case flushes only
shadow entries belonging to the memslot, i.e. doesn't use the fast
invalidate mechanism, the only remaining usage of the mechanism are
when the VM is being destroyed and when the MMIO generation rolls
over.

When a VM is being destroyed, either there are no active vcpus, i.e.
there's no lock contention, or the VM has ungracefully terminated, in
which case we want to reclaim its pages as quickly as possible, i.e.
not release the MMU lock if there are still CPUs executing in the VM.

The MMIO generation scenario is almost literally a one-in-a-million
occurrence, i.e. is not a performance sensitive scenario.

Given that lock-breaking is not desirable (VM teardown) or irrelevant
(MMIO generation overflow), remove the fast invalidate mechanism to
simplify the code (a small amount) and to discourage future code from
zapping all pages as using such a big hammer should be a last restort.

This reverts commit f6f8adeef542a18b1cb26a0b772c9781a10bb477.

[1] https://lkml.kernel.org/r/1369960590-14138-1-git-send-email-xiaoguangrong@linux.vnet.ibm.com

Cc: Xiao Guangrong
Signed-off-by: Sean Christopherson
Signed-off-by: Paolo Bonzini

Sean Christopherson
2019-02-21 05:48:39 +0800
164bf7e56 KVM: Move the memslot update in-progress flag to bit 63 ... Browse Code »

...now that KVM won't explode by moving it out of bit 0. Using bit 63
eliminates the need to jump over bit 0, e.g. when calculating a new
memslots generation or when propagating the memslots generation to an
MMIO spte.

Signed-off-by: Sean Christopherson
Signed-off-by: Paolo Bonzini

Sean Christopherson
2019-02-21 05:48:37 +0800

12 Jan, 2019

1 commit

cf1754c2a Documentation/virtual/kvm: Update URL for AMD SEV API specification ... Browse Code »

The URL of [api-spec] in Documentation/virtual/kvm/amd-memory-encryption.rst
is no longer valid, replaced space with underscore.

Signed-off-by: Christophe de Dinechin
Reviewed-by: Brijesh Singh
Signed-off-by: Radim Krčmář

Christophe de Dinechin
2019-01-12 01:38:07 +0800

15 Dec, 2018

1 commit

2bc39970e x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID ... Browse Code »

With every new Hyper-V Enlightenment we implement we're forced to add a
KVM_CAP_HYPERV_* capability. While this approach works it is fairly
inconvenient: the majority of the enlightenments we do have corresponding
CPUID feature bit(s) and userspace has to know this anyways to be able to
expose the feature to the guest.

Add KVM_GET_SUPPORTED_HV_CPUID ioctl (backed by KVM_CAP_HYPERV_CPUID, "one
cap to rule them all!") returning all Hyper-V CPUID feature leaves.

Using the existing KVM_GET_SUPPORTED_CPUID doesn't seem to be possible:
Hyper-V CPUID feature leaves intersect with KVM's (e.g. 0x40000000,
0x40000001) and we would probably confuse userspace in case we decide to
return these twice.

KVM_CAP_HYPERV_CPUID's number is interim: we're intended to drop
KVM_CAP_HYPERV_STIMER_DIRECT and use its number instead.

Suggested-by: Paolo Bonzini
Signed-off-by: Vitaly Kuznetsov
Signed-off-by: Paolo Bonzini

Vitaly Kuznetsov
2018-12-15 00:59:54 +0800

14 Dec, 2018

2 commits

2a31b9db1 kvm: introduce manual dirty log reprotect ... Browse Code »

There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:

1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).

This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.

Signed-off-by: Paolo Bonzini

Paolo Bonzini
2018-12-14 19:34:19 +0800
e5d83c74a kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic ... Browse Code »

The first such capability to be handled in virt/kvm/ will be manual
dirty page reprotection.

Signed-off-by: Paolo Bonzini

Paolo Bonzini
2018-12-14 19:34:18 +0800

26 Oct, 2018

1 commit

0d1e8b8d2 Merge tag 'kvm-4.20-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM updates from Radim Krčmář:
"ARM:
- Improved guest IPA space support (32 to 52 bits)

- RAS event delivery for 32bit

- PMU fixes

- Guest entry hardening

- Various cleanups

- Port of dirty_log_test selftest

PPC:
- Nested HV KVM support for radix guests on POWER9. The performance
is much better than with PR KVM. Migration and arbitrary level of
nesting is supported.

- Disable nested HV-KVM on early POWER9 chips that need a particular
hardware bug workaround

- One VM per core mode to prevent potential data leaks

- PCI pass-through optimization

- merge ppc-kvm topic branch and kvm-ppc-fixes to get a better base

s390:
- Initial version of AP crypto virtualization via vfio-mdev

- Improvement for vfio-ap

- Set the host program identifier

- Optimize page table locking

x86:
- Enable nested virtualization by default

- Implement Hyper-V IPI hypercalls

- Improve #PF and #DB handling

- Allow guests to use Enlightened VMCS

- Add migration selftests for VMCS and Enlightened VMCS

- Allow coalesced PIO accesses

- Add an option to perform nested VMCS host state consistency check
through hardware

- Automatic tuning of lapic_timer_advance_ns

- Many fixes, minor improvements, and cleanups"

* tag 'kvm-4.20-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
KVM/nVMX: Do not validate that posted_intr_desc_addr is page aligned
Revert "kvm: x86: optimize dr6 restore"
KVM: PPC: Optimize clearing TCEs for sparse tables
x86/kvm/nVMX: tweak shadow fields
selftests/kvm: add missing executables to .gitignore
KVM: arm64: Safety check PSTATE when entering guest and handle IL
KVM: PPC: Book3S HV: Don't use streamlined entry path on early POWER9 chips
arm/arm64: KVM: Enable 32 bits kvm vcpu events support
arm/arm64: KVM: Rename function kvm_arch_dev_ioctl_check_extension()
KVM: arm64: Fix caching of host MDCR_EL2 value
KVM: VMX: enable nested virtualization by default
KVM/x86: Use 32bit xor to clear registers in svm.c
kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD
kvm: vmx: Defer setting of DR6 until #DB delivery
kvm: x86: Defer setting of CR2 until #PF delivery
kvm: x86: Add payload operands to kvm_multiple_exception
kvm: x86: Add exception payload fields to kvm_vcpu_events
kvm: x86: Add has_payload and payload to kvm_queued_exception
KVM: Documentation: Fix omission in struct kvm_vcpu_events
KVM: selftests: add Enlightened VMCS test
...

Linus Torvalds
2018-10-26 08:57:35 +0800

25 Oct, 2018

1 commit

01aa9d518 Merge tag 'docs-4.20' of git://git.lwn.net/linux ... Browse Code »

Pull documentation updates from Jonathan Corbet:
"This is a fairly typical cycle for documentation. There's some welcome
readability improvements for the formatted output, some LICENSES
updates including the addition of the ISC license, the removal of the
unloved and unmaintained 00-INDEX files, the deprecated APIs document
from Kees, more MM docs from Mike Rapoport, and the usual pile of typo
fixes and corrections"

* tag 'docs-4.20' of git://git.lwn.net/linux: (41 commits)
docs: Fix typos in histogram.rst
docs: Introduce deprecated APIs list
kernel-doc: fix declaration type determination
doc: fix a typo in adding-syscalls.rst
docs/admin-guide: memory-hotplug: remove table of contents
doc: printk-formats: Remove bogus kobject references for device nodes
Documentation: preempt-locking: Use better example
dm flakey: Document "error_writes" feature
docs/completion.txt: Fix a couple of punctuation nits
LICENSES: Add ISC license text
LICENSES: Add note to CDDL-1.0 license that it should not be used
docs/core-api: memory-hotplug: add some details about locking internals
docs/core-api: rename memory-hotplug-notifier to memory-hotplug
docs: improve readability for people with poorer eyesight
yama: clarify ptrace_scope=2 in Yama documentation
docs/vm: split memory hotplug notifier description to Documentation/core-api
docs: move memory hotplug description into admin-guide/mm
doc: Fix acronym "FEKEK" in ecryptfs
docs: fix some broken documentation references
iommu: Fix passthrough option documentation
...

Linus Torvalds
2018-10-25 01:01:11 +0800

19 Oct, 2018

1 commit

e42b4a507 Merge tag 'kvmarm-for-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/kv… ... Browse Code »

…marm/kvmarm into HEAD

KVM/arm updates for 4.20

- Improved guest IPA space support (32 to 52 bits)
- RAS event delivery for 32bit
- PMU fixes
- Guest entry hardening
- Various cleanups

Paolo Bonzini
2018-10-19 21:24:24 +0800

18 Oct, 2018

2 commits

c4f55198c kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD ... Browse Code »

This is a per-VM capability which can be enabled by userspace so that
the faulting linear address will be included with the information
about a pending #PF in L2, and the "new DR6 bits" will be included
with the information about a pending #DB in L2. With this capability
enabled, the L1 hypervisor can now intercept #PF before CR2 is
modified. Under VMX, the L1 hypervisor can now intercept #DB before
DR6 and DR7 are modified.

When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should
generally provide an appropriate payload when injecting a #PF or #DB
exception via KVM_SET_VCPU_EVENTS. However, to support restoring old
checkpoints, this payload is not required.

Note that bit 16 of the "new DR6 bits" is set to indicate that a debug
exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM
region while advanced debugging of RTM transactional regions was
enabled. This is the reverse of DR6.RTM, which is cleared in this
scenario.

This capability also enables exception.pending in struct
kvm_vcpu_events, which allows userspace to distinguish between pending
and injected exceptions.

Reported-by: Jim Mattson
Suggested-by: Paolo Bonzini
Signed-off-by: Jim Mattson
Signed-off-by: Paolo Bonzini

Jim Mattson
2018-10-18 01:07:44 +0800
59073aaf6 kvm: x86: Add exception payload fields to kvm_vcpu_events ... Browse Code »

The per-VM capability KVM_CAP_EXCEPTION_PAYLOAD (to be introduced in a
later commit) adds the following fields to struct kvm_vcpu_events:
exception_has_payload, exception_payload, and exception.pending.

With this capability set, all of the details of vcpu->arch.exception,
including the payload for a pending exception, are reported to
userspace in response to KVM_GET_VCPU_EVENTS.

With this capability clear, the original ABI is preserved, and the
exception.injected field is set for either pending or injected
exceptions.

When userspace calls KVM_SET_VCPU_EVENTS with
KVM_CAP_EXCEPTION_PAYLOAD clear, exception.injected is no longer
translated to exception.pending. KVM_SET_VCPU_EVENTS can now only
establish a pending exception when KVM_CAP_EXCEPTION_PAYLOAD is set.

Reported-by: Jim Mattson
Suggested-by: Paolo Bonzini
Signed-off-by: Jim Mattson
Signed-off-by: Paolo Bonzini

Jim Mattson
2018-10-18 01:07:38 +0800

17 Oct, 2018

4 commits

bba9ce58d KVM: Documentation: Fix omission in struct kvm_vcpu_events ... Browse Code »

The header file indicates that there are 36 reserved bytes at the end
of this structure. Adjust the documentation to agree with the header
file.

Signed-off-by: Jim Mattson
Signed-off-by: Paolo Bonzini

Jim Mattson
2018-10-17 06:30:21 +0800
0804c849f kvm/x86 : add coalesced pio support ... Browse Code »

Coalesced pio is based on coalesced mmio and can be used for some port
like rtc port, pci-host config port and so on.

Specially in case of rtc as coalesced pio, some versions of windows guest
access rtc frequently because of rtc as system tick. guest access rtc like
this: write register index to 0x70, then write or read data from 0x71.
writing 0x70 port is just as index and do nothing else. So we can use
coalesced pio to handle this scene to reduce VM-EXIT time.

When starting and closing a virtual machine, it will access pci-host config
port frequently. So setting these port as coalesced pio can reduce startup
and shutdown time.

without my patch, get the vm-exit time of accessing rtc 0x70 and piix 0xcf8
using perf tools: (guest OS : windows 7 64bit)
IO Port Access Samples Samples% Time% Min Time Max Time Avg time
0x70:POUT 86 30.99% 74.59% 9us 29us 10.75us (+- 3.41%)
0xcf8:POUT 1119 2.60% 2.12% 2.79us 56.83us 3.41us (+- 2.23%)

with my patch
IO Port Access Samples Samples% Time% Min Time Max Time Avg time
0x70:POUT 106 32.02% 29.47% 0us 10us 1.57us (+- 7.38%)
0xcf8:POUT 1065 1.67% 0.28% 0.41us 65.44us 0.66us (+- 10.55%)

Signed-off-by: Peng Hao
Signed-off-by: Paolo Bonzini

Peng Hao
2018-10-17 06:30:11 +0800
9943450b7 kvm/x86 : add document for coalesced mmio ... Browse Code »

Signed-off-by: Peng Hao
Signed-off-by: Paolo Bonzini

Peng Hao
2018-10-17 06:30:10 +0800
214ff83d4 KVM: x86: hyperv: implement PV IPI send hypercalls ... Browse Code »

Using hypercall for sending IPIs is faster because this allows to specify
any number of vCPUs (even > 64 with sparse CPU set), the whole procedure
will take only one VMEXIT.

Current Hyper-V TLFS (v5.0b) claims that HvCallSendSyntheticClusterIpi
hypercall can't be 'fast' (passing parameters through registers) but
apparently this is not true, Windows always uses it as 'fast' so we need
to support that.

Signed-off-by: Vitaly Kuznetsov
Signed-off-by: Paolo Bonzini

Vitaly Kuznetsov
2018-10-17 06:29:47 +0800

09 Oct, 2018

4 commits

901f8c3f6 KVM: PPC: Book3S HV: Add NO_HASH flag to GET_SMMU_INFO ioctl result ... Browse Code »

This adds a KVM_PPC_NO_HASH flag to the flags field of the
kvm_ppc_smmu_info struct, and arranges for it to be set when
running as a nested hypervisor, as an unambiguous indication
to userspace that HPT guests are not supported. Reporting the
KVM_CAP_PPC_MMU_HASH_V3 capability as false could be taken as
indicating only that the new HPT features in ISA V3.0 are not
supported, leaving it ambiguous whether pre-V3.0 HPT features
are supported.

Reviewed-by: David Gibson
Signed-off-by: Paul Mackerras

Paul Mackerras
2018-10-09 13:14:54 +0800
aa069a996 KVM: PPC: Book3S HV: Add a VM capability to enable nested virtualization ... Browse Code »

With this, userspace can enable a KVM-HV guest to run nested guests
under it.

The administrator can control whether any nested guests can be run;
setting the "nested" module parameter to false prevents any guests
becoming nested hypervisors (that is, any attempt to enable the nested
capability on a guest will fail). Guests which are already nested
hypervisors will continue to be so.

Reviewed-by: David Gibson
Signed-off-by: Paul Mackerras

Paul Mackerras
2018-10-09 13:14:47 +0800
9d67121a4 Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next ... Browse Code »

This merges in the "ppc-kvm" topic branch of the powerpc tree to get a
series of commits that touch both general arch/powerpc code and KVM
code. These commits will be merged both via the KVM tree and the
powerpc tree.

Signed-off-by: Paul Mackerras

Paul Mackerras
2018-10-09 13:13:20 +0800
303234185 KVM: PPC: Book3S HV: Add one-reg interface to virtual PTCR register ... Browse Code »

This adds a one-reg register identifier which can be used to read and
set the virtual PTCR for the guest. This register identifies the
address and size of the virtual partition table for the guest, which
contains information about the nested guests under this guest.

Migrating this value is the only extra requirement for migrating a
guest which has nested guests (assuming of course that the destination
host supports nested virtualization in the kvm-hv module).

Reviewed-by: David Gibson
Signed-off-by: Paul Mackerras
Signed-off-by: Michael Ellerman

Paul Mackerras
2018-10-09 13:04:27 +0800

03 Oct, 2018

1 commit

233a7cb23 kvm: arm64: Allow tuning the physical address size for VM ... Browse Code »

Allow specifying the physical address size limit for a new
VM via the kvm_type argument for the KVM_CREATE_VM ioctl. This
allows us to finalise the stage2 page table as early as possible
and hence perform the right checks on the memory slots
without complication. The size is encoded as Log2(PA_Size) in
bits[7:0] of the type field. For backward compatibility the
value 0 is reserved and implies 40bits. Also, lift the limit
of the IPA to host limit and allow lower IPA sizes (e.g, 32).

The userspace could check the extension KVM_CAP_ARM_VM_IPA_SIZE
for the availability of this feature. The cap check returns the
maximum limit for the physical address shift supported by the host.

Cc: Marc Zyngier
Cc: Christoffer Dall
Cc: Peter Maydell
Cc: Paolo Bonzini
Cc: Radim Krčmář
Reviewed-by: Eric Auger
Signed-off-by: Suzuki K Poulose
Signed-off-by: Marc Zyngier

Suzuki K Poulose
2018-10-03 18:45:20 +0800

20 Sep, 2018

1 commit

6fbbde9a1 KVM: x86: Control guest reads of MSR_PLATFORM_INFO ... Browse Code »

Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access
to reads of MSR_PLATFORM_INFO.

Disabling access to reads of this MSR gives userspace the control to "expose"
this platform-dependent information to guests in a clear way. As it exists
today, guests that read this MSR would get unpopulated information if userspace
hadn't already set it (and prior to this patch series, only the CPUID faulting
information could have been populated). This existing interface could be
confusing if guests don't handle the potential for incorrect/incomplete
information gracefully (e.g. zero reported for base frequency).

Signed-off-by: Drew Schmitt
Signed-off-by: Paolo Bonzini

Drew Schmitt
2018-09-20 06:51:46 +0800

12 Sep, 2018

1 commit

40ebdb8e5 KVM: s390: Make huge pages unavailable in ucontrol VMs ... Browse Code »

We currently do not notify all gmaps when using gmap_pmdp_xchg(), due
to locking constraints. This makes ucontrol VMs, which is the only VM
type that creates multiple gmaps, incompatible with huge pages. Also
we would need to hold the guest_table_lock of all gmaps that have this
vmaddr maped to synchronize access to the pmd.

ucontrol VMs are rather exotic and creating a new locking concept is
no easy task. Hence we return EINVAL when trying to active
KVM_CAP_S390_HPAGE_1M and report it as being not available when
checking for it.

Fixes: a4499382 ("KVM: s390: Add huge page enablement control")
Signed-off-by: Janosch Frank
Reviewed-by: David Hildenbrand
Reviewed-by: Claudio Imbrenda
Message-Id:
Signed-off-by: Janosch Frank

Janosch Frank
2018-09-12 20:46:37 +0800

10 Sep, 2018

1 commit

a7ddcea58 Drop all 00-INDEX files from Documentation/ ... Browse Code »

This is a respin with a wider audience (all that get_maintainer returned)
and I know this spams a *lot* of people. Not sure what would be the correct
way, so my apologies for ruining your inbox.

The 00-INDEX files are supposed to give a summary of all files present
in a directory, but these files are horribly out of date and their
usefulness is brought into question. Often a simple "ls" would reveal
the same information as the filenames are generally quite descriptive as
a short introduction to what the file covers (it should not surprise
anyone what Documentation/sched/sched-design-CFS.txt covers)

A few years back it was mentioned that these files were no longer really
needed, and they have since then grown further out of date, so perhaps
it is time to just throw them out.

A short status yields the following _outdated_ 00-INDEX files, first
counter is files listed in 00-INDEX but missing in the directory, last
is files present but not listed in 00-INDEX.

List of outdated 00-INDEX:
Documentation: (4/10)
Documentation/sysctl: (0/1)
Documentation/timers: (1/0)
Documentation/blockdev: (3/1)
Documentation/w1/slaves: (0/1)
Documentation/locking: (0/1)
Documentation/devicetree: (0/5)
Documentation/power: (1/1)
Documentation/powerpc: (0/5)
Documentation/arm: (1/0)
Documentation/x86: (0/9)
Documentation/x86/x86_64: (1/1)
Documentation/scsi: (4/4)
Documentation/filesystems: (2/9)
Documentation/filesystems/nfs: (0/2)
Documentation/cgroup-v1: (0/2)
Documentation/kbuild: (0/4)
Documentation/spi: (1/0)
Documentation/virtual/kvm: (1/0)
Documentation/scheduler: (0/2)
Documentation/fb: (0/1)
Documentation/block: (0/1)
Documentation/networking: (6/37)
Documentation/vm: (1/3)

Then there are 364 subdirectories in Documentation/ with several files that
are missing 00-INDEX alltogether (and another 120 with a single file and no
00-INDEX).

I don't really have an opinion to whether or not we /should/ have 00-INDEX,
but the above 00-INDEX should either be removed or be kept up to date. If
we should keep the files, I can try to keep them updated, but I rather not
if we just want to delete them anyway.

As a starting point, remove all index-files and references to 00-INDEX and
see where the discussion is going.

Signed-off-by: Henrik Austad
Acked-by: "Paul E. McKenney"
Just-do-it-by: Steven Rostedt
Reviewed-by: Jens Axboe
Acked-by: Paul Moore
Acked-by: Greg Kroah-Hartman
Acked-by: Mark Brown
Acked-by: Mike Rapoport
Cc: [Almost everybody else]
Signed-off-by: Jonathan Corbet

Henrik Austad
2018-09-10 05:08:58 +0800

22 Aug, 2018

2 commits

688e0581d KVM: Documentation: rename the capability of KVM_CAP_ARM_SET_SERROR_ESR ... Browse Code »

In the documentation description, this capability's name is
KVM_CAP_ARM_SET_SERROR_ESR, but in the header file this
capability's name is KVM_CAP_ARM_INJECT_SERROR_ESR, so change
the documentation description to make it same.

Signed-off-by: Dongjiu Geng
Reported-by: Andrew Jones
Reviewed-by: Andrew Jones
Signed-off-by: Paolo Bonzini

Dongjiu Geng
2018-08-22 20:08:05 +0800
631989303 Merge tag 'kvmarm-for-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kv… ... Browse Code »

…marm/kvmarm into HEAD

KVM/arm updates for 4.19

- Support for Group0 interrupts in guests
- Cache management optimizations for ARMv8.4 systems
- Userspace interface for RAS, allowing error retrival and injection
- Fault path optimization
- Emulated physical timer fixes
- Random cleanups

Paolo Bonzini
2018-08-22 20:07:56 +0800

06 Aug, 2018

2 commits

4180bf1b6 KVM: X86: Implement "send IPI" hypercall ... Browse Code »

Using hypercall to send IPIs by one vmexit instead of one by one for
xAPIC/x2APIC physical mode and one vmexit per-cluster for x2APIC cluster
mode. Intel guest can enter x2apic cluster mode when interrupt remmaping
is enabled in qemu, however, latest AMD EPYC still just supports xapic
mode which can get great improvement by Exit-less IPIs. This patchset
lets a guest send multicast IPIs, with at most 128 destinations per
hypercall in 64-bit mode and 64 vCPUs per hypercall in 32-bit mode.

Hardware: Xeon Skylake 2.5GHz, 2 sockets, 40 cores, 80 threads, the VM
is 80 vCPUs, IPI microbenchmark(https://lkml.org/lkml/2017/12/19/141):

x2apic cluster mode, vanilla

Dry-run: 0, 2392199 ns
Self-IPI: 6907514, 15027589 ns
Normal IPI: 223910476, 251301666 ns
Broadcast IPI: 0, 9282161150 ns
Broadcast lock: 0, 8812934104 ns

x2apic cluster mode, pv-ipi

Dry-run: 0, 2449341 ns
Self-IPI: 6720360, 15028732 ns
Normal IPI: 228643307, 255708477 ns
Broadcast IPI: 0, 7572293590 ns => 22% performance boost
Broadcast lock: 0, 8316124651 ns

x2apic physical mode, vanilla

Dry-run: 0, 3135933 ns
Self-IPI: 8572670, 17901757 ns
Normal IPI: 226444334, 255421709 ns
Broadcast IPI: 0, 19845070887 ns
Broadcast lock: 0, 19827383656 ns

x2apic physical mode, pv-ipi

Dry-run: 0, 2446381 ns
Self-IPI: 6788217, 15021056 ns
Normal IPI: 219454441, 249583458 ns
Broadcast IPI: 0, 7806540019 ns => 154% performance boost
Broadcast lock: 0, 9143618799 ns

Cc: Paolo Bonzini
Cc: Radim Krčmář
Cc: Vitaly Kuznetsov
Signed-off-by: Wanpeng Li
Signed-off-by: Paolo Bonzini

Wanpeng Li
2018-08-06 23:59:20 +0800
8fcc4b592 kvm: nVMX: Introduce KVM_CAP_NESTED_STATE ... Browse Code »

For nested virtualization L0 KVM is managing a bit of state for L2 guests,
this state can not be captured through the currently available IOCTLs. In
fact the state captured through all of these IOCTLs is usually a mix of L1
and L2 state. It is also dependent on whether the L2 guest was running at
the moment when the process was interrupted to save its state.

With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE
and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM
that is in VMX operation.

Cc: Paolo Bonzini
Cc: Radim Krčmář
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: H. Peter Anvin
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Jim Mattson
[karahmed@ - rename structs and functions and make them ready for AMD and
address previous comments.
- handle nested.smm state.
- rebase & a bit of refactoring.
- Merge 7/8 and 8/8 into one patch. ]
Signed-off-by: KarimAllah Ahmed
Signed-off-by: Paolo Bonzini

Jim Mattson
2018-08-06 23:58:30 +0800

31 Jul, 2018

1 commit

a44993829 KVM: s390: Add huge page enablement control ... Browse Code »

General KVM huge page support on s390 has to be enabled via the
kvm.hpage module parameter. Either nested or hpage can be enabled, as
we currently do not support vSIE for huge backed guests. Once the vSIE
support is added we will either drop the parameter or enable it as
default.

For a guest the feature has to be enabled through the new
KVM_CAP_S390_HPAGE_1M capability and the hpage module
parameter. Enabling it means that cmm can't be enabled for the vm and
disables pfmf and storage key interpretation.

This is due to the fact that in some cases, in upcoming patches, we
have to split huge pages in the guest mapping to be able to set more
granular memory protection on 4k pages. These split pages have fake
page tables that are not visible to the Linux memory management which
subsequently will not manage its PGSTEs, while the SIE will. Disabling
these features lets us manage PGSTE data in a consistent matter and
solve that problem.

Signed-off-by: Janosch Frank
Reviewed-by: David Hildenbrand

Janosch Frank
2018-07-31 05:13:38 +0800

21 Jul, 2018

4 commits

b0960b956 KVM: arm: Add 32bit get/set events support ... Browse Code »

arm64's new use of KVMs get_events/set_events API calls isn't just
or RAS, it allows an SError that has been made pending by KVM as
part of its device emulation to be migrated.

Wire this up for 32bit too.

We only need to read/write the HCR_VA bit, and check that no esr has
been provided, as we don't yet support VDFSR.

Signed-off-by: James Morse
Reviewed-by: Dongjiu Geng
Signed-off-by: Marc Zyngier

James Morse
2018-07-21 23:02:32 +0800
be26b3a73 arm64: KVM: export the capability to set guest SError syndrome ... Browse Code »

For the arm64 RAS Extension, user space can inject a virtual-SError
with specified ESR. So user space needs to know whether KVM support
to inject such SError, this interface adds this query for this capability.

KVM will check whether system support RAS Extension, if supported, KVM
returns true to user space, otherwise returns false.

Signed-off-by: Dongjiu Geng
Reviewed-by: James Morse
[expanded documentation wording]
Signed-off-by: James Morse
Signed-off-by: Marc Zyngier

Dongjiu Geng
2018-07-21 23:02:31 +0800
b7b27facc arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS ... Browse Code »

For the migrating VMs, user space may need to know the exception
state. For example, in the machine A, KVM make an SError pending,
when migrate to B, KVM also needs to pend an SError.

This new IOCTL exports user-invisible states related to SError.
Together with appropriate user space changes, user space can get/set
the SError exception state to do migrate/snapshot/suspend.

Signed-off-by: Dongjiu Geng
Reviewed-by: James Morse
[expanded documentation wording]
Signed-off-by: James Morse
Signed-off-by: Marc Zyngier

Dongjiu Geng
2018-07-21 23:02:30 +0800
327432c24 KVM: arm/arm64: vgic: Update documentation of the GIC devices wrt IIDR ... Browse Code »

Update the documentation to reflect the ordering requirements of
restoring the GICD_IIDR register before any other registers and the
effects this has on restoring the interrupt groups for an emulated GICv2
instance.

Also remove some outdated limitations in the documentation while we're
at it.

Reviewed-by: Andrew Jones
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Christoffer Dall
2018-07-21 23:02:30 +0800

22 Jun, 2018

1 commit

2ddc64981 KVM: fix KVM_CAP_HYPERV_TLBFLUSH paragraph number ... Browse Code »

KVM_CAP_HYPERV_TLBFLUSH collided with KVM_CAP_S390_PSW-BPB, its paragraph
number should now be 8.18.

Signed-off-by: Vitaly Kuznetsov
Signed-off-by: Radim Krčmář

Vitaly Kuznetsov
2018-06-22 23:30:20 +0800

02 Jun, 2018

3 commits

71e9d9aee KVM: docs: nVMX: Remove known limitations as they do not exist now ... Browse Code »

We can document other "Known Limitations" but the ones currently
referenced don't hold anymore...

Signed-off-by: Liran Alon
Signed-off-by: Paolo Bonzini

Liran Alon
2018-06-02 01:18:28 +0800
7f4693b8b KVM: docs: mmu: KVM support exposing SLAT to guests ... Browse Code »

Fix outdated statement that KVM is not able to expose SLAT
(Second-Layer-Address-Translation) to guests.
This was implemented a long time ago...

Signed-off-by: Liran Alon
Signed-off-by: Paolo Bonzini

Liran Alon
2018-06-02 01:18:27 +0800
5eec43a1f Merge tag 'kvmarm-for-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kv… ... Browse Code »

…marm/kvmarm into HEAD

KVM/ARM updates for 4.18

- Lazy context-switching of FPSIMD registers on arm64
- Allow virtual redistributors to be part of two or more MMIO ranges

Paolo Bonzini
2018-06-02 01:17:22 +0800