Eric Lee / smarc-fsl-linux-kernel

08 Sep, 2021

1 commit

192ad3c27 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM updates from Paolo Bonzini:
"ARM:
- Page ownership tracking between host EL1 and EL2
- Rely on userspace page tables to create large stage-2 mappings
- Fix incompatibility between pKVM and kmemleak
- Fix the PMU reset state, and improve the performance of the virtual
PMU
- Move over to the generic KVM entry code
- Address PSCI reset issues w.r.t. save/restore
- Preliminary rework for the upcoming pKVM fixed feature
- A bunch of MM cleanups
- a vGIC fix for timer spurious interrupts
- Various cleanups

s390:
- enable interpretation of specification exceptions
- fix a vcpu_idx vs vcpu_id mixup

x86:
- fast (lockless) page fault support for the new MMU
- new MMU now the default
- increased maximum allowed VCPU count
- allow inhibit IRQs on KVM_RUN while debugging guests
- let Hyper-V-enabled guests run with virtualized LAPIC as long as
they do not enable the Hyper-V "AutoEOI" feature
- fixes and optimizations for the toggling of AMD AVIC (virtualized
LAPIC)
- tuning for the case when two-dimensional paging (EPT/NPT) is
disabled
- bugfixes and cleanups, especially with respect to vCPU reset and
choosing a paging mode based on CR0/CR4/EFER
- support for 5-level page table on AMD processors

Generic:
- MMU notifier invalidation callbacks do not take mmu_lock unless
necessary
- improved caching of LRU kvm_memory_slot
- support for histogram statistics
- add statistics for halt polling and remote TLB flush requests"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (210 commits)
KVM: Drop unused kvm_dirty_gfn_invalid()
KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted
KVM: MMU: mark role_regs and role accessors as maybe unused
KVM: MIPS: Remove a "set but not used" variable
x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait
KVM: stats: Add VM stat for remote tlb flush requests
KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count()
KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page
KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality
Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()"
KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page
kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710
kvm: x86: Increase MAX_VCPUS to 1024
kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS
KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host
KVM: s390: index kvm->arch.idle_mask by vcpu_idx
KVM: s390: Enable specification exception interpretation
KVM: arm64: Trim guest debug exception handling
KVM: SVM: Add 5-level page table support for SVM
...

Linus Torvalds
2021-09-08 04:40:51 +0800

02 Sep, 2021

1 commit

4ac6d9086 Merge tag 'docs-5.15' of git://git.lwn.net/linux ... Browse Code »

Pull documentation updates from Jonathan Corbet:
"Yet another set of documentation changes:

- A reworking of PDF generation to yield better results for documents
using CJK fonts in particular.

- A new set of translations into traditional Chinese, a dialect for
which I am assured there is a community of interested readers.

- A lot more regular Chinese translation work as well.

... plus the usual assortment of updates, fixes, typo tweaks, etc"

* tag 'docs-5.15' of git://git.lwn.net/linux: (55 commits)
docs: sphinx-requirements: Move sphinx_rtd_theme to top
docs: pdfdocs: Enable language-specific font choice of zh_TW translations
docs: pdfdocs: Teach xeCJK about character classes of quotation marks
docs: pdfdocs: Permit AutoFakeSlant for CJK fonts
docs: pdfdocs: One-half spacing for CJK translations
docs: pdfdocs: Add conf.py local to translations for ascii-art alignment
docs: pdfdocs: Preserve inter-phrase space in Korean translations
docs: pdfdocs: Choose Serif font as CJK mainfont if possible
docs: pdfdocs: Add CJK-language-specific font settings
docs: pdfdocs: Refactor config for CJK document
scripts/kernel-doc: Override -Werror from KCFLAGS with KDOC_WERROR
docs/zh_CN: Add zh_CN/accounting/psi.rst
doc: align Italian translation
Documentation/features/vm: riscv supports THP now
docs/zh_CN: add infiniband user_verbs translation
docs/zh_CN: add infiniband user_mad translation
docs/zh_CN: add infiniband tag_matching translation
docs/zh_CN: add infiniband sysfs translation
docs/zh_CN: add infiniband opa_vnic translation
docs/zh_CN: add infiniband ipoib translation
...

Linus Torvalds
2021-09-02 09:49:47 +0800

21 Aug, 2021

2 commits

61e5f69ef KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ ... Browse Code »

KVM_GUESTDBG_BLOCKIRQ will allow KVM to block all interrupts
while running.

This change is mostly intended for more robust single stepping
of the guest and it has the following benefits when enabled:

* Resuming from a breakpoint is much more reliable.
When resuming execution from a breakpoint, with interrupts enabled,
more often than not, KVM would inject an interrupt and make the CPU
jump immediately to the interrupt handler and eventually return to
the breakpoint, to trigger it again.

From the user point of view it looks like the CPU never executed a
single instruction and in some cases that can even prevent forward
progress, for example, when the breakpoint is placed by an automated
script (e.g lx-symbols), which does something in response to the
breakpoint and then continues the guest automatically.
If the script execution takes enough time for another interrupt to
arrive, the guest will be stuck on the same breakpoint RIP forever.

* Normal single stepping is much more predictable, since it won't
land the debugger into an interrupt handler.

* RFLAGS.TF has less chance to be leaked to the guest:

We set that flag behind the guest's back to do single stepping
but if single step lands us into an interrupt/exception handler
it will be leaked to the guest in the form of being pushed
to the stack.
This doesn't completely eliminate this problem as exceptions
can still happen, but at least this reduces the chances
of this happening.

Signed-off-by: Maxim Levitsky
Message-Id:
Signed-off-by: Paolo Bonzini

Maxim Levitsky
2021-08-21 04:06:37 +0800
0176ec512 KVM: stats: Update doc for histogram statistics ... Browse Code »

Add documentations for linear and logarithmic histogram statistics.

Signed-off-by: Jing Zhang
Message-Id:
[Small changes to the phrasing. - Paolo]
Signed-off-by: Paolo Bonzini

Jing Zhang
2021-08-21 04:06:32 +0800

13 Aug, 2021

2 commits

9a63b4517 Merge branch 'kvm-tdpmmu-fixes' into HEAD ... Browse Code »

Merge topic branch with fixes for 5.14-rc6 and 5.15 merge window.

Paolo Bonzini
2021-08-13 15:35:01 +0800
ce25681d5 KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock ... Browse Code »

Add yet another spinlock for the TDP MMU and take it when marking indirect
shadow pages unsync. When using the TDP MMU and L1 is running L2(s) with
nested TDP, KVM may encounter shadow pages for the TDP entries managed by
L1 (controlling L2) when handling a TDP MMU page fault. The unsync logic
is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and
misbehaves when a shadow page is marked unsync via a TDP MMU page fault,
which runs with mmu_lock held for read, not write.

Lack of a critical section manifests most visibly as an underflow of
unsync_children in clear_unsync_child_bit() due to unsync_children being
corrupted when multiple CPUs write it without a critical section and
without atomic operations. But underflow is the best case scenario. The
worst case scenario is that unsync_children prematurely hits '0' and
leads to guest memory corruption due to KVM neglecting to properly sync
shadow pages.

Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock
would functionally be ok. Usurping the lock could degrade performance when
building upper level page tables on different vCPUs, especially since the
unsync flow could hold the lock for a comparatively long time depending on
the number of indirect shadow pages and the depth of the paging tree.

For simplicity, take the lock for all MMUs, even though KVM could fairly
easily know that mmu_lock is held for write. If mmu_lock is held for
write, there cannot be contention for the inner spinlock, and marking
shadow pages unsync across multiple vCPUs will be slow enough that
bouncing the kvm_arch cacheline should be in the noise.

Note, even though L2 could theoretically be given access to its own EPT
entries, a nested MMU must hold mmu_lock for write and thus cannot race
against a TDP MMU page fault. I.e. the additional spinlock only _needs_ to
be taken by the TDP MMU, as opposed to being taken by any MMU for a VM
that is running with the TDP MMU enabled. Holding mmu_lock for read also
prevents the indirect shadow page from being freed. But as above, keep
it simple and always take the lock.

Alternative #1, the TDP MMU could simply pass "false" for can_unsync and
effectively disable unsync behavior for nested TDP. Write protecting leaf
shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such
VMMs typically don't modify TDP entries, but the same may not hold true for
non-standard use cases and/or VMMs that are migrating physical pages (from
L1's perspective).

Alternative #2, the unsync logic could be made thread safe. In theory,
simply converting all relevant kvm_mmu_page fields to atomics and using
atomic bitops for the bitmap would suffice. However, (a) an in-depth audit
would be required, (b) the code churn would be substantial, and (c) legacy
shadow paging would incur additional atomic operations in performance
sensitive paths for no benefit (to legacy shadow paging).

Fixes: a2855afc7ee8 ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon
Signed-off-by: Sean Christopherson
Message-Id:
Signed-off-by: Paolo Bonzini

Sean Christopherson
2021-08-13 15:32:14 +0800

03 Aug, 2021

1 commit

52ac8b358 KVM: Block memslot updates across range_start() and range_end() ... Browse Code »

We would like to avoid taking mmu_lock for .invalidate_range_{start,end}()
notifications that are unrelated to KVM. Because mmu_notifier_count
must be modified while holding mmu_lock for write, and must always
be paired across start->end to stay balanced, lock elision must
happen in both or none. Therefore, in preparation for this change,
this patch prevents memslot updates across range_start() and range_end().

Note, technically flag-only memslot updates could be allowed in parallel,
but stalling a memslot update for a relatively short amount of time is
not a scalability issue, and this is all more than complex enough.

A long note on the locking: a previous version of the patch used an rwsem
to block the memslot update while the MMU notifier run, but this resulted
in the following deadlock involving the pseudo-lock tagged as
"mmu_notifier_invalidate_range_start".

======================================================
WARNING: possible circular locking dependency detected
5.12.0-rc3+ #6 Tainted: G OE
------------------------------------------------------
qemu-system-x86/3069 is trying to acquire lock:
ffffffff9c775ca0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: __mmu_notifier_invalidate_range_end+0x5/0x190

but task is already holding lock:
ffffaff7410a9160 (&kvm->mmu_notifier_slots_lock){.+.+}-{3:3}, at: kvm_mmu_notifier_invalidate_range_start+0x36d/0x4f0 [kvm]

which lock already depends on the new lock.

This corresponds to the following MMU notifier logic:

invalidate_range_start
take pseudo lock
down_read() (*)
release pseudo lock
invalidate_range_end
take pseudo lock (**)
up_read()
release pseudo lock

At point (*) we take the mmu_notifiers_slots_lock inside the pseudo lock;
at point (**) we take the pseudo lock inside the mmu_notifiers_slots_lock.

This could cause a deadlock (ignoring for a second that the pseudo lock
is not a lock):

- invalidate_range_start waits on down_read(), because the rwsem is
held by install_new_memslots

- install_new_memslots waits on down_write(), because the rwsem is
held till (another) invalidate_range_end finishes

- invalidate_range_end sits waits on the pseudo lock, held by
invalidate_range_start.

Removing the fairness of the rwsem breaks the cycle (in lockdep terms,
it would change the *shared* rwsem readers into *shared recursive*
readers), so open-code the wait using a readers count and a
spinlock. This also allows handling blockable and non-blockable
critical section in the same way.

Losing the rwsem fairness does theoretically allow MMU notifiers to
block install_new_memslots forever. Note that mm/mmu_notifier.c's own
retry scheme in mmu_interval_read_begin also uses wait/wake_up
and is likewise not fair.

Signed-off-by: Paolo Bonzini

Paolo Bonzini
2021-08-03 15:44:03 +0800

26 Jul, 2021

5 commits

3b1c8c568 docs: virt: kvm: api.rst: replace some characters ... Browse Code »

The conversion tools used during DocBook/LaTeX/html/Markdown->ReST
conversion and some cut-and-pasted text contain some characters that
aren't easily reachable on standard keyboards and/or could cause
troubles when parsed by the documentation build system.

Replace the occurences of the following characters:

- U+00a0 (' '): NO-BREAK SPACE
as it can cause lines being truncated on PDF output

Signed-off-by: Mauro Carvalho Chehab
Message-Id:
Signed-off-by: Paolo Bonzini

Mauro Carvalho Chehab
2021-07-26 20:26:06 +0800
0e691ee7b KVM: Documentation: Fix KVM_CAP_ENFORCE_PV_FEATURE_CPUID name ... Browse Code »

'KVM_CAP_ENFORCE_PV_CPUID' doesn't match the define in
include/uapi/linux/kvm.h.

Signed-off-by: Vitaly Kuznetsov
Message-Id:
Signed-off-by: Paolo Bonzini

Vitaly Kuznetsov
2021-07-26 20:24:30 +0800
b426d9d78 docs: virt: kvm: api.rst: replace some characters ... Browse Code »

The conversion tools used during DocBook/LaTeX/html/Markdown->ReST
conversion and some cut-and-pasted text contain some characters that
aren't easily reachable on standard keyboards and/or could cause
troubles when parsed by the documentation build system.

Replace the occurences of the following characters:

- U+00a0 (' '): NO-BREAK SPACE
as it can cause lines being truncated on PDF output

Signed-off-by: Mauro Carvalho Chehab
Link: https://lore.kernel.org/r/ff70cb42d63f3a1da66af1b21b8d038418ed5189.1626947264.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet

Mauro Carvalho Chehab
2021-07-26 04:35:46 +0800
a9fd134be docs: kvm: properly format code blocks and lists ... Browse Code »

Add a '::' so that a code block is interpreted properly and also add a
blank line before the start of a list.

Fixes: fdc09ddd4064 ("KVM: stats: Add documentation for binary statistics interface")
Signed-off-by: Ioana Ciornei
Reviewed-by: Jing Zhang
Link: https://lore.kernel.org/r/20210722100356.635078-4-ciorneiioana@gmail.com
Signed-off-by: Jonathan Corbet

Ioana Ciornei
2021-07-26 04:34:33 +0800
8b9671643 docs: kvm: fix build warnings ... Browse Code »

Fix some small build warnings. The title underline was too short in some
cases and a code block was not indented.

Documentation/virt/kvm/api.rst:7216: WARNING: Title underline too short.

Fixes: 6dba94035203 ("KVM: x86: Introduce KVM_GET_SREGS2 / KVM_SET_SREGS2")
Signed-off-by: Ioana Ciornei
Link: https://lore.kernel.org/r/20210722100356.635078-3-ciorneiioana@gmail.com
Signed-off-by: Jonathan Corbet

Ioana Ciornei
2021-07-26 04:34:33 +0800

29 Jun, 2021

1 commit

233a806b0 Merge tag 'docs-5.14' of git://git.lwn.net/linux ... Browse Code »

Pull documentation updates from Jonathan Corbet:
"This was a reasonably active cycle for documentation; this includes:

- Some kernel-doc cleanups. That script is still regex onslaught from
hell, but it has gotten a little better.

- Improvements to the checkpatch docs, which are also used by the
tool itself.

- A major update to the pathname lookup documentation.

- Elimination of :doc: markup, since our automarkup magic can create
references from filenames without all the extra noise.

- The flurry of Chinese translation activity continues.

Plus, of course, the usual collection of updates, typo fixes, and
warning fixes"

* tag 'docs-5.14' of git://git.lwn.net/linux: (115 commits)
docs: path-lookup: use bare function() rather than literals
docs: path-lookup: update symlink description
docs: path-lookup: update get_link() ->follow_link description
docs: path-lookup: update WALK_GET, WALK_PUT desc
docs: path-lookup: no get_link()
docs: path-lookup: update i_op->put_link and cookie description
docs: path-lookup: i_op->follow_link replaced with i_op->get_link
docs: path-lookup: Add macro name to symlink limit description
docs: path-lookup: remove filename_mountpoint
docs: path-lookup: update do_last() part
docs: path-lookup: update path_mountpoint() part
docs: path-lookup: update path_to_nameidata() part
docs: path-lookup: update follow_managed() part
docs: Makefile: Use CONFIG_SHELL not SHELL
docs: Take a little noise out of the build process
docs: x86: avoid using ReST :doc:`foo` markup
docs: virt: kvm: s390-pv-boot.rst: avoid using ReST :doc:`foo` markup
docs: userspace-api: landlock.rst: avoid using ReST :doc:`foo` markup
docs: trace: ftrace.rst: avoid using ReST :doc:`foo` markup
docs: trace: coresight: coresight.rst: avoid using ReST :doc:`foo` markup
...

Linus Torvalds
2021-06-29 07:53:05 +0800

25 Jun, 2021

6 commits

b8917b4ae Merge tag 'kvmarm-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD ... Browse Code »

KVM/arm64 updates for v5.14.

- Add MTE support in guests, complete with tag save/restore interface
- Reduce the impact of CMOs by moving them in the page-table code
- Allow device block mappings at stage-2
- Reduce the footprint of the vmemmap in protected mode
- Support the vGIC on dumb systems such as the Apple M1
- Add selftest infrastructure to support multiple configuration
and apply that to PMU/non-PMU setups
- Add selftests for the debug architecture
- The usual crop of PMU fixes

Paolo Bonzini
2021-06-25 23:24:24 +0800
19238e75b kvm: x86: Allow userspace to handle emulation errors ... Browse Code »

Add a fallback mechanism to the in-kernel instruction emulator that
allows userspace the opportunity to process an instruction the emulator
was unable to. When the in-kernel instruction emulator fails to process
an instruction it will either inject a #UD into the guest or exit to
userspace with exit reason KVM_INTERNAL_ERROR. This is because it does
not know how to proceed in an appropriate manner. This feature lets
userspace get involved to see if it can figure out a better path
forward.

Signed-off-by: Aaron Lewis
Reviewed-by: David Edmondson
Message-Id:
Reviewed-by: Jim Mattson
Signed-off-by: Paolo Bonzini

Aaron Lewis
2021-06-25 06:00:48 +0800
167f8a5ca KVM: x86/mmu: Rename "nxe" role bit to "efer_nx" for macro shenanigans ... Browse Code »

Rename "nxe" to "efer_nx" so that future macro magic can use the pattern
_ for all CR0, CR4, and EFER bits that included in the role.
Using "efer_nx" also makes it clear that the role bit reflects EFER.NX,
not the NX bit in the corresponding PTE.

Signed-off-by: Sean Christopherson
Message-Id:
Signed-off-by: Paolo Bonzini

Sean Christopherson
2021-06-25 06:00:41 +0800
00a669780 KVM: x86/mmu: Use MMU role to check for matching guest page sizes ... Browse Code »

Originally, __kvm_sync_page used to check the cr4_pae bit in the role
to avoid zapping 4-byte kvm_mmu_pages when guest page size are 8-byte
or the other way round. However, in commit 47c42e6b4192 ("KVM: x86: fix
handling of role.cr4_pae and rename it to 'gpte_size'", 2019-03-28) it
was observed that this did not work for nested EPT, where the page table
size would be 8 bytes even if CR4.PAE=0. (Note that the check still
has to be done for nested *NPT*, so it is not possible to use tdp_enabled
or similar).

Therefore, a hack was introduced to identify nested EPT shadow pages
and unconditionally call __kvm_sync_page() on them. However, it is
possible to do without the hack to identify nested EPT shadow pages:
if EPT is active, there will be no shadow pages in non-EPT format,
and all of them will have gpte_is_8_bytes set to true; we can just
check the MMU role directly, and the test will always be true.

Even for non-EPT shadow MMUs, this test should really always be true
now that __kvm_sync_page() is called if and only if the role is an
exact match (kvm_mmu_get_page()) or is part of the current MMU context
(kvm_mmu_sync_roots()). A future commit will convert the likely-pointless
check into a meaningful WARN to enforce that the mmu_roles of the current
context and the shadow page are compatible.

Cc: Vitaly Kuznetsov
Signed-off-by: Sean Christopherson
Message-Id:
Signed-off-by: Paolo Bonzini

Sean Christopherson
2021-06-25 06:00:37 +0800
63f5a1909 KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken ... Browse Code »

Warn userspace that KVM_SET_CPUID{,2} after KVM_RUN "may" cause guest
instability. Initialize last_vmentry_cpu to -1 and use it to detect if
the vCPU has been run at least once when its CPUID model is changed.

KVM does not correctly handle changes to paging related settings in the
guest's vCPU model after KVM_RUN, e.g. MAXPHYADDR, GBPAGES, etc... KVM
could theoretically zap all shadow pages, but actually making that happen
is a mess due to lock inversion (vcpu->mutex is held). And even then,
updating paging settings on the fly would only work if all vCPUs are
stopped, updated in concert with identical settings, then restarted.

To support running vCPUs with different vCPU models (that affect paging),
KVM would need to track all relevant information in kvm_mmu_page_role.
Note, that's the _page_ role, not the full mmu_role. Updating mmu_role
isn't sufficient as a vCPU can reuse a shadow page translation that was
created by a vCPU with different settings and thus completely skip the
reserved bit checks (that are tied to CPUID).

Tracking CPUID state in kvm_mmu_page_role is _extremely_ undesirable as
it would require doubling gfn_track from a u16 to a u32, i.e. would
increase KVM's memory footprint by 2 bytes for every 4kb of guest memory.
E.g. MAXPHYADDR (6 bits), GBPAGES, AMD vs. INTEL = 1 bit, and SEV C-BIT
would all need to be tracked.

In practice, there is no remotely sane use case for changing any paging
related CPUID entries on the fly, so just sweep it under the rug (after
yelling at userspace).

Signed-off-by: Sean Christopherson
Message-Id:
Signed-off-by: Paolo Bonzini

Sean Christopherson
2021-06-25 06:00:36 +0800
fdc09ddd4 KVM: stats: Add documentation for binary statistics interface ... Browse Code »

This new API provides a file descriptor for every VM and VCPU to read
KVM statistics data in binary format.
It is meant to provide a lightweight, flexible, scalable and efficient
lock-free solution for user space telemetry applications to pull the
statistics data periodically for large scale systems. The pulling
frequency could be as high as a few times per second.
The statistics descriptors are defined by KVM in kernel and can be
by userspace to discover VM/VCPU statistics during the one-time setup
stage.
The statistics data itself could be read out by userspace telemetry
periodically without any extra parsing or setup effort.
There are a few existed interface protocols and definitions, but no
one can fulfil all the requirements this interface implemented as
below:
1. During high frequency periodic stats reading, there should be no
extra efforts except the stats data read itself.
2. Support stats annotation, like type (cumulative, instantaneous,
peak, histogram, etc) and unit (counter, time, size, cycles, etc).
3. The stats data reading should be free of lock/synchronization. We
don't care about the consistency between all the stats data. All
stats data can not be read out at exactly the same time. We really
care about the change or trend of the stats data. The lock-free
solution is not just for efficiency and scalability, also for the
stats data accuracy and usability. For example, in the situation
that all the stats data readings are protected by a global lock,
if one VCPU died somehow with that lock held, then all stats data
reading would be blocked, then we have no way from stats data that
which VCPU has died.
4. The stats data reading workload can be handed over to other
unprivileged process.

Reviewed-by: David Matlack
Reviewed-by: Ricardo Koller
Reviewed-by: Krish Sadhukhan
Reviewed-by: Fuad Tabba
Signed-off-by: Jing Zhang
Message-Id:
Signed-off-by: Paolo Bonzini

Jing Zhang
2021-06-25 06:00:23 +0800

23 Jun, 2021

1 commit

c3ab0e28a Merge branch 'topic/ppc-kvm' of https://git.kernel.org/pub/scm/linux/kernel/git/… ... Browse Code »

…powerpc/linux into HEAD

- Support for the H_RPT_INVALIDATE hypercall

- Conversion of Book3S entry/exit to C

- Bug fixes

Paolo Bonzini
2021-06-23 19:30:41 +0800

22 Jun, 2021

2 commits

b87cc116c KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE capability ... Browse Code »

Now that we have H_RPT_INVALIDATE fully implemented, enable
support for the same via KVM_CAP_PPC_RPT_INVALIDATE KVM capability

Signed-off-by: Bharata B Rao
Reviewed-by: David Gibson
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/20210621085003.904767-6-bharata@linux.ibm.com

Bharata B Rao
2021-06-22 21:38:28 +0800
04c02c201 KVM: arm64: Document MTE capability and ioctl ... Browse Code »

A new capability (KVM_CAP_ARM_MTE) identifies that the kernel supports
granting a guest access to the tags, and provides a mechanism for the
VMM to enable it.

A new ioctl (KVM_ARM_MTE_COPY_TAGS) provides a simple way for a VMM to
access the tags of a guest without having to maintain a PROT_MTE mapping
in userspace. The above capability gates access to the ioctl.

Reviewed-by: Catalin Marinas
Signed-off-by: Steven Price
Signed-off-by: Marc Zyngier
Link: https://lore.kernel.org/r/20210621111716.37157-7-steven.price@arm.com

Steven Price
2021-06-22 21:08:07 +0800

18 Jun, 2021

5 commits

c6c032bf2 docs: virt: kvm: s390-pv-boot.rst: avoid using ReST :doc:`foo` markup ... Browse Code »

The :doc:`foo` tag is auto-generated via automarkup.py.
So, use the filename at the sources, instead of :doc:`foo`.

Signed-off-by: Mauro Carvalho Chehab
Link: https://lore.kernel.org/r/8c0fc6578ff6384580fd0d622f363bbbd4fe91da.1623824363.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet

Mauro Carvalho Chehab
2021-06-18 03:24:39 +0800
0dbb11230 KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall ... Browse Code »

This hypercall is used by the SEV guest to notify a change in the page
encryption status to the hypervisor. The hypercall should be invoked
only when the encryption attribute is changed from encrypted -> decrypted
and vice versa. By default all guest pages are considered encrypted.

The hypercall exits to userspace to manage the guest shared regions and
integrate with the userspace VMM's migration code.

Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Paolo Bonzini
Cc: Joerg Roedel
Cc: Borislav Petkov
Cc: Tom Lendacky
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford
Signed-off-by: Brijesh Singh
Signed-off-by: Ashish Kalra
Co-developed-by: Sean Christopherson
Signed-off-by: Sean Christopherson
Co-developed-by: Paolo Bonzini
Signed-off-by: Paolo Bonzini
Message-Id:
Signed-off-by: Paolo Bonzini

Ashish Kalra
2021-06-18 02:25:39 +0800
6dba94035 KVM: x86: Introduce KVM_GET_SREGS2 / KVM_SET_SREGS2 ... Browse Code »

This is a new version of KVM_GET_SREGS / KVM_SET_SREGS.

It has the following changes:
* Has flags for future extensions
* Has vcpu's PDPTRs, allowing to save/restore them on migration.
* Lacks obsolete interrupt bitmap (done now via KVM_SET_VCPU_EVENTS)

New capability, KVM_CAP_SREGS2 is added to signal
the userspace of this ioctl.

Signed-off-by: Maxim Levitsky
Message-Id:
Signed-off-by: Paolo Bonzini

Maxim Levitsky
2021-06-18 01:09:47 +0800
644f70671 KVM: x86: hyper-v: Introduce KVM_CAP_HYPERV_ENFORCE_CPUID ... Browse Code »

Modeled after KVM_CAP_ENFORCE_PV_FEATURE_CPUID, the new capability allows
for limiting Hyper-V features to those exposed to the guest in Hyper-V
CPUIDs (0x40000003, 0x40000004, ...).

Signed-off-by: Vitaly Kuznetsov
Signed-off-by: Paolo Bonzini
Message-Id:
Signed-off-by: Paolo Bonzini

Vitaly Kuznetsov
2021-06-18 01:09:38 +0800
b10a038e8 KVM: mmu: Add slots_arch_lock for memslot arch fields ... Browse Code »

Add a new lock to protect the arch-specific fields of memslots if they
need to be modified in a kvm->srcu read critical section. A future
commit will use this lock to lazily allocate memslot rmaps for x86.

Signed-off-by: Ben Gardon
Message-Id:
[Add Documentation/ hunk. - Paolo]
Signed-off-by: Paolo Bonzini

Ben Gardon
2021-06-18 01:09:26 +0800

09 Jun, 2021

1 commit

b1bd5cba3 KVM: X86: MMU: Use the correct inherited permissions to get shadow page ... Browse Code »

When computing the access permissions of a shadow page, use the effective
permissions of the walk up to that point, i.e. the logic AND of its parents'
permissions. Two guest PxE entries that point at the same table gfn need to
be shadowed with different shadow pages if their parents' permissions are
different. KVM currently uses the effective permissions of the last
non-leaf entry for all non-leaf entries. Because all non-leaf SPTEs have
full ("uwx") permissions, and the effective permissions are recorded only
in role.access and merged into the leaves, this can lead to incorrect
reuse of a shadow page and eventually to a missing guest protection page
fault.

For example, here is a shared pagetable:

pgd[] pud[] pmd[] virtual address pointers
/->pmd1(u--)->pte1(uw-)->page1 pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 pmd2(uw-)->pte2(uw-)->page2 access. "u--" is used also to get
the pagetable for pud1, instead of "uw-".

- Then the guest writes to ptr2 and KVM reuses pud1 which is present.
The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
even though the pud1 pmd (because of the incorrect argument to
kvm_mmu_get_page in the previous step) has role.access="u--".

- Then the guest reads from ptr3. The hypervisor reuses pud1's
shadow pmd for pud2, because both use "u--" for their permissions.
Thus, the shadow pmd already includes entries for both pmd1 and pmd2.

- At last, the guest writes to ptr4. This causes no vmexit or pagefault,
because pud1's shadow page structures included an "uw-" page even though
its role.access was "u--".

Any kind of shared pagetable might have the similar problem when in
virtual machine without TDP enabled if the permissions are different
from different ancestors.

In order to fix the problem, we change pt->access to be an array, and
any access in it will not include permissions ANDed from child ptes.

The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
Remember to test it with TDP disabled.

The problem had existed long before the commit 41074d07c78b ("KVM: MMU:
Fix inherited permissions for emulated guest pte updates"), and it
is hard to find which is the culprit. So there is no fixes tag here.

Signed-off-by: Lai Jiangshan
Message-Id:
Cc: stable@vger.kernel.org
Fixes: cea0f0e7ea54 ("[PATCH] KVM: MMU: Shadow page table caching")
Signed-off-by: Paolo Bonzini

Lai Jiangshan
2021-06-09 00:29:53 +0800

30 May, 2021

1 commit

224478289 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM fixes from Paolo Bonzini:
"ARM fixes:

- Another state update on exit to userspace fix

- Prevent the creation of mixed 32/64 VMs

- Fix regression with irqbypass not restarting the guest on failed
connect

- Fix regression with debug register decoding resulting in
overlapping access

- Commit exception state on exit to usrspace

- Fix the MMU notifier return values

- Add missing 'static' qualifiers in the new host stage-2 code

x86 fixes:

- fix guest missed wakeup with assigned devices

- fix WARN reported by syzkaller

- do not use BIT() in UAPI headers

- make the kvm_amd.avic parameter bool

PPC fixes:

- make halt polling heuristics consistent with other architectures

selftests:

- various fixes

- new performance selftest memslot_perf_test

- test UFFD minor faults in demand_paging_test"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (44 commits)
selftests: kvm: fix overlapping addresses in memslot_perf_test
KVM: X86: Kill off ctxt->ud
KVM: X86: Fix warning caused by stale emulation context
KVM: X86: Use kvm_get_linear_rip() in single-step and #DB/#BP interception
KVM: x86/mmu: Fix comment mentioning skip_4k
KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
KVM: rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK
KVM: x86: add start_assignment hook to kvm_x86_ops
KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch
selftests: kvm: do only 1 memslot_perf_test run by default
KVM: X86: Use _BITUL() macro in UAPI headers
KVM: selftests: add shared hugetlbfs backing source type
KVM: selftests: allow using UFFD minor faults for demand paging
KVM: selftests: create alias mappings when using shared memory
KVM: selftests: add shmem backing source type
KVM: selftests: refactor vm_mem_backing_src_type flags
KVM: selftests: allow different backing source types
KVM: selftests: compute correct demand paging size
KVM: selftests: simplify setup_demand_paging error handling
KVM: selftests: Print a message if /dev/kvm is missing
...

Linus Torvalds
2021-05-30 00:02:25 +0800

27 May, 2021

1 commit

084071d5e KVM: rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK ... Browse Code »

KVM_REQ_UNBLOCK will be used to exit a vcpu from
its inner vcpu halt emulation loop.

Rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK, switch
PowerPC to arch specific request bit.

Signed-off-by: Marcelo Tosatti

Message-Id:
Signed-off-by: Paolo Bonzini

Marcelo Tosatti
2021-05-27 19:57:38 +0800

21 May, 2021

2 commits

0a5fab9f0 docs: virt: api.rst: fix a pointer to SGX documentation ... Browse Code »

The document which describes the SGX kernel architecture was added at
commit 3fa97bf00126 ("Documentation/x86: Document SGX kernel architecture")

but the reference at virt/kvm/api.rst is pointing to some
non-existing document.

Signed-off-by: Mauro Carvalho Chehab
Link: https://lore.kernel.org/r/138c24633c6e4edf862a2b4d77033c603fc10406.1621413933.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet

Mauro Carvalho Chehab
2021-05-21 03:44:14 +0800
e437c1a3e docs: vcpu-requests.rst: fix reference for atomic ops ... Browse Code »

Changeset f0400a77ebdc ("atomic: Delete obsolete documentation")
got rid of atomic_ops.rst, pointing that this was superseded by
Documentation/atomic_*.txt.

Update its reference accordingly.

Fixes: f0400a77ebdc ("atomic: Delete obsolete documentation")
Signed-off-by: Mauro Carvalho Chehab
Link: https://lore.kernel.org/r/703af756ac26a06c2185c05dfe6d902253f11161.1621413933.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet

Mauro Carvalho Chehab
2021-05-21 03:44:13 +0800

17 May, 2021

1 commit

ccb013c29 Merge tag 'x86_urgent_for_v5.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Borislav Petkov:
"The three SEV commits are not really urgent material. But we figured
since getting them in now will avoid a huge amount of conflicts
between future SEV changes touching tip, the kvm and probably other
trees, sending them to you now would be best.

The idea is that the tip, kvm etc branches for 5.14 will all base
ontop of -rc2 and thus everything will be peachy. What is more, those
changes are purely mechanical and defines movement so they should be
fine to go now (famous last words).

Summary:

- Enable -Wundef for the compressed kernel build stage

- Reorganize SEV code to streamline and simplify future development"

* tag 'x86_urgent_for_v5.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/compressed: Enable -Wundef
x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG
x86/sev: Move GHCB MSR protocol and NAE definitions in a common header
x86/sev-es: Rename sev-es.{ch} to sev.{ch}

Linus Torvalds
2021-05-17 00:31:06 +0800

10 May, 2021

1 commit

059e5c321 x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG ... Browse Code »

The SYSCFG MSR continued being updated beyond the K8 family; drop the K8
name from it.

Suggested-by: Borislav Petkov
Signed-off-by: Brijesh Singh
Signed-off-by: Borislav Petkov
Acked-by: Joerg Roedel
Link: https://lkml.kernel.org/r/20210427111636.1207-4-brijesh.singh@amd.com

Brijesh Singh
2021-05-10 13:51:38 +0800

07 May, 2021

1 commit

46a63924b doc/kvm: Fix wrong entry for KVM_CAP_X86_MSR_FILTER ... Browse Code »

The capability that exposes new ioctl KVM_X86_SET_MSR_FILTER to
userspace is specified incorrectly as the ioctl itself (instead of
KVM_CAP_X86_MSR_FILTER). This patch fixes it.

Fixes: 1a155254ff93 ("KVM: x86: Introduce MSR filtering")
Reviewed-by: Alexander Graf
Signed-off-by: Siddharth Chandrasekaran
Message-Id:
Signed-off-by: Paolo Bonzini

Siddharth Chandrasekaran
2021-05-07 18:06:11 +0800

02 May, 2021

1 commit

152d32aa8 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull kvm updates from Paolo Bonzini:
"This is a large update by KVM standards, including AMD PSP (Platform
Security Processor, aka "AMD Secure Technology") and ARM CoreSight
(debug and trace) changes.

ARM:

- CoreSight: Add support for ETE and TRBE

- Stage-2 isolation for the host kernel when running in protected
mode

- Guest SVE support when running in nVHE mode

- Force W^X hypervisor mappings in nVHE mode

- ITS save/restore for guests using direct injection with GICv4.1

- nVHE panics now produce readable backtraces

- Guest support for PTP using the ptp_kvm driver

- Performance improvements in the S2 fault handler

x86:

- AMD PSP driver changes

- Optimizations and cleanup of nested SVM code

- AMD: Support for virtual SPEC_CTRL

- Optimizations of the new MMU code: fast invalidation, zap under
read lock, enable/disably dirty page logging under read lock

- /dev/kvm API for AMD SEV live migration (guest API coming soon)

- support SEV virtual machines sharing the same encryption context

- support SGX in virtual machines

- add a few more statistics

- improved directed yield heuristics

- Lots and lots of cleanups

Generic:

- Rework of MMU notifier interface, simplifying and optimizing the
architecture-specific code

- a handful of "Get rid of oprofile leftovers" patches

- Some selftests improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits)
KVM: selftests: Speed up set_memory_region_test
selftests: kvm: Fix the check of return value
KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt()
KVM: SVM: Skip SEV cache flush if no ASIDs have been used
KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids()
KVM: SVM: Drop redundant svm_sev_enabled() helper
KVM: SVM: Move SEV VMCB tracking allocation to sev.c
KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup()
KVM: SVM: Unconditionally invoke sev_hardware_teardown()
KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported)
KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y
KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables
KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features
KVM: SVM: Move SEV module params/variables to sev.c
KVM: SVM: Disable SEV/SEV-ES if NPT is disabled
KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails
KVM: SVM: Zero out the VMCB array used to track SEV ASID association
x86/sev: Drop redundant and potentially misleading 'sev_enabled'
KVM: x86: Move reverse CPUID helpers to separate header file
KVM: x86: Rename GPR accessors to make mode-aware variants the defaults
...

Linus Torvalds
2021-05-02 01:14:08 +0800

27 Apr, 2021

1 commit

2f9ef0559 Merge tag 'docs-5.13' of git://git.lwn.net/linux ... Browse Code »

Pull documentation updates from Jonathan Corbet:
"It's been a relatively busy cycle in docsland, though more than
usually well contained to Documentation/ itself. Highlights include:

- The Chinese translators have been busy and show no signs of
stopping anytime soon. Italian has also caught up.

- Aditya Srivastava has been working on improvements to the
kernel-doc script.

- Thorsten continues his work on reporting-issues.rst and related
documentation around regression reporting.

- Lots of documentation updates, typo fixes, etc. as usual"

* tag 'docs-5.13' of git://git.lwn.net/linux: (139 commits)
docs/zh_CN: add openrisc translation to zh_CN index
docs/zh_CN: add openrisc index.rst translation
docs/zh_CN: add openrisc todo.rst translation
docs/zh_CN: add openrisc openrisc_port.rst translation
docs/zh_CN: add core api translation to zh_CN index
docs/zh_CN: add core-api index.rst translation
docs/zh_CN: add core-api irq index.rst translation
docs/zh_CN: add core-api irq irqflags-tracing.rst translation
docs/zh_CN: add core-api irq irq-domain.rst translation
docs/zh_CN: add core-api irq irq-affinity.rst translation
docs/zh_CN: add core-api irq concepts.rst translation
docs: sphinx-pre-install: don't barf on beta Sphinx releases
scripts: kernel-doc: improve parsing for kernel-doc comments syntax
docs/zh_CN: two minor fixes in zh_CN/doc-guide/
Documentation: dev-tools: Add Testing Overview
docs/zh_CN: add translations in zh_CN/dev-tools/gcov
docs: reporting-issues: make people CC the regressions list
MAINTAINERS: add regressions mailing list
doc:it_IT: align Italian documentation
docs/zh_CN: sync reporting-issues.rst
...

Linus Torvalds
2021-04-27 04:22:43 +0800

26 Apr, 2021

1 commit

f82762fb6 KVM: documentation: fix sphinx warnings ... Browse Code »

Signed-off-by: Paolo Bonzini

Paolo Bonzini
2021-04-26 17:19:28 +0800

23 Apr, 2021

2 commits

c4f71901d Merge tag 'kvmarm-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD ... Browse Code »

KVM/arm64 updates for Linux 5.13

New features:

- Stage-2 isolation for the host kernel when running in protected mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
- Alexandru is now a reviewer (not really a new feature...)

Fixes:
- Proper emulation of the GICR_TYPER register
- Handle the complete set of relocation in the nVHE EL2 object
- Get rid of the oprofile dependency in the PMU code (and of the
oprofile body parts at the same time)
- Debug and SPE fixes
- Fix vcpu reset

Paolo Bonzini
2021-04-23 19:41:17 +0800
fd49e8ee7 Merge branch 'kvm-sev-cgroup' into HEAD Browse Code »

Paolo Bonzini
2021-04-23 01:19:01 +0800