02 Jun, 2018
4 commits
-
The kvm struct has been bloating. For example, it's tens of kilo-bytes
for x86, which turns out to be a large amount of memory to allocate
contiguously via kzalloc. Thus, this patch does the following:
1. Uses architecture-specific routines to allocate the kvm struct via
vzalloc for x86.
2. Switches arm to __KVM_HAVE_ARCH_VM_ALLOC so that it can use vzalloc
when has_vhe() is true.Other architectures continue to default to kalloc, as they have a
dependency on kalloc or have a small-enough struct kvm.Signed-off-by: Marc Orr
Reviewed-by: Marc Zyngier
Signed-off-by: Paolo Bonzini -
Use new return type vm_fault_t for fault handler. For
now, this is just documenting that the function returns
a VM_FAULT value rather than an errno. Once all instances
are converted, vm_fault_t will become a distinct type.commit 1c8f422059ae ("mm: change return type to vm_fault_t")
Signed-off-by: Souptick Joarder
Reviewed-by: Matthew Wilcox
Signed-off-by: Paolo Bonzini -
…t/kvms390/linux into HEAD
KVM: s390: Cleanups for 4.18
- cleanups for nested, clock handling, crypto, storage keys and
control register bits -
…marm/kvmarm into HEAD
KVM/ARM updates for 4.18
- Lazy context-switching of FPSIMD registers on arm64
- Allow virtual redistributors to be part of two or more MMIO ranges
26 May, 2018
10 commits
-
Signed-off-by: Liran Alon
Signed-off-by: Radim Krčmář -
Document the subtle nuances that KVM_CAP_X86_DISABLE_EXITS induces in
the KVM_GET_SUPPORTED_CPUID API.Fixes: 4d5422cea3b6 ("KVM: X86: Provide a capability to disable MWAIT intercepts")
Signed-off-by: Jim Mattson
Signed-off-by: Radim Krčmář -
We need a new capability to indicate support for the newly added
HvFlushVirtualAddress{List,Space}{,Ex} hypercalls. Upon seeing this
capability, userspace is supposed to announce PV TLB flush features
by setting the appropriate CPUID bits (if needed).Signed-off-by: Vitaly Kuznetsov
Signed-off-by: Radim Krčmář -
Implement HvFlushVirtualAddress{List,Space}Ex hypercalls in the same way
we've implemented non-EX counterparts.Signed-off-by: Vitaly Kuznetsov
[Initialized valid_bank_mask to silence misguided GCC warnigs. - Radim]
Signed-off-by: Radim Krčmář -
Implement HvFlushVirtualAddress{List,Space} hypercalls in a simplistic way:
do full TLB flush with KVM_REQ_TLB_FLUSH and kick vCPUs which are currently
IN_GUEST_MODE.Signed-off-by: Vitaly Kuznetsov
Signed-off-by: Radim Krčmář -
Hyper-V style PV TLB flush hypercalls inmplementation will use this API.
To avoid memory allocation in CONFIG_CPUMASK_OFFSTACK case add
cpumask_var_t argument.Signed-off-by: Vitaly Kuznetsov
Signed-off-by: Radim Krčmář -
Prepare to support TLB flush hypercalls, some of which are REP hypercalls.
Also, return HV_STATUS_INVALID_HYPERCALL_INPUT as it seems more
appropriate.Signed-off-by: Vitaly Kuznetsov
Signed-off-by: Radim Krčmář -
Avoid open-coding offsets for hypercall input parameters, we already
have defines for them.Signed-off-by: Vitaly Kuznetsov
Signed-off-by: Radim Krčmář -
Hyper-V TLB flush hypercalls definitions will be required for KVM so move
them hyperv-tlfs.h. Structures also need to be renamed as '_pcpu' suffix is
irrelevant for a general-purpose definition.Signed-off-by: Vitaly Kuznetsov
Signed-off-by: Radim Krčmář -
To resolve conflicts with the PV TLB flush series.
25 May, 2018
26 commits
-
Let's raise the number of supported vcpus along with
vgic v3 now that HW is looming with more physical CPUs.Signed-off-by: Eric Auger
Acked-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
Now all the internals are ready to handle multiple redistributor
regions, let's allow the userspace to register them.Signed-off-by: Eric Auger
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
This new attribute allows the userspace to set the base address
of a reditributor region, relaxing the constraint of having all
consecutive redistibutor frames contiguous.Signed-off-by: Eric Auger
Acked-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
On vcpu first run, we eventually know the actual number of vcpus.
This is a synchronization point to check all redistributors
were assigned. On kvm_vgic_map_resources() we check both dist and
redist were set, eventually check potential base address inconsistencies.Signed-off-by: Eric Auger
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
As we are going to register several redist regions,
vgic_register_all_redist_iodevs() may be called several times. We need
to register a redist_iodev for a given vcpu only once. So let's
check if the base address has already been set. Initialize this latter
in kvm_vgic_vcpu_init().Signed-off-by: Eric Auger
Acked-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
kvm_vgic_vcpu_early_init gets called after kvm_vgic_cpu_init which
is confusing. The call path is as follows:
kvm_vm_ioctl_create_vcpu
|_ kvm_arch_cpu_create
|_ kvm_vcpu_init
|_ kvm_arch_vcpu_init
|_ kvm_vgic_vcpu_init
|_ kvm_arch_vcpu_postcreate
|_ kvm_vgic_vcpu_early_initStatic initialization currently done in kvm_vgic_vcpu_early_init()
can be moved to kvm_vgic_vcpu_init(). So let's move the code and
remove kvm_vgic_vcpu_early_init(). kvm_arch_vcpu_postcreate() does
nothing.Signed-off-by: Eric Auger
Signed-off-by: Marc Zyngier -
We introduce a new helper that creates and inserts a new redistributor
region into the rdist region list. This helper both handles the case
where the redistributor region size is known at registration time
and the legacy case where it is not (eventually depending on the number
of online vcpus). Depending on pfns, we perform all the possible checks
that we can do:- end of memory crossing
- incorrect alignment of the base address
- collision with distributor region if already defined
- collision with already registered rdist regions
- check of the new indexRdist regions must be inserted by increasing order of indices. Indices
must be contiguous.Signed-off-by: Eric Auger
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
vgic_v3_check_base() currently only handles the case of a unique
legacy redistributor region whose size is not explicitly set but
inferred, instead, from the number of online vcpus.We adapt it to handle the case of multiple redistributor regions
with explicitly defined size. We rely on two new helpers:
- vgic_v3_rdist_overlap() is used to detect overlap with the dist
region if defined
- vgic_v3_rd_region_size computes the size of the redist region,
would it be a legacy unique region or a new explicitly sized
region.Signed-off-by: Eric Auger
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
The TYPER of an redistributor reflects whether the rdist is
the last one of the redistributor region. Let's compare the TYPER
GPA against the address of the last occupied slot within the
redistributor region.Signed-off-by: Eric Auger
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
We introduce vgic_v3_rdist_free_slot to help identifying
where we can place a new 2x64KB redistributor.Signed-off-by: Eric Auger
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
At the moment KVM supports a single rdist region. We want to
support several separate rdist regions so let's introduce a list
of them. This patch currently only cares about a single
entry in this list as the functionality to register several redist
regions is not yet there. So this only translates the existing code
into something functionally similar using that new data struct.The redistributor region handle is stored in the vgic_cpu structure
to allow later computation of the TYPER last bit.Signed-off-by: Eric Auger
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
We introduce a new KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION attribute in
KVM_DEV_ARM_VGIC_GRP_ADDR group. It allows userspace to provide the
base address and size of a redistributor regionCompared to KVM_VGIC_V3_ADDR_TYPE_REDIST, this new attribute allows
to declare several separate redistributor regions.So the whole redist space does not need to be contiguous anymore.
Signed-off-by: Eric Auger
Reviewed-by: Peter Maydell
Acked-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
in case kvm_vgic_map_resources() fails, typically if the vgic
distributor is not defined, __kvm_vgic_destroy will be called
several times. Indeed kvm_vgic_map_resources() is called on
first vcpu run. As a result dist->spis is freeed more than once
and on the second time it causes a "kernel BUG at mm/slub.c:3912!"Set dist->spis to NULL to avoid the crash.
Fixes: ad275b8bb1e6 ("KVM: arm/arm64: vgic-new: vgic_init: implement
vgic_init")Signed-off-by: Eric Auger
Reviewed-by: Marc Zyngier
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
The conversion of the FPSIMD context switch trap code to C has added
some overhead to calling it, due to the need to save registers that
the procedure call standard defines as caller-saved.So, perhaps it is no longer worth invoking this trap handler quite
so early.Instead, we can invoke it from fixup_guest_exit(), with little
likelihood of increasing the overhead much further.As a convenience, this patch gives __hyp_switch_fpsimd() the same
return semantics fixup_guest_exit(). For now there is no
possibility of a spurious FPSIMD trap, so the function always
returns true, but this allows it to be tail-called with a single
return statement.Signed-off-by: Dave Martin
Reviewed-by: Marc Zyngier
Reviewed-by: Christoffer Dall
Reviewed-by: Alex Bennée
Signed-off-by: Marc Zyngier -
The entire tail of fixup_guest_exit() is contained in if statements
of the form if (x && *exit_code == ARM_EXCEPTION_TRAP). As a result,
we can check just once and bail out of the function early, allowing
the remaining if conditions to be simplified.The only awkward case is where *exit_code is changed to
ARM_EXCEPTION_EL1_SERROR in the case of an illegal GICv2 CPU
interface access: in that case, the GICv3 trap handling code is
skipped using a goto. This avoids pointlessly evaluating the
static branch check for the GICv3 case, even though we can't have
vgic_v2_cpuif_trap and vgic_v3_cpuif_trap true simultaneously
unless we have a GICv3 and GICv2 on the host: that sounds stupid,
but I haven't satisfied myself that it can't happen.No functional change.
Signed-off-by: Dave Martin
Reviewed-by: Marc Zyngier
Reviewed-by: Alex Bennée
Acked-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
In fixup_guest_exit(), there are a couple of cases where after
checking what the exit code was, we assign it explicitly with the
value it already had.Assuming this is not indicative of a bug, these assignments are not
needed.This patch removes the redundant assignments, and simplifies some
if-nesting that becomes trivial as a result.No functional change.
Signed-off-by: Dave Martin
Reviewed-by: Alex Bennée
Acked-by: Marc Zyngier
Acked-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
Now that the host SVE context can be saved on demand from Hyp,
there is no longer any need to save this state in advance before
entering the guest.This patch removes the relevant call to
kvm_fpsimd_flush_cpu_state().Since the problem that function was intended to solve now no longer
exists, the function and its dependencies are also deleted.Signed-off-by: Dave Martin
Reviewed-by: Alex Bennée
Acked-by: Christoffer Dall
Acked-by: Marc Zyngier
Acked-by: Catalin Marinas
Signed-off-by: Marc Zyngier -
This patch adds SVE context saving to the hyp FPSIMD context switch
path. This means that it is no longer necessary to save the host
SVE state in advance of entering the guest, when in use.In order to avoid adding pointless complexity to the code, VHE is
assumed if SVE is in use. VHE is an architectural prerequisite for
SVE, so there is no good reason to turn CONFIG_ARM64_VHE off in
kernels that support both SVE and KVM.Historically, software models exist that can expose the
architecturally invalid configuration of SVE without VHE, so if
this situation is detected at kvm_init() time then KVM will be
disabled.Signed-off-by: Dave Martin
Reviewed-by: Alex Bennée
Acked-by: Catalin Marinas
Signed-off-by: Marc Zyngier -
In order to make sve_save_state()/sve_load_state() more easily
reusable and to get rid of a potential branch on context switch
critical paths, this patch makes sve_pffr() inline and moves it to
fpsimd.h.must be included in fpsimd.h in order to make
this work, and this creates an #include cycle that is tricky to
avoid without modifying core code, due to the way the PR_SVE_*()
prctl helpers are included in the core prctl implementation.Instead of breaking the cycle, this patch defers inclusion of
in until the point where it is
actually needed: i.e., immediately before the prctl definitions.No functional change.
Signed-off-by: Dave Martin
Reviewed-by: Alex Bennée
Acked-by: Catalin Marinas
Acked-by: Marc Zyngier
Signed-off-by: Marc Zyngier -
sve_pffr(), which is used to derive the base address used for
low-level SVE save/restore routines, currently takes the relevant
task_struct as an argument.The only accessed fields are actually part of thread_struct, so
this patch changes the argument type accordingly. This is done in
preparation for moving this function to a header, where we do not
want to have to include due to the consequent
circular #include problems.No functional change.
Signed-off-by: Dave Martin
Reviewed-by: Alex Bennée
Acked-by: Catalin Marinas
Acked-by: Marc Zyngier
Signed-off-by: Marc Zyngier -
Having read_zcr_features() inline in cpufeature.h results in that
header requiring #includes which make it hard to include
elsewhere without triggering header inclusion
cycles.This is not a hot-path function and arguably should not be in
cpufeature.h in the first place, so this patch moves it to
fpsimd.c, compiled conditionally if CONFIG_ARM64_SVE=y.This allows some SVE-related #includes to be dropped from
cpufeature.h, which will ease future maintenance.A couple of missing #includes of are exposed by this
change under arch/arm64/. This patch adds the missing #includes as
necessary.No functional change.
Signed-off-by: Dave Martin
Reviewed-by: Alex Bennée
Acked-by: Catalin Marinas
Acked-by: Marc Zyngier
Signed-off-by: Marc Zyngier -
This patch refactors KVM to align the host and guest FPSIMD
save/restore logic with each other for arm64. This reduces the
number of redundant save/restore operations that must occur, and
reduces the common-case IRQ blackout time during guest exit storms
by saving the host state lazily and optimising away the need to
restore the host state before returning to the run loop.Four hooks are defined in order to enable this:
* kvm_arch_vcpu_run_map_fp():
Called on PID change to map necessary bits of current to Hyp.* kvm_arch_vcpu_load_fp():
Set up FP/SIMD for entering the KVM run loop (parse as
"vcpu_load fp").* kvm_arch_vcpu_ctxsync_fp():
Get FP/SIMD into a safe state for re-enabling interrupts after a
guest exit back to the run loop.For arm64 specifically, this involves updating the host kernel's
FPSIMD context tracking metadata so that kernel-mode NEON use
will cause the vcpu's FPSIMD state to be saved back correctly
into the vcpu struct. This must be done before re-enabling
interrupts because kernel-mode NEON may be used by softirqs.* kvm_arch_vcpu_put_fp():
Save guest FP/SIMD state back to memory and dissociate from the
CPU ("vcpu_put fp").Also, the arm64 FPSIMD context switch code is updated to enable it
to save back FPSIMD state for a vcpu, not just current. A few
helpers drive this:* fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
mark this CPU as having context fp (which may belong to a vcpu)
currently loaded in its registers. This is the non-task
equivalent of the static function fpsimd_bind_to_cpu() in
fpsimd.c.* task_fpsimd_save():
exported to allow KVM to save the guest's FPSIMD state back to
memory on exit from the run loop.* fpsimd_flush_state():
invalidate any context's FPSIMD state that is currently loaded.
Used to disassociate the vcpu from the CPU regs on run loop exit.These changes allow the run loop to enable interrupts (and thus
softirqs that may use kernel-mode NEON) without having to save the
guest's FPSIMD state eagerly.Some new vcpu_arch fields are added to make all this work. Because
host FPSIMD state can now be saved back directly into current's
thread_struct as appropriate, host_cpu_context is no longer used
for preserving the FPSIMD state. However, it is still needed for
preserving other things such as the host's system registers. To
avoid ABI churn, the redundant storage space in host_cpu_context is
not removed for now.arch/arm is not addressed by this patch and continues to use its
current save/restore logic. It could provide implementations of
the helpers later if desired.Signed-off-by: Dave Martin
Reviewed-by: Marc Zyngier
Reviewed-by: Christoffer Dall
Reviewed-by: Alex Bennée
Acked-by: Catalin Marinas
Signed-off-by: Marc Zyngier -
In struct vcpu_arch, the debug_flags field is used to store
debug-related flags about the vcpu state.Since we are about to add some more flags related to FPSIMD and
SVE, it makes sense to add them to the existing flags field rather
than adding new fields. Since there is only one debug_flags flag
defined so far, there is plenty of free space for expansion.In preparation for adding more flags, this patch renames the
debug_flags field to simply "flags", and updates comments
appropriately.The flag definitions are also moved to , since
their presence in was for purely historical
reasons: these definitions are not used from asm any more, and not
very likely to be as more Hyp asm is migrated to C.KVM_ARM64_DEBUG_DIRTY_SHIFT has not been used since commit
1ea66d27e7b0 ("arm64: KVM: Move away from the assembly version of
the world switch"), so this patch gets rid of that too.No functional change.
Signed-off-by: Dave Martin
Reviewed-by: Marc Zyngier
Reviewed-by: Alex Bennée
Acked-by: Christoffer Dall
[maz: fixed minor conflict]
Signed-off-by: Marc Zyngier -
In preparation for optimising the way KVM manages switching the
guest and host FPSIMD state, it is necessary to provide a means for
code outside arch/arm64/kernel/fpsimd.c to restore the user trap
configuration for SVE correctly for the current task.Rather than requiring external code to duplicate the maintenance
explicitly, this patch moves the trap maintenenace to
fpsimd_bind_to_cpu(), since it is logically part of the work of
associating the current task with the cpu.Because fpsimd_bind_to_cpu() is rather a cryptic name to publish
alongside fpsimd_bind_state_to_cpu(), the former function is
renamed to fpsimd_bind_task_to_cpu() to make its purpose more
explicit.This patch makes appropriate changes to ensure that
fpsimd_bind_task_to_cpu() is always called alongside
task_fpsimd_load(), so that the trap maintenance continues to be
done in every situation where it was done prior to this patch.As a side-effect, the metadata updates done by
fpsimd_bind_task_to_cpu() now change from conditional to
unconditional in the "already bound" case of sigreturn. This is
harmless, and a couple of extra stores on this slow path will not
impact performance. I consider this a reasonable price to pay for
a slightly cleaner interface.Signed-off-by: Dave Martin
Reviewed-by: Alex Bennée
Acked-by: Marc Zyngier
Acked-by: Catalin Marinas
Signed-off-by: Marc Zyngier -
Currently the FPSIMD handling code uses the condition task->mm ==
NULL as a hint that task has no FPSIMD register context.The ->mm check is only there to filter out tasks that cannot
possibly have FPSIMD context loaded, for optimisation purposes.
Also, TIF_FOREIGN_FPSTATE must always be checked anyway before
saving FPSIMD context back to memory. For these reasons, the ->mm
checks are not useful, providing that TIF_FOREIGN_FPSTATE is
maintained in a consistent way for all threads.The context switch logic is already deliberately optimised to defer
reloads of the regs until ret_to_user (or sigreturn as a special
case), and save them only if they have been previously loaded.
These paths are the only places where the wrong_task and wrong_cpu
conditions can be made false, by calling fpsimd_bind_task_to_cpu().
Kernel threads by definition never reach these paths. As a result,
the wrong_task and wrong_cpu tests in fpsimd_thread_switch() will
always yield true for kernel threads.This patch removes the redundant checks and special-case code,
ensuring that TIF_FOREIGN_FPSTATE is set whenever a kernel thread
is scheduled in, and ensures that this flag is set for the init
task. The fpsimd_flush_task_state() call already present in
copy_thread() ensures the same for any new task.With TIF_FOREIGN_FPSTATE always set for kernel threads, this patch
ensures that no extra context save work is added for kernel
threads, and eliminates the redundant context saving that may
currently occur for kernel threads that have acquired an mm via
use_mm().Signed-off-by: Dave Martin
Reviewed-by: Catalin Marinas
Reviewed-by: Alex Bennée
Reviewed-by: Christoffer Dall
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Ard Biesheuvel
Signed-off-by: Marc Zyngier -
The init task is started with thread_flags equal to 0, which means
that TIF_FOREIGN_FPSTATE is initially clear.It is theoretically possible (if unlikely) that the init task could
reach userspace without ever being scheduled out. If this occurs,
data left in the FPSIMD registers by the kernel could be exposed.This patch fixes this anomaly by ensuring that the init task's
initial TIF_FOREIGN_FPSTATE is set.Signed-off-by: Dave Martin
Fixes: 005f78cd8849 ("arm64: defer reloading a task's FPSIMD state to userland resume")
Reviewed-by: Catalin Marinas
Reviewed-by: Alex Bennée
Cc: Will Deacon
Cc: Ard Biesheuvel
Signed-off-by: Marc Zyngier