26 Jan, 2017
1 commit
-
commit 04478197416e3a302e9ebc917ba1aa884ef9bfab upstream.
kvm_s390_get_machine() populates the facility bitmap by copying bytes
from the host results that are stored in a 256 byte array in the prefix
page. The KVM code does use the size of the target buffer (2k), thus
copying and exposing unrelated kernel memory (mostly machine check
related logout data).Let's use the size of the source buffer instead. This is ok, as the
target buffer will always be greater or equal than the source buffer as
the KVM internal buffers (and thus S390_ARCH_FAC_LIST_SIZE_BYTE) cover
the maximum possible size that is allowed by STFLE, which is 256
doublewords. All structures are zero allocated so we can leave bytes
256-2047 unchanged.Add a similar fix for kvm_arch_init_vm().
Reported-by: Heiko Carstens
[found with smatch]
Signed-off-by: Christian Borntraeger
Acked-by: Cornelia Huck
Signed-off-by: Greg Kroah-Hartman
26 Oct, 2016
1 commit
-
Diag224 requires a page-aligned 4k buffer to store the name table
into. kmalloc does not guarantee page alignment, hence we replace it
with __get_free_page for the buffer allocation.Cc: stable@vger.kernel.org # v4.8+
Reported-by: Michael Holzheu
Signed-off-by: Janosch Frank
Reviewed-by: Cornelia Huck
Signed-off-by: Christian Borntraeger
21 Oct, 2016
1 commit
-
Usually a validity intercept is a programming error of the host
because of invalid entries in the state description.
We can get a validity intercept if the mode of the runtime
instrumentation control block is wrong. As the host does not know
which modes are valid, this can be used by userspace to trigger
a WARN.
Instead of printing a WARN let's return an error to userspace as
this can only happen if userspace provides a malformed initial
value (e.g. on migration). The kernel should never warn on bogus
input. Instead let's log it into the s390 debug feature.While at it, let's return -EINVAL for all validity intercepts as
this will trigger an error in QEMU likeerror: kvm run failed Invalid argument
PSW=mask 0404c00180000000 addr 000000000063c226 cc 00
R00=000000000000004f R01=0000000000000004 R02=0000000000760005 R03=000000007fe0a000
R04=000000000064ba2a R05=000000049db73dd0 R06=000000000082c4b0 R07=0000000000000041
R08=0000000000000002 R09=000003e0804042a8 R10=0000000496152c42 R11=000000007fe0afb0
[...]This will avoid an endless loop of validity intercepts.
Cc: stable@vger.kernel.org # v4.5+
Fixes: c6e5f166373a ("KVM: s390: implement the RI support of guest")
Acked-by: Fan Zhang
Reviewed-by: Pierre Morel
Signed-off-by: Christian Borntraeger
07 Oct, 2016
1 commit
-
Pull KVM updates from Radim Krčmář:
"All architectures:
- move `make kvmconfig` stubs from x86
- use 64 bits for debugfs statsARM:
- Important fixes for not using an in-kernel irqchip
- handle SError exceptions and present them to guests if appropriate
- proxying of GICV access at EL2 if guest mappings are unsafe
- GICv3 on AArch32 on ARMv8
- preparations for GICv3 save/restore, including ABI docs
- cleanups and a bit of optimizationsMIPS:
- A couple of fixes in preparation for supporting MIPS EVA host
kernels
- MIPS SMP host & TLB invalidation fixesPPC:
- Fix the bug which caused guests to falsely report lockups
- other minor fixes
- a small optimizations390:
- Lazy enablement of runtime instrumentation
- up to 255 CPUs for nested guests
- rework of machine check deliver
- cleanups and fixesx86:
- IOMMU part of AMD's AVIC for vmexit-less interrupt delivery
- Hyper-V TSC page
- per-vcpu tsc_offset in debugfs
- accelerated INS/OUTS in nVMX
- cleanups and fixes"* tag 'kvm-4.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (140 commits)
KVM: MIPS: Drop dubious EntryHi optimisation
KVM: MIPS: Invalidate TLB by regenerating ASIDs
KVM: MIPS: Split kernel/user ASID regeneration
KVM: MIPS: Drop other CPU ASIDs on guest MMU changes
KVM: arm/arm64: vgic: Don't flush/sync without a working vgic
KVM: arm64: Require in-kernel irqchip for PMU support
KVM: PPC: Book3s PR: Allow access to unprivileged MMCR2 register
KVM: PPC: Book3S PR: Support 64kB page size on POWER8E and POWER8NVL
KVM: PPC: Book3S: Remove duplicate setting of the B field in tlbie
KVM: PPC: BookE: Fix a sanity check
KVM: PPC: Book3S HV: Take out virtual core piggybacking code
KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-thread
ARM: gic-v3: Work around definition of gic_write_bpr1
KVM: nVMX: Fix the NMI IDT-vectoring handling
KVM: VMX: Enable MSR-BASED TPR shadow even if APICv is inactive
KVM: nVMX: Fix reload apic access page warning
kvmconfig: add virtio-gpu to config fragment
config: move x86 kvm_guest.config to a common location
arm64: KVM: Remove duplicating init code for setting VMID
ARM: KVM: Support vgic-v3
...
05 Oct, 2016
1 commit
-
Pull s390 updates from Martin Schwidefsky:
"The new features and main improvements in this merge for v4.9- Support for the UBSAN sanitizer
- Set HAVE_EFFICIENT_UNALIGNED_ACCESS, it improves the code in some
places- Improvements for the in-kernel fpu code, in particular the overhead
for multiple consecutive in kernel fpu users is recuded- Add a SIMD implementation for the RAID6 gen and xor operations
- Add RAID6 recovery based on the XC instruction
- The PCI DMA flush logic has been improved to increase the speed of
the map / unmap operations- The time synchronization code has seen some updates
And bug fixes all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits)
s390/con3270: fix insufficient space padding
s390/con3270: fix use of uninitialised data
MAINTAINERS: update DASD maintainer
s390/cio: fix accidental interrupt enabling during resume
s390/dasd: add missing \n to end of dev_err messages
s390/config: Enable config options for Docker
s390/dasd: make query host access interruptible
s390/dasd: fix panic during offline processing
s390/dasd: fix hanging offline processing
s390/pci_dma: improve lazy flush for unmap
s390/pci_dma: split dma_update_trans
s390/pci_dma: improve map_sg
s390/pci_dma: simplify dma address calculation
s390/pci_dma: remove dma address range check
iommu/s390: simplify registration of I/O address translation parameters
s390: migrate exception table users off module.h and onto extable.h
s390: export header for CLP ioctl
s390/vmur: fix irq pointer dereference in int handler
s390/dasd: add missing KOBJ_CHANGE event for unformatted devices
s390: enable UBSAN
...
16 Sep, 2016
1 commit
-
Two stubs are added:
o kvm_arch_has_vcpu_debugfs(): must return true if the arch
supports creating debugfs entries in the vcpu debugfs dir
(which will be implemented by the next commit)o kvm_arch_create_vcpu_debugfs(): code that creates debugfs
entries in the vcpu debugfs dirFor x86, this commit introduces a new file to avoid growing
arch/x86/kvm/x86.c even more.Signed-off-by: Luiz Capitulino
Signed-off-by: Paolo Bonzini
08 Sep, 2016
11 commits
-
* Reuse existing functionality from memdup_user() instead of keeping
duplicate source code.This issue was detected by using the Coccinelle software.
* Return directly if this copy operation failed.
Reviewed-by: David Hildenbrand
Acked-by: Cornelia Huck
Signed-off-by: Markus Elfring
Message-Id:
Signed-off-by: Christian Borntraeger -
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus reuse the corresponding function "kmalloc_array".Suggested-by: Paolo Bonzini
This issue was detected also by using the Coccinelle software.
* Replace the specification of data structures by pointer dereferences
to make the corresponding size determination a bit safer according to
the Linux coding style convention.* Delete the local variable "size" which became unnecessary with
this refactoring.Signed-off-by: Markus Elfring
Acked-by: Cornelia Huck
Message-Id:
Signed-off-by: Christian Borntraeger -
If the SCA entries aren't used by the hardware (no SIGPIF), we
can simply not set the entries, stick to the basic sca and allow more
than 64 VCPUs.To hinder any other facility from using these entries, let's properly
provoke intercepts by not setting the MCN and keeping the entries
unset.This effectively allows when running KVM under KVM (vSIE) or under z/VM to
provide more than 64 VCPUs to a guest. Let's limit it to 255 for now, to
not run into problems if the CPU numbers are limited somewhere else.Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger -
Only enable runtime instrumentation if the guest issues an RI related
instruction or if userspace changes the riccb to a valid state.
This makes entry/exit a tiny bit faster.Initial patch by Christian Borntraeger
Signed-off-by: Fan Zhang
Signed-off-by: Christian Borntraeger -
The payload data for protection exceptions is a superset of the
payload of other translation exceptions. Let's set the additional
flags and use a fall through to minimize code duplication.Signed-off-by: Janosch Frank
Reviewed-by: David Hildenbrand
Signed-off-by: Christian Borntraeger -
Let's avoid working with the PER_EVENT* defines, used for control register
manipulation, when checking the u8 PER code. Introduce separate defines
based on the existing defines.Reviewed-by: Eric Farman
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger -
Let's also write the external damage code already provided by
struct kvm_s390_mchk_info.Reviewed-by: Cornelia Huck
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger -
Vector registers are only to be stored if the facility is available
and if the guest has set up the machine check extended save area.If anything goes wrong while writing the vector registers, the vector
registers are to be marked as invalid. Please note that we are allowed
to write the registers although they are marked as invalid.Machine checks and "store status" SIGP orders are two different concepts,
let's correctly separate these. As the SIGP part is completely handled in
user space, we can drop it.This patch is based on a patch from Cornelia Huck.
Reviewed-by: Cornelia Huck
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger -
Store status writes the prefix which is not to be done by a machine check.
Also, the psw is stored and later on overwritten by the failing-storage
address, which looks strange at first sight.Store status and machine check handling look similar, but they are actually
two different things.Reviewed-by: Cornelia Huck
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger -
Let's factor this out to prepare for bigger changes. Reorder to calls to
match the logical order given in the PoP.Reviewed-by: Cornelia Huck
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger
05 Sep, 2016
1 commit
-
We store the address of riccbd at the wrong location, overwriting
gvrd. This means that our nested guest will not be able to use runtime
instrumentation. Also, a memory leak, if our KVM guest actually sets gvrd.Not noticed until now, as KVM guests never make use of gvrd and runtime
instrumentation wasn't completely tested yet.Reported-by: Fan Zhang
Reviewed-by: Cornelia Huck
Signed-off-by: David Hildenbrand
Signed-off-by: Cornelia Huck
29 Aug, 2016
1 commit
-
The CPACF code makes some assumptions about the availablity of hardware
support. E.g. if the machine supports KM(AES-256) without chaining it is
assumed that KMC(AES-256) with chaining is available as well. For the
existing CPUs this is true but the architecturally correct way is to
check each CPACF functions on its own. This is what the query function
of each instructions is all about.Reviewed-by: Harald Freudenberger
Signed-off-by: Martin Schwidefsky
26 Aug, 2016
2 commits
-
Pull facility mask patch from the KVM tree.
* tag 's390forkvm' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux
KVM: s390: generate facility mask from readable list -
Automatically generate the KVM facility mask out of a readable list.
Manually changing the masks is very error prone, especially if the
special IBM bit numbering has to be considered.Signed-off-by: Heiko Carstens
Reviewed-by: Christian Borntraeger
Signed-off-by: Christian Borntraeger
25 Aug, 2016
1 commit
-
As the meaning of these variables and pointers seems to change more
frequently, let's directly access our save area, instead of going via
current->thread.Right now, this is broken for set/get_fpu. They simply overwrite the
host registers, as the pointers to the current save area were turned
into the static host save area.Cc: stable@vger.kernel.org # 4.7
Fixes: 3f6813b9a5e0 ("s390/fpu: allocate 'struct fpu' with the task_struct")
Reported-by: Hao QingFeng
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger
12 Aug, 2016
2 commits
-
When triggering KVM_RUN without a user memory region being mapped
(KVM_SET_USER_MEMORY_REGION) a validity intercept occurs. This could
happen, if the user memory region was not mapped initially or if it
was unmapped after the vcpu is initialized. The function
kvm_s390_handle_requests checks for the KVM_REQ_MMU_RELOAD bit. The
check function always clears this bit. If gmap_mprotect_notify
returns an error code, the mapping failed, but the KVM_REQ_MMU_RELOAD
was not set anymore. So the next time kvm_s390_handle_requests is
called, the execution would fall trough the check for
KVM_REQ_MMU_RELOAD. The bit needs to be resetted, if
gmap_mprotect_notify returns an error code. Resetting the bit with
kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu) fixes the bug.Reviewed-by: David Hildenbrand
Signed-off-by: Julius Niedworok
Signed-off-by: Christian Borntraeger -
When KVM_RUN is triggered on a VCPU without an initial reset, a
validity intercept occurs.
Setting the prefix will set the KVM_REQ_MMU_RELOAD bit initially,
thus preventing the bug.Reviewed-by: David Hildenbrand
Acked-by: Cornelia Huck
Signed-off-by: Julius Niedworok
Signed-off-by: Christian Borntraeger
03 Aug, 2016
1 commit
-
Pull KVM updates from Paolo Bonzini:
- ARM: GICv3 ITS emulation and various fixes. Removal of the
old VGIC implementation.- s390: support for trapping software breakpoints, nested
virtualization (vSIE), the STHYI opcode, initial extensions
for CPU model support.- MIPS: support for MIPS64 hosts (32-bit guests only) and lots
of cleanups, preliminary to this and the upcoming support for
hardware virtualization extensions.- x86: support for execute-only mappings in nested EPT; reduced
vmexit latency for TSC deadline timer (by about 30%) on Intel
hosts; support for more than 255 vCPUs.- PPC: bugfixes.
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits)
KVM: PPC: Introduce KVM_CAP_PPC_HTM
MIPS: Select HAVE_KVM for MIPS64_R{2,6}
MIPS: KVM: Reset CP0_PageMask during host TLB flush
MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX()
MIPS: KVM: Sign extend MFC0/RDHWR results
MIPS: KVM: Fix 64-bit big endian dynamic translation
MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase
MIPS: KVM: Use 64-bit CP0_EBase when appropriate
MIPS: KVM: Set CP0_Status.KX on MIPS64
MIPS: KVM: Make entry code MIPS64 friendly
MIPS: KVM: Use kmap instead of CKSEG0ADDR()
MIPS: KVM: Use virt_to_phys() to get commpage PFN
MIPS: Fix definition of KSEGX() for 64-bit
KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD
kvm: x86: nVMX: maintain internal copy of current VMCS
KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
KVM: arm64: vgic-its: Simplify MAPI error handling
KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers
KVM: arm64: vgic-its: Turn device_id validation into generic ID validation
...
27 Jul, 2016
1 commit
-
Pull s390 updates from Martin Schwidefsky:
"There are a couple of new things for s390 with this merge request:- a new scheduling domain "drawer" is added to reflect the unusual
topology found on z13 machines. Performance tests showed up to 8
percent gain with the additional domain.- the new crc-32 checksum crypto module uses the vector-galois-field
multiply and sum SIMD instruction to speed up crc-32 and crc-32c.- proper __ro_after_init support, this requires RO_AFTER_INIT_DATA in
the generic vmlinux.lds linker script definitions.- kcov instrumentation support. A prerequisite for that is the
inline assembly basic block cleanup, which is the reason for the
net/iucv/iucv.c change.- support for 2GB pages is added to the hugetlbfs backend.
Then there are two removals:
- the oprofile hardware sampling support is dead code and is removed.
The oprofile user space uses the perf interface nowadays.- the ETR clock synchronization is removed, this has been superseeded
be the STP clock synchronization. And it always has been
"interesting" code..And the usual bug fixes and cleanups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (82 commits)
s390/pci: Delete an unnecessary check before the function call "pci_dev_put"
s390/smp: clean up a condition
s390/cio/chp : Remove deprecated create_singlethread_workqueue
s390/chsc: improve channel path descriptor determination
s390/chsc: sanitize fmt check for chp_desc determination
s390/cio: make fmt1 channel path descriptor optional
s390/chsc: fix ioctl CHSC_INFO_CU command
s390/cio/device_ops: fix kernel doc
s390/cio: allow to reset channel measurement block
s390/console: Make preferred console handling more consistent
s390/mm: fix gmap tlb flush issues
s390/mm: add support for 2GB hugepages
s390: have unique symbol for __switch_to address
s390/cpuinfo: show maximum thread id
s390/ptrace: clarify bits in the per_struct
s390: stack address vs thread_info
s390: remove pointless load within __switch_to
s390: enable kcov support
s390/cpumf: use basic block for ecctr inline assembly
s390/hypfs: use basic block for diag inline assembly
...
18 Jul, 2016
2 commits
-
We don't emulate ptff subfunctions, therefore react on any attempt of
execution by setting cc=3 (Requested function not available).Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger -
We will use illegal instruction 0x0000 for handling 2 byte sw breakpoints
from user space. As it can be enabled dynamically via a capability,
let's move setting of ICTL_OPEREXC to the post creation step, so we avoid
any races when enabling that capability just while adding new cpus.Acked-by: Janosch Frank
Reviewed-by: Cornelia Huck
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger
14 Jul, 2016
1 commit
-
Arch-specific code will use it.
Signed-off-by: Radim Krčmář
Signed-off-by: Paolo Bonzini
05 Jul, 2016
1 commit
-
In case we have to emuluate an instruction or part of it (instruction,
partial instruction, operation exception), we have to inject a PER
instruction-fetching event for that instruction, if hardware told us to do
so.In case we retry an instruction, we must not inject the PER event.
Please note that we don't filter the events properly yet, so guest
debugging will be visible for the guest.Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger
01 Jul, 2016
1 commit
-
Use the functions from context_tracking.h directly.
Cc: Andy Lutomirski
Cc: Peter Zijlstra
Cc: H. Peter Anvin
Cc: Ingo Molnar
Cc: Thomas Gleixner
Reviewed-by: Rik van Riel
Signed-off-by: Paolo Bonzini
21 Jun, 2016
9 commits
-
Let's be careful first and allow nested virtualization only if enabled
by the system administrator. In addition, user space still has to
explicitly enable it via SCLP features for it to work.Acked-by: Christian Borntraeger
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger -
We have certain SIE features that we cannot support for now.
Let's add these features, so user space can directly prepare to enable
them, so we don't have to update yet another component.In addition, add a comment block, telling why it is for now not possible to
forward/enable these features.Acked-by: Christian Borntraeger
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger -
Guest 2 sets up the epoch of guest 3 from his point of view. Therefore,
we have to add the guest 2 epoch to the guest 3 epoch. We also have to take
care of guest 2 epoch changes on STP syncs. This will work just fine by
also updating the guest 3 epoch when a vsie_block has been set for a VCPU.Acked-by: Christian Borntraeger
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger -
Whenever a SIGP external call is injected via the SIGP external call
interpretation facility, the VCPU is not kicked. When a VCPU is currently
in the VSIE, the external call might not be processed immediately.Therefore we have to provoke partial execution exceptions, which leads to a
kick of the VCPU and therefore also kick out of VSIE. This is done by
simulating the WAIT state. This bit has no other side effects.Acked-by: Christian Borntraeger
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger -
As we want to make use of CPUSTAT_WAIT also when a VCPU is not idle but
to force interception of external calls, let's check in the bitmap instead.Acked-by: Christian Borntraeger
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger -
Whenever we want to wake up a VCPU (e.g. when injecting an IRQ), we
have to kick it out of vsie, so the request will be handled faster.Acked-by: Christian Borntraeger
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger -
We can avoid one unneeded SIE entry after we reported a fault to g2.
Theoretically, g2 resolves the fault and we can create the shadow mapping
directly, instead of failing again when entering the SIE.Acked-by: Christian Borntraeger
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger -
We can easily enable ibs for guest 2, so he can use it for guest 3.
Acked-by: Christian Borntraeger
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger -
We can easily enable cei for guest 2, so he can use it for guest 3.
Acked-by: Christian Borntraeger
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger