22 Apr, 2015
3 commits
-
…it/kvmarm/kvmarm into kvm-master
KVM/ARM changes for v4.1, take #2:
Rather small this time:
- a fix for a nasty bug with virtual IRQ injection
- a fix for irqfd -
When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently
only check it against a fixed limit, which historically is set
to 127. With the new dynamic IRQ allocation the effective limit may
actually be smaller (64).
So when now a malicious or buggy userland injects a SPI in that
range, we spill over on our VGIC bitmaps and bytemaps memory.
I could trigger a host kernel NULL pointer dereference with current
mainline by injecting some bogus IRQ number from a hacked kvmtool:
-----------------
....
DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1)
DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1)
DEBUG: IRQ #114 still in the game, writing to bytemap now...
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = ffffffc07652e000
[00000000] *pgd=00000000f658b003, *pud=00000000f658b003, *pmd=0000000000000000
Internal error: Oops: 96000006 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027
Hardware name: FVP Base (DT)
task: ffffffc0774e9680 ti: ffffffc0765a8000 task.ti: ffffffc0765a8000
PC is at kvm_vgic_inject_irq+0x234/0x310
LR is at kvm_vgic_inject_irq+0x30c/0x310
pc : [] lr : [] pstate: 80000145
.....So this patch fixes this by checking the SPI number against the
actual limit. Also we remove the former legacy hard limit of
127 in the ioctl code.Signed-off-by: Andre Przywara
Reviewed-by: Christoffer Dall
CC: # 4.0, 3.19, 3.18
[maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__,
as suggested by Christopher Covington]
Signed-off-by: Marc Zyngier -
irqfd/arm curently does not support routing. kvm_irq_map_gsi is
supposed to return all the routing entries associated with the
provided gsi and return the number of those entries. We should
return 0 at this point.Signed-off-by: Eric Auger
Acked-by: Christoffer Dall
Signed-off-by: Marc Zyngier
21 Apr, 2015
1 commit
-
This creates a debugfs directory for each HV guest (assuming debugfs
is enabled in the kernel config), and within that directory, a file
by which the contents of the guest's HPT (hashed page table) can be
read. The directory is named vmnnnn, where nnnn is the PID of the
process that created the guest. The file is named "htab". This is
intended to help in debugging problems in the host's management
of guest memory.The contents of the file consist of a series of lines like this:
3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196
The first field is the index of the entry in the HPT, the second and
third are the HPT entry, so the third entry contains the real page
number that is mapped by the entry if the entry's valid bit is set.
The fourth field is the guest's view of the second doubleword of the
entry, so it contains the guest physical address. (The format of the
second through fourth fields are described in the Power ISA and also
in arch/powerpc/include/asm/mmu-hash64.h.)Signed-off-by: Paul Mackerras
Signed-off-by: Alexander Graf
14 Apr, 2015
1 commit
-
Pull KVM updates from Paolo Bonzini:
"First batch of KVM changes for 4.1The most interesting bit here is irqfd/ioeventfd support for ARM and
ARM64.Summary:
ARM/ARM64:
fixes for live migration, irqfd and ioeventfd support (enabling
vhost, too), page agings390:
interrupt handling rework, allowing to inject all local interrupts
via new ioctl and to get/set the full local irq state for migration
and introspection. New ioctls to access memory by virtual address,
and to get/set the guest storage keys. SIMD support.MIPS:
FPU and MIPS SIMD Architecture (MSA) support. Includes some
patches from Ralf Baechle's MIPS tree.x86:
bugfixes (notably for pvclock, the others are small) and cleanups.
Another small latency improvement for the TSC deadline timer"* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
KVM: use slowpath for cross page cached accesses
kvm: mmu: lazy collapse small sptes into large sptes
KVM: x86: Clear CR2 on VCPU reset
KVM: x86: DR0-DR3 are not clear on reset
KVM: x86: BSP in MSR_IA32_APICBASE is writable
KVM: x86: simplify kvm_apic_map
KVM: x86: avoid logical_map when it is invalid
KVM: x86: fix mixed APIC mode broadcast
KVM: x86: use MDA for interrupt matching
kvm/ppc/mpic: drop unused IRQ_testbit
KVM: nVMX: remove unnecessary double caching of MAXPHYADDR
KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry
KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu
KVM: vmx: pass error code with internal error #2
x86: vdso: fix pvclock races with task migration
KVM: remove kvm_read_hva and kvm_read_hva_atomic
KVM: x86: optimize delivery of TSC deadline timer interrupt
KVM: x86: extract blocking logic from __vcpu_run
kvm: x86: fix x86 eflags fixed bit
KVM: s390: migrate vcpu interrupt state
...
10 Apr, 2015
1 commit
-
kvm_write_guest_cached() does not mark all written pages as dirty and
code comments in kvm_gfn_to_hva_cache_init() talk about NULL memslot
with cross page accesses. Fix all the easy way.The check is '
Message-Id:
Reviewed-by: Wanpeng Li
Signed-off-by: Paolo Bonzini
08 Apr, 2015
4 commits
-
The corresponding write functions just use __copy_to_user. Do the
same on the read side.This reverts what's left of commit 86ab8cffb498 (KVM: introduce
gfn_to_hva_read/kvm_read_hva/kvm_read_hva_atomic, 2012-08-21)Cc: Xiao Guangrong
Signed-off-by: Paolo Bonzini
Message-Id: -
…git/kvms390/linux into HEAD
Features and fixes for 4.1 (kvm/next)
1. Assorted changes
1.1 allow more feature bits for the guest
1.2 Store breaking event address on program interrupts2. Interrupt handling rework
2.1 Fix copy_to_user while holding a spinlock (cc stable)
2.2 Rework floating interrupts to follow the priorities
2.3 Allow to inject all local interrupts via new ioctl
2.4 allow to get/set the full local irq state, e.g. for migration
and introspection -
…arm/kvmarm into 'kvm-next'
KVM/ARM changes for v4.1:
- fixes for live migration
- irqfd support
- kvm-io-bus & vgic rework to enable ioeventfd
- page ageing for stage-2 translation
- various cleanups -
…it/kvmarm/kvmarm into 'kvm-next'
Fixes for KVM/ARM for 4.0-rc5.
Fixes page refcounting issues in our Stage-2 page table management code,
fixes a missing unlock in a gicv3 error path, and fixes a race that can
cause lost interrupts if signals are pending just prior to entering the
guest.
01 Apr, 2015
1 commit
-
We have introduced struct kvm_s390_irq a while ago which allows to
inject all kinds of interrupts as defined in the Principles of
Operation.
Add ioctl to inject interrupts with the extended struct kvm_s390_irqSigned-off-by: Jens Freimann
Signed-off-by: Christian Borntraeger
Acked-by: Cornelia Huck
31 Mar, 2015
3 commits
-
Currently we have struct kvm_exit_mmio for encapsulating MMIO abort
data to be passed on from syndrome decoding all the way down to the
VGIC register handlers. Now as we switch the MMIO handling to be
routed through the KVM MMIO bus, it does not make sense anymore to
use that structure already from the beginning. So we keep the data in
local variables until we put them into the kvm_io_bus framework.
Then we fill kvm_exit_mmio in the VGIC only, making it a VGIC private
structure. On that way we replace the data buffer in that structure
with a pointer pointing to a single location in a local variable, so
we get rid of some copying on the way.
With all of the virtual GIC emulation code now being registered with
the kvm_io_bus, we can remove all of the old MMIO handling code and
its dispatching functionality.I didn't bother to rename kvm_exit_mmio (to vgic_mmio or something),
because that touches a lot of code lines without any good reason.This is based on an original patch by Nikolay.
Signed-off-by: Andre Przywara
Cc: Nikolay Nikolaev
Reviewed-by: Marc Zyngier
Signed-off-by: Marc Zyngier -
Using the framework provided by the recent vgic.c changes, we
register a kvm_io_bus device on mapping the virtual GICv3 resources.
The distributor mapping is pretty straight forward, but the
redistributors need some more love, since they need to be tagged with
the respective redistributor (read: VCPU) they are connected with.
We use the kvm_io_bus framework to register one devices per VCPU.Signed-off-by: Andre Przywara
Reviewed-by: Marc Zyngier
Signed-off-by: Marc Zyngier -
Currently we handle the redistributor registers in two separate MMIO
regions, one for the overall behaviour and SPIs and one for the
SGIs/PPIs. That latter forces the creation of _two_ KVM I/O bus
devices for each redistributor.
Since the spec mandates those two pages to be contigious, we could as
well merge them and save the churn with the second KVM I/O bus device.Signed-off-by: Andre Przywara
Reviewed-by: Marc Zyngier
Signed-off-by: Marc Zyngier
27 Mar, 2015
6 commits
-
Using the framework provided by the recent vgic.c changes we register
a kvm_io_bus device when initializing the virtual GICv2.Signed-off-by: Andre Przywara
Signed-off-by: Marc Zyngier -
Currently we use a lot of VGIC specific code to do the MMIO
dispatching.
Use the previous reworks to add kvm_io_bus style MMIO handlers.Those are not yet called by the MMIO abort handler, also the actual
VGIC emulator function do not make use of it yet, but will be enabled
with the following patches.Signed-off-by: Andre Przywara
Reviewed-by: Marc Zyngier
Signed-off-by: Marc Zyngier -
The vgic_find_range() function in vgic.c takes a struct kvm_exit_mmio
argument, but actually only used the length field in there. Since we
need to get rid of that structure in that part of the code anyway,
let's rework the function (and it's callers) to pass the length
argument to the function directly.Signed-off-by: Andre Przywara
Reviewed-by: Christoffer Dall
Reviewed-by: Marc Zyngier
Signed-off-by: Marc Zyngier -
The name "kvm_mmio_range" is a bit bold, given that it only covers
the VGIC's MMIO ranges. To avoid confusion with kvm_io_range, rename
it to vgic_io_range.Signed-off-by: Andre Przywara
Acked-by: Christoffer Dall
Reviewed-by: Marc Zyngier
Signed-off-by: Marc Zyngier -
iodev.h contains definitions for the kvm_io_bus framework. This is
needed both by the generic KVM code in virt/kvm as well as by
architecture specific code under arch/. Putting the header file in
virt/kvm and using local includes in the architecture part seems at
least dodgy to me, so let's move the file into include/kvm, so that a
more natural "#include " can be used by all of the code.
This also solves a problem later when using struct kvm_io_device
in arm_vgic.h.
Fixing up the FSF address in the GPL header and a wrong include path
on the way.Signed-off-by: Andre Przywara
Acked-by: Christoffer Dall
Reviewed-by: Marc Zyngier
Reviewed-by: Marcelo Tosatti
Signed-off-by: Marc Zyngier -
This is needed in e.g. ARM vGIC emulation, where the MMIO handling
depends on the VCPU that does the access.Signed-off-by: Nikolay Nikolaev
Signed-off-by: Andre Przywara
Acked-by: Paolo Bonzini
Acked-by: Christoffer Dall
Reviewed-by: Marc Zyngier
Signed-off-by: Marc Zyngier
24 Mar, 2015
1 commit
-
KVM guest can fail to startup with following trace on host:
qemu-system-x86: page allocation failure: order:4, mode:0x40d0
Call Trace:
dump_stack+0x47/0x67
warn_alloc_failed+0xee/0x150
__alloc_pages_direct_compact+0x14a/0x150
__alloc_pages_nodemask+0x776/0xb80
alloc_kmem_pages+0x3a/0x110
kmalloc_order+0x13/0x50
kmemdup+0x1b/0x40
__kvm_set_memory_region+0x24a/0x9f0 [kvm]
kvm_set_ioapic+0x130/0x130 [kvm]
kvm_set_memory_region+0x21/0x40 [kvm]
kvm_vm_ioctl+0x43f/0x750 [kvm]Failure happens when attempting to allocate pages for
'struct kvm_memslots', however it doesn't have to be
present in physically contiguous (kmalloc-ed) address
space, change allocation to kvm_kvzalloc() so that
it will be vmalloc-ed when its size is more then a page.Signed-off-by: Igor Mammedov
Signed-off-by: Marcelo Tosatti
19 Mar, 2015
1 commit
-
When all bits in mask are not set,
kvm_arch_mmu_enable_log_dirty_pt_masked() has nothing to do. But since
it needs to be called from the generic code, it cannot be inlined, and
a few function calls, two when PML is enabled, are wasted.Since it is common to see many pages remain clean, e.g. framebuffers can
stay calm for a long time, it is worth eliminating this overhead.Signed-off-by: Takuya Yoshikawa
Reviewed-by: Paolo Bonzini
Signed-off-by: Marcelo Tosatti
17 Mar, 2015
1 commit
-
Fixes for KVM/ARM for 4.0-rc5.
Fixes page refcounting issues in our Stage-2 page table management code,
fixes a missing unlock in a gicv3 error path, and fixes a race that can
cause lost interrupts if signals are pending just prior to entering the
guest.
14 Mar, 2015
4 commits
-
When a VCPU is no longer running, we currently check to see if it has a
timer scheduled in the future, and if it does, we schedule a host
hrtimer to notify is in case the timer expires while the VCPU is still
not running. When the hrtimer fires, we mask the guest's timer and
inject the timer IRQ (still relying on the guest unmasking the time when
it receives the IRQ).This is all good and fine, but when migration a VM (checkpoint/restore)
this introduces a race. It is unlikely, but possible, for the following
sequence of events to happen:1. Userspace stops the VM
2. Hrtimer for VCPU is scheduled
3. Userspace checkpoints the VGIC state (no pending timer interrupts)
4. The hrtimer fires, schedules work in a workqueue
5. Workqueue function runs, masks the timer and injects timer interrupt
6. Userspace checkpoints the timer state (timer masked)At restore time, you end up with a masked timer without any timer
interrupts and your guest halts never receiving timer interrupts.Fix this by only kicking the VCPU in the workqueue function, and sample
the expired state of the timer when entering the guest again and inject
the interrupt and mask the timer only then.Signed-off-by: Christoffer Dall
Signed-off-by: Alex Bennée
Signed-off-by: Christoffer Dall -
Migrating active interrupts causes the active state to be lost
completely. This implements some additional bitmaps to track the active
state on the distributor and export this to user space.Signed-off-by: Christoffer Dall
Signed-off-by: Alex Bennée
Signed-off-by: Christoffer Dall -
This helps re-factor away some of the repetitive code and makes the code
flow more nicely.Signed-off-by: Alex Bennée
Signed-off-by: Christoffer Dall -
There is an interesting bug in the vgic code, which manifests itself
when the KVM run loop has a signal pending or needs a vmid generation
rollover after having disabled interrupts but before actually switching
to the guest.In this case, we flush the vgic as usual, but we sync back the vgic
state and exit to userspace before entering the guest. The consequence
is that we will be syncing the list registers back to the software model
using the GICH_ELRSR and GICH_EISR from the last execution of the guest,
potentially overwriting a list register containing an interrupt.This showed up during migration testing where we would capture a state
where the VM has masked the arch timer but there were no interrupts,
resulting in a hung test.Cc: Marc Zyngier
Reported-by: Alex Bennee
Signed-off-by: Christoffer Dall
Signed-off-by: Alex Bennée
Acked-by: Marc Zyngier
Signed-off-by: Christoffer Dall
13 Mar, 2015
1 commit
-
Add the missing unlock before return from function kvm_vgic_create()
in the error handling case.Signed-off-by: Wei Yongjun
Signed-off-by: Christoffer Dall
12 Mar, 2015
3 commits
-
This patch enables irqfd on arm/arm64.
Both irqfd and resamplefd are supported. Injection is implemented
in vgic.c without routing.This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD.
KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability
automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set.Irqfd injection is restricted to SPI. The rationale behind not
supporting PPI irqfd injection is that any device using a PPI would
be a private-to-the-CPU device (timer for instance), so its state
would have to be context-switched along with the VCPU and would
require in-kernel wiring anyhow. It is not a relevant use case for
irqfds.Signed-off-by: Eric Auger
Reviewed-by: Christoffer Dall
Acked-by: Marc Zyngier
Signed-off-by: Christoffer Dall -
To prepare for irqfd addition, coarse grain locking is removed at
kvm_vgic_sync_hwstate level and finer grain locking is introduced in
vgic_process_maintenance only.Signed-off-by: Eric Auger
Acked-by: Christoffer Dall
Acked-by: Marc Zyngier
Signed-off-by: Christoffer Dall -
Introduce __KVM_HAVE_ARCH_INTC_INITIALIZED define and
associated kvm_arch_intc_initialized function. This latter
allows to test whether the virtual interrupt controller is initialized
and ready to accept virtual IRQ injection. On some architectures,
the virtual interrupt controller is dynamically instantiated, justifying
that kind of check.The new function can now be used by irqfd to check whether the
virtual interrupt controller is ready on KVM_IRQFD request. If not,
KVM_IRQFD returns -EAGAIN.Signed-off-by: Eric Auger
Acked-by: Christoffer Dall
Reviewed-by: Andre Przywara
Acked-by: Marc Zyngier
Signed-off-by: Christoffer Dall
11 Mar, 2015
2 commits
-
Several dts only list "arm,cortex-a7-gic" or "arm,gic-400" in their GIC
compatible list, and while this is correct (and supported by the GIC
driver), KVM will fail to detect that it can support these cases.This patch adds the missing strings to the VGIC code. The of_device_id
entries are padded to keep the probe function data aligned.Signed-off-by: Mark Rutland
Cc: Andre Przywara
Cc: Christoffer Dall
Cc: Marc Zyngier
Cc: Michal Simek
Acked-by: Marc Zyngier
Signed-off-by: Christoffer Dall -
POWER supports irqfds but forgot to advertise them. Some userspace does
not check for the capability, but others check it---thus they work on
x86 and s390 but not POWER.To avoid that other architectures in the future make the same mistake, let
common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE.Reported-and-tested-by: Greg Kurz
Cc: stable@vger.kernel.org
Fixes: 297e21053a52f060944e9f0de4c64fad9bcd72fc
Signed-off-by: Paolo Bonzini
Signed-off-by: Marcelo Tosatti
10 Mar, 2015
7 commits
-
WARNING: Prefer [subsystem eg: netdev]_info([subsystem]dev, ... then
dev_info(dev, ... then pr_info(... to printk(KERN_INFO ...
+ printk(KERN_INFO "kvm: exiting hardware virtualization\n");WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then
dev_err(dev, ... then pr_err(... to printk(KERN_ERR ...
+ printk(KERN_ERR "kvm: misc device register failed\n");Signed-off-by: Xiubo Li
Signed-off-by: Marcelo Tosatti -
ERROR: code indent should use tabs where possible
+ const struct kvm_io_range *r2)$WARNING: please, no spaces at the start of a line
+ const struct kvm_io_range *r2)$This patch fixes this ERROR & WARNING to reduce noise when checking new
patches in kvm_main.c.Signed-off-by: Xiubo Li
Signed-off-by: Marcelo Tosatti -
WARNING: please, no space before tabs
+ * ^I^Ikvm->lock --> kvm->slots_lock --> kvm->irq_lock$WARNING: please, no space before tabs
+^I^I * ^I- gfn_to_hva (kvm_read_guest, gfn_to_pfn)$WARNING: please, no space before tabs
+^I^I * ^I- kvm_is_visible_gfn (mmu_check_roots)$This patch fixes these warnings to reduce noise when checking new
patches in kvm_main.c.Signed-off-by: Xiubo Li
Signed-off-by: Marcelo Tosatti -
There are many Warnings like this:
WARNING: Missing a blank line after declarations
+ struct kvm_coalesced_mmio_zone zone;
+ r = -EFAULT;This patch fixes these warnings to reduce noise when checking new
patches in kvm_main.c.Signed-off-by: Xiubo Li
Signed-off-by: Marcelo Tosatti -
WARNING: EXPORT_SYMBOL(foo); should immediately follow its
function/variable
+EXPORT_SYMBOL_GPL(gfn_to_page);This patch fixes these warnings to reduce noise when checking new
patches in kvm_main.c.Signed-off-by: Xiubo Li
Signed-off-by: Marcelo Tosatti -
ERROR: do not initialise statics to 0 or NULL
+static int kvm_usage_count = 0;The kvm_usage_count will be placed to .bss segment when linking, so
not need to set it to 0 here obviously.This patch fixes this ERROR to reduce noise when checking new patches
in kvm_main.c.Signed-off-by: Xiubo Li
Signed-off-by: Marcelo Tosatti -
WARNING: labels should not be indented
+ out_free_irq_routing:This patch fixes this WARNING to reduce noise when checking new patches
in kvm_main.c.Signed-off-by: Xiubo Li
Signed-off-by: Marcelo Tosatti