Eric Lee / smarc-fsl-linux-kernel

06 Nov, 2015

1 commit

933425fb0 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM updates from Paolo Bonzini:
"First batch of KVM changes for 4.4.

s390:
A bunch of fixes and optimizations for interrupt and time handling.

PPC:
Mostly bug fixes.

ARM:
No big features, but many small fixes and prerequisites including:

- a number of fixes for the arch-timer

- introducing proper level-triggered semantics for the arch-timers

- a series of patches to synchronously halt a guest (prerequisite
for IRQ forwarding)

- some tracepoint improvements

- a tweak for the EL2 panic handlers

- some more VGIC cleanups getting rid of redundant state

x86:
Quite a few changes:

- support for VT-d posted interrupts (i.e. PCI devices can inject
interrupts directly into vCPUs). This introduces a new
component (in virt/lib/) that connects VFIO and KVM together.
The same infrastructure will be used for ARM interrupt
forwarding as well.

- more Hyper-V features, though the main one Hyper-V synthetic
interrupt controller will have to wait for 4.5. These will let
KVM expose Hyper-V devices.

- nested virtualization now supports VPID (same as PCID but for
vCPUs) which makes it quite a bit faster

- for future hardware that supports NVDIMM, there is support for
clflushopt, clwb, pcommit

- support for "split irqchip", i.e. LAPIC in kernel +
IOAPIC/PIC/PIT in userspace, which reduces the attack surface of
the hypervisor

- obligatory smattering of SMM fixes

- on the guest side, stable scheduler clock support was rewritten
to not require help from the hypervisor"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits)
KVM: VMX: Fix commit which broke PML
KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()
KVM: x86: allow RSM from 64-bit mode
KVM: VMX: fix SMEP and SMAP without EPT
KVM: x86: move kvm_set_irq_inatomic to legacy device assignment
KVM: device assignment: remove pointless #ifdefs
KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic
KVM: x86: zero apic_arb_prio on reset
drivers/hv: share Hyper-V SynIC constants with userspace
KVM: x86: handle SMBASE as physical address in RSM
KVM: x86: add read_phys to x86_emulate_ops
KVM: x86: removing unused variable
KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs
KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
KVM: arm/arm64: Optimize away redundant LR tracking
KVM: s390: use simple switch statement as multiplexer
KVM: s390: drop useless newline in debugging data
KVM: s390: SCA must not cross page boundaries
KVM: arm: Do not indent the arguments of DECLARE_BITMAP
...

Linus Torvalds
2015-11-06 08:26:26 +0800

04 Nov, 2015

7 commits

b97e6de9c KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic ... Browse Code »

We do not want to do too much work in atomic context, in particular
not walking all the VCPUs of the virtual machine. So we want
to distinguish the architecture-specific injection function for irqfd
from kvm_set_msi. Since it's still empty, reuse the newly added
kvm_arch_set_irq and rename it to kvm_arch_set_irq_inatomic.

Reviewed-by: Radim Krčmář
Signed-off-by: Paolo Bonzini

Paolo Bonzini
2015-11-04 23:24:35 +0800
6956d8946 KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs ... Browse Code »

The symbol was missing a KVM dependency.

Signed-off-by: Jan Beulich
Signed-off-by: Paolo Bonzini

Jan Beulich
2015-11-04 23:24:30 +0800
197a4f4b0 Merge tag 'kvm-arm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD ... Browse Code »

KVM/ARM Changes for v4.4-rc1

Includes a number of fixes for the arch-timer, introducing proper
level-triggered semantics for the arch-timers, a series of patches to
synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
getting rid of redundant state, and finally a stylistic change that gets rid of
some ctags warnings.

Conflicts:
arch/x86/include/asm/kvm_host.h

Paolo Bonzini
2015-11-04 23:24:17 +0800
26caea769 KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr() ... Browse Code »

Now we see that vgic_set_lr() and vgic_sync_lr_elrsr() are always used
together. Merge them into one function, saving from second vgic_ops
dereferencing every time.

Signed-off-by: Pavel Fedin
Signed-off-by: Christoffer Dall

Pavel Fedin
2015-11-04 22:29:49 +0800
212c76545 KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings ... Browse Code »

1. Remove unnecessary 'irq' argument, because irq number can be retrieved
from the LR.
2. Since cff9211eb1a1f58ce7f5a2d596b617928fd4be0e
("arm/arm64: KVM: Fix arch timer behavior for disabled interrupts ")
LR_STATE_PENDING is queued back by vgic_retire_lr() itself. Also, it
clears vlr.state itself. Therefore, we remove the same, now duplicated,
check with all accompanying bit manipulations from vgic_unqueue_irqs().
3. vgic_retire_lr() is always accompanied by vgic_irq_clear_queued(). Since
it already does more than just clearing the LR, move
vgic_irq_clear_queued() inside of it.

Signed-off-by: Pavel Fedin
Signed-off-by: Christoffer Dall

Pavel Fedin
2015-11-04 22:29:49 +0800
c4cd4c168 KVM: arm/arm64: Optimize away redundant LR tracking ... Browse Code »

Currently we use vgic_irq_lr_map in order to track which LRs hold which
IRQs, and lr_used bitmap in order to track which LRs are used or free.

vgic_irq_lr_map is actually used only for piggy-back optimization, and
can be easily replaced by iteration over lr_used. This is good because in
future, when LPI support is introduced, number of IRQs will grow up to at
least 16384, while numbers from 1024 to 8192 are never going to be used.
This would be a huge memory waste.

In its turn, lr_used is also completely redundant since
ae705930fca6322600690df9dc1c7d0516145a93 ("arm/arm64: KVM: Keep elrsr/aisr
in sync with software model"), because together with lr_used we also update
elrsr. This allows to easily replace lr_used with elrsr, inverting all
conditions (because in elrsr '1' means 'free').

Signed-off-by: Pavel Fedin
Signed-off-by: Christoffer Dall

Pavel Fedin
2015-11-04 22:29:49 +0800
6aa2fdb87 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull irq updates from Thomas Gleixner:
"The irq departement delivers:

- Rework the irqdomain core infrastructure to accomodate ACPI based
systems. This is required to support ARM64 without creating
artificial device tree nodes.

- Sanitize the ACPI based ARM GIC initialization by making use of the
new firmware independent irqdomain core

- Further improvements to the generic MSI management

- Generalize the irq migration on CPU hotplug

- Improvements to the threaded interrupt infrastructure

- Allow the migration of "chained" low level interrupt handlers

- Allow optional force masking of interrupts in disable_irq[_nosysnc]

- Support for two new interrupt chips - Sigh!

- A larger set of errata fixes for ARM gicv3

- The usual pile of fixes, updates, improvements and cleanups all
over the place"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
Document that IRQ_NONE should be returned when IRQ not actually handled
PCI/MSI: Allow the MSI domain to be device-specific
PCI: Add per-device MSI domain hook
of/irq: Use the msi-map property to provide device-specific MSI domain
of/irq: Split of_msi_map_rid to reuse msi-map lookup
irqchip/gic-v3-its: Parse new version of msi-parent property
PCI/MSI: Use of_msi_get_domain instead of open-coded "msi-parent" parsing
of/irq: Use of_msi_get_domain instead of open-coded "msi-parent" parsing
of/irq: Add support code for multi-parent version of "msi-parent"
irqchip/gic-v3-its: Add handling of PCI requester id.
PCI/MSI: Add helper function pci_msi_domain_get_msi_rid().
of/irq: Add new function of_msi_map_rid()
Docs: dt: Add PCI MSI map bindings
irqchip/gic-v2m: Add support for multiple MSI frames
irqchip/gic-v3: Fix translation of LPIs after conversion to irq_fwspec
irqchip/mxs: Add Alphascale ASM9260 support
irqchip/mxs: Prepare driver for hardware with different offsets
irqchip/mxs: Panic if ioremap or domain creation fails
irqdomain: Documentation updates
irqdomain/msi: Use fwnode instead of of_node
...

Linus Torvalds
2015-11-04 06:40:01 +0800

23 Oct, 2015

8 commits

e21f09108 arm/arm64: KVM: Add tracepoints for vgic and timer ... Browse Code »

The VGIC and timer code for KVM arm/arm64 doesn't have any tracepoints
or tracepoint infrastructure defined. Rewriting some of the timer code
handling showed me how much we need this, so let's add these simple
trace points once and for all and we can easily expand with additional
trace points in these files as we go along.

Cc: Wei Huang
Signed-off-by: Christoffer Dall

Christoffer Dall
2015-10-23 05:01:48 +0800
8fe2f19e6 arm/arm64: KVM: Support edge-triggered forwarded interrupts ... Browse Code »

We mark edge-triggered interrupts with the HW bit set as queued to
prevent the VGIC code from injecting LRs with both the Active and
Pending bits set at the same time while also setting the HW bit,
because the hardware does not support this.

However, this means that we must also clear the queued flag when we sync
back a LR where the state on the physical distributor went from active
to inactive because the guest deactivated the interrupt. At this point
we must also check if the interrupt is pending on the distributor, and
tell the VGIC to queue it again if it is.

Since these actions on the sync path are extremely close to those for
level-triggered interrupts, rename process_level_irq to
process_queued_irq, allowing it to cater for both cases.

Signed-off-by: Christoffer Dall

Christoffer Dall
2015-10-23 05:01:44 +0800
4b4b4512d arm/arm64: KVM: Rework the arch timer to use level-triggered semantics ... Browse Code »

The arch timer currently uses edge-triggered semantics in the sense that
the line is never sampled by the vgic and lowering the line from the
timer to the vgic doesn't have any effect on the pending state of
virtual interrupts in the vgic. This means that we do not support a
guest with the otherwise valid behavior of (1) disable interrupts (2)
enable the timer (3) disable the timer (4) enable interrupts. Such a
guest would validly not expect to see any interrupts on real hardware,
but will see interrupts on KVM.

This patch fixes this shortcoming through the following series of
changes.

First, we change the flow of the timer/vgic sync/flush operations. Now
the timer is always flushed/synced before the vgic, because the vgic
samples the state of the timer output. This has the implication that we
move the timer operations in to non-preempible sections, but that is
fine after the previous commit getting rid of hrtimer schedules on every
entry/exit.

Second, we change the internal behavior of the timer, letting the timer
keep track of its previous output state, and only lower/raise the line
to the vgic when the state changes. Note that in theory this could have
been accomplished more simply by signalling the vgic every time the
state *potentially* changed, but we don't want to be hitting the vgic
more often than necessary.

Third, we get rid of the use of the map->active field in the vgic and
instead simply set the interrupt as active on the physical distributor
whenever the input to the GIC is asserted and conversely clear the
physical active state when the input to the GIC is deasserted.

Fourth, and finally, we now initialize the timer PPIs (and all the other
unused PPIs for now), to be level-triggered, and modify the sync code to
sample the line state on HW sync and re-inject a new interrupt if it is
still pending at that time.

Signed-off-by: Christoffer Dall

Christoffer Dall
2015-10-23 05:01:44 +0800
54723bb37 arm/arm64: KVM: Use appropriate define in VGIC reset code ... Browse Code »

We currently initialize the SGIs to be enabled in the VGIC code, but we
use the VGIC_NR_PPIS define for this purpose, instead of the the more
natural VGIC_NR_SGIS. Change this slightly confusing use of the
defines.

Note: This should have no functional change, as both names are defined
to the number 16.

Acked-by: Marc Zyngier
Signed-off-by: Christoffer Dall

Christoffer Dall
2015-10-23 05:01:43 +0800
8bf9a701e arm/arm64: KVM: Implement GICD_ICFGR as RO for PPIs ... Browse Code »

The GICD_ICFGR allows the bits for the SGIs and PPIs to be read only.
We currently simulate this behavior by writing a hardcoded value to the
register for the SGIs and PPIs on every write of these bits to the
register (ignoring what the guest actually wrote), and by writing the
same value as the reset value to the register.

This is a bit counter-intuitive, as the register is RO for these bits,
and we can just implement it that way, allowing us to control the value
of the bits purely in the reset code.

Reviewed-by: Marc Zyngier
Signed-off-by: Christoffer Dall

Christoffer Dall
2015-10-23 05:01:42 +0800
9103617df arm/arm64: KVM: vgic: Factor out level irq processing on guest exit ... Browse Code »

Currently vgic_process_maintenance() processes dealing with a completed
level-triggered interrupt directly, but we are soon going to reuse this
logic for level-triggered mapped interrupts with the HW bit set, so
move this logic into a separate static function.

Probably the most scary part of this commit is convincing yourself that
the current flow is safe compared to the old one. In the following I
try to list the changes and why they are harmless:

Move vgic_irq_clear_queued after kvm_notify_acked_irq:
Harmless because the only potential effect of clearing the queued
flag wrt. kvm_set_irq is that vgic_update_irq_pending does not set
the pending bit on the emulated CPU interface or in the
pending_on_cpu bitmask if the function is called with level=1.
However, the point of kvm_notify_acked_irq is to call kvm_set_irq
with level=0, and we set the queued flag again in
__kvm_vgic_sync_hwstate later on if the level is stil high.

Move vgic_set_lr before kvm_notify_acked_irq:
Also, harmless because the LR are cpu-local operations and
kvm_notify_acked only affects the dist

Move vgic_dist_irq_clear_soft_pend after kvm_notify_acked_irq:
Also harmless, because now we check the level state in the
clear_soft_pend function and lower the pending bits if the level is
low.

Reviewed-by: Eric Auger
Reviewed-by: Marc Zyngier
Signed-off-by: Christoffer Dall

Christoffer Dall
2015-10-23 05:01:42 +0800
d35268da6 arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block ... Browse Code »

We currently schedule a soft timer every time we exit the guest if the
timer did not expire while running the guest. This is really not
necessary, because the only work we do in the timer work function is to
kick the vcpu.

Kicking the vcpu does two things:
(1) If the vpcu thread is on a waitqueue, make it runnable and remove it
from the waitqueue.
(2) If the vcpu is running on a different physical CPU from the one
doing the kick, it sends a reschedule IPI.

The second case cannot happen, because the soft timer is only ever
scheduled when the vcpu is not running. The first case is only relevant
when the vcpu thread is on a waitqueue, which is only the case when the
vcpu thread has called kvm_vcpu_block().

Therefore, we only need to make sure a timer is scheduled for
kvm_vcpu_block(), which we do by encapsulating all calls to
kvm_vcpu_block() with kvm_timer_{un}schedule calls.

Additionally, we only schedule a soft timer if the timer is enabled and
unmasked, since it is useless otherwise.

Note that theoretically userspace can use the SET_ONE_REG interface to
change registers that should cause the timer to fire, even if the vcpu
is blocked without a scheduled timer, but this case was not supported
before this patch and we leave it for future work for now.

Signed-off-by: Christoffer Dall

Christoffer Dall
2015-10-23 05:01:42 +0800
3217f7c25 KVM: Add kvm_arch_vcpu_{un}blocking callbacks ... Browse Code »

Some times it is useful for architecture implementations of KVM to know
when the VCPU thread is about to block or when it comes back from
blocking (arm/arm64 needs to know this to properly implement timers, for
example).

Therefore provide a generic architecture callback function in line with
what we do elsewhere for KVM generic-arch interactions.

Reviewed-by: Marc Zyngier
Signed-off-by: Christoffer Dall

Christoffer Dall
2015-10-23 05:01:41 +0800

21 Oct, 2015

4 commits

0d997491f arm/arm64: KVM: Fix disabled distributor operation ... Browse Code »

We currently do a single update of the vgic state when the distributor
enable/disable control register is accessed and then bypass updating the
state for as long as the distributor remains disabled.

This is incorrect, because updating the state does not consider the
distributor enable bit, and this you can end up in a situation where an
interrupt is marked as pending on the CPU interface, but not pending on
the distributor, which is an impossible state to be in, and triggers a
warning. Consider for example the following sequence of events:

1. An interrupt is marked as pending on the distributor
- the interrupt is also forwarded to the CPU interface
2. The guest turns off the distributor (it's about to do a reboot)
- we stop updating the CPU interface state from now on
3. The guest disables the pending interrupt
- we remove the pending state from the distributor, but don't touch
the CPU interface, see point 2.

Since the distributor disable bit really means that no interrupts should
be forwarded to the CPU interface, we modify the code to keep updating
the internal VGIC state, but always set the CPU interface pending bits
to zero when the distributor is disabled.

Signed-off-by: Christoffer Dall

Christoffer Dall
2015-10-21 00:09:13 +0800
544c572e0 arm/arm64: KVM: Clear map->active on pend/active clear ... Browse Code »

When a guest reboots or offlines/onlines CPUs, it is not uncommon for it
to clear the pending and active states of an interrupt through the
emulated VGIC distributor. However, since the architected timers are
defined by the architecture to be level triggered and the guest
rightfully expects them to be that, but we emulate them as
edge-triggered, we have to mimic level-triggered behavior for an
edge-triggered virtual implementation.

We currently do not signal the VGIC when the map->active field is true,
because it indicates that the guest has already been signalled of the
interrupt as required. Normally this field is set to false when the
guest deactivates the virtual interrupt through the sync path.

We also need to catch the case where the guest deactivates the interrupt
through the emulated distributor, again allowing guests to boot even if
the original virtual timer signal hit before the guest's GIC
initialization sequence is run.

Reviewed-by: Eric Auger
Signed-off-by: Christoffer Dall

Christoffer Dall
2015-10-21 00:06:34 +0800
cff9211eb arm/arm64: KVM: Fix arch timer behavior for disabled interrupts ... Browse Code »

We have an interesting issue when the guest disables the timer interrupt
on the VGIC, which happens when turning VCPUs off using PSCI, for
example.

The problem is that because the guest disables the virtual interrupt at
the VGIC level, we never inject interrupts to the guest and therefore
never mark the interrupt as active on the physical distributor. The
host also never takes the timer interrupt (we only use the timer device
to trigger a guest exit and everything else is done in software), so the
interrupt does not become active through normal means.

The result is that we keep entering the guest with a programmed timer
that will always fire as soon as we context switch the hardware timer
state and run the guest, preventing forward progress for the VCPU.

Since the active state on the physical distributor is really part of the
timer logic, it is the job of our virtual arch timer driver to manage
this state.

The timer->map->active boolean field indicates whether we have signalled
this interrupt to the vgic and if that interrupt is still pending or
active. As long as that is the case, the hardware doesn't have to
generate physical interrupts and therefore we mark the interrupt as
active on the physical distributor.

We also have to restore the pending state of an interrupt that was
queued to an LR but was retired from the LR for some reason, while
remaining pending in the LR.

Cc: Marc Zyngier
Reported-by: Lorenzo Pieralisi
Signed-off-by: Christoffer Dall

Christoffer Dall
2015-10-21 00:04:54 +0800
437f9963b KVM: arm/arm64: Do not inject spurious interrupts ... Browse Code »

When lowering a level-triggered line from userspace, we forgot to lower
the pending bit on the emulated CPU interface and we also did not
re-compute the pending_on_cpu bitmap for the CPU affected by the change.

Update vgic_update_irq_pending() to fix the two issues above and also
raise a warning in vgic_quue_irq_to_lr if we encounter an interrupt
pending on a CPU which is neither marked active nor pending.

[ Commit text reworked completely - Christoffer ]

Signed-off-by: Pavel Fedin
Signed-off-by: Christoffer Dall

Pavel Fedin
2015-10-21 00:04:43 +0800

16 Oct, 2015

4 commits

f33143d80 kvm/irqchip: allow only multiple irqchip routes per GSI ... Browse Code »

Any other irq routing types (MSI, S390_ADAPTER, upcoming Hyper-V
SynIC) map one-to-one to GSI.

Signed-off-by: Andrey Smetanin
Reviewed-by: Roman Kagan
Signed-off-by: Denis V. Lunev
CC: Vitaly Kuznetsov
CC: "K. Y. Srinivasan"
CC: Gleb Natapov
CC: Paolo Bonzini
Signed-off-by: Paolo Bonzini

Andrey Smetanin
2015-10-16 16:34:30 +0800
c9a5eccac kvm/eventfd: add arch-specific set_irq ... Browse Code »

Allow for arch-specific interrupt types to be set. For that, add
kvm_arch_set_irq() which takes interrupt type-specific action if it
recognizes the interrupt type given, and -EWOULDBLOCK otherwise.

The default implementation always returns -EWOULDBLOCK.

Signed-off-by: Andrey Smetanin
Reviewed-by: Roman Kagan
Signed-off-by: Denis V. Lunev
CC: Vitaly Kuznetsov
CC: "K. Y. Srinivasan"
CC: Gleb Natapov
CC: Paolo Bonzini
Signed-off-by: Paolo Bonzini

Andrey Smetanin
2015-10-16 16:34:29 +0800
ba1aefcd6 kvm/eventfd: factor out kvm_notify_acked_gsi() ... Browse Code »

Factor out kvm_notify_acked_gsi() helper to iterate over EOI listeners
and notify those matching the given gsi.

It will be reused in the upcoming Hyper-V SynIC implementation.

Signed-off-by: Andrey Smetanin
Reviewed-by: Roman Kagan
Signed-off-by: Denis V. Lunev
CC: Vitaly Kuznetsov
CC: "K. Y. Srinivasan"
CC: Gleb Natapov
CC: Paolo Bonzini
Signed-off-by: Paolo Bonzini

Andrey Smetanin
2015-10-16 16:34:29 +0800
351dc6477 kvm/eventfd: avoid loop inside irqfd_update() ... Browse Code »

The loop(for) inside irqfd_update() is unnecessary
because any other value for irq_entry.type will just trigger
schedule_work(&irqfd->inject) in irqfd_wakeup.

Signed-off-by: Andrey Smetanin
Reviewed-by: Roman Kagan
Signed-off-by: Denis V. Lunev
CC: Vitaly Kuznetsov
CC: "K. Y. Srinivasan"
CC: Gleb Natapov
CC: Paolo Bonzini
Signed-off-by: Paolo Bonzini

Andrey Smetanin
2015-10-16 16:34:28 +0800

14 Oct, 2015

1 commit

6003a4201 kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c ... Browse Code »

async_pf_execute() seems to be missing a memory barrier which might
cause the waker to not notice the waiter and miss sending a wake_up as
in the following figure.

async_pf_execute kvm_vcpu_block
------------------------------------------------------------------------
spin_lock(&vcpu->async_pf.lock);
if (waitqueue_active(&vcpu->wq))
/* The CPU might reorder the test for
the waitqueue up here, before
prior writes complete */
prepare_to_wait(&vcpu->wq, &wait,
TASK_INTERRUPTIBLE);
/*if (kvm_vcpu_check_block(vcpu) < 0) */
/*if (kvm_arch_vcpu_runnable(vcpu)) { */
...
return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
!vcpu->arch.apf.halted)
|| !list_empty_careful(&vcpu->async_pf.done)
...
return 0;
list_add_tail(&apf->link,
&vcpu->async_pf.done);
spin_unlock(&vcpu->async_pf.lock);
waited = true;
schedule();
------------------------------------------------------------------------

The attached patch adds the missing memory barrier.

I found this issue when I was looking through the linux source code
for places calling waitqueue_active() before wake_up*(), but without
preceding memory barriers, after sending a patch to fix a similar
issue in drivers/tty/n_tty.c (Details about the original issue can be
found here: https://lkml.org/lkml/2015/9/28/849).

Signed-off-by: Kosuke Tatsukawa
Signed-off-by: Paolo Bonzini

Kosuke Tatsukawa
2015-10-14 22:41:08 +0800

10 Oct, 2015

1 commit

4f64cb65b arm/arm64: KVM: Only allow 64bit hosts to build VGICv3 ... Browse Code »

Hardware virtualisation of GICv3 is only supported by 64bit hosts for
the moment. Some VGICv3 bits are missing from the 32bit side, and this
patch allows to still be able to build 32bit hosts when CONFIG_ARM_GIC_V3
is selected.

To this end, we introduce a new option, CONFIG_KVM_ARM_VGIC_V3, that is
only enabled on the 64bit side. The selection is done unconditionally
because CONFIG_ARM_GIC_V3 is always enabled on arm64.

Reviewed-by: Marc Zyngier
Signed-off-by: Jean-Philippe Brucker
Signed-off-by: Marc Zyngier

Jean-Philippe Brucker
2015-10-10 06:11:57 +0800

01 Oct, 2015

10 commits

bf9f6ac8d KVM: Update Posted-Interrupts Descriptor when vCPU is blocked ... Browse Code »

This patch updates the Posted-Interrupts Descriptor when vCPU
is blocked.

pre-block:
- Add the vCPU to the blocked per-CPU list
- Set 'NV' to POSTED_INTR_WAKEUP_VECTOR

post-block:
- Remove the vCPU from the per-CPU list

Signed-off-by: Feng Wu
[Concentrate invocation of pre/post-block hooks to vcpu_block. - Paolo]
Signed-off-by: Paolo Bonzini

Feng Wu
2015-10-01 21:06:53 +0800
f70c20aaf KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd' ... Browse Code »

This patch adds an arch specific hooks 'arch_update' in
'struct kvm_kernel_irqfd'. On Intel side, it is used to
update the IRTE when VT-d posted-interrupts is used.

Signed-off-by: Feng Wu
Reviewed-by: Alex Williamson
Signed-off-by: Paolo Bonzini

Feng Wu
2015-10-01 21:06:47 +0800
9016cfb57 KVM: eventfd: add irq bypass consumer management ... Browse Code »

This patch adds the registration/unregistration of an
irq_bypass_consumer on irqfd assignment/deassignment.

Signed-off-by: Eric Auger
Signed-off-by: Feng Wu
Reviewed-by: Alex Williamson
Signed-off-by: Paolo Bonzini

Eric Auger
2015-10-01 21:06:46 +0800
1a02b2703 KVM: introduce kvm_arch functions for IRQ bypass ... Browse Code »

This patch introduces
- kvm_arch_irq_bypass_add_producer
- kvm_arch_irq_bypass_del_producer
- kvm_arch_irq_bypass_stop
- kvm_arch_irq_bypass_start

They make possible to specialize the KVM IRQ bypass consumer in
case CONFIG_KVM_HAVE_IRQ_BYPASS is set.

Signed-off-by: Eric Auger
[Add weak implementations of the callbacks. - Feng]
Signed-off-by: Feng Wu
Reviewed-by: Alex Williamson
Signed-off-by: Paolo Bonzini

Eric Auger
2015-10-01 21:06:45 +0800
166c9775f KVM: create kvm_irqfd.h ... Browse Code »

Move _irqfd_resampler and _irqfd struct declarations in a new
public header: kvm_irqfd.h. They are respectively renamed into
kvm_kernel_irqfd_resampler and kvm_kernel_irqfd. Those datatypes
will be used by architecture specific code, in the context of
IRQ bypass manager integration.

Signed-off-by: Eric Auger
Signed-off-by: Feng Wu
Reviewed-by: Alex Williamson
Signed-off-by: Paolo Bonzini

Eric Auger
2015-10-01 21:06:44 +0800
37d9fe478 virt: Add virt directory to the top Makefile ... Browse Code »

We need to build files in virt/lib/, which are now used by
KVM and VFIO, so add virt directory to the top Makefile.

Signed-off-by: Feng Wu
Acked-by: Michal Marek
Signed-off-by: Paolo Bonzini

Feng Wu
2015-10-01 21:06:44 +0800
f73f81731 virt: IRQ bypass manager ... Browse Code »

When a physical I/O device is assigned to a virtual machine through
facilities like VFIO and KVM, the interrupt for the device generally
bounces through the host system before being injected into the VM.
However, hardware technologies exist that often allow the host to be
bypassed for some of these scenarios. Intel Posted Interrupts allow
the specified physical edge interrupts to be directly injected into a
guest when delivered to a physical processor while the vCPU is
running. ARM IRQ Forwarding allows forwarded physical interrupts to
be directly deactivated by the guest.

The IRQ bypass manager here is meant to provide the shim to connect
interrupt producers, generally the host physical device driver, with
interrupt consumers, generally the hypervisor, in order to configure
these bypass mechanism. To do this, we base the connection on a
shared, opaque token. For KVM-VFIO this is expected to be an
eventfd_ctx since this is the connection we already use to connect an
eventfd to an irqfd on the in-kernel path. When a producer and
consumer with matching tokens is found, callbacks via both registered
participants allow the bypass facilities to be automatically enabled.

Signed-off-by: Alex Williamson
Reviewed-by: Eric Auger
Tested-by: Eric Auger
Tested-by: Feng Wu
Signed-off-by: Feng Wu
Signed-off-by: Paolo Bonzini

Alex Williamson
2015-10-01 21:06:43 +0800
e9ea5069d kvm: add capability for any-length ioeventfds ... Browse Code »

Cc: Gleb Natapov
Cc: Paolo Bonzini
Signed-off-by: Jason Wang
Signed-off-by: Paolo Bonzini

Jason Wang
2015-10-01 21:06:31 +0800
d3febddde kvm: use kmalloc() instead of kzalloc() during iodev register/unregister ... Browse Code »

All fields of kvm_io_range were initialized or copied explicitly
afterwards. So switch to use kmalloc().

Cc: Gleb Natapov
Cc: Paolo Bonzini
Cc: Michael S. Tsirkin
Signed-off-by: Jason Wang
Signed-off-by: Paolo Bonzini

Jason Wang
2015-10-01 21:06:29 +0800
b053b2aef KVM: x86: Add EOI exit bitmap inference ... Browse Code »

In order to support a userspace IOAPIC interacting with an in kernel
APIC, the EOI exit bitmaps need to be configurable.

If the IOAPIC is in userspace (i.e. the irqchip has been split), the
EOI exit bitmaps will be set whenever the GSI Routes are configured.
In particular, for the low MSI routes are reservable for userspace
IOAPICs. For these MSI routes, the EOI Exit bit corresponding to the
destination vector of the route will be set for the destination VCPU.

The intention is for the userspace IOAPICs to use the reservable MSI
routes to inject interrupts into the guest.

This is a slight abuse of the notion of an MSI Route, given that MSIs
classically bypass the IOAPIC. It might be worthwhile to add an
additional route type to improve clarity.

Compile tested for Intel x86.

Signed-off-by: Steve Rutherford
Signed-off-by: Paolo Bonzini

Steve Rutherford
2015-10-01 21:06:28 +0800

25 Sep, 2015

1 commit

920552b21 KVM: disable halt_poll_ns as default for s390x ... Browse Code »

We observed some performance degradation on s390x with dynamic
halt polling. Until we can provide a proper fix, let's enable
halt_poll_ns as default only for supported architectures.

Architectures are now free to set their own halt_poll_ns
default value.

Signed-off-by: David Hildenbrand
Signed-off-by: Paolo Bonzini

David Hildenbrand
2015-09-25 16:31:30 +0800

17 Sep, 2015

2 commits

efe4d36a7 Merge tag 'kvm-arm-for-4.3-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/g… ... Browse Code »

…it/kvmarm/kvmarm into kvm-master

Second set of KVM/ARM changes for 4.3-rc2

- Workaround for a Cortex-A57 erratum
- Bug fix for the debugging infrastructure
- Fix for 32bit guests with more than 4GB of address space
on a 32bit host
- A number of fixes for the (unusual) case when we don't use
the in-kernel GIC emulation
- Removal of ThumbEE handling on arm64, since these have been
dropped from the architecture before anyone actually ever
built a CPU
- Remove the KVM_ARM_MAX_VCPUS limitation which has become
fairly pointless

Paolo Bonzini
2015-09-17 22:51:59 +0800
ef748917b arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS' ... Browse Code »

This patch removes config option of KVM_ARM_MAX_VCPUS,
and like other ARCHs, just choose the maximum allowed
value from hardware, and follows the reasons:

1) from distribution view, the option has to be
defined as the max allowed value because it need to
meet all kinds of virtulization applications and
need to support most of SoCs;

2) using a bigger value doesn't introduce extra memory
consumption, and the help text in Kconfig isn't accurate
because kvm_vpu structure isn't allocated until request
of creating VCPU is sent from QEMU;

3) the main effect is that the field of vcpus[] in 'struct kvm'
becomes a bit bigger(sizeof(void *) per vcpu) and need more cache
lines to hold the structure, but 'struct kvm' is one generic struct,
and it has worked well on other ARCHs already in this way. Also,
the world switch frequecy is often low, for example, it is ~2000
when running kernel building load in VM from APM xgene KVM host,
so the effect is very small, and the difference can't be observed
in my test at all.

Cc: Dann Frazier
Signed-off-by: Ming Lei
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Ming Lei
2015-09-17 20:13:27 +0800

16 Sep, 2015

1 commit

62bea5bff KVM: add halt_attempted_poll to VCPU stats ... Browse Code »

This new statistic can help diagnosing VCPUs that, for any reason,
trigger bad behavior of halt_poll_ns autotuning.

For example, say halt_poll_ns = 480000, and wakeups are spaced exactly
like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes
10+20+40+80+160+320+480 = 1110 microseconds out of every
479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then
is consuming about 30% more CPU than it would use without
polling. This would show as an abnormally high number of
attempted polling compared to the successful polls.

Acked-by: Christian Borntraeger <
Reviewed-by: David Matlack
Signed-off-by: Paolo Bonzini

Paolo Bonzini
2015-09-16 18:17:00 +0800