Eric Lee / smarc-fsl-linux-kernel

14 Feb, 2015

1 commit

b9085bcbf Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM update from Paolo Bonzini:
"Fairly small update, but there are some interesting new features.

Common:
Optional support for adding a small amount of polling on each HLT
instruction executed in the guest (or equivalent for other
architectures). This can improve latency up to 50% on some
scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This
also has to be enabled manually for now, but the plan is to
auto-tune this in the future.

ARM/ARM64:
The highlights are support for GICv3 emulation and dirty page
tracking

s390:
Several optimizations and bugfixes. Also a first: a feature
exposed by KVM (UUID and long guest name in /proc/sysinfo) before
it is available in IBM's hypervisor! :)

MIPS:
Bugfixes.

x86:
Support for PML (page modification logging, a new feature in
Broadwell Xeons that speeds up dirty page tracking), nested
virtualization improvements (nested APICv---a nice optimization),
usual round of emulation fixes.

There is also a new option to reduce latency of the TSC deadline
timer in the guest; this needs to be tuned manually.

Some commits are common between this pull and Catalin's; I see you
have already included his tree.

Powerpc:
Nothing yet.

The KVM/PPC changes will come in through the PPC maintainers,
because I haven't received them yet and I might end up being
offline for some part of next week"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits)
KVM: ia64: drop kvm.h from installed user headers
KVM: x86: fix build with !CONFIG_SMP
KVM: x86: emulate: correct page fault error code for NoWrite instructions
KVM: Disable compat ioctl for s390
KVM: s390: add cpu model support
KVM: s390: use facilities and cpu_id per KVM
KVM: s390/CPACF: Choose crypto control block format
s390/kernel: Update /proc/sysinfo file with Extended Name and UUID
KVM: s390: reenable LPP facility
KVM: s390: floating irqs: fix user triggerable endless loop
kvm: add halt_poll_ns module parameter
kvm: remove KVM_MMIO_SIZE
KVM: MIPS: Don't leak FPU/DSP to guest
KVM: MIPS: Disable HTW while in guest
KVM: nVMX: Enable nested posted interrupt processing
KVM: nVMX: Enable nested virtual interrupt delivery
KVM: nVMX: Enable nested apic register virtualization
KVM: nVMX: Make nested control MSRs per-cpu
KVM: nVMX: Enable nested virtualize x2apic mode
KVM: nVMX: Prepare for using hardware MSR bitmap
...

Linus Torvalds
2015-02-14 01:55:09 +0800

12 Feb, 2015

1 commit

0664e57ff mm: gup: kvm use get_user_pages_unlocked ... Browse Code »

Use the more generic get_user_pages_unlocked which has the additional
benefit of passing FAULT_FLAG_ALLOW_RETRY at the very first page fault
(which allows the first page fault in an unmapped area to be always able
to block indefinitely by being allowed to release the mmap_sem).

Signed-off-by: Andrea Arcangeli
Reviewed-by: Andres Lagar-Cavilla
Reviewed-by: Kirill A. Shutemov
Cc: Peter Feiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2015-02-12 09:06:05 +0800

09 Feb, 2015

1 commit

de8e5d744 KVM: Disable compat ioctl for s390 ... Browse Code »

We never had a 31bit QEMU/kuli running. We would need to review several
ioctls to check if this creates holes, bugs or whatever to make it work.
Lets just disable compat support for KVM on s390.

Signed-off-by: Christian Borntraeger
Acked-by: Paolo Bonzini

Christian Borntraeger
2015-02-09 19:44:14 +0800

06 Feb, 2015

1 commit

f78195129 kvm: add halt_poll_ns module parameter ... Browse Code »

This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.

This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.

With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.

Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.

The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.

While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.

The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.

Signed-off-by: Paolo Bonzini

Paolo Bonzini
2015-02-06 20:08:37 +0800

29 Jan, 2015

1 commit

3b0f1d01e KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic for log dirty ... Browse Code »

We don't have to write protect guest memory for dirty logging if architecture
supports hardware dirty logging, such as PML on VMX, so rename it to be more
generic.

Signed-off-by: Kai Huang
Reviewed-by: Xiao Guangrong
Signed-off-by: Paolo Bonzini

Kai Huang
2015-01-29 22:30:38 +0800

28 Jan, 2015

1 commit

b0165f1b4 kvm: update_memslots: clean flags for invalid memslots ... Browse Code »

Indeed, any invalid memslots should be new->npages = 0,
new->base_gfn = 0 and new->flags = 0 at the same time.

Signed-off-by: Tiejun Chen
Signed-off-by: Paolo Bonzini

Tiejun Chen
2015-01-28 04:31:44 +0800

23 Jan, 2015

1 commit

4b9905899 KVM: Remove unused config symbol ... Browse Code »

The dirty patch logging series introduced both
HAVE_KVM_ARCH_DIRTY_LOG_PROTECT and KVM_GENERIC_DIRTYLOG_READ_PROTECT
config symbols, but only KVM_GENERIC_DIRTYLOG_READ_PROTECT is used.
Just remove the unused one.

(The config symbol was renamed during the development of the patch
series and the old name just creeped in by accident.()

Reported-by: Paul Bolle
Signed-off-by: Christoffer Dall

Christoffer Dall
2015-01-23 17:52:03 +0800

21 Jan, 2015

18 commits

4fa96afd9 arm/arm64: KVM: force alignment of VGIC dist/CPU/redist addresses ... Browse Code »

Although the GIC architecture requires us to map the MMIO regions
only at page aligned addresses, we currently do not enforce this from
the kernel side.
Restrict any vGICv2 regions to be 4K aligned and any GICv3 regions
to be 64K aligned. Document this requirement.

Signed-off-by: Andre Przywara
Signed-off-by: Christoffer Dall

Andre Przywara
2015-01-21 01:25:33 +0800
ac3d37356 arm/arm64: KVM: allow userland to request a virtual GICv3 ... Browse Code »

With all of the GICv3 code in place now we allow userland to ask the
kernel for using a virtual GICv3 in the guest.
Also we provide the necessary support for guests setting the memory
addresses for the virtual distributor and redistributors.
This requires some userland code to make use of that feature and
explicitly ask for a virtual GICv3.
Document that KVM_CREATE_IRQCHIP only works for GICv2, but is
considered legacy and using KVM_CREATE_DEVICE is preferred.

Signed-off-by: Andre Przywara
Signed-off-by: Christoffer Dall

Andre Przywara
2015-01-21 01:25:33 +0800
b5d84ff60 arm/arm64: KVM: enable kernel side of GICv3 emulation ... Browse Code »

With all the necessary GICv3 emulation code in place, we can now
connect the code to the GICv3 backend in the kernel.
The LR register handling is different depending on the emulated GIC
model, so provide different implementations for each.
Also allow non-v2-compatible GICv3 implementations (which don't
provide MMIO regions for the virtual CPU interface in the DT), but
restrict those hosts to support GICv3 guests only.
If the device tree provides a GICv2 compatible GICV resource entry,
but that one is faulty, just disable the GICv2 emulation and let the
user use at least the GICv3 emulation for guests.
To provide proper support for the legacy KVM_CREATE_IRQCHIP ioctl,
note virtual GICv2 compatibility in struct vgic_params and use it
on creating a VGICv2.

Signed-off-by: Andre Przywara
Signed-off-by: Christoffer Dall

Andre Przywara
2015-01-21 01:25:32 +0800
6d52f35af arm64: KVM: add SGI generation register emulation ... Browse Code »

While the generation of a (virtual) inter-processor interrupt (SGI)
on a GICv2 works by writing to a MMIO register, GICv3 uses the system
register ICC_SGI1R_EL1 to trigger them.
Add a trap handler function that calls the new SGI register handler
in the GICv3 code. As ICC_SRE_EL1.SRE at this point is still always 0,
this will not trap yet, but will only be used later when all the data
structures have been initialized properly.

Signed-off-by: Andre Przywara
Reviewed-by: Christoffer Dall
Signed-off-by: Christoffer Dall

Andre Przywara
2015-01-21 01:25:32 +0800
a0675c25d arm/arm64: KVM: add virtual GICv3 distributor emulation ... Browse Code »

With everything separated and prepared, we implement a model of a
GICv3 distributor and redistributors by using the existing framework
to provide handler functions for each register group.

Currently we limit the emulation to a model enforcing a single
security state, with SRE==1 (forcing system register access) and
ARE==1 (allowing more than 8 VCPUs).

We share some of the functions provided for GICv2 emulation, but take
the different ways of addressing (v)CPUs into account.
Save and restore is currently not implemented.

Similar to the split-off of the GICv2 specific code, the new emulation
code goes into a new file (vgic-v3-emul.c).

Signed-off-by: Andre Przywara
Signed-off-by: Christoffer Dall

Andre Przywara
2015-01-21 01:25:31 +0800
9fedf1467 arm/arm64: KVM: add opaque private pointer to MMIO data ... Browse Code »

For a GICv2 there is always only one (v)CPU involved: the one that
does the access. On a GICv3 the access to a CPU redistributor is
memory-mapped, but not banked, so the (v)CPU affected is determined by
looking at the MMIO address region being accessed.
To allow passing the affected CPU into the accessors later, extend
struct kvm_exit_mmio to add an opaque private pointer parameter.
The current GICv2 emulation just does not use it.

Signed-off-by: Andre Przywara
Acked-by: Christoffer Dall
Signed-off-by: Christoffer Dall

Andre Przywara
2015-01-21 01:25:30 +0800
1d916229e arm/arm64: KVM: split GICv2 specific emulation code from vgic.c ... Browse Code »

vgic.c is currently a mixture of generic vGIC emulation code and
functions specific to emulating a GICv2. To ease the addition of
GICv3, split off strictly v2 specific parts into a new file
vgic-v2-emul.c.

Signed-off-by: Andre Przywara
Acked-by: Christoffer Dall

-------
As the diff isn't always obvious here (and to aid eventual rebases),
here is a list of high-level changes done to the code:
* added new file to respective arm/arm64 Makefiles
* moved GICv2 specific functions to vgic-v2-emul.c:
- handle_mmio_misc()
- handle_mmio_set_enable_reg()
- handle_mmio_clear_enable_reg()
- handle_mmio_set_pending_reg()
- handle_mmio_clear_pending_reg()
- handle_mmio_priority_reg()
- vgic_get_target_reg()
- vgic_set_target_reg()
- handle_mmio_target_reg()
- handle_mmio_cfg_reg()
- handle_mmio_sgi_reg()
- vgic_v2_unqueue_sgi()
- read_set_clear_sgi_pend_reg()
- write_set_clear_sgi_pend_reg()
- handle_mmio_sgi_set()
- handle_mmio_sgi_clear()
- vgic_v2_handle_mmio()
- vgic_get_sgi_sources()
- vgic_dispatch_sgi()
- vgic_v2_queue_sgi()
- vgic_v2_map_resources()
- vgic_v2_init()
- vgic_v2_add_sgi_source()
- vgic_v2_init_model()
- vgic_v2_init_emulation()
- handle_cpu_mmio_misc()
- handle_mmio_abpr()
- handle_cpu_mmio_ident()
- vgic_attr_regs_access()
- vgic_create() (renamed to vgic_v2_create())
- vgic_destroy() (renamed to vgic_v2_destroy())
- vgic_has_attr() (renamed to vgic_v2_has_attr())
- vgic_set_attr() (renamed to vgic_v2_set_attr())
- vgic_get_attr() (renamed to vgic_v2_get_attr())
- struct kvm_mmio_range vgic_dist_ranges[]
- struct kvm_mmio_range vgic_cpu_ranges[]
- struct kvm_device_ops kvm_arm_vgic_v2_ops {}

Signed-off-by: Christoffer Dall

Andre Przywara
2015-01-21 01:25:30 +0800
832158125 arm/arm64: KVM: add vgic.h header file ... Browse Code »

vgic.c is currently a mixture of generic vGIC emulation code and
functions specific to emulating a GICv2. To ease the addition of
GICv3 later, we create new header file vgic.h, which holds constants
and prototypes of commonly used functions.
Rename some identifiers to avoid name space clutter.
I removed the long-standing comment about using the kvm_io_bus API
to tackle the GIC register ranges, as it wouldn't be a win for us
anymore.

Signed-off-by: Andre Przywara
Acked-by: Christoffer Dall

-------
As the diff isn't always obvious here (and to aid eventual rebases),
here is a list of high-level changes done to the code:
* moved definitions and prototypes from vgic.c to vgic.h:
- VGIC_ADDR_UNDEF
- ACCESS_{READ,WRITE}_*
- vgic_init()
- vgic_update_state()
- vgic_kick_vcpus()
- vgic_get_vmcr()
- vgic_set_vmcr()
- struct mmio_range {} (renamed to struct kvm_mmio_range)
* removed static keyword and exported prototype in vgic.h:
- vgic_bitmap_get_reg()
- vgic_bitmap_set_irq_val()
- vgic_bitmap_get_shared_map()
- vgic_bytemap_get_reg()
- vgic_dist_irq_set_pending()
- vgic_dist_irq_clear_pending()
- vgic_cpu_irq_clear()
- vgic_reg_access()
- handle_mmio_raz_wi()
- vgic_handle_enable_reg()
- vgic_handle_set_pending_reg()
- vgic_handle_clear_pending_reg()
- vgic_handle_cfg_reg()
- vgic_unqueue_irqs()
- find_matching_range() (renamed to vgic_find_range)
- vgic_handle_mmio_range()
- vgic_update_state()
- vgic_get_vmcr()
- vgic_set_vmcr()
- vgic_queue_irq()
- vgic_kick_vcpus()
- vgic_init()
- vgic_v2_init_emulation()
- vgic_has_attr_regs()
- vgic_set_common_attr()
- vgic_get_common_attr()
- vgic_destroy()
- vgic_create()
* moved functions to vgic.h (static inline):
- mmio_data_read()
- mmio_data_write()
- is_in_range()

Signed-off-by: Christoffer Dall

Andre Przywara
2015-01-21 01:25:30 +0800
b60da146c arm/arm64: KVM: refactor/wrap vgic_set/get_attr() ... Browse Code »

vgic_set_attr() and vgic_get_attr() contain both code specific for
the emulated GIC as well as code for the userland facing, generic
part of the GIC.
Split the guest GIC facing code of from the generic part to allow
easier splitting later.

Signed-off-by: Andre Przywara
Reviewed-by: Christoffer Dall
Signed-off-by: Christoffer Dall

Andre Przywara
2015-01-21 01:25:29 +0800
d97f683d0 arm/arm64: KVM: refactor MMIO accessors ... Browse Code »

The MMIO accessors for GICD_I[CS]ENABLER, GICD_I[CS]PENDR and
GICD_ICFGR behave very similar for GICv2 and GICv3, although the way
the affected VCPU is determined differs.
Since we need them to access the registers from three different
places in the future, we factor out a generic, backend-facing
implementation and use small wrappers in the current GICv2 emulation.
This will ease adding GICv3 accessors later.

Signed-off-by: Andre Przywara
Reviewed-by: Christoffer Dall
Signed-off-by: Christoffer Dall

Andre Przywara
2015-01-21 01:25:29 +0800
2f5fa41a7 arm/arm64: KVM: make the value of ICC_SRE_EL1 a per-VM variable ... Browse Code »

ICC_SRE_EL1 is a system register allowing msr/mrs accesses to the
GIC CPU interface for EL1 (guests). Currently we force it to 0, but
for proper GICv3 support we have to allow guests to use it (depending
on their selected virtual GIC model).
So add ICC_SRE_EL1 to the list of saved/restored registers on a
world switch, but actually disallow a guest to change it by only
restoring a fixed, once-initialized value.
This value depends on the GIC model userland has chosen for a guest.

Signed-off-by: Andre Przywara
Reviewed-by: Christoffer Dall
Acked-by: Marc Zyngier
Signed-off-by: Christoffer Dall

Andre Przywara
2015-01-21 01:25:28 +0800
3caa2d8c3 arm/arm64: KVM: make the maximum number of vCPUs a per-VM value ... Browse Code »

Currently the maximum number of vCPUs supported is a global value
limited by the used GIC model. GICv3 will lift this limit, but we
still need to observe it for guests using GICv2.
So the maximum number of vCPUs is per-VM value, depending on the
GIC model the guest uses.
Store and check the value in struct kvm_arch, but keep it down to
8 for now.

Signed-off-by: Andre Przywara
Signed-off-by: Christoffer Dall

Andre Przywara
2015-01-21 01:25:28 +0800
4ce7ebdfc arm/arm64: KVM: dont rely on a valid GICH base address ... Browse Code »

To check whether the vGIC was already initialized, we currently check
the GICH base address for not being NULL. Since with GICv3 we may
get along without this address, lets use the irqchip_in_kernel()
function to detect an already initialized vGIC.

Signed-off-by: Andre Przywara
Acked-by: Christoffer Dall
Acked-by: Marc Zyngier
Signed-off-by: Christoffer Dall

Andre Przywara
2015-01-21 01:25:27 +0800
ea2f83a7d arm/arm64: KVM: move kvm_register_device_ops() into vGIC probing ... Browse Code »

Currently we unconditionally register the GICv2 emulation device
during the host's KVM initialization. Since with GICv3 support we
may end up with only v2 or only v3 or both supported, we move the
registration into the GIC probing function, where we will later know
which combination is valid.

Signed-off-by: Andre Przywara
Acked-by: Christoffer Dall
Acked-by: Marc Zyngier
Signed-off-by: Christoffer Dall

Andre Przywara
2015-01-21 01:25:27 +0800
b26e5fdac arm/arm64: KVM: introduce per-VM ops ... Browse Code »

Currently we only have one virtual GIC model supported, so all guests
use the same emulation code. With the addition of another model we
end up with different guests using potentially different vGIC models,
so we have to split up some functions to be per VM.
Introduce a vgic_vm_ops struct to hold function pointers for those
functions that are different and provide the necessary code to
initialize them.
Also split up the vgic_init() function to separate out VGIC model
specific functionality into a separate function, which will later be
different for a GICv3 model.

Signed-off-by: Andre Przywara
Reviewed-by: Christoffer Dall
Signed-off-by: Christoffer Dall

Andre Przywara
2015-01-21 01:25:26 +0800
05bc8aafe arm/arm64: KVM: wrap 64 bit MMIO accesses with two 32 bit ones ... Browse Code »

Some GICv3 registers can and will be accessed as 64 bit registers.
Currently the register handling code can only deal with 32 bit
accesses, so we do two consecutive calls to cover this.

Signed-off-by: Andre Przywara
Reviewed-by: Christoffer Dall
Acked-by: Marc Zyngier
Signed-off-by: Christoffer Dall

Andre Przywara
2015-01-21 01:25:26 +0800
96415257a arm/arm64: KVM: refactor vgic_handle_mmio() function ... Browse Code »

Currently we only need to deal with one MMIO region for the GIC
emulation (the GICv2 distributor), but we soon need to extend this.
Refactor the existing code to allow easier addition of different
ranges without code duplication.

Signed-off-by: Andre Przywara
Reviewed-by: Christoffer Dall
Acked-by: Marc Zyngier
Signed-off-by: Christoffer Dall

Andre Przywara
2015-01-21 01:25:25 +0800
59892136c arm/arm64: KVM: pass down user space provided GIC type into vGIC code ... Browse Code »

With the introduction of a second emulated GIC model we need to let
userspace specify the GIC model to use for each VM. Pass the
userspace provided value down into the vGIC code and store it there
to differentiate later.

Signed-off-by: Andre Przywara
Acked-by: Christoffer Dall
Acked-by: Marc Zyngier
Signed-off-by: Christoffer Dall

Andre Przywara
2015-01-21 01:25:25 +0800

16 Jan, 2015

2 commits

ba0513b5b KVM: Add generic support for dirty page logging ... Browse Code »

kvm_get_dirty_log() provides generic handling of dirty bitmap, currently reused
by several architectures. Building on that we intrdoduce
kvm_get_dirty_log_protect() adding write protection to mark these pages dirty
for future write access, before next KVM_GET_DIRTY_LOG ioctl call from user
space.

Reviewed-by: Christoffer Dall
Signed-off-by: Mario Smarduch

Mario Smarduch
2015-01-16 21:40:14 +0800
a6d510166 KVM: Add architecture-defined TLB flush support ... Browse Code »

Allow architectures to override the generic kvm_flush_remote_tlbs()
function via HAVE_KVM_ARCH_TLB_FLUSH_ALL. ARMv7 will need this to
provide its own TLB flush interface.

Reviewed-by: Christoffer Dall
Reviewed-by: Marc Zyngier
Reviewed-by: Paolo Bonzini
Signed-off-by: Mario Smarduch

Mario Smarduch
2015-01-16 21:40:14 +0800

11 Jan, 2015

2 commits

065c00348 KVM: arm/arm64: vgic: add init entry to VGIC KVM device ... Browse Code »

Since the advent of VGIC dynamic initialization, this latter is
initialized quite late on the first vcpu run or "on-demand", when
injecting an IRQ or when the guest sets its registers.

This initialization could be initiated explicitly much earlier
by the users-space, as soon as it has provided the requested
dimensioning parameters.

This patch adds a new entry to the VGIC KVM device that allows
the user to manually request the VGIC init:
- a new KVM_DEV_ARM_VGIC_GRP_CTRL group is introduced.
- Its first attribute is KVM_DEV_ARM_VGIC_CTRL_INIT

The rationale behind introducing a group is to be able to add other
controls later on, if needed.

Signed-off-by: Eric Auger
Signed-off-by: Christoffer Dall

Eric Auger
2015-01-11 21:12:15 +0800
66b030e48 KVM: arm/arm64: vgic: vgic_init returns -ENODEV when no online vcpu ... Browse Code »

To be more explicit on vgic initialization failure, -ENODEV is
returned by vgic_init when no online vcpus can be found at init.

Signed-off-by: Eric Auger
Signed-off-by: Christoffer Dall

Eric Auger
2015-01-11 21:12:15 +0800

09 Jan, 2015

1 commit

ff651cb61 KVM: nVMX: Add nested msr load/restore algorithm ... Browse Code »

Several hypervisors need MSR auto load/restore feature.
We read MSRs from VM-entry MSR load area which specified by L1,
and load them via kvm_set_msr in the nested entry.
When nested exit occurs, we get MSRs via kvm_get_msr, writing
them to L1`s MSR store area. After this, we read MSRs from VM-exit
MSR load area, and load them via kvm_set_msr.

Signed-off-by: Wincy Van
Signed-off-by: Paolo Bonzini

Wincy Van
2015-01-09 05:45:14 +0800

31 Dec, 2014

1 commit

2eebdde65 timecounter: keep track of accumulated fractional nanoseconds ... Browse Code »

The current timecounter implementation will drop a variable amount
of resolution, depending on the magnitude of the time delta. In
other words, reading the clock too often or too close to a time
stamp conversion will introduce errors into the time values. This
patch fixes the issue by introducing a fractional nanosecond field
that accumulates the low order bits.

Reported-by: Janusz Użycki
Signed-off-by: Richard Cochran
Signed-off-by: David S. Miller

Richard Cochran
2014-12-31 07:29:27 +0800

28 Dec, 2014

2 commits

dbaff3094 kvm: warn on more invariant breakage ... Browse Code »

Modifying a non-existent slot is not allowed. Also check that the
first loop doesn't move a deleted slot beyond the used part of
the mslots array.

Signed-off-by: Paolo Bonzini

Paolo Bonzini
2014-12-28 17:01:25 +0800
efbeec709 kvm: fix sorting of memslots with base_gfn == 0 ... Browse Code »

Before commit 0e60b0799fed (kvm: change memslot sorting rule from size
to GFN, 2014-12-01), the memslots' sorting key was npages, meaning
that a valid memslot couldn't have its sorting key equal to zero.
On the other hand, a valid memslot can have base_gfn == 0, and invalid
memslots are identified by base_gfn == npages == 0.

Because of this, commit 0e60b0799fed broke the invariant that invalid
memslots are at the end of the mslots array. When a memslot with
base_gfn == 0 was created, any invalid memslot before it were left
in place.

This can be fixed by changing the insertion to use a ">=" comparison
instead of "
Reported-by: Andy Lutomirski
Tested-by: Jamie Heilman
Signed-off-by: Paolo Bonzini

Paolo Bonzini
2014-12-28 17:01:17 +0800

19 Dec, 2014

1 commit

66dcff86b Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM update from Paolo Bonzini:
"3.19 changes for KVM:

- spring cleaning: removed support for IA64, and for hardware-
assisted virtualization on the PPC970

- ARM, PPC, s390 all had only small fixes

For x86:
- small performance improvements (though only on weird guests)
- usual round of hardware-compliancy fixes from Nadav
- APICv fixes
- XSAVES support for hosts and guests. XSAVES hosts were broken
because the (non-KVM) XSAVES patches inadvertently changed the KVM
userspace ABI whenever XSAVES was enabled; hence, this part is
going to stable. Guest support is just a matter of exposing the
feature and CPUID leaves support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits)
KVM: move APIC types to arch/x86/
KVM: PPC: Book3S: Enable in-kernel XICS emulation by default
KVM: PPC: Book3S HV: Improve H_CONFER implementation
KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register
KVM: PPC: Book3S HV: Remove code for PPC970 processors
KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
KVM: PPC: Book3S HV: Simplify locking around stolen time calculations
arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function
arch: powerpc: kvm: book3s_pr.c: Remove unused function
arch: powerpc: kvm: book3s.c: Remove some unused functions
arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function
KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
KVM: PPC: Book3S HV: ptes are big endian
KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI
KVM: PPC: Book3S HV: Fix KSM memory corruption
KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI
KVM: PPC: Book3S HV: Fix computation of tlbie operand
KVM: PPC: Book3S HV: Add missing HPTE unlock
KVM: PPC: BookE: Improve irq inject tracepoint
arm/arm64: KVM: Require in-kernel vgic for the arch timers
...

Linus Torvalds
2014-12-19 08:05:28 +0800

15 Dec, 2014

3 commits

333bce5aa Merge tag 'kvm-arm-for-3.19-take2' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/kvmarm/kvmarm into HEAD

Second round of changes for KVM for arm/arm64 for v3.19; fixes reboot
problems, clarifies VCPU init, and fixes a regression concerning the
VGIC init flow.

Conflicts:
arch/ia64/kvm/kvm-ia64.c [deleted in HEAD and modified in kvmarm]

Paolo Bonzini
2014-12-15 20:06:40 +0800
05971120f arm/arm64: KVM: Require in-kernel vgic for the arch timers ... Browse Code »

It is curently possible to run a VM with architected timers support
without creating an in-kernel VGIC, which will result in interrupts from
the virtual timer going nowhere.

To address this issue, move the architected timers initialization to the
time when we run a VCPU for the first time, and then only initialize
(and enable) the architected timers if we have a properly created and
initialized in-kernel VGIC.

When injecting interrupts from the virtual timer to the vgic, the
current setup should ensure that this never calls an on-demand init of
the VGIC, which is the only call path that could return an error from
kvm_vgic_inject_irq(), so capture the return value and raise a warning
if there's an error there.

We also change the kvm_timer_init() function from returning an int to be
a void function, since the function always succeeds.

Reviewed-by: Marc Zyngier
Signed-off-by: Christoffer Dall

Christoffer Dall
2014-12-15 18:50:42 +0800
ca7d9c829 arm/arm64: KVM: Initialize the vgic on-demand when injecting IRQs ... Browse Code »

Userspace assumes that it can wire up IRQ injections after having
created all VCPUs and after having created the VGIC, but potentially
before starting the first VCPU. This can currently lead to lost IRQs
because the state of that IRQ injection is not stored anywhere and we
don't return an error to userspace.

We haven't seen this problem manifest itself yet, presumably because
guests reset the devices on boot, but this could cause issues with
migration and other non-standard startup configurations.

Reviewed-by: Marc Zyngier
Signed-off-by: Christoffer Dall

Christoffer Dall
2014-12-15 18:36:21 +0800

13 Dec, 2014

3 commits

1f57be289 arm/arm64: KVM: Add (new) vgic_initialized macro ... Browse Code »

Some code paths will need to check to see if the internal state of the
vgic has been initialized (such as when creating new VCPUs), so
introduce such a macro that checks the nr_cpus field which is set when
the vgic has been initialized.

Also set nr_cpus = 0 in kvm_vgic_destroy, because the error path in
vgic_init() will call this function, and code should never errornously
assume the vgic to be properly initialized after an error.

Acked-by: Marc Zyngier
Reviewed-by: Eric Auger
Signed-off-by: Christoffer Dall

Christoffer Dall
2014-12-13 21:17:10 +0800
c52edf5f8 arm/arm64: KVM: Rename vgic_initialized to vgic_ready ... Browse Code »

The vgic_initialized() macro currently returns the state of the
vgic->ready flag, which indicates if the vgic is ready to be used when
running a VM, not specifically if its internal state has been
initialized.

Rename the macro accordingly in preparation for a more nuanced
initialization flow.

Acked-by: Marc Zyngier
Reviewed-by: Eric Auger
Signed-off-by: Christoffer Dall

Christoffer Dall
2014-12-13 21:17:05 +0800
6d3cfbe21 arm/arm64: KVM: vgic: move reset initialization into vgic_init_maps() ... Browse Code »

VGIC initialization currently happens in three phases:
(1) kvm_vgic_create() (triggered by userspace GIC creation)
(2) vgic_init_maps() (triggered by userspace GIC register read/write
requests, or from kvm_vgic_init() if not already run)
(3) kvm_vgic_init() (triggered by first VM run)

We were doing initialization of some state to correspond with the
state of a freshly-reset GIC in kvm_vgic_init(); this is too late,
since it will overwrite changes made by userspace using the
register access APIs before the VM is run. Move this initialization
earlier, into the vgic_init_maps() phase.

This fixes a bug where QEMU could successfully restore a saved
VM state snapshot into a VM that had already been run, but could
not restore it "from cold" using the -loadvm command line option
(the symptoms being that the restored VM would run but interrupts
were ignored).

Finally rename vgic_init_maps to vgic_init and renamed kvm_vgic_init to
kvm_vgic_map_resources.

[ This patch is originally written by Peter Maydell, but I have
modified it somewhat heavily, renaming various bits and moving code
around. If something is broken, I am to be blamed. - Christoffer ]

Acked-by: Marc Zyngier
Reviewed-by: Eric Auger
Signed-off-by: Peter Maydell
Signed-off-by: Christoffer Dall

Peter Maydell
2014-12-13 21:15:52 +0800