14 Feb, 2015

1 commit

  • Pull KVM update from Paolo Bonzini:
    "Fairly small update, but there are some interesting new features.

    Common:
    Optional support for adding a small amount of polling on each HLT
    instruction executed in the guest (or equivalent for other
    architectures). This can improve latency up to 50% on some
    scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This
    also has to be enabled manually for now, but the plan is to
    auto-tune this in the future.

    ARM/ARM64:
    The highlights are support for GICv3 emulation and dirty page
    tracking

    s390:
    Several optimizations and bugfixes. Also a first: a feature
    exposed by KVM (UUID and long guest name in /proc/sysinfo) before
    it is available in IBM's hypervisor! :)

    MIPS:
    Bugfixes.

    x86:
    Support for PML (page modification logging, a new feature in
    Broadwell Xeons that speeds up dirty page tracking), nested
    virtualization improvements (nested APICv---a nice optimization),
    usual round of emulation fixes.

    There is also a new option to reduce latency of the TSC deadline
    timer in the guest; this needs to be tuned manually.

    Some commits are common between this pull and Catalin's; I see you
    have already included his tree.

    Powerpc:
    Nothing yet.

    The KVM/PPC changes will come in through the PPC maintainers,
    because I haven't received them yet and I might end up being
    offline for some part of next week"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits)
    KVM: ia64: drop kvm.h from installed user headers
    KVM: x86: fix build with !CONFIG_SMP
    KVM: x86: emulate: correct page fault error code for NoWrite instructions
    KVM: Disable compat ioctl for s390
    KVM: s390: add cpu model support
    KVM: s390: use facilities and cpu_id per KVM
    KVM: s390/CPACF: Choose crypto control block format
    s390/kernel: Update /proc/sysinfo file with Extended Name and UUID
    KVM: s390: reenable LPP facility
    KVM: s390: floating irqs: fix user triggerable endless loop
    kvm: add halt_poll_ns module parameter
    kvm: remove KVM_MMIO_SIZE
    KVM: MIPS: Don't leak FPU/DSP to guest
    KVM: MIPS: Disable HTW while in guest
    KVM: nVMX: Enable nested posted interrupt processing
    KVM: nVMX: Enable nested virtual interrupt delivery
    KVM: nVMX: Enable nested apic register virtualization
    KVM: nVMX: Make nested control MSRs per-cpu
    KVM: nVMX: Enable nested virtualize x2apic mode
    KVM: nVMX: Prepare for using hardware MSR bitmap
    ...

    Linus Torvalds
     

12 Feb, 2015

1 commit

  • Use the more generic get_user_pages_unlocked which has the additional
    benefit of passing FAULT_FLAG_ALLOW_RETRY at the very first page fault
    (which allows the first page fault in an unmapped area to be always able
    to block indefinitely by being allowed to release the mmap_sem).

    Signed-off-by: Andrea Arcangeli
    Reviewed-by: Andres Lagar-Cavilla
    Reviewed-by: Kirill A. Shutemov
    Cc: Peter Feiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

09 Feb, 2015

1 commit

  • We never had a 31bit QEMU/kuli running. We would need to review several
    ioctls to check if this creates holes, bugs or whatever to make it work.
    Lets just disable compat support for KVM on s390.

    Signed-off-by: Christian Borntraeger
    Acked-by: Paolo Bonzini

    Christian Borntraeger
     

06 Feb, 2015

1 commit

  • This patch introduces a new module parameter for the KVM module; when it
    is present, KVM attempts a bit of polling on every HLT before scheduling
    itself out via kvm_vcpu_block.

    This parameter helps a lot for latency-bound workloads---in particular
    I tested it with O_DSYNC writes with a battery-backed disk in the host.
    In this case, writes are fast (because the data doesn't have to go all
    the way to the platters) but they cannot be merged by either the host or
    the guest. KVM's performance here is usually around 30% of bare metal,
    or 50% if you use cache=directsync or cache=writethrough (these
    parameters avoid that the guest sends pointless flush requests, and
    at the same time they are not slow because of the battery-backed cache).
    The bad performance happens because on every halt the host CPU decides
    to halt itself too. When the interrupt comes, the vCPU thread is then
    migrated to a new physical CPU, and in general the latency is horrible
    because the vCPU thread has to be scheduled back in.

    With this patch performance reaches 60-65% of bare metal and, more
    important, 99% of what you get if you use idle=poll in the guest. This
    means that the tunable gets rid of this particular bottleneck, and more
    work can be done to improve performance in the kernel or QEMU.

    Of course there is some price to pay; every time an otherwise idle vCPUs
    is interrupted by an interrupt, it will poll unnecessarily and thus
    impose a little load on the host. The above results were obtained with
    a mostly random value of the parameter (500000), and the load was around
    1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.

    The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
    that can be used to tune the parameter. It counts how many HLT
    instructions received an interrupt during the polling period; each
    successful poll avoids that Linux schedules the VCPU thread out and back
    in, and may also avoid a likely trip to C1 and back for the physical CPU.

    While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
    Of these halts, almost all are failed polls. During the benchmark,
    instead, basically all halts end within the polling period, except a more
    or less constant stream of 50 per second coming from vCPUs that are not
    running the benchmark. The wasted time is thus very low. Things may
    be slightly different for Windows VMs, which have a ~10 ms timer tick.

    The effect is also visible on Marcelo's recently-introduced latency
    test for the TSC deadline timer. Though of course a non-RT kernel has
    awful latency bounds, the latency of the timer is around 8000-10000 clock
    cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
    deadline timer, thus, the effect is both a smaller average latency and
    a smaller variance.

    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

29 Jan, 2015

1 commit


28 Jan, 2015

1 commit


23 Jan, 2015

1 commit

  • The dirty patch logging series introduced both
    HAVE_KVM_ARCH_DIRTY_LOG_PROTECT and KVM_GENERIC_DIRTYLOG_READ_PROTECT
    config symbols, but only KVM_GENERIC_DIRTYLOG_READ_PROTECT is used.
    Just remove the unused one.

    (The config symbol was renamed during the development of the patch
    series and the old name just creeped in by accident.()

    Reported-by: Paul Bolle
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

21 Jan, 2015

18 commits

  • Although the GIC architecture requires us to map the MMIO regions
    only at page aligned addresses, we currently do not enforce this from
    the kernel side.
    Restrict any vGICv2 regions to be 4K aligned and any GICv3 regions
    to be 64K aligned. Document this requirement.

    Signed-off-by: Andre Przywara
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • With all of the GICv3 code in place now we allow userland to ask the
    kernel for using a virtual GICv3 in the guest.
    Also we provide the necessary support for guests setting the memory
    addresses for the virtual distributor and redistributors.
    This requires some userland code to make use of that feature and
    explicitly ask for a virtual GICv3.
    Document that KVM_CREATE_IRQCHIP only works for GICv2, but is
    considered legacy and using KVM_CREATE_DEVICE is preferred.

    Signed-off-by: Andre Przywara
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • With all the necessary GICv3 emulation code in place, we can now
    connect the code to the GICv3 backend in the kernel.
    The LR register handling is different depending on the emulated GIC
    model, so provide different implementations for each.
    Also allow non-v2-compatible GICv3 implementations (which don't
    provide MMIO regions for the virtual CPU interface in the DT), but
    restrict those hosts to support GICv3 guests only.
    If the device tree provides a GICv2 compatible GICV resource entry,
    but that one is faulty, just disable the GICv2 emulation and let the
    user use at least the GICv3 emulation for guests.
    To provide proper support for the legacy KVM_CREATE_IRQCHIP ioctl,
    note virtual GICv2 compatibility in struct vgic_params and use it
    on creating a VGICv2.

    Signed-off-by: Andre Przywara
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • While the generation of a (virtual) inter-processor interrupt (SGI)
    on a GICv2 works by writing to a MMIO register, GICv3 uses the system
    register ICC_SGI1R_EL1 to trigger them.
    Add a trap handler function that calls the new SGI register handler
    in the GICv3 code. As ICC_SRE_EL1.SRE at this point is still always 0,
    this will not trap yet, but will only be used later when all the data
    structures have been initialized properly.

    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • With everything separated and prepared, we implement a model of a
    GICv3 distributor and redistributors by using the existing framework
    to provide handler functions for each register group.

    Currently we limit the emulation to a model enforcing a single
    security state, with SRE==1 (forcing system register access) and
    ARE==1 (allowing more than 8 VCPUs).

    We share some of the functions provided for GICv2 emulation, but take
    the different ways of addressing (v)CPUs into account.
    Save and restore is currently not implemented.

    Similar to the split-off of the GICv2 specific code, the new emulation
    code goes into a new file (vgic-v3-emul.c).

    Signed-off-by: Andre Przywara
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • For a GICv2 there is always only one (v)CPU involved: the one that
    does the access. On a GICv3 the access to a CPU redistributor is
    memory-mapped, but not banked, so the (v)CPU affected is determined by
    looking at the MMIO address region being accessed.
    To allow passing the affected CPU into the accessors later, extend
    struct kvm_exit_mmio to add an opaque private pointer parameter.
    The current GICv2 emulation just does not use it.

    Signed-off-by: Andre Przywara
    Acked-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • vgic.c is currently a mixture of generic vGIC emulation code and
    functions specific to emulating a GICv2. To ease the addition of
    GICv3, split off strictly v2 specific parts into a new file
    vgic-v2-emul.c.

    Signed-off-by: Andre Przywara
    Acked-by: Christoffer Dall

    -------
    As the diff isn't always obvious here (and to aid eventual rebases),
    here is a list of high-level changes done to the code:
    * added new file to respective arm/arm64 Makefiles
    * moved GICv2 specific functions to vgic-v2-emul.c:
    - handle_mmio_misc()
    - handle_mmio_set_enable_reg()
    - handle_mmio_clear_enable_reg()
    - handle_mmio_set_pending_reg()
    - handle_mmio_clear_pending_reg()
    - handle_mmio_priority_reg()
    - vgic_get_target_reg()
    - vgic_set_target_reg()
    - handle_mmio_target_reg()
    - handle_mmio_cfg_reg()
    - handle_mmio_sgi_reg()
    - vgic_v2_unqueue_sgi()
    - read_set_clear_sgi_pend_reg()
    - write_set_clear_sgi_pend_reg()
    - handle_mmio_sgi_set()
    - handle_mmio_sgi_clear()
    - vgic_v2_handle_mmio()
    - vgic_get_sgi_sources()
    - vgic_dispatch_sgi()
    - vgic_v2_queue_sgi()
    - vgic_v2_map_resources()
    - vgic_v2_init()
    - vgic_v2_add_sgi_source()
    - vgic_v2_init_model()
    - vgic_v2_init_emulation()
    - handle_cpu_mmio_misc()
    - handle_mmio_abpr()
    - handle_cpu_mmio_ident()
    - vgic_attr_regs_access()
    - vgic_create() (renamed to vgic_v2_create())
    - vgic_destroy() (renamed to vgic_v2_destroy())
    - vgic_has_attr() (renamed to vgic_v2_has_attr())
    - vgic_set_attr() (renamed to vgic_v2_set_attr())
    - vgic_get_attr() (renamed to vgic_v2_get_attr())
    - struct kvm_mmio_range vgic_dist_ranges[]
    - struct kvm_mmio_range vgic_cpu_ranges[]
    - struct kvm_device_ops kvm_arm_vgic_v2_ops {}

    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • vgic.c is currently a mixture of generic vGIC emulation code and
    functions specific to emulating a GICv2. To ease the addition of
    GICv3 later, we create new header file vgic.h, which holds constants
    and prototypes of commonly used functions.
    Rename some identifiers to avoid name space clutter.
    I removed the long-standing comment about using the kvm_io_bus API
    to tackle the GIC register ranges, as it wouldn't be a win for us
    anymore.

    Signed-off-by: Andre Przywara
    Acked-by: Christoffer Dall

    -------
    As the diff isn't always obvious here (and to aid eventual rebases),
    here is a list of high-level changes done to the code:
    * moved definitions and prototypes from vgic.c to vgic.h:
    - VGIC_ADDR_UNDEF
    - ACCESS_{READ,WRITE}_*
    - vgic_init()
    - vgic_update_state()
    - vgic_kick_vcpus()
    - vgic_get_vmcr()
    - vgic_set_vmcr()
    - struct mmio_range {} (renamed to struct kvm_mmio_range)
    * removed static keyword and exported prototype in vgic.h:
    - vgic_bitmap_get_reg()
    - vgic_bitmap_set_irq_val()
    - vgic_bitmap_get_shared_map()
    - vgic_bytemap_get_reg()
    - vgic_dist_irq_set_pending()
    - vgic_dist_irq_clear_pending()
    - vgic_cpu_irq_clear()
    - vgic_reg_access()
    - handle_mmio_raz_wi()
    - vgic_handle_enable_reg()
    - vgic_handle_set_pending_reg()
    - vgic_handle_clear_pending_reg()
    - vgic_handle_cfg_reg()
    - vgic_unqueue_irqs()
    - find_matching_range() (renamed to vgic_find_range)
    - vgic_handle_mmio_range()
    - vgic_update_state()
    - vgic_get_vmcr()
    - vgic_set_vmcr()
    - vgic_queue_irq()
    - vgic_kick_vcpus()
    - vgic_init()
    - vgic_v2_init_emulation()
    - vgic_has_attr_regs()
    - vgic_set_common_attr()
    - vgic_get_common_attr()
    - vgic_destroy()
    - vgic_create()
    * moved functions to vgic.h (static inline):
    - mmio_data_read()
    - mmio_data_write()
    - is_in_range()

    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • vgic_set_attr() and vgic_get_attr() contain both code specific for
    the emulated GIC as well as code for the userland facing, generic
    part of the GIC.
    Split the guest GIC facing code of from the generic part to allow
    easier splitting later.

    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • The MMIO accessors for GICD_I[CS]ENABLER, GICD_I[CS]PENDR and
    GICD_ICFGR behave very similar for GICv2 and GICv3, although the way
    the affected VCPU is determined differs.
    Since we need them to access the registers from three different
    places in the future, we factor out a generic, backend-facing
    implementation and use small wrappers in the current GICv2 emulation.
    This will ease adding GICv3 accessors later.

    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • ICC_SRE_EL1 is a system register allowing msr/mrs accesses to the
    GIC CPU interface for EL1 (guests). Currently we force it to 0, but
    for proper GICv3 support we have to allow guests to use it (depending
    on their selected virtual GIC model).
    So add ICC_SRE_EL1 to the list of saved/restored registers on a
    world switch, but actually disallow a guest to change it by only
    restoring a fixed, once-initialized value.
    This value depends on the GIC model userland has chosen for a guest.

    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • Currently the maximum number of vCPUs supported is a global value
    limited by the used GIC model. GICv3 will lift this limit, but we
    still need to observe it for guests using GICv2.
    So the maximum number of vCPUs is per-VM value, depending on the
    GIC model the guest uses.
    Store and check the value in struct kvm_arch, but keep it down to
    8 for now.

    Signed-off-by: Andre Przywara
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • To check whether the vGIC was already initialized, we currently check
    the GICH base address for not being NULL. Since with GICv3 we may
    get along without this address, lets use the irqchip_in_kernel()
    function to detect an already initialized vGIC.

    Signed-off-by: Andre Przywara
    Acked-by: Christoffer Dall
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • Currently we unconditionally register the GICv2 emulation device
    during the host's KVM initialization. Since with GICv3 support we
    may end up with only v2 or only v3 or both supported, we move the
    registration into the GIC probing function, where we will later know
    which combination is valid.

    Signed-off-by: Andre Przywara
    Acked-by: Christoffer Dall
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • Currently we only have one virtual GIC model supported, so all guests
    use the same emulation code. With the addition of another model we
    end up with different guests using potentially different vGIC models,
    so we have to split up some functions to be per VM.
    Introduce a vgic_vm_ops struct to hold function pointers for those
    functions that are different and provide the necessary code to
    initialize them.
    Also split up the vgic_init() function to separate out VGIC model
    specific functionality into a separate function, which will later be
    different for a GICv3 model.

    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • Some GICv3 registers can and will be accessed as 64 bit registers.
    Currently the register handling code can only deal with 32 bit
    accesses, so we do two consecutive calls to cover this.

    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • Currently we only need to deal with one MMIO region for the GIC
    emulation (the GICv2 distributor), but we soon need to extend this.
    Refactor the existing code to allow easier addition of different
    ranges without code duplication.

    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Andre Przywara
     
  • With the introduction of a second emulated GIC model we need to let
    userspace specify the GIC model to use for each VM. Pass the
    userspace provided value down into the vGIC code and store it there
    to differentiate later.

    Signed-off-by: Andre Przywara
    Acked-by: Christoffer Dall
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Andre Przywara
     

16 Jan, 2015

2 commits

  • kvm_get_dirty_log() provides generic handling of dirty bitmap, currently reused
    by several architectures. Building on that we intrdoduce
    kvm_get_dirty_log_protect() adding write protection to mark these pages dirty
    for future write access, before next KVM_GET_DIRTY_LOG ioctl call from user
    space.

    Reviewed-by: Christoffer Dall
    Signed-off-by: Mario Smarduch

    Mario Smarduch
     
  • Allow architectures to override the generic kvm_flush_remote_tlbs()
    function via HAVE_KVM_ARCH_TLB_FLUSH_ALL. ARMv7 will need this to
    provide its own TLB flush interface.

    Reviewed-by: Christoffer Dall
    Reviewed-by: Marc Zyngier
    Reviewed-by: Paolo Bonzini
    Signed-off-by: Mario Smarduch

    Mario Smarduch
     

11 Jan, 2015

2 commits

  • Since the advent of VGIC dynamic initialization, this latter is
    initialized quite late on the first vcpu run or "on-demand", when
    injecting an IRQ or when the guest sets its registers.

    This initialization could be initiated explicitly much earlier
    by the users-space, as soon as it has provided the requested
    dimensioning parameters.

    This patch adds a new entry to the VGIC KVM device that allows
    the user to manually request the VGIC init:
    - a new KVM_DEV_ARM_VGIC_GRP_CTRL group is introduced.
    - Its first attribute is KVM_DEV_ARM_VGIC_CTRL_INIT

    The rationale behind introducing a group is to be able to add other
    controls later on, if needed.

    Signed-off-by: Eric Auger
    Signed-off-by: Christoffer Dall

    Eric Auger
     
  • To be more explicit on vgic initialization failure, -ENODEV is
    returned by vgic_init when no online vcpus can be found at init.

    Signed-off-by: Eric Auger
    Signed-off-by: Christoffer Dall

    Eric Auger
     

09 Jan, 2015

1 commit

  • Several hypervisors need MSR auto load/restore feature.
    We read MSRs from VM-entry MSR load area which specified by L1,
    and load them via kvm_set_msr in the nested entry.
    When nested exit occurs, we get MSRs via kvm_get_msr, writing
    them to L1`s MSR store area. After this, we read MSRs from VM-exit
    MSR load area, and load them via kvm_set_msr.

    Signed-off-by: Wincy Van
    Signed-off-by: Paolo Bonzini

    Wincy Van
     

31 Dec, 2014

1 commit

  • The current timecounter implementation will drop a variable amount
    of resolution, depending on the magnitude of the time delta. In
    other words, reading the clock too often or too close to a time
    stamp conversion will introduce errors into the time values. This
    patch fixes the issue by introducing a fractional nanosecond field
    that accumulates the low order bits.

    Reported-by: Janusz Użycki
    Signed-off-by: Richard Cochran
    Signed-off-by: David S. Miller

    Richard Cochran
     

28 Dec, 2014

2 commits

  • Modifying a non-existent slot is not allowed. Also check that the
    first loop doesn't move a deleted slot beyond the used part of
    the mslots array.

    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • Before commit 0e60b0799fed (kvm: change memslot sorting rule from size
    to GFN, 2014-12-01), the memslots' sorting key was npages, meaning
    that a valid memslot couldn't have its sorting key equal to zero.
    On the other hand, a valid memslot can have base_gfn == 0, and invalid
    memslots are identified by base_gfn == npages == 0.

    Because of this, commit 0e60b0799fed broke the invariant that invalid
    memslots are at the end of the mslots array. When a memslot with
    base_gfn == 0 was created, any invalid memslot before it were left
    in place.

    This can be fixed by changing the insertion to use a ">=" comparison
    instead of "
    Reported-by: Andy Lutomirski
    Tested-by: Jamie Heilman
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

19 Dec, 2014

1 commit

  • Pull KVM update from Paolo Bonzini:
    "3.19 changes for KVM:

    - spring cleaning: removed support for IA64, and for hardware-
    assisted virtualization on the PPC970

    - ARM, PPC, s390 all had only small fixes

    For x86:
    - small performance improvements (though only on weird guests)
    - usual round of hardware-compliancy fixes from Nadav
    - APICv fixes
    - XSAVES support for hosts and guests. XSAVES hosts were broken
    because the (non-KVM) XSAVES patches inadvertently changed the KVM
    userspace ABI whenever XSAVES was enabled; hence, this part is
    going to stable. Guest support is just a matter of exposing the
    feature and CPUID leaves support"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits)
    KVM: move APIC types to arch/x86/
    KVM: PPC: Book3S: Enable in-kernel XICS emulation by default
    KVM: PPC: Book3S HV: Improve H_CONFER implementation
    KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register
    KVM: PPC: Book3S HV: Remove code for PPC970 processors
    KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
    KVM: PPC: Book3S HV: Simplify locking around stolen time calculations
    arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function
    arch: powerpc: kvm: book3s_pr.c: Remove unused function
    arch: powerpc: kvm: book3s.c: Remove some unused functions
    arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function
    KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
    KVM: PPC: Book3S HV: ptes are big endian
    KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI
    KVM: PPC: Book3S HV: Fix KSM memory corruption
    KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI
    KVM: PPC: Book3S HV: Fix computation of tlbie operand
    KVM: PPC: Book3S HV: Add missing HPTE unlock
    KVM: PPC: BookE: Improve irq inject tracepoint
    arm/arm64: KVM: Require in-kernel vgic for the arch timers
    ...

    Linus Torvalds
     

15 Dec, 2014

3 commits

  • …git/kvmarm/kvmarm into HEAD

    Second round of changes for KVM for arm/arm64 for v3.19; fixes reboot
    problems, clarifies VCPU init, and fixes a regression concerning the
    VGIC init flow.

    Conflicts:
    arch/ia64/kvm/kvm-ia64.c [deleted in HEAD and modified in kvmarm]

    Paolo Bonzini
     
  • It is curently possible to run a VM with architected timers support
    without creating an in-kernel VGIC, which will result in interrupts from
    the virtual timer going nowhere.

    To address this issue, move the architected timers initialization to the
    time when we run a VCPU for the first time, and then only initialize
    (and enable) the architected timers if we have a properly created and
    initialized in-kernel VGIC.

    When injecting interrupts from the virtual timer to the vgic, the
    current setup should ensure that this never calls an on-demand init of
    the VGIC, which is the only call path that could return an error from
    kvm_vgic_inject_irq(), so capture the return value and raise a warning
    if there's an error there.

    We also change the kvm_timer_init() function from returning an int to be
    a void function, since the function always succeeds.

    Reviewed-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • Userspace assumes that it can wire up IRQ injections after having
    created all VCPUs and after having created the VGIC, but potentially
    before starting the first VCPU. This can currently lead to lost IRQs
    because the state of that IRQ injection is not stored anywhere and we
    don't return an error to userspace.

    We haven't seen this problem manifest itself yet, presumably because
    guests reset the devices on boot, but this could cause issues with
    migration and other non-standard startup configurations.

    Reviewed-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

13 Dec, 2014

3 commits

  • Some code paths will need to check to see if the internal state of the
    vgic has been initialized (such as when creating new VCPUs), so
    introduce such a macro that checks the nr_cpus field which is set when
    the vgic has been initialized.

    Also set nr_cpus = 0 in kvm_vgic_destroy, because the error path in
    vgic_init() will call this function, and code should never errornously
    assume the vgic to be properly initialized after an error.

    Acked-by: Marc Zyngier
    Reviewed-by: Eric Auger
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • The vgic_initialized() macro currently returns the state of the
    vgic->ready flag, which indicates if the vgic is ready to be used when
    running a VM, not specifically if its internal state has been
    initialized.

    Rename the macro accordingly in preparation for a more nuanced
    initialization flow.

    Acked-by: Marc Zyngier
    Reviewed-by: Eric Auger
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • VGIC initialization currently happens in three phases:
    (1) kvm_vgic_create() (triggered by userspace GIC creation)
    (2) vgic_init_maps() (triggered by userspace GIC register read/write
    requests, or from kvm_vgic_init() if not already run)
    (3) kvm_vgic_init() (triggered by first VM run)

    We were doing initialization of some state to correspond with the
    state of a freshly-reset GIC in kvm_vgic_init(); this is too late,
    since it will overwrite changes made by userspace using the
    register access APIs before the VM is run. Move this initialization
    earlier, into the vgic_init_maps() phase.

    This fixes a bug where QEMU could successfully restore a saved
    VM state snapshot into a VM that had already been run, but could
    not restore it "from cold" using the -loadvm command line option
    (the symptoms being that the restored VM would run but interrupts
    were ignored).

    Finally rename vgic_init_maps to vgic_init and renamed kvm_vgic_init to
    kvm_vgic_map_resources.

    [ This patch is originally written by Peter Maydell, but I have
    modified it somewhat heavily, renaming various bits and moving code
    around. If something is broken, I am to be blamed. - Christoffer ]

    Acked-by: Marc Zyngier
    Reviewed-by: Eric Auger
    Signed-off-by: Peter Maydell
    Signed-off-by: Christoffer Dall

    Peter Maydell