11 Sep, 2015

1 commit

  • In the scope of the idle memory tracking feature, which is introduced by
    the following patch, we need to clear the referenced/accessed bit not only
    in primary, but also in secondary ptes. The latter is required in order
    to estimate wss of KVM VMs. At the same time we want to avoid flushing
    tlb, because it is quite expensive and it won't really affect the final
    result.

    Currently, there is no function for clearing pte young bit that would meet
    our requirements, so this patch introduces one. To achieve that we have
    to add a new mmu-notifier callback, clear_young, since there is no method
    for testing-and-clearing a secondary pte w/o flushing tlb. The new method
    is not mandatory and currently only implemented by KVM.

    Signed-off-by: Vladimir Davydov
    Reviewed-by: Andres Lagar-Cavilla
    Acked-by: Paolo Bonzini
    Cc: Minchan Kim
    Cc: Raghavendra K T
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Greg Thelen
    Cc: Michel Lespinasse
    Cc: David Rientjes
    Cc: Pavel Emelyanov
    Cc: Cyrill Gorcunov
    Cc: Jonathan Corbet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

30 Jul, 2015

1 commit


29 Jul, 2015

1 commit


10 Jul, 2015

1 commit

  • If there are no assigned devices, the guest PAT are not providing
    any useful information and can be overridden to writeback; VMX
    always does this because it has the "IPAT" bit in its extended
    page table entries, but SVM does not have anything similar.
    Hook into VFIO and legacy device assignment so that they
    provide this information to KVM.

    Reviewed-by: Alex Williamson
    Tested-by: Joerg Roedel
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

04 Jul, 2015

1 commit

  • Commit 1cde2930e154 ("sched/preempt: Add static_key() to preempt_notifiers")
    had two problems. First, the preempt-notifier API needs to sleep with the
    addition of the static_key, we do however need to hold off preemption
    while modifying the preempt notifier list, otherwise a preemption could
    observe an inconsistent list state. KVM correctly registers and
    unregisters preempt notifiers with preemption disabled, so the sleep
    caused dmesg splats.

    Second, KVM registers and unregisters preemption notifiers very often
    (in vcpu_load/vcpu_put). With a single uniprocessor guest the static key
    would move between 0 and 1 continuously, hitting the slow path on every
    userspace exit.

    To fix this, wrap the static_key inc/dec in a new API, and call it from
    KVM.

    Fixes: 1cde2930e154 ("sched/preempt: Add static_key() to preempt_notifiers")
    Reported-by: Pontus Fuchs
    Reported-by: Takashi Iwai
    Tested-by: Takashi Iwai
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Paolo Bonzini

    Peter Zijlstra
     

25 Jun, 2015

1 commit

  • Pull arm64 updates from Catalin Marinas:
    "Mostly refactoring/clean-up:

    - CPU ops and PSCI (Power State Coordination Interface) refactoring
    following the merging of the arm64 ACPI support, together with
    handling of Trusted (secure) OS instances

    - Using fixmap for permanent FDT mapping, removing the initial dtb
    placement requirements (within 512MB from the start of the kernel
    image). This required moving the FDT self reservation out of the
    memreserve processing

    - Idmap (1:1 mapping used for MMU on/off) handling clean-up

    - Removing flush_cache_all() - not safe on ARM unless the MMU is off.
    Last stages of CPU power down/up are handled by firmware already

    - "Alternatives" (run-time code patching) refactoring and support for
    immediate branch patching, GICv3 CPU interface access

    - User faults handling clean-up

    And some fixes:

    - Fix for VDSO building with broken ELF toolchains

    - Fix another case of init_mm.pgd usage for user mappings (during
    ASID roll-over broadcasting)

    - Fix for FPSIMD reloading after CPU hotplug

    - Fix for missing syscall trace exit

    - Workaround for .inst asm bug

    - Compat fix for switching the user tls tpidr_el0 register"

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (42 commits)
    arm64: use private ratelimit state along with show_unhandled_signals
    arm64: show unhandled SP/PC alignment faults
    arm64: vdso: work-around broken ELF toolchains in Makefile
    arm64: kernel: rename __cpu_suspend to keep it aligned with arm
    arm64: compat: print compat_sp instead of sp
    arm64: mm: Fix freeing of the wrong memmap entries with !SPARSEMEM_VMEMMAP
    arm64: entry: fix context tracking for el0_sp_pc
    arm64: defconfig: enable memtest
    arm64: mm: remove reference to tlb.S from comment block
    arm64: Do not attempt to use init_mm in reset_context()
    arm64: KVM: Switch vgic save/restore to alternative_insn
    arm64: alternative: Introduce feature for GICv3 CPU interface
    arm64: psci: fix !CONFIG_HOTPLUG_CPU build warning
    arm64: fix bug for reloading FPSIMD state after CPU hotplug.
    arm64: kernel thread don't need to save fpsimd context.
    arm64: fix missing syscall trace exit
    arm64: alternative: Work around .inst assembler bugs
    arm64: alternative: Merge alternative-asm.h into alternative.h
    arm64: alternative: Allow immediate branch as alternative instruction
    arm64: Rework alternate sequence for ARM erratum 845719
    ...

    Linus Torvalds
     

19 Jun, 2015

4 commits


18 Jun, 2015

1 commit


17 Jun, 2015

2 commits

  • Commit fd1d0ddf2ae9 (KVM: arm/arm64: check IRQ number on userland
    injection) rightly limited the range of interrupts userspace can
    inject in a guest, but failed to consider the (unlikely) case where
    a guest is configured with 1024 interrupts.

    In this case, interrupts ranging from 1020 to 1023 are unuseable,
    as they have a special meaning for the GIC CPU interface.

    Make sure that these number cannot be used as an IRQ. Also delete
    a redundant (and similarily buggy) check in kvm_set_irq.

    Reported-by: Peter Maydell
    Cc: Andre Przywara
    Cc: # 4.1, 4.0, 3.19, 3.18
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • If a GICv3-enabled guest tries to configure Group0, we print a
    warning on the console (because we don't support Group0 interrupts).

    This is fairly pointless, and would allow a guest to spam the
    console. Let's just drop the warning.

    Acked-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     

12 Jun, 2015

1 commit

  • So far, we configured the world-switch by having a small array
    of pointers to the save and restore functions, depending on the
    GIC used on the platform.

    Loading these values each time is a bit silly (they never change),
    and it makes sense to rely on the instruction patching instead.

    This leads to a nice cleanup of the code.

    Acked-by: Will Deacon
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Catalin Marinas

    Marc Zyngier
     

10 Jun, 2015

1 commit

  • Commit 47a98b15ba7c ("arm/arm64: KVM: support for un-queuing active
    IRQs") introduced handling of the GICD_I[SC]ACTIVER registers,
    but only for the GICv2 emulation. For the sake of completeness and
    as this is a pre-requisite for save/restore of the GICv3 distributor
    state, we should also emulate their handling in the distributor and
    redistributor frames of an emulated GICv3.

    Acked-by: Christoffer Dall
    Signed-off-by: Andre Przywara
    Signed-off-by: Marc Zyngier

    Andre Przywara
     

05 Jun, 2015

2 commits

  • Only two ioctls have to be modified; the address space id is
    placed in the higher 16 bits of their slot id argument.

    As of this patch, no architecture defines more than one
    address space; x86 will be the first.

    Reviewed-by: Radim Krčmář
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • We need to hide SMRAM from guests not running in SMM. Therefore, all
    uses of kvm_read_guest* and kvm_write_guest* must be changed to use
    different address spaces, depending on whether the VCPU is in system
    management mode. We need to introduce a new family of functions for
    this purpose.

    For now, the VCPU-based functions have the same behavior as the
    existing per-VM ones, they just accept a different type for the
    first argument. Later however they will be changed to use one of many
    "struct kvm_memslots" stored in struct kvm, through an architecture hook.
    VM-based functions will unconditionally use the first memslots pointer.

    Whenever possible, this patch introduces slot-based functions with an
    __ prefix, with two wrappers for generic and vcpu-based actions.
    The exceptions are kvm_read_guest and kvm_write_guest, which are copied
    into the new functions kvm_vcpu_read_guest and kvm_vcpu_write_guest.

    Reviewed-by: Radim Krčmář
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

28 May, 2015

4 commits


26 May, 2015

4 commits

  • Prepare for the case of multiple address spaces.

    Reviewed-by: Radim Krcmar
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • Architecture-specific helpers are not supposed to muck with
    struct kvm_userspace_memory_region contents. Add const to
    enforce this.

    In order to eliminate the only write in __kvm_set_memory_region,
    the cleaning of deleted slots is pulled up from update_memslots
    to __kvm_set_memory_region.

    Reviewed-by: Takuya Yoshikawa
    Reviewed-by: Radim Krcmar
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • kvm_memslots provides lockdep checking. Use it consistently instead of
    explicit dereferencing of kvm->memslots.

    Reviewed-by: Radim Krcmar
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • kvm_alloc_memslots is extracted out of previously scattered code
    that was in kvm_init_memslots_id and kvm_create_vm.

    kvm_free_memslot and kvm_free_memslots are new names of
    kvm_free_physmem and kvm_free_physmem_slot, but they also take
    an explicit pointer to struct kvm_memslots.

    This will simplify the transition to multiple address spaces,
    each represented by one pointer to struct kvm_memslots.

    Reviewed-by: Takuya Yoshikawa
    Reviewed-by: Radim Krcmar
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

20 May, 2015

1 commit

  • gfn_to_pfn_async is used in just one place, and because of x86-specific
    treatment that place will need to look at the memory slot. Hence inline
    it into try_async_pf and export __gfn_to_pfn_memslot.

    The patch also switches the subsequent call to gfn_to_pfn_prot to use
    __gfn_to_pfn_memslot. This is a small optimization. Finally, remove
    the now-unused async argument of __gfn_to_pfn.

    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

08 May, 2015

2 commits

  • On cpu hotplug only KVM emits an unconditional message that its notifier
    has been called. It certainly can be assumed that calling cpu hotplug
    notifiers work, therefore there is no added value if KVM prints a message.

    If an error happens on cpu online KVM will still emit a warning.

    So let's remove this superfluous message.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Paolo Bonzini

    Heiko Carstens
     
  • Caching memslot value and using mark_page_dirty_in_slot() avoids another
    O(log N) search when dirtying the page.

    Signed-off-by: Radim Krčmář
    Message-Id:
    Signed-off-by: Paolo Bonzini

    Radim Krčmář
     

22 Apr, 2015

3 commits

  • …it/kvmarm/kvmarm into kvm-master

    KVM/ARM changes for v4.1, take #2:

    Rather small this time:

    - a fix for a nasty bug with virtual IRQ injection
    - a fix for irqfd

    Paolo Bonzini
     
  • When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently
    only check it against a fixed limit, which historically is set
    to 127. With the new dynamic IRQ allocation the effective limit may
    actually be smaller (64).
    So when now a malicious or buggy userland injects a SPI in that
    range, we spill over on our VGIC bitmaps and bytemaps memory.
    I could trigger a host kernel NULL pointer dereference with current
    mainline by injecting some bogus IRQ number from a hacked kvmtool:
    -----------------
    ....
    DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1)
    DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1)
    DEBUG: IRQ #114 still in the game, writing to bytemap now...
    Unable to handle kernel NULL pointer dereference at virtual address 00000000
    pgd = ffffffc07652e000
    [00000000] *pgd=00000000f658b003, *pud=00000000f658b003, *pmd=0000000000000000
    Internal error: Oops: 96000006 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027
    Hardware name: FVP Base (DT)
    task: ffffffc0774e9680 ti: ffffffc0765a8000 task.ti: ffffffc0765a8000
    PC is at kvm_vgic_inject_irq+0x234/0x310
    LR is at kvm_vgic_inject_irq+0x30c/0x310
    pc : [] lr : [] pstate: 80000145
    .....

    So this patch fixes this by checking the SPI number against the
    actual limit. Also we remove the former legacy hard limit of
    127 in the ioctl code.

    Signed-off-by: Andre Przywara
    Reviewed-by: Christoffer Dall
    CC: # 4.0, 3.19, 3.18
    [maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__,
    as suggested by Christopher Covington]
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • irqfd/arm curently does not support routing. kvm_irq_map_gsi is
    supposed to return all the routing entries associated with the
    provided gsi and return the number of those entries. We should
    return 0 at this point.

    Signed-off-by: Eric Auger
    Acked-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Eric Auger
     

21 Apr, 2015

1 commit

  • This creates a debugfs directory for each HV guest (assuming debugfs
    is enabled in the kernel config), and within that directory, a file
    by which the contents of the guest's HPT (hashed page table) can be
    read. The directory is named vmnnnn, where nnnn is the PID of the
    process that created the guest. The file is named "htab". This is
    intended to help in debugging problems in the host's management
    of guest memory.

    The contents of the file consist of a series of lines like this:

    3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196

    The first field is the index of the entry in the HPT, the second and
    third are the HPT entry, so the third entry contains the real page
    number that is mapped by the entry if the entry's valid bit is set.
    The fourth field is the guest's view of the second doubleword of the
    entry, so it contains the guest physical address. (The format of the
    second through fourth fields are described in the Power ISA and also
    in arch/powerpc/include/asm/mmu-hash64.h.)

    Signed-off-by: Paul Mackerras
    Signed-off-by: Alexander Graf

    Paul Mackerras
     

14 Apr, 2015

1 commit

  • Pull KVM updates from Paolo Bonzini:
    "First batch of KVM changes for 4.1

    The most interesting bit here is irqfd/ioeventfd support for ARM and
    ARM64.

    Summary:

    ARM/ARM64:
    fixes for live migration, irqfd and ioeventfd support (enabling
    vhost, too), page aging

    s390:
    interrupt handling rework, allowing to inject all local interrupts
    via new ioctl and to get/set the full local irq state for migration
    and introspection. New ioctls to access memory by virtual address,
    and to get/set the guest storage keys. SIMD support.

    MIPS:
    FPU and MIPS SIMD Architecture (MSA) support. Includes some
    patches from Ralf Baechle's MIPS tree.

    x86:
    bugfixes (notably for pvclock, the others are small) and cleanups.
    Another small latency improvement for the TSC deadline timer"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
    KVM: use slowpath for cross page cached accesses
    kvm: mmu: lazy collapse small sptes into large sptes
    KVM: x86: Clear CR2 on VCPU reset
    KVM: x86: DR0-DR3 are not clear on reset
    KVM: x86: BSP in MSR_IA32_APICBASE is writable
    KVM: x86: simplify kvm_apic_map
    KVM: x86: avoid logical_map when it is invalid
    KVM: x86: fix mixed APIC mode broadcast
    KVM: x86: use MDA for interrupt matching
    kvm/ppc/mpic: drop unused IRQ_testbit
    KVM: nVMX: remove unnecessary double caching of MAXPHYADDR
    KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry
    KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu
    KVM: vmx: pass error code with internal error #2
    x86: vdso: fix pvclock races with task migration
    KVM: remove kvm_read_hva and kvm_read_hva_atomic
    KVM: x86: optimize delivery of TSC deadline timer interrupt
    KVM: x86: extract blocking logic from __vcpu_run
    kvm: x86: fix x86 eflags fixed bit
    KVM: s390: migrate vcpu interrupt state
    ...

    Linus Torvalds
     

10 Apr, 2015

1 commit


08 Apr, 2015

4 commits


01 Apr, 2015

1 commit

  • We have introduced struct kvm_s390_irq a while ago which allows to
    inject all kinds of interrupts as defined in the Principles of
    Operation.
    Add ioctl to inject interrupts with the extended struct kvm_s390_irq

    Signed-off-by: Jens Freimann
    Signed-off-by: Christian Borntraeger
    Acked-by: Cornelia Huck

    Jens Freimann
     

31 Mar, 2015

1 commit

  • Currently we have struct kvm_exit_mmio for encapsulating MMIO abort
    data to be passed on from syndrome decoding all the way down to the
    VGIC register handlers. Now as we switch the MMIO handling to be
    routed through the KVM MMIO bus, it does not make sense anymore to
    use that structure already from the beginning. So we keep the data in
    local variables until we put them into the kvm_io_bus framework.
    Then we fill kvm_exit_mmio in the VGIC only, making it a VGIC private
    structure. On that way we replace the data buffer in that structure
    with a pointer pointing to a single location in a local variable, so
    we get rid of some copying on the way.
    With all of the virtual GIC emulation code now being registered with
    the kvm_io_bus, we can remove all of the old MMIO handling code and
    its dispatching functionality.

    I didn't bother to rename kvm_exit_mmio (to vgic_mmio or something),
    because that touches a lot of code lines without any good reason.

    This is based on an original patch by Nikolay.

    Signed-off-by: Andre Przywara
    Cc: Nikolay Nikolaev
    Reviewed-by: Marc Zyngier
    Signed-off-by: Marc Zyngier

    Andre Przywara