03 Aug, 2016

1 commit

  • Pull KVM updates from Paolo Bonzini:

    - ARM: GICv3 ITS emulation and various fixes. Removal of the
    old VGIC implementation.

    - s390: support for trapping software breakpoints, nested
    virtualization (vSIE), the STHYI opcode, initial extensions
    for CPU model support.

    - MIPS: support for MIPS64 hosts (32-bit guests only) and lots
    of cleanups, preliminary to this and the upcoming support for
    hardware virtualization extensions.

    - x86: support for execute-only mappings in nested EPT; reduced
    vmexit latency for TSC deadline timer (by about 30%) on Intel
    hosts; support for more than 255 vCPUs.

    - PPC: bugfixes.

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits)
    KVM: PPC: Introduce KVM_CAP_PPC_HTM
    MIPS: Select HAVE_KVM for MIPS64_R{2,6}
    MIPS: KVM: Reset CP0_PageMask during host TLB flush
    MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX()
    MIPS: KVM: Sign extend MFC0/RDHWR results
    MIPS: KVM: Fix 64-bit big endian dynamic translation
    MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase
    MIPS: KVM: Use 64-bit CP0_EBase when appropriate
    MIPS: KVM: Set CP0_Status.KX on MIPS64
    MIPS: KVM: Make entry code MIPS64 friendly
    MIPS: KVM: Use kmap instead of CKSEG0ADDR()
    MIPS: KVM: Use virt_to_phys() to get commpage PFN
    MIPS: Fix definition of KSEGX() for 64-bit
    KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD
    kvm: x86: nVMX: maintain internal copy of current VMCS
    KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
    KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
    KVM: arm64: vgic-its: Simplify MAPI error handling
    KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers
    KVM: arm64: vgic-its: Turn device_id validation into generic ID validation
    ...

    Linus Torvalds
     

30 Jul, 2016

1 commit

  • Pull smp hotplug updates from Thomas Gleixner:
    "This is the next part of the hotplug rework.

    - Convert all notifiers with a priority assigned

    - Convert all CPU_STARTING/DYING notifiers

    The final removal of the STARTING/DYING infrastructure will happen
    when the merge window closes.

    Another 700 hundred line of unpenetrable maze gone :)"

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
    timers/core: Correct callback order during CPU hot plug
    leds/trigger/cpu: Move from CPU_STARTING to ONLINE level
    powerpc/numa: Convert to hotplug state machine
    arm/perf: Fix hotplug state machine conversion
    irqchip/armada: Avoid unused function warnings
    ARC/time: Convert to hotplug state machine
    clocksource/atlas7: Convert to hotplug state machine
    clocksource/armada-370-xp: Convert to hotplug state machine
    clocksource/exynos_mct: Convert to hotplug state machine
    clocksource/arm_global_timer: Convert to hotplug state machine
    rcu: Convert rcutree to hotplug state machine
    KVM/arm/arm64/vgic-new: Convert to hotplug state machine
    smp/cfd: Convert core to hotplug state machine
    x86/x2apic: Convert to CPU hotplug state machine
    profile: Convert to hotplug state machine
    timers/core: Convert to hotplug state machine
    hrtimer: Convert to hotplug state machine
    x86/tboot: Convert to hotplug state machine
    arm64/armv8 deprecated: Convert to hotplug state machine
    hwtracing/coresight-etm4x: Convert to hotplug state machine
    ...

    Linus Torvalds
     

23 Jul, 2016

1 commit


19 Jul, 2016

26 commits

  • If we care to move all the checks that do not involve any memory
    allocation, we can simplify the MAPI error handling. Let's do that,
    it cannot hurt.

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • vgic_its_cmd_handle_mapi has an extra "subcmd" argument, which is
    already contained in the command buffer that all command handlers
    obtain from the command queue. Let's drop it, as it is not that
    useful.

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • There is no need to have separate functions to validate devices
    and collections, as the architecture doesn't really distinguish the
    two, and they are supposed to be managed the same way.

    Let's turn the DevID checker into a generic one.

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Going from the ITS structure to the corresponding KVM structure
    would be quite handy at times. The kvm_device pointer that is
    passed at create time is quite convenient for this, so let's
    keep a copy of it in the vgic_its structure.

    This will be put to a good use in subsequent patches.

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Instead of spreading random allocations all over the place,
    consolidate allocation/init/freeing of collections in a pair
    of constructor/destructor.

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • When checking that the storage address of a device entry is valid,
    it is critical to compute the actual address of the entry, rather
    than relying on the beginning of the page to match a CPU page of
    the same size: for example, if the guest places the table at the
    last 64kB boundary of RAM, but RAM size isn't a multiple of 64kB...

    Fix this by computing the actual offset of the device ID in the
    L2 page, and check the corresponding GFN.

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Checking that the device_id fits if the table, and we must make
    sure that the associated memory is also accessible.

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • The nr_entries variable in vgic_its_check_device_id actually
    describe the size of the L1 table, and not the number of
    entries in this table.

    Rename it to l1_tbl_size, so that we can now change the code
    with a better understanding of what is what.

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • The ITS tables are stored in LE format. If the host is reading
    a L1 table entry to check its validity, it must convert it to
    the CPU endianness.

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • The current code will fail on valid indirect tables, and happily
    use the ones that are pointing out of the guest RAM. Funny what a
    small "!" can do for you...

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Instead of sprinkling raw kref_get() calls everytime we cannot
    do a normal vgic_get_irq(), use the existing vgic_get_irq_kref(),
    which does the same thing and is paired with a vgic_put_irq().

    vgic_get_irq_kref is moved to vgic.h in order to be easily shared.

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • For VGICv2 save and restore the CPU interface registers
    are accessed. Restore the modality which has been altered.
    Also explicitly set the iodev_type for both the DIST and CPU
    interface.

    Signed-off-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Eric Auger
     
  • Now that all ITS emulation functionality is in place, we advertise
    MSI functionality to userland and also the ITS device to the guest - if
    userland has configured that.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Tested-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • When userland wants to inject an MSI into the guest, it uses the
    KVM_SIGNAL_MSI ioctl, which carries the doorbell address along with
    the payload and the device ID.
    With the help of the KVM IO bus framework we learn the corresponding
    ITS from the doorbell address. We then use our wrapper functions to
    iterate the linked lists and find the proper Interrupt Translation Table
    Entry (ITTE) and thus the corresponding struct vgic_irq to finally set
    the pending bit.
    We also provide the handler for the ITS "INT" command, which allows a
    guest to trigger an MSI via the ITS command queue. Since this one knows
    about the right ITS already, we directly call the MMIO handler function
    without using the kvm_io_bus framework.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Tested-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • The connection between a device, an event ID, the LPI number and the
    associated CPU is stored in in-memory tables in a GICv3, but their
    format is not specified by the spec. Instead software uses a command
    queue in a ring buffer to let an ITS implementation use its own
    format.
    Implement handlers for the various ITS commands and let them store
    the requested relation into our own data structures. Those data
    structures are protected by the its_lock mutex.
    Our internal ring buffer read and write pointers are protected by the
    its_cmd mutex, so that only one VCPU per ITS can handle commands at
    any given time.
    Error handling is very basic at the moment, as we don't have a good
    way of communicating errors to the guest (usually an SError).
    The INT command handler is missing from this patch, as we gain the
    capability of actually injecting MSIs into the guest only later on.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Tested-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • The (system-wide) LPI configuration table is held in a table in
    (guest) memory. To achieve reasonable performance, we cache this data
    in our struct vgic_irq. If the guest updates the configuration data
    (which consists of the enable bit and the priority value), it issues
    an INV or INVALL command to allow us to update our information.
    Provide functions that update that information for one LPI or all LPIs
    mapped to a specific collection.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Tested-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • The LPI pending status for a GICv3 redistributor is held in a table
    in (guest) memory. To achieve reasonable performance, we cache the
    pending bit in our struct vgic_irq. The initial pending state must be
    read from guest memory upon enabling LPIs for this redistributor.
    As we can't access the guest memory while we hold the lpi_list spinlock,
    we create a snapshot of the LPI list and iterate over that.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Tested-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • LPIs are dynamically created (mapped) at guest runtime and their
    actual number can be quite high, but is mostly assigned using a very
    sparse allocation scheme. So arrays are not an ideal data structure
    to hold the information.
    We use a spin-lock protected linked list to hold all mapped LPIs,
    represented by their struct vgic_irq. This lock is grouped between the
    ap_list_lock and the vgic_irq lock in our locking order.
    Also we store a pointer to that struct vgic_irq in our struct its_itte,
    so we can easily access it.
    Eventually we call our new vgic_get_lpi() from vgic_get_irq(), so
    the VGIC code gets transparently access to LPIs.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Tested-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • Add emulation for some basic MMIO registers used in the ITS emulation.
    This includes:
    - GITS_{CTLR,TYPER,IIDR}
    - ID registers
    - GITS_{CBASER,CREADR,CWRITER}
    (which implement the ITS command buffer handling)
    - GITS_BASER

    Most of the handlers are pretty straight forward, only the CWRITER
    handler is a bit more involved by taking the new its_cmd mutex and
    then iterating over the command buffer.
    The registers holding base addresses and attributes are sanitised before
    storing them.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Tested-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • Introduce a new KVM device that represents an ARM Interrupt Translation
    Service (ITS) controller. Since there can be multiple of this per guest,
    we can't piggy back on the existing GICv3 distributor device, but create
    a new type of KVM device.
    On the KVM_CREATE_DEVICE ioctl we allocate and initialize the ITS data
    structure and store the pointer in the kvm_device data.
    Upon an explicit init ioctl from userland (after having setup the MMIO
    address) we register the handlers with the kvm_io_bus framework.
    Any reference to an ITS thus has to go via this interface.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Tested-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • The ARM GICv3 ITS emulation code goes into a separate file, but needs
    to be connected to the GICv3 emulation, of which it is an option.
    The ITS MMIO handlers require the respective ITS pointer to be passed in,
    so we amend the existing VGIC MMIO framework to let it cope with that.
    Also we introduce the basic ITS data structure and initialize it, but
    don't return any success yet, as we are not yet ready for the show.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Tested-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • In the GICv3 redistributor there are the PENDBASER and PROPBASER
    registers which we did not emulate so far, as they only make sense
    when having an ITS. In preparation for that emulate those MMIO
    accesses by storing the 64-bit data written into it into a variable
    which we later read in the ITS emulation.
    We also sanitise the registers, making sure RES0 regions are respected
    and checking for valid memory attributes.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Tested-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • In the moment our struct vgic_irq's are statically allocated at guest
    creation time. So getting a pointer to an IRQ structure is trivial and
    safe. LPIs are more dynamic, they can be mapped and unmapped at any time
    during the guest's _runtime_.
    In preparation for supporting LPIs we introduce reference counting for
    those structures using the kernel's kref infrastructure.
    Since private IRQs and SPIs are statically allocated, we avoid actually
    refcounting them, since they would never be released anyway.
    But we take provisions to increase the refcount when an IRQ gets onto a
    VCPU list and decrease it when it gets removed. Also this introduces
    vgic_put_irq(), which wraps kref_put and hides the release function from
    the callers.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Tested-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • The kvm_io_bus framework is a nice place of holding information about
    various MMIO regions for kernel emulated devices.
    Add a call to retrieve the kvm_io_device structure which is associated
    with a certain MMIO address. This avoids to duplicate kvm_io_bus'
    knowledge of MMIO regions without having to fake MMIO calls if a user
    needs the device a certain MMIO address belongs to.
    This will be used by the ITS emulation to get the associated ITS device
    when someone triggers an MSI via an ioctl from userspace.

    Signed-off-by: Andre Przywara
    Reviewed-by: Eric Auger
    Reviewed-by: Marc Zyngier
    Acked-by: Christoffer Dall
    Acked-by: Paolo Bonzini
    Tested-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • kvm_register_device_ops() can return an error, so lets check its return
    value and propagate this up the call chain.

    Signed-off-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Tested-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • Logically a GICv3 redistributor is assigned to a (v)CPU, so we should
    aim to keep redistributor related variables out of our struct vgic_dist.

    Let's start by replacing the redistributor related kvm_io_device array
    with two members in our existing struct vgic_cpu, which are naturally
    per-VCPU and thus don't require any allocation / freeing.
    So apart from the better fit with the redistributor design this saves
    some code as well.

    Signed-off-by: Andre Przywara
    Reviewed-by: Eric Auger
    Reviewed-by: Marc Zyngier
    Tested-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Andre Przywara
     

15 Jul, 2016

8 commits

  • Install the callbacks via the state machine and let the core invoke
    the callbacks on the already online CPUs.

    Signed-off-by: Anna-Maria Gleixner
    Cc: Andre Przywara
    Cc: Christoffer Dall
    Cc: Eric Auger
    Cc: Linus Torvalds
    Cc: Marc Zyngier
    Cc: Paolo Bonzini
    Cc: Peter Zijlstra
    Cc: Radim Krcmar
    Cc: Thomas Gleixner
    Cc: kvm@vger.kernel.org
    Cc: kvmarm@lists.cs.columbia.edu
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160713153337.900484868@linutronix.de
    Signed-off-by: Ingo Molnar

    Anna-Maria Gleixner
     
  • Install the callbacks via the state machine and let the core invoke
    the callbacks on the already online CPUs.

    Signed-off-by: Richard Cochran
    Signed-off-by: Anna-Maria Gleixner
    Reviewed-by: Sebastian Andrzej Siewior
    Cc: Christoffer Dall
    Cc: Gleb Natapov
    Cc: Linus Torvalds
    Cc: Marc Zyngier
    Cc: Paolo Bonzini
    Cc: Peter Zijlstra
    Cc: Radim Krcmar
    Cc: Thomas Gleixner
    Cc: kvm@vger.kernel.org
    Cc: kvmarm@lists.cs.columbia.edu
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160713153336.634155707@linutronix.de
    Signed-off-by: Ingo Molnar

    Richard Cochran
     
  • Install the callbacks via the state machine and let the core invoke
    the callbacks on the already online CPUs.
    The VGIC callback is run after KVM's main callback since it reflects the
    makefile order.

    Signed-off-by: Richard Cochran
    Signed-off-by: Anna-Maria Gleixner
    Reviewed-by: Sebastian Andrzej Siewior
    Cc: Christoffer Dall
    Cc: Gleb Natapov
    Cc: Linus Torvalds
    Cc: Marc Zyngier
    Cc: Paolo Bonzini
    Cc: Peter Zijlstra
    Cc: Radim Krcmar
    Cc: Thomas Gleixner
    Cc: kvm@vger.kernel.org
    Cc: kvmarm@lists.cs.columbia.edu
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160713153336.546953286@linutronix.de
    Signed-off-by: Ingo Molnar

    Richard Cochran
     
  • Install the callbacks via the state machine. The core won't invoke the
    callbacks on already online CPUs.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Anna-Maria Gleixner
    Reviewed-by: Sebastian Andrzej Siewior
    Acked-by: Paolo Bonzini
    Cc: Gleb Natapov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Radim Krcmar
    Cc: kvm@vger.kernel.org
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160713153335.886159080@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Once anon_inode_getfd() has succeeded, it's impossible to undo
    in a clean way and no, sys_close() is not usable in such cases.
    Use anon_inode_getfile() and get_unused_fd_flags() to get struct file
    and descriptor and do *not* install the file into the descriptor table
    until after the last possible failure exit.

    Signed-off-by: Paolo Bonzini

    Al Viro
     
  • This reverts commit 77ecc085fed1af1000ca719522977b960aa6da52.

    Al Viro colorfully says: "You should *NEVER* use sys_close() on failure
    exit paths like that. Moreover, this kvm_put_kvm() becomes a double-put,
    since closing the damn file will drop that reference to kvm. Please,
    revert. anon_inode_getfd() should be used only when there's no possible
    failures past its call".

    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • The failure of create debugfs of VM will return directly without release
    the anon file. It will leak memory and file descriptors, even through
    be not serious.

    Signed-off-by: Liu Shuo
    Fixes: 536a6f88c49dd739961ffd53774775afed852c83
    Signed-off-by: Paolo Bonzini

    Liu Shuo
     
  • When freeing the nested resources of a vcpu, there is an assumption that
    the vcpu's vmcs01 is the current VMCS on the CPU that executes
    nested_release_vmcs12(). If this assumption is violated, the vcpu's
    vmcs01 may be made active on multiple CPUs at the same time, in
    violation of Intel's specification. Moreover, since the vcpu's vmcs01 is
    not VMCLEARed on every CPU on which it is active, it can linger in a
    CPU's VMCS cache after it has been freed and potentially
    repurposed. Subsequent eviction from the CPU's VMCS cache on a capacity
    miss can result in memory corruption.

    It is not sufficient for vmx_free_vcpu() to call vmx_load_vmcs01(). If
    the vcpu in question was last loaded on a different CPU, it must be
    migrated to the current CPU before calling vmx_load_vmcs01().

    Signed-off-by: Jim Mattson
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Bonzini

    Jim Mattson
     

14 Jul, 2016

1 commit


05 Jul, 2016

2 commits

  • The vGPU folks would like to trap the first access to a BAR by setting
    vm_ops on the VMAs produced by mmap-ing a VFIO device. The fault handler
    then can use remap_pfn_range to place some non-reserved pages in the VMA.

    This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
    and fixup_user_fault together help supporting it. The patch also supports
    VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
    reference counting.

    Cc: Xiao Guangrong
    Cc: Andrea Arcangeli
    Cc: Radim Krčmář
    Tested-by: Neo Jia
    Reported-by: Kirti Wankhede
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • Handle VM_IO like VM_PFNMAP, as is common in the rest of Linux; extract
    the formula to convert hva->pfn into a new function, which will soon
    gain more capabilities.

    Cc: Xiao Guangrong
    Cc: Andrea Arcangeli
    Cc: Radim Krčmář
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini