29 Jun, 2017

2 commits

  • At the point where the kvm-vfio pseudo device wants to release its
    vfio group reference, we can't always acquire a new reference to make
    that happen. The group can be in a state where we wouldn't allow a
    new reference to be added. This new helper function allows a caller
    to match a file to a group to facilitate this. Given a file and
    group, report if they match. Thus the caller needs to already have a
    group reference to match to the file. This allows the deletion of a
    group without acquiring a new reference.

    Signed-off-by: Alex Williamson
    Reviewed-by: Eric Auger
    Reviewed-by: Paolo Bonzini
    Tested-by: Eric Auger
    Cc: stable@vger.kernel.org

    Alex Williamson
     
  • Unset-KVM and decrement-assignment only when we find the group in our
    list. Otherwise we can get out of sync if the user triggers this for
    groups that aren't currently on our list.

    Signed-off-by: Alex Williamson
    Reviewed-by: Alexey Kardashevskiy
    Reviewed-by: Eric Auger
    Tested-by: Eric Auger
    Acked-by: Paolo Bonzini
    Cc: stable@vger.kernel.org

    Alex Williamson
     

20 Apr, 2017

1 commit

  • This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
    and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO
    without passing them to user space which saves time on switching
    to user space and back.

    This adds H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE handlers to KVM.
    KVM tries to handle a TCE request in the real mode, if failed
    it passes the request to the virtual mode to complete the operation.
    If it a virtual mode handler fails, the request is passed to
    the user space; this is not expected to happen though.

    To avoid dealing with page use counters (which is tricky in real mode),
    this only accelerates SPAPR TCE IOMMU v2 clients which are required
    to pre-register the userspace memory. The very first TCE request will
    be handled in the VFIO SPAPR TCE driver anyway as the userspace view
    of the TCE table (iommu_table::it_userspace) is not allocated till
    the very first mapping happens and we cannot call vmalloc in real mode.

    If we fail to update a hardware IOMMU table unexpected reason, we just
    clear it and move on as there is nothing really we can do about it -
    for example, if we hot plug a VFIO device to a guest, existing TCE tables
    will be mirrored automatically to the hardware and there is no interface
    to report to the guest about possible failures.

    This adds new attribute - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE - to
    the VFIO KVM device. It takes a VFIO group fd and SPAPR TCE table fd
    and associates a physical IOMMU table with the SPAPR TCE table (which
    is a guest view of the hardware IOMMU table). The iommu_table object
    is cached and referenced so we do not have to look up for it in real mode.

    This does not implement the UNSET counterpart as there is no use for it -
    once the acceleration is enabled, the existing userspace won't
    disable it unless a VFIO container is destroyed; this adds necessary
    cleanup to the KVM_DEV_VFIO_GROUP_DEL handler.

    This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user
    space.

    This adds real mode version of WARN_ON_ONCE() as the generic version
    causes problems with rcu_sched. Since we testing what vmalloc_to_phys()
    returns in the code, this also adds a check for already existing
    vmalloc_to_phys() call in kvmppc_rm_h_put_tce_indirect().

    This finally makes use of vfio_external_user_iommu_id() which was
    introduced quite some time ago and was considered for removal.

    Tests show that this patch increases transmission speed from 220MB/s
    to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

    Signed-off-by: Alexey Kardashevskiy
    Acked-by: Alex Williamson
    Reviewed-by: David Gibson
    Signed-off-by: Paul Mackerras

    Alexey Kardashevskiy
     

02 Dec, 2016

1 commit

  • Sometimes users need to be aware when a vfio_group attaches to a
    KVM or detaches from it. KVM already calls get/put method from vfio to
    manipulate the vfio_group reference, it can notify vfio_group in
    a similar way.

    Cc: Kirti Wankhede
    Cc: Xiao Guangrong
    Signed-off-by: Jike Song
    Acked-by: Paolo Bonzini
    Signed-off-by: Alex Williamson

    Jike Song
     

10 Jul, 2015

1 commit

  • If there are no assigned devices, the guest PAT are not providing
    any useful information and can be overridden to writeback; VMX
    always does this because it has the "IPAT" bit in its extended
    page table entries, but SVM does not have anything similar.
    Hook into VFIO and legacy device assignment so that they
    provide this information to KVM.

    Reviewed-by: Alex Williamson
    Tested-by: Joerg Roedel
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

24 Oct, 2014

1 commit

  • After commit 80ce163 (KVM: VFIO: register kvm_device_ops dynamically),
    kvm_device_ops of vfio can be registered dynamically. Commit 3c3c29fd
    (kvm-vfio: do not use module_init) move the dynamic register invoked by
    kvm_init in order to fix broke unloading of the kvm module. However,
    kvm_device_ops of vfio is unregistered after rmmod kvm-intel module
    which lead to device type collision detection warning after kvm-intel
    module reinsmod.

    WARNING: CPU: 1 PID: 10358 at /root/cathy/kvm/arch/x86/kvm/../../../virt/kvm/kvm_main.c:3289 kvm_init+0x234/0x282 [kvm]()
    Modules linked in: kvm_intel(O+) kvm(O) nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs fscache lockd sunrpc pci_stub bridge stp llc autofs4 8021q cpufreq_ondemand ipv6 joydev microcode pcspkr igb i2c_algo_bit ehci_pci ehci_hcd e1000e i2c_i801 ixgbe ptp pps_core hwmon mdio tpm_tis tpm ipmi_si ipmi_msghandler acpi_cpufreq isci libsas scsi_transport_sas button dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kvm_intel]
    CPU: 1 PID: 10358 Comm: insmod Tainted: G W O 3.17.0-rc1 #2
    Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
    0000000000000cd9 ffff880ff08cfd18 ffffffff814a61d9 0000000000000cd9
    0000000000000000 ffff880ff08cfd58 ffffffff810417b7 ffff880ff08cfd48
    ffffffffa045bcac ffffffffa049c420 0000000000000040 00000000000000ff
    Call Trace:
    [] dump_stack+0x49/0x60
    [] warn_slowpath_common+0x7c/0x96
    [] ? kvm_init+0x234/0x282 [kvm]
    [] warn_slowpath_null+0x15/0x17
    [] kvm_init+0x234/0x282 [kvm]
    [] vmx_init+0x1bf/0x42a [kvm_intel]
    [] ? vmx_check_processor_compat+0x64/0x64 [kvm_intel]
    [] do_one_initcall+0xe3/0x170
    [] ? __vunmap+0xad/0xb8
    [] do_init_module+0x2b/0x174
    [] load_module+0x43e/0x569
    [] ? do_init_module+0x174/0x174
    [] ? copy_module_from_user+0x39/0x82
    [] ? module_sect_show+0x20/0x20
    [] SyS_init_module+0x54/0x81
    [] system_call_fastpath+0x16/0x1b
    ---[ end trace 0626f4a3ddea56f3 ]---

    The bug can be reproduced by:

    rmmod kvm_intel.ko
    insmod kvm_intel.ko

    without rmmod/insmod kvm.ko
    This patch fixes the bug by unregistering kvm_device_ops of vfio when the
    kvm-intel module is removed.

    Reported-by: Liu Rongrong
    Fixes: 3c3c29fd0d7cddc32862c350d0700ce69953e3bd
    Signed-off-by: Wanpeng Li
    Signed-off-by: Paolo Bonzini

    Wanpeng Li
     

24 Sep, 2014

1 commit

  • /me got confused between the kernel and QEMU. In the kernel, you can
    only have one module_init function, and it will prevent unloading the
    module unless you also have the corresponding module_exit function.

    So, commit 80ce1639727e (KVM: VFIO: register kvm_device_ops dynamically,
    2014-09-02) broke unloading of the kvm module, by adding a module_init
    function and no module_exit.

    Repair it by making kvm_vfio_ops_init weak, and checking it in
    kvm_init.

    Cc: Will Deacon
    Cc: Gleb Natapov
    Cc: Alex Williamson
    Fixes: 80ce1639727e9d38729c34f162378508c307ca25
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

17 Sep, 2014

1 commit

  • Now that we have a dynamic means to register kvm_device_ops, use that
    for the VFIO kvm device, instead of relying on the static table.

    This is achieved by a module_init call to register the ops with KVM.

    Cc: Gleb Natapov
    Cc: Paolo Bonzini
    Acked-by: Alex Williamson
    Signed-off-by: Will Deacon
    Signed-off-by: Paolo Bonzini

    Will Deacon
     

27 Feb, 2014

1 commit

  • VFIO now has support for using the IOMMU_CACHE flag and a mechanism
    for an external user to test the current operating mode of the IOMMU.
    Add support for this to the kvm-vfio pseudo device so that we only
    register noncoherent DMA when necessary.

    Signed-off-by: Alex Williamson
    Cc: Gleb Natapov
    Cc: Paolo Bonzini
    Acked-by: Paolo Bonzini

    Alex Williamson
     

15 Jan, 2014

1 commit

  • Building vfio.o triggers a GCC warning (when building for 32 bits x86):
    arch/x86/kvm/../../../virt/kvm/vfio.c: In function 'kvm_vfio_set_group':
    arch/x86/kvm/../../../virt/kvm/vfio.c:104:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    void __user *argp = (void __user *)arg;
    ^

    Silence this warning by casting arg to unsigned long.

    argp's current type, "void __user *", is always casted to "int32_t
    __user *". So its type might as well be changed to "int32_t __user *".

    Signed-off-by: Paul Bolle
    Signed-off-by: Paolo Bonzini

    Paul Bolle
     

31 Oct, 2013

2 commits

  • We currently use some ad-hoc arch variables tied to legacy KVM device
    assignment to manage emulation of instructions that depend on whether
    non-coherent DMA is present. Create an interface for this, adapting
    legacy KVM device assignment and adding VFIO via the KVM-VFIO device.
    For now we assume that non-coherent DMA is possible any time we have a
    VFIO group. Eventually an interface can be developed as part of the
    VFIO external user interface to query the coherency of a group.

    Signed-off-by: Alex Williamson
    Signed-off-by: Paolo Bonzini

    Alex Williamson
     
  • So far we've succeeded at making KVM and VFIO mostly unaware of each
    other, but areas are cropping up where a connection beyond eventfds
    and irqfds needs to be made. This patch introduces a KVM-VFIO device
    that is meant to be a gateway for such interaction. The user creates
    the device and can add and remove VFIO groups to it via file
    descriptors. When a group is added, KVM verifies the group is valid
    and gets a reference to it via the VFIO external user interface.

    Signed-off-by: Alex Williamson
    Signed-off-by: Paolo Bonzini

    Alex Williamson