28 Feb, 2016

1 commit

  • Calling return copy_to_user(...) in an ioctl will not
    do the right thing if there's a pagefault:
    copy_to_user returns the number of bytes not copied
    in this case.

    Fix up vfio to do
    return copy_to_user(...)) ?
    -EFAULT : 0;

    everywhere.

    Cc: stable@vger.kernel.org
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Alex Williamson

    Michael S. Tsirkin
     

28 Jan, 2016

1 commit

  • Using iommu_present() to determine whether an IOMMU group is real or
    fake has some problems. First, apparently Power systems don't
    register an IOMMU on the device bus, so the groups and containers get
    marked as noiommu and then won't bind to their actual IOMMU driver.
    Second, I expect we'll run into the same issue as we try to support
    vGPUs through vfio, since they're likely to emulate this behavior of
    creating an IOMMU group on a virtual device and then providing a vfio
    IOMMU backend tailored to the sort of isolation they provide, which
    won't necessarily be fully compatible with the IOMMU API.

    The solution here is to use the existing iommudata interface to IOMMU
    groups, which allows us to easily identify the fake groups we've
    created for noiommu purposes. The iommudata we set is purely
    arbitrary since we're only comparing the address, so we use the
    address of the noiommu switch itself.

    Reported-by: Alexey Kardashevskiy
    Reviewed-by: Alexey Kardashevskiy
    Tested-by: Alexey Kardashevskiy
    Tested-by: Anatoly Burakov
    Tested-by: Santosh Shukla
    Fixes: 03a76b60f8ba ("vfio: Include No-IOMMU mode")
    Signed-off-by: Alex Williamson

    Alex Williamson
     

05 Jan, 2016

1 commit

  • The flags entry is there to tell the user that some
    optional information is available.

    Since we report the iova_pgsizes signal it to the user
    by setting the flags to VFIO_IOMMU_INFO_PGSIZES.

    Signed-off-by: Pierre Morel
    Signed-off-by: Alex Williamson

    Pierre Morel
     

22 Dec, 2015

2 commits

  • There is really no way to safely give a user full access to a DMA
    capable device without an IOMMU to protect the host system. There is
    also no way to provide DMA translation, for use cases such as device
    assignment to virtual machines. However, there are still those users
    that want userspace drivers even under those conditions. The UIO
    driver exists for this use case, but does not provide the degree of
    device access and programming that VFIO has. In an effort to avoid
    code duplication, this introduces a No-IOMMU mode for VFIO.

    This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling
    the "enable_unsafe_noiommu_mode" option on the vfio driver. This
    should make it very clear that this mode is not safe. Additionally,
    CAP_SYS_RAWIO privileges are necessary to work with groups and
    containers using this mode. Groups making use of this support are
    named /dev/vfio/noiommu-$GROUP and can only make use of the special
    VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically
    binding a device without a native IOMMU group to a VFIO bus driver
    will taint the kernel and should therefore not be considered
    supported. This patch includes no-iommu support for the vfio-pci bus
    driver only.

    Signed-off-by: Alex Williamson
    Acked-by: Michael S. Tsirkin

    Alex Williamson
     
  • This loop ends with count set to -1 and not zero so the warning message
    isn't printed when it should be. I've fixed this by change the postop
    to a preop.

    Fixes: 0990822c9866 ('VFIO: platform: reset: AMD xgbe reset module')
    Signed-off-by: Dan Carpenter
    Reviewed-by: Eric Auger
    Signed-off-by: Alex Williamson

    Dan Carpenter
     

04 Dec, 2015

1 commit

  • Revert commit 033291eccbdb ("vfio: Include No-IOMMU mode") due to lack
    of a user. This was originally intended to fill a need for the DPDK
    driver, but uptake has been slow so rather than support an unproven
    kernel interface revert it and revisit when userspace catches up.

    Signed-off-by: Alex Williamson

    Alex Williamson
     

21 Nov, 2015

2 commits


20 Nov, 2015

2 commits


14 Nov, 2015

1 commit

  • Pull VFIO updates from Alex Williamson:
    - Use kernel interfaces for VPD emulation (Alex Williamson)
    - Platform fix for releasing IRQs (Eric Auger)
    - Type1 IOMMU always advertises PAGE_SIZE support when smaller mapping
    sizes are available (Eric Auger)
    - Platform fixes for incorrectly using copies of structures rather than
    pointers to structures (James Morse)
    - Rework platform reset modules, fix leak, and add AMD xgbe reset
    module (Eric Auger)
    - Fix vfio_device_get_from_name() return value (Joerg Roedel)
    - No-IOMMU interface (Alex Williamson)
    - Fix potential out of bounds array access in PCI config handling (Dan
    Carpenter)

    * tag 'vfio-v4.4-rc1' of git://github.com/awilliam/linux-vfio:
    vfio/pci: make an array larger
    vfio: Include No-IOMMU mode
    vfio: Fix bug in vfio_device_get_from_name()
    VFIO: platform: reset: AMD xgbe reset module
    vfio: platform: reset: calxedaxgmac: fix ioaddr leak
    vfio: platform: add dev_info on device reset
    vfio: platform: use list of registered reset function
    vfio: platform: add compat in vfio_platform_device
    vfio: platform: reset: calxedaxgmac: add reset function registration
    vfio: platform: introduce module_vfio_reset_handler macro
    vfio: platform: add capability to register a reset function
    vfio: platform: introduce vfio-platform-base module
    vfio/platform: store mapped memory in region, instead of an on-stack copy
    vfio/type1: handle case where IOMMU does not support PAGE_SIZE size
    VFIO: platform: clear IRQ_NOAUTOEN when de-assigning the IRQ
    vfio/pci: Use kernel VPD access functions
    vfio: Whitelist PCI bridges

    Linus Torvalds
     

09 Nov, 2015

1 commit


05 Nov, 2015

2 commits

  • There is really no way to safely give a user full access to a DMA
    capable device without an IOMMU to protect the host system. There is
    also no way to provide DMA translation, for use cases such as device
    assignment to virtual machines. However, there are still those users
    that want userspace drivers even under those conditions. The UIO
    driver exists for this use case, but does not provide the degree of
    device access and programming that VFIO has. In an effort to avoid
    code duplication, this introduces a No-IOMMU mode for VFIO.

    This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling
    the "enable_unsafe_noiommu_mode" option on the vfio driver. This
    should make it very clear that this mode is not safe. Additionally,
    CAP_SYS_RAWIO privileges are necessary to work with groups and
    containers using this mode. Groups making use of this support are
    named /dev/vfio/noiommu-$GROUP and can only make use of the special
    VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically
    binding a device without a native IOMMU group to a VFIO bus driver
    will taint the kernel and should therefore not be considered
    supported. This patch includes no-iommu support for the vfio-pci bus
    driver only.

    Signed-off-by: Alex Williamson
    Acked-by: Michael S. Tsirkin

    Alex Williamson
     
  • The vfio_device_get_from_name() function might return a
    non-NULL pointer, when called with a device name that is not
    found in the list. This causes undefined behavior, in my
    case calling an invalid function pointer later on:

    kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
    BUG: unable to handle kernel paging request at ffff8800cb3ddc08

    [...]

    Call Trace:
    [] ? vfio_group_fops_unl_ioctl+0x253/0x410 [vfio]
    [] do_vfs_ioctl+0x2cd/0x4c0
    [] ? __fget+0x77/0xb0
    [] SyS_ioctl+0x79/0x90
    [] ? syscall_return_slowpath+0x50/0x130
    [] entry_SYSCALL_64_fastpath+0x16/0x75

    Fix the issue by returning NULL when there is no device with
    the requested name in the list.

    Cc: stable@vger.kernel.org # v4.2+
    Fixes: 4bc94d5dc95d ("vfio: Fix lockdep issue")
    Signed-off-by: Joerg Roedel
    Signed-off-by: Alex Williamson

    Joerg Roedel
     

04 Nov, 2015

11 commits

  • This patch introduces a module that registers and implements a low-level
    reset function for the AMD XGBE device.

    it performs the following actions:
    - reset the PHY
    - disable auto-negotiation
    - disable & clear auto-negotiation IRQ
    - soft-reset the MAC

    Those tiny pieces of code are inherited from the native xgbe driver.

    Signed-off-by: Eric Auger
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • In the current code the vfio_platform_region is copied on the stack.
    As a consequence the ioaddr address is not iounmapped in the vfio
    platform driver (vfio_platform_regions_cleanup). The patch uses the
    pointer to the region instead.

    Signed-off-by: Eric Auger
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • It might be helpful for the end-user to check the device reset
    function was found by the vfio platform reset framework.

    Lets store a pointer to the struct device in vfio_platform_device
    and trace when the reset function is called or not found.

    Signed-off-by: Eric Auger
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • Remove the static lookup table and use the dynamic list of registered
    reset functions instead. Also load the reset module through its alias.
    The reset struct module pointer is stored in vfio_platform_device.

    We also remove the useless struct device pointer parameter in
    vfio_platform_get_reset.

    This patch fixes the issue related to the usage of __symbol_get, which
    besides from being moot, prevented compilation with CONFIG_MODULES
    disabled.

    Also usage of MODULE_ALIAS makes possible to add a new reset module
    without needing to update the framework. This was suggested by Arnd.

    Signed-off-by: Eric Auger
    Reported-by: Arnd Bergmann
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • Let's retrieve the compatibility string on probe and store it
    in the vfio_platform_device struct

    Signed-off-by: Eric Auger
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • This patch adds the reset function registration/unregistration.
    This is handled through the module_vfio_reset_handler macro. This
    latter also defines a MODULE_ALIAS which simplifies the load from
    vfio-platform.

    Signed-off-by: Eric Auger
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • The module_vfio_reset_handler macro
    - define a module alias
    - implement module init/exit function which respectively registers
    and unregisters the reset function.

    Signed-off-by: Eric Auger
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • In preparation for subsequent changes in reset function lookup,
    lets introduce a dynamic list of reset combos (compat string,
    reset module, reset function). The list can be populated/voided with
    vfio_platform_register/unregister_reset. Those are not yet used in
    this patch.

    Signed-off-by: Eric Auger
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • To prepare for vfio platform reset rework let's build
    vfio_platform_common.c and vfio_platform_irq.c in a separate
    module from vfio-platform and vfio-amba. This makes possible
    to have separate module inits and works around a race between
    platform driver init and vfio reset module init: that way we
    make sure symbols exported by base are available when vfio-platform
    driver gets probed.

    The open/release being implemented in the base module, the ref
    count is applied to the parent module instead.

    Signed-off-by: Eric Auger
    Suggested-by: Arnd Bergmann
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • vfio_platform_{read,write}_mmio() call ioremap_nocache() to map
    a region of io memory, which they store in struct vfio_platform_region to
    be eventually re-used, or unmapped by vfio_platform_regions_cleanup().

    These functions receive a copy of their struct vfio_platform_region
    argument on the stack - so these mapped areas are always allocated, and
    always leaked.

    Pass this argument as a pointer instead.

    Fixes: 6e3f26456009 "vfio/platform: read and write support for the device fd"
    Signed-off-by: James Morse
    Acked-by: Baptiste Reynal
    Tested-by: Baptiste Reynal
    Signed-off-by: Alex Williamson

    James Morse
     
  • Current vfio_pgsize_bitmap code hides the supported IOMMU page
    sizes smaller than PAGE_SIZE. As a result, in case the IOMMU
    does not support PAGE_SIZE page, the alignment check on map/unmap
    is done with larger page sizes, if any. This can fail although
    mapping could be done with pages smaller than PAGE_SIZE.

    This patch modifies vfio_pgsize_bitmap implementation so that,
    in case the IOMMU supports page sizes smaller than PAGE_SIZE
    we pretend PAGE_SIZE is supported and hide sub-PAGE_SIZE sizes.
    That way the user will be able to map/unmap buffers whose size/
    start address is aligned with PAGE_SIZE. Pinning code uses that
    granularity while iommu driver can use the sub-PAGE_SIZE size
    to map the buffer.

    Signed-off-by: Eric Auger
    Acked-by: Will Deacon
    Signed-off-by: Alex Williamson

    Eric Auger
     

28 Oct, 2015

3 commits

  • The vfio platform driver currently sets the IRQ_NOAUTOEN before
    doing the request_irq to properly handle the user masking. However
    it does not clear it when de-assigning the IRQ. This brings issues
    when loading the native driver again which may not explicitly enable
    the IRQ. This problem was observed with xgbe driver.

    Signed-off-by: Eric Auger
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • The PCI VPD capability operates on a set of window registers in PCI
    config space. Writing to the address register triggers either a read
    or write, depending on the setting of the PCI_VPD_ADDR_F bit within
    the address register. The data register provides either the source
    for writes or the target for reads.

    This model is susceptible to being broken by concurrent access, for
    which the kernel has adopted a set of access functions to serialize
    these registers. Additionally, commits like 932c435caba8 ("PCI: Add
    dev_flags bit to access VPD through function 0") and 7aa6ca4d39ed
    ("PCI: Add VPD function 0 quirk for Intel Ethernet devices") indicate
    that VPD registers can be shared between functions on multifunction
    devices creating dependencies between otherwise independent devices.

    Fortunately it's quite easy to emulate the VPD registers, simply
    storing copies of the address and data registers in memory and
    triggering a VPD read or write on writes to the address register.
    This allows vfio users to avoid seeing spurious register changes from
    accesses on other devices and enables the use of shared quirks in the
    host kernel. We can theoretically still race with access through
    sysfs, but the window of opportunity is much smaller.

    Signed-off-by: Alex Williamson
    Acked-by: Mark Rustad

    Alex Williamson
     
  • When determining whether a group is viable, we already allow devices
    bound to pcieport. Generalize this to include any PCI bridge device.

    Signed-off-by: Alex Williamson

    Alex Williamson
     

01 Oct, 2015

1 commit


25 Jul, 2015

1 commit

  • When we open a device file descriptor, we currently have the
    following:

    vfio_group_get_device_fd()
    mutex_lock(&group->device_lock);
    open()
    ...
    if (ret)
    release()

    If we hit that error case, we call the backend driver release path,
    which for vfio-pci looks like this:

    vfio_pci_release()
    vfio_pci_disable()
    vfio_pci_try_bus_reset()
    vfio_pci_get_devs()
    vfio_device_get_from_dev()
    vfio_group_get_device()
    mutex_lock(&group->device_lock);

    Whoops, we've stumbled back onto group.device_lock and created a
    deadlock. There's a low likelihood of ever seeing this play out, but
    obviously it needs to be fixed. To do that we can use a reference to
    the vfio_device for vfio_group_get_device_fd() rather than holding the
    lock. There was a loop in this function, theoretically allowing
    multiple devices with the same name, but in practice we don't expect
    such a thing to happen and the code is already aborting from the loop
    with break on any sort of error rather than continuing and only
    parsing the first match anyway, so the loop was effectively unused
    already.

    Signed-off-by: Alex Williamson
    Fixes: 20f300175a1e ("vfio/pci: Fix racy vfio_device_get_from_dev() call")
    Reported-by: Joerg Roedel
    Tested-by: Joerg Roedel

    Alex Williamson
     

29 Jun, 2015

1 commit

  • Pull VFIO updates from Alex Williamson:

    - fix race with device reference versus driver release (Alex Williamson)

    - add reset hooks and Calxeda xgmac reset for vfio-platform (Eric Auger)

    - enable vfio-platform for ARM64 (Eric Auger)

    - tag Baptiste Reynal as vfio-platform sub-maintainer (Alex Williamson)

    * tag 'vfio-v4.2-rc1' of git://github.com/awilliam/linux-vfio:
    MAINTAINERS: Add vfio-platform sub-maintainer
    VFIO: platform: enable ARM64 build
    VFIO: platform: Calxeda xgmac reset module
    VFIO: platform: populate the reset function on probe
    VFIO: platform: add reset callback
    VFIO: platform: add reset struct and lookup table
    vfio/pci: Fix racy vfio_device_get_from_dev() call

    Linus Torvalds
     

24 Jun, 2015

1 commit

  • Pull powerpc updates from Michael Ellerman:

    - disable the 32-bit vdso when building LE, so we can build with a
    64-bit only toolchain.

    - EEH fixes from Gavin & Richard.

    - enable the sys_kcmp syscall from Laurent.

    - sysfs control for fastsleep workaround from Shreyas.

    - expose OPAL events as an irq chip by Alistair.

    - MSI ops moved to pci_controller_ops by Daniel.

    - fix for kernel to userspace backtraces for perf from Anton.

    - merge pseries and pseries_le defconfigs from Cyril.

    - CXL in-kernel API from Mikey.

    - OPAL prd driver from Jeremy.

    - fix for DSCR handling & tests from Anshuman.

    - Powernv flash mtd driver from Cyril.

    - dynamic DMA Window support on powernv from Alexey.

    - LLVM clang fixes & workarounds from Anton.

    - reworked version of the patch to abort syscalls when transactional.

    - fix the swap encoding to support 4TB, from Aneesh.

    - various fixes as usual.

    - Freescale updates from Scott: Highlights include more 8xx
    optimizations, an e6500 hugetlb optimization, QMan device tree nodes,
    t1024/t1023 support, and various fixes and cleanup.

    * tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (180 commits)
    cxl: Fix typo in debug print
    cxl: Add CXL_KERNEL_API config option
    powerpc/powernv: Fix wrong IOMMU table in pnv_ioda_setup_bus_dma()
    powerpc/mm: Change the swap encoding in pte.
    powerpc/mm: PTE_RPN_MAX is not used, remove the same
    powerpc/tm: Abort syscalls in active transactions
    powerpc/iommu/ioda2: Enable compile with IOV=on and IOMMU_API=off
    powerpc/include: Add opal-prd to installed uapi headers
    powerpc/powernv: fix construction of opal PRD messages
    powerpc/powernv: Increase opal-irqchip initcall priority
    powerpc: Make doorbell check preemption safe
    powerpc/powernv: pnv_init_idle_states() should only run on powernv
    macintosh/nvram: Remove as unused
    powerpc: Don't use gcc specific options on clang
    powerpc: Don't use -mno-strict-align on clang
    powerpc: Only use -mtraceback=no, -mno-string and -msoft-float if toolchain supports it
    powerpc: Only use -mabi=altivec if toolchain supports it
    powerpc: Fix duplicate const clang warning in user access code
    vfio: powerpc/spapr: Support Dynamic DMA windows
    vfio: powerpc/spapr: Register memory and define IOMMU v2
    ...

    Linus Torvalds
     

22 Jun, 2015

4 commits

  • This patch enables building VFIO platform and derivatives on ARM64.

    Signed-off-by: Eric Auger
    Acked-by: Baptiste Reynal
    Tested-by: Baptiste Reynal
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • This patch introduces a module that registers and implements a basic
    reset function for the Calxeda xgmac device. This latter basically disables
    interrupts and stops DMA transfers.

    The reset function code is inherited from the native calxeda xgmac driver.

    Signed-off-by: Eric Auger
    Acked-by: Baptiste Reynal
    Tested-by: Baptiste Reynal
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • The reset function lookup happens on vfio-platform probe. The reset
    module load is requested and a reference to the function symbol is
    hold. The reference is released on vfio-platform remove.

    Signed-off-by: Eric Auger
    Acked-by: Baptiste Reynal
    Tested-by: Baptiste Reynal
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • A new reset callback is introduced. If this callback is populated,
    the reset is invoked on device first open/last close or upon userspace
    ioctl. The modality is exposed on VFIO_DEVICE_GET_INFO.

    Signed-off-by: Eric Auger
    Acked-by: Baptiste Reynal
    Tested-by: Baptiste Reynal
    Signed-off-by: Alex Williamson

    Eric Auger
     

18 Jun, 2015

1 commit

  • This patch introduces the vfio_platform_reset_combo struct that
    stores all the information useful to handle the reset modality:
    compat string, name of the reset function, name of the module that
    implements the reset function. A lookup table of such structures
    is added, currently void.

    Signed-off-by: Eric Auger
    Acked-by: Baptiste Reynal
    Tested-by: Baptiste Reynal
    Signed-off-by: Alex Williamson

    Eric Auger
     

11 Jun, 2015

3 commits

  • This adds create/remove window ioctls to create and remove DMA windows.
    sPAPR defines a Dynamic DMA windows capability which allows
    para-virtualized guests to create additional DMA windows on a PCI bus.
    The existing linux kernels use this new window to map the entire guest
    memory and switch to the direct DMA operations saving time on map/unmap
    requests which would normally happen in a big amounts.

    This adds 2 ioctl handlers - VFIO_IOMMU_SPAPR_TCE_CREATE and
    VFIO_IOMMU_SPAPR_TCE_REMOVE - to create and remove windows.
    Up to 2 windows are supported now by the hardware and by this driver.

    This changes VFIO_IOMMU_SPAPR_TCE_GET_INFO handler to return additional
    information such as a number of supported windows and maximum number
    levels of TCE tables.

    DDW is added as a capability, not as a SPAPR TCE IOMMU v2 unique feature
    as we still want to support v2 on platforms which cannot do DDW for
    the sake of TCE acceleration in KVM (coming soon).

    Signed-off-by: Alexey Kardashevskiy
    [aw: for the vfio related changes]
    Acked-by: Alex Williamson
    Reviewed-by: David Gibson
    Signed-off-by: Michael Ellerman

    Alexey Kardashevskiy
     
  • The existing implementation accounts the whole DMA window in
    the locked_vm counter. This is going to be worse with multiple
    containers and huge DMA windows. Also, real-time accounting would requite
    additional tracking of accounted pages due to the page size difference -
    IOMMU uses 4K pages and system uses 4K or 64K pages.

    Another issue is that actual pages pinning/unpinning happens on every
    DMA map/unmap request. This does not affect the performance much now as
    we spend way too much time now on switching context between
    guest/userspace/host but this will start to matter when we add in-kernel
    DMA map/unmap acceleration.

    This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU.
    New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces
    2 new ioctls to register/unregister DMA memory -
    VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY -
    which receive user space address and size of a memory region which
    needs to be pinned/unpinned and counted in locked_vm.
    New IOMMU splits physical pages pinning and TCE table update
    into 2 different operations. It requires:
    1) guest pages to be registered first
    2) consequent map/unmap requests to work only with pre-registered memory.
    For the default single window case this means that the entire guest
    (instead of 2GB) needs to be pinned before using VFIO.
    When a huge DMA window is added, no additional pinning will be
    required, otherwise it would be guest RAM + 2GB.

    The new memory registration ioctls are not supported by
    VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration
    will require memory to be preregistered in order to work.

    The accounting is done per the user process.

    This advertises v2 SPAPR TCE IOMMU and restricts what the userspace
    can do with v1 or v2 IOMMUs.

    In order to support memory pre-registration, we need a way to track
    the use of every registered memory region and only allow unregistration
    if a region is not in use anymore. So we need a way to tell from what
    region the just cleared TCE was from.

    This adds a userspace view of the TCE table into iommu_table struct.
    It contains userspace address, one per TCE entry. The table is only
    allocated when the ownership over an IOMMU group is taken which means
    it is only used from outside of the powernv code (such as VFIO).

    As v2 IOMMU supports IODA2 and pre-IODA2 IOMMUs (which do not support
    DDW API), this creates a default DMA window for IODA2 for consistency.

    Signed-off-by: Alexey Kardashevskiy
    [aw: for the vfio related changes]
    Acked-by: Alex Williamson
    Reviewed-by: David Gibson
    Signed-off-by: Michael Ellerman

    Alexey Kardashevskiy
     
  • Before the IOMMU user (VFIO) would take control over the IOMMU table
    belonging to a specific IOMMU group. This approach did not allow sharing
    tables between IOMMU groups attached to the same container.

    This introduces a new IOMMU ownership flavour when the user can not
    just control the existing IOMMU table but remove/create tables on demand.
    If an IOMMU implements take/release_ownership() callbacks, this lets
    the user have full control over the IOMMU group. When the ownership
    is taken, the platform code removes all the windows so the caller must
    create them.
    Before returning the ownership back to the platform code, VFIO
    unprograms and removes all the tables it created.

    This changes IODA2's onwership handler to remove the existing table
    rather than manipulating with the existing one. From now on,
    iommu_take_ownership() and iommu_release_ownership() are only called
    from the vfio_iommu_spapr_tce driver.

    Old-style ownership is still supported allowing VFIO to run on older
    P5IOC2 and IODA IO controllers.

    No change in userspace-visible behaviour is expected. Since it recreates
    TCE tables on each ownership change, related kernel traces will appear
    more often.

    This adds a pnv_pci_ioda2_setup_default_config() which is called
    when PE is being configured at boot time and when the ownership is
    passed from VFIO to the platform code.

    Signed-off-by: Alexey Kardashevskiy
    [aw: for the vfio related changes]
    Acked-by: Alex Williamson
    Reviewed-by: David Gibson
    Signed-off-by: Michael Ellerman

    Alexey Kardashevskiy