11 Jun, 2015

2 commits

  • This adds create/remove window ioctls to create and remove DMA windows.
    sPAPR defines a Dynamic DMA windows capability which allows
    para-virtualized guests to create additional DMA windows on a PCI bus.
    The existing linux kernels use this new window to map the entire guest
    memory and switch to the direct DMA operations saving time on map/unmap
    requests which would normally happen in a big amounts.

    This adds 2 ioctl handlers - VFIO_IOMMU_SPAPR_TCE_CREATE and
    VFIO_IOMMU_SPAPR_TCE_REMOVE - to create and remove windows.
    Up to 2 windows are supported now by the hardware and by this driver.

    This changes VFIO_IOMMU_SPAPR_TCE_GET_INFO handler to return additional
    information such as a number of supported windows and maximum number
    levels of TCE tables.

    DDW is added as a capability, not as a SPAPR TCE IOMMU v2 unique feature
    as we still want to support v2 on platforms which cannot do DDW for
    the sake of TCE acceleration in KVM (coming soon).

    Signed-off-by: Alexey Kardashevskiy
    [aw: for the vfio related changes]
    Acked-by: Alex Williamson
    Reviewed-by: David Gibson
    Signed-off-by: Michael Ellerman

    Alexey Kardashevskiy
     
  • The existing implementation accounts the whole DMA window in
    the locked_vm counter. This is going to be worse with multiple
    containers and huge DMA windows. Also, real-time accounting would requite
    additional tracking of accounted pages due to the page size difference -
    IOMMU uses 4K pages and system uses 4K or 64K pages.

    Another issue is that actual pages pinning/unpinning happens on every
    DMA map/unmap request. This does not affect the performance much now as
    we spend way too much time now on switching context between
    guest/userspace/host but this will start to matter when we add in-kernel
    DMA map/unmap acceleration.

    This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU.
    New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces
    2 new ioctls to register/unregister DMA memory -
    VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY -
    which receive user space address and size of a memory region which
    needs to be pinned/unpinned and counted in locked_vm.
    New IOMMU splits physical pages pinning and TCE table update
    into 2 different operations. It requires:
    1) guest pages to be registered first
    2) consequent map/unmap requests to work only with pre-registered memory.
    For the default single window case this means that the entire guest
    (instead of 2GB) needs to be pinned before using VFIO.
    When a huge DMA window is added, no additional pinning will be
    required, otherwise it would be guest RAM + 2GB.

    The new memory registration ioctls are not supported by
    VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration
    will require memory to be preregistered in order to work.

    The accounting is done per the user process.

    This advertises v2 SPAPR TCE IOMMU and restricts what the userspace
    can do with v1 or v2 IOMMUs.

    In order to support memory pre-registration, we need a way to track
    the use of every registered memory region and only allow unregistration
    if a region is not in use anymore. So we need a way to tell from what
    region the just cleared TCE was from.

    This adds a userspace view of the TCE table into iommu_table struct.
    It contains userspace address, one per TCE entry. The table is only
    allocated when the ownership over an IOMMU group is taken which means
    it is only used from outside of the powernv code (such as VFIO).

    As v2 IOMMU supports IODA2 and pre-IODA2 IOMMUs (which do not support
    DDW API), this creates a default DMA window for IODA2 for consistency.

    Signed-off-by: Alexey Kardashevskiy
    [aw: for the vfio related changes]
    Acked-by: Alex Williamson
    Reviewed-by: David Gibson
    Signed-off-by: Michael Ellerman

    Alexey Kardashevskiy
     

12 May, 2015

1 commit

  • The patch adds one more EEH sub-command (VFIO_EEH_PE_INJECT_ERR)
    to inject the specified EEH error, which is represented by
    (struct vfio_eeh_pe_err), to the indicated PE for testing purpose.

    Signed-off-by: Gavin Shan
    Reviewed-by: David Gibson
    Acked-by: Alex Williamson
    Signed-off-by: Michael Ellerman

    Gavin Shan
     

05 Aug, 2014

1 commit

  • The patch adds new IOCTL commands for sPAPR VFIO container device
    to support EEH functionality for PCI devices, which have been passed
    through from host to somebody else via VFIO.

    Signed-off-by: Gavin Shan
    Acked-by: Alexander Graf
    Acked-by: Alex Williamson
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     

06 Sep, 2013

1 commit


11 Jul, 2013

1 commit

  • Pull vfio updates from Alex Williamson:
    "Largely hugepage support for vfio/type1 iommu and surrounding cleanups
    and fixes"

    * tag 'vfio-v3.11' of git://github.com/awilliam/linux-vfio:
    vfio/type1: Fix leak on error path
    vfio: Limit group opens
    vfio/type1: Fix missed frees and zero sized removes
    vfio: fix documentation
    vfio: Provide module option to disable vfio_iommu_type1 hugepage support
    vfio: hugepage support for vfio_iommu_type1
    vfio: Convert type1 iommu to use rbtree

    Linus Torvalds
     

21 Jun, 2013

1 commit


20 Jun, 2013

1 commit

  • VFIO implements platform independent stuff such as
    a PCI driver, BAR access (via read/write on a file descriptor
    or direct mapping when possible) and IRQ signaling.

    The platform dependent part includes IOMMU initialization
    and handling. This implements an IOMMU driver for VFIO
    which does mapping/unmapping pages for the guest IO and
    provides information about DMA window (required by a POWER
    guest).

    Cc: David Gibson
    Signed-off-by: Alexey Kardashevskiy
    Signed-off-by: Paul Mackerras
    Acked-by: Alex Williamson
    Signed-off-by: Benjamin Herrenschmidt

    Alexey Kardashevskiy
     

22 Sep, 2012

1 commit


31 Jul, 2012

1 commit