18 Mar, 2016

1 commit

  • Pull VFIO updates from Alex Williamson:
    "Various enablers for assignment of Intel graphics devices and future
    support of vGPU devices (Alex Williamson). This includes

    - Handling the vfio type1 interface as an API rather than a specific
    implementation, allowing multiple type1 providers.

    - Capability chains, similar to PCI device capabilities, that allow
    extending ioctls. Extensions here include device specific regions
    and sparse mmap descriptions. The former is used to expose non-PCI
    regions for IGD, including the OpRegion (particularly the Video
    BIOS Table), and read only PCI config access to the host and LPC
    bridge as drivers often depend on identifying those devices.

    Sparse mmaps here are used to describe the MSIx vector table, which
    vfio has always protected from mmap, but never had an API to
    explicitly define that protection. In future vGPU support this is
    expected to allow the description of PCI BARs that may mix direct
    access and emulated access within a single region.

    - The ability to expose the shadow ROM as an option ROM as IGD use
    cases may rely on the ROM even though the physical device does not
    make use of a PCI option ROM BAR"

    * tag 'vfio-v4.6-rc1' of git://github.com/awilliam/linux-vfio:
    vfio/pci: return -EFAULT if copy_to_user fails
    vfio/pci: Expose shadow ROM as PCI option ROM
    vfio/pci: Intel IGD host and LCP bridge config space access
    vfio/pci: Intel IGD OpRegion support
    vfio/pci: Enable virtual register in PCI config space
    vfio/pci: Add infrastructure for additional device specific regions
    vfio: Define device specific region type capability
    vfio/pci: Include sparse mmap capability for MSI-X table regions
    vfio: Define sparse mmap capability for regions
    vfio: Add capability chain helpers
    vfio: Define capability chains
    vfio: If an IOMMU backend fails, keep looking
    vfio/pci: Fix unsigned comparison overflow

    Linus Torvalds
     

28 Feb, 2016

1 commit

  • Calling return copy_to_user(...) in an ioctl will not
    do the right thing if there's a pagefault:
    copy_to_user returns the number of bytes not copied
    in this case.

    Fix up vfio to do
    return copy_to_user(...)) ?
    -EFAULT : 0;

    everywhere.

    Cc: stable@vger.kernel.org
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Alex Williamson

    Michael S. Tsirkin
     

26 Feb, 2016

1 commit


23 Feb, 2016

9 commits

  • Integrated graphics may have their ROM shadowed at 0xc0000 rather than
    implement a PCI option ROM. Make this ROM appear to the user using
    the ROM BAR.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • Provide read-only access to PCI config space of the PCI host bridge
    and LPC bridge through device specific regions. This may be used to
    configure a VM with matching register contents to satisfy driver
    requirements. Providing this through the vfio file descriptor removes
    an additional userspace requirement for access through pci-sysfs and
    removes the CAP_SYS_ADMIN requirement that doesn't appear to apply to
    the specific devices we're accessing.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • This is the first consumer of vfio device specific resource support,
    providing read-only access to the OpRegion for Intel graphics devices.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • Typically config space for a device is mapped out into capability
    specific handlers and unassigned space. The latter allows direct
    read/write access to config space. Sometimes we know about registers
    living in this void space and would like an easy way to virtualize
    them, similar to how BAR registers are managed. To do this, create
    one more pseudo (fake) PCI capability to be handled as purely virtual
    space. Reads and writes are serviced entirely from virtual config
    space.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • Add support for additional regions with indexes started after the
    already defined fixed regions. Device specific code can register
    these regions with the new vfio_pci_register_dev_region() function.
    The ops structure per region currently only includes read/write
    access and a release function, allowing automatic cleanup when the
    device is closed. mmap support is only missing here because it's
    not needed by the first user queued for this support.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • vfio-pci has never allowed the user to directly mmap the MSI-X vector
    table, but we've always relied on implicit knowledge of the user that
    they cannot do this. Now that we have capability chains that we can
    expose in the region info ioctl and a sparse mmap capability that
    represents the sub-areas within the region that can be mmap'd, we can
    make the mmap constraints more explicit.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • Allow sub-modules to easily reallocate a buffer for managing
    capability chains for info ioctls.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • Consider an IOMMU to be an API rather than an implementation, we might
    have multiple implementations supporting the same API, so try another
    if one fails. The expectation here is that we'll really only have
    one implementation per device type. For instance the existing type1
    driver works with any PCI device where the IOMMU API is available. A
    vGPU vendor may have a virtual PCI device which provides DMA isolation
    and mapping through other mechanisms, but can re-use userspaces that
    make use of the type1 VFIO IOMMU API. This allows that to work.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • Signed versus unsigned comparisons are implicitly cast to unsigned,
    which result in a couple possible overflows. For instance (start +
    count) might overflow and wrap, getting through our validation test.
    Also when unwinding setup, -1 being compared as unsigned doesn't
    produce the intended stop condition. Fix both of these and also fix
    vfio_msi_set_vector_signal() to validate parameters before using the
    vector index, though none of the callers should pass bad indexes
    anymore.

    Reported-by: Eric Auger
    Reviewed-by: Eric Auger
    Tested-by: Eric Auger
    Signed-off-by: Alex Williamson

    Alex Williamson
     

28 Jan, 2016

1 commit

  • Using iommu_present() to determine whether an IOMMU group is real or
    fake has some problems. First, apparently Power systems don't
    register an IOMMU on the device bus, so the groups and containers get
    marked as noiommu and then won't bind to their actual IOMMU driver.
    Second, I expect we'll run into the same issue as we try to support
    vGPUs through vfio, since they're likely to emulate this behavior of
    creating an IOMMU group on a virtual device and then providing a vfio
    IOMMU backend tailored to the sort of isolation they provide, which
    won't necessarily be fully compatible with the IOMMU API.

    The solution here is to use the existing iommudata interface to IOMMU
    groups, which allows us to easily identify the fake groups we've
    created for noiommu purposes. The iommudata we set is purely
    arbitrary since we're only comparing the address, so we use the
    address of the noiommu switch itself.

    Reported-by: Alexey Kardashevskiy
    Reviewed-by: Alexey Kardashevskiy
    Tested-by: Alexey Kardashevskiy
    Tested-by: Anatoly Burakov
    Tested-by: Santosh Shukla
    Fixes: 03a76b60f8ba ("vfio: Include No-IOMMU mode")
    Signed-off-by: Alex Williamson

    Alex Williamson
     

05 Jan, 2016

1 commit

  • The flags entry is there to tell the user that some
    optional information is available.

    Since we report the iova_pgsizes signal it to the user
    by setting the flags to VFIO_IOMMU_INFO_PGSIZES.

    Signed-off-by: Pierre Morel
    Signed-off-by: Alex Williamson

    Pierre Morel
     

22 Dec, 2015

2 commits

  • There is really no way to safely give a user full access to a DMA
    capable device without an IOMMU to protect the host system. There is
    also no way to provide DMA translation, for use cases such as device
    assignment to virtual machines. However, there are still those users
    that want userspace drivers even under those conditions. The UIO
    driver exists for this use case, but does not provide the degree of
    device access and programming that VFIO has. In an effort to avoid
    code duplication, this introduces a No-IOMMU mode for VFIO.

    This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling
    the "enable_unsafe_noiommu_mode" option on the vfio driver. This
    should make it very clear that this mode is not safe. Additionally,
    CAP_SYS_RAWIO privileges are necessary to work with groups and
    containers using this mode. Groups making use of this support are
    named /dev/vfio/noiommu-$GROUP and can only make use of the special
    VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically
    binding a device without a native IOMMU group to a VFIO bus driver
    will taint the kernel and should therefore not be considered
    supported. This patch includes no-iommu support for the vfio-pci bus
    driver only.

    Signed-off-by: Alex Williamson
    Acked-by: Michael S. Tsirkin

    Alex Williamson
     
  • This loop ends with count set to -1 and not zero so the warning message
    isn't printed when it should be. I've fixed this by change the postop
    to a preop.

    Fixes: 0990822c9866 ('VFIO: platform: reset: AMD xgbe reset module')
    Signed-off-by: Dan Carpenter
    Reviewed-by: Eric Auger
    Signed-off-by: Alex Williamson

    Dan Carpenter
     

04 Dec, 2015

1 commit

  • Revert commit 033291eccbdb ("vfio: Include No-IOMMU mode") due to lack
    of a user. This was originally intended to fill a need for the DPDK
    driver, but uptake has been slow so rather than support an unproven
    kernel interface revert it and revisit when userspace catches up.

    Signed-off-by: Alex Williamson

    Alex Williamson
     

21 Nov, 2015

2 commits


20 Nov, 2015

2 commits


14 Nov, 2015

1 commit

  • Pull VFIO updates from Alex Williamson:
    - Use kernel interfaces for VPD emulation (Alex Williamson)
    - Platform fix for releasing IRQs (Eric Auger)
    - Type1 IOMMU always advertises PAGE_SIZE support when smaller mapping
    sizes are available (Eric Auger)
    - Platform fixes for incorrectly using copies of structures rather than
    pointers to structures (James Morse)
    - Rework platform reset modules, fix leak, and add AMD xgbe reset
    module (Eric Auger)
    - Fix vfio_device_get_from_name() return value (Joerg Roedel)
    - No-IOMMU interface (Alex Williamson)
    - Fix potential out of bounds array access in PCI config handling (Dan
    Carpenter)

    * tag 'vfio-v4.4-rc1' of git://github.com/awilliam/linux-vfio:
    vfio/pci: make an array larger
    vfio: Include No-IOMMU mode
    vfio: Fix bug in vfio_device_get_from_name()
    VFIO: platform: reset: AMD xgbe reset module
    vfio: platform: reset: calxedaxgmac: fix ioaddr leak
    vfio: platform: add dev_info on device reset
    vfio: platform: use list of registered reset function
    vfio: platform: add compat in vfio_platform_device
    vfio: platform: reset: calxedaxgmac: add reset function registration
    vfio: platform: introduce module_vfio_reset_handler macro
    vfio: platform: add capability to register a reset function
    vfio: platform: introduce vfio-platform-base module
    vfio/platform: store mapped memory in region, instead of an on-stack copy
    vfio/type1: handle case where IOMMU does not support PAGE_SIZE size
    VFIO: platform: clear IRQ_NOAUTOEN when de-assigning the IRQ
    vfio/pci: Use kernel VPD access functions
    vfio: Whitelist PCI bridges

    Linus Torvalds
     

09 Nov, 2015

1 commit


05 Nov, 2015

2 commits

  • There is really no way to safely give a user full access to a DMA
    capable device without an IOMMU to protect the host system. There is
    also no way to provide DMA translation, for use cases such as device
    assignment to virtual machines. However, there are still those users
    that want userspace drivers even under those conditions. The UIO
    driver exists for this use case, but does not provide the degree of
    device access and programming that VFIO has. In an effort to avoid
    code duplication, this introduces a No-IOMMU mode for VFIO.

    This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling
    the "enable_unsafe_noiommu_mode" option on the vfio driver. This
    should make it very clear that this mode is not safe. Additionally,
    CAP_SYS_RAWIO privileges are necessary to work with groups and
    containers using this mode. Groups making use of this support are
    named /dev/vfio/noiommu-$GROUP and can only make use of the special
    VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically
    binding a device without a native IOMMU group to a VFIO bus driver
    will taint the kernel and should therefore not be considered
    supported. This patch includes no-iommu support for the vfio-pci bus
    driver only.

    Signed-off-by: Alex Williamson
    Acked-by: Michael S. Tsirkin

    Alex Williamson
     
  • The vfio_device_get_from_name() function might return a
    non-NULL pointer, when called with a device name that is not
    found in the list. This causes undefined behavior, in my
    case calling an invalid function pointer later on:

    kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
    BUG: unable to handle kernel paging request at ffff8800cb3ddc08

    [...]

    Call Trace:
    [] ? vfio_group_fops_unl_ioctl+0x253/0x410 [vfio]
    [] do_vfs_ioctl+0x2cd/0x4c0
    [] ? __fget+0x77/0xb0
    [] SyS_ioctl+0x79/0x90
    [] ? syscall_return_slowpath+0x50/0x130
    [] entry_SYSCALL_64_fastpath+0x16/0x75

    Fix the issue by returning NULL when there is no device with
    the requested name in the list.

    Cc: stable@vger.kernel.org # v4.2+
    Fixes: 4bc94d5dc95d ("vfio: Fix lockdep issue")
    Signed-off-by: Joerg Roedel
    Signed-off-by: Alex Williamson

    Joerg Roedel
     

04 Nov, 2015

11 commits

  • This patch introduces a module that registers and implements a low-level
    reset function for the AMD XGBE device.

    it performs the following actions:
    - reset the PHY
    - disable auto-negotiation
    - disable & clear auto-negotiation IRQ
    - soft-reset the MAC

    Those tiny pieces of code are inherited from the native xgbe driver.

    Signed-off-by: Eric Auger
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • In the current code the vfio_platform_region is copied on the stack.
    As a consequence the ioaddr address is not iounmapped in the vfio
    platform driver (vfio_platform_regions_cleanup). The patch uses the
    pointer to the region instead.

    Signed-off-by: Eric Auger
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • It might be helpful for the end-user to check the device reset
    function was found by the vfio platform reset framework.

    Lets store a pointer to the struct device in vfio_platform_device
    and trace when the reset function is called or not found.

    Signed-off-by: Eric Auger
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • Remove the static lookup table and use the dynamic list of registered
    reset functions instead. Also load the reset module through its alias.
    The reset struct module pointer is stored in vfio_platform_device.

    We also remove the useless struct device pointer parameter in
    vfio_platform_get_reset.

    This patch fixes the issue related to the usage of __symbol_get, which
    besides from being moot, prevented compilation with CONFIG_MODULES
    disabled.

    Also usage of MODULE_ALIAS makes possible to add a new reset module
    without needing to update the framework. This was suggested by Arnd.

    Signed-off-by: Eric Auger
    Reported-by: Arnd Bergmann
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • Let's retrieve the compatibility string on probe and store it
    in the vfio_platform_device struct

    Signed-off-by: Eric Auger
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • This patch adds the reset function registration/unregistration.
    This is handled through the module_vfio_reset_handler macro. This
    latter also defines a MODULE_ALIAS which simplifies the load from
    vfio-platform.

    Signed-off-by: Eric Auger
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • The module_vfio_reset_handler macro
    - define a module alias
    - implement module init/exit function which respectively registers
    and unregisters the reset function.

    Signed-off-by: Eric Auger
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • In preparation for subsequent changes in reset function lookup,
    lets introduce a dynamic list of reset combos (compat string,
    reset module, reset function). The list can be populated/voided with
    vfio_platform_register/unregister_reset. Those are not yet used in
    this patch.

    Signed-off-by: Eric Auger
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • To prepare for vfio platform reset rework let's build
    vfio_platform_common.c and vfio_platform_irq.c in a separate
    module from vfio-platform and vfio-amba. This makes possible
    to have separate module inits and works around a race between
    platform driver init and vfio reset module init: that way we
    make sure symbols exported by base are available when vfio-platform
    driver gets probed.

    The open/release being implemented in the base module, the ref
    count is applied to the parent module instead.

    Signed-off-by: Eric Auger
    Suggested-by: Arnd Bergmann
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • vfio_platform_{read,write}_mmio() call ioremap_nocache() to map
    a region of io memory, which they store in struct vfio_platform_region to
    be eventually re-used, or unmapped by vfio_platform_regions_cleanup().

    These functions receive a copy of their struct vfio_platform_region
    argument on the stack - so these mapped areas are always allocated, and
    always leaked.

    Pass this argument as a pointer instead.

    Fixes: 6e3f26456009 "vfio/platform: read and write support for the device fd"
    Signed-off-by: James Morse
    Acked-by: Baptiste Reynal
    Tested-by: Baptiste Reynal
    Signed-off-by: Alex Williamson

    James Morse
     
  • Current vfio_pgsize_bitmap code hides the supported IOMMU page
    sizes smaller than PAGE_SIZE. As a result, in case the IOMMU
    does not support PAGE_SIZE page, the alignment check on map/unmap
    is done with larger page sizes, if any. This can fail although
    mapping could be done with pages smaller than PAGE_SIZE.

    This patch modifies vfio_pgsize_bitmap implementation so that,
    in case the IOMMU supports page sizes smaller than PAGE_SIZE
    we pretend PAGE_SIZE is supported and hide sub-PAGE_SIZE sizes.
    That way the user will be able to map/unmap buffers whose size/
    start address is aligned with PAGE_SIZE. Pinning code uses that
    granularity while iommu driver can use the sub-PAGE_SIZE size
    to map the buffer.

    Signed-off-by: Eric Auger
    Acked-by: Will Deacon
    Signed-off-by: Alex Williamson

    Eric Auger
     

28 Oct, 2015

3 commits

  • The vfio platform driver currently sets the IRQ_NOAUTOEN before
    doing the request_irq to properly handle the user masking. However
    it does not clear it when de-assigning the IRQ. This brings issues
    when loading the native driver again which may not explicitly enable
    the IRQ. This problem was observed with xgbe driver.

    Signed-off-by: Eric Auger
    Signed-off-by: Alex Williamson

    Eric Auger
     
  • The PCI VPD capability operates on a set of window registers in PCI
    config space. Writing to the address register triggers either a read
    or write, depending on the setting of the PCI_VPD_ADDR_F bit within
    the address register. The data register provides either the source
    for writes or the target for reads.

    This model is susceptible to being broken by concurrent access, for
    which the kernel has adopted a set of access functions to serialize
    these registers. Additionally, commits like 932c435caba8 ("PCI: Add
    dev_flags bit to access VPD through function 0") and 7aa6ca4d39ed
    ("PCI: Add VPD function 0 quirk for Intel Ethernet devices") indicate
    that VPD registers can be shared between functions on multifunction
    devices creating dependencies between otherwise independent devices.

    Fortunately it's quite easy to emulate the VPD registers, simply
    storing copies of the address and data registers in memory and
    triggering a VPD read or write on writes to the address register.
    This allows vfio users to avoid seeing spurious register changes from
    accesses on other devices and enables the use of shared quirks in the
    host kernel. We can theoretically still race with access through
    sysfs, but the window of opportunity is much smaller.

    Signed-off-by: Alex Williamson
    Acked-by: Mark Rustad

    Alex Williamson
     
  • When determining whether a group is viable, we already allow devices
    bound to pcieport. Generalize this to include any PCI bridge device.

    Signed-off-by: Alex Williamson

    Alex Williamson
     

01 Oct, 2015

1 commit