18 Mar, 2016

1 commit

  • Pull VFIO updates from Alex Williamson:
    "Various enablers for assignment of Intel graphics devices and future
    support of vGPU devices (Alex Williamson). This includes

    - Handling the vfio type1 interface as an API rather than a specific
    implementation, allowing multiple type1 providers.

    - Capability chains, similar to PCI device capabilities, that allow
    extending ioctls. Extensions here include device specific regions
    and sparse mmap descriptions. The former is used to expose non-PCI
    regions for IGD, including the OpRegion (particularly the Video
    BIOS Table), and read only PCI config access to the host and LPC
    bridge as drivers often depend on identifying those devices.

    Sparse mmaps here are used to describe the MSIx vector table, which
    vfio has always protected from mmap, but never had an API to
    explicitly define that protection. In future vGPU support this is
    expected to allow the description of PCI BARs that may mix direct
    access and emulated access within a single region.

    - The ability to expose the shadow ROM as an option ROM as IGD use
    cases may rely on the ROM even though the physical device does not
    make use of a PCI option ROM BAR"

    * tag 'vfio-v4.6-rc1' of git://github.com/awilliam/linux-vfio:
    vfio/pci: return -EFAULT if copy_to_user fails
    vfio/pci: Expose shadow ROM as PCI option ROM
    vfio/pci: Intel IGD host and LCP bridge config space access
    vfio/pci: Intel IGD OpRegion support
    vfio/pci: Enable virtual register in PCI config space
    vfio/pci: Add infrastructure for additional device specific regions
    vfio: Define device specific region type capability
    vfio/pci: Include sparse mmap capability for MSI-X table regions
    vfio: Define sparse mmap capability for regions
    vfio: Add capability chain helpers
    vfio: Define capability chains
    vfio: If an IOMMU backend fails, keep looking
    vfio/pci: Fix unsigned comparison overflow

    Linus Torvalds
     

28 Feb, 2016

1 commit

  • Calling return copy_to_user(...) in an ioctl will not
    do the right thing if there's a pagefault:
    copy_to_user returns the number of bytes not copied
    in this case.

    Fix up vfio to do
    return copy_to_user(...)) ?
    -EFAULT : 0;

    everywhere.

    Cc: stable@vger.kernel.org
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Alex Williamson

    Michael S. Tsirkin
     

26 Feb, 2016

1 commit


23 Feb, 2016

7 commits

  • Integrated graphics may have their ROM shadowed at 0xc0000 rather than
    implement a PCI option ROM. Make this ROM appear to the user using
    the ROM BAR.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • Provide read-only access to PCI config space of the PCI host bridge
    and LPC bridge through device specific regions. This may be used to
    configure a VM with matching register contents to satisfy driver
    requirements. Providing this through the vfio file descriptor removes
    an additional userspace requirement for access through pci-sysfs and
    removes the CAP_SYS_ADMIN requirement that doesn't appear to apply to
    the specific devices we're accessing.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • This is the first consumer of vfio device specific resource support,
    providing read-only access to the OpRegion for Intel graphics devices.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • Typically config space for a device is mapped out into capability
    specific handlers and unassigned space. The latter allows direct
    read/write access to config space. Sometimes we know about registers
    living in this void space and would like an easy way to virtualize
    them, similar to how BAR registers are managed. To do this, create
    one more pseudo (fake) PCI capability to be handled as purely virtual
    space. Reads and writes are serviced entirely from virtual config
    space.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • Add support for additional regions with indexes started after the
    already defined fixed regions. Device specific code can register
    these regions with the new vfio_pci_register_dev_region() function.
    The ops structure per region currently only includes read/write
    access and a release function, allowing automatic cleanup when the
    device is closed. mmap support is only missing here because it's
    not needed by the first user queued for this support.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • vfio-pci has never allowed the user to directly mmap the MSI-X vector
    table, but we've always relied on implicit knowledge of the user that
    they cannot do this. Now that we have capability chains that we can
    expose in the region info ioctl and a sparse mmap capability that
    represents the sub-areas within the region that can be mmap'd, we can
    make the mmap constraints more explicit.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • Signed versus unsigned comparisons are implicitly cast to unsigned,
    which result in a couple possible overflows. For instance (start +
    count) might overflow and wrap, getting through our validation test.
    Also when unwinding setup, -1 being compared as unsigned doesn't
    produce the intended stop condition. Fix both of these and also fix
    vfio_msi_set_vector_signal() to validate parameters before using the
    vector index, though none of the callers should pass bad indexes
    anymore.

    Reported-by: Eric Auger
    Reviewed-by: Eric Auger
    Tested-by: Eric Auger
    Signed-off-by: Alex Williamson

    Alex Williamson
     

22 Dec, 2015

1 commit

  • There is really no way to safely give a user full access to a DMA
    capable device without an IOMMU to protect the host system. There is
    also no way to provide DMA translation, for use cases such as device
    assignment to virtual machines. However, there are still those users
    that want userspace drivers even under those conditions. The UIO
    driver exists for this use case, but does not provide the degree of
    device access and programming that VFIO has. In an effort to avoid
    code duplication, this introduces a No-IOMMU mode for VFIO.

    This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling
    the "enable_unsafe_noiommu_mode" option on the vfio driver. This
    should make it very clear that this mode is not safe. Additionally,
    CAP_SYS_RAWIO privileges are necessary to work with groups and
    containers using this mode. Groups making use of this support are
    named /dev/vfio/noiommu-$GROUP and can only make use of the special
    VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically
    binding a device without a native IOMMU group to a VFIO bus driver
    will taint the kernel and should therefore not be considered
    supported. This patch includes no-iommu support for the vfio-pci bus
    driver only.

    Signed-off-by: Alex Williamson
    Acked-by: Michael S. Tsirkin

    Alex Williamson
     

04 Dec, 2015

1 commit

  • Revert commit 033291eccbdb ("vfio: Include No-IOMMU mode") due to lack
    of a user. This was originally intended to fill a need for the DPDK
    driver, but uptake has been slow so rather than support an unproven
    kernel interface revert it and revisit when userspace catches up.

    Signed-off-by: Alex Williamson

    Alex Williamson
     

20 Nov, 2015

1 commit


14 Nov, 2015

1 commit

  • Pull VFIO updates from Alex Williamson:
    - Use kernel interfaces for VPD emulation (Alex Williamson)
    - Platform fix for releasing IRQs (Eric Auger)
    - Type1 IOMMU always advertises PAGE_SIZE support when smaller mapping
    sizes are available (Eric Auger)
    - Platform fixes for incorrectly using copies of structures rather than
    pointers to structures (James Morse)
    - Rework platform reset modules, fix leak, and add AMD xgbe reset
    module (Eric Auger)
    - Fix vfio_device_get_from_name() return value (Joerg Roedel)
    - No-IOMMU interface (Alex Williamson)
    - Fix potential out of bounds array access in PCI config handling (Dan
    Carpenter)

    * tag 'vfio-v4.4-rc1' of git://github.com/awilliam/linux-vfio:
    vfio/pci: make an array larger
    vfio: Include No-IOMMU mode
    vfio: Fix bug in vfio_device_get_from_name()
    VFIO: platform: reset: AMD xgbe reset module
    vfio: platform: reset: calxedaxgmac: fix ioaddr leak
    vfio: platform: add dev_info on device reset
    vfio: platform: use list of registered reset function
    vfio: platform: add compat in vfio_platform_device
    vfio: platform: reset: calxedaxgmac: add reset function registration
    vfio: platform: introduce module_vfio_reset_handler macro
    vfio: platform: add capability to register a reset function
    vfio: platform: introduce vfio-platform-base module
    vfio/platform: store mapped memory in region, instead of an on-stack copy
    vfio/type1: handle case where IOMMU does not support PAGE_SIZE size
    VFIO: platform: clear IRQ_NOAUTOEN when de-assigning the IRQ
    vfio/pci: Use kernel VPD access functions
    vfio: Whitelist PCI bridges

    Linus Torvalds
     

09 Nov, 2015

1 commit


05 Nov, 2015

1 commit

  • There is really no way to safely give a user full access to a DMA
    capable device without an IOMMU to protect the host system. There is
    also no way to provide DMA translation, for use cases such as device
    assignment to virtual machines. However, there are still those users
    that want userspace drivers even under those conditions. The UIO
    driver exists for this use case, but does not provide the degree of
    device access and programming that VFIO has. In an effort to avoid
    code duplication, this introduces a No-IOMMU mode for VFIO.

    This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling
    the "enable_unsafe_noiommu_mode" option on the vfio driver. This
    should make it very clear that this mode is not safe. Additionally,
    CAP_SYS_RAWIO privileges are necessary to work with groups and
    containers using this mode. Groups making use of this support are
    named /dev/vfio/noiommu-$GROUP and can only make use of the special
    VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically
    binding a device without a native IOMMU group to a VFIO bus driver
    will taint the kernel and should therefore not be considered
    supported. This patch includes no-iommu support for the vfio-pci bus
    driver only.

    Signed-off-by: Alex Williamson
    Acked-by: Michael S. Tsirkin

    Alex Williamson
     

28 Oct, 2015

1 commit

  • The PCI VPD capability operates on a set of window registers in PCI
    config space. Writing to the address register triggers either a read
    or write, depending on the setting of the PCI_VPD_ADDR_F bit within
    the address register. The data register provides either the source
    for writes or the target for reads.

    This model is susceptible to being broken by concurrent access, for
    which the kernel has adopted a set of access functions to serialize
    these registers. Additionally, commits like 932c435caba8 ("PCI: Add
    dev_flags bit to access VPD through function 0") and 7aa6ca4d39ed
    ("PCI: Add VPD function 0 quirk for Intel Ethernet devices") indicate
    that VPD registers can be shared between functions on multifunction
    devices creating dependencies between otherwise independent devices.

    Fortunately it's quite easy to emulate the VPD registers, simply
    storing copies of the address and data registers in memory and
    triggering a VPD read or write on writes to the address register.
    This allows vfio users to avoid seeing spurious register changes from
    accesses on other devices and enables the use of shared quirks in the
    host kernel. We can theoretically still race with access through
    sysfs, but the window of opportunity is much smaller.

    Signed-off-by: Alex Williamson
    Acked-by: Mark Rustad

    Alex Williamson
     

01 Oct, 2015

1 commit


10 Jun, 2015

1 commit

  • Testing the driver for a PCI device is racy, it can be all but
    complete in the release path and still report the driver as ours.
    Therefore we can't trust drvdata to be valid. This race can sometimes
    be seen when one port of a multifunction device is being unbound from
    the vfio-pci driver while another function is being released by the
    user and attempting a bus reset. The device in the remove path is
    found as a dependent device for the bus reset of the release path
    device, the driver is still set to vfio-pci, but the drvdata has
    already been cleared, resulting in a null pointer dereference.

    To resolve this, fix vfio_device_get_from_dev() to not take the
    dev_get_drvdata() shortcut and instead traverse through the
    iommu_group, vfio_group, vfio_device path to get a reference we
    can trust. Once we have that reference, we know the device isn't
    in transition and we can test to make sure the driver is still what
    we expect, so that we don't interfere with devices we don't own.

    Signed-off-by: Alex Williamson

    Alex Williamson
     

02 May, 2015

1 commit


08 Apr, 2015

6 commits

  • Reported by 0-day test infrastructure.

    Fixes: ecaa1f6a0154 ("vfio-pci: Add VGA arbiter client")
    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • We can save some power by putting devices that are bound to vfio-pci
    but not in use by the user in the D3hot power state. Devices get
    woken into D0 when opened by the user. Resets return the device to
    D0, so we need to re-apply the low power state after a bus reset.
    It's tempting to try to use D3cold, but we have no reason to inhibit
    hotplug of idle devices and we might get into a loop of having the
    device disappear before we have a chance to try to use it.

    A new module parameter allows this feature to be disabled if there are
    devices that misbehave as a result of this change.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • As indicated in the comment, this is not entirely uncommon and
    causes user concern for no reason.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • This copies the same support from pci-stub for exactly the same
    purpose, enabling a set of PCI IDs to be automatically added to the
    driver's dynamic ID table at module load time. The code here is
    pretty simple and both vfio-pci and pci-stub are fairly unique in
    being meta drivers, capable of attaching to any device, so there's no
    attempt made to generalize the code into pci-core.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • If VFIO VGA access is disabled for the user, either by CONFIG option
    or module parameter, we can often opt-out of VGA arbitration. We can
    do this when PCI bridge control of VGA routing is possible. This
    means that we must have a parent bridge and there must only be a
    single VGA device below that bridge. Fortunately this is the typical
    case for discrete GPUs.

    Doing this allows us to minimize the impact of additional GPUs, in
    terms of VGA arbitration, when they are only used via vfio-pci for
    non-VGA applications.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • Add a module option so that we don't require a CONFIG change and
    kernel rebuild to disable VGA support. Not only can VGA support be
    troublesome in itself, but by disabling it we can reduce the impact
    to host devices by doing a VGA arbitration opt-out.

    Signed-off-by: Alex Williamson

    Alex Williamson
     

17 Mar, 2015

7 commits

  • An unintended consequence of commit 42ac9bd18d4f ("vfio: initialize
    the virqfd workqueue in VFIO generic code") is that the vfio module
    is renamed to vfio_core so that it can include both vfio and virqfd.
    That's a user visible change that may break module loading scritps
    and it imposes eventfd support as a dependency on the core vfio code,
    which it's really not. virqfd is intended to be provided as a service
    to vfio bus drivers, so instead of wrapping it into vfio.ko, we can
    make it a stand-alone module toggled by vfio bus drivers. This has
    the additional benefit of removing initialization and exit from the
    core vfio code.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • Now we have finally completely decoupled virqfd from VFIO_PCI. We can
    initialize it from the VFIO generic code, in order to safely use it from
    multiple independent VFIO bus drivers.

    Signed-off-by: Antonios Motakis
    Signed-off-by: Baptiste Reynal
    Reviewed-by: Eric Auger
    Tested-by: Eric Auger
    Signed-off-by: Alex Williamson

    Antonios Motakis
     
  • The virqfd functionality that is used by VFIO_PCI to implement interrupt
    masking and unmasking via an eventfd, is generic enough and can be reused
    by another driver. Move it to a separate file in order to allow the code
    to be shared.

    Signed-off-by: Antonios Motakis
    Signed-off-by: Baptiste Reynal
    Reviewed-by: Eric Auger
    Tested-by: Eric Auger
    Signed-off-by: Alex Williamson

    Antonios Motakis
     
  • VFIO_PCI passes the VFIO device structure *vdev via eventfd to the handler
    that implements masking/unmasking of IRQs via an eventfd. We can replace
    it in the virqfd infrastructure with an opaque type so we can make use
    of the mechanism from other VFIO bus drivers.

    Signed-off-by: Antonios Motakis
    Signed-off-by: Baptiste Reynal
    Reviewed-by: Eric Auger
    Tested-by: Eric Auger
    Signed-off-by: Alex Williamson

    Antonios Motakis
     
  • The Virqfd code needs to keep accesses to any struct *virqfd safe, but
    this comes into play only when creating or destroying eventfds, so sharing
    the same spinlock with the VFIO bus driver is not necessary.

    Signed-off-by: Antonios Motakis
    Signed-off-by: Baptiste Reynal
    Reviewed-by: Eric Auger
    Tested-by: Eric Auger
    Signed-off-by: Alex Williamson

    Antonios Motakis
     
  • The functions vfio_pci_virqfd_init and vfio_pci_virqfd_exit are not really
    PCI specific, since we plan to reuse the virqfd code with more VFIO drivers
    in addition to VFIO_PCI.

    Signed-off-by: Antonios Motakis
    [Baptiste Reynal: Move rename vfio_pci_virqfd_init and vfio_pci_virqfd_exit
    from "vfio: add a vfio_ prefix to virqfd_enable and virqfd_disable and export"]
    Signed-off-by: Baptiste Reynal
    Reviewed-by: Eric Auger
    Tested-by: Eric Auger
    Signed-off-by: Alex Williamson

    Antonios Motakis
     
  • We want to reuse virqfd functionality in multiple VFIO drivers; before
    moving these functions to core VFIO, add the vfio_ prefix to the
    virqfd_enable and virqfd_disable functions, and export them so they can
    be used from other modules.

    Signed-off-by: Antonios Motakis
    Signed-off-by: Baptiste Reynal
    Reviewed-by: Eric Auger
    Tested-by: Eric Auger
    Signed-off-by: Alex Williamson

    Antonios Motakis
     

12 Mar, 2015

1 commit

  • This adds a missing break statement to VFIO_DEVICE_SET_IRQS handler
    without which vfio_pci_set_err_trigger() would never be called.

    While we are here, add another "break" to VFIO_PCI_REQ_IRQ_INDEX case
    so if we add more indexes later, we won't miss it.

    Fixes: 6140a8f56238 ("vfio-pci: Add device request interface")
    Signed-off-by: Alexey Kardashevskiy
    Signed-off-by: Alex Williamson

    Alexey Kardashevskiy
     

11 Feb, 2015

2 commits


08 Jan, 2015

1 commit

  • Current vfio-pci just supports normal pci device, so vfio_pci_probe() will
    return if the pci device is not a normal device. While current code makes a
    mistake. PCI_HEADER_TYPE is the offset in configuration space of the device
    type, but we use this value to mask the type value.

    This patch fixs this by do the check directly on the pci_dev->hdr_type.

    Signed-off-by: Wei Yang
    Signed-off-by: Alex Williamson
    Cc: stable@vger.kernel.org # v3.6+

    Wei Yang
     

18 Dec, 2014

1 commit


23 Nov, 2014

1 commit


08 Nov, 2014

1 commit