08 Jan, 2015

1 commit

  • Current vfio-pci just supports normal pci device, so vfio_pci_probe() will
    return if the pci device is not a normal device. While current code makes a
    mistake. PCI_HEADER_TYPE is the offset in configuration space of the device
    type, but we use this value to mask the type value.

    This patch fixs this by do the check directly on the pci_dev->hdr_type.

    Signed-off-by: Wei Yang
    Signed-off-by: Alex Williamson
    Cc: stable@vger.kernel.org # v3.6+

    Wei Yang
     

18 Dec, 2014

1 commit


23 Nov, 2014

1 commit


08 Nov, 2014

1 commit


11 Oct, 2014

1 commit

  • Pull VFIO updates from Alex Williamson:
    - Nested IOMMU extension to type1 (Will Deacon)
    - Restore MSIx message before enabling (Gavin Shan)
    - Fix remove path locking (Alex Williamson)

    * tag 'vfio-v3.18-rc1' of git://github.com/awilliam/linux-vfio:
    vfio-pci: Fix remove path locking
    drivers/vfio: Export vfio_spapr_iommu_eeh_ioctl() with GPL
    vfio/pci: Restore MSIx message prior to enabling
    PCI: Export MSI message relevant functions
    vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type
    iommu: introduce domain attribute for nesting IOMMUs

    Linus Torvalds
     

30 Sep, 2014

2 commits

  • Locking both the remove() and release() path results in a deadlock
    that should have been obvious. To fix this we can get and hold the
    vfio_device reference as we evaluate whether to do a bus/slot reset.
    This will automatically block any remove() calls, allowing us to
    remove the explict lock. Fixes 61d792562b53.

    Signed-off-by: Alex Williamson
    Cc: stable@vger.kernel.org [3.17]

    Alex Williamson
     
  • The MSIx vector table lives in device memory, which may be cleared as
    part of a backdoor device reset. This is the case on the IBM IPR HBA
    when the BIST is run on the device. When assigned to a QEMU guest,
    the guest driver does a pci_save_state(), issues a BIST, then does a
    pci_restore_state(). The BIST clears the MSIx vector table, but due
    to the way interrupts are configured the pci_restore_state() does not
    restore the vector table as expected. Eventually this results in an
    EEH error on Power platforms when the device attempts to signal an
    interrupt with the zero'd table entry.

    Fix the problem by restoring the host cached MSI message prior to
    enabling each vector.

    Reported-by: Wen Xiong
    Signed-off-by: Gavin Shan
    Signed-off-by: Alex Williamson

    Gavin Shan
     

25 Sep, 2014

1 commit

  • In PCIe r1.0, sec 5.10.2, bit 0 of the Uncorrectable Error Status, Mask,
    and Severity Registers was for "Training Error." In PCIe r1.1, sec 7.10.2,
    bit 0 was redefined to be "Undefined."

    Rename PCI_ERR_UNC_TRAIN to PCI_ERR_UNC_UND to reflect this change.

    No functional change.

    [bhelgaas: changelog]
    Signed-off-by: Chen, Gong
    Signed-off-by: Bjorn Helgaas

    Chen, Gong
     

09 Aug, 2014

1 commit

  • The existing vfio_pci_open() fails upon error returned from
    vfio_spapr_pci_eeh_open(), which breaks POWER7's P5IOC2 PHB
    support which this patch brings back.

    The patch fixes the issue by dropping the return value of
    vfio_spapr_pci_eeh_open().

    Signed-off-by: Alexey Kardashevskiy
    Signed-off-by: Gavin Shan
    Signed-off-by: Alex Williamson

    Alexey Kardashevskiy
     

08 Aug, 2014

3 commits

  • Each time a device is released, mark whether a local reset was
    successful or whether a bus/slot reset is needed. If a reset is
    needed and all of the affected devices are bound to vfio-pci and
    unused, allow the reset. This is most useful when the userspace
    driver is killed and releases all the devices in an unclean state,
    such as when a QEMU VM quits.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • Serializing open/release allows us to fix a refcnt error if we fail
    to enable the device and lets us prevent devices from being unbound
    or opened, giving us an opportunity to do bus resets on release. No
    restriction added to serialize binding devices to vfio-pci while the
    mutex is held though.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • Our current open/release path looks like this:

    vfio_pci_open
    vfio_pci_enable
    pci_enable_device
    pci_save_state
    pci_store_saved_state

    vfio_pci_release
    vfio_pci_disable
    pci_disable_device
    pci_restore_state

    pci_enable_device() doesn't modify PCI_COMMAND_MASTER, so if a device
    comes to us with it enabled, it persists through the open and gets
    stored as part of the device saved state. We then restore that saved
    state when released, which can allow the device to attempt to continue
    to do DMA. When the group is disconnected from the domain, this will
    get caught by the IOMMU, but if there are other devices in the group,
    the device may continue running and interfere with the user. Even in
    the former case, IOMMUs don't necessarily behave well and a stream of
    blocked DMA can result in unpleasant behavior on the host.

    Explicitly disable Bus Master as we're enabling the device and
    slightly re-work release to make sure that pci_disable_device() is
    the last thing that touches the device.

    Signed-off-by: Alex Williamson

    Alex Williamson
     

05 Aug, 2014

1 commit

  • The patch adds new IOCTL commands for sPAPR VFIO container device
    to support EEH functionality for PCI devices, which have been passed
    through from host to somebody else via VFIO.

    Signed-off-by: Gavin Shan
    Acked-by: Alexander Graf
    Acked-by: Alex Williamson
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     

31 May, 2014

3 commits

  • According PCI local bus specification, the register of Message
    Control for MSI (offset: 2, length: 2) has bit#0 to enable or
    disable MSI logic and it shouldn't be part contributing to the
    calculation of MSI interrupt count. The patch fixes the issue.

    Signed-off-by: Gavin Shan
    Signed-off-by: Alex Williamson

    Gavin Shan
     
  • There's nothing we can do different if pci_load_and_free_saved_state()
    fails, other than maybe print some log message, but the actual re-load
    of the state is an unnecessary step here since we've only just saved
    it. We can cleanup a coverity warning and eliminate the unnecessary
    step by freeing the state ourselves.

    Detected by Coverity: CID 753101

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • When sizing the TPH capability we store the register containing the
    table size into the 'dword' variable, but then use the uninitialized
    'byte' variable to analyze the size. The table size is also actually
    reported as an N-1 value, so correct sizing to account for this.

    The round_up() for both TPH and DPA is unnecessary, remove it.

    Detected by Coverity: CID 714665 & 715156

    Signed-off-by: Alex Williamson

    Alex Williamson
     

15 Feb, 2014

1 commit


25 Jan, 2014

1 commit


16 Jan, 2014

1 commit

  • PCI resets will attempt to take the device_lock for any device to be
    reset. This is a problem if that lock is already held, for instance
    in the device remove path. It's not sufficient to simply kill the
    user process or skip the reset if called after .remove as a race could
    result in the same deadlock. Instead, we handle all resets as "best
    effort" using the PCI "try" reset interfaces. This prevents the user
    from being able to induce a deadlock by triggering a reset.

    Signed-off-by: Alex Williamson
    Signed-off-by: Bjorn Helgaas

    Alex Williamson
     

15 Jan, 2014

1 commit

  • device_lock is much too prone to lockups. For instance if we have a
    pending .remove then device_lock is already held. If userspace
    attempts to modify AER signaling after that point, a deadlock occurs.
    eventfd setup/teardown is already protected in vfio with the igate
    mutex. AER is not a high performance interrupt, so we can also use
    the same mutex to protect signaling versus setup races.

    Signed-off-by: Alex Williamson

    Alex Williamson
     

18 Dec, 2013

1 commit


05 Sep, 2013

2 commits

  • The current VFIO_DEVICE_RESET interface only maps to PCI use cases
    where we can isolate the reset to the individual PCI function. This
    means the device must support FLR (PCIe or AF), PM reset on D3hot->D0
    transition, device specific reset, or be a singleton device on a bus
    for a secondary bus reset. FLR does not have widespread support,
    PM reset is not very reliable, and bus topology is dictated by the
    system and device design. We need to provide a means for a user to
    induce a bus reset in cases where the existing mechanisms are not
    available or not reliable.

    This device specific extension to VFIO provides the user with this
    ability. Two new ioctls are introduced:
    - VFIO_DEVICE_PCI_GET_HOT_RESET_INFO
    - VFIO_DEVICE_PCI_HOT_RESET

    The first provides the user with information about the extent of
    devices affected by a hot reset. This is essentially a list of
    devices and the IOMMU groups they belong to. The user may then
    initiate a hot reset by calling the second ioctl. We must be
    careful that the user has ownership of all the affected devices
    found via the first ioctl, so the second ioctl takes a list of file
    descriptors for the VFIO groups affected by the reset. Each group
    must have IOMMU protection established for the ioctl to succeed.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • Having PCIe/PCI-X capability isn't enough to assume that there are
    extended capabilities. Both specs define that the first capability
    header is all zero if there are no extended capabilities. Testing
    for this avoids an erroneous message about hiding capability 0x0 at
    offset 0x100.

    Signed-off-by: Alex Williamson

    Alex Williamson
     

28 Aug, 2013

1 commit

  • eventfd_fget() tests to see whether the file is an eventfd file, which
    we then immediately pass to eventfd_ctx_fileget(), which again tests
    whether the file is an eventfd file. Simplify slightly by using
    fdget() so that we only test that we're looking at an eventfd once.
    fget() could also be used, but fdget() makes use of fget_light() for
    another slight optimization.

    Signed-off-by: Alex Williamson

    Alex Williamson
     

25 Jul, 2013

1 commit

  • If an attempt is made to unbind a device from vfio-pci while that
    device is in use, the request is blocked until the device becomes
    unused. Unfortunately, that unbind path still grabs the device_lock,
    which certain things like __pci_reset_function() also want to take.
    This means we need to try to acquire the locks ourselves and use the
    pre-locked version, __pci_reset_function_locked().

    Signed-off-by: Alex Williamson

    Alex Williamson
     

29 Jun, 2013

1 commit


03 May, 2013

1 commit

  • Pull vfio updates from Alex Williamson:
    "Changes include extension to support PCI AER notification to
    userspace, byte granularity of PCI config space and access to
    unarchitected PCI config space, better protection around IOMMU driver
    accesses, default file mode fix, and a few misc cleanups."

    * tag 'vfio-for-v3.10' of git://github.com/awilliam/linux-vfio:
    vfio: Set container device mode
    vfio: Use down_reads to protect iommu disconnects
    vfio: Convert container->group_lock to rwsem
    PCI/VFIO: use pcie_flags_reg instead of access PCI-E Capabilities Register
    vfio-pci: Enable raw access to unassigned config space
    vfio-pci: Use byte granularity in config map
    vfio: make local function vfio_pci_intx_unmask_handler() static
    VFIO-AER: Vfio-pci driver changes for supporting AER
    VFIO: Wrapper for getting reference to vfio_device

    Linus Torvalds
     

30 Apr, 2013

1 commit

  • Pull PCI updates from Bjorn Helgaas:
    "PCI changes for the v3.10 merge window:

    PCI device hotplug
    - Remove ACPI PCI subdrivers (Jiang Liu, Myron Stowe)
    - Make acpiphp builtin only, not modular (Jiang Liu)
    - Add acpiphp mutual exclusion (Jiang Liu)

    Power management
    - Skip "PME enabled/disabled" messages when not supported (Rafael
    Wysocki)
    - Fix fallback to PCI_D0 (Rafael Wysocki)

    Miscellaneous
    - Factor quirk_io_region (Yinghai Lu)
    - Cache MSI capability offsets & cleanup (Gavin Shan, Bjorn Helgaas)
    - Clean up EISA resource initialization and logging (Bjorn Helgaas)
    - Fix prototype warnings (Andy Shevchenko, Bjorn Helgaas)
    - MIPS: Initialize of_node before scanning bus (Gabor Juhos)
    - Fix pcibios_get_phb_of_node() declaration "weak" annotation (Gabor
    Juhos)
    - Add MSI INTX_DISABLE quirks for AR8161/AR8162/etc (Xiong Huang)
    - Fix aer_inject return values (Prarit Bhargava)
    - Remove PME/ACPI dependency (Andrew Murray)
    - Use shared PCI_BUS_NUM() and PCI_DEVID() (Shuah Khan)"

    * tag 'pci-v3.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (63 commits)
    vfio-pci: Use cached MSI/MSI-X capabilities
    vfio-pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK
    PCI: Remove "extern" from function declarations
    PCI: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK
    PCI: Drop msi_mask_reg() and remove drivers/pci/msi.h
    PCI: Use msix_table_size() directly, drop multi_msix_capable()
    PCI: Drop msix_table_offset_reg() and msix_pba_offset_reg() macros
    PCI: Drop is_64bit_address() and is_mask_bit_support() macros
    PCI: Drop msi_data_reg() macro
    PCI: Drop msi_lower_address_reg() and msi_upper_address_reg() macros
    PCI: Drop msi_control_reg() macro and use PCI_MSI_FLAGS directly
    PCI: Use cached MSI/MSI-X offsets from dev, not from msi_desc
    PCI: Clean up MSI/MSI-X capability #defines
    PCI: Use cached MSI-X cap while enabling MSI-X
    PCI: Use cached MSI cap while enabling MSI interrupts
    PCI: Remove MSI/MSI-X cap check in pci_msi_check_device()
    PCI: Cache MSI/MSI-X capability offsets in struct pci_dev
    PCI: Use u8, not int, for PM capability offset
    [SCSI] megaraid_sas: Use correct #define for MSI-X capability
    PCI: Remove "extern" from function declarations
    ...

    Linus Torvalds
     

25 Apr, 2013

2 commits


15 Apr, 2013

1 commit


01 Apr, 2013

2 commits

  • Devices like be2net hide registers between the gaps in capabilities
    and architected regions of PCI config space. Our choices to support
    such devices is to either build an ever growing and unmanageable white
    list or rely on hardware isolation to protect us. These registers are
    really no different than MMIO or I/O port space registers, which we
    don't attempt to regulate, so treat PCI config space in the same way.

    Reported-by: Gavin Shan
    Signed-off-by: Alex Williamson
    Tested-by: Gavin Shan

    Alex Williamson
     
  • The config map previously used a byte per dword to map regions of
    config space to capabilities. Modulo a bug where we round the length
    of capabilities down instead of up, this theoretically works well and
    saves space so long as devices don't try to hide registers in the gaps
    between capabilities. Unfortunately they do exactly that so we need
    byte granularity on our config space map. Increase the allocation of
    the config map and split accesses at capability region boundaries.

    Signed-off-by: Alex Williamson
    Tested-by: Gavin Shan

    Alex Williamson
     

27 Mar, 2013

1 commit

  • The VFIO_DEVICE_SET_IRQS ioctl takes a start and count parameter, both
    of which are unsigned. We attempt to bounds check these, but fail to
    account for the case where start is a very large number, allowing
    start + count to wrap back into the valid range. Bounds check both
    start and start + count.

    Reported-by: Dan Carpenter
    Signed-off-by: Alex Williamson

    Alex Williamson
     

26 Mar, 2013

1 commit


16 Mar, 2013

1 commit


11 Mar, 2013

1 commit

  • - New VFIO_SET_IRQ ioctl option to pass the eventfd that is signaled when
    an error occurs in the vfio_pci_device

    - Register pci_error_handler for the vfio_pci driver

    - When the device encounters an error, the error handler registered by
    the vfio_pci driver gets invoked by the AER infrastructure

    - In the error handler, signal the eventfd registered for the device.

    - This results in the qemu eventfd handler getting invoked and
    appropriate action taken for the guest.

    Signed-off-by: Vijay Mohan Pandarathil
    Signed-off-by: Alex Williamson

    Vijay Mohan Pandarathil
     

25 Feb, 2013

1 commit


19 Feb, 2013

2 commits

  • PCI defines display class VGA regions at I/O port address 0x3b0, 0x3c0
    and MMIO address 0xa0000. As these are non-overlapping, we can ignore
    the I/O port vs MMIO difference and expose them both in a single
    region. We make use of the VGA arbiter around each access to
    configure chipset access as necessary.

    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • We give the user access to change the power state of the device but
    certain transitions result in an uninitialized state which the user
    cannot resolve. To fix this we need to mark the PowerState field of
    the PMCSR register read-only and effect the requested change on behalf
    of the user. This has the added benefit that pdev->current_state
    remains accurate while controlled by the user.

    The primary example of this bug is a QEMU guest doing a reboot where
    the device it put into D3 on shutdown and becomes unusable on the next
    boot because the device did a soft reset on D3->D0 (NoSoftRst-).

    Signed-off-by: Alex Williamson

    Alex Williamson