02 Apr, 2020

2 commits

  • When we try to get an MSI cookie for a VFIO device, that can fail if
    CONFIG_IOMMU_DMA is not set. In this case iommu_get_msi_cookie() returns
    -ENODEV, and that should not be fatal.

    Ignore that case and proceed with the initialisation.

    This fixes VFIO with a platform device on the Calxeda Midway (no MSIs).

    Fixes: f6810c15cf973f ("iommu/arm-smmu: Clean up early-probing workarounds")
    Signed-off-by: Andre Przywara
    Acked-by: Robin Murphy
    Reviewed-by: Eric Auger
    Signed-off-by: Alex Williamson

    Andre Przywara
     
  • Older versions of skiboot only provide a single value in the device
    tree property "ibm,mmio-atsd", even when multiple Address Translation
    Shoot Down (ATSD) registers are present. This prevents NVLink2 devices
    (other than the first) from being used with vfio-pci because vfio-pci
    expects to be able to assign a dedicated ATSD register to each NVLink2
    device.

    However, ATSD registers can be shared among devices. This change
    allows vfio-pci to fall back to sharing the register at index 0 if
    necessary.

    Fixes: 7f92891778df ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver")
    Signed-off-by: Sam Bobroff
    Reviewed-by: Alexey Kardashevskiy
    Signed-off-by: Alex Williamson

    Sam Bobroff
     

24 Mar, 2020

12 commits

  • Alex Williamson
     
  • The cleanup is getting a tad long.

    Reviewed-by: Cornelia Huck
    Reviewed-by: Kevin Tian
    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • It currently results in messages like:

    "vfio-pci 0000:03:00.0: vfio_pci: ..."

    Which is quite a bit redundant.

    Reviewed-by: Cornelia Huck
    Reviewed-by: Kevin Tian
    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • With the VF Token interface we can now expect that a vfio userspace
    driver must be in collaboration with the PF driver, an unwitting
    userspace driver will not be able to get past the GET_DEVICE_FD step
    in accessing the device. We can now move on to actually allowing
    SR-IOV to be enabled by vfio-pci on the PF. Support for this is not
    enabled by default in this commit, but it does provide a module option
    for this to be enabled (enable_sriov=1). Enabling VFs is rather
    straightforward, except we don't want to risk that a VF might get
    autoprobed and bound to other drivers, so a bus notifier is used to
    "capture" VFs to vfio-pci using the driver_override support. We
    assume any later action to bind the device to other drivers is
    condoned by the system admin and allow it with a log warning.

    vfio-pci will disable SR-IOV on a PF before releasing the device,
    allowing a VF driver to be assured other drivers cannot take over the
    PF and that any other userspace driver must know the shared VF token.
    This support also does not provide a mechanism for the PF userspace
    driver itself to manipulate SR-IOV through the vfio API. With this
    patch SR-IOV can only be enabled via the host sysfs interface and the
    PF driver user cannot create or remove VFs.

    Reviewed-by: Cornelia Huck
    Reviewed-by: Kevin Tian
    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • The VFIO_DEVICE_FEATURE ioctl is meant to be a general purpose, device
    agnostic ioctl for setting, retrieving, and probing device features.
    This implementation provides a 16-bit field for specifying a feature
    index, where the data porition of the ioctl is determined by the
    semantics for the given feature. Additional flag bits indicate the
    direction and nature of the operation; SET indicates user data is
    provided into the device feature, GET indicates the device feature is
    written out into user data. The PROBE flag augments determining
    whether the given feature is supported, and if provided, whether the
    given operation on the feature is supported.

    The first user of this ioctl is for setting the vfio-pci VF token,
    where the user provides a shared secret key (UUID) on a SR-IOV PF
    device, which users must provide when opening associated VF devices.

    Reviewed-by: Cornelia Huck
    Reviewed-by: Kevin Tian
    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are not
    fully isolated from the PF. The PF can always cause a denial of service
    to the VF, even if by simply resetting itself. The degree to which a PF
    can access the data passed through a VF or interfere with its operation
    is dependent on a given SR-IOV implementation. Therefore we want to
    avoid a scenario where an existing vfio-pci based userspace driver might
    assume the PF driver is trusted, for example assigning a PF to one VM
    and VF to another with some expectation of isolation. IOMMU grouping
    could be a solution to this, but imposes an unnecessarily strong
    relationship between PF and VF drivers if they need to operate with the
    same IOMMU context. Instead we introduce a "VF token", which is
    essentially just a shared secret between PF and VF drivers, implemented
    as a UUID.

    The VF token can be set by a vfio-pci based PF driver and must be known
    by the vfio-pci based VF driver in order to gain access to the device.
    This allows the degree to which this VF token is considered secret to be
    determined by the applications and environment. For example a VM might
    generate a random UUID known only internally to the hypervisor while a
    userspace networking appliance might use a shared, or even well know,
    UUID among the application drivers.

    To incorporate this VF token, the VFIO_GROUP_GET_DEVICE_FD interface is
    extended to accept key=value pairs in addition to the device name. This
    allows us to most easily deny user access to the device without risk
    that existing userspace drivers assume region offsets, IRQs, and other
    device features, leading to more elaborate error paths. The format of
    these options are expected to take the form:

    "$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2"

    Where the device name is always provided first for compatibility and
    additional options are specified in a space separated list. The
    relation between and requirements for the additional options will be
    vfio bus driver dependent, however unknown or unused option within this
    schema should return error. This allow for future use of unknown
    options as well as a positive indication to the user that an option is
    used.

    An example VF token option would take this form:

    "0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258"

    When accessing a VF where the PF is making use of vfio-pci, the user
    MUST provide the current vf_token. When accessing a PF, the user MUST
    provide the current vf_token IF there are active VF users or MAY provide
    a vf_token in order to set the current VF token when no VF users are
    active. The former requirement assures VF users that an unassociated
    driver cannot usurp the PF device. These semantics also imply that a
    VF token MUST be set by a PF driver before VF drivers can access their
    device, the default token is random and mechanisms to read the token are
    not provided in order to protect the VF token of previous users. Use of
    the vf_token option outside of these cases will return an error, as
    discussed above.

    Reviewed-by: Cornelia Huck
    Reviewed-by: Kevin Tian
    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • This currently serves the same purpose as the default implementation
    but will be expanded for additional functionality.

    Reviewed-by: Cornelia Huck
    Reviewed-by: Kevin Tian
    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • Allow bus drivers to provide their own callback to match a device to
    the user provided string.

    Reviewed-by: Cornelia Huck
    Reviewed-by: Kevin Tian
    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • vfio_group_pin_pages() and vfio_group_unpin_pages() are introduced to
    avoid inefficient search/check/ref/deref opertions associated with VFIO
    group as those in each calling into vfio_pin_pages() and
    vfio_unpin_pages().

    VFIO group is taken as arg directly. The callers combine
    search/check/ref/deref operations associated with VFIO group by calling
    vfio_group_get_external_user()/vfio_group_get_external_user_from_dev()
    beforehand, and vfio_group_put_external_user() afterwards.

    Suggested-by: Alex Williamson
    Signed-off-by: Yan Zhao
    Signed-off-by: Alex Williamson

    Yan Zhao
     
  • vfio_dma_rw will read/write a range of user space memory pointed to by
    IOVA into/from a kernel buffer without enforcing pinning the user space
    memory.

    TODO: mark the IOVAs to user space memory dirty if they are written in
    vfio_dma_rw().

    Cc: Kevin Tian
    Signed-off-by: Yan Zhao
    Signed-off-by: Alex Williamson

    Yan Zhao
     
  • external user calls vfio_group_get_external_user_from_dev() with a device
    pointer to get the VFIO group associated with this device.
    The VFIO group is checked to be vialbe and have IOMMU set. Then
    container user counter is increased and VFIO group reference is hold
    to prevent the VFIO group from disposal before external user exits.

    when the external user finishes using of the VFIO group, it calls
    vfio_group_put_external_user() to dereference the VFIO group and the
    container user counter.

    Suggested-by: Alex Williamson
    Signed-off-by: Yan Zhao
    Signed-off-by: Alex Williamson

    Yan Zhao
     
  • Since commit 7723f4c5ecdb ("driver core: platform: Add an error
    message to platform_get_irq*()"), platform_get_irq() calls dev_err()
    on an error. As we enumerate all interrupts until platform_get_irq()
    fails, we now systematically get a message such as:
    "vfio-platform fff51000.ethernet: IRQ index 3 not found" which is
    a false positive.

    Let's use platform_get_irq_optional() instead.

    Signed-off-by: Eric Auger
    Cc: stable@vger.kernel.org # v5.3+
    Reviewed-by: Andre Przywara
    Tested-by: Andre Przywara
    Signed-off-by: Alex Williamson

    Eric Auger
     

04 Feb, 2020

1 commit

  • Pull VFIO updates from Alex Williamson:

    - Fix nvlink error path (Alexey Kardashevskiy)

    - Update nvlink and spapr to use mmgrab() (Julia Lawall)

    - Update static declaration (Ben Dooks)

    - Annotate __iomem to fix sparse warnings (Ben Dooks)

    * tag 'vfio-v5.6-rc1' of git://github.com/awilliam/linux-vfio:
    vfio: platform: fix __iomem in vfio_platform_amdxgbe.c
    vfio/mdev: make create attribute static
    vfio/spapr_tce: use mmgrab
    vfio: vfio_pci_nvlink2: use mmgrab
    vfio/spapr/nvlink2: Skip unpinning pages on error exit

    Linus Torvalds
     

01 Feb, 2020

3 commits

  • In order to provide a clearer, more symmetric API for pinning and
    unpinning DMA pages. This way, pin_user_pages*() calls match up with
    unpin_user_pages*() calls, and the API is a lot closer to being
    self-explanatory.

    Link: http://lkml.kernel.org/r/20200107224558.2362728-23-jhubbard@nvidia.com
    Signed-off-by: John Hubbard
    Reviewed-by: Jan Kara
    Cc: Alex Williamson
    Cc: Aneesh Kumar K.V
    Cc: Björn Töpel
    Cc: Christoph Hellwig
    Cc: Daniel Vetter
    Cc: Dan Williams
    Cc: Hans Verkuil
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jason Gunthorpe
    Cc: Jens Axboe
    Cc: Jerome Glisse
    Cc: Jonathan Corbet
    Cc: Kirill A. Shutemov
    Cc: Leon Romanovsky
    Cc: Mauro Carvalho Chehab
    Cc: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Hubbard
     
  • 1. Change vfio from get_user_pages_remote(), to
    pin_user_pages_remote().

    2. Because all FOLL_PIN-acquired pages must be released via
    put_user_page(), also convert the put_page() call over to
    put_user_pages_dirty_lock().

    Note that this effectively changes the code's behavior in
    vfio_iommu_type1.c: put_pfn(): it now ultimately calls
    set_page_dirty_lock(), instead of set_page_dirty(). This is probably
    more accurate.

    As Christoph Hellwig put it, "set_page_dirty() is only safe if we are
    dealing with a file backed page where we have reference on the inode it
    hangs off." [1]

    [1] https://lore.kernel.org/r/20190723153640.GB720@lst.de

    Link: http://lkml.kernel.org/r/20200107224558.2362728-20-jhubbard@nvidia.com
    Signed-off-by: John Hubbard
    Tested-by: Alex Williamson
    Acked-by: Alex Williamson
    Cc: Aneesh Kumar K.V
    Cc: Björn Töpel
    Cc: Christoph Hellwig
    Cc: Daniel Vetter
    Cc: Dan Williams
    Cc: Hans Verkuil
    Cc: Ira Weiny
    Cc: Jan Kara
    Cc: Jason Gunthorpe
    Cc: Jason Gunthorpe
    Cc: Jens Axboe
    Cc: Jerome Glisse
    Cc: Jonathan Corbet
    Cc: Kirill A. Shutemov
    Cc: Leon Romanovsky
    Cc: Mauro Carvalho Chehab
    Cc: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Hubbard
     
  • Update VFIO to take advantage of the recently loosened restriction on
    FOLL_LONGTERM with get_user_pages_remote(). Also, now it is possible to
    fix a bug: the VFIO caller is logically a FOLL_LONGTERM user, but it
    wasn't setting FOLL_LONGTERM.

    Also, remove an unnessary pair of calls that were releasing and
    reacquiring the mmap_sem. There is no need to avoid holding mmap_sem
    just in order to call page_to_pfn().

    Also, now that the the DAX check ("if a VMA is DAX, don't allow long
    term pinning") is in the internals of get_user_pages_remote() and
    __gup_longterm_locked(), there's no need for it at the VFIO call site. So
    remove it.

    Link: http://lkml.kernel.org/r/20200107224558.2362728-8-jhubbard@nvidia.com
    Signed-off-by: John Hubbard
    Tested-by: Alex Williamson
    Acked-by: Alex Williamson
    Reviewed-by: Jason Gunthorpe
    Reviewed-by: Ira Weiny
    Suggested-by: Jason Gunthorpe
    Cc: Dan Williams
    Cc: Jerome Glisse
    Cc: Aneesh Kumar K.V
    Cc: Björn Töpel
    Cc: Christoph Hellwig
    Cc: Daniel Vetter
    Cc: Hans Verkuil
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Jonathan Corbet
    Cc: Kirill A. Shutemov
    Cc: Leon Romanovsky
    Cc: Mauro Carvalho Chehab
    Cc: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Hubbard
     

10 Jan, 2020

2 commits

  • The ioaddr should have __iomem marker on it, so add that to fix
    the following sparse warnings:

    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:33:44: warning: incorrect type in argument 2 (different address spaces)
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:33:44: expected void volatile [noderef] *addr
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:33:44: got void *
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:34:33: warning: incorrect type in argument 1 (different address spaces)
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:34:33: expected void const volatile [noderef] *addr
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:34:33: got void *
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:44:44: warning: incorrect type in argument 2 (different address spaces)
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:44:44: expected void volatile [noderef] *addr
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:44:44: got void *
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:45:33: warning: incorrect type in argument 2 (different address spaces)
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:45:33: expected void volatile [noderef] *addr
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:45:33: got void *
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:69:41: warning: incorrect type in argument 1 (different address spaces)
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:69:41: expected void *ioaddr
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:69:41: got void [noderef] *ioaddr
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:71:30: warning: incorrect type in argument 1 (different address spaces)
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:71:30: expected void *ioaddr
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:71:30: got void [noderef] *ioaddr
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:76:49: warning: incorrect type in argument 1 (different address spaces)
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:76:49: expected void *ioaddr
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:76:49: got void [noderef] *ioaddr
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:85:37: warning: incorrect type in argument 1 (different address spaces)
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:85:37: expected void *ioaddr
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:85:37: got void [noderef] *ioaddr
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:87:30: warning: incorrect type in argument 1 (different address spaces)
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:87:30: expected void *ioaddr
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:87:30: got void [noderef] *ioaddr
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:90:30: warning: incorrect type in argument 1 (different address spaces)
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:90:30: expected void *ioaddr
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:90:30: got void [noderef] *ioaddr
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:93:30: warning: incorrect type in argument 1 (different address spaces)
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:93:30: expected void *ioaddr
    drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:93:30: got void [noderef] *ioaddr

    Signed-off-by: Ben Dooks (Codethink)
    Acked-by: Eric Auger
    Signed-off-by: Alex Williamson

    Ben Dooks (Codethink)
     
  • The create attribute is not exported, so make it
    static to avoid the following sparse warning:

    drivers/vfio/mdev/mdev_sysfs.c:77:1: warning: symbol 'mdev_type_attr_create' was not declared. Should it be static?

    Signed-off-by: Ben Dooks (Codethink)
    Reviewed-by: Cornelia Huck
    Signed-off-by: Alex Williamson

    Ben Dooks (Codethink)
     

08 Jan, 2020

2 commits

  • Mmgrab was introduced in commit f1f1007644ff ("mm: add new mmgrab()
    helper") and most of the kernel was updated to use it. Update a
    remaining file.

    The semantic patch that makes this change is as follows:
    (http://coccinelle.lip6.fr/)

    @@ expression e; @@
    - atomic_inc(&e->mm_count);
    + mmgrab(e);

    Signed-off-by: Julia Lawall
    Reviewed-by: Cornelia Huck
    Signed-off-by: Alex Williamson

    Julia Lawall
     
  • Mmgrab was introduced in commit f1f1007644ff ("mm: add new mmgrab()
    helper") and most of the kernel was updated to use it. Update a
    remaining file.

    The semantic patch that makes this change is as follows:
    (http://coccinelle.lip6.fr/)

    @@ expression e; @@
    - atomic_inc(&e->mm_count);
    + mmgrab(e);

    Signed-off-by: Julia Lawall
    Reviewed-by: Cornelia Huck
    Signed-off-by: Alex Williamson

    Julia Lawall
     

07 Jan, 2020

1 commit

  • The nvlink2 subdriver for IBM Witherspoon machines preregisters
    GPU memory in the IOMMI API so KVM TCE code can map this memory
    for DMA as well. This is done by mm_iommu_newdev() called from
    vfio_pci_nvgpu_regops::mmap.

    In an unlikely event of failure the data->mem remains NULL and
    since mm_iommu_put() (which unregisters the region and unpins memory
    if that was regular memory) does not expect mem=NULL, it should not be
    called.

    This adds a check to only call mm_iommu_put() for a valid data->mem.

    Fixes: 7f92891778df ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver")
    Signed-off-by: Alexey Kardashevskiy
    Signed-off-by: Alex Williamson

    Alexey Kardashevskiy
     

06 Jan, 2020

1 commit


08 Dec, 2019

1 commit


05 Dec, 2019

1 commit


04 Dec, 2019

1 commit

  • Pull PCI updates from Bjorn Helgaas:
    "Enumeration:

    - Warn if a host bridge has no NUMA info (Yunsheng Lin)

    - Add PCI_STD_NUM_BARS for the number of standard BARs (Denis
    Efremov)

    Resource management:

    - Fix boot-time Embedded Controller GPE storm caused by incorrect
    resource assignment after ACPI Bus Check Notification (Mika
    Westerberg)

    - Protect pci_reassign_bridge_resources() against concurrent
    addition/removal (Benjamin Herrenschmidt)

    - Fix bridge dma_ranges resource list cleanup (Rob Herring)

    - Add "pci=hpmmiosize" and "pci=hpmmioprefsize" parameters to control
    the MMIO and prefetchable MMIO window sizes of hotplug bridges
    independently (Nicholas Johnson)

    - Fix MMIO/MMIO_PREF window assignment that assigned more space than
    desired (Nicholas Johnson)

    - Only enforce bus numbers from bridge EA if the bridge has EA
    devices downstream (Subbaraya Sundeep)

    - Consolidate DT "dma-ranges" parsing and convert all host drivers to
    use shared parsing (Rob Herring)

    Error reporting:

    - Restore AER capability after resume (Mayurkumar Patel)

    - Add PoisonTLPBlocked AER counter (Rajat Jain)

    - Use for_each_set_bit() to simplify AER code (Andy Shevchenko)

    - Fix AER kernel-doc (Andy Shevchenko)

    - Add "pcie_ports=dpc-native" parameter to allow native use of DPC
    even if platform didn't grant control over AER (Olof Johansson)

    Hotplug:

    - Avoid returning prematurely from sysfs requests to enable or
    disable a PCIe hotplug slot (Lukas Wunner)

    - Don't disable interrupts twice when suspending hotplug ports (Mika
    Westerberg)

    - Fix deadlocks when PCIe ports are hot-removed while suspended (Mika
    Westerberg)

    Power management:

    - Remove unnecessary ASPM locking (Bjorn Helgaas)

    - Add support for disabling L1 PM Substates (Heiner Kallweit)

    - Allow re-enabling Clock PM after it has been disabled (Heiner
    Kallweit)

    - Add sysfs attributes for controlling ASPM link states (Heiner
    Kallweit)

    - Remove CONFIG_PCIEASPM_DEBUG, including "link_state" and "clk_ctl"
    sysfs files (Heiner Kallweit)

    - Avoid AMD FCH XHCI USB PME# from D0 defect that prevents wakeup on
    USB 2.0 or 1.1 connect events (Kai-Heng Feng)

    - Move power state check out of pci_msi_supported() (Bjorn Helgaas)

    - Fix incorrect MSI-X masking on resume and revert related nvme quirk
    for Kingston NVME SSD running FW E8FK11.T (Jian-Hong Pan)

    - Always return devices to D0 when thawing to fix hibernation with
    drivers like mlx4 that used legacy power management (previously we
    only did it for drivers with new power management ops) (Dexuan Cui)

    - Clear PCIe PME Status even for legacy power management (Bjorn
    Helgaas)

    - Fix PCI PM documentation errors (Bjorn Helgaas)

    - Use dev_printk() for more power management messages (Bjorn Helgaas)

    - Apply D2 delay as milliseconds, not microseconds (Bjorn Helgaas)

    - Convert xen-platform from legacy to generic power management (Bjorn
    Helgaas)

    - Removed unused .resume_early() and .suspend_late() legacy power
    management hooks (Bjorn Helgaas)

    - Rearrange power management code for clarity (Rafael J. Wysocki)

    - Decode power states more clearly ("4" or "D4" really refers to
    "D3cold") (Bjorn Helgaas)

    - Notice when reading PM Control register returns an error (~0)
    instead of interpreting it as being in D3hot (Bjorn Helgaas)

    - Add missing link delays required by the PCIe spec (Mika Westerberg)

    Virtualization:

    - Move pci_prg_resp_pasid_required() to CONFIG_PCI_PRI (Bjorn
    Helgaas)

    - Allow VFs to use PRI (the PF PRI is shared by the VFs, but the code
    previously didn't recognize that) (Kuppuswamy Sathyanarayanan)

    - Allow VFs to use PASID (the PF PASID capability is shared by the
    VFs, but the code previously didn't recognize that) (Kuppuswamy
    Sathyanarayanan)

    - Disconnect PF and VF ATS enablement, since ATS in PFs and
    associated VFs can be enabled independently (Kuppuswamy
    Sathyanarayanan)

    - Cache PRI and PASID capability offsets (Kuppuswamy Sathyanarayanan)

    - Cache the PRI PRG Response PASID Required bit (Bjorn Helgaas)

    - Consolidate ATS declarations in linux/pci-ats.h (Krzysztof
    Wilczynski)

    - Remove unused PRI and PASID stubs (Bjorn Helgaas)

    - Removed unnecessary EXPORT_SYMBOL_GPL() from ATS, PRI, and PASID
    interfaces that are only used by built-in IOMMU drivers (Bjorn
    Helgaas)

    - Hide PRI and PASID state restoration functions used only inside the
    PCI core (Bjorn Helgaas)

    - Add a DMA alias quirk for the Intel VCA NTB (Slawomir Pawlowski)

    - Serialize sysfs sriov_numvfs reads vs writes (Pierre Crégut)

    - Update Cavium ACS quirk for ThunderX2 and ThunderX3 (George
    Cherian)

    - Fix the UPDCR register address in the Intel ACS quirk (Steffen
    Liebergeld)

    - Unify ACS quirk implementations (Bjorn Helgaas)

    Amlogic Meson host bridge driver:

    - Fix meson PERST# GPIO polarity problem (Remi Pommarel)

    - Add DT bindings for Amlogic Meson G12A (Neil Armstrong)

    - Fix meson clock names to match DT bindings (Neil Armstrong)

    - Add meson support for Amlogic G12A SoC with separate shared PHY
    (Neil Armstrong)

    - Add meson extended PCIe PHY functions for Amlogic G12A USB3+PCIe
    combo PHY (Neil Armstrong)

    - Add arm64 DT for Amlogic G12A PCIe controller node (Neil Armstrong)

    - Add commented-out description of VIM3 USB3/PCIe mux in arm64 DT
    (Neil Armstrong)

    Broadcom iProc host bridge driver:

    - Invalidate iProc PAXB address mapping before programming it
    (Abhishek Shah)

    - Fix iproc-msi and mvebu __iomem annotations (Ben Dooks)

    Cadence host bridge driver:

    - Refactor Cadence PCIe host controller to use as a library for both
    host and endpoint (Tom Joseph)

    Freescale Layerscape host bridge driver:

    - Add layerscape LS1028a support (Xiaowei Bao)

    Intel VMD host bridge driver:

    - Add VMD bus 224-255 restriction decode (Jon Derrick)

    - Add VMD 8086:9A0B device ID (Jon Derrick)

    - Remove Keith from VMD maintainer list (Keith Busch)

    Marvell ARMADA 3700 / Aardvark host bridge driver:

    - Use LTSSM state to build link training flag since Aardvark doesn't
    implement the Link Training bit (Remi Pommarel)

    - Delay before training Aardvark link in case PERST# was asserted
    before the driver probe (Remi Pommarel)

    - Fix Aardvark issues with Root Control reads and writes (Remi
    Pommarel)

    - Don't rely on jiffies in Aardvark config access path since
    interrupts may be disabled (Remi Pommarel)

    - Fix Aardvark big-endian support (Grzegorz Jaszczyk)

    Marvell ARMADA 370 / XP host bridge driver:

    - Make mvebu_pci_bridge_emul_ops static (Ben Dooks)

    Microsoft Hyper-V host bridge driver:

    - Add hibernation support for Hyper-V virtual PCI devices (Dexuan
    Cui)

    - Track Hyper-V pci_protocol_version per-hbus, not globally (Dexuan
    Cui)

    - Avoid kmemleak false positive on hv hbus buffer (Dexuan Cui)

    Mobiveil host bridge driver:

    - Change mobiveil csr_read()/write() function names that conflict
    with riscv arch functions (Kefeng Wang)

    NVIDIA Tegra host bridge driver:

    - Fix Tegra CLKREQ dependency programming (Vidya Sagar)

    Renesas R-Car host bridge driver:

    - Remove unnecessary header include from rcar (Andrew Murray)

    - Tighten register index checking for rcar inbound range programming
    (Marek Vasut)

    - Fix rcar inbound range alignment calculation to improve packing of
    multiple entries (Marek Vasut)

    - Update rcar MACCTLR setting to match documentation (Yoshihiro
    Shimoda)

    - Clear bit 0 of MACCTLR before PCIETCTLR.CFINIT per manual
    (Yoshihiro Shimoda)

    - Add Marek Vasut and Yoshihiro Shimoda as R-Car maintainers (Simon
    Horman)

    Rockchip host bridge driver:

    - Make rockchip 0V9 and 1V8 power regulators non-optional (Robin
    Murphy)

    Socionext UniPhier host bridge driver:

    - Set uniphier to host (RC) mode always (Kunihiko Hayashi)

    Endpoint drivers:

    - Fix endpoint driver sign extension problem when shifting page
    number to phys_addr_t (Alan Mikhak)

    Misc:

    - Add NumaChip SPDX header (Krzysztof Wilczynski)

    - Replace EXTRA_CFLAGS with ccflags-y (Krzysztof Wilczynski)

    - Remove unused includes (Krzysztof Wilczynski)

    - Removed unused sysfs attribute groups (Ben Dooks)

    - Remove PTM and ASPM dependencies on PCIEPORTBUS (Bjorn Helgaas)

    - Add PCIe Link Control 2 register field definitions to replace magic
    numbers in AMDGPU and Radeon CIK/SI (Bjorn Helgaas)

    - Fix incorrect Link Control 2 Transmit Margin usage in AMDGPU and
    Radeon CIK/SI PCIe Gen3 link training (Bjorn Helgaas)

    - Use pcie_capability_read_word() instead of pci_read_config_word()
    in AMDGPU and Radeon CIK/SI (Frederick Lawler)

    - Remove unused pci_irq_get_node() Greg Kroah-Hartman)

    - Make asm/msi.h mandatory and simplify PCI_MSI_IRQ_DOMAIN Kconfig
    (Palmer Dabbelt, Michal Simek)

    - Read all 64 bits of Switchtec part_event_bitmap (Logan Gunthorpe)

    - Fix erroneous intel-iommu dependency on CONFIG_AMD_IOMMU (Bjorn
    Helgaas)

    - Fix bridge emulation big-endian support (Grzegorz Jaszczyk)

    - Fix dwc find_next_bit() usage (Niklas Cassel)

    - Fix pcitest.c fd leak (Hewenliang)

    - Fix typos and comments (Bjorn Helgaas)

    - Fix Kconfig whitespace errors (Krzysztof Kozlowski)"

    * tag 'pci-v5.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (160 commits)
    PCI: Remove PCI_MSI_IRQ_DOMAIN architecture whitelist
    asm-generic: Make msi.h a mandatory include/asm header
    Revert "nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T"
    PCI/MSI: Fix incorrect MSI-X masking on resume
    PCI/MSI: Move power state check out of pci_msi_supported()
    PCI/MSI: Remove unused pci_irq_get_node()
    PCI: hv: Avoid a kmemleak false positive caused by the hbus buffer
    PCI: hv: Change pci_protocol_version to per-hbus
    PCI: hv: Add hibernation support
    PCI: hv: Reorganize the code in preparation of hibernation
    MAINTAINERS: Remove Keith from VMD maintainer
    PCI/ASPM: Remove PCIEASPM_DEBUG Kconfig option and related code
    PCI/ASPM: Add sysfs attributes for controlling ASPM link states
    PCI: Fix indentation
    drm/radeon: Prefer pcie_capability_read_word()
    drm/radeon: Replace numbers with PCI_EXP_LNKCTL2 definitions
    drm/radeon: Correct Transmit Margin masks
    drm/amdgpu: Prefer pcie_capability_read_word()
    PCI: uniphier: Set mode register to host mode
    drm/amdgpu: Replace numbers with PCI_EXP_LNKCTL2 definitions
    ...

    Linus Torvalds
     

03 Dec, 2019

1 commit

  • Since irq_bypass_register_producer() is called after request_irq(), we
    should do tear-down in reverse order: irq_bypass_unregister_producer()
    then free_irq().

    Specifically free_irq() may release resources required by the
    irqbypass del_producer() callback. Notably an example provided by
    Marc Zyngier on arm64 with GICv4 that he indicates has the potential
    to wedge the hardware:

    free_irq(irq)
    __free_irq(irq)
    irq_domain_deactivate_irq(irq)
    its_irq_domain_deactivate()
    [unmap the VLPI from the ITS]

    kvm_arch_irq_bypass_del_producer(cons, prod)
    kvm_vgic_v4_unset_forwarding(kvm, irq, ...)
    its_unmap_vlpi(irq)
    [Unmap the VLPI from the ITS (again), remap the original LPI]

    Signed-off-by: Jiang Yi
    Cc: stable@vger.kernel.org # v4.4+
    Fixes: 6d7425f109d26 ("vfio: Register/unregister irq_bypass_producer")
    Link: https://lore.kernel.org/kvm/20191127164910.15888-1-giangyi@amazon.com
    Reviewed-by: Marc Zyngier
    Reviewed-by: Eric Auger
    [aw: commit log]
    Signed-off-by: Alex Williamson

    Jiang Yi
     

02 Dec, 2019

1 commit

  • Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
    "As part of the cleanup of some remaining y2038 issues, I came to
    fs/compat_ioctl.c, which still has a couple of commands that need
    support for time64_t.

    In completely unrelated work, I spent time on cleaning up parts of
    this file in the past, moving things out into drivers instead.

    After Al Viro reviewed an earlier version of this series and did a lot
    more of that cleanup, I decided to try to completely eliminate the
    rest of it and move it all into drivers.

    This series incorporates some of Al's work and many patches of my own,
    but in the end stops short of actually removing the last part, which
    is the scsi ioctl handlers. I have patches for those as well, but they
    need more testing or possibly a rewrite"

    * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
    scsi: sd: enable compat ioctls for sed-opal
    pktcdvd: add compat_ioctl handler
    compat_ioctl: move SG_GET_REQUEST_TABLE handling
    compat_ioctl: ppp: move simple commands into ppp_generic.c
    compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
    compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
    compat_ioctl: unify copy-in of ppp filters
    tty: handle compat PPP ioctls
    compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
    compat_ioctl: handle SIOCOUTQNSD
    af_unix: add compat_ioctl support
    compat_ioctl: reimplement SG_IO handling
    compat_ioctl: move WDIOC handling into wdt drivers
    fs: compat_ioctl: move FITRIM emulation into file systems
    gfs2: add compat_ioctl support
    compat_ioctl: remove unused convert_in_user macro
    compat_ioctl: remove last RAID handling code
    compat_ioctl: remove /dev/raw ioctl translation
    compat_ioctl: remove PCI ioctl translation
    compat_ioctl: remove joystick ioctl translation
    ...

    Linus Torvalds
     

23 Oct, 2019

1 commit

  • Each of these drivers has a copy of the same trivial helper function to
    convert the pointer argument and then call the native ioctl handler.

    We now have a generic implementation of that, so use it.

    Acked-by: Greg Kroah-Hartman
    Acked-by: Michael S. Tsirkin
    Acked-by: David S. Miller
    Acked-by: Jarkko Sakkinen
    Reviewed-by: Jarkko Sakkinen
    Reviewed-by: Jason Gunthorpe
    Reviewed-by: Jiri Kosina
    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Cornelia Huck
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

18 Oct, 2019

1 commit

  • Currently, no hugepage split code can transfer the reserved bit
    from head to tail during the split, so checking the head can't make
    a difference in a racing condition with hugepage spliting.

    The buddy wouldn't allow a driver to allocate an hugepage if any
    subpage is reserved in the e820 map at boot, if any driver sets the
    reserved bit of head page before mapping the hugepage in userland,
    it needs to set the reserved bit in all subpages to be safe.

    Signed-off-by: Ben Luo
    Reviewed-by: Andrea Arcangeli
    Signed-off-by: Alex Williamson

    Ben Luo
     

16 Oct, 2019

1 commit

  • After enabling CONFIG_IOMMU_DMA on X86 a new warning appears when
    compiling vfio:

    drivers/vfio/vfio_iommu_type1.c: In function ‘vfio_iommu_type1_attach_group’:
    drivers/vfio/vfio_iommu_type1.c:1827:7: warning: ‘resv_msi_base’ may be used uninitialized in this function [-Wmaybe-uninitialized]
    ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
    ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    The warning is a false positive, because the call to iommu_get_msi_cookie()
    only happens when vfio_iommu_has_sw_msi() returned true. And that only
    happens when it also set resv_msi_base.

    But initialize the variable anyway to get rid of the warning.

    Signed-off-by: Joerg Roedel
    Reviewed-by: Cornelia Huck
    Reviewed-by: Eric Auger
    Signed-off-by: Alex Williamson

    Joerg Roedel
     

14 Oct, 2019

1 commit

  • Code that iterates over all standard PCI BARs typically uses
    PCI_STD_RESOURCE_END. However, that requires the unusual test
    "i < PCI_STD_NUM_BARS".

    Add a definition for PCI_STD_NUM_BARS and change loops to use the more
    idiomatic C style to help avoid fencepost errors.

    Link: https://lore.kernel.org/r/20190927234026.23342-1-efremov@linux.com
    Link: https://lore.kernel.org/r/20190927234308.23935-1-efremov@linux.com
    Link: https://lore.kernel.org/r/20190916204158.6889-3-efremov@linux.com
    Signed-off-by: Denis Efremov
    Signed-off-by: Bjorn Helgaas
    Acked-by: Sebastian Ott # arch/s390/
    Acked-by: Bartlomiej Zolnierkiewicz # video/fbdev/
    Acked-by: Gustavo Pimentel # pci/controller/dwc/
    Acked-by: Jack Wang # scsi/pm8001/
    Acked-by: Martin K. Petersen # scsi/pm8001/
    Acked-by: Ulf Hansson # memstick/

    Denis Efremov
     

26 Sep, 2019

1 commit

  • This patch is a part of a series that extends kernel ABI to allow to pass
    tagged user pointers (with the top byte set to something else other than
    0x00) as syscall arguments.

    vaddr_get_pfn() uses provided user pointers for vma lookups, which can
    only by done with untagged pointers.

    Untag user pointers in this function.

    Link: http://lkml.kernel.org/r/87422b4d72116a975896f2b19b00f38acbd28f33.1563904656.git.andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Reviewed-by: Eric Auger
    Reviewed-by: Vincenzo Frascino
    Reviewed-by: Catalin Marinas
    Reviewed-by: Kees Cook
    Cc: Dave Hansen
    Cc: Will Deacon
    Cc: Al Viro
    Cc: Felix Kuehling
    Cc: Jens Wiklander
    Cc: Khalid Aziz
    Cc: Mauro Carvalho Chehab
    Cc: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     

25 Sep, 2019

1 commit

  • Replace PAGE_SHIFT + compound_order(page) with the new page_shift()
    function. Minor improvements in readability.

    [akpm@linux-foundation.org: fix build in tce_page_is_contained()]
    Link: http://lkml.kernel.org/r/201907241853.yNQTrJWd%25lkp@intel.com
    Link: http://lkml.kernel.org/r/20190721104612.19120-3-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle)
    Reviewed-by: Andrew Morton
    Reviewed-by: Ira Weiny
    Acked-by: Kirill A. Shutemov
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     

21 Sep, 2019

2 commits

  • Pull VFIO updates from Alex Williamson:

    - Fix spapr iommu error case case (Alexey Kardashevskiy)

    - Consolidate region type definitions (Cornelia Huck)

    - Restore saved original PCI state on release (hexin)

    - Simplify mtty sample driver interrupt path (Parav Pandit)

    - Support for reporting valid IOVA regions to user (Shameer Kolothum)

    * tag 'vfio-v5.4-rc1' of git://github.com/awilliam/linux-vfio:
    vfio_pci: Restore original state on release
    vfio/type1: remove duplicate retrieval of reserved regions
    vfio/type1: Add IOVA range capability support
    vfio/type1: check dma map request is within a valid iova range
    vfio/spapr_tce: Fix incorrect tce_iommu_group memory free
    vfio-mdev/mtty: Simplify interrupt generation
    vfio: re-arrange vfio region definitions
    vfio/type1: Update iova list on detach
    vfio/type1: Check reserved region conflict and update iova list
    vfio/type1: Introduce iova list and add iommu aperture validity check

    Linus Torvalds
     
  • Pull powerpc updates from Michael Ellerman:
    "This is a bit late, partly due to me travelling, and partly due to a
    power outage knocking out some of my test systems *while* I was
    travelling.

    - Initial support for running on a system with an Ultravisor, which
    is software that runs below the hypervisor and protects guests
    against some attacks by the hypervisor.

    - Support for building the kernel to run as a "Secure Virtual
    Machine", ie. as a guest capable of running on a system with an
    Ultravisor.

    - Some changes to our DMA code on bare metal, to allow devices with
    medium sized DMA masks (> 32 && < 59 bits) to use more than 2GB of
    DMA space.

    - Support for firmware assisted crash dumps on bare metal (powernv).

    - Two series fixing bugs in and refactoring our PCI EEH code.

    - A large series refactoring our exception entry code to use gas
    macros, both to make it more readable and also enable some future
    optimisations.

    As well as many cleanups and other minor features & fixups.

    Thanks to: Adam Zerella, Alexey Kardashevskiy, Alistair Popple, Andrew
    Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman Khandual,
    Balbir Singh, Benjamin Herrenschmidt, Cédric Le Goater, Christophe
    JAILLET, Christophe Leroy, Christopher M. Riedl, Christoph Hellwig,
    Claudio Carvalho, Daniel Axtens, David Gibson, David Hildenbrand,
    Desnes A. Nunes do Rosario, Ganesh Goudar, Gautham R. Shenoy, Greg
    Kurz, Guerney Hunt, Gustavo Romero, Halil Pasic, Hari Bathini, Joakim
    Tjernlund, Jonathan Neuschafer, Jordan Niethe, Leonardo Bras, Lianbo
    Jiang, Madhavan Srinivasan, Mahesh Salgaonkar, Mahesh Salgaonkar,
    Masahiro Yamada, Maxiwell S. Garcia, Michael Anderson, Nathan
    Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver
    O'Halloran, Qian Cai, Ram Pai, Ravi Bangoria, Reza Arbab, Ryan Grimm,
    Sam Bobroff, Santosh Sivaraj, Segher Boessenkool, Sukadev Bhattiprolu,
    Thiago Bauermann, Thiago Jung Bauermann, Thomas Gleixner, Tom
    Lendacky, Vasant Hegde"

    * tag 'powerpc-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (264 commits)
    powerpc/mm/mce: Keep irqs disabled during lockless page table walk
    powerpc: Use ftrace_graph_ret_addr() when unwinding
    powerpc/ftrace: Enable HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
    ftrace: Look up the address of return_to_handler() using helpers
    powerpc: dump kernel log before carrying out fadump or kdump
    docs: powerpc: Add missing documentation reference
    powerpc/xmon: Fix output of XIVE IPI
    powerpc/xmon: Improve output of XIVE interrupts
    powerpc/mm/radix: remove useless kernel messages
    powerpc/fadump: support holes in kernel boot memory area
    powerpc/fadump: remove RMA_START and RMA_END macros
    powerpc/fadump: update documentation about option to release opalcore
    powerpc/fadump: consider f/w load area
    powerpc/opalcore: provide an option to invalidate /sys/firmware/opal/core file
    powerpc/opalcore: export /sys/firmware/opal/core for analysing opal crashes
    powerpc/fadump: update documentation about CONFIG_PRESERVE_FA_DUMP
    powerpc/fadump: add support to preserve crash data on FADUMP disabled kernel
    powerpc/fadump: improve how crashed kernel's memory is reserved
    powerpc/fadump: consider reserved ranges while releasing memory
    powerpc/fadump: make crash memory ranges array allocation generic
    ...

    Linus Torvalds
     

30 Aug, 2019

1 commit

  • Invalidating a TCE cache entry for each updated TCE is quite expensive.
    This makes use of the new iommu_table_ops::xchg_no_kill()/tce_kill()
    callbacks to bring down the time spent in mapping a huge guest DMA window.

    Signed-off-by: Alexey Kardashevskiy
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20190829085252.72370-4-aik@ozlabs.ru

    Alexey Kardashevskiy
     

24 Aug, 2019

1 commit


23 Aug, 2019

1 commit

  • vfio_pci_enable() saves the device's initial configuration information
    with the intent that it is restored in vfio_pci_disable(). However,
    the commit referenced in Fixes: below replaced the call to
    __pci_reset_function_locked(), which is not wrapped in a state save
    and restore, with pci_try_reset_function(), which overwrites the
    restored device state with the current state before applying it to the
    device. Reinstate use of __pci_reset_function_locked() to return to
    the desired behavior.

    Fixes: 890ed578df82 ("vfio-pci: Use pci "try" reset interface")
    Signed-off-by: hexin
    Signed-off-by: Liu Qi
    Signed-off-by: Zhang Yu
    Signed-off-by: Alex Williamson

    hexin