26 Sep, 2019

1 commit

  • This patch is a part of a series that extends kernel ABI to allow to pass
    tagged user pointers (with the top byte set to something else other than
    0x00) as syscall arguments.

    vaddr_get_pfn() uses provided user pointers for vma lookups, which can
    only by done with untagged pointers.

    Untag user pointers in this function.

    Link: http://lkml.kernel.org/r/87422b4d72116a975896f2b19b00f38acbd28f33.1563904656.git.andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Reviewed-by: Eric Auger
    Reviewed-by: Vincenzo Frascino
    Reviewed-by: Catalin Marinas
    Reviewed-by: Kees Cook
    Cc: Dave Hansen
    Cc: Will Deacon
    Cc: Al Viro
    Cc: Felix Kuehling
    Cc: Jens Wiklander
    Cc: Khalid Aziz
    Cc: Mauro Carvalho Chehab
    Cc: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     

25 Sep, 2019

1 commit

  • Replace PAGE_SHIFT + compound_order(page) with the new page_shift()
    function. Minor improvements in readability.

    [akpm@linux-foundation.org: fix build in tce_page_is_contained()]
    Link: http://lkml.kernel.org/r/201907241853.yNQTrJWd%25lkp@intel.com
    Link: http://lkml.kernel.org/r/20190721104612.19120-3-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle)
    Reviewed-by: Andrew Morton
    Reviewed-by: Ira Weiny
    Acked-by: Kirill A. Shutemov
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     

21 Sep, 2019

2 commits

  • Pull VFIO updates from Alex Williamson:

    - Fix spapr iommu error case case (Alexey Kardashevskiy)

    - Consolidate region type definitions (Cornelia Huck)

    - Restore saved original PCI state on release (hexin)

    - Simplify mtty sample driver interrupt path (Parav Pandit)

    - Support for reporting valid IOVA regions to user (Shameer Kolothum)

    * tag 'vfio-v5.4-rc1' of git://github.com/awilliam/linux-vfio:
    vfio_pci: Restore original state on release
    vfio/type1: remove duplicate retrieval of reserved regions
    vfio/type1: Add IOVA range capability support
    vfio/type1: check dma map request is within a valid iova range
    vfio/spapr_tce: Fix incorrect tce_iommu_group memory free
    vfio-mdev/mtty: Simplify interrupt generation
    vfio: re-arrange vfio region definitions
    vfio/type1: Update iova list on detach
    vfio/type1: Check reserved region conflict and update iova list
    vfio/type1: Introduce iova list and add iommu aperture validity check

    Linus Torvalds
     
  • Pull powerpc updates from Michael Ellerman:
    "This is a bit late, partly due to me travelling, and partly due to a
    power outage knocking out some of my test systems *while* I was
    travelling.

    - Initial support for running on a system with an Ultravisor, which
    is software that runs below the hypervisor and protects guests
    against some attacks by the hypervisor.

    - Support for building the kernel to run as a "Secure Virtual
    Machine", ie. as a guest capable of running on a system with an
    Ultravisor.

    - Some changes to our DMA code on bare metal, to allow devices with
    medium sized DMA masks (> 32 && < 59 bits) to use more than 2GB of
    DMA space.

    - Support for firmware assisted crash dumps on bare metal (powernv).

    - Two series fixing bugs in and refactoring our PCI EEH code.

    - A large series refactoring our exception entry code to use gas
    macros, both to make it more readable and also enable some future
    optimisations.

    As well as many cleanups and other minor features & fixups.

    Thanks to: Adam Zerella, Alexey Kardashevskiy, Alistair Popple, Andrew
    Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman Khandual,
    Balbir Singh, Benjamin Herrenschmidt, Cédric Le Goater, Christophe
    JAILLET, Christophe Leroy, Christopher M. Riedl, Christoph Hellwig,
    Claudio Carvalho, Daniel Axtens, David Gibson, David Hildenbrand,
    Desnes A. Nunes do Rosario, Ganesh Goudar, Gautham R. Shenoy, Greg
    Kurz, Guerney Hunt, Gustavo Romero, Halil Pasic, Hari Bathini, Joakim
    Tjernlund, Jonathan Neuschafer, Jordan Niethe, Leonardo Bras, Lianbo
    Jiang, Madhavan Srinivasan, Mahesh Salgaonkar, Mahesh Salgaonkar,
    Masahiro Yamada, Maxiwell S. Garcia, Michael Anderson, Nathan
    Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver
    O'Halloran, Qian Cai, Ram Pai, Ravi Bangoria, Reza Arbab, Ryan Grimm,
    Sam Bobroff, Santosh Sivaraj, Segher Boessenkool, Sukadev Bhattiprolu,
    Thiago Bauermann, Thiago Jung Bauermann, Thomas Gleixner, Tom
    Lendacky, Vasant Hegde"

    * tag 'powerpc-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (264 commits)
    powerpc/mm/mce: Keep irqs disabled during lockless page table walk
    powerpc: Use ftrace_graph_ret_addr() when unwinding
    powerpc/ftrace: Enable HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
    ftrace: Look up the address of return_to_handler() using helpers
    powerpc: dump kernel log before carrying out fadump or kdump
    docs: powerpc: Add missing documentation reference
    powerpc/xmon: Fix output of XIVE IPI
    powerpc/xmon: Improve output of XIVE interrupts
    powerpc/mm/radix: remove useless kernel messages
    powerpc/fadump: support holes in kernel boot memory area
    powerpc/fadump: remove RMA_START and RMA_END macros
    powerpc/fadump: update documentation about option to release opalcore
    powerpc/fadump: consider f/w load area
    powerpc/opalcore: provide an option to invalidate /sys/firmware/opal/core file
    powerpc/opalcore: export /sys/firmware/opal/core for analysing opal crashes
    powerpc/fadump: update documentation about CONFIG_PRESERVE_FA_DUMP
    powerpc/fadump: add support to preserve crash data on FADUMP disabled kernel
    powerpc/fadump: improve how crashed kernel's memory is reserved
    powerpc/fadump: consider reserved ranges while releasing memory
    powerpc/fadump: make crash memory ranges array allocation generic
    ...

    Linus Torvalds
     

30 Aug, 2019

1 commit

  • Invalidating a TCE cache entry for each updated TCE is quite expensive.
    This makes use of the new iommu_table_ops::xchg_no_kill()/tce_kill()
    callbacks to bring down the time spent in mapping a huge guest DMA window.

    Signed-off-by: Alexey Kardashevskiy
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20190829085252.72370-4-aik@ozlabs.ru

    Alexey Kardashevskiy
     

24 Aug, 2019

1 commit


23 Aug, 2019

1 commit

  • vfio_pci_enable() saves the device's initial configuration information
    with the intent that it is restored in vfio_pci_disable(). However,
    the commit referenced in Fixes: below replaced the call to
    __pci_reset_function_locked(), which is not wrapped in a state save
    and restore, with pci_try_reset_function(), which overwrites the
    restored device state with the current state before applying it to the
    device. Reinstate use of __pci_reset_function_locked() to return to
    the desired behavior.

    Fixes: 890ed578df82 ("vfio-pci: Use pci "try" reset interface")
    Signed-off-by: hexin
    Signed-off-by: Liu Qi
    Signed-off-by: Zhang Yu
    Signed-off-by: Alex Williamson

    hexin
     

20 Aug, 2019

7 commits


24 Jul, 2019

2 commits

  • To permit batching of TLB flushes across multiple calls to the IOMMU
    driver's ->unmap() implementation, introduce a new structure for
    tracking the address range to be flushed and the granularity at which
    the flushing is required.

    This is hooked into the IOMMU API and its caller are updated to make use
    of the new structure. Subsequent patches will plumb this into the IOMMU
    drivers as well, but for now the gathering information is ignored.

    Signed-off-by: Will Deacon

    Will Deacon
     
  • Commit add02cfdc9bc ("iommu: Introduce Interface for IOMMU TLB Flushing")
    added three new TLB flushing operations to the IOMMU API so that the
    underlying driver operations can be batched when unmapping large regions
    of IO virtual address space.

    However, the ->iotlb_range_add() callback has not been implemented by
    any IOMMU drivers (amd_iommu.c implements it as an empty function, which
    incurs the overhead of an indirect branch). Instead, drivers either flush
    the entire IOTLB in the ->iotlb_sync() callback or perform the necessary
    invalidation during ->unmap().

    Attempting to implement ->iotlb_range_add() for arm-smmu-v3.c revealed
    two major issues:

    1. The page size used to map the region in the page-table is not known,
    and so it is not generally possible to issue TLB flushes in the most
    efficient manner.

    2. The only mutable state passed to the callback is a pointer to the
    iommu_domain, which can be accessed concurrently and therefore
    requires expensive synchronisation to keep track of the outstanding
    flushes.

    Remove the callback entirely in preparation for extending ->unmap() and
    ->iotlb_sync() to update a token on the caller's stack.

    Signed-off-by: Will Deacon

    Will Deacon
     

18 Jul, 2019

1 commit

  • Pull VFIO updates from Alex Williamson:

    - Static symbol cleanup in mdev samples (Kefeng Wang)

    - Use vma help in nvlink code (Peng Hao)

    - Remove unused code in mbochs sample (YueHaibing)

    - Send uevents around mdev registration (Alex Williamson)

    * tag 'vfio-v5.3-rc1' of git://github.com/awilliam/linux-vfio:
    mdev: Send uevents around parent device registration
    sample/mdev/mbochs: remove set but not used variable 'mdev_state'
    vfio: vfio_pci_nvlink2: use a vma helper function
    vfio-mdev/samples: make some symbols static

    Linus Torvalds
     

17 Jul, 2019

2 commits

  • Merge more updates from Andrew Morton:
    "VM:
    - z3fold fixes and enhancements by Henry Burns and Vitaly Wool

    - more accurate reclaimed slab caches calculations by Yafang Shao

    - fix MAP_UNINITIALIZED UAPI symbol to not depend on config, by
    Christoph Hellwig

    - !CONFIG_MMU fixes by Christoph Hellwig

    - new novmcoredd parameter to omit device dumps from vmcore, by
    Kairui Song

    - new test_meminit module for testing heap and pagealloc
    initialization, by Alexander Potapenko

    - ioremap improvements for huge mappings, by Anshuman Khandual

    - generalize kprobe page fault handling, by Anshuman Khandual

    - device-dax hotplug fixes and improvements, by Pavel Tatashin

    - enable synchronous DAX fault on powerpc, by Aneesh Kumar K.V

    - add pte_devmap() support for arm64, by Robin Murphy

    - unify locked_vm accounting with a helper, by Daniel Jordan

    - several misc fixes

    core/lib:
    - new typeof_member() macro including some users, by Alexey Dobriyan

    - make BIT() and GENMASK() available in asm, by Masahiro Yamada

    - changed LIST_POISON2 on x86_64 to 0xdead000000000122 for better
    code generation, by Alexey Dobriyan

    - rbtree code size optimizations, by Michel Lespinasse

    - convert struct pid count to refcount_t, by Joel Fernandes

    get_maintainer.pl:
    - add --no-moderated switch to skip moderated ML's, by Joe Perches

    misc:
    - ptrace PTRACE_GET_SYSCALL_INFO interface

    - coda updates

    - gdb scripts, various"

    [ Using merge message suggestion from Vlastimil Babka, with some editing - Linus ]

    * emailed patches from Andrew Morton : (100 commits)
    fs/select.c: use struct_size() in kmalloc()
    mm: add account_locked_vm utility function
    arm64: mm: implement pte_devmap support
    mm: introduce ARCH_HAS_PTE_DEVMAP
    mm: clean up is_device_*_page() definitions
    mm/mmap: move common defines to mman-common.h
    mm: move MAP_SYNC to asm-generic/mman-common.h
    device-dax: "Hotremove" persistent memory that is used like normal RAM
    mm/hotplug: make remove_memory() interface usable
    device-dax: fix memory and resource leak if hotplug fails
    include/linux/lz4.h: fix spelling and copy-paste errors in documentation
    ipc/mqueue.c: only perform resource calculation if user valid
    include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures
    scripts/gdb: add helpers to find and list devices
    scripts/gdb: add lx-genpd-summary command
    drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
    kernel/pid.c: convert struct pid count to refcount_t
    drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings
    select: shift restore_saved_sigmask_unless() into poll_select_copy_remaining()
    select: change do_poll() to return -ERESTARTNOHAND rather than -EINTR
    ...

    Linus Torvalds
     
  • locked_vm accounting is done roughly the same way in five places, so
    unify them in a helper.

    Include the helper's caller in the debug print to distinguish between
    callsites.

    Error codes stay the same, so user-visible behavior does too. The one
    exception is that the -EPERM case in tce_account_locked_vm is removed
    because Alexey has never seen it triggered.

    [daniel.m.jordan@oracle.com: v3]
    Link: http://lkml.kernel.org/r/20190529205019.20927-1-daniel.m.jordan@oracle.com
    [sfr@canb.auug.org.au: fix mm/util.c]
    Link: http://lkml.kernel.org/r/20190524175045.26897-1-daniel.m.jordan@oracle.com
    Signed-off-by: Daniel Jordan
    Signed-off-by: Stephen Rothwell
    Tested-by: Alexey Kardashevskiy
    Acked-by: Alex Williamson
    Cc: Alan Tull
    Cc: Alex Williamson
    Cc: Benjamin Herrenschmidt
    Cc: Christoph Lameter
    Cc: Christophe Leroy
    Cc: Davidlohr Bueso
    Cc: Jason Gunthorpe
    Cc: Mark Rutland
    Cc: Michael Ellerman
    Cc: Moritz Fischer
    Cc: Paul Mackerras
    Cc: Steve Sistare
    Cc: Wu Hao
    Cc: Ira Weiny
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Jordan
     

15 Jul, 2019

1 commit


12 Jul, 2019

1 commit


03 Jul, 2019

1 commit


19 Jun, 2019

1 commit

  • Based on 2 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation #

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 4122 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Enrico Weigelt
    Reviewed-by: Kate Stewart
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

07 Jun, 2019

3 commits

  • In following sequences, child devices created while removing mdev parent
    device can be left out, or it may lead to race of removing half
    initialized child mdev devices.

    issue-1:
    --------
    cpu-0 cpu-1
    ----- -----
    mdev_unregister_device()
    device_for_each_child()
    mdev_device_remove_cb()
    mdev_device_remove()
    create_store()
    mdev_device_create() [...]
    device_add()
    parent_remove_sysfs_files()

    /* BUG: device added by cpu-0
    * whose parent is getting removed
    * and it won't process this mdev.
    */

    issue-2:
    --------
    Below crash is observed when user initiated remove is in progress
    and mdev_unregister_driver() completes parent unregistration.

    cpu-0 cpu-1
    ----- -----
    remove_store()
    mdev_device_remove()
    active = false;
    mdev_unregister_device()
    parent device removed.
    [...]
    parents->ops->remove()
    /*
    * BUG: Accessing invalid parent.
    */

    This is similar race like create() racing with mdev_unregister_device().

    BUG: unable to handle kernel paging request at ffffffffc0585668
    PGD e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
    Oops: 0000 [#1] SMP PTI
    CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
    Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
    RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev]
    Call Trace:
    remove_store+0x71/0x90 [mdev]
    kernfs_fop_write+0x113/0x1a0
    vfs_write+0xad/0x1b0
    ksys_write+0x5a/0xe0
    do_syscall_64+0x5a/0x210
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Therefore, mdev core is improved as below to overcome above issues.

    Wait for any ongoing mdev create() and remove() to finish before
    unregistering parent device.
    This continues to allow multiple create and remove to progress in
    parallel for different mdev devices as most common case.
    At the same time guard parent removal while parent is being accessed by
    create() and remove() callbacks.
    create()/remove() and unregister_device() are synchronized by the rwsem.

    Refactor device removal code to mdev_device_remove_common() to avoid
    acquiring unreg_sem of the parent.

    Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
    Signed-off-by: Parav Pandit
    Reviewed-by: Cornelia Huck
    Signed-off-by: Alex Williamson

    Parav Pandit
     
  • If device is removal is initiated by two threads as below, mdev core
    attempts to create a syfs remove file on stale device.
    During this flow, below [1] call trace is observed.

    cpu-0 cpu-1
    ----- -----
    mdev_unregister_device()
    device_for_each_child
    mdev_device_remove_cb
    mdev_device_remove
    user_syscall
    remove_store()
    mdev_device_remove()
    [..]
    unregister device();
    /* not found in list or
    * active=false.
    */
    sysfs_create_file()
    ..Call trace

    Now that mdev core follows correct device removal sequence of the linux
    bus model, remove shouldn't fail in normal cases. If it fails, there is
    no point of creating a stale file or checking for specific error status.

    kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
    sysfs_create_file_ns+0x7f/0x90
    kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
    5.1.0-rc6-vdevbus+ #6
    kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
    08/09/2016
    kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
    kernel: Call Trace:
    kernel: remove_store+0xdc/0x100 [mdev]
    kernel: kernfs_fop_write+0x113/0x1a0
    kernel: vfs_write+0xad/0x1b0
    kernel: ksys_write+0x5a/0xe0
    kernel: do_syscall_64+0x5a/0x210
    kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Reviewed-by: Cornelia Huck
    Signed-off-by: Parav Pandit
    Signed-off-by: Alex Williamson

    Parav Pandit
     
  • This patch addresses below two issues and prepares the code to address
    3rd issue listed below.

    1. mdev device is placed on the mdev bus before it is created in the
    vendor driver. Once a device is placed on the mdev bus without creating
    its supporting underlying vendor device, mdev driver's probe() gets
    triggered. However there isn't a stable mdev available to work on.

    create_store()
    mdev_create_device()
    device_register()
    ...
    vfio_mdev_probe()
    [...]
    parent->ops->create()
    vfio_ap_mdev_create()
    mdev_set_drvdata(mdev, matrix_mdev);
    /* Valid pointer set above */

    Due to this way of initialization, mdev driver who wants to use the mdev,
    doesn't have a valid mdev to work on.

    2. Current creation sequence is,
    parent->ops_create()
    groups_register()

    Remove sequence is,
    parent->ops->remove()
    groups_unregister()

    However, remove sequence should be exact mirror of creation sequence.
    Once this is achieved, all users of the mdev will be terminated first
    before removing underlying vendor device.
    (Follow standard linux driver model).
    At that point vendor's remove() ops shouldn't fail because taking the
    device off the bus should terminate any usage.

    3. When remove operation fails, mdev sysfs removal attempts to add the
    file back on already removed device. Following call trace [1] is observed.

    [1] call trace:
    kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327 sysfs_create_file_ns+0x7f/0x90
    kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
    kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
    kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
    kernel: Call Trace:
    kernel: remove_store+0xdc/0x100 [mdev]
    kernel: kernfs_fop_write+0x113/0x1a0
    kernel: vfs_write+0xad/0x1b0
    kernel: ksys_write+0x5a/0xe0
    kernel: do_syscall_64+0x5a/0x210
    kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Therefore, mdev core is improved in following ways.

    1. Split the device registration/deregistration sequence so that some
    things can be done between initialization of the device and hooking it
    up to the bus respectively after deregistering it from the bus but
    before giving up our final reference.
    In particular, this means invoking the ->create() and ->remove()
    callbacks in those new windows. This gives the vendor driver an
    initialized mdev device to work with during creation.
    At the same time, a bus driver who wish to bind to mdev driver also
    gets initialized mdev device.

    This follows standard Linux kernel bus and device model.

    2. During remove flow, first remove the device from the bus. This
    ensures that any bus specific devices are removed.
    Once device is taken off the mdev bus, invoke remove() of mdev
    from the vendor driver.

    3. The driver core device model provides way to register and auto
    unregister the device sysfs attribute groups at dev->groups.
    Make use of dev->groups to let core create the groups and eliminate
    code to avoid explicit groups creation and removal.

    To ensure, that new sequence is solid, a below stack dump of a
    process is taken who attempts to remove the device while device is in
    use by vfio driver and user application.
    This stack dump validates that vfio driver guards against such device
    removal when device is in use.

    cat /proc/21962/stack
    [] vfio_del_group_dev+0x216/0x3c0 [vfio]
    [] mdev_remove+0x21/0x40 [mdev]
    [] device_release_driver_internal+0xe8/0x1b0
    [] bus_remove_device+0xf9/0x170
    [] device_del+0x168/0x350
    [] mdev_device_remove_common+0x1d/0x50 [mdev]
    [] mdev_device_remove+0x8c/0xd0 [mdev]
    [] remove_store+0x71/0x90 [mdev]
    [] kernfs_fop_write+0x113/0x1a0
    [] vfs_write+0xad/0x1b0
    [] ksys_write+0x5a/0xe0
    [] do_syscall_64+0x5a/0x210
    [] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [] 0xffffffffffffffff

    This prepares the code to eliminate calling device_create_file() in
    subsequent patch.

    Reviewed-by: Cornelia Huck
    Signed-off-by: Parav Pandit
    Signed-off-by: Alex Williamson

    Parav Pandit
     

31 May, 2019

2 commits

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms and conditions of the gnu general public license
    version 2 as published by the free software foundation this program
    is distributed in the hope it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details you should have received a copy of the gnu general
    public license along with this program if not see http www gnu org
    licenses

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 228 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Steve Winslow
    Reviewed-by: Richard Fontana
    Reviewed-by: Alexios Zavras
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190528171438.107155473@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation this program is
    distributed in the hope that it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 655 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Kate Stewart
    Reviewed-by: Richard Fontana
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070034.575739538@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

1 commit


15 May, 2019

2 commits

  • To facilitate additional options to get_user_pages_fast() change the
    singular write parameter to be gup_flags.

    This patch does not change any functionality. New functionality will
    follow in subsequent patches.

    Some of the get_user_pages_fast() call sites were unchanged because they
    already passed FOLL_WRITE or 0 for the write parameter.

    NOTE: It was suggested to change the ordering of the get_user_pages_fast()
    arguments to ensure that callers were converted. This breaks the current
    GUP call site convention of having the returned pages be the final
    parameter. So the suggestion was rejected.

    Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
    Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
    Signed-off-by: Ira Weiny
    Reviewed-by: Mike Marshall
    Cc: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Dan Williams
    Cc: "David S. Miller"
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: James Hogan
    Cc: Jason Gunthorpe
    Cc: John Hubbard
    Cc: "Kirill A. Shutemov"
    Cc: Martin Schwidefsky
    Cc: Michal Hocko
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Thomas Gleixner
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ira Weiny
     
  • Pach series "Add FOLL_LONGTERM to GUP fast and use it".

    HFI1, qib, and mthca, use get_user_pages_fast() due to its performance
    advantages. These pages can be held for a significant time. But
    get_user_pages_fast() does not protect against mapping FS DAX pages.

    Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
    retains the performance while also adding the FS DAX checks. XDP has also
    shown interest in using this functionality.[1]

    In addition we change get_user_pages() to use the new FOLL_LONGTERM flag
    and remove the specialized get_user_pages_longterm call.

    [1] https://lkml.org/lkml/2019/3/19/939

    "longterm" is a relative thing and at this point is probably a misnomer.
    This is really flagging a pin which is going to be given to hardware and
    can't move. I've thought of a couple of alternative names but I think we
    have to settle on if we are going to use FL_LAYOUT or something else to
    solve the "longterm" problem. Then I think we can change the flag to a
    better name.

    Secondly, it depends on how often you are registering memory. I have
    spoken with some RDMA users who consider MR in the performance path...
    For the overall application performance. I don't have the numbers as the
    tests for HFI1 were done a long time ago. But there was a significant
    advantage. Some of which is probably due to the fact that you don't have
    to hold mmap_sem.

    Finally, architecturally I think it would be good for everyone to use
    *_fast. There are patches submitted to the RDMA list which would allow
    the use of *_fast (they reworking the use of mmap_sem) and as soon as they
    are accepted I'll submit a patch to convert the RDMA core as well. Also
    to this point others are looking to use *_fast.

    As an aside, Jasons pointed out in my previous submission that *_fast and
    *_unlocked look very much the same. I agree and I think further cleanup
    will be coming. But I'm focused on getting the final solution for DAX at
    the moment.

    This patch (of 7):

    This patch starts a series which aims to support FOLL_LONGTERM in
    get_user_pages_fast(). Some callers who would like to do a longterm (user
    controlled pin) of pages with the fast variant of GUP for performance
    purposes.

    Rather than have a separate get_user_pages_longterm() call, introduce
    FOLL_LONGTERM and change the longterm callers to use it.

    This patch does not change any functionality. In the short term
    "longterm" or user controlled pins are unsafe for Filesystems and FS DAX
    in particular has been blocked. However, callers of get_user_pages_fast()
    were not "protected".

    FOLL_LONGTERM can _only_ be supported with get_user_pages[_fast]() as it
    requires vmas to determine if DAX is in use.

    NOTE: In merging with the CMA changes we opt to change the
    get_user_pages() call in check_and_migrate_cma_pages() to a call of
    __get_user_pages_locked() on the newly migrated pages. This makes the
    code read better in that we are calling __get_user_pages_locked() on the
    pages before and after a potential migration.

    As a side affect some of the interfaces are cleaned up but this is not the
    primary purpose of the series.

    In review[1] it was asked:

    > This I don't get - if you do lock down long term mappings performance
    > of the actual get_user_pages call shouldn't matter to start with.
    >
    > What do I miss?

    A couple of points.

    First "longterm" is a relative thing and at this point is probably a
    misnomer. This is really flagging a pin which is going to be given to
    hardware and can't move. I've thought of a couple of alternative names
    but I think we have to settle on if we are going to use FL_LAYOUT or
    something else to solve the "longterm" problem. Then I think we can
    change the flag to a better name.

    Second, It depends on how often you are registering memory. I have spoken
    with some RDMA users who consider MR in the performance path... For the
    overall application performance. I don't have the numbers as the tests
    for HFI1 were done a long time ago. But there was a significant
    advantage. Some of which is probably due to the fact that you don't have
    to hold mmap_sem.

    Finally, architecturally I think it would be good for everyone to use
    *_fast. There are patches submitted to the RDMA list which would allow
    the use of *_fast (they reworking the use of mmap_sem) and as soon as they
    are accepted I'll submit a patch to convert the RDMA core as well. Also
    to this point others are looking to use *_fast.

    As an asside, Jasons pointed out in my previous submission that *_fast and
    *_unlocked look very much the same. I agree and I think further cleanup
    will be coming. But I'm focused on getting the final solution for DAX at
    the moment.

    [1] https://lore.kernel.org/lkml/20190220180255.GA12020@iweiny-DESK2.sc.intel.com/T/#md6abad2569f3bf6c1f03686c8097ab6563e94965

    [ira.weiny@intel.com: v3]
    Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
    Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
    Link: http://lkml.kernel.org/r/20190317183438.2057-2-ira.weiny@intel.com
    Signed-off-by: Ira Weiny
    Reviewed-by: Andrew Morton
    Cc: Aneesh Kumar K.V
    Cc: Michal Hocko
    Cc: John Hubbard
    Cc: "Kirill A. Shutemov"
    Cc: Peter Zijlstra
    Cc: Jason Gunthorpe
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "David S. Miller"
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: Ralf Baechle
    Cc: James Hogan
    Cc: Dan Williams
    Cc: Mike Marshall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ira Weiny
     

13 May, 2019

1 commit

  • Pull IOMMU updates from Joerg Roedel:

    - ATS support for ARM-SMMU-v3.

    - AUX domain support in the IOMMU-API and the Intel VT-d driver. This
    adds support for multiple DMA address spaces per (PCI-)device. The
    use-case is to multiplex devices between host and KVM guests in a
    more flexible way than supported by SR-IOV.

    - the rest are smaller cleanups and fixes, two of which needed to be
    reverted after testing in linux-next.

    * tag 'iommu-updates-v5.2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (45 commits)
    Revert "iommu/amd: Flush not present cache in iommu_map_page"
    Revert "iommu/amd: Remove the leftover of bypass support"
    iommu/vt-d: Fix leak in intel_pasid_alloc_table on error path
    iommu/vt-d: Make kernel parameter igfx_off work with vIOMMU
    iommu/vt-d: Set intel_iommu_gfx_mapped correctly
    iommu/amd: Flush not present cache in iommu_map_page
    iommu/vt-d: Cleanup: no spaces at the start of a line
    iommu/vt-d: Don't request page request irq under dmar_global_lock
    iommu/vt-d: Use struct_size() helper
    iommu/mediatek: Fix leaked of_node references
    iommu/amd: Remove amd_iommu_pd_list
    iommu/arm-smmu: Log CBFRSYNRA register on context fault
    iommu/arm-smmu-v3: Don't disable SMMU in kdump kernel
    iommu/arm-smmu-v3: Disable tagged pointers
    iommu/arm-smmu-v3: Add support for PCI ATS
    iommu/arm-smmu-v3: Link domains and devices
    iommu/arm-smmu-v3: Add a master->domain pointer
    iommu/arm-smmu-v3: Store SteamIDs in master
    iommu/arm-smmu-v3: Rename arm_smmu_master_data to arm_smmu_master
    ACPI/IORT: Check ATS capability in root complex nodes
    ...

    Linus Torvalds
     

11 May, 2019

1 commit

  • Pull VFIO updates from Alex Williamson:

    - Improve dev_printk() usage (Bjorn Helgaas)

    - Fix issue with blocking in !TASK_RUNNING state while waiting for
    userspace to release devices (Farhan Ali)

    - Fix error path cleanup in nvlink setup (Greg Kurz)

    - mdev-core cleanups and fixes in preparation for more use cases (Parav
    Pandit)

    - Cornelia has volunteered as an official vfio reviewer (Cornelia Huck)

    * tag 'vfio-v5.2-rc1' of git://github.com/awilliam/linux-vfio:
    vfio: Add Cornelia Huck as reviewer
    vfio/mdev: Avoid inline get and put parent helpers
    vfio/mdev: Fix aborting mdev child device removal if one fails
    vfio/mdev: Follow correct remove sequence
    vfio/mdev: Avoid masking error code to EBUSY
    vfio/mdev: Drop redundant extern for exported symbols
    vfio/mdev: Removed unused kref
    vfio/mdev: Avoid release parent reference during error path
    vfio-pci/nvlink2: Fix potential VMA leak
    vfio: Fix WARNING "do not call blocking ops when !TASK_RUNNING"
    vfio: Use dev_printk() when possible

    Linus Torvalds
     

08 May, 2019

6 commits

  • As section 15 of Documentation/process/coding-style.rst clearly
    describes that compiler will be able to optimize code.

    Hence drop inline for get and put helpers for parent.

    Signed-off-by: Parav Pandit
    Signed-off-by: Alex Williamson

    Parav Pandit
     
  • device_for_each_child() stops executing callback function for remaining
    child devices, if callback hits an error.
    Each child mdev device is independent of each other.
    While unregistering parent device, mdev core must remove all child mdev
    devices.
    Therefore, mdev_device_remove_cb() always returns success so that
    device_for_each_child doesn't abort if one child removal hits error.

    While at it, improve remove and unregister functions for below simplicity.

    There isn't need to pass forced flag pointer during mdev parent
    removal which invokes mdev_device_remove(). So simplify the flow.

    mdev_device_remove() is called from two paths.
    1. mdev_unregister_driver()
    mdev_device_remove_cb()
    mdev_device_remove()
    2. remove_store()
    mdev_device_remove()

    Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
    Reviewed-by: Maxim Levitsky
    Signed-off-by: Parav Pandit
    Signed-off-by: Alex Williamson

    Parav Pandit
     
  • mdev_remove_sysfs_files() should follow exact mirror sequence of a
    create, similar to what is followed in error unwinding path of
    mdev_create_sysfs_files().

    Fixes: 6a62c1dfb5c7 ("vfio/mdev: Re-order sysfs attribute creation")
    Reviewed-by: Cornelia Huck
    Reviewed-by: Maxim Levitsky
    Signed-off-by: Parav Pandit
    Signed-off-by: Alex Williamson

    Parav Pandit
     
  • Instead of masking return error to -EBUSY, return actual error
    returned by the driver.

    Reviewed-by: Cornelia Huck
    Reviewed-by: Maxim Levitsky
    Signed-off-by: Parav Pandit
    Signed-off-by: Alex Williamson

    Parav Pandit
     
  • Remove unused kref from the mdev_device structure.

    Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
    Reviewed-by: Cornelia Huck
    Reviewed-by: Kirti Wankhede
    Reviewed-by: Maxim Levitsky
    Signed-off-by: Parav Pandit
    Signed-off-by: Alex Williamson

    Parav Pandit
     
  • During mdev parent registration in mdev_register_device(),
    if parent device is duplicate, it releases the reference of existing
    parent device.
    This is incorrect. Existing parent device should not be touched.

    Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
    Reviewed-by: Cornelia Huck
    Reviewed-by: Kirti Wankhede
    Reviewed-by: Maxim Levitsky
    Signed-off-by: Parav Pandit
    Signed-off-by: Alex Williamson

    Parav Pandit
     

02 May, 2019

1 commit

  • If vfio_pci_register_dev_region() fails then we should rollback
    previous changes, ie. unmap the ATSD registers.

    Fixes: 7f92891778df ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver")
    Signed-off-by: Greg Kurz
    Reviewed-by: Alexey Kardashevskiy
    Signed-off-by: Alex Williamson

    Greg Kurz