01 Feb, 2020

4 commits

  • Pull rdma updates from Jason Gunthorpe:
    "A very quiet cycle with few notable changes. Mostly the usual list of
    one or two patches to drivers changing something that isn't quite rc
    worthy. The subsystem seems to be seeing a larger number of rework and
    cleanup style patches right now, I feel that several vendors are
    prepping their drivers for new silicon.

    Summary:

    - Driver updates and cleanup for qedr, bnxt_re, hns, siw, mlx5, mlx4,
    rxe, i40iw

    - Larger series doing cleanup and rework for hns and hfi1.

    - Some general reworking of the CM code to make it a little more
    understandable

    - Unify the different code paths connected to the uverbs FD scheme

    - New UAPI ioctls conversions for get context and get async fd

    - Trace points for CQ and CM portions of the RDMA stack

    - mlx5 driver support for virtio-net formatted rings as RDMA raw
    ethernet QPs

    - verbs support for setting the PCI-E relaxed ordering bit on DMA
    traffic connected to a MR

    - A couple of bug fixes that came too late to make rc7"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (108 commits)
    RDMA/core: Make the entire API tree static
    RDMA/efa: Mask access flags with the correct optional range
    RDMA/cma: Fix unbalanced cm_id reference count during address resolve
    RDMA/umem: Fix ib_umem_find_best_pgsz()
    IB/mlx4: Fix leak in id_map_find_del
    IB/opa_vnic: Spelling correction of 'erorr' to 'error'
    IB/hfi1: Fix logical condition in msix_request_irq
    RDMA/cm: Remove CM message structs
    RDMA/cm: Use IBA functions for complex structure members
    RDMA/cm: Use IBA functions for simple structure members
    RDMA/cm: Use IBA functions for swapping get/set acessors
    RDMA/cm: Use IBA functions for simple get/set acessors
    RDMA/cm: Add SET/GET implementations to hide IBA wire format
    RDMA/cm: Add accessors for CM_REQ transport_type
    IB/mlx5: Return the administrative GUID if exists
    RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fence
    IB/mlx4: Fix memory leak in add_gid error flow
    IB/mlx5: Expose RoCE accelerator counters
    RDMA/mlx5: Set relaxed ordering when requested
    RDMA/core: Add the core support field to METHOD_GET_CONTEXT
    ...

    Linus Torvalds
     
  • In order to provide a clearer, more symmetric API for pinning and
    unpinning DMA pages. This way, pin_user_pages*() calls match up with
    unpin_user_pages*() calls, and the API is a lot closer to being
    self-explanatory.

    Link: http://lkml.kernel.org/r/20200107224558.2362728-23-jhubbard@nvidia.com
    Signed-off-by: John Hubbard
    Reviewed-by: Jan Kara
    Cc: Alex Williamson
    Cc: Aneesh Kumar K.V
    Cc: Björn Töpel
    Cc: Christoph Hellwig
    Cc: Daniel Vetter
    Cc: Dan Williams
    Cc: Hans Verkuil
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jason Gunthorpe
    Cc: Jens Axboe
    Cc: Jerome Glisse
    Cc: Jonathan Corbet
    Cc: Kirill A. Shutemov
    Cc: Leon Romanovsky
    Cc: Mauro Carvalho Chehab
    Cc: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Hubbard
     
  • Convert infiniband to use the new pin_user_pages*() calls.

    Also, revert earlier changes to Infiniband ODP that had it using
    put_user_page(). ODP is "Case 3" in
    Documentation/core-api/pin_user_pages.rst, which is to say, normal
    get_user_pages() and put_page() is the API to use there.

    The new pin_user_pages*() calls replace corresponding get_user_pages*()
    calls, and set the FOLL_PIN flag. The FOLL_PIN flag requires that the
    caller must return the pages via put_user_page*() calls, but infiniband
    was already doing that as part of an earlier commit.

    Link: http://lkml.kernel.org/r/20200107224558.2362728-14-jhubbard@nvidia.com
    Signed-off-by: John Hubbard
    Reviewed-by: Jason Gunthorpe
    Cc: Alex Williamson
    Cc: Aneesh Kumar K.V
    Cc: Björn Töpel
    Cc: Christoph Hellwig
    Cc: Daniel Vetter
    Cc: Dan Williams
    Cc: Hans Verkuil
    Cc: Ira Weiny
    Cc: Jan Kara
    Cc: Jason Gunthorpe
    Cc: Jens Axboe
    Cc: Jerome Glisse
    Cc: Jonathan Corbet
    Cc: Kirill A. Shutemov
    Cc: Leon Romanovsky
    Cc: Mauro Carvalho Chehab
    Cc: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Hubbard
     
  • And get rid of the mmap_sem calls, as part of that. Note that
    get_user_pages_fast() will, if necessary, fall back to
    __gup_longterm_unlocked(), which takes the mmap_sem as needed.

    Link: http://lkml.kernel.org/r/20200107224558.2362728-10-jhubbard@nvidia.com
    Signed-off-by: John Hubbard
    Reviewed-by: Leon Romanovsky
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Reviewed-by: Jason Gunthorpe
    Reviewed-by: Ira Weiny
    Cc: Alex Williamson
    Cc: Aneesh Kumar K.V
    Cc: Björn Töpel
    Cc: Daniel Vetter
    Cc: Dan Williams
    Cc: Hans Verkuil
    Cc: Jason Gunthorpe
    Cc: Jens Axboe
    Cc: Jerome Glisse
    Cc: Jonathan Corbet
    Cc: Kirill A. Shutemov
    Cc: Mauro Carvalho Chehab
    Cc: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Hubbard
     

31 Jan, 2020

1 commit

  • Compilation of mlx5 driver without CONFIG_INFINIBAND_USER_ACCESS generates
    the following error.

    on x86_64:

    ld: drivers/infiniband/hw/mlx5/main.o: in function `mlx5_ib_handler_MLX5_IB_METHOD_VAR_OBJ_ALLOC':
    main.c:(.text+0x186d): undefined reference to `ib_uverbs_get_ucontext_file'
    ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x2480): undefined reference to `uverbs_idr_class'
    ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x24d8): undefined reference to `uverbs_destroy_def_handler'

    This is happening because some parts of the UAPI description are not
    static. This is a hold over from earlier code that relied on struct
    pointers to refer to object types, now object types are referenced by
    number. Remove the unused globals and add statics to the remaining UAPI
    description elements.

    Remove the redundent #ifdefs around mlx5_ib_*defs and obsolete
    mlx5_ib_get_devx_tree().

    The compiler now trims alot more unused code, including the above
    problematic definitions when !CONFIG_INFINIBAND_USER_ACCESS.

    Fixes: 7be76bef320b ("IB/mlx5: Introduce VAR object and its alloc/destroy methods")
    Reported-by: Randy Dunlap
    Acked-by: Randy Dunlap
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

30 Jan, 2020

1 commit

  • The uapi value IB_UVERBS_ACCESS_OPTIONAL_RANGE shouldn't be used inside
    the driver, use IB_ACCESS_OPTIONAL instead.

    Fixes: 86dd738cf20c ("RDMA/efa: Allow passing of optional access flags for MR registration")
    Link: https://lore.kernel.org/r/20200129071803.40117-1-galpress@amazon.com
    Signed-off-by: Gal Pressman
    Signed-off-by: Jason Gunthorpe

    Gal Pressman
     

29 Jan, 2020

5 commits

  • Pull networking updates from David Miller:

    1) Add WireGuard

    2) Add HE and TWT support to ath11k driver, from John Crispin.

    3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.

    4) Add variable window congestion control to TIPC, from Jon Maloy.

    5) Add BCM84881 PHY driver, from Russell King.

    6) Start adding netlink support for ethtool operations, from Michal
    Kubecek.

    7) Add XDP drop and TX action support to ena driver, from Sameeh
    Jubran.

    8) Add new ipv4 route notifications so that mlxsw driver does not have
    to handle identical routes itself. From Ido Schimmel.

    9) Add BPF dynamic program extensions, from Alexei Starovoitov.

    10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.

    11) Add support for macsec HW offloading, from Antoine Tenart.

    12) Add initial support for MPTCP protocol, from Christoph Paasch,
    Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.

    13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
    Cherian, and others.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
    net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
    udp: segment looped gso packets correctly
    netem: change mailing list
    qed: FW 8.42.2.0 debug features
    qed: rt init valid initialization changed
    qed: Debug feature: ilt and mdump
    qed: FW 8.42.2.0 Add fw overlay feature
    qed: FW 8.42.2.0 HSI changes
    qed: FW 8.42.2.0 iscsi/fcoe changes
    qed: Add abstraction for different hsi values per chip
    qed: FW 8.42.2.0 Additional ll2 type
    qed: Use dmae to write to widebus registers in fw_funcs
    qed: FW 8.42.2.0 Parser offsets modified
    qed: FW 8.42.2.0 Queue Manager changes
    qed: FW 8.42.2.0 Expose new registers and change windows
    qed: FW 8.42.2.0 Internal ram offsets modifications
    MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
    Documentation: net: octeontx2: Add RVU HW and drivers overview
    octeontx2-pf: ethtool RSS config support
    octeontx2-pf: Add basic ethtool support
    ...

    Linus Torvalds
     
  • Below commit missed the AF_IB and loopback code flow in
    rdma_resolve_addr(). This leads to an unbalanced cm_id refcount in
    cma_work_handler() which puts the refcount which was not incremented prior
    to queuing the work.

    A call trace is observed with such code flow:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    [] __mutex_lock_slowpath+0x166/0x1d0
    [] mutex_lock+0x1f/0x2f
    [] cma_work_handler+0x25/0xa0
    [] process_one_work+0x17f/0x440
    [] worker_thread+0x126/0x3c0

    Hence, hold the cm_id reference when scheduling the resolve work item.

    Fixes: 722c7b2bfead ("RDMA/{cma, core}: Avoid callback on rdma_addr_cancel()")
    Link: https://lore.kernel.org/r/20200126142652.104803-2-leon@kernel.org
    Signed-off-by: Parav Pandit
    Signed-off-by: Leon Romanovsky
    Reviewed-by: Jason Gunthorpe
    Signed-off-by: Jason Gunthorpe

    Parav Pandit
     
  • Except for the last entry, the ending iova alignment sets the maximum
    possible page size as the low bits of the iova must be zero when starting
    the next chunk.

    Fixes: 4a35339958f1 ("RDMA/umem: Add API to find best driver supported page size in an MR")
    Link: https://lore.kernel.org/r/20200128135612.174820-1-leon@kernel.org
    Signed-off-by: Artemy Kovalyov
    Signed-off-by: Leon Romanovsky
    Tested-by: Gal Pressman
    Reviewed-by: Jason Gunthorpe
    Signed-off-by: Jason Gunthorpe

    Artemy Kovalyov
     
  • Pull perf updates from Ingo Molnar:
    "Kernel side changes:

    - Ftrace is one of the last W^X violators (after this only KLP is
    left). These patches move it over to the generic text_poke()
    interface and thereby get rid of this oddity. This requires a
    surprising amount of surgery, by Peter Zijlstra.

    - x86/AMD PMUs: add support for 'Large Increment per Cycle Events' to
    count certain types of events that have a special, quirky hw ABI
    (by Kim Phillips)

    - kprobes fixes by Masami Hiramatsu

    Lots of tooling updates as well, the following subcommands were
    updated: annotate/report/top, c2c, clang, record, report/top TUI,
    sched timehist, tests; plus updates were done to the gtk ui, libperf,
    headers and the parser"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
    perf/x86/amd: Add support for Large Increment per Cycle Events
    perf/x86/amd: Constrain Large Increment per Cycle events
    perf/x86/intel/rapl: Add Comet Lake support
    tracing: Initialize ret in syscall_enter_define_fields()
    perf header: Use last modification time for timestamp
    perf c2c: Fix return type for histogram sorting comparision functions
    perf beauty sockaddr: Fix augmented syscall format warning
    perf/ui/gtk: Fix gtk2 build
    perf ui gtk: Add missing zalloc object
    perf tools: Use %define api.pure full instead of %pure-parser
    libperf: Setup initial evlist::all_cpus value
    perf report: Fix no libunwind compiled warning break s390 issue
    perf tools: Support --prefix/--prefix-strip
    perf report: Clarify in help that --children is default
    tools build: Fix test-clang.cpp with Clang 8+
    perf clang: Fix build with Clang 9
    kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic
    tools lib: Fix builds when glibc contains strlcpy()
    perf report/top: Make 'e' visible in the help and make it toggle showing callchains
    perf report/top: Do not offer annotation for symbols without samples
    ...

    Linus Torvalds
     
  • Pull EFI updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Cleanup of the GOP [graphics output] handling code in the EFI stub

    - Complete refactoring of the mixed mode handling in the x86 EFI stub

    - Overhaul of the x86 EFI boot/runtime code

    - Increase robustness for mixed mode code

    - Add the ability to disable DMA at the root port level in the EFI
    stub

    - Get rid of RWX mappings in the EFI memory map and page tables,
    where possible

    - Move the support code for the old EFI memory mapping style into its
    only user, the SGI UV1+ support code.

    - plus misc fixes, updates, smaller cleanups.

    ... and due to interactions with the RWX changes, another round of PAT
    cleanups make a guest appearance via the EFI tree - with no side
    effects intended"

    * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
    efi/x86: Disable instrumentation in the EFI runtime handling code
    efi/libstub/x86: Fix EFI server boot failure
    efi/x86: Disallow efi=old_map in mixed mode
    x86/boot/compressed: Relax sed symbol type regex for LLVM ld.lld
    efi/x86: avoid KASAN false positives when accessing the 1: 1 mapping
    efi: Fix handling of multiple efi_fake_mem= entries
    efi: Fix efi_memmap_alloc() leaks
    efi: Add tracking for dynamically allocated memmaps
    efi: Add a flags parameter to efi_memory_map
    efi: Fix comment for efi_mem_type() wrt absent physical addresses
    efi/arm: Defer probe of PCIe backed efifb on DT systems
    efi/x86: Limit EFI old memory map to SGI UV machines
    efi/x86: Avoid RWX mappings for all of DRAM
    efi/x86: Don't map the entire kernel text RW for mixed mode
    x86/mm: Fix NX bit clearing issue in kernel_map_pages_in_pgd
    efi/libstub/x86: Fix unused-variable warning
    efi/libstub/x86: Use mandatory 16-byte stack alignment in mixed mode
    efi/libstub/x86: Use const attribute for efi_is_64bit()
    efi: Allow disabling PCI busmastering on bridges during boot
    efi/x86: Allow translating 64-bit arguments for mixed mode calls
    ...

    Linus Torvalds
     

28 Jan, 2020

2 commits

  • Pull ioremap updates from Christoph Hellwig:
    "Remove the ioremap_nocache API (plus wrappers) that are always
    identical to ioremap"

    * tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap:
    remove ioremap_nocache and devm_ioremap_nocache
    MIPS: define ioremap_nocache to ioremap

    Linus Torvalds
     
  • Using CX-3 virtual functions, either from a bare-metal machine or
    pass-through from a VM, MAD packets are proxied through the PF driver.

    Since the VF drivers have separate name spaces for MAD Transaction Ids
    (TIDs), the PF driver has to re-map the TIDs and keep the book keeping in
    a cache.

    Following the RDMA Connection Manager (CM) protocol, it is clear when an
    entry has to evicted from the cache. When a DREP is sent from
    mlx4_ib_multiplex_cm_handler(), id_map_find_del() is called. Similar when
    a REJ is received by the mlx4_ib_demux_cm_handler(), id_map_find_del() is
    called.

    This function wipes out the TID in use from the IDR or XArray and removes
    the id_map_entry from the table.

    In short, it does everything except the topping of the cake, which is to
    remove the entry from the list and free it. In other words, for the REJ
    case enumerated above, one id_map_entry will be leaked.

    For the other case above, a DREQ has been received first. The reception of
    the DREQ will trigger queuing of a delayed work to delete the
    id_map_entry, for the case where the VM doesn't send back a DREP.

    In the normal case, the VM _will_ send back a DREP, and id_map_find_del()
    will be called.

    But this scenario introduces a secondary leak. First, when the DREQ is
    received, a delayed work is queued. The VM will then return a DREP, which
    will call id_map_find_del(). As stated above, this will free the TID used
    from the XArray or IDR. Now, there is window where that particular TID can
    be re-allocated, lets say by an outgoing REQ. This TID will later be wiped
    out by the delayed work, when the function id_map_ent_timeout() is
    called. But the id_map_entry allocated by the outgoing REQ will not be
    de-allocated, and we have a leak.

    Both leaks are fixed by removing the id_map_find_del() function and only
    using schedule_delayed(). Of course, a check in schedule_delayed() to see
    if the work already has been queued, has been added.

    Another benefit of always using the delayed version for deleting entries,
    is that we do get a TimeWait effect; a TID no longer in use, will occupy
    the XArray or IDR for CM_CLEANUP_CACHE_TIMEOUT time, without any ability
    of being re-used for that time period.

    Fixes: 3cf69cc8dbeb ("IB/mlx4: Add CM paravirtualization")
    Link: https://lore.kernel.org/r/20200123155521.1212288-1-haakon.bugge@oracle.com
    Signed-off-by: Håkon Bugge
    Signed-off-by: Manjunath Patil
    Reviewed-by: Rama Nichanamatlu
    Reviewed-by: Jack Morgenstein
    Signed-off-by: Jason Gunthorpe

    Håkon Bugge
     

27 Jan, 2020

1 commit

  • Pull SCSI fixes from James Bottomley:
    "Two last minute fixes, both in drivers.

    The fnic one is a highly unlikely condition, but the RDMA one is a
    recently introduced regression that causes a kernel warning to trigger
    in every RDMA logon, which would be unsightly if it got into the final
    release"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
    scsi: RDMA/isert: Fix a recently introduced regression related to logout
    scsi: fnic: do not queue commands during fwreset

    Linus Torvalds
     

26 Jan, 2020

11 commits

  • Correcting a minor spelling mistake in the comments.

    Link: https://lore.kernel.org/r/20200118162542.15188-1-dab9861@gmail.com
    Signed-off-by: Dillon Brock
    Acked-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe

    Dillon Brock
     
  • Clang warns:

    drivers/infiniband/hw/hfi1/msix.c:136:22: warning: overlapping
    comparisons always evaluate to false [-Wtautological-overlap-compare]
    if (type < IRQ_SDMA && type >= IRQ_OTHER)
    ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
    1 warning generated.

    It is impossible for something to be less than 0 (IRQ_SDMA) and greater
    than or equal to 3 (IRQ_OTHER) at the same time. A logical OR should
    have been used to keep the same logic as before.

    Link: https://lore.kernel.org/r/20200116222658.5285-1-natechancellor@gmail.com
    Link: https://github.com/ClangBuiltLinux/linux/issues/841
    Fixes: 13d2a8384bd9 ("IB/hfi1: Decouple IRQ name from type")
    Signed-off-by: Nathan Chancellor
    Reviewed-by: Nick Desaulniers
    Acked-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe

    Nathan Chancellor
     
  • All accesses now use the new IBA acessor scheme, so delete the structs
    entirely and generate the structures from the schema file.

    Link: https://lore.kernel.org/r/20200116170037.30109-8-jgg@ziepe.ca
    Tested-by: Leon Romanovsky
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     
  • Use a Coccinelle spatch to replace CM structure members used as
    structures, arrays, or pointers with IBA_GET/SET versions. Applied with

    $ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c

    The spatch file was generated using the template pattern:

    @@
    expression src;
    expression len;
    {struct} *msg;
    @@
    - memcpy(msg->{old_name}, src, len)
    + IBA_SET_MEM({new_name}, msg, src, len)
    @@
    {struct} *msg;
    identifier x;
    @@
    - msg->{old_name}.x
    + IBA_GET_MEM_PTR({new_name}, msg)->x
    @@
    {struct} *msg;
    @@
    - &msg->{old_name}
    + IBA_GET_MEM_PTR({new_name}, msg)

    For GIDs:
    @@
    {struct} *msg;
    @@
    - msg->{old_name}
    + *IBA_GET_MEM_PTR({new_name}, msg)

    For non-GIDs:
    @@
    {struct} *msg;
    @@
    - msg->{old_name}
    + IBA_GET_MEM_PTR({new_name}, msg)

    Iterated for every remaining IBA_CHECK_OFF()/IBA_CHECK_GET()
    pairing. Touched up with clang-format after.

    Link: https://lore.kernel.org/r/20200116170037.30109-7-jgg@ziepe.ca
    Tested-by: Leon Romanovsky
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     
  • Use a Coccinelle spatch script to replace use of simple CM structure
    members with IBA_GET/SET versions. Applied with

    $ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c

    The spatch file was generated using the template pattern:

    @@
    expression val;
    {struct} *msg;
    @@
    - msg->{old_name} = val
    + IBA_SET({new_name}, msg, be{bits}_to_cpu(val))
    @@
    {struct} *msg;
    @@
    - msg->{old_name}
    + cpu_to_be{bits}(IBA_GET({new_name}, msg))

    Iterated for every IBA_CHECK_OFF that isn't a CM_FIELD_MLOC.

    And the below iterated over all byte sizes to remove doubled byte swaps:

    @@
    expression val;
    @@
    -be{bits}_to_cpu(cpu_to_be{bits}(val))
    +val

    (and __be_to_cpu and ntoh varients)

    Touched up with clang-format after.

    Link: https://lore.kernel.org/r/20200116170037.30109-6-jgg@ziepe.ca
    Tested-by: Leon Romanovsky
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     
  • Use a Coccinelle spatch script to replace CM helper functions that
    return/accept BE values with IBA_GET/SET versions. Applied with

    $ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c

    The spatch file was generated using the template pattern:

    @@
    expression val;
    {struct} *msg;
    @@
    - {old_setter}(msg, val)
    + IBA_SET({new_name}, msg, be{bits}_to_cpu(val))
    @@
    {struct} *msg;
    @@
    - {old_getter}(msg)
    + cpu_to_be{bits}(IBA_GET({new_name}, msg))

    Iterated for every IBA_CHECK_GET_BE()/IBA_CHECK_SET_BE() pairing.

    And the below iterated over all byte sizes to remove doubled byte swaps:

    @@
    expression val;
    @@
    -be{bits}_to_cpu(cpu_to_be{bits}(val))
    +val

    (and __be_to_cpu and ntoh varients)

    Touched up with clang-format after.

    Link: https://lore.kernel.org/r/20200116170037.30109-5-jgg@ziepe.ca
    Tested-by: Leon Romanovsky
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     
  • Use a Coccinelle spatch to replace CM helper functions with IBA_GET/SET
    versions. Applied with

    $ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c

    The spatch file was generated using the template pattern:

    @@
    expression val;
    {struct} *msg;
    @@
    - {old_setter}
    + IBA_SET({new_name}, msg, val)
    @@
    {struct} *msg;
    @@
    - {old_getter}
    + IBA_GET({new_name}, msg)

    Iterated for every IBA_CHECK_GET()/IBA_CHECK_GET() pairing. Touched up
    with clang-format after.

    Link: https://lore.kernel.org/r/20200116170037.30109-4-jgg@ziepe.ca
    Tested-by: Leon Romanovsky
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     
  • There is no separation between RDMA-CM wire format as it is declared in
    IBTA and kernel logic which implements needed support. Such situation
    causes to many mistakes in conversion between big-endian (wire format)
    and CPU format used by kernel. It also mixes RDMA core code with
    combination of uXX and beXX variables.

    The idea that all accesses to IBA definitions will go through special
    GET/SET macros to ensure that no conversion mistakes are made. The
    shifting and masking required to read the value is automatically deduced
    using the field offset description from the tables in the IBA
    specification.

    This starts with the CM MADs described in IBTA release 1.3 volume 1.

    To confirm that the new macros behave the same as the old accessors a
    self-test is included in this patch.

    Each macro replacing a straightforward struct field compile-time tests
    that the new field has the same offsetof() and width as the old field.

    For the fields with accessor functions a runtime test, the 'all ones'
    value is placed in a dummy message and read back in several ways to
    confirm that both approaches give identical results.

    Later patches in this series delete the self test.

    This creates a tested table of new field name, old field name(s) and some
    meta information like BE coding for the functions which will be used in
    the next patches.

    Link: https://lore.kernel.org/r/20200116170037.30109-3-jgg@ziepe.ca
    Link: https://lore.kernel.org/r/20191212093830.316934-5-leon@kernel.org
    Signed-off-by: Leon Romanovsky
    Tested-by: Leon Romanovsky
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Leon Romanovsky
     
  • Access the two fields through wrappers, like all other fields, to make it
    clearer what is happening.

    Link: https://lore.kernel.org/r/20200116170037.30109-2-jgg@ziepe.ca
    Tested-by: Leon Romanovsky
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     
  • A user can change the operational GUID (a.k.a affective GUID) through
    link/infiniband. Therefore it is preferred to return the currently set
    GUID if it exists instead of the operational.

    This way the PF can query which VF GUID will be set in the next bind. In
    order to align with MAC address, zero is returned if administrative GUID
    is not set.

    For example, before setting administrative GUID:
    $ ip link show
    ib0: mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256
    link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    vf 0 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
    spoof checking off, NODE_GUID 00:00:00:00:00:00:00:00, PORT_GUID 00:00:00:00:00:00:00:00, link-state auto, trust off, query_rss off

    Then:

    $ ip link set ib0 vf 0 node_guid 11:00:af:21:cb:05:11:00
    $ ip link set ib0 vf 0 port_guid 22:11:af:21:cb:05:11:00

    After setting administrative GUID:
    $ ip link show
    ib0: mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256
    link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    vf 0 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
    spoof checking off, NODE_GUID 11:00:af:21:cb:05:11:00, PORT_GUID 22:11:af:21:cb:05:11:00, link-state auto, trust off, query_rss off

    Fixes: 9c0015ef0928 ("IB/mlx5: Implement callbacks for getting VFs GUID attributes")
    Link: https://lore.kernel.org/r/20200116120048.12744-1-leon@kernel.org
    Signed-off-by: Danit Goldberg
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Danit Goldberg
     
  • The set of entry->driver_removed is missing locking, protect it with
    xa_lock() which is held by the only reader.

    Otherwise readers may continue to see driver_removed = false after
    rdma_user_mmap_entry_remove() returns and may continue to try and
    establish new mmaps.

    Fixes: 3411f9f01b76 ("RDMA/core: Create mmap database and cookie helper functions")
    Link: https://lore.kernel.org/r/20200115202041.GA17199@ziepe.ca
    Reviewed-by: Gal Pressman
    Acked-by: Michal Kalderon
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

21 Jan, 2020

3 commits

  • From https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma

    Leon Romanovsky says:

    ====================
    Use ODP MRs for kernel ULPs

    The following series extends MR creation routines to allow creation of
    user MRs through kernel ULPs as a proxy. The immediate use case is to
    allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
    MRs to be created and such MRs were not possible to create prior this
    series.

    The first part of this patchset extends RDMA to have special verb
    ib_reg_user_mr(). The common use case that uses this function is a
    userspace application that allocates memory for HCA access but the
    responsibility to register the memory at the HCA is on an kernel ULP.
    This ULP acts as an agent for the userspace application.

    The second part provides advise MR functionality for ULPs. This is
    integral part of ODP flows and used to trigger pagefaults in advance
    to prepare memory before running working set.

    The third part is actual user of those in-kernel APIs.
    ====================

    * tag 'rds-odp-for-5.5':
    net/rds: Use prefetch for On-Demand-Paging MR
    net/rds: Handle ODP mr registration/unregistration
    net/rds: Detect need of On-Demand-Paging memory registration
    RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths
    IB/mlx5: Mask out unsupported ODP capabilities for kernel QPs
    RDMA/mlx5: Don't fake udata for kernel path
    IB/mlx5: Add ODP WQE handlers for kernel QPs
    IB/core: Add interface to advise_mr for kernel users
    IB/core: Introduce ib_reg_user_mr
    IB: Allow calls to ib_umem_get from kernel ULPs

    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     
  • Leon Romanovsky says:

    ====================
    Use ODP MRs for kernel ULPs

    The following series extends MR creation routines to allow creation of
    user MRs through kernel ULPs as a proxy. The immediate use case is to
    allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
    MRs to be created and such MRs were not possible to create prior this
    series.

    The first part of this patchset extends RDMA to have special verb
    ib_reg_user_mr(). The common use case that uses this function is a
    userspace application that allocates memory for HCA access but the
    responsibility to register the memory at the HCA is on an kernel ULP.
    This ULP acts as an agent for the userspace application.

    The second part provides advise MR functionality for ULPs. This is
    integral part of ODP flows and used to trigger pagefaults in advance
    to prepare memory before running working set.

    The third part is actual user of those in-kernel APIs.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • iscsit_close_connection() calls isert_wait_conn(). Due to commit
    e9d3009cb936 both functions call target_wait_for_sess_cmds() although that
    last function should be called only once. Fix this by removing the
    target_wait_for_sess_cmds() call from isert_wait_conn() and by only calling
    isert_wait_conn() after target_wait_for_sess_cmds().

    Fixes: e9d3009cb936 ("scsi: target: iscsi: Wait for all commands to finish before freeing a session").
    Link: https://lore.kernel.org/r/20200116044737.19507-1-bvanassche@acm.org
    Reported-by: Rahul Kundu
    Signed-off-by: Bart Van Assche
    Tested-by: Mike Marciniszyn
    Acked-by: Sagi Grimberg
    Signed-off-by: Martin K. Petersen

    Bart Van Assche
     

20 Jan, 2020

2 commits


17 Jan, 2020

10 commits

  • This merge syncs with mlx5-next latest HW bits and layout updates for next
    features, in addition one patch that improves
    mlx5_create_auto_grouped_flow_table() API across all mlx5 users.

    * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
    net/mlx5: Refactor mlx5_create_auto_grouped_flow_table
    net/mlx5e: Add discard counters per priority
    net/mlx5e: Expose FEC feilds and related capability bit
    net/mlx5: Add mlx5_ifc definitions for connection tracking support
    net/mlx5: Add copy header action struct layout
    net/mlx5: Expose resource dump register mapping
    net/mlx5: Add structures and defines for MIRC register
    net/mlx5: Read MCAM register groups 1 and 2
    net/mlx5: Add structures layout for new MCAM access reg groups
    net/mlx5: Expose vDPA emulation device capabilities
    net/mlx5: Add Virtio Emulation related device capabilities

    Signed-off-by: Saeed Mahameed

    Saeed Mahameed
     
  • Refactor mlx5_create_auto_grouped_flow_table() to use ft_attr param
    which already carries the max_fte, prio and flags memebers, and is
    used the same in similar mlx5_create_flow_table() function.

    Signed-off-by: Paul Blakey
    Reviewed-by: Roi Dayan
    Reviewed-by: Oz Shlomo
    Reviewed-by: Mark Bloch
    Signed-off-by: Saeed Mahameed

    Paul Blakey
     
  • In procedure mlx4_ib_add_gid(), if the driver is unable to update the FW
    gid table, there is a memory leak in the driver's copy of the gid table:
    the gid entry's context buffer is not freed.

    If such an error occurs, free the entry's context buffer, and mark the
    entry as available (by setting its context pointer to NULL).

    Fixes: e26be1bfef81 ("IB/mlx4: Implement ib_device callbacks")
    Link: https://lore.kernel.org/r/20200115085050.73746-1-leon@kernel.org
    Signed-off-by: Jack Morgenstein
    Reviewed-by: Parav Pandit
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Jack Morgenstein
     
  • Introduce the following RoCE accelerator counters:
    * roce_adp_retrans - number of adaptive retransmission for RoCE traffic.
    * roce_adp_retrans_to - number of times RoCE traffic reached time out
    due to adaptive retransmission.
    * roce_slow_restart - number of times RoCE slow restart was used.
    * roce_slow_restart_cnps - number of times RoCE slow restart
    generate CNP packets.
    * roce_slow_restart_trans - number of times RoCE slow restart change
    state to slow restart.

    Link: https://lore.kernel.org/r/20200115145459.83280-3-leon@kernel.org
    Signed-off-by: Avihai Horon
    Reviewed-by: Maor Gottlieb
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Avihai Horon
     
  • Enable relaxed ordering in the mkey context when requested. As relaxed
    ordering is not currently supported in UMR, disable UMR usage for relaxed
    ordering MRs.

    Link: https://lore.kernel.org/r/1578506740-22188-11-git-send-email-yishaih@mellanox.com
    Signed-off-by: Michael Guralnik
    Signed-off-by: Yishai Hadas
    Signed-off-by: Jason Gunthorpe

    Michael Guralnik
     
  • Add the core support field to METHOD_GET_CONTEXT, this field should
    represent capabilities that are not device-specific.

    Return support for optional access flags for memory regions. User-space
    will use this capability to mask the optional access flags for
    unsupporting kernels.

    Link: https://lore.kernel.org/r/1578506740-22188-10-git-send-email-yishaih@mellanox.com
    Signed-off-by: Michael Guralnik
    Signed-off-by: Yishai Hadas
    Signed-off-by: Jason Gunthorpe

    Michael Guralnik
     
  • As part of adding a range of optional access flags that drivers need to be
    able to accept, mask this range inside efa driver. This will prevent the
    driver from failing when an access flag from that range is passed.

    Link: https://lore.kernel.org/r/1578506740-22188-8-git-send-email-yishaih@mellanox.com
    Signed-off-by: Michael Guralnik
    Signed-off-by: Yishai Hadas
    Signed-off-by: Jason Gunthorpe

    Michael Guralnik
     
  • Allow future extensions of the get context command through the uverbs
    ioctl kabi.

    Unlike the uverbs version this does not return an async_fd as well, that
    has to be done with another command.

    Link: https://lore.kernel.org/r/1578506740-22188-5-git-send-email-yishaih@mellanox.com
    Signed-off-by: Yishai Hadas
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     
  • This lock only serializes ucontext creation. Instead of checking the
    ucontext_lock during destruction hold the existing hw_destroy_rwsem during
    creation, which is the standard pattern for object creation.

    The simplification of locking is needed for the next patch.

    Link: https://lore.kernel.org/r/1578506740-22188-4-git-send-email-yishaih@mellanox.com
    Signed-off-by: Yishai Hadas
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     
  • Allow the async FD to be allocated separately from the context.

    This is necessary to introduce the ioctl to create a context, as an ioctl
    should only ever create a single uobject at a time.

    If multiple async FDs are created then the first one is used to deliver
    affiliated events from any ib_uevent_object, with all subsequent ones will
    receive only unaffiliated events.

    Link: https://lore.kernel.org/r/1578506740-22188-3-git-send-email-yishaih@mellanox.com
    Signed-off-by: Yishai Hadas
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe