10 Dec, 2020

1 commit

  • If cm_create_timewait_info() fails, the timewait_info pointer will contain
    an error value and will be used in cm_remove_remote() later.

    general protection fault, probably for non-canonical address 0xdffffc0000000024: 0000 [#1] SMP KASAN PTI
    KASAN: null-ptr-deref in range [0×0000000000000120-0×0000000000000127]
    CPU: 2 PID: 12446 Comm: syz-executor.3 Not tainted 5.10.0-rc5-5d4c0742a60e #27
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
    RIP: 0010:cm_remove_remote.isra.0+0x24/0×170 drivers/infiniband/core/cm.c:978
    Code: 84 00 00 00 00 00 41 54 55 53 48 89 fb 48 8d ab 2d 01 00 00 e8 7d bf 4b fe 48 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 b6 04 02 48 89 ea 83 e2 07 38 d0 7f 08 84 c0 0f 85 fc 00 00 00
    RSP: 0018:ffff888013127918 EFLAGS: 00010006
    RAX: dffffc0000000000 RBX: fffffffffffffff4 RCX: ffffc9000a18b000
    RDX: 0000000000000024 RSI: ffffffff82edc573 RDI: fffffffffffffff4
    RBP: 0000000000000121 R08: 0000000000000001 R09: ffffed1002624f1d
    R10: 0000000000000003 R11: ffffed1002624f1c R12: ffff888107760c70
    R13: ffff888107760c40 R14: fffffffffffffff4 R15: ffff888107760c9c
    FS: 00007fe1ffcc1700(0000) GS:ffff88811a600000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000001b2ff21000 CR3: 000000010f504001 CR4: 0000000000370ee0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    cm_destroy_id+0x189/0×15b0 drivers/infiniband/core/cm.c:1155
    cma_connect_ib drivers/infiniband/core/cma.c:4029 [inline]
    rdma_connect_locked+0x1100/0×17c0 drivers/infiniband/core/cma.c:4107
    rdma_connect+0x2a/0×40 drivers/infiniband/core/cma.c:4140
    ucma_connect+0x277/0×340 drivers/infiniband/core/ucma.c:1069
    ucma_write+0x236/0×2f0 drivers/infiniband/core/ucma.c:1724
    vfs_write+0x220/0×830 fs/read_write.c:603
    ksys_write+0x1df/0×240 fs/read_write.c:658
    do_syscall_64+0x33/0×40 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation")
    Link: https://lore.kernel.org/r/20201204064205.145795-1-leon@kernel.org
    Reviewed-by: Maor Gottlieb
    Reported-by: Amit Matityahu
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Leon Romanovsky
     

08 Dec, 2020

1 commit

  • The query_gid_table ioctl skips non IB/RoCE ports, which as a result
    returns an empty gid table for devices such as EFA which have a GID table,
    but are not IB/RoCE.

    Fixes: c4b4d548fabc ("RDMA/core: Introduce new GID table query API")
    Link: https://lore.kernel.org/r/20201206153238.34878-1-galpress@amazon.com
    Signed-off-by: Gal Pressman
    Signed-off-by: Jason Gunthorpe

    Gal Pressman
     

02 Dec, 2020

2 commits

  • The local variables cur_state and new_state hold the state that should be
    used for the modify QP operation instead of the ones in the ib_qp_attr
    struct.

    Fixes: 40909f664d27 ("RDMA/efa: Add EFA verbs implementation")
    Link: https://lore.kernel.org/r/20201201091724.37016-1-galpress@amazon.com
    Reviewed-by: Firas JahJah
    Reviewed-by: Yossi Leybovich
    Signed-off-by: Gal Pressman
    Signed-off-by: Jason Gunthorpe

    Gal Pressman
     
  • This patch fixes issue introduced by a previous commit where iWARP
    doorbell address wasn't initialized, causing call trace when any RDMA
    application wants to use this interface:

    Illegal doorbell address: 0000000000000000. Legal range for doorbell addresses is [0000000011431e08..00000000ec3799d3]
    WARNING: CPU: 11 PID: 11990 at drivers/net/ethernet/qlogic/qed/qed_dev.c:93 qed_db_rec_sanity.isra.12+0x48/0x70 [qed]
    ...
    hpsa scsi_transport_sas [last unloaded: crc8]
    CPU: 11 PID: 11990 Comm: rping Tainted: G S 5.10.0-rc1 #29
    Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018
    RIP: 0010:qed_db_rec_sanity.isra.12+0x48/0x70 [qed]
    ...
    RSP: 0018:ffffafc28458fa88 EFLAGS: 00010286
    RAX: 0000000000000000 RBX: ffff8d0d4c620000 RCX: 0000000000000000
    RDX: ffff8d10afde7d50 RSI: ffff8d10afdd8b40 RDI: ffff8d10afdd8b40
    RBP: ffffafc28458fe38 R08: 0000000000000003 R09: 0000000000007fff
    R10: 0000000000000001 R11: ffffafc28458f888 R12: 0000000000000000
    R13: 0000000000000000 R14: ffff8d0d43ccbbd0 R15: ffff8d0d48dae9c0
    FS: 00007fbd5267e740(0000) GS:ffff8d10afdc0000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fbd4f258fb8 CR3: 0000000108d96003 CR4: 00000000001706e0
    Call Trace:
    qed_db_recovery_add+0x6d/0x1f0 [qed]
    qedr_create_user_qp+0x57e/0xd30 [qedr]
    qedr_create_qp+0x5f3/0xab0 [qedr]
    ? lookup_get_idr_uobject.part.12+0x45/0x90 [ib_uverbs]
    create_qp+0x45d/0xb30 [ib_uverbs]
    ? ib_uverbs_cq_event_handler+0x30/0x30 [ib_uverbs]
    ib_uverbs_create_qp+0xb9/0xe0 [ib_uverbs]
    ib_uverbs_write+0x3f9/0x570 [ib_uverbs]
    ? security_mmap_file+0x62/0xe0
    vfs_write+0xb7/0x200
    ksys_write+0xaf/0xd0
    ? syscall_trace_enter.isra.25+0x152/0x200
    do_syscall_64+0x2d/0x40
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: 06e8d1df46ed ("RDMA/qedr: Add support for user mode XRC-SRQ's")
    Link: https://lore.kernel.org/r/20201127163251.14533-1-palok@marvell.com
    Signed-off-by: Michal Kalderon
    Signed-off-by: Igor Russkikh
    Signed-off-by: Alok Prasad
    Signed-off-by: Jason Gunthorpe

    Alok Prasad
     

26 Nov, 2020

4 commits

  • When a memory window is bound to a memory region, the local write access
    should be set for its mtpt table.

    Fixes: c7c28191408b ("RDMA/hns: Add MW support for hip08")
    Link: https://lore.kernel.org/r/1606386372-21094-1-git-send-email-liweihang@huawei.com
    Signed-off-by: Yixian Liu
    Signed-off-by: Weihang Li
    Signed-off-by: Jason Gunthorpe

    Yixian Liu
     
  • The maximum number of retransmission should be returned when querying QP,
    not the value of retransmission counter.

    Fixes: 99fcf82521d9 ("RDMA/hns: Fix the wrong value of rnr_retry when querying qp")
    Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC")
    Link: https://lore.kernel.org/r/1606382977-21431-1-git-send-email-liweihang@huawei.com
    Signed-off-by: Wenpeng Liang
    Signed-off-by: Weihang Li
    Signed-off-by: Jason Gunthorpe

    Wenpeng Liang
     
  • The SRQ capacity is got from the firmware, whose field should be ended at
    bit 19.

    Fixes: ba6bb7e97421 ("RDMA/hns: Add interfaces to get pf capabilities from firmware")
    Link: https://lore.kernel.org/r/1606382812-23636-1-git-send-email-liweihang@huawei.com
    Signed-off-by: Wenpeng Liang
    Signed-off-by: Weihang Li
    Signed-off-by: Jason Gunthorpe

    Wenpeng Liang
     
  • Two earlier bug fixes have created a security problem in the hfi1
    driver. One fix aimed to solve an issue where current->mm was not valid
    when closing the hfi1 cdev. It attempted to do this by saving a cached
    value of the current->mm pointer at file open time. This is a problem if
    another process with access to the FD calls in via write() or ioctl() to
    pin pages via the hfi driver. The other fix tried to solve a use after
    free by taking a reference on the mm.

    To fix this correctly we use the existing cached value of the mm in the
    mmu notifier. Now we can check in the insert, evict, etc. routines that
    current->mm matched what the notifier was registered for. If not, then
    don't allow access. The register of the mmu notifier will save the mm
    pointer.

    Since in do_exit() the exit_mm() is called before exit_files(), which
    would call our close routine a reference is needed on the mm. We rely on
    the mmgrab done by the registration of the notifier, whereas before it was
    explicit. The mmu notifier deregistration happens when the user context is
    torn down, the creation of which triggered the registration.

    Also of note is we do not do any explicit work to protect the interval
    tree notifier. It doesn't seem that this is going to be needed since we
    aren't actually doing anything with current->mm. The interval tree
    notifier stuff still has a FIXME noted from a previous commit that will be
    addressed in a follow on patch.

    Cc:
    Fixes: e0cf75deab81 ("IB/hfi1: Fix mm_struct use after free")
    Fixes: 3faa3d9a308e ("IB/hfi1: Make use of mm consistent")
    Link: https://lore.kernel.org/r/20201125210112.104301.51331.stgit@awfm-01.aw.intel.com
    Suggested-by: Jann Horn
    Reported-by: Jason Gunthorpe
    Reviewed-by: Ira Weiny
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe

    Dennis Dalessandro
     

25 Nov, 2020

1 commit

  • i40iw_mmap manipulates the vma->vm_pgoff to differentiate a push page mmap
    vs a doorbell mmap, and uses it to compute the pfn in remap_pfn_range
    without any validation. This is vulnerable to an mmap exploit as described
    in: https://lore.kernel.org/r/20201119093523.7588-1-zhudi21@huawei.com

    The push feature is disabled in the driver currently and therefore no push
    mmaps are issued from user-space. The feature does not work as expected in
    the x722 product.

    Remove the push module parameter and all VMA attribute manipulations for
    this feature in i40iw_mmap. Update i40iw_mmap to only allow DB user
    mmapings at offset = 0. Check vm_pgoff for zero and if the mmaps are bound
    to a single page.

    Cc:
    Fixes: d37498417947 ("i40iw: add files for iwarp interface")
    Link: https://lore.kernel.org/r/20201125005616.1800-2-shiraz.saleem@intel.com
    Reported-by: Di Zhu
    Signed-off-by: Shiraz Saleem
    Signed-off-by: Jason Gunthorpe

    Shiraz Saleem
     

24 Nov, 2020

1 commit

  • We return 'err' in the error branch, but this variable may be set as zero
    by the above code. Fix it by setting 'err' as a negative value before we
    goto the error label.

    Fixes: 74c2174e7be5 ("IB uverbs: add mthca user CQ support")
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Link: https://lore.kernel.org/r/1605837422-42724-1-git-send-email-wangxiongfeng2@huawei.com
    Reported-by: Hulk Robot
    Signed-off-by: Xiongfeng Wang
    Signed-off-by: Jason Gunthorpe

    Xiongfeng Wang
     

14 Nov, 2020

1 commit

  • Fix to return a negative error code from the error handling case instead
    of 0, as done elsewhere in this function.

    Fixes: 4730f4a6c6b2 ("IB/hfi1: Activate the dummy netdev")
    Link: https://lore.kernel.org/r/1605249747-17942-1-git-send-email-zhangchangzhong@huawei.com
    Reported-by: Hulk Robot
    Signed-off-by: Zhang Changzhong
    Acked-by: Mike Marciniszyn
    Signed-off-by: Jason Gunthorpe

    Zhang Changzhong
     

13 Nov, 2020

3 commits

  • dma_virt_ops requires that all pages have a kernel virtual address.
    Introduce a INFINIBAND_VIRT_DMA Kconfig symbol that depends on !HIGHMEM
    and make all three drivers depend on the new symbol.

    Also remove the ARCH_DMA_ADDR_T_64BIT dependency, which has been obsolete
    since commit 4965a68780c5 ("arch: define the ARCH_DMA_ADDR_T_64BIT config
    symbol in lib/Kconfig")

    Fixes: 551199aca1c3 ("lib/dma-virt: Add dma_virt_ops")
    Link: https://lore.kernel.org/r/20201106181941.1878556-2-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jason Gunthorpe

    Christoph Hellwig
     
  • Fix missing kfree in pvrdma_register_device() when failure from
    ib_device_set_netdev().

    Fixes: 4b38da75e089 ("RDMA/drivers: Convert easy drivers to use ib_device_set_netdev()")
    Link: https://lore.kernel.org/r/20201111032202.17925-1-miaoqinglang@huawei.com
    Reported-by: Hulk Robot
    Signed-off-by: Qinglang Miao
    Signed-off-by: Jason Gunthorpe

    Qinglang Miao
     
  • The xarray is never mutated from an IRQ handler, only from work queues
    under a spinlock_irq. Thus there is no reason for it be an IRQ type
    xarray.

    This was copied over from the original IDR code, but the recent rework put
    the xarray inside another spinlock_irq which will unbalance the unlocking.

    Fixes: c206f8bad15d ("RDMA/cm: Make it clearer how concurrency works in cm_req_handler()")
    Link: https://lore.kernel.org/r/0-v1-808b6da3bd3f+1857-cm_xarray_no_irq_jgg@nvidia.com
    Reported-by: Matthew Wilcox
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

06 Nov, 2020

1 commit

  • Pull rdma fixes from Jason Gunthorpe:
    "A few more merge window regressions that didn't make rc1:

    - New validation in the DMA layer triggers wrong use of the DMA layer
    in rxe, siw and rdmavt

    - Accidental change of a hypervisor facing ABI when widening the port
    speed u8 to u16 in vmw_pvrdma

    - Memory leak on error unwind in SRP target"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
    RDMA/srpt: Fix typo in srpt_unregister_mad_agent docstring
    RDMA/vmw_pvrdma: Fix the active_speed and phys_state value
    IB/srpt: Fix memory leak in srpt_add_one
    RDMA: Fix software RDMA drivers for dma mapping error

    Linus Torvalds
     

05 Nov, 2020

1 commit


04 Nov, 2020

1 commit

  • Pull documentation build warning fixes from Jonathan Corbet:
    "This contains a series of warning fixes from Mauro; once applied, the
    number of warnings from the once-noisy docs build process is nearly
    zero.

    Getting to this point has required a lot of work; once there,
    hopefully we can keep things that way.

    I have packaged this as a separate pull because it does a fair amount
    of reaching outside of Documentation/. The changes are all in comments
    and in code placement. It's all been in linux-next since last week"

    * tag 'docs-5.10-warnings' of git://git.lwn.net/linux: (24 commits)
    docs: SafeSetID: fix a warning
    amdgpu: fix a few kernel-doc markup issues
    selftests: kselftest_harness.h: fix kernel-doc markups
    drm: amdgpu_dm: fix a typo
    gpu: docs: amdgpu.rst: get rid of wrong kernel-doc markups
    drm: amdgpu: kernel-doc: update some adev parameters
    docs: fs: api-summary.rst: get rid of kernel-doc include
    IB/srpt: docs: add a description for cq_size member
    locking/refcount: move kernel-doc markups to the proper place
    docs: lockdep-design: fix some warning issues
    MAINTAINERS: fix broken doc refs due to yaml conversion
    ice: docs fix a devlink info that broke a table
    crypto: sun8x-ce*: update entries to its documentation
    net: phy: remove kernel-doc duplication
    mm: pagemap.h: fix two kernel-doc markups
    blk-mq: docs: add kernel-doc description for a new struct member
    docs: userspace-api: add iommu.rst to the index file
    docs: hwmon: mp2975.rst: address some html build warnings
    docs: net: statistics.rst: remove a duplicated kernel-doc
    docs: kasan.rst: add two missing blank lines
    ...

    Linus Torvalds
     

03 Nov, 2020

3 commits

  • The pvrdma_port_attr structure is ABI toward the hypervisor, changing it
    breaks the ability to report the speed properly. Revert the change to u16.

    Fixes: 376ceb31ff87 ("RDMA: Fix link active_speed size")
    Link: https://lore.kernel.org/r/20201102225437.26557-1-aditr@vmware.com
    Reviewed-by: Vishnu Dasa
    Signed-off-by: Adit Ranadive
    Signed-off-by: Jason Gunthorpe

    Adit Ranadive
     
  • Failure in srpt_refresh_port() for the second port will leave MAD
    registered for the first one, however, the srpt_add_one() will be marked
    as "failed" and SRPT will leak resources for that registered but not used
    and released first port.

    Unregister the MAD agent for all ports in case of failure.

    Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
    Link: https://lore.kernel.org/r/20201028065051.112430-1-leon@kernel.org
    Signed-off-by: Maor Gottlieb
    Signed-off-by: Leon Romanovsky
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jason Gunthorpe

    Maor Gottlieb
     
  • The commit f959dcd6ddfd ("dma-direct: Fix potential NULL pointer
    dereference") made dma_mask as mandetory field to be setup even for
    dma_virt_ops based dma devices. The commit in the fixes tag omitted
    setting up the dma_mask on virtual devices triggering the below trace when
    they were combined during the merge window.

    Fix it by setting empty DMA MASK for software based RDMA devices.

    WARNING: CPU: 1 PID: 8488 at kernel/dma/mapping.c:149 dma_map_page_attrs+0x493/0x700
    CPU: 1 PID: 8488 Comm: syz-executor144 Not tainted 5.9.0-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:dma_map_page_attrs+0x493/0x700 kernel/dma/mapping.c:149
    Trace:
    dma_map_single_attrs include/linux/dma-mapping.h:279 [inline]
    ib_dma_map_single include/rdma/ib_verbs.h:3967 [inline]
    ib_mad_post_receive_mads+0x23f/0xd60 drivers/infiniband/core/mad.c:2715
    ib_mad_port_start drivers/infiniband/core/mad.c:2862 [inline]
    ib_mad_port_open drivers/infiniband/core/mad.c:3016 [inline]
    ib_mad_init_device+0x72b/0x1400 drivers/infiniband/core/mad.c:3092
    add_client_context+0x405/0x5e0 drivers/infiniband/core/device.c:680
    enable_device_and_get+0x1d5/0x3c0 drivers/infiniband/core/device.c:1301
    ib_register_device drivers/infiniband/core/device.c:1376 [inline]
    ib_register_device+0x7a7/0xa40 drivers/infiniband/core/device.c:1335
    rxe_register_device+0x46d/0x570 drivers/infiniband/sw/rxe/rxe_verbs.c:1182
    rxe_add+0x12fe/0x16d0 drivers/infiniband/sw/rxe/rxe.c:247
    rxe_net_add+0x8c/0xe0 drivers/infiniband/sw/rxe/rxe_net.c:507
    rxe_newlink drivers/infiniband/sw/rxe/rxe.c:269 [inline]
    rxe_newlink+0xb7/0xe0 drivers/infiniband/sw/rxe/rxe.c:250
    nldev_newlink+0x30e/0x540 drivers/infiniband/core/nldev.c:1555
    rdma_nl_rcv_msg+0x367/0x690 drivers/infiniband/core/netlink.c:195
    rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
    rdma_nl_rcv+0x2f2/0x440 drivers/infiniband/core/netlink.c:259
    netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
    netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
    netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1919
    sock_sendmsg_nosec net/socket.c:651 [inline]
    sock_sendmsg+0xcf/0x120 net/socket.c:671
    ____sys_sendmsg+0x6e8/0x810 net/socket.c:2353
    ___sys_sendmsg+0xf3/0x170 net/socket.c:2407
    __sys_sendmsg+0xe5/0x1b0 net/socket.c:2440
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x443699

    Link: https://lore.kernel.org/r/20201030093803.278830-1-parav@nvidia.com
    Reported-by: syzbot+34dc2fea3478e659af01@syzkaller.appspotmail.com
    Fixes: e0477b34d9d1 ("RDMA: Explicitly pass in the dma_device to ib_register_device")
    Signed-off-by: Parav Pandit
    Tested-by: Guoqing Jiang
    Tested-by: Dennis Dalessandro
    Reviewed-by: Christoph Hellwig
    Acked-by: Zhu Yanjun
    Signed-off-by: Jason Gunthorpe

    Parav Pandit
     

29 Oct, 2020

1 commit

  • Changeset c804af2c1d31 ("IB/srpt: use new shared CQ mechanism")
    added a new member for struct srpt_rdma_ch, but didn't add the
    corresponding kernel-doc markup, as repoted when doing
    "make htmldocs":

    ./drivers/infiniband/ulp/srpt/ib_srpt.h:331: warning: Function parameter or member 'cq_size' not described in 'srpt_rdma_ch'

    Add a description for it.

    Fixes: c804af2c1d31 ("IB/srpt: use new shared CQ mechanism")
    Signed-off-by: Mauro Carvalho Chehab
    Tested-by: Brendan Higgins
    Reviewed-by: Brendan Higgins
    Link: https://lore.kernel.org/r/df0e5f0e866b91724299ef569a2da8115e48c0cf.1603791716.git.mchehab+huawei@kernel.org
    Signed-off-by: Jonathan Corbet

    Mauro Carvalho Chehab
     

28 Oct, 2020

2 commits

  • Fixes memory leak in iWARP CM

    Fixes: e411e0587e0d ("RDMA/qedr: Add iWARP connection management functions")
    Link: https://lore.kernel.org/r/20201021115008.28138-1-palok@marvell.com
    Signed-off-by: Michal Kalderon
    Signed-off-by: Igor Russkikh
    Signed-off-by: Alok Prasad
    Signed-off-by: Jason Gunthorpe

    Alok Prasad
     
  • There are two flows for handling RDMA_CM_EVENT_ROUTE_RESOLVED, either the
    handler triggers a completion and another thread does rdma_connect() or
    the handler directly calls rdma_connect().

    In all cases rdma_connect() needs to hold the handler_mutex, but when
    handler's are invoked this is already held by the core code. This causes
    ULPs using the 2nd method to deadlock.

    Provide a rdma_connect_locked() and have all ULPs call it from their
    handlers.

    Link: https://lore.kernel.org/r/0-v2-53c22d5c1405+33-rdma_connect_locking_jgg@nvidia.com
    Reported-and-tested-by: Guoqing Jiang
    Fixes: 2a7cec538169 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state")
    Acked-by: Santosh Shilimkar
    Acked-by: Jack Wang
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Max Gurtovoy
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

27 Oct, 2020

3 commits

  • Some drivers (such as EFA) have a GID table, but aren't IB/RoCE devices.
    Remove the unnecessary rdma_ib_or_roce() check.

    This fixes rdma-core failures for EFA when it uses the new ioctl interface
    for querying the GID table.

    Fixes: 9f85cbe50aa0 ("RDMA/uverbs: Expose the new GID query API to user space")
    Link: https://lore.kernel.org/r/20201026082621.32463-1-galpress@amazon.com
    Signed-off-by: Gal Pressman
    Signed-off-by: Jason Gunthorpe

    Gal Pressman
     
  • When a mlx5 core devlink instance is reloaded in different net namespace,
    its associated IB device is deleted and recreated.

    Example sequence is:
    $ ip netns add foo
    $ devlink dev reload pci/0000:00:08.0 netns foo
    $ ip netns del foo

    mlx5 IB device needs to attach and detach the netdevice to it through the
    netdev notifier chain during load and unload sequence. A below call graph
    of the unload flow.

    cleanup_net()
    down_read(&pernet_ops_rwsem);
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Parav Pandit
     
  • The patch referenced below has a typo that results in using the wrong L2
    header size for outbound traffic. (V4 V6).

    It also breaks kernel-side RC traffic because they use AVs that use
    RDMA_NETWORK_XXX enums instead of RXE_NETWORK_TYPE_XXX enums. Fix this by
    transcoding between these enum types.

    Fixes: e0d696d201dd ("RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI")
    Link: https://lore.kernel.org/r/20201016211343.22906-1-rpearson@hpe.com
    Signed-off-by: Bob Pearson
    Signed-off-by: Jason Gunthorpe

    Bob Pearson
     

18 Oct, 2020

1 commit

  • Pull rdma updates from Jason Gunthorpe:
    "A usual cycle for RDMA with a typical mix of driver and core subsystem
    updates:

    - Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma,
    hns, usnic, qib, qedr, cxgb4, hns, bnxt_re

    - Various rtrs fixes and updates

    - Bug fix for mlx4 CM emulation for virtualization scenarios where
    MRA wasn't working right

    - Use tracepoints instead of pr_debug in the CM code

    - Scrub the locking in ucma and cma to close more syzkaller bugs

    - Use tasklet_setup in the subsystem

    - Revert the idea that 'destroy' operations are not allowed to fail
    at the driver level. This proved unworkable from a HW perspective.

    - Revise how the umem API works so drivers make fewer mistakes using
    it

    - XRC support for qedr

    - Convert uverbs objects RWQ and MW to new the allocation scheme

    - Large queue entry sizes for hns

    - Use hmm_range_fault() for mlx5 On Demand Paging

    - uverbs APIs to inspect the GID table instead of sysfs

    - Move some of the RDMA code for building large page SGLs into
    lib/scatterlist"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (191 commits)
    RDMA/ucma: Fix use after free in destroy id flow
    RDMA/rxe: Handle skb_clone() failure in rxe_recv.c
    RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI
    RDMA: Explicitly pass in the dma_device to ib_register_device
    lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED values
    IB/mlx4: Convert rej_tmout radix-tree to XArray
    RDMA/rxe: Fix bug rejecting all multicast packets
    RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()
    RDMA/rxe: Remove duplicate entries in struct rxe_mr
    IB/hfi,rdmavt,qib,opa_vnic: Update MAINTAINERS
    IB/rdmavt: Fix sizeof mismatch
    MAINTAINERS: CISCO VIC LOW LATENCY NIC DRIVER
    RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl.
    RDMA/bnxt_re: Use rdma_umem_for_each_dma_block()
    RDMA/umem: Move to allocate SG table from pages
    lib/scatterlist: Add support in dynamic allocation of SG table from pages
    tools/testing/scatterlist: Show errors in human readable form
    tools/testing/scatterlist: Rejuvenate bit-rotten test
    RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces
    RDMA/uverbs: Expose the new GID query API to user space
    ...

    Linus Torvalds
     

17 Oct, 2020

6 commits

  • Merge more updates from Andrew Morton:
    "155 patches.

    Subsystems affected by this patch series: mm (dax, debug, thp,
    readahead, page-poison, util, memory-hotplug, zram, cleanups), misc,
    core-kernel, get_maintainer, MAINTAINERS, lib, bitops, checkpatch,
    binfmt, ramfs, autofs, nilfs, rapidio, panic, relay, kgdb, ubsan,
    romfs, and fault-injection"

    * emailed patches from Andrew Morton : (155 commits)
    lib, uaccess: add failure injection to usercopy functions
    lib, include/linux: add usercopy failure capability
    ROMFS: support inode blocks calculation
    ubsan: introduce CONFIG_UBSAN_LOCAL_BOUNDS for Clang
    sched.h: drop in_ubsan field when UBSAN is in trap mode
    scripts/gdb/tasks: add headers and improve spacing format
    scripts/gdb/proc: add struct mount & struct super_block addr in lx-mounts command
    kernel/relay.c: drop unneeded initialization
    panic: dump registers on panic_on_warn
    rapidio: fix the missed put_device() for rio_mport_add_riodev
    rapidio: fix error handling path
    nilfs2: fix some kernel-doc warnings for nilfs2
    autofs: harden ioctl table
    ramfs: fix nommu mmap with gaps in the page cache
    mm: remove the now-unnecessary mmget_still_valid() hack
    mm/gup: take mmap_lock in get_dump_page()
    binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot
    coredump: rework elf/elf_fdpic vma_dump_size() into common helper
    coredump: refactor page range dumping into common helper
    coredump: let dump_emit() bail out on short writes
    ...

    Linus Torvalds
     
  • The preceding patches have ensured that core dumping properly takes the
    mmap_lock. Thanks to that, we can now remove mmget_still_valid() and all
    its users.

    Signed-off-by: Jann Horn
    Signed-off-by: Andrew Morton
    Acked-by: Linus Torvalds
    Cc: Christoph Hellwig
    Cc: Alexander Viro
    Cc: "Eric W . Biederman"
    Cc: Oleg Nesterov
    Cc: Hugh Dickins
    Link: http://lkml.kernel.org/r/20200827114932.3572699-8-jannh@google.com
    Signed-off-by: Linus Torvalds

    Jann Horn
     
  • ucma_free_ctx() should call to __destroy_id() on all the connection requests
    that have not been delivered to user space. Currently it calls on the
    context itself and cause to use after free.

    Fixes the trace:

    BUG: Unable to handle kernel data access on write at 0x5deadbeef0000108
    Faulting instruction address: 0xc0080000002428f4
    Oops: Kernel access of bad area, sig: 11 [#1]
    Call Trace:
    [c000000207f2b680] [c00800000024280c] .__destroy_id+0x28c/0x610 [rdma_ucm] (unreliable)
    [c000000207f2b750] [c0080000002429c4] .__destroy_id+0x444/0x610 [rdma_ucm]
    [c000000207f2b820] [c008000000242c24] .ucma_close+0x94/0xf0 [rdma_ucm]
    [c000000207f2b8c0] [c00000000046fbdc] .__fput+0xac/0x330
    [c000000207f2b960] [c00000000015d48c] .task_work_run+0xbc/0x110
    [c000000207f2b9f0] [c00000000012fb00] .do_exit+0x430/0xc50
    [c000000207f2bae0] [c0000000001303ec] .do_group_exit+0x5c/0xd0
    [c000000207f2bb70] [c000000000144a34] .get_signal+0x194/0xe30
    [c000000207f2bc60] [c00000000001f6b4] .do_notify_resume+0x124/0x470
    [c000000207f2bd60] [c000000000032484] .interrupt_exit_user_prepare+0x1b4/0x240
    [c000000207f2be20] [c000000000010034] interrupt_return+0x14/0x1c0

    Rename listen_ctx to conn_req_ctx as the poor name was the cause of this
    bug.

    Fixes: a1d33b70dbbc ("RDMA/ucma: Rework how new connections are passed through event delivery")
    Link: https://lore.kernel.org/r/20201012045600.418271-4-leon@kernel.org
    Signed-off-by: Maor Gottlieb
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Maor Gottlieb
     
  • If skb_clone() is unable to allocate memory for a new sk_buff this is not
    detected by the current code.

    Check for a NULL return and continue. This is similar to other errors in
    this loop over QPs attached to the multicast address and consistent with
    the unreliable UD transport.

    Fixes: e7ec96fc7932f ("RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()")
    Addresses-Coverity-ID: 1497804: Null pointer dereferences (NULL_RETURNS)
    Link: https://lore.kernel.org/r/20201013184236.5231-1-rpearson@hpe.com
    Signed-off-by: Bob Pearson
    Signed-off-by: Jason Gunthorpe

    Bob Pearson
     
  • RXE was wrongly using an internal kernel enum as part of its uAPI, split
    this out into a dedicated uAPI enum just for RXE. It only uses the IPv4
    and IPv6 values.

    This was exposed by changing the internal kernel enum definition which
    broke RXE.

    Fixes: 1c15b4f2a42f ("RDMA/core: Modify enum ib_gid_type and enum rdma_network_type")
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     
  • The code in setup_dma_device has become rather convoluted, move all of
    this to the drivers. Drives now pass in a DMA capable struct device which
    will be used to setup DMA, or drivers must fully configure the ibdev for
    DMA and pass in NULL.

    Other than setting the masks in rvt all drivers were doing this already
    anyhow.

    mthca, mlx4 and mlx5 were already setting up maximum DMA segment size for
    DMA based on their hardweare limits in:
    __mthca_init_one()
    dma_set_max_seg_size (1G)

    __mlx4_init_one()
    dma_set_max_seg_size (1G)

    mlx5_pci_init()
    set_dma_caps()
    dma_set_max_seg_size (2G)

    Other non software drivers (except usnic) were extended to UINT_MAX [1, 2]
    instead of 2G as was before.

    [1] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/
    [2] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/

    Link: https://lore.kernel.org/r/20201008082752.275846-1-leon@kernel.org
    Link: https://lore.kernel.org/r/6b2ed339933d066622d5715903870676d8cc523a.1602590106.git.mchehab+huawei@kernel.org
    Suggested-by: Christoph Hellwig
    Signed-off-by: Parav Pandit
    Signed-off-by: Leon Romanovsky
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

16 Oct, 2020

2 commits

  • From Maor Gottlieb says:

    ====================
    This series extends __sg_alloc_table_from_pages to allow chaining of new
    pages to an already initialized SG table.

    This allows for drivers to utilize the optimization of merging contiguous
    pages without a need to pre allocate all the pages and hold them in a very
    large temporary buffer prior to the call to SG table initialization.

    The last patch changes the Infiniband core to use the new API. It removes
    duplicate functionality from the code and benefits from the optimization
    of allocating dynamic SG table from pages.

    In huge pages system of 2MB page size, without this change, the SG table
    would contain x512 SG entries.
    ====================

    * branch 'dynamic_sg':
    RDMA/umem: Move to allocate SG table from pages
    lib/scatterlist: Add support in dynamic allocation of SG table from pages
    tools/testing/scatterlist: Show errors in human readable form
    tools/testing/scatterlist: Rejuvenate bit-rotten test

    Jason Gunthorpe
     
  • Pull networking updates from Jakub Kicinski:

    - Add redirect_neigh() BPF packet redirect helper, allowing to limit
    stack traversal in common container configs and improving TCP
    back-pressure.

    Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

    - Expand netlink policy support and improve policy export to user
    space. (Ge)netlink core performs request validation according to
    declared policies. Expand the expressiveness of those policies
    (min/max length and bitmasks). Allow dumping policies for particular
    commands. This is used for feature discovery by user space (instead
    of kernel version parsing or trial and error).

    - Support IGMPv3/MLDv2 multicast listener discovery protocols in
    bridge.

    - Allow more than 255 IPv4 multicast interfaces.

    - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
    packets of TCPv6.

    - In Multi-patch TCP (MPTCP) support concurrent transmission of data on
    multiple subflows in a load balancing scenario. Enhance advertising
    addresses via the RM_ADDR/ADD_ADDR options.

    - Support SMC-Dv2 version of SMC, which enables multi-subnet
    deployments.

    - Allow more calls to same peer in RxRPC.

    - Support two new Controller Area Network (CAN) protocols - CAN-FD and
    ISO 15765-2:2016.

    - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
    kernel problem.

    - Add TC actions for implementing MPLS L2 VPNs.

    - Improve nexthop code - e.g. handle various corner cases when nexthop
    objects are removed from groups better, skip unnecessary
    notifications and make it easier to offload nexthops into HW by
    converting to a blocking notifier.

    - Support adding and consuming TCP header options by BPF programs,
    opening the doors for easy experimental and deployment-specific TCP
    option use.

    - Reorganize TCP congestion control (CC) initialization to simplify
    life of TCP CC implemented in BPF.

    - Add support for shipping BPF programs with the kernel and loading
    them early on boot via the User Mode Driver mechanism, hence reusing
    all the user space infra we have.

    - Support sleepable BPF programs, initially targeting LSM and tracing.

    - Add bpf_d_path() helper for returning full path for given 'struct
    path'.

    - Make bpf_tail_call compatible with bpf-to-bpf calls.

    - Allow BPF programs to call map_update_elem on sockmaps.

    - Add BPF Type Format (BTF) support for type and enum discovery, as
    well as support for using BTF within the kernel itself (current use
    is for pretty printing structures).

    - Support listing and getting information about bpf_links via the bpf
    syscall.

    - Enhance kernel interfaces around NIC firmware update. Allow
    specifying overwrite mask to control if settings etc. are reset
    during update; report expected max time operation may take to users;
    support firmware activation without machine reboot incl. limits of
    how much impact reset may have (e.g. dropping link or not).

    - Extend ethtool configuration interface to report IEEE-standard
    counters, to limit the need for per-vendor logic in user space.

    - Adopt or extend devlink use for debug, monitoring, fw update in many
    drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
    dpaa2-eth).

    - In mlxsw expose critical and emergency SFP module temperature alarms.
    Refactor port buffer handling to make the defaults more suitable and
    support setting these values explicitly via the DCBNL interface.

    - Add XDP support for Intel's igb driver.

    - Support offloading TC flower classification and filtering rules to
    mscc_ocelot switches.

    - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
    fixed interval period pulse generator and one-step timestamping in
    dpaa-eth.

    - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
    offload.

    - Add Lynx PHY/PCS MDIO module, and convert various drivers which have
    this HW to use it. Convert mvpp2 to split PCS.

    - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
    7-port Mediatek MT7531 IP.

    - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
    and wcn3680 support in wcn36xx.

    - Improve performance for packets which don't require much offloads on
    recent Mellanox NICs by 20% by making multiple packets share a
    descriptor entry.

    - Move chelsio inline crypto drivers (for TLS and IPsec) from the
    crypto subtree to drivers/net. Move MDIO drivers out of the phy
    directory.

    - Clean up a lot of W=1 warnings, reportedly the actively developed
    subsections of networking drivers should now build W=1 warning free.

    - Make sure drivers don't use in_interrupt() to dynamically adapt their
    code. Convert tasklets to use new tasklet_setup API (sadly this
    conversion is not yet complete).

    * tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
    Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
    net, sockmap: Don't call bpf_prog_put() on NULL pointer
    bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
    bpf, sockmap: Add locking annotations to iterator
    netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
    net: fix pos incrementment in ipv6_route_seq_next
    net/smc: fix invalid return code in smcd_new_buf_create()
    net/smc: fix valid DMBE buffer sizes
    net/smc: fix use-after-free of delayed events
    bpfilter: Fix build error with CONFIG_BPFILTER_UMH
    cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
    net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
    bpf: Fix register equivalence tracking.
    rxrpc: Fix loss of final ack on shutdown
    rxrpc: Fix bundle counting for exclusive connections
    netfilter: restore NF_INET_NUMHOOKS
    ibmveth: Identify ingress large send packets.
    ibmveth: Switch order of ibmveth_helper calls.
    cxgb4: handle 4-tuple PEDIT to NAT mode translation
    selftests: Add VRF route leaking tests
    ...

    Linus Torvalds
     

14 Oct, 2020

2 commits

  • Simplify the code by using new function dev_fetch_sw_netstats().

    Signed-off-by: Heiner Kallweit
    Link: https://lore.kernel.org/r/6cad1a04-f021-d94b-45fd-7cc7cf07367d@gmail.com
    Signed-off-by: Jakub Kicinski

    Heiner Kallweit
     
  • Pull block updates from Jens Axboe:

    - Series of merge handling cleanups (Baolin, Christoph)

    - Series of blk-throttle fixes and cleanups (Baolin)

    - Series cleaning up BDI, seperating the block device from the
    backing_dev_info (Christoph)

    - Removal of bdget() as a generic API (Christoph)

    - Removal of blkdev_get() as a generic API (Christoph)

    - Cleanup of is-partition checks (Christoph)

    - Series reworking disk revalidation (Christoph)

    - Series cleaning up bio flags (Christoph)

    - bio crypt fixes (Eric)

    - IO stats inflight tweak (Gabriel)

    - blk-mq tags fixes (Hannes)

    - Buffer invalidation fixes (Jan)

    - Allow soft limits for zone append (Johannes)

    - Shared tag set improvements (John, Kashyap)

    - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

    - DM no-wait support (Mike, Konstantin)

    - Request allocation improvements (Ming)

    - Allow md/dm/bcache to use IO stat helpers (Song)

    - Series improving blk-iocost (Tejun)

    - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
    Xianting, Yang, Yufen, yangerkun)

    * tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
    block: fix uapi blkzoned.h comments
    blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
    blk-mq: get rid of the dead flush handle code path
    block: get rid of unnecessary local variable
    block: fix comment and add lockdep assert
    blk-mq: use helper function to test hw stopped
    block: use helper function to test queue register
    block: remove redundant mq check
    block: invoke blk_mq_exit_sched no matter whether have .exit_sched
    percpu_ref: don't refer to ref->data if it isn't allocated
    block: ratelimit handle_bad_sector() message
    blk-throttle: Re-use the throtl_set_slice_end()
    blk-throttle: Open code __throtl_de/enqueue_tg()
    blk-throttle: Move service tree validation out of the throtl_rb_first()
    blk-throttle: Move the list operation after list validation
    blk-throttle: Fix IO hang for a corner case
    blk-throttle: Avoid tracking latency if low limit is invalid
    blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
    blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
    block: Remove redundant 'return' statement
    ...

    Linus Torvalds
     

09 Oct, 2020

3 commits

  • Was missed during the initial review of the below patch

    Fixes: 227a0e142e37 ("IB/mlx4: Add support for REJ due to timeout")
    Link: https://lore.kernel.org/r/1602253482-6718-1-git-send-email-haakon.bugge@oracle.com
    Signed-off-by: Håkon Bugge
    Signed-off-by: Jason Gunthorpe

    Håkon Bugge
     
  • Fix a bug in rxe_rcv() that causes all multicast packets to be
    dropped. Currently rxe_match_dgid() is called for each packet to verify
    that the destination IP address matches one of the entries in the port
    source GID table. This is incorrect for IP multicast addresses since they
    do not appear in the GID table.

    Add code to detect multicast addresses.

    Change function name to rxe_chk_dgid() which is clearer.

    Link: https://lore.kernel.org/r/20201008212753.265249-1-rpearson@hpe.com
    Signed-off-by: Bob Pearson
    Signed-off-by: Jason Gunthorpe

    Bob Pearson
     
  • The changes referenced below replaced sbk_clone)_ by taking additional
    references, passing the skb along and then freeing the skb. This
    deleted the packets before they could be processed and additionally
    passed bad data in each packet. Since pkt is stored in skb->cb
    changing pkt->qp changed it for all the packets.

    Replace skb_get() by sbk_clone() in rxe_rcv_mcast_pkt() for cases where
    multiple QPs are receiving multicast packets on the same address.

    Delete kfree_skb() because the packets need to live until they have been
    processed by each QP. They are freed later.

    Fixes: 86af61764151 ("IB/rxe: remove unnecessary skb_clone")
    Fixes: fe896ceb5772 ("IB/rxe: replace refcount_inc with skb_get")
    Link: https://lore.kernel.org/r/20201008203651.256958-1-rpearson@hpe.com
    Signed-off-by: Bob Pearson
    Signed-off-by: Jason Gunthorpe

    Bob Pearson