05 Mar, 2020

3 commits

  • commit 468d020e2f02867b8ec561461a1689cd4365e493 upstream.

    Driver should first check whether the sge is valid, then fill the valid
    sge and the caculated total into hardware, otherwise invalid sges will
    cause an error.

    Fixes: 52e3b42a2f58 ("RDMA/hns: Filter for zero length of sge in hip08 kernel mode")
    Fixes: 7bdee4158b37 ("RDMA/hns: Fill sq wqe context of ud type in hip08")
    Link: https://lore.kernel.org/r/1578571852-13704-1-git-send-email-liweihang@huawei.com
    Signed-off-by: Lijun Ou
    Signed-off-by: Weihang Li
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Lijun Ou
     
  • commit 4768820243d71d49f1044b3f911ac3d52bdb79af upstream.

    Currently, the wqe idx is calculated repeatly everywhere it is used. This
    patch defines wqe_idx and calculated it only once, then just use it as
    needed.

    Fixes: 2d40788825ac ("RDMA/hns: Add support for processing send wr and receive wr")
    Link: https://lore.kernel.org/r/1575981902-5274-1-git-send-email-liweihang@hisilicon.com
    Signed-off-by: Yixian Liu
    Signed-off-by: Weihang Li
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Yixian Liu
     
  • [ Upstream commit 663218a3e715fd9339d143a3e10088316b180f4f ]

    Warnings like below can fill up the dmesg while disconnecting RDMA
    connections.
    Hence, remove the unwanted WARN_ON.

    WARNING: CPU: 6 PID: 0 at drivers/infiniband/sw/siw/siw_cm.c:1229 siw_cm_llp_data_ready+0xc1/0xd0 [siw]
    RIP: 0010:siw_cm_llp_data_ready+0xc1/0xd0 [siw]
    Call Trace:

    tcp_data_queue+0x226/0xb40
    tcp_rcv_established+0x220/0x620
    tcp_v4_do_rcv+0x12a/0x1e0
    tcp_v4_rcv+0xb05/0xc00
    ip_local_deliver_finish+0x69/0x210
    ip_local_deliver+0x6b/0xe0
    ip_rcv+0x273/0x362
    __netif_receive_skb_core+0xb35/0xc30
    netif_receive_skb_internal+0x3d/0xb0
    napi_gro_frags+0x13b/0x200
    t4_ethrx_handler+0x433/0x7d0 [cxgb4]
    process_responses+0x318/0x580 [cxgb4]
    napi_rx_handler+0x14/0x100 [cxgb4]
    net_rx_action+0x149/0x3b0
    __do_softirq+0xe3/0x30a
    irq_exit+0x100/0x110
    do_IRQ+0x7f/0xe0
    common_interrupt+0xf/0xf

    Link: https://lore.kernel.org/r/20200207141429.27927-1-krishna2@chelsio.com
    Signed-off-by: Krishnamraju Eraparaju
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin

    Krishnamraju Eraparaju
     

29 Feb, 2020

1 commit

  • commit 76261ada16dcc3be610396a46d35acc3efbda682 upstream.

    Since commit 04060db41178 introduces soft lockups when toggling network
    interfaces, revert it.

    Link: https://marc.info/?l=target-devel&m=158157054906196
    Cc: Rahul Kundu
    Cc: Mike Marciniszyn
    Cc: Sagi Grimberg
    Reported-by: Dakshaja Uppalapati
    Fixes: 04060db41178 ("scsi: RDMA/isert: Fix a recently introduced regression related to logout")
    Signed-off-by: Bart Van Assche
    Signed-off-by: Martin K. Petersen
    Signed-off-by: Greg Kroah-Hartman

    Bart Van Assche
     

24 Feb, 2020

6 commits

  • [ Upstream commit 4835709176e8ccf6561abc9f5c405293e008095f ]

    Kernel paths must not set udata and provide NULL pointer,
    instead of faking zeroed udata struct.

    Signed-off-by: Leon Romanovsky
    Signed-off-by: Sasha Levin

    Leon Romanovsky
     
  • [ Upstream commit 2c9d4e26d1ab27ceae2ded2ffe930f8e5f5b2a89 ]

    This counter, RxShrErr, is required for error analysis and debug.

    Fixes: 7724105686e7 ("IB/hfi1: add driver files")
    Link: https://lore.kernel.org/r/20200106134235.119356.29123.stgit@awfm-01.aw.intel.com
    Reviewed-by: Kaike Wan
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin

    Mike Marciniszyn
     
  • [ Upstream commit 5ffd048698ea5139743acd45e8ab388a683642b8 ]

    All other code paths increment some form of drop counter.

    This was missed in the original implementation.

    Fixes: 82c2611daaf0 ("staging/rdma/hfi1: Handle packets with invalid RHF on context 0")
    Link: https://lore.kernel.org/r/20200106134228.119356.96828.stgit@awfm-01.aw.intel.com
    Reviewed-by: Kaike Wan
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin

    Mike Marciniszyn
     
  • [ Upstream commit eca44507c3e908b7362696a4d6a11d90371334c6 ]

    Address of a page shouldn't be printed in case of security issues.

    Link: https://lore.kernel.org/r/1578313276-29080-2-git-send-email-liweihang@huawei.com
    Signed-off-by: Wenpeng Liang
    Signed-off-by: Weihang Li
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin

    Wenpeng Liang
     
  • [ Upstream commit 6ca18d8927d468c763571f78c9a7387a69ffa020 ]

    The type of mmap_offset should be u64 instead of int to match the type of
    mminfo.offset. If otherwise, after we create several thousands of CQs, it
    will run into overflow issues.

    Link: https://lore.kernel.org/r/20191227113613.5020-1-kejiewei.cn@gmail.com
    Signed-off-by: Jiewei Ke
    Reviewed-by: Jason Gunthorpe
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin

    Jiewei Ke
     
  • [ Upstream commit 6b57cea9221b0247ad5111b348522625e489a8e4 ]

    Currently when the low level driver notifies Pkey, GID, and port change
    events they are notified to the registered handlers in the order they are
    registered.

    IB core and other ULPs such as IPoIB are interested in GID, LID, Pkey
    change events.

    Since all GID queries done by ULPs are serviced by IB core, and the IB
    core deferes cache updates to a work queue, it is possible for other
    clients to see stale cache data when they handle their own events.

    For example, the below call tree shows how ipoib will call
    rdma_query_gid() concurrently with the update to the cache sitting in the
    WQ.

    mlx5_ib_handle_event()
    ib_dispatch_event()
    ib_cache_event()
    queue_work() -> slow cache update

    [..]
    ipoib_event()
    queue_work()
    [..]
    work handler
    ipoib_ib_dev_flush_light()
    __ipoib_ib_dev_flush()
    ipoib_dev_addr_changed_valid()
    rdma_query_gid()
    Signed-off-by: Leon Romanovsky
    Reviewed-by: Jason Gunthorpe
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin

    Parav Pandit
     

20 Feb, 2020

10 commits

  • commit 1dd017882e01d2fcd9c5dbbf1eb376211111c393 upstream.

    We don't need to set pkey as valid in case that user set only one of pkey
    index or port number, otherwise it will be resulted in NULL pointer
    dereference while accessing to uninitialized pkey list. The following
    crash from Syzkaller revealed it.

    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] SMP KASAN PTI
    CPU: 1 PID: 14753 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
    rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
    RIP: 0010:get_pkey_idx_qp_list+0x161/0x2d0
    Code: 01 00 00 49 8b 5e 20 4c 39 e3 0f 84 b9 00 00 00 e8 e4 42 6e fe 48
    8d 7b 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 b6 04
    02 84 c0 74 08 3c 01 0f 8e d0 00 00 00 48 8d 7d 04 48 b8
    RSP: 0018:ffffc9000bc6f950 EFLAGS: 00010202
    RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff82c8bdec
    RDX: 0000000000000002 RSI: ffffc900030a8000 RDI: 0000000000000010
    RBP: ffff888112c8ce80 R08: 0000000000000004 R09: fffff5200178df1f
    R10: 0000000000000001 R11: fffff5200178df1f R12: ffff888115dc4430
    R13: ffff888115da8498 R14: ffff888115dc4410 R15: ffff888115da8000
    FS: 00007f20777de700(0000) GS:ffff88811b100000(0000)
    knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000001b2f721000 CR3: 00000001173ca002 CR4: 0000000000360ee0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    port_pkey_list_insert+0xd7/0x7c0
    ib_security_modify_qp+0x6fa/0xfc0
    _ib_modify_qp+0x8c4/0xbf0
    modify_qp+0x10da/0x16d0
    ib_uverbs_modify_qp+0x9a/0x100
    ib_uverbs_write+0xaa5/0xdf0
    __vfs_write+0x7c/0x100
    vfs_write+0x168/0x4a0
    ksys_write+0xc8/0x200
    do_syscall_64+0x9c/0x390
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs")
    Link: https://lore.kernel.org/r/20200212080651.GB679970@unreal
    Signed-off-by: Maor Gottlieb
    Signed-off-by: Leon Romanovsky
    Message-Id:
    Signed-off-by: Greg Kroah-Hartman

    Leon Romanovsky
     
  • commit 8ac0e6641c7ca14833a2a8c6f13d8e0a435e535c upstream.

    When run stress tests with RXE, the following Call Traces often occur

    watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0]
    ...
    Call Trace:

    create_object+0x3f/0x3b0
    kmem_cache_alloc_node_trace+0x129/0x2d0
    __kmalloc_reserve.isra.52+0x2e/0x80
    __alloc_skb+0x83/0x270
    rxe_init_packet+0x99/0x150 [rdma_rxe]
    rxe_requester+0x34e/0x11a0 [rdma_rxe]
    rxe_do_task+0x85/0xf0 [rdma_rxe]
    tasklet_action_common.isra.21+0xeb/0x100
    __do_softirq+0xd0/0x298
    irq_exit+0xc5/0xd0
    smp_apic_timer_interrupt+0x68/0x120
    apic_timer_interrupt+0xf/0x20

    ...

    The root cause is that tasklet is actually a softirq. In a tasklet
    handler, another softirq handler is triggered. Usually these softirq
    handlers run on the same cpu core. So this will cause "soft lockup Bug".

    Fixes: 8700e3e7c485 ("Soft RoCE driver")
    Link: https://lore.kernel.org/r/20200212072635.682689-8-leon@kernel.org
    Signed-off-by: Zhu Yanjun
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Zhu Yanjun
     
  • commit 8a4f300b978edbbaa73ef9eca660e45eb9f13873 upstream.

    Make sure to free the allocated cpumask_var_t's to avoid the following
    reported memory leak by kmemleak:

    $ cat /sys/kernel/debug/kmemleak
    unreferenced object 0xffff8897f812d6a8 (size 8):
    comm "kworker/1:1", pid 347, jiffies 4294751400 (age 101.703s)
    hex dump (first 8 bytes):
    00 00 00 00 00 00 00 00 ........
    backtrace:
    [] alloc_cpumask_var_node+0x4c/0xb0
    [] hfi1_comp_vectors_set_up+0x20f/0x800 [hfi1]
    [] hfi1_init_dd+0x3311/0x4960 [hfi1]
    [] init_one+0x25e/0xf10 [hfi1]
    [] local_pci_probe+0xd4/0x180
    [] work_for_cpu_fn+0x51/0xa0
    [] process_one_work+0x8f0/0x17b0
    [] worker_thread+0x536/0xb50
    [] kthread+0x30c/0x3d0
    [] ret_from_fork+0x3a/0x50

    Fixes: 5d18ee67d4c1 ("IB/{hfi1, rdmavt, qib}: Implement CQ completion vector support")
    Link: https://lore.kernel.org/r/20200205110530.12129-1-kamalheib1@gmail.com
    Signed-off-by: Kamal Heib
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Kamal Heib
     
  • commit d219face9059f38ad187bde133451a2a308fdb7c upstream.

    As per draft-hilland-iwarp-verbs-v1.0, sec 6.2.3, always initiate a CLOSE
    when entering into TERM state.

    In c4iw_modify_qp(), disconnect operation should only be performed when
    the modify_qp call is invoked from ib_core. And all other internal
    modify_qp calls(invoked within iw_cxgb4) that needs 'disconnect' should
    call c4iw_ep_disconnect() explicitly after modify_qp. Otherwise, deadlocks
    like below can occur:

    Call Trace:
    schedule+0x2f/0xa0
    schedule_preempt_disabled+0xa/0x10
    __mutex_lock.isra.5+0x2d0/0x4a0
    c4iw_ep_disconnect+0x39/0x430 => tries to reacquire ep lock again
    c4iw_modify_qp+0x468/0x10d0
    rx_data+0x218/0x570 => acquires ep lock
    process_work+0x5f/0x70
    process_one_work+0x1a7/0x3b0
    worker_thread+0x30/0x390
    kthread+0x112/0x130
    ret_from_fork+0x35/0x40

    Fixes: d2c33370ae73 ("RDMA/iw_cxgb4: Always disconnect when QP is transitioning to TERMINATE state")
    Link: https://lore.kernel.org/r/20200204091230.7210-1-krishna2@chelsio.com
    Signed-off-by: Krishnamraju Eraparaju
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Krishnamraju Eraparaju
     
  • commit a72f4ac1d778f7bde93dfee69bfc23377ec3d74f upstream.

    Add a check that the size specified in the flow spec header doesn't cause
    an overflow when calculating the filter size, and thus prevent access to
    invalid memory. The following crash from syzkaller revealed it.

    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] SMP KASAN PTI
    CPU: 1 PID: 17834 Comm: syz-executor.3 Not tainted 5.5.0-rc5 #2
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
    rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
    RIP: 0010:memchr_inv+0xd3/0x330
    Code: 89 f9 89 f5 83 e1 07 0f 85 f9 00 00 00 49 89 d5 49 c1 ed 03 45 85
    ed 74 6f 48 89 d9 48 b8 00 00 00 00 00 fc ff df 48 c1 e9 03 3c 01
    00 0f 85 0d 02 00 00 44 0f b6 e5 48 b8 01 01 01 01 01 01
    RSP: 0018:ffffc9000a13fa50 EFLAGS: 00010202
    RAX: dffffc0000000000 RBX: 7fff88810de9d820 RCX: 0ffff11021bd3b04
    RDX: 000000000000fff8 RSI: 0000000000000000 RDI: 7fff88810de9d820
    RBP: 0000000000000000 R08: ffff888110d69018 R09: 0000000000000009
    R10: 0000000000000001 R11: ffffed10236267cc R12: 0000000000000004
    R13: 0000000000001fff R14: ffff88810de9d820 R15: 0000000000000040
    FS: 00007f9ee0e51700(0000) GS:ffff88811b100000(0000)
    knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 0000000115ea0006 CR4: 0000000000360ee0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    spec_filter_size.part.16+0x34/0x50
    ib_uverbs_kern_spec_to_ib_spec_filter+0x691/0x770
    ib_uverbs_ex_create_flow+0x9ea/0x1b40
    ib_uverbs_write+0xaa5/0xdf0
    __vfs_write+0x7c/0x100
    vfs_write+0x168/0x4a0
    ksys_write+0xc8/0x200
    do_syscall_64+0x9c/0x390
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x465b49
    Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89
    f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01
    f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007f9ee0e50c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000465b49
    RDX: 00000000000003a0 RSI: 00000000200007c0 RDI: 0000000000000004
    RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ee0e516bc
    R13: 00000000004ca2da R14: 000000000070deb8 R15: 00000000ffffffff
    Modules linked in:
    Dumping ftrace buffer:
    (ftrace buffer empty)

    Fixes: 94e03f11ad1f ("IB/uverbs: Add support for flow tag")
    Link: https://lore.kernel.org/r/20200126171500.4623-1-leon@kernel.org
    Signed-off-by: Avihai Horon
    Reviewed-by: Maor Gottlieb
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Avihai Horon
     
  • commit 9ea04d0df6e6541c6736b43bff45f1e54875a1db upstream.

    When disassociating a device from umad we must ensure that the sysfs
    access is prevented before blocking the fops, otherwise assumptions in
    syfs don't hold:

    CPU0 CPU1
    ib_umad_kill_port() ibdev_show()
    port->ib_dev = NULL
    dev_name(port->ib_dev)

    The prior patch made an error in moving the device_destroy(), it should
    have been split into device_del() (above) and put_device() (below). At
    this point we already have the split, so move the device_del() back to its
    original place.

    kernel stack
    PF: error_code(0x0000) - not-present page
    Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
    RIP: 0010:ibdev_show+0x18/0x50 [ib_umad]
    RSP: 0018:ffffc9000097fe40 EFLAGS: 00010282
    RAX: 0000000000000000 RBX: ffffffffa0441120 RCX: ffff8881df514000
    RDX: ffff8881df514000 RSI: ffffffffa0441120 RDI: ffff8881df1e8870
    RBP: ffffffff81caf000 R08: ffff8881df1e8870 R09: 0000000000000000
    R10: 0000000000001000 R11: 0000000000000003 R12: ffff88822f550b40
    R13: 0000000000000001 R14: ffffc9000097ff08 R15: ffff8882238bad58
    FS: 00007f1437ff3740(0000) GS:ffff888236940000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000004e8 CR3: 00000001e0dfc001 CR4: 00000000001606e0
    Call Trace:
    dev_attr_show+0x15/0x50
    sysfs_kf_seq_show+0xb8/0x1a0
    seq_read+0x12d/0x350
    vfs_read+0x89/0x140
    ksys_read+0x55/0xd0
    do_syscall_64+0x55/0x1b0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9:

    Fixes: cf7ad3030271 ("IB/umad: Avoid destroying device while it is accessed")
    Link: https://lore.kernel.org/r/20200212072635.682689-9-leon@kernel.org
    Signed-off-by: Yonatan Cohen
    Signed-off-by: Leon Romanovsky
    Reviewed-by: Jason Gunthorpe
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Yonatan Cohen
     
  • commit f92e48718889b3d49cee41853402aa88cac84a6b upstream.

    When the hfi1 device is shut down during a system reboot, it is possible
    that some QPs might have not not freed by ULPs. More requests could be
    post sent and a lingering timer could be triggered to schedule more packet
    sends, leading to a crash:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
    IP: [ffffffff810a65f2] __queue_work+0x32/0x3c0
    PGD 0
    Oops: 0000 1 SMP
    Modules linked in: nvmet_rdma(OE) nvmet(OE) nvme(OE) dm_round_robin nvme_rdma(OE) nvme_fabrics(OE) nvme_core(OE) pal_raw(POE) pal_pmt(POE) pal_cache(POE) pal_pile(POE) pal(POE) pal_compatible(OE) rpcrdma sunrpc ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support mxm_wmi ipmi_ssif pcspkr ses enclosure joydev scsi_transport_sas i2c_i801 sg mei_me lpc_ich mei ioatdma shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad dm_multipath hangcheck_timer ip_tables ext4 mbcache jbd2 mlx4_en
    sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core crct10dif_pclmul crct10dif_common hfi1(OE) igb crc32c_intel rdmavt(OE) ahci ib_core libahci libata ptp megaraid_sas pps_core dca i2c_algo_bit i2c_core devlink dm_mirror dm_region_hash dm_log dm_mod
    CPU: 23 PID: 0 Comm: swapper/23 Tainted: P OE ------------ 3.10.0-693.el7.x86_64 #1
    Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0028.121720182203 12/17/2018
    task: ffff8808f4ec4f10 ti: ffff8808f4ed8000 task.ti: ffff8808f4ed8000
    RIP: 0010:[ffffffff810a65f2] [ffffffff810a65f2] __queue_work+0x32/0x3c0
    RSP: 0018:ffff88105df43d48 EFLAGS: 00010046
    RAX: 0000000000000086 RBX: 0000000000000086 RCX: 0000000000000000
    RDX: ffff880f74e758b0 RSI: 0000000000000000 RDI: 000000000000001f
    RBP: ffff88105df43d80 R08: ffff8808f3c583c8 R09: ffff8808f3c58000
    R10: 0000000000000002 R11: ffff88105df43da8 R12: ffff880f74e758b0
    R13: 000000000000001f R14: 0000000000000000 R15: ffff88105a300000
    FS: 0000000000000000(0000) GS:ffff88105df40000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000102 CR3: 00000000019f2000 CR4: 00000000001407e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Stack:
    ffff88105b6dd708 0000001f00000286 0000000000000086 ffff88105a300000
    ffff880f74e75800 0000000000000000 ffff88105a300000 ffff88105df43d98
    ffffffff810a6b85 ffff88105a301e80 ffff88105df43dc8 ffffffffc0224cde
    Call Trace:
    IRQ

    [ffffffff810a6b85] queue_work_on+0x45/0x50
    [ffffffffc0224cde] _hfi1_schedule_send+0x6e/0xc0 [hfi1]
    [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
    [ffffffffc0224d62] hfi1_schedule_send+0x32/0x70 [hfi1]
    [ffffffffc0170644] rvt_rc_timeout+0xd4/0x120 [rdmavt]
    [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
    [ffffffff81097316] call_timer_fn+0x36/0x110
    [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
    [ffffffff8109982d] run_timer_softirq+0x22d/0x310
    [ffffffff81090b3f] __do_softirq+0xef/0x280
    [ffffffff816b6a5c] call_softirq+0x1c/0x30
    [ffffffff8102d3c5] do_softirq+0x65/0xa0
    [ffffffff81090ec5] irq_exit+0x105/0x110
    [ffffffff816b76c2] smp_apic_timer_interrupt+0x42/0x50
    [ffffffff816b5c1d] apic_timer_interrupt+0x6d/0x80
    EOI

    [ffffffff81527a02] ? cpuidle_enter_state+0x52/0xc0
    [ffffffff81527b48] cpuidle_idle_call+0xd8/0x210
    [ffffffff81034fee] arch_cpu_idle+0xe/0x30
    [ffffffff810e7bca] cpu_startup_entry+0x14a/0x1c0
    [ffffffff81051af6] start_secondary+0x1b6/0x230
    Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 be 02 00 00 41 f6 86 02 01 00 00 01 0f 85 58 02 00 00 49 c7 c7 28 19 01 00
    RIP [ffffffff810a65f2] __queue_work+0x32/0x3c0
    RSP ffff88105df43d48
    CR2: 0000000000000102

    The solution is to reset the QPs before the device resources are freed.
    This reset will change the QP state to prevent post sends and delete
    timers to prevent callbacks.

    Fixes: 0acb0cc7ecc1 ("IB/rdmavt: Initialize and teardown of qpn table")
    Link: https://lore.kernel.org/r/20200210131040.87408.38161.stgit@awfm-01.aw.intel.com
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Kaike Wan
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Kaike Wan
     
  • commit be8638344c70bf492963ace206a9896606b6922d upstream.

    Cleaning up a pq can result in the following warning and panic:

    WARNING: CPU: 52 PID: 77418 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0
    list_del corruption, ffff88cb2c6ac068->next is LIST_POISON1 (dead000000000100)
    Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables
    nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit]
    CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G OE ------------ 3.10.0-957.38.3.el7.x86_64 #1
    Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019
    Call Trace:
    [] dump_stack+0x19/0x1b
    [] __warn+0xd8/0x100
    [] warn_slowpath_fmt+0x5f/0x80
    [] __list_del_entry+0x63/0xd0
    [] list_del+0xd/0x30
    [] kmem_cache_destroy+0x50/0x110
    [] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1]
    [] hfi1_file_close+0x70/0x1e0 [hfi1]
    [] __fput+0xec/0x260
    [] ____fput+0xe/0x10
    [] task_work_run+0xbb/0xe0
    [] do_notify_resume+0xa5/0xc0
    [] int_signal+0x12/0x17
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
    IP: [] kmem_cache_close+0x7e/0x300
    PGD 2cdab19067 PUD 2f7bfdb067 PMD 0
    Oops: 0000 [#1] SMP
    Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables
    nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit]
    CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.38.3.el7.x86_64 #1
    Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019
    task: ffff88cc26db9040 ti: ffff88b5393a8000 task.ti: ffff88b5393a8000
    RIP: 0010:[] [] kmem_cache_close+0x7e/0x300
    RSP: 0018:ffff88b5393abd60 EFLAGS: 00010287
    RAX: 0000000000000000 RBX: ffff88cb2c6ac000 RCX: 0000000000000003
    RDX: 0000000000000400 RSI: 0000000000000400 RDI: ffffffff9095b800
    RBP: ffff88b5393abdb0 R08: ffffffff9095b808 R09: ffffffff8ff77c19
    R10: ffff88b73ce1f160 R11: ffffddecddde9800 R12: ffff88cb2c6ac000
    R13: 000000000000000c R14: ffff88cf3fdca780 R15: 0000000000000000
    FS: 00002aaaaab52500(0000) GS:ffff88b73ce00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000010 CR3: 0000002d27664000 CR4: 00000000007607e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    PKRU: 55555554
    Call Trace:
    [] __kmem_cache_shutdown+0x14/0x80
    [] kmem_cache_destroy+0x58/0x110
    [] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1]
    [] hfi1_file_close+0x70/0x1e0 [hfi1]
    [] __fput+0xec/0x260
    [] ____fput+0xe/0x10
    [] task_work_run+0xbb/0xe0
    [] do_notify_resume+0xa5/0xc0
    [] int_signal+0x12/0x17
    Code: 00 00 ba 00 04 00 00 0f 4f c2 3d 00 04 00 00 89 45 bc 0f 84 e7 01 00 00 48 63 45 bc 49 8d 04 c4 48 89 45 b0 48 8b 80 c8 00 00 00 8b 78 10 48 89 45 c0 48 83 c0 10 48 89 45 d0 48 8b 17 48 39
    RIP [] kmem_cache_close+0x7e/0x300
    RSP
    CR2: 0000000000000010

    The panic is the result of slab entries being freed during the destruction
    of the pq slab.

    The code attempts to quiesce the pq, but looking for n_req == 0 doesn't
    account for new requests.

    Fix the issue by using SRCU to get a pq pointer and adjust the pq free
    logic to NULL the fd pq pointer prior to the quiesce.

    Fixes: e87473bc1b6c ("IB/hfi1: Only set fd pointer when base context is completely initialized")
    Link: https://lore.kernel.org/r/20200210131033.87408.81174.stgit@awfm-01.aw.intel.com
    Reviewed-by: Kaike Wan
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Mike Marciniszyn
     
  • commit a70ed0f2e6262e723ae8d70accb984ba309eacc2 upstream.

    Each user context is allocated a certain number of RcvArray (TID)
    entries and these entries are managed through TID groups. These groups
    are put into one of three lists in each user context: tid_group_list,
    tid_used_list, and tid_full_list, depending on the number of used TID
    entries within each group. When TID packets are expected, one or more
    TID groups will be allocated. After the packets are received, the TID
    groups will be freed. Since multiple user threads may access the TID
    groups simultaneously, a mutex exp_mutex is used to synchronize the
    access. However, when the user file is closed, it tries to release
    all TID groups without acquiring the mutex first, which risks a race
    condition with another thread that may be releasing its TID groups,
    leading to data corruption.

    This patch addresses the issue by acquiring the mutex first before
    releasing the TID groups when the file is closed.

    Fixes: 3abb33ac6521 ("staging/hfi1: Add TID cache receive init and free funcs")
    Link: https://lore.kernel.org/r/20200210131026.87408.86853.stgit@awfm-01.aw.intel.com
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Kaike Wan
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Kaike Wan
     
  • commit 10189e8e6fe8dcde13435f9354800429c4474fb1 upstream.

    When binding a QP with a counter and the QP state is not RESET, return
    failure if the rts2rts_qp_counters_set_id is not supported by the
    device.

    This is to prevent cases like manual bind for Connect-IB devices from
    returning success when the feature is not supported.

    Fixes: d14133dd4161 ("IB/mlx5: Support set qp counter")
    Link: https://lore.kernel.org/r/20200126171708.5167-1-leon@kernel.org
    Signed-off-by: Mark Zhang
    Reviewed-by: Maor Gottlieb
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Mark Zhang
     

15 Feb, 2020

8 commits

  • commit 36798d5ae1af62e830c5e045b2e41ce038690c61 upstream.

    Except for the last entry, the ending iova alignment sets the maximum
    possible page size as the low bits of the iova must be zero when starting
    the next chunk.

    Fixes: 4a35339958f1 ("RDMA/umem: Add API to find best driver supported page size in an MR")
    Link: https://lore.kernel.org/r/20200128135612.174820-1-leon@kernel.org
    Signed-off-by: Artemy Kovalyov
    Signed-off-by: Leon Romanovsky
    Tested-by: Gal Pressman
    Reviewed-by: Jason Gunthorpe
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Artemy Kovalyov
     
  • commit b4fb4cc5ba83b20dae13cef116c33648e81d2f44 upstream.

    Below commit missed the AF_IB and loopback code flow in
    rdma_resolve_addr(). This leads to an unbalanced cm_id refcount in
    cma_work_handler() which puts the refcount which was not incremented prior
    to queuing the work.

    A call trace is observed with such code flow:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    [] __mutex_lock_slowpath+0x166/0x1d0
    [] mutex_lock+0x1f/0x2f
    [] cma_work_handler+0x25/0xa0
    [] process_one_work+0x17f/0x440
    [] worker_thread+0x126/0x3c0

    Hence, hold the cm_id reference when scheduling the resolve work item.

    Fixes: 722c7b2bfead ("RDMA/{cma, core}: Avoid callback on rdma_addr_cancel()")
    Link: https://lore.kernel.org/r/20200126142652.104803-2-leon@kernel.org
    Signed-off-by: Parav Pandit
    Signed-off-by: Leon Romanovsky
    Reviewed-by: Jason Gunthorpe
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Parav Pandit
     
  • commit 14e23bd6d22123f6f3b2747701fa6cd4c6d05873 upstream.

    This should not be using ib_dev to test for disassociation, during
    disassociation is_closed is set under lock and the waitq is triggered.

    Instead check is_closed and be sure to re-obtain the lock to test the
    value after the wait_event returns.

    Fixes: 036b10635739 ("IB/uverbs: Enable device removal when there are active user space applications")
    Link: https://lore.kernel.org/r/1578504126-9400-12-git-send-email-yishaih@mellanox.com
    Signed-off-by: Yishai Hadas
    Reviewed-by: Håkon Bugge
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Jason Gunthorpe
     
  • commit 04db1580b5e48a79e24aa51ecae0cd4b2296ec23 upstream.

    A NULL pointer can be returned by in_dev_get(). Thus add a corresponding
    check so that a NULL pointer dereference will be avoided at this place.

    Fixes: 8e06af711bf2 ("i40iw: add main, hdr, status")
    Link: https://lore.kernel.org/r/1577672668-46499-1-git-send-email-xiyuyang19@fudan.edu.cn
    Signed-off-by: Xiyu Yang
    Signed-off-by: Xin Tan
    Reviewed-by: Leon Romanovsky
    Reviewed-by: Jason Gunthorpe
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Xiyu Yang
     
  • commit a242c36951ecd24bc16086940dbe6b522205c461 upstream.

    In rdma_nl_rcv_skb(), the local variable err is assigned the return value
    of the supplied callback function, which could be one of
    ib_nl_handle_resolve_resp(), ib_nl_handle_set_timeout(), or
    ib_nl_handle_ip_res_resp(). These three functions all return skb->len on
    success.

    rdma_nl_rcv_skb() is merely a copy of netlink_rcv_skb(). The callback
    functions used by the latter have the convention: "Returns 0 on success or
    a negative error code".

    In particular, the statement (equal for both functions):

    if (nlh->nlmsg_flags & NLM_F_ACK || err)

    implies that rdma_nl_rcv_skb() always will ack a message, independent of
    the NLM_F_ACK being set in nlmsg_flags or not.

    The fix could be to change the above statement, but it is better to keep
    the two *_rcv_skb() functions equal in this respect and instead change the
    three callback functions in the rdma subsystem to the correct convention.

    Fixes: 2ca546b92a02 ("IB/sa: Route SA pathrecord query through netlink")
    Fixes: ae43f8286730 ("IB/core: Add IP to GID netlink offload")
    Link: https://lore.kernel.org/r/20191216120436.3204814-1-haakon.bugge@oracle.com
    Suggested-by: Mark Haywood
    Signed-off-by: Håkon Bugge
    Tested-by: Mark Haywood
    Reviewed-by: Leon Romanovsky
    Reviewed-by: Jason Gunthorpe
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Håkon Bugge
     
  • commit ea660ad7c1c476fd6e5e3b17780d47159db71dea upstream.

    Using CX-3 virtual functions, either from a bare-metal machine or
    pass-through from a VM, MAD packets are proxied through the PF driver.

    Since the VF drivers have separate name spaces for MAD Transaction Ids
    (TIDs), the PF driver has to re-map the TIDs and keep the book keeping in
    a cache.

    Following the RDMA Connection Manager (CM) protocol, it is clear when an
    entry has to evicted from the cache. When a DREP is sent from
    mlx4_ib_multiplex_cm_handler(), id_map_find_del() is called. Similar when
    a REJ is received by the mlx4_ib_demux_cm_handler(), id_map_find_del() is
    called.

    This function wipes out the TID in use from the IDR or XArray and removes
    the id_map_entry from the table.

    In short, it does everything except the topping of the cake, which is to
    remove the entry from the list and free it. In other words, for the REJ
    case enumerated above, one id_map_entry will be leaked.

    For the other case above, a DREQ has been received first. The reception of
    the DREQ will trigger queuing of a delayed work to delete the
    id_map_entry, for the case where the VM doesn't send back a DREP.

    In the normal case, the VM _will_ send back a DREP, and id_map_find_del()
    will be called.

    But this scenario introduces a secondary leak. First, when the DREQ is
    received, a delayed work is queued. The VM will then return a DREP, which
    will call id_map_find_del(). As stated above, this will free the TID used
    from the XArray or IDR. Now, there is window where that particular TID can
    be re-allocated, lets say by an outgoing REQ. This TID will later be wiped
    out by the delayed work, when the function id_map_ent_timeout() is
    called. But the id_map_entry allocated by the outgoing REQ will not be
    de-allocated, and we have a leak.

    Both leaks are fixed by removing the id_map_find_del() function and only
    using schedule_delayed(). Of course, a check in schedule_delayed() to see
    if the work already has been queued, has been added.

    Another benefit of always using the delayed version for deleting entries,
    is that we do get a TimeWait effect; a TID no longer in use, will occupy
    the XArray or IDR for CM_CLEANUP_CACHE_TIMEOUT time, without any ability
    of being re-used for that time period.

    Fixes: 3cf69cc8dbeb ("IB/mlx4: Add CM paravirtualization")
    Link: https://lore.kernel.org/r/20200123155521.1212288-1-haakon.bugge@oracle.com
    Signed-off-by: Håkon Bugge
    Signed-off-by: Manjunath Patil
    Reviewed-by: Rama Nichanamatlu
    Reviewed-by: Jack Morgenstein
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Håkon Bugge
     
  • commit 0fbb37dd82998b5c83355997b3bdba2806968ac7 upstream.

    Some SRP targets that do not support specification SRP-2, put the garbage
    to the reserved bits of the SRP login response. The problem was not
    detected for a long time because the SRP initiator ignored those bits. But
    now one of them is used as SRP_LOGIN_RSP_IMMED_SUPP. And it causes a
    critical error on the target when the initiator sends immediate data.

    The ib_srp module has a use_imm_date parameter to enable or disable
    immediate data manually. But it does not help in the above case, because
    use_imm_date is ignored at handling the SRP login response. The problem is
    definitely caused by a bug on the target side, but the initiator's
    behavior also does not look correct. The initiator should not use
    immediate data if use_imm_date is disabled by a user.

    This commit adds an additional checking of use_imm_date at the handling of
    SRP login response to avoid unexpected use of immediate data.

    Fixes: 882981f4a411 ("RDMA/srp: Add support for immediate data")
    Link: https://lore.kernel.org/r/20200115133055.30232-1-sergeygo@mellanox.com
    Signed-off-by: Sergey Gorenko
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Sergey Gorenko
     
  • commit eaad647e5cc27f7b46a27f3b85b14c4c8a64bffa upstream.

    In procedure mlx4_ib_add_gid(), if the driver is unable to update the FW
    gid table, there is a memory leak in the driver's copy of the gid table:
    the gid entry's context buffer is not freed.

    If such an error occurs, free the entry's context buffer, and mark the
    entry as available (by setting its context pointer to NULL).

    Fixes: e26be1bfef81 ("IB/mlx4: Implement ib_device callbacks")
    Link: https://lore.kernel.org/r/20200115085050.73746-1-leon@kernel.org
    Signed-off-by: Jack Morgenstein
    Reviewed-by: Parav Pandit
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Jack Morgenstein
     

11 Feb, 2020

2 commits

  • commit d07de8bd1709a80a282963ad7b2535148678a9e4 upstream.

    The nr_pages argument of get_user_pages_remote() should always be in terms
    of the system page size, not the MR page size. Use PAGE_SIZE instead of
    umem_odp->page_shift.

    Fixes: 403cd12e2cf7 ("IB/umem: Add contiguous ODP support")
    Link: https://lore.kernel.org/r/20191222124649.52300-3-leon@kernel.org
    Signed-off-by: Yishai Hadas
    Reviewed-by: Artemy Kovalyov
    Reviewed-by: Jason Gunthorpe
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Yishai Hadas
     
  • commit b5671afe5e39ed71e94eae788bacdcceec69db09 upstream.

    Commit b0ffeb537f3a ("IB/mlx5: Fix iteration overrun in GSI qps") changed
    the way outstanding WRs are tracked for the GSI QP. But the fix did not
    cover the case when a call to ib_post_send() fails and updates index to
    track outstanding.

    Since the prior commmit outstanding_pi should not be bounded otherwise the
    loop generate_completions() will fail.

    Fixes: b0ffeb537f3a ("IB/mlx5: Fix iteration overrun in GSI qps")
    Link: https://lore.kernel.org/r/1576195889-23527-1-git-send-email-psajeepa@purestorage.com
    Signed-off-by: Prabhath Sajeepa
    Acked-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Prabhath Sajeepa
     

29 Jan, 2020

1 commit

  • commit 04060db41178c7c244f2c7dcd913e7fd331de915 upstream.

    iscsit_close_connection() calls isert_wait_conn(). Due to commit
    e9d3009cb936 both functions call target_wait_for_sess_cmds() although that
    last function should be called only once. Fix this by removing the
    target_wait_for_sess_cmds() call from isert_wait_conn() and by only calling
    isert_wait_conn() after target_wait_for_sess_cmds().

    Fixes: e9d3009cb936 ("scsi: target: iscsi: Wait for all commands to finish before freeing a session").
    Link: https://lore.kernel.org/r/20200116044737.19507-1-bvanassche@acm.org
    Reported-by: Rahul Kundu
    Signed-off-by: Bart Van Assche
    Tested-by: Mike Marciniszyn
    Acked-by: Sagi Grimberg
    Signed-off-by: Martin K. Petersen
    Signed-off-by: Greg Kroah-Hartman

    Bart Van Assche
     

18 Jan, 2020

9 commits

  • commit e88982ad1bb12db699de96fbc07096359ef6176c upstream.

    The code added by this patch is similar to the code that already exists in
    ibmvscsis_determine_resid(). This patch has been tested by running the
    following command:

    strace sg_raw -r 1k /dev/sdb 12 00 00 00 60 00 -o inquiry.bin |&
    grep resid=

    Link: https://lore.kernel.org/r/20191105214632.183302-1-bvanassche@acm.org
    Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
    Signed-off-by: Bart Van Assche
    Acked-by: Honggang Li
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Bart Van Assche
     
  • commit 546d30099ed204792083f043cd7e016de86016a3 upstream.

    Returned value from mlx5_mr_cache_alloc() is checked to be error or real
    pointer. Return proper error code instead of NULL which is not checked
    later.

    Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support")
    Link: https://lore.kernel.org/r/20191029055721.7192-1-leon@kernel.org
    Signed-off-by: Leon Romanovsky
    Reviewed-by: Jason Gunthorpe
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Leon Romanovsky
     
  • commit 887803db866a7a4e1817a3cb8a3eee4e9879fed2 upstream.

    qpc/cqc timer entry size needs one page, but currently they are fixedly
    configured to 4096, which is not appropriate in 64K page scenarios. So
    they should be modified to PAGE_SIZE.

    Fixes: 0e40dc2f70cd ("RDMA/hns: Add timer allocation support for hip08")
    Link: https://lore.kernel.org/r/1571908917-16220-3-git-send-email-liweihang@hisilicon.com
    Signed-off-by: Yangyang Li
    Signed-off-by: Weihang Li
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Yangyang Li
     
  • commit 5c7e76fb7cb5071be800c938ebf2c475e140d3f0 upstream.

    SRQ's page size configuration of BA and buffer should depend on current
    PAGE_SHIFT, or it can't work in scenario of 64K page.

    Fixes: c7bcb13442e1 ("RDMA/hns: Add SRQ support for hip08 kernel mode")
    Link: https://lore.kernel.org/r/1571908917-16220-2-git-send-email-liweihang@hisilicon.com
    Signed-off-by: Lijun Ou
    Signed-off-by: Weihang Li
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Lijun Ou
     
  • commit d302c6e3a6895608a5856bc708c47bda1770b24d upstream.

    Even if no response from hardware, we should make sure that qp related
    resources are released to avoid memory leaks.

    Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC")
    Signed-off-by: Yangyang Li
    Signed-off-by: Weihang Li
    Link: https://lore.kernel.org/r/1570584110-3659-1-git-send-email-liweihang@hisilicon.com
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Yangyang Li
     
  • commit d5b60e26e86a463ca83bb5ec502dda6ea685159e upstream.

    This is not the first attempt to fix building random configurations,
    unfortunately the attempt in commit a07fc0bb483e ("RDMA/hns: Fix build
    error") caused a new problem when CONFIG_INFINIBAND_HNS_HIP06=m and
    CONFIG_INFINIBAND_HNS_HIP08=y:

    drivers/infiniband/hw/hns/hns_roce_main.o:(.rodata+0xe60): undefined reference to `__this_module'

    Revert commits a07fc0bb483e ("RDMA/hns: Fix build error") and
    a3e2d4c7e766 ("RDMA/hns: remove obsolete Kconfig comment") to get back to
    the previous state, then fix the issues described there differently, by
    adding more specific dependencies: INFINIBAND_HNS can now only be built-in
    if at least one of HNS or HNS3 are built-in, and the individual back-ends
    are only available if that code is reachable from the main driver.

    Fixes: a07fc0bb483e ("RDMA/hns: Fix build error")
    Fixes: a3e2d4c7e766 ("RDMA/hns: remove obsolete Kconfig comment")
    Fixes: dd74282df573 ("RDMA/hns: Initialize the PCI device for hip08 RoCE")
    Fixes: 08805fdbeb2d ("RDMA/hns: Split hw v1 driver from hns roce driver")
    Link: https://lore.kernel.org/r/20191007211826.3361202-1-arnd@arndb.de
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Arnd Bergmann
     
  • commit 050dbddf249eee3e936b5734c30b2e1b427efdc3 upstream.

    sin_port and sin6_port are big endian member variables. Convert these port
    numbers into CPU endianness before printing.

    Link: https://lore.kernel.org/r/20190930231707.48259-5-bvanassche@acm.org
    Fixes: 6c52fdc244b5 ("rdma/siw: connection management")
    Signed-off-by: Bart Van Assche
    Reviewed-by: Bernard Metzler
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Bart Van Assche
     
  • commit 663912a6378a34fd4f43b8d873f0c6c6322d9d0e upstream.

    If auto mode is configured, manual counter allocation and QP bind is not
    allowed.

    Fixes: 1bd8e0a9d0fd ("RDMA/counter: Allow manual mode configuration support")
    Link: https://lore.kernel.org/r/20190916071154.20383-3-leon@kernel.org
    Signed-off-by: Mark Zhang
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Mark Zhang
     
  • commit cfd82da4e741c16d71a12123bf0cb585af2b8796 upstream.

    The restrack function return EINVAL instead of EMSGSIZE when the driver
    operation fails.

    Fixes: 4b42d05d0b2c ("RDMA/hns: Remove unnecessary kzalloc")
    Signed-off-by: Lang Cheng
    Signed-off-by: Weihang Li
    Link: https://lore.kernel.org/r/1567566885-23088-5-git-send-email-liweihang@hisilicon.com
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Lang Cheng