Eric Lee / smarc-fsl-linux-kernel

05 Mar, 2020

3 commits

c2e2f561d RDMA/hns: Bugfix for posting a wqe with sge ... Browse Code »

commit 468d020e2f02867b8ec561461a1689cd4365e493 upstream.

Driver should first check whether the sge is valid, then fill the valid
sge and the caculated total into hardware, otherwise invalid sges will
cause an error.

Fixes: 52e3b42a2f58 ("RDMA/hns: Filter for zero length of sge in hip08 kernel mode")
Fixes: 7bdee4158b37 ("RDMA/hns: Fill sq wqe context of ud type in hip08")
Link: https://lore.kernel.org/r/1578571852-13704-1-git-send-email-liweihang@huawei.com
Signed-off-by: Lijun Ou
Signed-off-by: Weihang Li
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Lijun Ou
2020-03-05 23:43:49 +0800
3065f5776 RDMA/hns: Simplify the calculation and usage of wqe idx for post verbs ... Browse Code »

commit 4768820243d71d49f1044b3f911ac3d52bdb79af upstream.

Currently, the wqe idx is calculated repeatly everywhere it is used. This
patch defines wqe_idx and calculated it only once, then just use it as
needed.

Fixes: 2d40788825ac ("RDMA/hns: Add support for processing send wr and receive wr")
Link: https://lore.kernel.org/r/1575981902-5274-1-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yixian Liu
Signed-off-by: Weihang Li
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Yixian Liu
2020-03-05 23:43:48 +0800
eb62f4c2e RDMA/siw: Remove unwanted WARN_ON in siw_cm_llp_data_ready() ... Browse Code »

[ Upstream commit 663218a3e715fd9339d143a3e10088316b180f4f ]

Warnings like below can fill up the dmesg while disconnecting RDMA
connections.
Hence, remove the unwanted WARN_ON.

WARNING: CPU: 6 PID: 0 at drivers/infiniband/sw/siw/siw_cm.c:1229 siw_cm_llp_data_ready+0xc1/0xd0 [siw]
RIP: 0010:siw_cm_llp_data_ready+0xc1/0xd0 [siw]
Call Trace:

tcp_data_queue+0x226/0xb40
tcp_rcv_established+0x220/0x620
tcp_v4_do_rcv+0x12a/0x1e0
tcp_v4_rcv+0xb05/0xc00
ip_local_deliver_finish+0x69/0x210
ip_local_deliver+0x6b/0xe0
ip_rcv+0x273/0x362
__netif_receive_skb_core+0xb35/0xc30
netif_receive_skb_internal+0x3d/0xb0
napi_gro_frags+0x13b/0x200
t4_ethrx_handler+0x433/0x7d0 [cxgb4]
process_responses+0x318/0x580 [cxgb4]
napi_rx_handler+0x14/0x100 [cxgb4]
net_rx_action+0x149/0x3b0
__do_softirq+0xe3/0x30a
irq_exit+0x100/0x110
do_IRQ+0x7f/0xe0
common_interrupt+0xf/0xf

Link: https://lore.kernel.org/r/20200207141429.27927-1-krishna2@chelsio.com
Signed-off-by: Krishnamraju Eraparaju
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin

Krishnamraju Eraparaju
2020-03-05 23:43:38 +0800

29 Feb, 2020

1 commit

d92e714a4 scsi: Revert "RDMA/isert: Fix a recently introduced regression related to logout" ... Browse Code »

commit 76261ada16dcc3be610396a46d35acc3efbda682 upstream.

Since commit 04060db41178 introduces soft lockups when toggling network
interfaces, revert it.

Link: https://marc.info/?l=target-devel&m=158157054906196
Cc: Rahul Kundu
Cc: Mike Marciniszyn
Cc: Sagi Grimberg
Reported-by: Dakshaja Uppalapati
Fixes: 04060db41178 ("scsi: RDMA/isert: Fix a recently introduced regression related to logout")
Signed-off-by: Bart Van Assche
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman

Bart Van Assche
2020-02-29 00:22:25 +0800

24 Feb, 2020

6 commits

b04235f1e RDMA/mlx5: Don't fake udata for kernel path ... Browse Code »

[ Upstream commit 4835709176e8ccf6561abc9f5c405293e008095f ]

Kernel paths must not set udata and provide NULL pointer,
instead of faking zeroed udata struct.

Signed-off-by: Leon Romanovsky
Signed-off-by: Sasha Levin

Leon Romanovsky
2020-02-24 15:36:51 +0800
75d916c3b IB/hfi1: Add RcvShortLengthErrCnt to hfi1stats ... Browse Code »

[ Upstream commit 2c9d4e26d1ab27ceae2ded2ffe930f8e5f5b2a89 ]

This counter, RxShrErr, is required for error analysis and debug.

Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Link: https://lore.kernel.org/r/20200106134235.119356.29123.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan
Signed-off-by: Mike Marciniszyn
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin

Mike Marciniszyn
2020-02-24 15:36:45 +0800
9cfe6c21f IB/hfi1: Add software counter for ctxt0 seq drop ... Browse Code »

[ Upstream commit 5ffd048698ea5139743acd45e8ab388a683642b8 ]

All other code paths increment some form of drop counter.

This was missed in the original implementation.

Fixes: 82c2611daaf0 ("staging/rdma/hfi1: Handle packets with invalid RHF on context 0")
Link: https://lore.kernel.org/r/20200106134228.119356.96828.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan
Signed-off-by: Mike Marciniszyn
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin

Mike Marciniszyn
2020-02-24 15:36:45 +0800
9d89ff3d2 RDMA/hns: Avoid printing address of mtt page ... Browse Code »

[ Upstream commit eca44507c3e908b7362696a4d6a11d90371334c6 ]

Address of a page shouldn't be printed in case of security issues.

Link: https://lore.kernel.org/r/1578313276-29080-2-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang
Signed-off-by: Weihang Li
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin

Wenpeng Liang
2020-02-24 15:36:44 +0800
d1d92e972 RDMA/rxe: Fix error type of mmap_offset ... Browse Code »

[ Upstream commit 6ca18d8927d468c763571f78c9a7387a69ffa020 ]

The type of mmap_offset should be u64 instead of int to match the type of
mminfo.offset. If otherwise, after we create several thousands of CQs, it
will run into overflow issues.

Link: https://lore.kernel.org/r/20191227113613.5020-1-kejiewei.cn@gmail.com
Signed-off-by: Jiewei Ke
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin

Jiewei Ke
2020-02-24 15:36:42 +0800
9ad79d4fa IB/core: Let IB core distribute cache update events ... Browse Code »

[ Upstream commit 6b57cea9221b0247ad5111b348522625e489a8e4 ]

Currently when the low level driver notifies Pkey, GID, and port change
events they are notified to the registered handlers in the order they are
registered.

IB core and other ULPs such as IPoIB are interested in GID, LID, Pkey
change events.

Since all GID queries done by ULPs are serviced by IB core, and the IB
core deferes cache updates to a work queue, it is possible for other
clients to see stale cache data when they handle their own events.

For example, the below call tree shows how ipoib will call
rdma_query_gid() concurrently with the update to the cache sitting in the
WQ.

mlx5_ib_handle_event()
ib_dispatch_event()
ib_cache_event()
queue_work() -> slow cache update

[..]
ipoib_event()
queue_work()
[..]
work handler
ipoib_ib_dev_flush_light()
__ipoib_ib_dev_flush()
ipoib_dev_addr_changed_valid()
rdma_query_gid()
Signed-off-by: Leon Romanovsky
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin

Parav Pandit
2020-02-24 15:36:26 +0800

20 Feb, 2020

10 commits

ae88de70c RDMA/core: Fix protection fault in get_pkey_idx_qp_list ... Browse Code »

commit 1dd017882e01d2fcd9c5dbbf1eb376211111c393 upstream.

We don't need to set pkey as valid in case that user set only one of pkey
index or port number, otherwise it will be resulted in NULL pointer
dereference while accessing to uninitialized pkey list. The following
crash from Syzkaller revealed it.

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 14753 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
RIP: 0010:get_pkey_idx_qp_list+0x161/0x2d0
Code: 01 00 00 49 8b 5e 20 4c 39 e3 0f 84 b9 00 00 00 e8 e4 42 6e fe 48
8d 7b 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 b6 04
02 84 c0 74 08 3c 01 0f 8e d0 00 00 00 48 8d 7d 04 48 b8
RSP: 0018:ffffc9000bc6f950 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff82c8bdec
RDX: 0000000000000002 RSI: ffffc900030a8000 RDI: 0000000000000010
RBP: ffff888112c8ce80 R08: 0000000000000004 R09: fffff5200178df1f
R10: 0000000000000001 R11: fffff5200178df1f R12: ffff888115dc4430
R13: ffff888115da8498 R14: ffff888115dc4410 R15: ffff888115da8000
FS: 00007f20777de700(0000) GS:ffff88811b100000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2f721000 CR3: 00000001173ca002 CR4: 0000000000360ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
port_pkey_list_insert+0xd7/0x7c0
ib_security_modify_qp+0x6fa/0xfc0
_ib_modify_qp+0x8c4/0xbf0
modify_qp+0x10da/0x16d0
ib_uverbs_modify_qp+0x9a/0x100
ib_uverbs_write+0xaa5/0xdf0
__vfs_write+0x7c/0x100
vfs_write+0x168/0x4a0
ksys_write+0xc8/0x200
do_syscall_64+0x9c/0x390
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs")
Link: https://lore.kernel.org/r/20200212080651.GB679970@unreal
Signed-off-by: Maor Gottlieb
Signed-off-by: Leon Romanovsky
Message-Id:
Signed-off-by: Greg Kroah-Hartman

Leon Romanovsky
2020-02-20 02:53:06 +0800
2c753af06 RDMA/rxe: Fix soft lockup problem due to using tasklets in softirq ... Browse Code »

commit 8ac0e6641c7ca14833a2a8c6f13d8e0a435e535c upstream.

When run stress tests with RXE, the following Call Traces often occur

watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0]
...
Call Trace:

create_object+0x3f/0x3b0
kmem_cache_alloc_node_trace+0x129/0x2d0
__kmalloc_reserve.isra.52+0x2e/0x80
__alloc_skb+0x83/0x270
rxe_init_packet+0x99/0x150 [rdma_rxe]
rxe_requester+0x34e/0x11a0 [rdma_rxe]
rxe_do_task+0x85/0xf0 [rdma_rxe]
tasklet_action_common.isra.21+0xeb/0x100
__do_softirq+0xd0/0x298
irq_exit+0xc5/0xd0
smp_apic_timer_interrupt+0x68/0x120
apic_timer_interrupt+0xf/0x20

...

The root cause is that tasklet is actually a softirq. In a tasklet
handler, another softirq handler is triggered. Usually these softirq
handlers run on the same cpu core. So this will cause "soft lockup Bug".

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200212072635.682689-8-leon@kernel.org
Signed-off-by: Zhu Yanjun
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Zhu Yanjun
2020-02-20 02:53:06 +0800
8662e612a RDMA/hfi1: Fix memory leak in _dev_comp_vect_mappings_create ... Browse Code »

commit 8a4f300b978edbbaa73ef9eca660e45eb9f13873 upstream.

Make sure to free the allocated cpumask_var_t's to avoid the following
reported memory leak by kmemleak:

$ cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8897f812d6a8 (size 8):
comm "kworker/1:1", pid 347, jiffies 4294751400 (age 101.703s)
hex dump (first 8 bytes):
00 00 00 00 00 00 00 00 ........
backtrace:
[] alloc_cpumask_var_node+0x4c/0xb0
[] hfi1_comp_vectors_set_up+0x20f/0x800 [hfi1]
[] hfi1_init_dd+0x3311/0x4960 [hfi1]
[] init_one+0x25e/0xf10 [hfi1]
[] local_pci_probe+0xd4/0x180
[] work_for_cpu_fn+0x51/0xa0
[] process_one_work+0x8f0/0x17b0
[] worker_thread+0x536/0xb50
[] kthread+0x30c/0x3d0
[] ret_from_fork+0x3a/0x50

Fixes: 5d18ee67d4c1 ("IB/{hfi1, rdmavt, qib}: Implement CQ completion vector support")
Link: https://lore.kernel.org/r/20200205110530.12129-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib
Reviewed-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Kamal Heib
2020-02-20 02:53:06 +0800
b860a4524 RDMA/iw_cxgb4: initiate CLOSE when entering TERM ... Browse Code »

commit d219face9059f38ad187bde133451a2a308fdb7c upstream.

As per draft-hilland-iwarp-verbs-v1.0, sec 6.2.3, always initiate a CLOSE
when entering into TERM state.

In c4iw_modify_qp(), disconnect operation should only be performed when
the modify_qp call is invoked from ib_core. And all other internal
modify_qp calls(invoked within iw_cxgb4) that needs 'disconnect' should
call c4iw_ep_disconnect() explicitly after modify_qp. Otherwise, deadlocks
like below can occur:

Call Trace:
schedule+0x2f/0xa0
schedule_preempt_disabled+0xa/0x10
__mutex_lock.isra.5+0x2d0/0x4a0
c4iw_ep_disconnect+0x39/0x430 => tries to reacquire ep lock again
c4iw_modify_qp+0x468/0x10d0
rx_data+0x218/0x570 => acquires ep lock
process_work+0x5f/0x70
process_one_work+0x1a7/0x3b0
worker_thread+0x30/0x390
kthread+0x112/0x130
ret_from_fork+0x35/0x40

Fixes: d2c33370ae73 ("RDMA/iw_cxgb4: Always disconnect when QP is transitioning to TERMINATE state")
Link: https://lore.kernel.org/r/20200204091230.7210-1-krishna2@chelsio.com
Signed-off-by: Krishnamraju Eraparaju
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Krishnamraju Eraparaju
2020-02-20 02:53:06 +0800
c60c4b4b6 RDMA/core: Fix invalid memory access in spec_filter_size ... Browse Code »

commit a72f4ac1d778f7bde93dfee69bfc23377ec3d74f upstream.

Add a check that the size specified in the flow spec header doesn't cause
an overflow when calculating the filter size, and thus prevent access to
invalid memory. The following crash from syzkaller revealed it.

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 17834 Comm: syz-executor.3 Not tainted 5.5.0-rc5 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
RIP: 0010:memchr_inv+0xd3/0x330
Code: 89 f9 89 f5 83 e1 07 0f 85 f9 00 00 00 49 89 d5 49 c1 ed 03 45 85
ed 74 6f 48 89 d9 48 b8 00 00 00 00 00 fc ff df 48 c1 e9 03 3c 01
00 0f 85 0d 02 00 00 44 0f b6 e5 48 b8 01 01 01 01 01 01
RSP: 0018:ffffc9000a13fa50 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 7fff88810de9d820 RCX: 0ffff11021bd3b04
RDX: 000000000000fff8 RSI: 0000000000000000 RDI: 7fff88810de9d820
RBP: 0000000000000000 R08: ffff888110d69018 R09: 0000000000000009
R10: 0000000000000001 R11: ffffed10236267cc R12: 0000000000000004
R13: 0000000000001fff R14: ffff88810de9d820 R15: 0000000000000040
FS: 00007f9ee0e51700(0000) GS:ffff88811b100000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000115ea0006 CR4: 0000000000360ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
spec_filter_size.part.16+0x34/0x50
ib_uverbs_kern_spec_to_ib_spec_filter+0x691/0x770
ib_uverbs_ex_create_flow+0x9ea/0x1b40
ib_uverbs_write+0xaa5/0xdf0
__vfs_write+0x7c/0x100
vfs_write+0x168/0x4a0
ksys_write+0xc8/0x200
do_syscall_64+0x9c/0x390
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x465b49
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01
f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f9ee0e50c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000465b49
RDX: 00000000000003a0 RSI: 00000000200007c0 RDI: 0000000000000004
RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ee0e516bc
R13: 00000000004ca2da R14: 000000000070deb8 R15: 00000000ffffffff
Modules linked in:
Dumping ftrace buffer:
(ftrace buffer empty)

Fixes: 94e03f11ad1f ("IB/uverbs: Add support for flow tag")
Link: https://lore.kernel.org/r/20200126171500.4623-1-leon@kernel.org
Signed-off-by: Avihai Horon
Reviewed-by: Maor Gottlieb
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Avihai Horon
2020-02-20 02:53:06 +0800
8a14f01c4 IB/umad: Fix kernel crash while unloading ib_umad ... Browse Code »

commit 9ea04d0df6e6541c6736b43bff45f1e54875a1db upstream.

When disassociating a device from umad we must ensure that the sysfs
access is prevented before blocking the fops, otherwise assumptions in
syfs don't hold:

CPU0 CPU1
ib_umad_kill_port() ibdev_show()
port->ib_dev = NULL
dev_name(port->ib_dev)

The prior patch made an error in moving the device_destroy(), it should
have been split into device_del() (above) and put_device() (below). At
this point we already have the split, so move the device_del() back to its
original place.

kernel stack
PF: error_code(0x0000) - not-present page
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
RIP: 0010:ibdev_show+0x18/0x50 [ib_umad]
RSP: 0018:ffffc9000097fe40 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffffffffa0441120 RCX: ffff8881df514000
RDX: ffff8881df514000 RSI: ffffffffa0441120 RDI: ffff8881df1e8870
RBP: ffffffff81caf000 R08: ffff8881df1e8870 R09: 0000000000000000
R10: 0000000000001000 R11: 0000000000000003 R12: ffff88822f550b40
R13: 0000000000000001 R14: ffffc9000097ff08 R15: ffff8882238bad58
FS: 00007f1437ff3740(0000) GS:ffff888236940000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000004e8 CR3: 00000001e0dfc001 CR4: 00000000001606e0
Call Trace:
dev_attr_show+0x15/0x50
sysfs_kf_seq_show+0xb8/0x1a0
seq_read+0x12d/0x350
vfs_read+0x89/0x140
ksys_read+0x55/0xd0
do_syscall_64+0x55/0x1b0
entry_SYSCALL_64_after_hwframe+0x44/0xa9:

Fixes: cf7ad3030271 ("IB/umad: Avoid destroying device while it is accessed")
Link: https://lore.kernel.org/r/20200212072635.682689-9-leon@kernel.org
Signed-off-by: Yonatan Cohen
Signed-off-by: Leon Romanovsky
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Yonatan Cohen
2020-02-20 02:53:06 +0800
6603342a6 IB/rdmavt: Reset all QPs when the device is shut down ... Browse Code »

commit f92e48718889b3d49cee41853402aa88cac84a6b upstream.

When the hfi1 device is shut down during a system reboot, it is possible
that some QPs might have not not freed by ULPs. More requests could be
post sent and a lingering timer could be triggered to schedule more packet
sends, leading to a crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
IP: [ffffffff810a65f2] __queue_work+0x32/0x3c0
PGD 0
Oops: 0000 1 SMP
Modules linked in: nvmet_rdma(OE) nvmet(OE) nvme(OE) dm_round_robin nvme_rdma(OE) nvme_fabrics(OE) nvme_core(OE) pal_raw(POE) pal_pmt(POE) pal_cache(POE) pal_pile(POE) pal(POE) pal_compatible(OE) rpcrdma sunrpc ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support mxm_wmi ipmi_ssif pcspkr ses enclosure joydev scsi_transport_sas i2c_i801 sg mei_me lpc_ich mei ioatdma shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad dm_multipath hangcheck_timer ip_tables ext4 mbcache jbd2 mlx4_en
sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core crct10dif_pclmul crct10dif_common hfi1(OE) igb crc32c_intel rdmavt(OE) ahci ib_core libahci libata ptp megaraid_sas pps_core dca i2c_algo_bit i2c_core devlink dm_mirror dm_region_hash dm_log dm_mod
CPU: 23 PID: 0 Comm: swapper/23 Tainted: P OE ------------ 3.10.0-693.el7.x86_64 #1
Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0028.121720182203 12/17/2018
task: ffff8808f4ec4f10 ti: ffff8808f4ed8000 task.ti: ffff8808f4ed8000
RIP: 0010:[ffffffff810a65f2] [ffffffff810a65f2] __queue_work+0x32/0x3c0
RSP: 0018:ffff88105df43d48 EFLAGS: 00010046
RAX: 0000000000000086 RBX: 0000000000000086 RCX: 0000000000000000
RDX: ffff880f74e758b0 RSI: 0000000000000000 RDI: 000000000000001f
RBP: ffff88105df43d80 R08: ffff8808f3c583c8 R09: ffff8808f3c58000
R10: 0000000000000002 R11: ffff88105df43da8 R12: ffff880f74e758b0
R13: 000000000000001f R14: 0000000000000000 R15: ffff88105a300000
FS: 0000000000000000(0000) GS:ffff88105df40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000102 CR3: 00000000019f2000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff88105b6dd708 0000001f00000286 0000000000000086 ffff88105a300000
ffff880f74e75800 0000000000000000 ffff88105a300000 ffff88105df43d98
ffffffff810a6b85 ffff88105a301e80 ffff88105df43dc8 ffffffffc0224cde
Call Trace:
IRQ

[ffffffff810a6b85] queue_work_on+0x45/0x50
[ffffffffc0224cde] _hfi1_schedule_send+0x6e/0xc0 [hfi1]
[ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
[ffffffffc0224d62] hfi1_schedule_send+0x32/0x70 [hfi1]
[ffffffffc0170644] rvt_rc_timeout+0xd4/0x120 [rdmavt]
[ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
[ffffffff81097316] call_timer_fn+0x36/0x110
[ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
[ffffffff8109982d] run_timer_softirq+0x22d/0x310
[ffffffff81090b3f] __do_softirq+0xef/0x280
[ffffffff816b6a5c] call_softirq+0x1c/0x30
[ffffffff8102d3c5] do_softirq+0x65/0xa0
[ffffffff81090ec5] irq_exit+0x105/0x110
[ffffffff816b76c2] smp_apic_timer_interrupt+0x42/0x50
[ffffffff816b5c1d] apic_timer_interrupt+0x6d/0x80
EOI

[ffffffff81527a02] ? cpuidle_enter_state+0x52/0xc0
[ffffffff81527b48] cpuidle_idle_call+0xd8/0x210
[ffffffff81034fee] arch_cpu_idle+0xe/0x30
[ffffffff810e7bca] cpu_startup_entry+0x14a/0x1c0
[ffffffff81051af6] start_secondary+0x1b6/0x230
Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 be 02 00 00 41 f6 86 02 01 00 00 01 0f 85 58 02 00 00 49 c7 c7 28 19 01 00
RIP [ffffffff810a65f2] __queue_work+0x32/0x3c0
RSP ffff88105df43d48
CR2: 0000000000000102

The solution is to reset the QPs before the device resources are freed.
This reset will change the QP state to prevent post sends and delete
timers to prevent callbacks.

Fixes: 0acb0cc7ecc1 ("IB/rdmavt: Initialize and teardown of qpn table")
Link: https://lore.kernel.org/r/20200210131040.87408.38161.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn
Signed-off-by: Kaike Wan
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Kaike Wan
2020-02-20 02:53:05 +0800
b16dfda32 IB/hfi1: Close window for pq and request coliding ... Browse Code »

commit be8638344c70bf492963ace206a9896606b6922d upstream.

Cleaning up a pq can result in the following warning and panic:

WARNING: CPU: 52 PID: 77418 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0
list_del corruption, ffff88cb2c6ac068->next is LIST_POISON1 (dead000000000100)
Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables
nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit]
CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G OE ------------ 3.10.0-957.38.3.el7.x86_64 #1
Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019
Call Trace:
[] dump_stack+0x19/0x1b
[] __warn+0xd8/0x100
[] warn_slowpath_fmt+0x5f/0x80
[] __list_del_entry+0x63/0xd0
[] list_del+0xd/0x30
[] kmem_cache_destroy+0x50/0x110
[] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1]
[] hfi1_file_close+0x70/0x1e0 [hfi1]
[] __fput+0xec/0x260
[] ____fput+0xe/0x10
[] task_work_run+0xbb/0xe0
[] do_notify_resume+0xa5/0xc0
[] int_signal+0x12/0x17
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [] kmem_cache_close+0x7e/0x300
PGD 2cdab19067 PUD 2f7bfdb067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables
nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit]
CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.38.3.el7.x86_64 #1
Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019
task: ffff88cc26db9040 ti: ffff88b5393a8000 task.ti: ffff88b5393a8000
RIP: 0010:[] [] kmem_cache_close+0x7e/0x300
RSP: 0018:ffff88b5393abd60 EFLAGS: 00010287
RAX: 0000000000000000 RBX: ffff88cb2c6ac000 RCX: 0000000000000003
RDX: 0000000000000400 RSI: 0000000000000400 RDI: ffffffff9095b800
RBP: ffff88b5393abdb0 R08: ffffffff9095b808 R09: ffffffff8ff77c19
R10: ffff88b73ce1f160 R11: ffffddecddde9800 R12: ffff88cb2c6ac000
R13: 000000000000000c R14: ffff88cf3fdca780 R15: 0000000000000000
FS: 00002aaaaab52500(0000) GS:ffff88b73ce00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 0000002d27664000 CR4: 00000000007607e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
[] __kmem_cache_shutdown+0x14/0x80
[] kmem_cache_destroy+0x58/0x110
[] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1]
[] hfi1_file_close+0x70/0x1e0 [hfi1]
[] __fput+0xec/0x260
[] ____fput+0xe/0x10
[] task_work_run+0xbb/0xe0
[] do_notify_resume+0xa5/0xc0
[] int_signal+0x12/0x17
Code: 00 00 ba 00 04 00 00 0f 4f c2 3d 00 04 00 00 89 45 bc 0f 84 e7 01 00 00 48 63 45 bc 49 8d 04 c4 48 89 45 b0 48 8b 80 c8 00 00 00 8b 78 10 48 89 45 c0 48 83 c0 10 48 89 45 d0 48 8b 17 48 39
RIP [] kmem_cache_close+0x7e/0x300
RSP
CR2: 0000000000000010

The panic is the result of slab entries being freed during the destruction
of the pq slab.

The code attempts to quiesce the pq, but looking for n_req == 0 doesn't
account for new requests.

Fix the issue by using SRCU to get a pq pointer and adjust the pq free
logic to NULL the fd pq pointer prior to the quiesce.

Fixes: e87473bc1b6c ("IB/hfi1: Only set fd pointer when base context is completely initialized")
Link: https://lore.kernel.org/r/20200210131033.87408.81174.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan
Signed-off-by: Mike Marciniszyn
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Mike Marciniszyn
2020-02-20 02:53:05 +0800
327f33e54 IB/hfi1: Acquire lock to release TID entries when user file is closed ... Browse Code »

commit a70ed0f2e6262e723ae8d70accb984ba309eacc2 upstream.

Each user context is allocated a certain number of RcvArray (TID)
entries and these entries are managed through TID groups. These groups
are put into one of three lists in each user context: tid_group_list,
tid_used_list, and tid_full_list, depending on the number of used TID
entries within each group. When TID packets are expected, one or more
TID groups will be allocated. After the packets are received, the TID
groups will be freed. Since multiple user threads may access the TID
groups simultaneously, a mutex exp_mutex is used to synchronize the
access. However, when the user file is closed, it tries to release
all TID groups without acquiring the mutex first, which risks a race
condition with another thread that may be releasing its TID groups,
leading to data corruption.

This patch addresses the issue by acquiring the mutex first before
releasing the TID groups when the file is closed.

Fixes: 3abb33ac6521 ("staging/hfi1: Add TID cache receive init and free funcs")
Link: https://lore.kernel.org/r/20200210131026.87408.86853.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn
Signed-off-by: Kaike Wan
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Kaike Wan
2020-02-20 02:53:05 +0800
e30e30c04 IB/mlx5: Return failure when rts2rts_qp_counters_set_id is not supported ... Browse Code »

commit 10189e8e6fe8dcde13435f9354800429c4474fb1 upstream.

When binding a QP with a counter and the QP state is not RESET, return
failure if the rts2rts_qp_counters_set_id is not supported by the
device.

This is to prevent cases like manual bind for Connect-IB devices from
returning success when the feature is not supported.

Fixes: d14133dd4161 ("IB/mlx5: Support set qp counter")
Link: https://lore.kernel.org/r/20200126171708.5167-1-leon@kernel.org
Signed-off-by: Mark Zhang
Reviewed-by: Maor Gottlieb
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Mark Zhang
2020-02-20 02:53:05 +0800

15 Feb, 2020

8 commits

21702236f RDMA/umem: Fix ib_umem_find_best_pgsz() ... Browse Code »

commit 36798d5ae1af62e830c5e045b2e41ce038690c61 upstream.

Except for the last entry, the ending iova alignment sets the maximum
possible page size as the low bits of the iova must be zero when starting
the next chunk.

Fixes: 4a35339958f1 ("RDMA/umem: Add API to find best driver supported page size in an MR")
Link: https://lore.kernel.org/r/20200128135612.174820-1-leon@kernel.org
Signed-off-by: Artemy Kovalyov
Signed-off-by: Leon Romanovsky
Tested-by: Gal Pressman
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Artemy Kovalyov
2020-02-15 05:34:08 +0800
56b22525a RDMA/cma: Fix unbalanced cm_id reference count during address resolve ... Browse Code »

commit b4fb4cc5ba83b20dae13cef116c33648e81d2f44 upstream.

Below commit missed the AF_IB and loopback code flow in
rdma_resolve_addr(). This leads to an unbalanced cm_id refcount in
cma_work_handler() which puts the refcount which was not incremented prior
to queuing the work.

A call trace is observed with such code flow:

BUG: unable to handle kernel NULL pointer dereference at (null)
[] __mutex_lock_slowpath+0x166/0x1d0
[] mutex_lock+0x1f/0x2f
[] cma_work_handler+0x25/0xa0
[] process_one_work+0x17f/0x440
[] worker_thread+0x126/0x3c0

Hence, hold the cm_id reference when scheduling the resolve work item.

Fixes: 722c7b2bfead ("RDMA/{cma, core}: Avoid callback on rdma_addr_cancel()")
Link: https://lore.kernel.org/r/20200126142652.104803-2-leon@kernel.org
Signed-off-by: Parav Pandit
Signed-off-by: Leon Romanovsky
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Parav Pandit
2020-02-15 05:34:08 +0800
789236751 RDMA/core: Fix locking in ib_uverbs_event_read ... Browse Code »

commit 14e23bd6d22123f6f3b2747701fa6cd4c6d05873 upstream.

This should not be using ib_dev to test for disassociation, during
disassociation is_closed is set under lock and the waitq is triggered.

Instead check is_closed and be sure to re-obtain the lock to test the
value after the wait_event returns.

Fixes: 036b10635739 ("IB/uverbs: Enable device removal when there are active user space applications")
Link: https://lore.kernel.org/r/1578504126-9400-12-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas
Reviewed-by: Håkon Bugge
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Jason Gunthorpe
2020-02-15 05:34:08 +0800
33daaea78 RDMA/i40iw: fix a potential NULL pointer dereference ... Browse Code »

commit 04db1580b5e48a79e24aa51ecae0cd4b2296ec23 upstream.

A NULL pointer can be returned by in_dev_get(). Thus add a corresponding
check so that a NULL pointer dereference will be avoided at this place.

Fixes: 8e06af711bf2 ("i40iw: add main, hdr, status")
Link: https://lore.kernel.org/r/1577672668-46499-1-git-send-email-xiyuyang19@fudan.edu.cn
Signed-off-by: Xiyu Yang
Signed-off-by: Xin Tan
Reviewed-by: Leon Romanovsky
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Xiyu Yang
2020-02-15 05:34:08 +0800
b1f90d263 RDMA/netlink: Do not always generate an ACK for some netlink operations ... Browse Code »

commit a242c36951ecd24bc16086940dbe6b522205c461 upstream.

In rdma_nl_rcv_skb(), the local variable err is assigned the return value
of the supplied callback function, which could be one of
ib_nl_handle_resolve_resp(), ib_nl_handle_set_timeout(), or
ib_nl_handle_ip_res_resp(). These three functions all return skb->len on
success.

rdma_nl_rcv_skb() is merely a copy of netlink_rcv_skb(). The callback
functions used by the latter have the convention: "Returns 0 on success or
a negative error code".

In particular, the statement (equal for both functions):

if (nlh->nlmsg_flags & NLM_F_ACK || err)

implies that rdma_nl_rcv_skb() always will ack a message, independent of
the NLM_F_ACK being set in nlmsg_flags or not.

The fix could be to change the above statement, but it is better to keep
the two *_rcv_skb() functions equal in this respect and instead change the
three callback functions in the rdma subsystem to the correct convention.

Fixes: 2ca546b92a02 ("IB/sa: Route SA pathrecord query through netlink")
Fixes: ae43f8286730 ("IB/core: Add IP to GID netlink offload")
Link: https://lore.kernel.org/r/20191216120436.3204814-1-haakon.bugge@oracle.com
Suggested-by: Mark Haywood
Signed-off-by: Håkon Bugge
Tested-by: Mark Haywood
Reviewed-by: Leon Romanovsky
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Håkon Bugge
2020-02-15 05:34:08 +0800
839fb9e04 IB/mlx4: Fix leak in id_map_find_del ... Browse Code »

commit ea660ad7c1c476fd6e5e3b17780d47159db71dea upstream.

Using CX-3 virtual functions, either from a bare-metal machine or
pass-through from a VM, MAD packets are proxied through the PF driver.

Since the VF drivers have separate name spaces for MAD Transaction Ids
(TIDs), the PF driver has to re-map the TIDs and keep the book keeping in
a cache.

Following the RDMA Connection Manager (CM) protocol, it is clear when an
entry has to evicted from the cache. When a DREP is sent from
mlx4_ib_multiplex_cm_handler(), id_map_find_del() is called. Similar when
a REJ is received by the mlx4_ib_demux_cm_handler(), id_map_find_del() is
called.

This function wipes out the TID in use from the IDR or XArray and removes
the id_map_entry from the table.

In short, it does everything except the topping of the cake, which is to
remove the entry from the list and free it. In other words, for the REJ
case enumerated above, one id_map_entry will be leaked.

For the other case above, a DREQ has been received first. The reception of
the DREQ will trigger queuing of a delayed work to delete the
id_map_entry, for the case where the VM doesn't send back a DREP.

In the normal case, the VM _will_ send back a DREP, and id_map_find_del()
will be called.

But this scenario introduces a secondary leak. First, when the DREQ is
received, a delayed work is queued. The VM will then return a DREP, which
will call id_map_find_del(). As stated above, this will free the TID used
from the XArray or IDR. Now, there is window where that particular TID can
be re-allocated, lets say by an outgoing REQ. This TID will later be wiped
out by the delayed work, when the function id_map_ent_timeout() is
called. But the id_map_entry allocated by the outgoing REQ will not be
de-allocated, and we have a leak.

Both leaks are fixed by removing the id_map_find_del() function and only
using schedule_delayed(). Of course, a check in schedule_delayed() to see
if the work already has been queued, has been added.

Another benefit of always using the delayed version for deleting entries,
is that we do get a TimeWait effect; a TID no longer in use, will occupy
the XArray or IDR for CM_CLEANUP_CACHE_TIMEOUT time, without any ability
of being re-used for that time period.

Fixes: 3cf69cc8dbeb ("IB/mlx4: Add CM paravirtualization")
Link: https://lore.kernel.org/r/20200123155521.1212288-1-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge
Signed-off-by: Manjunath Patil
Reviewed-by: Rama Nichanamatlu
Reviewed-by: Jack Morgenstein
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Håkon Bugge
2020-02-15 05:34:08 +0800
996dc3d50 IB/srp: Never use immediate data if it is disabled by a user ... Browse Code »

commit 0fbb37dd82998b5c83355997b3bdba2806968ac7 upstream.

Some SRP targets that do not support specification SRP-2, put the garbage
to the reserved bits of the SRP login response. The problem was not
detected for a long time because the SRP initiator ignored those bits. But
now one of them is used as SRP_LOGIN_RSP_IMMED_SUPP. And it causes a
critical error on the target when the initiator sends immediate data.

The ib_srp module has a use_imm_date parameter to enable or disable
immediate data manually. But it does not help in the above case, because
use_imm_date is ignored at handling the SRP login response. The problem is
definitely caused by a bug on the target side, but the initiator's
behavior also does not look correct. The initiator should not use
immediate data if use_imm_date is disabled by a user.

This commit adds an additional checking of use_imm_date at the handling of
SRP login response to avoid unexpected use of immediate data.

Fixes: 882981f4a411 ("RDMA/srp: Add support for immediate data")
Link: https://lore.kernel.org/r/20200115133055.30232-1-sergeygo@mellanox.com
Signed-off-by: Sergey Gorenko
Reviewed-by: Bart Van Assche
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Sergey Gorenko
2020-02-15 05:34:07 +0800
56f5f41e8 IB/mlx4: Fix memory leak in add_gid error flow ... Browse Code »

commit eaad647e5cc27f7b46a27f3b85b14c4c8a64bffa upstream.

In procedure mlx4_ib_add_gid(), if the driver is unable to update the FW
gid table, there is a memory leak in the driver's copy of the gid table:
the gid entry's context buffer is not freed.

If such an error occurs, free the entry's context buffer, and mark the
entry as available (by setting its context pointer to NULL).

Fixes: e26be1bfef81 ("IB/mlx4: Implement ib_device callbacks")
Link: https://lore.kernel.org/r/20200115085050.73746-1-leon@kernel.org
Signed-off-by: Jack Morgenstein
Reviewed-by: Parav Pandit
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Jack Morgenstein
2020-02-15 05:34:07 +0800

11 Feb, 2020

2 commits

0d1dacfda IB/core: Fix ODP get user pages flow ... Browse Code »

commit d07de8bd1709a80a282963ad7b2535148678a9e4 upstream.

The nr_pages argument of get_user_pages_remote() should always be in terms
of the system page size, not the MR page size. Use PAGE_SIZE instead of
umem_odp->page_shift.

Fixes: 403cd12e2cf7 ("IB/umem: Add contiguous ODP support")
Link: https://lore.kernel.org/r/20191222124649.52300-3-leon@kernel.org
Signed-off-by: Yishai Hadas
Reviewed-by: Artemy Kovalyov
Reviewed-by: Jason Gunthorpe
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Yishai Hadas
2020-02-11 20:35:46 +0800
320a24fae IB/mlx5: Fix outstanding_pi index for GSI qps ... Browse Code »

commit b5671afe5e39ed71e94eae788bacdcceec69db09 upstream.

Commit b0ffeb537f3a ("IB/mlx5: Fix iteration overrun in GSI qps") changed
the way outstanding WRs are tracked for the GSI QP. But the fix did not
cover the case when a call to ib_post_send() fails and updates index to
track outstanding.

Since the prior commmit outstanding_pi should not be bounded otherwise the
loop generate_completions() will fail.

Fixes: b0ffeb537f3a ("IB/mlx5: Fix iteration overrun in GSI qps")
Link: https://lore.kernel.org/r/1576195889-23527-1-git-send-email-psajeepa@purestorage.com
Signed-off-by: Prabhath Sajeepa
Acked-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Prabhath Sajeepa
2020-02-11 20:35:46 +0800

29 Jan, 2020

1 commit

3c6a183d3 scsi: RDMA/isert: Fix a recently introduced regression related to logout ... Browse Code »

commit 04060db41178c7c244f2c7dcd913e7fd331de915 upstream.

iscsit_close_connection() calls isert_wait_conn(). Due to commit
e9d3009cb936 both functions call target_wait_for_sess_cmds() although that
last function should be called only once. Fix this by removing the
target_wait_for_sess_cmds() call from isert_wait_conn() and by only calling
isert_wait_conn() after target_wait_for_sess_cmds().

Fixes: e9d3009cb936 ("scsi: target: iscsi: Wait for all commands to finish before freeing a session").
Link: https://lore.kernel.org/r/20200116044737.19507-1-bvanassche@acm.org
Reported-by: Rahul Kundu
Signed-off-by: Bart Van Assche
Tested-by: Mike Marciniszyn
Acked-by: Sagi Grimberg
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman

Bart Van Assche
2020-01-29 23:45:30 +0800

18 Jan, 2020

9 commits

8efa2de5f RDMA/srpt: Report the SCSI residual to the initiator ... Browse Code »

commit e88982ad1bb12db699de96fbc07096359ef6176c upstream.

The code added by this patch is similar to the code that already exists in
ibmvscsis_determine_resid(). This patch has been tested by running the
following command:

strace sg_raw -r 1k /dev/sdb 12 00 00 00 60 00 -o inquiry.bin |&
grep resid=

Link: https://lore.kernel.org/r/20191105214632.183302-1-bvanassche@acm.org
Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: Bart Van Assche
Acked-by: Honggang Li
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Bart Van Assche
2020-01-18 02:48:40 +0800
d73bd8a7b RDMA/mlx5: Return proper error value ... Browse Code »

commit 546d30099ed204792083f043cd7e016de86016a3 upstream.

Returned value from mlx5_mr_cache_alloc() is checked to be error or real
pointer. Return proper error code instead of NULL which is not checked
later.

Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191029055721.7192-1-leon@kernel.org
Signed-off-by: Leon Romanovsky
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Leon Romanovsky
2020-01-18 02:48:39 +0800
86933e7e6 RDMA/hns: Bugfix for qpc/cqc timer configuration ... Browse Code »

commit 887803db866a7a4e1817a3cb8a3eee4e9879fed2 upstream.

qpc/cqc timer entry size needs one page, but currently they are fixedly
configured to 4096, which is not appropriate in 64K page scenarios. So
they should be modified to PAGE_SIZE.

Fixes: 0e40dc2f70cd ("RDMA/hns: Add timer allocation support for hip08")
Link: https://lore.kernel.org/r/1571908917-16220-3-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yangyang Li
Signed-off-by: Weihang Li
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Yangyang Li
2020-01-18 02:48:39 +0800
fe714eaf4 RDMA/hns: Fix to support 64K page for srq ... Browse Code »

commit 5c7e76fb7cb5071be800c938ebf2c475e140d3f0 upstream.

SRQ's page size configuration of BA and buffer should depend on current
PAGE_SHIFT, or it can't work in scenario of 64K page.

Fixes: c7bcb13442e1 ("RDMA/hns: Add SRQ support for hip08 kernel mode")
Link: https://lore.kernel.org/r/1571908917-16220-2-git-send-email-liweihang@hisilicon.com
Signed-off-by: Lijun Ou
Signed-off-by: Weihang Li
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Lijun Ou
2020-01-18 02:48:39 +0800
c249fb6c1 RDMA/hns: Release qp resources when failed to destroy qp ... Browse Code »

commit d302c6e3a6895608a5856bc708c47bda1770b24d upstream.

Even if no response from hardware, we should make sure that qp related
resources are released to avoid memory leaks.

Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC")
Signed-off-by: Yangyang Li
Signed-off-by: Weihang Li
Link: https://lore.kernel.org/r/1570584110-3659-1-git-send-email-liweihang@hisilicon.com
Signed-off-by: Doug Ledford
Signed-off-by: Greg Kroah-Hartman

Yangyang Li
2020-01-18 02:48:37 +0800
89d316d80 RDMA/hns: Fix build error again ... Browse Code »

commit d5b60e26e86a463ca83bb5ec502dda6ea685159e upstream.

This is not the first attempt to fix building random configurations,
unfortunately the attempt in commit a07fc0bb483e ("RDMA/hns: Fix build
error") caused a new problem when CONFIG_INFINIBAND_HNS_HIP06=m and
CONFIG_INFINIBAND_HNS_HIP08=y:

drivers/infiniband/hw/hns/hns_roce_main.o:(.rodata+0xe60): undefined reference to `__this_module'

Revert commits a07fc0bb483e ("RDMA/hns: Fix build error") and
a3e2d4c7e766 ("RDMA/hns: remove obsolete Kconfig comment") to get back to
the previous state, then fix the issues described there differently, by
adding more specific dependencies: INFINIBAND_HNS can now only be built-in
if at least one of HNS or HNS3 are built-in, and the individual back-ends
are only available if that code is reachable from the main driver.

Fixes: a07fc0bb483e ("RDMA/hns: Fix build error")
Fixes: a3e2d4c7e766 ("RDMA/hns: remove obsolete Kconfig comment")
Fixes: dd74282df573 ("RDMA/hns: Initialize the PCI device for hip08 RoCE")
Fixes: 08805fdbeb2d ("RDMA/hns: Split hw v1 driver from hns roce driver")
Link: https://lore.kernel.org/r/20191007211826.3361202-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Arnd Bergmann
2020-01-18 02:48:37 +0800
219e92c25 RDMA/siw: Fix port number endianness in a debug message ... Browse Code »

commit 050dbddf249eee3e936b5734c30b2e1b427efdc3 upstream.

sin_port and sin6_port are big endian member variables. Convert these port
numbers into CPU endianness before printing.

Link: https://lore.kernel.org/r/20190930231707.48259-5-bvanassche@acm.org
Fixes: 6c52fdc244b5 ("rdma/siw: connection management")
Signed-off-by: Bart Van Assche
Reviewed-by: Bernard Metzler
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Bart Van Assche
2020-01-18 02:48:37 +0800
0e990da9b RDMA/counter: Prevent QP counter manual binding in auto mode ... Browse Code »

commit 663912a6378a34fd4f43b8d873f0c6c6322d9d0e upstream.

If auto mode is configured, manual counter allocation and QP bind is not
allowed.

Fixes: 1bd8e0a9d0fd ("RDMA/counter: Allow manual mode configuration support")
Link: https://lore.kernel.org/r/20190916071154.20383-3-leon@kernel.org
Signed-off-by: Mark Zhang
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Mark Zhang
2020-01-18 02:48:37 +0800
8328cd684 RDMA/hns: Modify return value of restrack functions ... Browse Code »

commit cfd82da4e741c16d71a12123bf0cb585af2b8796 upstream.

The restrack function return EINVAL instead of EMSGSIZE when the driver
operation fails.

Fixes: 4b42d05d0b2c ("RDMA/hns: Remove unnecessary kzalloc")
Signed-off-by: Lang Cheng
Signed-off-by: Weihang Li
Link: https://lore.kernel.org/r/1567566885-23088-5-git-send-email-liweihang@hisilicon.com
Signed-off-by: Doug Ledford
Signed-off-by: Greg Kroah-Hartman

Lang Cheng
2020-01-18 02:48:36 +0800