Eric Lee / smarc-fsl-linux-kernel

20 Oct, 2018

1 commit

6edd85a78 IB/hfi1: Fix destroy_qp hang after a link down ... Browse Code »

commit b4a4957d3d1c328b733fce783b7264996f866ad2 upstream.

rvt_destroy_qp() cannot complete until all in process packets have
been released from the underlying hardware. If a link down event
occurs, an application can hang with a kernel stack similar to:

cat /proc//stack
quiesce_qp+0x178/0x250 [hfi1]
rvt_reset_qp+0x23d/0x400 [rdmavt]
rvt_destroy_qp+0x69/0x210 [rdmavt]
ib_destroy_qp+0xba/0x1c0 [ib_core]
nvme_rdma_destroy_queue_ib+0x46/0x80 [nvme_rdma]
nvme_rdma_free_queue+0x3c/0xd0 [nvme_rdma]
nvme_rdma_destroy_io_queues+0x88/0xd0 [nvme_rdma]
nvme_rdma_error_recovery_work+0x52/0xf0 [nvme_rdma]
process_one_work+0x17a/0x440
worker_thread+0x126/0x3c0
kthread+0xcf/0xe0
ret_from_fork+0x58/0x90
0xffffffffffffffff

quiesce_qp() waits until all outstanding packets have been freed.
This wait should be momentary. During a link down event, the cleanup
handling does not ensure that all packets caught by the link down are
flushed properly.

This is caused by the fact that the freeze path and the link down
event is handled the same. This is not correct. The freeze path
waits until the HFI is unfrozen and then restarts PIO. A link down
is not a freeze event. The link down path cannot restart the PIO
until link is restored. If the PIO path is restarted before the link
comes up, the application (QP) using the PIO path will hang (until
link is restored).

Fix by separating the linkdown path from the freeze path and use the
link down path for link down events.

Close a race condition sc_disable() by acquiring both the progress
and release locks.

Close a race condition in sc_stop() by moving the setting of the flag
bits under the alloc lock.

Cc: # 4.9.x+
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Michael J. Ruhl
2018-10-20 15:48:54 +0800

13 Oct, 2018

1 commit

5656b7354 ucma: fix a use-after-free in ucma_resolve_ip() ... Browse Code »

commit 5fe23f262e0548ca7f19fb79f89059a60d087d22 upstream.

There is a race condition between ucma_close() and ucma_resolve_ip():

CPU0 CPU1
ucma_resolve_ip(): ucma_close():

ctx = ucma_get_ctx(file, cmd.id);

list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) {
mutex_lock(&mut);
idr_remove(&ctx_idr, ctx->id);
mutex_unlock(&mut);
...
mutex_lock(&mut);
if (!ctx->closing) {
mutex_unlock(&mut);
rdma_destroy_id(ctx->cm_id);
...
ucma_free_ctx(ctx);

ret = rdma_resolve_addr();
ucma_put_ctx(ctx);

Before idr_remove(), ucma_get_ctx() could still find the ctx
and after rdma_destroy_id(), rdma_resolve_addr() may still
access id_priv pointer. Also, ucma_put_ctx() may use ctx after
ucma_free_ctx() too.

ucma_close() should call ucma_put_ctx() too which tests the
refcnt and waits for the last one releasing it. The similar
pattern is already used by ucma_destroy_id().

Reported-and-tested-by: syzbot+da2591e115d57a9cbb8b@syzkaller.appspotmail.com
Reported-by: syzbot+cfe3c1e8ef634ba8964b@syzkaller.appspotmail.com
Cc: Jason Gunthorpe
Cc: Doug Ledford
Cc: Leon Romanovsky
Signed-off-by: Cong Wang
Reviewed-by: Leon Romanovsky
Signed-off-by: Doug Ledford
Signed-off-by: Greg Kroah-Hartman

Cong Wang
2018-10-13 15:27:29 +0800

10 Oct, 2018

1 commit

d4da71220 RDMA/ucma: check fd type in ucma_migrate_id() ... Browse Code »

[ Upstream commit 0d23ba6034b9cf48b8918404367506da3e4b3ee5 ]

The current code grabs the private_data of whatever file descriptor
userspace has supplied and implicitly casts it to a `struct ucma_file *`,
potentially causing a type confusion.

This is probably fine in practice because the pointer is only used for
comparisons, it is never actually dereferenced; and even in the
comparisons, it is unlikely that a file from another filesystem would have
a ->private_data pointer that happens to also be valid in this context.
But ->private_data is not always guaranteed to be a valid pointer to an
object owned by the file's filesystem; for example, some filesystems just
cram numbers in there.

Check the type of the supplied file descriptor to be safe, analogous to how
other places in the kernel do it.

Fixes: 88314e4dda1e ("RDMA/cma: add support for rdma_migrate_id()")
Signed-off-by: Jann Horn
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Jann Horn
2018-10-10 14:54:24 +0800

04 Oct, 2018

9 commits

105470069 RDMA/uverbs: Atomically flush and mark closed the comp event queue ... Browse Code »

commit 67e3816842fe6414d629c7515b955952ec40c7d7 upstream.

Currently a uverbs completion event queue is flushed of events in
ib_uverbs_comp_event_close() with the queue spinlock held and then
released. Yet setting ev_queue->is_closed is not set until later in
uverbs_hot_unplug_completion_event_file().

In between the time ib_uverbs_comp_event_close() releases the lock and
uverbs_hot_unplug_completion_event_file() acquires the lock, a completion
event can arrive and be inserted into the event queue by
ib_uverbs_comp_handler().

This can cause a "double add" list_add warning or crash depending on the
kernel configuration, or a memory leak because the event is never dequeued
since the queue is already closed down.

So add setting ev_queue->is_closed = 1 to ib_uverbs_comp_event_close().

Cc: stable@vger.kernel.org
Fixes: 1e7710f3f656 ("IB/core: Change completion channel to use the reworked objects schema")
Signed-off-by: Steve Wise
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Steve Wise
2018-10-04 08:00:56 +0800
693536a7c IB/hfi1: Fix context recovery when PBC has an UnsupportedVL ... Browse Code »

commit d623500b3c4efd8d4e945ac9003c6b87b469a9ab upstream.

If a packet stream uses an UnsupportedVL (virtual lane), the send
engine will not send the packet, and it will not indicate that an
error has occurred. This will cause the packet stream to block.

HFI has 8 virtual lanes available for packet streams. Each lane can
be enabled or disabled using the UnsupportedVL mask. If a lane is
disabled, adding a packet to the send context must be disallowed.

The current mask for determining unsupported VLs defaults to 0 (allow
all). This is incorrect. Only the VLs that are defined should be
allowed.

Determine which VLs are disabled (mtu == 0), and set the appropriate
unsupported bit in the mask. The correct mask will allow the send
engine to error on the invalid VL, and error recovery will work
correctly.

Cc: # 4.9.x+
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn
Reviewed-by: Lukasz Odzioba
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Michael J. Ruhl
2018-10-04 08:00:56 +0800
412a4b4db IB/hfi1: Invalid user input can result in crash ... Browse Code »

commit 94694d18cf27a6faad91487a38ce516c2b16e7d9 upstream.

If the number of packets in a user sdma request does not match
the actual iovectors being sent, sdma_cleanup can be called on
an uninitialized request structure, resulting in a crash similar
to this:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [] __sdma_txclean+0x57/0x1e0 [hfi1]
PGD 8000001044f61067 PUD 1052706067 PMD 0
Oops: 0000 [#1] SMP
CPU: 30 PID: 69912 Comm: upsm Kdump: loaded Tainted: G OE
------------ 3.10.0-862.el7.x86_64 #1
Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS
SE5C610.86B.01.01.0019.101220160604 10/12/2016
task: ffff8b331c890000 ti: ffff8b2ed1f98000 task.ti: ffff8b2ed1f98000
RIP: 0010:[] [] __sdma_txclean+0x57/0x1e0
[hfi1]
RSP: 0018:ffff8b2ed1f9bab0 EFLAGS: 00010286
RAX: 0000000000008b2b RBX: ffff8b2adf6e0000 RCX: 0000000000000000
RDX: 00000000000000a0 RSI: ffff8b2e9eedc540 RDI: ffff8b2adf6e0000
RBP: ffff8b2ed1f9bad8 R08: 0000000000000000 R09: ffffffffc0b04a06
R10: ffff8b331c890190 R11: ffffe6ed00bf1840 R12: ffff8b3315480000
R13: ffff8b33154800f0 R14: 00000000fffffff2 R15: ffff8b2e9eedc540
FS: 00007f035ac47740(0000) GS:ffff8b331e100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000c03fe6000 CR4: 00000000001607e0
Call Trace:
[] user_sdma_send_pkts+0xdcd/0x1990 [hfi1]
[] ? gup_pud_range+0x140/0x290
[] ? hfi1_mmu_rb_insert+0x155/0x1b0 [hfi1]
[] hfi1_user_sdma_process_request+0xc5b/0x11b0 [hfi1]
[] hfi1_aio_write+0xba/0x110 [hfi1]
[] do_sync_readv_writev+0x7b/0xd0
[] do_readv_writev+0xce/0x260
[] ? tty_ldisc_deref+0x19/0x20
[] ? n_tty_ioctl+0xe0/0xe0
[] vfs_writev+0x35/0x60
[] SyS_writev+0x7f/0x110
[] system_call_fastpath+0x1c/0x21
Code: 06 49 c7 47 18 00 00 00 00 0f 87 89 01 00 00 5b 41 5c 41 5d 41 5e 41 5f
5d c3 66 2e 0f 1f 84 00 00 00 00 00 48 8b 4e 10 48 89 fb 8b 51 08 49 89 d4
83 e2 0c 41 81 e4 00 e0 00 00 48 c1 ea 02
RIP [] __sdma_txclean+0x57/0x1e0 [hfi1]
RSP
CR2: 0000000000000008

There are two exit points from user_sdma_send_pkts(). One (free_tx)
merely frees the slab entry and one (free_txreq) cleans the sdma_txreq
prior to freeing the slab entry. The free_txreq variation can only be
called after one of the sdma_init*() variations has been called.

In the panic case, the slab entry had been allocated but not inited.

Fix the issue by exiting through free_tx thus avoiding sdma_clean().

Cc: # 4.9.x+
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn
Reviewed-by: Lukasz Odzioba
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Greg Kroah-Hartman

Signed-off-by: Jason Gunthorpe

Michael J. Ruhl
2018-10-04 08:00:56 +0800
d9e49e9ed IB/hfi1: Fix SL array bounds check ... Browse Code »

commit 0dbfaa9f2813787679e296eb5476e40938ab48c8 upstream.

The SL specified by a user needs to be a valid SL.

Add a range check to the user specified SL value which protects from
running off the end of the SL to SC table.

CC: stable@vger.kernel.org
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Signed-off-by: Ira Weiny
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Ira Weiny
2018-10-04 08:00:56 +0800
fcbe49c82 IB/srp: Avoid that sg_reset -d ${srp_device} triggers an infinite loop ... Browse Code »

commit ee92efe41cf358f4b99e73509f2bfd4733609f26 upstream.

Use different loop variables for the inner and outer loop. This avoids
that an infinite loop occurs if there are more RDMA channels than
target->req_ring_size.

Fixes: d92c0da71a35 ("IB/srp: Add multichannel support")
Cc:
Signed-off-by: Bart Van Assche
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Bart Van Assche
2018-10-04 08:00:56 +0800
333cb98f3 IB/mlx4: Test port number before querying type. ... Browse Code »

[ Upstream commit f1228867adaf8890826f2b59e4caddb1c5cc2df7 ]

rdma_ah_find_type() can reach into ib_device->port_immutable with a
potentially out-of-bounds port number, so check that the port number is
valid first.

Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Tarick Bedeir
Reviewed-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Tarick Bedeir
2018-10-04 08:00:48 +0800
0ca45668e IB/core: type promotion bug in rdma_rw_init_one_mr() ... Browse Code »

[ Upstream commit c2d7c8ff89b22ddefb1ac2986c0d48444a667689 ]

"nents" is an unsigned int, so if ib_map_mr_sg() returns a negative
error code then it's type promoted to a high unsigned int which is
treated as success.

Fixes: a060b5629ab0 ("IB/core: generic RDMA READ/WRITE API")
Signed-off-by: Dan Carpenter
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Dan Carpenter
2018-10-04 08:00:47 +0800
eca859882 RDMA/i40w: Hold read semaphore while looking after VMA ... Browse Code »

[ Upstream commit 5d9a2b0e28759e319a623da33940dbb3ce952b7d ]

VMA lookup is supposed to be performed while mmap_sem is held.

Fixes: f26c7c83395b ("i40iw: Add 2MB page support")
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Leon Romanovsky
2018-10-04 08:00:47 +0800
e862ab6b6 RDMA/bnxt_re: Fix a couple off by one bugs ... Browse Code »

[ Upstream commit 474e5a86067e5f12c97d1db8b170c7f45b53097a ]

The sgid_tbl->tbl[] array is allocated in bnxt_qplib_alloc_sgid_tbl().
It has sgid_tbl->max elements. So the > should be >= to prevent
accessing one element beyond the end of the array.

Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Dan Carpenter
Acked-by: Selvin Xavier
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Dan Carpenter
2018-10-04 08:00:47 +0800

29 Sep, 2018

1 commit

3a411a04b iw_cxgb4: only allow 1 flush on user qps ... Browse Code »

commit 308aa2b8f7b7db3332a7d41099fd37851fb793b2 upstream.

Once the qp has been flushed, it cannot be flushed again. The user qp
flush logic wasn't enforcing it however. The bug can cause
touch-after-free crashes like:

Unable to handle kernel paging request for data at address 0x000001ec
Faulting instruction address: 0xc008000016069100
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c008000016069100] flush_qp+0x80/0x480 [iw_cxgb4]
LR [c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4]
Call Trace:
[c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4]
[c00800001606e868] c4iw_ib_modify_qp+0x118/0x200 [iw_cxgb4]
[c0080000119eae80] ib_security_modify_qp+0xd0/0x3d0 [ib_core]
[c0080000119c4e24] ib_modify_qp+0xc4/0x2c0 [ib_core]
[c008000011df0284] iwcm_modify_qp_err+0x44/0x70 [iw_cm]
[c008000011df0fec] destroy_cm_id+0xcc/0x370 [iw_cm]
[c008000011ed4358] rdma_destroy_id+0x3c8/0x520 [rdma_cm]
[c0080000134b0540] ucma_close+0x90/0x1b0 [rdma_ucm]
[c000000000444da4] __fput+0xe4/0x2f0

So fix flush_qp() to only flush the wq once.

Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Steve Wise
2018-09-29 18:06:07 +0800

26 Sep, 2018

4 commits

b8b9c7f05 IB/ipoib: Avoid a race condition between start_xmit and cm_rep_handler ... Browse Code »

commit 816e846c2eb9129a3e0afa5f920c8bbc71efecaa upstream.

Inside of start_xmit() the call to check if the connection is up and the
queueing of the packets for later transmission is not atomic which leaves
a window where cm_rep_handler can run, set the connection up, dequeue
pending packets and leave the subsequently queued packets by start_xmit()
sitting on neigh->queue until they're dropped when the connection is torn
down. This only applies to connected mode. These dropped packets can
really upset TCP, for example, and cause multi-minute delays in
transmission for open connections.

Here's the code in start_xmit where we check to see if the connection is
up:

if (ipoib_cm_get(neigh)) {
if (ipoib_cm_up(neigh)) {
ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
goto unref;
}
}

The race occurs if cm_rep_handler execution occurs after the above
connection check (specifically if it gets to the point where it acquires
priv->lock to dequeue pending skb's) but before the below code snippet in
start_xmit where packets are queued.

if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
push_pseudo_header(skb, phdr->hwaddr);
spin_lock_irqsave(&priv->lock, flags);
__skb_queue_tail(&neigh->queue, skb);
spin_unlock_irqrestore(&priv->lock, flags);
} else {
++dev->stats.tx_dropped;
dev_kfree_skb_any(skb);
}

The patch acquires the netif tx lock in cm_rep_handler for the section
where it sets the connection up and dequeues and retransmits deferred
skb's.

Fixes: 839fcaba355a ("IPoIB: Connected mode experimental support")
Cc: stable@vger.kernel.org
Signed-off-by: Aaron Knister
Tested-by: Ira Weiny
Reviewed-by: Ira Weiny
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Aaron Knister
2018-09-26 14:38:05 +0800
8c705dea5 RDMA/cma: Protect cma dev list with lock ... Browse Code »

commit 954a8e3aea87e896e320cf648c1a5bbe47de443e upstream.

When AF_IB addresses are used during rdma_resolve_addr() a lock is not
held. A cma device can get removed while list traversal is in progress
which may lead to crash. ie

CPU0 CPU1
==== ====
rdma_resolve_addr()
cma_resolve_ib_dev()
list_for_each() cma_remove_one()
cur_dev->device mutex_lock(&lock)
list_del();
mutex_unlock(&lock);
cma_process_remove();

Therefore, hold a lock while traversing the list which avoids such
situation.

Cc: # 3.10
Fixes: f17df3b0dede ("RDMA/cma: Add support for AF_IB to rdma_resolve_addr()")
Signed-off-by: Parav Pandit
Reviewed-by: Daniel Jurgens
Signed-off-by: Leon Romanovsky
Reviewed-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Parav Pandit
2018-09-26 14:38:05 +0800
745cb5eb3 IB/ipoib: Fix error return code in ipoib_dev_init() ... Browse Code »

[ Upstream commit 99a7e2bf704d64c966dfacede1ba2d9b47cb676e ]

Fix to return a negative error code from the ipoib_neigh_hash_init()
error handling case instead of 0, as done elsewhere in this function.

Fixes: 515ed4f3aab4 ("IB/IPoIB: Separate control and data related initializations")
Signed-off-by: Wei Yongjun
Reviewed-by: Yuval Shaia
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Wei Yongjun
2018-09-26 14:38:00 +0800
394df5914 IB/rxe: Drop QP0 silently ... Browse Code »

[ Upstream commit 536ca245c512aedfd84cde072d7b3ca14b6e1792 ]

According to "Annex A16: RDMA over Converged Ethernet (RoCE)":

A16.4.3 MANAGEMENT INTERFACES

As defined in the base specification, a special Queue Pair, QP0 is defined
solely for communication between subnet manager(s) and subnet management
agents. Since such an IB-defined subnet management architecture is outside
the scope of this annex, it follows that there is also no requirement that
a port which conforms to this annex be associated with a QP0. Thus, for
end nodes designed to conform to this annex, the concept of QP0 is
undefined and unused for any port connected to an Ethernet network.

CA16-8: A packet arriving at a RoCE port containing a BTH with the
destination QP field set to QP0 shall be silently dropped.

Signed-off-by: Zhu Yanjun
Acked-by: Moni Shoua
Reviewed-by: Yuval Shaia
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Zhu Yanjun
2018-09-26 14:38:00 +0800

20 Sep, 2018

1 commit

bdbf6e0b9 RDMA/cma: Do not ignore net namespace for unbound cm_id ... Browse Code »

[ Upstream commit 643d213a9a034fa04f5575a40dfc8548e33ce04f ]

Currently if the cm_id is not bound to any netdevice, than for such cm_id,
net namespace is ignored; which is incorrect.

Regardless of cm_id bound to a netdevice or not, net namespace must
match. When a cm_id is bound to a netdevice, in such case net namespace
and netdevice both must match.

Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to RDMA CM")
Signed-off-by: Parav Pandit
Reviewed-by: Daniel Jurgens
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Parav Pandit
2018-09-20 04:43:45 +0800

15 Sep, 2018

2 commits

c16a0727c RDMA/hns: Fix usage of bitmap allocation functions return values ... Browse Code »

[ Upstream commit a1ceeca679dccc492235f0f629d9e9f7b3d51ca8 ]

hns bitmap allocation functions return 0 on success and -1 on failure.
Callers of these functions wrongly used their return value as an errno,
fix that by making a proper conversion.

Fixes: a598c6f4c5a8 ("IB/hns: Simplify function of pd alloc and qp alloc")
Signed-off-by: Gal Pressman
Acked-by: Lijun Ou
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Gal Pressman
2018-09-15 15:45:29 +0800
52ec8484a IB/hfi1: Invalid NUMA node information can cause a divide by zero ... Browse Code »

[ Upstream commit c513de490f808d8480346f9a58e6a4a5f3de12e7 ]

If the system BIOS does not supply NUMA node information to the
PCI devices, the NUMA node is selected by choosing the current
node.

This can lead to the following crash:

divide error: 0000 SMP
CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G IOE
------------ 3.10.0-693.21.1.el7.x86_64 #1
Hardware name: Intel Corporation S2600KP/S2600KP, BIOS
SE5C610.86B.01.01.0005.101720141054 10/17/2014
Workqueue: events work_for_cpu_fn
task: ffff880174480fd0 ti: ffff880174488000 task.ti: ffff880174488000
RIP: 0010: [] hfi1_dev_affinity_init+0x129/0x6a0 [hfi1]
RSP: 0018:ffff88017448bbf8 EFLAGS: 00010246
RAX: 0000000000000011 RBX: ffff88107ffba6c0 RCX: ffff88085c22e130
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880824ad0000
RBP: ffff88017448bc48 R08: 0000000000000011 R09: 0000000000000002
R10: ffff8808582b6ca0 R11: 0000000000003151 R12: ffff8808582b6ca0
R13: ffff8808582b6518 R14: ffff8808582b6010 R15: 0000000000000012
FS: 0000000000000000(0000) GS:ffff88085ec00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007efc707404f0 CR3: 0000000001a02000 CR4: 00000000001607f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
hfi1_init_dd+0x14b3/0x27a0 [hfi1]
? pcie_capability_write_word+0x46/0x70
? hfi1_pcie_init+0xc0/0x200 [hfi1]
do_init_one+0x153/0x4c0 [hfi1]
? sched_clock_cpu+0x85/0xc0
init_one+0x1b5/0x260 [hfi1]
local_pci_probe+0x4a/0xb0
work_for_cpu_fn+0x1a/0x30
process_one_work+0x17f/0x440
worker_thread+0x278/0x3c0
? manage_workers.isra.24+0x2a0/0x2a0
kthread+0xd1/0xe0
? insert_kthread_work+0x40/0x40
ret_from_fork+0x77/0xb0
? insert_kthread_work+0x40/0x40

If the BIOS is not supplying NUMA information:
- set the default table count to 1 for all possible nodes
- select node 0 (instead of current NUMA) node to get consistent
performance
- generate an error indicating that the BIOS should be upgraded

Reviewed-by: Gary Leshner
Reviewed-by: Mike Marciniszyn
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro

Signed-off-by: Jason Gunthorpe

Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Michael J. Ruhl
2018-09-15 15:45:27 +0800

10 Sep, 2018

2 commits

def89b81e RDMA/rxe: Set wqe->status correctly if an unexpected response is received ... Browse Code »

commit 61b717d041b1976530f68f8b539b2e3a7dd8e39c upstream.

Every function that returns COMPST_ERROR must set wqe->status to another
value than IB_WC_SUCCESS before returning COMPST_ERROR. Fix the only code
path for which this is not yet the case.

Signed-off-by: Bart Van Assche
Cc:
Reviewed-by: Yuval Shaia
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Bart Van Assche
2018-09-10 01:55:54 +0800
bac5c3c12 ib_srpt: Fix a use-after-free in srpt_close_ch() ... Browse Code »

commit 995250959d22fc341b5424e3343b0ce5df672461 upstream.

Avoid that KASAN reports the following:

BUG: KASAN: use-after-free in srpt_close_ch+0x4f/0x1b0 [ib_srpt]
Read of size 4 at addr ffff880151180cb8 by task check/4681

CPU: 15 PID: 4681 Comm: check Not tainted 4.18.0-rc2-dbg+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0xa4/0xf5
print_address_description+0x6f/0x270
kasan_report+0x241/0x360
__asan_load4+0x78/0x80
srpt_close_ch+0x4f/0x1b0 [ib_srpt]
srpt_set_enabled+0xf7/0x1e0 [ib_srpt]
srpt_tpg_enable_store+0xb8/0x120 [ib_srpt]
configfs_write_file+0x14e/0x1d0 [configfs]
__vfs_write+0xd2/0x3b0
vfs_write+0x101/0x270
ksys_write+0xab/0x120
__x64_sys_write+0x43/0x50
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: aaf45bd83eba ("IB/srpt: Detect session shutdown reliably")
Signed-off-by: Bart Van Assche
Cc:
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Bart Van Assche
2018-09-10 01:55:54 +0800

24 Aug, 2018

3 commits

26c7588c2 RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error path ... Browse Code »

[ Upstream commit d63c46734c545ad0488761059004a65c46efdde3 ]

Fix memory leak in the error path of mlx5_ib_create_srq() by making sure
to free the allocated srq.

Fixes: c2b37f76485f ("IB/mlx5: Fix integer overflows in mlx5_ib_create_srq")
Signed-off-by: Kamal Heib
Acked-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Kamal Heib
2018-08-24 19:09:16 +0800
ed4afe79b IB/rxe: Fix missing completion for mem_reg work requests ... Browse Code »

[ Upstream commit 375dc53d032fc11e98036b5f228ad13f7c5933f5 ]

Run the completer task to post a work completion after processing
a memory registration or invalidate work request. This covers the
case where the memory registration or invalidate was the last work
request posted to the qp.

Signed-off-by: Vijay Immanuel
Reviewed-by: Yonatan Cohen
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Vijay Immanuel
2018-08-24 19:09:00 +0800
da327a4b9 IB/mlx4: Fix an error handling path in 'mlx4_ib_rereg_user_mr()' ... Browse Code »

[ Upstream commit 3dc7c7badb7502ec3e3aa817a8bdd9e53aa54c52 ]

Before returning -EPERM we should release some resources, as already done
in the other error handling path of the function.

Fixes: d8f9cc328c88 ("IB/mlx4: Mark user MR as writable if actual virtual memory is writable")
Signed-off-by: Christophe JAILLET
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Christophe Jaillet
2018-08-24 19:08:56 +0800

06 Aug, 2018

1 commit

65be9cbe1 RDMA/uverbs: Expand primary and alt AV port checks ... Browse Code »

commit addb8a6559f0f8b5a37582b7ca698358445a55bf upstream.

The commit cited below checked that the port numbers provided in the
primary and alt AVs are legal.

That is sufficient to prevent a kernel panic. However, it is not
sufficient for correct operation.

In Linux, AVs (both primary and alt) must be completely self-described.
We do not accept an AV from userspace without an embedded port number.
(This has been the case since kernel 3.14 commit dbf727de7440
("IB/core: Use GID table in AH creation and dmac resolution")).

For the primary AV, this embedded port number must match the port number
specified with IB_QP_PORT.

We also expect the port number embedded in the alt AV to match the
alt_port_num value passed by the userspace driver in the modify_qp command
base structure.

Add these checks to modify_qp.

Cc: # 4.16
Fixes: 5d4c05c3ee36 ("RDMA/uverbs: Sanitize user entered port numbers prior to access it")
Signed-off-by: Jack Morgenstein
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Jack Morgenstein
2018-08-06 22:20:51 +0800

03 Aug, 2018

7 commits

a61b3378b RDMA/uverbs: Protect from attempts to create flows on unsupported QP ... Browse Code »

commit 940efcc8889f0d15567eb07fc9fd69b06e366aa5 upstream.

Flows can be created on UD and RAW_PACKET QP types. Attempts to provide
other QP types as an input causes to various unpredictable failures.

The reason is that in order to support all various types (e.g. XRC), we
are supposed to use real_qp handle and not qp handle and expect to
driver/FW to fail such (XRC) flows. The simpler and safer variant is to
ban all QP types except UD and RAW_PACKET, instead of relying on
driver/FW.

Cc: # 3.11
Fixes: 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through uverbs")
Cc: syzkaller
Reported-by: Noa Osherovich
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sudip Mukherjee
Signed-off-by: Greg Kroah-Hartman

Leon Romanovsky
2018-08-03 13:50:43 +0800
929e1a390 RDMA/mad: Convert BUG_ONs to error flows ... Browse Code »

[ Upstream commit 2468b82d69e3a53d024f28d79ba0fdb8bf43dfbf ]

Let's perform checks in-place instead of BUG_ONs.

Signed-off-by: Leon Romanovsky
Signed-off-by: Doug Ledford
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Leon Romanovsky
2018-08-03 13:50:25 +0800
e27dad1eb infiniband: fix a possible use-after-free bug ... Browse Code »

[ Upstream commit cb2595c1393b4a5211534e6f0a0fbad369e21ad8 ]

ucma_process_join() will free the new allocated "mc" struct,
if there is any error after that, especially the copy_to_user().

But in parallel, ucma_leave_multicast() could find this "mc"
through idr_find() before ucma_process_join() frees it, since it
is already published.

So "mc" could be used in ucma_leave_multicast() after it is been
allocated and freed in ucma_process_join(), since we don't refcnt
it.

Fix this by separating "publish" from ID allocation, so that we
can get an ID first and publish it later after copy_to_user().

Fixes: c8f6a362bf3e ("RDMA/cma: Add multicast communication support")
Reported-by: Noam Rathaus
Signed-off-by: Cong Wang
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Cong Wang
2018-08-03 13:50:24 +0800
e581f7c59 drivers/infiniband/ulp/srpt/ib_srpt.c: fix build with gcc-4.4.4 ... Browse Code »

commit 06892cc190550807d332c95a0114c7e175584012 upstream.

gcc-4.4.4 has issues with initialization of anonymous unions:

drivers/infiniband/ulp/srpt/ib_srpt.c: In function 'srpt_zerolength_write':
drivers/infiniband/ulp/srpt/ib_srpt.c:854: error: unknown field 'wr_cqe' specified in initializer
drivers/infiniband/ulp/srpt/ib_srpt.c:854: warning: initialization makes integer from pointer without a cast

Work aound this.

Fixes: 2a78cb4db487 ("IB/srpt: Fix an out-of-bounds stack access in srpt_zerolength_write()")
Cc: Bart Van Assche
Cc: Christoph Hellwig
Cc: Jason Gunthorpe
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Doug Ledford
Signed-off-by: Sudip Mukherjee
Signed-off-by: Greg Kroah-Hartman

Andrew Morton
2018-08-03 13:50:20 +0800
1e8bb2e9c IB/srpt: Fix an out-of-bounds stack access in srpt_zerolength_write() ... Browse Code »

commit 2a78cb4db487372152bed2055c038f9634d595e8 upstream.

Avoid triggering an out-of-bounds stack access by changing the type
of 'wr' from ib_send_wr into ib_rdma_wr.

This patch fixes the following KASAN bug report:

BUG: KASAN: stack-out-of-bounds in rxe_post_send+0x7a9/0x9a0 [rdma_rxe]
Read of size 8 at addr ffff880068197a48 by task kworker/2:1/44

Workqueue: ib_cm cm_work_handler [ib_cm]
Call Trace:
dump_stack+0x8e/0xcd
print_address_description+0x6f/0x280
kasan_report+0x25a/0x380
__asan_load8+0x54/0x90
rxe_post_send+0x7a9/0x9a0 [rdma_rxe]
srpt_zerolength_write+0xf0/0x180 [ib_srpt]
srpt_cm_rtu_recv+0x68/0x110 [ib_srpt]
srpt_rdma_cm_handler+0xbb/0x15b [ib_srpt]
cma_ib_handler+0x1aa/0x4a0 [rdma_cm]
cm_process_work+0x30/0x100 [ib_cm]
cm_work_handler+0xa86/0x351b [ib_cm]
process_one_work+0x475/0x9f0
worker_thread+0x69/0x690
kthread+0x1ad/0x1d0
ret_from_fork+0x3a/0x50

Fixes: aaf45bd83eba ("IB/srpt: Detect session shutdown reliably")
Signed-off-by: Bart Van Assche
Cc: Christoph Hellwig
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sudip Mukherjee
Signed-off-by: Greg Kroah-Hartman

Bart Van Assche
2018-08-03 13:50:20 +0800
d02c9c8bf drivers/infiniband/core/verbs.c: fix build with gcc-4.4.4 ... Browse Code »

commit 6ee687735e745eafae9e6b93d1ea70bc52e7ad07 upstream.

gcc-4.4.4 has issues with initialization of anonymous unions.

drivers/infiniband/core/verbs.c: In function '__ib_drain_sq':
drivers/infiniband/core/verbs.c:2204: error: unknown field 'wr_cqe' specified in initializer
drivers/infiniband/core/verbs.c:2204: warning: initialization makes integer from pointer without a cast

Work around this.

Fixes: a1ae7d0345edd5 ("RDMA/core: Avoid that ib_drain_qp() triggers an out-of-bounds stack access")
Cc: Bart Van Assche
Cc: Steve Wise
Cc: Sagi Grimberg
Cc: Jason Gunthorpe
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Doug Ledford
Signed-off-by: Sudip Mukherjee
Signed-off-by: Greg Kroah-Hartman

Andrew Morton
2018-08-03 13:50:20 +0800
3af618717 RDMA/core: Avoid that ib_drain_qp() triggers an out-of-bounds stack access ... Browse Code »

commit a1ae7d0345edd593d6725d3218434d903a0af95d upstream.

This patch fixes the following KASAN complaint:

==================================================================
BUG: KASAN: stack-out-of-bounds in rxe_post_send+0x77d/0x9b0 [rdma_rxe]
Read of size 8 at addr ffff880061aef860 by task 01/1080

CPU: 2 PID: 1080 Comm: 01 Not tainted 4.16.0-rc3-dbg+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0x85/0xc7
print_address_description+0x65/0x270
kasan_report+0x231/0x350
rxe_post_send+0x77d/0x9b0 [rdma_rxe]
__ib_drain_sq+0x1ad/0x250 [ib_core]
ib_drain_qp+0x9/0x30 [ib_core]
srp_destroy_qp+0x51/0x70 [ib_srp]
srp_free_ch_ib+0xfc/0x380 [ib_srp]
srp_create_target+0x1071/0x19e0 [ib_srp]
kernfs_fop_write+0x180/0x210
__vfs_write+0xb1/0x2e0
vfs_write+0xf6/0x250
SyS_write+0x99/0x110
do_syscall_64+0xee/0x2b0
entry_SYSCALL_64_after_hwframe+0x42/0xb7

The buggy address belongs to the page:
page:ffffea000186bbc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x4000000000000000()
raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: 0000000000000000 ffffea000186bbe0 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff880061aef700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880061aef780: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00
>ffff880061aef800: f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 f2 f2 f2 f2
^
ffff880061aef880: f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 f2 f2
ffff880061aef900: f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

Fixes: 765d67748bcf ("IB: new common API for draining queues")
Signed-off-by: Bart Van Assche
Cc: Steve Wise
Cc: Sagi Grimberg
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sudip Mukherjee
Signed-off-by: Greg Kroah-Hartman

Bart Van Assche
2018-08-03 13:50:20 +0800

17 Jul, 2018

3 commits

e8484443c RDMA/ucm: Mark UCM interface as BROKEN ... Browse Code »

commit 7a8690ed6f5346f6738971892205e91d39b6b901 upstream.

In commit 357d23c811a7 ("Remove the obsolete libibcm library")
in rdma-core [1], we removed obsolete library which used the
/dev/infiniband/ucmX interface.

Following multiple syzkaller reports about non-sanitized
user input in the UCMA module, the short audit reveals the same
issues in UCM module too.

It is better to disable this interface in the kernel,
before syzkaller team invests time and energy to harden
this unused interface.

[1] https://github.com/linux-rdma/rdma-core/pull/279

Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Leon Romanovsky
2018-07-17 17:39:33 +0800
f47f1f976 iw_cxgb4: correctly enforce the max reg_mr depth ... Browse Code »

commit 7b72717a20bba8bdd01b14c0460be7d15061cd6b upstream.

The code was mistakenly using the length of the page array memory instead
of the depth of the page array.

This would cause MR creation to fail in some cases.

Fixes: 8376b86de7d3 ("iw_cxgb4: Support the new memory registration API")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Steve Wise
2018-07-17 17:39:31 +0800
ac5270d4b IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return values ... Browse Code »

commit b697d7d8c741f27b728a878fc55852b06d0f6f5e upstream.

The __get_txreq() function can return a pointer, ERR_PTR(-EBUSY), or NULL.
All of the relevant call sites look for IS_ERR, so the NULL return would
lead to a NULL pointer exception.

Do not use the ERR_PTR mechanism for this function.

Update all call sites to handle the return value correctly.

Clean up error paths to reflect return value.

Fixes: 45842abbb292 ("staging/rdma/hfi1: move txreq header code")
Cc: # 4.9.x+
Reported-by: Dan Carpenter
Reviewed-by: Mike Marciniszyn
Reviewed-by: Kamenee Arumugam
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Michael J. Ruhl
2018-07-17 17:39:30 +0800

03 Jul, 2018

4 commits

786c8d79f RDMA/mlx4: Discard unknown SQP work requests ... Browse Code »

commit 6b1ca7ece15e94251d1d0d919f813943e4a58059 upstream.

There is no need to crash the machine if unknown work request was
received in SQP MAD.

Cc: # 3.6
Fixes: 37bfc7c1e83f ("IB/mlx4: SR-IOV multiplex and demultiplex MADs")
Signed-off-by: Leon Romanovsky
Signed-off-by: Doug Ledford
Signed-off-by: Greg Kroah-Hartman

Leon Romanovsky
2018-07-03 17:24:54 +0800
a33699925 IB/hfi1: Fix user context tail allocation for DMA_RTAIL ... Browse Code »

commit 1bc0299d976e000ececc6acd76e33b4582646cb7 upstream.

The following code fails to allocate a buffer for the
tail address that the hardware DMAs into when the user
context DMA_RTAIL is set.

if (HFI1_CAP_KGET_MASK(rcd->flags, DMA_RTAIL)) {
rcd->rcvhdrtail_kvaddr = dma_zalloc_coherent(
&dd->pcidev->dev, PAGE_SIZE, &dma_hdrqtail,
gfp_flags);
if (!rcd->rcvhdrtail_kvaddr)
goto bail_free;
rcd->rcvhdrqtailaddr_dma = dma_hdrqtail;
}

So the rcvhdrtail_kvaddr would then be NULL.

The mmap logic fails to check for a NULL rcvhdrtail_kvaddr.

The fix is to test for both user and kernel DMA_TAIL options
during the allocation as well as testing for a NULL
rcvhdrtail_kvaddr during the mmap processing.

Additionally, all downstream testing of the capmask for DMA_RTAIL
have been eliminated in favor of testing rcvhdrtail_kvaddr.

Cc: # 4.9.x
Reviewed-by: Michael J. Ruhl
Signed-off-by: Mike Marciniszyn
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Mike Marciniszyn
2018-07-03 17:24:54 +0800
964705c4a IB/hfi1: Optimize kthread pointer locking when queuing CQ entries ... Browse Code »

commit af8aab71370a692eaf7e7969ba5b1a455ac20113 upstream.

All threads queuing CQ entries on different CQs are unnecessarily
synchronized by a spin lock to check if the CQ kthread worker hasn't
been destroyed before queuing an CQ entry.

The lock used in 6efaf10f163d ("IB/rdmavt: Avoid queuing work into a
destroyed cq kthread worker") is a device global lock and will have
poor performance at scale as completions are entered from a large
number of CPUs.

Convert to use RCU where the read side of RCU is rvt_cq_enter() to
determine that the worker is alive prior to triggering the
completion event.
Apply write side RCU semantics in rvt_driver_cq_init() and
rvt_cq_exit().

Fixes: 6efaf10f163d ("IB/rdmavt: Avoid queuing work into a destroyed cq kthread worker")
Cc: # 4.14.x
Reviewed-by: Mike Marciniszyn
Signed-off-by: Sebastian Sanchez
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford
Signed-off-by: Greg Kroah-Hartman

Sebastian Sanchez
2018-07-03 17:24:54 +0800
2bd28cba4 IB/hfi1: Reorder incorrect send context disable ... Browse Code »

commit a93a0a31111231bb1949f4a83b17238f0fa32d6a upstream.

User send context integrity bits are cleared before the context is
disabled. If the send context is still processing data, any packets
that need those integrity bits will cause an error and halt the send
context.

During the disable handling, the driver waits for the context to drain.
If the context is halted, the driver will eventually timeout because
the context won't drain and then incorrectly bounce the link.

Reorder the bit clearing and the context disable.

Examine the software state and send context status as well as the
egress status to determine if a send context is in the halted state.

Promote the check macros to static functions for consistency with the
new check and to follow kernel style.

Remove an unused define that refers to the egress timeout.

Cc: # 4.9.x
Reviewed-by: Mitko Haralanov
Reviewed-by: Mike Marciniszyn
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford
Signed-off-by: Greg Kroah-Hartman

Michael J. Ruhl
2018-07-03 17:24:54 +0800