Eric Lee / smarc-fsl-linux-kernel

07 Feb, 2019

1 commit

31ba88562 IB/hfi1: Remove overly conservative VM_EXEC flag check ... Browse Code »

commit 7709b0dc265f28695487712c45f02bbd1f98415d upstream.

Applications that use the stack for execution purposes cause userspace PSM
jobs to fail during mmap().

Both Fortran (non-standard format parsing) and C (callback functions
located in the stack) applications can be written such that stack
execution is required. The linker notes this via the gnu_stack ELF flag.

This causes READ_IMPLIES_EXEC to be set which forces all PROT_READ mmaps
to have PROT_EXEC for the process.

Checking for VM_EXEC bit and failing the request with EPERM is overly
conservative and will break any PSM application using executable stacks.

Cc: #v4.14+
Fixes: 12220267645c ("IB/hfi: Protect against writable mmap")
Reviewed-by: Mike Marciniszyn
Reviewed-by: Dennis Dalessandro
Reviewed-by: Ira Weiny
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Michael J. Ruhl
2019-02-07 00:31:36 +0800

26 Jan, 2019

2 commits

9d3e1c098 IB/usnic: Fix potential deadlock ... Browse Code »

[ Upstream commit 8036e90f92aae2784b855a0007ae2d8154d28b3c ]

Acquiring the rtnl lock while holding usdev_lock could result in a
deadlock.

For example:

usnic_ib_query_port()
| mutex_lock(&us_ibdev->usdev_lock)
| ib_get_eth_speed()
| rtnl_lock()

rtnl_lock()
| usnic_ib_netdevice_event()
| mutex_lock(&us_ibdev->usdev_lock)

This commit moves the usdev_lock acquisition after the rtnl lock has been
released.

This is safe to do because usdev_lock is not protecting anything being
accessed in ib_get_eth_speed(). Hence, the correct order of holding locks
(rtnl -> usdev_lock) is not violated.

Signed-off-by: Parvi Kaustubhi
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin

Parvi Kaustubhi
2019-01-26 16:37:06 +0800
ce91ad1a6 rxe: IB_WR_REG_MR does not capture MR's iova field ... Browse Code »

[ Upstream commit b024dd0eba6e6d568f69d63c5e3153aba94c23e3 ]

FRWR memory registration is done with a series of calls and WRs.
1. ULP invokes ib_dma_map_sg()
2. ULP invokes ib_map_mr_sg()
3. ULP posts an IB_WR_REG_MR on the Send queue

Step 2 generates an iova. It is permissible for ULPs to change this
iova (with certain restrictions) between steps 2 and 3.

rxe_map_mr_sg captures the MR's iova but later when rxe processes the
REG_MR WR, it ignores the MR's iova field. If a ULP alters the MR's iova
after step 2 but before step 3, rxe never captures that change.

When the remote sends an RDMA Read targeting that MR, rxe looks up the
R_key, but the altered iova does not match the iova stored in the MR,
causing the RDMA Read request to fail.

Reported-by: Anna Schumaker
Signed-off-by: Chuck Lever
Reviewed-by: Sagi Grimberg
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin

Chuck Lever
2019-01-26 16:37:02 +0800

13 Jan, 2019

1 commit

2063fe0ec rxe: fix error completion wr_id and qp_num ... Browse Code »

commit e48d8ed9c6193502d849b35767fd18e20bbd7ba2 upstream.

Error completions must still contain a valid wr_id and
qp_num such that the consumer can rely on. Correctly
fill these fields in receive error completions.

Reported-by: Walker Benjamin
Cc: stable@vger.kernel.org
Signed-off-by: Sagi Grimberg
Reviewed-by: Zhu Yanjun
Tested-by: Zhu Yanjun
Signed-off-by: Doug Ledford
Signed-off-by: Greg Kroah-Hartman

Sagi Grimberg
2019-01-13 17:01:06 +0800

10 Jan, 2019

1 commit

9eb8b278c IB/hfi1: Incorrect sizing of sge for PIO will OOPs ... Browse Code »

commit dbc2970caef74e8ff41923d302aa6fb5a4812d0e upstream.

An incorrect sge sizing in the HFI PIO path will cause an OOPs similar to
this:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [] hfi1_verbs_send_pio+0x3d8/0x530 [hfi1]
PGD 0
Oops: 0000 1 SMP
Call Trace:
? hfi1_verbs_send_dma+0xad0/0xad0 [hfi1]
hfi1_verbs_send+0xdf/0x250 [hfi1]
? make_rc_ack+0xa80/0xa80 [hfi1]
hfi1_do_send+0x192/0x430 [hfi1]
hfi1_do_send_from_rvt+0x10/0x20 [hfi1]
rvt_post_send+0x369/0x820 [rdmavt]
ib_uverbs_post_send+0x317/0x570 [ib_uverbs]
ib_uverbs_write+0x26f/0x420 [ib_uverbs]
? security_file_permission+0x21/0xa0
vfs_write+0xbd/0x1e0
? mntput+0x24/0x40
SyS_write+0x7f/0xe0
system_call_fastpath+0x16/0x1b

Fix by adding the missing sizing check to correctly determine the sge
length.

Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Michael J. Ruhl
2019-01-10 00:14:46 +0800

29 Dec, 2018

1 commit

179c8da7f ib_srpt: Fix a use-after-free in __srpt_close_all_ch() ... Browse Code »

commit 14d15c2b278011056482eb015dff89f9cbf2b841 upstream

BUG: KASAN: use-after-free in srpt_set_enabled+0x1a9/0x1e0 [ib_srpt]
Read of size 4 at addr ffff8801269d23f8 by task check/29726

CPU: 4 PID: 29726 Comm: check Not tainted 4.18.0-rc2-dbg+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0xa4/0xf5
print_address_description+0x6f/0x270
kasan_report+0x241/0x360
__asan_load4+0x78/0x80
srpt_set_enabled+0x1a9/0x1e0 [ib_srpt]
srpt_tpg_enable_store+0xb8/0x120 [ib_srpt]
configfs_write_file+0x14e/0x1d0 [configfs]
__vfs_write+0xd2/0x3b0
vfs_write+0x101/0x270
ksys_write+0xab/0x120
__x64_sys_write+0x43/0x50
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f235cfe6154

Fixes: aaf45bd83eba ("IB/srpt: Detect session shutdown reliably")
Signed-off-by: Bart Van Assche
Cc:
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sudip Mukherjee
Signed-off-by: Sasha Levin

Bart Van Assche
2018-12-29 20:39:07 +0800

21 Dec, 2018

1 commit

12f75e8ad IB/hfi1: Remove race conditions in user_sdma send path ... Browse Code »

commit 28a9a9e83ceae2cee25b9af9ad20d53aaa9ab951 upstream

Packet queue state is over used to determine SDMA descriptor
availablitity and packet queue request state.

cpu 0 ret = user_sdma_send_pkts(req, pcount);
cpu 0 if (atomic_read(&pq->n_reqs))
cpu 1 IRQ user_sdma_txreq_cb calls pq_update() (state to _INACTIVE)
cpu 0 xchg(&pq->state, SDMA_PKT_Q_ACTIVE);

At this point pq->n_reqs == 0 and pq->state is incorrectly
SDMA_PKT_Q_ACTIVE. The close path will hang waiting for the state
to return to _INACTIVE.

This can also change the state from _DEFERRED to _ACTIVE. However,
this is a mostly benign race.

Remove the racy code path.

Use n_reqs to determine if a packet queue is active or not.

Cc: # 4.14.0>
Reviewed-by: Mitko Haralanov
Reviewed-by: Mike Marciniszyn
Signed-off-by: Michael J. Ruhl
Signed-off-by: Sasha Levin

Michael J. Ruhl
2018-12-21 21:13:11 +0800

17 Dec, 2018

4 commits

01a166012 IB/hfi1: Fix an out-of-bounds access in get_hw_stats ... Browse Code »

commit 36d842194a57f1b21fbc6a6875f2fa2f9a7f8679 upstream.

When running with KASAN, the following trace is produced:

[ 62.535888]

==================================================================
[ 62.544930] BUG: KASAN: slab-out-of-bounds in
gut_hw_stats+0x122/0x230 [hfi1]
[ 62.553856] Write of size 8 at addr ffff88080e8d6330 by task
kworker/0:1/14

[ 62.565333] CPU: 0 PID: 14 Comm: kworker/0:1 Not tainted
4.19.0-test-build-kasan+ #8
[ 62.575087] Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS
SE5C610.86B.01.01.0019.101220160604 10/12/2016
[ 62.587951] Workqueue: events work_for_cpu_fn
[ 62.594050] Call Trace:
[ 62.598023] dump_stack+0xc6/0x14c
[ 62.603089] ? dump_stack_print_info.cold.1+0x2f/0x2f
[ 62.610041] ? kmsg_dump_rewind_nolock+0x59/0x59
[ 62.616615] ? get_hw_stats+0x122/0x230 [hfi1]
[ 62.622985] print_address_description+0x6c/0x23c
[ 62.629744] ? get_hw_stats+0x122/0x230 [hfi1]
[ 62.636108] kasan_report.cold.6+0x241/0x308
[ 62.642365] get_hw_stats+0x122/0x230 [hfi1]
[ 62.648703] ? hfi1_alloc_rn+0x40/0x40 [hfi1]
[ 62.655088] ? __kmalloc+0x110/0x240
[ 62.660695] ? hfi1_alloc_rn+0x40/0x40 [hfi1]
[ 62.667142] setup_hw_stats+0xd8/0x430 [ib_core]
[ 62.673972] ? show_hfi+0x50/0x50 [hfi1]
[ 62.680026] ib_device_register_sysfs+0x165/0x180 [ib_core]
[ 62.687995] ib_register_device+0x5a2/0xa10 [ib_core]
[ 62.695340] ? show_hfi+0x50/0x50 [hfi1]
[ 62.701421] ? ib_unregister_device+0x2e0/0x2e0 [ib_core]
[ 62.709222] ? __vmalloc_node_range+0x2d0/0x380
[ 62.716131] ? rvt_driver_mr_init+0x11f/0x2d0 [rdmavt]
[ 62.723735] ? vmalloc_node+0x5c/0x70
[ 62.729697] ? rvt_driver_mr_init+0x11f/0x2d0 [rdmavt]
[ 62.737347] ? rvt_driver_mr_init+0x1f5/0x2d0 [rdmavt]
[ 62.744998] ? __rvt_alloc_mr+0x110/0x110 [rdmavt]
[ 62.752315] ? rvt_rc_error+0x140/0x140 [rdmavt]
[ 62.759434] ? rvt_vma_open+0x30/0x30 [rdmavt]
[ 62.766364] ? mutex_unlock+0x1d/0x40
[ 62.772445] ? kmem_cache_create_usercopy+0x15d/0x230
[ 62.780115] rvt_register_device+0x1f6/0x360 [rdmavt]
[ 62.787823] ? rvt_get_port_immutable+0x180/0x180 [rdmavt]
[ 62.796058] ? __get_txreq+0x400/0x400 [hfi1]
[ 62.802969] ? memcpy+0x34/0x50
[ 62.808611] hfi1_register_ib_device+0xde6/0xeb0 [hfi1]
[ 62.816601] ? hfi1_get_npkeys+0x10/0x10 [hfi1]
[ 62.823760] ? hfi1_init+0x89f/0x9a0 [hfi1]
[ 62.830469] ? hfi1_setup_eagerbufs+0xad0/0xad0 [hfi1]
[ 62.838204] ? pcie_capability_clear_and_set_word+0xcd/0xe0
[ 62.846429] ? pcie_capability_read_word+0xd0/0xd0
[ 62.853791] ? hfi1_pcie_init+0x187/0x4b0 [hfi1]
[ 62.860958] init_one+0x67f/0xae0 [hfi1]
[ 62.867301] ? hfi1_init+0x9a0/0x9a0 [hfi1]
[ 62.873876] ? wait_woken+0x130/0x130
[ 62.879860] ? read_word_at_a_time+0xe/0x20
[ 62.886329] ? strscpy+0x14b/0x280
[ 62.891998] ? hfi1_init+0x9a0/0x9a0 [hfi1]
[ 62.898405] local_pci_probe+0x70/0xd0
[ 62.904295] ? pci_device_shutdown+0x90/0x90
[ 62.910833] work_for_cpu_fn+0x29/0x40
[ 62.916750] process_one_work+0x584/0x960
[ 62.922974] ? rcu_work_rcufn+0x40/0x40
[ 62.928991] ? __schedule+0x396/0xdc0
[ 62.934806] ? __sched_text_start+0x8/0x8
[ 62.941020] ? pick_next_task_fair+0x68b/0xc60
[ 62.947674] ? run_rebalance_domains+0x260/0x260
[ 62.954471] ? __list_add_valid+0x29/0xa0
[ 62.960607] ? move_linked_works+0x1c7/0x230
[ 62.967077] ?
trace_event_raw_event_workqueue_execute_start+0x140/0x140
[ 62.976248] ? mutex_lock+0xa6/0x100
[ 62.982029] ? __mutex_lock_slowpath+0x10/0x10
[ 62.988795] ? __switch_to+0x37a/0x710
[ 62.994731] worker_thread+0x62e/0x9d0
[ 63.000602] ? max_active_store+0xf0/0xf0
[ 63.006828] ? __switch_to_asm+0x40/0x70
[ 63.012932] ? __switch_to_asm+0x34/0x70
[ 63.019013] ? __switch_to_asm+0x40/0x70
[ 63.025042] ? __switch_to_asm+0x34/0x70
[ 63.031030] ? __switch_to_asm+0x40/0x70
[ 63.037006] ? __schedule+0x396/0xdc0
[ 63.042660] ? kmem_cache_alloc_trace+0xf3/0x1f0
[ 63.049323] ? kthread+0x59/0x1d0
[ 63.054594] ? ret_from_fork+0x35/0x40
[ 63.060257] ? __sched_text_start+0x8/0x8
[ 63.066212] ? schedule+0xcf/0x250
[ 63.071529] ? __wake_up_common+0x110/0x350
[ 63.077794] ? __schedule+0xdc0/0xdc0
[ 63.083348] ? wait_woken+0x130/0x130
[ 63.088963] ? finish_task_switch+0x1f1/0x520
[ 63.095258] ? kasan_unpoison_shadow+0x30/0x40
[ 63.101792] ? __init_waitqueue_head+0xa0/0xd0
[ 63.108183] ? replenish_dl_entity.cold.60+0x18/0x18
[ 63.115151] ? _raw_spin_lock_irqsave+0x25/0x50
[ 63.121754] ? max_active_store+0xf0/0xf0
[ 63.127753] kthread+0x1ae/0x1d0
[ 63.132894] ? kthread_bind+0x30/0x30
[ 63.138422] ret_from_fork+0x35/0x40

[ 63.146973] Allocated by task 14:
[ 63.152077] kasan_kmalloc+0xbf/0xe0
[ 63.157471] __kmalloc+0x110/0x240
[ 63.162804] init_cntrs+0x34d/0xdf0 [hfi1]
[ 63.168883] hfi1_init_dd+0x29a3/0x2f90 [hfi1]
[ 63.175244] init_one+0x551/0xae0 [hfi1]
[ 63.181065] local_pci_probe+0x70/0xd0
[ 63.186759] work_for_cpu_fn+0x29/0x40
[ 63.192310] process_one_work+0x584/0x960
[ 63.198163] worker_thread+0x62e/0x9d0
[ 63.203843] kthread+0x1ae/0x1d0
[ 63.208874] ret_from_fork+0x35/0x40

[ 63.217203] Freed by task 1:
[ 63.221844] __kasan_slab_free+0x12e/0x180
[ 63.227844] kfree+0x92/0x1a0
[ 63.232570] single_release+0x3a/0x60
[ 63.238024] __fput+0x1d9/0x480
[ 63.242911] task_work_run+0x139/0x190
[ 63.248440] exit_to_usermode_loop+0x191/0x1a0
[ 63.254814] do_syscall_64+0x301/0x330
[ 63.260283] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[ 63.270199] The buggy address belongs to the object at
ffff88080e8d5500
which belongs to the cache kmalloc-4096 of size 4096
[ 63.287247] The buggy address is located 3632 bytes inside of
4096-byte region [ffff88080e8d5500, ffff88080e8d6500)
[ 63.303564] The buggy address belongs to the page:
[ 63.310447] page:ffffea00203a3400 count:1 mapcount:0
mapping:ffff88081380e840 index:0x0 compound_mapcount: 0
[ 63.323102] flags: 0x2fffff80008100(slab|head)
[ 63.329775] raw: 002fffff80008100 0000000000000000 0000000100000001
ffff88081380e840
[ 63.340175] raw: 0000000000000000 0000000000070007 00000001ffffffff
0000000000000000
[ 63.350564] page dumped because: kasan: bad access detected

[ 63.361974] Memory state around the buggy address:
[ 63.369137] ffff88080e8d6200: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
[ 63.379082] ffff88080e8d6280: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
[ 63.389032] >ffff88080e8d6300: 00 00 00 00 00 00 fc fc fc fc fc fc fc
fc fc fc
[ 63.398944] ^
[ 63.406141] ffff88080e8d6380: fc fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc
[ 63.416109] ffff88080e8d6400: fc fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc
[ 63.426099]
==================================================================

The trace happens because get_hw_stats() assumes there is room in the
memory allocated in init_cntrs() to accommodate the driver counters.
Unfortunately, that routine only allocated space for the device
counters.

Fix by insuring the allocation has room for the additional driver
counters.

Cc: # v4.14+
Fixes: b7481944b06e9 ("IB/hfi1: Show statistics counters under IB stats interface")
Reviewed-by: Mike Marciniczyn
Reviewed-by: Mike Ruhl
Signed-off-by: Piotr Stankiewicz
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford
Signed-off-by: Greg Kroah-Hartman

Piotr Stankiewicz
2018-12-17 16:28:55 +0800
8f859b272 IB/mlx5: Fix page fault handling for MW ... Browse Code »

[ Upstream commit 75b7b86bdb0df37e08e44b6c1f99010967f81944 ]

Memory windows are implemented with an indirect MKey, when a page fault
event comes for a MW Mkey we need to find the MR at the end of the list of
the indirect MKeys by iterating on all items from the first to the last.

The offset calculated during this process has to be zeroed after the first
iteration or the next iteration will start from a wrong address, resulting
incorrect ODP faulting behavior.

Fixes: db570d7deafb ("IB/mlx5: Add ODP support to MW")
Signed-off-by: Artemy Kovalyov
Signed-off-by: Moni Shoua
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin

Artemy Kovalyov
2018-12-17 16:28:51 +0800
a625d3e3c RDMA/rdmavt: Fix rvt_create_ah function signature ... Browse Code »

[ Upstream commit 4f32fb921b153ae9ea280e02a3e91509fffc03d3 ]

rdmavt uses a crazy system that looses the type checking when assinging
functions to struct ib_device function pointers. Because of this the
signature to this function was not changed when the below commit revised
things.

Fix the signature so we are not calling a function pointer with a
mismatched signature.

Fixes: 477864c8fcd9 ("IB/core: Let create_ah return extended response to user")
Signed-off-by: Kamal Heib
Reviewed-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin

Kamal Heib
2018-12-17 16:28:50 +0800
9692eefbe RDMA/mlx5: Fix fence type for IB_WR_LOCAL_INV WR ... Browse Code »

[ Upstream commit 074fca3a18e7e1e0d4d7dcc9d7badc43b90232f4 ]

Currently, for IB_WR_LOCAL_INV WR, when the next fence is None, the
current fence will be SMALL instead of Normal Fence.

Without this patch krping doesn't work on CX-5 devices and throws
following error:

The error messages are from CX5 driver are: (from server side)
[ 710.434014] mlx5_0:dump_cqe:278:(pid 2712): dump error cqe
[ 710.434016] 00000000 00000000 00000000 00000000
[ 710.434016] 00000000 00000000 00000000 00000000
[ 710.434017] 00000000 00000000 00000000 00000000
[ 710.434018] 00000000 93003204 100000b8 000524d2
[ 710.434019] krping: cq completion failed with wr_id 0 status 4 opcode 128 vender_err 32

Fixed the logic to set the correct fence type.

Fixes: 6e8484c5cf07 ("RDMA/mlx5: set UMR wqe fence according to HCA cap")
Signed-off-by: Majd Dibbiny
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin

Majd Dibbiny
2018-12-17 16:28:50 +0800

08 Dec, 2018

2 commits

53f6341a9 IB/mlx5: Avoid load failure due to unknown link width ... Browse Code »

commit db7a691a1551a748cb92d9c89c6b190ea87e28d5 upstream.

If the firmware reports a connection width that is not 1x, 4x, 8x or 12x
it causes the driver to fail during initialization.

To prevent this failure every time a new width is introduced to the RDMA
stack, we will set a default 4x width for these widths which ar unknown to
the driver.

This is needed to allow to run old kernels with new firmware.

Cc: # 4.1
Fixes: 1b5daf11b015 ("IB/mlx5: Avoid using the MAD_IFC command under ISSI > 0 mode")
Signed-off-by: Michael Guralnik
Reviewed-by: Majd Dibbiny
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Michael Guralnik
2018-12-08 20:03:37 +0800
48c11537e iser: set sector for ambiguous mr status errors ... Browse Code »

commit 24c3456c8d5ee6fc1933ca40f7b4406130682668 upstream.

If for some reason we failed to query the mr status, we need to make sure
to provide sufficient information for an ambiguous error (guard error on
sector 0).

Fixes: 0a7a08ad6f5f ("IB/iser: Implement check_protection")
Cc:
Reported-by: Dan Carpenter
Signed-off-by: Sagi Grimberg
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Sagi Grimberg
2018-12-08 20:03:36 +0800

01 Dec, 2018

3 commits

31a673709 i40iw: Fix memory leak in error path of create QP ... Browse Code »

commit 5a7189d529cd146cd5838af97b32fcac4122b471 upstream.

If i40iw_allocate_dma_mem fails when creating a QP, the
memory allocated for the QP structure using kzalloc is not
freed because iwqp->allocated_buffer is used to free the
memory and it is not setup until later. Fix this by setting
iwqp->allocated_buffer before allocating the dma memory.

Fixes: d37498417947 ("i40iw: add files for iwarp interface")
Signed-off-by: Mustafa Ismail
Signed-off-by: Shiraz Saleem
Signed-off-by: Doug Ledford
Signed-off-by: Sudip Mukherjee
Signed-off-by: Greg Kroah-Hartman

Mustafa Ismail
2018-12-01 16:42:58 +0800
240ec6ca4 IB/hfi1: Eliminate races in the SDMA send error path ... Browse Code »

commit a0e0cb82804a6a21d9067022c2dfdf80d11da429 upstream.

pq_update() can only be called in two places: from the completion
function when the complete (npkts) sequence of packets has been
submitted and processed, or from setup function if a subset of the
packets were submitted (i.e. the error path).

Currently both paths can call pq_update() if an error occurrs. This
race will cause the n_req value to go negative, hanging file_close(),
or cause a crash by freeing the txlist more than once.

Several variables are used to determine SDMA send state. Most of
these are unnecessary, and have code inspectible races between the
setup function and the completion function, in both the send path and
the error path.

The request 'status' value can be set by the setup or by the
completion function. This is code inspectibly racy. Since the status
is not needed in the completion code or by the caller it has been
removed.

The request 'done' value races between usage by the setup and the
completion function. The completion function does not need this.
When the number of processed packets matches npkts, it is done.

The 'has_error' value races between usage of the setup and the
completion function. This can cause incorrect error handling and leave
the n_req in an incorrect value (i.e. negative).

Simplify the code by removing all of the unneeded state checks and
variables.

Clean up iovs node when it is freed.

Eliminate race conditions in the error path:

If all packets are submitted, the completion handler will set the
completion status correctly (ok or aborted).

If all packets are not submitted, the caller must wait until the
submitted packets have completed, and then set the completion status.

These two change eliminate the race condition in the error path.

Reviewed-by: Mitko Haralanov
Reviewed-by: Mike Marciniszyn
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Michael J. Ruhl
2018-12-01 16:42:53 +0800
c271b660b IB/core: Perform modify QP on real one ... Browse Code »

commit b2bedfb39541a7e14798d066b6f8685d84c8fcf5 upstream.

Currently qp->port stores the port number whenever IB_QP_PORT
QP attribute mask is set (during QP state transition to INIT state).
This port number should be stored for the real QP when XRC target QP
is used.

Follow the ib_modify_qp() implementation and hide the access to ->real_qp.

Fixes: a512c2fbef9c ("IB/core: Introduce modify QP operation with udata")
Signed-off-by: Parav Pandit
Reviewed-by: Daniel Jurgens
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Parav Pandit
2018-12-01 16:42:51 +0800

14 Nov, 2018

5 commits

97a063dfc IB/mlx5: Fix MR cache initialization ... Browse Code »

commit 013c2403bf32e48119aeb13126929f81352cc7ac upstream.

Schedule MR cache work only after bucket was initialized.

Cc: # 4.10
Fixes: 49780d42dfc9 ("IB/mlx5: Expose MR cache for mlx5_ib")
Signed-off-by: Artemy Kovalyov
Reviewed-by: Majd Dibbiny
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Artemy Kovalyov
2018-11-14 03:15:05 +0800
ab5ed779f IB/rxe: fix for duplicate request processing and ack psns ... Browse Code »

[ Upstream commit b97db58557f4aa6d9903f8e1deea6b3d1ed0ba43 ]

Don't reset the resp opcode for a replayed read response.
The resp opcode could be in the middle of a write or send
sequence, when the duplicate read request was received.
An example sequence is as follows:
- Receive read request for 12KB PSN 20. Transmit read response
first, middle and last with PSNs 20,21,22.
- Receive write first PSN 23.
At this point the resp psn is 24 and resp opcode is write first.
- The sender notices that PSN 20 is dropped and retransmits.
Receive read request for 12KB PSN 20. Transmit read response
first, middle and last with PSNs 20,21,22. The resp opcode is
set to -1, the resp psn remains 24.
- Receive write first PSN 23. This is processed by duplicate_request().
The resp opcode remains -1 and resp psn remains 24.
- Receive write middle PSN 24. check_op_seq() reports a missing
first error since the resp opcode is -1.

When sending an ack for a duplicate send or write request,
use the psn of the previous ack sent. Do not use the psn
of a read response for the ack.
An example sequence is as follows:
- Receive write PSN 30. Transmit ACK for PSN 30.
- Receive read request 4KB PSN 31. Transmit read response with
PSN 31. The resp psn is now 32.
- The sender notices that PSN 30 is dropped and retransmits.
Receive write PSN 30. duplicate_request() sends an ACK with
PSN 31. That is incorrect since PSN 31 was a read request.

Signed-off-by: Vijay Immanuel
Signed-off-by: Doug Ledford
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Vijay Immanuel
2018-11-14 03:15:00 +0800
88018bc07 RDMA/bnxt_re: Fix recursive lock warning in debug kernel ... Browse Code »

[ Upstream commit d455f29f6d76a5f94881ca1289aaa1e90617ff5d ]

Fix possible recursive lock warning. Its a false warning as the locks are
part of two differnt HW Queue data structure - cmdq and creq. Debug kernel
is throwing the following warning and stack trace.

[ 783.914967] ============================================
[ 783.914970] WARNING: possible recursive locking detected
[ 783.914973] 4.19.0-rc2+ #33 Not tainted
[ 783.914976] --------------------------------------------
[ 783.914979] swapper/2/0 is trying to acquire lock:
[ 783.914982] 000000002aa3949d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x232/0x350 [bnxt_re]
[ 783.914999]
but task is already holding lock:
[ 783.915002] 00000000be73920d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x2a/0x350 [bnxt_re]
[ 783.915013]
other info that might help us debug this:
[ 783.915016] Possible unsafe locking scenario:

[ 783.915019] CPU0
[ 783.915021] ----
[ 783.915034] lock(&(&hwq->lock)->rlock);
[ 783.915035] lock(&(&hwq->lock)->rlock);
[ 783.915037]
*** DEADLOCK ***

[ 783.915038] May be due to missing lock nesting notation

[ 783.915039] 1 lock held by swapper/2/0:
[ 783.915040] #0: 00000000be73920d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x2a/0x350 [bnxt_re]
[ 783.915044]
stack backtrace:
[ 783.915046] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.19.0-rc2+ #33
[ 783.915047] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014
[ 783.915048] Call Trace:
[ 783.915049]
[ 783.915054] dump_stack+0x90/0xe3
[ 783.915058] __lock_acquire+0x106c/0x1080
[ 783.915061] ? sched_clock+0x5/0x10
[ 783.915063] lock_acquire+0xbd/0x1a0
[ 783.915065] ? bnxt_qplib_service_creq+0x232/0x350 [bnxt_re]
[ 783.915069] _raw_spin_lock_irqsave+0x4a/0x90
[ 783.915071] ? bnxt_qplib_service_creq+0x232/0x350 [bnxt_re]
[ 783.915073] bnxt_qplib_service_creq+0x232/0x350 [bnxt_re]
[ 783.915078] tasklet_action_common.isra.17+0x197/0x1b0
[ 783.915081] __do_softirq+0xcb/0x3a6
[ 783.915084] irq_exit+0xe9/0x100
[ 783.915085] do_IRQ+0x6a/0x120
[ 783.915087] common_interrupt+0xf/0xf
[ 783.915088]

Use nested notation for the spin_lock to avoid this warning.

Signed-off-by: Selvin Xavier
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Selvin Xavier
2018-11-14 03:14:57 +0800
b02034726 IB/ipoib: Clear IPCB before icmp_send ... Browse Code »

[ Upstream commit 4d6e4d12da2c308f8f976d3955c45ee62539ac98 ]

IPCB should be cleared before icmp_send, since it may contain data from
previous layers and the data could be misinterpreted as ip header options,
which later caused the ihl to be set to an invalid value and resulted in
the following stack corruption:

[ 1083.031512] ib0: packet len 57824 (> 2048) too long to send, dropping
[ 1083.031843] ib0: packet len 37904 (> 2048) too long to send, dropping
[ 1083.032004] ib0: packet len 4040 (> 2048) too long to send, dropping
[ 1083.032253] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.032481] ib0: packet len 23960 (> 2048) too long to send, dropping
[ 1083.033149] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.033439] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.033700] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.034124] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.034387] ==================================================================
[ 1083.034602] BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xf08/0x1310
[ 1083.034798] Write of size 4 at addr ffff880353457c5f by task kworker/u16:0/7
[ 1083.034990]
[ 1083.035104] CPU: 7 PID: 7 Comm: kworker/u16:0 Tainted: G O 4.19.0-rc5+ #1
[ 1083.035316] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
[ 1083.035573] Workqueue: ipoib_wq ipoib_cm_skb_reap [ib_ipoib]
[ 1083.035750] Call Trace:
[ 1083.035888] dump_stack+0x9a/0xeb
[ 1083.036031] print_address_description+0xe3/0x2e0
[ 1083.036213] kasan_report+0x18a/0x2e0
[ 1083.036356] ? __ip_options_echo+0xf08/0x1310
[ 1083.036522] __ip_options_echo+0xf08/0x1310
[ 1083.036688] icmp_send+0x7b9/0x1cd0
[ 1083.036843] ? icmp_route_lookup.constprop.9+0x1070/0x1070
[ 1083.037018] ? netif_schedule_queue+0x5/0x200
[ 1083.037180] ? debug_show_all_locks+0x310/0x310
[ 1083.037341] ? rcu_dynticks_curr_cpu_in_eqs+0x85/0x120
[ 1083.037519] ? debug_locks_off+0x11/0x80
[ 1083.037673] ? debug_check_no_obj_freed+0x207/0x4c6
[ 1083.037841] ? check_flags.part.27+0x450/0x450
[ 1083.037995] ? debug_check_no_obj_freed+0xc3/0x4c6
[ 1083.038169] ? debug_locks_off+0x11/0x80
[ 1083.038318] ? skb_dequeue+0x10e/0x1a0
[ 1083.038476] ? ipoib_cm_skb_reap+0x2b5/0x650 [ib_ipoib]
[ 1083.038642] ? netif_schedule_queue+0xa8/0x200
[ 1083.038820] ? ipoib_cm_skb_reap+0x544/0x650 [ib_ipoib]
[ 1083.038996] ipoib_cm_skb_reap+0x544/0x650 [ib_ipoib]
[ 1083.039174] process_one_work+0x912/0x1830
[ 1083.039336] ? wq_pool_ids_show+0x310/0x310
[ 1083.039491] ? lock_acquire+0x145/0x3a0
[ 1083.042312] worker_thread+0x87/0xbb0
[ 1083.045099] ? process_one_work+0x1830/0x1830
[ 1083.047865] kthread+0x322/0x3e0
[ 1083.050624] ? kthread_create_worker_on_cpu+0xc0/0xc0
[ 1083.053354] ret_from_fork+0x3a/0x50

For instance __ip_options_echo is failing to proceed with invalid srr and
optlen passed from another layer via IPCB

[ 762.139568] IPv4: __ip_options_echo rr=0 ts=0 srr=43 cipso=0
[ 762.139720] IPv4: ip_options_build: IPCB 00000000f3cd969e opt 000000002ccb3533
[ 762.139838] IPv4: __ip_options_echo in srr: optlen 197 soffset 84
[ 762.139852] IPv4: ip_options_build srr=0 is_frag=0 rr_needaddr=0 ts_needaddr=0 ts_needtime=0 rr=0 ts=0
[ 762.140269] ==================================================================
[ 762.140713] IPv4: __ip_options_echo rr=0 ts=0 srr=0 cipso=0
[ 762.141078] BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0x12ec/0x1680
[ 762.141087] Write of size 4 at addr ffff880353457c7f by task kworker/u16:0/7

Signed-off-by: Denis Drozdov
Reviewed-by: Erez Shitrit
Reviewed-by: Feras Daoud
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Denis Drozdov
2018-11-14 03:14:57 +0800
670ebdf0c RDMA/core: Do not expose unsupported counters ... Browse Code »

[ Upstream commit 0f6ef65d1c6ec8deb5d0f11f86631ec4cfe8f22e ]

If the provider driver (such as rdma_rxe) doesn't support pma counters,
avoid exposing its directory similar to optional hw_counters directory.
If core fails to read the PMA counter, return an error so that user can
retry later if needed.

Fixes: 35c4cbb17811 ("IB/core: Create get_perf_mad function in sysfs.c")
Reported-by: Holger Hoffstätte
Tested-by: Holger Hoffstätte
Signed-off-by: Parav Pandit
Signed-off-by: Leon Romanovsky
Signed-off-by: Doug Ledford
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Parav Pandit
2018-11-14 03:14:57 +0800

10 Nov, 2018

2 commits

f0e3b74a4 IB/ucm: Fix Spectre v1 vulnerability ... Browse Code »

commit 0295e39595e1146522f2722715dba7f7fba42217 upstream.

hdr.cmd can be indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

drivers/infiniband/core/ucm.c:1127 ib_ucm_write() warn: potential
spectre issue 'ucm_cmd_table' [r] (local cap)

Fix this by sanitizing hdr.cmd before using it to index
ucm_cmd_table.

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva
Signed-off-by: Doug Ledford
Signed-off-by: Greg Kroah-Hartman

Gustavo A. R. Silva
2018-11-10 23:48:35 +0800
66448066c RDMA/ucma: Fix Spectre v1 vulnerability ... Browse Code »

commit a3671a4f973ee9d9621d60166cc3b037c397d604 upstream.

hdr.cmd can be indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

drivers/infiniband/core/ucma.c:1686 ucma_write() warn: potential
spectre issue 'ucma_cmd_table' [r] (local cap)

Fix this by sanitizing hdr.cmd before using it to index
ucm_cmd_table.

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva
Signed-off-by: Doug Ledford
Signed-off-by: Greg Kroah-Hartman

Gustavo A. R. Silva
2018-11-10 23:48:35 +0800

04 Nov, 2018

5 commits

e6df57b60 IB/usnic: Update with bug fixes from core code ... Browse Code »

[ Upstream commit 43cbd64b1fdc1da89abdad88a022d9e87a98e9c6 ]

usnic has a modified version of the core codes' ib_umem_get() and
related, and the copy misses many of the bug fixes done over the years:

Commit bc3e53f682d9 ("mm: distinguish between mlocked and pinned pages")
Commit 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm
from ib_umem_get")
Commit 8494057ab5e4 ("IB/uverbs: Prevent integer overflow in ib_umem_get
address arithmetic")
Commit 8abaae62f3fd ("IB/core: disallow registering 0-sized memory region")
Commit 66578b0b2f69 ("IB/core: don't disallow registering region starting
at 0x0")
Commit 53376fedb9da ("RDMA/core: not to set page dirty bit if it's already
set.")
Commit 8e907ed48827 ("IB/umem: Use the correct mm during ib_umem_release")

Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin

Jason Gunthorpe
2018-11-04 21:52:44 +0800
579493b9f IB/mlx5: Avoid passing an invalid QP type to firmware ... Browse Code »

[ Upstream commit e7b169f34403becd3c9fd3b6e46614ab788f2187 ]

During QP creation, the mlx5 driver translates the QP type to an
internal value which is passed on to FW. There was no check to make
sure that the translated value is valid, and -EINVAL was coerced into
the mailbox command.

Current firmware refuses this as an invalid QP type, but future/past
firmware may do something else.

Fixes: 09a7d9eca1a6c ('{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc')
Reviewed-by: Ilya Lesokhin
Signed-off-by: Noa Osherovich
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin

Noa Osherovich
2018-11-04 21:52:43 +0800
71a9d1240 net/mlx5: Fix mlx5_get_vector_affinity function ... Browse Code »

[ Upstream commit 6082d9c9c94a408d7409b5f2e4e42ac9e8b16d0d ]

Adding the vector offset when calling to mlx5_vector2eqn() is wrong.
This is because mlx5_vector2eqn() checks if EQ index is equal to vector number
and the fact that the internal completion vectors that mlx5 allocates
don't get an EQ index.

The second problem here is that using effective_affinity_mask gives the same
CPU for different vectors.
This leads to unmapped queues when calling it from blk_mq_rdma_map_queues().
This doesn't happen when using affinity_hint mask.

Fixes: 2572cf57d75a ("mlx5: fix mlx5_get_vector_affinity to start from completion vector 0")
Fixes: 05e0cc84e00c ("net/mlx5: Fix get vector affinity helper function")
Signed-off-by: Israel Rukshin
Reviewed-by: Max Gurtovoy
Reviewed-by: Sagi Grimberg
Signed-off-by: Sasha Levin

Israel Rukshin
2018-11-04 21:52:42 +0800
18addd960 IB/rxe: put the pool on allocation failure ... Browse Code »

[ Upstream commit 6b9f8970cd30929cb6b372fa44fa66da9e59c650 ]

If the allocation of elem fails, it is not sufficient to simply check
for NULL and return. We need to also put our reference on the pool or
else we will leave the pool with a permanent ref count and we will never
be able to free it.

Fixes: 4831ca9e4a8e ("IB/rxe: check for allocation failure on elem")
Suggested-by: Leon Romanovsky
Signed-off-by: Doug Ledford
Signed-off-by: Sasha Levin

Doug Ledford
2018-11-04 21:52:42 +0800
8c954368d IB/ipoib: Fix lockdep issue found on ipoib_ib_dev_heavy_flush ... Browse Code »

[ Upstream commit 1f80bd6a6cc8358b81194e1f5fc16449947396ec ]

The locking order of vlan_rwsem (LOCK A) and then rtnl (LOCK B),
contradicts other flows such as ipoib_open possibly causing a deadlock.
To prevent this deadlock heavy flush is called with RTNL locked and
only then tries to acquire vlan_rwsem.
This deadlock is possible only when there are child interfaces.

[ 140.941758] ======================================================
[ 140.946276] WARNING: possible circular locking dependency detected
[ 140.950950] 4.15.0-rc1+ #9 Tainted: G O
[ 140.954797] ------------------------------------------------------
[ 140.959424] kworker/u32:1/146 is trying to acquire lock:
[ 140.963450] (rtnl_mutex){+.+.}, at: [] __ipoib_ib_dev_flush+0x2da/0x4e0 [ib_ipoib]
[ 140.970006]
but task is already holding lock:
[ 140.975141] (&priv->vlan_rwsem){++++}, at: [] __ipoib_ib_dev_flush+0x51/0x4e0 [ib_ipoib]
[ 140.982105]
which lock already depends on the new lock.
[ 140.990023]
the existing dependency chain (in reverse order) is:
[ 140.998650]
-> #1 (&priv->vlan_rwsem){++++}:
[ 141.005276] down_read+0x4d/0xb0
[ 141.009560] ipoib_open+0xad/0x120 [ib_ipoib]
[ 141.014400] __dev_open+0xcb/0x140
[ 141.017919] __dev_change_flags+0x1a4/0x1e0
[ 141.022133] dev_change_flags+0x23/0x60
[ 141.025695] devinet_ioctl+0x704/0x7d0
[ 141.029156] sock_do_ioctl+0x20/0x50
[ 141.032526] sock_ioctl+0x221/0x300
[ 141.036079] do_vfs_ioctl+0xa6/0x6d0
[ 141.039656] SyS_ioctl+0x74/0x80
[ 141.042811] entry_SYSCALL_64_fastpath+0x1f/0x96
[ 141.046891]
-> #0 (rtnl_mutex){+.+.}:
[ 141.051701] lock_acquire+0xd4/0x220
[ 141.055212] __mutex_lock+0x88/0x970
[ 141.058631] __ipoib_ib_dev_flush+0x2da/0x4e0 [ib_ipoib]
[ 141.063160] __ipoib_ib_dev_flush+0x71/0x4e0 [ib_ipoib]
[ 141.067648] process_one_work+0x1f5/0x610
[ 141.071429] worker_thread+0x4a/0x3f0
[ 141.074890] kthread+0x141/0x180
[ 141.078085] ret_from_fork+0x24/0x30
[ 141.081559]

other info that might help us debug this:
[ 141.088967] Possible unsafe locking scenario:
[ 141.094280] CPU0 CPU1
[ 141.097953] ---- ----
[ 141.101640] lock(&priv->vlan_rwsem);
[ 141.104771] lock(rtnl_mutex);
[ 141.109207] lock(&priv->vlan_rwsem);
[ 141.114032] lock(rtnl_mutex);
[ 141.116800]
*** DEADLOCK ***

Fixes: b4b678b06f6e ("IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop")
Signed-off-by: Alex Vesker
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin

Alex Vesker
2018-11-04 21:52:42 +0800

20 Oct, 2018

1 commit

6edd85a78 IB/hfi1: Fix destroy_qp hang after a link down ... Browse Code »

commit b4a4957d3d1c328b733fce783b7264996f866ad2 upstream.

rvt_destroy_qp() cannot complete until all in process packets have
been released from the underlying hardware. If a link down event
occurs, an application can hang with a kernel stack similar to:

cat /proc//stack
quiesce_qp+0x178/0x250 [hfi1]
rvt_reset_qp+0x23d/0x400 [rdmavt]
rvt_destroy_qp+0x69/0x210 [rdmavt]
ib_destroy_qp+0xba/0x1c0 [ib_core]
nvme_rdma_destroy_queue_ib+0x46/0x80 [nvme_rdma]
nvme_rdma_free_queue+0x3c/0xd0 [nvme_rdma]
nvme_rdma_destroy_io_queues+0x88/0xd0 [nvme_rdma]
nvme_rdma_error_recovery_work+0x52/0xf0 [nvme_rdma]
process_one_work+0x17a/0x440
worker_thread+0x126/0x3c0
kthread+0xcf/0xe0
ret_from_fork+0x58/0x90
0xffffffffffffffff

quiesce_qp() waits until all outstanding packets have been freed.
This wait should be momentary. During a link down event, the cleanup
handling does not ensure that all packets caught by the link down are
flushed properly.

This is caused by the fact that the freeze path and the link down
event is handled the same. This is not correct. The freeze path
waits until the HFI is unfrozen and then restarts PIO. A link down
is not a freeze event. The link down path cannot restart the PIO
until link is restored. If the PIO path is restarted before the link
comes up, the application (QP) using the PIO path will hang (until
link is restored).

Fix by separating the linkdown path from the freeze path and use the
link down path for link down events.

Close a race condition sc_disable() by acquiring both the progress
and release locks.

Close a race condition in sc_stop() by moving the setting of the flag
bits under the alloc lock.

Cc: # 4.9.x+
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Michael J. Ruhl
2018-10-20 15:48:54 +0800

13 Oct, 2018

1 commit

5656b7354 ucma: fix a use-after-free in ucma_resolve_ip() ... Browse Code »

commit 5fe23f262e0548ca7f19fb79f89059a60d087d22 upstream.

There is a race condition between ucma_close() and ucma_resolve_ip():

CPU0 CPU1
ucma_resolve_ip(): ucma_close():

ctx = ucma_get_ctx(file, cmd.id);

list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) {
mutex_lock(&mut);
idr_remove(&ctx_idr, ctx->id);
mutex_unlock(&mut);
...
mutex_lock(&mut);
if (!ctx->closing) {
mutex_unlock(&mut);
rdma_destroy_id(ctx->cm_id);
...
ucma_free_ctx(ctx);

ret = rdma_resolve_addr();
ucma_put_ctx(ctx);

Before idr_remove(), ucma_get_ctx() could still find the ctx
and after rdma_destroy_id(), rdma_resolve_addr() may still
access id_priv pointer. Also, ucma_put_ctx() may use ctx after
ucma_free_ctx() too.

ucma_close() should call ucma_put_ctx() too which tests the
refcnt and waits for the last one releasing it. The similar
pattern is already used by ucma_destroy_id().

Reported-and-tested-by: syzbot+da2591e115d57a9cbb8b@syzkaller.appspotmail.com
Reported-by: syzbot+cfe3c1e8ef634ba8964b@syzkaller.appspotmail.com
Cc: Jason Gunthorpe
Cc: Doug Ledford
Cc: Leon Romanovsky
Signed-off-by: Cong Wang
Reviewed-by: Leon Romanovsky
Signed-off-by: Doug Ledford
Signed-off-by: Greg Kroah-Hartman

Cong Wang
2018-10-13 15:27:29 +0800

10 Oct, 2018

1 commit

d4da71220 RDMA/ucma: check fd type in ucma_migrate_id() ... Browse Code »

[ Upstream commit 0d23ba6034b9cf48b8918404367506da3e4b3ee5 ]

The current code grabs the private_data of whatever file descriptor
userspace has supplied and implicitly casts it to a `struct ucma_file *`,
potentially causing a type confusion.

This is probably fine in practice because the pointer is only used for
comparisons, it is never actually dereferenced; and even in the
comparisons, it is unlikely that a file from another filesystem would have
a ->private_data pointer that happens to also be valid in this context.
But ->private_data is not always guaranteed to be a valid pointer to an
object owned by the file's filesystem; for example, some filesystems just
cram numbers in there.

Check the type of the supplied file descriptor to be safe, analogous to how
other places in the kernel do it.

Fixes: 88314e4dda1e ("RDMA/cma: add support for rdma_migrate_id()")
Signed-off-by: Jann Horn
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Jann Horn
2018-10-10 14:54:24 +0800

04 Oct, 2018

9 commits

105470069 RDMA/uverbs: Atomically flush and mark closed the comp event queue ... Browse Code »

commit 67e3816842fe6414d629c7515b955952ec40c7d7 upstream.

Currently a uverbs completion event queue is flushed of events in
ib_uverbs_comp_event_close() with the queue spinlock held and then
released. Yet setting ev_queue->is_closed is not set until later in
uverbs_hot_unplug_completion_event_file().

In between the time ib_uverbs_comp_event_close() releases the lock and
uverbs_hot_unplug_completion_event_file() acquires the lock, a completion
event can arrive and be inserted into the event queue by
ib_uverbs_comp_handler().

This can cause a "double add" list_add warning or crash depending on the
kernel configuration, or a memory leak because the event is never dequeued
since the queue is already closed down.

So add setting ev_queue->is_closed = 1 to ib_uverbs_comp_event_close().

Cc: stable@vger.kernel.org
Fixes: 1e7710f3f656 ("IB/core: Change completion channel to use the reworked objects schema")
Signed-off-by: Steve Wise
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Steve Wise
2018-10-04 08:00:56 +0800
693536a7c IB/hfi1: Fix context recovery when PBC has an UnsupportedVL ... Browse Code »

commit d623500b3c4efd8d4e945ac9003c6b87b469a9ab upstream.

If a packet stream uses an UnsupportedVL (virtual lane), the send
engine will not send the packet, and it will not indicate that an
error has occurred. This will cause the packet stream to block.

HFI has 8 virtual lanes available for packet streams. Each lane can
be enabled or disabled using the UnsupportedVL mask. If a lane is
disabled, adding a packet to the send context must be disallowed.

The current mask for determining unsupported VLs defaults to 0 (allow
all). This is incorrect. Only the VLs that are defined should be
allowed.

Determine which VLs are disabled (mtu == 0), and set the appropriate
unsupported bit in the mask. The correct mask will allow the send
engine to error on the invalid VL, and error recovery will work
correctly.

Cc: # 4.9.x+
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn
Reviewed-by: Lukasz Odzioba
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Michael J. Ruhl
2018-10-04 08:00:56 +0800
412a4b4db IB/hfi1: Invalid user input can result in crash ... Browse Code »

commit 94694d18cf27a6faad91487a38ce516c2b16e7d9 upstream.

If the number of packets in a user sdma request does not match
the actual iovectors being sent, sdma_cleanup can be called on
an uninitialized request structure, resulting in a crash similar
to this:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [] __sdma_txclean+0x57/0x1e0 [hfi1]
PGD 8000001044f61067 PUD 1052706067 PMD 0
Oops: 0000 [#1] SMP
CPU: 30 PID: 69912 Comm: upsm Kdump: loaded Tainted: G OE
------------ 3.10.0-862.el7.x86_64 #1
Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS
SE5C610.86B.01.01.0019.101220160604 10/12/2016
task: ffff8b331c890000 ti: ffff8b2ed1f98000 task.ti: ffff8b2ed1f98000
RIP: 0010:[] [] __sdma_txclean+0x57/0x1e0
[hfi1]
RSP: 0018:ffff8b2ed1f9bab0 EFLAGS: 00010286
RAX: 0000000000008b2b RBX: ffff8b2adf6e0000 RCX: 0000000000000000
RDX: 00000000000000a0 RSI: ffff8b2e9eedc540 RDI: ffff8b2adf6e0000
RBP: ffff8b2ed1f9bad8 R08: 0000000000000000 R09: ffffffffc0b04a06
R10: ffff8b331c890190 R11: ffffe6ed00bf1840 R12: ffff8b3315480000
R13: ffff8b33154800f0 R14: 00000000fffffff2 R15: ffff8b2e9eedc540
FS: 00007f035ac47740(0000) GS:ffff8b331e100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000c03fe6000 CR4: 00000000001607e0
Call Trace:
[] user_sdma_send_pkts+0xdcd/0x1990 [hfi1]
[] ? gup_pud_range+0x140/0x290
[] ? hfi1_mmu_rb_insert+0x155/0x1b0 [hfi1]
[] hfi1_user_sdma_process_request+0xc5b/0x11b0 [hfi1]
[] hfi1_aio_write+0xba/0x110 [hfi1]
[] do_sync_readv_writev+0x7b/0xd0
[] do_readv_writev+0xce/0x260
[] ? tty_ldisc_deref+0x19/0x20
[] ? n_tty_ioctl+0xe0/0xe0
[] vfs_writev+0x35/0x60
[] SyS_writev+0x7f/0x110
[] system_call_fastpath+0x1c/0x21
Code: 06 49 c7 47 18 00 00 00 00 0f 87 89 01 00 00 5b 41 5c 41 5d 41 5e 41 5f
5d c3 66 2e 0f 1f 84 00 00 00 00 00 48 8b 4e 10 48 89 fb 8b 51 08 49 89 d4
83 e2 0c 41 81 e4 00 e0 00 00 48 c1 ea 02
RIP [] __sdma_txclean+0x57/0x1e0 [hfi1]
RSP
CR2: 0000000000000008

There are two exit points from user_sdma_send_pkts(). One (free_tx)
merely frees the slab entry and one (free_txreq) cleans the sdma_txreq
prior to freeing the slab entry. The free_txreq variation can only be
called after one of the sdma_init*() variations has been called.

In the panic case, the slab entry had been allocated but not inited.

Fix the issue by exiting through free_tx thus avoiding sdma_clean().

Cc: # 4.9.x+
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn
Reviewed-by: Lukasz Odzioba
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Greg Kroah-Hartman

Signed-off-by: Jason Gunthorpe

Michael J. Ruhl
2018-10-04 08:00:56 +0800
d9e49e9ed IB/hfi1: Fix SL array bounds check ... Browse Code »

commit 0dbfaa9f2813787679e296eb5476e40938ab48c8 upstream.

The SL specified by a user needs to be a valid SL.

Add a range check to the user specified SL value which protects from
running off the end of the SL to SC table.

CC: stable@vger.kernel.org
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Signed-off-by: Ira Weiny
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Ira Weiny
2018-10-04 08:00:56 +0800
fcbe49c82 IB/srp: Avoid that sg_reset -d ${srp_device} triggers an infinite loop ... Browse Code »

commit ee92efe41cf358f4b99e73509f2bfd4733609f26 upstream.

Use different loop variables for the inner and outer loop. This avoids
that an infinite loop occurs if there are more RDMA channels than
target->req_ring_size.

Fixes: d92c0da71a35 ("IB/srp: Add multichannel support")
Cc:
Signed-off-by: Bart Van Assche
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Bart Van Assche
2018-10-04 08:00:56 +0800
333cb98f3 IB/mlx4: Test port number before querying type. ... Browse Code »

[ Upstream commit f1228867adaf8890826f2b59e4caddb1c5cc2df7 ]

rdma_ah_find_type() can reach into ib_device->port_immutable with a
potentially out-of-bounds port number, so check that the port number is
valid first.

Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Tarick Bedeir
Reviewed-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Tarick Bedeir
2018-10-04 08:00:48 +0800
0ca45668e IB/core: type promotion bug in rdma_rw_init_one_mr() ... Browse Code »

[ Upstream commit c2d7c8ff89b22ddefb1ac2986c0d48444a667689 ]

"nents" is an unsigned int, so if ib_map_mr_sg() returns a negative
error code then it's type promoted to a high unsigned int which is
treated as success.

Fixes: a060b5629ab0 ("IB/core: generic RDMA READ/WRITE API")
Signed-off-by: Dan Carpenter
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Dan Carpenter
2018-10-04 08:00:47 +0800
eca859882 RDMA/i40w: Hold read semaphore while looking after VMA ... Browse Code »

[ Upstream commit 5d9a2b0e28759e319a623da33940dbb3ce952b7d ]

VMA lookup is supposed to be performed while mmap_sem is held.

Fixes: f26c7c83395b ("i40iw: Add 2MB page support")
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Leon Romanovsky
2018-10-04 08:00:47 +0800
e862ab6b6 RDMA/bnxt_re: Fix a couple off by one bugs ... Browse Code »

[ Upstream commit 474e5a86067e5f12c97d1db8b170c7f45b53097a ]

The sgid_tbl->tbl[] array is allocated in bnxt_qplib_alloc_sgid_tbl().
It has sgid_tbl->max elements. So the > should be >= to prevent
accessing one element beyond the end of the array.

Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Dan Carpenter
Acked-by: Selvin Xavier
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Dan Carpenter
2018-10-04 08:00:47 +0800