20 Oct, 2018
1 commit
-
commit b4a4957d3d1c328b733fce783b7264996f866ad2 upstream.
rvt_destroy_qp() cannot complete until all in process packets have
been released from the underlying hardware. If a link down event
occurs, an application can hang with a kernel stack similar to:cat /proc//stack
quiesce_qp+0x178/0x250 [hfi1]
rvt_reset_qp+0x23d/0x400 [rdmavt]
rvt_destroy_qp+0x69/0x210 [rdmavt]
ib_destroy_qp+0xba/0x1c0 [ib_core]
nvme_rdma_destroy_queue_ib+0x46/0x80 [nvme_rdma]
nvme_rdma_free_queue+0x3c/0xd0 [nvme_rdma]
nvme_rdma_destroy_io_queues+0x88/0xd0 [nvme_rdma]
nvme_rdma_error_recovery_work+0x52/0xf0 [nvme_rdma]
process_one_work+0x17a/0x440
worker_thread+0x126/0x3c0
kthread+0xcf/0xe0
ret_from_fork+0x58/0x90
0xffffffffffffffffquiesce_qp() waits until all outstanding packets have been freed.
This wait should be momentary. During a link down event, the cleanup
handling does not ensure that all packets caught by the link down are
flushed properly.This is caused by the fact that the freeze path and the link down
event is handled the same. This is not correct. The freeze path
waits until the HFI is unfrozen and then restarts PIO. A link down
is not a freeze event. The link down path cannot restart the PIO
until link is restored. If the PIO path is restarted before the link
comes up, the application (QP) using the PIO path will hang (until
link is restored).Fix by separating the linkdown path from the freeze path and use the
link down path for link down events.Close a race condition sc_disable() by acquiring both the progress
and release locks.Close a race condition in sc_stop() by moving the setting of the flag
bits under the alloc lock.Cc: # 4.9.x+
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman
04 Oct, 2018
6 commits
-
commit d623500b3c4efd8d4e945ac9003c6b87b469a9ab upstream.
If a packet stream uses an UnsupportedVL (virtual lane), the send
engine will not send the packet, and it will not indicate that an
error has occurred. This will cause the packet stream to block.HFI has 8 virtual lanes available for packet streams. Each lane can
be enabled or disabled using the UnsupportedVL mask. If a lane is
disabled, adding a packet to the send context must be disallowed.The current mask for determining unsupported VLs defaults to 0 (allow
all). This is incorrect. Only the VLs that are defined should be
allowed.Determine which VLs are disabled (mtu == 0), and set the appropriate
unsupported bit in the mask. The correct mask will allow the send
engine to error on the invalid VL, and error recovery will work
correctly.Cc: # 4.9.x+
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn
Reviewed-by: Lukasz Odzioba
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman -
commit 94694d18cf27a6faad91487a38ce516c2b16e7d9 upstream.
If the number of packets in a user sdma request does not match
the actual iovectors being sent, sdma_cleanup can be called on
an uninitialized request structure, resulting in a crash similar
to this:BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [] __sdma_txclean+0x57/0x1e0 [hfi1]
PGD 8000001044f61067 PUD 1052706067 PMD 0
Oops: 0000 [#1] SMP
CPU: 30 PID: 69912 Comm: upsm Kdump: loaded Tainted: G OE
------------ 3.10.0-862.el7.x86_64 #1
Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS
SE5C610.86B.01.01.0019.101220160604 10/12/2016
task: ffff8b331c890000 ti: ffff8b2ed1f98000 task.ti: ffff8b2ed1f98000
RIP: 0010:[] [] __sdma_txclean+0x57/0x1e0
[hfi1]
RSP: 0018:ffff8b2ed1f9bab0 EFLAGS: 00010286
RAX: 0000000000008b2b RBX: ffff8b2adf6e0000 RCX: 0000000000000000
RDX: 00000000000000a0 RSI: ffff8b2e9eedc540 RDI: ffff8b2adf6e0000
RBP: ffff8b2ed1f9bad8 R08: 0000000000000000 R09: ffffffffc0b04a06
R10: ffff8b331c890190 R11: ffffe6ed00bf1840 R12: ffff8b3315480000
R13: ffff8b33154800f0 R14: 00000000fffffff2 R15: ffff8b2e9eedc540
FS: 00007f035ac47740(0000) GS:ffff8b331e100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000c03fe6000 CR4: 00000000001607e0
Call Trace:
[] user_sdma_send_pkts+0xdcd/0x1990 [hfi1]
[] ? gup_pud_range+0x140/0x290
[] ? hfi1_mmu_rb_insert+0x155/0x1b0 [hfi1]
[] hfi1_user_sdma_process_request+0xc5b/0x11b0 [hfi1]
[] hfi1_aio_write+0xba/0x110 [hfi1]
[] do_sync_readv_writev+0x7b/0xd0
[] do_readv_writev+0xce/0x260
[] ? tty_ldisc_deref+0x19/0x20
[] ? n_tty_ioctl+0xe0/0xe0
[] vfs_writev+0x35/0x60
[] SyS_writev+0x7f/0x110
[] system_call_fastpath+0x1c/0x21
Code: 06 49 c7 47 18 00 00 00 00 0f 87 89 01 00 00 5b 41 5c 41 5d 41 5e 41 5f
5d c3 66 2e 0f 1f 84 00 00 00 00 00 48 8b 4e 10 48 89 fb 8b 51 08 49 89 d4
83 e2 0c 41 81 e4 00 e0 00 00 48 c1 ea 02
RIP [] __sdma_txclean+0x57/0x1e0 [hfi1]
RSP
CR2: 0000000000000008There are two exit points from user_sdma_send_pkts(). One (free_tx)
merely frees the slab entry and one (free_txreq) cleans the sdma_txreq
prior to freeing the slab entry. The free_txreq variation can only be
called after one of the sdma_init*() variations has been called.In the panic case, the slab entry had been allocated but not inited.
Fix the issue by exiting through free_tx thus avoiding sdma_clean().
Cc: # 4.9.x+
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn
Reviewed-by: Lukasz Odzioba
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Greg Kroah-HartmanSigned-off-by: Jason Gunthorpe
-
commit 0dbfaa9f2813787679e296eb5476e40938ab48c8 upstream.
The SL specified by a user needs to be a valid SL.
Add a range check to the user specified SL value which protects from
running off the end of the SL to SC table.CC: stable@vger.kernel.org
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Signed-off-by: Ira Weiny
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit f1228867adaf8890826f2b59e4caddb1c5cc2df7 ]
rdma_ah_find_type() can reach into ib_device->port_immutable with a
potentially out-of-bounds port number, so check that the port number is
valid first.Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Tarick Bedeir
Reviewed-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 5d9a2b0e28759e319a623da33940dbb3ce952b7d ]
VMA lookup is supposed to be performed while mmap_sem is held.
Fixes: f26c7c83395b ("i40iw: Add 2MB page support")
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 474e5a86067e5f12c97d1db8b170c7f45b53097a ]
The sgid_tbl->tbl[] array is allocated in bnxt_qplib_alloc_sgid_tbl().
It has sgid_tbl->max elements. So the > should be >= to prevent
accessing one element beyond the end of the array.Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Dan Carpenter
Acked-by: Selvin Xavier
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
29 Sep, 2018
1 commit
-
commit 308aa2b8f7b7db3332a7d41099fd37851fb793b2 upstream.
Once the qp has been flushed, it cannot be flushed again. The user qp
flush logic wasn't enforcing it however. The bug can cause
touch-after-free crashes like:Unable to handle kernel paging request for data at address 0x000001ec
Faulting instruction address: 0xc008000016069100
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c008000016069100] flush_qp+0x80/0x480 [iw_cxgb4]
LR [c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4]
Call Trace:
[c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4]
[c00800001606e868] c4iw_ib_modify_qp+0x118/0x200 [iw_cxgb4]
[c0080000119eae80] ib_security_modify_qp+0xd0/0x3d0 [ib_core]
[c0080000119c4e24] ib_modify_qp+0xc4/0x2c0 [ib_core]
[c008000011df0284] iwcm_modify_qp_err+0x44/0x70 [iw_cm]
[c008000011df0fec] destroy_cm_id+0xcc/0x370 [iw_cm]
[c008000011ed4358] rdma_destroy_id+0x3c8/0x520 [rdma_cm]
[c0080000134b0540] ucma_close+0x90/0x1b0 [rdma_ucm]
[c000000000444da4] __fput+0xe4/0x2f0So fix flush_qp() to only flush the wq once.
Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman
15 Sep, 2018
2 commits
-
[ Upstream commit a1ceeca679dccc492235f0f629d9e9f7b3d51ca8 ]
hns bitmap allocation functions return 0 on success and -1 on failure.
Callers of these functions wrongly used their return value as an errno,
fix that by making a proper conversion.Fixes: a598c6f4c5a8 ("IB/hns: Simplify function of pd alloc and qp alloc")
Signed-off-by: Gal Pressman
Acked-by: Lijun Ou
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit c513de490f808d8480346f9a58e6a4a5f3de12e7 ]
If the system BIOS does not supply NUMA node information to the
PCI devices, the NUMA node is selected by choosing the current
node.This can lead to the following crash:
divide error: 0000 SMP
CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G IOE
------------ 3.10.0-693.21.1.el7.x86_64 #1
Hardware name: Intel Corporation S2600KP/S2600KP, BIOS
SE5C610.86B.01.01.0005.101720141054 10/17/2014
Workqueue: events work_for_cpu_fn
task: ffff880174480fd0 ti: ffff880174488000 task.ti: ffff880174488000
RIP: 0010: [] hfi1_dev_affinity_init+0x129/0x6a0 [hfi1]
RSP: 0018:ffff88017448bbf8 EFLAGS: 00010246
RAX: 0000000000000011 RBX: ffff88107ffba6c0 RCX: ffff88085c22e130
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880824ad0000
RBP: ffff88017448bc48 R08: 0000000000000011 R09: 0000000000000002
R10: ffff8808582b6ca0 R11: 0000000000003151 R12: ffff8808582b6ca0
R13: ffff8808582b6518 R14: ffff8808582b6010 R15: 0000000000000012
FS: 0000000000000000(0000) GS:ffff88085ec00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007efc707404f0 CR3: 0000000001a02000 CR4: 00000000001607f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
hfi1_init_dd+0x14b3/0x27a0 [hfi1]
? pcie_capability_write_word+0x46/0x70
? hfi1_pcie_init+0xc0/0x200 [hfi1]
do_init_one+0x153/0x4c0 [hfi1]
? sched_clock_cpu+0x85/0xc0
init_one+0x1b5/0x260 [hfi1]
local_pci_probe+0x4a/0xb0
work_for_cpu_fn+0x1a/0x30
process_one_work+0x17f/0x440
worker_thread+0x278/0x3c0
? manage_workers.isra.24+0x2a0/0x2a0
kthread+0xd1/0xe0
? insert_kthread_work+0x40/0x40
ret_from_fork+0x77/0xb0
? insert_kthread_work+0x40/0x40If the BIOS is not supplying NUMA information:
- set the default table count to 1 for all possible nodes
- select node 0 (instead of current NUMA) node to get consistent
performance
- generate an error indicating that the BIOS should be upgradedReviewed-by: Gary Leshner
Reviewed-by: Mike Marciniszyn
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis DalessandroSigned-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
24 Aug, 2018
2 commits
-
[ Upstream commit d63c46734c545ad0488761059004a65c46efdde3 ]
Fix memory leak in the error path of mlx5_ib_create_srq() by making sure
to free the allocated srq.Fixes: c2b37f76485f ("IB/mlx5: Fix integer overflows in mlx5_ib_create_srq")
Signed-off-by: Kamal Heib
Acked-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 3dc7c7badb7502ec3e3aa817a8bdd9e53aa54c52 ]
Before returning -EPERM we should release some resources, as already done
in the other error handling path of the function.Fixes: d8f9cc328c88 ("IB/mlx4: Mark user MR as writable if actual virtual memory is writable")
Signed-off-by: Christophe JAILLET
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
17 Jul, 2018
2 commits
-
commit 7b72717a20bba8bdd01b14c0460be7d15061cd6b upstream.
The code was mistakenly using the length of the page array memory instead
of the depth of the page array.This would cause MR creation to fail in some cases.
Fixes: 8376b86de7d3 ("iw_cxgb4: Support the new memory registration API")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman -
commit b697d7d8c741f27b728a878fc55852b06d0f6f5e upstream.
The __get_txreq() function can return a pointer, ERR_PTR(-EBUSY), or NULL.
All of the relevant call sites look for IS_ERR, so the NULL return would
lead to a NULL pointer exception.Do not use the ERR_PTR mechanism for this function.
Update all call sites to handle the return value correctly.
Clean up error paths to reflect return value.
Fixes: 45842abbb292 ("staging/rdma/hfi1: move txreq header code")
Cc: # 4.9.x+
Reported-by: Dan Carpenter
Reviewed-by: Mike Marciniszyn
Reviewed-by: Kamenee Arumugam
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman
03 Jul, 2018
8 commits
-
commit 6b1ca7ece15e94251d1d0d919f813943e4a58059 upstream.
There is no need to crash the machine if unknown work request was
received in SQP MAD.Cc: # 3.6
Fixes: 37bfc7c1e83f ("IB/mlx4: SR-IOV multiplex and demultiplex MADs")
Signed-off-by: Leon Romanovsky
Signed-off-by: Doug Ledford
Signed-off-by: Greg Kroah-Hartman -
commit 1bc0299d976e000ececc6acd76e33b4582646cb7 upstream.
The following code fails to allocate a buffer for the
tail address that the hardware DMAs into when the user
context DMA_RTAIL is set.if (HFI1_CAP_KGET_MASK(rcd->flags, DMA_RTAIL)) {
rcd->rcvhdrtail_kvaddr = dma_zalloc_coherent(
&dd->pcidev->dev, PAGE_SIZE, &dma_hdrqtail,
gfp_flags);
if (!rcd->rcvhdrtail_kvaddr)
goto bail_free;
rcd->rcvhdrqtailaddr_dma = dma_hdrqtail;
}So the rcvhdrtail_kvaddr would then be NULL.
The mmap logic fails to check for a NULL rcvhdrtail_kvaddr.
The fix is to test for both user and kernel DMA_TAIL options
during the allocation as well as testing for a NULL
rcvhdrtail_kvaddr during the mmap processing.Additionally, all downstream testing of the capmask for DMA_RTAIL
have been eliminated in favor of testing rcvhdrtail_kvaddr.Cc: # 4.9.x
Reviewed-by: Michael J. Ruhl
Signed-off-by: Mike Marciniszyn
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman -
commit a93a0a31111231bb1949f4a83b17238f0fa32d6a upstream.
User send context integrity bits are cleared before the context is
disabled. If the send context is still processing data, any packets
that need those integrity bits will cause an error and halt the send
context.During the disable handling, the driver waits for the context to drain.
If the context is halted, the driver will eventually timeout because
the context won't drain and then incorrectly bounce the link.Reorder the bit clearing and the context disable.
Examine the software state and send context status as well as the
egress status to determine if a send context is in the halted state.Promote the check macros to static functions for consistency with the
new check and to follow kernel style.Remove an unused define that refers to the egress timeout.
Cc: # 4.9.x
Reviewed-by: Mitko Haralanov
Reviewed-by: Mike Marciniszyn
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford
Signed-off-by: Greg Kroah-Hartman -
commit 8c79d8223bb11b2f005695a32ddd3985de97727c upstream.
There are config dependent code paths that expose panics in unload
paths both in this file and in debugfs_remove_recursive() because
CONFIG_FAULT_INJECTION and CONFIG_FAULT_INJECTION_DEBUG_FS can be
set independently.Having CONFIG_FAULT_INJECTION set and CONFIG_FAULT_INJECTION_DEBUG_FS
reset causes fault_create_debugfs_attr() to return an error.The debugfs.c routines tolerate failures, but the module unload panics
dereferencing a NULL in the two exit routines. If that is fixed, the
dir passed to debugfs_remove_recursive comes from a memory location
that was freed and potentially reused causing a segfault or corrupting
memory.Here is an example of the NULL deref panic:
[66866.286829] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
[66866.295602] IP: hfi1_dbg_ibdev_exit+0x2a/0x80 [hfi1]
[66866.301138] PGD 858496067 P4D 858496067 PUD 8433a7067 PMD 0
[66866.307452] Oops: 0000 [#1] SMP
[66866.310953] Modules linked in: hfi1(-) rdmavt rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm iw_cm ib_cm ib_core rpcsec_gss_krb5 nfsv4 dns_resolver nfsv3 nfs fscache sb_edac x86_pkg_temp_thermal intel_powerclamp vfat fat coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel iTCO_wdt iTCO_vendor_support crypto_simd mei_me glue_helper cryptd mxm_wmi ipmi_si pcspkr lpc_ich sg mei ioatdma ipmi_devintf i2c_i801 mfd_core shpchp ipmi_msghandler wmi acpi_power_meter acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt igb fb_sys_fops ttm ahci ptp crc32c_intel libahci pps_core drm dca libata i2c_algo_bit i2c_core [last unloaded: opa_vnic]
[66866.385551] CPU: 8 PID: 7470 Comm: rmmod Not tainted 4.14.0-mam-tid-rdma #2
[66866.393317] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/2016
[66866.405252] task: ffff88084f28c380 task.stack: ffffc90008454000
[66866.411866] RIP: 0010:hfi1_dbg_ibdev_exit+0x2a/0x80 [hfi1]
[66866.417984] RSP: 0018:ffffc90008457da0 EFLAGS: 00010202
[66866.423812] RAX: 0000000000000000 RBX: ffff880857de0000 RCX: 0000000180040001
[66866.431773] RDX: 0000000180040002 RSI: ffffea0021088200 RDI: 0000000040000000
[66866.439734] RBP: ffffc90008457da8 R08: ffff88084220e000 R09: 0000000180040001
[66866.447696] R10: 000000004220e001 R11: ffff88084220e000 R12: ffff88085a31c000
[66866.455657] R13: ffffffffa07c9820 R14: ffffffffa07c9890 R15: ffff881059d78100
[66866.463618] FS: 00007f6876047740(0000) GS:ffff88085f800000(0000) knlGS:0000000000000000
[66866.472644] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[66866.479053] CR2: 0000000000000088 CR3: 0000000856357006 CR4: 00000000001606e0
[66866.487013] Call Trace:
[66866.489747] remove_one+0x1f/0x220 [hfi1]
[66866.494221] pci_device_remove+0x39/0xc0
[66866.498596] device_release_driver_internal+0x141/0x210
[66866.504424] driver_detach+0x3f/0x80
[66866.508409] bus_remove_driver+0x55/0xd0
[66866.512784] driver_unregister+0x2c/0x50
[66866.517164] pci_unregister_driver+0x2a/0xa0
[66866.521934] hfi1_mod_cleanup+0x10/0xaa2 [hfi1]
[66866.526988] SyS_delete_module+0x171/0x250
[66866.531558] do_syscall_64+0x67/0x1b0
[66866.535644] entry_SYSCALL64_slow_path+0x25/0x25
[66866.540792] RIP: 0033:0x7f6875525c27
[66866.544777] RSP: 002b:00007ffd48528e78 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[66866.553224] RAX: ffffffffffffffda RBX: 0000000001cc01d0 RCX: 00007f6875525c27
[66866.561185] RDX: 00007f6875596000 RSI: 0000000000000800 RDI: 0000000001cc0238
[66866.569146] RBP: 0000000000000000 R08: 00007f68757e9060 R09: 00007f6875596000
[66866.577120] R10: 00007ffd48528c00 R11: 0000000000000206 R12: 00007ffd48529db4
[66866.585080] R13: 0000000000000000 R14: 0000000001cc01d0 R15: 0000000001cc0010
[66866.593040] Code: 90 0f 1f 44 00 00 48 83 3d a3 8b 03 00 00 55 48 89 e5 53 48 89 fb 74 4e 48 8d bf 18 0c 00 00 e8 9d f2 ff ff 48 8b 83 20 0c 00 00 8b b8 88 00 00 00 e8 2a 21 b3 e0 48 8b bb 20 0c 00 00 e8 0e
[66866.614127] RIP: hfi1_dbg_ibdev_exit+0x2a/0x80 [hfi1] RSP: ffffc90008457da0
[66866.621885] CR2: 0000000000000088
[66866.625618] ---[ end trace c4817425783fb092 ]---Fix by insuring that upon failure from fault_create_debugfs_attr() the
parent pointer for the routines is always set to NULL and guards added
in the exit routines to insure that debugfs_remove_recursive() is not
called when when the parent pointer is NULL.Fixes: 0181ce31b260 ("IB/hfi1: Add receive fault injection feature")
Cc: # 4.14.x
Reviewed-by: Michael J. Ruhl
Signed-off-by: Mike Marciniszyn
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford
Signed-off-by: Greg Kroah-Hartman -
commit 7b74a83cf54a3747e22c57e25712bd70eef8acee upstream.
On fatal error the driver simulates CQE's for ULPs that rely on
completion of all their posted work-request.For the GSI traffic, the mlx5 has its own mechanism that sends the
completions via software CQE's directly to the relevant CQ.This should be kept in fatal error too, so the driver should simulate
such CQE's with the specified error state in order to complete GSI QP
work requests.Without the fix the next deadlock might appears:
schedule_timeout+0x274/0x350
wait_for_common+0xec/0x240
mcast_remove_one+0xd0/0x120 [ib_core]
ib_unregister_device+0x12c/0x230 [ib_core]
mlx5_ib_remove+0xc4/0x270 [mlx5_ib]
mlx5_detach_device+0x184/0x1a0 [mlx5_core]
mlx5_unload_one+0x308/0x340 [mlx5_core]
mlx5_pci_err_detected+0x74/0xe0 [mlx5_core]Cc: # 4.7
Fixes: 89ea94a7b6c4 ("IB/mlx5: Reset flow support for IB kernel ULPs")
Signed-off-by: Erez Shitrit
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman -
commit d8f9cc328c8888369880e2527e9186d745f2bbf6 upstream.
To allow rereg_user_mr to modify the MR from read-only to writable without
using get_user_pages again, we needed to define the initial MR as writable.
However, this was originally done unconditionally, without taking into
account the writability of the underlying virtual memory.As a result, any attempt to register a read-only MR over read-only
virtual memory failed.To fix this, do not add the writable flag bit when the user virtual memory
is not writable (e.g. const memory).However, when the underlying memory is NOT writable (and we therefore
do not define the initial MR as writable), the IB core adds a
"force writable" flag to its user-pages request. If this succeeds,
the reg_user_mr caller gets a writable copy of the original pages.If the user-space caller then does a rereg_user_mr operation to enable
writability, this will succeed. This should not be allowed, since
the original virtual memory was not writable.Cc:
Fixes: 9376932d0c26 ("IB/mlx4_ib: Add support for user MR re-registration")
Signed-off-by: Jason Gunthorpe
Signed-off-by: Jack Morgenstein
Signed-off-by: Leon Romanovsky
Signed-off-by: Greg Kroah-Hartman -
commit 8d3e71136a080d007620472f50c7b3e63ba0f5cf upstream.
A warm restart will fail to unload the driver, leaving link state
potentially flapping up to the point the BIOS resets the adapter.
Correct the issue by hooking the shutdown pci method,
which will bring port down.Cc: # 4.9.x
Reviewed-by: Mike Marciniszyn
Signed-off-by: Alex Estrin
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford
Signed-off-by: Greg Kroah-Hartman -
commit 0252f73334f9ef68868e4684200bea3565a4fcee upstream.
The following error occurs in a debug build when running MPI PSM:
[ 307.415911] WARNING: CPU: 4 PID: 23867 at lib/dma-debug.c:1158
check_unmap+0x4ee/0xa20
[ 307.455661] ib_qib 0000:05:00.0: DMA-API: device driver failed to check map
error[device address=0x00000000df82b000] [size=4096 bytes] [mapped as page]
[ 307.517494] Modules linked in:
[ 307.531584] ib_isert iscsi_target_mod ib_srpt target_core_mod rpcrdma
sunrpc ib_srp scsi_transport_srp scsi_tgt ib_iser libiscsi ib_ipoib
scsi_transport_iscsi rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm
ib_qib intel_powerclamp coretemp rdmavt intel_rapl iosf_mbi kvm_intel kvm
irqbypass crc32_pclmul ghash_clmulni_intel ipmi_ssif ib_core aesni_intel sg
ipmi_si lrw gf128mul dca glue_helper ipmi_devintf iTCO_wdt gpio_ich hpwdt
iTCO_vendor_support ablk_helper hpilo acpi_power_meter cryptd ipmi_msghandler
ie31200_edac shpchp pcc_cpufreq lpc_ich pcspkr ip_tables xfs libcrc32c sd_mod
crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops ttm ahci crct10dif_pclmul crct10dif_common
drm crc32c_intel libahci tg3 libata serio_raw ptp i2c_core
[ 307.846113] pps_core dm_mirror dm_region_hash dm_log dm_mod
[ 307.866505] CPU: 4 PID: 23867 Comm: mpitests-IMB-MP Kdump: loaded Not
tainted 3.10.0-862.el7.x86_64.debug #1
[ 307.911178] Hardware name: HP ProLiant DL320e Gen8, BIOS J05 11/09/2013
[ 307.944206] Call Trace:
[ 307.956973] [] dump_stack+0x19/0x1b
[ 307.982201] [] __warn+0xd8/0x100
[ 308.005999] [] warn_slowpath_fmt+0x5f/0x80
[ 308.034260] [] check_unmap+0x4ee/0xa20
[ 308.060801] [] ? page_add_file_rmap+0x2a/0x1d0
[ 308.090689] [] debug_dma_unmap_page+0x9d/0xb0
[ 308.120155] [] ? might_fault+0xa0/0xb0
[ 308.146656] [] qib_tid_free.isra.14+0x215/0x2a0 [ib_qib]
[ 308.180739] [] qib_write+0x894/0x1280 [ib_qib]
[ 308.210733] [] ? __inode_security_revalidate+0x70/0x80
[ 308.244837] [] ? security_file_permission+0x27/0xb0
[ 308.266025] qib_ib0.8006: multicast join failed for
ff12:401b:8006:0000:0000:0000:ffff:ffff, status -22
[ 308.323421] [] vfs_write+0xc3/0x1f0
[ 308.347077] [] ? fget_light+0xfc/0x510
[ 308.372533] [] SyS_write+0x8a/0x100
[ 308.396456] [] system_call_fastpath+0x1c/0x21The code calls a qib_map_page() which has never correctly tested for a
mapping error.Fix by testing for pci_dma_mapping_error() in all cases and properly
handling the failure in the caller.Additionally, streamline qib_map_page() arguments to satisfy just
the single caller.Cc:
Reviewed-by: Alex Estrin
Tested-by: Don Dutile
Reviewed-by: Don Dutile
Signed-off-by: Mike Marciniszyn
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford
Signed-off-by: Greg Kroah-Hartman
21 Jun, 2018
2 commits
-
[ Upstream commit 59482a14918b282ca2a98f38c69da5ebeb1107d2 ]
When IRQ affinity is set and the interrupt type is unknown, a cpu
mask allocated within the function is never freed. Fix this memory
leak by allocating memory within the scope where it is used.Reviewed-by: Mike Marciniszyn
Reviewed-by: Michael J. Ruhl
Signed-off-by: Sebastian Sanchez
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 5da9e742be44d9b7c68b1bf6e1aaf46a1aa7a52b ]
The module parameter num_user_context is defined as 'int' and
defaults to -1. The module_param_named() says that it is uint.Correct module_param_named() type information and update the modinfo
text to reflect the default value.Reviewed-by: Dennis Dalessandro
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
30 May, 2018
16 commits
-
[ Upstream commit 7672ed33c4c15dbe9d56880683baaba4227cf940 ]
Before commit f1b65df5a232 ("IB/mlx5: Add support for active_width and
active_speed in RoCE"), the mlx5_ib driver set the default active_width
and active_speed to IB_WIDTH_4X and IB_SPEED_QDR.When the RoCE port is down, the RoCE port does not negotiate the active
width with the remote side, causing the active width to be zero. When
running userspace ibstat to view the port status, ibstat will panic as it
reads an invalid width from sys file.This patch restores the original behavior.
Fixes: f1b65df5a232 ("IB/mlx5: Add support for active_width and active_speed in RoCE").
Signed-off-by: Honggang Li
Reviewed-by: Hal Rosenstock
Reviewed-by: Noa Osherovich
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit caf61b1b8b88ccf1451f7321a176393797e8d292 ]
Once the FW is transitioned to error, FLUSH cqes can be received.
We want the driver to be aware of the fact that QP is already in error.Without this fix, a user may see false error messages in the dmesg log,
mentioning that a FLUSH cqe was received while QP is not in error state.Fixes: cecbcddf ("qedr: Add support for QP verbs")
Signed-off-by: Michal Kalderon
Signed-off-by: Ariel Elior
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit b15606f47b89b0b09936d7f45b59ba6275527041 ]
Return code wasn't set properly when CNQ allocation failed.
This only affect error message logging, currently user will
receive an error message that says the qedr driver load failed
with rc '0', instead of ENOMEMFixes: ec72fce4 ("qedr: Add support for RoCE HW init")
Signed-off-by: Michal Kalderon
Signed-off-by: Ariel Elior
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit c3594f22302cca5e924e47ec1cc8edd265708f41 ]
QPs that were configured with ack timeout value lower than 1
msec will not implement re-transmission timeout.
This means that if a packet / ACK were dropped, the QP
will not retransmit this packet.This can lead to an application hang.
Fixes: cecbcddf6 ("qedr: Add support for QP verbs")
Signed-off-by: Michal Kalderon
Signed-off-by: Ariel Elior
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 5d414b178e950ce9685c253994cc730893d5d887 ]
"err" is either zero or possibly uninitialized here. It should be
-EINVAL.Fixes: 427c1e7bcd7e ("{IB, net}/mlx5: Move the modify QP operation table to mlx5_ib")
Signed-off-by: Dan Carpenter
Acked-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit a18177925c252da7801149abe217c05b80884798 ]
The commit cited below added a gid_type field (RoCEv1 or RoCEv2)
to GID properties.When adding GIDs, this gid_type field was copied over to the
hardware gid table. However, when deleting GIDs, the gid_type field
was not copied over to the hardware gid table.As a result, when running RoCEv2, all RoCEv2 gids in the
hardware gid table were set to type RoCEv1 when any gid was deleted.This problem would persist until the next gid was added (which would again
restore the gid_type field for all the gids in the hardware gid table).Fix this by copying over the gid_type field to the hardware gid table
when deleting gids, so that the gid_type of all remaining gids is
preserved when a gid is deleted.Fixes: b699a859d17b ("IB/mlx4: Add gid_type to GID properties")
Reviewed-by: Moni Shoua
Signed-off-by: Jack Morgenstein
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 0077416a3d529baccbe07ab3242e8db541cfadf6 ]
When using IPv4 addresses in RoCEv2, the GID format for the mapped
IPv4 address should be: ::ffff:.In the cited commit, IPv4 mapped IPV6 addresses had the 3 upper dwords
zeroed out by memset, which resulted in deleting the ffff field.However, since procedure ipv6_addr_v4mapped() already verifies that the
gid has format ::ffff:, no change is needed for the gid,
and the memset can simply be removed.Fixes: 7e57b85c444c ("IB/mlx4: Add support for setting RoCEv2 gids in hardware")
Reviewed-by: Moni Shoua
Signed-off-by: Jack Morgenstein
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 551e1c67b4207455375a2e7a285dea1c7e8fc361 ]
iWARP does not support RDMA WRITE or SEND with immediate data.
Driver should check this before submitting to FW and return an
immediate errorSigned-off-by: Michal Kalderon
Signed-off-by: Ariel Elior
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit e3fd112cbf21d049faf64ba1471d72b93c22109a ]
Race in qedr_poll_cq, lastest_cqe wasn't protected by lock,
leading to a case where two context's accessing poll_cq at
the same time lead to one of them having a pointer to an old
latest_cqe and reading an invalid cqe elementSigned-off-by: Amit Radzi
Signed-off-by: Michal Kalderon
Signed-off-by: Ariel Elior
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 497158aa5f520db50452ef928c0f955cb42f2e77 ]
Release the netdev references in the cleanup path. Invokes the cleanup
routines if bnxt_re_ib_reg fails.Signed-off-by: Selvin Xavier
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit c354dff00db8df80f271418d8392065e10ffffb6 ]
To support host systems with non 4K page size, l2_db_size shall be
calculated with 4096 instead of PAGE_SIZE. Also, supply the host page size
to FW during initialization.Signed-off-by: Devesh Sharma
Signed-off-by: Selvin Xavier
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit a45bc17b360d75fac9ced85e99fda14bf38b4dc3 ]
HW requires an unconditonal fence for all non-wire memory operations
through SQ. This guarantees the completions of these memory operations.Signed-off-by: Devesh Sharma
Signed-off-by: Selvin Xavier
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 65389322b28f81cc137b60a41044c2d958a7b950 ]
IB spec says that a lid should be ignored when link layer is Ethernet,
for example when building or parsing a CM request message (CA17-34).
However, since ib_lid_be16() and ib_lid_cpu16() validates the slid,
not only when link layer is IB, we set the slid to zero to prevent
false warnings in the kernel log.Fixes: 62ede7779904 ("Add OPA extended LID support")
Reviewed-by: Majd Dibbiny
Signed-off-by: Moni Shoua
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit dcdaba08062b4726500b9456f8664bfda896c664 ]
During driver unload, the driver proceeds with cleanup
without waiting for the scheduled events. So the device
pointers get freed up and driver crashes when the events
are scheduled later.Flush the bnxt_re_task work queue before starting
device removal.Signed-off-by: Selvin Xavier
Signed-off-by: Doug Ledford
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 6b4521f5174c26020ae0deb3ef7f2c28557cf445 ]
Driver leaves the QP memory pinned if QP create command
fails from the FW. Avoids this scenario by adding a proper
exit path if the FW command fails.Signed-off-by: Devesh Sharma
Signed-off-by: Selvin Xavier
Signed-off-by: Doug Ledford
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
commit f9e76ca3771bf23d2142a81a88ddd8f31f5c4c03 upstream.
A pio send egress error can occur when the PSM library attempts to
to send a bad packet. That issue is still being investigated.The pio error interrupt handler then attempts to progress the recovery
of the errored pio send context.Code inspection reveals that the handling lacks the necessary locking
if that recovery interleaves with a PSM close of the "context" object
contains the pio send context.The lack of the locking can cause the recovery to access the already
freed pio send context object and incorrectly deduce that the pio
send context is actually a kernel pio send context as shown by the
NULL deref stack below:[] _dev_info+0x6c/0x90
[] sc_restart+0x70/0x1f0 [hfi1]
[] ? __schedule+0x424/0x9b0
[] sc_halted+0x15/0x20 [hfi1]
[] process_one_work+0x17a/0x440
[] worker_thread+0x126/0x3c0
[] ? manage_workers.isra.24+0x2a0/0x2a0
[] kthread+0xcf/0xe0
[] ? insert_kthread_work+0x40/0x40
[] ret_from_fork+0x58/0x90
[] ? insert_kthread_work+0x40/0x40This is the best case scenario and other scenarios can corrupt the
already freed memory.Fix by adding the necessary locking in the pio send context error
handler.Cc: # 4.9.x
Reviewed-by: Mike Marciniszyn
Reviewed-by: Dennis Dalessandro
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford
Signed-off-by: Greg Kroah-Hartman