20 Oct, 2018

1 commit

  • commit b4a4957d3d1c328b733fce783b7264996f866ad2 upstream.

    rvt_destroy_qp() cannot complete until all in process packets have
    been released from the underlying hardware. If a link down event
    occurs, an application can hang with a kernel stack similar to:

    cat /proc//stack
    quiesce_qp+0x178/0x250 [hfi1]
    rvt_reset_qp+0x23d/0x400 [rdmavt]
    rvt_destroy_qp+0x69/0x210 [rdmavt]
    ib_destroy_qp+0xba/0x1c0 [ib_core]
    nvme_rdma_destroy_queue_ib+0x46/0x80 [nvme_rdma]
    nvme_rdma_free_queue+0x3c/0xd0 [nvme_rdma]
    nvme_rdma_destroy_io_queues+0x88/0xd0 [nvme_rdma]
    nvme_rdma_error_recovery_work+0x52/0xf0 [nvme_rdma]
    process_one_work+0x17a/0x440
    worker_thread+0x126/0x3c0
    kthread+0xcf/0xe0
    ret_from_fork+0x58/0x90
    0xffffffffffffffff

    quiesce_qp() waits until all outstanding packets have been freed.
    This wait should be momentary. During a link down event, the cleanup
    handling does not ensure that all packets caught by the link down are
    flushed properly.

    This is caused by the fact that the freeze path and the link down
    event is handled the same. This is not correct. The freeze path
    waits until the HFI is unfrozen and then restarts PIO. A link down
    is not a freeze event. The link down path cannot restart the PIO
    until link is restored. If the PIO path is restarted before the link
    comes up, the application (QP) using the PIO path will hang (until
    link is restored).

    Fix by separating the linkdown path from the freeze path and use the
    link down path for link down events.

    Close a race condition sc_disable() by acquiring both the progress
    and release locks.

    Close a race condition in sc_stop() by moving the setting of the flag
    bits under the alloc lock.

    Cc: # 4.9.x+
    Fixes: 7724105686e7 ("IB/hfi1: add driver files")
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Michael J. Ruhl
     

04 Oct, 2018

6 commits

  • commit d623500b3c4efd8d4e945ac9003c6b87b469a9ab upstream.

    If a packet stream uses an UnsupportedVL (virtual lane), the send
    engine will not send the packet, and it will not indicate that an
    error has occurred. This will cause the packet stream to block.

    HFI has 8 virtual lanes available for packet streams. Each lane can
    be enabled or disabled using the UnsupportedVL mask. If a lane is
    disabled, adding a packet to the send context must be disallowed.

    The current mask for determining unsupported VLs defaults to 0 (allow
    all). This is incorrect. Only the VLs that are defined should be
    allowed.

    Determine which VLs are disabled (mtu == 0), and set the appropriate
    unsupported bit in the mask. The correct mask will allow the send
    engine to error on the invalid VL, and error recovery will work
    correctly.

    Cc: # 4.9.x+
    Fixes: 7724105686e7 ("IB/hfi1: add driver files")
    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Lukasz Odzioba
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Michael J. Ruhl
     
  • commit 94694d18cf27a6faad91487a38ce516c2b16e7d9 upstream.

    If the number of packets in a user sdma request does not match
    the actual iovectors being sent, sdma_cleanup can be called on
    an uninitialized request structure, resulting in a crash similar
    to this:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: [] __sdma_txclean+0x57/0x1e0 [hfi1]
    PGD 8000001044f61067 PUD 1052706067 PMD 0
    Oops: 0000 [#1] SMP
    CPU: 30 PID: 69912 Comm: upsm Kdump: loaded Tainted: G OE
    ------------ 3.10.0-862.el7.x86_64 #1
    Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS
    SE5C610.86B.01.01.0019.101220160604 10/12/2016
    task: ffff8b331c890000 ti: ffff8b2ed1f98000 task.ti: ffff8b2ed1f98000
    RIP: 0010:[] [] __sdma_txclean+0x57/0x1e0
    [hfi1]
    RSP: 0018:ffff8b2ed1f9bab0 EFLAGS: 00010286
    RAX: 0000000000008b2b RBX: ffff8b2adf6e0000 RCX: 0000000000000000
    RDX: 00000000000000a0 RSI: ffff8b2e9eedc540 RDI: ffff8b2adf6e0000
    RBP: ffff8b2ed1f9bad8 R08: 0000000000000000 R09: ffffffffc0b04a06
    R10: ffff8b331c890190 R11: ffffe6ed00bf1840 R12: ffff8b3315480000
    R13: ffff8b33154800f0 R14: 00000000fffffff2 R15: ffff8b2e9eedc540
    FS: 00007f035ac47740(0000) GS:ffff8b331e100000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000008 CR3: 0000000c03fe6000 CR4: 00000000001607e0
    Call Trace:
    [] user_sdma_send_pkts+0xdcd/0x1990 [hfi1]
    [] ? gup_pud_range+0x140/0x290
    [] ? hfi1_mmu_rb_insert+0x155/0x1b0 [hfi1]
    [] hfi1_user_sdma_process_request+0xc5b/0x11b0 [hfi1]
    [] hfi1_aio_write+0xba/0x110 [hfi1]
    [] do_sync_readv_writev+0x7b/0xd0
    [] do_readv_writev+0xce/0x260
    [] ? tty_ldisc_deref+0x19/0x20
    [] ? n_tty_ioctl+0xe0/0xe0
    [] vfs_writev+0x35/0x60
    [] SyS_writev+0x7f/0x110
    [] system_call_fastpath+0x1c/0x21
    Code: 06 49 c7 47 18 00 00 00 00 0f 87 89 01 00 00 5b 41 5c 41 5d 41 5e 41 5f
    5d c3 66 2e 0f 1f 84 00 00 00 00 00 48 8b 4e 10 48 89 fb 8b 51 08 49 89 d4
    83 e2 0c 41 81 e4 00 e0 00 00 48 c1 ea 02
    RIP [] __sdma_txclean+0x57/0x1e0 [hfi1]
    RSP
    CR2: 0000000000000008

    There are two exit points from user_sdma_send_pkts(). One (free_tx)
    merely frees the slab entry and one (free_txreq) cleans the sdma_txreq
    prior to freeing the slab entry. The free_txreq variation can only be
    called after one of the sdma_init*() variations has been called.

    In the panic case, the slab entry had been allocated but not inited.

    Fix the issue by exiting through free_tx thus avoiding sdma_clean().

    Cc: # 4.9.x+
    Fixes: 7724105686e7 ("IB/hfi1: add driver files")
    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Lukasz Odzioba
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Greg Kroah-Hartman

    Signed-off-by: Jason Gunthorpe

    Michael J. Ruhl
     
  • commit 0dbfaa9f2813787679e296eb5476e40938ab48c8 upstream.

    The SL specified by a user needs to be a valid SL.

    Add a range check to the user specified SL value which protects from
    running off the end of the SL to SC table.

    CC: stable@vger.kernel.org
    Fixes: 7724105686e7 ("IB/hfi1: add driver files")
    Signed-off-by: Ira Weiny
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Ira Weiny
     
  • [ Upstream commit f1228867adaf8890826f2b59e4caddb1c5cc2df7 ]

    rdma_ah_find_type() can reach into ib_device->port_immutable with a
    potentially out-of-bounds port number, so check that the port number is
    valid first.

    Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
    Signed-off-by: Tarick Bedeir
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Tarick Bedeir
     
  • [ Upstream commit 5d9a2b0e28759e319a623da33940dbb3ce952b7d ]

    VMA lookup is supposed to be performed while mmap_sem is held.

    Fixes: f26c7c83395b ("i40iw: Add 2MB page support")
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Leon Romanovsky
     
  • [ Upstream commit 474e5a86067e5f12c97d1db8b170c7f45b53097a ]

    The sgid_tbl->tbl[] array is allocated in bnxt_qplib_alloc_sgid_tbl().
    It has sgid_tbl->max elements. So the > should be >= to prevent
    accessing one element beyond the end of the array.

    Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
    Signed-off-by: Dan Carpenter
    Acked-by: Selvin Xavier
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     

29 Sep, 2018

1 commit

  • commit 308aa2b8f7b7db3332a7d41099fd37851fb793b2 upstream.

    Once the qp has been flushed, it cannot be flushed again. The user qp
    flush logic wasn't enforcing it however. The bug can cause
    touch-after-free crashes like:

    Unable to handle kernel paging request for data at address 0x000001ec
    Faulting instruction address: 0xc008000016069100
    Oops: Kernel access of bad area, sig: 11 [#1]
    ...
    NIP [c008000016069100] flush_qp+0x80/0x480 [iw_cxgb4]
    LR [c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4]
    Call Trace:
    [c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4]
    [c00800001606e868] c4iw_ib_modify_qp+0x118/0x200 [iw_cxgb4]
    [c0080000119eae80] ib_security_modify_qp+0xd0/0x3d0 [ib_core]
    [c0080000119c4e24] ib_modify_qp+0xc4/0x2c0 [ib_core]
    [c008000011df0284] iwcm_modify_qp_err+0x44/0x70 [iw_cm]
    [c008000011df0fec] destroy_cm_id+0xcc/0x370 [iw_cm]
    [c008000011ed4358] rdma_destroy_id+0x3c8/0x520 [rdma_cm]
    [c0080000134b0540] ucma_close+0x90/0x1b0 [rdma_ucm]
    [c000000000444da4] __fput+0xe4/0x2f0

    So fix flush_qp() to only flush the wq once.

    Cc: stable@vger.kernel.org
    Signed-off-by: Steve Wise
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Steve Wise
     

15 Sep, 2018

2 commits

  • [ Upstream commit a1ceeca679dccc492235f0f629d9e9f7b3d51ca8 ]

    hns bitmap allocation functions return 0 on success and -1 on failure.
    Callers of these functions wrongly used their return value as an errno,
    fix that by making a proper conversion.

    Fixes: a598c6f4c5a8 ("IB/hns: Simplify function of pd alloc and qp alloc")
    Signed-off-by: Gal Pressman
    Acked-by: Lijun Ou
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Gal Pressman
     
  • [ Upstream commit c513de490f808d8480346f9a58e6a4a5f3de12e7 ]

    If the system BIOS does not supply NUMA node information to the
    PCI devices, the NUMA node is selected by choosing the current
    node.

    This can lead to the following crash:

    divide error: 0000 SMP
    CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G IOE
    ------------ 3.10.0-693.21.1.el7.x86_64 #1
    Hardware name: Intel Corporation S2600KP/S2600KP, BIOS
    SE5C610.86B.01.01.0005.101720141054 10/17/2014
    Workqueue: events work_for_cpu_fn
    task: ffff880174480fd0 ti: ffff880174488000 task.ti: ffff880174488000
    RIP: 0010: [] hfi1_dev_affinity_init+0x129/0x6a0 [hfi1]
    RSP: 0018:ffff88017448bbf8 EFLAGS: 00010246
    RAX: 0000000000000011 RBX: ffff88107ffba6c0 RCX: ffff88085c22e130
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880824ad0000
    RBP: ffff88017448bc48 R08: 0000000000000011 R09: 0000000000000002
    R10: ffff8808582b6ca0 R11: 0000000000003151 R12: ffff8808582b6ca0
    R13: ffff8808582b6518 R14: ffff8808582b6010 R15: 0000000000000012
    FS: 0000000000000000(0000) GS:ffff88085ec00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007efc707404f0 CR3: 0000000001a02000 CR4: 00000000001607f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Call Trace:
    hfi1_init_dd+0x14b3/0x27a0 [hfi1]
    ? pcie_capability_write_word+0x46/0x70
    ? hfi1_pcie_init+0xc0/0x200 [hfi1]
    do_init_one+0x153/0x4c0 [hfi1]
    ? sched_clock_cpu+0x85/0xc0
    init_one+0x1b5/0x260 [hfi1]
    local_pci_probe+0x4a/0xb0
    work_for_cpu_fn+0x1a/0x30
    process_one_work+0x17f/0x440
    worker_thread+0x278/0x3c0
    ? manage_workers.isra.24+0x2a0/0x2a0
    kthread+0xd1/0xe0
    ? insert_kthread_work+0x40/0x40
    ret_from_fork+0x77/0xb0
    ? insert_kthread_work+0x40/0x40

    If the BIOS is not supplying NUMA information:
    - set the default table count to 1 for all possible nodes
    - select node 0 (instead of current NUMA) node to get consistent
    performance
    - generate an error indicating that the BIOS should be upgraded

    Reviewed-by: Gary Leshner
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro

    Signed-off-by: Jason Gunthorpe

    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Michael J. Ruhl
     

24 Aug, 2018

2 commits

  • [ Upstream commit d63c46734c545ad0488761059004a65c46efdde3 ]

    Fix memory leak in the error path of mlx5_ib_create_srq() by making sure
    to free the allocated srq.

    Fixes: c2b37f76485f ("IB/mlx5: Fix integer overflows in mlx5_ib_create_srq")
    Signed-off-by: Kamal Heib
    Acked-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Kamal Heib
     
  • [ Upstream commit 3dc7c7badb7502ec3e3aa817a8bdd9e53aa54c52 ]

    Before returning -EPERM we should release some resources, as already done
    in the other error handling path of the function.

    Fixes: d8f9cc328c88 ("IB/mlx4: Mark user MR as writable if actual virtual memory is writable")
    Signed-off-by: Christophe JAILLET
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Christophe Jaillet
     

17 Jul, 2018

2 commits

  • commit 7b72717a20bba8bdd01b14c0460be7d15061cd6b upstream.

    The code was mistakenly using the length of the page array memory instead
    of the depth of the page array.

    This would cause MR creation to fail in some cases.

    Fixes: 8376b86de7d3 ("iw_cxgb4: Support the new memory registration API")
    Cc: stable@vger.kernel.org
    Signed-off-by: Steve Wise
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Steve Wise
     
  • commit b697d7d8c741f27b728a878fc55852b06d0f6f5e upstream.

    The __get_txreq() function can return a pointer, ERR_PTR(-EBUSY), or NULL.
    All of the relevant call sites look for IS_ERR, so the NULL return would
    lead to a NULL pointer exception.

    Do not use the ERR_PTR mechanism for this function.

    Update all call sites to handle the return value correctly.

    Clean up error paths to reflect return value.

    Fixes: 45842abbb292 ("staging/rdma/hfi1: move txreq header code")
    Cc: # 4.9.x+
    Reported-by: Dan Carpenter
    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Kamenee Arumugam
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Michael J. Ruhl
     

03 Jul, 2018

8 commits

  • commit 6b1ca7ece15e94251d1d0d919f813943e4a58059 upstream.

    There is no need to crash the machine if unknown work request was
    received in SQP MAD.

    Cc: # 3.6
    Fixes: 37bfc7c1e83f ("IB/mlx4: SR-IOV multiplex and demultiplex MADs")
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Leon Romanovsky
     
  • commit 1bc0299d976e000ececc6acd76e33b4582646cb7 upstream.

    The following code fails to allocate a buffer for the
    tail address that the hardware DMAs into when the user
    context DMA_RTAIL is set.

    if (HFI1_CAP_KGET_MASK(rcd->flags, DMA_RTAIL)) {
    rcd->rcvhdrtail_kvaddr = dma_zalloc_coherent(
    &dd->pcidev->dev, PAGE_SIZE, &dma_hdrqtail,
    gfp_flags);
    if (!rcd->rcvhdrtail_kvaddr)
    goto bail_free;
    rcd->rcvhdrqtailaddr_dma = dma_hdrqtail;
    }

    So the rcvhdrtail_kvaddr would then be NULL.

    The mmap logic fails to check for a NULL rcvhdrtail_kvaddr.

    The fix is to test for both user and kernel DMA_TAIL options
    during the allocation as well as testing for a NULL
    rcvhdrtail_kvaddr during the mmap processing.

    Additionally, all downstream testing of the capmask for DMA_RTAIL
    have been eliminated in favor of testing rcvhdrtail_kvaddr.

    Cc: # 4.9.x
    Reviewed-by: Michael J. Ruhl
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Mike Marciniszyn
     
  • commit a93a0a31111231bb1949f4a83b17238f0fa32d6a upstream.

    User send context integrity bits are cleared before the context is
    disabled. If the send context is still processing data, any packets
    that need those integrity bits will cause an error and halt the send
    context.

    During the disable handling, the driver waits for the context to drain.
    If the context is halted, the driver will eventually timeout because
    the context won't drain and then incorrectly bounce the link.

    Reorder the bit clearing and the context disable.

    Examine the software state and send context status as well as the
    egress status to determine if a send context is in the halted state.

    Promote the check macros to static functions for consistency with the
    new check and to follow kernel style.

    Remove an unused define that refers to the egress timeout.

    Cc: # 4.9.x
    Reviewed-by: Mitko Haralanov
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Michael J. Ruhl
     
  • commit 8c79d8223bb11b2f005695a32ddd3985de97727c upstream.

    There are config dependent code paths that expose panics in unload
    paths both in this file and in debugfs_remove_recursive() because
    CONFIG_FAULT_INJECTION and CONFIG_FAULT_INJECTION_DEBUG_FS can be
    set independently.

    Having CONFIG_FAULT_INJECTION set and CONFIG_FAULT_INJECTION_DEBUG_FS
    reset causes fault_create_debugfs_attr() to return an error.

    The debugfs.c routines tolerate failures, but the module unload panics
    dereferencing a NULL in the two exit routines. If that is fixed, the
    dir passed to debugfs_remove_recursive comes from a memory location
    that was freed and potentially reused causing a segfault or corrupting
    memory.

    Here is an example of the NULL deref panic:

    [66866.286829] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
    [66866.295602] IP: hfi1_dbg_ibdev_exit+0x2a/0x80 [hfi1]
    [66866.301138] PGD 858496067 P4D 858496067 PUD 8433a7067 PMD 0
    [66866.307452] Oops: 0000 [#1] SMP
    [66866.310953] Modules linked in: hfi1(-) rdmavt rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm iw_cm ib_cm ib_core rpcsec_gss_krb5 nfsv4 dns_resolver nfsv3 nfs fscache sb_edac x86_pkg_temp_thermal intel_powerclamp vfat fat coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel iTCO_wdt iTCO_vendor_support crypto_simd mei_me glue_helper cryptd mxm_wmi ipmi_si pcspkr lpc_ich sg mei ioatdma ipmi_devintf i2c_i801 mfd_core shpchp ipmi_msghandler wmi acpi_power_meter acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt igb fb_sys_fops ttm ahci ptp crc32c_intel libahci pps_core drm dca libata i2c_algo_bit i2c_core [last unloaded: opa_vnic]
    [66866.385551] CPU: 8 PID: 7470 Comm: rmmod Not tainted 4.14.0-mam-tid-rdma #2
    [66866.393317] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/2016
    [66866.405252] task: ffff88084f28c380 task.stack: ffffc90008454000
    [66866.411866] RIP: 0010:hfi1_dbg_ibdev_exit+0x2a/0x80 [hfi1]
    [66866.417984] RSP: 0018:ffffc90008457da0 EFLAGS: 00010202
    [66866.423812] RAX: 0000000000000000 RBX: ffff880857de0000 RCX: 0000000180040001
    [66866.431773] RDX: 0000000180040002 RSI: ffffea0021088200 RDI: 0000000040000000
    [66866.439734] RBP: ffffc90008457da8 R08: ffff88084220e000 R09: 0000000180040001
    [66866.447696] R10: 000000004220e001 R11: ffff88084220e000 R12: ffff88085a31c000
    [66866.455657] R13: ffffffffa07c9820 R14: ffffffffa07c9890 R15: ffff881059d78100
    [66866.463618] FS: 00007f6876047740(0000) GS:ffff88085f800000(0000) knlGS:0000000000000000
    [66866.472644] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [66866.479053] CR2: 0000000000000088 CR3: 0000000856357006 CR4: 00000000001606e0
    [66866.487013] Call Trace:
    [66866.489747] remove_one+0x1f/0x220 [hfi1]
    [66866.494221] pci_device_remove+0x39/0xc0
    [66866.498596] device_release_driver_internal+0x141/0x210
    [66866.504424] driver_detach+0x3f/0x80
    [66866.508409] bus_remove_driver+0x55/0xd0
    [66866.512784] driver_unregister+0x2c/0x50
    [66866.517164] pci_unregister_driver+0x2a/0xa0
    [66866.521934] hfi1_mod_cleanup+0x10/0xaa2 [hfi1]
    [66866.526988] SyS_delete_module+0x171/0x250
    [66866.531558] do_syscall_64+0x67/0x1b0
    [66866.535644] entry_SYSCALL64_slow_path+0x25/0x25
    [66866.540792] RIP: 0033:0x7f6875525c27
    [66866.544777] RSP: 002b:00007ffd48528e78 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
    [66866.553224] RAX: ffffffffffffffda RBX: 0000000001cc01d0 RCX: 00007f6875525c27
    [66866.561185] RDX: 00007f6875596000 RSI: 0000000000000800 RDI: 0000000001cc0238
    [66866.569146] RBP: 0000000000000000 R08: 00007f68757e9060 R09: 00007f6875596000
    [66866.577120] R10: 00007ffd48528c00 R11: 0000000000000206 R12: 00007ffd48529db4
    [66866.585080] R13: 0000000000000000 R14: 0000000001cc01d0 R15: 0000000001cc0010
    [66866.593040] Code: 90 0f 1f 44 00 00 48 83 3d a3 8b 03 00 00 55 48 89 e5 53 48 89 fb 74 4e 48 8d bf 18 0c 00 00 e8 9d f2 ff ff 48 8b 83 20 0c 00 00 8b b8 88 00 00 00 e8 2a 21 b3 e0 48 8b bb 20 0c 00 00 e8 0e
    [66866.614127] RIP: hfi1_dbg_ibdev_exit+0x2a/0x80 [hfi1] RSP: ffffc90008457da0
    [66866.621885] CR2: 0000000000000088
    [66866.625618] ---[ end trace c4817425783fb092 ]---

    Fix by insuring that upon failure from fault_create_debugfs_attr() the
    parent pointer for the routines is always set to NULL and guards added
    in the exit routines to insure that debugfs_remove_recursive() is not
    called when when the parent pointer is NULL.

    Fixes: 0181ce31b260 ("IB/hfi1: Add receive fault injection feature")
    Cc: # 4.14.x
    Reviewed-by: Michael J. Ruhl
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Mike Marciniszyn
     
  • commit 7b74a83cf54a3747e22c57e25712bd70eef8acee upstream.

    On fatal error the driver simulates CQE's for ULPs that rely on
    completion of all their posted work-request.

    For the GSI traffic, the mlx5 has its own mechanism that sends the
    completions via software CQE's directly to the relevant CQ.

    This should be kept in fatal error too, so the driver should simulate
    such CQE's with the specified error state in order to complete GSI QP
    work requests.

    Without the fix the next deadlock might appears:
    schedule_timeout+0x274/0x350
    wait_for_common+0xec/0x240
    mcast_remove_one+0xd0/0x120 [ib_core]
    ib_unregister_device+0x12c/0x230 [ib_core]
    mlx5_ib_remove+0xc4/0x270 [mlx5_ib]
    mlx5_detach_device+0x184/0x1a0 [mlx5_core]
    mlx5_unload_one+0x308/0x340 [mlx5_core]
    mlx5_pci_err_detected+0x74/0xe0 [mlx5_core]

    Cc: # 4.7
    Fixes: 89ea94a7b6c4 ("IB/mlx5: Reset flow support for IB kernel ULPs")
    Signed-off-by: Erez Shitrit
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Erez Shitrit
     
  • commit d8f9cc328c8888369880e2527e9186d745f2bbf6 upstream.

    To allow rereg_user_mr to modify the MR from read-only to writable without
    using get_user_pages again, we needed to define the initial MR as writable.
    However, this was originally done unconditionally, without taking into
    account the writability of the underlying virtual memory.

    As a result, any attempt to register a read-only MR over read-only
    virtual memory failed.

    To fix this, do not add the writable flag bit when the user virtual memory
    is not writable (e.g. const memory).

    However, when the underlying memory is NOT writable (and we therefore
    do not define the initial MR as writable), the IB core adds a
    "force writable" flag to its user-pages request. If this succeeds,
    the reg_user_mr caller gets a writable copy of the original pages.

    If the user-space caller then does a rereg_user_mr operation to enable
    writability, this will succeed. This should not be allowed, since
    the original virtual memory was not writable.

    Cc:
    Fixes: 9376932d0c26 ("IB/mlx4_ib: Add support for user MR re-registration")
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Jack Morgenstein
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Greg Kroah-Hartman

    Jack Morgenstein
     
  • commit 8d3e71136a080d007620472f50c7b3e63ba0f5cf upstream.

    A warm restart will fail to unload the driver, leaving link state
    potentially flapping up to the point the BIOS resets the adapter.
    Correct the issue by hooking the shutdown pci method,
    which will bring port down.

    Cc: # 4.9.x
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Alex Estrin
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Alex Estrin
     
  • commit 0252f73334f9ef68868e4684200bea3565a4fcee upstream.

    The following error occurs in a debug build when running MPI PSM:

    [ 307.415911] WARNING: CPU: 4 PID: 23867 at lib/dma-debug.c:1158
    check_unmap+0x4ee/0xa20
    [ 307.455661] ib_qib 0000:05:00.0: DMA-API: device driver failed to check map
    error[device address=0x00000000df82b000] [size=4096 bytes] [mapped as page]
    [ 307.517494] Modules linked in:
    [ 307.531584] ib_isert iscsi_target_mod ib_srpt target_core_mod rpcrdma
    sunrpc ib_srp scsi_transport_srp scsi_tgt ib_iser libiscsi ib_ipoib
    scsi_transport_iscsi rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm
    ib_qib intel_powerclamp coretemp rdmavt intel_rapl iosf_mbi kvm_intel kvm
    irqbypass crc32_pclmul ghash_clmulni_intel ipmi_ssif ib_core aesni_intel sg
    ipmi_si lrw gf128mul dca glue_helper ipmi_devintf iTCO_wdt gpio_ich hpwdt
    iTCO_vendor_support ablk_helper hpilo acpi_power_meter cryptd ipmi_msghandler
    ie31200_edac shpchp pcc_cpufreq lpc_ich pcspkr ip_tables xfs libcrc32c sd_mod
    crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea
    sysfillrect sysimgblt fb_sys_fops ttm ahci crct10dif_pclmul crct10dif_common
    drm crc32c_intel libahci tg3 libata serio_raw ptp i2c_core
    [ 307.846113] pps_core dm_mirror dm_region_hash dm_log dm_mod
    [ 307.866505] CPU: 4 PID: 23867 Comm: mpitests-IMB-MP Kdump: loaded Not
    tainted 3.10.0-862.el7.x86_64.debug #1
    [ 307.911178] Hardware name: HP ProLiant DL320e Gen8, BIOS J05 11/09/2013
    [ 307.944206] Call Trace:
    [ 307.956973] [] dump_stack+0x19/0x1b
    [ 307.982201] [] __warn+0xd8/0x100
    [ 308.005999] [] warn_slowpath_fmt+0x5f/0x80
    [ 308.034260] [] check_unmap+0x4ee/0xa20
    [ 308.060801] [] ? page_add_file_rmap+0x2a/0x1d0
    [ 308.090689] [] debug_dma_unmap_page+0x9d/0xb0
    [ 308.120155] [] ? might_fault+0xa0/0xb0
    [ 308.146656] [] qib_tid_free.isra.14+0x215/0x2a0 [ib_qib]
    [ 308.180739] [] qib_write+0x894/0x1280 [ib_qib]
    [ 308.210733] [] ? __inode_security_revalidate+0x70/0x80
    [ 308.244837] [] ? security_file_permission+0x27/0xb0
    [ 308.266025] qib_ib0.8006: multicast join failed for
    ff12:401b:8006:0000:0000:0000:ffff:ffff, status -22
    [ 308.323421] [] vfs_write+0xc3/0x1f0
    [ 308.347077] [] ? fget_light+0xfc/0x510
    [ 308.372533] [] SyS_write+0x8a/0x100
    [ 308.396456] [] system_call_fastpath+0x1c/0x21

    The code calls a qib_map_page() which has never correctly tested for a
    mapping error.

    Fix by testing for pci_dma_mapping_error() in all cases and properly
    handling the failure in the caller.

    Additionally, streamline qib_map_page() arguments to satisfy just
    the single caller.

    Cc:
    Reviewed-by: Alex Estrin
    Tested-by: Don Dutile
    Reviewed-by: Don Dutile
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Mike Marciniszyn
     

21 Jun, 2018

2 commits

  • [ Upstream commit 59482a14918b282ca2a98f38c69da5ebeb1107d2 ]

    When IRQ affinity is set and the interrupt type is unknown, a cpu
    mask allocated within the function is never freed. Fix this memory
    leak by allocating memory within the scope where it is used.

    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Michael J. Ruhl
    Signed-off-by: Sebastian Sanchez
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Sebastian Sanchez
     
  • [ Upstream commit 5da9e742be44d9b7c68b1bf6e1aaf46a1aa7a52b ]

    The module parameter num_user_context is defined as 'int' and
    defaults to -1. The module_param_named() says that it is uint.

    Correct module_param_named() type information and update the modinfo
    text to reflect the default value.

    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Michael J. Ruhl
     

30 May, 2018

16 commits

  • [ Upstream commit 7672ed33c4c15dbe9d56880683baaba4227cf940 ]

    Before commit f1b65df5a232 ("IB/mlx5: Add support for active_width and
    active_speed in RoCE"), the mlx5_ib driver set the default active_width
    and active_speed to IB_WIDTH_4X and IB_SPEED_QDR.

    When the RoCE port is down, the RoCE port does not negotiate the active
    width with the remote side, causing the active width to be zero. When
    running userspace ibstat to view the port status, ibstat will panic as it
    reads an invalid width from sys file.

    This patch restores the original behavior.

    Fixes: f1b65df5a232 ("IB/mlx5: Add support for active_width and active_speed in RoCE").
    Signed-off-by: Honggang Li
    Reviewed-by: Hal Rosenstock
    Reviewed-by: Noa Osherovich
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Honggang Li
     
  • [ Upstream commit caf61b1b8b88ccf1451f7321a176393797e8d292 ]

    Once the FW is transitioned to error, FLUSH cqes can be received.
    We want the driver to be aware of the fact that QP is already in error.

    Without this fix, a user may see false error messages in the dmesg log,
    mentioning that a FLUSH cqe was received while QP is not in error state.

    Fixes: cecbcddf ("qedr: Add support for QP verbs")
    Signed-off-by: Michal Kalderon
    Signed-off-by: Ariel Elior
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Kalderon, Michal
     
  • [ Upstream commit b15606f47b89b0b09936d7f45b59ba6275527041 ]

    Return code wasn't set properly when CNQ allocation failed.
    This only affect error message logging, currently user will
    receive an error message that says the qedr driver load failed
    with rc '0', instead of ENOMEM

    Fixes: ec72fce4 ("qedr: Add support for RoCE HW init")
    Signed-off-by: Michal Kalderon
    Signed-off-by: Ariel Elior
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Kalderon, Michal
     
  • [ Upstream commit c3594f22302cca5e924e47ec1cc8edd265708f41 ]

    QPs that were configured with ack timeout value lower than 1
    msec will not implement re-transmission timeout.
    This means that if a packet / ACK were dropped, the QP
    will not retransmit this packet.

    This can lead to an application hang.

    Fixes: cecbcddf6 ("qedr: Add support for QP verbs")
    Signed-off-by: Michal Kalderon
    Signed-off-by: Ariel Elior
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Kalderon, Michal
     
  • [ Upstream commit 5d414b178e950ce9685c253994cc730893d5d887 ]

    "err" is either zero or possibly uninitialized here. It should be
    -EINVAL.

    Fixes: 427c1e7bcd7e ("{IB, net}/mlx5: Move the modify QP operation table to mlx5_ib")
    Signed-off-by: Dan Carpenter
    Acked-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • [ Upstream commit a18177925c252da7801149abe217c05b80884798 ]

    The commit cited below added a gid_type field (RoCEv1 or RoCEv2)
    to GID properties.

    When adding GIDs, this gid_type field was copied over to the
    hardware gid table. However, when deleting GIDs, the gid_type field
    was not copied over to the hardware gid table.

    As a result, when running RoCEv2, all RoCEv2 gids in the
    hardware gid table were set to type RoCEv1 when any gid was deleted.

    This problem would persist until the next gid was added (which would again
    restore the gid_type field for all the gids in the hardware gid table).

    Fix this by copying over the gid_type field to the hardware gid table
    when deleting gids, so that the gid_type of all remaining gids is
    preserved when a gid is deleted.

    Fixes: b699a859d17b ("IB/mlx4: Add gid_type to GID properties")
    Reviewed-by: Moni Shoua
    Signed-off-by: Jack Morgenstein
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jack M
     
  • [ Upstream commit 0077416a3d529baccbe07ab3242e8db541cfadf6 ]

    When using IPv4 addresses in RoCEv2, the GID format for the mapped
    IPv4 address should be: ::ffff:.

    In the cited commit, IPv4 mapped IPV6 addresses had the 3 upper dwords
    zeroed out by memset, which resulted in deleting the ffff field.

    However, since procedure ipv6_addr_v4mapped() already verifies that the
    gid has format ::ffff:, no change is needed for the gid,
    and the memset can simply be removed.

    Fixes: 7e57b85c444c ("IB/mlx4: Add support for setting RoCEv2 gids in hardware")
    Reviewed-by: Moni Shoua
    Signed-off-by: Jack Morgenstein
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jack Morgenstein
     
  • [ Upstream commit 551e1c67b4207455375a2e7a285dea1c7e8fc361 ]

    iWARP does not support RDMA WRITE or SEND with immediate data.
    Driver should check this before submitting to FW and return an
    immediate error

    Signed-off-by: Michal Kalderon
    Signed-off-by: Ariel Elior
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Kalderon, Michal
     
  • [ Upstream commit e3fd112cbf21d049faf64ba1471d72b93c22109a ]

    Race in qedr_poll_cq, lastest_cqe wasn't protected by lock,
    leading to a case where two context's accessing poll_cq at
    the same time lead to one of them having a pointer to an old
    latest_cqe and reading an invalid cqe element

    Signed-off-by: Amit Radzi
    Signed-off-by: Michal Kalderon
    Signed-off-by: Ariel Elior
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Kalderon, Michal
     
  • [ Upstream commit 497158aa5f520db50452ef928c0f955cb42f2e77 ]

    Release the netdev references in the cleanup path. Invokes the cleanup
    routines if bnxt_re_ib_reg fails.

    Signed-off-by: Selvin Xavier
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Selvin Xavier
     
  • [ Upstream commit c354dff00db8df80f271418d8392065e10ffffb6 ]

    To support host systems with non 4K page size, l2_db_size shall be
    calculated with 4096 instead of PAGE_SIZE. Also, supply the host page size
    to FW during initialization.

    Signed-off-by: Devesh Sharma
    Signed-off-by: Selvin Xavier
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Devesh Sharma
     
  • [ Upstream commit a45bc17b360d75fac9ced85e99fda14bf38b4dc3 ]

    HW requires an unconditonal fence for all non-wire memory operations
    through SQ. This guarantees the completions of these memory operations.

    Signed-off-by: Devesh Sharma
    Signed-off-by: Selvin Xavier
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Devesh Sharma
     
  • [ Upstream commit 65389322b28f81cc137b60a41044c2d958a7b950 ]

    IB spec says that a lid should be ignored when link layer is Ethernet,
    for example when building or parsing a CM request message (CA17-34).
    However, since ib_lid_be16() and ib_lid_cpu16() validates the slid,
    not only when link layer is IB, we set the slid to zero to prevent
    false warnings in the kernel log.

    Fixes: 62ede7779904 ("Add OPA extended LID support")
    Reviewed-by: Majd Dibbiny
    Signed-off-by: Moni Shoua
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Moni Shoua
     
  • [ Upstream commit dcdaba08062b4726500b9456f8664bfda896c664 ]

    During driver unload, the driver proceeds with cleanup
    without waiting for the scheduled events. So the device
    pointers get freed up and driver crashes when the events
    are scheduled later.

    Flush the bnxt_re_task work queue before starting
    device removal.

    Signed-off-by: Selvin Xavier
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Selvin Xavier
     
  • [ Upstream commit 6b4521f5174c26020ae0deb3ef7f2c28557cf445 ]

    Driver leaves the QP memory pinned if QP create command
    fails from the FW. Avoids this scenario by adding a proper
    exit path if the FW command fails.

    Signed-off-by: Devesh Sharma
    Signed-off-by: Selvin Xavier
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Devesh Sharma
     
  • commit f9e76ca3771bf23d2142a81a88ddd8f31f5c4c03 upstream.

    A pio send egress error can occur when the PSM library attempts to
    to send a bad packet. That issue is still being investigated.

    The pio error interrupt handler then attempts to progress the recovery
    of the errored pio send context.

    Code inspection reveals that the handling lacks the necessary locking
    if that recovery interleaves with a PSM close of the "context" object
    contains the pio send context.

    The lack of the locking can cause the recovery to access the already
    freed pio send context object and incorrectly deduce that the pio
    send context is actually a kernel pio send context as shown by the
    NULL deref stack below:

    [] _dev_info+0x6c/0x90
    [] sc_restart+0x70/0x1f0 [hfi1]
    [] ? __schedule+0x424/0x9b0
    [] sc_halted+0x15/0x20 [hfi1]
    [] process_one_work+0x17a/0x440
    [] worker_thread+0x126/0x3c0
    [] ? manage_workers.isra.24+0x2a0/0x2a0
    [] kthread+0xcf/0xe0
    [] ? insert_kthread_work+0x40/0x40
    [] ret_from_fork+0x58/0x90
    [] ? insert_kthread_work+0x40/0x40

    This is the best case scenario and other scenarios can corrupt the
    already freed memory.

    Fix by adding the necessary locking in the pio send context error
    handler.

    Cc: # 4.9.x
    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Michael J. Ruhl