20 Oct, 2018

1 commit

  • commit b4a4957d3d1c328b733fce783b7264996f866ad2 upstream.

    rvt_destroy_qp() cannot complete until all in process packets have
    been released from the underlying hardware. If a link down event
    occurs, an application can hang with a kernel stack similar to:

    cat /proc//stack
    quiesce_qp+0x178/0x250 [hfi1]
    rvt_reset_qp+0x23d/0x400 [rdmavt]
    rvt_destroy_qp+0x69/0x210 [rdmavt]
    ib_destroy_qp+0xba/0x1c0 [ib_core]
    nvme_rdma_destroy_queue_ib+0x46/0x80 [nvme_rdma]
    nvme_rdma_free_queue+0x3c/0xd0 [nvme_rdma]
    nvme_rdma_destroy_io_queues+0x88/0xd0 [nvme_rdma]
    nvme_rdma_error_recovery_work+0x52/0xf0 [nvme_rdma]
    process_one_work+0x17a/0x440
    worker_thread+0x126/0x3c0
    kthread+0xcf/0xe0
    ret_from_fork+0x58/0x90
    0xffffffffffffffff

    quiesce_qp() waits until all outstanding packets have been freed.
    This wait should be momentary. During a link down event, the cleanup
    handling does not ensure that all packets caught by the link down are
    flushed properly.

    This is caused by the fact that the freeze path and the link down
    event is handled the same. This is not correct. The freeze path
    waits until the HFI is unfrozen and then restarts PIO. A link down
    is not a freeze event. The link down path cannot restart the PIO
    until link is restored. If the PIO path is restarted before the link
    comes up, the application (QP) using the PIO path will hang (until
    link is restored).

    Fix by separating the linkdown path from the freeze path and use the
    link down path for link down events.

    Close a race condition sc_disable() by acquiring both the progress
    and release locks.

    Close a race condition in sc_stop() by moving the setting of the flag
    bits under the alloc lock.

    Cc: # 4.9.x+
    Fixes: 7724105686e7 ("IB/hfi1: add driver files")
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Michael J. Ruhl
     

13 Oct, 2018

1 commit

  • commit 5fe23f262e0548ca7f19fb79f89059a60d087d22 upstream.

    There is a race condition between ucma_close() and ucma_resolve_ip():

    CPU0 CPU1
    ucma_resolve_ip(): ucma_close():

    ctx = ucma_get_ctx(file, cmd.id);

    list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) {
    mutex_lock(&mut);
    idr_remove(&ctx_idr, ctx->id);
    mutex_unlock(&mut);
    ...
    mutex_lock(&mut);
    if (!ctx->closing) {
    mutex_unlock(&mut);
    rdma_destroy_id(ctx->cm_id);
    ...
    ucma_free_ctx(ctx);

    ret = rdma_resolve_addr();
    ucma_put_ctx(ctx);

    Before idr_remove(), ucma_get_ctx() could still find the ctx
    and after rdma_destroy_id(), rdma_resolve_addr() may still
    access id_priv pointer. Also, ucma_put_ctx() may use ctx after
    ucma_free_ctx() too.

    ucma_close() should call ucma_put_ctx() too which tests the
    refcnt and waits for the last one releasing it. The similar
    pattern is already used by ucma_destroy_id().

    Reported-and-tested-by: syzbot+da2591e115d57a9cbb8b@syzkaller.appspotmail.com
    Reported-by: syzbot+cfe3c1e8ef634ba8964b@syzkaller.appspotmail.com
    Cc: Jason Gunthorpe
    Cc: Doug Ledford
    Cc: Leon Romanovsky
    Signed-off-by: Cong Wang
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     

10 Oct, 2018

1 commit

  • [ Upstream commit 0d23ba6034b9cf48b8918404367506da3e4b3ee5 ]

    The current code grabs the private_data of whatever file descriptor
    userspace has supplied and implicitly casts it to a `struct ucma_file *`,
    potentially causing a type confusion.

    This is probably fine in practice because the pointer is only used for
    comparisons, it is never actually dereferenced; and even in the
    comparisons, it is unlikely that a file from another filesystem would have
    a ->private_data pointer that happens to also be valid in this context.
    But ->private_data is not always guaranteed to be a valid pointer to an
    object owned by the file's filesystem; for example, some filesystems just
    cram numbers in there.

    Check the type of the supplied file descriptor to be safe, analogous to how
    other places in the kernel do it.

    Fixes: 88314e4dda1e ("RDMA/cma: add support for rdma_migrate_id()")
    Signed-off-by: Jann Horn
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     

04 Oct, 2018

9 commits

  • commit 67e3816842fe6414d629c7515b955952ec40c7d7 upstream.

    Currently a uverbs completion event queue is flushed of events in
    ib_uverbs_comp_event_close() with the queue spinlock held and then
    released. Yet setting ev_queue->is_closed is not set until later in
    uverbs_hot_unplug_completion_event_file().

    In between the time ib_uverbs_comp_event_close() releases the lock and
    uverbs_hot_unplug_completion_event_file() acquires the lock, a completion
    event can arrive and be inserted into the event queue by
    ib_uverbs_comp_handler().

    This can cause a "double add" list_add warning or crash depending on the
    kernel configuration, or a memory leak because the event is never dequeued
    since the queue is already closed down.

    So add setting ev_queue->is_closed = 1 to ib_uverbs_comp_event_close().

    Cc: stable@vger.kernel.org
    Fixes: 1e7710f3f656 ("IB/core: Change completion channel to use the reworked objects schema")
    Signed-off-by: Steve Wise
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Steve Wise
     
  • commit d623500b3c4efd8d4e945ac9003c6b87b469a9ab upstream.

    If a packet stream uses an UnsupportedVL (virtual lane), the send
    engine will not send the packet, and it will not indicate that an
    error has occurred. This will cause the packet stream to block.

    HFI has 8 virtual lanes available for packet streams. Each lane can
    be enabled or disabled using the UnsupportedVL mask. If a lane is
    disabled, adding a packet to the send context must be disallowed.

    The current mask for determining unsupported VLs defaults to 0 (allow
    all). This is incorrect. Only the VLs that are defined should be
    allowed.

    Determine which VLs are disabled (mtu == 0), and set the appropriate
    unsupported bit in the mask. The correct mask will allow the send
    engine to error on the invalid VL, and error recovery will work
    correctly.

    Cc: # 4.9.x+
    Fixes: 7724105686e7 ("IB/hfi1: add driver files")
    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Lukasz Odzioba
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Michael J. Ruhl
     
  • commit 94694d18cf27a6faad91487a38ce516c2b16e7d9 upstream.

    If the number of packets in a user sdma request does not match
    the actual iovectors being sent, sdma_cleanup can be called on
    an uninitialized request structure, resulting in a crash similar
    to this:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: [] __sdma_txclean+0x57/0x1e0 [hfi1]
    PGD 8000001044f61067 PUD 1052706067 PMD 0
    Oops: 0000 [#1] SMP
    CPU: 30 PID: 69912 Comm: upsm Kdump: loaded Tainted: G OE
    ------------ 3.10.0-862.el7.x86_64 #1
    Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS
    SE5C610.86B.01.01.0019.101220160604 10/12/2016
    task: ffff8b331c890000 ti: ffff8b2ed1f98000 task.ti: ffff8b2ed1f98000
    RIP: 0010:[] [] __sdma_txclean+0x57/0x1e0
    [hfi1]
    RSP: 0018:ffff8b2ed1f9bab0 EFLAGS: 00010286
    RAX: 0000000000008b2b RBX: ffff8b2adf6e0000 RCX: 0000000000000000
    RDX: 00000000000000a0 RSI: ffff8b2e9eedc540 RDI: ffff8b2adf6e0000
    RBP: ffff8b2ed1f9bad8 R08: 0000000000000000 R09: ffffffffc0b04a06
    R10: ffff8b331c890190 R11: ffffe6ed00bf1840 R12: ffff8b3315480000
    R13: ffff8b33154800f0 R14: 00000000fffffff2 R15: ffff8b2e9eedc540
    FS: 00007f035ac47740(0000) GS:ffff8b331e100000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000008 CR3: 0000000c03fe6000 CR4: 00000000001607e0
    Call Trace:
    [] user_sdma_send_pkts+0xdcd/0x1990 [hfi1]
    [] ? gup_pud_range+0x140/0x290
    [] ? hfi1_mmu_rb_insert+0x155/0x1b0 [hfi1]
    [] hfi1_user_sdma_process_request+0xc5b/0x11b0 [hfi1]
    [] hfi1_aio_write+0xba/0x110 [hfi1]
    [] do_sync_readv_writev+0x7b/0xd0
    [] do_readv_writev+0xce/0x260
    [] ? tty_ldisc_deref+0x19/0x20
    [] ? n_tty_ioctl+0xe0/0xe0
    [] vfs_writev+0x35/0x60
    [] SyS_writev+0x7f/0x110
    [] system_call_fastpath+0x1c/0x21
    Code: 06 49 c7 47 18 00 00 00 00 0f 87 89 01 00 00 5b 41 5c 41 5d 41 5e 41 5f
    5d c3 66 2e 0f 1f 84 00 00 00 00 00 48 8b 4e 10 48 89 fb 8b 51 08 49 89 d4
    83 e2 0c 41 81 e4 00 e0 00 00 48 c1 ea 02
    RIP [] __sdma_txclean+0x57/0x1e0 [hfi1]
    RSP
    CR2: 0000000000000008

    There are two exit points from user_sdma_send_pkts(). One (free_tx)
    merely frees the slab entry and one (free_txreq) cleans the sdma_txreq
    prior to freeing the slab entry. The free_txreq variation can only be
    called after one of the sdma_init*() variations has been called.

    In the panic case, the slab entry had been allocated but not inited.

    Fix the issue by exiting through free_tx thus avoiding sdma_clean().

    Cc: # 4.9.x+
    Fixes: 7724105686e7 ("IB/hfi1: add driver files")
    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Lukasz Odzioba
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Greg Kroah-Hartman

    Signed-off-by: Jason Gunthorpe

    Michael J. Ruhl
     
  • commit 0dbfaa9f2813787679e296eb5476e40938ab48c8 upstream.

    The SL specified by a user needs to be a valid SL.

    Add a range check to the user specified SL value which protects from
    running off the end of the SL to SC table.

    CC: stable@vger.kernel.org
    Fixes: 7724105686e7 ("IB/hfi1: add driver files")
    Signed-off-by: Ira Weiny
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Ira Weiny
     
  • commit ee92efe41cf358f4b99e73509f2bfd4733609f26 upstream.

    Use different loop variables for the inner and outer loop. This avoids
    that an infinite loop occurs if there are more RDMA channels than
    target->req_ring_size.

    Fixes: d92c0da71a35 ("IB/srp: Add multichannel support")
    Cc:
    Signed-off-by: Bart Van Assche
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Bart Van Assche
     
  • [ Upstream commit f1228867adaf8890826f2b59e4caddb1c5cc2df7 ]

    rdma_ah_find_type() can reach into ib_device->port_immutable with a
    potentially out-of-bounds port number, so check that the port number is
    valid first.

    Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
    Signed-off-by: Tarick Bedeir
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Tarick Bedeir
     
  • [ Upstream commit c2d7c8ff89b22ddefb1ac2986c0d48444a667689 ]

    "nents" is an unsigned int, so if ib_map_mr_sg() returns a negative
    error code then it's type promoted to a high unsigned int which is
    treated as success.

    Fixes: a060b5629ab0 ("IB/core: generic RDMA READ/WRITE API")
    Signed-off-by: Dan Carpenter
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • [ Upstream commit 5d9a2b0e28759e319a623da33940dbb3ce952b7d ]

    VMA lookup is supposed to be performed while mmap_sem is held.

    Fixes: f26c7c83395b ("i40iw: Add 2MB page support")
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Leon Romanovsky
     
  • [ Upstream commit 474e5a86067e5f12c97d1db8b170c7f45b53097a ]

    The sgid_tbl->tbl[] array is allocated in bnxt_qplib_alloc_sgid_tbl().
    It has sgid_tbl->max elements. So the > should be >= to prevent
    accessing one element beyond the end of the array.

    Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
    Signed-off-by: Dan Carpenter
    Acked-by: Selvin Xavier
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     

29 Sep, 2018

1 commit

  • commit 308aa2b8f7b7db3332a7d41099fd37851fb793b2 upstream.

    Once the qp has been flushed, it cannot be flushed again. The user qp
    flush logic wasn't enforcing it however. The bug can cause
    touch-after-free crashes like:

    Unable to handle kernel paging request for data at address 0x000001ec
    Faulting instruction address: 0xc008000016069100
    Oops: Kernel access of bad area, sig: 11 [#1]
    ...
    NIP [c008000016069100] flush_qp+0x80/0x480 [iw_cxgb4]
    LR [c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4]
    Call Trace:
    [c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4]
    [c00800001606e868] c4iw_ib_modify_qp+0x118/0x200 [iw_cxgb4]
    [c0080000119eae80] ib_security_modify_qp+0xd0/0x3d0 [ib_core]
    [c0080000119c4e24] ib_modify_qp+0xc4/0x2c0 [ib_core]
    [c008000011df0284] iwcm_modify_qp_err+0x44/0x70 [iw_cm]
    [c008000011df0fec] destroy_cm_id+0xcc/0x370 [iw_cm]
    [c008000011ed4358] rdma_destroy_id+0x3c8/0x520 [rdma_cm]
    [c0080000134b0540] ucma_close+0x90/0x1b0 [rdma_ucm]
    [c000000000444da4] __fput+0xe4/0x2f0

    So fix flush_qp() to only flush the wq once.

    Cc: stable@vger.kernel.org
    Signed-off-by: Steve Wise
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Steve Wise
     

26 Sep, 2018

4 commits

  • commit 816e846c2eb9129a3e0afa5f920c8bbc71efecaa upstream.

    Inside of start_xmit() the call to check if the connection is up and the
    queueing of the packets for later transmission is not atomic which leaves
    a window where cm_rep_handler can run, set the connection up, dequeue
    pending packets and leave the subsequently queued packets by start_xmit()
    sitting on neigh->queue until they're dropped when the connection is torn
    down. This only applies to connected mode. These dropped packets can
    really upset TCP, for example, and cause multi-minute delays in
    transmission for open connections.

    Here's the code in start_xmit where we check to see if the connection is
    up:

    if (ipoib_cm_get(neigh)) {
    if (ipoib_cm_up(neigh)) {
    ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
    goto unref;
    }
    }

    The race occurs if cm_rep_handler execution occurs after the above
    connection check (specifically if it gets to the point where it acquires
    priv->lock to dequeue pending skb's) but before the below code snippet in
    start_xmit where packets are queued.

    if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
    push_pseudo_header(skb, phdr->hwaddr);
    spin_lock_irqsave(&priv->lock, flags);
    __skb_queue_tail(&neigh->queue, skb);
    spin_unlock_irqrestore(&priv->lock, flags);
    } else {
    ++dev->stats.tx_dropped;
    dev_kfree_skb_any(skb);
    }

    The patch acquires the netif tx lock in cm_rep_handler for the section
    where it sets the connection up and dequeues and retransmits deferred
    skb's.

    Fixes: 839fcaba355a ("IPoIB: Connected mode experimental support")
    Cc: stable@vger.kernel.org
    Signed-off-by: Aaron Knister
    Tested-by: Ira Weiny
    Reviewed-by: Ira Weiny
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Aaron Knister
     
  • commit 954a8e3aea87e896e320cf648c1a5bbe47de443e upstream.

    When AF_IB addresses are used during rdma_resolve_addr() a lock is not
    held. A cma device can get removed while list traversal is in progress
    which may lead to crash. ie

    CPU0 CPU1
    ==== ====
    rdma_resolve_addr()
    cma_resolve_ib_dev()
    list_for_each() cma_remove_one()
    cur_dev->device mutex_lock(&lock)
    list_del();
    mutex_unlock(&lock);
    cma_process_remove();

    Therefore, hold a lock while traversing the list which avoids such
    situation.

    Cc: # 3.10
    Fixes: f17df3b0dede ("RDMA/cma: Add support for AF_IB to rdma_resolve_addr()")
    Signed-off-by: Parav Pandit
    Reviewed-by: Daniel Jurgens
    Signed-off-by: Leon Romanovsky
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Parav Pandit
     
  • [ Upstream commit 99a7e2bf704d64c966dfacede1ba2d9b47cb676e ]

    Fix to return a negative error code from the ipoib_neigh_hash_init()
    error handling case instead of 0, as done elsewhere in this function.

    Fixes: 515ed4f3aab4 ("IB/IPoIB: Separate control and data related initializations")
    Signed-off-by: Wei Yongjun
    Reviewed-by: Yuval Shaia
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Wei Yongjun
     
  • [ Upstream commit 536ca245c512aedfd84cde072d7b3ca14b6e1792 ]

    According to "Annex A16: RDMA over Converged Ethernet (RoCE)":

    A16.4.3 MANAGEMENT INTERFACES

    As defined in the base specification, a special Queue Pair, QP0 is defined
    solely for communication between subnet manager(s) and subnet management
    agents. Since such an IB-defined subnet management architecture is outside
    the scope of this annex, it follows that there is also no requirement that
    a port which conforms to this annex be associated with a QP0. Thus, for
    end nodes designed to conform to this annex, the concept of QP0 is
    undefined and unused for any port connected to an Ethernet network.

    CA16-8: A packet arriving at a RoCE port containing a BTH with the
    destination QP field set to QP0 shall be silently dropped.

    Signed-off-by: Zhu Yanjun
    Acked-by: Moni Shoua
    Reviewed-by: Yuval Shaia
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Zhu Yanjun
     

20 Sep, 2018

1 commit

  • [ Upstream commit 643d213a9a034fa04f5575a40dfc8548e33ce04f ]

    Currently if the cm_id is not bound to any netdevice, than for such cm_id,
    net namespace is ignored; which is incorrect.

    Regardless of cm_id bound to a netdevice or not, net namespace must
    match. When a cm_id is bound to a netdevice, in such case net namespace
    and netdevice both must match.

    Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to RDMA CM")
    Signed-off-by: Parav Pandit
    Reviewed-by: Daniel Jurgens
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Parav Pandit
     

15 Sep, 2018

2 commits

  • [ Upstream commit a1ceeca679dccc492235f0f629d9e9f7b3d51ca8 ]

    hns bitmap allocation functions return 0 on success and -1 on failure.
    Callers of these functions wrongly used their return value as an errno,
    fix that by making a proper conversion.

    Fixes: a598c6f4c5a8 ("IB/hns: Simplify function of pd alloc and qp alloc")
    Signed-off-by: Gal Pressman
    Acked-by: Lijun Ou
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Gal Pressman
     
  • [ Upstream commit c513de490f808d8480346f9a58e6a4a5f3de12e7 ]

    If the system BIOS does not supply NUMA node information to the
    PCI devices, the NUMA node is selected by choosing the current
    node.

    This can lead to the following crash:

    divide error: 0000 SMP
    CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G IOE
    ------------ 3.10.0-693.21.1.el7.x86_64 #1
    Hardware name: Intel Corporation S2600KP/S2600KP, BIOS
    SE5C610.86B.01.01.0005.101720141054 10/17/2014
    Workqueue: events work_for_cpu_fn
    task: ffff880174480fd0 ti: ffff880174488000 task.ti: ffff880174488000
    RIP: 0010: [] hfi1_dev_affinity_init+0x129/0x6a0 [hfi1]
    RSP: 0018:ffff88017448bbf8 EFLAGS: 00010246
    RAX: 0000000000000011 RBX: ffff88107ffba6c0 RCX: ffff88085c22e130
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880824ad0000
    RBP: ffff88017448bc48 R08: 0000000000000011 R09: 0000000000000002
    R10: ffff8808582b6ca0 R11: 0000000000003151 R12: ffff8808582b6ca0
    R13: ffff8808582b6518 R14: ffff8808582b6010 R15: 0000000000000012
    FS: 0000000000000000(0000) GS:ffff88085ec00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007efc707404f0 CR3: 0000000001a02000 CR4: 00000000001607f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Call Trace:
    hfi1_init_dd+0x14b3/0x27a0 [hfi1]
    ? pcie_capability_write_word+0x46/0x70
    ? hfi1_pcie_init+0xc0/0x200 [hfi1]
    do_init_one+0x153/0x4c0 [hfi1]
    ? sched_clock_cpu+0x85/0xc0
    init_one+0x1b5/0x260 [hfi1]
    local_pci_probe+0x4a/0xb0
    work_for_cpu_fn+0x1a/0x30
    process_one_work+0x17f/0x440
    worker_thread+0x278/0x3c0
    ? manage_workers.isra.24+0x2a0/0x2a0
    kthread+0xd1/0xe0
    ? insert_kthread_work+0x40/0x40
    ret_from_fork+0x77/0xb0
    ? insert_kthread_work+0x40/0x40

    If the BIOS is not supplying NUMA information:
    - set the default table count to 1 for all possible nodes
    - select node 0 (instead of current NUMA) node to get consistent
    performance
    - generate an error indicating that the BIOS should be upgraded

    Reviewed-by: Gary Leshner
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro

    Signed-off-by: Jason Gunthorpe

    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Michael J. Ruhl
     

10 Sep, 2018

2 commits

  • commit 61b717d041b1976530f68f8b539b2e3a7dd8e39c upstream.

    Every function that returns COMPST_ERROR must set wqe->status to another
    value than IB_WC_SUCCESS before returning COMPST_ERROR. Fix the only code
    path for which this is not yet the case.

    Signed-off-by: Bart Van Assche
    Cc:
    Reviewed-by: Yuval Shaia
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Bart Van Assche
     
  • commit 995250959d22fc341b5424e3343b0ce5df672461 upstream.

    Avoid that KASAN reports the following:

    BUG: KASAN: use-after-free in srpt_close_ch+0x4f/0x1b0 [ib_srpt]
    Read of size 4 at addr ffff880151180cb8 by task check/4681

    CPU: 15 PID: 4681 Comm: check Not tainted 4.18.0-rc2-dbg+ #4
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
    Call Trace:
    dump_stack+0xa4/0xf5
    print_address_description+0x6f/0x270
    kasan_report+0x241/0x360
    __asan_load4+0x78/0x80
    srpt_close_ch+0x4f/0x1b0 [ib_srpt]
    srpt_set_enabled+0xf7/0x1e0 [ib_srpt]
    srpt_tpg_enable_store+0xb8/0x120 [ib_srpt]
    configfs_write_file+0x14e/0x1d0 [configfs]
    __vfs_write+0xd2/0x3b0
    vfs_write+0x101/0x270
    ksys_write+0xab/0x120
    __x64_sys_write+0x43/0x50
    do_syscall_64+0x77/0x230
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Fixes: aaf45bd83eba ("IB/srpt: Detect session shutdown reliably")
    Signed-off-by: Bart Van Assche
    Cc:
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Bart Van Assche
     

24 Aug, 2018

3 commits

  • [ Upstream commit d63c46734c545ad0488761059004a65c46efdde3 ]

    Fix memory leak in the error path of mlx5_ib_create_srq() by making sure
    to free the allocated srq.

    Fixes: c2b37f76485f ("IB/mlx5: Fix integer overflows in mlx5_ib_create_srq")
    Signed-off-by: Kamal Heib
    Acked-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Kamal Heib
     
  • [ Upstream commit 375dc53d032fc11e98036b5f228ad13f7c5933f5 ]

    Run the completer task to post a work completion after processing
    a memory registration or invalidate work request. This covers the
    case where the memory registration or invalidate was the last work
    request posted to the qp.

    Signed-off-by: Vijay Immanuel
    Reviewed-by: Yonatan Cohen
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Vijay Immanuel
     
  • [ Upstream commit 3dc7c7badb7502ec3e3aa817a8bdd9e53aa54c52 ]

    Before returning -EPERM we should release some resources, as already done
    in the other error handling path of the function.

    Fixes: d8f9cc328c88 ("IB/mlx4: Mark user MR as writable if actual virtual memory is writable")
    Signed-off-by: Christophe JAILLET
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Christophe Jaillet
     

06 Aug, 2018

1 commit

  • commit addb8a6559f0f8b5a37582b7ca698358445a55bf upstream.

    The commit cited below checked that the port numbers provided in the
    primary and alt AVs are legal.

    That is sufficient to prevent a kernel panic. However, it is not
    sufficient for correct operation.

    In Linux, AVs (both primary and alt) must be completely self-described.
    We do not accept an AV from userspace without an embedded port number.
    (This has been the case since kernel 3.14 commit dbf727de7440
    ("IB/core: Use GID table in AH creation and dmac resolution")).

    For the primary AV, this embedded port number must match the port number
    specified with IB_QP_PORT.

    We also expect the port number embedded in the alt AV to match the
    alt_port_num value passed by the userspace driver in the modify_qp command
    base structure.

    Add these checks to modify_qp.

    Cc: # 4.16
    Fixes: 5d4c05c3ee36 ("RDMA/uverbs: Sanitize user entered port numbers prior to access it")
    Signed-off-by: Jack Morgenstein
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Jack Morgenstein
     

03 Aug, 2018

7 commits

  • commit 940efcc8889f0d15567eb07fc9fd69b06e366aa5 upstream.

    Flows can be created on UD and RAW_PACKET QP types. Attempts to provide
    other QP types as an input causes to various unpredictable failures.

    The reason is that in order to support all various types (e.g. XRC), we
    are supposed to use real_qp handle and not qp handle and expect to
    driver/FW to fail such (XRC) flows. The simpler and safer variant is to
    ban all QP types except UD and RAW_PACKET, instead of relying on
    driver/FW.

    Cc: # 3.11
    Fixes: 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through uverbs")
    Cc: syzkaller
    Reported-by: Noa Osherovich
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Leon Romanovsky
     
  • [ Upstream commit 2468b82d69e3a53d024f28d79ba0fdb8bf43dfbf ]

    Let's perform checks in-place instead of BUG_ONs.

    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Leon Romanovsky
     
  • [ Upstream commit cb2595c1393b4a5211534e6f0a0fbad369e21ad8 ]

    ucma_process_join() will free the new allocated "mc" struct,
    if there is any error after that, especially the copy_to_user().

    But in parallel, ucma_leave_multicast() could find this "mc"
    through idr_find() before ucma_process_join() frees it, since it
    is already published.

    So "mc" could be used in ucma_leave_multicast() after it is been
    allocated and freed in ucma_process_join(), since we don't refcnt
    it.

    Fix this by separating "publish" from ID allocation, so that we
    can get an ID first and publish it later after copy_to_user().

    Fixes: c8f6a362bf3e ("RDMA/cma: Add multicast communication support")
    Reported-by: Noam Rathaus
    Signed-off-by: Cong Wang
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • commit 06892cc190550807d332c95a0114c7e175584012 upstream.

    gcc-4.4.4 has issues with initialization of anonymous unions:

    drivers/infiniband/ulp/srpt/ib_srpt.c: In function 'srpt_zerolength_write':
    drivers/infiniband/ulp/srpt/ib_srpt.c:854: error: unknown field 'wr_cqe' specified in initializer
    drivers/infiniband/ulp/srpt/ib_srpt.c:854: warning: initialization makes integer from pointer without a cast

    Work aound this.

    Fixes: 2a78cb4db487 ("IB/srpt: Fix an out-of-bounds stack access in srpt_zerolength_write()")
    Cc: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Jason Gunthorpe
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Doug Ledford
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Andrew Morton
     
  • commit 2a78cb4db487372152bed2055c038f9634d595e8 upstream.

    Avoid triggering an out-of-bounds stack access by changing the type
    of 'wr' from ib_send_wr into ib_rdma_wr.

    This patch fixes the following KASAN bug report:

    BUG: KASAN: stack-out-of-bounds in rxe_post_send+0x7a9/0x9a0 [rdma_rxe]
    Read of size 8 at addr ffff880068197a48 by task kworker/2:1/44

    Workqueue: ib_cm cm_work_handler [ib_cm]
    Call Trace:
    dump_stack+0x8e/0xcd
    print_address_description+0x6f/0x280
    kasan_report+0x25a/0x380
    __asan_load8+0x54/0x90
    rxe_post_send+0x7a9/0x9a0 [rdma_rxe]
    srpt_zerolength_write+0xf0/0x180 [ib_srpt]
    srpt_cm_rtu_recv+0x68/0x110 [ib_srpt]
    srpt_rdma_cm_handler+0xbb/0x15b [ib_srpt]
    cma_ib_handler+0x1aa/0x4a0 [rdma_cm]
    cm_process_work+0x30/0x100 [ib_cm]
    cm_work_handler+0xa86/0x351b [ib_cm]
    process_one_work+0x475/0x9f0
    worker_thread+0x69/0x690
    kthread+0x1ad/0x1d0
    ret_from_fork+0x3a/0x50

    Fixes: aaf45bd83eba ("IB/srpt: Detect session shutdown reliably")
    Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: stable@vger.kernel.org
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Bart Van Assche
     
  • commit 6ee687735e745eafae9e6b93d1ea70bc52e7ad07 upstream.

    gcc-4.4.4 has issues with initialization of anonymous unions.

    drivers/infiniband/core/verbs.c: In function '__ib_drain_sq':
    drivers/infiniband/core/verbs.c:2204: error: unknown field 'wr_cqe' specified in initializer
    drivers/infiniband/core/verbs.c:2204: warning: initialization makes integer from pointer without a cast

    Work around this.

    Fixes: a1ae7d0345edd5 ("RDMA/core: Avoid that ib_drain_qp() triggers an out-of-bounds stack access")
    Cc: Bart Van Assche
    Cc: Steve Wise
    Cc: Sagi Grimberg
    Cc: Jason Gunthorpe
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Doug Ledford
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Andrew Morton
     
  • commit a1ae7d0345edd593d6725d3218434d903a0af95d upstream.

    This patch fixes the following KASAN complaint:

    ==================================================================
    BUG: KASAN: stack-out-of-bounds in rxe_post_send+0x77d/0x9b0 [rdma_rxe]
    Read of size 8 at addr ffff880061aef860 by task 01/1080

    CPU: 2 PID: 1080 Comm: 01 Not tainted 4.16.0-rc3-dbg+ #2
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
    Call Trace:
    dump_stack+0x85/0xc7
    print_address_description+0x65/0x270
    kasan_report+0x231/0x350
    rxe_post_send+0x77d/0x9b0 [rdma_rxe]
    __ib_drain_sq+0x1ad/0x250 [ib_core]
    ib_drain_qp+0x9/0x30 [ib_core]
    srp_destroy_qp+0x51/0x70 [ib_srp]
    srp_free_ch_ib+0xfc/0x380 [ib_srp]
    srp_create_target+0x1071/0x19e0 [ib_srp]
    kernfs_fop_write+0x180/0x210
    __vfs_write+0xb1/0x2e0
    vfs_write+0xf6/0x250
    SyS_write+0x99/0x110
    do_syscall_64+0xee/0x2b0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7

    The buggy address belongs to the page:
    page:ffffea000186bbc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
    flags: 0x4000000000000000()
    raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff
    raw: 0000000000000000 ffffea000186bbe0 0000000000000000 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff880061aef700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ffff880061aef780: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00
    >ffff880061aef800: f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 f2 f2 f2 f2
    ^
    ffff880061aef880: f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 f2 f2
    ffff880061aef900: f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00 00
    ==================================================================

    Fixes: 765d67748bcf ("IB: new common API for draining queues")
    Signed-off-by: Bart Van Assche
    Cc: Steve Wise
    Cc: Sagi Grimberg
    Cc: stable@vger.kernel.org
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Bart Van Assche
     

17 Jul, 2018

3 commits

  • commit 7a8690ed6f5346f6738971892205e91d39b6b901 upstream.

    In commit 357d23c811a7 ("Remove the obsolete libibcm library")
    in rdma-core [1], we removed obsolete library which used the
    /dev/infiniband/ucmX interface.

    Following multiple syzkaller reports about non-sanitized
    user input in the UCMA module, the short audit reveals the same
    issues in UCM module too.

    It is better to disable this interface in the kernel,
    before syzkaller team invests time and energy to harden
    this unused interface.

    [1] https://github.com/linux-rdma/rdma-core/pull/279

    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Leon Romanovsky
     
  • commit 7b72717a20bba8bdd01b14c0460be7d15061cd6b upstream.

    The code was mistakenly using the length of the page array memory instead
    of the depth of the page array.

    This would cause MR creation to fail in some cases.

    Fixes: 8376b86de7d3 ("iw_cxgb4: Support the new memory registration API")
    Cc: stable@vger.kernel.org
    Signed-off-by: Steve Wise
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Steve Wise
     
  • commit b697d7d8c741f27b728a878fc55852b06d0f6f5e upstream.

    The __get_txreq() function can return a pointer, ERR_PTR(-EBUSY), or NULL.
    All of the relevant call sites look for IS_ERR, so the NULL return would
    lead to a NULL pointer exception.

    Do not use the ERR_PTR mechanism for this function.

    Update all call sites to handle the return value correctly.

    Clean up error paths to reflect return value.

    Fixes: 45842abbb292 ("staging/rdma/hfi1: move txreq header code")
    Cc: # 4.9.x+
    Reported-by: Dan Carpenter
    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Kamenee Arumugam
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Michael J. Ruhl
     

03 Jul, 2018

4 commits

  • commit 6b1ca7ece15e94251d1d0d919f813943e4a58059 upstream.

    There is no need to crash the machine if unknown work request was
    received in SQP MAD.

    Cc: # 3.6
    Fixes: 37bfc7c1e83f ("IB/mlx4: SR-IOV multiplex and demultiplex MADs")
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Leon Romanovsky
     
  • commit 1bc0299d976e000ececc6acd76e33b4582646cb7 upstream.

    The following code fails to allocate a buffer for the
    tail address that the hardware DMAs into when the user
    context DMA_RTAIL is set.

    if (HFI1_CAP_KGET_MASK(rcd->flags, DMA_RTAIL)) {
    rcd->rcvhdrtail_kvaddr = dma_zalloc_coherent(
    &dd->pcidev->dev, PAGE_SIZE, &dma_hdrqtail,
    gfp_flags);
    if (!rcd->rcvhdrtail_kvaddr)
    goto bail_free;
    rcd->rcvhdrqtailaddr_dma = dma_hdrqtail;
    }

    So the rcvhdrtail_kvaddr would then be NULL.

    The mmap logic fails to check for a NULL rcvhdrtail_kvaddr.

    The fix is to test for both user and kernel DMA_TAIL options
    during the allocation as well as testing for a NULL
    rcvhdrtail_kvaddr during the mmap processing.

    Additionally, all downstream testing of the capmask for DMA_RTAIL
    have been eliminated in favor of testing rcvhdrtail_kvaddr.

    Cc: # 4.9.x
    Reviewed-by: Michael J. Ruhl
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Mike Marciniszyn
     
  • commit af8aab71370a692eaf7e7969ba5b1a455ac20113 upstream.

    All threads queuing CQ entries on different CQs are unnecessarily
    synchronized by a spin lock to check if the CQ kthread worker hasn't
    been destroyed before queuing an CQ entry.

    The lock used in 6efaf10f163d ("IB/rdmavt: Avoid queuing work into a
    destroyed cq kthread worker") is a device global lock and will have
    poor performance at scale as completions are entered from a large
    number of CPUs.

    Convert to use RCU where the read side of RCU is rvt_cq_enter() to
    determine that the worker is alive prior to triggering the
    completion event.
    Apply write side RCU semantics in rvt_driver_cq_init() and
    rvt_cq_exit().

    Fixes: 6efaf10f163d ("IB/rdmavt: Avoid queuing work into a destroyed cq kthread worker")
    Cc: # 4.14.x
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Sebastian Sanchez
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Sebastian Sanchez
     
  • commit a93a0a31111231bb1949f4a83b17238f0fa32d6a upstream.

    User send context integrity bits are cleared before the context is
    disabled. If the send context is still processing data, any packets
    that need those integrity bits will cause an error and halt the send
    context.

    During the disable handling, the driver waits for the context to drain.
    If the context is halted, the driver will eventually timeout because
    the context won't drain and then incorrectly bounce the link.

    Reorder the bit clearing and the context disable.

    Examine the software state and send context status as well as the
    egress status to determine if a send context is in the halted state.

    Promote the check macros to static functions for consistency with the
    new check and to follow kernel style.

    Remove an unused define that refers to the egress timeout.

    Cc: # 4.9.x
    Reviewed-by: Mitko Haralanov
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Michael J. Ruhl