15 Feb, 2017

2 commits

  • commit 647bf3d8a8e5777319da92af672289b2a6c4dc66 upstream.

    Update the range check to avoid integer-overflow in edge case.
    Resolves CVE 2016-8636.

    Signed-off-by: Eyal Itkin
    Signed-off-by: Dan Carpenter
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Eyal Itkin
     
  • commit 628f07d33c1f2e7bf31e0a4a988bb07914bd5e73 upstream.

    Update the response's resid field when larger than MTU, instead of only
    updating the local resid variable.

    Fixes: 8700e3e7c485 ("Soft RoCE driver")
    Signed-off-by: Eyal Itkin
    Signed-off-by: Dan Carpenter
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Eyal Itkin
     

01 Feb, 2017

2 commits

  • commit 2d4b21e0a2913612274a69a3ba1bfee4cffc6e77 upstream.

    On UD QP completer tasklet is scheduled for each packet sent.

    If it is followed by a destroy_qp(), the kernel panic will
    happen as the completer tries to operate on a destroyed QP.

    Fixes: 8700e3e7c485 ("Soft RoCE driver")
    Signed-off-by: Yonatan Cohen
    Reviewed-by: Moni Shoua
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Yonatan Cohen
     
  • commit f39f775218a7520e3700de2003c84a042c3b5972 upstream.

    The first argument of list_add_tail is the new item and the second
    is the head of the list. Fix the code to pass arguments in the
    right order, otherwise not all the rxe devices will be removed
    during teardown.

    Fixes: 8700e3e7c4857 ('Soft RoCE driver')
    Signed-off-by: Maor Gottlieb
    Reviewed-by: Moni Shoua
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Maor Gottlieb
     

26 Jan, 2017

2 commits

  • commit a0fa72683e78979ef1123d679b1c40ae28bd9096 upstream.

    A race condition fix added an rxe_qp structure to the stack in order
    to be able to perform rollback in rxe_requester(), but the structure
    is large enough to trigger the warning for possible stack overflow:

    drivers/infiniband/sw/rxe/rxe_req.c: In function 'rxe_requester':
    drivers/infiniband/sw/rxe/rxe_req.c:757:1: error: the frame size of 2064 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

    This changes the rollback function to only save the psn inside
    the qp, which is the only field we access in the rollback_qp
    anyway.

    Fixes: 3050b9985024 ("IB/rxe: Fix race condition between requester and completer")
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Arnd Bergmann
     
  • commit d680ebed91e0b45c43ae03a880a0b43211096161 upstream.

    Increase limit of max CQE from 8K to 32K to allow demanding
    applications to work over SoftRoCE with same configuration
    as most RoCEv2 HW vendors have.

    Fixes: 8700e3e7c485 ("Soft RoCE driver")
    Signed-off-by: Yonatan Cohen
    Reviewed-by: Moni Shoua
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Yonatan Cohen
     

09 Jan, 2017

1 commit

  • commit e259934d4df7f99f2a5c2c4f074f6a55bd4b1722 upstream.

    A socket is associated with every QP by the rxe driver but sock_release()
    is never called. Add a call to sock_release() in rxe_qp_cleanup().

    Fixes: commit 8700e3e7c48A5 ("Add Soft RoCE driver")
    Signed-off-by: Bart Van Assche
    Cc: Moni Shoua
    Cc: Kamal Heib
    Cc: Amir Vadai
    Cc: Haggai Eran
    Reviewed-by: Moni Shoua
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Bart Van Assche
     

17 Nov, 2016

5 commits

  • Doug Ledford
     
  • The method rxe_qp_error() transitions QP to error state
    and make sure the QP is drained. It did not though update
    the QP state for user's query.

    This patch fixes this.

    Fixes: 8700e3e7c485 ("Soft RoCE driver")
    Signed-off-by: Yonatan Cohen
    Reviewed-by: Moni Shoua
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Yonatan Cohen
     
  • RXE resets the send-q only once in rxe_qp_init_req() when
    QP is created, but when the QP is reused after QP reset, the send-q
    holds previous garbage data.

    This garbage data wrongly fails CQEs that otherwise
    should have completed successfully.

    Fixes: 8700e3e7c485 ("Soft RoCE driver")
    Signed-off-by: Yonatan Cohen
    Reviewed-by: Moni Shoua
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Yonatan Cohen
     
  • To correctly handle a erroneous WR this fix does the following
    1. Make sure the bad WQE causes a user completion event.
    2. Call rxe_completer to handle the erred WQE.

    Before the fix, when rxe_requester found a bad WQE, it changed its
    status to IB_WC_LOC_PROT_ERR and exit with 0 for non RC QPs.

    If this was the 1st WQE then there would be no ACK to invoke the
    completer and this bad WQE would be stuck in the QP's send-q.

    On top of that the requester exiting with 0 caused rxe_do_task to
    endlessly invoke rxe_requester, resulting in a soft-lockup attached
    below.

    In case the WQE was not the 1st and rxe_completer did get a chance to
    handle the bad WQE, it did not cause a complete event since the WQE's
    IB_SEND_SIGNALED flag was not set.

    Setting WQE status to IB_SEND_SIGNALED is subject to IBA spec
    version 1.2.1, section 10.7.3.1 Signaled Completions.

    NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s!
    [] ? rxe_pool_get_index+0x35/0xb0 [rdma_rxe]
    [] lookup_mem+0x3c/0xc0 [rdma_rxe]
    [] copy_data+0x1c4/0x230 [rdma_rxe]
    [] rxe_requester+0x9d0/0x1100 [rdma_rxe]
    [] ? kfree_skbmem+0x5a/0x60
    [] rxe_do_task+0x89/0xf0 [rdma_rxe]
    [] rxe_run_task+0x12/0x30 [rdma_rxe]
    [] rxe_post_send+0x41a/0x550 [rdma_rxe]
    [] ? __kmalloc+0x182/0x200
    [] ? down_read+0x12/0x40
    [] ib_uverbs_post_send+0x532/0x540 [ib_uverbs]
    [] ? tcp_sendmsg+0x402/0xb80
    [] ib_uverbs_write+0x18c/0x3f0 [ib_uverbs]
    [] ? inet_recvmsg+0x7e/0xb0
    [] ? sock_recvmsg+0x3d/0x50
    [] __vfs_write+0x37/0x140
    [] vfs_write+0xb2/0x1b0
    [] SyS_write+0x55/0xc0
    [] entry_SYSCALL_64_fastpath+0x1a/0xa

    Fixes: 8700e3e7c485 ("Soft RoCE driver")
    Signed-off-by: Yonatan Cohen
    Reviewed-by: Moni Shoua
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Yonatan Cohen
     
  • Missing initialization of udp_tunnel_sock_cfg causes to following
    kernel panic, while kernel tries to execute gro_receive().

    While being there, we converted udp_port_cfg to use the same
    initialization scheme as udp_tunnel_sock_cfg.

    ------------[ cut here ]------------
    kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
    BUG: unable to handle kernel paging request at ffffffffa0588c50
    IP: [] __this_module+0x50/0xffffffffffff8400 [ib_rxe]
    PGD 1c09067 PUD 1c0a063 PMD bb394067 PTE 80000000ad5e8163
    Oops: 0011 [#1] SMP
    Modules linked in: ib_rxe ip6_udp_tunnel udp_tunnel
    CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.7.0-rc3+ #2
    Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
    task: ffff880235e4e680 ti: ffff880235e68000 task.ti: ffff880235e68000
    RIP: 0010:[]
    [] __this_module+0x50/0xffffffffffff8400 [ib_rxe]
    RSP: 0018:ffff880237343c80 EFLAGS: 00010282
    RAX: 00000000dffe482d RBX: ffff8800ae330900 RCX: 000000002001b712
    RDX: ffff8800ae330900 RSI: ffff8800ae102578 RDI: ffff880235589c00
    RBP: ffff880237343cb0 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800ae33e262
    R13: ffff880235589c00 R14: 0000000000000014 R15: ffff8800ae102578
    FS: 0000000000000000(0000) GS:ffff880237340000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffffffa0588c50 CR3: 0000000001c06000 CR4: 00000000000006e0
    Stack:
    ffffffff8160860e ffff8800ae330900 ffff8800ae102578 0000000000000014
    000000000000004e ffff8800ae102578 ffff880237343ce0 ffffffff816088fb
    0000000000000000 ffff8800ae330900 0000000000000000 00000000ffad0000
    Call Trace:

    [] ? udp_gro_receive+0xde/0x130
    [] udp4_gro_receive+0x10b/0x2d0
    [] inet_gro_receive+0x1d3/0x270
    [] dev_gro_receive+0x269/0x3b0
    [] napi_gro_receive+0x38/0x120
    [] mlx5e_handle_rx_cqe+0x27e/0x340 [mlx5_core]
    [] mlx5e_poll_rx_cq+0x66/0x6d0 [mlx5_core]
    [] mlx5e_napi_poll+0x8e/0x400 [mlx5_core]
    [] net_rx_action+0x160/0x380
    [] __do_softirq+0xd7/0x2c5
    [] irq_exit+0xf5/0x100
    [] do_IRQ+0x56/0xd0
    [] common_interrupt+0x8c/0x8c

    [] ? native_safe_halt+0x6/0x10
    [] default_idle+0x1e/0xd0
    [] arch_cpu_idle+0xf/0x20
    [] default_idle_call+0x3c/0x50
    [] cpu_startup_entry+0x323/0x3c0
    [] start_secondary+0x15c/0x1a0
    RIP [] __this_module+0x50/0xffffffffffff8400 [ib_rxe]
    RSP
    CR2: ffffffffa0588c50
    ---[ end trace 489ee31fa7614ac5 ]---
    Kernel panic - not syncing: Fatal exception in interrupt
    Kernel Offset: disabled
    ---[ end Kernel panic - not syncing: Fatal exception in interrupt
    ------------[ cut here ]------------

    Fixes: 8700e3e7c485 ("Soft RoCE driver")
    Signed-off-by: Yonatan Cohen
    Reviewed-by: Moni Shoua
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Yonatan Cohen
     

16 Nov, 2016

1 commit

  • The initial code for rdmavt carried with it a restriction that was a
    vestige from the qib driver, that to dma map a page it had to be less
    than a page size. This is not the case on modern hardware, both qib and
    hfi1 will be just fine with unaligned map requests.

    This fixes a 4.8 regression where by an IPoIB transfer of > PAGE_SIZE
    will hang because the dma map page call always fails. This was
    introduced after commit 5faba5469522 ("IB/ipoib: Report SG feature
    regardless of HW UD CSUM capability") added the capability to use SG by
    default. Rather than override this, the HW supports it, so allow SG.

    Cc: Stable # 4.8
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     

12 Oct, 2016

1 commit

  • A good practice is to prefix the names of functions by the name
    of the subsystem.

    The kthread worker API is a mix of classic kthreads and workqueues. Each
    worker has a dedicated kthread. It runs a generic function that process
    queued works. It is implemented as part of the kthread subsystem.

    This patch renames the existing kthread worker API to use
    the corresponding name from the workqueues API prefixed by
    kthread_:

    __init_kthread_worker() -> __kthread_init_worker()
    init_kthread_worker() -> kthread_init_worker()
    init_kthread_work() -> kthread_init_work()
    insert_kthread_work() -> kthread_insert_work()
    queue_kthread_work() -> kthread_queue_work()
    flush_kthread_work() -> kthread_flush_work()
    flush_kthread_worker() -> kthread_flush_worker()

    Note that the names of DEFINE_KTHREAD_WORK*() macros stay
    as they are. It is common that the "DEFINE_" prefix has
    precedence over the subsystem names.

    Note that INIT() macros and init() functions use different
    naming scheme. There is no good solution. There are several
    reasons for this solution:

    + "init" in the function names stands for the verb "initialize"
    aka "initialize worker". While "INIT" in the macro names
    stands for the noun "INITIALIZER" aka "worker initializer".

    + INIT() macros are used only in DEFINE() macros

    + init() functions are used close to the other kthread()
    functions. It looks much better if all the functions
    use the same scheme.

    + There will be also kthread_destroy_worker() that will
    be used close to kthread_cancel_work(). It is related
    to the init() function. Again it looks better if all
    functions use the same naming scheme.

    + there are several precedents for such init() function
    names, e.g. amd_iommu_init_device(), free_area_init_node(),
    jump_label_init_type(), regmap_init_mmio_clk(),

    + It is not an argument but it was inconsistent even before.

    [arnd@arndb.de: fix linux-next merge conflict]
    Link: http://lkml.kernel.org/r/20160908135724.1311726-1-arnd@arndb.de
    Link: http://lkml.kernel.org/r/1470754545-17632-3-git-send-email-pmladek@suse.com
    Suggested-by: Andrew Morton
    Signed-off-by: Petr Mladek
    Cc: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: "Paul E. McKenney"
    Cc: Josh Triplett
    Cc: Thomas Gleixner
    Cc: Jiri Kosina
    Cc: Borislav Petkov
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     

07 Oct, 2016

6 commits

  • 1. Debugging qp state transitions and qp errors in loopback and
    multiple QP tests is difficult without qp numbers in debug logs.
    This patch adds qp number to important debug logs.

    2. Instead of having rxe: prefix in few logs and not having in
    few logs, using uniform module name prefix using pr_fmt macro.

    3. Code cleanup for various warnings reported by checkpatch for
    incomplete unsigned data type, line over 80 characters, return
    statements.

    Signed-off-by: Parav Pandit
    Signed-off-by: Doug Ledford

    Parav Pandit
     
  • There is a problem when CONFIG_RDMA_RXE=y and CONFIG_IPV6=y. This
    results in the rdma_rxe initialization occurring before the IPv6
    services are ready. This patch delays the initialization of rdma_rxe
    until after the IPv6 services are ready. This fix is based on one
    proposed by Logan Gunthorpe on a much older code base.

    Signed-off-by: Stephen Bates
    Reviewed-by: Yonatan Cohen
    Signed-off-by: Doug Ledford

    Stephen Bates
     
  • This patch honoris the max incoming read request count instead of
    outgoing read req count
    (a) during modify qp by allocating response queue metadata
    (b) during incoming read request processing

    Signed-off-by: Parav Pandit
    Signed-off-by: Doug Ledford

    Parav Pandit
     
  • This patch fixes below kernel crash on memory registration for rxe
    and other transport drivers which has dma_ops extension.

    IB/core invokes ib_map_sg_attrs() in generic manner with dma attributes
    which is used by mlx5 and mthca adapters. However in doing so it
    ignored honoring dma_ops extension of software based transports for
    sg map/unmap operation. This results in calling dma_map_sg_attrs of
    hardware virtual device resulting in crash for null reference.

    We extend the core to support sg_map/unmap_attrs and transport drivers
    to implement those dma_ops callback functions.

    Verified usign perftest applications.

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] check_addr+0x35/0x60
    ...
    Call Trace:
    [] ? nommu_map_sg+0x99/0xd0
    [] ib_umem_get+0x3d6/0x470 [ib_core]
    [] rxe_mem_init_user+0x49/0x270 [rdma_rxe]
    [] ? rxe_add_index+0xca/0x100 [rdma_rxe]
    [] rxe_reg_user_mr+0x9f/0x130 [rdma_rxe]
    [] ib_uverbs_reg_mr+0x14e/0x2c0 [ib_uverbs]
    [] ib_uverbs_write+0x15b/0x3b0 [ib_uverbs]
    [] ? mem_cgroup_commit_charge+0x76/0xe0
    [] ? page_add_new_anon_rmap+0x89/0xc0
    [] ? lru_cache_add_active_or_unevictable+0x39/0xc0
    [] __vfs_write+0x28/0x120
    [] ? rw_verify_area+0x49/0xb0
    [] vfs_write+0xb2/0x1b0
    [] SyS_write+0x46/0xa0
    [] entry_SYSCALL_64_fastpath+0x1a/0xa4

    Signed-off-by: Parav Pandit
    Signed-off-by: Doug Ledford

    Parav Pandit
     
  • Both prepare4 and prepare6 sets loopback mask in pkt_info structure
    instance of skb. The xmit_packet and other requester side functions
    use a pkt_info struct from the stack without the proper mask. This
    results in sending out the packet to the actual netdev device and
    loopback functionality is broken.

    Modify prepare() to pass its correctly marked pkt_info struct to
    prepare4() and prepare6() instead of them using SKB_TO_PKT(skb) and
    getting an incorrectly set mask.

    Verified with perftest applications.

    Signed-off-by: Parav Pandit
    Signed-off-by: Doug Ledford

    Parav Pandit
     
  • This patch avoids scheduing tasklet for WQE and protocol processing
    for user space QP. It performs the task in calling process context.

    To improve code readability kernel specific post_send handling moved to
    post_send_kernel() function.

    Signed-off-by: Parav Pandit
    Signed-off-by: Doug Ledford

    Parav Pandit
     

03 Oct, 2016

1 commit


02 Oct, 2016

4 commits

  • This patch adds lockdep asserts in key code paths for
    insuring lock correctness.

    Reviewed-by: Ira Weiny
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     
  • Add an rvt_qp_init() to initialize specific
    common fields as the qp is created or reset.

    The routine is shared by the rvt_reset_qp() and
    the rvt_create_qp().

    The intent is that lock dep assertions will only
    appear in the rvt_reset_qp().

    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     
  • The reset calldown is misplaced.

    It should only be called in the code that actually
    transitions the QP to reset.

    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     
  • The __must_hold() is sufficent to correct the sparse
    context imbalance inside a function.

    Per Documentation/sparse.txt:
    __must_hold - The specified lock is held on function entry and exit.

    Fixes: Commit c0a67f6ba356 ("IB/rdmavt: Annotate rvt_reset_qp()")
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     

17 Sep, 2016

7 commits

  • This improves readability and hides the reference count
    mechanism from the client drivers.

    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     
  • The userspace memory region 'mr' is allocated with kzalloc in
    __rvt_alloc_mr however it is incorrectly being freed with vfree in
    __rvt_free_mr. Fix this by using kfree to free it.

    Signed-off-by: Colin Ian King
    Reviewed-by: Leon Romanovsky
    Acked-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Colin Ian King
     
  • Decrement qp reference when handling error path
    in completer to prevent kmem_cache leak.

    Fixes: 8700e3e7c485 ("Soft RoCE driver")
    Signed-off-by: Yonatan Cohen
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Yonatan Cohen
     
  • rxe_requester() is sending a pkt with rxe_xmit_packet() and
    then calls rxe_update() to update the wqe and qp's psn values.
    But sometimes the response is received before the requester
    had time to update the wqe in which case the completer
    acts on errornous wqe values.
    This fix updates the wqe and qp before actually sending
    the request and rolls back when xmit fails.

    Fixes: 8700e3e7c485 ("Soft RoCE driver")
    Signed-off-by: Yonatan Cohen
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Yonatan Cohen
     
  • When handling ack for atomic opcodes like "fetch&add"
    or "cmp&swp", the method send_atomic_ack() saves the ack
    before sending it, in case it gets lost and never reach the
    requester. In which case the method duplicate_request()
    will need to find it using the duplicated request.psn.
    But send_atomic_ack() used a wrong psn value and thus
    the above ack was never found.
    This fix uses the ack.psn to locate the ack in case
    its needed.
    This fix also copies the ack packet to the skb's control buffer
    since duplicate_request() will need it when calling rxe_xmit_packet()

    Fixes: 8700e3e7c485 ("Soft RoCE driver")
    Signed-off-by: Yonatan Cohen
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Yonatan Cohen
     
  • Disable creation of a UDP socket for ipv6 when
    CONFIG_IPV6 is not enabeld. Since udp_sock_create6()
    returns 0 when CONFIG_IPV6 is not set

    [ 46.888632] IP: [] setup_udp_tunnel_sock+0x6/0x4f
    [ 46.891355] *pdpt = 0000000000000000 *pde = f000ff53f000ff53
    [ 46.893918] Oops: 0002 [#1] PREEMPT
    [ 46.896014] CPU: 0 PID: 1 Comm: swapper Not tainted 4.7.0-rc4-00001-g8700e3e #1
    [ 46.900280] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
    [ 46.904905] task: cf06c040 ti: cf05e000 task.ti: cf05e000
    [ 46.907854] EIP: 0060:[] EFLAGS: 00210246 CPU: 0
    [ 46.911137] EIP is at setup_udp_tunnel_sock+0x6/0x4f
    [ 46.914070] EAX: 00000044 EBX: 00000001 ECX: cf05fef0 EDX: ca8142e0
    [ 46.917236] ESI: c2c4505b EDI: cf05fef0 EBP: cf05fed0 ESP: cf05fed0
    [ 46.919836] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
    [ 46.922046] CR0: 80050033 CR2: 000001fc CR3: 02cec000 CR4: 000006b0
    [ 46.924550] Stack:
    [ 46.926014] cf05ff10 c1fd4657 ca8142e0 0000000a 00000000 00000000 0000b712 00000008
    [ 46.931274] 00000000 6bb5bd01 c1fd48de 00000000 00000000 cf05ff1c 00000000 00000000
    [ 46.936122] cf05ff1c c1fd4bdf 00000000 cf05ff28 c2c4507b ffffffff cf05ff88 c2bf1c74
    [ 46.942350] Call Trace:
    [ 46.944403] [] rxe_setup_udp_tunnel+0x8f/0x99
    [ 46.947689] [] ? net_to_rxe+0x4e/0x4e
    [ 46.950567] [] rxe_net_init+0xe/0xa4
    [ 46.953147] [] rxe_module_init+0x20/0x4c
    [ 46.955448] [] do_one_initcall+0x89/0x113
    [ 46.957797] [] ? set_debug_rodata+0xf/0xf
    [ 46.959966] [] ? kernel_init_freeable+0xbe/0x15b
    [ 46.962262] [] kernel_init_freeable+0xde/0x15b
    [ 46.964418] [] kernel_init+0x8/0xd0
    [ 46.966618] [] ret_from_kernel_thread+0xe/0x24
    [ 46.969592] [] ? rest_init+0x6f/0x6f

    Fixes: 8700e3e7c485 ("Soft RoCE driver")
    Signed-off-by: Yonatan Cohen
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Yonatan Cohen
     
  • There is skb_clone(skb, GFP_KERNEL) in spinlock context
    in rxe_rcv_mcast_pkt().

    Found by Linux Driver Verification project (linuxtesting.org).

    Signed-off-by: Alexey Khoroshilov
    Acked-by: Moni Shoua
    Signed-off-by: Doug Ledford

    Alexey Khoroshilov
     

23 Aug, 2016

1 commit

  • The unwind logic for creating a user QP has a double vfree
    of the non-shared receive queue when handling a "too many qps"
    failure.

    The code unwinds the mmmap info by decrementing a reference
    count which will call rvt_release_mmap_info() which in turn
    does the vfree() of the r_rq.wq. The unwind code then does
    the same free.

    Fix by guarding the vfree() with the same test that is done
    in close and only do the vfree() if qp->ip is NULL.

    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     

05 Aug, 2016

2 commits

  • Pull second round of rdma updates from Doug Ledford:
    "This can be split out into just two categories:

    - fixes to the RDMA R/W API in regards to SG list length limits
    (about 5 patches)

    - fixes/features for the Intel hfi1 driver (everything else)

    The hfi1 driver is still being brought to full feature support by
    Intel, and they have a lot of people working on it, so that amounts to
    almost the entirety of this pull request"

    * tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits)
    IB/hfi1: Add cache evict LRU list
    IB/hfi1: Fix memory leak during unexpected shutdown
    IB/hfi1: Remove unneeded mm argument in remove function
    IB/hfi1: Consistently call ops->remove outside spinlock
    IB/hfi1: Use evict mmu rb operation
    IB/hfi1: Add evict operation to the mmu rb handler
    IB/hfi1: Fix TID caching actions
    IB/hfi1: Make the cache handler own its rb tree root
    IB/hfi1: Make use of mm consistent
    IB/hfi1: Fix user SDMA racy user request claim
    IB/hfi1: Fix error condition that needs to clean up
    IB/hfi1: Release node on insert failure
    IB/hfi1: Validate SDMA user iovector count
    IB/hfi1: Validate SDMA user request index
    IB/hfi1: Use the same capability state for all shared contexts
    IB/hfi1: Prevent null pointer dereference
    IB/hfi1: Rename TID mmu_rb_* functions
    IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister()
    IB/hfi1: Restructure hfi1_file_open
    IB/hfi1: Make iovec loop index easy to understand
    ...

    Linus Torvalds
     
  • Pull base rdma updates from Doug Ledford:
    "Round one of 4.8 code: while this is mostly normal, there is a new
    driver in here (the driver was hosted outside the kernel for several
    years and is actually a fairly mature and well coded driver). It
    amounts to 13,000 of the 16,000 lines of added code in here.

    Summary:

    - Updates/fixes for iw_cxgb4 driver
    - Updates/fixes for mlx5 driver
    - Add flow steering and RSS API
    - Add hardware stats to mlx4 and mlx5 drivers
    - Add firmware version API for RDMA driver use
    - Add the rxe driver (this is a software RoCE driver that makes any
    Ethernet device a RoCE device)
    - Fixes for i40iw driver
    - Support for send only multicast joins in the cma layer
    - Other minor fixes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits)
    Soft RoCE driver
    IB/core: Support for CMA multicast join flags
    IB/sa: Add cached attribute containing SM information to SA port
    IB/uverbs: Fix race between uverbs_close and remove_one
    IB/mthca: Clean up error unwind flow in mthca_reset()
    IB/mthca: NULL arg to pci_dev_put is OK
    IB/hfi1: NULL arg to sc_return_credits is OK
    IB/mlx4: Add diagnostic hardware counters
    net/mlx4: Query performance and diagnostics counters
    net/mlx4: Add diagnostic counters capability bit
    Use smaller 512 byte messages for portmapper messages
    IB/ipoib: Report SG feature regardless of HW UD CSUM capability
    IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct
    IB/hfi1: Disable by default
    IB/rdmavt: Disable by default
    IB/mlx5: Fix port counter ID association to QP offset
    IB/mlx5: Fix iteration overrun in GSI qps
    i40iw: Add NULL check for puda buffer
    i40iw: Change dup_ack_thresh to u8
    i40iw: Remove unnecessary check for moving CQ head
    ...

    Linus Torvalds
     

04 Aug, 2016

3 commits

  • Doug Ledford
     
  • Soft RoCE (RXE) - The software RoCE driver

    ib_rxe implements the RDMA transport and registers to the RDMA core
    device as a kernel verbs provider. It also implements the packet IO
    layer. On the other hand ib_rxe registers to the Linux netdev stack
    as a udp encapsulating protocol, in that case RDMA, for sending and
    receiving packets over any Ethernet device. This yields a RDMA
    transport over the UDP/Ethernet network layer forming a RoCEv2
    compatible device.

    The configuration procedure of the Soft RoCE drivers requires
    binding to any existing Ethernet network device. This is done with
    /sys interface.

    A userspace Soft RoCE library (librxe) provides user applications
    the ability to run with Soft RoCE devices. The use of rxe verbs ins
    user space requires the inclusion of librxe as a device specifics
    plug-in to libibverbs. librxe is packaged separately.

    Architecture:

    +-----------------------------------------------------------+
    | Application |
    +-----------------------------------------------------------+
    +-----------------------------------+
    | libibverbs |
    User +-----------------------------------+
    +----------------+ +----------------+
    | librxe | | HW RoCE lib |
    +----------------+ +----------------+
    +---------------------------------------------------------------+
    +--------------+ +------------+
    | Sockets | | RDMA ULP |
    +--------------+ +------------+
    +--------------+ +---------------------+
    | TCP/IP | | ib_core |
    +--------------+ +---------------------+
    +------------+ +----------------+
    Kernel | ib_rxe | | HW RoCE driver |
    +------------+ +----------------+
    +------------------------------------+
    | NIC driver |
    +------------------------------------+

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +-----------------------------------------------------------+
    | Application |
    +-----------------------------------------------------------+
    +-----------------------------------+
    | libibverbs |
    User +-----------------------------------+
    +----------------+ +----------------+
    | librxe | | HW RoCE lib |
    +----------------+ +----------------+
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +--------------+ +------------+
    | Sockets | | RDMA ULP |
    +--------------+ +------------+
    +--------------+ +---------------------+
    | TCP/IP | | ib_core |
    +--------------+ +---------------------+
    +------------+ +----------------+
    Kernel | ib_rxe | | HW RoCE driver |
    +------------+ +----------------+
    +------------------------------------+
    | NIC driver |
    +------------------------------------+
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    Soft RoCE resources:

    [1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in
    Github
    [2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE
    Wiki page
    [3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library

    Signed-off-by: Kamal Heib
    Signed-off-by: Amir Vadai
    Signed-off-by: Moni Shoua
    Reviewed-by: Haggai Eran
    Signed-off-by: Doug Ledford

    Moni Shoua
     
  • There is a strict policy in the Linux kernel that new drivers must be
    disabled by default. Hence leave out the "default m" line from Kconfig.

    Fixes: 0194621b2253 ("IB/rdmavt: Create module framework and handle driver registration")
    Signed-off-by: Bart Van Assche
    Cc: Jubin John
    Cc: Dennis Dalessandro
    Cc: Ira Weiny
    Cc: Mike Marciniszyn
    Cc: # v4.6+
    Signed-off-by: Doug Ledford

    Bart Van Assche
     

03 Aug, 2016

2 commits

  • The use of the specific opcode test is redundant since
    all ack entry users correctly manipulate the mr pointer
    to selectively trigger the reference clearing.

    The overly specific test hinders the use of implementation
    specific operations.

    The change needs to get rid of the union to insure that
    an atomic value is not seen as an MR pointer.

    Reviewed-by: Ashutosh Dixit
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Ira Weiny
    Signed-off-by: Doug Ledford

    Ira Weiny
     
  • Hanging has been observed while writing a file over NFSoRDMA. Dmesg on
    the server contains messages like these:

    [ 931.992501] svcrdma: Error -22 posting RDMA_READ
    [ 952.076879] svcrdma: Error -22 posting RDMA_READ
    [ 982.154127] svcrdma: Error -22 posting RDMA_READ
    [ 1012.235884] svcrdma: Error -22 posting RDMA_READ
    [ 1042.319194] svcrdma: Error -22 posting RDMA_READ

    Here is why:

    With the base memory management extension enabled, FRMR is used instead
    of FMR. The xprtrdma server issues each RDMA read request as the following
    bundle:

    (1)IB_WR_REG_MR, signaled;
    (2)IB_WR_RDMA_READ, signaled;
    (3)IB_WR_LOCAL_INV, signaled & fencing.

    These requests are signaled. In order to generate completion, the fast
    register work request is processed by the hfi1 send engine after being
    posted to the work queue, and the corresponding lkey is not valid until
    the request is processed. However, the rdmavt driver validates lkey when
    the RDMA read request is posted and thus it fails immediately with error
    -EINVAL (-22).

    This patch changes the work flow of local operations (fast register and
    local invalidate) so that fast register work requests are always
    processed immediately to ensure that the corresponding lkey is valid
    when subsequent work requests are posted. Local invalidate requests are
    processed immediately if fencing is not required and no previous local
    invalidate request is pending.

    To allow completion generation for signaled local operations that have
    been processed before posting to the work queue, an internal send flag
    RVT_SEND_COMPLETION_ONLY is added. The hfi1 send engine checks this flag
    and only generates completion for such requests.

    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Jianxin Xiong
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Jianxin Xiong