14 Sep, 2019

1 commit

  • This patch adds a counter for credit waits to assist field debugging.

    Link: https://lore.kernel.org/r/20190911113047.126040.10857.stgit@awfm-01.aw.intel.com
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Kaike Wan
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe

    Kaike Wan
     

29 Jun, 2019

1 commit

  • Historically rdmavt destroy_ah() has returned an -EBUSY when the AH has a
    non-zero reference count. IBTA 11.2.2 notes no such return value or error
    case:

    Output Modifiers:
    - Verb results:
    - Operation completed successfully.
    - Invalid HCA handle.
    - Invalid address handle.

    ULPs never test for this error and this will leak memory.

    The reference count exists to allow for driver independent progress
    mechanisms to process UD SWQEs in parallel with post sends. The SWQE will
    hold a reference count until the UD SWQE completes and then drops the
    reference.

    Fix by removing need to reference count the AH. Add a UD specific
    allocation to each SWQE entry to cache the necessary information for
    independent progress. Copy the information during the post send
    processing.

    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe

    Michael J. Ruhl
     

11 Jun, 2019

1 commit


24 Apr, 2019

1 commit

  • The currently include file ordering for rdmavt headers has an
    ab/ba include issue the precludes using inlines from rdma_vt.h
    in rdmavt_qp.h.

    At the heart of the issue is that rdma_vt.h includes rdmavt_qp.h.

    Fix the ordering issue by adjusting rdma_vt.h to not require rdmavt_qp.h
    and move qp related inlines to rdmavt_qp.h.

    Additionally, promote rvt_mmap_info to rdma_vt.h since it is shared
    by rdmavt_cq.h and rdmavt_qp.h.

    Reviewed-by: Michael J. Ruhl
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe

    Mike Marciniszyn
     

16 Feb, 2019

1 commit


06 Feb, 2019

1 commit

  • The RC retry timeout value is based on the estimated time for the
    response packet to come back. However, for TID RDMA READ request, due
    to the use of header suppression, the driver is normally not notified
    for each incoming response packet until the last TID RDMA READ response
    packet. Consequently, the retry timeout value should be extended to
    cover the transaction time for the entire length of a segment (default
    256K) instead of that for a single packet. This patch addresses the
    issue by introducing new retry timer functions to account for multiple
    packets and wrapper functions for backward compatibility.

    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Kaike Wan
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Kaike Wan
     

01 Feb, 2019

1 commit

  • The OPFN protocol uses the COMPARE_SWAP request to exchange data
    between the requester and the responder and therefore needs to
    be stored in the QP's s_ack_queue when the request is received
    on the responder side. However, because the user does not know
    anything about the OPFN protocol, this extra entry in the
    queue cannot be advertised to the user. This patch adds an extra
    entry in a QP's s_ack_queue.

    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Mitko Haralanov
    Signed-off-by: Kaike Wan
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Kaike Wan
     

15 Jan, 2019

1 commit

  • Most provider routines are callback routines which ib core invokes.
    _callback suffix doesn't convey information about when such callback is
    invoked. Therefore, rename port_callback to init_port.

    Additionally, store the init_port function pointer in ib_device_ops, so
    that it can be accessed in subsequent patches when binding rdma device to
    net namespace.

    Signed-off-by: Parav Pandit
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Parav Pandit
     

07 Dec, 2018

1 commit

  • This patch adds an interface to allow the driver to initialize the QP priv
    struct when the QP is created and after the qpn has been assigned. A
    field is added to the QP priv struct to reference the rcd and two new
    files are added to contain the function to initialize the rcd field so
    that more TID RDMA related code can be added here later.

    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Kaike Wan
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe

    Mike Marciniszyn
     

04 Oct, 2018

2 commits

  • Moving send completion code into rdmavt in order to have shared logic
    between qib and hfi1 drivers.

    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Brian Welty
    Signed-off-by: Venkata Sandeep Dhanalakota
    Signed-off-by: Harish Chegondi
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe

    Venkata Sandeep Dhanalakota
     
  • This patch moves hfi1_copy_sge() into rdmavt for sharing with qib.
    This patch also moves all the wss_*() functions into rdmavt as
    several wss_*() functions are called from hfi1_copy_sge()

    When SGE copy mode is adaptive, cacheless copy may be done in some cases
    for performance reasons. In those cases, X86 cacheless copy function
    is called since the drivers that use rdmavt and may set SGE copy mode
    to adaptive are X86 only. For this reason, this patch adds
    "depends on X86_64" to rdmavt/Kconfig.

    Reviewed-by: Ashutosh Dixit
    Reviewed-by: Michael J. Ruhl
    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Brian Welty
    Signed-off-by: Harish Chegondi
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe

    Brian Welty
     

01 Oct, 2018

2 commits

  • Current implementation allows each qp to have only one send engine. As
    such, each qp has only one list to queue prebuilt packets when send engine
    resources are not available. To improve performance, it is desired to
    support multiple send engines for each qp.

    This patch creates the framework to support two send engines
    (two legs) for each qp for the TID RDMA protocol, which can be easily
    extended to support more send engines. It achieves the goal by creating a
    leg specific struct, iowait_work in the iowait struct, to hold the
    work_struct and the tx_list as well as a pointer to the parent iowait
    struct.

    The hfi1_pkt_state now has an additional field to record the current legs
    work structure and that is now passed to all egress waiters to determine
    the leg that needs to wait via a new iowait helper. The APIs are adjusted
    to use the new leg specific struct as required.

    Many new and modified helpers are added to support this change.

    Reviewed-by: Mitko Haralanov
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Kaike Wan
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe

    Dennis Dalessandro
     
  • The driver-provided function check_send_wqe allows the hardware driver to
    check and set up the incoming send wqe before it is inserted into the swqe
    ring. This patch will rename it as setup_wqe to better reflect its
    usage. In addition, this function is only called when all setup is
    complete in rdmavt.

    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Kaike Wan
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe

    Kaike Wan
     

27 Sep, 2018

2 commits

  • These return the same thing but dev_name is a more conventional use of the
    kernel API.

    Signed-off-by: Jason Gunthorpe
    Reviewed-by: Steve Wise
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Dennis Dalessandro

    Jason Gunthorpe
     
  • The current code has two copies of the device name, ibdev->dev and
    dev_name(&ibdev->dev), and they are setup at different times, which is
    very confusing.

    Set them both up at the same time and make dev_name() the lead name, which
    is the proper use of the driver core APIs. To make it very clear that the
    name is not valid until registration pass it in to the
    ib_register_device() call rather than messing with ibdev->name directly.

    Also the reorganization now checks that dev_name is unique even if it does
    not contain a %.

    Signed-off-by: Jason Gunthorpe
    Acked-by: Adit Ranadive
    Reviewed-by: Steve Wise
    Acked-by: Devesh Sharma
    Reviewed-by: Shiraz Saleem
    Reviewed-by: Leon Romanovsky
    Reviewed-by: Dennis Dalessandro
    Reviewed-by: Michael J. Ruhl

    Jason Gunthorpe
     

11 Sep, 2018

1 commit


10 May, 2018

2 commits

  • Currently the driver doesn't support completion vectors. These
    are used to indicate which sets of CQs should be grouped together
    into the same vector. A vector is a CQ processing thread that
    runs on a specific CPU.

    If an application has several CQs bound to different completion
    vectors, and each completion vector runs on different CPUs, then
    the completion queue workload is balanced. This helps scale as more
    nodes are used.

    Implement CQ completion vector support using a global workqueue
    where a CQ entry is queued to the CPU corresponding to the CQ's
    completion vector. Since the workqueue is global, it's guaranteed
    to always be there when queueing CQ entries; Therefore, the RCU
    locking for cq->rdi->worker in the hot path is superfluous.

    Each completion vector is assigned to a different CPU. The number of
    completion vectors available is computed by taking the number of
    online, physical CPUs from the local NUMA node and subtracting the
    CPUs used for kernel receive queues and the general interrupt.
    Special use cases:

    * If there are no CPUs left for completion vectors, the same CPU
    for the general interrupt is used; Therefore, there would only
    be one completion vector available.

    * For multi-HFI systems, the number of completion vectors available
    for each device is the total number of completion vectors in
    the local NUMA node divided by the number of devices in the same
    NUMA node. If there's a division remainder, the first device to
    get initialized gets an extra completion vector.

    Upon a CQ creation, an invalid completion vector could be specified.
    Handle it as follows:

    * If the completion vector is less than 0, set it to 0.

    * Set the completion vector to the result of the passed completion
    vector moded with the number of device completion vectors
    available.

    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Sebastian Sanchez
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Sebastian Sanchez
     
  • All threads queuing CQ entries on different CQs are unnecessarily
    synchronized by a spin lock to check if the CQ kthread worker hasn't
    been destroyed before queuing an CQ entry.

    The lock used in 6efaf10f163d ("IB/rdmavt: Avoid queuing work into a
    destroyed cq kthread worker") is a device global lock and will have
    poor performance at scale as completions are entered from a large
    number of CPUs.

    Convert to use RCU where the read side of RCU is rvt_cq_enter() to
    determine that the worker is alive prior to triggering the
    completion event.
    Apply write side RCU semantics in rvt_driver_cq_init() and
    rvt_cq_exit().

    Fixes: 6efaf10f163d ("IB/rdmavt: Avoid queuing work into a destroyed cq kthread worker")
    Cc: # 4.14.x
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Sebastian Sanchez
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Sebastian Sanchez
     

20 Mar, 2018

1 commit

  • Extending uverbs_ioctl header with driver_id and another reserved
    field. driver_id should be used in order to identify the driver.
    Since every driver could have its own parsing tree, this is necessary
    for strace support.
    Downstream patches take off the EXPERIMENTAL flag from the ioctl() IB
    support and thus we add some reserved fields for future usage.

    Reviewed-by: Yishai Hadas
    Signed-off-by: Matan Barak
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Matan Barak
     

06 Jan, 2018

2 commits

  • rdmavt has a down call to client drivers to retrieve a crafted card
    name.

    This name should be the IB defined name.

    Rather than craft the name each time it is needed, simply retrieve
    the IB allocated name from the IB device.

    Update the function name to reflect its application.

    Clean up driver code to match this change.

    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Michael J. Ruhl
     
  • Currently the HFI and QIB drivers allow the IB core to assign a unit
    number to the driver name string.

    If multiple devices exist in a system, there is a possibility that the
    device unit number and the IB core number will be mismatched.

    Fix by using the driver defined unit number to generate the device
    name.

    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Michael J. Ruhl
     

23 Aug, 2017

2 commits


01 Aug, 2017

1 commit

  • A trap should be sent to the FM until the FM sends a repress message.
    This is in line with the IBTA 13.4.9.

    Add the ability to resend traps until a repress message is received.

    Reviewed-by: Dennis Dalessandro
    Reviewed-by: Michael N. Henry
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Michael J. Ruhl
     

24 Jul, 2017

1 commit


18 Jul, 2017

1 commit

  • The caller to the driver marks GFP_NOIO allocations with help
    of memalloc_noio-* calls now. This makes redundant to pass down
    to the driver gfp flags, which can be GFP_KERNEL only.

    The patch removes the gfp flags argument and updates all driver paths.

    Signed-off-by: Leon Romanovsky
    Signed-off-by: Leon Romanovsky
    Acked-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Leon Romanovsky
     

28 Jun, 2017

2 commits

  • Provide the ability for IB clients to modify the OPA specific
    capability mask and include this mask in the subsequent trap data.

    Reviewed-by: Niranjana Vishwanathapura
    Signed-off-by: Michael N. Henry
    Signed-off-by: Doug Ledford

    Vishwanathapura, Niranjana
     
  • SGEs that are contiguous needlessly consume driver dependent TX resources.

    The lkey validation logic is enhanced to compress the SGE that ends
    up in the send wqe when consecutive addresses are detected.

    The lkey validation API used to return 1 (success) or 0 (fail).

    The return value is now an -errno, 0 (compressed), or 1 (uncompressed). A
    additional argument is added to pass the last SQE for the compression.

    Loopback callers always pass a NULL to last_sge since the optimization is
    of little benefit in that situation.

    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Brian Welty
    Signed-off-by: Venkata Sandeep Dhanalakota
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     

02 May, 2017

1 commit


29 Apr, 2017

1 commit

  • The Infiniband spec defines "A multicast address is defined by a
    MGID and a MLID" (section 10.5).

    The current code only uses the MGID for identifying multicast groups.
    Update the driver to be compliant with this definition.

    Reviewed-by: Ira Weiny
    Reviewed-by: Dasaratharaman Chandramouli
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Michael J. Ruhl
     

19 Feb, 2017

2 commits

  • Convert copy_sge and related SGE state functions to use boolean.
    For determining if QP is in user mode, add helper function in rdmavt_qp.h.
    This is used to determine if QP needs the last byte ordering.
    While here, change rvt_pd.user to a boolean.

    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Dean Luick
    Signed-off-by: Brian Welty
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Brian Welty
     
  • To move common code across target to rdmavt for code reuse.

    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Brian Welty
    Signed-off-by: Venkata Sandeep Dhanalakota
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Venkata Sandeep Dhanalakota
     

16 Nov, 2016

1 commit


03 Aug, 2016

3 commits

  • This fix allows for support of in-kernel reserved operations
    without impacting the ULP user.

    The low level driver can register a non-zero value which
    will be transparently added to the send queue size and hidden
    from the ULP in every respect.

    ULP post sends will never see a full queue due to a reserved
    post send and reserved operations will never exceed that
    registered value.

    The s_avail will continue to track the ULP swqe availability
    and the difference between the reserved value and the reserved
    in use will track reserved availabity.

    Reviewed-by: Ashutosh Dixit
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     
  • In order to support extended memory management, add the mechanism to
    invalidate MR keys. This includes a flag "lkey_invalid" in the MR data
    structure that is to be checked when validating access to the MR via
    the associated key, and two utility functions to perform fast memory
    registration and memory key invalidate operations.

    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Jianxin Xiong
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Jianxin Xiong
     
  • Add flexibility for driver dependent operations in post send
    because different drivers will have differing post send
    operation support.

    This includes data structure definitions to support a table
    driven scheme along with the necessary validation routine
    using the new table.

    Reviewed-by: Ashutosh Dixit
    Reviewed-by: Jianxin Xiong
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     

23 Jun, 2016

1 commit

  • The current drivers return errors from this calldown
    wrapped in an ERR_PTR().

    The rdmavt code incorrectly tests for NULL.

    The code is fixed to use IS_ERR() and change ret according
    to the driver return value.

    Cc: Stable # 4.6+
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     

27 May, 2016

2 commits

  • rdmavt allows the driver to specify the size of the ack queue, but
    only uses it for the modify QP limit testing for setting the atomic
    limit value.

    The driver dependent size is now used to size the s_ack_queue ring
    dynamicially.

    Since the driver knows its size, the driver will use its define
    for any ring size dependent code.

    Reviewed-by: Mitko Haralanov
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     
  • This matches the ib_qp_attr size and
    avoids a extremely large value when the lower level
    driver registers.

    As part of the patch, the u8 ordinals are moved to the
    end of the struct to reduce pahole noted excesses.

    Reviewed-by: Mitko Haralanov
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     

29 Apr, 2016

1 commit

  • rdi->ports has memory allocated in rvt_alloc_device(), but does not get
    freed because the hfi1 and qib drivers drivers call ib_dealloc_device()
    directly instead of going through rdmavt. Add a rvt_dealloc_device()
    that frees rdi->ports and then calls ib_dealloc_device(). Switch hfi1
    and qib drivers to calling rvt_dealloc_device() instead of
    ib_dealloc_device() directly.

    Reviewed-by: Dennis Dalessandro
    Reviewed-by: Brian Welty
    Signed-off-by: Jubin John
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Jubin John