14 May, 2016

9 commits

  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Doug Ledford

    Christoph Hellwig
     
  • Replace the homegrown RDMA READ/WRITE code in srpt with the generic API.
    The only real twist here is that we need to allocate one Linux scatterlist
    per direct buffer in the SRP command, and chain them before handing them
    off to the target core.

    As a side-effect of the conversion the driver will also chain the SEND
    of the SRP response to the RDMA WRITE WRs for a DATA OUT command, and
    properly account for RDMA WRITE WRs instead of just for RDMA READ WRs
    like the driver previously did.

    We now allocate half of the SQ size to RDMA READ/WRITE contexts, assuming
    by default one RDMA READ or WRITE operation per command. If a command
    has multiple operations it will eat into the budget but will still succeed,
    possible after waiting for WQEs to be available.

    Also ensure the QPs request the maximum allowed SGEs so that RDMA R/W API
    works correctly.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Bart Van Assche
    Signed-off-by: Doug Ledford

    Christoph Hellwig
     
  • The SRP target driver will need to allocate and chain it's own SGLs soon.
    For this export target_alloc_sgl, and add a new argument to it so that it
    can allocate an additional chain entry that doesn't point to a page. Also
    export transport_free_sgl after renaming it to target_free_sgl to free
    these SGLs again.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Bart Van Assche
    Signed-off-by: Doug Ledford

    Christoph Hellwig
     
  • This supports both manual mapping of lots of SGEs, as well as using MRs
    from the QP's MR pool, for iWarp or other cases where it's more optimal.
    For now, MRs are only used for iWARP transports. The user of the RDMA-RW
    API must allocate the QP MR pool as well as size the SQ accordingly.

    Thanks to Steve Wise for testing, fixing and rewriting the iWarp support,
    and to Sagi Grimberg for ideas, reviews and fixes.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Doug Ledford

    Christoph Hellwig
     
  • This is the first step toward moving MR invalidation decisions
    to the core. It will be needed by the upcoming RW API.

    Signed-off-by: Steve Wise
    Reviewed-by: Bart Van Assche
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Doug Ledford

    Steve Wise
     
  • Signed-off-by: Christoph Hellwig
    Tested-by: Steve Wise
    Reviewed-by: Bart Van Assche
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Steve Wise
    Signed-off-by: Doug Ledford

    Christoph Hellwig
     
  • Split the XRC magic into a separate function, and return early on failure
    to make the initialization code readable.

    Signed-off-by: Christoph Hellwig
    Tested-by: Steve Wise
    Reviewed-by: Bart Van Assche
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Steve Wise
    Signed-off-by: Doug Ledford

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Tested-by: Steve Wise
    Reviewed-by: Steve Wise
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Doug Ledford

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Tested-by: Steve Wise
    Reviewed-by: Bart Van Assche
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Steve Wise
    Signed-off-by: Doug Ledford

    Christoph Hellwig
     

13 May, 2016

12 commits

  • The new RW API will need this.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Bart Van Assche
    Reviewed-by: Sagi Grimberg
    Tested-by: Steve Wise
    Signed-off-by: Doug Ledford

    Christoph Hellwig
     
  • Doug Ledford
     
  • Adding the needed mlx5_ifc hardware bits and structs
    for the following features:

    * Add vport to steering commands for SRIOV ACL support
    * Add mlcr, pcmr and mcia registers for dump module EEPROM
    * Add support for FCS, beacon led and disable_link bits to
    hca caps
    * Add CQE period mode bit in CQ context for CQE based CQ
    moderation support
    * Add umr SQ bit for fragmented memory registration
    * Add needed bits and caps for Striding RQ support

    In-order to avoid possible future conflicts between rdma and
    net-next we added all expected updates to this file for this release.
    If more changes will be submitted, we plan to do it only through
    one of the subsystems, probably net-next.

    All updated bits in this patch will be later used in
    the up-coming submissions to net-next and rdma trees.

    Signed-off-by: Saeed Mahameed
    Signed-off-by: Matan Barak
    Acked-by: Or Gerlitz
    Signed-off-by: Doug Ledford

    Saeed Mahameed
     
  • All reserved fields after early_vf_enable are off by 1, since
    early_vf_enable was not explicitly declared as array of size 1.

    Reserved field before cqe_zip had a wrong size, it should
    be 0x80 + 0x3f.

    Fixes: b0844444590e ("net/mlx5_core: Introduce access function to read internal timer ")
    Fixes: b4ff3a36d3e4 ("net/mlx5: Use offset based reserved field names in the IFC header file")
    Signed-off-by: Tariq Toukan
    Signed-off-by: Saeed Mahameed
    Signed-off-by: Matan Barak
    Acked-by: Or Gerlitz
    Signed-off-by: Doug Ledford

    Tariq Toukan
     
  • Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Sagi Grimberg
    Cc: Laurence Oberman
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Doug Ledford

    Bart Van Assche
     
  • Since all srp_map_finish_fr() callers pass a non-zero value as
    the fourth argument (sg_nents), the sg_nents == 0 check in that
    function can be removed. Add a count == 0 check in the caller
    of that function.

    Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Sagi Grimberg
    Cc: Laurence Oberman
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Doug Ledford

    Bart Van Assche
     
  • The srp_queuecommand() function translates ENOMEM into QUEUE_FULL
    which causes the SCSI mid-layer to retry the command. All other
    error codes are translated into DID_ERROR which causes the SCSI
    command to fail. Return E2BIG if mapping will always fail to
    prevent that the SCSI mid-layer keeps resubmitting a command
    forever.

    Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Sagi Grimberg
    Cc: Laurence Oberman
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Doug Ledford

    Bart Van Assche
     
  • This patch does not change any functionality.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Sagi Grimberg
    Cc: Christoph Hellwig
    Cc: Laurence Oberman
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Doug Ledford

    Bart Van Assche
     
  • Ensure that req->nmdesc is set correctly in srp_map_sg() if mapping
    fails. Avoid that mapping failure causes a memory descriptor leak.
    Report srp_map_sg() failure to the caller.

    Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Sagi Grimberg
    Cc: Laurence Oberman
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Doug Ledford

    Bart Van Assche
     
  • Signed-off-by: Bart Van Assche
    Reviewed-by: Sagi Grimberg
    Cc: Christoph Hellwig
    Cc: Laurence Oberman
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Doug Ledford

    Bart Van Assche
     
  • The free request list was removed through patch "IB/srp: Use block layer tags".
    Hence update a comment that refers to that free request list.

    Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Sagi Grimberg
    Cc: Laurence Oberman
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Doug Ledford

    Bart Van Assche
     
  • Change one occurrence of "boundries" into "boundaries".

    Signed-off-by: Bart Van Assche
    Reviewed-by: Sagi Grimberg
    Cc: Christoph Hellwig
    Cc: Laurence Oberman
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Doug Ledford

    Bart Van Assche
     

06 May, 2016

11 commits

  • Doug Ledford
     
  • Use c4iw_ep_disconnect() instead. This is part of getting rid of
    abort_connection() altogether so we properly clean up on send_abort()
    failures.

    This is the last user of abort_connection(), so remove it too.

    Signed-off-by: Steve Wise
    Signed-off-by: Hariprasad Shenai
    Signed-off-by: Doug Ledford

    Hariprasad S
     
  • In c4iw_ep_disconnect(), if we fail to initiate a close operation, then
    move the qp to ERROR to disassociate the ep from the qp. Failure to do
    this will leak the ep resources.

    Signed-off-by: Steve Wise
    Signed-off-by: Hariprasad Shenai
    Signed-off-by: Doug Ledford

    Hariprasad S
     
  • Instead return whether the caller needs to disconnect. This is part of
    getting rid of abort_connection() altogether so we properly clean up on
    send_abort() failures.

    Signed-off-by: Steve Wise
    Signed-off-by: Hariprasad Shenai
    Signed-off-by: Doug Ledford

    Hariprasad S
     
  • Use c4iw_ep_disconnect() instead. This is part of getting rid of
    abort_connection() altogether so we properly clean up on send_abort()
    failures.

    Signed-off-by: Steve Wise
    Signed-off-by: Hariprasad Shenai
    Signed-off-by: Doug Ledford

    Hariprasad S
     
  • Signed-off-by: Steve Wise
    Signed-off-by: Hariprasad Shenai
    Signed-off-by: Doug Ledford

    Hariprasad S
     
  • Instead, have the caller, rx_data() handle the close/abort like
    it does for process_mpa_request(). This is part of getting rid of
    abort_connection() altogether so we properly clean up on send_abort()
    failures.

    Signed-off-by: Steve Wise
    Signed-off-by: Hariprasad Shenai
    Signed-off-by: Doug Ledford

    Hariprasad S
     
  • In rx_data(), with the ep in FPDU_MODE, refcnt=2, if we get unexpected
    streaming data, we call c4iw_modify_rc_qp() and move the qp from
    RTS -> TERMINATE. In c4iw_modify_rc_qp(), if rdma_fini() returns
    an error, the ep will be dereferenced (refcnt=1). Then rx_data()
    calls c4iw_ep_disconnect() which starts the close operation.
    But if send_halfclose() fails in c4iw_ep_disconnect(), we will call
    release_ep_resources() derefing the ep which reduces the refcnt to 0 and
    and frees the ep. However we still has the ep mutex at that point, so we
    have a touch-after-free bug. There is a similar issue where
    peer_close() calls c4iw_ep_disconnect().

    The solution is to add a reference to the ep in c4iw_ep_disconnect()
    after acquiring the mutex, and release it after releasing the mutex.

    Signed-off-by: Steve Wise
    Signed-off-by: Hariprasad Shenai
    Signed-off-by: Doug Ledford

    Hariprasad S
     
  • In c4iw_ep_disconnect(), if we start the ep timer to begin a close,
    but send_halfclose() fails, we need to stop the timer and send a CLOSE
    event up to the IWCM before releasing the resources. Otherwise, we can
    crash when the ep timer fires if the ep is referencing a previous instance
    of the device. This can happen as part of adapter reset/recovery, for
    instance.

    Signed-off-by: Steve Wise
    Signed-off-by: Hariprasad Shenai
    Signed-off-by: Doug Ledford

    Hariprasad S
     
  • If ARP fails before the CPL_PASS_ACCEPT_RPL is seen by hardware, the tid
    will be stuck in SYN_PEND and never released. So create an arp failure
    handler specifically for this message to release the endpoint resources.

    In pass_accept_rpl_arp_failure(), put the parent endpoint so it will
    be freed when destroyed. Also we don't need to call release_tid() here
    because _c4iw_free_ep() calls cxgb4_remove_tid() which releases the
    hwtid.

    If we get an ABORT_REQ_RSS instead of a PASS_ESTABLISH (because the
    peer's ACK to our SYN is never received), then put the parent as well
    in peer_abort().

    Treat accept_cr() failures just like arp failures: put the parent ep
    and release the ep resources destroying the tid

    The ARP failure handlers are called in an atomic context, so we need to
    schedule some of the processing which might block. Namely _c4iw_free_ep()
    which needs a mutex. So create a "special" CPL opcode and handler and
    schedule it via sched() to be run by process_work() in a blockable context.

    Also rework the active open arp failure handler to make use of
    release_ep_resources(). This allows both the active and passive arp
    failure handlers to use the same deferred cleanup function.

    Signed-off-by: Steve Wise
    Signed-off-by: Hariprasad Shenai
    Signed-off-by: Doug Ledford

    Hariprasad S
     
  • iSER currently has a couple places that set max_sectors in either the host
    template or SCSI host, and all of them get it wrong.

    This patch instead uses a single assignment that (hopefully) gets it right:
    the max_sectors value must be derived from the number of segments in the
    FR or FMR structure, but actually be one lower than the page size multiplied
    by the number of sectors, as it has to handle the case of non-aligned I/O.

    Without this I get trivial to reproduce hangs when running xfstests
    (on XFS) over iSER to Linux targets.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Max Gurtovoy
    Acked-by: Sagi Grimberg
    Signed-off-by: Doug Ledford

    Christoph Hellwig
     

29 Apr, 2016

8 commits