24 Aug, 2016

1 commit


05 Aug, 2016

2 commits

  • Pull second round of rdma updates from Doug Ledford:
    "This can be split out into just two categories:

    - fixes to the RDMA R/W API in regards to SG list length limits
    (about 5 patches)

    - fixes/features for the Intel hfi1 driver (everything else)

    The hfi1 driver is still being brought to full feature support by
    Intel, and they have a lot of people working on it, so that amounts to
    almost the entirety of this pull request"

    * tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits)
    IB/hfi1: Add cache evict LRU list
    IB/hfi1: Fix memory leak during unexpected shutdown
    IB/hfi1: Remove unneeded mm argument in remove function
    IB/hfi1: Consistently call ops->remove outside spinlock
    IB/hfi1: Use evict mmu rb operation
    IB/hfi1: Add evict operation to the mmu rb handler
    IB/hfi1: Fix TID caching actions
    IB/hfi1: Make the cache handler own its rb tree root
    IB/hfi1: Make use of mm consistent
    IB/hfi1: Fix user SDMA racy user request claim
    IB/hfi1: Fix error condition that needs to clean up
    IB/hfi1: Release node on insert failure
    IB/hfi1: Validate SDMA user iovector count
    IB/hfi1: Validate SDMA user request index
    IB/hfi1: Use the same capability state for all shared contexts
    IB/hfi1: Prevent null pointer dereference
    IB/hfi1: Rename TID mmu_rb_* functions
    IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister()
    IB/hfi1: Restructure hfi1_file_open
    IB/hfi1: Make iovec loop index easy to understand
    ...

    Linus Torvalds
     
  • Pull base rdma updates from Doug Ledford:
    "Round one of 4.8 code: while this is mostly normal, there is a new
    driver in here (the driver was hosted outside the kernel for several
    years and is actually a fairly mature and well coded driver). It
    amounts to 13,000 of the 16,000 lines of added code in here.

    Summary:

    - Updates/fixes for iw_cxgb4 driver
    - Updates/fixes for mlx5 driver
    - Add flow steering and RSS API
    - Add hardware stats to mlx4 and mlx5 drivers
    - Add firmware version API for RDMA driver use
    - Add the rxe driver (this is a software RoCE driver that makes any
    Ethernet device a RoCE device)
    - Fixes for i40iw driver
    - Support for send only multicast joins in the cma layer
    - Other minor fixes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits)
    Soft RoCE driver
    IB/core: Support for CMA multicast join flags
    IB/sa: Add cached attribute containing SM information to SA port
    IB/uverbs: Fix race between uverbs_close and remove_one
    IB/mthca: Clean up error unwind flow in mthca_reset()
    IB/mthca: NULL arg to pci_dev_put is OK
    IB/hfi1: NULL arg to sc_return_credits is OK
    IB/mlx4: Add diagnostic hardware counters
    net/mlx4: Query performance and diagnostics counters
    net/mlx4: Add diagnostic counters capability bit
    Use smaller 512 byte messages for portmapper messages
    IB/ipoib: Report SG feature regardless of HW UD CSUM capability
    IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct
    IB/hfi1: Disable by default
    IB/rdmavt: Disable by default
    IB/mlx5: Fix port counter ID association to QP offset
    IB/mlx5: Fix iteration overrun in GSI qps
    i40iw: Add NULL check for puda buffer
    i40iw: Change dup_ack_thresh to u8
    i40iw: Remove unnecessary check for moving CQ head
    ...

    Linus Torvalds
     

04 Aug, 2016

1 commit

  • The dma-mapping core and the implementations do not change the DMA
    attributes passed by pointer. Thus the pointer can point to const data.
    However the attributes do not have to be a bitfield. Instead unsigned
    long will do fine:

    1. This is just simpler. Both in terms of reading the code and setting
    attributes. Instead of initializing local attributes on the stack
    and passing pointer to it to dma_set_attr(), just set the bits.

    2. It brings safeness and checking for const correctness because the
    attributes are passed by value.

    Semantic patches for this change (at least most of them):

    virtual patch
    virtual context

    @r@
    identifier f, attrs;

    @@
    f(...,
    - struct dma_attrs *attrs
    + unsigned long attrs
    , ...)
    {
    ...
    }

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
    )

    and

    // Options: --all-includes
    virtual patch
    virtual context

    @r@
    identifier f, attrs;
    type t;

    @@
    t f(..., struct dma_attrs *attrs);

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
    )

    Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
    Signed-off-by: Krzysztof Kozlowski
    Acked-by: Vineet Gupta
    Acked-by: Robin Murphy
    Acked-by: Hans-Christian Noren Egtvedt
    Acked-by: Mark Salter [c6x]
    Acked-by: Jesper Nilsson [cris]
    Acked-by: Daniel Vetter [drm]
    Reviewed-by: Bart Van Assche
    Acked-by: Joerg Roedel [iommu]
    Acked-by: Fabien Dessenne [bdisp]
    Reviewed-by: Marek Szyprowski [vb2-core]
    Acked-by: David Vrabel [xen]
    Acked-by: Konrad Rzeszutek Wilk [xen swiotlb]
    Acked-by: Joerg Roedel [iommu]
    Acked-by: Richard Kuo [hexagon]
    Acked-by: Geert Uytterhoeven [m68k]
    Acked-by: Gerald Schaefer [s390]
    Acked-by: Bjorn Andersson
    Acked-by: Hans-Christian Noren Egtvedt [avr32]
    Acked-by: Vineet Gupta [arc]
    Acked-by: Robin Murphy [arm64 and dma-iommu]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Krzysztof Kozlowski
     

03 Aug, 2016

1 commit

  • Compute the SGE limit for RDMA READ and WRITE requests in
    ib_create_qp(). Use that limit in the RDMA RW API implementation.

    Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Sagi Grimberg
    Cc: Steve Wise
    Cc: Parav Pandit
    Cc: Nicholas Bellinger
    Cc: Laurence Oberman
    Cc: #v4.7+
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Doug Ledford

    Bart Van Assche
     

24 Jun, 2016

2 commits


23 Jun, 2016

6 commits

  • Add IPv6 flow specification support.

    Signed-off-by: Maor Gottlieb
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Maor Gottlieb
     
  • Extend create QP to get Receive Work Queue (WQ) indirection table.

    QP can be created with external Receive Work Queue indirection table,
    in that case it is ready to receive immediately.

    Signed-off-by: Yishai Hadas
    Signed-off-by: Matan Barak
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Doug Ledford

    Yishai Hadas
     
  • User applications that want to spread traffic on several WQs, need to
    create an indirection table, by using already created WQs.

    Adding uverbs API in order to create and destroy this table.

    Signed-off-by: Yishai Hadas
    Signed-off-by: Matan Barak
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Doug Ledford

    Yishai Hadas
     
  • Introduce Receive Work Queue (WQ) indirection table.
    This object can be used to spread incoming traffic to different
    receive Work Queues.

    A Receive WQ indirection table points to variable size of WQs.
    This table is given to a QP in downstream patches.

    Signed-off-by: Yishai Hadas
    Signed-off-by: Matan Barak
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Doug Ledford

    Yishai Hadas
     
  • User space applications which use RSS functionality need to create
    a work queue object (WQ). The lifetime of such an object is:
    * Create a WQ
    * Modify the WQ from reset to init state.
    * Use the WQ (by downstream patches).
    * Destroy the WQ.

    These commands are added to the uverbs API.

    Signed-off-by: Yishai Hadas
    Signed-off-by: Matan Barak
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Doug Ledford

    Yishai Hadas
     
  • Introduce Work Queue object and its create/destroy/modify verbs.

    QP can be created without internal WQs "packaged" inside it,
    this QP can be configured to use "external" WQ object as its
    receive/send queue.
    WQ is a necessary component for RSS technology since RSS mechanism
    is supposed to distribute the traffic between multiple
    Receive Work Queues.

    WQ associated (many to one) with Completion Queue and it owns WQ
    properties (PD, WQ size, etc.).
    WQ has a type, this patch introduces the IB_WQT_RQ (i.e.receive queue),
    it may be extend to others such as IB_WQT_SQ. (send queue).
    WQ from type IB_WQT_RQ contains receive work requests.

    PD is an attribute of a work queue (i.e. send/receive queue), it's used
    by the hardware for security validation before scattering to a memory
    region which is pointed by the WQ. For that, an external WQ object
    needs a PD, letting the hardware makes that validation.

    When accessing a memory region that is pointed by the WQ its PD
    is used and not the QP's PD, this behavior is similar
    to a SRQ and a QP.

    WQ context is subject to a well-defined state transitions done by
    the modify_wq verb.
    When WQ is created its initial state becomes IB_WQS_RESET.
    >From IB_WQS_RESET it can be modified to itself or to IB_WQS_RDY.
    >From IB_WQS_RDY it can be modified to itself, to IB_WQS_RESET
    or to IB_WQS_ERR.
    >From IB_WQS_ERR it can be modified to IB_WQS_RESET.

    Note: transition to IB_WQS_ERR might occur implicitly in case there
    was some HW error.

    Signed-off-by: Yishai Hadas
    Signed-off-by: Matan Barak
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Doug Ledford

    Yishai Hadas
     

07 Jun, 2016

2 commits

  • Replace the few u64 casts with ULL to match the rest of the casts.

    Signed-off-by: Max Gurtovoy
    Signed-off-by: Doug Ledford

    Max Gurtovoy
     
  • ib_device_cap_flags 64-bit expansion caused caps overlapping
    and made consumers read wrong device capabilities. For example
    IB_DEVICE_SG_GAPS_REG was falsely read by the iser driver causing
    it to use a non-existing capability. This happened because signed
    int becomes sign extended when converted it to u64. Fix this by
    casting IB_DEVICE_ON_DEMAND_PAGING enumeration to ULL.

    Fixes: f5aa9159a418 ('IB/core: Add arbitrary sg_list support')
    Reported-by: Robert LeBlanc
    Cc: Stable #[v4.6+]
    Acked-by: Sagi Grimberg
    Signed-off-by: Max Gurtovoy
    Signed-off-by: Matan Barak
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Doug Ledford

    Max Gurtovoy
     

27 May, 2016

1 commit

  • In practice, each RDMA device has a unique set of counters that the
    hardware implements. Having a central set of counters that they must
    all adhere to is limiting and causes many useful counters to not be
    available.

    Therefore we create a dynamic counter registration infrastructure.

    The driver must implement a stats structure allocation routine, in
    which the driver must place the directory name it wants, a list of
    names for all of the counters, an array of u64 counters themselves,
    plus a few generic configuration options.

    We then implement a core routine to create a sysfs file for each
    of the named stats elements, and a core routine to retrieve the
    stats when any of the sysfs attribute files are read.

    To avoid excessive beating on the stats generation routine in the
    drivers, the core code also caches the stats for a short period of
    time so that someone attempting to read all of the stats in a
    given device's directory will not result in a stats generation
    call per file read.

    Future work will attempt to standardize just the shared stats
    elements, and possibly add a method to get the stats via netlink
    in addition to sysfs.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Mark Bloch
    Reviewed-by: Steve Wise
    Signed-off-by: Doug Ledford
    [ Add caching, make structure names more informative, add i40iw support,
    other significant rewrites from the original patch ]

    Christoph Lameter
     

14 May, 2016

10 commits


22 Mar, 2016

5 commits

  • Doug Ledford
     
  • Following the practice exercised for network devices which allow the PF
    net device to configure attributes of its virtual functions, we
    introduce the following functions to be used by IPoIB which is the
    network driver implementation for IB devices.

    ib_set_vf_link_state - set the policy for a VF link. More below.
    ib_get_vf_config - read configuration information of a VF
    ib_get_vf_stats - read VF statistics
    ib_set_vf_guid - set the node or port GUID of a VF

    Also add an indication in the device cap flags that indicates that this
    IB devices is based on a virtual function.

    A VF shares the physical port with the PF and other VFs. When setting
    the link state we have three options:

    1. Auto - in this mode, the virtual port follows the state of the
    physical port and becomes active only if the physical port's state is
    active. In all other cases it remains in a Down state.
    2. Down - sets the state of the virtual port to Down
    3. Up - causes the virtual port to transition into Initialize state if
    it was not already in this state. A virtualization aware subnet manager
    can then bring the state of the port into the Active state.

    Signed-off-by: Eli Cohen
    Reviewed-by: Or Gerlitz
    Signed-off-by: Doug Ledford

    Eli Cohen
     
  • Per the ongoing standardisation process, when virtual HCAs are present
    in a network, traffic is routed based on a destination GID. In order to
    access the SA we use the well known SA GID.

    We also add a GRH required boolean field to the port attributes which is
    used to report to the verbs consumer whether this port is connected to a
    virtual network. We use this field to realize whether we need to create
    an address vector with GRH to access the subnet administrator. We clear
    the port attributes struct before calling the hardware driver to make
    sure the default remains that GRH is not required.

    Signed-off-by: Eli Cohen
    Reviewed-by: Or Gerlitz
    Signed-off-by: Doug Ledford

    Eli Cohen
     
  • The subnet prefix is a part of the port_info MAD returned and should be
    available at the ib_port_attr struct. We define it here and provide a
    default implementation in case the hardware driver does not provide one.
    The subnet prefix is required when creating the address vector to access
    the SA in networks where GRH must be used.

    Signed-off-by: Eli Cohen
    Reviewed-by: Or Gerlitz
    Signed-off-by: Doug Ledford

    Eli Cohen
     
  • The old bitwise device_cap_flags variable was limited to u32 which
    has all bits already defined. In order to overcome it, we converted
    device_cap_flags variable to be u64 type.

    Signed-off-by: Leon Romanovsky
    Reviewed-by: Matan Barak
    Signed-off-by: Doug Ledford

    Leon Romanovsky
     

17 Mar, 2016

1 commit


11 Mar, 2016

1 commit

  • Until all functionality is moved over to rdmavt drivers still need to
    access a number of fields in data structures that are predominantly
    meant to be used by rdmavt. Once these rdmavt_.h header
    files are no longer being touched by drivers their content should be
    moved to rdmavt/.h. While here move a couple #defines
    over to more general IB verbs header files because they fit better.

    Reviewed-by: Ira Weiny
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     

05 Mar, 2016

1 commit

  • Devices that are capable in registering SG lists
    with gaps can now expose it in the core to ULPs
    using a new device capability IB_DEVICE_SG_GAPS_REG
    (in a new field device_cap_flags_ex in the device attributes
    as we ran out of bits), and a new mr_type IB_MR_TYPE_SG_GAPS_REG
    which allocates a memory region which is capable of handling
    SG lists with gaps.

    Signed-off-by: Sagi Grimberg
    Signed-off-by: Doug Ledford

    Sagi Grimberg
     

02 Mar, 2016

1 commit


01 Mar, 2016

2 commits

  • Don't trap flag (i.e. IB_FLOW_ATTR_FLAGS_DONT_TRAP) indicates that QP
    will receive traffic, but will not steal it.

    When a packet matches a flow steering rule that was created with
    the don't trap flag, the QPs assigned to this rule will get this
    packet, but matching will continue to other equal/lower priority
    rules. This will let other QPs assigned to those rules to get the
    packet too.

    If both don't trap rule and other rules have the same priority
    and match the same packet, the behavior is undefined.

    The don't trap flag can't be set with default rule types
    (i.e. IB_FLOW_ATTR_ALL_DEFAULT, IB_FLOW_ATTR_MC_DEFAULT) as default rules
    don't have rules after them and don't trap has no meaning here.

    Signed-off-by: Marina Varshaver
    Reviewed-by: Matan Barak
    Reviewed-by: Yishai Hadas
    Signed-off-by: Doug Ledford

    Marina Varshaver
     
  • Add provider-specific drain_sq/drain_rq functions for providers needing
    special drain logic.

    Add static functions __ib_drain_sq() and __ib_drain_rq() which post noop
    WRs to the SQ or RQ and block until their completions are processed.
    This ensures the applications completions for work requests posted prior
    to the drain work request have all been processed.

    Add API functions ib_drain_sq(), ib_drain_rq(), and ib_drain_qp().

    For the drain logic to work, the caller must:

    ensure there is room in the CQ(s) and QP for the drain work request
    and completion.

    allocate the CQ using ib_alloc_cq() and the CQ poll context cannot be
    IB_POLL_DIRECT.

    ensure that there are no other contexts that are posting WRs concurrently.
    Otherwise the drain is not guaranteed.

    Reviewed-by: Chuck Lever
    Signed-off-by: Steve Wise
    Signed-off-by: Doug Ledford

    Steve Wise
     

20 Jan, 2016

1 commit


24 Dec, 2015

2 commits

  • The cross-channel feature allows to execute WQEs that involve
    synchronization of I/O operations’ on different QPs.

    This capability enables to program complex flows with a single
    function call, hereby significantly reducing overhead associated
    with I/O processing.

    Cross-channel operations support is indicated by HCA capability
    information.

    The queue pairs can be configured to work as a “sync master queue”
    or “sync slave queues”.

    The added flags are:

    1. Device capability flag IB_DEVICE_CROSS_CHANNEL for the
    devices that can perform cross-channel operations.

    2. CQ property flag IB_CQ_FLAGS_IGNORE_OVERRUN to disable CQ overrun
    check. This check is useless in cross-channel scenario.

    3. QP property flags to indicate if queues are slave or master:
    * IB_QP_CREATE_MANAGED_SEND indicates that posted send work requests
    will not be executed immediately and requires enabling.
    * IB_QP_CREATE_MANAGED_RECV indicates that posted receive work
    requests will not be executed immediately and requires enabling.
    * IB_QP_CREATE_CROSS_CHANNEL declares the QP to work in cross-channel
    mode. If IB_QP_CREATE_MANAGED_SEND and IB_QP_CREATE_MANAGED_RECV are
    not provided, this QP will be sync master queue, else it will be sync
    slave.

    Reviewed-by: Sagi Grimberg
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Leon Romanovsky
     
  • Modify enum ib_device_cap_flags such that other patches which add new
    enum values pass strict checkpatch.pl checks.

    Reviewed-by: Sagi Grimberg
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Leon Romanovsky