18 Nov, 2013

2 commits

  • …s', 'ocrdma', 'qib' and 'srp' into for-next

    Roland Dreier
     
  • Commit 400dbc96583f ("IB/core: Infrastructure for extensible uverbs
    commands") added an infrastructure for extensible uverbs commands
    while later commit 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow
    through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
    using this new infrastructure.

    According to the commit 400dbc96583f, the purpose of this
    infrastructure is to support passing around provider (eg. hardware)
    specific buffers when userspace issue commands to the kernel, so that
    it would be possible to extend uverbs (eg. core) buffers independently
    from the provider buffers.

    But the new kernel command function prototypes were not modified to
    take advantage of this extension. This issue was exposed by Roland
    Dreier in a previous review[1].

    So the following patch is an attempt to a revised extensible command
    infrastructure.

    This improved extensible command infrastructure distinguish between
    core (eg. legacy)'s command/response buffers from provider
    (eg. hardware)'s command/response buffers: each extended command
    implementing function is given a struct ib_udata to hold core
    (eg. uverbs) input and output buffers, and another struct ib_udata to
    hold the hw (eg. provider) input and output buffers.

    Having those buffers identified separately make it easier to increase
    one buffer to support extension without having to add some code to
    guess the exact size of each command/response parts: This should make
    the extended functions more reliable.

    Additionally, instead of relying on command identifier being greater
    than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
    unused bits in command field: on the 32 bits provided by command
    field, only 6 bits are really needed to encode the identifier of
    commands currently supported by the kernel. (Even using only 6 bits
    leaves room for about 23 new commands).

    So this patch makes use of some high order bits in command field to
    store flags, leaving enough room for more command identifiers than one
    will ever need (eg. 256).

    The new flags are used to specify if the command should be processed
    as an extended one or a legacy one. While designing the new command
    format, care was taken to make usage of flags itself extensible.

    Using high order bits of the commands field ensure that newer
    libibverbs on older kernel will properly fail when trying to call
    extended commands. On the other hand, older libibverbs on newer kernel
    will never be able to issue calls to extended commands.

    The extended command header includes the optional response pointer so
    that output buffer length and output buffer pointer are located
    together in the command, allowing proper parameters checking. This
    should make implementing functions easier and safer.

    Additionally the extended header ensure 64bits alignment, while making
    all sizes multiple of 8 bytes, extending the maximum buffer size:

    legacy extended

    Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
    Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)

    For the purpose of doing proper buffer size accounting, the headers
    size are no more taken in account in "in_words".

    One of the odds of the current extensible infrastructure, reading
    twice the "legacy" command header, is fixed by removing the "legacy"
    command header from the extended command header: they are processed as
    two different parts of the command: memory is read once and
    information are not duplicated: it's making clear that's an extended
    command scheme and not a different command scheme.

    The proposed scheme will format input (command) and output (response)
    buffers this way:

    - command:

    legacy header +
    extended header +
    command data (core + hw):

    +----------------------------------------+
    | flags | 00 00 | command |
    | in_words | out_words |
    +----------------------------------------+
    | response |
    | response |
    | provider_in_words | provider_out_words |
    | padding |
    +----------------------------------------+
    | |
    . .
    . (in_words * 8) .
    | |
    +----------------------------------------+
    | |
    . .
    . (provider_in_words * 8) .
    | |
    +----------------------------------------+

    - response, if present:

    +----------------------------------------+
    | |
    . .
    . (out_words * 8) .
    | |
    +----------------------------------------+
    | |
    . .
    . (provider_out_words * 8) .
    | |
    +----------------------------------------+

    The overall design is to ensure that the extensible infrastructure is
    itself extensible while begin more reliable with more input and bound
    checking.

    Note:

    The unused field in the extended header would be perfect candidate to
    hold the command "comp_mask" (eg. bit field used to handle
    compatibility). This was suggested by Roland Dreier in a previous
    review[2]. But "comp_mask" field is likely to be present in the uverb
    input and/or provider input, likewise for the response, as noted by
    Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
    header.

    [1]:
    http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com

    [2]:
    http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com

    [3]:
    http://marc.info/?i=525C1149.6000701@mellanox.com

    Signed-off-by: Yann Droneaud
    Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com

    [ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ]

    Signed-off-by: Roland Dreier

    Yann Droneaud
     

16 Nov, 2013

1 commit


09 Nov, 2013

1 commit


04 Sep, 2013

1 commit


03 Sep, 2013

1 commit


29 Aug, 2013

2 commits

  • Implement ib_uverbs_create_flow() and ib_uverbs_destroy_flow() to
    support flow steering for user space applications.

    Signed-off-by: Hadar Hen Zion
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Hadar Hen Zion
     
  • The RDMA stack allows for applications to create IB_QPT_RAW_PACKET
    QPs, which receive plain Ethernet packets, specifically packets that
    don't carry any QPN to be matched by the receiving side. Applications
    using these QPs must be provided with a method to program some
    steering rule with the HW so packets arriving at the local port can be
    routed to them.

    This patch adds ib_create_flow(), which allow providing a flow
    specification for a QP. When there's a match between the
    specification and a received packet, the packet is forwarded to that
    QP, in a the same way one uses ib_attach_multicast() for IB UD
    multicast handling.

    Flow specifications are provided as instances of struct ib_flow_spec_yyy,
    which describe L2, L3 and L4 headers. Currently specs for Ethernet, IPv4,
    TCP and UDP are defined. Flow specs are made of values and masks.

    The input to ib_create_flow() is a struct ib_flow_attr, which contains
    a few mandatory control elements and optional flow specs.

    struct ib_flow_attr {
    enum ib_flow_attr_type type;
    u16 size;
    u16 priority;
    u32 flags;
    u8 num_of_specs;
    u8 port;
    /* Following are the optional layers according to user request
    * struct ib_flow_spec_yyy
    * struct ib_flow_spec_zzz
    */
    };

    As these specs are eventually coming from user space, they are defined and
    used in a way which allows adding new spec types without kernel/user ABI
    change, just with a little API enhancement which defines the newly added spec.

    The flow spec structures are defined with TLV (Type-Length-Value)
    entries, which allows calling ib_create_flow() with a list of variable
    length of optional specs.

    For the actual processing of ib_flow_attr the driver uses the number
    of specs and the size mandatory fields along with the TLV nature of
    the specs.

    Steering rules processing order is according to the domain over which
    the rule is set and the rule priority. All rules set by user space
    applicatations fall into the IB_FLOW_DOMAIN_USER domain, other domains
    could be used by future IPoIB RFS and Ethetool flow-steering interface
    implementation. Lower numerical value for the priority field means
    higher priority.

    The returned value from ib_create_flow() is a struct ib_flow, which
    contains a database pointer (handle) provided by the HW driver to be
    used when calling ib_destroy_flow().

    Applications that offload TCP/IP traffic can also be written over IB
    UD QPs. The ib_create_flow() / ib_destroy_flow() API is designed to
    support UD QPs too. A HW driver can set IB_DEVICE_MANAGED_FLOW_STEERING
    to denote support for flow steering.

    The ib_flow_attr enum type supports usage of flow steering for promiscuous
    and sniffer purposes:

    IB_FLOW_ATTR_NORMAL - "regular" rule, steering according to rule specification

    IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive
    all Ethernet traffic which isn't steered to any QP

    IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for multicast

    IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic

    ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type.

    Signed-off-by: Hadar Hen Zion
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Hadar Hen Zion
     

14 Aug, 2013

1 commit

  • Fix a potential race when event occurrs on a target XRC QP and in the
    middle of reporting that on its shared qps, one of them is destroyed
    by user space application. Also add note for kernel consumers in
    ib_verbs.h that they must not destroy the QP from within the handler.

    Signed-off-by: Yishai Hadas
    Signed-off-by: Jack Morgenstein
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Yishai Hadas
     

13 Aug, 2013

1 commit

  • Modify the type of local_addr and remote_addr fields in struct
    iw_cm_id from struct sockaddr_in to struct sockaddr_storage to hold
    IPv6 and IPv4 addresses uniformly.

    Change the references of local_addr and remote_addr in cxgb4, cxgb3,
    nes and amso drivers to match this. However to be able to actully run
    traffic over IPv6, low-level drivers have to add code to support this.

    Signed-off-by: Steve Wise
    Reviewed-by: Sean Hefty

    [ Fix unused variable warnings when INFINIBAND_NES_DEBUG not set.
    - Roland ]

    Signed-off-by: Roland Dreier

    Steve Wise
     

09 Jul, 2013

1 commit


08 Jul, 2013

1 commit

  • Continue the approach taken by commit d2b57063e4a ("IB/core: Reserve
    bits in enum ib_qp_create_flags for low-level driver use") and add
    reserved entries to the ib_qp_type and ib_wr_opcode enums. Low-level
    drivers can then define macros to use these reserved values, giving
    proper names to the macros for readability. Also add a range of
    reserved flags to enum ib_send_flags.

    The mlx5 IB driver uses the new additions.

    Signed-off-by: Jack Morgenstein
    Signed-off-by: Roland Dreier

    Jack Morgenstein
     

21 Jun, 2013

6 commits

  • Allow the rdma_ucm to query the IB service ID formed or allocated by
    the rdma_cm by exporting the cma_get_service_id() functionality.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Allow converting from struct ib_sa_path_rec to the IB defined SA path
    record wire format. This will be used to report path data from the
    rdma cm into user space.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Allow the user to specify the qkey when using AF_IB. The qkey is
    added to struct rdma_ucm_conn_param in place of a reserved field, but
    for backwards compatability, is only accessed if the associated
    rdma_cm_id is using AF_IB.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • The AF_IB uses a 64-bit service id (SID), which the user can control
    through the use of a mask. The rdma_cm will assign values to the
    unmasked portions of the SID based on the selected port space and port
    number.

    Because the IB spec divides the SID range into several regions, a
    SID/mask combination may fall into one of the existing port space
    ranges as defined by the RDMA CM IP Annex. Map the AF_IB SID to the
    correct RDMA port space.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Add support for AF_IB to ip_addr_size, and rename the function to
    account for the change. Give the compiler more control over whether
    the call should be inline or not by moving the definition into the .c
    file, removing the static inline, and exporting it.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Define AF_IB and sockaddr_ib to allow the rdma_cm to use native IB
    addressing.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     

22 Feb, 2013

1 commit

  • This patch enhances the IB core support for Memory Windows (MWs).

    MWs allow an application to have better/flexible control over remote
    access to memory.

    Two types of MWs are supported, with the second type having two flavors:

    Type 1 - associated with PD only
    Type 2A - associated with QPN only
    Type 2B - associated with PD and QPN

    Applications can allocate a MW once, and then repeatedly bind the MW
    to different ranges in MRs that are associated to the same PD. Type 1
    windows are bound through a verb, while type 2 windows are bound by
    posting a work request.

    The 32-bit memory key is composed of a 24-bit index and an 8-bit
    key. The key is changed with each bind, thus allowing more control
    over the peer's use of the memory key.

    The changes introduced are the following:

    * add memory window type enum and a corresponding parameter to ib_alloc_mw.
    * type 2 memory window bind work request support.
    * create a struct that contains the common part of the bind verb struct
    ibv_mw_bind and the bind work request into a single struct.
    * add the ib_inc_rkey helper function to advance the tag part of an rkey.

    Consumer interface details:

    * new device capability flags IB_DEVICE_MEM_WINDOW_TYPE_2A and
    IB_DEVICE_MEM_WINDOW_TYPE_2B are added to indicate device support
    for these features.

    Devices can set either IB_DEVICE_MEM_WINDOW_TYPE_2A or
    IB_DEVICE_MEM_WINDOW_TYPE_2B if it supports type 2A or type 2B
    memory windows. It can set neither to indicate it doesn't support
    type 2 windows at all.

    * modify existing provides and consumers code to the new param of
    ib_alloc_mw and the ib_mw_bind_info structure

    Signed-off-by: Haggai Eran
    Signed-off-by: Shani Michaeli
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Shani Michaeli
     

03 Jan, 2013

1 commit


22 Nov, 2012

1 commit


07 Oct, 2012

1 commit


01 Oct, 2012

2 commits

  • When P_Key tables potentially contain both full and partial membership
    copies for the same P_Key, we need a function to find the index for an
    exact (16-bit) P_Key.

    This is necessary when the master forwards QP1 MADs sent by guests.
    If the guest has sent the MAD with a limited membership P_Key, we need
    to to forward the MAD using the same limited membership P_Key. Since
    the master may have both the limited and the full member P_Keys in its
    table, we must make sure to retrieve the limited membership P_Key in
    this case.

    Signed-off-by: Jack Morgenstein
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Jack Morgenstein
     
  • Reserve bits 26-31 for internal use by low-level drivers. Two such
    bits are used in the mlx4_b driver SR-IOV implementation.

    These enum additions guarantee that the core layer will never use
    these bits, so that low level drivers may safely make use of them.

    Signed-off-by: Jack Morgenstein
    Signed-off-by: Roland Dreier

    Jack Morgenstein
     

23 Jul, 2012

1 commit


09 Jul, 2012

3 commits


22 May, 2012

1 commit


19 May, 2012

1 commit


09 May, 2012

2 commits

  • IB_QPT_RAW_PACKET allows applications to build a complete packet,
    including L2 headers, when sending; on the receive side, the HW will
    not strip any headers.

    This QP type is designed for userspace direct access to Ethernet; for
    example by applications that do TCP/IP themselves. Only processes
    with the NET_RAW capability are allowed to create raw packet QPs (the
    name "raw packet QP" is supposed to suggest an analogy to AF_PACKET /
    SOL_RAW sockets).

    Signed-off-by: Or Gerlitz
    Reviewed-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Or Gerlitz
     
  • Just as we don't allow PDs, CQs, etc. to be destroyed if there are QPs
    that are attached to them, don't let a QP be destroyed if there are
    multicast group(s) attached to it. Use the existing usecnt field of
    struct ib_qp which was added by commit 0e0ec7e ("RDMA/core: Export
    ib_open_qp() to share XRC TGT QPs") to track this.

    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Or Gerlitz
     

20 Mar, 2012

1 commit


09 Mar, 2012

1 commit

  • Use a bit in wc_flags rather then a whole integer to hold the
    "checksum OK" flag. By itself, this change doesn't reduce the size of
    struct ib_wc on 64bit machines -- it stays on 56 bytes because of
    padding. However, it will allow to add more fields in the future
    without enlarging the struct. Also, it will let us have a unified
    approach with future libibverbs checksum offload reporting, because a
    bit flag doesn't break the library ABI.

    This patch was suggested during conversation with Liran Liss
    .

    Signed-off-by: Or Gerlitz
    Reviewed-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Or Gerlitz
     

06 Mar, 2012

1 commit

  • The kernel IB stack uses one enumeration for IB speed, which wasn't
    explicitly specified in the verbs header file. Add that enum, and use
    it all over the code.

    The IB speed/width notation is also used by iWARP and IBoE HW drivers,
    which use the convention of rate = speed * width to advertise their
    port link rate.

    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Or Gerlitz
     

26 Feb, 2012

1 commit


05 Jan, 2012

1 commit


02 Nov, 2011

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits)
    mlx4_core: Deprecate log_num_vlan module param
    IB/mlx4: Don't set VLAN in IBoE WQEs' control segment
    IB/mlx4: Enable 4K mtu for IBoE
    RDMA/cxgb4: Mark QP in error before disabling the queue in firmware
    RDMA/cxgb4: Serialize calls to CQ's comp_handler
    RDMA/cxgb3: Serialize calls to CQ's comp_handler
    IB/qib: Fix issue with link states and QSFP cables
    IB/mlx4: Configure extended active speeds
    mlx4_core: Add extended port capabilities support
    IB/qib: Hold links until tuning data is available
    IB/qib: Clean up checkpatch issue
    IB/qib: Remove s_lock around header validation
    IB/qib: Precompute timeout jiffies to optimize latency
    IB/qib: Use RCU for qpn lookup
    IB/qib: Eliminate divide/mod in converting idx to egr buf pointer
    IB/qib: Decode path MTU optimization
    IB/qib: Optimize RC/UC code by IB operation
    IPoIB: Use the right function to do DMA unmap pages
    RDMA/cxgb4: Use correct QID in insert_recv_cqe()
    RDMA/cxgb4: Make sure flush CQ entries are collected on connection close
    ...

    Linus Torvalds
     
  • …sc', 'mlx4', 'misc', 'nes', 'qib' and 'xrc' into for-next

    Roland Dreier
     

14 Oct, 2011

1 commit

  • Allow processes that share the same XRC domain to open an existing
    shareable QP. This permits those processes to receive events on the
    shared QP and transfer ownership, so that any process may modify the
    QP. The latter allows the creating process to exit, while a remaining
    process can still transition it for path migration purposes.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty