16 Dec, 2014

2 commits

  • * Add a configuration option for enable on-demand paging support in
    the infiniband subsystem (CONFIG_INFINIBAND_ON_DEMAND_PAGING). In a
    later patch, this configuration option will select the MMU_NOTIFIER
    configuration option to enable mmu notifiers.
    * Add a flag for on demand paging (ODP) support in the IB device capabilities.
    * Add a flag to request ODP MR in the access flags to reg_mr.
    * Fail registrations done with the ODP flag when the low-level driver
    doesn't support this.
    * Change the conditions in which an MR will be writable to explicitly
    specify the access flags. This is to avoid making an MR writable just
    because it is an ODP MR.
    * Add a ODP capabilities to the extended query device verb.

    Signed-off-by: Sagi Grimberg
    Signed-off-by: Shachar Raindel
    Signed-off-by: Haggai Eran
    Signed-off-by: Roland Dreier

    Sagi Grimberg
     
  • Add extensible query device capabilities verb to allow adding new features.
    ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to
    copy capability fields to be used by both ib_uverbs_query_device and
    ib_uverbs_ex_query_device.

    Signed-off-by: Eli Cohen
    Signed-off-by: Haggai Eran
    Signed-off-by: Roland Dreier

    Eli Cohen
     

14 Aug, 2014

1 commit


13 Aug, 2014

1 commit

  • added struct sockaddr_storage to rdma_user_cm.h without also adding an
    include for linux/socket.h to make sure it is defined. Systemtap
    needs the header files to build standalone and cannot rely on other
    files to pre-include other headers, so add linux/socket.h to the list
    of includes in this file.

    Fixes: ee7aed4528f ("RDMA/ucma: Support querying for AF_IB addresses")
    Signed-off-by: Doug Ledford
    Signed-off-by: Roland Dreier

    Doug Ledford
     

11 Aug, 2014

2 commits


02 Aug, 2014

1 commit

  • Memory re-registration is a feature that enables changing the
    attributes of a memory region registered by user-space, including PD,
    translation (address and length) and access flags.

    Add the required support in uverbs and the kernel verbs API.

    Signed-off-by: Matan Barak
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Matan Barak
     

11 Jun, 2014

1 commit

  • This patch adds iWARP Port Mapper (IWPM) Version 2 support. The iWARP
    Port Mapper implementation is based on the port mapper specification
    section in the Sockets Direct Protocol paper -
    http://www.rdmaconsortium.org/home/draft-pinkerton-iwarp-sdp-v1.0.pdf

    Existing iWARP RDMA providers use the same IP address as the native
    TCP/IP stack when creating RDMA connections. They need a mechanism to
    claim the TCP ports used for RDMA connections to prevent TCP port
    collisions when other host applications use TCP ports. The iWARP Port
    Mapper provides a standard mechanism to accomplish this. Without this
    service it is possible for RDMA application to bind/listen on the same
    port which is already being used by native TCP host application. If
    that happens the incoming TCP connection data can be passed to the
    RDMA stack with error.

    The iWARP Port Mapper solution doesn't contain any changes to the
    existing network stack in the kernel space. All the changes are
    contained with the infiniband tree and also in user space.

    The iWARP Port Mapper service is implemented as a user space daemon
    process. Source for the IWPM service is located at
    http://git.openfabrics.org/git?p=~tnikolova/libiwpm-1.0.0/.git;a=summary

    The iWARP driver (port mapper client) sends to the IWPM service the
    local IP address and TCP port it has received from the RDMA
    application, when starting a connection. The IWPM service performs a
    socket bind from user space to get an available TCP port, called a
    mapped port, and communicates it back to the client. In that sense,
    the IWPM service is used to map the TCP port, which the RDMA
    application uses to any port available from the host TCP port
    space. The mapped ports are used in iWARP RDMA connections to avoid
    collisions with native TCP stack which is aware that these ports are
    taken. When an RDMA connection using a mapped port is terminated, the
    client notifies the IWPM service, which then releases the TCP port.

    The message exchange between the IWPM service and the iWARP drivers
    (between user space and kernel space) is implemented using netlink
    sockets.

    1) Netlink interface functions are added: ibnl_unicast() and
    ibnl_mulitcast() for sending netlink messages to user space

    2) The signature of the existing ibnl_put_msg() is changed to be more
    generic

    3) Two netlink clients are added: RDMA_NL_NES, RDMA_NL_C4IW
    corresponding to the two iWarp drivers - nes and cxgb4 which use
    the IWPM service

    4) Enums are added to enumerate the attributes in the netlink
    messages, which are exchanged between the user space IWPM service
    and the iWARP drivers

    Signed-off-by: Tatyana Nikolova
    Signed-off-by: Steve Wise
    Reviewed-by: PJ Waskiewicz

    [ Fold in range checking fixes and nlh_next removal as suggested by Dan
    Carpenter and Steve Wise. Fix sparse endianness in hash. - Roland ]

    Signed-off-by: Roland Dreier

    Tatyana Nikolova
     

18 Nov, 2013

7 commits

  • This commit reverts commit 7afbddfae993 ("IB/core: Temporarily disable
    create_flow/destroy_flow uverbs"). Since the uverbs extensions
    functionality was experimental for v3.12, this patch re-enables the
    support for them and flow-steering for v3.13.

    Signed-off-by: Matan Barak
    Signed-off-by: Roland Dreier

    Matan Barak
     
  • Commit 400dbc96583f ("IB/core: Infrastructure for extensible uverbs
    commands") added an infrastructure for extensible uverbs commands
    while later commit 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow
    through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
    using this new infrastructure.

    According to the commit 400dbc96583f, the purpose of this
    infrastructure is to support passing around provider (eg. hardware)
    specific buffers when userspace issue commands to the kernel, so that
    it would be possible to extend uverbs (eg. core) buffers independently
    from the provider buffers.

    But the new kernel command function prototypes were not modified to
    take advantage of this extension. This issue was exposed by Roland
    Dreier in a previous review[1].

    So the following patch is an attempt to a revised extensible command
    infrastructure.

    This improved extensible command infrastructure distinguish between
    core (eg. legacy)'s command/response buffers from provider
    (eg. hardware)'s command/response buffers: each extended command
    implementing function is given a struct ib_udata to hold core
    (eg. uverbs) input and output buffers, and another struct ib_udata to
    hold the hw (eg. provider) input and output buffers.

    Having those buffers identified separately make it easier to increase
    one buffer to support extension without having to add some code to
    guess the exact size of each command/response parts: This should make
    the extended functions more reliable.

    Additionally, instead of relying on command identifier being greater
    than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
    unused bits in command field: on the 32 bits provided by command
    field, only 6 bits are really needed to encode the identifier of
    commands currently supported by the kernel. (Even using only 6 bits
    leaves room for about 23 new commands).

    So this patch makes use of some high order bits in command field to
    store flags, leaving enough room for more command identifiers than one
    will ever need (eg. 256).

    The new flags are used to specify if the command should be processed
    as an extended one or a legacy one. While designing the new command
    format, care was taken to make usage of flags itself extensible.

    Using high order bits of the commands field ensure that newer
    libibverbs on older kernel will properly fail when trying to call
    extended commands. On the other hand, older libibverbs on newer kernel
    will never be able to issue calls to extended commands.

    The extended command header includes the optional response pointer so
    that output buffer length and output buffer pointer are located
    together in the command, allowing proper parameters checking. This
    should make implementing functions easier and safer.

    Additionally the extended header ensure 64bits alignment, while making
    all sizes multiple of 8 bytes, extending the maximum buffer size:

    legacy extended

    Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
    Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)

    For the purpose of doing proper buffer size accounting, the headers
    size are no more taken in account in "in_words".

    One of the odds of the current extensible infrastructure, reading
    twice the "legacy" command header, is fixed by removing the "legacy"
    command header from the extended command header: they are processed as
    two different parts of the command: memory is read once and
    information are not duplicated: it's making clear that's an extended
    command scheme and not a different command scheme.

    The proposed scheme will format input (command) and output (response)
    buffers this way:

    - command:

    legacy header +
    extended header +
    command data (core + hw):

    +----------------------------------------+
    | flags | 00 00 | command |
    | in_words | out_words |
    +----------------------------------------+
    | response |
    | response |
    | provider_in_words | provider_out_words |
    | padding |
    +----------------------------------------+
    | |
    . .
    . (in_words * 8) .
    | |
    +----------------------------------------+
    | |
    . .
    . (provider_in_words * 8) .
    | |
    +----------------------------------------+

    - response, if present:

    +----------------------------------------+
    | |
    . .
    . (out_words * 8) .
    | |
    +----------------------------------------+
    | |
    . .
    . (provider_out_words * 8) .
    | |
    +----------------------------------------+

    The overall design is to ensure that the extensible infrastructure is
    itself extensible while begin more reliable with more input and bound
    checking.

    Note:

    The unused field in the extended header would be perfect candidate to
    hold the command "comp_mask" (eg. bit field used to handle
    compatibility). This was suggested by Roland Dreier in a previous
    review[2]. But "comp_mask" field is likely to be present in the uverb
    input and/or provider input, likewise for the response, as noted by
    Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
    header.

    [1]:
    http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com

    [2]:
    http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com

    [3]:
    http://marc.info/?i=525C1149.6000701@mellanox.com

    Signed-off-by: Yann Droneaud
    Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com

    [ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ]

    Signed-off-by: Roland Dreier

    Yann Droneaud
     
  • The structure holding any types of flow_spec is of no use to
    userspace. It would be wrong for userspace to do:

    struct ib_uverbs_flow_spec flow_spec;

    flow_spec.type = IB_FLOW_SPEC_TCP;
    flow_spec.size = sizeof(flow_spec);

    Instead, userspace should use the dedicated flow_spec structure for
    - Ethernet : struct ib_uverbs_flow_spec_eth,
    - IPv4 : struct ib_uverbs_flow_spec_ipv4,
    - TCP/UDP : struct ib_uverbs_flow_spec_tcp_udp.

    In other words, struct ib_uverbs_flow_spec is a "virtual" data
    structure that can only be use by the kernel as an alias to the other.

    Signed-off-by: Yann Droneaud
    Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
    Signed-off-by: Roland Dreier

    Yann Droneaud
     
  • A common header will allows better checking of flow specs size, while
    ensuring strict alignment to 64 bits.

    Signed-off-by: Yann Droneaud
    Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
    Signed-off-by: Roland Dreier

    Yann Droneaud
     
  • This patch adds "flow" prefix to most of data structure added as part
    of commit 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through
    uverbs") to keep those names in sync with the data structures added in
    commit 319a441d1361 ("IB/core: Add receive flow steering support").

    It's just a matter of translating 'ib_flow' to 'ib_uverbs_flow'.

    Signed-off-by: Yann Droneaud
    Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
    Signed-off-by: Roland Dreier

    Yann Droneaud
     
  • Commit 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through
    uverbs") added public data structures to support receive flow
    steering. The new structs are not following the 'uverbs' pattern:
    they're lacking the common prefix 'ib_uverbs'.

    This patch replaces ib_kern prefix by ib_uverbs.

    Signed-off-by: Yann Droneaud
    Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
    Signed-off-by: Roland Dreier

    Yann Droneaud
     
  • This patch fixes the following issues:

    1. Unneeded checks were removed

    2. Removed the fixed size out of flow_attr.size, thus simplifying the checks.

    3. Remove a 32bit hole on 64bit systems with strict alignment in
    struct ib_kern_flow_att by adding a reserved field.

    Signed-off-by: Matan Barak
    Signed-off-by: Roland Dreier

    Matan Barak
     

22 Oct, 2013

1 commit

  • The create_flow/destroy_flow uverbs and the associated extensions to
    the user-kernel verbs ABI are under review and are too experimental to
    freeze at this point.

    So userspace is not exposed to experimental features and an uinstable
    ABI, temporarily disable this for v3.12 (with a Kconfig option behind
    staging to reenable it if desired).

    The feature will be enabled after proper cleanup for v3.13.

    Signed-off-by: Yann Droneaud
    Link: http://marc.info/?i=cover.1381351016.git.ydroneaud@opteya.com
    Link: http://marc.info/?i=cover.1381177342.git.ydroneaud@opteya.com

    [ Add a Kconfig option to reenable these verbs. - Roland ]

    Signed-off-by: Roland Dreier

    Yann Droneaud
     

29 Aug, 2013

2 commits

  • Implement ib_uverbs_create_flow() and ib_uverbs_destroy_flow() to
    support flow steering for user space applications.

    Signed-off-by: Hadar Hen Zion
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Hadar Hen Zion
     
  • Add infrastructure to support extended uverbs capabilities in a
    forward/backward manner. Uverbs command opcodes which are based on
    the verbs extensions approach should be greater or equal to
    IB_USER_VERBS_CMD_THRESHOLD. They have new header format and
    processed a bit differently.

    Whenever a specific IB_USER_VERBS_CMD_XXX is extended, which practically means
    it needs to have additional arguments, we will be able to add them without creating
    a completely new IB_USER_VERBS_CMD_YYY command or bumping the uverbs ABI version.

    This patch for itself doesn't provide the whole scheme which is also dependent
    on adding a comp_mask field to each extended uverbs command struct.

    The new header framework allows for future extension of the CMD arguments
    (ib_uverbs_cmd_hdr.in_words, ib_uverbs_cmd_hdr.out_words) for an existing
    new command (that is a command that supports the new uverbs command header format
    suggested in this patch) w/o bumping ABI version and with maintaining backward
    and formward compatibility to new and old libibverbs versions.

    In the uverbs command we are passing both uverbs arguments and the provider arguments.
    We split the ib_uverbs_cmd_hdr.in_words to ib_uverbs_cmd_hdr.in_words which will now carry only
    uverbs input argument struct size and ib_uverbs_cmd_hdr.provider_in_words that will carry
    the provider input argument size. Same goes for the response (the uverbs CMD output argument).

    For example take the create_cq call and the mlx4_ib provider:

    The uverbs layer gets libibverb's struct ibv_create_cq (named struct ib_uverbs_create_cq
    in the kernel), mlx4_ib gets libmlx4's struct mlx4_create_cq (which includes struct
    ibv_create_cq and is named struct mlx4_ib_create_cq in the kernel) and
    in_words = sizeof(mlx4_create_cq)/4 .

    Thus ib_uverbs_cmd_hdr.in_words carry both uverbs plus mlx4_ib input argument sizes,
    where uverbs assumes it knows the size of its input argument - struct ibv_create_cq.

    Now, if we wish to add a variable to struct ibv_create_cq, we can add a comp_mask field
    to the struct which is basically bit field indicating which fields exists in the struct
    (as done for the libibverbs API extension), but we need a way to tell what is the total
    size of the struct and not assume the struct size is predefined (since we may get different
    struct sizes from different user libibverbs versions). So we know at which point the
    provider input argument (struct mlx4_create_cq) begins. Same goes for extending the
    provider struct mlx4_create_cq. Thus we split the ib_uverbs_cmd_hdr.in_words to
    ib_uverbs_cmd_hdr.in_words which will now carry only uverbs input argument struct size and
    ib_uverbs_cmd_hdr.provider_in_words that will carry the provider (mlx4_ib) input argument size.

    Signed-off-by: Igor Ivanov
    Signed-off-by: Hadar Hen Zion
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Igor Ivanov
     

21 Jun, 2013

8 commits

  • Allow user space applications to join multicast groups using MGIDs
    directly. MGIDs may be passed using AF_IB addresses. Since the
    current multicast join command only supports addresses as large as
    sockaddr_in6, define a new structure for joining addresses specified
    using sockaddr_ib.

    Since AF_IB allows the user to specify the qkey when resolving a
    remote UD QP address, when joining the multicast group use the qkey
    value, if one has been assigned.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Allow user space applications to call resolve_addr using AF_IB. To
    support sockaddr_ib, we need to define a new structure capable of
    handling the larger address size.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Support user space binding to addresses using AF_IB. Since
    sockaddr_ib is larger than sockaddr_in6, we need to define a larger
    structure when binding using AF_IB. This time we use sockaddr_storage
    to cover future cases.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Several commands into the RDMA CM from user space are restricted to
    supporting addresses which fit into a sockaddr_in6 structure: bind
    address, resolve address, and join multicast.

    With the addition of AF_IB, we need to support addresses which are
    larger than sockaddr_in6. This will be done by adding new commands
    that exchange address information using sockaddr_storage. However, to
    support existing applications, we maintain the current commands and
    structures, but rename them to indicate that they only support IPv4
    and v6 addresses.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Part of address resolution is mapping IP addresses to IB GIDs. With
    the changes to support querying larger addresses and more path records,
    also provide a way to query IB GIDs after resolution completes.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • The current query_route call can return up to two path records. The
    assumption being that one is the primary path, with optional support
    for an alternate path. In both cases, the paths are assumed to be
    reversible and are used to send CM MADs.

    With the ability to manually set IB path data, the rdma cm can
    eventually be capable of using up to 6 paths per connection:

    forward primary, reverse primary,
    forward alternate, reverse alternate,
    reversible primary path for CM MADs
    reversible alternate path for CM MADs.

    (It is unclear at this time if IB routing will complicate this) In
    order to handle more flexible routing topologies, add a new command to
    report any number of paths.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • The sockaddr structure for AF_IB is larger than sockaddr_in6. The
    rdma cm user space ABI uses the latter to exchange address information
    between user space and the kernel.

    To support querying for larger addresses, define a new query command
    that exchanges data using sockaddr_storage, rather than sockaddr_in6.
    Unlike the existing query_route command, the new command only returns
    address information. Route (i.e. path record) data is separated.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Allow the user to specify the qkey when using AF_IB. The qkey is
    added to struct rdma_ucm_conn_param in place of a reserved field, but
    for backwards compatability, is only accessed if the associated
    rdma_cm_id is using AF_IB.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     

22 Feb, 2013

1 commit


22 Nov, 2012

1 commit


03 Oct, 2012

1 commit