14 Nov, 2018

1 commit

  • commit 9a59739bd01f77db6fbe2955a4fce165f0f43568 upstream.

    This enum has become part of the uABI, as both RXE and the
    ib_uverbs_post_send() command expect userspace to supply values from this
    enum. So it should be properly placed in include/uapi/rdma.

    In userspace this enum is called 'enum ibv_wr_opcode' as part of
    libibverbs.h. That enum defines different values for IB_WR_LOCAL_INV,
    IB_WR_SEND_WITH_INV, and IB_WR_LSO. These were introduced (incorrectly, it
    turns out) into libiberbs in 2015.

    The kernel has changed its mind on the numbering for several of the IB_WC
    values over the years, but has remained stable on IB_WR_LOCAL_INV and
    below.

    Based on this we can conclude that there is no real user space user of the
    values beyond IB_WR_ATOMIC_FETCH_AND_ADD, as they have never worked via
    rdma-core. This is confirmed by inspection, only rxe uses the kernel enum
    and implements the latter operations. rxe has clearly never worked with
    these attributes from userspace. Other drivers that support these opcodes
    implement the functionality without calling out to the kernel.

    To make IB_WR_SEND_WITH_INV and related work for RXE in userspace we
    choose to renumber the IB_WR enum in the kernel to match the uABI that
    userspace has bee using since before Soft RoCE was merged. This is an
    overall simpler configuration for the whole software stack, and obviously
    can't break anything existing.

    Reported-by: Seth Howell
    Tested-by: Seth Howell
    Fixes: 8700e3e7c485 ("Soft RoCE driver")
    Cc:
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Jason Gunthorpe
     

17 Aug, 2018

1 commit

  • Resolve merge conflicts from the -rc cycle against the rdma.git tree:

    Conflicts:
    drivers/infiniband/core/uverbs_cmd.c
    - New ifs added to ib_uverbs_ex_create_flow in -rc and for-next
    - Merge removal of file->ucontext in for-next with new code in -rc
    drivers/infiniband/core/uverbs_main.c
    - for-next removed code from ib_uverbs_write() that was modified
    in for-rc

    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

11 Aug, 2018

2 commits

  • Currently the struct uverbs_obj_type stored in the ib_uobject is part of
    the .rodata segment of the module that defines the object. This is a
    problem if drivers define new uapi objects as we will be left with a
    dangling pointer after device disassociation.

    Switch the uverbs_obj_type for struct uverbs_api_object, which is
    allocated memory that is part of the uverbs_api and is guaranteed to
    always exist. Further this moves the 'type_class' into this memory which
    means access to the IDR/FD function pointers is also guaranteed. Drivers
    cannot define new types.

    This makes it safe to continue to use all uobjects, including driver
    defined ones, after disassociation.

    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     
  • There is no reason for drivers to do this, the core code should take of
    everything. The drivers will provide their information from rodata to
    describe their modifications to the core's base uapi specification.

    The core uses this to build up the runtime uapi for each device.

    Signed-off-by: Jason Gunthorpe
    Reviewed-by: Michael J. Ruhl
    Reviewed-by: Leon Romanovsky

    Jason Gunthorpe
     

03 Aug, 2018

1 commit

  • Now that the unregister_netdev flow for IPoIB no longer relies on external
    code we can now introduce the use of priv_destructor and
    needs_free_netdev.

    The rdma_netdev flow is switched to use the netdev common priv_destructor
    instead of the special free_rdma_netdev and the IPOIB ULP adjusted:
    - priv_destructor needs to switch to point to the ULP's destructor
    which will then call the rdma_ndev's in the right order
    - We need to be careful around the error unwind of register_netdev
    as it sometimes calls priv_destructor on failure
    - ULPs need to use ndo_init/uninit to ensure proper ordering
    of failures around register_netdev

    Switching to priv_destructor is a necessary pre-requisite to using
    the rtnl new_link mechanism.

    The VNIC user for rdma_netdev should also be revised, but that is left for
    another patch.

    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Denis Drozdov
    Signed-off-by: Leon Romanovsky

    Jason Gunthorpe
     

02 Aug, 2018

2 commits

  • This does the same as the patch before, except for ioctl. The rules are
    the same, but for the ioctl methods the core code handles setting up the
    uobject.

    - Retrieve the ib_dev from the uobject->context->device. This is
    safe under ioctl as the core has already done rdma_alloc_begin_uobject
    and so CREATE calls are entirely protected by the rwsem.
    - Retrieve the ib_dev from uobject->object
    - Call ib_uverbs_get_ucontext()

    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     
  • There are several flows that can destroy a uobject and each one is
    minimized and sprinkled throughout the code base, making it difficult to
    understand and very hard to modify the destroy path.

    Consolidate all of these into uverbs_destroy_uobject() and call it in all
    cases where a uobject has to be destroyed.

    This makes one change to the lifecycle, during any abort (eg when
    alloc_commit is not called) we always call out to alloc_abort, even if
    remove_commit needs to be called to delete a HW object.

    This also renames RDMA_REMOVE_DURING_CLEANUP to RDMA_REMOVE_ABORT to
    clarify its actual usage and revises some of the comments to reflect what
    the life cycle is for the type implementation.

    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

31 Jul, 2018

2 commits

  • Since neither ib_post_send() nor ib_post_recv() modify the data structure
    their second argument points at, declare that argument const. This change
    makes it necessary to declare the 'bad_wr' argument const too and also to
    modify all ULPs that call ib_post_send(), ib_post_recv() or
    ib_post_srq_recv(). This patch does not change any functionality but makes
    it possible for the compiler to verify whether the
    ib_post_(send|recv|srq_recv) really do not modify the posted work request.

    To make this possible, only one cast had to be introduce that casts away
    constness, namely in rpcrdma_post_recvs(). The only way I can think of to
    avoid that cast is to introduce an additional loop in that function or to
    change the data type of bad_wr from struct ib_recv_wr ** into int
    (an index that refers to an element in the work request list). However,
    both approaches would require even more extensive changes than this
    patch.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Chuck Lever
    Signed-off-by: Jason Gunthorpe

    Bart Van Assche
     
  • When posting a send work request, the work request that is posted is not
    modified by any of the RDMA drivers. Make this explicit by constifying
    most ib_send_wr pointers in RDMA transport drivers.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Steve Wise
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe

    Bart Van Assche
     

26 Jul, 2018

1 commit

  • The locking here has always been a bit crazy and spread out, upon some
    careful analysis we can simplify things.

    Create a single function uverbs_destroy_ufile_hw() that internally handles
    all locking. This pulls together pieces of this process that were
    sprinkled all over the places into one place, and covers them with one
    lock.

    This eliminates several duplicate/confusing locks and makes the control
    flow in ib_uverbs_close() and ib_uverbs_free_hw_resources() extremely
    simple.

    Unfortunately we have to keep an extra mutex, ucontext_lock. This lock is
    logically part of the rwsem and provides the 'down write, fail if write
    locked, wait if read locked' semantic we require.

    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

25 Jul, 2018

3 commits


24 Jul, 2018

1 commit


11 Jul, 2018

3 commits

  • Enable uverbs_destroy_def_handler to be used by drivers and replace
    current code to use it.

    Signed-off-by: Yishai Hadas
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Yishai Hadas
     
  • Extend the existing grh_required flag to check when AV's are handled that
    a GRH is present.

    Since we don't want to do query_port during the AV checks for performance
    reasons move the flag into the immutable_data.

    Signed-off-by: Artemy Kovalyov
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Artemy Kovalyov
     
  • The internal flag IP_BASED_GIDS was added to a field that was being used
    to hold the port Info CapabilityMask without considering the effects this
    will have. Since most drivers just use the value from the HW MAD it means
    IP_BASED_GIDS will also become set on any HW that sets the IBA flag
    IsOtherLocalChangesNoticeSupported - which is not intended.

    Fix this by keeping port_cap_flags only for the IBA CapabilityMask value
    and store unrelated flags externally. Move the bit definitions for this to
    ib_mad.h to make it clear what is happening.

    To keep the uAPI unchanged define a new set of flags in the uapi header
    that are only used by ib_uverbs_query_port_resp.port_cap_flags which match
    the current flags supported in rdma-core, and the values exposed by the
    current kernel.

    Fixes: b4a26a27287a ("IB: Report using RoCE IP based gids in port caps")
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Artemy Kovalyov
    Signed-off-by: Leon Romanovsky

    Jason Gunthorpe
     

10 Jul, 2018

2 commits

  • The only purpose for this structure was to hold the ib_uobject_file
    pointer, but now that is part of the standard ib_uobject the structure
    no longer makes any sense, so get rid of it.

    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Leon Romanovsky

    Jason Gunthorpe
     
  • The IDR is part of the ib_ufile so all the machinery to lock it, handle
    closing and disassociation rightly belongs to the ufile not the ucontext.

    This changes the lifetime of that data to match the lifetime of the file
    descriptor which is always strictly longer than the lifetime of the
    ucontext.

    We need the entire locking machinery to continue to exist after ucontext
    destruction to allow us to return the destroy data after a device has been
    disassociated.

    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Leon Romanovsky

    Jason Gunthorpe
     

05 Jul, 2018

1 commit

  • The specs are required to operate the uverbs file, so they belong inside
    the ib_uverbs_device, not inside the ib_device. The spec passed in the
    ib_device is just a communication from the driver and should not be used
    during runtime.

    This also changes the lifetime of the spec memory to match the
    ib_uverbs_device, however at this time the spec_root can still contain
    driver pointers after disassociation, so it cannot be used if ib_dev is
    NULL. This is preparation for another series.

    Signed-off-by: Jason Gunthorpe
    Reviewed-by: Michael J. Ruhl
    Signed-off-by: Leon Romanovsky

    Jason Gunthorpe
     

30 Jun, 2018

1 commit

  • Improve uverbs_cleanup_ucontext algorithm to work properly when the
    topology graph of the objects cannot be determined at compile time. This
    is the case with objects created via the devx interface in mlx5.

    Typically uverbs objects must be created in a strict topologically sorted
    order, so that LIFO ordering will generally cause them to be freed
    properly. There are only a few cases (eg memory windows) where objects can
    point to things out of the strict LIFO order.

    Instead of using an explicit ordering scheme where the HW destroy is not
    allowed to fail, go over the list multiple times and allow the destroy
    function to fail. If progress halts then a final, desperate, cleanup is
    done before leaking the memory. This indicates a driver bug.

    Signed-off-by: Yishai Hadas
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Yishai Hadas
     

26 Jun, 2018

3 commits


21 Jun, 2018

1 commit

  • Pull rdma fixes from Jason Gunthorpe:
    "Here are eight fairly small fixes collected over the last two weeks.

    Regression and crashing bug fixes:

    - mlx4/5: Fixes for issues found from various checkers

    - A resource tracking and uverbs regression in the core code

    - qedr: NULL pointer regression found during testing

    - rxe: Various small bugs"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
    IB/rxe: Fix missing completion for mem_reg work requests
    RDMA/core: Save kernel caller name when creating CQ using ib_create_cq()
    IB/uverbs: Fix ordering of ucontext check in ib_uverbs_write
    IB/mlx4: Fix an error handling path in 'mlx4_ib_rereg_user_mr()'
    RDMA/qedr: Fix NULL pointer dereference when running over iWARP without RDMA-CM
    IB/mlx5: Fix return value check in flow_counters_set_data()
    IB/mlx5: Fix memory leak in mlx5_ib_create_flow
    IB/rxe: avoid double kfree skb

    Linus Torvalds
     

20 Jun, 2018

1 commit


19 Jun, 2018

9 commits

  • This patch replaces the ib_device_attr.max_sge with max_send_sge and
    max_recv_sge. It allows ulps to take advantage of devices that have very
    different send and recv sge depths. For example cxgb4 has a max_recv_sge
    of 4, yet a max_send_sge of 16. Splitting out these attributes allows
    much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW
    API. Consider a large RDMA WRITE that has 16 scattergather entries.
    With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of
    16, it can be done with 1 WRITE WR.

    Acked-by: Sagi Grimberg
    Acked-by: Christoph Hellwig
    Acked-by: Selvin Xavier
    Acked-by: Shiraz Saleem
    Acked-by: Dennis Dalessandro
    Signed-off-by: Steve Wise
    Signed-off-by: Jason Gunthorpe

    Steve Wise
     
  • Few kernel applications like SCST-iSER create CQ using ib_create_cq(),
    where accessing CQ structures using rdma restrack tool leads to below NULL
    pointer dereference. This patch saves caller kernel module name similar to
    ib_alloc_cq().

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] skip_spaces+0x30/0x30
    PGD 738bac067 PUD 8533f0067 PMD 0
    Oops: 0000 [#1] SMP
    R10: ffff88017fc03300 R11: 0000000000000246 R12: 0000000000000000
    R13: ffff88082fa5a668 R14: ffff88017475a000 R15: 0000000000000000
    FS: 00002b32726582c0(0000) GS:ffff88087fc40000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 00000008491a1000 CR4: 00000000003607e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    [] ? fill_res_name_pid+0x7c/0x90 [ib_core]
    [] fill_res_cq_entry+0xef/0x170 [ib_core]
    [] res_get_common_dumpit+0x3c4/0x480 [ib_core]
    [] nldev_res_get_cq_dumpit+0x13/0x20 [ib_core]
    [] netlink_dump+0x117/0x2e0
    [] __netlink_dump_start+0x1ab/0x230
    [] ibnl_rcv_msg+0x11d/0x1f0 [ib_core]
    [] ? nldev_res_get_mr_dumpit+0x20/0x20 [ib_core]
    [] ? rdma_nl_multicast+0x30/0x30 [ib_core]
    [] netlink_rcv_skb+0xa9/0xc0
    [] ibnl_rcv+0x98/0xb0 [ib_core]
    [] netlink_unicast+0xf2/0x1b0
    [] netlink_sendmsg+0x31f/0x6a0
    [] sock_sendmsg+0xb0/0xf0
    [] ? _raw_spin_unlock_bh+0x1e/0x20
    [] ? release_sock+0x118/0x170
    [] SYSC_sendto+0x121/0x1c0
    [] ? sock_alloc_file+0xa0/0x140
    [] ? __fd_install+0x25/0x60
    [] SyS_sendto+0xe/0x10
    [] system_call_fastpath+0x16/0x1b
    RIP [] skip_spaces+0x30/0x30
    RSP
    CR2: 0000000000000000

    Cc:
    Fixes: f66c8ba4c9fa ("RDMA/core: Save kernel caller name when creating PD and CQ objects")
    Reviewed-by: Steve Wise
    Signed-off-by: Potnuri Bharat Teja
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Bharat Potnuri
     
  • If the AH has a GRH then hold a reference to the sgid_attr inside the
    common struct.

    If the QP is modified with an AV that includes a GRH then also hold a
    reference to the sgid_attr inside the common struct.

    This informs the cache that the sgid_index is in-use so long as the AH or
    QP using it exists.

    This also means that all drivers can access the sgid_attr directly from
    the ah_attr instead of querying the cache during their UD post-send paths.

    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Leon Romanovsky

    Jason Gunthorpe
     
  • The core code now ensures that all driver callbacks that receive an
    rdma_ah_attrs will have a sgid_attr's pointer if there is a GRH present.

    Drivers can use this pointer instead of calling a query function with
    sgid_index. This simplifies the drivers and also avoids races where a
    gid_index lookup may return different data if it is changed.

    Signed-off-by: Parav Pandit
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Leon Romanovsky

    Parav Pandit
     
  • Introduce AH attribute copy, move and replace APIs to be used by core and
    provider drivers.

    In CM code flow when ah attribute might be re-initialized twice while
    processing incoming request, or initialized once while from path record
    while sending out CM requests. Therefore use rdma_move_ah_attr API to
    handle such scenarios instead of memcpy().

    Provider drivers keeps a copy ah_attr during the lifetime of the ah.
    Therefore, use rdma_replace_ah_attr() which conditionally release
    reference to old ah_attr and holds reference to new attribute whose
    referrence is released when the AH is freed.

    Signed-off-by: Parav Pandit
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Leon Romanovsky

    Jason Gunthorpe
     
  • The sgid_attr will ultimately replace the sgid_index in the ah_attr.
    This will allow for all layers to have a consistent view of what
    gid table entry was selected as processing runs through all stages of the
    stack.

    This commit introduces the pointer and ensures it is set before calling
    any driver callback that includes a struct ah_attr callback, allowing
    future patches to adjust both the drivers and the callers to use
    sgid_attr instead of sgid_index.

    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Parav Pandit
    Signed-off-by: Leon Romanovsky

    Jason Gunthorpe
     
  • If the gid_attr argument is NULL then the functions behave identically to
    rdma_query_gid. ib_query_gid just calls ib_get_cached_gid, so everything
    can be consolidated to one function.

    Now that all callers either use rdma_query_gid() or ib_get_cached_gid(),
    ib_query_gid() API is removed.

    Signed-off-by: Parav Pandit
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Parav Pandit
     
  • Now that ib_gid_attr contains the GID, make use of that in the add_gid()
    callback functions for the provider drivers to simplify the add_gid()
    implementations.

    Signed-off-by: Parav Pandit
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Parav Pandit
     
  • In order to be able to expose pointers to the ib_gid_attrs in the GID
    table we need to make it so the value of the pointer cannot be
    changed. Thus each GID table entry gets a unique piece of kref'd memory
    that is written only during initialization and remains constant for its
    lifetime.

    This eventually will allow the struct ib_gid_attrs to be returned without
    copy from many of query the APIs, but it also provides a way to track when
    all users of a HW table index go away.

    For roce we no longer allow an in-use HW table index to be re-used for a
    new an different entry. When a GID table entry needs to be removed it is
    hidden from the find API, but remains as a valid HW index and all
    ib_gid_attr points remain valid. The HW index is not relased until all
    users put the kref.

    Later patches will broadly replace the use of the sgid_index integer with
    the kref'd structure.

    Ultimately this will prevent security problems where the OS changes the
    properties of a HW GID table entry while an active user object is still
    using the entry.

    Signed-off-by: Parav Pandit
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Parav Pandit
     

13 Jun, 2018

1 commit


04 Jun, 2018

2 commits

  • T10-PI offload capability is currently supported in iSER protocol only,
    and the definition of the HCA protection information checks are missing
    from the core layer. Add those definition to avoid code duplication in
    other drivers (such iSER target and NVMeoF).

    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Martin K. Petersen
    Signed-off-by: Max Gurtovoy
    Signed-off-by: Jason Gunthorpe

    Max Gurtovoy
     
  • …/leon/linux-rdma.git into for-next

    Pull verbs counters series from Leon Romanovsky:

    ====================
    Verbs flow counters support

    This series comes to allow user space applications to monitor real time
    traffic activity and events of the verbs objects it manages, e.g.: ibv_qp,
    ibv_wq, ibv_flow.

    The API enables generic counters creation and define mapping to
    association with a verbs object, the current mlx5 driver is using this API
    for flow counters.

    With this API, an application can monitor the entire life cycle of object
    activity, defined here as a static counters attachment. This API also
    allows dynamic counters monitoring of measurement points for a partial
    period in the verbs object life cycle.

    In addition it presents the implementation of the generic counters
    interface.

    This will be achieved by extending flow creation by adding a new flow
    count specification type which allows the user to associate a previously
    created flow counters using the generic verbs counters interface to the
    created flow, once associated the user could read statistics by using the
    read function of the generic counters interface.

    The API includes:
    1. create and destroyed API of a new counters objects
    2. read the counters values from HW

    Note:
    Attaching API to allow application to define the measurement points per
    objects is a user space only API and this data is passed to kernel when
    the counted object (e.g. flow) is created with the counters object.
    ===================

    * tag 'verbs_flow_counters':
    IB/mlx5: Add counters read support
    IB/mlx5: Add flow counters read support
    IB/mlx5: Add flow counters binding support
    IB/mlx5: Add counters create and destroy support
    IB/uverbs: Add support for flow counters
    IB/core: Add support for flow counters
    IB/core: Support passing uhw for create_flow
    IB/uverbs: Add read counters support
    IB/core: Introduce counters read verb
    IB/uverbs: Add create/destroy counters support
    IB/core: Introduce counters object and its create/destroy
    IB/uverbs: Add an ib_uobject getter to ioctl() infrastructure
    net/mlx5: Export flow counter related API
    net/mlx5: Use flow counter pointer as input to the query function

    Jason Gunthorpe
     

02 Jun, 2018

2 commits

  • A counters object could be attached to flow on creation by providing the
    counter specification action.

    General counters description which count packets and bytes are introduced,
    downstream patches from this series will use them as part of flow counters
    binding.

    In addition, increase number of flow specifications supported layers to 10
    upon adding count specification and for the previously added drop
    specification.

    Reviewed-by: Yishai Hadas
    Signed-off-by: Raed Salem
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Raed Salem
     
  • This is required when user-space drivers need to pass extra information
    regarding how to handle this flow steering specification.

    Reviewed-by: Yishai Hadas
    Signed-off-by: Matan Barak
    Signed-off-by: Boris Pismenny
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Matan Barak