02 Jun, 2017

2 commits

  • Commit 9fdca4da4d8c (IB/SA: Split struct sa_path_rec based on IB and
    ROCE specific fields) moved the service_id to be specific attribute
    for IB and OPA SA Path Record, and thus wasn't assigned for RoCE.

    This caused to the following kernel panic in the CMA request handler flow:

    [ 27.074594] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    [ 27.074731] IP: __radix_tree_lookup+0x1d/0xe0
    ...
    [ 27.075356] Workqueue: ib_cm cm_work_handler [ib_cm]
    [ 27.075401] task: ffff88022e3b8000 task.stack: ffffc90001298000
    [ 27.075449] RIP: 0010:__radix_tree_lookup+0x1d/0xe0
    ...
    [ 27.075979] Call Trace:
    [ 27.076015] radix_tree_lookup+0xd/0x10
    [ 27.076055] cma_ps_find+0x59/0x70 [rdma_cm]
    [ 27.076097] cma_id_from_event+0xd2/0x470 [rdma_cm]
    [ 27.076144] ? ib_init_ah_from_path+0x39a/0x590 [ib_core]
    [ 27.076193] cma_req_handler+0x25/0x480 [rdma_cm]
    [ 27.076237] cm_process_work+0x25/0x120 [ib_cm]
    [ 27.076280] ? cm_get_bth_pkey.isra.62+0x3c/0xa0 [ib_cm]
    [ 27.076350] cm_req_handler+0xb03/0xd40 [ib_cm]
    [ 27.076430] ? sched_clock_cpu+0x11/0xb0
    [ 27.076478] cm_work_handler+0x194/0x1588 [ib_cm]
    [ 27.076525] process_one_work+0x160/0x410
    [ 27.076565] worker_thread+0x137/0x4a0
    [ 27.076614] kthread+0x112/0x150
    [ 27.076684] ? max_active_store+0x60/0x60
    [ 27.077642] ? kthread_park+0x90/0x90
    [ 27.078530] ret_from_fork+0x2c/0x40

    This patch moves it back to the common SA Path Record structure
    and removes the redundant setter and getter.

    Tested on Connect-IB and Connect-X4 in Infiniband and RoCE respectively.

    Fixes: 9fdca4da4d8c (IB/SA: Split struct sa_path_rec based on IB ands
    ROCE specific fields)
    Signed-off-by: Majd Dibbiny
    Reviewed-by: Parav Pandit
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Majd Dibbiny
     
  • RDMA netlink is part of ib_core, hence ibnl_chk_listeners(),
    ibnl_init() and ibnl_cleanup() don't need to be published
    in public header file.

    Let's remove EXPORT_SYMBOL from ibnl_chk_listeners() and move all these
    functions to private header file.

    CC: Yuval Shaia
    Signed-off-by: Leon Romanovsky
    Reviewed-by: Yuval Shaia
    Signed-off-by: Doug Ledford

    Leon Romanovsky
     

10 May, 2017

1 commit


09 May, 2017

1 commit


05 May, 2017

1 commit


04 May, 2017

1 commit

  • Pull rdma updates from Doug Ledford:
    "More exchaustive description of primary updates in this release:

    - Lots of driver fixes and misc fixes across the board.

    - I had to base on a net-next tree because the IPoIB Accelorator
    patches needed it.

    Unfortunately, it was known to Mellanox that there would need to be
    an IPoIB accelorator patch to the net tree (which left some
    functions turned off by an #ifdef construct to avoid warnings about
    defined but unused functions), then one to the RDMA tree, then a
    fixup that went back and re-enabled the functions in the net tree
    and enabled their use in the rdma tree

    Also, a sparse fix was sent to the net tree after I did my pull,
    and the fixup patch conflicts quite directly with that sparse fix,
    so I'm going to submit the fixup patch towards the end of the merge
    window by itself and based upon your master branch at the time.

    - Two separate rounds of hfi1 fixes, one that got dropped from last
    release because it came in just a day or two before the end of the
    merge window and then the one from this release cycle.

    Of note is that I now have a third series that just landed from
    Intel yesterday. It is not included in this pull request, but I may
    submit it by the end of the week. I'll talk to Intel about
    improving the timing of thier submissions for my workflow.

    - Changes to our idr usage in the RDMA subsystem that will tie into
    our cgroup management and also into the upcoming changes for the
    RDMA kerneluserspace API.

    - Addition of support for a netdev to be tied to an RDMA device at
    the core level

    - Addition of the VNIC driver from Intel.

    While IPoIB provides IP over InfiniBand (and *only* IP, no lower
    layer protocol headers are allowed or supported), the VNIC driver
    presents a virtual Ethernet device with support for things like
    varying Ethertypes, VLANs, priorities and other features of
    Ethernet.

    The virtual devices are centrally managed by the OPA fabric
    manager, making this (for the time being) a strictly OPA specific
    feature.

    - Improvements to the On-Demand Paging support in the RDMA subsystem.

    - Addition of three significant OPA changes.

    While we added OPA support some time ago (via the hfi1 driver), the
    RDMA subsystem has so far glossed over the areas where OPA and
    InfiniBand differ.

    With this release we are starting to add support for the OPA
    extensions into the RDMA core in the following area: Extended port
    information for OPA is now supported, extended Address Handle
    attributes for OPA are now supported, and extended SA Queries to
    get OPA specific subnet information is now supported.

    Concise summary from the tag:
    - idr usage and locking changes
    - build fix for hns
    - ipoib debug path record file fix
    - hfi1 updates
    - core RDMA netdev addition
    - Intel VNIC driver addition
    - Enhanced accelerators for IPoIB addition
    - Debug cleanups in cxgb3/4
    - Trivial cleanups from SF Markus Elfring
    - Misc rxe fixes from Mellanox
    - Misc ipoib fixes from Mellanox
    - Lots of mlx4/mlx5 changes from Mellanox
    - Misc fixes across the RDMA subsystem
    - ODP paging fixes and improvements
    - qedr updates
    - hfi1 updates
    - OPA port info patches
    - OPA AH patches
    - OPA SA Query patches"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (191 commits)
    infiniband: avoid dereferencing uninitialized dst on error path
    IB/SA: Add OPA addr header
    IB/mlx5: Add port_xmit_wait to counter registers read
    IB/ocrdma: fix out of bounds access to local buffer
    IB/mlx4: Fix incorrect order of formal and actual parameters
    IB/mlx4: Change flush logic so it adheres to the variable name
    mlx5: Fix mlx5_ib_map_mr_sg mr length
    IB/rxe: Don't clamp residual length to mtu
    IB/SA: Add support to query OPA path records
    IB/SA: Add OPA path record type
    IB/SA: Split struct sa_path_rec based on IB and ROCE specific fields
    IB/SA: Introduce path record specific types
    IB/SA: Rename ib_sa_path_rec to sa_path_rec
    IB/CM: Add braces when using sizeof
    IB/core: Define 'opa' rdma_ah_attr type
    IB/core: Define 'ib' and 'roce' rdma_ah_attr types
    IB/core: Use rdma_ah_attr accessor functions
    IB/core: Add accessor functions for rdma_ah_attr fields
    IB/PVRDMA: Rename ib_ah_attr related functions
    IB/mthca: Rename to_ib_ah_attr to to_rdma_ah_attr
    ...

    Linus Torvalds
     

02 May, 2017

14 commits

  • When importing the patch 57520751445b (IB/SA: Add OPA path record type),
    a new header file should have been added to the repo as part of the
    patch. However, as the patch didn't apply cleanly using git am, I
    instead used patch manually, and followed that up with git add -u, which
    misses new files. This adds the new file back in.

    Fixes: 57520751445b (IB/SA: Add OPA path record type)
    Signed-off-by: Doug Ledford

    Doug Ledford
     
  • When the bit 26 of capmask2 field in OPA classport info
    query is set, SA will query for OPA path records instead
    of querying for IB path records. Note that OPA
    path records can only be queried by kernel ULPs.
    Userspace clients continue to query IB path records.

    Reviewed-by: Don Hiatt
    Reviewed-by: Ira Weiny
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     
  • Add opa_sa_path_rec to sa_path_rec data structure.
    The 'type' field in sa_path_rec identifies the
    type of the path record.

    Reviewed-by: Don Hiatt
    Reviewed-by: Ira Weiny
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     
  • sa_path_rec now contains a union of sa_path_rec_ib and sa_path_rec_roce
    based on the type of the path record. Note that fields applicable to
    path record type ROCE v1 and ROCE v2 fall under sa_path_rec_roce.
    Accessor functions are added to these fields so the caller doesn't have
    to know the type.

    Reviewed-by: Don Hiatt
    Reviewed-by: Ira Weiny
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     
  • struct sa_path_rec has a gid_type field. This patch introduces a more
    generic path record specific type 'rec_type' which is either IB, ROCE v1
    or ROCE v2. The patch also provides conversion functions to get
    a gid type from a path record type and vice versa

    Reviewed-by: Don Hiatt
    Reviewed-by: Ira Weiny
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     
  • Rename ib_sa_path_rec to a more generic sa_path_rec.
    This is part of extending ib_sa to also support OPA
    path records in addition to the IB defined path records.

    Reviewed-by: Don Hiatt
    Reviewed-by: Ira Weiny
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     
  • OPA ah_attr types allows core components to specify
    attributes that may be specific to opa devices.
    For instance, opa type ah_attr provides 32 bit lids
    enabling larger OPA fabric sizes.

    Reviewed-by: Ira Weiny
    Reviewed-by: Don Hiatt
    Reviewed-by: Sean Hefty
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     
  • rdma_ah_attr can now be either ib or roce allowing
    core components to use one type or the other and also
    to define attributes unique to a specific type. struct
    ib_ah is also initialized with the type when its first
    created. This ensures that calls such as modify_ah
    dont modify the type of the address handle attribute.

    Reviewed-by: Ira Weiny
    Reviewed-by: Don Hiatt
    Reviewed-by: Sean Hefty
    Reviewed-by: Niranjana Vishwanathapura
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     
  • These accessor functions are supposed to be used to get
    and set individual fields of struct rdma_ah_attr

    Reviewed-by: Ira Weiny
    Reviewed-by: Don Hiatt
    Reviewed-by: Sean Hefty
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     
  • Rename ib_destroy_ah to rdma_destroy_ah so its in sync with the
    rename of the ib address handle attribute

    Reviewed-by: Ira Weiny
    Reviewed-by: Don Hiatt
    Reviewed-by: Sean Hefty
    Reviewed-by: Niranjana Vishwanathapura
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     
  • Rename ib_query_ah to rdma_query_ah so its in sync with the
    rename of the ib address handle attribute

    Reviewed-by: Ira Weiny
    Reviewed-by: Don Hiatt
    Reviewed-by: Sean Hefty
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     
  • Rename ib_modify_ah to rdma_modify_ah so its in sync with the
    rename of the ib address handle attribute

    Reviewed-by: Ira Weiny
    Reviewed-by: Don Hiatt
    Reviewed-by: Sean Hefty
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     
  • Rename ib_create_ah to rdma_create_ah so its in sync with the
    rename of the ib address handle attribute

    Reviewed-by: Ira Weiny
    Reviewed-by: Don Hiatt
    Reviewed-by: Sean Hefty
    Reviewed-by: Niranjana Vishwanathapura
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     
  • This patch simply renames struct ib_ah_attr to
    rdma_ah_attr as these fields specify attributes that are
    not necessarily specific to IB.

    Reviewed-by: Ira Weiny
    Reviewed-by: Don Hiatt
    Reviewed-by: Niranjana Vishwanathapura
    Reviewed-by: Sean Hefty
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     

29 Apr, 2017

7 commits

  • For OPA devices, SA will query the OPA classport info
    instead of the IB defined classport info.
    opa classport info exposes additional information and
    capabilities that are specific to OPA devices.

    Reviewed-by: Ira Weiny
    Reviewed-by: Don Hiatt
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     
  • Both opa_vnic and the hfi driver use the same opa_classport_info
    definition. We will also have ib_sa capable of querying opa class
    port info and would need this definition. Move it to ib_mad.h
    for everyone to use.

    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     
  • rdma_cap_opa_ah(..) enables core components to check if the
    corresponding port supports OPA extended addressing.

    Reviewed-by: Ira Weiny
    Reviewed-by: Don Hiatt
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     
  • SA will query and cache class port info as part of
    its initialization. SA will also invalidate and
    refresh the cache based on specific events. Callers such
    as IPoIB and CM can query the SA to get the classportinfo
    information. Apart from making the caller code much simpler,
    this change puts the onus on the SA to query and maintain
    classportinfo much like how it maitains the address handle to the SM.

    Reviewed-by: Ira Weiny
    Reviewed-by: Don Hiatt
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     
  • Move FECN and BECN related defines to common header files

    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Don Hiatt
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Don Hiatt
     
  • These inline functions improve code readability by
    enabling callers to read specific fields from the
    header without knowledge of byte offsets.

    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Don Hiatt
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Don Hiatt
     
  • The Infiniband spec defines "A multicast address is defined by a
    MGID and a MLID" (section 10.5).

    The current code only uses the MGID for identifying multicast groups.
    Update the driver to be compliant with this definition.

    Reviewed-by: Ira Weiny
    Reviewed-by: Dasaratharaman Chandramouli
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Michael J. Ruhl
     

26 Apr, 2017

4 commits

  • Add IB_ACCESS_HUGETLB ib_reg_mr flag.
    Hugetlb region registered with this flag
    will use single translation entry per huge page.

    Signed-off-by: Artemy Kovalyov
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Artemy Kovalyov
     
  • Currenlty ODP supports only regular MMU pages.
    Add ODP support for regions consisting of physically contiguous chunks
    of arbitrary order (huge pages for instance) to improve performance.

    Signed-off-by: Artemy Kovalyov
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Artemy Kovalyov
     
  • Size of pages are held by struct ib_umem in page_size field.

    It is better to store it as an exponent, because page size by nature
    is always power-of-two and used as a factor, divisor or ilog2's argument.

    The conversion of page_size to be page_shift allows to have portable
    code and avoid following error while compiling on ARM:

    ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined!

    CC: Selvin Xavier
    CC: Steve Wise
    CC: Lijun Ou
    CC: Shiraz Saleem
    CC: Adit Ranadive
    CC: Dennis Dalessandro
    CC: Ram Amrani
    Signed-off-by: Artemy Kovalyov
    Signed-off-by: Leon Romanovsky
    Acked-by: Ram Amrani
    Acked-by: Shiraz Saleem
    Acked-by: Selvin Xavier
    Acked-by: Selvin Xavier
    Acked-by: Adit Ranadive
    Signed-off-by: Doug Ledford

    Artemy Kovalyov
     
  • The function ib_unregister_mad_agent always returns zero. And
    this returned value is not checked. As such, chane the return
    type to void.

    CC: Joe Jin
    CC: Junxiao Bi
    Signed-off-by: Zhu Yanjun
    Reviewed-by: Yuval Shaia
    Reviewed-by: Hal Rosenstock
    Signed-off-by: Doug Ledford

    Zhu Yanjun
     

22 Apr, 2017

2 commits

  • Add high data rate speed to the ib_port_speed enumeration.

    Signed-off-by: Noa Osherovich
    Signed-off-by: Eran Ben Elisha
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Noa Osherovich
     
  • This flow steering specification identifies flow for drop by the HW.
    If user create a flow only with the drop specification,
    then all the packets that hit this flow will be dropped, otherwise the HW
    will drop only the packets that match the other L2/L3/L4 specifications.

    Signed-off-by: Slava Shwartsman
    Reviewed-by: Maor Gottlieb
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Slava Shwartsman
     

21 Apr, 2017

5 commits

  • Add RDMA netdev interface to ib device structure allowing RDMA
    netdev devices to be allocated by ib clients.

    The idea is to allow to providers to optimize IPoIB data path.
    New struct that includes functions and data member is exposed.
    It exposes set of callback functions for handling data path flows
    in IPoIB driver.

    Each provider can support these set of functions in order
    to optimize its specific data path, and let IPoIB to leverage
    its data path.

    There is an assumption, that providers should give the full set
    of functions and not only part of them, in order to work properly.

    Signed-off-by: Erez Shitrit
    Signed-off-by: Niranjana Vishwanathapura
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Niranjana Vishwanathapura
     
  • HFI1 HW specific support for VNIC functionality.
    Dynamically allocate a set of contexts for VNIC when the first vnic
    port is instantiated. Allocate VNIC contexts from user contexts pool
    and return them back to the same pool while freeing up. Set aside
    enough MSI-X interrupts for VNIC contexts and assign them when the
    contexts are allocated. On the receive side, use an RSM rule to
    spread TCP/UDP streams among VNIC contexts.

    Reviewed-by: Dennis Dalessandro
    Reviewed-by: Ira Weiny
    Signed-off-by: Niranjana Vishwanathapura
    Signed-off-by: Andrzej Kacprowski
    Signed-off-by: Doug Ledford

    Vishwanathapura, Niranjana
     
  • Define OPA VNIC interface between hardware independent VNIC
    functionality and the hardware dependent VNIC functionality.

    Reviewed-by: Dennis Dalessandro
    Reviewed-by: Ira Weiny
    Signed-off-by: Niranjana Vishwanathapura
    Signed-off-by: Doug Ledford

    Vishwanathapura, Niranjana
     
  • Add rdma netdev interface to ib device structure allowing rdma netdev
    devices to be allocated by ib clients.

    Reviewed-by: Dennis Dalessandro
    Reviewed-by: Ira Weiny
    Signed-off-by: Niranjana Vishwanathapura
    Signed-off-by: Doug Ledford

    Vishwanathapura, Niranjana
     
  • Doug Ledford
     

20 Apr, 2017

1 commit


06 Apr, 2017

1 commit

  • Add ability to fault packets on transmit by opcode.
    Dropping by packet can be achieved by setting the mask to 0.

    In order to drop non-verbs traffic we set PbcInsertHrc
    to NONE (0x2). The packet will still be delivered to
    the receiving node but a KHdrHCRCErr (KDETH packet
    with a bad HCRC) will be triggered and the packet will
    not be delivered to the correct context.

    In order to drop regular verbs traffic we set the
    PbcTestEbp flag. The packet will still be delivered
    to the receiving node but a 'late ebp error' will
    be triggered and will be dropped.

    A global toggle (/sys/kernel/debug/hfi1/hfi1_X/fault_suppress_err)
    has been added to suppress the error messages on the receive
    node when a packet was faulted on the sending node.

    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Don Hiatt
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Don Hiatt