02 Jun, 2017

1 commit

  • Commit 9fdca4da4d8c (IB/SA: Split struct sa_path_rec based on IB and
    ROCE specific fields) moved the service_id to be specific attribute
    for IB and OPA SA Path Record, and thus wasn't assigned for RoCE.

    This caused to the following kernel panic in the CMA request handler flow:

    [ 27.074594] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    [ 27.074731] IP: __radix_tree_lookup+0x1d/0xe0
    ...
    [ 27.075356] Workqueue: ib_cm cm_work_handler [ib_cm]
    [ 27.075401] task: ffff88022e3b8000 task.stack: ffffc90001298000
    [ 27.075449] RIP: 0010:__radix_tree_lookup+0x1d/0xe0
    ...
    [ 27.075979] Call Trace:
    [ 27.076015] radix_tree_lookup+0xd/0x10
    [ 27.076055] cma_ps_find+0x59/0x70 [rdma_cm]
    [ 27.076097] cma_id_from_event+0xd2/0x470 [rdma_cm]
    [ 27.076144] ? ib_init_ah_from_path+0x39a/0x590 [ib_core]
    [ 27.076193] cma_req_handler+0x25/0x480 [rdma_cm]
    [ 27.076237] cm_process_work+0x25/0x120 [ib_cm]
    [ 27.076280] ? cm_get_bth_pkey.isra.62+0x3c/0xa0 [ib_cm]
    [ 27.076350] cm_req_handler+0xb03/0xd40 [ib_cm]
    [ 27.076430] ? sched_clock_cpu+0x11/0xb0
    [ 27.076478] cm_work_handler+0x194/0x1588 [ib_cm]
    [ 27.076525] process_one_work+0x160/0x410
    [ 27.076565] worker_thread+0x137/0x4a0
    [ 27.076614] kthread+0x112/0x150
    [ 27.076684] ? max_active_store+0x60/0x60
    [ 27.077642] ? kthread_park+0x90/0x90
    [ 27.078530] ret_from_fork+0x2c/0x40

    This patch moves it back to the common SA Path Record structure
    and removes the redundant setter and getter.

    Tested on Connect-IB and Connect-X4 in Infiniband and RoCE respectively.

    Fixes: 9fdca4da4d8c (IB/SA: Split struct sa_path_rec based on IB ands
    ROCE specific fields)
    Signed-off-by: Majd Dibbiny
    Reviewed-by: Parav Pandit
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Majd Dibbiny
     

02 May, 2017

5 commits


29 Apr, 2017

2 commits

  • For OPA devices, SA will query the OPA classport info
    instead of the IB defined classport info.
    opa classport info exposes additional information and
    capabilities that are specific to OPA devices.

    Reviewed-by: Ira Weiny
    Reviewed-by: Don Hiatt
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     
  • SA will query and cache class port info as part of
    its initialization. SA will also invalidate and
    refresh the cache based on specific events. Callers such
    as IPoIB and CM can query the SA to get the classportinfo
    information. Apart from making the caller code much simpler,
    this change puts the onus on the SA to query and maintain
    classportinfo much like how it maitains the address handle to the SM.

    Reviewed-by: Ira Weiny
    Reviewed-by: Don Hiatt
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     

13 Jan, 2017

1 commit

  • The RDMA core uses ib_pack() to convert from unpacked CPU structs
    to on-the-wire bitpacked structs.

    This process requires that 1 bit fields are declared as u8 in the
    unpacked struct, otherwise the packing process does not read the
    value properly and the packed result is wired to 0. Several
    places wrongly used int.

    Crucially this means the kernel has never, set reversible
    correctly in the path record request. It has always asked for
    irreversible paths even if the ULP requests otherwise.

    When the kernel is used with a SM that supports this feature, it
    completely breaks communication management if reversible paths are
    not properly requested.

    The only reason this ever worked is because opensm ignores the
    reversible bit.

    Cc: stable@vger.kernel.org
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Doug Ledford

    Jason Gunthorpe
     

04 Aug, 2016

1 commit

  • Added UCMA and CMA support for multicast join flags. Flags are
    passed using UCMA CM join command previously reserved fields.
    Currently supporting two join flags indicating two different
    multicast JoinStates:

    1. Full Member:
    The initiator creates the Multicast group(MCG) if it wasn't
    previously created, can send Multicast messages to the group
    and receive messages from the MCG.

    2. Send Only Full Member:
    The initiator creates the Multicast group(MCG) if it wasn't
    previously created, can send Multicast messages to the group
    but doesn't receive any messages from the MCG.

    IB: Send Only Full Member requires a query of ClassPortInfo
    to determine if SM/SA supports this option. If SM/SA
    doesn't support Send-Only there will be no join request
    sent and an error will be returned.

    ETH: When Send Only Full Member is requested no IGMP join
    will be sent.

    Signed-off-by: Alex Vesker
    Reviewed by: Hal Rosenstock
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Alex Vesker
     

26 May, 2016

1 commit


23 Dec, 2015

2 commits

  • Since RoCEv2 is a protocol over IP header it is required to send IGMP
    join and leave requests to the network when joining and leaving
    multicast groups.

    Signed-off-by: Moni Shoua
    Signed-off-by: Doug Ledford

    Moni Shoua
     
  • In order to support multiple GID types, we need to store the gid_type
    with each GID. This is also aligned with the RoCE v2 annex "RoCEv2 PORT
    GID table entries shall have a "GID type" attribute that denotes the L3
    Address type". The currently supported GID is IB_GID_TYPE_IB which is
    also RoCE v1 GID type.

    This implies that gid_type should be added to roce_gid_table meta-data.

    Signed-off-by: Matan Barak
    Signed-off-by: Doug Ledford

    Matan Barak
     

22 Oct, 2015

2 commits

  • The GID cache accompanies every GID with attributes.
    The GID attributes link the GID with its netdevice, which could be
    resolved to smac and vlan id easily. Since we've added the netdevice
    (ifindex and net) to the path record, storing the L2 attributes is
    duplicated data and hence these attributes are removed.

    Signed-off-by: Matan Barak
    Reviewed-By: Devesh Sharma
    Signed-off-by: Doug Ledford

    Matan Barak
     
  • In order to find the sgid_index, one could just query the IB cache
    with the correct GID and netdevice. Therefore, instead of storing
    the L2 attributes directly in the path, we only store the
    ifindex and net and use them later to get the sgid_index.
    The vlan_id and smac L2 attributes are removed in a later patch.

    Signed-off-by: Matan Barak
    Reviewed-By: Devesh Sharma
    Signed-off-by: Doug Ledford

    Matan Barak
     

15 Jan, 2014

1 commit

  • This patch add the support for Ethernet L2 attributes in the
    verbs/cm/cma structures.

    When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
    in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.

    Thus, those attributes were added to the following structures:

    * ib_ah_attr - added dmac
    * ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
    * ib_wc - added smac, vlan_id
    * ib_sa_path_rec - added smac, dmac, vlan_id
    * cm_av - added smac and vlan_id

    For the path record structure, extra care was taken to avoid the new
    fields when packing it into wire format, so we don't break the IB CM
    and SA wire protocol.

    On the active side, the CM fills. its internal structures from the
    path provided by the ULP. We add there taking the ETH L2 attributes
    and placing them into the CM Address Handle (struct cm_av).

    On the passive side, the CM fills its internal structures from the WC
    associated with the REQ message. We add there taking the ETH L2
    attributes from the WC.

    When the HW driver provides the required ETH L2 attributes in the WC,
    they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
    code checks for the presence of these flags, and in their absence does
    address resolution from the ib_init_ah_from_wc() helper function.

    ib_modify_qp_is_ok is also updated to consider the link layer. Some
    parameters are mandatory for Ethernet link layer, while they are
    irrelevant for IB. Vendor drivers are modified to support the new
    function signature.

    Signed-off-by: Matan Barak
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Matan Barak
     

21 Jun, 2013

1 commit


09 Jul, 2012

1 commit

  • This query is needed for SRIOV alias GUID support.

    The query is implemented per the IB Spec definition
    in section 15.2.5.18 (GuidInfoRecord).

    Signed-off-by: Erez Shitrit
    Signed-off-by: Jack Morgenstein
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Erez Shitrit
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

17 Nov, 2009

1 commit

  • Export rdma_set_ib_paths to user space to allow applications to
    manually set the IB path used for connections. This allows
    alternative ways for a user space application or library to obtain
    path record information, including retrieving path information
    from cached data, avoiding direct interaction with the IB SA.
    The IB SA is a single, centralized entity that can limit scaling
    on large clusters running MPI applications.

    Future changes to the rdma cm can expand on this framework to
    support the full range of features allowed by the IB CM, such as
    separate forward and reverse paths and APM.

    Signed-off-by: Sean Hefty
    Reviewed-By: Jason Gunthorpe
    Signed-off-by: Roland Dreier

    Sean Hefty
     

15 Jul, 2008

1 commit


10 Oct, 2007

1 commit


17 Feb, 2007

1 commit

  • The IB SA tracks multicast join/leave requests on a per port basis and
    does not do any reference counting: if two users of the same port join
    the same group, and one leaves that group, then the SA will remove the
    port from the group even though there is one user who wants to stay a
    member left. Therefore, in order to support multiple users of the
    same multicast group from the same port, we need to perform reference
    counting locally.

    To do this, add an multicast submodule to ib_sa to perform reference
    counting of multicast join/leave operations. Modify ib_ipoib (the
    only in-kernel user of multicast) to use the new interface.

    Signed-off-by: Roland Dreier

    Sean Hefty
     

23 Sep, 2006

2 commits

  • Relevant SA queries are actually "greater than" / "less than", not
    "greater than or equal" / "less than or equal" as the names imply.
    (See IB spec 1.2 Vol 1, 15.2.5.16 PATHRECORD/Table 205 PathRecord)

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Roland Dreier

    Michael S. Tsirkin
     
  • Require users to register with SA module, to prevent the sa_query
    module text from going away while an SA query callback is still
    running. Update all in-tree users for the new interface.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Michael S. Tsirkin
     

18 Jun, 2006

1 commit


11 Apr, 2006

1 commit

  • Push translation of static rate to HCA format into low-level drivers,
    where it belongs. For static rate encoding, use encoding of rate
    field from IB standard PathRecord, with addition of value 0, for
    backwards compatibility with current usage. The changes are:

    - Add enum ib_rate to midlayer includes.
    - Get rid of static rate translation in IPoIB; just use static rate
    directly from Path and MulticastGroup records.
    - Update mthca driver to translate absolute static rate into the
    format used by hardware. This also fixes mthca's static rate
    handling for HCAs that are capable of 4X DDR.

    Signed-off-by: Jack Morgenstein
    Signed-off-by: Roland Dreier

    Jack Morgenstein
     

09 Oct, 2005

1 commit

  • - added typedef unsigned int __nocast gfp_t;

    - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
    the same warnings as far as sparse is concerned, doesn't change
    generated code (from gcc point of view we replaced unsigned int with
    typedef) and documents what's going on far better.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

10 Sep, 2005

2 commits


27 Aug, 2005

1 commit