02 Jun, 2017
2 commits
-
Commit 9fdca4da4d8c (IB/SA: Split struct sa_path_rec based on IB and
ROCE specific fields) moved the service_id to be specific attribute
for IB and OPA SA Path Record, and thus wasn't assigned for RoCE.This caused to the following kernel panic in the CMA request handler flow:
[ 27.074594] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 27.074731] IP: __radix_tree_lookup+0x1d/0xe0
...
[ 27.075356] Workqueue: ib_cm cm_work_handler [ib_cm]
[ 27.075401] task: ffff88022e3b8000 task.stack: ffffc90001298000
[ 27.075449] RIP: 0010:__radix_tree_lookup+0x1d/0xe0
...
[ 27.075979] Call Trace:
[ 27.076015] radix_tree_lookup+0xd/0x10
[ 27.076055] cma_ps_find+0x59/0x70 [rdma_cm]
[ 27.076097] cma_id_from_event+0xd2/0x470 [rdma_cm]
[ 27.076144] ? ib_init_ah_from_path+0x39a/0x590 [ib_core]
[ 27.076193] cma_req_handler+0x25/0x480 [rdma_cm]
[ 27.076237] cm_process_work+0x25/0x120 [ib_cm]
[ 27.076280] ? cm_get_bth_pkey.isra.62+0x3c/0xa0 [ib_cm]
[ 27.076350] cm_req_handler+0xb03/0xd40 [ib_cm]
[ 27.076430] ? sched_clock_cpu+0x11/0xb0
[ 27.076478] cm_work_handler+0x194/0x1588 [ib_cm]
[ 27.076525] process_one_work+0x160/0x410
[ 27.076565] worker_thread+0x137/0x4a0
[ 27.076614] kthread+0x112/0x150
[ 27.076684] ? max_active_store+0x60/0x60
[ 27.077642] ? kthread_park+0x90/0x90
[ 27.078530] ret_from_fork+0x2c/0x40This patch moves it back to the common SA Path Record structure
and removes the redundant setter and getter.Tested on Connect-IB and Connect-X4 in Infiniband and RoCE respectively.
Fixes: 9fdca4da4d8c (IB/SA: Split struct sa_path_rec based on IB ands
ROCE specific fields)
Signed-off-by: Majd Dibbiny
Reviewed-by: Parav Pandit
Signed-off-by: Leon Romanovsky
Signed-off-by: Doug Ledford -
RDMA netlink is part of ib_core, hence ibnl_chk_listeners(),
ibnl_init() and ibnl_cleanup() don't need to be published
in public header file.Let's remove EXPORT_SYMBOL from ibnl_chk_listeners() and move all these
functions to private header file.CC: Yuval Shaia
Signed-off-by: Leon Romanovsky
Reviewed-by: Yuval Shaia
Signed-off-by: Doug Ledford
10 May, 2017
1 commit
-
This patch prepares the uapi export by fixing the following error:
.../linux/smc_diag.h:6:27: fatal error: rdma/ib_verbs.h: No such file or directory
#includeSigned-off-by: Nicolas Dichtel
Signed-off-by: Masahiro Yamada
09 May, 2017
1 commit
05 May, 2017
1 commit
-
This field is causing excessive cache line bouncing.
There are spare bytes in the r_lock cache line so the best approach
is to make an rvt QP field and remove from the hfi1 priv field.Signed-off-by: Sebastian Sanchez
Signed-off-by: Mike Marciniszyn
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford
04 May, 2017
1 commit
-
Pull rdma updates from Doug Ledford:
"More exchaustive description of primary updates in this release:- Lots of driver fixes and misc fixes across the board.
- I had to base on a net-next tree because the IPoIB Accelorator
patches needed it.Unfortunately, it was known to Mellanox that there would need to be
an IPoIB accelorator patch to the net tree (which left some
functions turned off by an #ifdef construct to avoid warnings about
defined but unused functions), then one to the RDMA tree, then a
fixup that went back and re-enabled the functions in the net tree
and enabled their use in the rdma treeAlso, a sparse fix was sent to the net tree after I did my pull,
and the fixup patch conflicts quite directly with that sparse fix,
so I'm going to submit the fixup patch towards the end of the merge
window by itself and based upon your master branch at the time.- Two separate rounds of hfi1 fixes, one that got dropped from last
release because it came in just a day or two before the end of the
merge window and then the one from this release cycle.Of note is that I now have a third series that just landed from
Intel yesterday. It is not included in this pull request, but I may
submit it by the end of the week. I'll talk to Intel about
improving the timing of thier submissions for my workflow.- Changes to our idr usage in the RDMA subsystem that will tie into
our cgroup management and also into the upcoming changes for the
RDMA kerneluserspace API.- Addition of support for a netdev to be tied to an RDMA device at
the core level- Addition of the VNIC driver from Intel.
While IPoIB provides IP over InfiniBand (and *only* IP, no lower
layer protocol headers are allowed or supported), the VNIC driver
presents a virtual Ethernet device with support for things like
varying Ethertypes, VLANs, priorities and other features of
Ethernet.The virtual devices are centrally managed by the OPA fabric
manager, making this (for the time being) a strictly OPA specific
feature.- Improvements to the On-Demand Paging support in the RDMA subsystem.
- Addition of three significant OPA changes.
While we added OPA support some time ago (via the hfi1 driver), the
RDMA subsystem has so far glossed over the areas where OPA and
InfiniBand differ.With this release we are starting to add support for the OPA
extensions into the RDMA core in the following area: Extended port
information for OPA is now supported, extended Address Handle
attributes for OPA are now supported, and extended SA Queries to
get OPA specific subnet information is now supported.Concise summary from the tag:
- idr usage and locking changes
- build fix for hns
- ipoib debug path record file fix
- hfi1 updates
- core RDMA netdev addition
- Intel VNIC driver addition
- Enhanced accelerators for IPoIB addition
- Debug cleanups in cxgb3/4
- Trivial cleanups from SF Markus Elfring
- Misc rxe fixes from Mellanox
- Misc ipoib fixes from Mellanox
- Lots of mlx4/mlx5 changes from Mellanox
- Misc fixes across the RDMA subsystem
- ODP paging fixes and improvements
- qedr updates
- hfi1 updates
- OPA port info patches
- OPA AH patches
- OPA SA Query patches"* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (191 commits)
infiniband: avoid dereferencing uninitialized dst on error path
IB/SA: Add OPA addr header
IB/mlx5: Add port_xmit_wait to counter registers read
IB/ocrdma: fix out of bounds access to local buffer
IB/mlx4: Fix incorrect order of formal and actual parameters
IB/mlx4: Change flush logic so it adheres to the variable name
mlx5: Fix mlx5_ib_map_mr_sg mr length
IB/rxe: Don't clamp residual length to mtu
IB/SA: Add support to query OPA path records
IB/SA: Add OPA path record type
IB/SA: Split struct sa_path_rec based on IB and ROCE specific fields
IB/SA: Introduce path record specific types
IB/SA: Rename ib_sa_path_rec to sa_path_rec
IB/CM: Add braces when using sizeof
IB/core: Define 'opa' rdma_ah_attr type
IB/core: Define 'ib' and 'roce' rdma_ah_attr types
IB/core: Use rdma_ah_attr accessor functions
IB/core: Add accessor functions for rdma_ah_attr fields
IB/PVRDMA: Rename ib_ah_attr related functions
IB/mthca: Rename to_ib_ah_attr to to_rdma_ah_attr
...
02 May, 2017
14 commits
-
When importing the patch 57520751445b (IB/SA: Add OPA path record type),
a new header file should have been added to the repo as part of the
patch. However, as the patch didn't apply cleanly using git am, I
instead used patch manually, and followed that up with git add -u, which
misses new files. This adds the new file back in.Fixes: 57520751445b (IB/SA: Add OPA path record type)
Signed-off-by: Doug Ledford -
When the bit 26 of capmask2 field in OPA classport info
query is set, SA will query for OPA path records instead
of querying for IB path records. Note that OPA
path records can only be queried by kernel ULPs.
Userspace clients continue to query IB path records.Reviewed-by: Don Hiatt
Reviewed-by: Ira Weiny
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Doug Ledford -
Add opa_sa_path_rec to sa_path_rec data structure.
The 'type' field in sa_path_rec identifies the
type of the path record.Reviewed-by: Don Hiatt
Reviewed-by: Ira Weiny
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Doug Ledford -
sa_path_rec now contains a union of sa_path_rec_ib and sa_path_rec_roce
based on the type of the path record. Note that fields applicable to
path record type ROCE v1 and ROCE v2 fall under sa_path_rec_roce.
Accessor functions are added to these fields so the caller doesn't have
to know the type.Reviewed-by: Don Hiatt
Reviewed-by: Ira Weiny
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Doug Ledford -
struct sa_path_rec has a gid_type field. This patch introduces a more
generic path record specific type 'rec_type' which is either IB, ROCE v1
or ROCE v2. The patch also provides conversion functions to get
a gid type from a path record type and vice versaReviewed-by: Don Hiatt
Reviewed-by: Ira Weiny
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Doug Ledford -
Rename ib_sa_path_rec to a more generic sa_path_rec.
This is part of extending ib_sa to also support OPA
path records in addition to the IB defined path records.Reviewed-by: Don Hiatt
Reviewed-by: Ira Weiny
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Doug Ledford -
OPA ah_attr types allows core components to specify
attributes that may be specific to opa devices.
For instance, opa type ah_attr provides 32 bit lids
enabling larger OPA fabric sizes.Reviewed-by: Ira Weiny
Reviewed-by: Don Hiatt
Reviewed-by: Sean Hefty
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Doug Ledford -
rdma_ah_attr can now be either ib or roce allowing
core components to use one type or the other and also
to define attributes unique to a specific type. struct
ib_ah is also initialized with the type when its first
created. This ensures that calls such as modify_ah
dont modify the type of the address handle attribute.Reviewed-by: Ira Weiny
Reviewed-by: Don Hiatt
Reviewed-by: Sean Hefty
Reviewed-by: Niranjana Vishwanathapura
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Doug Ledford -
These accessor functions are supposed to be used to get
and set individual fields of struct rdma_ah_attrReviewed-by: Ira Weiny
Reviewed-by: Don Hiatt
Reviewed-by: Sean Hefty
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Doug Ledford -
Rename ib_destroy_ah to rdma_destroy_ah so its in sync with the
rename of the ib address handle attributeReviewed-by: Ira Weiny
Reviewed-by: Don Hiatt
Reviewed-by: Sean Hefty
Reviewed-by: Niranjana Vishwanathapura
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Doug Ledford -
Rename ib_query_ah to rdma_query_ah so its in sync with the
rename of the ib address handle attributeReviewed-by: Ira Weiny
Reviewed-by: Don Hiatt
Reviewed-by: Sean Hefty
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Doug Ledford -
Rename ib_modify_ah to rdma_modify_ah so its in sync with the
rename of the ib address handle attributeReviewed-by: Ira Weiny
Reviewed-by: Don Hiatt
Reviewed-by: Sean Hefty
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Doug Ledford -
Rename ib_create_ah to rdma_create_ah so its in sync with the
rename of the ib address handle attributeReviewed-by: Ira Weiny
Reviewed-by: Don Hiatt
Reviewed-by: Sean Hefty
Reviewed-by: Niranjana Vishwanathapura
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Doug Ledford -
This patch simply renames struct ib_ah_attr to
rdma_ah_attr as these fields specify attributes that are
not necessarily specific to IB.Reviewed-by: Ira Weiny
Reviewed-by: Don Hiatt
Reviewed-by: Niranjana Vishwanathapura
Reviewed-by: Sean Hefty
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Doug Ledford
29 Apr, 2017
7 commits
-
For OPA devices, SA will query the OPA classport info
instead of the IB defined classport info.
opa classport info exposes additional information and
capabilities that are specific to OPA devices.Reviewed-by: Ira Weiny
Reviewed-by: Don Hiatt
Reviewed-by: Dennis Dalessandro
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Doug Ledford -
Both opa_vnic and the hfi driver use the same opa_classport_info
definition. We will also have ib_sa capable of querying opa class
port info and would need this definition. Move it to ib_mad.h
for everyone to use.Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Doug Ledford -
rdma_cap_opa_ah(..) enables core components to check if the
corresponding port supports OPA extended addressing.Reviewed-by: Ira Weiny
Reviewed-by: Don Hiatt
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Doug Ledford -
SA will query and cache class port info as part of
its initialization. SA will also invalidate and
refresh the cache based on specific events. Callers such
as IPoIB and CM can query the SA to get the classportinfo
information. Apart from making the caller code much simpler,
this change puts the onus on the SA to query and maintain
classportinfo much like how it maitains the address handle to the SM.Reviewed-by: Ira Weiny
Reviewed-by: Don Hiatt
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Doug Ledford -
Move FECN and BECN related defines to common header files
Reviewed-by: Dennis Dalessandro
Signed-off-by: Don Hiatt
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford -
These inline functions improve code readability by
enabling callers to read specific fields from the
header without knowledge of byte offsets.Reviewed-by: Dennis Dalessandro
Signed-off-by: Don Hiatt
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford -
The Infiniband spec defines "A multicast address is defined by a
MGID and a MLID" (section 10.5).The current code only uses the MGID for identifying multicast groups.
Update the driver to be compliant with this definition.Reviewed-by: Ira Weiny
Reviewed-by: Dasaratharaman Chandramouli
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford
26 Apr, 2017
4 commits
-
Add IB_ACCESS_HUGETLB ib_reg_mr flag.
Hugetlb region registered with this flag
will use single translation entry per huge page.Signed-off-by: Artemy Kovalyov
Signed-off-by: Leon Romanovsky
Signed-off-by: Doug Ledford -
Currenlty ODP supports only regular MMU pages.
Add ODP support for regions consisting of physically contiguous chunks
of arbitrary order (huge pages for instance) to improve performance.Signed-off-by: Artemy Kovalyov
Signed-off-by: Leon Romanovsky
Signed-off-by: Doug Ledford -
Size of pages are held by struct ib_umem in page_size field.
It is better to store it as an exponent, because page size by nature
is always power-of-two and used as a factor, divisor or ilog2's argument.The conversion of page_size to be page_shift allows to have portable
code and avoid following error while compiling on ARM:ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined!
CC: Selvin Xavier
CC: Steve Wise
CC: Lijun Ou
CC: Shiraz Saleem
CC: Adit Ranadive
CC: Dennis Dalessandro
CC: Ram Amrani
Signed-off-by: Artemy Kovalyov
Signed-off-by: Leon Romanovsky
Acked-by: Ram Amrani
Acked-by: Shiraz Saleem
Acked-by: Selvin Xavier
Acked-by: Selvin Xavier
Acked-by: Adit Ranadive
Signed-off-by: Doug Ledford -
The function ib_unregister_mad_agent always returns zero. And
this returned value is not checked. As such, chane the return
type to void.CC: Joe Jin
CC: Junxiao Bi
Signed-off-by: Zhu Yanjun
Reviewed-by: Yuval Shaia
Reviewed-by: Hal Rosenstock
Signed-off-by: Doug Ledford
22 Apr, 2017
2 commits
-
Add high data rate speed to the ib_port_speed enumeration.
Signed-off-by: Noa Osherovich
Signed-off-by: Eran Ben Elisha
Signed-off-by: Leon Romanovsky
Signed-off-by: Doug Ledford -
This flow steering specification identifies flow for drop by the HW.
If user create a flow only with the drop specification,
then all the packets that hit this flow will be dropped, otherwise the HW
will drop only the packets that match the other L2/L3/L4 specifications.Signed-off-by: Slava Shwartsman
Reviewed-by: Maor Gottlieb
Signed-off-by: Leon Romanovsky
Signed-off-by: Doug Ledford
21 Apr, 2017
5 commits
-
Add RDMA netdev interface to ib device structure allowing RDMA
netdev devices to be allocated by ib clients.The idea is to allow to providers to optimize IPoIB data path.
New struct that includes functions and data member is exposed.
It exposes set of callback functions for handling data path flows
in IPoIB driver.Each provider can support these set of functions in order
to optimize its specific data path, and let IPoIB to leverage
its data path.There is an assumption, that providers should give the full set
of functions and not only part of them, in order to work properly.Signed-off-by: Erez Shitrit
Signed-off-by: Niranjana Vishwanathapura
Signed-off-by: Leon Romanovsky
Signed-off-by: Doug Ledford -
HFI1 HW specific support for VNIC functionality.
Dynamically allocate a set of contexts for VNIC when the first vnic
port is instantiated. Allocate VNIC contexts from user contexts pool
and return them back to the same pool while freeing up. Set aside
enough MSI-X interrupts for VNIC contexts and assign them when the
contexts are allocated. On the receive side, use an RSM rule to
spread TCP/UDP streams among VNIC contexts.Reviewed-by: Dennis Dalessandro
Reviewed-by: Ira Weiny
Signed-off-by: Niranjana Vishwanathapura
Signed-off-by: Andrzej Kacprowski
Signed-off-by: Doug Ledford -
Define OPA VNIC interface between hardware independent VNIC
functionality and the hardware dependent VNIC functionality.Reviewed-by: Dennis Dalessandro
Reviewed-by: Ira Weiny
Signed-off-by: Niranjana Vishwanathapura
Signed-off-by: Doug Ledford -
Add rdma netdev interface to ib device structure allowing rdma netdev
devices to be allocated by ib clients.Reviewed-by: Dennis Dalessandro
Reviewed-by: Ira Weiny
Signed-off-by: Niranjana Vishwanathapura
Signed-off-by: Doug Ledford
20 Apr, 2017
1 commit
-
We rename the "write" flags to "exclusive", as it's used for both
WRITE and DESTROY actions.Fixes: 3832125624b7 ('IB/core: Add support for idr types')
Signed-off-by: Matan Barak
Reviewed-by: Sean Hefty
Signed-off-by: Doug Ledford
06 Apr, 2017
1 commit
-
Add ability to fault packets on transmit by opcode.
Dropping by packet can be achieved by setting the mask to 0.In order to drop non-verbs traffic we set PbcInsertHrc
to NONE (0x2). The packet will still be delivered to
the receiving node but a KHdrHCRCErr (KDETH packet
with a bad HCRC) will be triggered and the packet will
not be delivered to the correct context.In order to drop regular verbs traffic we set the
PbcTestEbp flag. The packet will still be delivered
to the receiving node but a 'late ebp error' will
be triggered and will be dropped.A global toggle (/sys/kernel/debug/hfi1/hfi1_X/fault_suppress_err)
has been added to suppress the error messages on the receive
node when a packet was faulted on the sending node.Reviewed-by: Dennis Dalessandro
Signed-off-by: Mike Marciniszyn
Signed-off-by: Don Hiatt
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford