24 Feb, 2020
1 commit
-
[ Upstream commit 6b57cea9221b0247ad5111b348522625e489a8e4 ]
Currently when the low level driver notifies Pkey, GID, and port change
events they are notified to the registered handlers in the order they are
registered.IB core and other ULPs such as IPoIB are interested in GID, LID, Pkey
change events.Since all GID queries done by ULPs are serviced by IB core, and the IB
core deferes cache updates to a work queue, it is possible for other
clients to see stale cache data when they handle their own events.For example, the below call tree shows how ipoib will call
rdma_query_gid() concurrently with the update to the cache sitting in the
WQ.mlx5_ib_handle_event()
ib_dispatch_event()
ib_cache_event()
queue_work() -> slow cache update[..]
ipoib_event()
queue_work()
[..]
work handler
ipoib_ib_dev_flush_light()
__ipoib_ib_dev_flush()
ipoib_dev_addr_changed_valid()
rdma_query_gid()
Signed-off-by: Leon Romanovsky
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
15 Feb, 2020
1 commit
-
commit ca95c1411198c2d87217c19d44571052cdc94725 upstream.
Verify that MR access flags that are passed from user are all supported
ones, otherwise an error is returned.Fixes: 4fca03778351 ("IB/uverbs: Move ib_access_flags and ib_read_counters_flags to uapi")
Link: https://lore.kernel.org/r/1578506740-22188-6-git-send-email-yishaih@mellanox.com
Signed-off-by: Michael Guralnik
Signed-off-by: Yishai Hadas
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman
18 Dec, 2019
1 commit
-
commit ecdfdfdbe4d4c74029f2b416b7ee6d0aeb56364a upstream.
If dev->dma_device->params == NULL then the maximum DMA segment size is 64
KB. See also the dma_get_max_seg_size() implementation. This patch fixes
the following kernel warning:DMA-API: infiniband rxe0: mapping sg segment longer than device claims to support [len=126976] [max=65536]
WARNING: CPU: 4 PID: 4848 at kernel/dma/debug.c:1220 debug_dma_map_sg+0x3d9/0x450
RIP: 0010:debug_dma_map_sg+0x3d9/0x450
Call Trace:
srp_queuecommand+0x626/0x18d0 [ib_srp]
scsi_queue_rq+0xd02/0x13e0 [scsi_mod]
__blk_mq_try_issue_directly+0x2b3/0x3f0
blk_mq_request_issue_directly+0xac/0xf0
blk_insert_cloned_request+0xdf/0x170
dm_mq_queue_rq+0x43d/0x830 [dm_mod]
__blk_mq_try_issue_directly+0x2b3/0x3f0
blk_mq_request_issue_directly+0xac/0xf0
blk_mq_try_issue_list_directly+0xb8/0x170
blk_mq_sched_insert_requests+0x23c/0x3b0
blk_mq_flush_plug_list+0x529/0x730
blk_flush_plug_list+0x21f/0x260
blk_mq_make_request+0x56b/0xf20
generic_make_request+0x196/0x660
submit_bio+0xae/0x290
blkdev_direct_IO+0x822/0x900
generic_file_direct_write+0x110/0x200
__generic_file_write_iter+0x124/0x2a0
blkdev_write_iter+0x168/0x270
aio_write+0x1c4/0x310
io_submit_one+0x971/0x1390
__x64_sys_io_submit+0x12a/0x390
do_syscall_64+0x6f/0x2e0
entry_SYSCALL_64_after_hwframe+0x49/0xbeLink: https://lore.kernel.org/r/20191025225830.257535-2-bvanassche@acm.org
Cc:
Fixes: 0b5cb3300ae5 ("RDMA/srp: Increase max_segment_size")
Signed-off-by: Bart Van Assche
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman
23 Oct, 2019
1 commit
-
The issue is in drivers/infiniband/core/uverbs_std_types_cq.c in the
UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE) function. We check that:if (attr.comp_vector >= attrs->ufile->device->num_comp_vectors) {
But we don't check if "attr.comp_vector" is negative. It could
potentially lead to an array underflow. My concern would be where
cq->vector is used in the create_cq() function from the cxgb4 driver.And really "attr.comp_vector" is appears as a u32 to user space so that's
the right type to use.Fixes: 9ee79fce3642 ("IB/core: Add completion queue (cq) object actions")
Link: https://lore.kernel.org/r/20191011133419.GA22905@mwanda
Signed-off-by: Dan Carpenter
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe
22 Sep, 2019
2 commits
-
Pull RDMA subsystem updates from Jason Gunthorpe:
"This cycle mainly saw lots of bug fixes and clean up code across the
core code and several drivers, few new functional changes were made.- Many cleanup and bug fixes for hns
- Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed,
bnxt_re, efa- Share the query_port code between all the iWarp drivers
- General rework and cleanup of the ODP MR umem code to fit better
with the mmu notifier get/put scheme- Support rdma netlink in non init_net name spaces
- mlx5 support for XRC devx and DC ODP"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits)
RDMA: Fix double-free in srq creation error flow
RDMA/efa: Fix incorrect error print
IB/mlx5: Free mpi in mp_slave mode
IB/mlx5: Use the original address for the page during free_pages
RDMA/bnxt_re: Fix spelling mistake "missin_resp" -> "missing_resp"
RDMA/hns: Package operations of rq inline buffer into separate functions
RDMA/hns: Optimize cmd init and mode selection for hip08
IB/hfi1: Define variables as unsigned long to fix KASAN warning
IB/{rdmavt, hfi1, qib}: Add a counter for credit waits
IB/hfi1: Add traces for TID RDMA READ
RDMA/siw: Relax from kmap_atomic() use in TX path
IB/iser: Support up to 16MB data transfer in a single command
RDMA/siw: Fix page address mapping in TX path
RDMA: Fix goto target to release the allocated memory
RDMA/usnic: Avoid overly large buffers on stack
RDMA/odp: Add missing cast for 32 bit
RDMA/hns: Use devm_platform_ioremap_resource() to simplify code
Documentation/infiniband: update name of some functions
RDMA/cma: Fix false error message
RDMA/hns: Fix wrong assignment of qp_access_flags
... -
Pull hmm updates from Jason Gunthorpe:
"This is more cleanup and consolidation of the hmm APIs and the very
strongly related mmu_notifier interfaces. Many places across the tree
using these interfaces are touched in the process. Beyond that a
cleanup to the page walker API and a few memremap related changes
round out the series:- General improvement of hmm_range_fault() and related APIs, more
documentation, bug fixes from testing, API simplification &
consolidation, and unused API removal- Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE,
and make them internal kconfig selects- Hoist a lot of code related to mmu notifier attachment out of
drivers by using a refcount get/put attachment idiom and remove the
convoluted mmu_notifier_unregister_no_release() and related APIs.- General API improvement for the migrate_vma API and revision of its
only user in nouveau- Annotate mmu_notifiers with lockdep and sleeping region debugging
Two series unrelated to HMM or mmu_notifiers came along due to
dependencies:- Allow pagemap's memremap_pages family of APIs to work without
providing a struct device- Make walk_page_range() and related use a constant structure for
function pointers"* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (75 commits)
libnvdimm: Enable unit test infrastructure compile checks
mm, notifier: Catch sleeping/blocking for !blockable
kernel.h: Add non_block_start/end()
drm/radeon: guard against calling an unpaired radeon_mn_unregister()
csky: add missing brackets in a macro for tlb.h
pagewalk: use lockdep_assert_held for locking validation
pagewalk: separate function pointers from iterator data
mm: split out a new pagewalk.h header from mm.h
mm/mmu_notifiers: annotate with might_sleep()
mm/mmu_notifiers: prime lockdep
mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end
mm/mmu_notifiers: remove the __mmu_notifier_invalidate_range_start/end exports
mm/hmm: hmm_range_fault() infinite loop
mm/hmm: hmm_range_fault() NULL pointer bug
mm/hmm: fix hmm_range_fault()'s handling of swapped out pages
mm/mmu_notifiers: remove unregister_no_release
RDMA/odp: remove ib_ucontext from ib_umem
RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'
RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr
RDMA/mlx5: Use ib_umem_start instead of umem.address
...
14 Sep, 2019
2 commits
-
This patch adds a counter for credit waits to assist field debugging.
Link: https://lore.kernel.org/r/20190911113047.126040.10857.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn
Signed-off-by: Kaike Wan
Signed-off-by: Mike Marciniszyn
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe -
To resolve dependencies in following patches
mlx5_ib.h conflict resolved by keeing both hunks
Linux 5.3-rc8
Signed-off-by: Jason Gunthorpe
22 Aug, 2019
9 commits
-
At this point the ucontext is only being stored to access the ib_device,
so just store the ib_device directly instead. This is more natural and
logical as the umem has nothing to do with the ucontext.Link: https://lore.kernel.org/r/20190806231548.25242-8-jgg@ziepe.ca
Signed-off-by: Jason Gunthorpe -
This is a significant simplification, no extra list is kept per FD, and
the interval tree is now shared between all the ucontexts, reducing
overhead if there are multiple ucontexts active.Link: https://lore.kernel.org/r/20190806231548.25242-7-jgg@ziepe.ca
Signed-off-by: Jason Gunthorpe -
Jason Gunthorpe says:
====================
This is a collection of general cleanups for ODP to clarify some of the
flows around umem creation and use of the interval tree.
====================The branch is based on v5.3-rc5 due to dependencies
* odp_fixes:
RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr
RDMA/mlx5: Use ib_umem_start instead of umem.address
RDMA/core: Make invalidate_range a device operation
RDMA/odp: Use kvcalloc for the dma_list and page_list
RDMA/odp: Check for overflow when computing the umem_odp end
RDMA/odp: Provide ib_umem_odp_release() to undo the allocs
RDMA/odp: Split creating a umem_odp from ib_umem_get
RDMA/odp: Make the three ways to create a umem_odp clear
RMDA/odp: Consolidate umem_odp initialization
RDMA/odp: Make it clearer when a umem is an implicit ODP umem
RDMA/odp: Iterate over the whole rbtree directly
RDMA/odp: Use the common interval tree library instead of generic
RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLBSigned-off-by: Jason Gunthorpe
-
The callback function 'invalidate_range' is implemented in a driver so the
place for it is in the ib_device_ops structure and not in ib_ucontext.Signed-off-by: Moni Shoua
Reviewed-by: Guy Levi
Reviewed-by: Jason Gunthorpe
Link: https://lore.kernel.org/r/20190819111710.18440-11-leon@kernel.org
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe -
Since the page size can be extended in the ODP case by IB_ACCESS_HUGETLB
the existing overflow checks done by ib_umem_get() are not
sufficient. Check for overflow again.Further, remove the unchecked math from the inlines and just use the
precomputed value stored in the interval_tree_node.Link: https://lore.kernel.org/r/20190819111710.18440-9-leon@kernel.org
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe -
This is the last creation API that is overloaded for both, there is very
little code sharing and a driver has to be specifically ready for a
umem_odp to be created to use the odp version.Link: https://lore.kernel.org/r/20190819111710.18440-7-leon@kernel.org
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe -
The three paths to build the umem_odps are kind of muddled, they are:
- As a normal ib_mr umem
- As a child in an implicit ODP umem tree
- As the root of an implicit ODP umem treeOnly the first two are actually umem's, the last is an abuse.
The implicit case can only be triggered by explicit driver request, it
should never be co-mingled with the normal case. While we are here, make
sensible function names and add some comments to make this clearer.Link: https://lore.kernel.org/r/20190819111710.18440-6-leon@kernel.org
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe -
Implicit ODP umems are special, they don't have any page lists, they don't
exist in the interval tree and they are never DMA mapped.Instead of trying to guess this based on a zero length use an explicit
flag.Further, do not allow non-implicit umems to be 0 size.
Link: https://lore.kernel.org/r/20190819111710.18440-4-leon@kernel.org
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe -
ODP is working with userspace VA's in the interval tree which always fit
into an unsigned long, so we can use the common code.This comes at a cost of a 16 byte increase in ib_umem_odp struct size due
to storing the interval tree start/last in addition to the umem
addr/length. However these values were computed and are performance
critical for the interval lookup, so this seems like a worthwhile trade
off.Removes 2k of .text from the kernel.
Link: https://lore.kernel.org/r/20190819111710.18440-2-leon@kernel.org
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
21 Aug, 2019
2 commits
-
task_active_pid_ns() is wrong API to check PID namespace because it
posses some restrictions and return PID namespace where the process
was allocated. It created mismatches with current namespace, which
can be different.Rewrite whole rdma_is_visible_in_pid_ns() logic to provide reliable
results without any relation to allocated PID namespace.Fixes: 8be565e65fa9 ("RDMA/nldev: Factor out the PID namespace check")
Fixes: 6a6c306a09b5 ("RDMA/restrack: Make is_visible_in_pid_ns() as an API")
Reviewed-by: Mark Zhang
Signed-off-by: Leon Romanovsky
Link: https://lore.kernel.org/r/20190815083834.9245-4-leon@kernel.org
Signed-off-by: Doug Ledford -
There is no need to keep DEBUG defines for out-of-the tree testing.
Signed-off-by: Leon Romanovsky
Link: https://lore.kernel.org/r/20190819114547.20704-1-leon@kernel.org
Signed-off-by: Doug Ledford
12 Aug, 2019
1 commit
-
In order to improve readability, add ib_port_phys_state enum to replace
the use of magic numbers.Signed-off-by: Kamal Heib
Reviewed-by: Andrew Boyer
Acked-by: Michal Kalderon
Acked-by: Bernard Metzler
Link: https://lore.kernel.org/r/20190807103138.17219-2-kamalheib1@gmail.com
Signed-off-by: Doug Ledford
06 Aug, 2019
1 commit
-
Add ratelimited helpers to the ibdev_* printk functions.
Implementation inspired by counterpart dev_*_ratelimited functions.Signed-off-by: Gal Pressman
Reviewed-by: Leon Romanovsky
Link: https://lore.kernel.org/r/20190801171447.54440-2-galpress@amazon.com
Signed-off-by: Doug Ledford
05 Aug, 2019
1 commit
-
Send and Receive completion is handled on a single CPU selected at
the time each Completion Queue is allocated. Typically this is when
an initiator instantiates an RDMA transport, or when a target
accepts an RDMA connection.Some ULPs cannot open a connection per CPU to spread completion
workload across available CPUs and MSI vectors. For such ULPs,
provide an API that allows the RDMA core to select a completion
vector based on the device's complement of available comp_vecs.ULPs that invoke ib_alloc_cq() with only comp_vector 0 are converted
to use the new API so that their completion workloads interfere less
with each other.Suggested-by: Håkon Bugge
Signed-off-by: Chuck Lever
Reviewed-by: Leon Romanovsky
Cc:
Cc:
Link: https://lore.kernel.org/r/20190729171923.13428.52555.stgit@manet.1015granger.net
Signed-off-by: Doug Ledford
01 Aug, 2019
2 commits
-
Due to the complexity of client->remove() callbacks it is desirable to not
hold any locks while calling them. Remove the last one by tracking only
the highest client ID and running backwards from there over the xarray.Since the only purpose of that lock was to protect the linked list, we can
drop the lock.Signed-off-by: Jason Gunthorpe
Signed-off-by: Leon Romanovsky
Link: https://lore.kernel.org/r/20190731081841.32345-3-leon@kernel.org
Signed-off-by: Doug Ledford -
lockdep reports:
WARNING: possible circular locking dependency detected
modprobe/302 is trying to acquire lock:
0000000007c8919c ((wq_completion)ib_cm){+.+.}, at: flush_workqueue+0xdf/0x990but task is already holding lock:
000000002d3d2ca9 (&device->client_data_rwsem){++++}, at: remove_client_context+0x79/0xd0 [ib_core]which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&device->client_data_rwsem){++++}:
down_read+0x3f/0x160
ib_get_net_dev_by_params+0xd5/0x200 [ib_core]
cma_ib_req_handler+0x5f6/0x2090 [rdma_cm]
cm_process_work+0x29/0x110 [ib_cm]
cm_req_handler+0x10f5/0x1c00 [ib_cm]
cm_work_handler+0x54c/0x311d [ib_cm]
process_one_work+0x4aa/0xa30
worker_thread+0x62/0x5b0
kthread+0x1ca/0x1f0
ret_from_fork+0x24/0x30-> #1 ((work_completion)(&(&work->work)->work)){+.+.}:
process_one_work+0x45f/0xa30
worker_thread+0x62/0x5b0
kthread+0x1ca/0x1f0
ret_from_fork+0x24/0x30-> #0 ((wq_completion)ib_cm){+.+.}:
lock_acquire+0xc8/0x1d0
flush_workqueue+0x102/0x990
cm_remove_one+0x30e/0x3c0 [ib_cm]
remove_client_context+0x94/0xd0 [ib_core]
disable_device+0x10a/0x1f0 [ib_core]
__ib_unregister_device+0x5a/0xe0 [ib_core]
ib_unregister_device+0x21/0x30 [ib_core]
mlx5_ib_stage_ib_reg_cleanup+0x9/0x10 [mlx5_ib]
__mlx5_ib_remove+0x3d/0x70 [mlx5_ib]
mlx5_ib_remove+0x12e/0x140 [mlx5_ib]
mlx5_remove_device+0x144/0x150 [mlx5_core]
mlx5_unregister_interface+0x3f/0xf0 [mlx5_core]
mlx5_ib_cleanup+0x10/0x3a [mlx5_ib]
__x64_sys_delete_module+0x227/0x350
do_syscall_64+0xc3/0x6a4
entry_SYSCALL_64_after_hwframe+0x49/0xbeWhich is due to the read side of the client_data_rwsem being obtained
recursively through a work queue flush during cm client removal.The lock is being held across the remove in remove_client_context() so
that the function is a fence, once it returns the client is removed. This
is required so that the two callers do not proceed with destruction until
the client completes removal.Instead of using client_data_rwsem use the existing device unregistration
refcount and add a similar client unregistration (client->uses) refcount.This will fence the two unregistration paths without holding any locks.
Cc:
Fixes: 921eab1143aa ("RDMA/devices: Re-organize device.c locking")
Signed-off-by: Jason Gunthorpe
Signed-off-by: Leon Romanovsky
Link: https://lore.kernel.org/r/20190731081841.32345-2-leon@kernel.org
Signed-off-by: Doug Ledford
30 Jul, 2019
1 commit
-
The fix for IB port statistics initialization ("IB/core: Fix querying
total rdma stats") is needed before we take a follow-on patch to
for-next.Signed-off-by: Doug Ledford
26 Jul, 2019
2 commits
-
Now that IB core supports RDMA device binding with specific net namespace,
enable IB core to accept netlink commands in non init_net namespaces.This is done by having per net namespace netlink socket.
At present only netlink device handling client RDMA_NL_NLDEV supports
device handling in multiple net namespaces. Hence do not accept netlink
messages for other clients in non init_net net namespaces.Link: https://lore.kernel.org/r/20190723070205.6247-1-leon@kernel.org
Signed-off-by: Parav Pandit
Signed-off-by: Leon Romanovsky
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe -
So that rdma can work with CONFIG_KERNEL_HEADER_TEST and
CONFIG_HEADERS_CHECK.Link: https://lore.kernel.org/r/20190722170126.GA16453@ziepe.ca
Signed-off-by: Jason Gunthorpe
23 Jul, 2019
1 commit
-
When an OPFN request is flushed, the request is completed without
unreserving itself from the send queue. Subsequently, when a new
request is post sent, the following warning will be triggered:WARNING: CPU: 4 PID: 8130 at rdmavt/qp.c:1761 rvt_post_send+0x72a/0x880 [rdmavt]
Call Trace:
[] dump_stack+0x19/0x1b
[] __warn+0xd8/0x100
[] warn_slowpath_null+0x1d/0x20
[] rvt_post_send+0x72a/0x880 [rdmavt]
[] ? account_entity_dequeue+0xae/0xd0
[] ? __kmalloc+0x55/0x230
[] ib_uverbs_post_send+0x37c/0x5d0 [ib_uverbs]
[] ? rdma_lookup_put_uobject+0x26/0x60 [ib_uverbs]
[] ib_uverbs_write+0x286/0x460 [ib_uverbs]
[] ? security_file_permission+0x27/0xa0
[] vfs_write+0xc0/0x1f0
[] SyS_write+0x7f/0xf0
[] system_call_fastpath+0x22/0x27This patch fixes the problem by moving rvt_qp_wqe_unreserve() into
rvt_qp_complete_swqe() to simplify the code and make it less
error-prone.Fixes: ca95f802ef51 ("IB/hfi1: Unreserve a reserved request when it is completed")
Link: https://lore.kernel.org/r/20190715164528.74174.31364.stgit@awfm-01.aw.intel.com
Cc:
Reviewed-by: Mike Marciniszyn
Reviewed-by: Dennis Dalessandro
Signed-off-by: Kaike Wan
Signed-off-by: Mike Marciniszyn
Signed-off-by: Jason Gunthorpe
09 Jul, 2019
3 commits
-
5.4-rc1 will have new compile time debugging to test that headers can be
compiled stand alone. Many rdma headers are already broken and excluded
from the mechanism, however to avoid compile failures during the merge
window fix enough so that the newly added header compiles clean.Fixes: 413d3347503b ("RDMA/counter: Add set/clear per-port auto mode support")
Reported-by: Stephen Rothwell
Signed-off-by: Jason Gunthorpe
Signed-off-by: Mark Zhang -
Added the interface in the infiniband driver that applies the rdma_dim
adaptive moderation. There is now a special function for allocating an
ib_cq that uses rdma_dim.Performance improvement (ConnectX-5 100GbE, x86) running FIO benchmark over
NVMf between two equal end-hosts with 56 cores across a Mellanox switch
using null_blk device:READS without DIM:
blk size | BW | IOPS | 99th percentile latency | 99.99th latency
512B | 3.8GiB/s | 7.7M | 1401 usec | 2442 usec
4k | 7.0GiB/s | 1.8M | 4817 usec | 6587 usec
64k | 10.7GiB/s| 175k | 9896 usec | 10028 usecIO WRITES without DIM:
blk size | BW | IOPS | 99th percentile latency | 99.99th latency
512B | 3.6GiB/s | 7.5M | 1434 usec | 2474 usec
4k | 6.3GiB/s | 1.6M | 938 usec | 1221 usec
64k | 10.7GiB/s| 175k | 8979 usec | 12780 usecIO READS with DIM:
blk size | BW | IOPS | 99th percentile latency | 99.99th latency
512B | 4GiB/s | 8.2M | 816 usec | 889 usec
4k | 10.1GiB/s| 2.65M| 3359 usec | 5080 usec
64k | 10.7GiB/s| 175k | 9896 usec | 10028 usecIO WRITES with DIM:
blk size | BW | IOPS | 99th percentile latency | 99.99th latency
512B | 3.9GiB/s | 8.1M | 799 usec | 922 usec
4k | 9.6GiB/s | 2.5M | 717 usec | 1004 usec
64k | 10.7GiB/s| 176k | 8586 usec | 12256 usecThe rdma_dim algorithm was designed to measure the effectiveness of
moderation on the flow in a general way and thus should be appropriate
for all RDMA storage protocols.rdma_dim is configured to be the default option based on performance
improvement seen after extensive tests.Signed-off-by: Yamin Friedman
Reviewed-by: Max Gurtovoy
Reviewed-by: Sagi Grimberg
Signed-off-by: Saeed Mahameed
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe -
Userspace expects the IB_TM_CAP_RC bit to indicate that the device
supports RC transport tag matching with rendezvous offload. However the
firmware splits this into two capabilities for eager and rendezvous tag
matching.Only if the FW supports both modes should userspace be told the tag
matching capability is available.Cc: # 4.13
Fixes: eb761894351d ("IB/mlx5: Fill XRQ capabilities")
Signed-off-by: Danit Goldberg
Reviewed-by: Yishai Hadas
Reviewed-by: Artemy Kovalyov
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
05 Jul, 2019
8 commits
-
This patch adds the ability to return the hwstats of per-port default
counters (which can also be queried through sysfs nodes).Signed-off-by: Mark Zhang
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe -
Provide an option to allow users to manually bind a qp with a counter
through RDMA netlink. Limit it to users with ADMIN capability only.Signed-off-by: Mark Zhang
Reviewed-by: Majd Dibbiny
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe -
In manual mode a QP is bound to a counter manually. If counter is not
specified then a new one will be allocated.Manual mode is enabled when user binds a QP, and disabled when the last
manually bound QP is unbound.When auto-mode is turned off and there are counters left, manual mode is
enabled so that the user is able to access these counters.Signed-off-by: Mark Zhang
Reviewed-by: Majd Dibbiny
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe -
Since a QP can only be bound to one counter, then if it is bound to a
separate counter, for backward compatibility purpose, the statistic value
must be:
* stat of default counter
+ stat of all running allocated counters
+ stat of all deallocated counters (history stats)Signed-off-by: Mark Zhang
Reviewed-by: Majd Dibbiny
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe -
This patch adds the ability to return all available counters together with
their properties and hwstats.Signed-off-by: Mark Zhang
Reviewed-by: Majd Dibbiny
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe -
In auto mode all QPs belong to one category are bind automatically to a
single counter set. Currently only "qp type" is supported.In this mode the qp counter is set in RST2INIT modification, and when a qp
is destroyed the counter is unbound.Signed-off-by: Mark Zhang
Reviewed-by: Majd Dibbiny
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe -
Add an API to support set/clear per-port auto mode.
Signed-off-by: Mark Zhang
Reviewed-by: Majd Dibbiny
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe -
Introduce statistic counter as a new resource. It allows a user to monitor
specific objects (e.g., QPs) by binding to a counter.In some cases a user counter resource is created with task other then
"current", because its creation is done as part of rdmatool call.Signed-off-by: Mark Zhang
Reviewed-by: Majd Dibbiny
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
29 Jun, 2019
1 commit
-
Add some helper functions to hide struct rvt_swqe details.
Reviewed-by: Mike Marciniszyn
Signed-off-by: Michael J. Ruhl
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe