Eric Lee / smarc-fsl-linux-kernel

03 Jul, 2018

2 commits

964705c4a IB/hfi1: Optimize kthread pointer locking when queuing CQ entries ... Browse Code »

commit af8aab71370a692eaf7e7969ba5b1a455ac20113 upstream.

All threads queuing CQ entries on different CQs are unnecessarily
synchronized by a spin lock to check if the CQ kthread worker hasn't
been destroyed before queuing an CQ entry.

The lock used in 6efaf10f163d ("IB/rdmavt: Avoid queuing work into a
destroyed cq kthread worker") is a device global lock and will have
poor performance at scale as completions are entered from a large
number of CPUs.

Convert to use RCU where the read side of RCU is rvt_cq_enter() to
determine that the worker is alive prior to triggering the
completion event.
Apply write side RCU semantics in rvt_driver_cq_init() and
rvt_cq_exit().

Fixes: 6efaf10f163d ("IB/rdmavt: Avoid queuing work into a destroyed cq kthread worker")
Cc: # 4.14.x
Reviewed-by: Mike Marciniszyn
Signed-off-by: Sebastian Sanchez
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford
Signed-off-by: Greg Kroah-Hartman

Sebastian Sanchez
2018-07-03 17:24:54 +0800
96fb9b883 IB/core: Make testing MR flags for writability a static inline function ... Browse Code »

commit 08bb558ac11ab944e0539e78619d7b4c356278bd upstream.

Make the MR writability flags check, which is performed in umem.c,
a static inline function in file ib_verbs.h

This allows the function to be used by low-level infiniband drivers.

Cc:
Signed-off-by: Jason Gunthorpe
Signed-off-by: Jack Morgenstein
Signed-off-by: Leon Romanovsky
Signed-off-by: Greg Kroah-Hartman

Jack Morgenstein
2018-07-03 17:24:53 +0800

30 May, 2018

1 commit

a59bd8195 IB/umem: Use the correct mm during ib_umem_release ... Browse Code »

commit 8e907ed4882714fd13cfe670681fc6cb5284c780 upstream.

User-space may invoke ibv_reg_mr and ibv_dereg_mr in different threads.

If ibv_dereg_mr is called after the thread which invoked ibv_reg_mr has
exited, get_pid_task will return NULL and ib_umem_release will not
decrease mm->pinned_vm.

Instead of using threads to locate the mm, use the overall tgid from the
ib_ucontext struct instead. This matches the behavior of ODP and
disassociate in handling the mm of the process that called ibv_reg_mr.

Cc:
Fixes: 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get")
Signed-off-by: Lidong Chen
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Lidong Chen
2018-05-30 13:51:49 +0800

26 Apr, 2018

2 commits

fd370b8e6 IB/core: Map iWarp AH type to undefined in rdma_ah_find_type ... Browse Code »

[ Upstream commit 87daac68f77a3e21a1113f816e6a7be0b38bdde8 ]

iWarp devices do not support the creation of address handles
so return AH_ATTR_TYPE_UNDEFINED for all iWarp devices.

While we are here reduce the size of port_num to u8 and add
a comment.

Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Reported-by: Parav Pandit
CC: Sean Hefty
Reviewed-by: Ira Weiny
Reviewed-by: Shiraz Saleem
Signed-off-by: Don Hiatt
Signed-off-by: Dennis Dalessandro
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Don Hiatt
2018-04-26 17:02:15 +0800
b7b27e19e RDMA/core: Clarify rdma_ah_find_type ... Browse Code »

[ Upstream commit a6532e7139660c103dda181aa5b2c734aa26ed6c ]

iWARP does not use rdma_ah_attr_type, and for this reason we do not have a
RDMA_AH_ATTR_TYPE_IWARP. rdma_ah_find_type should not even be called on iwarp
ports and for clarity it shouldn't have a special test for iWarp.

This changes the result from RDMA_AH_ATTR_TYPE_ROCE to RDMA_AH_ATTR_TYPE_IB
when wrongly called on an iWarp port.

Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Parav Pandit
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Parav Pandit
2018-04-26 17:02:04 +0800

08 Apr, 2018

1 commit

b0d95e686 RDMA/ucma: Introduce safer rdma_addr_size() variants ... Browse Code »

commit 84652aefb347297aa08e91e283adf7b18f77c2d5 upstream.

There are several places in the ucma ABI where userspace can pass in a
sockaddr but set the address family to AF_IB. When that happens,
rdma_addr_size() will return a size bigger than sizeof struct sockaddr_in6,
and the ucma kernel code might end up copying past the end of a buffer
not sized for a struct sockaddr_ib.

Fix this by introducing new variants

int rdma_addr_size_in6(struct sockaddr_in6 *addr);
int rdma_addr_size_kss(struct __kernel_sockaddr_storage *addr);

that are type-safe for the types used in the ucma ABI and return 0 if the
size computed is bigger than the size of the type passed in. We can use
these new variants to check what size userspace has passed in before
copying any addresses.

Reported-by:
Signed-off-by: Roland Dreier
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Roland Dreier
2018-04-08 20:26:29 +0800

22 Feb, 2018

1 commit

d40ad8657 IB/core: Fix ib_wc structure size to remain in 64 bytes boundary ... Browse Code »

commit cd2a6e7d384b043d5d029e39663061cebc949385 upstream.

The change of slid from u16 to u32 results in sizeof(struct ib_wc)
cross 64B boundary, which causes more cache misses. This patch
rearranges the fields and remain the size to 64B.

Pahole output before this change:

struct ib_wc {
union {
u64 wr_id; /* 8 */
struct ib_cqe * wr_cqe; /* 8 */
}; /* 0 8 */
enum ib_wc_status status; /* 8 4 */
enum ib_wc_opcode opcode; /* 12 4 */
u32 vendor_err; /* 16 4 */
u32 byte_len; /* 20 4 */
struct ib_qp * qp; /* 24 8 */
union {
__be32 imm_data; /* 4 */
u32 invalidate_rkey; /* 4 */
} ex; /* 32 4 */
u32 src_qp; /* 36 4 */
int wc_flags; /* 40 4 */
u16 pkey_index; /* 44 2 */

/* XXX 2 bytes hole, try to pack */

u32 slid; /* 48 4 */
u8 sl; /* 52 1 */
u8 dlid_path_bits; /* 53 1 */
u8 port_num; /* 54 1 */
u8 smac[6]; /* 55 6 */

/* XXX 1 byte hole, try to pack */

u16 vlan_id; /* 62 2 */
/* --- cacheline 1 boundary (64 bytes) --- */
u8 network_hdr_type; /* 64 1 */

/* size: 72, cachelines: 2, members: 17 */
/* sum members: 62, holes: 2, sum holes: 3 */
/* padding: 7 */
/* last cacheline: 8 bytes */
};

Pahole output after this change:

struct ib_wc {
union {
u64 wr_id; /* 8 */
struct ib_cqe * wr_cqe; /* 8 */
}; /* 0 8 */
enum ib_wc_status status; /* 8 4 */
enum ib_wc_opcode opcode; /* 12 4 */
u32 vendor_err; /* 16 4 */
u32 byte_len; /* 20 4 */
struct ib_qp * qp; /* 24 8 */
union {
__be32 imm_data; /* 4 */
u32 invalidate_rkey; /* 4 */
} ex; /* 32 4 */
u32 src_qp; /* 36 4 */
u32 slid; /* 40 4 */
int wc_flags; /* 44 4 */
u16 pkey_index; /* 48 2 */
u8 sl; /* 50 1 */
u8 dlid_path_bits; /* 51 1 */
u8 port_num; /* 52 1 */
u8 smac[6]; /* 53 6 */

/* XXX 1 byte hole, try to pack */

u16 vlan_id; /* 60 2 */
u8 network_hdr_type; /* 62 1 */

/* size: 64, cachelines: 1, members: 17 */
/* sum members: 62, holes: 1, sum holes: 1 */
/* padding: 1 */
};

Fixes: 7db20ecd1d97 ("IB/core: Change wc.slid from 16 to 32 bits")
Signed-off-by: Bodong Wang
Reviewed-by: Parav Pandit
Signed-off-by: Leon Romanovsky
Signed-off-by: Jason Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Bodong Wang
2018-02-22 22:42:14 +0800

20 Dec, 2017

2 commits

9fc290e52 IB/core: Fix endianness annotation in rdma_is_multicast_addr() ... Browse Code »

[ Upstream commit 1c3aea2bc8f0b2e5b57375ead40457ff75a3a2ec ]

Since ipv4_addr is a big endian 32-bit number, annotate it as such.

Fixes: commit be1d325a3358 ("IB/core: Set RoCEv2 MGID according to spec")
Signed-off-by: Bart Van Assche
Reviewed-by: Leon Romanovsky
Signed-off-by: Doug Ledford
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Bart Van Assche
2017-12-20 17:10:36 +0800
b57259ca0 IB/core: Fix calculation of maximum RoCE MTU ... Browse Code »

[ Upstream commit 99260132fde7bddc6e0132ce53da94d1c9ccabcb ]

The original code only took into consideration the largest header
possible after the IB_BTH_BYTES. This was incorrect, as the largest
possible header size is the largest possible combination of headers we
might run into. The new code accounts for all possible headers in the
largest possible combination and subtracts that from the MTU to make
sure that all packets will fit on the wire.

Link: https://www.spinics.net/lists/linux-rdma/msg54558.html
Fixes: 3c86aa70bf67 ("RDMA/cm: Add RDMA CM support for IBoE devices")
Signed-off-by: Parav Pandit
Reviewed-by: Daniel Jurgens
Reported-by: Roland Dreier
Signed-off-by: Leon Romanovsky
Signed-off-by: Doug Ledford
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Parav Pandit
2017-12-20 17:10:35 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

25 Sep, 2017

2 commits

edd315511 IB: Correct MR length field to be 64-bit ... Browse Code »

The ib_mr->length represents the length of the MR in bytes as per
the IBTA spec 1.3 section 11.2.10.3 (REGISTER PHYSICAL MEMORY REGION).

Currently ib_mr->length field is defined as only 32-bits field.
This might result into truncation and failed WRs of consumers who
registers more than 4GB bytes memory regions and whose WRs accessing
such MRs.

This patch makes the length 64-bit to avoid such truncation.

Cc: Sagi Grimberg
Cc: Chuck Lever
Cc: Faisal Latif
Fixes: 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API")
Signed-off-by: Ilya Lesokhin
Signed-off-by: Parav Pandit
Signed-off-by: Leon Romanovsky
Signed-off-by: Doug Ledford

Parav Pandit
2017-09-25 23:47:23 +0800
78b1beb09 IB/core: Fix typo in the name of the tag-matching cap struct ... Browse Code »

The tag matching functionality is implemented by mlx5 driver
by extending XRQ, however this internal kernel information was
exposed to user space applications with *xrq* name instead of *tm*.

This patch renames *xrq* to *tm* to handle that.

Fixes: 8d50505ada72 ("IB/uverbs: Expose XRQ capabilities")
Signed-off-by: Leon Romanovsky
Reviewed-by: Yishai Hadas
Signed-off-by: Doug Ledford

Leon Romanovsky
2017-09-25 23:47:23 +0800

10 Sep, 2017

1 commit

ad9a19d00 Merge tag 'nfsd-4.14' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd updates from Bruce Fields:
"More RDMA work and some op-structure constification from Chuck Lever,
and a small cleanup to our xdr encoding"

* tag 'nfsd-4.14' of git://linux-nfs.org/~bfields/linux:
svcrdma: Estimate Send Queue depth properly
rdma core: Add rdma_rw_mr_payload()
svcrdma: Limit RQ depth
svcrdma: Populate tail iovec when receiving
nfsd: Incoming xdr_bufs may have content in tail buffer
svcrdma: Clean up svc_rdma_build_read_chunk()
sunrpc: Const-ify struct sv_serv_ops
nfsd: Const-ify NFSv4 encoding and decoding ops arrays
sunrpc: Const-ify instances of struct svc_xprt_ops
nfsd4: individual encoders no longer see error cases
nfsd4: skip encoder in trivial error cases
nfsd4: define ->op_release for compound ops
nfsd4: opdesc will be useful outside nfs4proc.c
nfsd4: move some nfsd4 op definitions to xdr4.h

Linus Torvalds
2017-09-10 04:31:49 +0800

09 Sep, 2017

1 commit

f808c13fd lib/interval_tree: fast overlap detection ... Browse Code »

Allow interval trees to quickly check for overlaps to avoid unnecesary
tree lookups in interval_tree_iter_first().

As of this patch, all interval tree flavors will require using a
'rb_root_cached' such that we can have the leftmost node easily
available. While most users will make use of this feature, those with
special functions (in addition to the generic insert, delete, search
calls) will avoid using the cached option as they can do funky things
with insertions -- for example, vma_interval_tree_insert_after().

[jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()]
Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com
Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Signed-off-by: Jérôme Glisse
Acked-by: Christian König
Acked-by: Peter Zijlstra (Intel)
Acked-by: Doug Ledford
Acked-by: Michael S. Tsirkin
Cc: David Airlie
Cc: Jason Wang
Cc: Christian Benvenuti
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2017-09-09 09:26:49 +0800

06 Sep, 2017

1 commit

006281829 rdma core: Add rdma_rw_mr_payload() ... Browse Code »

The amount of payload per MR depends on device capabilities and
the memory registration mode in use. The new rdma_rw API hides both,
making it difficult for ULPs to determine how large their transport
send queues need to be.

Expose the MR payload information via a new API.

Signed-off-by: Chuck Lever
Acked-by: Doug Ledford
Signed-off-by: J. Bruce Fields

Chuck Lever
2017-09-06 03:15:30 +0800

31 Aug, 2017

9 commits

524271129 IB/core: Assign root to all drivers ... Browse Code »

In order to use the parsing tree, we need to assign the root
to all drivers. Currently, we just assign the default parsing
tree via ib_uverbs_add_one. The driver could override this by
assigning a parsing tree prior to registering the device.

Signed-off-by: Matan Barak
Reviewed-by: Yishai Hadas
Signed-off-by: Doug Ledford

Matan Barak
2017-08-31 20:35:14 +0800
d70724f14 IB/core: Add legacy driver's user-data ... Browse Code »

In this phase, we don't want to change all the drivers to use
flexible driver's specific attributes. Therefore, we add two default
attributes: UHW_IN and UHW_OUT. These attributes are optional in some
methods and they encode the driver specific command data. We add
a function that extract this data and creates the legacy udata over
it.

Driver's data should start from UVERBS_UDATA_DRIVER_DATA_FLAG. This
turns on the first bit of the namespace, indicating this attribute
belongs to the driver's namespace.

Signed-off-by: Matan Barak
Reviewed-by: Yishai Hadas
Signed-off-by: Doug Ledford

Matan Barak
2017-08-31 20:35:13 +0800
64b19e132 IB/core: Export ioctl enum types to user-space ... Browse Code »

Add a new ib_user_ioctl_verbs.h which exports all required ABI
enums and structs to the user-space.
Export the default types to user-space through this file.

Signed-off-by: Matan Barak
Reviewed-by: Yishai Hadas
Signed-off-by: Doug Ledford

Matan Barak
2017-08-31 20:35:12 +0800
4da70da23 IB/core: Explicitly destroy an object while keeping uobject ... Browse Code »

When some objects are destroyed, we need to extract their status at
destruction. After object's destruction, this status
(e.g. events_reported) relies in the uobject. In order to have the
latest and correct status, the underlying object should be destroyed,
but we should keep the uobject alive and read this information off the
uobject. We introduce a rdma_explicit_destroy function. This function
destroys the class type object (for example, the IDR class type which
destroys the underlying object as well) and then convert the uobject
to be of a null class type. This uobject will then be destroyed as any
other uobject once uverbs_finalize_object[s] is called.

Signed-off-by: Matan Barak
Reviewed-by: Yishai Hadas
Signed-off-by: Doug Ledford

Matan Barak
2017-08-31 20:35:11 +0800
354103065 IB/core: Add macros for declaring methods and attributes ... Browse Code »

This patch adds macros for declaring objects, methods and
attributes. These definitions are later used by downstream patches
to declare some of the default types.

Signed-off-by: Matan Barak
Reviewed-by: Yishai Hadas
Signed-off-by: Doug Ledford

Matan Barak
2017-08-31 20:35:11 +0800
118620d36 IB/core: Add uverbs merge trees functionality ... Browse Code »

Different drivers support different features and even subset of the
common uverbs implementation. Currently, this is handled as bitmask
in every driver that represents which kind of methods it supports, but
doesn't go down to attributes granularity. Moreover, drivers might
want to add their specific types, methods and attributes to let
their user-space counter-parts be exposed to some more efficient
abstractions. It means that existence of different features is
validated syntactically via the parsing infrastructure rather than
using a complex in-handler logic.

In order to do that, we allow defining features and abstractions
as parsing trees. These per-feature parsing tree could be merged
to an efficient (perfect-hash based) parsing tree, which is later
used by the parsing infrastructure.

To sum it up, this makes a parse tree unique for a device and
represents only the features this particular device supports.
This is done by having a root specification tree per feature.
Before a device registers itself as an IB device, it merges
all these trees into one parsing tree. This parsing tree
is used to parse all user-space commands.

A future user-space application could read this parse tree. This
tree represents which objects, methods and attributes are
supported by this device.

This is based on the idea of
Jason Gunthorpe

Signed-off-by: Matan Barak
Reviewed-by: Yishai Hadas
Signed-off-by: Doug Ledford

Matan Barak
2017-08-31 20:35:10 +0800
09e3ebf8c IB/core: Add DEVICE object and root tree structure ... Browse Code »

This adds the DEVICE object. This object supports creating the context
that all objects are created from. Moreover, it supports executing
methods which are related to the device itself, such as QUERY_DEVICE.
This is a singleton object (per file instance).

All standard objects are put in the root structure. This root will later
on be used in drivers as the source for their whole parsing tree.
Later on, when new features are added, these drivers could mix this root
with other customized objects.

Signed-off-by: Matan Barak
Reviewed-by: Yishai Hadas
Signed-off-by: Doug Ledford

Matan Barak
2017-08-31 20:35:10 +0800
5009010fb IB/core: Declare an object instead of declaring only type attributes ... Browse Code »

Switch all uverbs_type_attrs_xxxx with DECLARE_UVERBS_OBJECT
macros. This will be later used in order to embed the object
specific methods in the objects as well.

Signed-off-by: Matan Barak
Reviewed-by: Yishai Hadas
Signed-off-by: Doug Ledford

Matan Barak
2017-08-31 20:35:09 +0800
fac9658ca IB/core: Add new ioctl interface ... Browse Code »

In this ioctl interface, processing the command starts from
properties of the command and fetching the appropriate user objects
before calling the handler.

Parsing and validation is done according to a specifier declared by
the driver's code. In the driver, all supported objects are declared.
These objects are separated to different object namepsaces. Dividing
objects to namespaces is done at initialization by using the higher
bits of the object ids. This initialization can mix objects declared
in different places to one parsing tree using in this ioctl interface.

For each object we list all supported methods. Similarly to objects,
methods are separated to method namespaces too. Namespacing is done
similarly to the objects case. This could be used in order to add
methods to an existing object.

Each method has a specific handler, which could be either a default
handler or a driver specific handler.
Along with the handler, a bunch of attributes are specified as well.
Similarly to objects and method, attributes are namespaced and hashed
by their ids at initialization too. All supported attributes are
subject to automatic fetching and validation. These attributes include
the command, response and the method's related objects' ids.

When these entities (objects, methods and attributes) are used, the
high bits of the entities ids are used in order to calculate the hash
bucket index. Then, these high bits are masked out in order to have a
zero based index. Since we use these high bits for both bucketing and
namespacing, we get a compact representation and O(1) array access.
This is mandatory for efficient dispatching.

Each attribute has a type (PTR_IN, PTR_OUT, IDR and FD) and a length.
Attributes could be validated through some attributes, like:
(*) Minimum size / Exact size
(*) Fops for FD
(*) Object type for IDR

If an IDR/fd attribute is specified, the kernel also states the object
type and the required access (NEW, WRITE, READ or DESTROY).
All uobject/fd management is done automatically by the infrastructure,
meaning - the infrastructure will fail concurrent commands that at
least one of them requires concurrent access (WRITE/DESTROY),
synchronize actions with device removals (dissociate context events)
and take care of reference counting (increase/decrease) for concurrent
actions invocation. The reference counts on the actual kernel objects
shall be handled by the handlers.

objects
+--------+
| |
| | methods +--------+
| | ns method method_spec +-----+ |len |
+--------+ +------+[d]+-------+ +----------------+[d]+------------+ |attr1+-> |type |
| object +> |method+-> | spec +-> + attr_buckets +-> |default_chain+--> +-----+ |idr_type|
+--------+ +------+ |handler| | | +------------+ |attr2| |access |
| | | | +-------+ +----------------+ |driver chain| +-----+ +--------+
| | | | +------------+
| | +------+
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
+--------+

[d] = Hash ids to groups using the high order bits

The right types table is also chosen by using the high bits from
the ids. Currently we have either default or driver specific groups.

Once validation and object fetching (or creation) completed, we call
the handler:
int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file *ufile,
struct uverbs_attr_bundle *ctx);

ctx bundles attributes of different namespaces. Each element there
is an array of attributes which corresponds to one namespaces of
attributes. For example, in the usually used case:

ctx core
+----------------------------+ +------------+
| core: +---> | valid |
+----------------------------+ | cmd_attr |
| driver: | +------------+
|----------------------------+--+ | valid |
| | cmd_attr |
| +------------+
| | valid |
| | obj_attr |
| +------------+
|
| drivers
| +------------+
+> | valid |
| cmd_attr |
+------------+
| valid |
| cmd_attr |
+------------+
| valid |
| obj_attr |
+------------+

Signed-off-by: Matan Barak
Reviewed-by: Yishai Hadas
Signed-off-by: Doug Ledford

Matan Barak
2017-08-31 20:35:09 +0800

30 Aug, 2017

2 commits

f43dbebfa IB/core: Add support to finalize objects in one transaction ... Browse Code »

The new ioctl based infrastructure either commits or rollbacks
all objects of the method as one transaction. In order to do
that, we introduce a notion of dealing with a collection of
objects that are related to a specific method.

This also requires adding a notion of a method and attribute.
A method contains a hash of attributes, where each bucket
contains several attributes. The attributes are hashed according
to their namespace which resides in the four upper bits of the id.

For example, an object could be a CQ, which has an action of CREATE_CQ.
This action has multiple attributes. For example, the CQ's new handle
and the comp_channel. Each layer in this hierarchy - objects, methods
and attributes is split into namespaces. The basic example for that is
one namespace representing the default entities and another one
representing the driver specific entities.

When declaring these methods and attributes, we actually declare
their specifications. When a method is executed, we actually
allocates some space to hold auxiliary information. This auxiliary
information contains meta-data about the required objects, such
as pointers to their type information, pointers to the uobjects
themselves (if exist), etc.
The specification, along with the auxiliary information we allocated
and filled is given to the finalize_objects function.

Signed-off-by: Matan Barak
Reviewed-by: Yishai Hadas
Signed-off-by: Doug Ledford

Matan Barak
2017-08-30 22:30:38 +0800
a0aa309c3 IB/core: Add a generic way to execute an operation on a uobject ... Browse Code »

The ioctl infrastructure treats all user-objects in the same manner.
It gets objects ids from the user-space and by using the object type
and type attributes mentioned in the object specification, it executes
this required method. Passing an object id from the user-space as
an attribute is carried out in three stages. The first is carried out
before the actual handler and the last is carried out afterwards.

The different supported operations are read, write, destroy and create.
In the first stage, the former three actions just fetches the object
from the repository (by using its id) and locks it. The last action
allocates a new uobject. Afterwards, the second stage is carried out
when the handler itself carries out the required modification of the
object. The last stage is carried out after the handler finishes and
commits the result. The former two operations just unlock the object.
Destroy calls the "free object" operation, taking into account the
object's type and releases the uobject as well. Creation just adds the
new uobject to the repository, making the object visible to the
application.

In order to abstract these details from the ioctl infrastructure
layer, we add uverbs_get_uobject_from_context and
uverbs_finalize_object functions which corresponds to the first
and last stages respectively.

Signed-off-by: Matan Barak
Reviewed-by: Yishai Hadas
Signed-off-by: Doug Ledford

Matan Barak
2017-08-30 22:30:38 +0800

29 Aug, 2017

5 commits

9c2c84962 IB/core: Add new SRQ type IB_SRQT_TM ... Browse Code »

This patch adds new SRQ type - IB_SRQT_TM. The new SRQ type supports tag
matching and rendezvous offloads for MPI applications.

When SRQ receives a message it will search through the matching list
for the corresponding posted receive buffer. The process of searching
the matching list is called tag matching.
In case the tag matching results in a match, the received message will
be placed in the address specified by the receive buffer. In case no
match was found the message will be placed in a generic buffer until the
corresponding receive buffer will be posted. These messages are called
unexpected and their set is called an unexpected list.

Signed-off-by: Artemy Kovalyov
Reviewed-by: Yossi Itigin
Signed-off-by: Leon Romanovsky
Signed-off-by: Doug Ledford

Artemy Kovalyov
2017-08-29 20:30:17 +0800
1a56ff6da IB/core: Separate CQ handle in SRQ context ... Browse Code »

Before this change CQ attached to SRQ was part of XRC specific extension.
Moving CQ handle out makes it available to other types extending SRQ
functionality.

Signed-off-by: Artemy Kovalyov
Reviewed-by: Yossi Itigin
Signed-off-by: Leon Romanovsky
Signed-off-by: Doug Ledford

Artemy Kovalyov
2017-08-29 20:30:16 +0800
6938fc1ee IB/core: Add XRQ capabilities ... Browse Code »

This patch adds following TM XRQ capabilities:

* max_rndv_hdr_size - Max size of rendezvous request message
* max_num_tags - Max number of entries in tag matching list
* max_ops - Max number of outstanding list operations
* max_sge - Max number of SGE in tag matching entry
* flags - the following flags are currently defined:
- IB_TM_CAP_RC - Support tag matching on RC transport

Signed-off-by: Artemy Kovalyov
Reviewed-by: Yossi Itigin
Signed-off-by: Leon Romanovsky
Signed-off-by: Doug Ledford

Artemy Kovalyov
2017-08-29 20:30:16 +0800
0208da90d IB/rdmavt: Handle dereg of inuse MRs properly ... Browse Code »

A destroy of an MR prior to destroying the QP can cause the following
diagnostic if the QP is referencing the MR being de-registered:

hfi1 0000:05:00.0: hfi1_0: rvt_dereg_mr timeout mr ffff8808562108
00 pd ffff880859b20b00

The solution is to when the a non-zero refcount is encountered when
the MR is destroyed the QPs needs to be iterated looking for QPs in
the same PD as the MR. If rvt_qp_mr_clean() detects any such QP
references the rkey/lkey, the QP needs to be put into an error state
via a call to rvt_qp_error() which will trigger the clean up of any
stuck references.

This solution is as specified in IBTA 1.3 Volume 1 11.2.10.5.

[This is reproduced with the 0.4.9 version of qperf and the rc_bw test]

Reviewed-by: Dennis Dalessandro
Signed-off-by: Mike Marciniszyn
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford

Mike Marciniszyn
2017-08-29 07:12:31 +0800
4734b4f41 IB/rdmavt: Add QP iterator API for QPs ... Browse Code »

There are currently 3 spots in the qib and hfi1 driver that have
knowledge of the internal QP hash list that should only be in
scope to rdmavt QP code.

Add an iterator API for processing all QPs to hide the
nature of the RCU hashlist.

The API consists of:
- rvt_qp_iter_init()
* For iterating QPs one at a time for seq_file semantics
- rvt_qp_iter_next()
* For iterating QPs one at a time for seq_file semantics
- rvt_qp_iter()
* For iterating all QPs

The first two are used for things like seq_file prints.

The last is for code that just needs to iterate all QPs
in the system.

Reviewed-by: Dennis Dalessandro
Signed-off-by: Mike Marciniszyn
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford

Mike Marciniszyn
2017-08-29 07:12:28 +0800

25 Aug, 2017

4 commits

78b57f952 RDMA/core: Cleanup device capability enum ... Browse Code »

Cleanup patch prior exporting the ib_device_cap_flags
to the user space. In this patch, we are aligning the
indentation, removing IB_DEVICE_INIT_TYPE and IB_DEVICE_RESERVED
fields, because it is not used in the kernel.

Signed-off-by: Leon Romanovsky
Reviewed-by: Dennis Dalessandro
Signed-off-by: Doug Ledford

Leon Romanovsky
2017-08-25 04:27:10 +0800
dcc9881e6 RDMA/(core, ulp): Convert register/unregister event handler to be void ... Browse Code »

The functions ib_register_event_handler() and
ib_unregister_event_handler() always returned success and they can't fail.

Let's convert those functions to be void, remove redundant checks and
cleanup tons of goto statements.

Signed-off-by: Leon Romanovsky
Reviewed-by: Dennis Dalessandro
Signed-off-by: Doug Ledford

Leon Romanovsky
2017-08-25 04:27:10 +0800
732912c73 Merge branch 'k.o/for-4.13-rc' into k.o/for-next ... Browse Code »

Pick up -rc fixes.

Signed-off-by: Doug Ledford

Doug Ledford
2017-08-25 03:58:26 +0800
498ca3c82 IB/core: Avoid accessing non-allocated memory when inferring port type ... Browse Code »

Commit 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
introduced the concept of type in ah_attr:
* During ib_register_device, each port is checked for its type which
is stored in ib_device's port_immutable array.
* During uverbs' modify_qp, the type is inferred using the port number
in ib_uverbs_qp_dest struct (address vector) by accessing the
relevant port_immutable array and the type is passed on to
providers.

IB spec (version 1.3) enforces a valid port value only in Reset to
Init. During Init to RTR, the address vector must be valid but port
number is not mentioned as a field in the address vector, so its
value is not validated, which leads to accesses to a non-allocated
memory when inferring the port type.

Save the real port number in ib_qp during modify to Init (when the
comp_mask indicates that the port number is valid) and use this value
to infer the port type.

Avoid copying the address vector fields if the matching bit is not set
in the attr_mask. Address vector can't be modified before the port, so
no valid flow is affected.

Fixes: 44c58487d51a ('IB/core: Define 'ib' and 'roce' rdma_ah_attr types')
Signed-off-by: Noa Osherovich
Reviewed-by: Yishai Hadas
Signed-off-by: Leon Romanovsky
Signed-off-by: Doug Ledford

Noa Osherovich
2017-08-25 03:33:33 +0800

23 Aug, 2017

5 commits

e3bf14bdc rdma: Autoload netlink client modules ... Browse Code »

If a message comes in and we do not have the client in the table, then
try to load the module supplying that client using MODULE_ALIAS to find
it.

This duplicates the scheme seen in other netlink muxes (eg nfnetlink).

Signed-off-by: Jason Gunthorpe
Reviewed-by: Leon Romanovsky
Signed-off-by: Doug Ledford

Jason Gunthorpe
2017-08-23 05:04:22 +0800
ec0d8b8a6 IB/hfi1: Stricter bounds checking of MAD trap index ... Browse Code »

The macro size is valid. This change makes it less ambiguous.
Bounds check trap type for better security.

Reviewed-by: Michael J. Ruhl
Signed-off-by: Kamenee Arumugam
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford

Kamenee Arumugame
2017-08-23 02:22:37 +0800
51e658f5d IB/rdmavt, hfi1, qib: Enhance rdmavt and hfi1 to use 32 bit lids ... Browse Code »

Increase lid used in hfi1 driver to 32 bits. qib continues
to use 16 bit lids.

Reviewed-by: Dennis Dalessandro
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Don Hiatt
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford

Dasaratharaman Chandramouli
2017-08-23 02:22:37 +0800
88733e3b8 IB/hfi1: Add 16B UD support ... Browse Code »

Add 16B bypass packet support for UD traffic types.

Reviewed-by: Dennis Dalessandro
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Don Hiatt
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford

Don Hiatt
2017-08-23 02:22:37 +0800
d98bb7f7e IB/hfi1: Determine 9B/16B L2 header type based on Address handle ... Browse Code »

When address handle attributes are initialized, the LIDs are
transformed to be in the 32 bit LID space.
When constructing the header, hfi1 driver will look at the LID
to determine the packet header to be created.

Reviewed-by: Dennis Dalessandro
Signed-off-by: Dasaratharaman Chandramouli
Signed-off-by: Don Hiatt
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford

Don Hiatt
2017-08-23 02:22:37 +0800