Eric Lee / smarc-fsl-linux-kernel

06 May, 2015

2 commits

0d0f738f6 IB/core: Fix unaligned accesses ... Browse Code »

Addresses the following kernel logs seen during boot of sparc systems:

Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]
Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]
Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]
Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]
Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]

Signed-off-by: David Ahern
Signed-off-by: Doug Ledford

David Ahern
2015-05-06 01:21:27 +0800
471e70583 IB/core: change rdma_gid2ip into void function as it always return zero ... Browse Code »

Signed-off-by: Honggang Li
Acked-by: Sean Hefty
Signed-off-by: Doug Ledford

Honggang LI
2015-05-06 01:21:27 +0800

05 May, 2015

1 commit

6eec17746 RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the con… ... Browse Code »

…necting peer to its clients

Add functionality to enable the port mapper on the passive side to provide to its
clients the actual (non-mapped) ip/tcp address information of the connecting peer

1) Adding remote_info_cb() to process the address info of the connecting peer
The address info is provided by the user space port mapper service when
the connection is initiated by the peer
2) Adding a hash list to store the remote address info
3) Adding functionality to add/remove the remote address info
After the info has been provided to the port mapper client,
it is removed from the hash list

Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

Tatyana Nikolova
2015-05-05 21:18:01 +0800

06 Feb, 2015

1 commit

43c611657 Revert "IB/core: Add support for extended query device caps" ... Browse Code »

While commit 7e36ef8205ff ("IB/core: Temporarily disable
ex_query_device uverb") is correct as it makes the extended
QUERY_DEVICE uverb (which came as part of commit 5a77abf9a97a
("IB/core: Add support for extended query device caps") and commit
860f10a799c8 ("IB/core: Add flags for on demand paging support")) not
available to userspace, it doesn't address the initial issue regarding
ib_copy_to_udata() [1][2].

Additionally, further discussions around this new uverb seems to
conclude it would require a different data structure than the one
currently described in [3].

Both of these issues require a revert of the changes, so this patch
partially reverts commit 8cdd312cfed7 ("IB/mlx5: Implement the ODP
capability query verb") and commit 860f10a799c8 ("IB/core: Add flags
for on demand paging support") and fully reverts commit 5a77abf9a97a
("IB/core: Add support for extended query device caps").

[1] "Re: [PATCH v3 06/17] IB/core: Add support for extended query device caps"
http://mid.gmane.org/1418733236.2779.26.camel@opteya.com

[2] "Re: [PATCH] IB/core: Temporarily disable ex_query_device uverb"
http://mid.gmane.org/1423067503.3030.83.camel@opteya.com

[3] "RE: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask"
http://mid.gmane.org/2807E5FD2F6FDA4886F6618EAC48510E0CC12C30@CRSMSX101.amr.corp.intel.com

Cc: Eli Cohen
Cc: Haggai Eran
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Sagi Grimberg
Cc: Shachar Raindel
Signed-off-by: Yann Droneaud
Signed-off-by: Roland Dreier

Yann Droneaud
2015-02-06 16:54:33 +0800

16 Dec, 2014

7 commits

882214e2b IB/core: Implement support for MMU notifiers regarding on demand paging regions ... Browse Code »

* Add an interval tree implementation for ODP umems. Create an
interval tree for each ucontext (including a count of the number of
ODP MRs in this context, semaphore, etc.), and register ODP umems in
the interval tree.
* Add MMU notifiers handling functions, using the interval tree to
notify only the relevant umems and underlying MRs.
* Register to receive MMU notifier events from the MM subsystem upon
ODP MR registration (and unregister accordingly).
* Add a completion object to synchronize the destruction of ODP umems.
* Add mechanism to abort page faults when there's a concurrent invalidation.

The way we synchronize between concurrent invalidations and page
faults is by keeping a counter of currently running invalidations, and
a sequence number that is incremented whenever an invalidation is
caught. The page fault code checks the counter and also verifies that
the sequence number hasn't progressed before it updates the umem's
page tables. This is similar to what the kvm module does.

In order to prevent the case where we register a umem in the middle of
an ongoing notifier, we also keep a per ucontext counter of the total
number of active mmu notifiers. We only enable new umems when all the
running notifiers complete.

Signed-off-by: Sagi Grimberg
Signed-off-by: Shachar Raindel
Signed-off-by: Haggai Eran
Signed-off-by: Yuval Dagan
Signed-off-by: Roland Dreier

Haggai Eran
2014-12-16 10:13:36 +0800
8ada2c1c0 IB/core: Add support for on demand paging regions ... Browse Code »

* Extend the umem struct to keep the ODP related data.
* Allocate and initialize the ODP related information in the umem
(page_list, dma_list) and freeing as needed in the end of the run.
* Store a reference to the process PID struct in the ucontext. Used to
safely obtain the task_struct and the mm during fault handling,
without preventing the task destruction if needed.
* Add 2 helper functions: ib_umem_odp_map_dma_pages and
ib_umem_odp_unmap_dma_pages. These functions get the DMA addresses
of specific pages of the umem (and, currently, pin them).
* Support for page faults only - IB core will keep the reference on
the pages used and call put_page when freeing an ODP umem
area. Invalidations support will be added in a later patch.

Signed-off-by: Sagi Grimberg
Signed-off-by: Shachar Raindel
Signed-off-by: Haggai Eran
Signed-off-by: Majd Dibbiny
Signed-off-by: Roland Dreier

Shachar Raindel
2014-12-16 10:13:36 +0800
860f10a79 IB/core: Add flags for on demand paging support ... Browse Code »

* Add a configuration option for enable on-demand paging support in
the infiniband subsystem (CONFIG_INFINIBAND_ON_DEMAND_PAGING). In a
later patch, this configuration option will select the MMU_NOTIFIER
configuration option to enable mmu notifiers.
* Add a flag for on demand paging (ODP) support in the IB device capabilities.
* Add a flag to request ODP MR in the access flags to reg_mr.
* Fail registrations done with the ODP flag when the low-level driver
doesn't support this.
* Change the conditions in which an MR will be writable to explicitly
specify the access flags. This is to avoid making an MR writable just
because it is an ODP MR.
* Add a ODP capabilities to the extended query device verb.

Signed-off-by: Sagi Grimberg
Signed-off-by: Shachar Raindel
Signed-off-by: Haggai Eran
Signed-off-by: Roland Dreier

Sagi Grimberg
2014-12-16 10:13:35 +0800
5a77abf9a IB/core: Add support for extended query device caps ... Browse Code »

Add extensible query device capabilities verb to allow adding new features.
ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to
copy capability fields to be used by both ib_uverbs_query_device and
ib_uverbs_ex_query_device.

Signed-off-by: Eli Cohen
Signed-off-by: Haggai Eran
Signed-off-by: Roland Dreier

Eli Cohen
2014-12-16 10:13:35 +0800
c1395a2a8 IB/mlx5: Add function to read WQE from user-space ... Browse Code »

Add a helper function mlx5_ib_read_user_wqe to read information from
user-space owned work queues. The function will be used in a later
patch by the page-fault handling code in mlx5_ib.

Signed-off-by: Haggai Eran

[ Add stub for ib_umem_copy_from() for CONFIG_INFINIBAND_USER_MEM=n
- Roland ]

Signed-off-by: Roland Dreier

Haggai Eran
2014-12-16 10:13:35 +0800
c5d76f130 IB/core: Add umem function to read data from user-space ... Browse Code »

In some drivers there's a need to read data from a user space area
that was pinned using ib_umem when running from a different process
context.

The ib_umem_copy_from function allows reading data from the physical
pages pinned in the ib_umem struct.

Signed-off-by: Haggai Eran
Signed-off-by: Roland Dreier

Haggai Eran
2014-12-16 10:13:35 +0800
406f9e5fa IB/core: Replace ib_umem's offset field with a full address ... Browse Code »

In order to allow umems that do not pin memory, we need the umem to
keep track of its region's address.

This makes the offset field redundant, and so this patch removes it.

Signed-off-by: Haggai Eran
Signed-off-by: Roland Dreier

Haggai Eran
2014-12-16 10:13:35 +0800

09 Oct, 2014

1 commit

78eda2bb6 IB/mlx5, iser, isert: Add Signature API additions ... Browse Code »

Expose more signature setting parameters. We modify the signature API
to allow usage of some new execution parameters relevant to data
integrity feature.

This patch modifies ib_sig_domain structure by:

- Deprecate DIF type in signature API (operation will
be determined by the parameters alone, no DIF type awareness)
- Add APPTAG check bitmask (for input domain)
- Add REFTAG remap (increment) flag for each domain
- Add APPTAG/REFTAG escape options for each domain

The mlx5 driver is modified to follow the new parameters in HW
signature setup.

At the moment the callers (iser/isert) hard-code new parameters (by
DIF type). In the future, callers will retrieve them from the scsi
command structure.

Signed-off-by: Sagi Grimberg
Signed-off-by: Roland Dreier

Sagi Grimberg
2014-10-09 15:10:53 +0800

20 Sep, 2014

1 commit

87773dd56 IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get ... Browse Code »

In debugging an application that receives -ENOMEM from ib_reg_mr(), I
found that ib_umem_get() can fail because the pinned_vm count has
wrapped causing it to always be larger than the lock limit even with
RLIMIT_MEMLOCK set to RLIM_INFINITY.

The wrapping of pinned_vm occurs because the process that calls
ib_reg_mr() will have its mm->pinned_vm count incremented. Later a
different process with a different mm_struct than the one that
allocated the ib_umem struct ends up releasing it which results in
decrementing the new processes mm->pinned_vm count past zero and
wrapping.

I'm not entirely sure what circumstances cause a different process to
release the ib_umem than the one that allocated it but the kernel
stack trace of the freeing process from my situation looks like the
following:

Call Trace:
[] dump_stack+0x19/0x1b
[] ib_umem_release+0x1f5/0x200 [ib_core]
[] mlx4_ib_destroy_qp+0x241/0x440 [mlx4_ib]
[] ib_destroy_qp+0x12c/0x170 [ib_core]
[] ib_uverbs_close+0x259/0x4e0 [ib_uverbs]
[] __fput+0xba/0x240
[] ____fput+0xe/0x10
[] task_work_run+0xc4/0xe0
[] do_notify_resume+0x95/0xa0
[] int_signal+0x12/0x17

The following patch fixes the issue by storing the pid struct of the
process that calls ib_umem_get() so that ib_umem_release and/or
ib_umem_account() can properly decrement the pinned_vm count of the
correct mm_struct.

Signed-off-by: Shawn Bohrer
Reviewed-by: Shachar Raindel
Signed-off-by: Roland Dreier

Shawn Bohrer
2014-09-20 00:55:42 +0800

14 Aug, 2014

1 commit

d087f6ad7 Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'iwcm', 'mad', 'misc', 'mlx4', … ... Browse Code »

…'mlx5', 'ocrdma' and 'srp' into for-next

Roland Dreier
2014-08-14 23:58:04 +0800

11 Aug, 2014

2 commits

1471cb6ca IB/mad: Add user space RMPP support ... Browse Code »

Using the new registration mechanism, define a flag that indicates the
user wishes to process RMPP messages in user space rather than have
the kernel process them.

Signed-off-by: Ira Weiny
Signed-off-by: Roland Dreier

Ira Weiny
2014-08-11 11:36:00 +0800
0f29b46d4 IB/mad: add new ioctl to ABI to support new registration options ... Browse Code »

Registrations options are specified through flags. Definitions of flags will
be in subsequent patches.

Signed-off-by: Ira Weiny
Signed-off-by: Roland Dreier

Ira Weiny
2014-08-11 11:36:00 +0800

02 Aug, 2014

1 commit

7e6edb9b2 IB/core: Add user MR re-registration support ... Browse Code »

Memory re-registration is a feature that enables changing the
attributes of a memory region registered by user-space, including PD,
translation (address and length) and access flags.

Add the required support in uverbs and the kernel verbs API.

Signed-off-by: Matan Barak
Signed-off-by: Or Gerlitz
Signed-off-by: Roland Dreier

Matan Barak
2014-08-02 06:11:13 +0800

11 Jun, 2014

2 commits

eeaddf367 Merge branches 'core', 'cxgb3', 'cxgb4', 'iser', 'iwpm', 'misc', 'mlx4', 'mlx5',… ... Browse Code »

… 'noio', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next

Roland Dreier
2014-06-11 01:12:14 +0800
30dc5e63d RDMA/core: Add support for iWARP Port Mapper user space service ... Browse Code »

This patch adds iWARP Port Mapper (IWPM) Version 2 support. The iWARP
Port Mapper implementation is based on the port mapper specification
section in the Sockets Direct Protocol paper -
http://www.rdmaconsortium.org/home/draft-pinkerton-iwarp-sdp-v1.0.pdf

Existing iWARP RDMA providers use the same IP address as the native
TCP/IP stack when creating RDMA connections. They need a mechanism to
claim the TCP ports used for RDMA connections to prevent TCP port
collisions when other host applications use TCP ports. The iWARP Port
Mapper provides a standard mechanism to accomplish this. Without this
service it is possible for RDMA application to bind/listen on the same
port which is already being used by native TCP host application. If
that happens the incoming TCP connection data can be passed to the
RDMA stack with error.

The iWARP Port Mapper solution doesn't contain any changes to the
existing network stack in the kernel space. All the changes are
contained with the infiniband tree and also in user space.

The iWARP Port Mapper service is implemented as a user space daemon
process. Source for the IWPM service is located at
http://git.openfabrics.org/git?p=~tnikolova/libiwpm-1.0.0/.git;a=summary

The iWARP driver (port mapper client) sends to the IWPM service the
local IP address and TCP port it has received from the RDMA
application, when starting a connection. The IWPM service performs a
socket bind from user space to get an available TCP port, called a
mapped port, and communicates it back to the client. In that sense,
the IWPM service is used to map the TCP port, which the RDMA
application uses to any port available from the host TCP port
space. The mapped ports are used in iWARP RDMA connections to avoid
collisions with native TCP stack which is aware that these ports are
taken. When an RDMA connection using a mapped port is terminated, the
client notifies the IWPM service, which then releases the TCP port.

The message exchange between the IWPM service and the iWARP drivers
(between user space and kernel space) is implemented using netlink
sockets.

1) Netlink interface functions are added: ibnl_unicast() and
ibnl_mulitcast() for sending netlink messages to user space

2) The signature of the existing ibnl_put_msg() is changed to be more
generic

3) Two netlink clients are added: RDMA_NL_NES, RDMA_NL_C4IW
corresponding to the two iWarp drivers - nes and cxgb4 which use
the IWPM service

4) Enums are added to enumerate the attributes in the netlink
messages, which are exchanged between the user space IWPM service
and the iWARP drivers

Signed-off-by: Tatyana Nikolova
Signed-off-by: Steve Wise
Reviewed-by: PJ Waskiewicz

[ Fold in range checking fixes and nlh_next removal as suggested by Dan
Carpenter and Steve Wise. Fix sparse endianness in hash. - Roland ]

Signed-off-by: Roland Dreier

Tatyana Nikolova
2014-06-11 01:11:45 +0800

05 Jun, 2014

1 commit

8385fd841 IB/core: Fix sparse warnings about redeclared functions ... Browse Code »

Fix a few functions that are declared with __attribute_const__ in the
ib_verbs.h header file but defined without it in verbs.c. This gets rid
of the following sparse warnings:

drivers/infiniband/core/verbs.c:51:5: error: symbol 'ib_rate_to_mult' redeclared with different type (originally declared at include/rdma/ib_verbs.h:469) - different modifiers
drivers/infiniband/core/verbs.c:68:14: error: symbol 'mult_to_ib_rate' redeclared with different type (originally declared at include/rdma/ib_verbs.h:607) - different modifiers
drivers/infiniband/core/verbs.c:85:5: error: symbol 'ib_rate_to_mbps' redeclared with different type (originally declared at include/rdma/ib_verbs.h:476) - different modifiers
drivers/infiniband/core/verbs.c:111:1: error: symbol 'rdma_node_get_transport' redeclared with different type (originally declared at include/rdma/ib_verbs.h:84) - different modifiers

Signed-off-by: Roland Dreier

Roland Dreier
2014-06-05 01:01:42 +0800

03 Jun, 2014

1 commit

09b93088d IB: Add a QP creation flag to use GFP_NOIO allocations ... Browse Code »

This addresses a problem where NFS client writes over IPoIB connected
mode may deadlock on memory allocation/writeback.

The problem is not directly memory reclamation. There is an indirect
dependency between network filesystems writing back pages and
ipoib_cm_tx_init() due to how a kworker is used. Page reclaim cannot
make forward progress until ipoib_cm_tx_init() succeeds and it is
stuck in page reclaim itself waiting for network transmission.
Ordinarily this situation may be avoided by having the caller use
GFP_NOFS but ipoib_cm_tx_init() does not have that information.

To address this, take a general approach and add a new QP creation
flag that tells the low-level hardware driver to use GFP_NOIO for the
memory allocations related to the new QP.

Use the new flag in the ipoib connected mode path, and if the driver
doesn't support it, re-issue the QP creation without the flag.

Signed-off-by: Mel Gorman
Signed-off-by: Jiri Kosina
Signed-off-by: Or Gerlitz
Signed-off-by: Roland Dreier

Or Gerlitz
2014-06-03 05:58:11 +0800

03 Apr, 2014

1 commit

f7eaa7ed8 Merge branches 'core', 'cxgb4', 'ip-roce', 'iser', 'misc', 'mlx4', 'nes', 'ocrdm… ... Browse Code »

…a', 'qib', 'sgwrapper', 'srp' and 'usnic' into for-next

Roland Dreier
2014-04-03 23:30:17 +0800

02 Apr, 2014

2 commits

b2853fd6c IB/core: Don't resolve passive side RoCE L2 address in CMA REQ handler ... Browse Code »

The code that resolves the passive side source MAC within the rdma_cm
connection request handler was both redundant and buggy, so remove it.

It was redundant since later, when an RC QP is modified to RTR state,
the resolution will take place in the ib_core module. It was buggy
because this callback also deals with UD SIDR exchange, for which we
incorrectly looked at the REQ member of the CM event and dereferenced
a random value.

Fixes: dd5f03beb4f7 ("IB/core: Ethernet L2 attributes in verbs/cm structures")
Signed-off-by: Moni Shoua
Signed-off-by: Or Gerlitz
Signed-off-by: Roland Dreier

Moni Shoua
2014-04-02 05:05:26 +0800
ea58a5956 IB/core: Remove overload in ib_sg_dma* ... Browse Code »

The code is replaced by driver specific changes and avoids the pointer
NULL test for drivers that don't overload these operations.

Suggested-by:
Reviewed-by: Dennis Dalessandro
Tested-by: Vinod Kumar
Signed-off-by: Mike Marciniszyn
Signed-off-by: Roland Dreier

Mike Marciniszyn
2014-04-02 02:16:32 +0800

08 Mar, 2014

2 commits

1b01d3356 IB/core: Introduce signature verbs API ... Browse Code »

Introduce a verbs interface for signature-related operations. A
signature handover operation configures the layouts of data and
protection attributes both in memory and wire domains.

Signature operations are:

- INSERT:
Generate and insert protection information when handing over
data from input space to output space.
- validate and STRIP:
Validate protection information and remove it when handing over
data from input space to output space.
- validate and PASS:
Validate protection information and pass it when handing over
data from input space to output space.

Once the signature handover opration is done, the HCA will offload
data integrity generation/validation while performing the actual data
transfer.

Additions:

1. HCA signature capabilities in device attributes
Verbs provider supporting signature handover operations fills
relevant fields in device attributes structure returned by
ib_query_device.

2. QP creation flag IB_QP_CREATE_SIGNATURE_EN
Creating a QP that will carry signature handover operations may
require some special preparations from the verbs provider. So we
add QP creation flag IB_QP_CREATE_SIGNATURE_EN to declare that the
created QP may carry out signature handover operations. Expose
signature support to verbs layer (no support for now).

3. New send work request IB_WR_REG_SIG_MR
Signature handover work request. This WR will define the signature
handover properties of the memory/wire domains as well as the
domains layout. The purpose of this work request is to bind all
the needed information for the signature operation:

- data to be transferred: wr->sg_list (ib_sge).
* The raw data, pre-registered to a single MR (normally, before
signature, this MR would have been used directly for the data
transfer)
- data protection guards: sig_handover.prot (ib_sge).
* The data protection buffer, pre-registered to a single MR, which
contains the data integrity guards of the raw data blocks.
Note that it may not always exist, only in cases where the user is
interested in storing protection guards in memory.
- signature operation attributes: sig_handover.sig_attrs.
* Tells the HCA how to validate/generate the protection information.

Once the work request is executed, the memory region that will
describe the signature transaction will be the sig_mr. The
application can now go ahead and send the sig_mr.rkey or use the
sig_mr.lkey for data transfer.

4. New Verb ib_check_mr_status
check_mr_status verb checks the status of the memory region post
transaction. The first check that may be used is
IB_MR_CHECK_SIG_STATUS, which will indicate if any signature
errors are pending for a specific signature-enabled ib_mr. This
verb is a lightwight check and is allowed to be taken from
interrupt context. An application must call this verb after it is
known that the actual data transfer has finished.

Signed-off-by: Sagi Grimberg
Signed-off-by: Roland Dreier

Sagi Grimberg
2014-03-08 03:26:49 +0800
17cd3a2db IB/core: Introduce protected memory regions ... Browse Code »

This commit introduces verbs for creating/destoying memory
regions which will allow new types of memory key operations such
as protected memory registration.

Indirect memory registration is registering several (one
of more) pre-registered memory regions in a specific layout.
The Indirect region may potentialy describe several regions
and some repitition format between them.

Protected Memory registration is registering a memory region
with various data integrity attributes which will describe protection
schemes that will be handled by the HCA in an offloaded manner.
These memory regions will be applicable for a new REG_SIG_MR
work request introduced later in this patchset.

In the future these routines may replace or implement current memory
regions creation routines existing today:
- ib_reg_user_mr
- ib_alloc_fast_reg_mr
- ib_get_dma_mr
- ib_dereg_mr

Signed-off-by: Sagi Grimberg
Signed-off-by: Roland Dreier

Sagi Grimberg
2014-03-08 03:26:49 +0800

05 Mar, 2014

1 commit

eeb8461e3 IB: Refactor umem to use linear SG table ... Browse Code »

This patch refactors the IB core umem code and vendor drivers to use a
linear (chained) SG table instead of chunk list. With this change the
relevant code becomes clearer—no need for nested loops to build and
use umem.

Signed-off-by: Shachar Raindel
Signed-off-by: Yishai Hadas
Signed-off-by: Roland Dreier

Yishai Hadas
2014-03-05 02:34:28 +0800

14 Feb, 2014

1 commit

b4a26a272 IB: Report using RoCE IP based gids in port caps ... Browse Code »

For userspace RoCE UD QPs we need to know the GID format that the
kernel uses, e.g when working over older kernels. For that end, add a
new port capability IB_PORT_IP_BASED_GIDS and report it when query
port is issued.

Signed-off-by: Moni Shoua
Signed-off-by: Matan Barak
Signed-off-by: Or Gerlitz
Signed-off-by: Roland Dreier

Moni Shoua
2014-02-14 06:46:03 +0800

23 Jan, 2014

2 commits

fb1b5034e Merge branch 'ip-roce' into for-next ... Browse Code »

Conflicts:
drivers/infiniband/hw/mlx4/main.c

Roland Dreier
2014-01-23 15:24:21 +0800
8f399921e Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'oc… ... Browse Code »

…rdma', 'qib', 'srp' and 'usnic' into for-next

Roland Dreier
2014-01-23 15:24:13 +0800

19 Jan, 2014

2 commits

7b85627b9 IB/cma: IBoE (RoCE) IP-based GID addressing ... Browse Code »

Currently, the IB core and specifically the RDMA-CM assumes that IBoE
(RoCE) gids encode related Ethernet netdevice interface MAC address
and possibly VLAN id.

Change GIDs to be treated as they encode interface IP address.

Since Ethernet layer 2 address parameters are not longer encoded
within gids, we have to extend the Infiniband address structures (e.g.
ib_ah_attr) with layer 2 address parameters, namely mac and vlan.

Signed-off-by: Moni Shoua
Signed-off-by: Or Gerlitz
Signed-off-by: Roland Dreier

Moni Shoua
2014-01-19 06:12:35 +0800
5db5765e2 IB/core: Add support for RDMA_NODE_USNIC_UDP ... Browse Code »

Add the complementary RDMA_NODE_USNIC_UDP for RDMA_TRANSPORT_USNIC_UDP.

Signed-off-by: Upinder Malhi
Signed-off-by: Roland Dreier

Upinder Malhi
2014-01-19 05:48:54 +0800

15 Jan, 2014

3 commits

dd5f03beb IB/core: Ethernet L2 attributes in verbs/cm structures ... Browse Code »

This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.

When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.

Thus, those attributes were added to the following structures:

* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id

For the path record structure, extra care was taken to avoid the new
fields when packing it into wire format, so we don't break the IB CM
and SA wire protocol.

On the active side, the CM fills. its internal structures from the
path provided by the ULP. We add there taking the ETH L2 attributes
and placing them into the CM Address Handle (struct cm_av).

On the passive side, the CM fills its internal structures from the WC
associated with the REQ message. We add there taking the ETH L2
attributes from the WC.

When the HW driver provides the required ETH L2 attributes in the WC,
they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
code checks for the presence of these flags, and in their absence does
address resolution from the ib_init_ah_from_wc() helper function.

ib_modify_qp_is_ok is also updated to consider the link layer. Some
parameters are mandatory for Ethernet link layer, while they are
irrelevant for IB. Vendor drivers are modified to support the new
function signature.

Signed-off-by: Matan Barak
Signed-off-by: Or Gerlitz
Signed-off-by: Roland Dreier

Matan Barak
2014-01-15 06:20:54 +0800
240ae00e4 IB/core: Add support for IB L2 device-managed steering ... Browse Code »

This patch adds preliminary support for IB L2 device-managed steering,
currently exposed only in the kernel.

This flow spec can be used by low-level drivers that need to indicate
the link layer type when creating device-managed flow rules.

Signed-off-by: Matan Barak
Signed-off-by: Or Gerlitz
Signed-off-by: Roland Dreier

Matan Barak
2014-01-15 06:06:50 +0800
90f1d1b41 IB/core: Add flow steering support for IPoIB UD traffic ... Browse Code »

When creating an IPoIB UD QP, provide a hint to the low level driver
that the QP should support flow-steering. This means that privileged
user space applications can steer TCP/IP IPoIB traffic from the
network stack, in a similar manner done with Ethernet RAW_PACKET QPs.

The hint is provided through new QP creation flag called NETIF_QP.

Signed-off-by: Matan Barak
Signed-off-by: Or Gerlitz
Signed-off-by: Roland Dreier

Matan Barak
2014-01-15 06:06:50 +0800

14 Jan, 2014

1 commit

248567f79 IB/core: Add RDMA_TRANSPORT_USNIC_UDP ... Browse Code »

Add RDMA_TRANSPORT_USNIC_UDP which will be used by usNIC.

Signed-off-by: Upinder Malhi
Signed-off-by: Roland Dreier

Upinder Malhi
2014-01-14 16:44:45 +0800

17 Dec, 2013

1 commit

309243ec1 IB/core: const'ify inbuf in struct ib_udata ... Browse Code »

Userspace input buffer is not modified by kernel, so it can be 'const'.

This is also a prerequisite to remove the implicit cast
from INIT_UDATA().

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud
Signed-off-by: Roland Dreier

Yann Droneaud
2013-12-17 02:38:28 +0800

18 Nov, 2013

2 commits

b4fdf52b3 Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'ne… ... Browse Code »

…s', 'ocrdma', 'qib' and 'srp' into for-next

Roland Dreier
2013-11-18 00:22:19 +0800
f21519b23 IB/core: extended command: an improved infrastructure for uverbs commands ... Browse Code »

Commit 400dbc96583f ("IB/core: Infrastructure for extensible uverbs
commands") added an infrastructure for extensible uverbs commands
while later commit 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow
through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
using this new infrastructure.

According to the commit 400dbc96583f, the purpose of this
infrastructure is to support passing around provider (eg. hardware)
specific buffers when userspace issue commands to the kernel, so that
it would be possible to extend uverbs (eg. core) buffers independently
from the provider buffers.

But the new kernel command function prototypes were not modified to
take advantage of this extension. This issue was exposed by Roland
Dreier in a previous review[1].

So the following patch is an attempt to a revised extensible command
infrastructure.

This improved extensible command infrastructure distinguish between
core (eg. legacy)'s command/response buffers from provider
(eg. hardware)'s command/response buffers: each extended command
implementing function is given a struct ib_udata to hold core
(eg. uverbs) input and output buffers, and another struct ib_udata to
hold the hw (eg. provider) input and output buffers.

Having those buffers identified separately make it easier to increase
one buffer to support extension without having to add some code to
guess the exact size of each command/response parts: This should make
the extended functions more reliable.

Additionally, instead of relying on command identifier being greater
than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
unused bits in command field: on the 32 bits provided by command
field, only 6 bits are really needed to encode the identifier of
commands currently supported by the kernel. (Even using only 6 bits
leaves room for about 23 new commands).

So this patch makes use of some high order bits in command field to
store flags, leaving enough room for more command identifiers than one
will ever need (eg. 256).

The new flags are used to specify if the command should be processed
as an extended one or a legacy one. While designing the new command
format, care was taken to make usage of flags itself extensible.

Using high order bits of the commands field ensure that newer
libibverbs on older kernel will properly fail when trying to call
extended commands. On the other hand, older libibverbs on newer kernel
will never be able to issue calls to extended commands.

The extended command header includes the optional response pointer so
that output buffer length and output buffer pointer are located
together in the command, allowing proper parameters checking. This
should make implementing functions easier and safer.

Additionally the extended header ensure 64bits alignment, while making
all sizes multiple of 8 bytes, extending the maximum buffer size:

legacy extended

Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)

For the purpose of doing proper buffer size accounting, the headers
size are no more taken in account in "in_words".

One of the odds of the current extensible infrastructure, reading
twice the "legacy" command header, is fixed by removing the "legacy"
command header from the extended command header: they are processed as
two different parts of the command: memory is read once and
information are not duplicated: it's making clear that's an extended
command scheme and not a different command scheme.

The proposed scheme will format input (command) and output (response)
buffers this way:

- command:

legacy header +
extended header +
command data (core + hw):

+----------------------------------------+
| flags | 00 00 | command |
| in_words | out_words |
+----------------------------------------+
| response |
| response |
| provider_in_words | provider_out_words |
| padding |
+----------------------------------------+
| |
. .
. (in_words * 8) .
| |
+----------------------------------------+
| |
. .
. (provider_in_words * 8) .
| |
+----------------------------------------+

- response, if present:

+----------------------------------------+
| |
. .
. (out_words * 8) .
| |
+----------------------------------------+
| |
. .
. (provider_out_words * 8) .
| |
+----------------------------------------+

The overall design is to ensure that the extensible infrastructure is
itself extensible while begin more reliable with more input and bound
checking.

Note:

The unused field in the extended header would be perfect candidate to
hold the command "comp_mask" (eg. bit field used to handle
compatibility). This was suggested by Roland Dreier in a previous
review[2]. But "comp_mask" field is likely to be present in the uverb
input and/or provider input, likewise for the response, as noted by
Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
header.

[1]:
http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com

[2]:
http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com

[3]:
http://marc.info/?i=525C1149.6000701@mellanox.com

Signed-off-by: Yann Droneaud
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com

[ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ]

Signed-off-by: Roland Dreier

Yann Droneaud
2013-11-18 00:22:09 +0800

16 Nov, 2013

1 commit

1c636f801 IB/core: Encorce MR access rights rules on kernel consumers ... Browse Code »

Enforce the rule that when requesting remote write or atomic permissions, local
write must be indicated as well. See IB spec 11.2.8.2.

Spotted by: Hagay Abramovsky
Signed-off-by: Eli Cohen
Signed-off-by: Roland Dreier

Eli Cohen
2013-11-16 02:25:32 +0800