Eric Lee / smarc-fsl-linux-kernel

22 Jul, 2018

1 commit

f5778c2d6 xprtrdma: Fix corner cases when handling device removal ... Browse Code »

commit 25524288631fc5b7d33259fca1e0dc38146be5d6 upstream.

Michal Kalderon has found some corner cases around device unload
with active NFS mounts that I didn't have the imagination to test
when xprtrdma device removal was added last year.

- The ULP device removal handler is responsible for deallocating
the PD. That wasn't clear to me initially, and my own testing
suggested it was not necessary, but that is incorrect.

- The transport destruction path can no longer assume that there
is a valid ID.

- When destroying a transport, ensure that ib_free_cq() is not
invoked on a CQ that was already released.

Reported-by: Michal Kalderon
Fixes: bebd031866ca ("xprtrdma: Support unplugging an HCA from ...")
Signed-off-by: Chuck Lever
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Anna Schumaker
Signed-off-by: Sudip Mukherjee
Signed-off-by: Greg Kroah-Hartman

Chuck Lever
2018-07-22 20:28:42 +0800

03 Jul, 2018

1 commit

d097e5b5a xprtrdma: Return -ENOBUFS when no pages are available ... Browse Code »

commit a8f688ec437dc2045cc8f0c89fe877d5803850da upstream.

The use of -EAGAIN in rpcrdma_convert_iovs() is a latent bug: the
transport never calls xprt_write_space() when more pages become
available. -ENOBUFS will trigger the correct "delay briefly and call
again" logic.

Fixes: 7a89f9c626e3 ("xprtrdma: Honor ->send_request API contract")
Signed-off-by: Chuck Lever
Cc: stable@vger.kernel.org # 4.8+
Signed-off-by: Anna Schumaker
Signed-off-by: Greg Kroah-Hartman

Chuck Lever
2018-07-03 17:24:54 +0800

26 Apr, 2018

2 commits

0b1fa241d svcrdma: Fix Read chunk round-up ... Browse Code »

[ Upstream commit 175e03101d36c3034f3c80038d4c28838351a7f2 ]

A single NFSv4 WRITE compound can often have three operations:
PUTFH, WRITE, then GETATTR.

When the WRITE payload is sent in a Read chunk, the client places
the GETATTR in the inline part of the RPC/RDMA message, just after
the WRITE operation (sans payload). The position value in the Read
chunk enables the receiver to insert the Read chunk at the correct
place in the received XDR stream; that is between the WRITE and
GETATTR.

According to RFC 8166, an NFS/RDMA client does not have to add XDR
round-up to the Read chunk that carries the WRITE payload. The
receiver adds XDR round-up padding if it is absent and the
receiver's XDR decoder requires it to be present.

Commit 193bcb7b3719 ("svcrdma: Populate tail iovec when receiving")
attempted to add support for receiving such a compound so that just
the WRITE payload appears in rq_arg's page list, and the trailing
GETATTR is placed in rq_arg's tail iovec. (TCP just strings the
whole compound into the head iovec and page list, without regard
to the alignment of the WRITE payload).

The server transport logic also had to accommodate the optional XDR
round-up of the Read chunk, which it did simply by lengthening the
tail iovec when round-up was needed. This approach is adequate for
the NFSv2 and NFSv3 WRITE decoders.

Unfortunately it is not sufficient for nfsd4_decode_write. When the
Read chunk length is a couple of bytes less than PAGE_SIZE, the
computation at the end of nfsd4_decode_write allows argp->pagelen to
go negative, which breaks the logic in read_buf that looks for the
tail iovec.

The result is that a WRITE operation whose payload length is just
less than a multiple of a page succeeds, but the subsequent GETATTR
in the same compound fails with NFS4ERR_OP_ILLEGAL because the XDR
decoder can't find it. Clients ignore the error, but they must
update their attribute cache via a separate round trip.

As nfsd4_decode_write appears to expect the payload itself to always
have appropriate XDR round-up, have svc_rdma_build_normal_read_chunk
add the Read chunk XDR round-up to the page_len rather than
lengthening the tail iovec.

Reported-by: Olga Kornievskaia
Fixes: 193bcb7b3719 ("svcrdma: Populate tail iovec when receiving")
Signed-off-by: Chuck Lever
Tested-by: Olga Kornievskaia
Signed-off-by: J. Bruce Fields
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Chuck Lever
2018-04-26 17:02:19 +0800
342d9092a xprtrdma: Fix backchannel allocation of extra rpcrdma_reps ... Browse Code »

[ Upstream commit d698c4a02ee02053bbebe051322ff427a2dad56a ]

The backchannel code uses rpcrdma_recv_buffer_put to add new reps
to the free rep list. This also decrements rb_recv_count, which
spoofs the receive overrun logic in rpcrdma_buffer_get_rep.

Commit 9b06688bc3b9 ("xprtrdma: Fix additional uses of
spin_lock_irqsave(rb_lock)") replaced the original open-coded
list_add with a call to rpcrdma_recv_buffer_put(), but then a year
later, commit 05c974669ece ("xprtrdma: Fix receive buffer
accounting") added rep accounting to rpcrdma_recv_buffer_put.
It was an oversight to let the backchannel continue to use this
function.

The fix this, let's combine the "add to free list" logic with
rpcrdma_create_rep.

Also, do not allocate RPCRDMA_MAX_BC_REQUESTS rpcrdma_reps in
rpcrdma_buffer_create and then allocate additional rpcrdma_reps in
rpcrdma_bc_setup_reps. Allocating the extra reps during backchannel
set-up is sufficient.

Fixes: 05c974669ece ("xprtrdma: Fix receive buffer accounting")
Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Chuck Lever
2018-04-26 17:02:04 +0800

22 Feb, 2018

2 commits

67154fb80 xprtrdma: Fix BUG after a device removal ... Browse Code »

commit e89e8d8fcdc6751e86ccad794b052fe67e6ad619 upstream.

Michal Kalderon reports a BUG that occurs just after device removal:

[ 169.112490] rpcrdma: removing device qedr0 for 192.168.110.146:20049
[ 169.143909] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[ 169.181837] IP: rpcrdma_dma_unmap_regbuf+0xa/0x60 [rpcrdma]

The RPC/RDMA client transport attempts to allocate some resources
on demand. Registered buffers are one such resource. These are
allocated (or re-allocated) by xprt_rdma_allocate to hold RPC Call
and Reply messages. A hardware resource is associated with each of
these buffers, as they can be used for a Send or Receive Work
Request.

If a device is removed from under an NFS/RDMA mount, the transport
layer is responsible for releasing all hardware resources before
the device can be finally unplugged. A BUG results when the NFS
mount hasn't yet seen much activity: the transport tries to release
resources that haven't yet been allocated.

rpcrdma_free_regbuf() already checks for this case, so just move
that check to cover the DEVICE_REMOVAL case as well.

Reported-by: Michal Kalderon
Fixes: bebd031866ca ("xprtrdma: Support unplugging an HCA ...")
Signed-off-by: Chuck Lever
Tested-by: Michal Kalderon
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Anna Schumaker
Signed-off-by: Greg Kroah-Hartman

Chuck Lever
2018-02-22 22:42:29 +0800
84b41e370 xprtrdma: Fix calculation of ri_max_send_sges ... Browse Code »

commit 1179e2c27efe21167ec9d882b14becefba2ee990 upstream.

Commit 16f906d66cd7 ("xprtrdma: Reduce required number of send
SGEs") introduced the rpcrdma_ia::ri_max_send_sges field. This fixes
a problem where xprtrdma would not work if the device's max_sge
capability was small (low single digits).

At least RPCRDMA_MIN_SEND_SGES are needed for the inline parts of
each RPC. ri_max_send_sges is set to this value:

ia->ri_max_send_sges = max_sge - RPCRDMA_MIN_SEND_SGES;

Then when marshaling each RPC, rpcrdma_args_inline uses that value
to determine whether the device has enough Send SGEs to convey an
NFS WRITE payload inline, or whether instead a Read chunk is
required.

More recently, commit ae72950abf99 ("xprtrdma: Add data structure to
manage RDMA Send arguments") used the ri_max_send_sges value to
calculate the size of an array, but that commit erroneously assumed
ri_max_send_sges contains a value similar to the device's max_sge,
and not one that was reduced by the minimum SGE count.

This assumption results in the calculated size of the sendctx's
Send SGE array to be too small. When the array is used to marshal
an RPC, the code can write Send SGEs into the following sendctx
element in that array, corrupting it. When the device's max_sge is
large, this issue is entirely harmless; but it results in an oops
in the provider's post_send method, if dev.attrs.max_sge is small.

So let's straighten this out: ri_max_send_sges will now contain a
value with the same meaning as dev.attrs.max_sge, which makes
the code easier to understand, and enables rpcrdma_sendctx_create
to calculate the size of the SGE array correctly.

Reported-by: Michal Kalderon
Fixes: 16f906d66cd7 ("xprtrdma: Reduce required number of send SGEs")
Signed-off-by: Chuck Lever
Tested-by: Michal Kalderon
Cc: stable@vger.kernel.org # v4.10+
Signed-off-by: Anna Schumaker
Signed-off-by: Greg Kroah-Hartman

Chuck Lever
2018-02-22 22:42:29 +0800

20 Dec, 2017

1 commit

42364042b xprtrdma: Don't defer fencing an async RPC's chunks ... Browse Code »

[ Upstream commit 8f66b1a529047a972cb9602a919c53a95f3d7a2b ]

In current kernels, waiting in xprt_release appears to be safe to
do. I had erroneously believed that for ASYNC RPCs, waiting of any
kind in xprt_release->xprt_rdma_free would result in deadlock. I've
done injection testing and consulted with Trond to confirm that
waiting in the RPC release path is safe.

For the very few times where RPC resources haven't yet been released
earlier by the reply handler, it is safe to wait synchronously in
xprt_rdma_free for invalidation rather than defering it to MR
recovery.

Note: When the QP is error state, posting a LocalInvalidate should
flush and mark the MR as bad. There is no way the remote HCA can
access that MR via a QP in error state, so it is effectively already
inaccessible and thus safe for the Upper Layer to access. The next
time the MR is used it should be recognized and cleaned up properly
by frwr_op_map.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Chuck Lever
2017-12-20 17:10:36 +0800

30 Nov, 2017

1 commit

a7a05def6 svcrdma: Preserve CB send buffer across retransmits ... Browse Code »

commit 0bad47cada5defba13e98827d22d06f13258dfb3 upstream.

During each NFSv4 callback Call, an RDMA Send completion frees the
page that contains the RPC Call message. If the upper layer
determines that a retransmit is necessary, this is too soon.

One possible symptom: after a GARBAGE_ARGS response an NFSv4.1
callback request, the following BUG fires on the NFS server:

kernel: BUG: Bad page state in process kworker/0:2H pfn:7d3ce2
kernel: page:ffffea001f4f3880 count:-2 mapcount:0 mapping: (null) index:0x0
kernel: flags: 0x2fffff80000000()
kernel: raw: 002fffff80000000 0000000000000000 0000000000000000 fffffffeffffffff
kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
kernel: page dumped because: nonzero _refcount
kernel: Modules linked in: cts rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
ocfs2_nodemanager ocfs2_stackglue rpcrdm a ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
rdma_cm ib_cm iw_cm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
kvm irqbypass crct10dif_pc lmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt
iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd pcspkr lpc_ich i2c_i801
mei_me mf d_core mei raid0 sg wmi ioatdma ipmi_si ipmi_devintf ipmi_msghandler shpchp
acpi_power_meter acpi_pad nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs
libcrc32c mlx4_en mlx4_ib mlx5_ib ib_core sd_mod sr_mod cdrom ast drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crc32c_intel libahci drm
mlx5_core igb libata mlx4_core dca i2c_algo_bit i2c_core nvme
kernel: ptp nvme_core pps_core dm_mirror dm_region_hash dm_log dm_mod dax
kernel: CPU: 0 PID: 11495 Comm: kworker/0:2H Not tainted 4.14.0-rc3-00001-g577ce48 #811
kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
kernel: Call Trace:
kernel: dump_stack+0x62/0x80
kernel: bad_page+0xfe/0x11a
kernel: free_pages_check_bad+0x76/0x78
kernel: free_pcppages_bulk+0x364/0x441
kernel: ? ttwu_do_activate.isra.61+0x71/0x78
kernel: free_hot_cold_page+0x1c5/0x202
kernel: __put_page+0x2c/0x36
kernel: svc_rdma_put_context+0xd9/0xe4 [rpcrdma]
kernel: svc_rdma_wc_send+0x50/0x98 [rpcrdma]

This issue exists all the way back to v4.5, but refactoring and code
re-organization prevents this simple patch from applying to kernels
older than v4.12. The fix is the same, however, if someone needs to
backport it.

Reported-by: Ben Coddington
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=314
Fixes: 5d252f90a800 ('svcrdma: Add class for RDMA backwards ... ')
Signed-off-by: Chuck Lever
Reviewed-by: Jeff Layton
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Chuck Lever
2017-11-30 16:40:54 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

25 Sep, 2017

1 commit

edd315511 IB: Correct MR length field to be 64-bit ... Browse Code »

The ib_mr->length represents the length of the MR in bytes as per
the IBTA spec 1.3 section 11.2.10.3 (REGISTER PHYSICAL MEMORY REGION).

Currently ib_mr->length field is defined as only 32-bits field.
This might result into truncation and failed WRs of consumers who
registers more than 4GB bytes memory regions and whose WRs accessing
such MRs.

This patch makes the length 64-bit to avoid such truncation.

Cc: Sagi Grimberg
Cc: Chuck Lever
Cc: Faisal Latif
Fixes: 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API")
Signed-off-by: Ilya Lesokhin
Signed-off-by: Parav Pandit
Signed-off-by: Leon Romanovsky
Signed-off-by: Doug Ledford

Parav Pandit
2017-09-25 23:47:23 +0800

12 Sep, 2017

1 commit

8e7757d83 Merge tag 'nfs-for-4.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client updates from Trond Myklebust:
"Hightlights include:

Stable bugfixes:
- Fix mirror allocation in the writeback code to avoid a use after
free
- Fix the O_DSYNC writes to use the correct byte range
- Fix 2 use after free issues in the I/O code

Features:
- Writeback fixes to split up the inode->i_lock in order to reduce
contention
- RPC client receive fixes to reduce the amount of time the
xprt->transport_lock is held when receiving data from a socket into
am XDR buffer.
- Ditto fixes to reduce contention between call side users of the
rdma rb_lock, and its use in rpcrdma_reply_handler.
- Re-arrange rdma stats to reduce false cacheline sharing.
- Various rdma cleanups and optimisations.
- Refactor the NFSv4.1 exchange id code and clean up the code.
- Const-ify all instances of struct rpc_xprt_ops

Bugfixes:
- Fix the NFSv2 'sec=' mount option.
- NFSv4.1: don't use machine credentials for CLOSE when using
'sec=sys'
- Fix the NFSv3 GRANT callback when the port changes on the server.
- Fix livelock issues with COMMIT
- NFSv4: Use correct inode in _nfs4_opendata_to_nfs4_state() when
doing and NFSv4.1 open by filehandle"

* tag 'nfs-for-4.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (69 commits)
NFS: Count the bytes of skipped subrequests in nfs_lock_and_join_requests()
NFS: Don't hold the group lock when calling nfs_release_request()
NFS: Remove pnfs_generic_transfer_commit_list()
NFS: nfs_lock_and_join_requests and nfs_scan_commit_list can deadlock
NFS: Fix 2 use after free issues in the I/O code
NFS: Sync the correct byte range during synchronous writes
lockd: Delete an error message for a failed memory allocation in reclaimer()
NFS: remove jiffies field from access cache
NFS: flush data when locking a file to ensure cache coherence for mmap.
SUNRPC: remove some dead code.
NFS: don't expect errors from mempool_alloc().
xprtrdma: Use xprt_pin_rqst in rpcrdma_reply_handler
xprtrdma: Re-arrange struct rx_stats
NFS: Fix NFSv2 security settings
NFSv4.1: don't use machine credentials for CLOSE when using 'sec=sys'
SUNRPC: ECONNREFUSED should cause a rebind.
NFS: Remove unused parameter gfp_flags from nfs_pageio_init()
NFSv4: Fix up mirror allocation
SUNRPC: Add a separate spinlock to protect the RPC request receive list
SUNRPC: Cleanup xs_tcp_read_common()
...

Linus Torvalds
2017-09-12 13:01:44 +0800

06 Sep, 2017

5 commits

9590d083c xprtrdma: Use xprt_pin_rqst in rpcrdma_reply_handler ... Browse Code »

Adopt the use of xprt_pin_rqst to eliminate contention between
Call-side users of rb_lock and the use of rb_lock in
rpcrdma_reply_handler.

This replaces the mechanism introduced in 431af645cf66 ("xprtrdma:
Fix client lock-up after application signal fires").

Use recv_lock to quickly find the completing rqst, pin it, then
drop the lock. At that point invalidation and pull-up of the Reply
XDR can be done. Both are often expensive operations.

Finally, take recv_lock again to signal completion to the RPC
layer. It also protects adjustment of "cwnd".

This greatly reduces the amount of time a lock is held by the
reply handler. Comparing lock_stat results shows a marked decrease
in contention on rb_lock and recv_lock.

Signed-off-by: Chuck Lever
[trond.myklebust@primarydata.com: Remove call to rpcrdma_buffer_put() from
the "out_norqst:" path in rpcrdma_reply_handler.]
Signed-off-by: Trond Myklebust

Chuck Lever
2017-09-06 06:27:07 +0800
f9773b22a Merge tag 'nfs-rdma-for-4.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs into linux-next ... Browse Code »

NFS-over-RDMA client updates for Linux 4.14

Bugfixes and cleanups:
- Constify rpc_xprt_ops
- Harden RPC call encoding and decoding
- Clean up rpc call decoding to use xdr_streams
- Remove unused variables from various structures
- Refactor code to remove imul instructions
- Rearrange rx_stats structure for better cacheline sharing

Trond Myklebust
2017-09-06 03:16:04 +0800
26fb2254d svcrdma: Estimate Send Queue depth properly ... Browse Code »

The rdma_rw API adjusts max_send_wr upwards during the
rdma_create_qp() call. If the ULP actually wants to take advantage
of these extra resources, it must increase the size of its send
completion queue (created before rdma_create_qp is called) and
increase its send queue accounting limit.

Use the new rdma_rw_mr_factor API to figure out the correct value
to use for the Send Queue and Send Completion Queue depths.

And, ensure that the chosen Send Queue depth for a newly created
transport does not overrun the QP WR limit of the underlying device.

Lastly, there's no longer a need to carry the Send Queue depth in
struct svcxprt_rdma, since the value is used only in the
svc_rdma_accept() path.

Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields

Chuck Lever
2017-09-06 03:15:31 +0800
5a25bfd28 svcrdma: Limit RQ depth ... Browse Code »

Ensure that the chosen Receive Queue depth for a newly created
transport does not overrun the QP WR limit of the underlying device.

Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields

Chuck Lever
2017-09-06 03:15:30 +0800
193bcb7b3 svcrdma: Populate tail iovec when receiving ... Browse Code »

So that NFS WRITE payloads can eventually be placed directly into a
file's page cache, enable the RPC-over-RDMA transport to present
these payloads in the xdr_buf's page list, while placing trailing
content (such as a GETATTR operation) in the xdr_buf's tail.

After this change, the RPC-over-RDMA's "copy tail" hack, added by
commit a97c331f9aa9 ("svcrdma: Handle additional inline content"),
is no longer needed and can be removed.

Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields

Chuck Lever
2017-09-06 03:15:29 +0800

25 Aug, 2017

2 commits

7075a867c svcrdma: Clean up svc_rdma_build_read_chunk() ... Browse Code »

Dan Carpenter observed that the while()
loop in svc_rdma_build_read_chunk() does not document the assumption
that the loop interior is always executed at least once.

Defensive: the function now returns -EINVAL if this assumption
fails.

Suggested-by: Dan Carpenter
Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields

Chuck Lever
2017-08-25 10:13:50 +0800
2412e9276 sunrpc: Const-ify instances of struct svc_xprt_ops ... Browse Code »

Close an attack vector by moving the arrays of server-side transport
methods to read-only memory.

Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields

Chuck Lever
2017-08-25 10:13:50 +0800

23 Aug, 2017

1 commit

67af6f652 xprtrdma: Re-arrange struct rx_stats ... Browse Code »

To reduce false cacheline sharing, separate counters that are likely
to be accessed in the Call path from those accessed in the Reply
path.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-08-23 04:19:32 +0800

19 Aug, 2017

1 commit

ce7c252a8 SUNRPC: Add a separate spinlock to protect the RPC request receive list ... Browse Code »

This further reduces contention with the transport_lock, and allows us
to convert to using a non-bh-safe spinlock, since the list is now never
accessed from a bh context.

Signed-off-by: Trond Myklebust

Trond Myklebust
2017-08-19 02:45:04 +0800

16 Aug, 2017

2 commits

6748b0caf xprtrdma: Remove imul instructions from chunk list encoders ... Browse Code »

Re-arrange the pointer arithmetic in the chunk list encoders to
eliminate several more integer multiplication instructions during
Transport Header encoding.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-08-16 02:01:50 +0800
28d9d56f4 xprtrdma: Remove imul instructions from rpcrdma_convert_iovs() ... Browse Code »

Re-arrange the pointer arithmetic in rpcrdma_convert_iovs() to
eliminate several integer multiplication instructions during
Transport Header encoding.

Also, array overflow does not occur outside development
environments, so replace overflow checking with one spot check
at the end. This reduces the number of conditional branches in
the common case.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-08-16 01:37:38 +0800

12 Aug, 2017

5 commits

7ec910e78 xprtrdma: Clean up rpcrdma_bc_marshal_reply() ... Browse Code »

Same changes as in rpcrdma_marshal_req(). This removes
C-structure style encoding from the backchannel.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-08-12 01:20:08 +0800
39f4cd9e9 xprtrdma: Harden chunk list encoding against send buffer overflow ... Browse Code »

While marshaling chunk lists which are variable-length XDR objects,
check for XDR buffer overflow at every step. Measurements show no
significant changes in CPU utilization.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-08-12 01:20:08 +0800
7a80f3f0d xprtrdma: Set up an xdr_stream in rpcrdma_marshal_req() ... Browse Code »

Initialize an xdr_stream at the top of rpcrdma_marshal_req(), and
use it to encode the fixed transport header fields. This xdr_stream
will be used to encode the chunk lists in a subsequent patch.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-08-12 01:20:08 +0800
f4a2805e7 xprtrdma: Remove rpclen from rpcrdma_marshal_req ... Browse Code »

Clean up: Remove a variable whose result is no longer used.
Commit 655fec6987be ("xprtrdma: Use gathered Send for large inline
messages") should have removed it.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-08-12 01:20:08 +0800
09e60641f xprtrdma: Clean up rpcrdma_marshal_req() synopsis ... Browse Code »

Clean up: The caller already has rpcrdma_xprt, so pass that directly
instead. And provide a documenting comment for this critical
function.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-08-12 01:20:08 +0800

08 Aug, 2017

7 commits

c1bcb68e3 xprtrdma: Clean up XDR decoding in rpcrdma_update_granted_credits() ... Browse Code »

Clean up: Replace C-structure based XDR decoding for consistency
with other areas.

struct rpcrdma_rep is rearranged slightly so that the relevant fields
are in cache when the Receive completion handler is invoked.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-08-08 22:52:01 +0800
e2a671904 xprtrdma: Remove rpcrdma_rep::rr_len ... Browse Code »

This field is no longer used outside the Receive completion handler.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-08-08 22:52:01 +0800
fdf503e30 xprtrdma: Remove opcode check in Receive completion handler ... Browse Code »

Clean up: The opcode check is no longer necessary, because since
commit 2fa8f88d8892 ("xprtrdma: Use new CQ API for RPC-over-RDMA
client send CQs"), this completion handler is invoked only for
RECV work requests.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-08-08 22:52:00 +0800
264b0cdbc xprtrdma: Replace rpcrdma_count_chunks() ... Browse Code »

Clean up chunk list decoding by using the xdr_stream set up in
rpcrdma_reply_handler. This hardens decoding by checking for buffer
overflow at every step while unmarshaling variable-length XDR
objects.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-08-08 22:52:00 +0800
07ff2dd51 xprtrdma: Refactor rpcrdma_reply_handler() ... Browse Code »

Refactor the reply handler's transport header decoding logic to make
it easier to understand and update.

Convert some of the handler to use xdr_streams, which will enable
stricter validation of input data and enable the eventual addition
of support for new combinations of chunks, such as "Write + Reply"
or "PZRC + normal Read".

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-08-08 22:52:00 +0800
41c8f70f5 xprtrdma: Harden backchannel call decoding ... Browse Code »

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-08-08 22:52:00 +0800
96f8778f7 xprtrdma: Add xdr_init_decode to rpcrdma_reply_handler() ... Browse Code »

Transport header decoding deals with untrusted input data, therefore
decoding this header needs to be hardened.

Adopt the same infrastructure that is used when XDR decoding NFS
replies. This is slightly more CPU-intensive than the replaced code,
but we're not adding new atomics, locking, or context switches. The
cost is manageable.

Start by initializing an xdr_stream in rpcrdma_reply_handler().

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-08-08 22:52:00 +0800

02 Aug, 2017

1 commit

d31ae2548 sunrpc: Const-ify all instances of struct rpc_xprt_ops ... Browse Code »

After transport instance creation, these function pointers never
change. Mark them as constant to prevent their use as an attack
vector for code injections.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-08-02 04:10:35 +0800

14 Jul, 2017

5 commits

b86faee6d Merge tag 'nfs-for-4.13-1' of git://git.linux-nfs.org/projects/anna/linux-nfs ... Browse Code »

Pull NFS client updates from Anna Schumaker:
"Stable bugfixes:
- Fix -EACCESS on commit to DS handling
- Fix initialization of nfs_page_array->npages
- Only invalidate dentries that are actually invalid

Features:
- Enable NFSoRDMA transparent state migration
- Add support for lookup-by-filehandle
- Add support for nfs re-exporting

Other bugfixes and cleanups:
- Christoph cleaned up the way we declare NFS operations
- Clean up various internal structures
- Various cleanups to commits
- Various improvements to error handling
- Set the dt_type of . and .. entries in NFS v4
- Make slot allocation more reliable
- Fix fscache stat printing
- Fix uninitialized variable warnings
- Fix potential list overrun in nfs_atomic_open()
- Fix a race in NFSoRDMA RPC reply handler
- Fix return size for nfs42_proc_copy()
- Fix against MAC forgery timing attacks"

* tag 'nfs-for-4.13-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (68 commits)
NFS: Don't run wake_up_bit() when nobody is waiting...
nfs: add export operations
nfs4: add NFSv4 LOOKUPP handlers
nfs: add a nfs_ilookup helper
nfs: replace d_add with d_splice_alias in atomic_open
sunrpc: use constant time memory comparison for mac
NFSv4.2 fix size storage for nfs42_proc_copy
xprtrdma: Fix documenting comments in frwr_ops.c
xprtrdma: Replace PAGE_MASK with offset_in_page()
xprtrdma: FMR does not need list_del_init()
xprtrdma: Demote "connect" log messages
NFSv4.1: Use seqid returned by EXCHANGE_ID after state migration
NFSv4.1: Handle EXCHGID4_FLAG_CONFIRMED_R during NFSv4.1 migration
xprtrdma: Don't defer MR recovery if ro_map fails
xprtrdma: Fix FRWR invalidation error recovery
xprtrdma: Fix client lock-up after application signal fires
xprtrdma: Rename rpcrdma_req::rl_free
xprtrdma: Pass only the list of registered MRs to ro_unmap_sync
xprtrdma: Pre-mark remotely invalidated MRs
xprtrdma: On invalidation failure, remove MWs from rl_registered
...

Linus Torvalds
2017-07-14 05:35:37 +0800
6afafa779 xprtrdma: Fix documenting comments in frwr_ops.c ... Browse Code »

Clean up.

FASTREG and LOCAL_INV WRs are typically not signaled. localinv_wake
is used for the last LOCAL_INV WR in a chain, which is always
signaled. The documenting comments should reflect that.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-07-14 04:00:13 +0800
d933cc320 xprtrdma: Replace PAGE_MASK with offset_in_page() ... Browse Code »

Clean up.

Reported by: Geliang Tang
Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-07-14 04:00:13 +0800
e2f6ef091 xprtrdma: FMR does not need list_del_init() ... Browse Code »

Clean up.

Commit 38f1932e60ba ("xprtrdma: Remove FMRs from the unmap list
after unmapping") utilized list_del_init() to try to prevent some
list corruption. The corruption was actually caused by the reply
handler racing with a signal. Now that MR invalidation is properly
serialized, list_del_init() can safely be replaced.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-07-14 04:00:13 +0800
173b8f49b xprtrdma: Demote "connect" log messages ... Browse Code »

Some have complained about the log messages generated when xprtrdma
opens or closes a connection to a server. When an NFS mount is
mostly idle these can appear every few minutes as the client idles
out the connection and reconnects.

Connection and disconnection is a normal part of operation, and not
exceptional, so change these to dprintk's for now. At some point
all of these will be converted to tracepoints, but that's for
another day.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-07-14 04:00:12 +0800