30 Dec, 2020
2 commits
-
commit 15261b9126cd5bb2ad8521da49d8f5c042d904c7 upstream.
Olga K. observed that rpcrdma_marsh_req() allocates sparse pages
only when it has determined that a Reply chunk is necessary. There
are plenty of cases where no Reply chunk is needed, but the
XDRBUF_SPARSE_PAGES flag is set. The result would be a crash in
rpcrdma_inline_fixup() when it tries to copy parts of the received
Reply into a missing page.To avoid crashing, handle sparse page allocation up front.
Until XATTR support was added, this issue did not appear often
because the only SPARSE_PAGES consumer always expected a reply large
enough to always require a Reply chunk.Reported-by: Olga Kornievskaia
Signed-off-by: Chuck Lever
Cc:
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit d5aa6b22e2258f05317313ecc02efbb988ed6d38 ]
According to RFC5666, the correct netid for an IPv6 addressed RDMA
transport is "rdma6", which we've supported as a mount option since
Linux-4.7. The problem is when we try to load the module "xprtrdma6",
that will fail, since there is no modulealias of that name.Fixes: 181342c5ebe8 ("xprtrdma: Add rdma6 option to support NFS/RDMA IPv6")
Signed-off-by: Trond Myklebust
Signed-off-by: Sasha Levin
23 Oct, 2020
1 commit
-
Pull nfsd updates from Bruce Fields:
"The one new feature this time, from Anna Schumaker, is READ_PLUS,
which has the same arguments as READ but allows the server to return
an array of data and hole extents.Otherwise it's a lot of cleanup and bugfixes"
* tag 'nfsd-5.10' of git://linux-nfs.org/~bfields/linux: (43 commits)
NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copy
SUNRPC: fix copying of multiple pages in gss_read_proxy_verf()
sunrpc: raise kernel RPC channel buffer size
svcrdma: fix bounce buffers for unaligned offsets and multiple pages
nfsd: remove unneeded break
net/sunrpc: Fix return value for sysctl sunrpc.transports
NFSD: Encode a full READ_PLUS reply
NFSD: Return both a hole and a data segment
NFSD: Add READ_PLUS hole segment encoding
NFSD: Add READ_PLUS data support
NFSD: Hoist status code encoding into XDR encoder functions
NFSD: Map nfserr_wrongsec outside of nfsd_dispatch
NFSD: Remove the RETURN_STATUS() macro
NFSD: Call NFSv2 encoders on error returns
NFSD: Fix .pc_release method for NFSv2
NFSD: Remove vestigial typedefs
NFSD: Refactor nfsd_dispatch() error paths
NFSD: Clean up nfsd_dispatch() variables
NFSD: Clean up stale comments in nfsd_dispatch()
NFSD: Clean up switch statement in nfsd_dispatch()
...
17 Oct, 2020
1 commit
-
This was discovered using O_DIRECT at the client side, with small
unaligned file offsets or IOs that span multiple file pages.Fixes: e248aa7be86 ("svcrdma: Remove max_sge check at connect time")
Signed-off-by: Dan Aloni
Signed-off-by: J. Bruce Fields
26 Sep, 2020
1 commit
-
Drop duplicate words in net/sunrpc/.
Also fix "Anyone" to be "Any one".Signed-off-by: Randy Dunlap
Cc: "J. Bruce Fields"
Cc: Chuck Lever
Cc: linux-nfs@vger.kernel.org
Signed-off-by: J. Bruce Fields
22 Sep, 2020
1 commit
-
sg_init_table zeroes its first argument, so the allocation of that argument
doesn't have to.the semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)//
@@
expression x,n,flags;
@@x =
- kcalloc
+ kmalloc_array
(n,sizeof(*x),flags)
...
sg_init_table(x,n)
//Signed-off-by: Julia Lawall
Acked-by: Chuck Lever
Signed-off-by: Anna Schumaker
21 Sep, 2020
3 commits
-
Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker -
These instruments don't appear to add any substantial value.
We already have this at the termination of each RPC:
iozone-2617 [002] 975.713126: rpc_stats_latency: task:418@5 xid=0x260eab5d nfsv3 LOOKUP backlog=15 rtt=32 execute=58
iozone-2617 [002] 975.713127: xprt_release_cong: task:418@5 snd_task:4294967295 cong=256 cwnd=16384
iozone-2617 [002] 975.713127: xprt_put_cong: task:418@5 snd_task:4294967295 cong=0 cwnd=16384Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker -
Introduce a tracepoint in call_allocate that reports the exact
sizes in the RPC buffer allocation request and the status of the
result. This helps catch problems with XDR buffer provisioning,
and replaces transport-specific debugging instrumentation.Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker
10 Sep, 2020
1 commit
-
Pull NFS client bugfixes from Trond Myklebust:
- Fix an NFS/RDMA resource leak
- Fix the error handling during delegation recall
- NFSv4.0 needs to return the delegation on a zero-stateid SETATTR
- Stop printk reading past end of string
* tag 'nfs-for-5.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: stop printk reading past end of string
NFS: Zero-stateid SETATTR should first return delegation
NFSv4.1 handle ERR_DELAY error reclaiming locking state on delegation recall
xprtrdma: Release in-flight MRs on disconnect
27 Aug, 2020
1 commit
-
Dan Aloni reports that when a server disconnects abruptly, a few
memory regions are left DMA mapped. Over time this leak could pin
enough I/O resources to slow or even deadlock an NFS/RDMA client.I found that if a transport disconnects before pending Send and
FastReg WRs can be posted, the to-be-registered MRs are stranded on
the req's rl_registered list and never released -- since they
weren't posted, there's no Send completion to DMA unmap them.Reported-by: Dan Aloni
Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker
24 Aug, 2020
1 commit
-
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva
10 Aug, 2020
1 commit
-
Pull NFS server updates from Chuck Lever:
"Highlights:
- Support for user extended attributes on NFS (RFC 8276)
- Further reduce unnecessary NFSv4 delegation recallsNotable fixes:
- Fix recent krb5p regression
- Address a few resource leaks and a rare NULL dereferenceOther:
- De-duplicate RPC/RDMA error handling and other utility functions
- Replace storage and display of kernel memory addresses by tracepoints"* tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6: (38 commits)
svcrdma: CM event handler clean up
svcrdma: Remove transport reference counting
svcrdma: Fix another Receive buffer leak
SUNRPC: Refresh the show_rqstp_flags() macro
nfsd: netns.h: delete a duplicated word
SUNRPC: Fix ("SUNRPC: Add "@len" parameter to gss_unwrap()")
nfsd: avoid a NULL dereference in __cld_pipe_upcall()
nfsd4: a client's own opens needn't prevent delegations
nfsd: Use seq_putc() in two functions
svcrdma: Display chunk completion ID when posting a rw_ctxt
svcrdma: Record send_ctxt completion ID in trace_svcrdma_post_send()
svcrdma: Introduce Send completion IDs
svcrdma: Record Receive completion ID in svc_rdma_decode_rqst
svcrdma: Introduce Receive completion IDs
svcrdma: Introduce infrastructure to support completion IDs
svcrdma: Add common XDR encoders for RDMA and Read segments
svcrdma: Add common XDR decoders for RDMA and Read segments
SUNRPC: Add helpers for decoding list discriminators symbolically
svcrdma: Remove declarations for functions long removed
svcrdma: Clean up trace_svcrdma_send_failed() tracepoint
...
28 Jul, 2020
3 commits
-
Now that there's a core tracepoint that reports these events, there's
no need to maintain dprintk() call sites in each arm of the switch
statements.We also refresh the documenting comments.
Signed-off-by: Chuck Lever
-
Jason tells me that a ULP cannot rely on getting an ESTABLISHED
and DISCONNECTED event pair for each connection, so transport
reference counting in the CM event handler will never be reliable.Now that we have ib_drain_qp(), svcrdma should no longer need to
hold transport references while Sends and Receives are posted. So
remove the get/put call sites in the CM event handlers.This eliminates a significant source of locked memory bus traffic.
Signed-off-by: Chuck Lever
-
During a connection tear down, the Receive queue is flushed before
the device resources are freed. Typically, all the Receives flush
with IB_WR_FLUSH_ERR.However, any pending successful Receives flush with IB_WR_SUCCESS,
and the server automatically posts a fresh Receive to replace the
completing one. This happens even after the connection has closed
and the RQ is drained. Receives that are posted after the RQ is
drained appear never to complete, causing a Receive resource leak.
The leaked Receive buffer is left DMA-mapped.To prevent these late-posted recv_ctxt's from leaking, block new
Receive posting after XPT_CLOSE is set.Signed-off-by: Chuck Lever
16 Jul, 2020
1 commit
-
Currently the header size calculations are using an assignment
operator instead of a += operator when accumulating the header
size leading to incorrect sizes. Fix this by using the correct
operator.Addresses-Coverity: ("Unused value")
Fixes: 302d3deb2068 ("xprtrdma: Prevent inline overflow")
Signed-off-by: Colin Ian King
Reviewed-by: Chuck Lever
Signed-off-by: Anna Schumaker
14 Jul, 2020
16 commits
-
Re-use the post_rw tracepoint (safely) to trace cc_info lifetime
events, including completion IDs.Signed-off-by: Chuck Lever
-
First, refactor: Dereference the svc_rdma_send_ctxt inside
svc_rdma_send() instead of at every call site.Then, it can be passed into trace_svcrdma_post_send() to get the
proper completion ID.Signed-off-by: Chuck Lever
-
Set up a completion ID in each svc_rdma_send_ctxt. The ID is used
to match an incoming Send completion to a transport and to a
previous ib_post_send().Signed-off-by: Chuck Lever
-
When recording a trace event in the Receive path, tie decoding
results and errors to an incoming Receive completion.Signed-off-by: Chuck Lever
-
Set up a completion ID in each svc_rdma_recv_ctxt. The ID is used
to match an incoming Receive completion to a transport and to a
previous ib_post_recv().Signed-off-by: Chuck Lever
-
Clean up: De-duplicate some code.
Signed-off-by: Chuck Lever
-
Clean up: De-duplicate some code.
Signed-off-by: Chuck Lever
-
Use these helpers in a few spots to demonstrate their use.
The remaining open-coded discriminator checks in rpcrdma will be
addressed in subsequent patches.Signed-off-by: Chuck Lever
-
- Use the _err naming convention instead
- Remove display of kernel memory address of the controlling xprtSigned-off-by: Chuck Lever
-
Final refactor: Replace internals of svc_rdma_send_error() with a
simple call to svc_rdma_send_error_msg().Signed-off-by: Chuck Lever
-
Prepare for svc_rdma_send_error_msg() to be invoked from another
source file.Signed-off-by: Chuck Lever
-
Like svc_rdma_send_error(), have svc_rdma_send_error_msg() handle
any error conditions internally, rather than duplicating that
recovery logic at every call site.Signed-off-by: Chuck Lever
-
The common "send RDMA_ERR" function should be in svc_rdma_sendto.c,
since that is where the other Send-related functions are located.
So from here, I will beef up svc_rdma_send_error_msg() and deprecate
svc_rdma_send_error().A generic svc_rdma_send_error_msg() will need to handle both
ERR_CHUNK and ERR_VERS. Copy that logic from svc_rdma_send_error()
to svc_rdma_send_error_msg().Signed-off-by: Chuck Lever
-
Another step towards making svc_rdma_send_error_msg() and
svc_rdma_send_error() similar enough to eliminate one of them.Signed-off-by: Chuck Lever
-
Commit 4757d90b15d8 ("svcrdma: Report Write/Reply chunk overruns")
made an effort to preserve I/O pages until RDMA Write completion.In a subsequent patch, I intend to de-duplicate the two functions
that send ERR_CHUNK responses. Pull the save_io_pages() call out of
svc_rdma_send_error_msg() to make it more like
svc_rdma_send_error().Signed-off-by: Chuck Lever
-
Commit 07d0ff3b0cd2 ("svcrdma: Clean up Read chunk path") moved the
page saver logic so that it gets executed event when an error occurs.
In that case, the I/O is never posted, and those pages are then
leaked. Errors in this path, however, are quite rare.Fixes: 07d0ff3b0cd2 ("svcrdma: Clean up Read chunk path")
Signed-off-by: Chuck Lever
13 Jul, 2020
4 commits
-
Ensure that the connect worker is awoken if an attempt to establish
a connection is unsuccessful. Otherwise the worker waits forever
and the transport workload hangs.Connect errors should not attempt to destroy the ep, since the
connect worker continues to use it after the handler runs, so these
errors are now handled independently of DISCONNECTED events.Reported-by: Dan Aloni
Fixes: e28ce90083f0 ("xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt")
Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker -
I noticed that when rpcrdma_xprt_connect() returns -ENOMEM,
instead of retrying the connect, the RPC client kills the
RPC task that requested the connection. We want a retry
here.Fixes: cb586decbb88 ("xprtrdma: Make sendctx queue lifetime the same as connection lifetime")
Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker -
Both Dan and I have observed two processes invoking
rpcrdma_xprt_disconnect() concurrently. In my case:1. The connect worker invokes rpcrdma_xprt_disconnect(), which
drains the QP and waits for the final completion
2. This causes the newly posted Receive to flush and invoke
xprt_force_disconnect()
3. xprt_force_disconnect() sets CLOSE_WAIT and wakes up the RPC task
that is holding the transport lock
4. The RPC task invokes xprt_connect(), which calls ->ops->close
5. xprt_rdma_close() invokes rpcrdma_xprt_disconnect(), which tries
to destroy the QP.Deadlock.
To prevent xprt_force_disconnect() from waking anything, handle the
clean up after a failed connection attempt in the xprt's sndtask.The retry loop is removed from rpcrdma_xprt_connect() to ensure
that the newly allocated ep and id are properly released before
a REJECTED connection attempt can be retried.Reported-by: Dan Aloni
Fixes: e28ce90083f0 ("xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt")
Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker -
In the error paths, there's no need to call kfree(ep) after calling
rpcrdma_ep_put(ep).Fixes: e28ce90083f0 ("xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt")
Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker
22 Jun, 2020
3 commits
-
The RPC client currently doesn't handle ERR_CHUNK replies correctly.
rpcrdma_complete_rqst() incorrectly passes a negative number to
xprt_complete_rqst() as the number of bytes copied. Instead, set
task->tk_status to the error value, and return zero bytes copied.In these cases, return -EIO rather than -EREMOTEIO. The RPC client's
finite state machine doesn't know what to do with -EREMOTEIO.Additional clean ups:
- Don't double-count RDMA_ERROR replies
- Remove a stale commentSigned-off-by: Chuck Lever
Cc:
Signed-off-by: Anna Schumaker -
1. Ensure that only rpcrdma_cm_event_handler() modifies
ep->re_connect_status to avoid racy changes to that field.2. Ensure that xprt_force_disconnect() is invoked only once as a
transport is closed or destroyed.Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker -
Refactor: Pass struct rpcrdma_xprt instead of an IB layer object.
Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker