Eric Lee / smarc-fsl-linux-kernel

17 Dec, 2017

1 commit

d025fbf1a Merge tag 'nfs-for-4.15-3' of git://git.linux-nfs.org/projects/anna/linux-nfs ... Browse Code »

Pull NFS client fixes from Anna Schumaker:
"This has two stable bugfixes, one to fix a BUG_ON() when
nfs_commit_inode() is called with no outstanding commit requests and
another to fix a race in the SUNRPC receive codepath.

Additionally, there are also fixes for an NFS client deadlock and an
xprtrdma performance regression.

Summary:

Stable bugfixes:
- NFS: Avoid a BUG_ON() in nfs_commit_inode() by not waiting for a
commit in the case that there were no commit requests.
- SUNRPC: Fix a race in the receive code path

Other fixes:
- NFS: Fix a deadlock in nfs client initialization
- xprtrdma: Fix a performance regression for small IOs"

* tag 'nfs-for-4.15-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: Fix a race in the receive code path
nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests
xprtrdma: Spread reply processing over more CPUs
nfs: fix a deadlock in nfs client initialization

Linus Torvalds
2017-12-17 05:12:53 +0800

16 Dec, 2017

2 commits

90d91b0cd SUNRPC: Fix a race in the receive code path ... Browse Code »

We must ensure that the call to rpc_sleep_on() in xprt_transmit() cannot
race with the call to xprt_complete_rqst().

Reported-by: Chuck Lever
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=317
Fixes: ce7c252a8c74 ("SUNRPC: Add a separate spinlock to protect..")
Cc: stable@vger.kernel.org # 4.14+
Reviewed-by: Chuck Lever
Signed-off-by: Trond Myklebust
Signed-off-by: Anna Schumaker

Trond Myklebust
2017-12-16 03:31:56 +0800
ccede7598 xprtrdma: Spread reply processing over more CPUs ... Browse Code »

Commit d8f532d20ee4 ("xprtrdma: Invoke rpcrdma_reply_handler
directly from RECV completion") introduced a performance regression
for NFS I/O small enough to not need memory registration. In multi-
threaded benchmarks that generate primarily small I/O requests,
IOPS throughput is reduced by nearly a third. This patch restores
the previous level of throughput.

Because workqueues are typically BOUND (in particular ib_comp_wq,
nfsiod_workqueue, and rpciod_workqueue), NFS/RDMA workloads tend
to aggregate on the CPU that is handling Receive completions.

The usual approach to addressing this problem is to create a QP
and CQ for each CPU, and then schedule transactions on the QP
for the CPU where you want the transaction to complete. The
transaction then does not require an extra context switch during
completion to end up on the same CPU where the transaction was
started.

This approach doesn't work for the Linux NFS/RDMA client because
currently the Linux NFS client does not support multiple connections
per client-server pair, and the RDMA core API does not make it
straightforward for ULPs to determine which CPU is responsible for
handling Receive completions for a CQ.

So for the moment, record the CPU number in the rpcrdma_req before
the transport sends each RPC Call. Then during Receive completion,
queue the RPC completion on that same CPU.

Additionally, move all RPC completion processing to the deferred
handler so that even RPCs with simple small replies complete on
the CPU that sent the corresponding RPC Call.

Fixes: d8f532d20ee4 ("xprtrdma: Invoke rpcrdma_reply_handler ...")
Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-12-16 03:31:50 +0800

15 Dec, 2017

1 commit

bdcf0a423 kernel: make groups_sort calling a responsibility group_info allocators ... Browse Code »

In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.

This patch:
- Make groups_sort globally visible.
- Move the call to groups_sort to the modifiers of group_info
- Remove the call to groups_sort from set_groups

Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker
Reviewed-by: Matthew Wilcox
Reviewed-by: NeilBrown
Acked-by: "J. Bruce Fields"
Cc: Al Viro
Cc: Martin Schwidefsky
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thiago Rafael Becker
2017-12-15 08:00:49 +0800

02 Dec, 2017

1 commit

2db767d98 Merge tag 'nfs-for-4.15-2' of git://git.linux-nfs.org/projects/anna/linux-nfs ... Browse Code »

Pull NFS client fixes from Anna Schumaker:
"These patches fix a problem with compiling using an old version of
gcc, and also fix up error handling in the SUNRPC layer.

- NFSv4: Ensure gcc 4.4.4 can compile initialiser for
"invalid_stateid"

- SUNRPC: Allow connect to return EHOSTUNREACH

- SUNRPC: Handle ENETDOWN errors"

* tag 'nfs-for-4.15-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: Handle ENETDOWN errors
SUNRPC: Allow connect to return EHOSTUNREACH
NFSv4: Ensure gcc 4.4.4 can compile initialiser for "invalid_stateid"

Linus Torvalds
2017-12-02 09:04:20 +0800

01 Dec, 2017

1 commit

eb5b46faa SUNRPC: Handle ENETDOWN errors ... Browse Code »

Signed-off-by: Trond Myklebust
Signed-off-by: Anna Schumaker

Trond Myklebust
2017-12-01 00:52:52 +0800

30 Nov, 2017

1 commit

4ba161a79 SUNRPC: Allow connect to return EHOSTUNREACH ... Browse Code »

Reported-by: Dmitry Vyukov
Signed-off-by: Trond Myklebust
Tested-by: Dmitry Vyukov
Signed-off-by: Anna Schumaker

Trond Myklebust
2017-11-30 03:02:01 +0800

28 Nov, 2017

2 commits

ee24eac3e SUNRPC: make cache_detail structures const ... Browse Code »

Make these const as they are only getting passed to the function
cache_create_net having the argument as const.

Signed-off-by: Bhumika Goyal
Reviewed-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Bhumika Goyal
2017-11-28 05:45:11 +0800
d34971a65 sunrpc: make the function arg as const ... Browse Code »

Make the struct cache_detail *tmpl argument of the function
cache_create_net as const as it is only getting passed to kmemup having
the argument as const void *.
Add const to the prototype too.

Signed-off-by: Bhumika Goyal
Reviewed-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Bhumika Goyal
2017-11-28 05:45:11 +0800

22 Nov, 2017

1 commit

841b86f32 treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts ... Browse Code »

With all callbacks converted, and the timer callback prototype
switched over, the TIMER_FUNC_TYPE cast is no longer needed,
so remove it. Conversion was done with the following scripts:

perl -pi -e 's|$TIMER_FUNC_TYPE$||g' \
$(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u)

perl -pi -e 's|$TIMER_DATA_TYPE$||g' \
$(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u)

The now unused macros are also dropped from include/linux/timer.h.

Signed-off-by: Kees Cook

Kees Cook
2017-11-22 08:35:54 +0800

19 Nov, 2017

1 commit

4dd3c2e5a Merge tag 'nfsd-4.15' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd updates from Bruce Fields:
"Lots of good bugfixes, including:

- fix a number of races in the NFSv4+ state code

- fix some shutdown crashes in multiple-network-namespace cases

- relax our 4.1 session limits; if you've an artificially low limit
to the number of 4.1 clients that can mount simultaneously, try
upgrading"

* tag 'nfsd-4.15' of git://linux-nfs.org/~bfields/linux: (22 commits)
SUNRPC: Improve ordering of transport processing
nfsd: deal with revoked delegations appropriately
svcrdma: Enqueue after setting XPT_CLOSE in completion handlers
nfsd: use nfs->ns.inum as net ID
rpc: remove some BUG()s
svcrdma: Preserve CB send buffer across retransmits
nfds: avoid gettimeofday for nfssvc_boot time
fs, nfsd: convert nfs4_file.fi_ref from atomic_t to refcount_t
fs, nfsd: convert nfs4_cntl_odstate.co_odcount from atomic_t to refcount_t
fs, nfsd: convert nfs4_stid.sc_count from atomic_t to refcount_t
lockd: double unregister of inetaddr notifiers
nfsd4: catch some false session retries
nfsd4: fix cached replies to solo SEQUENCE compounds
sunrcp: make function _svc_create_xprt static
SUNRPC: Fix tracepoint storage issues with svc_recv and svc_rqst_status
nfsd: use ARRAY_SIZE
nfsd: give out fewer session slots as limit approaches
nfsd: increase DRC cache limit
nfsd: remove unnecessary nofilehandle checks
nfs_common: convert int to bool
...

Linus Torvalds
2017-11-19 03:22:04 +0800

18 Nov, 2017

24 commits

c3e9c04b8 Merge tag 'nfs-for-4.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs ... Browse Code »

Pull NFS client updates from Anna Schumaker:
"Stable bugfixes:
- Revalidate "." and ".." correctly on open
- Avoid RCU usage in tracepoints
- Fix ugly referral attributes
- Fix a typo in nomigration mount option
- Revert "NFS: Move the flock open mode check into nfs_flock()"

Features:
- Implement a stronger send queue accounting system for NFS over RDMA
- Switch some atomics to the new refcount_t type

Other bugfixes and cleanups:
- Clean up access mode bits
- Remove special-case revalidations in nfs_opendir()
- Improve invalidating NFS over RDMA memory for async operations that
time out
- Handle NFS over RDMA replies with a worqueue
- Handle NFS over RDMA sends with a workqueue
- Fix up replaying interrupted requests
- Remove dead NFS over RDMA definitions
- Update NFS over RDMA copyright information
- Be more consistent with bool initialization and comparisons
- Mark expected switch fall throughs
- Various sunrpc tracepoint cleanups
- Fix various OPEN races
- Fix a typo in nfs_rename()
- Use common error handling code in nfs_lock_and_join_request()
- Check that some structures are properly cleaned up during
net_exit()
- Remove net pointer from dprintk()s"

* tag 'nfs-for-4.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (62 commits)
NFS: Revert "NFS: Move the flock open mode check into nfs_flock()"
NFS: Fix typo in nomigration mount option
nfs: Fix ugly referral attributes
NFS: super: mark expected switch fall-throughs
sunrpc: remove net pointer from messages
nfs: remove net pointer from messages
sunrpc: exit_net cleanup check added
nfs client: exit_net cleanup check added
nfs/write: Use common error handling code in nfs_lock_and_join_requests()
NFSv4: Replace closed stateids with the "invalid special stateid"
NFSv4: nfs_set_open_stateid must not trigger state recovery for closed state
NFSv4: Check the open stateid when searching for expired state
NFSv4: Clean up nfs4_delegreturn_done
NFSv4: cleanup nfs4_close_done
NFSv4: Retry NFS4ERR_OLD_STATEID errors in layoutreturn
pNFS: Retry NFS4ERR_OLD_STATEID errors in layoutreturn-on-close
NFSv4: Don't try to CLOSE if the stateid 'other' field has changed
NFSv4: Retry CLOSE and DELEGRETURN on NFS4ERR_OLD_STATEID.
NFS: Fix a typo in nfs_rename()
NFSv4: Fix open create exclusive when the server reboots
...

Linus Torvalds
2017-11-18 06:18:00 +0800
6c67a3e4a sunrpc: remove net pointer from messages ... Browse Code »

Publishing of net pointer is not safe, use net->ns.inum as net ID
[ 171.391947] RPC: created new rpcb local clients
(rpcb_local_clnt: ..., rpcb_local_clnt4: ...) for net f00001e7
[ 171.767188] NFSD: starting 90-second grace period (net f00001e7)

Signed-off-by: Vasily Averin
Signed-off-by: Anna Schumaker

Vasily Averin
2017-11-18 05:43:51 +0800
4112be70b sunrpc: exit_net cleanup check added ... Browse Code »

Be sure that all_clients list initialized in net_init hook was return
to initial state.

Signed-off-by: Vasily Averin
Signed-off-by: Anna Schumaker

Vasily Averin
2017-11-18 05:43:50 +0800
c435da68b sunrpc: Add rpc_request static trace point ... Browse Code »

Display information about the RPC procedure being requested in the
trace log. This sometimes critical information cannot always be
derived from other RPC trace entries.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 05:43:45 +0800
b2bfe5915 sunrpc: Fix rpc_task_begin trace point ... Browse Code »

The rpc_task_begin trace point always display a task ID of zero.
Move the trace point call site so that it picks up the new task ID.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 05:43:44 +0800
e9d476393 net: sunrpc: mark expected switch fall-throughs ... Browse Code »

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva
Signed-off-by: Anna Schumaker

Gustavo A. R. Silva
2017-11-18 05:43:44 +0800
62b56a675 xprtrdma: Update copyright notices ... Browse Code »

Credit work contributed by Oracle engineers since 2014.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 05:43:43 +0800
1b746c1e9 xprtrdma: Remove include for linux/prefetch.h ... Browse Code »

Clean up. This include should have been removed by
commit 23826c7aeac7 ("xprtrdma: Serialize credit accounting again").

Signed-off-by: Chuck Lever
Reviewed-by: Devesh Sharma
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 05:43:42 +0800
2232df5ec rpcrdma: Remove C structure definitions of XDR data items ... Browse Code »

Clean up: C-structure style XDR encoding and decoding logic has
been replaced over the past several merge windows on both the
client and server. These data structures are no longer used.

Signed-off-by: Chuck Lever
Reviewed-by: Devesh Sharma
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 05:43:42 +0800
a4699f564 xprtrdma: Put Send CQ in IB_POLL_WORKQUEUE mode ... Browse Code »

Lift the Send and LocalInv completion handlers out of soft IRQ mode
to make room for other work. Also, move the Send CQ to a different
CPU than the CPU where the Receive CQ is running, for improved
scalability.

Signed-off-by: Chuck Lever
Reviewed-by: Devesh Sharma
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 05:43:42 +0800
6f0afc282 xprtrdma: Remove atomic send completion counting ... Browse Code »

The sendctx circular queue now guarantees that xprtrdma cannot
overflow the Send Queue, so remove the remaining bits of the
original Send WQE counting mechanism.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 02:47:58 +0800
01bb35c89 xprtrdma: RPC completion should wait for Send completion ... Browse Code »

When an RPC Call includes a file data payload, that payload can come
from pages in the page cache, or a user buffer (for direct I/O).

If the payload can fit inline, xprtrdma includes it in the Send
using a scatter-gather technique. xprtrdma mustn't allow the RPC
consumer to re-use the memory where that payload resides before the
Send completes. Otherwise, the new contents of that memory would be
exposed by an HCA retransmit of the Send operation.

So, block RPC completion on Send completion, but only in the case
where a separate file data payload is part of the Send. This
prevents the reuse of that memory while it is still part of a Send
operation without an undue cost to other cases.

Waiting is avoided in the common case because typically the Send
will have completed long before the RPC Reply arrives.

These days, an RPC timeout will trigger a disconnect, which tears
down the QP. The disconnect flushes all waiting Sends. This bounds
the amount of time the reply handler has to wait for a Send
completion.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 02:47:57 +0800
0ba6f3701 xprtrdma: Refactor rpcrdma_deferred_completion ... Browse Code »

Invoke a common routine for releasing hardware resources (for
example, invalidating MRs). This needs to be done whether an
RPC Reply has arrived or the RPC was terminated early.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 02:47:57 +0800
531cca0c9 xprtrdma: Add a field of bit flags to struct rpcrdma_req ... Browse Code »

We have one boolean flag in rpcrdma_req today. I'd like to add more
flags, so convert that boolean to a bit flag.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 02:47:57 +0800
ae72950ab xprtrdma: Add data structure to manage RDMA Send arguments ... Browse Code »

Problem statement:

Recently Sagi Grimberg observed that kernel RDMA-
enabled storage initiators don't handle delayed Send completion
correctly. If Send completion is delayed beyond the end of a ULP
transaction, the ULP may release resources that are still being used
by the HCA to complete a long-running Send operation.

This is a common design trait amongst our initiators. Most Send
operations are faster than the ULP transaction they are part of.
Waiting for a completion for these is typically unnecessary.

Infrequently, a network partition or some other problem crops up
where an ordering problem can occur. In NFS parlance, the RPC Reply
arrives and completes the RPC, but the HCA is still retrying the
Send WR that conveyed the RPC Call. In this case, the HCA can try
to use memory that has been invalidated or DMA unmapped, and the
connection is lost. If that memory has been re-used for something
else (possibly not related to NFS), and the Send retransmission
exposes that data on the wire.

Thus we cannot assume that it is safe to release Send-related
resources just because a ULP reply has arrived.

After some analysis, we have determined that the completion
housekeeping will not be difficult for xprtrdma:

- Inline Send buffers are registered via the local DMA key, and
are already left DMA mapped for the lifetime of a transport
connection, thus no additional handling is necessary for those
- Gathered Sends involving page cache pages _will_ need to
DMA unmap those pages after the Send completes. But like
inline send buffers, they are registered via the local DMA key,
and thus will not need to be invalidated

In addition, RPC completion will need to wait for Send completion
in the latter case. However, nearly always, the Send that conveys
the RPC Call will have completed long before the RPC Reply
arrives, and thus no additional latency will be accrued.

Design notes:

In this patch, the rpcrdma_sendctx object is introduced, and a
lock-free circular queue is added to manage a set of them per
transport.

The RPC client's send path already prevents sending more than one
RPC Call at the same time. This allows us to treat the consumer
side of the queue (rpcrdma_sendctx_get_locked) as if there is a
single consumer thread.

The producer side of the queue (rpcrdma_sendctx_put_locked) is
invoked only from the Send completion handler, which is a single
thread of execution (soft IRQ).

The only care that needs to be taken is with the tail index, which
is shared between the producer and consumer. Only the producer
updates the tail index. The consumer compares the head with the
tail to ensure that the a sendctx that is in use is never handed
out again (or, expressed more conventionally, the queue is empty).

When the sendctx queue empties completely, there are enough Sends
outstanding that posting more Send operations can result in a Send
Queue overflow. In this case, the ULP is told to wait and try again.
This introduces strong Send Queue accounting to xprtrdma.

As a final touch, Jason Gunthorpe
suggested a mechanism that does not require signaling every Send.
We signal once every N Sends, and perform SGE unmapping of N Send
operations during that one completion.

Reported-by: Sagi Grimberg
Suggested-by: Jason Gunthorpe
Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 02:47:56 +0800
a062a2a3e xprtrdma: "Unoptimize" rpcrdma_prepare_hdr_sge() ... Browse Code »

Commit 655fec6987be ("xprtrdma: Use gathered Send for large inline
messages") assumed that, since the zeroeth element of the Send SGE
array always pointed to req->rl_rdmabuf, it needed to be initialized
just once. This was a valid assumption because the Send SGE array
and rl_rdmabuf both live in the same rpcrdma_req.

In a subsequent patch, the Send SGE array will be separated from the
rpcrdma_req, so the zeroeth element of the SGE array needs to be
initialized every time.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 02:47:56 +0800
857f9acab xprtrdma: Change return value of rpcrdma_prepare_send_sges() ... Browse Code »

Clean up: Make rpcrdma_prepare_send_sges() return a negative errno
instead of a bool. Soon callers will want distinct treatments of
different types of failures.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 02:47:56 +0800
394b2c77c xprtrdma: Fix error handling in rpcrdma_prepare_msg_sges() ... Browse Code »

When this function fails, it needs to undo the DMA mappings it's
done so far. Otherwise these are leaked.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 02:47:55 +0800
ad99f0530 xprtrdma: Clean up SGE accounting in rpcrdma_prepare_msg_sges() ... Browse Code »

Clean up. rpcrdma_prepare_hdr_sge() sets num_sge to one, then
rpcrdma_prepare_msg_sges() sets num_sge again to the count of SGEs
it added, plus one for the header SGE just mapped in
rpcrdma_prepare_hdr_sge(). This is confusing, and nails in an
assumption about when these functions are called.

Instead, maintain a running count that both functions can update
with just the number of SGEs they have added to the SGE array.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 02:47:55 +0800
be798f908 xprtrdma: Decode credits field in rpcrdma_reply_handler ... Browse Code »

We need to decode and save the incoming rdma_credits field _after_
we know that the direction of the message is "forward direction
Reply". Otherwise, the credits value in reverse direction Calls is
also used to update the forward direction credits.

It is safe to decode the rdma_credits field in rpcrdma_reply_handler
now that rpcrdma_reply_handler is single-threaded. Receives complete
in the same order as they were sent on the NFS server.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 02:47:55 +0800
d8f532d20 xprtrdma: Invoke rpcrdma_reply_handler directly from RECV completion ... Browse Code »

I noticed that the soft IRQ thread looked pretty busy under heavy
I/O workloads. perf suggested one area that was expensive was the
queue_work() call in rpcrdma_wc_receive. That gave me some ideas.

Instead of scheduling a separate worker to process RPC Replies,
promote the Receive completion handler to IB_POLL_WORKQUEUE, and
invoke rpcrdma_reply_handler directly.

Note that the poll workqueue is single-threaded. In order to keep
memory invalidation from serializing all RPC Replies, handle any
necessary invalidation tasks in a separate multi-threaded workqueue.

This provides a two-tier scheme, similar to OS I/O interrupt
handlers: A fast interrupt handler that schedules the slow handler
and re-enables the interrupt, and a slower handler that is invoked
for any needed heavy lifting.

Benefits include:
- One less context switch for RPCs that don't register memory
- Receive completion handling is moved out of soft IRQ context to
make room for other users of soft IRQ
- The same CPU core now DMA syncs and XDR decodes the Receive buffer

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 02:47:54 +0800
e1352c961 xprtrdma: Refactor rpcrdma_reply_handler some more ... Browse Code »

Clean up: I'd like to be able to invoke the tail of
rpcrdma_reply_handler in two different places. Split the tail out
into its own helper function.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 02:47:54 +0800
5381e0ec7 xprtrdma: Move decoded header fields into rpcrdma_rep ... Browse Code »

Clean up: Make it easier to pass the decoded XID, vers, credits, and
proc fields around by moving these variables into struct rpcrdma_rep.

Note: the credits field will be handled in a subsequent patch.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 02:47:54 +0800
61433af56 xprtrdma: Throw away reply when version is unrecognized ... Browse Code »

A reply with an unrecognized value in the version field means the
transport header is potentially garbled and therefore all the fields
are untrustworthy.

Fixes: 59aa1f9a3cce3 ("xprtrdma: Properly handle RDMA_ERROR ... ")
Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2017-11-18 02:47:53 +0800

16 Nov, 2017

1 commit

1be2172e9 Merge tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux ... Browse Code »

Pull module updates from Jessica Yu:
"Summary of modules changes for the 4.15 merge window:

- treewide module_param_call() cleanup, fix up set/get function
prototype mismatches, from Kees Cook

- minor code cleanups"

* tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: Do not paper over type mismatches in module_param_call()
treewide: Fix function prototypes for module_param_call()
module: Prepare to convert all module_param_call() prototypes
kernel/module: Delete an error message for a failed memory allocation in add_module_usage()

Linus Torvalds
2017-11-16 05:46:33 +0800

08 Nov, 2017

4 commits

22700f3c6 SUNRPC: Improve ordering of transport processing ... Browse Code »

Since it can take a while before a specific thread gets scheduled, it
is better to just implement a first come first served queue mechanism.
That way, if a thread is already scheduled and is idle, it can pick up
the work to do from the queue.

Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields

Trond Myklebust
2017-11-08 05:44:03 +0800
77a08867a svcrdma: Enqueue after setting XPT_CLOSE in completion handlers ... Browse Code »

I noticed the server was sometimes not closing the connection after
a flushed Send. For example, if the client responds with an RNR NAK
to a Reply from the server, that client might be deadlocked, and
thus wouldn't send any more traffic. Thus the server wouldn't have
any opportunity to notice the XPT_CLOSE bit has been set.

Enqueue the transport so that svcxprt notices the bit even if there
is no more transport activity after a flushed completion, QP access
error, or device removal event.

Signed-off-by: Chuck Lever
Reviewed-By: Devesh Sharma
Signed-off-by: J. Bruce Fields

Chuck Lever
2017-11-08 05:44:02 +0800
1754eb2b2 rpc: remove some BUG()s ... Browse Code »

It would be kinder to WARN() and recover in several spots here instead
of BUG()ing.

Also, it looks like the read_u32_from_xdr_buf() call could actually
fail, though it might require a broken (or malicious) client, so convert
that to just an error return.

Reported-by: Weston Andros Adamson
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2017-11-08 05:44:01 +0800
0bad47cad svcrdma: Preserve CB send buffer across retransmits ... Browse Code »

During each NFSv4 callback Call, an RDMA Send completion frees the
page that contains the RPC Call message. If the upper layer
determines that a retransmit is necessary, this is too soon.

One possible symptom: after a GARBAGE_ARGS response an NFSv4.1
callback request, the following BUG fires on the NFS server:

kernel: BUG: Bad page state in process kworker/0:2H pfn:7d3ce2
kernel: page:ffffea001f4f3880 count:-2 mapcount:0 mapping: (null) index:0x0
kernel: flags: 0x2fffff80000000()
kernel: raw: 002fffff80000000 0000000000000000 0000000000000000 fffffffeffffffff
kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
kernel: page dumped because: nonzero _refcount
kernel: Modules linked in: cts rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
ocfs2_nodemanager ocfs2_stackglue rpcrdm a ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
rdma_cm ib_cm iw_cm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
kvm irqbypass crct10dif_pc lmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt
iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd pcspkr lpc_ich i2c_i801
mei_me mf d_core mei raid0 sg wmi ioatdma ipmi_si ipmi_devintf ipmi_msghandler shpchp
acpi_power_meter acpi_pad nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs
libcrc32c mlx4_en mlx4_ib mlx5_ib ib_core sd_mod sr_mod cdrom ast drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crc32c_intel libahci drm
mlx5_core igb libata mlx4_core dca i2c_algo_bit i2c_core nvme
kernel: ptp nvme_core pps_core dm_mirror dm_region_hash dm_log dm_mod dax
kernel: CPU: 0 PID: 11495 Comm: kworker/0:2H Not tainted 4.14.0-rc3-00001-g577ce48 #811
kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
kernel: Call Trace:
kernel: dump_stack+0x62/0x80
kernel: bad_page+0xfe/0x11a
kernel: free_pages_check_bad+0x76/0x78
kernel: free_pcppages_bulk+0x364/0x441
kernel: ? ttwu_do_activate.isra.61+0x71/0x78
kernel: free_hot_cold_page+0x1c5/0x202
kernel: __put_page+0x2c/0x36
kernel: svc_rdma_put_context+0xd9/0xe4 [rpcrdma]
kernel: svc_rdma_wc_send+0x50/0x98 [rpcrdma]

This issue exists all the way back to v4.5, but refactoring and code
re-organization prevents this simple patch from applying to kernels
older than v4.12. The fix is the same, however, if someone needs to
backport it.

Reported-by: Ben Coddington
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=314
Fixes: 5d252f90a800 ('svcrdma: Add class for RDMA backwards ... ')
Cc: stable@vger.kernel.org # v4.12
Signed-off-by: Chuck Lever
Reviewed-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Chuck Lever
2017-11-08 05:44:00 +0800