Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

10 Dec, 2014

15 commits

1b2e122d1 sunrpc/cache: convert to use string_escape_str() ... Browse Code »

There is nice kernel helper to escape a given strings by provided rules. Let's
use it instead of custom approach.

Signed-off-by: Andy Shevchenko
[bfields@redhat.com: fix length calculation]
Signed-off-by: J. Bruce Fields

Andy Shevchenko
2014-12-10 00:30:20 +0800
acf06a7fa sunrpc: only call test_bit once in svc_xprt_received ... Browse Code »

...move the WARN_ON_ONCE inside the following if block since they use
the same condition.

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2014-12-10 00:29:14 +0800
83a712e0a sunrpc: add some tracepoints around enqueue and dequeue of svc_xprt ... Browse Code »
15

These were useful when I was tracking down a race condition between
svc_xprt_do_enqueue and svc_get_next_xprt.

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2014-12-10 00:29:14 +0800
b1691bc03 sunrpc: convert to lockless lookup of queued server threads ... Browse Code »

Testing has shown that the pool->sp_lock can be a bottleneck on a busy
server. Every time data is received on a socket, the server must take
that lock in order to dequeue a thread from the sp_threads list.

Address this problem by eliminating the sp_threads list (which contains
threads that are currently idle) and replacing it with a RQ_BUSY flag in
svc_rqst. This allows us to walk the sp_all_threads list under the
rcu_read_lock and find a suitable thread for the xprt by doing a
test_and_set_bit.

Note that we do still have a potential atomicity problem however with
this approach. We don't want svc_xprt_do_enqueue to set the
rqst->rq_xprt pointer unless a test_and_set_bit of RQ_BUSY returned
zero (which indicates that the thread was idle). But, by the time we
check that, the bit could be flipped by a waking thread.

To address this, we acquire a new per-rqst spinlock (rq_lock) and take
that before doing the test_and_set_bit. If that returns false, then we
can set rq_xprt and drop the spinlock. Then, when the thread wakes up,
it must set the bit under the same spinlock and can trust that if it was
already set then the rq_xprt is also properly set.

With this scheme, the case where we have an idle thread no longer needs
to take the highly contended pool->sp_lock at all, and that removes the
bottleneck.

That still leaves one issue: What of the case where we walk the whole
sp_all_threads list and don't find an idle thread? Because the search is
lockess, it's possible for the queueing to race with a thread that is
going to sleep. To address that, we queue the xprt and then search again.

If we find an idle thread at that point, we can't attach the xprt to it
directly since that might race with a different thread waking up and
finding it. All we can do is wake the idle thread back up and let it
attempt to find the now-queued xprt.

Signed-off-by: Jeff Layton
Tested-by: Chris Worley
Signed-off-by: J. Bruce Fields

Jeff Layton
2014-12-10 00:22:22 +0800
403c7b444 sunrpc: fix potential races in pool_stats collection ... Browse Code »

In a later patch, we'll be removing some spinlocking around the socket
and thread queueing code in order to fix some contention problems. At
that point, the stats counters will no longer be protected by the
sp_lock.

Change the counters to atomic_long_t fields, except for the
"sockets_queued" counter which will still be manipulated under a
spinlock.

Signed-off-by: Jeff Layton
Tested-by: Chris Worley
Signed-off-by: J. Bruce Fields

Jeff Layton
2014-12-10 00:22:22 +0800
812443865 sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it ... Browse Code »
2

...also make the manipulation of sp_all_threads list use RCU-friendly
functions.

Signed-off-by: Jeff Layton
Tested-by: Chris Worley
Signed-off-by: J. Bruce Fields

Jeff Layton
2014-12-10 00:22:22 +0800
0b5707e45 sunrpc: require svc_create callers to pass in meaningful shutdown routine ... Browse Code »

Currently all svc_create callers pass in NULL for the shutdown parm,
which then gets fixed up to be svc_rpcb_cleanup if the service uses
rpcbind.

Simplify this by instead having the the only caller that requires it
(lockd) pass in svc_rpcb_cleanup and get rid of the special casing.

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2014-12-10 00:22:21 +0800
ceff739c5 sunrpc: have svc_wake_up only deal with pool 0 ... Browse Code »

The way that svc_wake_up works is a bit inefficient. It walks all of the
available pools for a service and either wakes up a task in each one or
sets the SP_TASK_PENDING flag in each one.

When svc_wake_up is called, there is no need to wake up more than one
thread to do this work. In practice, only lockd currently uses this
function and it's single threaded anyway. Thus, this just boils down to
doing a wake up of a thread in pool 0 or setting a single flag.

Eliminate the for loop in this function and change it to just operate on
pool 0. Also update the comments that sit above it and get rid of some
code that has been commented out for years now.

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2014-12-10 00:22:21 +0800
4d5db3f53 sunrpc: convert sp_task_pending flag to use atomic bitops ... Browse Code »

In a later patch, we'll want to be able to handle this flag without
holding the sp_lock. Change this field to an unsigned long flags
field, and declare a new flag in it that can be managed with atomic
bitops.

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2014-12-10 00:22:21 +0800
779fb0f3a sunrpc: move rq_splice_ok flag into rq_flags ... Browse Code »

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2014-12-10 00:22:21 +0800
78b65eb3f sunrpc: move rq_dropme flag into rq_flags ... Browse Code »

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2014-12-10 00:22:20 +0800
30660e04b sunrpc: move rq_usedeferral flag to rq_flags ... Browse Code »

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2014-12-10 00:22:20 +0800
7501cc2bc sunrpc: move rq_local field to rq_flags ... Browse Code »

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2014-12-10 00:21:21 +0800
4d152e2c9 sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it ... Browse Code »

In a later patch, we're going to need some atomic bit flags. Since that
field will need to be an unsigned long, we mitigate that space
consumption by migrating some other bitflags to the new field. Start
with the rq_secure flag.

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2014-12-10 00:21:20 +0800
2941b0e91 Merge tag 'nfs-for-3.19-1' into nfsd for-3.19 branch ... Browse Code »

Mainly what I need is 860a0d9e511f "sunrpc: add some tracepoints in
svc_rqst handling functions", which subsequent server rpc patches from
jlayton depend on. I'm merging this later tag on the assumption that's
more likely to be a tested and stable point.

J. Bruce Fields
2014-12-10 00:12:26 +0800

02 Dec, 2014

2 commits

067f96ef1 sunrpc: release svc_pool_map reference when serv allocation fails ... Browse Code »

Currently, it leaks when the allocation fails.

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2014-12-02 03:45:27 +0800
8d65ef760 sunrpc: eliminate the XPT_DETACHED flag ... Browse Code »

All it does is indicate whether a xprt has already been deleted from
a list or not, which is unnecessary since we use list_del_init and it's
always set and checked under the sv_lock anyway.

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2014-12-02 03:45:26 +0800

28 Nov, 2014

2 commits

388f0c776 sunrpc: add a debugfs rpc_xprt directory with an info file in it ... Browse Code »
13

Add a new directory heirarchy under the debugfs sunrpc/ directory:

sunrpc/
rpc_xprt/
/

Within that directory, we can put files that give info about the
xprts. We do have the (minor) problem that there is no succinct,
unique identifier for rpc_xprts. So we generate them synthetically
with a static atomic_t counter.

For now, this directory just holds an "info" file, but we may add
other files to it in the future.

Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust

Jeff Layton
2014-11-28 02:14:52 +0800
b4b9d2ccf sunrpc: add debugfs file for displaying client rpc_task queue ... Browse Code »

It's possible to get a dump of the RPC task queue by writing a value to
/proc/sys/sunrpc/rpc_debug. If you write any value to that file, you get
a dump of the RPC client task list into the log buffer. This is a rather
inconvenient interface however, and makes it hard to get immediate info
about the task queue.

Add a new directory hierarchy under debugfs:

sunrpc/
rpc_clnt/
/

Within each clientid directory we create a new "tasks" file that will
dump info similar to what shows up in the log buffer, but with a few
small differences -- we avoid printing raw kernel addresses in favor of
symbolic names and the XID is also displayed.

Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust

Jeff Layton
2014-11-28 02:14:51 +0800

27 Nov, 2014

2 commits

ea5264138 Merge tag 'nfs-rdma-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next ... Browse Code »

Pull NFS client RDMA changes for 3.19 from Anna Schumaker:
"NFS: Client side changes for RDMA

These patches various bugfixes and cleanups for using NFS over RDMA, including
better error handling and performance improvements by using pad optimization.

Signed-off-by: Anna Schumaker "

* tag 'nfs-rdma-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma:
xprtrdma: Display async errors
xprtrdma: Enable pad optimization
xprtrdma: Re-write rpcrdma_flush_cqs()
xprtrdma: Refactor tasklet scheduling
xprtrdma: unmap all FMRs during transport disconnect
xprtrdma: Cap req_cqinit
xprtrdma: Return an errno from rpcrdma_register_external()

Trond Myklebust
2014-11-27 06:37:13 +0800
1702562db Merge tag 'nfs-cel-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next ... Browse Code »

Pull pull additional NFS client changes for 3.19 from Anna Schumaker:
"NFS: Generic client side changes from Chuck

These patches fixes for iostats and SETCLIENTID in addition to cleaning
up the nfs4_init_callback() function.

Signed-off-by: Anna Schumaker "

* tag 'nfs-cel-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma:
NFS: Clean up nfs4_init_callback()
NFS: SETCLIENTID XDR buffer sizes are incorrect
SUNRPC: serialize iostats updates

Trond Myklebust
2014-11-27 06:34:14 +0800

26 Nov, 2014

8 commits

edef1297f SUNRPC: serialize iostats updates ... Browse Code »

Occasionally mountstats reports a negative retransmission rate.
Ensure that two RPCs completing concurrently don't confuse the sums
in the transport's op_metrics array.

Since pNFS filelayout can invoke rpc_count_iostats() on another
transport from xprt_release(), we can't rely on simply holding the
transport_lock in xprt_release(). There's nothing for it but hard
serialization. One spin lock per RPC operation should make this as
painless as it can be.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2014-11-26 05:22:15 +0800
7ff11de1b xprtrdma: Display async errors ... Browse Code »

An async error upcall is a hard error, and should be reported in
the system log.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2014-11-26 02:39:20 +0800
d5440e27d xprtrdma: Enable pad optimization ... Browse Code »
4

The Linux NFS/RDMA server used to reject NFSv3 WRITE requests when
pad optimization was enabled. That bug was fixed by commit
e560e3b510d2 ("svcrdma: Add zero padding if the client doesn't send
it").

We can now enable pad optimization on the client, which helps
performance and is supported now by both Linux and Solaris servers.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2014-11-26 02:39:20 +0800
5c166bef4 xprtrdma: Re-write rpcrdma_flush_cqs() ... Browse Code »

Currently rpcrdma_flush_cqs() attempts to avoid code duplication,
and simply invokes rpcrdma_recvcq_upcall and rpcrdma_sendcq_upcall.

1. rpcrdma_flush_cqs() can run concurrently with provider upcalls.
Both flush_cqs() and the upcalls were invoking ib_poll_cq() in
different threads using the same wc buffers (ep->rep_recv_wcs
and ep->rep_send_wcs), added by commit 1c00dd077654 ("xprtrmda:
Reduce calls to ib_poll_cq() in completion handlers").

During transport disconnect processing, this sometimes resulted
in the same reply getting added to the rpcrdma_tasklets_g list
more than once, which corrupted the list.

2. The upcall functions drain only a limited number of CQEs,
thanks to the poll budget added by commit 8301a2c047cc
("xprtrdma: Limit work done by completion handler").

Fixes: a7bc211ac926 ("xprtrdma: On disconnect, don't ignore ... ")
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=276
Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2014-11-26 02:39:20 +0800
f1a03b76f xprtrdma: Refactor tasklet scheduling ... Browse Code »

Restore the separate function that schedules the reply handling
tasklet. I need to call it from two different paths.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2014-11-26 02:39:20 +0800
467c9674b xprtrdma: unmap all FMRs during transport disconnect ... Browse Code »

When using RPCRDMA_MTHCAFMR memory registration, after a few
transport disconnect / reconnect cycles, ib_map_phys_fmr() starts to
return EINVAL because the provider has exhausted its map pool.

Make sure that all FMRs are unmapped during transport disconnect,
and that ->send_request remarshals them during an RPC retransmit.
This resets the transport's MRs to ensure that none are leaked
during a disconnect.

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2014-11-26 02:39:20 +0800
e7104a2a9 xprtrdma: Cap req_cqinit ... Browse Code »
4

Recent work made FRMR registration and invalidation completions
unsignaled. This greatly reduces the adapter interrupt rate.

Every so often, however, a posted send Work Request is allowed to
signal. Otherwise, the provider's Work Queue will wrap and the
workload will hang.

The number of Work Requests that are allowed to remain unsignaled is
determined by the value of req_cqinit. Currently, this is set to the
size of the send Work Queue divided by two, minus 1.

For FRMR, the send Work Queue is the maximum number of concurrent
RPCs (currently 32) times the maximum number of Work Requests an
RPC might use (currently 7, though some adapters may need more).

For mlx4, this is 224 entries. This leaves completion signaling
disabled for 111 send Work Requests.

Some providers hold back dispatching Work Requests until a CQE is
generated. If completions are disabled, then no CQEs are generated
for quite some time, and that can stall the Work Queue.

I've seen this occur running xfstests generic/113 over NFSv4, where
eventually, posting a FAST_REG_MR Work Request fails with -ENOMEM
because the Work Queue has overflowed. The connection is dropped
and re-established.

Cap the rep_cqinit setting so completions are not left turned off
for too long.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=269
Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2014-11-26 02:39:20 +0800
92b98361f xprtrdma: Return an errno from rpcrdma_register_external() ... Browse Code »

The RPC/RDMA send_request method and the chunk registration code
expects an errno from the registration function. This allows
the upper layers to distinguish between a recoverable failure
(for example, temporary memory exhaustion) and a hard failure
(for example, a bug in the registration logic).

Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2014-11-26 02:39:20 +0800

25 Nov, 2014

5 commits

1306729b0 sunrpc: eliminate RPC_TRACEPOINTS ... Browse Code »

It's always set to the same value as CONFIG_TRACEPOINTS, so we can just
use that instead.

Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust

Jeff Layton
2014-11-25 06:33:12 +0800
f895b252d sunrpc: eliminate RPC_DEBUG ... Browse Code »
26

It's always set to whatever CONFIG_SUNRPC_DEBUG is, so just use that.

Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust

Jeff Layton
2014-11-25 06:31:46 +0800
1a867a089 sunrpc: add tracepoints in xs_tcp_data_recv ... Browse Code »

Add tracepoints inside the main loop on xs_tcp_data_recv that allow
us to keep an eye on what's happening during each phase of it.

Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust

Jeff Layton
2014-11-25 01:53:35 +0800
3705ad64f sunrpc: add new tracepoints in xprt handling code ... Browse Code »

...so we can keep track of when calls are sent and replies received.

Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust

Jeff Layton
2014-11-25 01:53:35 +0800
860a0d9e5 sunrpc: add some tracepoints in svc_rqst handling functions ... Browse Code »
13

...just around svc_send, svc_recv and svc_process for now.

Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust

Jeff Layton
2014-11-25 01:53:34 +0800

20 Nov, 2014

2 commits

56429e9b3 merge nfs bugfixes into nfsd for-3.19 branch ... Browse Code »

In addition to nfsd bugfixes, there are some fixes in -rc5 for client
bugs that can interfere with my testing.

J. Bruce Fields
2014-11-20 01:06:30 +0800
093a1468b SUNRPC: Fix locking around callback channel reply receive ... Browse Code »

Both xprt_lookup_rqst() and xprt_complete_rqst() require that you
take the transport lock in order to avoid races with xprt_transmit().

Signed-off-by: Trond Myklebust
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Trond Myklebust
2014-11-20 01:03:20 +0800

14 Nov, 2014

1 commit

b3ecba096 sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptor ... Browse Code »

Bruce reported that he was seeing the following BUG pop:

BUG: sleeping function called from invalid context at mm/slab.c:2846
in_atomic(): 0, irqs_disabled(): 0, pid: 4539, name: mount.nfs
2 locks held by mount.nfs/4539:
#0: (nfs_clid_init_mutex){+.+.+.}, at: [] nfs4_discover_server_trunking+0x4a/0x2f0 [nfsv4]
#1: (rcu_read_lock){......}, at: [] gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss]
Preemption disabled at:[] printk+0x4d/0x4f

CPU: 3 PID: 4539 Comm: mount.nfs Not tainted 3.18.0-rc1-00013-g5b095e9 #3393
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
ffff880021499390 ffff8800381476a8 ffffffff81a534cf 0000000000000001
0000000000000000 ffff8800381476c8 ffffffff81097854 00000000000000d0
0000000000000018 ffff880038147718 ffffffff8118e4f3 0000000020479f00
Call Trace:
[] dump_stack+0x4f/0x7c
[] __might_sleep+0x114/0x180
[] __kmalloc+0x1a3/0x280
[] gss_stringify_acceptor+0x58/0xb0 [auth_rpcgss]
[] ? gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss]
[] rpcauth_stringify_acceptor+0x18/0x30 [sunrpc]
[] nfs4_proc_setclientid+0x199/0x380 [nfsv4]
[] ? nfs4_proc_setclientid+0x200/0x380 [nfsv4]
[] nfs40_discover_server_trunking+0xda/0x150 [nfsv4]
[] ? nfs40_discover_server_trunking+0x5/0x150 [nfsv4]
[] nfs4_discover_server_trunking+0x7f/0x2f0 [nfsv4]
[] nfs4_init_client+0x104/0x2f0 [nfsv4]
[] nfs_get_client+0x314/0x3f0 [nfs]
[] ? nfs_get_client+0xe0/0x3f0 [nfs]
[] nfs4_set_client+0x8a/0x110 [nfsv4]
[] ? __rpc_init_priority_wait_queue+0xa8/0xf0 [sunrpc]
[] nfs4_create_server+0x12f/0x390 [nfsv4]
[] nfs4_remote_mount+0x32/0x60 [nfsv4]
[] mount_fs+0x39/0x1b0
[] ? __alloc_percpu+0x15/0x20
[] vfs_kern_mount+0x6b/0x150
[] nfs_do_root_mount+0x86/0xc0 [nfsv4]
[] nfs4_try_mount+0x44/0xc0 [nfsv4]
[] ? get_nfs_version+0x27/0x90 [nfs]
[] nfs_fs_mount+0x47d/0xd60 [nfs]
[] ? mutex_unlock+0xe/0x10
[] ? nfs_remount+0x430/0x430 [nfs]
[] ? nfs_clone_super+0x140/0x140 [nfs]
[] mount_fs+0x39/0x1b0
[] ? __alloc_percpu+0x15/0x20
[] vfs_kern_mount+0x6b/0x150
[] do_mount+0x210/0xbe0
[] ? copy_mount_options+0x3a/0x160
[] SyS_mount+0x6f/0xb0
[] system_call_fastpath+0x12/0x17

Sleeping under the rcu_read_lock is bad. This patch fixes it by dropping
the rcu_read_lock before doing the allocation and then reacquiring it
and redoing the dereference before doing the copy. If we find that the
string has somehow grown in the meantime, we'll reallocate and try again.

Cc: # v3.17+
Reported-by: "J. Bruce Fields"
Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust

Jeff Layton
2014-11-14 02:15:49 +0800

29 Oct, 2014

1 commit

eb63192bb SUNRPC: off by one in BUG_ON() ... Browse Code »

The m->pool_to[] array has "maxpools" number of elements. It's
allocated in svc_pool_map_alloc_arrays() which we called earlier in the
function. This test should be >= instead of >.

Signed-off-by: Dan Carpenter
Signed-off-by: J. Bruce Fields

Dan Carpenter
2014-10-29 23:37:42 +0800

24 Oct, 2014

2 commits

280caac07 rpc: change comments to assertions ... Browse Code »

Reported-by: Andrea Arcangeli
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2014-10-24 02:05:11 +0800
ed38c0699 RPC: remove unneeded checks from xdr_truncate_encode() ... Browse Code »

Thanks to Andrea Arcangeli for pointing out these checks are
obviously unnecessary given the preceding calculations.

Reported-by: Andrea Arcangeli
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2014-10-24 02:05:11 +0800