Eric Lee / smarc-fsl-linux-kernel

24 Oct, 2010

15 commits

82f2e5472 NFS: Readdir plus in v4 ... Browse Code »

By requsting more attributes during a readdir, we can mimic the readdir plus
operation that was in NFSv3.

To test, I ran the command `ls -lU --color=none` on directories with various
numbers of files. Without readdir plus, I see this:

n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000
--------+-----------+-----------+-----------+-----------+----------
real | 0m00.153s | 0m00.589s | 0m05.601s | 0m56.691s | 9m59.128s
user | 0m00.007s | 0m00.007s | 0m00.077s | 0m00.703s | 0m06.800s
sys | 0m00.010s | 0m00.070s | 0m00.633s | 0m06.423s | 1m10.005s
access | 3 | 1 | 1 | 4 | 31
getattr | 2 | 1 | 1 | 1 | 1
lookup | 104 | 1,003 | 10,003 | 100,003 | 1,000,003
readdir | 2 | 16 | 158 | 1,575 | 15,749
total | 111 | 1,021 | 10,163 | 101,583 | 1,015,784

With readdir plus enabled, I see this:

n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000
--------+-----------+-----------+-----------+-----------+----------
real | 0m00.115s | 0m00.206s | 0m01.079s | 0m12.521s | 2m07.528s
user | 0m00.003s | 0m00.003s | 0m00.040s | 0m00.290s | 0m03.296s
sys | 0m00.007s | 0m00.020s | 0m00.120s | 0m01.357s | 0m17.556s
access | 3 | 1 | 1 | 1 | 7
getattr | 2 | 1 | 1 | 1 | 1
lookup | 4 | 3 | 3 | 3 | 3
readdir | 6 | 62 | 630 | 6,300 | 62,993
total | 15 | 67 | 635 | 6,305 | 63,004

Readdir plus disabled has about a 16x increase in the number of rpc calls and
is 4 - 5 times slower on large directories.

Signed-off-by: Bryan Schumaker
Signed-off-by: Trond Myklebust

Bryan Schumaker
2010-10-24 03:27:37 +0800
ae42c70a6 NFS: introduce generic decode_getattr function ... Browse Code »

Getattr should be able to decode errors and the readdir file handle.
decode_getfattr_attrs does the actual attribute decoding, while
decode_getfattr_generic will check the opcode before decoding. This will
let other functions call decode_getfattr_attrs to decode their attributes.

Signed-off-by: Bryan Schumaker
Signed-off-by: Trond Myklebust

Bryan Schumaker
2010-10-24 03:27:37 +0800
994243808 NFS: check xdr_decode for errors ... Browse Code »

Check if the decoded entry has the eof bit set when returning from xdr_decode
with an error. If it does, we should set the eof bits in the array before
returning. This should keep us from looping when we expect more data but the
server doesn't give us anything new.

Signed-off-by: Bryan Schumaker
Signed-off-by: Trond Myklebust

Bryan Schumaker
2010-10-24 03:27:36 +0800
3c8a1aeed NFS: nfs_readdir_filler catch all errors ... Browse Code »

Check for all errors, not a specific one.

Signed-off-by: Bryan Schumaker
Signed-off-by: Trond Myklebust

Bryan Schumaker
2010-10-24 03:27:35 +0800
56e4ebf87 NFS: readdir with vmapped pages ... Browse Code »

We can use vmapped pages to read more information from the network at once.
This will reduce the number of calls needed to complete a readdir.

Signed-off-by: Bryan Schumaker
[trondmy: Added #include for linux/vmalloc.h> in fs/nfs/dir.c]
Signed-off-by: Trond Myklebust

Bryan Schumaker
2010-10-24 03:27:35 +0800
afa8ccc97 NFS: remove page size checking code ... Browse Code »

Remove the page size checking code for a readdir decode. This is now done
by decode_dirent with xdr_streams.

Signed-off-by: Bryan Schumaker
Signed-off-by: Trond Myklebust

Bryan Schumaker
2010-10-24 03:27:34 +0800
babddc72a NFS: decode_dirent should use an xdr_stream ... Browse Code »

Convert nfs*xdr.c to use an xdr stream in decode_dirent. This will prevent a
kernel oops that has been occuring when reading a vmapped page.

Signed-off-by: Bryan Schumaker
Signed-off-by: Trond Myklebust

Bryan Schumaker
2010-10-24 03:27:33 +0800
ba8e452a4 SUNRPC: Add a helper function xdr_inline_peek ... Browse Code »

We sometimes need to be able to read ahead in an xdr_stream without
incrementing the current pointer position.

Signed-off-by: Trond Myklebust

Trond Myklebust
2010-10-24 03:27:32 +0800
0715dc632 NFS: remove readdir plus limit ... Browse Code »

We will now use readdir plus even on directories that are very large.

Signed-off-by: Bryan Schumaker
Signed-off-by: Trond Myklebust

Bryan Schumaker
2010-10-24 03:27:32 +0800
d39ab9de3 NFS: re-add readdir plus ... Browse Code »

This patch adds readdir plus support to the cache array.

Signed-off-by: Bryan Schumaker
Signed-off-by: Trond Myklebust

Bryan Schumaker
2010-10-24 03:27:31 +0800
baf57a09e NFS: Optimise the readdir searches ... Browse Code »

If we're going through the loop in nfs_readdir() more than once, we usually
do not want to restart searching from the beginning of the pages cache.

We only want to do that if the previous search failed...

Signed-off-by: Trond Myklebust

Trond Myklebust
2010-10-24 03:27:30 +0800
d1bacf9eb NFS: add readdir cache array ... Browse Code »

This patch adds the readdir cache array and functions to retreive the array
stored on a cache page, clear the array by freeing allocated memory, add an
entry to the array, and search the array for a given cookie.

It then modifies readdir to make use of the new cache array.
With the new cache array method, we no longer need some of this code.

Finally, nfs_llseek_dir() will set file->f_pos to a value greater than 0 and
desc->dir_cookie to zero. When we see this, readdir needs to find the file
at position file->f_pos from the start of the directory.

Signed-off-by: Bryan Schumaker
Signed-off-by: Trond Myklebust

Bryan Schumaker
2010-10-24 03:27:30 +0800
8c7597f6c nfs: include ratelimit.h, fix nfs4state build error ... Browse Code »

nfs4state.c uses interfaces from ratelimit.h. It needs to include
that header file to fix build errors:

fs/nfs/nfs4state.c:1195: warning: type defaults to 'int' in declaration of 'DEFINE_RATELIMIT_STATE'
fs/nfs/nfs4state.c:1195: warning: parameter names (without types) in function declaration
fs/nfs/nfs4state.c:1195: error: invalid storage class for function 'DEFINE_RATELIMIT_STATE'
fs/nfs/nfs4state.c:1195: error: implicit declaration of function '__ratelimit'
fs/nfs/nfs4state.c:1195: error: '_rs' undeclared (first use in this function)

Signed-off-by: Randy Dunlap
Cc: Trond Myklebust
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Trond Myklebust

Randy Dunlap
2010-10-24 03:27:29 +0800
168667c43 NFSv4: The state manager must ignore EKEYEXPIRED. ... Browse Code »

Otherwise, we cannot recover state correctly.

Signed-off-by: Trond Myklebust

Trond Myklebust
2010-10-24 03:27:28 +0800
898f635c4 NFSv4: Don't ignore the error return codes from nfs_intent_set_file ... Browse Code »

If nfs_intent_set_file() returns an error, we usually want to pass that
back up the stack.

Also ensure that nfs_open_revalidate() returns '1' on success.

Signed-off-by: Trond Myklebust

Trond Myklebust
2010-10-24 03:27:17 +0800

20 Oct, 2010

4 commits

6eaa61496 NFSv4: Don't call nfs4_reclaim_complete() on receiving NFS4ERR_STALE_CLIENTID ... Browse Code »

If the server sends us an NFS4ERR_STALE_CLIENTID while the state management
thread is busy reclaiming state, we do want to treat all state that wasn't
reclaimed before the STALE_CLIENTID as if a network partition occurred (see
the edge conditions described in RFC3530 and RFC5661).
What we do not want to do is to send an nfs4_reclaim_complete(), since we
haven't yet even started reclaiming state after the server rebooted.

Signed-off-by: Trond Myklebust
Cc: stable@kernel.org

Trond Myklebust
2010-10-20 07:42:53 +0800
ae1007d37 NFSv4: Don't call nfs4_state_mark_reclaim_reboot() from error handlers ... Browse Code »

In the case of a server reboot, the state recovery thread starts by calling
nfs4_state_end_reclaim_reboot() in order to avoid edge conditions when
the server reboots while the client is in the middle of recovery.

However, if the client has already marked the nfs4_state as requiring
reboot recovery, then the above behaviour will cause the recovery thread to
treat the open as if it was part of such an edge condition: the open will
be recovered as if it was part of a lease expiration (and all the locks
will be lost).
Fix is to remove the call to nfs4_state_mark_reclaim_reboot from
nfs4_async_handle_error(), and nfs4_handle_exception(). Instead we leave it
to the recovery thread to do this for us.

Signed-off-by: Trond Myklebust
Cc: stable@kernel.org

Trond Myklebust
2010-10-20 07:42:33 +0800
b0ed9dbc2 NFSv4: Fix open recovery ... Browse Code »

NFSv4 open recovery is currently broken: since we do not clear the
state->flags states before attempting recovery, we end up with the
'can_open_cached()' function triggering. This again leads to no OPEN call
being put on the wire.

Reported-by: Sachin Prabhu
Signed-off-by: Trond Myklebust
Cc: stable@kernel.org

Trond Myklebust
2010-10-20 07:41:55 +0800
bc4866b6e NFS: Don't SIGBUS if nfs_vm_page_mkwrite races with a cache invalidation ... Browse Code »

In the case where we lock the page, and then find out that the page has
been thrown out of the page cache, we should just return VM_FAULT_NOPAGE.
This is what block_page_mkwrite() does in these situations.

Signed-off-by: Trond Myklebust
Cc: stable@kernel.org

Trond Myklebust
2010-10-20 07:37:54 +0800

08 Oct, 2010

1 commit

955a857e0 NFS: new idmapper ... Browse Code »

This patch creates a new idmapper system that uses the request-key function to
place a call into userspace to map user and group ids to names. The old
idmapper was single threaded, which prevented more than one request from running
at a single time. This means that a user would have to wait for an upcall to
finish before accessing a cached result.

The upcall result is stored on a keyring of type id_resolver. See the file
Documentation/filesystems/nfs/idmapper.txt for instructions.

Signed-off-by: Bryan Schumaker
[Trond: fix up the return value of nfs_idmap_lookup_name and clean up code]
Signed-off-by: Trond Myklebust

Bryan Schumaker
2010-10-08 06:48:49 +0800

30 Sep, 2010

3 commits

aa510da5b NFS: We must use list_for_each_entry_safe in nfs_access_cache_shrinker ... Browse Code »

We may end up removing the current entry from nfs_access_lru_list.

Signed-off-by: Trond Myklebust

Trond Myklebust
2010-09-30 03:16:25 +0800
a00dd6c03 NFS: don't use FLUSH_SYNC on WB_SYNC_NONE COMMIT calls (try #2) ... Browse Code »

WB_SYNC_NONE is supposed to mean "don't wait on anything". That should
also include not waiting for COMMIT calls to complete.

WB_SYNC_NONE is also implied when wbc->nonblocking and
wbc->for_background are set, so we can replace those checks in
nfs_commit_unstable_pages with a check for WB_SYNC_NONE.

Signed-off-by: Jeff Layton
Reviewed-by: Wu Fengguang
Signed-off-by: Trond Myklebust

Jeff Layton
2010-09-30 02:42:30 +0800
5c78f58e2 NFS: Really fix put_nfs_open_context() ... Browse Code »

In nfs_open_revalidate(), if the open_context() call returns an inode that
is not the same as dentry->d_inode, then we will call
put_nfs_open_context() with a valid dentry->d_inode, but without the
context being part of the nfsi->open_files list.

In this case too, we want to just skip the list removal, but we do want to
call the ->close_context() callback in order to close the NFSv4 state.

Signed-off-by: Trond Myklebust
Acked-by: Jeff Layton

Trond Myklebust
2010-09-30 02:41:36 +0800

24 Sep, 2010

3 commits

dfb4f3098 NFSv4.1: keep seq_res.sr_slot as pointer rather than an index ... Browse Code »

Having to explicitly initialize sr_slotid to NFS4_MAX_SLOT_TABLE
resulted in numerous bugs. Keeping the current slot as a pointer
to the slot table is more straight forward and robust as it's
implicitly set up to NULL wherever the seq_res member is initialized
to zeroes.

Signed-off-by: Benny Halevy
Signed-off-by: Trond Myklebust

Benny Halevy
2010-09-24 21:17:01 +0800
7c563cc9f nfs: show "local_lock" mount option in /proc/mounts ... Browse Code »

Display the status of 'local_lock' mount option in /proc/mounts.

Signed-off-by: Suresh Jayaraman
Signed-off-by: Trond Myklebust

Suresh Jayaraman
2010-09-24 02:26:48 +0800
ef84303eb NFS: handle inode==NULL in __put_nfs_open_context ... Browse Code »

inode may be NULL when put_nfs_open_context is called from nfs_atomic_lookup
before d_add_unique(dentry, inode)

Signed-off-by: Benny Halevy
Signed-off-by: Trond Myklebust

Benny Halevy
2010-09-24 00:22:09 +0800

23 Sep, 2010

1 commit

5eebde232 nfs: introduce mount option '-olocal_lock' to make locks local ... Browse Code »

NFS clients since 2.6.12 support flock locks by emulating fcntl byte-range
locks. Due to this, some windows applications which seem to use both flock
(share mode lock mapped as flock by Samba) and fcntl locks sequentially on
the same file, can't lock as they falsely assume the file is already locked.
The problem was reported on a setup with windows clients accessing excel files
on a Samba exported share which is originally a NFS mount from a NetApp filer.

Older NFS clients (< 2.6.12) did not see this problem as flock locks were
considered local. To support legacy flock behavior, this patch adds a mount
option "-olocal_lock=" which can take the following values:

'none' - Neither flock locks nor POSIX locks are local
'flock' - flock locks are local
'posix' - fcntl/POSIX locks are local
'all' - Both flock locks and POSIX locks are local

Testing:

- This patch was tested by using -olocal_lock option with different values
and the NLM calls were noted from the network packet captured.

'none' - NLM calls were seen during both flock() and fcntl(), flock lock
was granted, fcntl was denied
'flock' - no NLM calls for flock(), NLM call was seen for fcntl(),
granted
'posix' - NLM call was seen for flock() - granted, no NLM call for fcntl()
'all' - no NLM calls were seen during both flock() and fcntl()

- No bugs were seen during NFSv4 locking/unlocking in general and NFSv4
reboot recovery.

Cc: Neil Brown
Signed-off-by: Suresh Jayaraman
Signed-off-by: Trond Myklebust

Suresh Jayaraman
2010-09-23 20:55:58 +0800

22 Sep, 2010

8 commits

63185942c lockd: Remove BKL from the client ... Browse Code »

This patch removes all calls to lock_kernel() from the client. This patch
should be applied after the "fs/lock.c prepare for BKL removal" patch submitted
by Arnd Bergmann on September 18.

Signed-off-by: Bryan Schumaker
Signed-off-by: Trond Myklebust

Bryan Schumaker
2010-09-22 21:50:35 +0800
b4687da7f SUNRPC: Refactor logic to NUL-terminate strings in pages ... Browse Code »

Clean up: Introduce a helper to '\0'-terminate XDR strings
that are placed in a page in the page cache.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2010-09-22 04:55:48 +0800
38359352f SUNRPC: Correct an rpcbind debugging message ... Browse Code »

Clean up.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2010-09-22 04:55:48 +0800
d141d9743 NFS: Fix NFSv3 debugging messages in fs/nfs/nfs3proc.c ... Browse Code »

Clean up.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2010-09-22 04:55:47 +0800
609588928 NFS: Convert nfsiod to use alloc_workqueue() ... Browse Code »

create_singlethread_workqueue() is deprecated.

Signed-off-by: Trond Myklebust

Trond Myklebust
2010-09-22 04:55:31 +0800
4fbf6e507 SUNRPC: Convert rpciod to use the alloc_workqueue() interface ... Browse Code »

create_workqueue() is a deprecated function.

Signed-off-by: Trond Myklebust

Trond Myklebust
2010-09-22 04:54:34 +0800
d688e1100 NFSv4.1: Fix the slotid initialisation in nfs_async_rename() ... Browse Code »

This fixes an Oopsable condition that was introduced by commit
d3d4152a5d59af9e13a73efa9e9c24383fbe307f (nfs: make sillyrename an async
operation)

Signed-off-by: Trond Myklebust

Trond Myklebust
2010-09-22 04:52:40 +0800
f7732d657 NFS: Fix a use-after-free case in nfs_async_rename() ... Browse Code »

The call to nfs_async_rename_release() after rpc_run_task() is incorrect.
The rpc_run_task() is always guaranteed to call the ->rpc_release() method.

Signed-off-by: Trond Myklebust

Trond Myklebust
2010-09-22 04:52:40 +0800

18 Sep, 2010

4 commits

d3d4152a5 nfs: make sillyrename an async operation ... Browse Code »

A synchronous rename can be interrupted by a SIGKILL. If that happens
during a sillyrename operation, it's possible for the rename call to
be sent to the server, but the task exits before processing the
reply. If this happens, the sillyrenamed file won't get cleaned up
during nfs_dentry_iput and the server is left with a dangling .nfs* file
hanging around.

Fix this problem by turning sillyrename into an asynchronous operation
and have the task doing the sillyrename just wait on the reply. If the
task is killed before the sillyrename completes, it'll still proceed
to completion.

Signed-off-by: Jeff Layton
Reviewed-by: Chuck Lever
Signed-off-by: Trond Myklebust

Jeff Layton
2010-09-18 05:31:57 +0800
779c51795 nfs: move nfs_sillyrename to unlink.c ... Browse Code »

...since that's where most of the sillyrenaming code lives. A comment
block is added to the beginning as well to clarify how sillyrenaming
works. Also, make nfs_async_unlink static as nfs_sillyrename is the only
caller.

Signed-off-by: Jeff Layton
Reviewed-by: Chuck Lever
Signed-off-by: Trond Myklebust

Jeff Layton
2010-09-18 05:31:30 +0800
e8582a8b9 nfs: standardize the rename response container ... Browse Code »

Right now, v3 and v4 have their own variants. Create a standard struct
that will work for v3 and v4. v2 doesn't get anything but a simple error
and so isn't affected by this.

Signed-off-by: Jeff Layton
Reviewed-by: Chuck Lever
Signed-off-by: Trond Myklebust

Jeff Layton
2010-09-18 05:31:06 +0800
920769f03 nfs: standardize the rename args container ... Browse Code »

Each NFS version has its own version of the rename args container.
Standardize them on a common one that's identical to the one NFSv4
uses.

Signed-off-by: Jeff Layton
Reviewed-by: Chuck Lever
Signed-off-by: Trond Myklebust

Jeff Layton
2010-09-18 05:30:25 +0800

17 Sep, 2010

1 commit

2b484297e NFS: Add an 'open_context' element to struct nfs_rpc_ops ... Browse Code »

Signed-off-by: Trond Myklebust

Trond Myklebust
2010-09-17 22:56:51 +0800