24 Oct, 2010

15 commits

  • By requsting more attributes during a readdir, we can mimic the readdir plus
    operation that was in NFSv3.

    To test, I ran the command `ls -lU --color=none` on directories with various
    numbers of files. Without readdir plus, I see this:

    n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000
    --------+-----------+-----------+-----------+-----------+----------
    real | 0m00.153s | 0m00.589s | 0m05.601s | 0m56.691s | 9m59.128s
    user | 0m00.007s | 0m00.007s | 0m00.077s | 0m00.703s | 0m06.800s
    sys | 0m00.010s | 0m00.070s | 0m00.633s | 0m06.423s | 1m10.005s
    access | 3 | 1 | 1 | 4 | 31
    getattr | 2 | 1 | 1 | 1 | 1
    lookup | 104 | 1,003 | 10,003 | 100,003 | 1,000,003
    readdir | 2 | 16 | 158 | 1,575 | 15,749
    total | 111 | 1,021 | 10,163 | 101,583 | 1,015,784

    With readdir plus enabled, I see this:

    n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000
    --------+-----------+-----------+-----------+-----------+----------
    real | 0m00.115s | 0m00.206s | 0m01.079s | 0m12.521s | 2m07.528s
    user | 0m00.003s | 0m00.003s | 0m00.040s | 0m00.290s | 0m03.296s
    sys | 0m00.007s | 0m00.020s | 0m00.120s | 0m01.357s | 0m17.556s
    access | 3 | 1 | 1 | 1 | 7
    getattr | 2 | 1 | 1 | 1 | 1
    lookup | 4 | 3 | 3 | 3 | 3
    readdir | 6 | 62 | 630 | 6,300 | 62,993
    total | 15 | 67 | 635 | 6,305 | 63,004

    Readdir plus disabled has about a 16x increase in the number of rpc calls and
    is 4 - 5 times slower on large directories.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • Getattr should be able to decode errors and the readdir file handle.
    decode_getfattr_attrs does the actual attribute decoding, while
    decode_getfattr_generic will check the opcode before decoding. This will
    let other functions call decode_getfattr_attrs to decode their attributes.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • Check if the decoded entry has the eof bit set when returning from xdr_decode
    with an error. If it does, we should set the eof bits in the array before
    returning. This should keep us from looping when we expect more data but the
    server doesn't give us anything new.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • Check for all errors, not a specific one.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • We can use vmapped pages to read more information from the network at once.
    This will reduce the number of calls needed to complete a readdir.

    Signed-off-by: Bryan Schumaker
    [trondmy: Added #include for linux/vmalloc.h> in fs/nfs/dir.c]
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • Remove the page size checking code for a readdir decode. This is now done
    by decode_dirent with xdr_streams.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • Convert nfs*xdr.c to use an xdr stream in decode_dirent. This will prevent a
    kernel oops that has been occuring when reading a vmapped page.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • We sometimes need to be able to read ahead in an xdr_stream without
    incrementing the current pointer position.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • We will now use readdir plus even on directories that are very large.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • This patch adds readdir plus support to the cache array.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • If we're going through the loop in nfs_readdir() more than once, we usually
    do not want to restart searching from the beginning of the pages cache.

    We only want to do that if the previous search failed...

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • This patch adds the readdir cache array and functions to retreive the array
    stored on a cache page, clear the array by freeing allocated memory, add an
    entry to the array, and search the array for a given cookie.

    It then modifies readdir to make use of the new cache array.
    With the new cache array method, we no longer need some of this code.

    Finally, nfs_llseek_dir() will set file->f_pos to a value greater than 0 and
    desc->dir_cookie to zero. When we see this, readdir needs to find the file
    at position file->f_pos from the start of the directory.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • nfs4state.c uses interfaces from ratelimit.h. It needs to include
    that header file to fix build errors:

    fs/nfs/nfs4state.c:1195: warning: type defaults to 'int' in declaration of 'DEFINE_RATELIMIT_STATE'
    fs/nfs/nfs4state.c:1195: warning: parameter names (without types) in function declaration
    fs/nfs/nfs4state.c:1195: error: invalid storage class for function 'DEFINE_RATELIMIT_STATE'
    fs/nfs/nfs4state.c:1195: error: implicit declaration of function '__ratelimit'
    fs/nfs/nfs4state.c:1195: error: '_rs' undeclared (first use in this function)

    Signed-off-by: Randy Dunlap
    Cc: Trond Myklebust
    Cc: linux-nfs@vger.kernel.org
    Signed-off-by: Trond Myklebust

    Randy Dunlap
     
  • Otherwise, we cannot recover state correctly.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • If nfs_intent_set_file() returns an error, we usually want to pass that
    back up the stack.

    Also ensure that nfs_open_revalidate() returns '1' on success.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

20 Oct, 2010

4 commits

  • If the server sends us an NFS4ERR_STALE_CLIENTID while the state management
    thread is busy reclaiming state, we do want to treat all state that wasn't
    reclaimed before the STALE_CLIENTID as if a network partition occurred (see
    the edge conditions described in RFC3530 and RFC5661).
    What we do not want to do is to send an nfs4_reclaim_complete(), since we
    haven't yet even started reclaiming state after the server rebooted.

    Signed-off-by: Trond Myklebust
    Cc: stable@kernel.org

    Trond Myklebust
     
  • In the case of a server reboot, the state recovery thread starts by calling
    nfs4_state_end_reclaim_reboot() in order to avoid edge conditions when
    the server reboots while the client is in the middle of recovery.

    However, if the client has already marked the nfs4_state as requiring
    reboot recovery, then the above behaviour will cause the recovery thread to
    treat the open as if it was part of such an edge condition: the open will
    be recovered as if it was part of a lease expiration (and all the locks
    will be lost).
    Fix is to remove the call to nfs4_state_mark_reclaim_reboot from
    nfs4_async_handle_error(), and nfs4_handle_exception(). Instead we leave it
    to the recovery thread to do this for us.

    Signed-off-by: Trond Myklebust
    Cc: stable@kernel.org

    Trond Myklebust
     
  • NFSv4 open recovery is currently broken: since we do not clear the
    state->flags states before attempting recovery, we end up with the
    'can_open_cached()' function triggering. This again leads to no OPEN call
    being put on the wire.

    Reported-by: Sachin Prabhu
    Signed-off-by: Trond Myklebust
    Cc: stable@kernel.org

    Trond Myklebust
     
  • In the case where we lock the page, and then find out that the page has
    been thrown out of the page cache, we should just return VM_FAULT_NOPAGE.
    This is what block_page_mkwrite() does in these situations.

    Signed-off-by: Trond Myklebust
    Cc: stable@kernel.org

    Trond Myklebust
     

08 Oct, 2010

1 commit

  • This patch creates a new idmapper system that uses the request-key function to
    place a call into userspace to map user and group ids to names. The old
    idmapper was single threaded, which prevented more than one request from running
    at a single time. This means that a user would have to wait for an upcall to
    finish before accessing a cached result.

    The upcall result is stored on a keyring of type id_resolver. See the file
    Documentation/filesystems/nfs/idmapper.txt for instructions.

    Signed-off-by: Bryan Schumaker
    [Trond: fix up the return value of nfs_idmap_lookup_name and clean up code]
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     

30 Sep, 2010

3 commits


24 Sep, 2010

3 commits


23 Sep, 2010

1 commit

  • NFS clients since 2.6.12 support flock locks by emulating fcntl byte-range
    locks. Due to this, some windows applications which seem to use both flock
    (share mode lock mapped as flock by Samba) and fcntl locks sequentially on
    the same file, can't lock as they falsely assume the file is already locked.
    The problem was reported on a setup with windows clients accessing excel files
    on a Samba exported share which is originally a NFS mount from a NetApp filer.

    Older NFS clients (< 2.6.12) did not see this problem as flock locks were
    considered local. To support legacy flock behavior, this patch adds a mount
    option "-olocal_lock=" which can take the following values:

    'none' - Neither flock locks nor POSIX locks are local
    'flock' - flock locks are local
    'posix' - fcntl/POSIX locks are local
    'all' - Both flock locks and POSIX locks are local

    Testing:

    - This patch was tested by using -olocal_lock option with different values
    and the NLM calls were noted from the network packet captured.

    'none' - NLM calls were seen during both flock() and fcntl(), flock lock
    was granted, fcntl was denied
    'flock' - no NLM calls for flock(), NLM call was seen for fcntl(),
    granted
    'posix' - NLM call was seen for flock() - granted, no NLM call for fcntl()
    'all' - no NLM calls were seen during both flock() and fcntl()

    - No bugs were seen during NFSv4 locking/unlocking in general and NFSv4
    reboot recovery.

    Cc: Neil Brown
    Signed-off-by: Suresh Jayaraman
    Signed-off-by: Trond Myklebust

    Suresh Jayaraman
     

22 Sep, 2010

8 commits


18 Sep, 2010

4 commits

  • A synchronous rename can be interrupted by a SIGKILL. If that happens
    during a sillyrename operation, it's possible for the rename call to
    be sent to the server, but the task exits before processing the
    reply. If this happens, the sillyrenamed file won't get cleaned up
    during nfs_dentry_iput and the server is left with a dangling .nfs* file
    hanging around.

    Fix this problem by turning sillyrename into an asynchronous operation
    and have the task doing the sillyrename just wait on the reply. If the
    task is killed before the sillyrename completes, it'll still proceed
    to completion.

    Signed-off-by: Jeff Layton
    Reviewed-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • ...since that's where most of the sillyrenaming code lives. A comment
    block is added to the beginning as well to clarify how sillyrenaming
    works. Also, make nfs_async_unlink static as nfs_sillyrename is the only
    caller.

    Signed-off-by: Jeff Layton
    Reviewed-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • Right now, v3 and v4 have their own variants. Create a standard struct
    that will work for v3 and v4. v2 doesn't get anything but a simple error
    and so isn't affected by this.

    Signed-off-by: Jeff Layton
    Reviewed-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • Each NFS version has its own version of the rename args container.
    Standardize them on a common one that's identical to the one NFSv4
    uses.

    Signed-off-by: Jeff Layton
    Reviewed-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Jeff Layton
     

17 Sep, 2010

1 commit