06 Jan, 2012

1 commit

  • Servers have a finite amount of memory to store NFSv4 open and lock
    owners. Moreover, servers may have a difficult time determining when
    they can reap their state owner table, thanks to gray areas in the
    NFSv4 protocol specification. Thus clients should be careful to reuse
    state owners when possible.

    Currently Linux is not too careful. When a user has closed all her
    files on one mount point, the state owner's reference count goes to
    zero, and it is released. The next OPEN allocates a new one. A
    workload that serially opens and closes files can run through a large
    number of open owners this way.

    When a state owner's reference count goes to zero, slap it onto a free
    list for that nfs_server, with an expiry time. Garbage collect before
    looking for a state owner. This makes state owners for active users
    available for re-use.

    Now that there can be unused state owners remaining at umount time,
    purge the state owner free list when a server is destroyed. Also be
    sure not to reclaim unused state owners during state recovery.

    This change has benefits for the client as well. For some workloads,
    this approach drops the number of OPEN_CONFIRM calls from the same as
    the number of OPEN calls, down to just one. This reduces wire traffic
    and thus open(2) latency. Before this patch, untarring a kernel
    source tarball shows the OPEN_CONFIRM call counter steadily increasing
    through the test. With the patch, the OPEN_CONFIRM count remains at 1
    throughout the entire untar.

    As long as the expiry time is kept short, I don't think garbage
    collection should be terribly expensive, although it does bounce the
    clp->cl_lock around a bit.

    [ At some point we should rationalize the use of the nfs_server
    ->destroy method. ]

    Signed-off-by: Chuck Lever
    [Trond: Fixed a garbage collection race and a few efficiency issues]
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

05 Jan, 2012

1 commit

  • There's no longer a need to check the so_server field in the state
    owner, because nowadays the RB tree we search for state owners
    contains owners for that only server.

    Make nfs4_find_state_owners_locked() use the same tree searching logic
    as nfs4_insert_state_owner_locked().

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

10 Dec, 2011

1 commit


02 Dec, 2011

2 commits


25 Aug, 2011

1 commit


26 Jul, 2011

1 commit


20 Jul, 2011

1 commit


13 Jul, 2011

1 commit


28 May, 2011

1 commit


25 Apr, 2011

2 commits


16 Apr, 2011

1 commit


29 Mar, 2011

1 commit

  • Fix the incorrect use of igrab() inside the i_lock in NFS and Ceph‥

    If we are already holding the i_lock, we have a reference to the
    inode so we can safely use ihold() to gain an extra reference. This
    avoids hangs due to lock recursion on the i_lock now that the
    inode_lock is gone and igrab() uses the i_lock itself.

    Signed-off-by: Dave Chinner
    Cc: Al Viro
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Ryan Mallon
    Signed-off-by: Linus Torvalds

    Dave Chinner
     

12 Mar, 2011

4 commits

  • Use our own async error handler.
    Mark the layout as failed and retry i/o through the MDS on specified errors.

    Update the mds_offset in nfs_readpage_retry so that a failed short-read retry
    to a DS gets correctly resent through the MDS.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Data servers cannot send nfs4_proc_get_lease_time. but still need to setup
    state renewal. Add the NFS_CS_CHECK_LEASE_TIME bit to indicate if the lease
    time can be checked.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • There are no more external users of nfs4_state_mark_reclaim_nograce() or
    nfs4_state_mark_reclaim_reboot(), so mark them as static.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • nfs4_schedule_state_recovery() should only be used when we need to force
    the state manager to check the lease. If we just want to start the
    state manager in order to handle a state recovery situation, we should be
    using nfs4_schedule_state_manager().

    This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing
    its use with a set of helper functions that do the right thing.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

26 Jan, 2011

1 commit

  • The information required to find the nfs_client cooresponding to the incoming
    back channel request is contained in the NFS layer. Perform minimal checking
    in the RPC layer pg_authenticate method, and push more detailed checking into
    the NFS layer where the nfs_client can be found.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     

07 Jan, 2011

4 commits

  • NFSv4 migration needs to reassociate state owners from the source to
    the destination nfs_server data structures. To make that easier, move
    the cl_state_owners field to the nfs_server struct. cl_openowner_id
    and cl_lockowner_id accompany this move, as they are used in
    conjunction with cl_state_owners.

    The cl_lock field in the parent nfs_client continues to protect all
    three of these fields.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • A layout can request return-on-close. How this interacts with the
    forgetful model of never sending LAYOUTRETURNS is a bit ambiguous.
    We forget any layouts marked roc, and wait for them to be completely
    forgotten before continuing with the close. In addition, to compensate
    for races with any inflight LAYOUTGETs, and the fact that we do not get
    any layout stateid back from the server, we set the barrier to the worst
    case scenario of current_seqid + number of outstanding LAYOUTGETS.

    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Fred Isaman
     
  • Currently session draining only drains the fore channel.
    The back channel processing must also be drained.

    Use the back channel highest_slot_used to indicate that a callback is being
    processed by the callback thread. Move the session complete to be per channel.

    When the session is draininig, wait for any current back channel processing
    to complete and stop all new back channel processing by returning NFS4ERR_DELAY
    to the back channel client.

    Drain the back channel, then the fore channel.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • The sessions based callback service is started prior to the CREATE_SESSION call
    so that it can handle CB_NULL requests which can be sent before the
    CREATE_SESSION call returns and the session ID is known.

    Set the callback sessionid after a sucessful CREATE_SESSION.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     

27 Oct, 2010

1 commit

  • * 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
    net/sunrpc: Use static const char arrays
    nfs4: fix channel attribute sanity-checks
    NFSv4.1: Use more sensible names for 'initialize_mountpoint'
    NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure
    NFSv4.1: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure
    NFS: client needs to maintain list of inodes with active layouts
    NFS: create and destroy inode's layout cache
    NFSv4.1: pnfs: filelayout: introduce minimal file layout driver
    NFSv4.1: pnfs: full mount/umount infrastructure
    NFS: set layout driver
    NFS: ask for layouttypes during v4 fsinfo call
    NFS: change stateid to be a union
    NFSv4.1: pnfsd, pnfs: protocol level pnfs constants
    SUNRPC: define xdr_decode_opaque_fixed
    NFSD: remove duplicate NFS4_STATEID_SIZE

    Linus Torvalds
     

26 Oct, 2010

1 commit

  • * 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (67 commits)
    SUNRPC: Cleanup duplicate assignment in rpcauth_refreshcred
    nfs: fix unchecked value
    Ask for time_delta during fsinfo probe
    Revalidate caches on lock
    SUNRPC: After calling xprt_release(), we must restart from call_reserve
    NFSv4: Fix up the 'dircount' hint in encode_readdir
    NFSv4: Clean up nfs4_decode_dirent
    NFSv4: nfs4_decode_dirent must clear entry->fattr->valid
    NFSv4: Fix a regression in decode_getfattr
    NFSv4: Fix up decode_attr_filehandle() to handle the case of empty fh pointer
    NFS: Ensure we check all allocation return values in new readdir code
    NFS: Readdir plus in v4
    NFS: introduce generic decode_getattr function
    NFS: check xdr_decode for errors
    NFS: nfs_readdir_filler catch all errors
    NFS: readdir with vmapped pages
    NFS: remove page size checking code
    NFS: decode_dirent should use an xdr_stream
    SUNRPC: Add a helper function xdr_inline_peek
    NFS: remove readdir plus limit
    ...

    Linus Torvalds
     

25 Oct, 2010

1 commit

  • In particular, server reboot will invalidate all layouts.

    Note that in order to have an active layout, we must get a successful response
    from the server. To avoid adding that machinery, this patch just includes a
    stub that fakes up a successful return. Since the layout is never referenced
    for io, this is not a problem.

    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    Signed-off-by: Dean Hildebrand
    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Andy Adamson
     

24 Oct, 2010

2 commits

  • nfs4state.c uses interfaces from ratelimit.h. It needs to include
    that header file to fix build errors:

    fs/nfs/nfs4state.c:1195: warning: type defaults to 'int' in declaration of 'DEFINE_RATELIMIT_STATE'
    fs/nfs/nfs4state.c:1195: warning: parameter names (without types) in function declaration
    fs/nfs/nfs4state.c:1195: error: invalid storage class for function 'DEFINE_RATELIMIT_STATE'
    fs/nfs/nfs4state.c:1195: error: implicit declaration of function '__ratelimit'
    fs/nfs/nfs4state.c:1195: error: '_rs' undeclared (first use in this function)

    Signed-off-by: Randy Dunlap
    Cc: Trond Myklebust
    Cc: linux-nfs@vger.kernel.org
    Signed-off-by: Trond Myklebust

    Randy Dunlap
     
  • Otherwise, we cannot recover state correctly.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

20 Oct, 2010

1 commit

  • If the server sends us an NFS4ERR_STALE_CLIENTID while the state management
    thread is busy reclaiming state, we do want to treat all state that wasn't
    reclaimed before the STALE_CLIENTID as if a network partition occurred (see
    the edge conditions described in RFC3530 and RFC5661).
    What we do not want to do is to send an nfs4_reclaim_complete(), since we
    haven't yet even started reclaiming state after the server rebooted.

    Signed-off-by: Trond Myklebust
    Cc: stable@kernel.org

    Trond Myklebust
     

05 Oct, 2010

1 commit

  • This prepares the removal of the big kernel lock from the
    file locking code. We still use the BKL as long as fs/lockd
    uses it and ceph might sleep, but we can flip the definition
    to a private spinlock as soon as that's done.
    All users outside of fs/lockd get converted to use
    lock_flocks() instead of lock_kernel() where appropriate.

    Based on an earlier patch to use a spinlock from Matthew
    Wilcox, who has attempted this a few times before, the
    earliest patch from over 10 years ago turned it into
    a semaphore, which ended up being slower than the BKL
    and was subsequently reverted.

    Someone should do some serious performance testing when
    this becomes a spinlock, since this has caused problems
    before. Using a spinlock should be at least as good
    as the BKL in theory, but who knows...

    Signed-off-by: Arnd Bergmann
    Acked-by: Matthew Wilcox
    Cc: Christoph Hellwig
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Andrew Morton
    Cc: Miklos Szeredi
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: John Kacur
    Cc: Sage Weil
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-fsdevel@vger.kernel.org

    Arnd Bergmann
     

31 Jul, 2010

2 commits


25 Jun, 2010

1 commit


23 Jun, 2010

2 commits


15 May, 2010

2 commits


03 Mar, 2010

1 commit

  • Ensure that we change the EXCHANGE_ID verifier (i.e. clp->cl_boot_time)
    when we want to reset all state. This is mainly needed when the server
    tells us that it is revoking our open or lock stateids.

    Handle revoking of recallable state by expiring the delegations.

    Handle callback path issues by expiring the delegations and then resetting
    the session.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

10 Feb, 2010

2 commits