30 Apr, 2008

1 commit


26 Apr, 2008

3 commits

  • The file_lock structure is used both as a heavy-weight representation of
    an active lock, with pointers to reference-counted structures, etc., and
    as a simple container for parameters that describe a file lock.

    The conflicting lock returned from __posix_lock_file is an example of
    the latter; so don't call the filesystem or lock manager callbacks when
    copying to it. This also saves the need for an unnecessary
    locks_init_lock in the nfsv4 server.

    Thanks to Trond for pointing out the error.

    Signed-off-by: J. Bruce Fields
    Cc: Trond Myklebust

    J. Bruce Fields
     
  • Add /proc/fs/nfsd/unlock_filesystem, which allows e.g.:

    shell> echo /mnt/sfs1 > /proc/fs/nfsd/unlock_filesystem

    so that a filesystem can be unmounted before allowing a peer nfsd to
    take over nfs service for the filesystem.

    Signed-off-by: S. Wendy Cheng
    Cc: Lon Hohberger
    Cc: Christoph Hellwig
    Signed-off-by: J. Bruce Fields

    fs/lockd/svcsubs.c | 66 +++++++++++++++++++++++++++++++++++++++-----
    fs/nfsd/nfsctl.c | 65 +++++++++++++++++++++++++++++++++++++++++++
    include/linux/lockd/lockd.h | 7 ++++
    3 files changed, 131 insertions(+), 7 deletions(-)

    Wendy Cheng
     
  • For high-availability NFS service, we generally need to be able to drop
    file locks held on the exported filesystem before moving clients to a
    new server. Currently the only way to do that is by shutting down lockd
    entirely, which is often undesireable (for example, if you want to
    continue exporting other filesystems).

    This patch allows the administrator to release all locks held by clients
    accessing the client through a given server ip address, by echoing that
    address to a new file, /proc/fs/nfsd/unlock_ip, as in:

    shell> echo 10.1.1.2 > /proc/fs/nfsd/unlock_ip

    The expected sequence of events can be:
    1. Tear down the IP address
    2. Unexport the path
    3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files
    4. Signal peer to begin take-over.

    For now we only support IPv4 addresses and NFSv2/v3 (NFSv4 locks are not
    affected).

    Also, if unmounting the filesystem is required, we assume at step 3 that
    clients using the given server ip are the only clients holding locks on
    the given filesystem; otherwise, an additional patch is required to
    allow revoking all locks held by lockd on a given filesystem.

    Signed-off-by: S. Wendy Cheng
    Cc: Lon Hohberger
    Cc: Christoph Hellwig
    Signed-off-by: J. Bruce Fields

    fs/lockd/svcsubs.c | 66 +++++++++++++++++++++++++++++++++++++++-----
    fs/nfsd/nfsctl.c | 65 +++++++++++++++++++++++++++++++++++++++++++
    include/linux/lockd/lockd.h | 7 ++++
    3 files changed, 131 insertions(+), 7 deletions(-)

    Wendy Cheng
     

25 Apr, 2008

1 commit

  • * git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (80 commits)
    SUNRPC: Invalidate the RPCSEC_GSS session if the server dropped the request
    make nfs_automount_list static
    NFS: remove duplicate flags assignment from nfs_validate_mount_data
    NFS - fix potential NULL pointer dereference v2
    SUNRPC: Don't change the RPCSEC_GSS context on a credential that is in use
    SUNRPC: Fix a race in gss_refresh_upcall()
    SUNRPC: Don't disconnect more than once if retransmitting NFSv4 requests
    SUNRPC: Remove the unused export of xprt_force_disconnect
    SUNRPC: remove XS_SENDMSG_RETRY
    SUNRPC: Protect creds against early garbage collection
    NFSv4: Attempt to use machine credentials in SETCLIENTID calls
    NFSv4: Reintroduce machine creds
    NFSv4: Don't use cred->cr_ops->cr_name in nfs4_proc_setclientid()
    nfs: fix printout of multiword bitfields
    nfs: return negative error value from nfs{,4}_stat_to_errno
    NLM/lockd: Ensure client locking calls use correct credentials
    NFS: Remove the buggy lock-if-signalled case from do_setlk()
    NLM/lockd: Fix a race when cancelling a blocking lock
    NLM/lockd: Ensure that nlmclnt_cancel() returns results of the CANCEL call
    NLM: Remove the signal masking in nlmclnt_proc/nlmclnt_cancel
    ...

    Linus Torvalds
     

24 Apr, 2008

8 commits

  • When svc_recv returns an unexpected error, lockd will print a warning
    and exit. This problematic for several reasons. In particular, it will
    cause the reference counts for the thread to be wrong, and can lead to a
    potential BUG() call.

    Rather than exiting on error from svc_recv, have the thread do a 1s
    sleep and then retry the loop. This is unlikely to cause any harm, and
    if the error turns out to be something temporary then it may be able to
    recover.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • As of 5996a298da43a03081e9ba2116983d173001c862 ("NLM: don't unlock on
    cancel requests") we no longer unlock in this case, so the comment is no
    longer accurate.

    Thanks to Stuart Friedberg for pointing out the inconsistency.

    Cc: Stuart Friedberg
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • There's no reason for a mutex here, except to allow an allocation under
    the lock, which we can avoid with the usual trick of preallocating
    memory for the new object and freeing it if it turns out to be
    unnecessary.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Use list_for_each_entry(). Also, in keeping with kernel style, make the
    normal case (kzalloc succeeds) unindented and handle the abnormal case
    with a goto.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The sm_count is decremented to zero but left on the nsm_handles list.
    So in the space between decrementing sm_count and acquiring nsm_mutex,
    it is possible for another task to find this nsm_handle, increment the
    use count and then enter nsm_release itself.

    Thus there's nothing to prevent the nsm being freed before we acquire
    nsm_mutex here.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • fs/lockd/svcshare.c:74:50: warning: Using plain integer as NULL pointer

    Signed-off-by: Harvey Harrison
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: J. Bruce Fields

    Harvey Harrison
     
  • Have lockd_up start lockd using kthread_run. With this change,
    lockd_down now blocks until lockd actually exits, so there's no longer
    need for the waitqueue code at the end of lockd_down. This also means
    that only one lockd can be running at a time which simplifies the code
    within lockd's main loop.

    This also adds a check for kthread_should_stop in the main loop of
    nlmsvc_retry_blocked and after that function returns. There's no sense
    continuing to retry blocks if lockd is coming down anyway.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Lockd caches information about hosts that have recently held locks to
    expedite the taking of further locks.

    It periodically discards this information for hosts that have not been
    used for a few minutes.

    lockd currently has a value NLM_HOST_MAX, and changes the 'garbage
    collection' behaviour when the number of hosts exceeds this threshold.

    However its behaviour is strange, and likely not what was intended.
    When the number of hosts exceeds the max, it scans *less* often (every
    2 minutes vs every minute) and allows unused host information to
    remain around longer (5 minutes instead of 2).

    Having this limit is of dubious value anyway, and we have not
    suffered from the code not getting the limit right, so remove the
    limit altogether. We go with the larger values (discard 5 minute old
    hosts every 2 minutes) as they are probably safer.

    Maybe the periodic garbage collection should be replace to with
    'shrinker' handler so we just respond to memory pressure....

    Acked-by: Jeff Layton
    Signed-off-by: Neil Brown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     

20 Apr, 2008

7 commits

  • Now that we've added the 'generic' credentials (that are independent of the
    rpc_client) to the nfs_open_context, we can use those in the NLM client to
    ensure that the lock/unlock requests are authenticated to whoever
    originally opened the file.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • We shouldn't remove the lock from the list of blocked locks until the
    CANCEL call has completed since we may be racing with a GRANTED callback.

    Also ensure that we send an UNLOCK if the CANCEL request failed. Normally
    that should only happen if the process gets hit with a fatal signal.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Currently, it returns success as long as the RPC call was sent. We'd like
    to know if the CANCEL operation succeeded on the server.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • The signal masks have been rendered obsolete by the preceding patch.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Peter Staubach comments:

    > In the course of investigating testing failures in the locking phase of
    > the Connectathon testsuite, I discovered a couple of things. One was
    > that one of the tests in the locking tests was racy when it didn't seem
    > to need to be and two, that the NFS client asynchronously releases locks
    > when a process is exiting.
    ...
    > The Single UNIX Specification Version 3 specifies that: "All locks
    > associated with a file for a given process shall be removed when a file
    > descriptor for that file is closed by that process or the process holding
    > that file descriptor terminates.".
    >
    > This does not specify whether those locks must be released prior to the
    > completion of the exit processing for the process or not. However,
    > general assumptions seem to be that those locks will be released. This
    > leads to more deterministic behavior under normal circumstances.

    The following patch converts the NFSv2/v3 locking code to use the same
    mechanism as NFSv4 for sending asynchronous RPC calls and then waiting for
    them to complete. This ensures that the UNLOCK and CANCEL RPC calls will
    complete even if the user interrupts the call, yet satisfies the
    above request for synchronous behaviour on process exit.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • When we replace the existing synchronous RPC calls with asynchronous calls,
    the reference count will be needed in order to allow us to examine the
    result of the RPC call.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Also fix up nlmclnt_lock() so that it doesn't pass modified versions of
    fl->fl_flags to nlmclnt_cancel() and other helpers.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

20 Mar, 2008

8 commits


22 Feb, 2008

1 commit

  • Sorry for the noise, but here's the v3 of this compilation fix :)

    There are some places, which declare the char buf[...] on the stack
    to push it later into dprintk(). Since the dprintk sometimes (if the
    CONFIG_SYSCTL=n) becomes an empty do { } while (0) stub, these buffers
    cause gcc to produce appropriate warnings.

    Wrap these buffers with RPC_IFDEBUG macro, as Trond proposed, to
    compile them out when not needed.

    Signed-off-by: Pavel Emelyanov
    Acked-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Pavel Emelyanov
     

11 Feb, 2008

4 commits

  • It's possible for lockd to catch a SIGKILL while a GRANT_MSG callback
    is in flight. If this happens we don't want lockd to insert the block
    back into the nlm_blocked list.

    This helps that situation, but there's still a possible race. Fixing
    that will mean adding real locking for nlm_blocked.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • With the current scheme in nlmsvc_grant_blocked, we can end up with more
    than one GRANT_MSG callback for a block in flight. Right now, we requeue
    the block unconditionally so that a GRANT_MSG callback is done again in
    30s. If the client is unresponsive, it can take more than 30s for the
    call already in flight to time out.

    There's no benefit to having more than one GRANT_MSG RPC queued up at a
    time, so put it on the list with a timeout of NLM_NEVER before doing the
    RPC call. If the RPC call submission fails, we requeue it with a short
    timeout. If it works, then nlmsvc_grant_callback will end up requeueing
    it with a shorter timeout after it completes.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Now that it no longer does an RPC ping, lockd always ends up queueing
    an RPC task for the GRANT_MSG callback. But, it also requeues the block
    for later attempts. Since these are hard RPC tasks, if the client we're
    calling back goes unresponsive the GRANT_MSG callbacks can stack up in
    the RPC queue.

    Fix this by making server-side RPC clients default to soft RPC tasks.
    lockd requeues the block anyway, so this should be OK.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • It's currently possible for an unresponsive NLM client to completely
    lock up a server's lockd. The scenario is something like this:

    1) client1 (or a process on the server) takes a lock on a file
    2) client2 tries to take a blocking lock on the same file and
    awaits the callback
    3) client2 goes unresponsive (plug pulled, network partition, etc)
    4) client1 releases the lock

    ...at that point the server's lockd will try to queue up a GRANT_MSG
    callback for client2, but first it requeues the block with a timeout of
    30s. nlm_async_call will attempt to bind the RPC client to client2 and
    will call rpc_ping. rpc_ping entails a sync RPC call and if client2 is
    unresponsive it will take around 60s for that to time out. Once it times
    out, it's already time to retry the block and the whole process repeats.

    Once in this situation, nlmsvc_retry_blocked will never return until
    the host starts responding again. lockd won't service new calls.

    Fix this by skipping the RPC ping on NLM RPC clients. This makes
    nlm_async_call return quickly when called.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

02 Feb, 2008

7 commits

  • It's possible for a RPC to outlive the lockd daemon that created it, so
    we need to make sure that all RPC's are killed when lockd is coming
    down. When nlm_shutdown_hosts is called, kill off all RPC tasks
    associated with the host. Since we need to wait until they have all gone
    away, we might as well just shut down the RPC client altogether.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Wendy Cheng noticed that function name doesn't agree here.

    Signed-off-by: J. Bruce Fields
    Cc: Wendy Cheng

    J. Bruce Fields
     
  • Update the write handler for the portlist file to allow creating new
    listening endpoints on a transport. The general form of the string is:

    For example:

    echo "tcp 2049" > /proc/fs/nfsd/portlist

    This is intended to support the creation of a listening endpoint for
    RDMA transports without adding #ifdef code to the nfssvc.c file.

    Transports can also be removed as follows:

    '-'

    For example:

    echo "-tcp 2049" > /proc/fs/nfsd/portlist

    Attempting to add a listener with an invalid transport string results
    in EPROTONOSUPPORT and a perror string of "Protocol not supported".

    Attempting to remove an non-existent listener (.e.g. bad proto or port)
    results in ENOTCONN and a perror string of
    "Transport endpoint is not connected"

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • Add a new svc function that allows a service to query whether a
    transport instance has already been created. This is used in lockd
    to determine whether or not a transport needs to be created when
    a lockd instance is brought up.

    Specifying 0 for the address family or port is effectively a wild-card,
    and will result in matching the first transport in the service's list
    that has a matching class name.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • Move sk_list and sk_ready to svc_xprt. This involves close because these
    lists are walked by svcs when closing all their transports. So I combined
    the moving of these lists to svc_xprt with making close transport independent.

    The svc_force_sock_close has been changed to svc_close_all and takes a list
    as an argument. This removes some svc internals knowledge from the svcs.

    This code races with module removal and transport addition.

    Thanks to Simon Holm Thøgersen for a compile fix.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields
    Cc: Simon Holm Thøgersen

    Tom Tucker
     
  • Modify the various kernel RPC svcs to use the svc_create_xprt service.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • Fix nlm_block leak for the case of supplied blocking lock info.

    Signed-off-by: Oleg Drokin
    Signed-off-by: J. Bruce Fields

    Oleg Drokin