02 Oct, 2014

1 commit

  • We now have cb_to_delegation and to_delegation, which do the same thing
    and are defined separately in different .c files. Move the
    cb_to_delegation definition into a header file and eliminate the
    redundant to_delegation definition.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jeff Layton

    Jeff Layton
     

30 Sep, 2014

3 commits

  • This patch adds server support for the NFS v4.2 operation SEEK, which
    returns the position of the next hole or data segment in a file.

    Signed-off-by: Anna Schumaker
    Signed-off-by: J. Bruce Fields

    Anna Schumaker
     
  • It's cleaner to introduce everything at once and have the server reply
    with "not supported" than it would be to introduce extra operations when
    implementing a specific one in the middle of the list.

    Signed-off-by: Anna Schumaker
    Signed-off-by: J. Bruce Fields

    Anna Schumaker
     
  • Svcrdma currently advertises 1MB, which is too large. The correct value
    is the minimum of RPCSVC_MAXPAYLOAD and the max scatter-gather allowed
    in an NFSRDMA IO chunk * the host page size. This bug is usually benign
    because the Linux X64 NFSRDMA client correctly limits the payload size to
    the correct value (64*4096 = 256KB). But if the Linux client is PPC64
    with a 64KB page size, then the client will indeed use a payload size
    that will overflow the server.

    Signed-off-by: Steve Wise
    Signed-off-by: J. Bruce Fields

    Steve Wise
     

27 Sep, 2014

6 commits

  • Add a higher level abstraction than the rpc_ops for callback operations.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Christoph Hellwig
     
  • Split out initializing the nfs4_callback structure from using it. For
    the NULL callback this gets rid of tons of pointless re-initializations.

    Note that I don't quite understand what protects us from running multiple
    NULL callbacks at the same time, but at least this chance doesn't make
    it worse..

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Christoph Hellwig
     
  • Add a helper to queue up a callback. CB_NULL has a bit of special casing
    because it is special in the specification, but all other new callback
    operations will be able to share code with this and a few more changes
    to refactor the callback code.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Christoph Hellwig
     
  • We can always get at the private data by using container_of, no need for
    a void pointer. Also introduce a little to_delegation helper to avoid
    opencoding the container_of everywhere.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Christoph Hellwig
     
  • This is incorrect when a callback is has to be restarted, in which case
    the XDR decoding of the second iteration will see a NULL cb argument.

    [hch: updated description]
    Signed-off-by: Benny Halevy
    Signed-off-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields

    Benny Halevy
     
  • For any error that is not EBADHANDLE or NFS4ERR_BAD_STATEID,
    nfsd4_cb_recall_done first marks the connection down, then
    retries until dl_retries hits zero, then marks the connection down
    again and sets cb_done. This changes the code to only retry
    for EBADHANDLE or NFS4ERR_BAD_STATEID, and factors setting
    cb_done into a single point in the function.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields

    Christoph Hellwig
     

18 Sep, 2014

11 commits

  • The grace period is ended in two steps--first userland is notified that
    the grace period is now long enough that any clients who have not yet
    reclaimed can be safely forgotten, then we flip the switch that forbids
    reclaims and allows new opens. I had to think a bit to convince myself
    that the ordering was right here. Document it.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The attempt to automatically set a new grace period time at the end of
    the grace period isn't really helpful. We'll probably shut down and
    reboot before we actually make use of the new grace period time anyway.
    So may as well leave it up to the init system to get this right.

    This just confuses people when they see /proc/fs/nfsd/nfsv4gracetime
    change from what they set it to.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • In the case of v4.0 clients, we may call into the "create" client
    tracking operation multiple times (once for each openowner). Upcalling
    for each one of those is wasteful and slow however. We can skip doing
    further "create" operations after the first one if we know that one has
    already been done.

    v4.1+ clients generally only call into this function once (on
    RECLAIM_COMPLETE), and we can't skip upcalling on the create even if the
    STABLE bit is set. Doing so would make it impossible for nfsdcltrack to
    lift the grace period early since the timestamp has a different meaning
    in the case where the client is expected to issue a RECLAIM_COMPLETE.

    Signed-off-by: Jeff Layton

    Jeff Layton
     
  • The nfsdcltrack upcall doesn't utilize the NFSD4_CLIENT_STABLE flag,
    which basically results in an upcall every time we call into the client
    tracking ops.

    Change it to set this bit on a successful "check" or "create" request,
    and clear it on a "remove" request. Also, check to see if that bit is
    set before upcalling on a "check" or "remove" request, and skip
    upcalling appropriately, depending on its state.

    Signed-off-by: Jeff Layton

    Jeff Layton
     
  • In a later patch, we want to add a flag that will allow us to reduce the
    need for upcalls. In order to handle that correctly, we'll need to
    ensure that racing upcalls for the same client can't occur. In practice
    it should be rare for this to occur with a well-behaved client, but it
    is possible.

    Convert one of the bits in the cl_flags field to be an upcall bitlock,
    and use it to ensure that upcalls for the same client are serialized.

    Signed-off-by: Jeff Layton

    Jeff Layton
     
  • In order to support lifting the grace period early, we must tell
    nfsdcltrack what sort of client the "create" upcall is for. We can't
    reliably tell if a v4.0 client has completed reclaiming, so we can only
    lift the grace period once all the v4.1+ clients have issued a
    RECLAIM_COMPLETE and if there are no v4.0 clients.

    Also, in order to lift the grace period, we have to tell userland when
    the grace period started so that it can tell whether a RECLAIM_COMPLETE
    has been issued for each client since then.

    Since this is all optional info, we pass it along in environment
    variables to the "init" and "create" upcalls. By doing this, we don't
    need to revise the upcall format. The UMH upcall can simply make use of
    this info if it happens to be present. If it's not then it can just
    avoid lifting the grace period early.

    Signed-off-by: Jeff Layton

    Jeff Layton
     
  • Allow a privileged userland process to end the v4 grace period early.
    Writing "Y", "y", or "1" to the file will cause the v4 grace period to
    be lifted. The basic idea with this will be to allow the userland
    client tracking program to lift the grace period once it knows that no
    more clients will be reclaiming state.

    Signed-off-by: Jeff Layton

    Jeff Layton
     
  • Add a new procfile that will allow a (privileged) userland process to
    end the NLM grace period early. The basic idea here will be to have
    sm-notify write to this file, if it sent out no NOTIFY requests when
    it runs. In that situation, we can generally expect that there will be
    no reclaim requests so the grace period can be lifted early.

    Signed-off-by: Jeff Layton

    Jeff Layton
     
  • As stated in RFC 5661, section 18.51.3:

    Once a RECLAIM_COMPLETE is done, there can be no further reclaim
    operations for locks whose scope is defined as having completed
    recovery. Once the client sends RECLAIM_COMPLETE, the server will
    not allow the client to do subsequent reclaims of locking state for
    that scope and, if these are attempted, will return
    NFS4ERR_NO_GRACE.

    Ensure that we enforce that requirement.

    Signed-off-by: Jeff Layton

    Jeff Layton
     
  • Since it's stored in nfsd_net, we don't need to pass it in separately.

    Signed-off-by: Jeff Layton

    Jeff Layton
     
  • Currently, all of the grace period handling is part of lockd. Eventually
    though we'd like to be able to build v4-only servers, at which point
    we'll need to put all of this elsewhere.

    Move the code itself into fs/nfs_common and have it build a grace.ko
    module. Then, rejigger the Kconfig options so that both nfsd and lockd
    enable it automatically.

    Signed-off-by: Jeff Layton

    Jeff Layton
     

11 Sep, 2014

1 commit

  • This fixes a failure in xfstests generic/313 because nfs doesn't update
    mtime on a truncate. The protocol requires this to be done implicity
    for a size changing setattr.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields

    Christoph Hellwig
     

04 Sep, 2014

6 commits


03 Sep, 2014

2 commits


29 Aug, 2014

5 commits


19 Aug, 2014

1 commit

  • One of our customer's application only needs file names, not file
    attributes. With directories having 10K+ inodes (assuming buffer cache
    has directory blocks cached having file names, but inode cache is
    limited and hence need eviction of older cached inodes), older inodes
    are evicted periodically. So if they keep on doing readdir(2) from NSF
    client on multiple directories, some directory's files are periodically
    removed from inode cache and hence new readdir(2) on same directory
    requires disk access to bring back inodes again to inode cache.

    As READDIRPLUS request fetches attributes also, doing getattr on each
    file on server, it causes unnecessary disk accesses. If READDIRPLUS on
    NFS client is returned with -ENOTSUPP, NFS client uses READDIR request
    which just gets the names of the files in a directory, not attributes,
    hence avoiding disk accesses on server.

    There's already a corresponding client-side mount option, but an export
    option reduces the need for configuration across multiple clients.

    This flag affects NFSv3 only. If it turns out it's needed for NFSv4 as
    well then we may have to figure out how to extend the behavior to NFSv4,
    but it's not currently obvious how to do that.

    Signed-off-by: Rajesh Ghanekar
    Signed-off-by: J. Bruce Fields

    Rajesh Ghanekar
     

18 Aug, 2014

4 commits

  • As of 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low
    on space", we permit the server to process a LOCK operation even if
    there might not be space to return the conflicting lockowner, because
    we've made returning the conflicting lockowner optional.

    However, the rpc server still wants to know the most we might possibly
    return, so we need to take into account the possible conflicting
    lockowner in the svc_reserve_space() call here.

    Symptoms were log messages like "RPC request reserved 88 but used 108".

    Fixes: 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low on space"
    Reported-by: Kinglong Mee
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • We do what Neil suggests now.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • When creating a file that already exists in a read-only directory with
    O_EXCL, the NFSv3 server returns EACCES rather than EEXIST (which local
    files and the NFSv4 server return). Fix this by checking the MAY_CREATE
    permission only if the file does not exist. Since this already happens
    in do_nfsd_create, the check in nfsd3_proc_create can simply be removed.

    Signed-off-by: Ross Lagerwall
    Signed-off-by: J. Bruce Fields

    Ross Lagerwall
     
  • Currently, we hold the state_lock when releasing the lease. That's
    potentially problematic in the future if we allow for setlease methods
    that can sleep. Move the nfs4_put_deleg_lease call out of the delegation
    unhashing routine (which was always a bit goofy anyway), and into the
    unlocked sections of the callers of unhash_delegation_locked.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton