06 Jan, 2012

1 commit

  • Servers have a finite amount of memory to store NFSv4 open and lock
    owners. Moreover, servers may have a difficult time determining when
    they can reap their state owner table, thanks to gray areas in the
    NFSv4 protocol specification. Thus clients should be careful to reuse
    state owners when possible.

    Currently Linux is not too careful. When a user has closed all her
    files on one mount point, the state owner's reference count goes to
    zero, and it is released. The next OPEN allocates a new one. A
    workload that serially opens and closes files can run through a large
    number of open owners this way.

    When a state owner's reference count goes to zero, slap it onto a free
    list for that nfs_server, with an expiry time. Garbage collect before
    looking for a state owner. This makes state owners for active users
    available for re-use.

    Now that there can be unused state owners remaining at umount time,
    purge the state owner free list when a server is destroyed. Also be
    sure not to reclaim unused state owners during state recovery.

    This change has benefits for the client as well. For some workloads,
    this approach drops the number of OPEN_CONFIRM calls from the same as
    the number of OPEN calls, down to just one. This reduces wire traffic
    and thus open(2) latency. Before this patch, untarring a kernel
    source tarball shows the OPEN_CONFIRM call counter steadily increasing
    through the test. With the patch, the OPEN_CONFIRM count remains at 1
    throughout the entire untar.

    As long as the expiry time is kept short, I don't think garbage
    collection should be terribly expensive, although it does bounce the
    clp->cl_lock around a bit.

    [ At some point we should rationalize the use of the nfs_server
    ->destroy method. ]

    Signed-off-by: Chuck Lever
    [Trond: Fixed a garbage collection race and a few efficiency issues]
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

25 Oct, 2011

1 commit

  • * 'for-3.2' of git://linux-nfs.org/~bfields/linux: (103 commits)
    nfs41: implement DESTROY_CLIENTID operation
    nfsd4: typo logical vs bitwise negate for want_mask
    nfsd4: allow NFS4_SHARE_SIGNAL_DELEG_WHEN_RESRC_AVAIL | NFS4_SHARE_PUSH_DELEG_WHEN_UNCONTENDED
    nfsd4: seq->status_flags may be used unitialized
    nfsd41: use SEQ4_STATUS_BACKCHANNEL_FAULT when cb_sequence is invalid
    nfsd4: implement new 4.1 open reclaim types
    nfsd4: remove unneeded CLAIM_DELEGATE_CUR workaround
    nfsd4: warn on open failure after create
    nfsd4: preallocate open stateid in process_open1()
    nfsd4: do idr preallocation with stateid allocation
    nfsd4: preallocate nfs4_file in process_open1()
    nfsd4: clean up open owners on OPEN failure
    nfsd4: simplify process_open1 logic
    nfsd4: make is_open_owner boolean
    nfsd4: centralize renew_client() calls
    nfsd4: typo logical vs bitwise negate
    nfs: fix bug about IPv6 address scope checking
    nfsd4: more robust ignoring of WANT bits in OPEN
    nfsd4: move name-length checks to xdr
    nfsd4: move access/deny validity checks to xdr code
    ...

    Linus Torvalds
     

28 Aug, 2011

1 commit


25 Aug, 2011

3 commits


01 Aug, 2011

1 commit


26 Jul, 2011

1 commit


20 Jul, 2011

1 commit


13 Jul, 2011

2 commits

  • If the client is using NFS v4.1, then we can use SECINFO_NO_NAME to find
    the secflavor for the initial mount. If the server doesn't support
    SECINFO_NO_NAME then I fall back on the "guess and check" method used
    for v4.0 mounts.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • can be skipped if the "eir_server_scope" from the exchange_id proc differs from
    previous calls.

    Also, in the future server_scope will be useful for determining whether client
    trunking is available

    Signed-off-by: Weston Andros Adamson
    Signed-off-by: Trond Myklebust

    Weston Andros Adamson
     

25 Apr, 2011

1 commit

  • If a server for some reason keeps sending NFS4ERR_DELAY errors, we can end
    up looping forever inside nfs4_proc_create_session, and so the usual
    mechanisms for detecting if the nfs_client is dead don't work.

    Fix this by ensuring that we loop inside the nfs4_state_manager thread
    instead.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

25 Mar, 2011

3 commits


24 Mar, 2011

1 commit

  • The filelayout driver sends LAYOUTCOMMIT only when COMMIT goes to
    the data server (as opposed to the MDS) and the data server WRITE
    is not NFS_FILE_SYNC.

    Only whole file layout support means that there is only one IOMODE_RW layout
    segment.

    Signed-off-by: Andy Adamson
    Signed-off-by: Alexandros Batsakis
    Signed-off-by: Boaz Harrosh
    Signed-off-by: Dean Hildebrand
    Signed-off-by: Fred Isaman
    Signed-off-by: Mingyang Guo
    Signed-off-by: Tao Guo
    Signed-off-by: Zhang Jingwang
    Tested-by: Boaz Harrosh
    Signed-off-by: Benny Halevy
    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Andy Adamson
     

12 Mar, 2011

5 commits

  • Attempt a pNFS file layout read by setting up the nfs_read_data struct and
    calling nfs_initiate_read with the data server rpc client and the
    filelayout rpc call ops.

    Error handling is implemented in a subsequent patch.

    Signed-off-by: Andy Adamson
    Signed-off-by: Dean Hildebrand
    Signed-off-by: Fred Isaman
    Signed-off-by: Fred Isaman
    Signed-off-by: Mingyang Guo
    Signed-off-by: Oleg Drokin
    Signed-off-by: Ricardo Labiaga
    Tested-by: Guo Mingyang
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Introduce a data server set_client and init session following the
    nfs4_set_client and nfs4_init_session convention.

    Once a new nfs_client is on the nfs_client_list, the nfs_client cl_cons_state
    serializes access to creating an nfs_client struct with matching properties.

    Use the new nfs_get_client() that initializes new clients.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • The DS only role cannot be used to mount.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • There are no more external users of nfs4_state_mark_reclaim_nograce() or
    nfs4_state_mark_reclaim_reboot(), so mark them as static.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • nfs4_schedule_state_recovery() should only be used when we need to force
    the state manager to check the lease. If we just want to start the
    state manager in order to handle a state recovery situation, we should be
    using nfs4_schedule_state_manager().

    This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing
    its use with a set of helper functions that do the right thing.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

07 Jan, 2011

3 commits

  • NFSv4 migration needs to reassociate state owners from the source to
    the destination nfs_server data structures. To make that easier, move
    the cl_state_owners field to the nfs_server struct. cl_openowner_id
    and cl_lockowner_id accompany this move, as they are used in
    conjunction with cl_state_owners.

    The cl_lock field in the parent nfs_client continues to protect all
    three of these fields.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • A layout can request return-on-close. How this interacts with the
    forgetful model of never sending LAYOUTRETURNS is a bit ambiguous.
    We forget any layouts marked roc, and wait for them to be completely
    forgotten before continuing with the close. In addition, to compensate
    for races with any inflight LAYOUTGETs, and the fact that we do not get
    any layout stateid back from the server, we set the barrier to the worst
    case scenario of current_seqid + number of outstanding LAYOUTGETS.

    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Fred Isaman
     
  • This is the heart of the wave 2 submission. Add the code to trigger
    drain and forget of any afected layouts. In addition, we set a
    "barrier", below which any LAYOUTGET reply is ignored. This is to
    compensate for the fact that we do not wait for outstanding LAYOUTGETs
    to complete as per section 12.5.5.2.1 of RFC 5661.

    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Fred Isaman
     

05 Jan, 2011

1 commit


17 Dec, 2010

1 commit

  • Clean up.

    The pointer returned by ->decode_dirent() is no longer used as a
    pointer. The only call site (xdr_decode() in fs/nfs/dir.c) simply
    extracts the errno value encoded in the pointer. Replace the
    returned pointer with a standard integer errno return value.

    Also, pass the "server" argument as part of the nfs_entry instead of
    as a separate parameter. It's faster to derive "server" in
    nfs_readdir_xdr_to_array() since we already have the directory's inode
    handy. "server" ought to be invariant for a set of entries in the
    same directory, right?

    The legacy versions of decode_dirent() don't use "server" anyway, so
    it's wasted work for them to derive and pass "server" for each entry.

    Signed-off-by: Chuck Lever
    Tested-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

24 Oct, 2010

2 commits

  • By requsting more attributes during a readdir, we can mimic the readdir plus
    operation that was in NFSv3.

    To test, I ran the command `ls -lU --color=none` on directories with various
    numbers of files. Without readdir plus, I see this:

    n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000
    --------+-----------+-----------+-----------+-----------+----------
    real | 0m00.153s | 0m00.589s | 0m05.601s | 0m56.691s | 9m59.128s
    user | 0m00.007s | 0m00.007s | 0m00.077s | 0m00.703s | 0m06.800s
    sys | 0m00.010s | 0m00.070s | 0m00.633s | 0m06.423s | 1m10.005s
    access | 3 | 1 | 1 | 4 | 31
    getattr | 2 | 1 | 1 | 1 | 1
    lookup | 104 | 1,003 | 10,003 | 100,003 | 1,000,003
    readdir | 2 | 16 | 158 | 1,575 | 15,749
    total | 111 | 1,021 | 10,163 | 101,583 | 1,015,784

    With readdir plus enabled, I see this:

    n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000
    --------+-----------+-----------+-----------+-----------+----------
    real | 0m00.115s | 0m00.206s | 0m01.079s | 0m12.521s | 2m07.528s
    user | 0m00.003s | 0m00.003s | 0m00.040s | 0m00.290s | 0m03.296s
    sys | 0m00.007s | 0m00.020s | 0m00.120s | 0m01.357s | 0m17.556s
    access | 3 | 1 | 1 | 1 | 7
    getattr | 2 | 1 | 1 | 1 | 1
    lookup | 4 | 3 | 3 | 3 | 3
    readdir | 6 | 62 | 630 | 6,300 | 62,993
    total | 15 | 67 | 635 | 6,305 | 63,004

    Readdir plus disabled has about a 16x increase in the number of rpc calls and
    is 4 - 5 times slower on large directories.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • Convert nfs*xdr.c to use an xdr stream in decode_dirent. This will prevent a
    kernel oops that has been occuring when reading a vmapped page.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     

17 Sep, 2010

5 commits


31 Jul, 2010

2 commits


25 Jun, 2010

1 commit


23 Jun, 2010

4 commits