13 Jan, 2012

1 commit


10 Jan, 2012

1 commit

  • Now that the use of numeric uids/gids is officially sanctioned in
    RFC3530bis, it is time to change the default here to 'enabled'.

    By doing so, we ensure that NFSv4 copies the behaviour of NFSv3 when we're
    using the default AUTH_SYS authentication (i.e. when the client uses the
    numeric uids/gids as authentication tokens), so that when new files are
    created, they will appear to have the correct user/group.
    It also fixes a number of backward compatibility issues when migrating
    from NFSv3 to NFSv4 on a platform where the server uses different uid/gid
    mappings than the client.

    Note also that this setting has been successfully tested against servers
    that do not support numeric uids/gids at several Connectathon/Bakeathon
    events at this point, and the fall back to using string names/groups has
    been shown to work well in all those test cases.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

06 Jan, 2012

1 commit

  • Servers have a finite amount of memory to store NFSv4 open and lock
    owners. Moreover, servers may have a difficult time determining when
    they can reap their state owner table, thanks to gray areas in the
    NFSv4 protocol specification. Thus clients should be careful to reuse
    state owners when possible.

    Currently Linux is not too careful. When a user has closed all her
    files on one mount point, the state owner's reference count goes to
    zero, and it is released. The next OPEN allocates a new one. A
    workload that serially opens and closes files can run through a large
    number of open owners this way.

    When a state owner's reference count goes to zero, slap it onto a free
    list for that nfs_server, with an expiry time. Garbage collect before
    looking for a state owner. This makes state owners for active users
    available for re-use.

    Now that there can be unused state owners remaining at umount time,
    purge the state owner free list when a server is destroyed. Also be
    sure not to reclaim unused state owners during state recovery.

    This change has benefits for the client as well. For some workloads,
    this approach drops the number of OPEN_CONFIRM calls from the same as
    the number of OPEN calls, down to just one. This reduces wire traffic
    and thus open(2) latency. Before this patch, untarring a kernel
    source tarball shows the OPEN_CONFIRM call counter steadily increasing
    through the test. With the patch, the OPEN_CONFIRM count remains at 1
    throughout the entire untar.

    As long as the expiry time is kept short, I don't think garbage
    collection should be terribly expensive, although it does bounce the
    clp->cl_lock around a bit.

    [ At some point we should rationalize the use of the nfs_server
    ->destroy method. ]

    Signed-off-by: Chuck Lever
    [Trond: Fixed a garbage collection race and a few efficiency issues]
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

05 Jan, 2012

1 commit


21 Oct, 2011

1 commit


19 Oct, 2011

1 commit

  • The result from ipv6_addr_scope() always not be a single SCOPE,
    so we can't use equal to compare the result with IPV6_ADDR_SCOPE_LINKLOCAL
    at nfs_sockaddr_match_ipaddr6.

    This patch fixs the problem, and lets checking address before scope_id.

    Signed-off-by: Mi Jinlong
    Signed-off-by: Trond Myklebust

    Mi Jinlong
     

01 Aug, 2011

3 commits


15 Jul, 2011

1 commit


13 Jul, 2011

2 commits


30 May, 2011

1 commit

  • Use the pnfs_layoutdriver_type both as a qualifier for the deviceid,
    distinguishing deviceid from different layout types on the server,
    and for freeing the layout-driver allocated structure containing the
    nfs4_deviceid_node.

    [BUG in _deviceid_purge_client]
    [layout_driver MUST set free_deviceid_node if using dev-cache]
    [let ver < 4.1 compile]
    Signed-off-by: Boaz Harrosh
    [removed EXPORT_SYMBOL_GPL(nfs4_deviceid_purge_client)]
    Signed-off-by: Benny Halevy

    Benny Halevy
     

12 Mar, 2011

6 commits


26 Jan, 2011

1 commit

  • The information required to find the nfs_client cooresponding to the incoming
    back channel request is contained in the NFS layer. Perform minimal checking
    in the RPC layer pg_authenticate method, and push more detailed checking into
    the NFS layer where the nfs_client can be found.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     

07 Jan, 2011

6 commits

  • Delegations are per-inode, not per-nfs_client. When a server file
    system is migrated, delegations on the client must be moved from the
    source to the destination nfs_server. Make it easier to manage a
    mount point's delegation list across a migration event by moving the
    list to the nfs_server struct.

    Clean up: I added documenting comments to public functions I changed
    in this patch. For consistency I added comments to all the other
    public functions in fs/nfs/delegation.c.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • We're about to move some fields from struct nfs_client to struct
    nfs_server. There is a many-to-one relationship between nfs_servers
    and nfs_clients. After these fields are moved to the nfs_server
    struct, to visit all of the data in these fields that is owned by one
    nfs_client, code will need to visit each nfs_server on the
    cl_superblocks list for that nfs_client.

    To serialize changes to the cl_superblocks list during these little
    expeditions, protect the list with RCU.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • A layout can request return-on-close. How this interacts with the
    forgetful model of never sending LAYOUTRETURNS is a bit ambiguous.
    We forget any layouts marked roc, and wait for them to be completely
    forgotten before continuing with the close. In addition, to compensate
    for races with any inflight LAYOUTGETs, and the fact that we do not get
    any layout stateid back from the server, we set the barrier to the worst
    case scenario of current_seqid + number of outstanding LAYOUTGETS.

    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Fred Isaman
     
  • Fixes a bug where the nfs_client could be freed during callback processing.
    Refactor nfs_find_client to use minorversion specific means to locate the
    correct nfs_client structure.

    In the NFS layer, V4.0 clients are found using the callback_ident field in the
    CB_COMPOUND header. V4.1 clients are found using the sessionID in the
    CB_SEQUENCE operation which is also compared against the sessionID associated
    with the back channel thread after a successful CREATE_SESSION.

    Each of these methods finds the one an only nfs_client associated
    with the incoming callback request - so nfs_find_client_next is not needed.

    In the RPC layer, the pg_authenticate call needs to find the nfs_client. For
    the v4.0 callback service, the callback identifier has not been decoded so a
    search by address, version, and minorversion is used. The sessionid for the
    sessions based callback service has (usually) not been set for the
    pg_authenticate on a CB_NULL call which can be sent prior to the return
    of a CREATE_SESSION call, so the sessionid associated with the back channel
    thread is not used to find the client in pg_authenticate for CB_NULL calls.

    Pass the referenced nfs_client to each CB_COMPOUND operation being proceesed
    via the new cb_process_state structure. The reference is held across
    cb_compound processing.

    Use the new cb_process_state struct to move the NFS4ERR_RETRY_UNCACHED_REP
    processing from process_op into nfs4_callback_sequence where it belongs.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Use the small id to pointer translator service to provide a unique callback
    identifier per SETCLIENTID call used to identify the v4.0 callback service
    associated with the clientid.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Resetting the client minor version operations causes nfs4_destroy_callback
    to fail to shutdown the NFSv4.1 callback service.

    There is no reason to reset the client minorversion operations when the
    nfs_client struct is being freed.

    Remove the minorverion reset and rename the function.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     

27 Oct, 2010

2 commits

  • * 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: (99 commits)
    svcrpc: svc_tcp_sendto XPT_DEAD check is redundant
    svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue
    svcrpc: assume svc_delete_xprt() called only once
    svcrpc: never clear XPT_BUSY on dead xprt
    nfsd4: fix connection allocation in sequence()
    nfsd4: only require krb5 principal for NFSv4.0 callbacks
    nfsd4: move minorversion to client
    nfsd4: delay session removal till free_client
    nfsd4: separate callback change and callback probe
    nfsd4: callback program number is per-session
    nfsd4: track backchannel connections
    nfsd4: confirm only on succesful create_session
    nfsd4: make backchannel sequence number per-session
    nfsd4: use client pointer to backchannel session
    nfsd4: move callback setup into session init code
    nfsd4: don't cache seq_misordered replies
    SUNRPC: Properly initialize sock_xprt.srcaddr in all cases
    SUNRPC: Use conventional switch statement when reclassifying sockets
    sunrpc/xprtrdma: clean up workqueue usage
    sunrpc: Turn list_for_each-s into the ..._entry-s
    ...

    Fix up trivial conflicts (two different deprecation notices added in
    separate branches) in Documentation/feature-removal-schedule.txt

    Linus Torvalds
     
  • * 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
    net/sunrpc: Use static const char arrays
    nfs4: fix channel attribute sanity-checks
    NFSv4.1: Use more sensible names for 'initialize_mountpoint'
    NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure
    NFSv4.1: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure
    NFS: client needs to maintain list of inodes with active layouts
    NFS: create and destroy inode's layout cache
    NFSv4.1: pnfs: filelayout: introduce minimal file layout driver
    NFSv4.1: pnfs: full mount/umount infrastructure
    NFS: set layout driver
    NFS: ask for layouttypes during v4 fsinfo call
    NFS: change stateid to be a union
    NFSv4.1: pnfsd, pnfs: protocol level pnfs constants
    SUNRPC: define xdr_decode_opaque_fixed
    NFSD: remove duplicate NFS4_STATEID_SIZE

    Linus Torvalds
     

26 Oct, 2010

1 commit

  • * 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (67 commits)
    SUNRPC: Cleanup duplicate assignment in rpcauth_refreshcred
    nfs: fix unchecked value
    Ask for time_delta during fsinfo probe
    Revalidate caches on lock
    SUNRPC: After calling xprt_release(), we must restart from call_reserve
    NFSv4: Fix up the 'dircount' hint in encode_readdir
    NFSv4: Clean up nfs4_decode_dirent
    NFSv4: nfs4_decode_dirent must clear entry->fattr->valid
    NFSv4: Fix a regression in decode_getfattr
    NFSv4: Fix up decode_attr_filehandle() to handle the case of empty fh pointer
    NFS: Ensure we check all allocation return values in new readdir code
    NFS: Readdir plus in v4
    NFS: introduce generic decode_getattr function
    NFS: check xdr_decode for errors
    NFS: nfs_readdir_filler catch all errors
    NFS: readdir with vmapped pages
    NFS: remove page size checking code
    NFS: decode_dirent should use an xdr_stream
    SUNRPC: Add a helper function xdr_inline_peek
    NFS: remove readdir plus limit
    ...

    Linus Torvalds
     

25 Oct, 2010

4 commits

  • Implement the driver's io_ops->alloc_lseg and free_lseg functions,
    which integrate into the deviceid cache and calls out to
    nfs4_proc_getdeviceinfo when necessary.

    Signed-off-by: Andy Adamson
    Signed-off-by: Dean Hildebrand
    Signed-off-by: Marc Eshel
    Signed-off-by: Mike Sager
    Signed-off-by: Oleg Drokin
    Signed-off-by: Ricardo Labiaga
    Signed-off-by: Tao Guo
    Signed-off-by: Boaz Harrosh
    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • In particular, server reboot will invalidate all layouts.

    Note that in order to have an active layout, we must get a successful response
    from the server. To avoid adding that machinery, this patch just includes a
    stub that fakes up a successful return. Since the layout is never referenced
    for io, this is not a problem.

    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    Signed-off-by: Dean Hildebrand
    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Put in the infrastructure that uses information returned from the
    server at mount to select a layout driver module.

    In this patch, a stub is used that always returns "no driver found".

    Signed-off-by: Ricardo Labiaga
    Signed-off-by: Dean Hildebrand
    Signed-off-by: Marc Eshel
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Ricardo Labiaga
     
  • Instead of blindly zapping the caches, attempt to revalidate them if
    the server has indicated that it uses high resolution timestamps.

    NFSv4 should be able to always revalidate the cache since the
    protocol requires the update of the change attribute on modification of
    the data. In reality, there are servers (the Linux NFS server
    for example) that do not obey this requirement and use ctime as the
    basis for change attribute. Long term, the server needs to be fixed.
    At this time, and to be on the safe side, continue zapping caches if
    the server indicates that it does not have a high resolution timestamp.

    Signed-off-by: Ricardo Labiaga
    Signed-off-by: Trond Myklebust

    Ricardo Labiaga
     

24 Oct, 2010

2 commits

  • By requsting more attributes during a readdir, we can mimic the readdir plus
    operation that was in NFSv3.

    To test, I ran the command `ls -lU --color=none` on directories with various
    numbers of files. Without readdir plus, I see this:

    n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000
    --------+-----------+-----------+-----------+-----------+----------
    real | 0m00.153s | 0m00.589s | 0m05.601s | 0m56.691s | 9m59.128s
    user | 0m00.007s | 0m00.007s | 0m00.077s | 0m00.703s | 0m06.800s
    sys | 0m00.010s | 0m00.070s | 0m00.633s | 0m06.423s | 1m10.005s
    access | 3 | 1 | 1 | 4 | 31
    getattr | 2 | 1 | 1 | 1 | 1
    lookup | 104 | 1,003 | 10,003 | 100,003 | 1,000,003
    readdir | 2 | 16 | 158 | 1,575 | 15,749
    total | 111 | 1,021 | 10,163 | 101,583 | 1,015,784

    With readdir plus enabled, I see this:

    n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000
    --------+-----------+-----------+-----------+-----------+----------
    real | 0m00.115s | 0m00.206s | 0m01.079s | 0m12.521s | 2m07.528s
    user | 0m00.003s | 0m00.003s | 0m00.040s | 0m00.290s | 0m03.296s
    sys | 0m00.007s | 0m00.020s | 0m00.120s | 0m01.357s | 0m17.556s
    access | 3 | 1 | 1 | 1 | 7
    getattr | 2 | 1 | 1 | 1 | 1
    lookup | 4 | 3 | 3 | 3 | 3
    readdir | 6 | 62 | 630 | 6,300 | 62,993
    total | 15 | 67 | 635 | 6,305 | 63,004

    Readdir plus disabled has about a 16x increase in the number of rpc calls and
    is 4 - 5 times slower on large directories.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • We can use vmapped pages to read more information from the network at once.
    This will reduce the number of calls needed to complete a readdir.

    Signed-off-by: Bryan Schumaker
    [trondmy: Added #include for linux/vmalloc.h> in fs/nfs/dir.c]
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     

02 Oct, 2010

1 commit


23 Sep, 2010

1 commit

  • NFS clients since 2.6.12 support flock locks by emulating fcntl byte-range
    locks. Due to this, some windows applications which seem to use both flock
    (share mode lock mapped as flock by Samba) and fcntl locks sequentially on
    the same file, can't lock as they falsely assume the file is already locked.
    The problem was reported on a setup with windows clients accessing excel files
    on a Samba exported share which is originally a NFS mount from a NetApp filer.

    Older NFS clients (< 2.6.12) did not see this problem as flock locks were
    considered local. To support legacy flock behavior, this patch adds a mount
    option "-olocal_lock=" which can take the following values:

    'none' - Neither flock locks nor POSIX locks are local
    'flock' - flock locks are local
    'posix' - fcntl/POSIX locks are local
    'all' - Both flock locks and POSIX locks are local

    Testing:

    - This patch was tested by using -olocal_lock option with different values
    and the NLM calls were noted from the network packet captured.

    'none' - NLM calls were seen during both flock() and fcntl(), flock lock
    was granted, fcntl was denied
    'flock' - no NLM calls for flock(), NLM call was seen for fcntl(),
    granted
    'posix' - NLM call was seen for flock() - granted, no NLM call for fcntl()
    'all' - no NLM calls were seen during both flock() and fcntl()

    - No bugs were seen during NFSv4 locking/unlocking in general and NFSv4
    reboot recovery.

    Cc: Neil Brown
    Signed-off-by: Suresh Jayaraman
    Signed-off-by: Trond Myklebust

    Suresh Jayaraman
     

13 Sep, 2010

1 commit


23 Jun, 2010

2 commits