06 Jan, 2012

1 commit

  • Servers have a finite amount of memory to store NFSv4 open and lock
    owners. Moreover, servers may have a difficult time determining when
    they can reap their state owner table, thanks to gray areas in the
    NFSv4 protocol specification. Thus clients should be careful to reuse
    state owners when possible.

    Currently Linux is not too careful. When a user has closed all her
    files on one mount point, the state owner's reference count goes to
    zero, and it is released. The next OPEN allocates a new one. A
    workload that serially opens and closes files can run through a large
    number of open owners this way.

    When a state owner's reference count goes to zero, slap it onto a free
    list for that nfs_server, with an expiry time. Garbage collect before
    looking for a state owner. This makes state owners for active users
    available for re-use.

    Now that there can be unused state owners remaining at umount time,
    purge the state owner free list when a server is destroyed. Also be
    sure not to reclaim unused state owners during state recovery.

    This change has benefits for the client as well. For some workloads,
    this approach drops the number of OPEN_CONFIRM calls from the same as
    the number of OPEN calls, down to just one. This reduces wire traffic
    and thus open(2) latency. Before this patch, untarring a kernel
    source tarball shows the OPEN_CONFIRM call counter steadily increasing
    through the test. With the patch, the OPEN_CONFIRM count remains at 1
    throughout the entire untar.

    As long as the expiry time is kept short, I don't think garbage
    collection should be terribly expensive, although it does bounce the
    clp->cl_lock around a bit.

    [ At some point we should rationalize the use of the nfs_server
    ->destroy method. ]

    Signed-off-by: Chuck Lever
    [Trond: Fixed a garbage collection race and a few efficiency issues]
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

01 Aug, 2011

4 commits

  • Fix two recently introduced compile problems:

    Fix a typo in fs/nfs/pnfs.h

    Move the pnfs_blksize declaration outside the CONFIG_NFS_V4 section in
    struct nfs_server.

    Reported-by: Jens Axboe
    Signed-off-by: Trond Myklebust
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     
  • * 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits)
    pnfsblock: write_pagelist handle zero invalid extents
    pnfsblock: note written INVAL areas for layoutcommit
    pnfsblock: bl_write_pagelist
    pnfsblock: bl_read_pagelist
    pnfsblock: cleanup_layoutcommit
    pnfsblock: encode_layoutcommit
    pnfsblock: merge rw extents
    pnfsblock: add extent manipulation functions
    pnfsblock: bl_find_get_extent
    pnfsblock: xdr decode pnfs_block_layout4
    pnfsblock: call and parse getdevicelist
    pnfsblock: merge extents
    pnfsblock: lseg alloc and free
    pnfsblock: remove device operations
    pnfsblock: add device operations
    pnfsblock: basic extent code
    pnfsblock: use pageio_ops api
    pnfsblock: add blocklayout Kconfig option, Makefile, and stubs
    pnfs: cleanup_layoutcommit
    pnfs: ask for layout_blksize and save it in nfs_server
    ...

    Linus Torvalds
     
  • Call GETDEVICELIST during mount, then call and parse GETDEVICEINFO
    for each device returned.

    [pnfsblock: get rid of deprecated xdr macros]
    Signed-off-by: Jim Rees
    [pnfsblock: fix pnfs_deviceid references]
    Signed-off-by: Fred Isaman
    [pnfsblock: fix print format warnings for sector_t and size_t]
    [pnfs-block: #include ]
    [pnfsblock: no PNFS_NFS_SERVER]
    Signed-off-by: Benny Halevy
    [pnfsblock: fix bug determining size of striped volume]
    [pnfsblock: fix oops when using multiple devices]
    Signed-off-by: Fred Isaman
    Signed-off-by: Benny Halevy
    Signed-off-by: Benny Halevy
    [pnfsblock: get rid of vmap and deviceid->area structure]
    Signed-off-by: Peng Tao
    Signed-off-by: Jim Rees
    Signed-off-by: Trond Myklebust

    Fred Isaman
     
  • Block layout needs it to determine IO size.

    Signed-off-by: Fred Isaman
    Signed-off-by: Tao Guo
    Signed-off-by: Benny Halevy
    Signed-off-by: Benny Halevy
    Signed-off-by: Jim Rees
    Signed-off-by: Trond Myklebust

    Fred Isaman
     

28 Jul, 2011

1 commit

  • * 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (44 commits)
    NFSv4: Don't use the delegation->inode in nfs_mark_return_delegation()
    nfs: don't use d_move in nfs_async_rename_done
    RDMA: Increasing RPCRDMA_MAX_DATA_SEGS
    SUNRPC: Replace xprt->resend and xprt->sending with a priority queue
    SUNRPC: Allow caller of rpc_sleep_on() to select priority levels
    SUNRPC: Support dynamic slot allocation for TCP connections
    SUNRPC: Clean up the slot table allocation
    SUNRPC: Initalise the struct xprt upon allocation
    SUNRPC: Ensure that we grab the XPRT_LOCK before calling xprt_alloc_slot
    pnfs: simplify pnfs files module autoloading
    nfs: document nfsv4 sillyrename issues
    NFS: Convert nfs4_set_ds_client to EXPORT_SYMBOL_GPL
    SUNRPC: Convert the backchannel exports to EXPORT_SYMBOL_GPL
    SUNRPC: sunrpc should not explicitly depend on NFS config options
    NFS: Clean up - simplify the switch to read/write-through-MDS
    NFS: Move the pnfs write code into pnfs.c
    NFS: Move the pnfs read code into pnfs.c
    NFS: Allow the nfs_pageio_descriptor to signal that a re-coalesce is needed
    NFS: Use the nfs_pageio_descriptor->pg_bsize in the read/write request
    NFS: Cache rpc_ops in struct nfs_pageio_descriptor
    ...

    Linus Torvalds
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

13 Jul, 2011

2 commits


25 Apr, 2011

1 commit

  • If a server for some reason keeps sending NFS4ERR_DELAY errors, we can end
    up looping forever inside nfs4_proc_create_session, and so the usual
    mechanisms for detecting if the nfs_client is dead don't work.

    Fix this by ensuring that we loop inside the nfs4_state_manager thread
    instead.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

12 Mar, 2011

5 commits


11 Mar, 2011

1 commit


07 Jan, 2011

5 commits

  • Delegations are per-inode, not per-nfs_client. When a server file
    system is migrated, delegations on the client must be moved from the
    source to the destination nfs_server. Make it easier to manage a
    mount point's delegation list across a migration event by moving the
    list to the nfs_server struct.

    Clean up: I added documenting comments to public functions I changed
    in this patch. For consistency I added comments to all the other
    public functions in fs/nfs/delegation.c.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • NFSv4 migration needs to reassociate state owners from the source to
    the destination nfs_server data structures. To make that easier, move
    the cl_state_owners field to the nfs_server struct. cl_openowner_id
    and cl_lockowner_id accompany this move, as they are used in
    conjunction with cl_state_owners.

    The cl_lock field in the parent nfs_client continues to protect all
    three of these fields.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • A layout can request return-on-close. How this interacts with the
    forgetful model of never sending LAYOUTRETURNS is a bit ambiguous.
    We forget any layouts marked roc, and wait for them to be completely
    forgotten before continuing with the close. In addition, to compensate
    for races with any inflight LAYOUTGETs, and the fact that we do not get
    any layout stateid back from the server, we set the barrier to the worst
    case scenario of current_seqid + number of outstanding LAYOUTGETS.

    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Fred Isaman
     
  • Currently session draining only drains the fore channel.
    The back channel processing must also be drained.

    Use the back channel highest_slot_used to indicate that a callback is being
    processed by the callback thread. Move the session complete to be per channel.

    When the session is draininig, wait for any current back channel processing
    to complete and stop all new back channel processing by returning NFS4ERR_DELAY
    to the back channel client.

    Drain the back channel, then the fore channel.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Use the small id to pointer translator service to provide a unique callback
    identifier per SETCLIENTID call used to identify the v4.0 callback service
    associated with the clientid.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     

25 Oct, 2010

4 commits

  • Add the ability to actually send LAYOUTGET and GETDEVICEINFO. This also adds
    in the machinery to handle layout state and the deviceid cache. Note that
    GETDEVICEINFO is not called directly by the generic layer. Instead it
    is called by the drivers while parsing the LAYOUTGET opaque data in response
    to an unknown device id embedded therein. RFC 5661 only encodes
    device ids within the driver-specific opaque data.

    Signed-off-by: Andy Adamson
    Signed-off-by: Dean Hildebrand
    Signed-off-by: Marc Eshel
    Signed-off-by: Mike Sager
    Signed-off-by: Ricardo Labiaga
    Signed-off-by: Tao Guo
    Signed-off-by: Boaz Harrosh
    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • In particular, server reboot will invalidate all layouts.

    Note that in order to have an active layout, we must get a successful response
    from the server. To avoid adding that machinery, this patch just includes a
    stub that fakes up a successful return. Since the layout is never referenced
    for io, this is not a problem.

    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    Signed-off-by: Dean Hildebrand
    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Put in the infrastructure that uses information returned from the
    server at mount to select a layout driver module.

    In this patch, a stub is used that always returns "no driver found".

    Signed-off-by: Ricardo Labiaga
    Signed-off-by: Dean Hildebrand
    Signed-off-by: Marc Eshel
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Ricardo Labiaga
     
  • Instead of blindly zapping the caches, attempt to revalidate them if
    the server has indicated that it uses high resolution timestamps.

    NFSv4 should be able to always revalidate the cache since the
    protocol requires the update of the change attribute on modification of
    the data. In reality, there are servers (the Linux NFS server
    for example) that do not obey this requirement and use ctime as the
    basis for change attribute. Long term, the server needs to be fixed.
    At this time, and to be on the safe side, continue zapping caches if
    the server indicates that it does not have a high resolution timestamp.

    Signed-off-by: Ricardo Labiaga
    Signed-off-by: Trond Myklebust

    Ricardo Labiaga
     

23 Jun, 2010

1 commit


15 May, 2010

1 commit


12 Apr, 2010

1 commit

  • Arnaud Giersch reports that NFSv4 locking is broken when we hold a
    delegation since commit 8e469ebd6dc32cbaf620e134d79f740bf0ebab79 (NFSv4:
    Don't allow posix locking against servers that don't support it).

    According to Arnaud, the lock succeeds the first time he opens the file
    (since we cannot do a delegated open) but then fails after we start using
    delegated opens.

    The following patch fixes it by ensuring that locking behaviour is
    governed by a per-filesystem capability flag that is initially set, but
    gets cleared if the server ever returns an OPEN without the
    NFS4_OPEN_RESULT_LOCKTYPE_POSIX flag being set.

    Reported-by: Arnaud Giersch
    Signed-off-by: Trond Myklebust
    Cc: stable@kernel.org

    Trond Myklebust
     

06 Mar, 2010

1 commit


17 Feb, 2010

1 commit

  • Add __percpu sparse annotations to fs.

    These annotations are to make sparse consider percpu variables to be
    in a different address space and warn if accessed without going
    through percpu accessors. This patch doesn't affect normal builds.

    Signed-off-by: Tejun Heo
    Cc: "Theodore Ts'o"
    Cc: Trond Myklebust
    Cc: Alex Elder
    Cc: Christoph Hellwig
    Cc: Alexander Viro

    Tejun Heo
     

10 Feb, 2010

1 commit


05 Dec, 2009

1 commit

  • If the session is reset during state recovery, the state manager thread can
    sleep on the slot_tbl_waitq causing a deadlock.

    Add a completion framework to the session. Have the state manager thread set
    a new session state (NFS4CLNT_SESSION_DRAINING) and wait for the session slot
    table to drain.

    Signal the state manager thread in nfs41_sequence_free_slot when the
    NFS4CLNT_SESSION_DRAINING bit is set and the session is drained.

    Reported-by: Trond Myklebust
    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     

10 Aug, 2009

1 commit

  • If the NFSv4 server doesn't support a POSIX attribute, the generic NFS code
    needs to know that, so that it don't keep trying to poll for it.

    However, by the same count, if the NFSv4 server does support that
    attribute, then we should ensure that the inode metadata is appropriately
    labelled as being untrusted. For instance, if we don't know the correct
    value of the file's uid, we should certainly not be caching ACLs or ACCESS
    results.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

18 Jun, 2009

6 commits

  • Defines a new 'struct nfs4_slot_table' in the 'struct nfs4_session'
    for use by the backchannel. Initializes, resets, and destroys the backchannel
    slot table in the same manner the forechannel slot table is initialized,
    reset, and destroyed.

    The sequenceid for each slot in the backchannel slot table is initialized
    to 0, whereas the forechannel slotid's sequenceid is set to 1.

    Signed-off-by: Ricardo Labiaga
    Signed-off-by: Benny Halevy

    Ricardo Labiaga
     
  • At mount, nfs_alloc_client sets the cl_state NFS4CLNT_LEASE_EXPIRED bit
    and nfs4_alloc_session sets the NFS4CLNT_SESSION_SETUP bit, so both bits are
    set when nfs4_lookup_root calls nfs4_recover_expired_lease which schedules
    the nfs4_state_manager and waits for it to complete.

    Place the session setup after the clientid establishment in nfs4_state_manager
    so that the session is setup right after the clientid has been established
    without rescheduling the state manager.

    Unlike nfsv4.0, the nfs_client struct is not ready to use until the session
    has been established. Postpone marking the nfs_client struct to NFS_CS_READY
    until after a successful CREATE_SESSION call so that other threads cannot use
    the client until the session is established.

    If the EXCHANGE_ID call fails and the session has not been setup (the
    NFS4CLNT_SESSION_SETUP bit is set), mark the client with the error and return.

    If the session setup CREATE_SESSION call fails with NFS4ERR_STALE_CLIENTID
    which could occur due to server reboot or network partition inbetween the
    EXCHANGE_ID and CREATE_SESSION call, reset the NFS4CLNT_LEASE_EXPIRED and
    NFS4CLNT_SESSION_SETUP bits and try again.

    If the CREATE_SESSION call fails with other errors, mark the client with
    the error and return.

    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy

    [nfs41: NFS_CS_SESSION_SETUP cl_cons_state for back channel setup]
    On session setup, the CREATE_SESSION reply races with the server back channel
    probe which needs to succeed to setup the back channel. Set a new
    cl_cons_state NFS_CS_SESSION_SETUP just prior to the CREATE_SESSION call
    and add it as a valid state to nfs_find_client so that the client back channel
    can find the nfs_client struct and won't drop the server backchannel probe.
    Use a new cl_cons_state so that NFSv4.0 back channel behaviour which only
    sets NFS_CS_READY is unchanged.
    Adjust waiting on the nfs_client_active_wq accordingly.
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy

    [nfs41: rename NFS_CS_SESSION_SETUP to NFS_CS_SESSION_INITING]
    Signed-off-by: Andy Adamson
    [nfs41: set NFS_CL_SESSION_INITING in alloc_session]
    Signed-off-by: Andy Adamson
    [nfs41: move session setup into a function]
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [moved nfs4_proc_create_session declaration here]
    Signed-off-by: Benny Halevy
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Implement the exchange_id operation conforming to
    http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26

    Unlike NFSv4.0, NFSv4.1 requires machine credentials. RPC_AUTH_GSS machine
    credentials will be passed into the kernel at mount time to be available for
    the exchange_id operation.

    RPC_AUTH_UNIX root mounts can use the UNIX root credential. Store the root
    credential in the nfs_client struct.

    Without a credential, NFSv4.1 state renewal fails.

    [nfs41: establish clientid via exchange id only if cred != NULL]
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [nfsd41: move nfstime4 from under CONFIG_NFS_V4_1]
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [nfs41: do not wait a lease time in exchange id]
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [nfs41: pass *session in seq_args and seq_res]
    Signed-off-by: Benny Halevy
    Signed-off-by: Trond Myklebust
    [nfs41: Ignoring impid in decode_exchange_id is missing a READ_BUF]
    Signed-off-by: Benny Halevy
    [nfs41: fix Xcode_exchange_id's xdr Xcoding pointer type]
    [nfs41: get rid of unused struct nfs41_exchange_id_res members]
    Signed-off-by: Benny Halevy

    Benny Halevy
     
  • Use nfs4_call_sync rather than rpc_call_sync to provide
    for a nfs41 sessions-enabled interface for sessions manipulation.

    The nfs41 rpc logic uses the rpc_call_prepare method to
    recover and create the session, as well as selecting a free slot id
    and the rpc_call_done to free the slot and update slot table
    related metadata.

    In the coming patches we'll add rpc prepare and done routines
    for setting up the sequence op and processing the sequence result.

    Signed-off-by: Benny Halevy
    [nfs41: nfs4_call_sync]
    As per 11-14-08 review.
    Squash into "nfs41: introduce nfs4_call_sync" and "nfs41: nfs4_setup_sequence"
    Define two functions one for v4 and one for v41
    add a pointer to struct nfs4_client to the correct one.
    Signed-off-by: Andy Adamson
    [added BUG() in _nfs4_call_sync_session if !CONFIG_NFS_V4_1]
    Signed-off-by: Benny Halevy
    [nfs41: check for session not minorversion]
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [group minorversion specific stuff together]
    Signed-off-by: Alexandros Batsakis
    Signed-off-by: Benny Halevy
    Signed-off-by: Andy Adamson
    [nfs41: fixup nfs4_clear_client_minor_version]
    [introduce nfs4_init_client_minor_version() in this patch]
    Signed-off-by: Benny Halevy
    [cleaned-up patch: got rid of nfs_call_sync_t, dprintks, cosmetics, extra server defs]
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • NFSv4.1 Sessions basic data types, initialization, and destruction.

    The session is always associated with a struct nfs_client that holds
    the exchange_id results.

    Signed-off-by: Rahul Iyer
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [remove extraneous rpc_clnt pointer, use the struct nfs_client cl_rpcclient.
    remove the rpc_clnt parameter from nfs4 nfs4_init_session]
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [Use the presence of a session to determine behaviour instead of the
    minorversion number.]
    Signed-off-by: Andy Adamson
    [constified nfs4_has_session's struct nfs_client parameter]
    Signed-off-by: Benny Halevy
    [Rename nfs4_put_session() to nfs4_destroy_session() and call it from nfs4_free_client() not nfs4_free_server().
    Also get rid of nfs4_get_session() and the ref_count in nfs4_session struct as keeping track of nfs_client should be sufficient]
    Signed-off-by: Alexandros Batsakis
    [nfs41: pass rsize and wsize into nfs4_init_session]
    Signed-off-by: Andy Adamson
    [separated out removal of rpc_clnt parameter from nfs4_init_session ot a
    patch of its own]
    Signed-off-by: Benny Halevy
    [Pass the nfs_client pointer into nfs4_alloc_session]
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [nfs41: don't assign to session->clp->cl_session in nfs4_destroy_session]
    [nfs41: fixup nfs4_clear_client_minor_version]
    [introduce nfs4_clear_client_minor_version() in this patch]
    Signed-off-by: Benny Halevy
    [Refactor nfs4_init_session]
    Moved session allocation into nfs4_init_client_minor_version, called from
    nfs4_init_client.
    Leave rwise and wsize initialization in nfs4_init_session, called from
    nfs4_init_server.
    Reverted moving of nfs_fsid definition to nfs_fs_sb.h
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [nfs41: Move NFS4_MAX_SLOT_TABLE define from under CONFIG_NFS_V4_1]
    [Fix comile error when CONFIG_NFS_V4_1 is not set.]
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [moved nfs4_init_slot_table definition to "create_session operation"]
    Signed-off-by: Benny Halevy
    [nfs41: alloc session with GFP_KERNEL]
    Signed-off-by: Benny Halevy
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • This field is set to the nfsv4 minor version for this mount.

    Signed-off-by: Benny Halevy

    Note: This patch sets the referral to the same minorversion as the
    current mount. Revisit in future patch.

    Signed-off-by: Andy Adamson
    [removed cl_minorversion assignment in nfs_set_client]
    Signed-off-by: Benny Halevy
    [always define nfs_client.cl_minorversion]
    Signed-off-by: Trond Myklebust

    Benny Halevy
     

03 Apr, 2009

1 commit

  • Define and create superblock-level cache index objects (as managed by
    nfs_server structs).

    Each superblock object is created in a server level index object and is itself
    an index into which inode-level objects are inserted.

    Ideally there would be one superblock-level object per server, and the former
    would be folded into the latter; however, since the "nosharecache" option
    exists this isn't possible.

    The superblock object key is a sequence consisting of:

    (1) Certain superblock s_flags.

    (2) Various connection parameters that serve to distinguish superblocks for
    sget().

    (3) The volume FSID.

    (4) The security flavour.

    (5) The uniquifier length.

    (6) The uniquifier text. This is normally an empty string, unless the fsc=xyz
    mount option was used to explicitly specify a uniquifier.

    The key blob is of variable length, depending on the length of (6).

    The superblock object is given no coherency data to carry in the auxiliary data
    permitted by the cache. It is assumed that the superblock is always coherent.

    This patch also adds uniquification handling such that two otherwise identical
    superblocks, at least one of which is marked "nosharecache", won't end up
    trying to share the on-disk cache. It will be possible to manually provide a
    uniquifier through a mount option with a later patch to avoid the error
    otherwise produced.

    Signed-off-by: David Howells
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells