18 Feb, 2015

1 commit

  • Merge cleanups requested by Linus.

    * cleanups: (3 commits)
    pnfs: Refactor the *_layout_mark_request_commit to use pnfs_layout_mark_request_commit
    nfs: Can call nfs_clear_page_commit() instead
    nfs: Provide and use helper functions for marking a page as unstable

    Trond Myklebust
     

13 Feb, 2015

1 commit

  • Pull nfsd updates from Bruce Fields:
    "The main change is the pNFS block server support from Christoph, which
    allows an NFS client connected to shared disk to do block IO to the
    shared disk in place of NFS reads and writes. This also requires xfs
    patches, which should arrive soon through the xfs tree, barring
    unexpected problems. Support for other filesystems is also possible
    if there's interest.

    Thanks also to Chuck Lever for continuing work to get NFS/RDMA into
    shape"

    * 'for-3.20' of git://linux-nfs.org/~bfields/linux: (32 commits)
    nfsd: default NFSv4.2 to on
    nfsd: pNFS block layout driver
    exportfs: add methods for block layout exports
    nfsd: add trace events
    nfsd: update documentation for pNFS support
    nfsd: implement pNFS layout recalls
    nfsd: implement pNFS operations
    nfsd: make find_any_file available outside nfs4state.c
    nfsd: make find/get/put file available outside nfs4state.c
    nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c
    nfsd: add fh_fsid_match helper
    nfsd: move nfsd_fh_match to nfsfh.h
    fs: add FL_LAYOUT lease type
    fs: track fl_owner for leases
    nfs: add LAYOUT_TYPE_MAX enum value
    nfsd: factor out a helper to decode nfstime4 values
    sunrpc/lockd: fix references to the BKL
    nfsd: fix year-2038 nfs4 state problem
    svcrdma: Handle additional inline content
    svcrdma: Move read list XDR round-up logic
    ...

    Linus Torvalds
     

12 Feb, 2015

1 commit


10 Feb, 2015

1 commit


09 Feb, 2015

3 commits


04 Feb, 2015

4 commits

  • Fix an Oopsable condition when nsm_mon_unmon is called as part of the
    namespace cleanup, which now apparently happens after the utsname
    has been freed.

    Link: http://lkml.kernel.org/r/20150125220604.090121ae@neptune.home
    Reported-by: Bruno Prémont
    Cc: stable@vger.kernel.org # 3.18
    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • * flexfiles: (53 commits)
    pnfs: lookup new lseg at lseg boundary
    nfs41: .init_read and .init_write can be called with valid pg_lseg
    pnfs: Update documentation on the Layout Drivers
    pnfs/flexfiles: Add the FlexFile Layout Driver
    nfs: count DIO good bytes correctly with mirroring
    nfs41: wait for LAYOUTRETURN before retrying LAYOUTGET
    nfs: add a helper to set NFS_ODIRECT_RESCHED_WRITES to direct writes
    nfs41: add NFS_LAYOUT_RETRY_LAYOUTGET to layout header flags
    nfs/flexfiles: send layoutreturn before freeing lseg
    nfs41: introduce NFS_LAYOUT_RETURN_BEFORE_CLOSE
    nfs41: allow async version layoutreturn
    nfs41: add range to layoutreturn args
    pnfs: allow LD to ask to resend read through pnfs
    nfs: add nfs_pgio_current_mirror helper
    nfs: only reset desc->pg_mirror_idx when mirroring is supported
    nfs41: add a debug warning if we destroy an unempty layout
    pnfs: fail comparison when bucket verifier not set
    nfs: mirroring support for direct io
    nfs: add mirroring support to pgio layer
    pnfs: pass ds_commit_idx through the commit path
    ...

    Conflicts:
    fs/nfs/pnfs.c
    fs/nfs/pnfs.h

    Trond Myklebust
     
  • The flexfile layout is a new layout that extends the
    file layout. It is currently being drafted as a specification at
    https://datatracker.ietf.org/doc/draft-ietf-nfsv4-layout-types/

    Signed-off-by: Weston Andros Adamson
    Signed-off-by: Tom Haynes
    Signed-off-by: Tao Peng

    Tom Haynes
     
  • Add a call to tally stats for a task under a different statsidx than
    what's contained in the task structure.

    This is needed to properly account for pnfs reads/writes when the
    DS nfs version != the MDS version.

    Signed-off-by: Weston Andros Adamson
    Signed-off-by: Tom Haynes

    Weston Andros Adamson
     

30 Jan, 2015

2 commits


23 Jan, 2015

1 commit


16 Jan, 2015

3 commits

  • Currently the Linux server can not decode RDMA_NOMSG type requests.
    Operations whose length exceeds the fixed size of RDMA SEND buffers,
    like large NFSv4 CREATE(NF4LNK) operations, must be conveyed via
    RDMA_NOMSG.

    For an RDMA_MSG type request, the client sends the RPC/RDMA, RPC
    headers, and some or all of the NFS arguments via RDMA SEND.

    For an RDMA_NOMSG type request, the client sends just the RPC/RDMA
    header via RDMA SEND. The request's read list contains elements for
    the entire RPC message, including the RPC header.

    NFSD expects the RPC/RMDA header and RPC header to be contiguous in
    page zero of the XDR buffer. Add logic in the RDMA READ path to make
    the read list contents land where the server prefers, when the
    incoming message is a type RDMA_NOMSG message.

    Signed-off-by: Chuck Lever
    Reviewed-by: Steve Wise
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • The RDMA reader function doesn't change once an svcxprt_rdma is
    instantiated. Instead of checking sc_devcap during every incoming
    RPC, set the reader function once when the connection is accepted.

    Signed-off-by: Chuck Lever
    Reviewed-by: Steve Wise
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • The byte_count argument is not used, and the function is called
    only from one place.

    Signed-off-by: Chuck Lever
    Reviewed-by: Steve Wise
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

10 Dec, 2014

11 commits

  • Testing has shown that the pool->sp_lock can be a bottleneck on a busy
    server. Every time data is received on a socket, the server must take
    that lock in order to dequeue a thread from the sp_threads list.

    Address this problem by eliminating the sp_threads list (which contains
    threads that are currently idle) and replacing it with a RQ_BUSY flag in
    svc_rqst. This allows us to walk the sp_all_threads list under the
    rcu_read_lock and find a suitable thread for the xprt by doing a
    test_and_set_bit.

    Note that we do still have a potential atomicity problem however with
    this approach. We don't want svc_xprt_do_enqueue to set the
    rqst->rq_xprt pointer unless a test_and_set_bit of RQ_BUSY returned
    zero (which indicates that the thread was idle). But, by the time we
    check that, the bit could be flipped by a waking thread.

    To address this, we acquire a new per-rqst spinlock (rq_lock) and take
    that before doing the test_and_set_bit. If that returns false, then we
    can set rq_xprt and drop the spinlock. Then, when the thread wakes up,
    it must set the bit under the same spinlock and can trust that if it was
    already set then the rq_xprt is also properly set.

    With this scheme, the case where we have an idle thread no longer needs
    to take the highly contended pool->sp_lock at all, and that removes the
    bottleneck.

    That still leaves one issue: What of the case where we walk the whole
    sp_all_threads list and don't find an idle thread? Because the search is
    lockess, it's possible for the queueing to race with a thread that is
    going to sleep. To address that, we queue the xprt and then search again.

    If we find an idle thread at that point, we can't attach the xprt to it
    directly since that might race with a different thread waking up and
    finding it. All we can do is wake the idle thread back up and let it
    attempt to find the now-queued xprt.

    Signed-off-by: Jeff Layton
    Tested-by: Chris Worley
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • In a later patch, we'll be removing some spinlocking around the socket
    and thread queueing code in order to fix some contention problems. At
    that point, the stats counters will no longer be protected by the
    sp_lock.

    Change the counters to atomic_long_t fields, except for the
    "sockets_queued" counter which will still be manipulated under a
    spinlock.

    Signed-off-by: Jeff Layton
    Tested-by: Chris Worley
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • ...also make the manipulation of sp_all_threads list use RCU-friendly
    functions.

    Signed-off-by: Jeff Layton
    Tested-by: Chris Worley
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • In a later patch, we'll want to be able to handle this flag without
    holding the sp_lock. Change this field to an unsigned long flags
    field, and declare a new flag in it that can be managed with atomic
    bitops.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • There are a couple of holes in the svc_rqst field on x86_64. Move the
    rq_cachetype to a different location to eliminate both of them.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • In a later patch, we're going to need some atomic bit flags. Since that
    field will need to be an unsigned long, we mitigate that space
    consumption by migrating some other bitflags to the new field. Start
    with the rq_secure flag.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Mainly what I need is 860a0d9e511f "sunrpc: add some tracepoints in
    svc_rqst handling functions", which subsequent server rpc patches from
    jlayton depend on. I'm merging this later tag on the assumption that's
    more likely to be a tested and stable point.

    J. Bruce Fields
     

02 Dec, 2014

1 commit

  • All it does is indicate whether a xprt has already been deleted from
    a list or not, which is unnecessary since we use list_del_init and it's
    always set and checked under the sv_lock anyway.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

28 Nov, 2014

2 commits

  • Add a new directory heirarchy under the debugfs sunrpc/ directory:

    sunrpc/
    rpc_xprt/
    /

    Within that directory, we can put files that give info about the
    xprts. We do have the (minor) problem that there is no succinct,
    unique identifier for rpc_xprts. So we generate them synthetically
    with a static atomic_t counter.

    For now, this directory just holds an "info" file, but we may add
    other files to it in the future.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • It's possible to get a dump of the RPC task queue by writing a value to
    /proc/sys/sunrpc/rpc_debug. If you write any value to that file, you get
    a dump of the RPC client task list into the log buffer. This is a rather
    inconvenient interface however, and makes it hard to get immediate info
    about the task queue.

    Add a new directory hierarchy under debugfs:

    sunrpc/
    rpc_clnt/
    /

    Within each clientid directory we create a new "tasks" file that will
    dump info similar to what shows up in the log buffer, but with a few
    small differences -- we avoid printing raw kernel addresses in favor of
    symbolic names and the XID is also displayed.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     

27 Nov, 2014

1 commit


26 Nov, 2014

1 commit

  • Occasionally mountstats reports a negative retransmission rate.
    Ensure that two RPCs completing concurrently don't confuse the sums
    in the transport's op_metrics array.

    Since pNFS filelayout can invoke rpc_count_iostats() on another
    transport from xprt_release(), we can't rely on simply holding the
    transport_lock in xprt_release(). There's nothing for it but hard
    serialization. One spin lock per RPC operation should make this as
    painless as it can be.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

25 Nov, 2014

3 commits


09 Oct, 2014

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - support the NFSv4.2 SEEK operation (allowing clients to support
    SEEK_HOLE/SEEK_DATA), thanks to Anna.
    - end the grace period early in a number of cases, mitigating a
    long-standing annoyance, thanks to Jeff
    - improve SMP scalability, thanks to Trond"

    * 'for-3.18' of git://linux-nfs.org/~bfields/linux: (55 commits)
    nfsd: eliminate "to_delegation" define
    NFSD: Implement SEEK
    NFSD: Add generic v4.2 infrastructure
    svcrdma: advertise the correct max payload
    nfsd: introduce nfsd4_callback_ops
    nfsd: split nfsd4_callback initialization and use
    nfsd: introduce a generic nfsd4_cb
    nfsd: remove nfsd4_callback.cb_op
    nfsd: do not clear rpc_resp in nfsd4_cb_done_sequence
    nfsd: fix nfsd4_cb_recall_done error handling
    nfsd4: clarify how grace period ends
    nfsd4: stop grace_time update at end of grace period
    nfsd: skip subsequent UMH "create" operations after the first one for v4.0 clients
    nfsd: set and test NFSD4_CLIENT_STABLE bit to reduce nfsdcltrack upcalls
    nfsd: serialize nfsdcltrack upcalls for a particular client
    nfsd: pass extra info in env vars to upcalls to allow for early grace period end
    nfsd: add a v4_end_grace file to /proc/fs/nfsd
    lockd: add a /proc/fs/lockd/nlm_end_grace file
    nfsd: reject reclaim request when client has already sent RECLAIM_COMPLETE
    nfsd: remove redundant boot_time parm from grace_done client tracking op
    ...

    Linus Torvalds
     

25 Sep, 2014

1 commit

  • When aborting a connection to preserve source ports, don't wake the task in
    xs_error_report. This allows tasks with RPC_TASK_SOFTCONN to succeed if the
    connection needs to be re-established since it preserves the task's status
    instead of setting it to the status of the aborting kernel_connect().

    This may also avoid a potential conflict on the socket's lock.

    Signed-off-by: Benjamin Coddington
    Cc: stable@vger.kernel.org # 3.14+
    Signed-off-by: Trond Myklebust

    Benjamin Coddington
     

18 Aug, 2014

1 commit


14 Aug, 2014

1 commit

  • Pull NFS client updates from Trond Myklebust:
    "Highlights include:

    - stable fix for a bug in nfs3_list_one_acl()
    - speed up NFS path walks by supporting LOOKUP_RCU
    - more read/write code cleanups
    - pNFS fixes for layout return on close
    - fixes for the RCU handling in the rpcsec_gss code
    - more NFS/RDMA fixes"

    * tag 'nfs-for-3.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits)
    nfs: reject changes to resvport and sharecache during remount
    NFS: Avoid infinite loop when RELEASE_LOCKOWNER getting expired error
    SUNRPC: remove all refcounting of groupinfo from rpcauth_lookupcred
    NFS: fix two problems in lookup_revalidate in RCU-walk
    NFS: allow lockless access to access_cache
    NFS: teach nfs_lookup_verify_inode to handle LOOKUP_RCU
    NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU
    NFS: support RCU_WALK in nfs_permission()
    sunrpc/auth: allow lockless (rcu) lookup of credential cache.
    NFS: prepare for RCU-walk support but pushing tests later in code.
    NFS: nfs4_lookup_revalidate: only evaluate parent if it will be used.
    NFS: add checks for returned value of try_module_get()
    nfs: clear_request_commit while holding i_lock
    pnfs: add pnfs_put_lseg_async
    pnfs: find swapped pages on pnfs commit lists too
    nfs: fix comment and add warn_on for PG_INODE_REF
    nfs: check wait_on_bit_lock err in page_group_lock
    sunrpc: remove "ec" argument from encrypt_v2 operation
    sunrpc: clean up sparse endianness warnings in gss_krb5_wrap.c
    sunrpc: clean up sparse endianness warnings in gss_krb5_seal.c
    ...

    Linus Torvalds