20 Dec, 2017

1 commit

  • commit 90d91b0cd371193d9dbfa9beacab8ab9a4cb75e0 upstream.

    We must ensure that the call to rpc_sleep_on() in xprt_transmit() cannot
    race with the call to xprt_complete_rqst().

    Reported-by: Chuck Lever
    Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=317
    Fixes: ce7c252a8c74 ("SUNRPC: Add a separate spinlock to protect..")
    Reviewed-by: Chuck Lever
    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     

20 Oct, 2017

1 commit

  • The transport may need to flush transport connect and receive tasks
    that are running on rpciod. In order to do so safely, we need to
    ensure that the caller of cancel_work_sync() etc is not itself
    running on rpciod.
    Do so by running the destroy task from the system workqueue.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

16 Oct, 2017

1 commit

  • We remove the request from the receive list before we call
    xprt_wait_on_pinned_rqst(), and so we need to use list_del_init().
    Otherwise, we will see list corruption when xprt_complete_rqst()
    is called.

    Reported-by: Emre Celebi
    Fixes: ce7c252a8c741 ("SUNRPC: Add a separate spinlock to protect...")
    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

06 Sep, 2017

1 commit

  • Adopt the use of xprt_pin_rqst to eliminate contention between
    Call-side users of rb_lock and the use of rb_lock in
    rpcrdma_reply_handler.

    This replaces the mechanism introduced in 431af645cf66 ("xprtrdma:
    Fix client lock-up after application signal fires").

    Use recv_lock to quickly find the completing rqst, pin it, then
    drop the lock. At that point invalidation and pull-up of the Reply
    XDR can be done. Both are often expensive operations.

    Finally, take recv_lock again to signal completion to the RPC
    layer. It also protects adjustment of "cwnd".

    This greatly reduces the amount of time a lock is held by the
    reply handler. Comparing lock_stat results shows a marked decrease
    in contention on rb_lock and recv_lock.

    Signed-off-by: Chuck Lever
    [trond.myklebust@primarydata.com: Remove call to rpcrdma_buffer_put() from
    the "out_norqst:" path in rpcrdma_reply_handler.]
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

19 Aug, 2017

1 commit


17 Aug, 2017

1 commit


14 Jul, 2017

1 commit

  • In xprt_alloc_slot(), the spin lock is only needed to provide atomicity
    between the atomic_add_unless() failure and the call to xprt_add_backlog().
    We do not actually need to hold it across the memory allocation itself.

    By dropping the lock, we can use a more resilient GFP_NOFS allocation,
    just as we now do in the rest of the RPC client code.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     

26 Apr, 2017

1 commit

  • xprt_force_disconnect() is already invoked from the socket
    transport. I want to invoke xprt_force_disconnect() from the
    RPC-over-RDMA transport, which is a separate module from sunrpc.ko.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

11 Feb, 2017

1 commit

  • The transport lock is needed to protect the xprt_adjust_cwnd() call
    in xs_udp_timer, but it is not necessary for accessing the
    rq_reply_bytes_recvd or tk_status fields. It is correct to sublimate
    the lock into UDP's xs_udp_timer method, where it is required.

    The ->timer method has to take the transport lock if needed, but it
    can now sleep safely, or even call back into the RPC scheduler.

    This is more a clean-up than a fix, but the "issue" was introduced
    by my transport switch patches back in 2005.

    Fixes: 46c0ee8bc4ad ("RPC: separate xprt_timer implementations")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

02 Dec, 2016

1 commit

  • xs_connect() contains an exponential backoff mechanism so the repeated
    connection attempts are delayed by longer and longer amounts.

    This is appropriate when the connection failed due to a timeout, but
    it not appropriate when a definitive "no" answer is received. In such
    cases, call_connect_status() imposes a minimum 3-second back-off, so
    not having the exponetial back-off will never result in immediate
    retries.

    The current situation is a problem when the NFS server tries to
    register with rpcbind but rpcbind isn't running. All connection
    attempts are made on the same "xprt" and as the connection is never
    "closed", the exponential back delays successive attempts to register,
    or de-register, different protocols. This results in a multi-minute
    delay with no benefit.

    So, when call_connect_status() receives a definitive "no", use
    xprt_conditional_disconnect() to cancel the previous connection attempt.
    This will set XPRT_CLOSE_WAIT so that xprt->ops->close() calls xs_close()
    which resets the reestablish_timeout.

    To ensure xprt_conditional_disconnect() does the right thing, we
    ensure that rq_connect_cookie is set before a connection attempt, and
    allow xprt_conditional_disconnect() to complete even when the
    transport is not fully connected.

    Signed-off-by: NeilBrown
    Signed-off-by: Trond Myklebust

    NeilBrown
     

20 Sep, 2016

1 commit

  • xprtrdma needs to allocate the Call and Reply buffers separately.
    TBH, the reliance on using a single buffer for the pair of XDR
    buffers is transport implementation-specific.

    Instead of passing just the rq_buffer into the buf_free method, pass
    the task structure and let buf_free take care of freeing both
    XDR buffers at once.

    There's a micro-optimization here. In the common case, both
    xprt_release and the transport's buf_free method were checking if
    rq_buffer was NULL. Now the check is done only once per RPC.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

03 Aug, 2016

1 commit


14 Jun, 2016

2 commits


06 Feb, 2016

2 commits


01 Feb, 2016

2 commits


20 Jan, 2016

1 commit


20 Sep, 2015

1 commit


20 Jun, 2015

1 commit

  • This fixes a regression introduced by commit caf4ccd4e88cf2 ("SUNRPC:
    Make xs_tcp_close() do a socket shutdown rather than a sock_release").
    Prior to that commit, the autoclose feature would ensure that an
    idle connection would result in the socket being both disconnected and
    released, whereas now only gets disconnected.

    While the current behaviour is harmless, it does leave the port bound
    until either RPC traffic resumes or the RPC client is shut down.

    Reported-by: Steven Rostedt
    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

16 Jun, 2015

1 commit

  • If the sending queue has a task without ->rq_cong set at the front,
    and then a number of tasks with ->rq_cong set such that they use
    the entire congestion window, then the queue deadlocks. The first
    entry cannot be processed until later entries complete.

    This scenario has been seen with a client using UDP to access a server,
    and the network connection breaking for a period of time - it doesn't
    recover.

    It never really makes sense for an ->rq_cong request to be on the ->sending
    queue, but it can happen when a request is being retried, and finds
    the transport if locked (XPRT_LOCKED). In this case we simple call
    __xprt_put_cong() and the deadlock goes away.

    Signed-off-by: NeilBrown
    Signed-off-by: Trond Myklebust

    Neil Brown
     

11 Jun, 2015

1 commit

  • It has been exceptionally useful to exercise the logic that handles
    local immediate errors and RDMA connection loss. To enable
    developers to test this regularly and repeatably, add logic to
    simulate connection loss every so often.

    Fault injection is disabled by default. It is enabled with

    $ sudo echo xxx > /sys/kernel/debug/sunrpc/inject_fault/disconnect

    where "xxx" is a large positive number of transport method calls
    before a disconnect. A value of several thousand is usually a good
    number that allows reasonable forward progress while still causing a
    lot of connection drops.

    These hooks are disabled when SUNRPC_DEBUG is turned off.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

24 Apr, 2015

2 commits

  • * bugfixes:
    NFSv4: Return delegations synchronously in evict_inode
    SUNRPC: Fix a regression when reconnecting
    NFS: remount with security change should return EINVAL
    nfs: do not export discarded symbols
    NFSv4.1: don't export static symbol

    Trond Myklebust
     
  • v2: gracefully handle the case where some dentry pointers end up NULL
    and be more dilligent about zeroing out dentry pointers

    We currently have a problem that SELinux policy is being enforced when
    creating debugfs files. If a debugfs file is created as a side effect of
    doing some syscall, then that creation can fail if the SELinux policy
    for that process prevents it.

    This seems wrong. We don't do that for files under /proc, for instance,
    so Bruce has proposed a patch to fix that.

    While discussing that patch however, Greg K.H. stated:

    "No kernel code should care / fail if a debugfs function fails, so
    please fix up the sunrpc code first."

    This patch converts all of the sunrpc debugfs setup code to be void
    return functins, and the callers to not look for errors from those
    functions.

    This should allow rpc_clnt and rpc_xprt creation to work, even if the
    kernel fails to create debugfs files for some reason.

    Cc: Greg Kroah-Hartman
    Acked-by: "J. Bruce Fields"
    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     

28 Mar, 2015

1 commit

  • If the task needs to give up the socket lock in order to allow a
    reconnect to occur, then it must also clear the 'rq_bytes_sent' field
    so that when it retransmits, it knows to start from the beginning.

    Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing")
    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

10 Feb, 2015

1 commit


09 Feb, 2015

1 commit

  • The socket lock is currently held by the task that is requesting the
    connection be established. While that is efficient in the case where
    the connection happens quickly, it is racy in the case where it doesn't.
    What we really want is for the connect helper to be able to block access
    to the socket while it is being set up.

    This patch does so by arranging to transfer the socket lock from the
    task that is requesting the connect attempt, and then releasing that
    lock once everything is done.
    This scheme also gives us automatic protection against collisions with
    the RPC close code, so we can kill the cancel_delayed_work_sync()
    call in xs_close().

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

28 Nov, 2014

1 commit

  • Add a new directory heirarchy under the debugfs sunrpc/ directory:

    sunrpc/
    rpc_xprt/
    /

    Within that directory, we can put files that give info about the
    xprts. We do have the (minor) problem that there is no succinct,
    unique identifier for rpc_xprts. So we generate them synthetically
    with a static atomic_t counter.

    For now, this directory just holds an "info" file, but we may add
    other files to it in the future.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     

25 Nov, 2014

2 commits


14 Aug, 2014

1 commit

  • Pull NFS client updates from Trond Myklebust:
    "Highlights include:

    - stable fix for a bug in nfs3_list_one_acl()
    - speed up NFS path walks by supporting LOOKUP_RCU
    - more read/write code cleanups
    - pNFS fixes for layout return on close
    - fixes for the RCU handling in the rpcsec_gss code
    - more NFS/RDMA fixes"

    * tag 'nfs-for-3.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits)
    nfs: reject changes to resvport and sharecache during remount
    NFS: Avoid infinite loop when RELEASE_LOCKOWNER getting expired error
    SUNRPC: remove all refcounting of groupinfo from rpcauth_lookupcred
    NFS: fix two problems in lookup_revalidate in RCU-walk
    NFS: allow lockless access to access_cache
    NFS: teach nfs_lookup_verify_inode to handle LOOKUP_RCU
    NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU
    NFS: support RCU_WALK in nfs_permission()
    sunrpc/auth: allow lockless (rcu) lookup of credential cache.
    NFS: prepare for RCU-walk support but pushing tests later in code.
    NFS: nfs4_lookup_revalidate: only evaluate parent if it will be used.
    NFS: add checks for returned value of try_module_get()
    nfs: clear_request_commit while holding i_lock
    pnfs: add pnfs_put_lseg_async
    pnfs: find swapped pages on pnfs commit lists too
    nfs: fix comment and add warn_on for PG_INODE_REF
    nfs: check wait_on_bit_lock err in page_group_lock
    sunrpc: remove "ec" argument from encrypt_v2 operation
    sunrpc: clean up sparse endianness warnings in gss_krb5_wrap.c
    sunrpc: clean up sparse endianness warnings in gss_krb5_seal.c
    ...

    Linus Torvalds
     

18 Jul, 2014

1 commit

  • The current code always selects XPRT_TRANSPORT_BC_TCP for the back
    channel, even when the forward channel was not TCP (eg, RDMA). When
    a 4.1 mount is attempted with RDMA, the server panics in the TCP BC
    code when trying to send CB_NULL.

    Instead, construct the transport protocol number from the forward
    channel transport or'd with XPRT_TRANSPORT_BC. Transports that do
    not support bi-directional RPC will not have registered a "BC"
    transport, causing create_backchannel_client() to fail immediately.

    Fixes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=265
    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

03 Jul, 2014

1 commit

  • The callback handler xs_error_report() can end up propagating an EPIPE
    error by means of the call to xprt_wake_pending_tasks(). Ensure that
    xprt_connect_status() does not automatically convert this into an
    EIO error.

    Reported-by: Weston Andros Adamson
    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

11 Jun, 2014

1 commit

  • Pull NFS client updates from Trond Myklebust:
    "Highlights include:

    - massive cleanup of the NFS read/write code by Anna and Dros
    - support multiple NFS read/write requests per page in order to deal
    with non-page aligned pNFS striping. Also cleans up the r/wsize <
    page size code nicely.
    - stable fix for ensuring inode is declared uptodate only after all
    the attributes have been checked.
    - stable fix for a kernel Oops when remounting
    - NFS over RDMA client fixes
    - move the pNFS files layout driver into its own subdirectory"

    * tag 'nfs-for-3.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits)
    NFS: populate ->net in mount data when remounting
    pnfs: fix lockup caused by pnfs_generic_pg_test
    NFSv4.1: Fix typo in dprintk
    NFSv4.1: Comment is now wrong and redundant to code
    NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_state
    xprtrdma: Disconnect on registration failure
    xprtrdma: Remove BUG_ON() call sites
    xprtrdma: Avoid deadlock when credit window is reset
    SUNRPC: Move congestion window constants to header file
    xprtrdma: Reset connection timeout after successful reconnect
    xprtrdma: Use macros for reconnection timeout constants
    xprtrdma: Allocate missing pagelist
    xprtrdma: Remove Tavor MTU setting
    xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting
    xprtrdma: Reduce the number of hardway buffer allocations
    xprtrdma: Limit work done by completion handler
    xprtrmda: Reduce calls to ib_poll_cq() in completion handlers
    xprtrmda: Reduce lock contention in completion handlers
    xprtrdma: Split the completion queue
    xprtrdma: Make rpcrdma_ep_destroy() return void
    ...

    Linus Torvalds
     

04 Jun, 2014

1 commit


18 Apr, 2014

1 commit

  • Mostly scripted conversion of the smp_mb__* barriers.

    Signed-off-by: Peter Zijlstra
    Acked-by: Paul E. McKenney
    Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
    Cc: Linus Torvalds
    Cc: linux-arch@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

30 Mar, 2014

1 commit


29 Jan, 2014

1 commit

  • Pull NFS client updates from Trond Myklebust:
    "Highlights include:

    - stable fix for an infinite loop in RPC state machine
    - stable fix for a use after free situation in the NFSv4 trunking discovery
    - stable fix for error handling in the NFSv4 trunking discovery
    - stable fix for the page write update code
    - stable fix for the NFSv4.1 mount time security negotiation
    - stable fix for the NFSv4 open code.
    - O_DIRECT locking fixes
    - fix an Oops in the pnfs file commit code
    - RPC layer needs finer grained handling of connection errors
    - more RPC GSS upcall fixes"

    * tag 'nfs-for-3.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (30 commits)
    pnfs: Proper delay for NFS4ERR_RECALLCONFLICT in layout_get_done
    pnfs: fix BUG in filelayout_recover_commit_reqs
    nfs4: fix discover_server_trunking use after free
    NFSv4.1: Handle errors correctly in nfs41_walk_client_list
    nfs: always make sure page is up-to-date before extending a write to cover the entire page
    nfs: page cache invalidation for dio
    nfs: take i_mutex during direct I/O reads
    nfs: merge nfs_direct_write into nfs_file_direct_write
    nfs: merge nfs_direct_read into nfs_file_direct_read
    nfs: increment i_dio_count for reads, too
    nfs: defer inode_dio_done call until size update is done
    nfs: fix size updates for aio writes
    nfs4.1: properly handle ENOTSUP in SECINFO_NO_NAME
    NFSv4.1: Fix a race in nfs4_write_inode
    NFSv4.1: Don't trust attributes if a pNFS LAYOUTCOMMIT is outstanding
    point to the right include file in a comment (left over from a9004abc3)
    NFS: dprintk() should not print negative fileids and inode numbers
    nfs: fix dead code of ipv6_addr_scope
    sunrpc: Fix infinite loop in RPC state machine
    SUNRPC: Add tracepoint for socket errors
    ...

    Linus Torvalds
     

15 Jan, 2014

1 commit