31 Oct, 2019

1 commit


27 Sep, 2019

1 commit

  • Pull NFS client updates from Anna Schumaker:
    "Stable bugfixes:
    - Dequeue the request from the receive queue while we're re-encoding
    # v4.20+
    - Fix buffer handling of GSS MIC without slack # 5.1

    Features:
    - Increase xprtrdma maximum transport header and slot table sizes
    - Add support for nfs4_call_sync() calls using a custom
    rpc_task_struct
    - Optimize the default readahead size
    - Enable pNFS filelayout LAYOUTGET on OPEN

    Other bugfixes and cleanups:
    - Fix possible null-pointer dereferences and memory leaks
    - Various NFS over RDMA cleanups
    - Various NFS over RDMA comment updates
    - Don't receive TCP data into a reset request buffer
    - Don't try to parse incomplete RPC messages
    - Fix congestion window race with disconnect
    - Clean up pNFS return-on-close error handling
    - Fixes for NFS4ERR_OLD_STATEID handling"

    * tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (53 commits)
    pNFS/filelayout: enable LAYOUTGET on OPEN
    NFS: Optimise the default readahead size
    NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
    NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
    NFSv4: Fix OPEN_DOWNGRADE error handling
    pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid
    NFSv4: Add a helper to increment stateid seqids
    NFSv4: Handle RPC level errors in LAYOUTRETURN
    NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
    NFSv4: Clean up pNFS return-on-close error handling
    pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
    NFS: remove unused check for negative dentry
    NFSv3: use nfs_add_or_obtain() to create and reference inodes
    NFS: Refactor nfs_instantiate() for dentry referencing callers
    SUNRPC: Fix congestion window race with disconnect
    SUNRPC: Don't try to parse incomplete RPC messages
    SUNRPC: Rename xdr_buf_read_netobj to xdr_buf_read_mic
    SUNRPC: Fix buffer handling of GSS MIC without slack
    SUNRPC: RPC level errors should always set task->tk_rpc_status
    SUNRPC: Don't receive TCP data into a request buffer that has been reset
    ...

    Linus Torvalds
     

21 Sep, 2019

1 commit

  • If the congestion window closes just as the transport disconnects,
    a reconnect is never driven because:

    1. The XPRT_CONG_WAIT flag prevents tasks from taking the write lock
    2. There's no wake-up of the first task on the xprt->sending queue

    To address this, clear the congestion wait flag as part of
    completing a disconnect.

    Fixes: 75891f502f5f ("SUNRPC: Support for congestion control ... ")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

18 Sep, 2019

1 commit


27 Aug, 2019

1 commit


19 Jul, 2019

1 commit


13 Jul, 2019

1 commit

  • NFSoRDMA client updates for 5.3

    New features:
    - Add a way to place MRs back on the free list
    - Reduce context switching
    - Add new trace events

    Bugfixes and cleanups:
    - Fix a BUG when tracing is enabled with NFSv4.1
    - Fix a use-after-free in rpcrdma_post_recvs
    - Replace use of xdr_stream_pos in rpcrdma_marshal_req
    - Fix occasional transport deadlock
    - Fix show_nfs_errors macros, other tracing improvements
    - Remove RPCRDMA_REQ_F_PENDING and fr_state
    - Various simplifications and refactors

    Trond Myklebust
     

09 Jul, 2019

1 commit

  • Adapt and apply changes that were made to the TCP socket connect
    code. See the following commits for details on the purpose of
    these changes:

    Commit 7196dbb02ea0 ("SUNRPC: Allow changing of the TCP timeout parameters on the fly")
    Commit 3851f1cdb2b8 ("SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout")
    Commit 02910177aede ("SUNRPC: Fix reconnection timeouts")

    Some common transport code is moved to xprt.c to satisfy the code
    duplication police.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

07 Jul, 2019

4 commits


22 Jun, 2019

1 commit

  • Jon Hunter reports:
    "I have been noticing intermittent failures with a system suspend test on
    some of our machines that have a NFS mounted root file-system. Bisecting
    this issue points to your commit 431235818bc3 ("SUNRPC: Declare RPC
    timers as TIMER_DEFERRABLE") and reverting this on top of v5.2-rc3 does
    appear to resolve the problem.

    The cause of the suspend failure appears to be a long delay observed
    sometimes when resuming from suspend, and this is causing our test to
    timeout."

    This reverts commit 431235818bc3a919ca7487500c67c3144feece80.

    Reported-by: Jon Hunter
    Signed-off-by: Anna Schumaker

    Anna Schumaker
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

26 Apr, 2019

7 commits


16 Mar, 2019

1 commit

  • When the socket is closed, we currently send an EAGAIN error to all
    pending requests in order to ask them to retransmit. Use ENOTCONN
    instead, to ensure that they try to reconnect before attempting to
    transmit.
    This also helps SOFTCONN tasks to behave correctly in this
    situation.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

02 Mar, 2019

1 commit

  • If a layout segment gets invalidated while a pNFS I/O operation
    is queued for transmission, then we ideally want to abort
    immediately. This is particularly the case when there is a large
    number of I/O related RPCs queued in the RPC layer, and the layout
    segment gets invalidated due to an ENOSPC error, or an EACCES (because
    the client was fenced). We may end up forced to spam the MDS with a
    lot of otherwise unnecessary LAYOUTERRORs after that I/O fails.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

25 Feb, 2019

1 commit


21 Feb, 2019

3 commits


14 Feb, 2019

1 commit


16 Jan, 2019

2 commits

  • When using Kerberos with v4.20, I've observed frequent connection
    loss on heavy workloads. I traced it down to the client underrunning
    the GSS sequence number window -- NFS servers are required to drop
    the RPC with the low sequence number, and also drop the connection
    to signal that an RPC was dropped.

    Bisected to commit 918f3c1fe83c ("SUNRPC: Improve latency for
    interactive tasks").

    I've got a one-line workaround for this issue, which is easy to
    backport to v4.20 while a more permanent solution is being derived.
    Essentially, tk_owner-based sorting is disabled for RPCs that carry
    a GSS sequence number.

    Fixes: 918f3c1fe83c ("SUNRPC: Improve latency for interactive ... ")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • When we resend a request, ensure that the 'rq_bytes_sent' is reset
    to zero.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     

19 Dec, 2018

2 commits

  • Over the years, xprt_connect_status() has been superseded by
    call_connect_status(), which now handles all the errors that
    xprt_connect_status() does and more. Since the latter converts
    all errors that it doesn't recognise to EIO, then it is time
    for it to be retired.

    Reported-by: Chuck Lever
    Signed-off-by: Trond Myklebust
    Tested-by: Chuck Lever

    Trond Myklebust
     
  • When the socket is closed, we need to call xprt_disconnect_done() in order
    to clean up the XPRT_WRITE_SPACE flag, and wake up the sleeping tasks.

    However, we also want to ensure that we don't wake them up before the socket
    is closed, since that would cause thundering herd issues with everyone
    piling up to retransmit before the TCP shutdown dance has completed.
    Only the task that holds XPRT_LOCKED needs to wake up early in order to
    allow the close to complete.

    Reported-by: Dave Wysochanski
    Reported-by: Scott Mayhew
    Cc: Chuck Lever
    Signed-off-by: Trond Myklebust
    Tested-by: Chuck Lever

    Trond Myklebust
     

02 Dec, 2018

2 commits

  • If an asynchronous connection attempt completes while another task is
    in xprt_connect(), then the call to rpc_sleep_on() could end up
    racing with the call to xprt_wake_pending_tasks().
    So add a second test of the connection state after we've put the
    task to sleep and set the XPRT_CONNECTING flag, when we know that there
    can be no asynchronous connection attempts still in progress.

    Fixes: 0b9e79431377d ("SUNRPC: Move the test for XPRT_CONNECTING into...")
    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • If we retransmit an RPC request, we currently end up clobbering the
    value of req->rq_rcv_buf.bvec that was allocated by the initial call to
    xprt_request_prepare(req).

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

19 Oct, 2018

1 commit

  • NFS RDMA client updates for Linux 4.20

    Stable bugfixes:
    - Reset credit grant properly after a disconnect

    Other bugfixes and cleanups:
    - xprt_release_rqst_cong is called outside of transport_lock
    - Create more MRs at a time and toss out old ones during recovery
    - Various improvements to the RDMA connection and disconnection code:
    - Improve naming of trace events, functions, and variables
    - Add documenting comments
    - Fix metrics and stats reporting
    - Fix a tracepoint sparse warning

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

03 Oct, 2018

1 commit

  • For TCP, the logic in xprt_connect_status is currently never invoked
    to record a successful connection. Commit 2a4919919a97 ("SUNRPC:
    Return EAGAIN instead of ENOTCONN when waking up xprt->pending")
    changed the way TCP xprt's are awoken after a connect succeeds.

    Instead, change connection-oriented transports to bump connect_count
    and compute connect_time the moment that XPRT_CONNECTED is set.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

01 Oct, 2018

4 commits