21 Sep, 2020

9 commits

  • Time to remove dprintk call sites in here.

    Regarding the rpc_bind_status tracepoint: It's friendlier to
    administrators if they don't have to look up the error code to
    figure out what went wrong. Replace trace_rpc_bind_status with a
    set of tracepoints that report more specifically what the problem
    was, and what RPC program/version was being queried.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean up.

    When enabled, this dprintk adds a line in /var/log/messages after
    every RPC that reports the task ID (no connection to on the wire
    XID values) and the RPC's result (no connection to the program,
    operation, or the arguments and results).

    Thus it's value is pretty low. Let's remove it.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean up: Replace dprintk call sites.

    Note that rpc_call_rpcerror() already has a trace point, so perhaps
    adding trace_rpc_refresh_status() isn't necessary. However, it does
    report a particular category of error.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • For a long while we've wanted a tracepoint that fires when a major
    timeout is reported in the system log. Such a tracepoint can be
    attached to other actions that can take place when a timeout is
    detected (eg, server or connection health assessment).

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • This trace event can be used to audit transport connections from the
    client.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean up: The rpc_rpc_request tracepoint serves the same purpose.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean up: The rpc_task_run_action tracepoint serves the same
    purpose.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Introduce a tracepoint in call_allocate that reports the exact
    sizes in the RPC buffer allocation request and the status of the
    result. This helps catch problems with XDR buffer provisioning,
    and replaces transport-specific debugging instrumentation.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Current behaviour: every time a v3 operation is re-sent to the server
    we update (double) the timeout. There is no distinction between whether
    or not the previous timer had expired before the re-sent happened.

    Here's the scenario:
    1. Client sends a v3 operation
    2. Server RST-s the connection (prior to the timeout) (eg., connection
    is immediately reset)
    3. Client re-sends a v3 operation but the timeout is now 120sec.

    As a result, an application sees 2mins pause before a retry in case
    server again does not reply. Where as if a connection reset didn't
    change the timeout value, the client would have re-tried (the 3rd
    time) after 60secs.

    Signed-off-by: Olga Kornievskaia
    Signed-off-by: Anna Schumaker

    Olga Kornievskaia
     

24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

12 Jun, 2020

6 commits


15 May, 2020

1 commit

  • Each rpc_client has a cl_clid which is allocated from a global ida, and
    a debugfs directory which is named after cl_clid.

    We're releasing the cl_clid before we free the debugfs directory named
    after it. As soon as the cl_clid is released, that value is available
    for another newly created client.

    That leaves a window where another client may attempt to create a new
    debugfs directory with the same name as the not-yet-deleted debugfs
    directory from the dying client. Symptoms are log messages like

    Directory 4 with parent 'rpc_clnt' already present!

    Fixes: 7c4310ff5642 "SUNRPC: defer slow parts of rpc_free_client() to a workqueue."
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    J. Bruce Fields
     

12 May, 2020

1 commit

  • Ensure that signalled ASYNC rpc_tasks exit immediately instead of
    spinning until a timeout (or forever).

    To avoid checking for the signal flag on every scheduler iteration,
    the check is instead introduced in the client's finite state
    machine.

    Signed-off-by: Chuck Lever
    Fixes: ae67bd3821bb ("SUNRPC: Fix up task signalling")
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

11 May, 2020

1 commit

  • Parts of rpc_free_client() were recently moved to
    a separate rpc_free_clent_work(). This introduced
    a use-after-free as rpc_clnt_remove_pipedir() calls
    rpc_net_ns(), and that uses clnt->cl_xprt which has already
    been freed.
    So move the call to xprt_put() after the call to
    rpc_clnt_remove_pipedir().

    Reported-by: syzbot+22b5ef302c7c40d94ea8@syzkaller.appspotmail.com
    Fixes: 7c4310ff5642 ("SUNRPC: defer slow parts of rpc_free_client() to a workqueue.")
    Signed-off-by: NeilBrown
    Signed-off-by: Trond Myklebust

    NeilBrown
     

29 Apr, 2020

1 commit

  • The rpciod workqueue is on the write-out path for freeing dirty memory,
    so it is important that it never block waiting for memory to be
    allocated - this can lead to a deadlock.

    rpc_execute() - which is often called by an rpciod work item - calls
    rcp_task_release_client() which can lead to rpc_free_client().

    rpc_free_client() makes two calls which could potentially block wating
    for memory allocation.

    rpc_clnt_debugfs_unregister() calls into debugfs and will block while
    any of the debugfs files are being accessed. In particular it can block
    while any of the 'open' methods are being called and all of these use
    malloc for one thing or another. So this can deadlock if the memory
    allocation waits for NFS to complete some writes via rpciod.

    rpc_clnt_remove_pipedir() can take the inode_lock() and while it isn't
    obvious that memory allocations can happen while the lock it held, it is
    safer to assume they might and to not let rpciod call
    rpc_clnt_remove_pipedir().

    So this patch moves these two calls (together with the final kfree() and
    rpciod_down()) into a work-item to be run from the system work-queue.
    rpciod can continue its important work, and the final stages of the free
    can happen whenever they happen.

    I have seen this deadlock on a 4.12 based kernel where debugfs used
    synchronize_srcu() when removing objects. synchronize_srcu() requires a
    workqueue and there were no free workther threads and none could be
    allocated. While debugsfs no longer uses SRCU, I believe the deadlock
    is still possible.

    Signed-off-by: NeilBrown
    Signed-off-by: Trond Myklebust

    NeilBrown
     

22 Apr, 2020

1 commit

  • rpc_clnt_test_and_add_xprt() invokes rpc_call_null_helper(), which
    return the value of rpc_run_task() to "task". Since rpc_run_task() is
    impossible to return an ERR pointer, there is no need to add the
    IS_ERR() condition on "task" here. So we need to remove it.

    Fixes: 7f554890587c ("SUNRPC: Allow addition of new transports to a struct rpc_clnt")
    Signed-off-by: Xiyu Yang
    Signed-off-by: Xin Tan
    Signed-off-by: Trond Myklebust

    Xiyu Yang
     

08 Apr, 2020

1 commit

  • Pull NFS client updates from Trond Myklebust:
    "Highlights include:

    Stable fixes:
    - Fix a page leak in nfs_destroy_unlinked_subrequests()

    - Fix use-after-free issues in nfs_pageio_add_request()

    - Fix new mount code constant_table array definitions

    - finish_automount() requires us to hold 2 refs to the mount record

    Features:
    - Improve the accuracy of telldir/seekdir by using 64-bit cookies
    when possible.

    - Allow one RDMA active connection and several zombie connections to
    prevent blocking if the remote server is unresponsive.

    - Limit the size of the NFS access cache by default

    - Reduce the number of references to credentials that are taken by
    NFS

    - pNFS files and flexfiles drivers now support per-layout segment
    COMMIT lists.

    - Enable partial-file layout segments in the pNFS/flexfiles driver.

    - Add support for CB_RECALL_ANY to the pNFS flexfiles layout type

    - pNFS/flexfiles Report NFS4ERR_DELAY and NFS4ERR_GRACE errors from
    the DS using the layouterror mechanism.

    Bugfixes and cleanups:
    - SUNRPC: Fix krb5p regressions

    - Don't specify NFS version in "UDP not supported" error

    - nfsroot: set tcp as the default transport protocol

    - pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()

    - alloc_nfs_open_context() must use the file cred when available

    - Fix locking when dereferencing the delegation cred

    - Fix memory leaks in O_DIRECT when nfs_get_lock_context() fails

    - Various clean ups of the NFS O_DIRECT commit code

    - Clean up RDMA connect/disconnect

    - Replace zero-length arrays with C99-style flexible arrays"

    * tag 'nfs-for-5.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (86 commits)
    NFS: Clean up process of marking inode stale.
    SUNRPC: Don't start a timer on an already queued rpc task
    NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()
    NFS/pnfs: Fix dereference of layout cred in pnfs_layoutcommit_inode()
    NFS: Beware when dereferencing the delegation cred
    NFS: Add a module parameter to set nfs_mountpoint_expiry_timeout
    NFS: finish_automount() requires us to hold 2 refs to the mount record
    NFS: Fix a few constant_table array definitions
    NFS: Try to join page groups before an O_DIRECT retransmission
    NFS: Refactor nfs_lock_and_join_requests()
    NFS: Reverse the submission order of requests in __nfs_pageio_add_request()
    NFS: Clean up nfs_lock_and_join_requests()
    NFS: Remove the redundant function nfs_pgio_has_mirroring()
    NFS: Fix memory leaks in nfs_pageio_stop_mirroring()
    NFS: Fix a request reference leak in nfs_direct_write_clear_reqs()
    NFS: Fix use-after-free issues in nfs_pageio_add_request()
    NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests()
    NFS: Fix a page leak in nfs_destroy_unlinked_subrequests()
    NFS: Remove unused FLUSH_SYNC support in nfs_initiate_pgio()
    pNFS/flexfiles: Specify the layout segment range in LAYOUTGET
    ...

    Linus Torvalds
     

17 Mar, 2020

1 commit


16 Mar, 2020

2 commits


15 Jan, 2020

1 commit

  • The xprtrdma connect logic can return -EPROTO if the underlying
    device or network path does not support RDMA. This can happen
    after a device removal/insertion.

    - When SOFTCONN is set, EPROTO is a permanent error.

    - When SOFTCONN is not set, EPROTO is treated as a temporary error.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

18 Nov, 2019

1 commit

  • NFSoRDMA Client Updates for Linux 5.5

    New Features:
    - New tracepoints for congestion control and Local Invalidate WRs

    Bugfixes and Cleanups:
    - Eliminate log noise in call_reserveresult
    - Fix unstable connections after a reconnect
    - Clean up some code duplication
    - Close race between waking a sender and posting a receive
    - Fix MR list corruption, and clean up MR usage
    - Remove unused rpcrdma_sendctx fields
    - Try to avoid DMA mapping pages if it is too costly
    - Wake pending tasks if connection fails
    - Replace some dprintk()s with tracepoints

    Trond Myklebust
     

04 Nov, 2019

1 commit

  • NFSv2, v3 and NFSv4 servers often have duplicate replay caches that look
    at the source port when deciding whether or not an RPC call is a replay
    of a previous call. This requires clients to perform strange TCP gymnastics
    in order to ensure that when they reconnect to the server, they bind
    to the same source port.

    NFSv4.1 and NFSv4.2 have sessions that provide proper replay semantics,
    that do not look at the source port of the connection. This patch therefore
    ensures they can ignore the rebind requirement.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

24 Oct, 2019

1 commit


27 Sep, 2019

1 commit

  • Pull NFS client updates from Anna Schumaker:
    "Stable bugfixes:
    - Dequeue the request from the receive queue while we're re-encoding
    # v4.20+
    - Fix buffer handling of GSS MIC without slack # 5.1

    Features:
    - Increase xprtrdma maximum transport header and slot table sizes
    - Add support for nfs4_call_sync() calls using a custom
    rpc_task_struct
    - Optimize the default readahead size
    - Enable pNFS filelayout LAYOUTGET on OPEN

    Other bugfixes and cleanups:
    - Fix possible null-pointer dereferences and memory leaks
    - Various NFS over RDMA cleanups
    - Various NFS over RDMA comment updates
    - Don't receive TCP data into a reset request buffer
    - Don't try to parse incomplete RPC messages
    - Fix congestion window race with disconnect
    - Clean up pNFS return-on-close error handling
    - Fixes for NFS4ERR_OLD_STATEID handling"

    * tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (53 commits)
    pNFS/filelayout: enable LAYOUTGET on OPEN
    NFS: Optimise the default readahead size
    NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
    NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
    NFSv4: Fix OPEN_DOWNGRADE error handling
    pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid
    NFSv4: Add a helper to increment stateid seqids
    NFSv4: Handle RPC level errors in LAYOUTRETURN
    NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
    NFSv4: Clean up pNFS return-on-close error handling
    pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
    NFS: remove unused check for negative dentry
    NFSv3: use nfs_add_or_obtain() to create and reference inodes
    NFS: Refactor nfs_instantiate() for dentry referencing callers
    SUNRPC: Fix congestion window race with disconnect
    SUNRPC: Don't try to parse incomplete RPC messages
    SUNRPC: Rename xdr_buf_read_netobj to xdr_buf_read_mic
    SUNRPC: Fix buffer handling of GSS MIC without slack
    SUNRPC: RPC level errors should always set task->tk_rpc_status
    SUNRPC: Don't receive TCP data into a request buffer that has been reset
    ...

    Linus Torvalds
     

21 Sep, 2019

1 commit


18 Sep, 2019

2 commits


27 Aug, 2019

3 commits


19 Jul, 2019

1 commit


18 Jul, 2019

1 commit


13 Jul, 2019

1 commit


07 Jul, 2019

1 commit

  • With NFSv4.1, different network connections need to be explicitly
    bound to a session. During session startup, this is not possible
    so only a single connection must be used for session startup.

    So add a task flag to disable the default round-robin choice of
    connections (when nconnect > 1) and force the use of a single
    connection.
    Then use that flag on all requests for session management - for
    consistence, include NFSv4.0 management (SETCLIENTID) and session
    destruction

    Reported-by: Chuck Lever
    Signed-off-by: NeilBrown
    Signed-off-by: Trond Myklebust

    NeilBrown