12 Nov, 2020

2 commits


17 Sep, 2020

1 commit

  • nfs_readdir_page_filler() iterates over entries in a directory, reusing
    the same security label buffer, but does not reset the buffer's length.
    This causes decode_attr_security_label() to return -ERANGE if an entry's
    security label is longer than the previous one's. This error, in
    nfs4_decode_dirent(), only gets passed up as -EAGAIN, which causes another
    failed attempt to copy into the buffer. The second error is ignored and
    the remaining entries do not show up in ls, specifically the getdents64()
    syscall.

    Reproduce by creating multiple files in NFS and giving one of the later
    files a longer security label. ls will not see that file nor any that are
    added afterwards, though they will exist on the backend.

    In nfs_readdir_page_filler(), reset security label buffer length before
    every reuse

    Signed-off-by: Jeffrey Mitchell
    Fixes: b4487b935452 ("nfs: Fix getxattr kernel panic and memory overflow")
    Signed-off-by: Trond Myklebust

    Jeffrey Mitchell
     

24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

14 Jul, 2020

2 commits


08 Apr, 2020

1 commit

  • Pull NFS client updates from Trond Myklebust:
    "Highlights include:

    Stable fixes:
    - Fix a page leak in nfs_destroy_unlinked_subrequests()

    - Fix use-after-free issues in nfs_pageio_add_request()

    - Fix new mount code constant_table array definitions

    - finish_automount() requires us to hold 2 refs to the mount record

    Features:
    - Improve the accuracy of telldir/seekdir by using 64-bit cookies
    when possible.

    - Allow one RDMA active connection and several zombie connections to
    prevent blocking if the remote server is unresponsive.

    - Limit the size of the NFS access cache by default

    - Reduce the number of references to credentials that are taken by
    NFS

    - pNFS files and flexfiles drivers now support per-layout segment
    COMMIT lists.

    - Enable partial-file layout segments in the pNFS/flexfiles driver.

    - Add support for CB_RECALL_ANY to the pNFS flexfiles layout type

    - pNFS/flexfiles Report NFS4ERR_DELAY and NFS4ERR_GRACE errors from
    the DS using the layouterror mechanism.

    Bugfixes and cleanups:
    - SUNRPC: Fix krb5p regressions

    - Don't specify NFS version in "UDP not supported" error

    - nfsroot: set tcp as the default transport protocol

    - pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()

    - alloc_nfs_open_context() must use the file cred when available

    - Fix locking when dereferencing the delegation cred

    - Fix memory leaks in O_DIRECT when nfs_get_lock_context() fails

    - Various clean ups of the NFS O_DIRECT commit code

    - Clean up RDMA connect/disconnect

    - Replace zero-length arrays with C99-style flexible arrays"

    * tag 'nfs-for-5.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (86 commits)
    NFS: Clean up process of marking inode stale.
    SUNRPC: Don't start a timer on an already queued rpc task
    NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()
    NFS/pnfs: Fix dereference of layout cred in pnfs_layoutcommit_inode()
    NFS: Beware when dereferencing the delegation cred
    NFS: Add a module parameter to set nfs_mountpoint_expiry_timeout
    NFS: finish_automount() requires us to hold 2 refs to the mount record
    NFS: Fix a few constant_table array definitions
    NFS: Try to join page groups before an O_DIRECT retransmission
    NFS: Refactor nfs_lock_and_join_requests()
    NFS: Reverse the submission order of requests in __nfs_pageio_add_request()
    NFS: Clean up nfs_lock_and_join_requests()
    NFS: Remove the redundant function nfs_pgio_has_mirroring()
    NFS: Fix memory leaks in nfs_pageio_stop_mirroring()
    NFS: Fix a request reference leak in nfs_direct_write_clear_reqs()
    NFS: Fix use-after-free issues in nfs_pageio_add_request()
    NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests()
    NFS: Fix a page leak in nfs_destroy_unlinked_subrequests()
    NFS: Remove unused FLUSH_SYNC support in nfs_initiate_pgio()
    pNFS/flexfiles: Specify the layout segment range in LAYOUTGET
    ...

    Linus Torvalds
     

07 Apr, 2020

1 commit


24 Mar, 2020

1 commit


16 Mar, 2020

4 commits

  • The current codebase makes use of the zero-length array language
    extension to the C90 standard, but the preferred mechanism to declare
    variable-length types such as these ones is a flexible array member[1][2],
    introduced in C99:

    struct foo {
    int stuff;
    struct boo array[];
    };

    By making use of the mechanism above, we will get a compiler warning
    in case the flexible array does not occur last in the structure, which
    will help us prevent some kind of undefined behavior bugs from being
    inadvertently introduced[3] to the codebase from now on.

    Also, notice that, dynamic memory allocations won't be affected by
    this change:

    "Flexible array members have incomplete type, and so the sizeof operator
    may not be applied. As a quirk of the original implementation of
    zero-length arrays, sizeof evaluates to zero."[1]

    This issue was found with the help of Coccinelle.

    [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
    [2] https://github.com/KSPP/linux/issues/21
    [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Trond Myklebust

    Gustavo A. R. Silva
     
  • In function nfs_permission:
    1. the rcu_read_lock and rcu_read_unlock around nfs_do_access
    is unnecessary because the rcu critical data structure is already
    protected in subsidiary function nfs_access_get_cached_rcu. No other
    data structure needs rcu_read_lock in nfs_do_access.

    2. call nfs_do_access once is enough, because:
    2-1. when mask has MAY_NOT_BLOCK bit
    The second call to nfs_do_access will not happen.

    2-2. when mask has no MAY_NOT_BLOCK bit
    The second call to nfs_do_access will happen if res == -ECHILD, which
    means the first nfs_do_access goes out after statement if (!may_block).
    The second call to nfs_do_access will go through this procedure once
    again except continue the work after if (!may_block).
    But above work can be performed by only one call to nfs_do_access
    without mangling the mask flag.

    Tested in x86_64
    Signed-off-by: Zhouyi Zhou
    Signed-off-by: Trond Myklebust

    Zhouyi Zhou
     
  • Currently, we have no real limit on the access cache size (we set it
    to ULONG_MAX). That can lead to credentials getting pinned for a
    very long time on lots of files if you have a system with a lot of
    memory.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • When we're running as a 64-bit architecture and are not running in
    32-bit compatibility mode, it is better to use the 64-bit readdir
    cookies that supplied by the server. Doing so improves the accuracy
    of telldir()/seekdir(), particularly when the directory is changing,
    for instance, when doing 'rm -rf'.

    We still fall back to using the 32-bit offsets on 32-bit architectures
    and when in compatibility mode.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

21 Feb, 2020

1 commit

  • This patch fixes the following sparse error:
    fs/nfs/dir.c:2353:14: error: incompatible types in comparison expression (different address spaces):
    fs/nfs/dir.c:2353:14: struct list_head [noderef] *
    fs/nfs/dir.c:2353:14: struct list_head *

    Signed-off-by: Madhuparna Bhowmik
    Signed-off-by: Paul E. McKenney

    Madhuparna Bhowmik
     

13 Feb, 2020

1 commit

  • If a dentry was not initially looked up while we were holding a
    delegation, then we do still need to revalidate that it still holds
    the same name. If there are multiple hard links to the same file,
    then all the hard links need validation.

    Reported-by: Benjamin Coddington
    Signed-off-by: Trond Myklebust
    Reviewed-by: Benjamin Coddington
    Tested-by: Benjamin Coddington
    [Anna: Put nfs_unset_verifier_delegated() under CONFIG_NFS_V4]
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     

10 Feb, 2020

1 commit

  • In order to avoid having our dentry revalidation race with an update
    of the directory on the server, we need to store the verifier before
    the RPC calls to LOOKUP and READDIR.

    Signed-off-by: Trond Myklebust
    Reviewed-by: Benjamin Coddington
    Tested-by: Benjamin Coddington
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     

04 Feb, 2020

6 commits

  • When the directory is large and it's being modified by one client
    while another client is doing the 'ls -l' on the same directory then
    the cache page invalidation from nfs_force_use_readdirplus causes
    the reading client to keep restarting READDIRPLUS from cookie 0
    which causes the 'ls -l' to take a very long time to complete,
    possibly never completing.

    Currently when nfs_force_use_readdirplus is called to switch from
    READDIR to READDIRPLUS, it invalidates all the cached pages of the
    directory. This cache page invalidation causes the next nfs_readdir
    to re-read the directory content from cookie 0.

    This patch is to optimise the cache invalidation in
    nfs_force_use_readdirplus by only truncating the cached pages from
    last page index accessed to the end the file. It also marks the
    inode to delay invalidating all the cached page of the directory
    until the next initial nfs_readdir of the next 'ls' instance.

    Signed-off-by: Dai Ngo
    Reviewed-by: Trond Myklebust
    [Anna - Fix conflicts with Trond's readdir patches]
    [Anna - Remove redundant call to nfs_zap_mapping()]
    [Anna - Replace d_inode(file_dentry(desc->file)) with file_inode(desc->file)]
    Signed-off-by: Anna Schumaker

    Dai Ngo
     
  • Now that the page cache locking is repaired, we should be able to
    switch to using iterate_shared() for improved concurrency when
    doing readdir().

    Signed-off-by: Trond Myklebust
    Reviewed-by: Benjamin Coddington
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • The directory strings stored in the readdir cache may be used with
    printk(), so it is better to ensure they are nul-terminated.

    Signed-off-by: Trond Myklebust
    Reviewed-by: Benjamin Coddington
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • When a NFS directory page cache page is removed from the page cache,
    its contents are freed through a call to nfs_readdir_clear_array().
    To prevent the removal of the page cache entry until after we've
    finished reading it, we must take the page lock.

    Fixes: 11de3b11e08c ("NFS: Fix a memory leak in nfs_readdir")
    Cc: stable@vger.kernel.org # v2.6.37+
    Signed-off-by: Trond Myklebust
    Reviewed-by: Benjamin Coddington
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • nfs_readdir_xdr_to_array() must not exit without having initialised
    the array, so that the page cache deletion routines can safely
    call nfs_readdir_clear_array().
    Furthermore, we should ensure that if we exit nfs_readdir_filler()
    with an error, we free up any page contents to prevent a leak
    if we try to fill the page again.

    Fixes: 11de3b11e08c ("NFS: Fix a memory leak in nfs_readdir")
    Cc: stable@vger.kernel.org # v2.6.37+
    Signed-off-by: Trond Myklebust
    Reviewed-by: Benjamin Coddington
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • We do not need to have the rcu lookup method fail in the case where
    the fsuid/fsgid and supplemental groups match.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     

25 Jan, 2020

1 commit


15 Jan, 2020

2 commits


27 Sep, 2019

1 commit

  • Pull NFS client updates from Anna Schumaker:
    "Stable bugfixes:
    - Dequeue the request from the receive queue while we're re-encoding
    # v4.20+
    - Fix buffer handling of GSS MIC without slack # 5.1

    Features:
    - Increase xprtrdma maximum transport header and slot table sizes
    - Add support for nfs4_call_sync() calls using a custom
    rpc_task_struct
    - Optimize the default readahead size
    - Enable pNFS filelayout LAYOUTGET on OPEN

    Other bugfixes and cleanups:
    - Fix possible null-pointer dereferences and memory leaks
    - Various NFS over RDMA cleanups
    - Various NFS over RDMA comment updates
    - Don't receive TCP data into a reset request buffer
    - Don't try to parse incomplete RPC messages
    - Fix congestion window race with disconnect
    - Clean up pNFS return-on-close error handling
    - Fixes for NFS4ERR_OLD_STATEID handling"

    * tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (53 commits)
    pNFS/filelayout: enable LAYOUTGET on OPEN
    NFS: Optimise the default readahead size
    NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
    NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
    NFSv4: Fix OPEN_DOWNGRADE error handling
    pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid
    NFSv4: Add a helper to increment stateid seqids
    NFSv4: Handle RPC level errors in LAYOUTRETURN
    NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
    NFSv4: Clean up pNFS return-on-close error handling
    pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
    NFS: remove unused check for negative dentry
    NFSv3: use nfs_add_or_obtain() to create and reference inodes
    NFS: Refactor nfs_instantiate() for dentry referencing callers
    SUNRPC: Fix congestion window race with disconnect
    SUNRPC: Don't try to parse incomplete RPC messages
    SUNRPC: Rename xdr_buf_read_netobj to xdr_buf_read_mic
    SUNRPC: Fix buffer handling of GSS MIC without slack
    SUNRPC: RPC level errors should always set task->tk_rpc_status
    SUNRPC: Don't receive TCP data into a request buffer that has been reset
    ...

    Linus Torvalds
     

21 Sep, 2019

2 commits


19 Aug, 2019

1 commit


13 Jul, 2019

1 commit

  • This reverts commit be4c2d4723a4a637f0d1b4f7c66447141a4b3564.

    That commit caused a severe memory leak in nfs_readdir_make_qstr().

    When listing a directory with more than 100 files (this is how many
    struct nfs_cache_array_entry elements fit in one 4kB page), all
    allocated file name strings past those 100 leak.

    The root of the leakage is that those string pointers are managed in
    pages which are never linked into the page cache.

    fs/nfs/dir.c puts pages into the page cache by calling
    read_cache_page(); the callback function nfs_readdir_filler() will
    then fill the given page struct which was passed to it, which is
    already linked in the page cache (by do_read_cache_page() calling
    add_to_page_cache_lru()).

    Commit be4c2d4723a4 added another (local) array of allocated pages, to
    be filled with more data, instead of discarding excess items received
    from the NFS server. Those additional pages can be used by the next
    nfs_readdir_filler() call (from within the same nfs_readdir() call).

    The leak happens when some of those additional pages are never used
    (copied to the page cache using copy_highpage()). The pages will be
    freed by nfs_readdir_free_pages(), but their contents will not. The
    commit did not invoke nfs_readdir_clear_array() (and doing so would
    have been dangerous, because it did not track which of those pages
    were already copied to the page cache, risking double free bugs).

    How to reproduce the leak:

    - Use a kernel with CONFIG_SLUB_DEBUG_ON.

    - Create a directory on a NFS mount with more than 100 files with
    names long enough to use the "kmalloc-32" slab (so we can easily
    look up the allocation counts):

    for i in `seq 110`; do touch ${i}_0123456789abcdef; done

    - Drop all caches:

    echo 3 >/proc/sys/vm/drop_caches

    - Check the allocation counter:

    grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
    30564391 nfs_readdir_add_to_array+0x73/0xd0 age=534558/4791307/6540952 pid=370-1048386 cpus=0-47 nodes=0-1

    - Request a directory listing and check the allocation counters again:

    ls
    [...]
    grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
    30564511 nfs_readdir_add_to_array+0x73/0xd0 age=207/4792999/6542663 pid=370-1048386 cpus=0-47 nodes=0-1

    There are now 120 new allocations.

    - Drop all caches and check the counters again:

    echo 3 >/proc/sys/vm/drop_caches
    grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
    30564401 nfs_readdir_add_to_array+0x73/0xd0 age=735/4793524/6543176 pid=370-1048386 cpus=0-47 nodes=0-1

    110 allocations are gone, but 10 have leaked and will never be freed.

    Unhelpfully, those allocations are explicitly excluded from KMEMLEAK,
    that's why my initial attempts with KMEMLEAK were not successful:

    /*
    * Avoid a kmemleak false positive. The pointer to the name is stored
    * in a page cache page which kmemleak does not scan.
    */
    kmemleak_not_leak(string->name);

    It would be possible to solve this bug without reverting the whole
    commit:

    - keep track of which pages were not used, and call
    nfs_readdir_clear_array() on them, or
    - manually link those pages into the page cache

    But for now I have decided to just revert the commit, because the real
    fix would require complex considerations, risking more dangerous
    (crash) bugs, which may seem unsuitable for the stable branches.

    Signed-off-by: Max Kellermann
    Cc: stable@vger.kernel.org # v5.1+
    Signed-off-by: Trond Myklebust

    Max Kellermann
     

07 Jul, 2019

1 commit

  • If the client detects that close-to-open cache consistency has been
    violated, and that the file or directory has been changed on the
    server, then do a cache invalidation when we're done working with
    the file.
    The reason we don't do an immediate cache invalidation is that we
    want to avoid performance problems due to false positives. Also,
    note that we cannot guarantee cache consistency in this situation
    even if we do invalidate the cache.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

10 May, 2019

1 commit


21 Feb, 2019

4 commits

  • This fixes the typo in comments of nfs_readdir_alloc_pages().
    Because nfs_readdir_large_page and nfs_readdir_free_pagearray had been
    renamed.

    Signed-off-by: Liguang Zhang
    Signed-off-by: Trond Myklebust

    zhangliguang
     
  • This removes redundant semicolon for ending code.

    Fixes: c7944ebb9ce9 ("NFSv4: Fix lookup revalidate of regular files")
    Signed-off-by: Liguang Zhang
    Signed-off-by: Trond Myklebust

    zhangliguang
     
  • When listing very large directories via NFS, clients may take a long
    time to complete. There are about three factors involved:

    First of all, ls and practically every other method of listing a
    directory including python os.listdir and find rely on libc readdir().
    However readdir() only reads 32K of directory entries at a time, which
    means that if you have a lot of files in the same directory, it is going
    to take an insanely long time to read all the directory entries.

    Secondly, libc readdir() reads 32K of directory entries at a time, in
    kernel space 32K buffer split into 8 pages. One NFS readdirplus rpc will
    be called for one page, which introduces many readdirplus rpc calls.

    Lastly, one NFS readdirplus rpc asks for 32K data (filled by nfs_dentry)
    to fill one page (filled by dentry), we found that nearly one third of
    data was wasted.

    To solve above problems, pagecache mechanism was introduced. One NFS
    readdirplus rpc will ask for a large data (more than 32k), the data can
    fill more than one page, the cached pages can be used for next readdir
    call. This can reduce many readdirplus rpc calls and improve readdirplus
    performance.

    TESTING:
    When listing very large directories(include 300 thousand files) via NFS

    time ls -l /nfs_mount | wc -l

    without the patch:
    300001
    real 1m53.524s
    user 0m2.314s
    sys 0m2.599s

    with the patch:
    300001
    real 0m23.487s
    user 0m2.305s
    sys 0m2.558s

    Improved performance: 79.6%
    readdirplus rpc calls decrease: 85%

    Signed-off-by: Liguang Zhang
    Signed-off-by: Trond Myklebust

    luanshi
     
  • Fix up some compiler warnings about function parameters, etc not being
    correctly described or formatted.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

20 Dec, 2018

2 commits


01 Oct, 2018

1 commit