15 Aug, 2020

1 commit

  • Pull NFS client updates from Trond Myklebust:
    "Stable fixes:
    - pNFS: Don't return layout segments that are being used for I/O
    - pNFS: Don't move layout segments off the active list when being used for I/O

    Features:
    - NFS: Add support for user xattrs through the NFSv4.2 protocol
    - NFS: Allow applications to speed up readdir+statx() using AT_STATX_DONT_SYNC
    - NFSv4.0 allow nconnect for v4.0

    Bugfixes and cleanups:
    - nfs: ensure correct writeback errors are returned on close()
    - nfs: nfs_file_write() should check for writeback errors
    - nfs: Fix getxattr kernel panic and memory overflow
    - NFS: Fix the pNFS/flexfiles mirrored read failover code
    - SUNRPC: dont update timeout value on connection reset
    - freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS
    - sunrpc: destroy rpc_inode_cachep after unregister_filesystem"

    * tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (32 commits)
    NFS: Fix flexfiles read failover
    fs: nfs: delete repeated words in comments
    rpc_pipefs: convert comma to semicolon
    nfs: Fix getxattr kernel panic and memory overflow
    NFS: Don't return layout segments that are in use
    NFS: Don't move layouts to plh_return_segs list while in use
    NFS: Add layout segment info to pnfs read/write/commit tracepoints
    NFS: Add tracepoints for layouterror and layoutstats.
    NFS: Report the stateid + status in trace_nfs4_layoutreturn_on_close()
    SUNRPC dont update timeout value on connection reset
    nfs: nfs_file_write() should check for writeback errors
    nfs: ensure correct writeback errors are returned on close()
    NFSv4.2: xattr cache: get rid of cache discard work queue
    NFS: remove redundant initialization of variable result
    NFSv4.0 allow nconnect for v4.0
    freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS
    sunrpc: destroy rpc_inode_cachep after unregister_filesystem
    NFSv4.2: add client side xattr caching.
    NFSv4.2: hook in the user extended attribute handlers
    NFSv4.2: add the extended attribute proc functions.
    ...

    Linus Torvalds
     

05 Aug, 2020

1 commit

  • The NFS_CONTEXT_ERROR_WRITE flag (as well as the check of said flag) was
    removed by commit 6fbda89b257f. The absence of an error check allows
    writes to be continually queued up for a server that may no longer be
    able to handle them. Fix it by adding an error check using the generic
    error reporting functions.

    Fixes: 6fbda89b257f ("NFS: Replace custom error reporting mechanism with generic one")
    Signed-off-by: Scott Mayhew
    Signed-off-by: Trond Myklebust

    Scott Mayhew
     

02 Aug, 2020

1 commit

  • nfs_wb_all() calls filemap_write_and_wait(), which uses
    filemap_check_errors() to determine the error to return.
    filemap_check_errors() only looks at the mapping->flags and will
    therefore only return either -ENOSPC or -EIO. To ensure that the
    correct error is returned on close(), nfs{,4}_file_flush() should call
    filemap_check_wb_err() which looks at the errseq value in
    mapping->wb_err without consuming it.

    Fixes: 6fbda89b257f ("NFS: Replace custom error reporting mechanism with
    generic one")
    Signed-off-by: Scott Mayhew
    Signed-off-by: Trond Myklebust

    Scott Mayhew
     

18 Jul, 2020

1 commit

  • Reverting commit d03727b248d0 "NFSv4 fix CLOSE not waiting for
    direct IO compeletion". This patch made it so that fput() by calling
    inode_dio_done() in nfs_file_release() would wait uninterruptably
    for any outstanding directIO to the file (but that wait on IO should
    be killable).

    The problem the patch was also trying to address was REMOVE returning
    ERR_ACCESS because the file is still opened, is supposed to be resolved
    by server returning ERR_FILE_OPEN and not ERR_ACCESS.

    Signed-off-by: Olga Kornievskaia
    Signed-off-by: Anna Schumaker

    Olga Kornievskaia
     

26 Jun, 2020

1 commit

  • Figuring out the root case for the REMOVE/CLOSE race and
    suggesting the solution was done by Neil Brown.

    Currently what happens is that direct IO calls hold a reference
    on the open context which is decremented as an asynchronous task
    in the nfs_direct_complete(). Before reference is decremented,
    control is returned to the application which is free to close the
    file. When close is being processed, it decrements its reference
    on the open_context but since directIO still holds one, it doesn't
    sent a close on the wire. It returns control to the application
    which is free to do other operations. For instance, it can delete a
    file. Direct IO is finally releasing its reference and triggering
    an asynchronous close. Which races with the REMOVE. On the server,
    REMOVE can be processed before the CLOSE, failing the REMOVE with
    EACCES as the file is still opened.

    Signed-off-by: Olga Kornievskaia
    Suggested-by: Neil Brown
    CC: stable@vger.kernel.org
    Signed-off-by: Anna Schumaker

    Olga Kornievskaia
     

15 Jan, 2020

2 commits

  • Don't clear the NFS_CONTEXT_RESEND_WRITES flag until after calling
    nfs_commit_inode(). Otherwise, if nfs_commit_inode() returns an
    error, we end up with dirty pages in the page cache, but no tag
    to tell us that those pages need resending.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • swapon over NFS does not go through generic_swapfile_activate
    code path when setting up extents. This makes holes in NFS
    swapfiles possible which is not expected for swapon.

    Signed-off-by: Murphy Zhou
    Signed-off-by: Anna Schumaker

    Murphy Zhou
     

18 Nov, 2019

1 commit


21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

26 Apr, 2019

2 commits


21 Feb, 2019

3 commits

  • As the block and SCSI layouts can only read/write fixed-length
    blocks, we must perform read-modify-write when data to be written is
    not aligned to a block boundary or smaller than the block size.
    (612aa983a0410 pnfs: add flag to force read-modify-write in ->write_begin)

    The current code tries to see if we have to do read-modify-write
    on block-oriented pNFS layouts by just checking !PageUptodate(page),
    but the same condition also applies for overwriting of any uncached
    potions of existing files, making such operations excessively slow
    even it is block-aligned.

    The change does not affect the optimization for modify-write-read
    cases (38c73044f5f4d NFS: read-modify-write page updating),
    because partial update of !PageUptodate() pages can only happen
    in layouts that can do arbitrary length read/write and never
    in block-based ones.

    Testing results:

    We ran fio on one of the pNFS clients running 4.20 kernel
    (vanilla and patched) in this configuration to read/write/overwrite
    files on the storage array, exported as pnfs share by the server.

    pNFS clients ---1G Ethernet--- pNFS server
    (HP DL360 G8) (HP DL360 G8)
    | |
    | |
    +------8G Fiber Channel--------+
    |
    Storage Array
    (HP P6350)

    Throughput of overwrite (both buffered and O_SYNC) is noticeably
    improved.

    Ops. |block size| Throughput |
    | (KiB) | (MiB/s) |
    | | 4.20 | patched|
    ---------+----------+----------------+
    buffered | 4| 21.3 | 232 |
    overwrite| 32| 22.2 | 256 |
    | 512| 22.4 | 260 |
    ---------+----------+----------------+
    O_SYNC | 4| 3.84| 4.77|
    overwrite| 32| 12.2 | 32.0 |
    | 512| 18.5 | 152 |
    ---------+----------+----------------+

    Read and write (buffered and O_SYNC) by the same client remain unchanged
    by the patch either negatively or positively, as they should do.

    Ops. |block size| Throughput |
    | (KiB) | (MiB/s) |
    | | 4.20 | patched|
    ---------+----------+----------------+
    read | 4| 548 | 550 |
    | 32| 547 | 551 |
    | 512| 548 | 551 |
    ---------+----------+----------------+
    buffered | 4| 237 | 244 |
    write | 32| 261 | 268 |
    | 512| 265 | 272 |
    ---------+----------+----------------+
    O_SYNC | 4| 0.46| 0.46|
    write | 32| 3.60| 3.57|
    | 512| 105 | 106 |
    ---------+----------+----------------+

    Signed-off-by: Kazuo Ito
    Tested-by: Hiroyuki Watanabe
    Signed-off-by: Trond Myklebust

    Kazuo Ito
     
  • nfs_want_read_modify_write() didn't check for !PagePrivate when pNFS
    block or SCSI layout was in use, therefore we could lose data forever
    if the page being written was filled by a read before completion.

    Signed-off-by: Kazuo Ito
    Signed-off-by: Trond Myklebust

    Kazuo Ito
     
  • Fix up some compiler warnings about function parameters, etc not being
    correctly described or formatted.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

31 Jul, 2018

1 commit

  • Use new return type vm_fault_t for fault handler
    in struct vm_operations_struct. For now, this is
    just documenting that the function returns a
    VM_FAULT value rather than an errno. Once all
    instances are converted, vm_fault_t will become
    a distinct type.

    see commit 1c8f422059ae ("mm: change return type to
    vm_fault_t") for reference.

    Signed-off-by: Souptick Joarder
    Reviewed-by: Matthew Wilcox
    Signed-off-by: Anna Schumaker

    Souptick Joarder
     

18 Nov, 2017

1 commit


12 Sep, 2017

1 commit

  • 1/ remove 'start' and 'end' args from nfs_file_fsync_commit().
    They aren't used.

    2/ Make nfs_context_set_write_error() a "static inline" in internal.h
    so we can...

    3/ Use nfs_context_set_write_error() instead of mapping_set_error()
    if nfs_pageio_add_request() fails before sending any request.
    NFS generally keeps errors in the open_context, not the mapping,
    so this is more consistent.

    4/ If filemap_write_and_write_range() reports any error, still
    check ctx->error. The value in ctx->error is likely to be
    more useful. As part of this, NFS_CONTEXT_ERROR_WRITE is
    cleared slightly earlier, before nfs_file_fsync_commit() is called,
    rather than at the start of that function.

    Signed-off-by: NeilBrown
    Signed-off-by: Trond Myklebust

    NeilBrown
     

07 Sep, 2017

2 commits

  • Since commit 18290650b1c8 ("NFS: Move buffered I/O locking into
    nfs_file_write()") nfs_file_write() has not flushed the correct byte
    range during synchronous writes. generic_write_sync() expects that
    iocb->ki_pos points to the right edge of the range rather than the
    left edge.

    To replicate the problem, open a file with O_DSYNC, have the client
    write at increasing offsets, and then print the successful offsets.
    Block port 2049 partway through that sequence, and observe that the
    client application indicates successful writes in advance of what the
    server received.

    Fixes: 18290650b1c8 ("NFS: Move buffered I/O locking into nfs_file_write()")
    Signed-off-by: Jacob Strauss
    Signed-off-by: Tarang Gupta
    Tested-by: Tarang Gupta
    Cc: stable@vger.kernel.org # v4.8+
    Signed-off-by: Trond Myklebust

    tarangg@amazon.com
     
  • When a byte range lock (or flock) is taken out on an NFS file, the
    validity of the cached data is checked and the inode is marked
    NFS_INODE_INVALID_DATA. However the cached data isn't flushed from
    the page cache.

    This is sufficient for future read() requests or mmap() requests as
    they call nfs_revalidate_mapping() which performs the flush if
    necessary.

    However an existing mapping is not affected. Accessing data through
    that mapping will continue to return old data even though the inode is
    marked NFS_INODE_INVALID_DATA.

    This can easily be confirmed using the 'nfs' tool in
    git://github.com/okirch/twopence-nfs.git
    and running

    nfs coherence FILENAME
    on one client, and
    nfs coherence -r FILENAME
    on another client.

    It appears that prior to Linux 2.6.0 this worked correctly.

    However commit:

    http://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/?id=ca9268fe3ddd075714005adecd4afbd7f9ab87d0

    removed the call to inode_invalidate_pages() from nfs_zap_caches(). I
    haven't tested this code, but inspection suggests that prior to this
    commit, file locking would invalidate all inode pages.

    This patch adds a call to nfs_revalidate_mapping() after a
    successful SETLK so that invalid data is flushed. With this patch the
    above test passes. To minimize impact (and possibly avoid a GETATTR
    call) this only happens if the mapping might be mapped into
    userspace.

    Cc: Olaf Kirch
    Signed-off-by: NeilBrown
    Signed-off-by: Trond Myklebust

    NeilBrown
     

27 Jul, 2017

2 commits

  • posix_fallocate() will allocate space in an NFS file by considering
    the last byte of every 4K block. If it is before EOF, it will read
    the byte and if it is zero, a zero is written out. If it is after EOF,
    the zero is unconditionally written.

    For the blocks beyond EOF, if NFS believes its cache is valid, it will
    expand these writes to write full pages, and then will merge the pages.
    This results if (typically) 1MB writes. If NFS believes its cache is
    not valid (particularly if NFS_INO_INVALID_DATA or
    NFS_INO_REVAL_PAGECACHE are set - see nfs_write_pageuptodate()), it will
    send the individual 1-byte writes. This results in (typically) 256 times
    as many RPC requests, and can be substantially slower.

    Currently nfs_revalidate_mapping() is only used when reading a file or
    mmapping a file, as these are times when the content needs to be
    up-to-date. Writes don't generally need the cache to be up-to-date, but
    writes beyond EOF can benefit, particularly in the posix_fallocate()
    case.

    So this patch calls nfs_revalidate_mapping() when writing beyond EOF -
    i.e. when there is a gap between the end of the file and the start of
    the write. If the cache is thought to be out of date (as happens after
    taking a file lock), this will cause a GETATTR, and the two flags
    mentioned above will be cleared. With this, posix_fallocate() on a
    newly locked file does not generate excessive tiny writes.

    Signed-off-by: NeilBrown
    Signed-off-by: Anna Schumaker

    NeilBrown
     
  • Prior to commit ca0daa277aca ("NFS: Cache aggressively when file is open
    for writing"), NFS would revalidate, or invalidate, the file size when
    taking a lock. Since that commit it only invalidates the file content.

    If the file size is changed on the server while wait for the lock, the
    client will have an incorrect understanding of the file size and could
    corrupt data. This particularly happens when writing beyond the
    (supposed) end of file and can be easily be demonstrated with
    posix_fallocate().

    If an application opens an empty file, waits for a write lock, and then
    calls posix_fallocate(), glibc will determine that the underlying
    filesystem doesn't support fallocate (assuming version 4.1 or earlier)
    and will write out a '0' byte at the end of each 4K page in the region
    being fallocated that is after the end of the file.
    NFS will (usually) detect that these writes are beyond EOF and will
    expand them to cover the whole page, and then will merge the pages.
    Consequently, NFS will write out large blocks of zeroes beyond where it
    thought EOF was. If EOF had moved, the pre-existing part of the file
    will be over-written. Locking should have protected against this,
    but it doesn't.

    This patch restores the use of nfs_zap_caches() which invalidated the
    cached attributes. When posix_fallocate() asks for the file size, the
    request will go to the server and get a correct answer.

    cc: stable@vger.kernel.org (v4.8+)
    Fixes: ca0daa277aca ("NFS: Cache aggressively when file is open for writing")
    Signed-off-by: NeilBrown
    Signed-off-by: Anna Schumaker

    NeilBrown
     

27 Apr, 2017

1 commit


21 Apr, 2017

2 commits

  • NFS attempts to wait for read and write completion before unlocking in
    order to ensure that the data returned was protected by the lock. When
    this waiting is interrupted by a signal, the unlock may be skipped, and
    messages similar to the following are seen in the kernel ring buffer:

    [20.167876] Leaked locks on dev=0x0:0x2b ino=0x8dd4c3:
    [20.168286] POSIX: fl_owner=ffff880078b06940 fl_flags=0x1 fl_type=0x0 fl_pid=20183
    [20.168727] POSIX: fl_owner=ffff880078b06680 fl_flags=0x1 fl_type=0x0 fl_pid=20185

    For NFSv3, the missing unlock will cause the server to refuse conflicting
    locks indefinitely. For NFSv4, the leftover lock will be removed by the
    server after the lease timeout.

    This patch fixes this issue by skipping the usual wait in
    nfs_iocounter_wait if the FL_CLOSE flag is set when signaled. Instead, the
    wait happens in the unlock RPC task on the NFS UOC rpc_waitqueue.

    For NFSv3, use lockd's new nlmclnt_operations along with
    nfs_async_iocounter_wait to defer NLM's unlock task until the lock
    context's iocounter reaches zero.

    For NFSv4, call nfs_async_iocounter_wait() directly from unlock's
    current rpc_call_prepare.

    Signed-off-by: Benjamin Coddington
    Reviewed-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Benjamin Coddington
     
  • We only need to check lock exclusive/shared types against open mode when
    flock() is used on NFS, so move it into the flock-specific path instead of
    checking it for all locks.

    Signed-off-by: Benjamin Coddington
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Benjamin Coddington
     

25 Feb, 2017

1 commit

  • ->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
    take a vma and vmf parameter when the vma already resides in vmf.

    Remove the vma parameter to simplify things.

    [arnd@arndb.de: fix ARM build]
    Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
    Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
    Signed-off-by: Dave Jiang
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Ross Zwisler
    Cc: Theodore Ts'o
    Cc: Darrick J. Wong
    Cc: Matthew Wilcox
    Cc: Dave Hansen
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     

25 Dec, 2016

1 commit


22 Dec, 2016

1 commit

  • Pull more NFS client updates from Trond Myklebust:
    "Highlights include:

    - further attribute cache improvements to make revalidation more fine
    grained

    - NFSv4 locking improvements

    Bugfixes:

    - nfs4_fl_prepare_ds must be careful about reporting success in files
    layout

    - pNFS/flexfiles: Instead of marking a device inactive, remove it
    from the cache"

    * tag 'nfs-for-4.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFSv4: Retry the DELEGRETURN if the embedded GETATTR is rejected with EACCES
    NFS: Retry the CLOSE if the embedded GETATTR is rejected with EACCES
    NFSv4: Place the GETATTR operation before the CLOSE
    NFSv4: Also ask for attributes when downgrading to a READ-only state
    NFS: Don't abuse NFS_INO_REVAL_FORCED in nfs_post_op_update_inode_locked()
    pNFS: Return RW layouts on OPEN_DOWNGRADE
    NFSv4: Add encode/decode of the layoutreturn op in OPEN_DOWNGRADE
    NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID
    NFSv4: ensure __nfs4_find_lock_state returns consistent result.
    NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.
    pNFS/flexfiles: delete deviceid, don't mark inactive
    NFS: Clean up nfs_attribute_timeout()
    NFS: Remove unused function nfs_revalidate_inode_rcu()
    NFS: Fix and clean up the access cache validity checking
    NFS: Only look at the change attribute cache state in nfs_weak_revalidate()
    NFS: Clean up cache validity checking
    NFS: Don't revalidate the file on close if we hold a delegation
    NFSv4: Don't discard the attributes returned by asynchronous DELEGRETURN
    NFSv4: Update the attribute cache info in update_changeattr

    Linus Torvalds
     

20 Dec, 2016

1 commit


18 Dec, 2016

1 commit

  • Pull more vfs updates from Al Viro:
    "In this pile:

    - autofs-namespace series
    - dedupe stuff
    - more struct path constification"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
    ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features
    ocfs2: charge quota for reflinked blocks
    ocfs2: fix bad pointer cast
    ocfs2: always unlock when completing dio writes
    ocfs2: don't eat io errors during _dio_end_io_write
    ocfs2: budget for extent tree splits when adding refcount flag
    ocfs2: prohibit refcounted swapfiles
    ocfs2: add newlines to some error messages
    ocfs2: convert inode refcount test to a helper
    simple_write_end(): don't zero in short copy into uptodate
    exofs: don't mess with simple_write_{begin,end}
    9p: saner ->write_end() on failing copy into non-uptodate page
    fix gfs2_stuffed_write_end() on short copies
    fix ceph_write_end()
    nfs_write_end(): fix handling of short copies
    vfs: refactor clone/dedupe_file_range common functions
    fs: try to clone files first in vfs_copy_file_range
    vfs: misc struct path constification
    namespace.c: constify struct path passed to a bunch of primitives
    quota: constify struct path in quota_on
    ...

    Linus Torvalds
     

10 Dec, 2016

1 commit

  • What matters when deciding if we should make a page uptodate is
    not how much we _wanted_ to copy, but how much we actually have
    copied. As it is, on architectures that do not zero tail on
    short copy we can leave uninitialized data in page marked uptodate.

    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Al Viro
     

05 Dec, 2016

1 commit


14 Oct, 2016

1 commit

  • Pull NFS client updates from Anna Schumaker:
    "Highlights include:

    Stable bugfixes:
    - sunrpc: fix writ espace race causing stalls
    - NFS: Fix inode corruption in nfs_prime_dcache()
    - NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
    - NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
    - NFSv4: Open state recovery must account for file permission changes
    - NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic

    Features:
    - Add support for tracking multiple layout types with an ordered list
    - Add support for using multiple backchannel threads on the client
    - Add support for pNFS file layout session trunking
    - Delay xprtrdma use of DMA API (for device driver removal)
    - Add support for xprtrdma remote invalidation
    - Add support for larger xprtrdma inline thresholds
    - Use a scatter/gather list for sending xprtrdma RPC calls
    - Add support for the CB_NOTIFY_LOCK callback
    - Improve hashing sunrpc auth_creds by using both uid and gid

    Bugfixes:
    - Fix xprtrdma use of DMA API
    - Validate filenames before adding to the dcache
    - Fix corruption of xdr->nwords in xdr_copy_to_scratch
    - Fix setting buffer length in xdr_set_next_buffer()
    - Don't deadlock the state manager on the SEQUENCE status flags
    - Various delegation and stateid related fixes
    - Retry operations if an interrupted slot receives EREMOTEIO
    - Make nfs boot time y2038 safe"

    * tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits)
    NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
    fs: nfs: Make nfs boot time y2038 safe
    sunrpc: replace generic auth_cred hash with auth-specific function
    sunrpc: add RPCSEC_GSS hash_cred() function
    sunrpc: add auth_unix hash_cred() function
    sunrpc: add generic_auth hash_cred() function
    sunrpc: add hash_cred() function to rpc_authops struct
    Retry operation on EREMOTEIO on an interrupted slot
    pNFS: Fix atime updates on pNFS clients
    sunrpc: queue work on system_power_efficient_wq
    NFSv4.1: Even if the stateid is OK, we may need to recover the open modes
    NFSv4: If recovery failed for a specific open stateid, then don't retry
    NFSv4: Fix retry issues with nfs41_test/free_stateid
    NFSv4: Open state recovery must account for file permission changes
    NFSv4: Mark the lock and open stateids as invalid after freeing them
    NFSv4: Don't test open_stateid unless it is set
    NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
    NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation
    NFSv4: Fix a race when updating an open_stateid
    NFSv4: Fix a race in nfs_inode_reclaim_delegation()
    ...

    Linus Torvalds
     

06 Oct, 2016

1 commit


23 Sep, 2016

1 commit


20 Sep, 2016

1 commit


04 Sep, 2016

1 commit


25 Jul, 2016

1 commit


20 Jul, 2016

1 commit

  • A generic_cred can be used to look up a unx_cred or a gss_cred, so it's
    not really safe to use the the generic_cred->acred->ac_flags to store
    the NO_CRKEY_TIMEOUT flag. A lookup for a unx_cred triggered while the
    KEY_EXPIRE_SOON flag is already set will cause both NO_CRKEY_TIMEOUT and
    KEY_EXPIRE_SOON to be set in the ac_flags, leaving the user associated
    with the auth_cred to be in a state where they're perpetually doing 4K
    NFS_FILE_SYNC writes.

    This can be reproduced as follows:

    1. Mount two NFS filesystems, one with sec=krb5 and one with sec=sys.
    They do not need to be the same export, nor do they even need to be from
    the same NFS server. Also, v3 is fine.
    $ sudo mount -o v3,sec=krb5 server1:/export /mnt/krb5
    $ sudo mount -o v3,sec=sys server2:/export /mnt/sys

    2. As the normal user, before accessing the kerberized mount, kinit with
    a short lifetime (but not so short that renewing the ticket would leave
    you within the 4-minute window again by the time the original ticket
    expires), e.g.
    $ kinit -l 10m -r 60m

    3. Do some I/O to the kerberized mount and verify that the writes are
    wsize, UNSTABLE:
    $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1

    4. Wait until you're within 4 minutes of key expiry, then do some more
    I/O to the kerberized mount to ensure that RPC_CRED_KEY_EXPIRE_SOON gets
    set. Verify that the writes are 4K, FILE_SYNC:
    $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1

    5. Now do some I/O to the sec=sys mount. This will cause
    RPC_CRED_NO_CRKEY_TIMEOUT to be set:
    $ dd if=/dev/zero of=/mnt/sys/file bs=1M count=1

    6. Writes for that user will now be permanently 4K, FILE_SYNC for that
    user, regardless of which mount is being written to, until you reboot
    the client. Renewing the kerberos ticket (assuming it hasn't already
    expired) will have no effect. Grabbing a new kerberos ticket at this
    point will have no effect either.

    Move the flag to the auth->au_flags field (which is currently unused)
    and rename it slightly to reflect that it's no longer associated with
    the auth_cred->ac_flags. Add the rpc_auth to the arg list of
    rpcauth_cred_key_to_expire and check the au_flags there too. Finally,
    add the inode to the arg list of nfs_ctx_key_to_expire so we can
    determine the rpc_auth to pass to rpcauth_cred_key_to_expire.

    Signed-off-by: Scott Mayhew
    Signed-off-by: Trond Myklebust

    Scott Mayhew
     

06 Jul, 2016

2 commits