26 Sep, 2018

1 commit

  • commit 994b15b983a72e1148a173b61e5b279219bb45ae upstream.

    The previous fix broke recovery of delegated stateids because it assumes
    that if we did not mark the delegation as suspect, then the delegation has
    effectively been revoked, and so it removes that delegation irrespectively
    of whether or not it is valid and still in use. While this is "mostly
    harmless" for ordinary I/O, we've seen pNFS fail with LAYOUTGET spinning
    in an infinite loop while complaining that we're using an invalid stateid
    (in this case the all-zero stateid).

    What we rather want to do here is ensure that the delegation is always
    correctly marked as needing testing when that is the case. So we want
    to close the loophole offered by nfs4_schedule_stateid_recovery(),
    which marks the state as needing to be reclaimed, but not the
    delegation that may be backing it.

    Fixes: 0e3d3e5df07dc ("NFSv4.1 fix infinite loop on IO BAD_STATEID error")
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org # v4.11+
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     

26 Apr, 2018

1 commit

  • [ Upstream commit dce2630c7da73b0634686bca557cc8945cc450c8 ]

    There are 2 comments in the NFSv4 code which suggest that
    SIGLOST should possibly be sent to a process. In these
    cases a lock has been lost.
    The current practice is to set NFS_LOCK_LOST so that
    read/write returns EIO when a lock is lost.
    So change these comments to code when sets NFS_LOCK_LOST.

    One case is when lock recovery after apparent server restart
    fails with NFS4ERR_DENIED, NFS4ERR_RECLAIM_BAD, or
    NFS4ERRO_RECLAIM_CONFLICT. The other case is when a lock
    attempt as part of lease recovery fails with NFS4ERR_DENIED.

    In an ideal world, these should not happen. However I have
    a packet trace showing an NFSv4.1 session getting
    NFS4ERR_BADSESSION after an extended network parition. The
    NFSv4.1 client treats this like server reboot until/unless
    it get NFS4ERR_NO_GRACE, in which case it switches over to
    "nograce" recovery mode. In this network trace, the client
    attempts to recover a lock and the server (incorrectly)
    reports NFS4ERR_DENIED rather than NFS4ERR_NO_GRACE. This
    leads to the ineffective comment and the client then
    continues to write using the OPEN stateid.

    Signed-off-by: NeilBrown
    Signed-off-by: Trond Myklebust
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    NeilBrown
     

14 Jul, 2017

2 commits

  • Pull NFS client updates from Anna Schumaker:
    "Stable bugfixes:
    - Fix -EACCESS on commit to DS handling
    - Fix initialization of nfs_page_array->npages
    - Only invalidate dentries that are actually invalid

    Features:
    - Enable NFSoRDMA transparent state migration
    - Add support for lookup-by-filehandle
    - Add support for nfs re-exporting

    Other bugfixes and cleanups:
    - Christoph cleaned up the way we declare NFS operations
    - Clean up various internal structures
    - Various cleanups to commits
    - Various improvements to error handling
    - Set the dt_type of . and .. entries in NFS v4
    - Make slot allocation more reliable
    - Fix fscache stat printing
    - Fix uninitialized variable warnings
    - Fix potential list overrun in nfs_atomic_open()
    - Fix a race in NFSoRDMA RPC reply handler
    - Fix return size for nfs42_proc_copy()
    - Fix against MAC forgery timing attacks"

    * tag 'nfs-for-4.13-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (68 commits)
    NFS: Don't run wake_up_bit() when nobody is waiting...
    nfs: add export operations
    nfs4: add NFSv4 LOOKUPP handlers
    nfs: add a nfs_ilookup helper
    nfs: replace d_add with d_splice_alias in atomic_open
    sunrpc: use constant time memory comparison for mac
    NFSv4.2 fix size storage for nfs42_proc_copy
    xprtrdma: Fix documenting comments in frwr_ops.c
    xprtrdma: Replace PAGE_MASK with offset_in_page()
    xprtrdma: FMR does not need list_del_init()
    xprtrdma: Demote "connect" log messages
    NFSv4.1: Use seqid returned by EXCHANGE_ID after state migration
    NFSv4.1: Handle EXCHGID4_FLAG_CONFIRMED_R during NFSv4.1 migration
    xprtrdma: Don't defer MR recovery if ro_map fails
    xprtrdma: Fix FRWR invalidation error recovery
    xprtrdma: Fix client lock-up after application signal fires
    xprtrdma: Rename rpcrdma_req::rl_free
    xprtrdma: Pass only the list of registered MRs to ro_unmap_sync
    xprtrdma: Pre-mark remotely invalidated MRs
    xprtrdma: On invalidation failure, remove MWs from rl_registered
    ...

    Linus Torvalds
     
  • Transparent State Migration copies a client's lease state from the
    server where a filesystem used to reside to the server where it now
    resides. When an NFSv4.1 client first contacts that destination
    server, it uses EXCHANGE_ID to detect trunking relationships.

    The lease that was copied there is returned to that client, but the
    destination server sets EXCHGID4_FLAG_CONFIRMED_R when replying to
    the client. This is because the lease was confirmed on the source
    server (before it was copied).

    Normally, when CONFIRMED_R is set, a client purges the lease and
    creates a new one. However, that throws away the entire benefit of
    Transparent State Migration.

    Therefore, the client must not purge that lease when it is possible
    that Transparent State Migration has occurred.

    Reported-by: Xuan Qi
    Signed-off-by: Chuck Lever
    Tested-by: Xuan Qi
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

28 Jun, 2017

1 commit


06 May, 2017

1 commit


31 Jan, 2017

1 commit


27 Jan, 2017

1 commit

  • Lock sequence IDs are bumped in decode_lock by calling
    nfs_increment_seqid(). nfs_increment_sequid() does not use the
    seqid_mutating_err() function fixed in commit 059aa7348241 ("Don't
    increment lock sequence ID after NFS4ERR_MOVED").

    Fixes: 059aa7348241 ("Don't increment lock sequence ID after ...")
    Signed-off-by: Chuck Lever
    Tested-by: Xuan Qi
    Cc: stable@vger.kernel.org # v3.7+
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

14 Jan, 2017

1 commit

  • If the server reboots multiple times, the client should rely on the
    server to tell it that it cannot reclaim state as per section 9.6.3.4
    in RFC7530 and section 8.4.2.1 in RFC5661.
    Currently, the client is being to conservative, and is assuming that
    if the server reboots while state recovery is in progress, then it must
    ignore state that was not recovered before the reboot.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

20 Dec, 2016

2 commits

  • When an NFS4ERR_BAD_SEQID is received the open-owner is removed from
    the ->state_owners rbtree so that it will no longer be used.

    If any stateids attached to this open-owner are still in use, and if a
    request using one gets an NFS4ERR_BAD_STATEID reply, this can for bad.

    The state is marked as needing recovery and the nfs4_state_manager()
    is scheduled to clean up. nfs4_state_manager() finds states to be
    recovered by walking the state_owners rbtree. As the open-owner is
    not in the rbtree, the bad state is not found so nfs4_state_manager()
    completes having done nothing. The request is then retried, with a
    predicatable result (indefinite retries).

    If the stateid is for a delegation, this open_owner will be used
    to open files when the delegation is returned. For that to work,
    a new open-owner needs to be presented to the server.

    This patch changes NFS4ERR_BAD_SEQID handling to leave the open-owner
    in the rbtree but updates the 'create_time' so it looks like a new
    open-owner. With this the indefinite retries no longer happen.

    Signed-off-by: NeilBrown
    Signed-off-by: Trond Myklebust

    NeilBrown
     
  • If a file has both flock locks and OFD locks, then it is possible that
    two different nfs4 lock states could apply to file accesses from a
    single process.

    It is not possible to know, efficiently, which one is "correct".
    Presumably the state which represents a lock that covers the region
    undergoing IO would be the "correct" one to use, but finding that has
    a non-trivial cost and would provide miniscule value.

    Currently we just return whichever is first in the list, which could
    result in inconsistent behaviour if an application ever put it self in
    this position. As consistent behaviour is preferable (when perfectly
    correct behaviour is not available), change the search to return a
    consistent result in this circumstance.
    Specifically: if there is both a flock and OFD lock state, always return
    the flock one.

    Reviewed-by: Jeff Layton
    Signed-off-by: NeilBrown
    Signed-off-by: Trond Myklebust

    NeilBrown
     

05 Dec, 2016

1 commit


02 Dec, 2016

3 commits


19 Nov, 2016

1 commit


28 Sep, 2016

5 commits


06 Aug, 2016

1 commit


25 Jun, 2016

1 commit

  • Commit e8d975e73e5f ("fixing infinite OPEN loop in 4.0 stateid recovery")
    introduced access to state after it was just potentially freed by
    nfs4_put_open_state leading to a random data corruption somewhere.

    BUG: unable to handle kernel paging request at ffff88004941ee40
    IP: [] nfs4_do_reclaim+0x461/0x740
    PGD 3501067 PUD 3504067 PMD 6ff37067 PTE 800000004941e060
    Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
    Modules linked in: loop rpcsec_gss_krb5 acpi_cpufreq tpm_tis joydev i2c_piix4 pcspkr tpm virtio_console nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops floppy serio_raw virtio_blk drm
    CPU: 6 PID: 2161 Comm: 192.168.10.253- Not tainted 4.7.0-rc1-vm-nfs+ #112
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    task: ffff8800463dcd00 ti: ffff88003ff48000 task.ti: ffff88003ff48000
    RIP: 0010:[] [] nfs4_do_reclaim+0x461/0x740
    RSP: 0018:ffff88003ff4bd68 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffffffff81a49900 RCX: 00000000000000e8
    RDX: 00000000000000e8 RSI: ffff8800418b9930 RDI: ffff880040c96c88
    RBP: ffff88003ff4bdf8 R08: 0000000000000001 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff880040c96c98
    R13: ffff88004941ee20 R14: ffff88004941ee40 R15: ffff88004941ee00
    FS: 0000000000000000(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffff88004941ee40 CR3: 0000000060b0b000 CR4: 00000000000006e0
    Stack:
    ffffffff813baad5 ffff8800463dcd00 ffff880000000001 ffffffff810e6b68
    ffff880043ddbc88 ffff8800418b9800 ffff8800418b98c8 ffff88004941ee48
    ffff880040c96c90 ffff880040c96c00 ffff880040c96c20 ffff880040c96c40
    Call Trace:
    [] ? nfs4_do_reclaim+0x35/0x740
    [] ? trace_hardirqs_on_caller+0x128/0x1b0
    [] nfs4_run_state_manager+0x5ed/0xa40
    [] ? nfs4_do_reclaim+0x740/0x740
    [] ? nfs4_do_reclaim+0x740/0x740
    [] kthread+0x101/0x120
    [] ? trace_hardirqs_on_caller+0x128/0x1b0
    [] ret_from_fork+0x1f/0x40
    [] ? kthread_create_on_node+0x250/0x250
    Code: 65 80 4c 8b b5 78 ff ff ff e8 fc 88 4c 00 48 8b 7d 88 e8 13 67 d2 ff 49 8b 47 40 a8 02 0f 84 d3 01 00 00 4c 89 ff e8 7f f9 ff ff 41 80 26 7f 48 8b 7d c8 e8 b1 84 4c 00 e9 39 fd ff ff 3d e6
    RIP [] nfs4_do_reclaim+0x461/0x740
    RSP
    CR2: ffff88004941ee40

    Signed-off-by: Oleg Drokin
    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Oleg Drokin
     

28 May, 2016

1 commit

  • Older versions of gcc don't understand named initializers inside a
    anonymous structure or union member. It can be worked around by adding
    the bracin gin the initializer for the anonymous member.

    Without this, gcc 4.4.4 will fail the build with

    CC fs/nfs/nfs4state.o
    fs/nfs/nfs4state.c:69: error: unknown field ‘data’ specified in initializer
    fs/nfs/nfs4state.c:69: warning: missing braces around initializer
    fs/nfs/nfs4state.c:69: warning: (near initialization for ‘zero_stateid..data’)
    make[2]: *** [fs/nfs/nfs4state.o] Error 1

    introduced in commit 93b717fd81bf ("NFSv4: Label stateids with the type")

    Reported-and-tested-by: Boris Ostrovsky
    Cc: Anna Schumaker
    Cc: Trond Myklebust
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

18 May, 2016

2 commits


03 Oct, 2015

1 commit

  • Currently, we don't test if the state owner is in use before we try to
    recover it. The problem is that if the refcount is zero, then the
    state owner will be waiting on the lru list for garbage collection.
    The expectation in that case is that if you bump the refcount, then
    you must also remove the state owner from the lru list. Otherwise
    the call to nfs4_put_state_owner will corrupt that list by trying
    to add our state owner a second time.

    Avoid the whole problem by just skipping state owners that hold no
    state.

    Reported-by: Andrew W Elble
    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

18 Sep, 2015

1 commit

  • A test case is as the description says:
    open(foobar, O_WRONLY);
    sleep() --> reboot the server
    close(foobar)

    The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
    line before going to restart, there is
    clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).

    NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
    owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
    value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
    out state and when we go to close it, “call_close” doesn’t get set as
    state flag is not set and CLOSE doesn’t go on the wire.

    Signed-off-by: Olga Kornievskaia
    Signed-off-by: Trond Myklebust

    Olga Kornievskaia
     

18 Aug, 2015

1 commit


06 Jul, 2015

3 commits


02 Jun, 2015

1 commit

  • While the NFSv4.1 code has always drained the slot tables in order to stop
    non-recovery related RPC calls when doing lease recovery, the NFSv4 code
    did not.
    The reason for the difference in behaviour is that NFSv4 does not have
    session state, and so RPC calls can in theory proceed while recovery is
    happening. In practice, however, anything I/O or state related needs to
    wait until recovery is over.

    This patch changes the behaviour of NFSv4 to match that of NFSv4.1 so that
    we can simplify the state recovery code by assuming that we do not have to
    deal with races between recovery and ordinary I/O.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

01 Jun, 2015

1 commit

  • Problem: When an operation like WRITE receives a BAD_STATEID, even though
    recovery code clears the RECLAIM_NOGRACE recovery flag before recovering
    the open state, because of clearing delegation state for the associated
    inode, nfs_inode_find_state_and_recover() gets called and it makes the
    same state with RECLAIM_NOGRACE flag again. As a results, when we restart
    looking over the open states, we end up in the infinite loop instead of
    breaking out in the next test of state flags.

    Solution: unset the RECLAIM_NOGRACE set because of
    calling of nfs_inode_find_state_and_recover() after returning from calling
    recover_open() function.

    Signed-off-by: Olga Kornievskaia
    Cc: stable@vger.kernel.org
    Signed-off-by: Trond Myklebust

    Olga Kornievskaia
     

27 Apr, 2015

1 commit

  • Pull NFS client updates from Trond Myklebust:
    "Another set of mainly bugfixes and a couple of cleanups. No new
    functionality in this round.

    Highlights include:

    Stable patches:
    - Fix a regression in /proc/self/mountstats
    - Fix the pNFS flexfiles O_DIRECT support
    - Fix high load average due to callback thread sleeping

    Bugfixes:
    - Various patches to fix the pNFS layoutcommit support
    - Do not cache pNFS deviceids unless server notifications are enabled
    - Fix a SUNRPC transport reconnection regression
    - make debugfs file creation failure non-fatal in SUNRPC
    - Another fix for circular directory warnings on NFSv4 "junctioned"
    mountpoints
    - Fix locking around NFSv4.2 fallocate() support
    - Truncating NFSv4 file opens should also sync O_DIRECT writes
    - Prevent infinite loop in rpcrdma_ep_create()

    Features:
    - Various improvements to the RDMA transport code's handling of
    memory registration
    - Various code cleanups"

    * tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (55 commits)
    fs/nfs: fix new compiler warning about boolean in switch
    nfs: Remove unneeded casts in nfs
    NFS: Don't attempt to decode missing directory entries
    Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one"
    NFS: Rename idmap.c to nfs4idmap.c
    NFS: Move nfs_idmap.h into fs/nfs/
    NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h
    NFS: Add a stub for GETDEVICELIST
    nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes
    nfs: fix DIO good bytes calculation
    nfs: Fetch MOUNTED_ON_FILEID when updating an inode
    sunrpc: make debugfs file creation failure non-fatal
    nfs: fix high load average due to callback thread sleeping
    NFS: Reduce time spent holding the i_mutex during fallocate()
    NFS: Don't zap caches on fallocate()
    xprtrdma: Make rpcrdma_{un}map_one() into inline functions
    xprtrdma: Handle non-SEND completions via a callout
    xprtrdma: Add "open" memreg op
    xprtrdma: Add "destroy MRs" memreg op
    xprtrdma: Add "reset MRs" memreg op
    ...

    Linus Torvalds
     

24 Apr, 2015

1 commit

  • This file is only used internally to the NFS v4 module, so it doesn't
    need to be in the global include path. I also renamed it from
    nfs_idmap.h to nfs4idmap.h to emphasize that it's an NFSv4-only include
    file.

    Signed-off-by: Anna Schumaker
    Signed-off-by: Trond Myklebust

    Anna Schumaker
     

16 Apr, 2015

1 commit


04 Mar, 2015

2 commits


12 Feb, 2015

1 commit

  • Pull NFS client updates from Trond Myklebust:
    "Highlights incluse:

    Features:
    - Removing the forced serialisation of open()/close() calls in
    NFSv4.x (x>0) makes for a significant performance improvement in
    metadata intensive workloads.
    - Full support for the pNFS "flexible files" layout type
    - Further RPC/RDMA client improvements from Chuck

    Bugfixes:
    - Stable fix: NFSv4.1 backchannel calls blocking operations with !TASK_RUNNING
    - Stable fix: pnfs_generic_pg_init_read/write can be called with lseg == NULL
    - Stable fix: Fix an Oopsable condition when nsm_mon_unmon is called
    as part of the namespace cleanup,
    - Stable fix: Ensure we reference the inode for return-on-close in
    delegreturn
    - Use SO_REUSEPORT to ensure that NFSv3 TCP connections can rebind to
    the same source address/port combination during a disconnect/
    reconnect event. This is a requirement imposed by most NFSv3
    server duplicate reply cache implementations.

    Optimisations:
    - Ask for no NFSv4.1 delegations on OPEN if using O_DIRECT

    Other:
    - Add Anna Schumaker as co-maintainer for the NFS client"

    * tag 'nfs-for-3.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (119 commits)
    SUNRPC: Cleanup to remove xs_tcp_close()
    pnfs: delete an unintended goto
    pnfs/flexfiles: Do not dprintk after the free
    SUNRPC: Fix stupid typo in xs_sock_set_reuseport
    SUNRPC: Define xs_tcp_fin_timeout only if CONFIG_SUNRPC_DEBUG
    SUNRPC: Handle connection reset more efficiently.
    SUNRPC: Remove the redundant XPRT_CONNECTION_CLOSE flag
    SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release
    SUNRPC: Ensure xs_tcp_shutdown() requests a full close of the connection
    SUNRPC: Cleanup to remove remaining uses of XPRT_CONNECTION_ABORT
    SUNRPC: Remove TCP socket linger code
    SUNRPC: Remove TCP client connection reset hack
    SUNRPC: TCP/UDP always close the old socket before reconnecting
    SUNRPC: Add helpers to prevent socket create from racing
    SUNRPC: Ensure xs_reset_transport() resets the close connection flags
    SUNRPC: Do not clear the source port in xs_reset_transport
    SUNRPC: Handle EADDRINUSE on connect
    SUNRPC: Set SO_REUSEPORT socket option for TCP connections
    NFSv4.1: Fix pnfs_put_lseg races
    NFSv4.1: pnfs_send_layoutreturn should use GFP_NOFS
    ...

    Linus Torvalds