08 Dec, 2019

3 commits

  • Pull nfsd updates from Bruce Fields:
    "This is a relatively quiet cycle for nfsd, mainly various bugfixes.

    Possibly most interesting is Trond's fixes for some callback races
    that were due to my incomplete understanding of rpc client shutdown.
    Unfortunately at the last minute I've started noticing a new
    intermittent failure to send callbacks. As the logic seems basically
    correct, I'm leaving Trond's patches in for now, and hope to find a
    fix in the next week so I don't have to revert those patches"

    * tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linux: (24 commits)
    nfsd: depend on CRYPTO_MD5 for legacy client tracking
    NFSD fixing possible null pointer derefering in copy offload
    nfsd: check for EBUSY from vfs_rmdir/vfs_unink.
    nfsd: Ensure CLONE persists data and metadata changes to the target file
    SUNRPC: Fix backchannel latency metrics
    nfsd: restore NFSv3 ACL support
    nfsd: v4 support requires CRYPTO_SHA256
    nfsd: Fix cld_net->cn_tfm initialization
    lockd: remove __KERNEL__ ifdefs
    sunrpc: remove __KERNEL__ ifdefs
    race in exportfs_decode_fh()
    nfsd: Drop LIST_HEAD where the variable it declares is never used.
    nfsd: document callback_wq serialization of callback code
    nfsd: mark cb path down on unknown errors
    nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()
    nfsd: minor 4.1 callback cleanup
    SUNRPC: Fix svcauth_gss_proxy_init()
    SUNRPC: Trace gssproxy upcall results
    sunrpc: fix crash when cache_head become valid before update
    nfsd: remove private bin2hex implementation
    ...

    Linus Torvalds
     
  • The legacy client tracking infrastructure of nfsd makes use of MD5 to
    derive a client's recovery directory name. As the nfsd module doesn't
    declare any dependency on CRYPTO_MD5, though, it may fail to allocate
    the hash if the kernel was compiled without it. As a result, generation
    of client recovery directories will fail with the following error:

    NFSD: unable to generate recoverydir name

    The explicit dependency on CRYPTO_MD5 was removed as redundant back in
    6aaa67b5f3b9 (NFSD: Remove redundant "select" clauses in fs/Kconfig
    2008-02-11) as it was already implicitly selected via RPCSEC_GSS_KRB5.
    This broke when RPCSEC_GSS_KRB5 was made optional for NFSv4 in commit
    df486a25900f (NFS: Fix the selection of security flavours in Kconfig) at
    a later point.

    Fix the issue by adding back an explicit dependency on CRYPTO_MD5.

    Fixes: df486a25900f (NFS: Fix the selection of security flavours in Kconfig)
    Signed-off-by: Patrick Steinhardt
    Signed-off-by: J. Bruce Fields

    Patrick Steinhardt
     
  • Static checker revealed possible error path leading to possible
    NULL pointer dereferencing.

    Reported-by: Dan Carpenter
    Fixes: e0639dc5805a: ("NFSD introduce async copy feature")
    Signed-off-by: Olga Kornievskaia
    Signed-off-by: J. Bruce Fields

    Olga Kornievskaia
     

01 Dec, 2019

2 commits

  • vfs_rmdir and vfs_unlink can return -EBUSY if the
    target is a mountpoint. This currently gets passed to
    nfserrno() by nfsd_unlink(), and that results in a WARNing,
    which is not user-friendly.

    Possibly the best NFSv4 error is NFS4ERR_FILE_OPEN, because
    there is a sense in which the object is currently in use
    by some other task. The Linux NFSv4 client will map this
    back to EBUSY, which is an added benefit.

    For NFSv3, the best we can do is probably NFS3ERR_ACCES, which isn't
    true, but is not less true than the other options.

    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     
  • The NFSv4.2 CLONE operation has implicit persistence requirements on the
    target file, since there is no protocol requirement that the client issue
    a separate operation to persist data.
    For that reason, we should call vfs_fsync_range() on the destination file
    after a successful call to vfs_clone_file_range().

    Fixes: ffa0160a1039 ("nfsd: implement the NFSv4.2 CLONE operation")
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org # v4.5+
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     

20 Nov, 2019

1 commit

  • An error in e333f3bbefe3 left the nfsd_acl_program->pg_vers array empty,
    which effectively turned off the server's support for NFSv3 ACLs.

    Fixes: e333f3bbefe3 "nfsd: Allow containers to set supported nfs versions"
    Cc: stable@vger.kernel.org
    Cc: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

16 Nov, 2019

1 commit

  • Most of the callers of lookup_one_len_unlocked() treat negatives are
    ERR_PTR(-ENOENT). Provide a helper that would do just that. Note
    that a pinned positive dentry remains positive - it's ->d_inode is
    stable, etc.; a pinned _negative_ dentry can become positive at any
    point as long as you are not holding its parent at least shared.
    So using lookup_one_len_unlocked() needs to be careful;
    lookup_positive_unlocked() is safer and that's what the callers
    end up open-coding anyway.

    Signed-off-by: Al Viro

    Al Viro
     

13 Nov, 2019

2 commits

  • The new nfsdcld client tracking operations use sha256 to compute hashes
    of the kerberos principals, so make sure CRYPTO_SHA256 is enabled.

    Fixes: 6ee95d1c8991 ("nfsd: add support for upcall version 2")
    Reported-by: Jamie Heilman
    Signed-off-by: Scott Mayhew
    Signed-off-by: J. Bruce Fields

    Scott Mayhew
     
  • Don't assign an error pointer to cld_net->cn_tfm, otherwise an oops will
    occur in nfsd4_remove_cld_pipe().

    Also, move the initialization of cld_net->cn_tfm so that it occurs after
    the check to see if nfsdcld is running. This is necessary because
    nfsd4_client_tracking_init() looks for -ETIMEDOUT to determine whether
    to use the "old" nfsdcld tracking ops.

    Fixes: 6ee95d1c8991 ("nfsd: add support for upcall version 2")
    Reported-by: Jamie Heilman
    Signed-off-by: Scott Mayhew
    Signed-off-by: J. Bruce Fields

    Scott Mayhew
     

09 Nov, 2019

5 commits


12 Oct, 2019

1 commit

  • Calling sprintf in a loop is not very efficient, and in any case,
    we already have an implementation of bin-to-hex conversion in lib/
    which we might as well use.

    Note that original code used to nul-terminate the destination while
    bin2hex doesn't. That's why replace kmalloc() with kzalloc().

    Signed-off-by: Andy Shevchenko
    Signed-off-by: J. Bruce Fields

    Andy Shevchenko
     

10 Oct, 2019

1 commit

  • When running an nfs stress test, I see quite a few cached replies that
    don't match up with the actual request. The first comment in
    replay_matches_cache() makes sense, but the code doesn't seem to
    match... fix it.

    This isn't exactly a bugfix, as the server isn't required to catch every
    case of a false retry. So, we may as well do this, but if this is
    fixing a problem then that suggests there's a client bug.

    Fixes: 53da6a53e1d4 ("nfsd4: catch some false session retries")
    Signed-off-by: Scott Mayhew
    Signed-off-by: J. Bruce Fields

    Scott Mayhew
     

09 Oct, 2019

2 commits

  • Randy says:
    > sparse complains about these, as does gcc when used with --pedantic.
    > sparse says:
    >
    > ../fs/nfsd/nfs4state.c:2385:23: warning: unknown escape sequence: '\%'
    > ../fs/nfsd/nfs4state.c:2385:23: warning: unknown escape sequence: '\%'
    > ../fs/nfsd/nfs4state.c:2388:23: warning: unknown escape sequence: '\%'
    > ../fs/nfsd/nfs4state.c:2388:23: warning: unknown escape sequence: '\%'

    I'm not sure how this crept in. Fix it.

    Reported-by: Randy Dunlap
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Fixes gcc '-Wunused-but-set-variable' warning:

    fs/nfsd/nfs4xdr.c: In function nfsd4_encode_splice_read:
    fs/nfsd/nfs4xdr.c:3464:7: warning: variable len set but not used [-Wunused-but-set-variable]

    It is not used since commit 83a63072c815 ("nfsd: fix nfs read eof detection")

    Reported-by: Hulk Robot
    Signed-off-by: YueHaibing
    Signed-off-by: J. Bruce Fields

    YueHaibing
     

28 Sep, 2019

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - Add a new knfsd file cache, so that we don't have to open and close
    on each (NFSv2/v3) READ or WRITE. This can speed up read and write
    in some cases. It also replaces our readahead cache.

    - Prevent silent data loss on write errors, by treating write errors
    like server reboots for the purposes of write caching, thus forcing
    clients to resend their writes.

    - Tweak the code that allocates sessions to be more forgiving, so
    that NFSv4.1 mounts are less likely to hang when a server already
    has a lot of clients.

    - Eliminate an arbitrary limit on NFSv4 ACL sizes; they should now be
    limited only by the backend filesystem and the maximum RPC size.

    - Allow the server to enforce use of the correct kerberos credentials
    when a client reclaims state after a reboot.

    And some miscellaneous smaller bugfixes and cleanup"

    * tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux: (34 commits)
    sunrpc: clean up indentation issue
    nfsd: fix nfs read eof detection
    nfsd: Make nfsd_reset_boot_verifier_locked static
    nfsd: degraded slot-count more gracefully as allocation nears exhaustion.
    nfsd: handle drc over-allocation gracefully.
    nfsd: add support for upcall version 2
    nfsd: add a "GetVersion" upcall for nfsdcld
    nfsd: Reset the boot verifier on all write I/O errors
    nfsd: Don't garbage collect files that might contain write errors
    nfsd: Support the server resetting the boot verifier
    nfsd: nfsd_file cache entries should be per net namespace
    nfsd: eliminate an unnecessary acl size limit
    Deprecate nfsd fault injection
    nfsd: remove duplicated include from filecache.c
    nfsd: Fix the documentation for svcxdr_tmpalloc()
    nfsd: Fix up some unused variable warnings
    nfsd: close cached files prior to a REMOVE or RENAME that would replace target
    nfsd: rip out the raparms cache
    nfsd: have nfsd_test_lock use the nfsd_file cache
    nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
    ...

    Linus Torvalds
     

24 Sep, 2019

2 commits

  • Currently, the knfsd server assumes that a short read indicates an
    end of file. That assumption is incorrect. The short read means that
    either we've hit the end of file, or we've hit a read error.

    In the case of a read error, the client may want to retry (as per the
    implementation recommendations in RFC1813 and RFC7530), but currently it
    is being told that it hit an eof.

    Move the code to detect eof from version specific code into the generic
    nfsd read.

    Report eof only in the two following cases:
    1) read() returns a zero length short read with no error.
    2) the offset+length of the read is >= the file size.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • Fix sparse warning:

    fs/nfsd/nfssvc.c:364:6: warning:
    symbol 'nfsd_reset_boot_verifier_locked' was not declared. Should it be static?

    Reported-by: Hulk Robot
    Signed-off-by: YueHaibing
    Signed-off-by: J. Bruce Fields

    YueHaibing
     

21 Sep, 2019

2 commits

  • This original code in nfsd4_get_drc_mem() would hand out 30
    slots (approximately NFSD_MAX_MEM_PER_SESSION bytes at slightly
    over 2K per slot) to each requesting client until it ran out
    of space, then it would possibly give one last client a reduced
    allocation, then fail the allocation.

    Since commit de766e570413 ("nfsd: give out fewer session slots as
    limit approaches") the last 90 slots to be given to about 12
    clients with quickly reducing slot counts (better than just 3
    clients). This still seems unnecessarily hasty.
    A subsequent patch allows over-allocation so every client gets
    at least one slot, but that might be a bit restrictive.

    The requested number of nfsd threads is the best guide we have to the
    expected number of clients, so use that - if it is at least 8.

    256 threads on a 256Meg machine - which is a lot for a tiny machine -
    would result in nfsd_drc_max_mem being 2Meg, so 8K (3 slots) would be
    available for the first client, and over 200 clients would get more
    than 1 slot. So I don't think this change will be too debilitating on
    poorly configured machines, though it does mean that a sensible
    configuration is a little more important.

    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     
  • Currently, if there are more clients than allowed for by the
    space allocation in set_max_drc(), we fail a SESSION_CREATE
    request with NFS4ERR_DELAY.
    This means that the client retries indefinitely, which isn't
    a user-friendly response.

    The RFC requires NFS4ERR_NOSPC, but that would at best result in a
    clean failure on the client, which is not much more friendly.

    The current space allocation is a best-guess and doesn't provide any
    guarantees, we could still run out of space when trying to allocate
    drc space.

    So fail more gracefully - always give out at least one slot.
    If all clients used all the space in all slots, we might start getting
    memory pressure, but that is possible anyway.

    So ensure 'num' is always at least 1, and remove the test for it
    being zero.

    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     

19 Sep, 2019

1 commit

  • Pull vfs mount API infrastructure updates from Al Viro:
    "Infrastructure bits of mount API conversions.

    The rest is more of per-filesystem updates and that will happen
    in separate pull requests"

    * 'work.mount-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    mtd: Provide fs_context-aware mount_mtd() replacement
    vfs: Create fs_context-aware mount_bdev() replacement
    new helper: get_tree_keyed()
    vfs: set fs_context::user_ns for reconfigure

    Linus Torvalds
     

10 Sep, 2019

6 commits

  • Version 2 upcalls will allow the nfsd to include a hash of the kerberos
    principal string in the Cld_Create upcall. If a principal is present in
    the svc_cred, then the hash will be included in the Cld_Create upcall.
    We attempt to use the svc_cred.cr_raw_principal (which is returned by
    gssproxy) first, and then fall back to using the svc_cred.cr_principal
    (which is returned by both gssproxy and rpc.svcgssd). Upon a subsequent
    restart, the hash will be returned in the Cld_Gracestart downcall and
    stored in the reclaim_str_hashtbl so it can be used when handling
    reclaim opens.

    Signed-off-by: Scott Mayhew
    Signed-off-by: J. Bruce Fields

    Scott Mayhew
     
  • Add a "GetVersion" upcall to allow nfsd to determine the maximum upcall
    version that the nfsdcld userspace daemon supports. If the daemon
    responds with -EOPNOTSUPP, then we know it only supports v1.

    Signed-off-by: Scott Mayhew
    Signed-off-by: J. Bruce Fields

    Scott Mayhew
     
  • If multiple clients are writing to the same file, then due to the fact
    we share a single file descriptor between all NFSv3 clients writing
    to the file, we have a situation where clients can miss the fact that
    their file data was not persisted. While this should be rare, it
    could cause silent data loss in situations where multiple clients
    are using NLM locking or O_DIRECT to write to the same file.
    Unfortunately, the stateless nature of NFSv3 and the fact that we
    can only identify clients by their IP address means that we cannot
    trivially cache errors; we would not know when it is safe to
    release them from the cache.

    So the solution is to declare a reboot. We understand that this
    should be a rare occurrence, since disks are usually stable. The
    most frequent occurrence is likely to be ENOSPC, at which point
    all writes to the given filesystem are likely to fail anyway.

    So the expectation is that clients will be forced to retry their
    writes until they hit the fatal error.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • If a file may contain unstable writes that can error out, then we want
    to avoid garbage collecting the struct nfsd_file that may be
    tracking those errors.
    So in the garbage collector, we try to avoid collecting files that aren't
    clean. Furthermore, we avoid immediately kicking off the garbage collector
    in the case where the reference drops to zero for the case where there
    is a write error that is being tracked.

    If the file is unhashed while an error is pending, then declare a
    reboot, to ensure the client resends any unstable writes.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • Add support to allow the server to reset the boot verifier in order to
    force clients to resend I/O after a timeout failure.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Lance Shelton
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • Ensure that we can safely clear out the file cache entries when the
    nfs server is shut down on a container. Otherwise, the file cache
    may end up pinning the mounts.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     

06 Sep, 2019

1 commit


29 Aug, 2019

1 commit

  • We're unnecessarily limiting the size of an ACL to less than what most
    filesystems will support. Some users do hit the limit and it's
    confusing and unnecessary.

    It still seems prudent to impose some limit on the number of ACEs the
    client gives us before passing it straight to kmalloc(). So, let's just
    limit it to the maximum number that would be possible given the amount
    of data left in the argument buffer.

    That will still leave one limit beyond whatever the filesystem imposes:
    the client and server negotiate a limit on the size of a request, which
    we have to respect.

    But we're no longer imposing any additional arbitrary limit.

    struct nfs4_ace is 20 bytes on my system and the maximum call size we'll
    negotiate is about a megabyte, so in practice this is limiting the
    allocation here to about a megabyte.

    Reported-by: "de Vandiere, Louis"
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

26 Aug, 2019

1 commit

  • This is only useful for client testing. I haven't really maintained it,
    and reference counting and locking are wrong at this point. You can get
    some of the same functionality now from nfsd/clients/.

    It was a good idea but I think its time has passed.

    In the unlikely event of users, hopefully the BROKEN dependency will
    prompt them to speak up. Otherwise I expect to remove it soon.

    Reported-by: Alex Lyakas
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

20 Aug, 2019

1 commit


19 Aug, 2019

6 commits

  • Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • It's not uncommon for some workloads to do a bunch of I/O to a file and
    delete it just afterward. If knfsd has a cached open file however, then
    the file may still be open when the dentry is unlinked. If the
    underlying filesystem is nfs, then that could trigger it to do a
    sillyrename.

    On a REMOVE or RENAME scan the nfsd_file cache for open files that
    correspond to the inode, and proactively unhash and put their
    references. This should prevent any delete-on-last-close activity from
    occurring, solely due to knfsd's open file cache.

    This must be done synchronously though so we use the variants that call
    flush_delayed_fput. There are deadlock possibilities if you call
    flush_delayed_fput while holding locks, however. In the case of
    nfsd_rename, we don't even do the lookups of the dentries to be renamed
    until we've locked for rename.

    Once we've figured out what the target dentry is for a rename, check to
    see whether there are cached open files associated with it. If there
    are, then unwind all of the locking, close them all, and then reattempt
    the rename.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust
    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • The raparms cache was set up in order to ensure that we carry readahead
    information forward from one RPC call to the next. In other words, it
    was set up because each RPC call was forced to open a struct file, then
    close it, causing the loss of readahead information that is normally
    cached in that struct file, and used to keep the page cache filled when
    a user calls read() multiple times on the same file descriptor.

    Now that we cache the struct file, and reuse it for all the I/O calls
    to a given file by a given user, we no longer have to keep a separate
    readahead cache.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust
    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust
    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Have nfs4_preprocess_stateid_op pass back a nfsd_file instead of a filp.
    Since we now presume that the struct file will be persistent in most
    cases, we can stop fiddling with the raparms in the read code. This
    also means that we don't really care about the rd_tmp_file field
    anymore.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust
    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Jeff Layton