08 Dec, 2019
3 commits
-
Pull nfsd updates from Bruce Fields:
"This is a relatively quiet cycle for nfsd, mainly various bugfixes.Possibly most interesting is Trond's fixes for some callback races
that were due to my incomplete understanding of rpc client shutdown.
Unfortunately at the last minute I've started noticing a new
intermittent failure to send callbacks. As the logic seems basically
correct, I'm leaving Trond's patches in for now, and hope to find a
fix in the next week so I don't have to revert those patches"* tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linux: (24 commits)
nfsd: depend on CRYPTO_MD5 for legacy client tracking
NFSD fixing possible null pointer derefering in copy offload
nfsd: check for EBUSY from vfs_rmdir/vfs_unink.
nfsd: Ensure CLONE persists data and metadata changes to the target file
SUNRPC: Fix backchannel latency metrics
nfsd: restore NFSv3 ACL support
nfsd: v4 support requires CRYPTO_SHA256
nfsd: Fix cld_net->cn_tfm initialization
lockd: remove __KERNEL__ ifdefs
sunrpc: remove __KERNEL__ ifdefs
race in exportfs_decode_fh()
nfsd: Drop LIST_HEAD where the variable it declares is never used.
nfsd: document callback_wq serialization of callback code
nfsd: mark cb path down on unknown errors
nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()
nfsd: minor 4.1 callback cleanup
SUNRPC: Fix svcauth_gss_proxy_init()
SUNRPC: Trace gssproxy upcall results
sunrpc: fix crash when cache_head become valid before update
nfsd: remove private bin2hex implementation
... -
The legacy client tracking infrastructure of nfsd makes use of MD5 to
derive a client's recovery directory name. As the nfsd module doesn't
declare any dependency on CRYPTO_MD5, though, it may fail to allocate
the hash if the kernel was compiled without it. As a result, generation
of client recovery directories will fail with the following error:NFSD: unable to generate recoverydir name
The explicit dependency on CRYPTO_MD5 was removed as redundant back in
6aaa67b5f3b9 (NFSD: Remove redundant "select" clauses in fs/Kconfig
2008-02-11) as it was already implicitly selected via RPCSEC_GSS_KRB5.
This broke when RPCSEC_GSS_KRB5 was made optional for NFSv4 in commit
df486a25900f (NFS: Fix the selection of security flavours in Kconfig) at
a later point.Fix the issue by adding back an explicit dependency on CRYPTO_MD5.
Fixes: df486a25900f (NFS: Fix the selection of security flavours in Kconfig)
Signed-off-by: Patrick Steinhardt
Signed-off-by: J. Bruce Fields -
Static checker revealed possible error path leading to possible
NULL pointer dereferencing.Reported-by: Dan Carpenter
Fixes: e0639dc5805a: ("NFSD introduce async copy feature")
Signed-off-by: Olga Kornievskaia
Signed-off-by: J. Bruce Fields
01 Dec, 2019
2 commits
-
vfs_rmdir and vfs_unlink can return -EBUSY if the
target is a mountpoint. This currently gets passed to
nfserrno() by nfsd_unlink(), and that results in a WARNing,
which is not user-friendly.Possibly the best NFSv4 error is NFS4ERR_FILE_OPEN, because
there is a sense in which the object is currently in use
by some other task. The Linux NFSv4 client will map this
back to EBUSY, which is an added benefit.For NFSv3, the best we can do is probably NFS3ERR_ACCES, which isn't
true, but is not less true than the other options.Signed-off-by: NeilBrown
Signed-off-by: J. Bruce Fields -
The NFSv4.2 CLONE operation has implicit persistence requirements on the
target file, since there is no protocol requirement that the client issue
a separate operation to persist data.
For that reason, we should call vfs_fsync_range() on the destination file
after a successful call to vfs_clone_file_range().Fixes: ffa0160a1039 ("nfsd: implement the NFSv4.2 CLONE operation")
Signed-off-by: Trond Myklebust
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: J. Bruce Fields
20 Nov, 2019
1 commit
-
An error in e333f3bbefe3 left the nfsd_acl_program->pg_vers array empty,
which effectively turned off the server's support for NFSv3 ACLs.Fixes: e333f3bbefe3 "nfsd: Allow containers to set supported nfs versions"
Cc: stable@vger.kernel.org
Cc: Trond Myklebust
Signed-off-by: J. Bruce Fields
16 Nov, 2019
1 commit
-
Most of the callers of lookup_one_len_unlocked() treat negatives are
ERR_PTR(-ENOENT). Provide a helper that would do just that. Note
that a pinned positive dentry remains positive - it's ->d_inode is
stable, etc.; a pinned _negative_ dentry can become positive at any
point as long as you are not holding its parent at least shared.
So using lookup_one_len_unlocked() needs to be careful;
lookup_positive_unlocked() is safer and that's what the callers
end up open-coding anyway.Signed-off-by: Al Viro
13 Nov, 2019
2 commits
-
The new nfsdcld client tracking operations use sha256 to compute hashes
of the kerberos principals, so make sure CRYPTO_SHA256 is enabled.Fixes: 6ee95d1c8991 ("nfsd: add support for upcall version 2")
Reported-by: Jamie Heilman
Signed-off-by: Scott Mayhew
Signed-off-by: J. Bruce Fields -
Don't assign an error pointer to cld_net->cn_tfm, otherwise an oops will
occur in nfsd4_remove_cld_pipe().Also, move the initialization of cld_net->cn_tfm so that it occurs after
the check to see if nfsdcld is running. This is necessary because
nfsd4_client_tracking_init() looks for -ETIMEDOUT to determine whether
to use the "old" nfsdcld tracking ops.Fixes: 6ee95d1c8991 ("nfsd: add support for upcall version 2")
Reported-by: Jamie Heilman
Signed-off-by: Scott Mayhew
Signed-off-by: J. Bruce Fields
09 Nov, 2019
5 commits
-
The declarations were introduced with the file, but the declared
variables were not used.Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd")
Signed-off-by: Mao Wenan
Signed-off-by: J. Bruce Fields -
The callback code relies on the fact that much of it is only ever called
from the ordered workqueue callback_wq, and this is worth documenting.Reported-by: Trond Myklebust
Signed-off-by: J. Bruce Fields -
An unexpected error is probably a sign that something is wrong with the
callback path.Signed-off-by: J. Bruce Fields
-
When we're destroying the client lease, and we call
nfsd4_shutdown_callback(), we must ensure that we do not return
before all outstanding callbacks have terminated and have
released their payloads.Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields -
Move all the cb_holds_slot management into helper functions. No change
in behavior.Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields
12 Oct, 2019
1 commit
-
Calling sprintf in a loop is not very efficient, and in any case,
we already have an implementation of bin-to-hex conversion in lib/
which we might as well use.Note that original code used to nul-terminate the destination while
bin2hex doesn't. That's why replace kmalloc() with kzalloc().Signed-off-by: Andy Shevchenko
Signed-off-by: J. Bruce Fields
10 Oct, 2019
1 commit
-
When running an nfs stress test, I see quite a few cached replies that
don't match up with the actual request. The first comment in
replay_matches_cache() makes sense, but the code doesn't seem to
match... fix it.This isn't exactly a bugfix, as the server isn't required to catch every
case of a false retry. So, we may as well do this, but if this is
fixing a problem then that suggests there's a client bug.Fixes: 53da6a53e1d4 ("nfsd4: catch some false session retries")
Signed-off-by: Scott Mayhew
Signed-off-by: J. Bruce Fields
09 Oct, 2019
2 commits
-
Randy says:
> sparse complains about these, as does gcc when used with --pedantic.
> sparse says:
>
> ../fs/nfsd/nfs4state.c:2385:23: warning: unknown escape sequence: '\%'
> ../fs/nfsd/nfs4state.c:2385:23: warning: unknown escape sequence: '\%'
> ../fs/nfsd/nfs4state.c:2388:23: warning: unknown escape sequence: '\%'
> ../fs/nfsd/nfs4state.c:2388:23: warning: unknown escape sequence: '\%'I'm not sure how this crept in. Fix it.
Reported-by: Randy Dunlap
Signed-off-by: J. Bruce Fields -
Fixes gcc '-Wunused-but-set-variable' warning:
fs/nfsd/nfs4xdr.c: In function nfsd4_encode_splice_read:
fs/nfsd/nfs4xdr.c:3464:7: warning: variable len set but not used [-Wunused-but-set-variable]It is not used since commit 83a63072c815 ("nfsd: fix nfs read eof detection")
Reported-by: Hulk Robot
Signed-off-by: YueHaibing
Signed-off-by: J. Bruce Fields
28 Sep, 2019
1 commit
-
Pull nfsd updates from Bruce Fields:
"Highlights:- Add a new knfsd file cache, so that we don't have to open and close
on each (NFSv2/v3) READ or WRITE. This can speed up read and write
in some cases. It also replaces our readahead cache.- Prevent silent data loss on write errors, by treating write errors
like server reboots for the purposes of write caching, thus forcing
clients to resend their writes.- Tweak the code that allocates sessions to be more forgiving, so
that NFSv4.1 mounts are less likely to hang when a server already
has a lot of clients.- Eliminate an arbitrary limit on NFSv4 ACL sizes; they should now be
limited only by the backend filesystem and the maximum RPC size.- Allow the server to enforce use of the correct kerberos credentials
when a client reclaims state after a reboot.And some miscellaneous smaller bugfixes and cleanup"
* tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux: (34 commits)
sunrpc: clean up indentation issue
nfsd: fix nfs read eof detection
nfsd: Make nfsd_reset_boot_verifier_locked static
nfsd: degraded slot-count more gracefully as allocation nears exhaustion.
nfsd: handle drc over-allocation gracefully.
nfsd: add support for upcall version 2
nfsd: add a "GetVersion" upcall for nfsdcld
nfsd: Reset the boot verifier on all write I/O errors
nfsd: Don't garbage collect files that might contain write errors
nfsd: Support the server resetting the boot verifier
nfsd: nfsd_file cache entries should be per net namespace
nfsd: eliminate an unnecessary acl size limit
Deprecate nfsd fault injection
nfsd: remove duplicated include from filecache.c
nfsd: Fix the documentation for svcxdr_tmpalloc()
nfsd: Fix up some unused variable warnings
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
nfsd: rip out the raparms cache
nfsd: have nfsd_test_lock use the nfsd_file cache
nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
...
24 Sep, 2019
2 commits
-
Currently, the knfsd server assumes that a short read indicates an
end of file. That assumption is incorrect. The short read means that
either we've hit the end of file, or we've hit a read error.In the case of a read error, the client may want to retry (as per the
implementation recommendations in RFC1813 and RFC7530), but currently it
is being told that it hit an eof.Move the code to detect eof from version specific code into the generic
nfsd read.Report eof only in the two following cases:
1) read() returns a zero length short read with no error.
2) the offset+length of the read is >= the file size.Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields -
Fix sparse warning:
fs/nfsd/nfssvc.c:364:6: warning:
symbol 'nfsd_reset_boot_verifier_locked' was not declared. Should it be static?Reported-by: Hulk Robot
Signed-off-by: YueHaibing
Signed-off-by: J. Bruce Fields
21 Sep, 2019
2 commits
-
This original code in nfsd4_get_drc_mem() would hand out 30
slots (approximately NFSD_MAX_MEM_PER_SESSION bytes at slightly
over 2K per slot) to each requesting client until it ran out
of space, then it would possibly give one last client a reduced
allocation, then fail the allocation.Since commit de766e570413 ("nfsd: give out fewer session slots as
limit approaches") the last 90 slots to be given to about 12
clients with quickly reducing slot counts (better than just 3
clients). This still seems unnecessarily hasty.
A subsequent patch allows over-allocation so every client gets
at least one slot, but that might be a bit restrictive.The requested number of nfsd threads is the best guide we have to the
expected number of clients, so use that - if it is at least 8.256 threads on a 256Meg machine - which is a lot for a tiny machine -
would result in nfsd_drc_max_mem being 2Meg, so 8K (3 slots) would be
available for the first client, and over 200 clients would get more
than 1 slot. So I don't think this change will be too debilitating on
poorly configured machines, though it does mean that a sensible
configuration is a little more important.Signed-off-by: NeilBrown
Signed-off-by: J. Bruce Fields -
Currently, if there are more clients than allowed for by the
space allocation in set_max_drc(), we fail a SESSION_CREATE
request with NFS4ERR_DELAY.
This means that the client retries indefinitely, which isn't
a user-friendly response.The RFC requires NFS4ERR_NOSPC, but that would at best result in a
clean failure on the client, which is not much more friendly.The current space allocation is a best-guess and doesn't provide any
guarantees, we could still run out of space when trying to allocate
drc space.So fail more gracefully - always give out at least one slot.
If all clients used all the space in all slots, we might start getting
memory pressure, but that is possible anyway.So ensure 'num' is always at least 1, and remove the test for it
being zero.Signed-off-by: NeilBrown
Signed-off-by: J. Bruce Fields
19 Sep, 2019
1 commit
-
Pull vfs mount API infrastructure updates from Al Viro:
"Infrastructure bits of mount API conversions.The rest is more of per-filesystem updates and that will happen
in separate pull requests"* 'work.mount-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
mtd: Provide fs_context-aware mount_mtd() replacement
vfs: Create fs_context-aware mount_bdev() replacement
new helper: get_tree_keyed()
vfs: set fs_context::user_ns for reconfigure
10 Sep, 2019
6 commits
-
Version 2 upcalls will allow the nfsd to include a hash of the kerberos
principal string in the Cld_Create upcall. If a principal is present in
the svc_cred, then the hash will be included in the Cld_Create upcall.
We attempt to use the svc_cred.cr_raw_principal (which is returned by
gssproxy) first, and then fall back to using the svc_cred.cr_principal
(which is returned by both gssproxy and rpc.svcgssd). Upon a subsequent
restart, the hash will be returned in the Cld_Gracestart downcall and
stored in the reclaim_str_hashtbl so it can be used when handling
reclaim opens.Signed-off-by: Scott Mayhew
Signed-off-by: J. Bruce Fields -
Add a "GetVersion" upcall to allow nfsd to determine the maximum upcall
version that the nfsdcld userspace daemon supports. If the daemon
responds with -EOPNOTSUPP, then we know it only supports v1.Signed-off-by: Scott Mayhew
Signed-off-by: J. Bruce Fields -
If multiple clients are writing to the same file, then due to the fact
we share a single file descriptor between all NFSv3 clients writing
to the file, we have a situation where clients can miss the fact that
their file data was not persisted. While this should be rare, it
could cause silent data loss in situations where multiple clients
are using NLM locking or O_DIRECT to write to the same file.
Unfortunately, the stateless nature of NFSv3 and the fact that we
can only identify clients by their IP address means that we cannot
trivially cache errors; we would not know when it is safe to
release them from the cache.So the solution is to declare a reboot. We understand that this
should be a rare occurrence, since disks are usually stable. The
most frequent occurrence is likely to be ENOSPC, at which point
all writes to the given filesystem are likely to fail anyway.So the expectation is that clients will be forced to retry their
writes until they hit the fatal error.Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields -
If a file may contain unstable writes that can error out, then we want
to avoid garbage collecting the struct nfsd_file that may be
tracking those errors.
So in the garbage collector, we try to avoid collecting files that aren't
clean. Furthermore, we avoid immediately kicking off the garbage collector
in the case where the reference drops to zero for the case where there
is a write error that is being tracked.If the file is unhashed while an error is pending, then declare a
reboot, to ensure the client resends any unstable writes.Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields -
Add support to allow the server to reset the boot verifier in order to
force clients to resend I/O after a timeout failure.Signed-off-by: Trond Myklebust
Signed-off-by: Lance Shelton
Signed-off-by: J. Bruce Fields -
Ensure that we can safely clear out the file cache entries when the
nfs server is shut down on a container. Otherwise, the file cache
may end up pinning the mounts.Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields
06 Sep, 2019
1 commit
-
For vfs_get_keyed_super users.
Signed-off-by: Al Viro
29 Aug, 2019
1 commit
-
We're unnecessarily limiting the size of an ACL to less than what most
filesystems will support. Some users do hit the limit and it's
confusing and unnecessary.It still seems prudent to impose some limit on the number of ACEs the
client gives us before passing it straight to kmalloc(). So, let's just
limit it to the maximum number that would be possible given the amount
of data left in the argument buffer.That will still leave one limit beyond whatever the filesystem imposes:
the client and server negotiate a limit on the size of a request, which
we have to respect.But we're no longer imposing any additional arbitrary limit.
struct nfs4_ace is 20 bytes on my system and the maximum call size we'll
negotiate is about a megabyte, so in practice this is limiting the
allocation here to about a megabyte.Reported-by: "de Vandiere, Louis"
Signed-off-by: J. Bruce Fields
26 Aug, 2019
1 commit
-
This is only useful for client testing. I haven't really maintained it,
and reference counting and locking are wrong at this point. You can get
some of the same functionality now from nfsd/clients/.It was a good idea but I think its time has passed.
In the unlikely event of users, hopefully the BROKEN dependency will
prompt them to speak up. Otherwise I expect to remove it soon.Reported-by: Alex Lyakas
Signed-off-by: J. Bruce Fields
20 Aug, 2019
1 commit
-
Remove duplicated include.
Signed-off-by: YueHaibing
Signed-off-by: J. Bruce Fields
19 Aug, 2019
6 commits
-
Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields -
Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields -
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust
Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields -
The raparms cache was set up in order to ensure that we carry readahead
information forward from one RPC call to the next. In other words, it
was set up because each RPC call was forced to open a struct file, then
close it, causing the loss of readahead information that is normally
cached in that struct file, and used to keep the page cache filled when
a user calls read() multiple times on the same file descriptor.Now that we cache the struct file, and reuse it for all the I/O calls
to a given file by a given user, we no longer have to keep a separate
readahead cache.Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust
Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields -
Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust
Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields -
Have nfs4_preprocess_stateid_op pass back a nfsd_file instead of a filp.
Since we now presume that the struct file will be persistent in most
cases, we can stop fiddling with the raparms in the read code. This
also means that we don't really care about the rd_tmp_file field
anymore.Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust
Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields