12 Oct, 2020
1 commit
-
The original intent was presumably to reduce code duplication. The
trade-off was:- No support for an NFSD proc function returning a non-success
RPC accept_stat value.
- No support for void NFS replies to non-NULL procedures.
- Everyone pays for the deduplication with a few extra conditional
branches in a hot path.In addition, nfsd_dispatch() leaves *statp uninitialized in the
success path, unlike svc_generic_dispatch().Address all of these problems by moving the logic for encoding
the NFS status code into the NFS XDR encoders themselves. Then
update the NFS .pc_func methods to return an RPC accept_stat
value.Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields
02 Oct, 2020
4 commits
-
Remove special dispatcher logic for NFSv2 error responses. These are
rare to the point of becoming extinct, but all NFS responses have to
pay the cost of the extra conditional branches.With this change, the NFSv2 error cases now get proper
xdr_ressize_check() calls.Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields -
nfsd_release_fhandle() assumes that rqstp->rq_resp always points to
an nfsd_fhandle struct. In fact, no NFSv2 procedure uses struct
nfsd_fhandle as its response structure.So far that has been "safe" to do because the res structs put the
resp->fh field at that same offset as struct nfsd_fhandle. I don't
think that's a guarantee, though, and there is certainly nothing
preventing a developer from altering the fields in those structures.Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields -
Clean up: These are not used.
Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields -
There's no protection in nfsd_dispatch() against a NULL .pc_func
helpers. A malicious NFS client can trigger a crash by invoking the
unused/unsupported NFSv2 ROOT or WRITECACHE procedures.The current NFSD dispatcher does not support returning a void reply
to a non-NULL procedure, so the reply to both of these is wrong, for
the moment.Cc:
Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields
24 Aug, 2020
1 commit
-
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva
23 Jan, 2020
1 commit
-
When doing an unstable write, we need to ensure that we sample the
write verifier before releasing the lock, and allowing a commit to
the same file to proceed.Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields
20 Dec, 2019
2 commits
-
Change to time64_t and ktime_get_real_seconds() to make the
logic work correctly on 32-bit architectures beyond 2038.Signed-off-by: Arnd Bergmann
Signed-off-by: J. Bruce Fields -
Guardtime handling in nfs3 differs between 32-bit and 64-bit
architectures, and uses the deprecated time_t type.Change it to using time64_t, which behaves the same way on
64-bit and 32-bit architectures, treating the number as an
unsigned 32-bit entity with a range of year 1970 to 2106
consistently, and avoiding the y2038 overflow.Signed-off-by: Arnd Bergmann
Signed-off-by: J. Bruce Fields
24 Sep, 2019
1 commit
-
Currently, the knfsd server assumes that a short read indicates an
end of file. That assumption is incorrect. The short read means that
either we've hit the end of file, or we've hit a read error.In the case of a read error, the client may want to retry (as per the
implementation recommendations in RFC1813 and RFC7530), but currently it
is being told that it hit an eof.Move the code to detect eof from version specific code into the generic
nfsd read.Report eof only in the two following cases:
1) read() returns a zero length short read with no error.
2) the offset+length of the read is >= the file size.Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields
10 Aug, 2018
2 commits
-
I've given up on the idea of zero-copy handling of SYMLINK on the
server side. This is because the Linux VFS symlink API requires the
symlink pathname to be in a NUL-terminated kmalloc'd buffer. The
NUL-termination is going to be problematic (watching out for
landing on a page boundary and dealing with a 4096-byte pathname).I don't believe that SYMLINK creation is on a performance path or is
requested frequently enough that it will cause noticeable CPU cache
pollution due to data copies.There will be two places where a transport callout will be necessary
to fill in the rqstp: one will be in the svc_fill_symlink_pathname()
helper that is used by NFSv2 and NFSv3, and the other will be in
nfsd4_decode_create().Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields -
fill_in_write_vector() is nearly the same logic as
svc_fill_write_vector(), but there are a few differences so that
the former can handle multiple WRITE payloads in a single COMPOUND.svc_fill_write_vector() can be adjusted so that it can be used in
the NFSv4 WRITE code path too. Instead of assuming the pages are
coming from rq_args.pages, have the caller pass in the page list.The immediate benefit is a reduction of code duplication. It also
prevents the NFSv4 WRITE decoder from passing an empty vector
element when the transport has provided the payload in the xdr_buf's
page array.Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields
04 Apr, 2018
2 commits
-
Move common code in NFSD's legacy SYMLINK decoders into a helper.
The immediate benefits include:- one fewer data copies on transports that support DDP
- consistent error checking across all versions
- reduction of code duplication
- support for both legal forms of SYMLINK requests on RDMA
transports for all versions of NFS (in particular, NFSv2, for
completeness)In the long term, this helper is an appropriate spot to perform a
per-transport call-out to fill the pathname argument using, say,
RDMA Reads.Filling the pathname in the proc function also means that eventually
the incoming filehandle can be interpreted so that filesystem-
specific memory can be allocated as a sink for the pathname
argument, rather than using anonymous pages.Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields -
Move common code in NFSD's legacy NFS WRITE decoders into a helper.
The immediate benefit is reduction of code duplication and some nice
micro-optimizations (see below).In the long term, this helper can perform a per-transport call-out
to fill the rq_vec (say, using RDMA Reads).The legacy WRITE decoders and procs are changed to work like NFSv4,
which constructs the rq_vec just before it is about to call
vfs_writev.Why? Calling a transport call-out from the proc instead of the XDR
decoder means that the incoming FH can be resolved to a particular
filesystem and file. This would allow pages from the backing file to
be presented to the transport to be filled, rather than presenting
anonymous pages and copying or flipping them into the file's page
cache later.I also prefer using the pages in rq_arg.pages, instead of pulling
the data pages directly out of the rqstp::rq_pages array. This is
currently the way the NFSv3 write decoder works, but the other two
do not seem to take this approach. Fixing this removes the only
reference to rq_pages found in NFSD, eliminating an NFSD assumption
about how transports use the pages in rq_pages.Lastly, avoid setting up the first element of rq_vec as a zero-
length buffer. This happens with an RDMA transport when a normal
Read chunk is present because the data payload is in rq_arg's
page list (none of it is in the head buffer).Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields
02 Nov, 2017
1 commit
-
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.By default all files without license information are under the default
license of the kernel, which is GPL version 2.Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
15 May, 2017
7 commits
-
Signed-off-by: Christoph Hellwig
Acked-by: Trond Myklebust -
struct svc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.Signed-off-by: Christoph Hellwig
-
pc_count is the only writeable memeber of struct svc_procinfo, which is
a good candidate to be const-ified as it contains function pointers.This patch moves it into out out struct svc_procinfo, and into a
separate writable array that is pointed to by struct svc_version.Signed-off-by: Christoph Hellwig
-
Drop the resp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.Signed-off-by: Christoph Hellwig
Acked-by: Trond Myklebust -
Drop the argp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.Signed-off-by: Christoph Hellwig
-
Drop the p and resp arguments as they are always NULL or can trivially
be derived from the rqstp argument. With that all functions now have the
same prototype, and we can remove the unsafe casting to kxdrproc_t.Signed-off-by: Christoph Hellwig
-
Drop the argp and resp arguments as they can trivially be derived from
the rqstp argument. With that all functions now have the same prototype,
and we can remove the unsafe casting to svc_procfunc as well as the
svc_procfunc typedef itself.Signed-off-by: Christoph Hellwig
11 Mar, 2017
1 commit
-
Now that Ext4 and f2fs filesystems support encrypted directories and
files, attempts to access those files may return ENOKEY, resulting in
the following WARNING.Map ENOKEY to nfserr_perm instead of nfserr_io.
[ 1295.411759] ------------[ cut here ]------------
[ 1295.411787] WARNING: CPU: 0 PID: 12786 at fs/nfsd/nfsproc.c:796 nfserrno+0x74/0x80 [nfsd]
[ 1295.411806] nfsd: non-standard errno: -126
[ 1295.411816] Modules linked in: nfsd nfs_acl auth_rpcgss nfsv4 nfs lockd fscache tun bridge stp llc fuse ip_set nfnetlink vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event coretemp crct10dif_pclmul crc32_generic crc32_pclmul snd_ens1371 gameport ghash_clmulni_intel snd_ac97_codec f2fs intel_rapl_perf ac97_bus snd_seq ppdev snd_pcm snd_rawmidi snd_timer vmw_balloon snd_seq_device snd joydev soundcore parport_pc parport nfit acpi_cpufreq tpm_tis vmw_vmci tpm_tis_core tpm shpchp i2c_piix4 grace sunrpc xfs libcrc32c vmwgfx drm_kms_helper ttm drm crc32c_intel e1000 mptspi scsi_transport_spi serio_raw mptscsih mptbase ata_generic pata_acpi fjes [last unloaded: nfs_acl]
[ 1295.412522] CPU: 0 PID: 12786 Comm: nfsd Tainted: G W 4.11.0-rc1+ #521
[ 1295.412959] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 1295.413814] Call Trace:
[ 1295.414252] dump_stack+0x63/0x86
[ 1295.414666] __warn+0xcb/0xf0
[ 1295.415087] warn_slowpath_fmt+0x5f/0x80
[ 1295.415502] ? put_filp+0x42/0x50
[ 1295.415927] nfserrno+0x74/0x80 [nfsd]
[ 1295.416339] nfsd_open+0xd7/0x180 [nfsd]
[ 1295.416746] nfs4_get_vfs_file+0x367/0x3c0 [nfsd]
[ 1295.417182] ? security_inode_permission+0x41/0x60
[ 1295.417591] nfsd4_process_open2+0x9b2/0x1200 [nfsd]
[ 1295.418007] nfsd4_open+0x481/0x790 [nfsd]
[ 1295.418409] nfsd4_proc_compound+0x395/0x680 [nfsd]
[ 1295.418812] nfsd_dispatch+0xb8/0x1f0 [nfsd]
[ 1295.419233] svc_process_common+0x4d9/0x830 [sunrpc]
[ 1295.419631] svc_process+0xfe/0x1b0 [sunrpc]
[ 1295.420033] nfsd+0xe9/0x150 [nfsd]
[ 1295.420420] kthread+0x101/0x140
[ 1295.420802] ? nfsd_destroy+0x60/0x60 [nfsd]
[ 1295.421199] ? kthread_park+0x90/0x90
[ 1295.421598] ret_from_fork+0x2c/0x40
[ 1295.421996] ---[ end trace 0d5a969cd7852e1f ]---Signed-off-by: Kinglong Mee
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields
01 Feb, 2017
2 commits
-
This is just cleanup, no change in functionality.
Signed-off-by: Kinglong Mee
Reviewed-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields -
After fae5096ad217 "nfsd: assume writeable exportabled filesystems have
f_sync" we no longer modify this argument.This is just cleanup, no change in functionality.
Signed-off-by: Kinglong Mee
Signed-off-by: J. Bruce Fields
14 Oct, 2016
1 commit
-
Pull nfsd updates from Bruce Fields:
"Some RDMA work and some good bugfixes, and two new features that could
benefit from user testing:- Anna Schumacker contributed a simple NFSv4.2 COPY implementation.
COPY is already supported on the client side, so a call to
copy_file_range() on a recent client should now result in a
server-side copy that doesn't require all the data to make a round
trip to the client and back.- Jeff Layton implemented callbacks to notify clients when contended
locks become available, which should reduce latency on workloads
with contended locks"* tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux:
NFSD: Implement the COPY call
nfsd: handle EUCLEAN
nfsd: only WARN once on unmapped errors
exportfs: be careful to only return expected errors.
nfsd4: setclientid_confirm with unmatched verifier should fail
nfsd: randomize SETCLIENTID reply to help distinguish servers
nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies
nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant
nfsd: add a LRU list for blocked locks
nfsd: have nfsd4_lock use blocking locks for v4.1+ locks
nfsd: plumb in a CB_NOTIFY_LOCK operation
NFSD: fix corruption in notifier registration
svcrdma: support Remote Invalidation
svcrdma: Server-side support for rpcrdma_connect_private
rpcrdma: RDMA/CM private message data structure
svcrdma: Skip put_page() when send_reply() fails
svcrdma: Tail iovec leaves an orphaned DMA mapping
nfsd: fix dprintk in nfsd4_encode_getdeviceinfo
nfsd: eliminate cb_minorversion field
nfsd: don't set a FL_LAYOUT lease for flexfiles layouts
08 Oct, 2016
2 commits
-
Eric Sandeen reports that xfs can return this if filesystem corruption
prevented completing the operation.Reported-by: Eric Sandeen
Signed-off-by: J. Bruce Fields -
No need to spam the logs here.
The only drawback is losing information if we ever encounter two
different unmapped errors, but in practice we've rarely see even one.Signed-off-by: J. Bruce Fields
22 Sep, 2016
1 commit
-
inode_change_ok() will be resposible for clearing capabilities and IMA
extended attributes and as such will need dentry. Give it as an argument
to inode_change_ok() instead of an inode. Also rename inode_change_ok()
to setattr_prepare() to better relect that it does also some
modifications in addition to checks.Reviewed-by: Christoph Hellwig
Signed-off-by: Jan Kara
05 Aug, 2016
2 commits
-
There's some odd logic in nfsd_create() that allows it to be called with
the parent directory either locked or unlocked. The only already-locked
caller is NFSv2's nfsd_proc_create(). It's less confusing to split out
the unlocked case into a separate function which the NFSv2 code can call
directly.Also fix some comments while we're here.
Signed-off-by: J. Bruce Fields
-
lookup_one_len already has this check.
The only effect of this patch is to return access instead of perm in the
0-length-filename case. I actually prefer nfserr_perm (or _inval?), but
I doubt anyone cares.The isdotent check seems redundant too, but I worry that some client
might actually care about that strange nfserr_exist error.Signed-off-by: J. Bruce Fields
29 May, 2015
1 commit
-
NFSv2 can set the atime and/or mtime of a file to specific timestamps but not
to the server's current time. To implement the equivalent of utimes("file",
NULL), it uses a heuristic.NFSv3 and later do support setting the atime and/or mtime to the server's
current time directly. The NFSv2 heuristic is still enabled, and causes
timestamps to be set wrong sometimes.Fix this by moving the heuristic into the NFSv2 specific code. We can leave it
out of the create code path: the owner can always set timestamps arbitrarily,
and the workaround would never trigger.Signed-off-by: Andreas Gruenbacher
Reviewed-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields
16 Apr, 2015
1 commit
-
that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: David Howells
Signed-off-by: Al Viro
30 Jul, 2014
1 commit
-
It's possible for nfsd to fail opening a file that it has just created.
When that happens, we throw a WARN but it doesn't include any info about
the error code. Print the status code to give us a bit more info.Our QA group hit some of these warnings under some very heavy stress
testing. My suspicion is that they hit the file-max limit, but it's hard
to know for sure. Go ahead and add a -ENFILE mapping to
nfserr_serverfault to make the error more distinct (and correct).Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields
10 Jul, 2014
1 commit
-
I saw this pop up with some pynfs testing:
[ 123.609992] nfsd: non-standard errno: -7
...and -7 is -E2BIG. I think what happened is that XFS returned -E2BIG
due to some xattr operations with the ACL10 pynfs TEST (I guess it has
limited xattr size?).Add a better mapping for that error since it's possible that we'll need
it. How about we convert it to NFSERR_FBIG? As Bruce points out, they
both have "BIG" in the name so it must be good.Also, turn the printk in this function into a WARN() so that we can get
a bit more information about situations that don't have proper mappings.Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields
09 Jul, 2014
3 commits
-
Commit db2e747b1499 (vfs: remove mode parameter from vfs_symlink())
have remove mode parameter from vfs_symlink.
So that, iattr isn't needed by nfsd_symlink now, just remove it.Signed-off-by: Kinglong Mee
Signed-off-by: J. Bruce Fields -
Currently nfsd_symlink has a weird hack to serve callers who don't
null-terminate symlink data: it looks ahead at the next byte to see if
it's zero, and copies it to a new buffer to null-terminate if not.That means callers don't have to null-terminate, but they *do* have to
ensure that the byte following the end of the data is theirs to read.That's a bit subtle, and the NFSv4 code actually got this wrong.
So let's just throw out that code and let callers pass null-terminated
strings; we've already fixed them to do that.Signed-off-by: J. Bruce Fields
-
It's simple enough for NFSv2 to null-terminate the symlink data.
A bit weird (it depends on knowing that we've already read the following
byte, which is either padding or part of the mode), but no worse than
the conditional kstrdup it otherwise relies on in nfsd_symlink().Signed-off-by: J. Bruce Fields
26 Feb, 2013
1 commit
-
Signed-off-by: Al Viro
31 Jul, 2012
1 commit
-
When mnt_want_write() starts to handle freezing it will get a full lock
semantics requiring proper lock ordering. So push mnt_want_write() call
consistently outside of i_mutex.CC: linux-nfs@vger.kernel.org
CC: "J. Bruce Fields"
Signed-off-by: Jan Kara
Signed-off-by: Al Viro