12 Oct, 2020

1 commit

  • The original intent was presumably to reduce code duplication. The
    trade-off was:

    - No support for an NFSD proc function returning a non-success
    RPC accept_stat value.
    - No support for void NFS replies to non-NULL procedures.
    - Everyone pays for the deduplication with a few extra conditional
    branches in a hot path.

    In addition, nfsd_dispatch() leaves *statp uninitialized in the
    success path, unlike svc_generic_dispatch().

    Address all of these problems by moving the logic for encoding
    the NFS status code into the NFS XDR encoders themselves. Then
    update the NFS .pc_func methods to return an RPC accept_stat
    value.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

02 Oct, 2020

4 commits

  • Remove special dispatcher logic for NFSv2 error responses. These are
    rare to the point of becoming extinct, but all NFS responses have to
    pay the cost of the extra conditional branches.

    With this change, the NFSv2 error cases now get proper
    xdr_ressize_check() calls.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • nfsd_release_fhandle() assumes that rqstp->rq_resp always points to
    an nfsd_fhandle struct. In fact, no NFSv2 procedure uses struct
    nfsd_fhandle as its response structure.

    So far that has been "safe" to do because the res structs put the
    resp->fh field at that same offset as struct nfsd_fhandle. I don't
    think that's a guarantee, though, and there is certainly nothing
    preventing a developer from altering the fields in those structures.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: These are not used.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • There's no protection in nfsd_dispatch() against a NULL .pc_func
    helpers. A malicious NFS client can trigger a crash by invoking the
    unused/unsupported NFSv2 ROOT or WRITECACHE procedures.

    The current NFSD dispatcher does not support returning a void reply
    to a non-NULL procedure, so the reply to both of these is wrong, for
    the moment.

    Cc:
    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

23 Jan, 2020

1 commit


20 Dec, 2019

2 commits

  • Change to time64_t and ktime_get_real_seconds() to make the
    logic work correctly on 32-bit architectures beyond 2038.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: J. Bruce Fields

    Arnd Bergmann
     
  • Guardtime handling in nfs3 differs between 32-bit and 64-bit
    architectures, and uses the deprecated time_t type.

    Change it to using time64_t, which behaves the same way on
    64-bit and 32-bit architectures, treating the number as an
    unsigned 32-bit entity with a range of year 1970 to 2106
    consistently, and avoiding the y2038 overflow.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: J. Bruce Fields

    Arnd Bergmann
     

24 Sep, 2019

1 commit

  • Currently, the knfsd server assumes that a short read indicates an
    end of file. That assumption is incorrect. The short read means that
    either we've hit the end of file, or we've hit a read error.

    In the case of a read error, the client may want to retry (as per the
    implementation recommendations in RFC1813 and RFC7530), but currently it
    is being told that it hit an eof.

    Move the code to detect eof from version specific code into the generic
    nfsd read.

    Report eof only in the two following cases:
    1) read() returns a zero length short read with no error.
    2) the offset+length of the read is >= the file size.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     

10 Aug, 2018

2 commits

  • I've given up on the idea of zero-copy handling of SYMLINK on the
    server side. This is because the Linux VFS symlink API requires the
    symlink pathname to be in a NUL-terminated kmalloc'd buffer. The
    NUL-termination is going to be problematic (watching out for
    landing on a page boundary and dealing with a 4096-byte pathname).

    I don't believe that SYMLINK creation is on a performance path or is
    requested frequently enough that it will cause noticeable CPU cache
    pollution due to data copies.

    There will be two places where a transport callout will be necessary
    to fill in the rqstp: one will be in the svc_fill_symlink_pathname()
    helper that is used by NFSv2 and NFSv3, and the other will be in
    nfsd4_decode_create().

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • fill_in_write_vector() is nearly the same logic as
    svc_fill_write_vector(), but there are a few differences so that
    the former can handle multiple WRITE payloads in a single COMPOUND.

    svc_fill_write_vector() can be adjusted so that it can be used in
    the NFSv4 WRITE code path too. Instead of assuming the pages are
    coming from rq_args.pages, have the caller pass in the page list.

    The immediate benefit is a reduction of code duplication. It also
    prevents the NFSv4 WRITE decoder from passing an empty vector
    element when the transport has provided the payload in the xdr_buf's
    page array.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

04 Apr, 2018

2 commits

  • Move common code in NFSD's legacy SYMLINK decoders into a helper.
    The immediate benefits include:

    - one fewer data copies on transports that support DDP
    - consistent error checking across all versions
    - reduction of code duplication
    - support for both legal forms of SYMLINK requests on RDMA
    transports for all versions of NFS (in particular, NFSv2, for
    completeness)

    In the long term, this helper is an appropriate spot to perform a
    per-transport call-out to fill the pathname argument using, say,
    RDMA Reads.

    Filling the pathname in the proc function also means that eventually
    the incoming filehandle can be interpreted so that filesystem-
    specific memory can be allocated as a sink for the pathname
    argument, rather than using anonymous pages.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Move common code in NFSD's legacy NFS WRITE decoders into a helper.
    The immediate benefit is reduction of code duplication and some nice
    micro-optimizations (see below).

    In the long term, this helper can perform a per-transport call-out
    to fill the rq_vec (say, using RDMA Reads).

    The legacy WRITE decoders and procs are changed to work like NFSv4,
    which constructs the rq_vec just before it is about to call
    vfs_writev.

    Why? Calling a transport call-out from the proc instead of the XDR
    decoder means that the incoming FH can be resolved to a particular
    filesystem and file. This would allow pages from the backing file to
    be presented to the transport to be filled, rather than presenting
    anonymous pages and copying or flipping them into the file's page
    cache later.

    I also prefer using the pages in rq_arg.pages, instead of pulling
    the data pages directly out of the rqstp::rq_pages array. This is
    currently the way the NFSv3 write decoder works, but the other two
    do not seem to take this approach. Fixing this removes the only
    reference to rq_pages found in NFSD, eliminating an NFSD assumption
    about how transports use the pages in rq_pages.

    Lastly, avoid setting up the first element of rq_vec as a zero-
    length buffer. This happens with an RDMA transport when a normal
    Read chunk is present because the data payload is in rq_arg's
    page list (none of it is in the head buffer).

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

15 May, 2017

7 commits


11 Mar, 2017

1 commit

  • Now that Ext4 and f2fs filesystems support encrypted directories and
    files, attempts to access those files may return ENOKEY, resulting in
    the following WARNING.

    Map ENOKEY to nfserr_perm instead of nfserr_io.

    [ 1295.411759] ------------[ cut here ]------------
    [ 1295.411787] WARNING: CPU: 0 PID: 12786 at fs/nfsd/nfsproc.c:796 nfserrno+0x74/0x80 [nfsd]
    [ 1295.411806] nfsd: non-standard errno: -126
    [ 1295.411816] Modules linked in: nfsd nfs_acl auth_rpcgss nfsv4 nfs lockd fscache tun bridge stp llc fuse ip_set nfnetlink vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event coretemp crct10dif_pclmul crc32_generic crc32_pclmul snd_ens1371 gameport ghash_clmulni_intel snd_ac97_codec f2fs intel_rapl_perf ac97_bus snd_seq ppdev snd_pcm snd_rawmidi snd_timer vmw_balloon snd_seq_device snd joydev soundcore parport_pc parport nfit acpi_cpufreq tpm_tis vmw_vmci tpm_tis_core tpm shpchp i2c_piix4 grace sunrpc xfs libcrc32c vmwgfx drm_kms_helper ttm drm crc32c_intel e1000 mptspi scsi_transport_spi serio_raw mptscsih mptbase ata_generic pata_acpi fjes [last unloaded: nfs_acl]
    [ 1295.412522] CPU: 0 PID: 12786 Comm: nfsd Tainted: G W 4.11.0-rc1+ #521
    [ 1295.412959] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
    [ 1295.413814] Call Trace:
    [ 1295.414252] dump_stack+0x63/0x86
    [ 1295.414666] __warn+0xcb/0xf0
    [ 1295.415087] warn_slowpath_fmt+0x5f/0x80
    [ 1295.415502] ? put_filp+0x42/0x50
    [ 1295.415927] nfserrno+0x74/0x80 [nfsd]
    [ 1295.416339] nfsd_open+0xd7/0x180 [nfsd]
    [ 1295.416746] nfs4_get_vfs_file+0x367/0x3c0 [nfsd]
    [ 1295.417182] ? security_inode_permission+0x41/0x60
    [ 1295.417591] nfsd4_process_open2+0x9b2/0x1200 [nfsd]
    [ 1295.418007] nfsd4_open+0x481/0x790 [nfsd]
    [ 1295.418409] nfsd4_proc_compound+0x395/0x680 [nfsd]
    [ 1295.418812] nfsd_dispatch+0xb8/0x1f0 [nfsd]
    [ 1295.419233] svc_process_common+0x4d9/0x830 [sunrpc]
    [ 1295.419631] svc_process+0xfe/0x1b0 [sunrpc]
    [ 1295.420033] nfsd+0xe9/0x150 [nfsd]
    [ 1295.420420] kthread+0x101/0x140
    [ 1295.420802] ? nfsd_destroy+0x60/0x60 [nfsd]
    [ 1295.421199] ? kthread_park+0x90/0x90
    [ 1295.421598] ret_from_fork+0x2c/0x40
    [ 1295.421996] ---[ end trace 0d5a969cd7852e1f ]---

    Signed-off-by: Kinglong Mee
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    Kinglong Mee
     

01 Feb, 2017

2 commits


14 Oct, 2016

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Some RDMA work and some good bugfixes, and two new features that could
    benefit from user testing:

    - Anna Schumacker contributed a simple NFSv4.2 COPY implementation.
    COPY is already supported on the client side, so a call to
    copy_file_range() on a recent client should now result in a
    server-side copy that doesn't require all the data to make a round
    trip to the client and back.

    - Jeff Layton implemented callbacks to notify clients when contended
    locks become available, which should reduce latency on workloads
    with contended locks"

    * tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux:
    NFSD: Implement the COPY call
    nfsd: handle EUCLEAN
    nfsd: only WARN once on unmapped errors
    exportfs: be careful to only return expected errors.
    nfsd4: setclientid_confirm with unmatched verifier should fail
    nfsd: randomize SETCLIENTID reply to help distinguish servers
    nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies
    nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant
    nfsd: add a LRU list for blocked locks
    nfsd: have nfsd4_lock use blocking locks for v4.1+ locks
    nfsd: plumb in a CB_NOTIFY_LOCK operation
    NFSD: fix corruption in notifier registration
    svcrdma: support Remote Invalidation
    svcrdma: Server-side support for rpcrdma_connect_private
    rpcrdma: RDMA/CM private message data structure
    svcrdma: Skip put_page() when send_reply() fails
    svcrdma: Tail iovec leaves an orphaned DMA mapping
    nfsd: fix dprintk in nfsd4_encode_getdeviceinfo
    nfsd: eliminate cb_minorversion field
    nfsd: don't set a FL_LAYOUT lease for flexfiles layouts

    Linus Torvalds
     

08 Oct, 2016

2 commits


22 Sep, 2016

1 commit

  • inode_change_ok() will be resposible for clearing capabilities and IMA
    extended attributes and as such will need dentry. Give it as an argument
    to inode_change_ok() instead of an inode. Also rename inode_change_ok()
    to setattr_prepare() to better relect that it does also some
    modifications in addition to checks.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     

05 Aug, 2016

2 commits

  • There's some odd logic in nfsd_create() that allows it to be called with
    the parent directory either locked or unlocked. The only already-locked
    caller is NFSv2's nfsd_proc_create(). It's less confusing to split out
    the unlocked case into a separate function which the NFSv2 code can call
    directly.

    Also fix some comments while we're here.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • lookup_one_len already has this check.

    The only effect of this patch is to return access instead of perm in the
    0-length-filename case. I actually prefer nfserr_perm (or _inval?), but
    I doubt anyone cares.

    The isdotent check seems redundant too, but I worry that some client
    might actually care about that strange nfserr_exist error.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

29 May, 2015

1 commit

  • NFSv2 can set the atime and/or mtime of a file to specific timestamps but not
    to the server's current time. To implement the equivalent of utimes("file",
    NULL), it uses a heuristic.

    NFSv3 and later do support setting the atime and/or mtime to the server's
    current time directly. The NFSv2 heuristic is still enabled, and causes
    timestamps to be set wrong sometimes.

    Fix this by moving the heuristic into the NFSv2 specific code. We can leave it
    out of the create code path: the owner can always set timestamps arbitrarily,
    and the workaround would never trigger.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields

    Andreas Gruenbacher
     

16 Apr, 2015

1 commit


30 Jul, 2014

1 commit

  • It's possible for nfsd to fail opening a file that it has just created.
    When that happens, we throw a WARN but it doesn't include any info about
    the error code. Print the status code to give us a bit more info.

    Our QA group hit some of these warnings under some very heavy stress
    testing. My suspicion is that they hit the file-max limit, but it's hard
    to know for sure. Go ahead and add a -ENFILE mapping to
    nfserr_serverfault to make the error more distinct (and correct).

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

10 Jul, 2014

1 commit

  • I saw this pop up with some pynfs testing:

    [ 123.609992] nfsd: non-standard errno: -7

    ...and -7 is -E2BIG. I think what happened is that XFS returned -E2BIG
    due to some xattr operations with the ACL10 pynfs TEST (I guess it has
    limited xattr size?).

    Add a better mapping for that error since it's possible that we'll need
    it. How about we convert it to NFSERR_FBIG? As Bruce points out, they
    both have "BIG" in the name so it must be good.

    Also, turn the printk in this function into a WARN() so that we can get
    a bit more information about situations that don't have proper mappings.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

09 Jul, 2014

3 commits

  • Commit db2e747b1499 (vfs: remove mode parameter from vfs_symlink())
    have remove mode parameter from vfs_symlink.
    So that, iattr isn't needed by nfsd_symlink now, just remove it.

    Signed-off-by: Kinglong Mee
    Signed-off-by: J. Bruce Fields

    Kinglong Mee
     
  • Currently nfsd_symlink has a weird hack to serve callers who don't
    null-terminate symlink data: it looks ahead at the next byte to see if
    it's zero, and copies it to a new buffer to null-terminate if not.

    That means callers don't have to null-terminate, but they *do* have to
    ensure that the byte following the end of the data is theirs to read.

    That's a bit subtle, and the NFSv4 code actually got this wrong.

    So let's just throw out that code and let callers pass null-terminated
    strings; we've already fixed them to do that.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • It's simple enough for NFSv2 to null-terminate the symlink data.

    A bit weird (it depends on knowing that we've already read the following
    byte, which is either padding or part of the mode), but no worse than
    the conditional kstrdup it otherwise relies on in nfsd_symlink().

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

26 Feb, 2013

1 commit


31 Jul, 2012

1 commit

  • When mnt_want_write() starts to handle freezing it will get a full lock
    semantics requiring proper lock ordering. So push mnt_want_write() call
    consistently outside of i_mutex.

    CC: linux-nfs@vger.kernel.org
    CC: "J. Bruce Fields"
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara