12 Oct, 2020

5 commits

  • Reply to the client with multiple hole and data segments. I use the
    result of the first vfs_llseek() call for encoding as an optimization so
    we don't have to immediately repeat the call. This also lets us encode
    any remaining reply as data if we get an unexpected result while trying
    to calculate a hole.

    Signed-off-by: Anna Schumaker
    Signed-off-by: J. Bruce Fields

    Anna Schumaker
     
  • But only one of each right now. We'll expand on this in the next patch.

    Signed-off-by: Anna Schumaker
    Signed-off-by: J. Bruce Fields

    Anna Schumaker
     
  • However, we still only reply to the READ_PLUS call with a single segment
    at this time.

    Signed-off-by: Anna Schumaker
    Signed-off-by: J. Bruce Fields

    Anna Schumaker
     
  • This patch adds READ_PLUS support for returning a single
    NFS4_CONTENT_DATA segment to the client. This is basically the same as
    the READ operation, only with the extra information about data segments.

    Signed-off-by: Anna Schumaker
    Signed-off-by: J. Bruce Fields

    Anna Schumaker
     
  • The original intent was presumably to reduce code duplication. The
    trade-off was:

    - No support for an NFSD proc function returning a non-success
    RPC accept_stat value.
    - No support for void NFS replies to non-NULL procedures.
    - Everyone pays for the deduplication with a few extra conditional
    branches in a hot path.

    In addition, nfsd_dispatch() leaves *statp uninitialized in the
    success path, unlike svc_generic_dispatch().

    Address all of these problems by moving the logic for encoding
    the NFS status code into the NFS XDR encoders themselves. Then
    update the NFS .pc_func methods to return an RPC accept_stat
    value.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

02 Oct, 2020

1 commit


26 Sep, 2020

5 commits

  • Squelch some sparse warnings:

    /home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:1860:16: warning: incorrect type in assignment (different base types)
    /home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:1860:16: expected int status
    /home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:1860:16: got restricted __be32
    /home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:1862:24: warning: incorrect type in return expression (different base types)
    /home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:1862:24: expected restricted __be32
    /home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:1862:24: got int status

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Squelch some sparse warnings:

    /home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4692:24: warning: incorrect type in return expression (different base types)
    /home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4692:24: expected int
    /home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4692:24: got restricted __be32 [usertype]
    /home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4702:32: warning: incorrect type in return expression (different base types)
    /home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4702:32: expected int
    /home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4702:32: got restricted __be32 [usertype]
    /home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4739:13: warning: incorrect type in assignment (different base types)
    /home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4739:13: expected restricted __be32 [usertype] err
    /home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4739:13: got int
    /home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4891:15: warning: incorrect type in assignment (different base types)
    /home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4891:15: expected unsigned int [assigned] [usertype] count
    /home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4891:15: got restricted __be32 [usertype]

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Reserving space for a large READ payload requires special handling when
    reserving space in the xdr buffer pages. One problem we can have is use
    of the scratch buffer, which is used to get a pointer to a contiguous
    region of data up to PAGE_SIZE. When using the scratch buffer, calls to
    xdr_commit_encode() shift the data to it's proper alignment in the xdr
    buffer. If we've reserved several pages in a vector, then this could
    potentially invalidate earlier pointers and result in incorrect READ
    data being sent to the client.

    I get around this by looking at the amount of space left in the current
    page, and never reserve more than that for each entry in the read
    vector. This lets us place data directly where it needs to go in the
    buffer pages.

    Signed-off-by: Anna Schumaker
    Signed-off-by: J. Bruce Fields

    Anna Schumaker
     
  • In nfsd4_encode_listxattrs(), the variable p is assigned to at one point
    but this value is never used before p is reassigned. Fix this.

    Addresses-Coverity: ("Unused value")
    Signed-off-by: Alex Dewar
    Signed-off-by: J. Bruce Fields

    Alex Dewar
     
  • Missing "is".

    Signed-off-by: Alex Dewar
    Signed-off-by: J. Bruce Fields

    Alex Dewar
     

14 Jul, 2020

3 commits


17 Mar, 2020

3 commits

  • Address some minor nits I noticed while working on this function.

    Signed-off-by: Chuck Lever

    Chuck Lever
     
  • svcrdma expects that the payload falls precisely into the xdr_buf
    page vector. This does not seem to be the case for
    nfsd4_encode_readv().

    This code is called only when fops->splice_read is missing or when
    RQ_SPLICE_OK is clear, so it's not a noticeable problem in many
    common cases.

    Add new transport method: ->xpo_read_payload so that when a READ
    payload does not fit exactly in rq_res's page vector, the XDR
    encoder can inform the RPC transport exactly where that payload is,
    without the payload's XDR pad.

    That way, when a Write chunk is present, the transport knows what
    byte range in the Reply message is supposed to be matched with the
    chunk.

    Note that the Linux NFS server implementation of NFS/RDMA can
    currently handle only one Write chunk per RPC-over-RDMA message.
    This simplifies the implementation of this fix.

    Fixes: b04209806384 ("nfsd4: allow exotic read compounds")
    Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=198053
    Signed-off-by: Chuck Lever

    Chuck Lever
     
  • Currently, nfsd4_encode_exchange_id() encodes the utsname nodename
    string in the server_scope field. In a multi-host container
    environemnt, if an nfsd container is restarted on a different host than
    it was originally running on, clients will see a server_scope mismatch
    and will not attempt to reclaim opens.

    Instead, set the server_scope while we're in a process context during
    service startup, so we get the utsname nodename of the current process
    and store that in nfsd_net.

    Signed-off-by: Scott Mayhew
    [bfields: fix up major_id too]
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Chuck Lever

    Scott Mayhew
     

20 Dec, 2019

2 commits


10 Dec, 2019

2 commits


08 Dec, 2019

1 commit

  • Pull nfsd updates from Bruce Fields:
    "This is a relatively quiet cycle for nfsd, mainly various bugfixes.

    Possibly most interesting is Trond's fixes for some callback races
    that were due to my incomplete understanding of rpc client shutdown.
    Unfortunately at the last minute I've started noticing a new
    intermittent failure to send callbacks. As the logic seems basically
    correct, I'm leaving Trond's patches in for now, and hope to find a
    fix in the next week so I don't have to revert those patches"

    * tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linux: (24 commits)
    nfsd: depend on CRYPTO_MD5 for legacy client tracking
    NFSD fixing possible null pointer derefering in copy offload
    nfsd: check for EBUSY from vfs_rmdir/vfs_unink.
    nfsd: Ensure CLONE persists data and metadata changes to the target file
    SUNRPC: Fix backchannel latency metrics
    nfsd: restore NFSv3 ACL support
    nfsd: v4 support requires CRYPTO_SHA256
    nfsd: Fix cld_net->cn_tfm initialization
    lockd: remove __KERNEL__ ifdefs
    sunrpc: remove __KERNEL__ ifdefs
    race in exportfs_decode_fh()
    nfsd: Drop LIST_HEAD where the variable it declares is never used.
    nfsd: document callback_wq serialization of callback code
    nfsd: mark cb path down on unknown errors
    nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()
    nfsd: minor 4.1 callback cleanup
    SUNRPC: Fix svcauth_gss_proxy_init()
    SUNRPC: Trace gssproxy upcall results
    sunrpc: fix crash when cache_head become valid before update
    nfsd: remove private bin2hex implementation
    ...

    Linus Torvalds
     

16 Nov, 2019

1 commit

  • Most of the callers of lookup_one_len_unlocked() treat negatives are
    ERR_PTR(-ENOENT). Provide a helper that would do just that. Note
    that a pinned positive dentry remains positive - it's ->d_inode is
    stable, etc.; a pinned _negative_ dentry can become positive at any
    point as long as you are not holding its parent at least shared.
    So using lookup_one_len_unlocked() needs to be careful;
    lookup_positive_unlocked() is safer and that's what the callers
    end up open-coding anyway.

    Signed-off-by: Al Viro

    Al Viro
     

09 Oct, 2019

1 commit

  • Fixes gcc '-Wunused-but-set-variable' warning:

    fs/nfsd/nfs4xdr.c: In function nfsd4_encode_splice_read:
    fs/nfsd/nfs4xdr.c:3464:7: warning: variable len set but not used [-Wunused-but-set-variable]

    It is not used since commit 83a63072c815 ("nfsd: fix nfs read eof detection")

    Reported-by: Hulk Robot
    Signed-off-by: YueHaibing
    Signed-off-by: J. Bruce Fields

    YueHaibing
     

24 Sep, 2019

1 commit

  • Currently, the knfsd server assumes that a short read indicates an
    end of file. That assumption is incorrect. The short read means that
    either we've hit the end of file, or we've hit a read error.

    In the case of a read error, the client may want to retry (as per the
    implementation recommendations in RFC1813 and RFC7530), but currently it
    is being told that it hit an eof.

    Move the code to detect eof from version specific code into the generic
    nfsd read.

    Report eof only in the two following cases:
    1) read() returns a zero length short read with no error.
    2) the offset+length of the read is >= the file size.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     

29 Aug, 2019

1 commit

  • We're unnecessarily limiting the size of an ACL to less than what most
    filesystems will support. Some users do hit the limit and it's
    confusing and unnecessary.

    It still seems prudent to impose some limit on the number of ACEs the
    client gives us before passing it straight to kmalloc(). So, let's just
    limit it to the maximum number that would be possible given the amount
    of data left in the argument buffer.

    That will still leave one limit beyond whatever the filesystem imposes:
    the client and server negotiate a limit on the size of a request, which
    we have to respect.

    But we're no longer imposing any additional arbitrary limit.

    struct nfs4_ace is 20 bytes on my system and the maximum call size we'll
    negotiate is about a megabyte, so in practice this is limiting the
    allocation here to about a megabyte.

    Reported-by: "de Vandiere, Louis"
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

19 Aug, 2019

3 commits


04 Jul, 2019

3 commits

  • Decode the implementation ID and display in nfsd/clients/#/info. It may
    be help identify the client. It won't be used otherwise.

    (When this went into the protocol, I thought the implementation ID would
    be a slippery slope towards implementation-specific workarounds as with
    the http user-agent. But I guess I was wrong, the risk seems pretty low
    now.)

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Commit bf8d909705e "nfsd: Decode and send 64bit time values" fixed the
    code without updating the comment.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • After commit 95582b008388 "vfs: change inode times to use struct
    timespec64" there are spots in the NFSv4 decoding where we decode the
    protocol into a struct timeval and then convert that into a timeval64.

    That's unnecesary in the NFSv4 case since the on-the-wire protocol also
    uses 64-bit values. So just fix up our code to use timeval64 everywhere.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

24 Apr, 2019

2 commits

  • Convert knfsd to use the user namespace of the container that started
    the server processes.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • clang warns that 'contextlen' may be accessed without an initialization:

    fs/nfsd/nfs4xdr.c:2911:9: error: variable 'contextlen' is uninitialized when used here [-Werror,-Wuninitialized]
    contextlen);
    ^~~~~~~~~~
    fs/nfsd/nfs4xdr.c:2424:16: note: initialize the variable 'contextlen' to silence this warning
    int contextlen;
    ^
    = 0

    Presumably this cannot happen, as FATTR4_WORD2_SECURITY_LABEL is
    set if CONFIG_NFSD_V4_SECURITY_LABEL is enabled.
    Adding another #ifdef like the other two in this function
    avoids the warning.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: J. Bruce Fields

    Arnd Bergmann
     

26 Sep, 2018

3 commits

  • Upon receiving a request for async copy, create a new kthread. If we
    get asynchronous request, make sure to copy the needed arguments/state
    from the stack before starting the copy. Then start the thread and reply
    back to the client indicating copy is asynchronous.

    nfsd_copy_file_range() will copy in a loop over the total number of
    bytes is needed to copy. In case a failure happens in the middle, we
    ignore the error and return how much we copied so far. Once done
    creating a workitem for the callback workqueue and send CB_OFFLOAD with
    the results.

    The lifetime of the copy stateid is bound to the vfs copy. This way we
    don't need to keep the nfsd_net structure for the callback. We could
    keep it around longer so that an OFFLOAD_STATUS that came late would
    still get results, but clients should be able to deal without that.

    We handle OFFLOAD_CANCEL by sending a signal to the copy thread and
    calling kthread_stop.

    A client should cancel any ongoing copies before calling DESTROY_CLIENT;
    if not, we return a CLIENT_BUSY error.

    If the client is destroyed for some other reason (lease expiration, or
    server shutdown), we must clean up any ongoing copies ourselves.

    Signed-off-by: Olga Kornievskaia
    [colin.king@canonical.com: fix leak in error case]
    [bfields@fieldses.org: remove signalling, merge patches]
    Signed-off-by: J. Bruce Fields

    Olga Kornievskaia
     
  • Signed-off-by: Olga Kornievskaia
    Signed-off-by: J. Bruce Fields

    Olga Kornievskaia
     
  • Signed-off-by: Olga Kornievskaia
    Signed-off-by: J. Bruce Fields

    Olga Kornievskaia
     

10 Aug, 2018

1 commit

  • READ_BUF(8);
    dummy = be32_to_cpup(p++);
    dummy = be32_to_cpup(p++);
    ...
    READ_BUF(4);
    dummy = be32_to_cpup(p++);

    Assigning value to "dummy" here, but that stored value
    is overwritten before it can be used.
    At the same time READ_BUF() will re-update the pointer p.

    delete invalid assignment statements

    Signed-off-by: nixiaoming
    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    nixiaoming
     

17 Jun, 2018

2 commits

  • The change attribute is what is used by clients to revalidate their
    caches. Our server may use i_version or ctime for that purpose. Those
    choices behave slightly differently, and it may be useful to the client
    to know which we're using. This attribute tells the client that. The
    Linux client doesn't yet use this attribute yet, though.

    Reviewed-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Currently we return the worst-case value of 1 second in the time delta
    attribute. That's not terribly useful. Instead, return a value
    calculated from the time granularity supported by the filesystem and the
    system clock.

    Reviewed-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields