06 Nov, 2020

1 commit


12 Oct, 2020

1 commit

  • The original intent was presumably to reduce code duplication. The
    trade-off was:

    - No support for an NFSD proc function returning a non-success
    RPC accept_stat value.
    - No support for void NFS replies to non-NULL procedures.
    - Everyone pays for the deduplication with a few extra conditional
    branches in a hot path.

    In addition, nfsd_dispatch() leaves *statp uninitialized in the
    success path, unlike svc_generic_dispatch().

    Address all of these problems by moving the logic for encoding
    the NFS status code into the NFS XDR encoders themselves. Then
    update the NFS .pc_func methods to return an RPC accept_stat
    value.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

02 Oct, 2020

1 commit


23 Jan, 2020

2 commits


20 Dec, 2019

1 commit

  • The decode_time3 function behaves differently on 32-bit
    and 64-bit architectures: on the former, a 32-bit timestamp
    gets converted into an signed number and then into a timestamp
    between 1902 and 2038, while on the latter it is interpreted
    as unsigned in the range 1970-2106.

    Change all the remaining 'timespec' in nfsd to 'timespec64'
    to make the behavior the same, and use the current interpretation
    of the dominant 64-bit architectures.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: J. Bruce Fields

    Arnd Bergmann
     

16 Nov, 2019

1 commit

  • Most of the callers of lookup_one_len_unlocked() treat negatives are
    ERR_PTR(-ENOENT). Provide a helper that would do just that. Note
    that a pinned positive dentry remains positive - it's ->d_inode is
    stable, etc.; a pinned _negative_ dentry can become positive at any
    point as long as you are not holding its parent at least shared.
    So using lookup_one_len_unlocked() needs to be careful;
    lookup_positive_unlocked() is safer and that's what the callers
    end up open-coding anyway.

    Signed-off-by: Al Viro

    Al Viro
     

10 Sep, 2019

1 commit


24 Apr, 2019

1 commit


06 Apr, 2019

1 commit

  • After this commit
    f875a79 nfsd: allow nfsv3 readdir request to be larger.
    nfsv3 readdir request size can be larger than PAGE_SIZE. So if the
    directory been read is large enough, we can use multiple pages
    in rq_respages. Update buffer count and page pointers like we do
    in readdirplus to make this happen.

    Now listing a directory within 3000 files will panic because we
    are counting in a wrong way and would write on random page.

    Fixes: f875a79 "nfsd: allow nfsv3 readdir request to be larger"
    Signed-off-by: Murphy Zhou
    Signed-off-by: J. Bruce Fields

    Murphy Zhou
     

09 Mar, 2019

1 commit

  • nfsd currently reports the NFSv3 dtpref FSINFO parameter
    to be PAGE_SIZE, so NFS clients will typically ask for one
    page of directory entries at a time. This is needlessly restrictive
    as nfsd can handle larger replies easily.

    Also, a READDIR request (but not a READDIRPLUS request) has the count
    size clipped to PAGE_SIE, again unnecessary.

    This patch lifts these limits so that larger readdir requests can be
    used.

    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     

06 Mar, 2019

1 commit

  • If the result of an NFSv3 readdir{,plus} request results in the
    "offset" on one entry having to be split across 2 pages, and is sized
    so that the next directory entry doesn't fit in the requested size,
    then memory corruption can happen.

    When encode_entry() is called after encoding the last entry that fits,
    it notices that ->offset and ->offset1 are set, and so stores the
    offset value in the two pages as required. It clears ->offset1 but
    *does not* clear ->offset.

    Normally this omission doesn't matter as encode_entry_baggage() will
    be called, and will set ->offset to a suitable value (not on a page
    boundary).
    But in the case where cd->buflen < elen and nfserr_toosmall is
    returned, ->offset is not reset.

    This means that nfsd3proc_readdirplus will see ->offset with a value 4
    bytes before the end of a page, and ->offset1 set to NULL.
    It will try to write 8bytes to ->offset.
    If we are lucky, the next page will be read-only, and the system will
    BUG: unable to handle kernel paging request at...

    If we are unlucky, some innocent page will have the first 4 bytes
    corrupted.

    nfsd3proc_readdir() doesn't even check for ->offset1, it just blindly
    writes 8 bytes to the offset wherever it is.

    Fix this by clearing ->offset after it is used, and copying the
    ->offset handling code from nfsd3_proc_readdirplus into
    nfsd3_proc_readdir.

    (Note that the commit hash in the Fixes tag is from the 'history'
    tree - this bug predates git).

    Fixes: 0b1d57cf7654 ("[PATCH] kNFSd: Fix nfs3 dentry encoding")
    Fixes-URL: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=0b1d57cf7654
    Cc: stable@vger.kernel.org (v2.6.12+)
    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     

06 Jun, 2018

1 commit

  • struct timespec is not y2038 safe. Transition vfs to use
    y2038 safe struct timespec64 instead.

    The change was made with the help of the following cocinelle
    script. This catches about 80% of the changes.
    All the header file and logic changes are included in the
    first 5 rules. The rest are trivial substitutions.
    I avoid changing any of the function signatures or any other
    filesystem specific data structures to keep the patch simple
    for review.

    The script can be a little shorter by combining different cases.
    But, this version was sufficient for my usecase.

    virtual patch

    @ depends on patch @
    identifier now;
    @@
    - struct timespec
    + struct timespec64
    current_time ( ... )
    {
    - struct timespec now = current_kernel_time();
    + struct timespec64 now = current_kernel_time64();
    ...
    - return timespec_trunc(
    + return timespec64_trunc(
    ... );
    }

    @ depends on patch @
    identifier xtime;
    @@
    struct \( iattr \| inode \| kstat \) {
    ...
    - struct timespec xtime;
    + struct timespec64 xtime;
    ...
    }

    @ depends on patch @
    identifier t;
    @@
    struct inode_operations {
    ...
    int (*update_time) (...,
    - struct timespec t,
    + struct timespec64 t,
    ...);
    ...
    }

    @ depends on patch @
    identifier t;
    identifier fn_update_time =~ "update_time$";
    @@
    fn_update_time (...,
    - struct timespec *t,
    + struct timespec64 *t,
    ...) { ... }

    @ depends on patch @
    identifier t;
    @@
    lease_get_mtime( ... ,
    - struct timespec *t
    + struct timespec64 *t
    ) { ... }

    @te depends on patch forall@
    identifier ts;
    local idexpression struct inode *inode_node;
    identifier i_xtime =~ "^i_[acm]time$";
    identifier ia_xtime =~ "^ia_[acm]time$";
    identifier fn_update_time =~ "update_time$";
    identifier fn;
    expression e, E3;
    local idexpression struct inode *node1;
    local idexpression struct inode *node2;
    local idexpression struct iattr *attr1;
    local idexpression struct iattr *attr2;
    local idexpression struct iattr attr;
    identifier i_xtime1 =~ "^i_[acm]time$";
    identifier i_xtime2 =~ "^i_[acm]time$";
    identifier ia_xtime1 =~ "^ia_[acm]time$";
    identifier ia_xtime2 =~ "^ia_[acm]time$";
    @@
    (
    (
    - struct timespec ts;
    + struct timespec64 ts;
    |
    - struct timespec ts = current_time(inode_node);
    + struct timespec64 ts = current_time(inode_node);
    )

    i_xtime, &ts)
    + timespec64_equal(&inode_node->i_xtime, &ts)
    |
    - timespec_equal(&ts, &inode_node->i_xtime)
    + timespec64_equal(&ts, &inode_node->i_xtime)
    |
    - timespec_compare(&inode_node->i_xtime, &ts)
    + timespec64_compare(&inode_node->i_xtime, &ts)
    |
    - timespec_compare(&ts, &inode_node->i_xtime)
    + timespec64_compare(&ts, &inode_node->i_xtime)
    |
    ts = current_time(e)
    |
    fn_update_time(..., &ts,...)
    |
    inode_node->i_xtime = ts
    |
    node1->i_xtime = ts
    |
    ts = inode_node->i_xtime
    |
    ia_xtime ...+> = ts
    |
    ts = attr1->ia_xtime
    |
    ts.tv_sec
    |
    ts.tv_nsec
    |
    btrfs_set_stack_timespec_sec(..., ts.tv_sec)
    |
    btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
    |
    - ts = timespec64_to_timespec(
    + ts =
    ...
    -)
    |
    - ts = ktime_to_timespec(
    + ts = ktime_to_timespec64(
    ...)
    |
    - ts = E3
    + ts = timespec_to_timespec64(E3)
    |
    - ktime_get_real_ts(&ts)
    + ktime_get_real_ts64(&ts)
    |
    fn(...,
    - ts
    + timespec64_to_timespec(ts)
    ,...)
    )
    ...+>
    (

    )
    |
    - timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
    + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
    |
    - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
    + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
    |
    - timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
    + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
    |
    node1->i_xtime1 =
    - timespec_trunc(attr1->ia_xtime1,
    + timespec64_trunc(attr1->ia_xtime1,
    ...)
    |
    - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
    + attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2,
    ...)
    |
    - ktime_get_real_ts(&attr1->ia_xtime1)
    + ktime_get_real_ts64(&attr1->ia_xtime1)
    |
    - ktime_get_real_ts(&attr.ia_xtime1)
    + ktime_get_real_ts64(&attr.ia_xtime1)
    )

    @ depends on patch @
    struct inode *node;
    struct iattr *attr;
    identifier fn;
    identifier i_xtime =~ "^i_[acm]time$";
    identifier ia_xtime =~ "^ia_[acm]time$";
    expression e;
    @@
    (
    - fn(node->i_xtime);
    + fn(timespec64_to_timespec(node->i_xtime));
    |
    fn(...,
    - node->i_xtime);
    + timespec64_to_timespec(node->i_xtime));
    |
    - e = fn(attr->ia_xtime);
    + e = fn(timespec64_to_timespec(attr->ia_xtime));
    )

    @ depends on patch forall @
    struct inode *node;
    struct iattr *attr;
    identifier i_xtime =~ "^i_[acm]time$";
    identifier ia_xtime =~ "^ia_[acm]time$";
    identifier fn;
    @@
    {
    + struct timespec ts;
    i_xtime);
    fn (...,
    - &node->i_xtime,
    + &ts,
    ...);
    |
    + ts = timespec64_to_timespec(attr->ia_xtime);
    fn (...,
    - &attr->ia_xtime,
    + &ts,
    ...);
    )
    ...+>
    }

    @ depends on patch forall @
    struct inode *node;
    struct iattr *attr;
    struct kstat *stat;
    identifier ia_xtime =~ "^ia_[acm]time$";
    identifier i_xtime =~ "^i_[acm]time$";
    identifier xtime =~ "^[acm]time$";
    identifier fn, ret;
    @@
    {
    + struct timespec ts;
    i_xtime);
    ret = fn (...,
    - &node->i_xtime,
    + &ts,
    ...);
    |
    + ts = timespec64_to_timespec(node->i_xtime);
    ret = fn (...,
    - &node->i_xtime);
    + &ts);
    |
    + ts = timespec64_to_timespec(attr->ia_xtime);
    ret = fn (...,
    - &attr->ia_xtime,
    + &ts,
    ...);
    |
    + ts = timespec64_to_timespec(attr->ia_xtime);
    ret = fn (...,
    - &attr->ia_xtime);
    + &ts);
    |
    + ts = timespec64_to_timespec(stat->xtime);
    ret = fn (...,
    - &stat->xtime);
    + &ts);
    )
    ...+>
    }

    @ depends on patch @
    struct inode *node;
    struct inode *node2;
    identifier i_xtime1 =~ "^i_[acm]time$";
    identifier i_xtime2 =~ "^i_[acm]time$";
    identifier i_xtime3 =~ "^i_[acm]time$";
    struct iattr *attrp;
    struct iattr *attrp2;
    struct iattr attr ;
    identifier ia_xtime1 =~ "^ia_[acm]time$";
    identifier ia_xtime2 =~ "^ia_[acm]time$";
    struct kstat *stat;
    struct kstat stat1;
    struct timespec64 ts;
    identifier xtime =~ "^[acmb]time$";
    expression e;
    @@
    (
    ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1 ;
    |
    node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
    |
    node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
    |
    node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
    |
    stat->xtime = node2->i_xtime1;
    |
    stat1.xtime = node2->i_xtime1;
    |
    ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1 ;
    |
    ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
    |
    - e = node->i_xtime1;
    + e = timespec64_to_timespec( node->i_xtime1 );
    |
    - e = attrp->ia_xtime1;
    + e = timespec64_to_timespec( attrp->ia_xtime1 );
    |
    node->i_xtime1 = current_time(...);
    |
    node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
    - e;
    + timespec_to_timespec64(e);
    |
    node->i_xtime1 = node->i_xtime3 =
    - e;
    + timespec_to_timespec64(e);
    |
    - node->i_xtime1 = e;
    + node->i_xtime1 = timespec_to_timespec64(e);
    )

    Signed-off-by: Deepa Dinamani
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:

    Deepa Dinamani
     

04 Apr, 2018

2 commits

  • Move common code in NFSD's legacy SYMLINK decoders into a helper.
    The immediate benefits include:

    - one fewer data copies on transports that support DDP
    - consistent error checking across all versions
    - reduction of code duplication
    - support for both legal forms of SYMLINK requests on RDMA
    transports for all versions of NFS (in particular, NFSv2, for
    completeness)

    In the long term, this helper is an appropriate spot to perform a
    per-transport call-out to fill the pathname argument using, say,
    RDMA Reads.

    Filling the pathname in the proc function also means that eventually
    the incoming filehandle can be interpreted so that filesystem-
    specific memory can be allocated as a sink for the pathname
    argument, rather than using anonymous pages.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Move common code in NFSD's legacy NFS WRITE decoders into a helper.
    The immediate benefit is reduction of code duplication and some nice
    micro-optimizations (see below).

    In the long term, this helper can perform a per-transport call-out
    to fill the rq_vec (say, using RDMA Reads).

    The legacy WRITE decoders and procs are changed to work like NFSv4,
    which constructs the rq_vec just before it is about to call
    vfs_writev.

    Why? Calling a transport call-out from the proc instead of the XDR
    decoder means that the incoming FH can be resolved to a particular
    filesystem and file. This would allow pages from the backing file to
    be presented to the transport to be filled, rather than presenting
    anonymous pages and copying or flipping them into the file's page
    cache later.

    I also prefer using the pages in rq_arg.pages, instead of pulling
    the data pages directly out of the rqstp::rq_pages array. This is
    currently the way the NFSv3 write decoder works, but the other two
    do not seem to take this approach. Fixing this removes the only
    reference to rq_pages found in NFSD, eliminating an NFSD assumption
    about how transports use the pages in rq_pages.

    Lastly, avoid setting up the first element of rq_vec as a zero-
    length buffer. This happens with an RDMA transport when a normal
    Read chunk is present because the data payload is in rq_arg's
    page list (none of it is in the head buffer).

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

09 Feb, 2018

1 commit

  • The time values in stat and inode may differ for overlayfs and stat time
    values are the correct ones to use. This is also consistent with the fact
    that fill_post_wcc() also stores stat time values.

    This means introducing a stat call that could fail, where previously we
    were just copying values out of the inode. To be conservative about
    changing behavior, we fall back to copying values out of the inode in
    the error case. It might be better just to clear fh_pre_saved (though
    note the BUG_ON in set_change_info).

    Signed-off-by: Amir Goldstein
    Signed-off-by: J. Bruce Fields

    Amir Goldstein
     

19 Nov, 2017

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Lots of good bugfixes, including:

    - fix a number of races in the NFSv4+ state code

    - fix some shutdown crashes in multiple-network-namespace cases

    - relax our 4.1 session limits; if you've an artificially low limit
    to the number of 4.1 clients that can mount simultaneously, try
    upgrading"

    * tag 'nfsd-4.15' of git://linux-nfs.org/~bfields/linux: (22 commits)
    SUNRPC: Improve ordering of transport processing
    nfsd: deal with revoked delegations appropriately
    svcrdma: Enqueue after setting XPT_CLOSE in completion handlers
    nfsd: use nfs->ns.inum as net ID
    rpc: remove some BUG()s
    svcrdma: Preserve CB send buffer across retransmits
    nfds: avoid gettimeofday for nfssvc_boot time
    fs, nfsd: convert nfs4_file.fi_ref from atomic_t to refcount_t
    fs, nfsd: convert nfs4_cntl_odstate.co_odcount from atomic_t to refcount_t
    fs, nfsd: convert nfs4_stid.sc_count from atomic_t to refcount_t
    lockd: double unregister of inetaddr notifiers
    nfsd4: catch some false session retries
    nfsd4: fix cached replies to solo SEQUENCE compounds
    sunrcp: make function _svc_create_xprt static
    SUNRPC: Fix tracepoint storage issues with svc_recv and svc_rqst_status
    nfsd: use ARRAY_SIZE
    nfsd: give out fewer session slots as limit approaches
    nfsd: increase DRC cache limit
    nfsd: remove unnecessary nofilehandle checks
    nfs_common: convert int to bool
    ...

    Linus Torvalds
     

08 Nov, 2017

1 commit

  • do_gettimeofday() is deprecated and we should generally use time64_t
    based functions instead.

    In case of nfsd, all three users of nfssvc_boot only use the initial
    time as a unique token, and are not affected by it overflowing, so they
    are not affected by the y2038 overflow.

    This converts the structure to timespec64 anyway and adds comments
    to all uses, to document that we have thought about it and avoid
    having to look at it again.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: J. Bruce Fields

    Arnd Bergmann
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

13 Jul, 2017

1 commit

  • Factoring ctime into the nfsv4 change attribute gives us better
    properties than just i_version alone.

    Eventually we'll likely also expose this (as opposed to raw i_version)
    to userspace, at which point we'll want to move it to a common helper,
    called from either userspace or individual filesystems. For now, nfsd
    is the only user.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

29 Jun, 2017

1 commit


17 May, 2017

1 commit

  • This reverts commit 51f567777799 "nfsd: check for oversized NFSv2/v3
    arguments", which breaks support for NFSv3 ACLs.

    That patch was actually an earlier draft of a fix for the problem that
    was eventually fixed by e6838a29ecb "nfsd: check for oversized NFSv2/v3
    arguments". But somehow I accidentally left this earlier draft in the
    branch that was part of my 2.12 pull request.

    Reported-by: Eryu Guan
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

15 May, 2017

3 commits


26 Apr, 2017

3 commits

  • A client can append random data to the end of an NFSv2 or NFSv3 RPC call
    without our complaining; we'll just stop parsing at the end of the
    expected data and ignore the rest.

    Encoded arguments and replies are stored together in an array of pages,
    and if a call is too large it could leave inadequate space for the
    reply. This is normally OK because NFS RPC's typically have either
    short arguments and long replies (like READ) or long arguments and short
    replies (like WRITE). But a client that sends an incorrectly long reply
    can violate those assumptions. This was observed to cause crashes.

    So, insist that the argument not be any longer than we expect.

    Also, several operations increment rq_next_page in the decode routine
    before checking the argument size, which can leave rq_next_page pointing
    well past the end of the page array, causing trouble later in
    svc_free_pages.

    As followup we may also want to rewrite the encoding routines to check
    more carefully that they aren't running off the end of the page array.

    Reported-by: Tuomas Haanpää
    Reported-by: Ari Kauppi
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The NFSv2/v3 code does not systematically check whether we decode past
    the end of the buffer. This generally appears to be harmless, but there
    are a few places where we do arithmetic on the pointers involved and
    don't account for the possibility that a length could be negative. Add
    checks to catch these.

    Reported-by: Tuomas Haanpää
    Reported-by: Ari Kauppi
    Reviewed-by: NeilBrown
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Use a couple shortcuts that will simplify a following bugfix.

    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

25 May, 2016

1 commit

  • Pull nfsd updates from Bruce Fields:
    "A very quiet cycle for nfsd, mainly just an RDMA update from Chuck
    Lever"

    * tag 'nfsd-4.7' of git://linux-nfs.org/~bfields/linux:
    sunrpc: fix stripping of padded MIC tokens
    svcrpc: autoload rdma module
    svcrdma: Generalize svc_rdma_xdr_decode_req()
    svcrdma: Eliminate code duplication in svc_rdma_recvfrom()
    svcrdma: Drain QP before freeing svcrdma_xprt
    svcrdma: Post Receives only for forward channel requests
    svcrdma: Remove superfluous line from rdma_read_chunks()
    svcrdma: svc_rdma_put_context() is invoked twice in Send error path
    svcrdma: Do not add XDR padding to xdr_buf page vector
    svcrdma: Support IPv6 with NFS/RDMA
    nfsd: handle seqid wraparound in nfsd4_preprocess_layout_stateid
    Remove unnecessary allocation

    Linus Torvalds
     

14 May, 2016

1 commit

  • An xdr_buf has a head, a vector of pages, and a tail. Each
    RPC request is presented to the NFS server contained in an
    xdr_buf.

    The RDMA transport would like to supply the NFS server with only
    the NFS WRITE payload bytes in the page vector. In some common
    cases, that would allow the NFS server to swap those pages right
    into the target file's page cache.

    Have the transport's RDMA Read logic put XDR pad bytes in the tail
    iovec, and not in the pages that hold the data payload.

    The NFSv3 WRITE XDR decoder is finicky about the lengths involved,
    so make sure it is looking in the correct places when computing
    the total length of the incoming NFS WRITE request.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

11 Apr, 2016

1 commit


09 Jan, 2016

1 commit

  • We need information about exports when crossing mountpoints during
    lookup or NFSv4 readdir. If we don't already have that information
    cached, we may have to ask (and wait for) rpc.mountd.

    In both cases we currently hold the i_mutex on the parent of the
    directory we're asking rpc.mountd about. We've seen situations where
    rpc.mountd performs some operation on that directory that tries to take
    the i_mutex again, resulting in deadlock.

    With some care, we may be able to avoid that in rpc.mountd. But it
    seems better just to avoid holding a mutex while waiting on userspace.

    It appears that lookup_one_len is pretty much the only operation that
    needs the i_mutex. So we could just drop the i_mutex elsewhere and do
    something like

    mutex_lock()
    lookup_one_len()
    mutex_unlock()

    In many cases though the lookup would have been cached and not required
    the i_mutex, so it's more efficient to create a lookup_one_len() variant
    that only takes the i_mutex when necessary.

    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Al Viro

    NeilBrown
     

13 Oct, 2015

1 commit


07 May, 2015

1 commit

  • The NFSv3 READDIRPLUS gets some of the returned attributes from the
    readdir, and some from an inode returned from a new lookup. The two
    objects could be different thanks to intervening renames.

    The attributes in READDIRPLUS are optional, so let's just skip them if
    we notice this case.

    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     

16 Apr, 2015

1 commit


23 Jun, 2014

1 commit


23 May, 2014

1 commit

  • Assignments should not happen inside an if conditional, but in the line
    before. This issue was reported by checkpatch.

    The semantic patch that makes this change is as follows
    (http://coccinelle.lip6.fr/):

    //

    @@
    identifier i1;
    expression e1;
    statement S;
    @@
    -if(!(i1 = e1)) S
    +i1 = e1;
    +if(!i1)
    +S

    //

    It has been tested by compilation.

    Signed-off-by: Benoit Taine
    Signed-off-by: J. Bruce Fields

    Benoit Taine
     

24 Jan, 2014

1 commit


11 Dec, 2013

1 commit

  • The Linux NFS server replies among other things to a "Check access permission"
    the following:

    NFS: File type = 2 (Directory)
    NFS: Mode = 040755

    A netapp server replies here:
    NFS: File type = 2 (Directory)
    NFS: Mode = 0755

    The RFC 1813 i read:
    fattr3

    struct fattr3 {
    ftype3 type;
    mode3 mode;
    uint32 nlink;
    ...
    For the mode bits only the lowest 9 are defined in the RFC

    As far as I can tell, knfsd has always done this, so apparently it's harmless.
    Nevertheless, it appears to be wrong.

    Note this is already correct in the NFSv4 case, only v2 and v3 need
    fixing.

    Signed-off-by: J. Bruce Fields

    Albert Fluegel
     

27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds