24 Mar, 2019

2 commits

  • commit b7e5034cbecf5a65b7bfdc2b20a8378039577706 upstream.

    James Pearson found that an NFS server stopped responding to UDP
    requests if started with more than 1017 threads.

    sv_max_mesg is about 2^20, so that is probably where the calculation
    performed by

    svc_sock_setbufsize(svsk->sk_sock,
    (serv->sv_nrthreads+3) * serv->sv_max_mesg,
    (serv->sv_nrthreads+3) * serv->sv_max_mesg);

    starts to overflow an int.

    Reported-by: James Pearson
    Tested-by: James Pearson
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    J. Bruce Fields
     
  • [ Upstream commit a4cb5bdb754afe21f3e9e7164213e8600cf69427 ]

    Make sure the device has at least 2 completion vectors
    before allocating to compvec#1

    Fixes: a4699f5647f3 (xprtrdma: Put Send CQ in IB_POLL_WORKQUEUE mode)
    Signed-off-by: Nicolas Morey-Chaisemartin
    Reviewed-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Sasha Levin

    Nicolas Morey-Chaisemartin
     

27 Feb, 2019

1 commit

  • [ Upstream commit 6e17f58c486d9554341f70aa5b63b8fbed07b3fa ]

    The clean up is handled by the caller, rpcrdma_buffer_create(), so this
    call to rpcrdma_sendctxs_destroy() leads to a double free.

    Fixes: ae72950abf99 ("xprtrdma: Add data structure to manage RDMA Send arguments")
    Signed-off-by: Dan Carpenter
    Reviewed-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Sasha Levin

    Dan Carpenter
     

23 Feb, 2019

1 commit

  • commit e7afe6c1d486b516ed586dcc10b3e7e3e85a9c2b upstream.

    While trying to reproduce a reported kernel panic on arm64, I discovered
    that AUTH_GSS basically doesn't work at all with older enctypes on arm64
    systems with CONFIG_VMAP_STACK enabled. It turns out there still a few
    places using stack memory with scatterlists, causing krb5_encrypt() and
    krb5_decrypt() to produce incorrect results (or a BUG if CONFIG_DEBUG_SG
    is enabled).

    Tested with cthon on v4.0/v4.1/v4.2 with krb5/krb5i/krb5p using
    des3-cbc-sha1 and arcfour-hmac-md5.

    Signed-off-by: Scott Mayhew
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Scott Mayhew
     

15 Feb, 2019

3 commits

  • commit e248aa7be86e8179f20ac0931774ecd746f3f5bf upstream.

    Two and a half years ago, the client was changed to use gathered
    Send for larger inline messages, in commit 655fec6987b ("xprtrdma:
    Use gathered Send for large inline messages"). Several fixes were
    required because there are a few in-kernel device drivers whose
    max_sge is 3, and these were broken by the change.

    Apparently my memory is going, because some time later, I submitted
    commit 25fd86eca11c ("svcrdma: Don't overrun the SGE array in
    svc_rdma_send_ctxt"), and after that, commit f3c1fd0ee294 ("svcrdma:
    Reduce max_send_sges"). These too incorrectly assumed in-kernel
    device drivers would have more than a few Send SGEs available.

    The fix for the server side is not the same. This is because the
    fundamental problem on the server is that, whether or not the client
    has provisioned a chunk for the RPC reply, the server must squeeze
    even the most complex RPC replies into a single RDMA Send. Failing
    in the send path because of Send SGE exhaustion should never be an
    option.

    Therefore, instead of failing when the send path runs out of SGEs,
    switch to using a bounce buffer mechanism to handle RPC replies that
    are too complex for the device to send directly. That allows us to
    remove the max_sge check to enable drivers with small max_sge to
    work again.

    Reported-by: Don Dutile
    Fixes: 25fd86eca11c ("svcrdma: Don't overrun the SGE array in ...")
    Cc: stable@vger.kernel.org
    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit f3c1fd0ee294abd4367dfa72d89f016c682202f0 upstream.

    There's no need to request a large number of send SGEs because the
    inline threshold already constrains the number of SGEs per Send.

    Signed-off-by: Chuck Lever
    Reviewed-by: Sagi Grimberg
    Signed-off-by: J. Bruce Fields
    Cc: Don Dutile
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • This patch is only appropriate for stable kernels v4.16 - v4.19

    Since commit 9b30889c548a ("SUNRPC: Ensure we always close the socket after
    a connection shuts down"), and until commit c544577daddb ("SUNRPC: Clean up
    transport write space handling"), it is possible for the NFS client to spin
    in the following tight loop:

    269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_bind [sunrpc]
    269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_connect [sunrpc]
    269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_transmit [sunrpc]
    269.964085: xprt_transmit: peer=[10.0.1.82]:2049 xid=0x761d3f77 status=-32
    269.964085: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=-32 action=call_transmit_status [sunrpc]
    269.964085: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=-32 action=call_status [sunrpc]
    269.964085: rpc_call_status: task:43@0 status=-32

    The issue is that the path through call_transmit_status does not release
    the XPRT_LOCK when the transmit result is -EPIPE, so the socket cannot be
    properly shut down.

    The below commit fixed things up in mainline by unconditionally calling
    xprt_end_transmit() and releasing the XPRT_LOCK after every pass through
    call_transmit. However, the entirety of this commit is not appropriate for
    stable kernels because its original inclusion was part of a series that
    modifies the sunrpc code to use a different queueing model. As a result,
    there are machinations within this patch that are not needed for a stable
    fix and will not make sense without a larger backport of the mainline
    series.

    In this patch, we take the slightly modified bit of the mainline patch
    below, which is to release the XPRT_LOCK on transmission error should we
    detect that the transport is waiting to close.

    commit c544577daddb618c7dd5fa7fb98d6a41782f020e upstream
    Author: Trond Myklebust
    Date: Mon Sep 3 23:39:27 2018 -0400

    SUNRPC: Clean up transport write space handling

    Treat socket write space handling in the same way we now treat transport
    congestion: by denying the XPRT_LOCK until the transport signals that it
    has free buffer space.

    Signed-off-by: Trond Myklebust

    The original discussion of the problem is here:

    https://lore.kernel.org/linux-nfs/20181212135157.4489-1-dwysocha@redhat.com/T/#t

    This passes my usual cthon and xfstests on NFS as applied on v4.19 mainline.

    Reported-by: Dave Wysochanski
    Suggested-by: Trond Myklebust
    Signed-off-by: Benjamin Coddington
    Signed-off-by: Greg Kroah-Hartman

    Benjamin Coddington
     

23 Jan, 2019

1 commit

  • commit 81c88b18de1f11f70c97f28ced8d642c00bb3955 upstream.

    If we ignore the error we'll hit a null dereference a little later.

    Reported-by: syzbot+4b98281f2401ab849f4b@syzkaller.appspotmail.com
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    J. Bruce Fields
     

17 Jan, 2019

1 commit

  • commit d4b09acf924b84bae77cad090a9d108e70b43643 upstream.

    if node have NFSv41+ mounts inside several net namespaces
    it can lead to use-after-free in svc_process_common()

    svc_process_common()
    /* Setup reply header */
    rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE

    svc_process_common() can use incorrect rqstp->rq_xprt,
    its caller function bc_svc_process() takes it from serv->sv_bc_xprt.
    The problem is that serv is global structure but sv_bc_xprt
    is assigned per-netnamespace.

    According to Trond, the whole "let's set up rqstp->rq_xprt
    for the back channel" is nothing but a giant hack in order
    to work around the fact that svc_process_common() uses it
    to find the xpt_ops, and perform a couple of (meaningless
    for the back channel) tests of xpt_flags.

    All we really need in svc_process_common() is to be able to run
    rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr()

    Bruce J Fields points that this xpo_prep_reply_hdr() call
    is an awfully roundabout way just to do "svc_putnl(resv, 0);"
    in the tcp case.

    This patch does not initialiuze rqstp->rq_xprt in bc_svc_process(),
    now it calls svc_process_common() with rqstp->rq_xprt = NULL.

    To adjust reply header svc_process_common() just check
    rqstp->rq_prot and calls svc_tcp_prep_reply_hdr() for tcp case.

    To handle rqstp->rq_xprt = NULL case in functions called from
    svc_process_common() patch intruduces net namespace pointer
    svc_rqst->rq_bc_net and adjust SVC_NET() definition.
    Some other function was also adopted to properly handle described case.

    Signed-off-by: Vasily Averin
    Cc: stable@vger.kernel.org
    Fixes: 23c20ecd4475 ("NFS: callback up - users counting cleanup")
    Signed-off-by: J. Bruce Fields
    v2: added lost extern svc_tcp_prep_reply_hdr()
    Signed-off-by: Vasily Averin
    Signed-off-by: Greg Kroah-Hartman

    Vasily Averin
     

13 Jan, 2019

3 commits

  • commit b8be5674fa9a6f3677865ea93f7803c4212f3e10 upstream.

    Signed-off-by: Vasily Averin
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Vasily Averin
     
  • commit 4ecd55ea074217473f94cfee21bb72864d39f8d7 upstream.

    After commit d202cce8963d, an expired cache_head can be removed from the
    cache_detail's hash.

    However, the expired cache_head may be waiting for a reply from a
    previously submitted request. Such a cache_head has an increased
    refcounter and therefore it won't be freed after cache_put(freeme).

    Because the cache_head was removed from the hash it cannot be found
    during cache_clean() and can be leaked forever, together with stalled
    cache_request and other taken resources.

    In our case we noticed it because an entry in the export cache was
    holding a reference on a filesystem.

    Fixes d202cce8963d ("sunrpc: never return expired entries in sunrpc_cache_lookup")
    Cc: Pavel Tikhomirov
    Cc: stable@kernel.org # 2.6.35
    Signed-off-by: Vasily Averin
    Reviewed-by: NeilBrown
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Vasily Averin
     
  • [ Upstream commit cf76785d30712d90185455e752337acdb53d2a5d ]

    Ensure that we clear XPRT_CONNECTING before releasing the XPRT_LOCK so that
    we don't have races between the (asynchronous) socket setup code and
    tasks in xprt_connect().

    Signed-off-by: Trond Myklebust
    Tested-by: Chuck Lever
    Signed-off-by: Sasha Levin

    Trond Myklebust
     

10 Jan, 2019

1 commit

  • [ Upstream commit 3a0ed3e9619738067214871e9cb826fa23b2ddb9 ]

    Al Viro mentioned (Message-ID
    )
    that there is probably a race condition
    lurking in accesses of sk_stamp on 32-bit machines.

    sock->sk_stamp is of type ktime_t which is always an s64.
    On a 32 bit architecture, we might run into situations of
    unsafe access as the access to the field becomes non atomic.

    Use seqlocks for synchronization.
    This allows us to avoid using spinlocks for readers as
    readers do not need mutual exclusion.

    Another approach to solve this is to require sk_lock for all
    modifications of the timestamps. The current approach allows
    for timestamps to have their own lock: sk_stamp_lock.
    This allows for the patch to not compete with already
    existing critical sections, and side effects are limited
    to the paths in the patch.

    The addition of the new field maintains the data locality
    optimizations from
    commit 9115e8cd2a0c ("net: reorganize struct sock for better data
    locality")

    Note that all the instances of the sk_stamp accesses
    are either through the ioctl or the syscall recvmsg.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Deepa Dinamani
     

21 Dec, 2018

1 commit

  • [ Upstream commit 0a9a4304f3614e25d9de9b63502ca633c01c0d70 ]

    If an asynchronous connection attempt completes while another task is
    in xprt_connect(), then the call to rpc_sleep_on() could end up
    racing with the call to xprt_wake_pending_tasks().
    So add a second test of the connection state after we've put the
    task to sleep and set the XPRT_CONNECTING flag, when we know that there
    can be no asynchronous connection attempts still in progress.

    Fixes: 0b9e79431377d ("SUNRPC: Move the test for XPRT_CONNECTING into...")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Sasha Levin

    Trond Myklebust
     

13 Dec, 2018

1 commit

  • commit 8dae5398ab1ac107b1517e8195ed043d5f422bd0 upstream.

    call_encode can be invoked more than once per RPC call. Ensure that
    each call to gss_wrap_req_priv does not overwrite pointers to
    previously allocated memory.

    Signed-off-by: Chuck Lever
    Cc: stable@kernel.org
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     

01 Dec, 2018

1 commit


27 Nov, 2018

1 commit


21 Nov, 2018

1 commit

  • commit 5d7a5bcb67c70cbc904057ef52d3fcfeb24420bb upstream.

    When truncating the encode buffer, the page_ptr is getting
    advanced, causing the next page to be skipped while encoding.
    The page is still included in the response, so the response
    contains a page of bogus data.

    We need to adjust the page_ptr backwards to ensure we encode
    the next page into the correct place.

    We saw this triggered when concurrent directory modifications caused
    nfsd4_encode_direct_fattr() to return nfserr_noent, and the resulting
    call to xdr_truncate_encode() corrupted the READDIR reply.

    Signed-off-by: Frank Sorenson
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Frank Sorenson
     

14 Nov, 2018

2 commits

  • commit bb6ad5572c0022e17e846b382d7413cdcf8055be upstream.

    In call_xpt_users(), we delete the entry from the list, but we
    do not reinitialise it. This triggers the list poisoning when
    we later call unregister_xpt_user() in nfsd4_del_conns().

    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • [ Upstream commit ef739b2175dde9c05594f768cb78149f1ce2ac36 ]

    On a fresh connection, an RPC/RDMA client is supposed to send only
    one RPC Call until it gets a credit grant in the first RPC Reply
    from the server [RFC 8166, Section 3.3.3].

    There is a bug in the Linux client's credit accounting mechanism
    introduced by commit e7ce710a8802 ("xprtrdma: Avoid deadlock when
    credit window is reset"). On connect, it simply dumps all pending
    RPC Calls onto the new connection.

    Servers have been tolerant of this bad behavior. Currently no server
    implementation ever changes its credit grant over reconnects, and
    servers always repost enough Receives before connections are fully
    established.

    To correct this issue, ensure that the client resets both the credit
    grant _and_ the congestion window when handling a reconnect.

    Fixes: e7ce710a8802 ("xprtrdma: Avoid deadlock when credit ... ")
    Signed-off-by: Chuck Lever
    Cc: stable@kernel.org
    Signed-off-by: Anna Schumaker
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     

24 Aug, 2018

2 commits

  • Pull NFS client updates from Anna Schumaker:
    "These patches include adding async support for the v4.2 COPY
    operation. I think Bruce is planning to send the server patches for
    the next release, but I figured we could get the client side out of
    the way now since it's been in my tree for a while. This shouldn't
    cause any problems, since the server will still respond with
    synchronous copies even if the client requests async.

    Features:
    - Add support for asynchronous server-side COPY operations

    Stable bufixes:
    - Fix an off-by-one in bl_map_stripe() (v3.17+)
    - NFSv4 client live hangs after live data migration recovery (v4.9+)
    - xprtrdma: Fix disconnect regression (v4.18+)
    - Fix locking in pnfs_generic_recover_commit_reqs (v4.14+)
    - Fix a sleep in atomic context in nfs4_callback_sequence() (v4.9+)

    Other bugfixes and cleanups:
    - Optimizations and fixes involving NFS v4.1 / pNFS layout handling
    - Optimize lseek(fd, SEEK_CUR, 0) on directories to avoid locking
    - Immediately reschedule writeback when the server replies with an
    error
    - Fix excessive attribute revalidation in nfs_execute_ok()
    - Add error checking to nfs_idmap_prepare_message()
    - Use new vm_fault_t return type
    - Return a delegation when reclaiming one that the server has
    recalled
    - Referrals should inherit proto setting from parents
    - Make rpc_auth_create_args a const
    - Improvements to rpc_iostats tracking
    - Fix a potential reference leak when there is an error processing a
    callback
    - Fix rmdir / mkdir / rename nlink accounting
    - Fix updating inode change attribute
    - Fix error handling in nfsn4_sp4_select_mode()
    - Use an appropriate work queue for direct-write completion
    - Don't busy wait if NFSv4 session draining is interrupted"

    * tag 'nfs-for-4.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (54 commits)
    pNFS: Remove unwanted optimisation of layoutget
    pNFS/flexfiles: ff_layout_pg_init_read should exit on error
    pNFS: Treat RECALLCONFLICT like DELAY...
    pNFS: When updating the stateid in layoutreturn, also update the recall range
    NFSv4: Fix a sleep in atomic context in nfs4_callback_sequence()
    NFSv4: Fix locking in pnfs_generic_recover_commit_reqs
    NFSv4: Fix a typo in nfs4_init_channel_attrs()
    NFSv4: Don't busy wait if NFSv4 session draining is interrupted
    NFS recover from destination server reboot for copies
    NFS add a simple sync nfs4_proc_commit after async COPY
    NFS handle COPY ERR_OFFLOAD_NO_REQS
    NFS send OFFLOAD_CANCEL when COPY killed
    NFS export nfs4_async_handle_error
    NFS handle COPY reply CB_OFFLOAD call race
    NFS add support for asynchronous COPY
    NFS COPY xdr handle async reply
    NFS OFFLOAD_CANCEL xdr
    NFS CB_OFFLOAD xdr
    NFS: Use an appropriate work queue for direct-write completion
    NFSv4: Fix error handling in nfs4_sp4_select_mode()
    ...

    Linus Torvalds
     
  • Pull nfsd updates from Bruce Fields:
    "Chuck Lever fixed a problem with NFSv4.0 callbacks over GSS from
    multi-homed servers.

    The only new feature is a minor bit of protocol (change_attr_type)
    which the client doesn't even use yet.

    Other than that, various bugfixes and cleanup"

    * tag 'nfsd-4.19-1' of git://linux-nfs.org/~bfields/linux: (27 commits)
    sunrpc: Add comment defining gssd upcall API keywords
    nfsd: Remove callback_cred
    nfsd: Use correct credential for NFSv4.0 callback with GSS
    sunrpc: Extract target name into svc_cred
    sunrpc: Enable the kernel to specify the hostname part of service principals
    sunrpc: Don't use stack buffer with scatterlist
    rpc: remove unneeded variable 'ret' in rdma_listen_handler
    nfsd: use true and false for boolean values
    nfsd: constify write_op[]
    fs/nfsd: Delete invalid assignment statements in nfsd4_decode_exchange_id
    NFSD: Handle full-length symlinks
    NFSD: Refactor the generic write vector fill helper
    svcrdma: Clean up Read chunk path
    svcrdma: Avoid releasing a page in svc_xprt_release()
    nfsd: Mark expected switch fall-through
    sunrpc: remove redundant variables 'checksumlen','blocksize' and 'data'
    nfsd: fix leaked file lock with nfs exported overlayfs
    nfsd: don't advertise a SCSI layout for an unsupported request_queue
    nfsd: fix corrupted reply to badly ordered compound
    nfsd: clarify check_op_ordering
    ...

    Linus Torvalds
     

23 Aug, 2018

4 commits

  • During review, it was found that the target, service, and srchost
    keywords are easily conflated. Add an explainer.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • NFSv4.0 callback needs to know the GSS target name the client used
    when it established its lease. That information is available from
    the GSS context created by gssproxy. Make it available in each
    svc_cred.

    Note this will also give us access to the real target service
    principal name (which is typically "nfs", but spec does not require
    that).

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • A multi-homed NFS server may have more than one "nfs" key in its
    keytab. Enable the kernel to pick the key it wants as a machine
    credential when establishing a GSS context.

    This is useful for GSS-protected NFSv4.0 callbacks, which are
    required by RFC 7530 S3.3.3 to use the same principal as the service
    principal the client used when establishing its lease.

    A complementary modification to rpc.gssd is required to fully enable
    this feature.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Fedora got a bug report from NFS:

    kernel BUG at include/linux/scatterlist.h:143!
    ...
    RIP: 0010:sg_init_one+0x7d/0x90
    ..
    make_checksum+0x4e7/0x760 [rpcsec_gss_krb5]
    gss_get_mic_kerberos+0x26e/0x310 [rpcsec_gss_krb5]
    gss_marshal+0x126/0x1a0 [auth_rpcgss]
    ? __local_bh_enable_ip+0x80/0xe0
    ? call_transmit_status+0x1d0/0x1d0 [sunrpc]
    call_transmit+0x137/0x230 [sunrpc]
    __rpc_execute+0x9b/0x490 [sunrpc]
    rpc_run_task+0x119/0x150 [sunrpc]
    nfs4_run_exchange_id+0x1bd/0x250 [nfsv4]
    _nfs4_proc_exchange_id+0x2d/0x490 [nfsv4]
    nfs41_discover_server_trunking+0x1c/0xa0 [nfsv4]
    nfs4_discover_server_trunking+0x80/0x270 [nfsv4]
    nfs4_init_client+0x16e/0x240 [nfsv4]
    ? nfs_get_client+0x4c9/0x5d0 [nfs]
    ? _raw_spin_unlock+0x24/0x30
    ? nfs_get_client+0x4c9/0x5d0 [nfs]
    nfs4_set_client+0xb2/0x100 [nfsv4]
    nfs4_create_server+0xff/0x290 [nfsv4]
    nfs4_remote_mount+0x28/0x50 [nfsv4]
    mount_fs+0x3b/0x16a
    vfs_kern_mount.part.35+0x54/0x160
    nfs_do_root_mount+0x7f/0xc0 [nfsv4]
    nfs4_try_mount+0x43/0x70 [nfsv4]
    ? get_nfs_version+0x21/0x80 [nfs]
    nfs_fs_mount+0x789/0xbf0 [nfs]
    ? pcpu_alloc+0x6ca/0x7e0
    ? nfs_clone_super+0x70/0x70 [nfs]
    ? nfs_parse_mount_options+0xb40/0xb40 [nfs]
    mount_fs+0x3b/0x16a
    vfs_kern_mount.part.35+0x54/0x160
    do_mount+0x1fd/0xd50
    ksys_mount+0xba/0xd0
    __x64_sys_mount+0x21/0x30
    do_syscall_64+0x60/0x1f0
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    This is BUG_ON(!virt_addr_valid(buf)) triggered by using a stack
    allocated buffer with a scatterlist. Convert the buffer for
    rc4salt to be dynamically allocated instead.

    Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1615258
    Signed-off-by: Laura Abbott
    Signed-off-by: J. Bruce Fields

    Laura Abbott
     

17 Aug, 2018

2 commits

  • rdma.git merge resolution for the 4.19 merge window

    Conflicts:
    drivers/infiniband/core/rdma_core.c
    - Use the rdma code and revise with the new spelling for
    atomic_fetch_add_unless
    drivers/nvme/host/rdma.c
    - Replace max_sge with max_send_sge in new blk code
    drivers/nvme/target/rdma.c
    - Use the blk code and revise to use NULL for ib_post_recv when
    appropriate
    - Replace max_sge with max_recv_sge in new blk code
    net/rds/ib_send.c
    - Use the net code and revise to use NULL for ib_post_recv when
    appropriate

    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     
  • Resolve merge conflicts from the -rc cycle against the rdma.git tree:

    Conflicts:
    drivers/infiniband/core/uverbs_cmd.c
    - New ifs added to ib_uverbs_ex_create_flow in -rc and for-next
    - Merge removal of file->ucontext in for-next with new code in -rc
    drivers/infiniband/core/uverbs_main.c
    - for-next removed code from ib_uverbs_write() that was modified
    in for-rc

    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

10 Aug, 2018

6 commits

  • The ret is not modified after initalization, So just remove the variable
    and return 0.

    Signed-off-by: zhong jiang
    Signed-off-by: J. Bruce Fields

    zhong jiang
     
  • I've given up on the idea of zero-copy handling of SYMLINK on the
    server side. This is because the Linux VFS symlink API requires the
    symlink pathname to be in a NUL-terminated kmalloc'd buffer. The
    NUL-termination is going to be problematic (watching out for
    landing on a page boundary and dealing with a 4096-byte pathname).

    I don't believe that SYMLINK creation is on a performance path or is
    requested frequently enough that it will cause noticeable CPU cache
    pollution due to data copies.

    There will be two places where a transport callout will be necessary
    to fill in the rqstp: one will be in the svc_fill_symlink_pathname()
    helper that is used by NFSv2 and NFSv3, and the other will be in
    nfsd4_decode_create().

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • fill_in_write_vector() is nearly the same logic as
    svc_fill_write_vector(), but there are a few differences so that
    the former can handle multiple WRITE payloads in a single COMPOUND.

    svc_fill_write_vector() can be adjusted so that it can be used in
    the NFSv4 WRITE code path too. Instead of assuming the pages are
    coming from rq_args.pages, have the caller pass in the page list.

    The immediate benefit is a reduction of code duplication. It also
    prevents the NFSv4 WRITE decoder from passing an empty vector
    element when the transport has provided the payload in the xdr_buf's
    page array.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Simplify the error handling at the tail of recv_read_chunk() by
    re-arranging rq_pages[] housekeeping and documenting it properly.

    NB: In this path, svc_rdma_recvfrom returns zero. Therefore no
    subsequent reply processing is done on the svc_rqstp, and thus the
    rq_respages field does not need to be updated.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • svc_xprt_release() invokes svc_free_res_pages(), which releases
    pages between rq_respages and rq_next_page.

    Historically, the RPC/RDMA transport has set these two pointers to
    be different by one, which means:

    - one page gets released when svc_recv returns 0. This normally
    happens whenever one or more RDMA Reads need to be dispatched to
    complete construction of an RPC Call.

    - one page gets released after every call to svc_send.

    In both cases, this released page is immediately refilled by
    svc_alloc_arg. There does not seem to be a reason for releasing this
    page.

    To avoid this unnecessary memory allocator traffic, set rq_next_page
    more carefully.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Variables 'checksumlen','blocksize' and 'data' are being assigned,
    but are never used, hence they are redundant and can be removed.

    Fix the following warning:

    net/sunrpc/auth_gss/gss_krb5_wrap.c:443:7: warning: variable ‘blocksize’ set but not used [-Wunused-but-set-variable]
    net/sunrpc/auth_gss/gss_krb5_crypto.c:376:15: warning: variable ‘checksumlen’ set but not used [-Wunused-but-set-variable]
    net/sunrpc/xprtrdma/svc_rdma.c:97:9: warning: variable ‘data’ set but not used [-Wunused-but-set-variable]

    Signed-off-by: YueHaibing
    Reviewed-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    YueHaibing
     

09 Aug, 2018

1 commit

  • I found that injecting disconnects with v4.18-rc resulted in
    random failures of the multi-threaded git regression test.

    The root cause appears to be that, after a reconnect, the
    RPC/RDMA transport is waking pending RPCs before the transport has
    posted enough Receive buffers to receive the Replies. If a Reply
    arrives before enough Receive buffers are posted, the connection
    is dropped. A few connection drops happen in quick succession as
    the client and server struggle to regain credit synchronization.

    This regression was introduced with commit 7c8d9e7c8863 ("xprtrdma:
    Move Receive posting to Receive handler"). The client is supposed to
    post a single Receive when a connection is established because
    it's not supposed to send more than one RPC Call before it gets
    a fresh credit grant in the first RPC Reply [RFC 8166, Section
    3.3.3].

    Unfortunately there appears to be a longstanding bug in the Linux
    client's credit accounting mechanism. On connect, it simply dumps
    all pending RPC Calls onto the new connection. It's possible it has
    done this ever since the RPC/RDMA transport was added to the kernel
    ten years ago.

    Servers have so far been tolerant of this bad behavior. Currently no
    server implementation ever changes its credit grant over reconnects,
    and servers always repost enough Receives before connections are
    fully established.

    The Linux client implementation used to post a Receive before each
    of these Calls. This has covered up the flooding send behavior.

    I could try to correct this old bug so that the client sends exactly
    one RPC Call and waits for a Reply. Since we are so close to the
    next merge window, I'm going to instead provide a simple patch to
    post enough Receives before a reconnect completes (based on the
    number of credits granted to the previous connection).

    The spurious disconnects will be gone, but the client will still
    send multiple RPC Calls immediately after a reconnect.

    Addressing the latter problem will wait for a merge window because
    a) I expect it to be a large change requiring lots of testing, and
    b) obviously the Linux client has interoperated successfully since
    day zero while still being broken.

    Fixes: 7c8d9e7c8863 ("xprtrdma: Move Receive posting to ... ")
    Cc: stable@vger.kernel.org # v4.18+
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

05 Aug, 2018

1 commit


01 Aug, 2018

4 commits

  • Remove trailing whitespace and blank line at EOF

    Signed-off-by: Stephen Hemminger
    Signed-off-by: Anna Schumaker

    Stephen Hemminger
     
  • After a live data migration event at the NFS server, the client may send
    I/O requests to the wrong server, causing a live hang due to repeated
    recovery events. On the wire, this will appear as an I/O request failing
    with NFS4ERR_BADSESSION, followed by successful CREATE_SESSION, repeatedly.
    NFS4ERR_BADSSESSION is returned because the session ID being used was
    issued by the other server and is not valid at the old server.

    The failure is caused by async worker threads having cached the transport
    (xprt) in the rpc_task structure. After the migration recovery completes,
    the task is redispatched and the task resends the request to the wrong
    server based on the old value still present in tk_xprt.

    The solution is to recompute the tk_xprt field of the rpc_task structure
    so that the request goes to the correct server.

    Signed-off-by: Bill Baker
    Reviewed-by: Chuck Lever
    Tested-by: Helen Chao
    Fixes: fb43d17210ba ("SUNRPC: Use the multipath iterator to assign a ...")
    Cc: stable@vger.kernel.org # v4.9+
    Signed-off-by: Anna Schumaker

    Bill Baker
     
  • Smatch complains that "num" can be uninitialized when kstrtoul() returns
    -ERANGE. It's true enough, but basically harmless in this case.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Anna Schumaker

    Dan Carpenter
     
  • The existing rpc_print_iostats has a few shortcomings. First, the naming
    is not consistent with other functions in the kernel that display stats.
    Second, it is really displaying stats for an rpc_clnt structure as it
    displays both xprt stats and per-op stats. Third, it does not handle
    rpc_clnt clones, which is important for the one in-kernel tree caller
    of this function, the NFS client's nfs_show_stats function.

    Fix all of the above by renaming the rpc_print_iostats to
    rpc_clnt_show_stats and looping through any rpc_clnt clones via
    cl_parent.

    Once this interface is fixed, this addresses a problem with NFSv4.
    Before this patch, the /proc/self/mountstats always showed incorrect
    counts for NFSv4 lease and session related opcodes such as SEQUENCE,
    RENEW, SETCLIENTID, CREATE_SESSION, etc. These counts were always 0
    even though many ops would go over the wire. The reason for this is
    there are multiple rpc_clnt structures allocated for any given NFSv4
    mount, and inside nfs_show_stats() we callled into rpc_print_iostats()
    which only handled one of them, nfs_server->client. Fix these counts
    by calling sunrpc's new rpc_clnt_show_stats() function, which handles
    cloned rpc_clnt structs and prints the stats together.

    Note that one side-effect of the above is that multiple mounts from
    the same NFS server will show identical counts in the above ops due
    to the fact the one rpc_clnt (representing the NFSv4 client state)
    is shared across mounts.

    Signed-off-by: Dave Wysochanski
    Signed-off-by: Anna Schumaker

    Dave Wysochanski