22 Jul, 2020

1 commit

  • [ Upstream commit 912288442cb2f431bf3c8cb097a5de83bc6dbac1 ]

    Currently the header size calculations are using an assignment
    operator instead of a += operator when accumulating the header
    size leading to incorrect sizes. Fix this by using the correct
    operator.

    Addresses-Coverity: ("Unused value")
    Fixes: 302d3deb2068 ("xprtrdma: Prevent inline overflow")
    Signed-off-by: Colin Ian King
    Reviewed-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Sasha Levin

    Colin Ian King
     

01 Jul, 2020

1 commit

  • commit 7b2182ec381f8ea15c7eb1266d6b5d7da620ad93 upstream.

    The RPC client currently doesn't handle ERR_CHUNK replies correctly.
    rpcrdma_complete_rqst() incorrectly passes a negative number to
    xprt_complete_rqst() as the number of bytes copied. Instead, set
    task->tk_status to the error value, and return zero bytes copied.

    In these cases, return -EIO rather than -EREMOTEIO. The RPC client's
    finite state machine doesn't know what to do with -EREMOTEIO.

    Additional clean ups:
    - Don't double-count RDMA_ERROR replies
    - Remove a stale comment

    Signed-off-by: Chuck Lever
    Cc:
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     

02 May, 2020

2 commits

  • commit 23cf1ee1f1869966b75518c59b5cbda4c6c92450 upstream.

    Utilize the xpo_release_rqst transport method to ensure that each
    rqstp's svc_rdma_recv_ctxt object is released even when the server
    cannot return a Reply for that rqstp.

    Without this fix, each RPC whose Reply cannot be sent leaks one
    svc_rdma_recv_ctxt. This is a 2.5KB structure, a 4KB DMA-mapped
    Receive buffer, and any pages that might be part of the Reply
    message.

    The leak is infrequent unless the network fabric is unreliable or
    Kerberos is in use, as GSS sequence window overruns, which result
    in connection loss, are more common on fast transports.

    Fixes: 3a88092ee319 ("svcrdma: Preserve Receive buffer until svc_rdma_sendto")
    Signed-off-by: Chuck Lever
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit e28b4fc652c1830796a4d3e09565f30c20f9a2cf upstream.

    I hit this while testing nfsd-5.7 with kernel memory debugging
    enabled on my server:

    Mar 30 13:21:45 klimt kernel: BUG: unable to handle page fault for address: ffff8887e6c279a8
    Mar 30 13:21:45 klimt kernel: #PF: supervisor read access in kernel mode
    Mar 30 13:21:45 klimt kernel: #PF: error_code(0x0000) - not-present page
    Mar 30 13:21:45 klimt kernel: PGD 3601067 P4D 3601067 PUD 87c519067 PMD 87c3e2067 PTE 800ffff8193d8060
    Mar 30 13:21:45 klimt kernel: Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
    Mar 30 13:21:45 klimt kernel: CPU: 2 PID: 1933 Comm: nfsd Not tainted 5.6.0-rc6-00040-g881e87a3c6f9 #1591
    Mar 30 13:21:45 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
    Mar 30 13:21:45 klimt kernel: RIP: 0010:svc_rdma_post_chunk_ctxt+0xab/0x284 [rpcrdma]
    Mar 30 13:21:45 klimt kernel: Code: c1 83 34 02 00 00 29 d0 85 c0 7e 72 48 8b bb a0 02 00 00 48 8d 54 24 08 4c 89 e6 48 8b 07 48 8b 40 20 e8 5a 5c 2b e1 41 89 c6 45 20 89 44 24 04 8b 05 02 e9 01 00 85 c0 7e 33 e9 5e 01 00 00
    Mar 30 13:21:45 klimt kernel: RSP: 0018:ffffc90000dfbdd8 EFLAGS: 00010286
    Mar 30 13:21:45 klimt kernel: RAX: 0000000000000000 RBX: ffff8887db8db400 RCX: 0000000000000030
    Mar 30 13:21:45 klimt kernel: RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000246
    Mar 30 13:21:45 klimt kernel: RBP: ffff8887e6c27988 R08: 0000000000000000 R09: 0000000000000004
    Mar 30 13:21:45 klimt kernel: R10: ffffc90000dfbdd8 R11: 00c068ef00000000 R12: ffff8887eb4e4a80
    Mar 30 13:21:45 klimt kernel: R13: ffff8887db8db634 R14: 0000000000000000 R15: ffff8887fc931000
    Mar 30 13:21:45 klimt kernel: FS: 0000000000000000(0000) GS:ffff88885bd00000(0000) knlGS:0000000000000000
    Mar 30 13:21:45 klimt kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Mar 30 13:21:45 klimt kernel: CR2: ffff8887e6c279a8 CR3: 000000081b72e002 CR4: 00000000001606e0
    Mar 30 13:21:45 klimt kernel: Call Trace:
    Mar 30 13:21:45 klimt kernel: ? svc_rdma_vec_to_sg+0x7f/0x7f [rpcrdma]
    Mar 30 13:21:45 klimt kernel: svc_rdma_send_write_chunk+0x59/0xce [rpcrdma]
    Mar 30 13:21:45 klimt kernel: svc_rdma_sendto+0xf9/0x3ae [rpcrdma]
    Mar 30 13:21:45 klimt kernel: ? nfsd_destroy+0x51/0x51 [nfsd]
    Mar 30 13:21:45 klimt kernel: svc_send+0x105/0x1e3 [sunrpc]
    Mar 30 13:21:45 klimt kernel: nfsd+0xf2/0x149 [nfsd]
    Mar 30 13:21:45 klimt kernel: kthread+0xf6/0xfb
    Mar 30 13:21:45 klimt kernel: ? kthread_queue_delayed_work+0x74/0x74
    Mar 30 13:21:45 klimt kernel: ret_from_fork+0x3a/0x50
    Mar 30 13:21:45 klimt kernel: Modules linked in: ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue ib_umad ib_ipoib mlx4_ib sb_edac x86_pkg_temp_thermal iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel glue_helper crypto_simd cryptd pcspkr rpcrdma i2c_i801 rdma_ucm lpc_ich mfd_core ib_iser rdma_cm iw_cm ib_cm mei_me raid0 libiscsi mei sg scsi_transport_iscsi ioatdma wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en sd_mod sr_mod cdrom mlx4_core crc32c_intel igb nvme i2c_algo_bit ahci i2c_core libahci nvme_core dca libata t10_pi qedr dm_mirror dm_region_hash dm_log dm_mod dax qede qed crc8 ib_uverbs ib_core
    Mar 30 13:21:45 klimt kernel: CR2: ffff8887e6c279a8
    Mar 30 13:21:45 klimt kernel: ---[ end trace 87971d2ad3429424 ]---

    It's absolutely not safe to use resources pointed to by the @send_wr
    argument of ib_post_send() _after_ that function returns. Those
    resources are typically freed by the Send completion handler, which
    can run before ib_post_send() returns.

    Thus the trace points currently around ib_post_send() in the
    server's RPC/RDMA transport are a hazard, even when they are
    disabled. Rearrange them so that they touch the Work Request only
    _before_ ib_post_send() is invoked.

    Fixes: bd2abef33394 ("svcrdma: Trace key RDMA API events")
    Fixes: 4201c7464753 ("svcrdma: Introduce svc_rdma_send_ctxt")
    Signed-off-by: Chuck Lever
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     

29 Apr, 2020

1 commit

  • commit 6221f1d9b63fed6260273e59a2b89ab30537a811 upstream.

    Currently, after the forward channel connection goes away,
    backchannel operations are causing soft lockups on the server
    because call_transmit_status's SOFTCONN logic ignores ENOTCONN.
    Such backchannel Calls are aggressively retried until the client
    reconnects.

    Backchannel Calls should use RPC_TASK_NOCONNECT rather than
    RPC_TASK_SOFTCONN. If there is no forward connection, the server is
    not capable of establishing a connection back to the client, thus
    that backchannel request should fail before the server attempts to
    send it. Commit 58255a4e3ce5 ("NFSD: NFSv4 callback client should
    use RPC_TASK_SOFTCONN") was merged several years before
    RPC_TASK_NOCONNECT was available.

    Because setup_callback_client() explicitly sets NOPING, the NFSv4.0
    callback connection depends on the first callback RPC to initiate
    a connection to the client. Thus NFSv4.0 needs to continue to use
    RPC_TASK_SOFTCONN.

    Suggested-by: Trond Myklebust
    Signed-off-by: Chuck Lever
    Cc: # v4.20+
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     

20 Feb, 2020

1 commit

  • commit ca1c671302825182629d3c1a60363cee6f5455bb upstream.

    The @nents value that was passed to ib_dma_map_sg() has to be passed
    to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
    concatenate sg entries, it will return a different nents value than
    it was passed.

    The bug was exposed by recent changes to the AMD IOMMU driver, which
    enabled sg entry concatenation.

    Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
    new memory registration API") and reviewing other kernel ULPs, it's
    not clear that the frwr_map() logic was ever correct for this case.

    Reported-by: Andre Tomt
    Suggested-by: Robin Murphy
    Signed-off-by: Chuck Lever
    Cc: stable@vger.kernel.org
    Reviewed-by: Jason Gunthorpe
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     

26 Jan, 2020

1 commit

  • commit 8729aaba74626c4ebce3abf1b9e96bb62d2958ca upstream.

    I noticed that for callback requests, the reported backlog latency
    is always zero, and the rtt value is crazy big. The problem was that
    rqst->rq_xtime is never set for backchannel requests.

    Fixes: 78215759e20d ("SUNRPC: Make RTT measurement more ... ")
    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     

18 Jan, 2020

7 commits

  • commit 671c450b6fe0680ea1cb1cf1526d764fdd5a3d3f upstream.

    Since v5.4, a device removal occasionally triggered this oops:

    Dec 2 17:13:53 manet kernel: BUG: unable to handle page fault for address: 0000000c00000219
    Dec 2 17:13:53 manet kernel: #PF: supervisor read access in kernel mode
    Dec 2 17:13:53 manet kernel: #PF: error_code(0x0000) - not-present page
    Dec 2 17:13:53 manet kernel: PGD 0 P4D 0
    Dec 2 17:13:53 manet kernel: Oops: 0000 [#1] SMP
    Dec 2 17:13:53 manet kernel: CPU: 2 PID: 468 Comm: kworker/2:1H Tainted: G W 5.4.0-00050-g53717e43af61 #883
    Dec 2 17:13:53 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
    Dec 2 17:13:53 manet kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
    Dec 2 17:13:53 manet kernel: RIP: 0010:rpcrdma_wc_receive+0x7c/0xf6 [rpcrdma]
    Dec 2 17:13:53 manet kernel: Code: 6d 8b 43 14 89 c1 89 45 78 48 89 4d 40 8b 43 2c 89 45 14 8b 43 20 89 45 18 48 8b 45 20 8b 53 14 48 8b 30 48 8b 40 10 48 8b 38 8b 87 18 02 00 00 48 85 c0 75 18 48 8b 05 1e 24 c4 e1 48 85 c0
    Dec 2 17:13:53 manet kernel: RSP: 0018:ffffc900035dfe00 EFLAGS: 00010246
    Dec 2 17:13:53 manet kernel: RAX: ffff888467290000 RBX: ffff88846c638400 RCX: 0000000000000048
    Dec 2 17:13:53 manet kernel: RDX: 0000000000000048 RSI: 00000000f942e000 RDI: 0000000c00000001
    Dec 2 17:13:53 manet kernel: RBP: ffff888467611b00 R08: ffff888464e4a3c4 R09: 0000000000000000
    Dec 2 17:13:53 manet kernel: R10: ffffc900035dfc88 R11: fefefefefefefeff R12: ffff888865af4428
    Dec 2 17:13:53 manet kernel: R13: ffff888466023000 R14: ffff88846c63f000 R15: 0000000000000010
    Dec 2 17:13:53 manet kernel: FS: 0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000
    Dec 2 17:13:53 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Dec 2 17:13:53 manet kernel: CR2: 0000000c00000219 CR3: 0000000002009002 CR4: 00000000001606e0
    Dec 2 17:13:53 manet kernel: Call Trace:
    Dec 2 17:13:53 manet kernel: __ib_process_cq+0x5c/0x14e [ib_core]
    Dec 2 17:13:53 manet kernel: ib_cq_poll_work+0x26/0x70 [ib_core]
    Dec 2 17:13:53 manet kernel: process_one_work+0x19d/0x2cd
    Dec 2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
    Dec 2 17:13:53 manet kernel: worker_thread+0x1a6/0x25a
    Dec 2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
    Dec 2 17:13:53 manet kernel: kthread+0xf4/0xf9
    Dec 2 17:13:53 manet kernel: ? kthread_queue_delayed_work+0x74/0x74
    Dec 2 17:13:53 manet kernel: ret_from_fork+0x24/0x30

    The proximal cause is that this rpcrdma_rep has a rr_rdmabuf that
    is still pointing to the old ib_device, which has been freed. The
    only way that is possible is if this rpcrdma_rep was not destroyed
    by rpcrdma_ia_remove.

    Debugging showed that was indeed the case: this rpcrdma_rep was
    still in use by a completing RPC at the time of the device removal,
    and thus wasn't on the rep free list. So, it was not found by
    rpcrdma_reps_destroy().

    The fix is to introduce a list of all rpcrdma_reps so that they all
    can be found when a device is removed. That list is used to perform
    only regbuf DMA unmapping, replacing that call to
    rpcrdma_reps_destroy().

    Meanwhile, to prevent corruption of this list, I've moved the
    destruction of temp rpcrdma_rep objects to rpcrdma_post_recvs().
    rpcrdma_xprt_drain() ensures that post_recvs (and thus rep_destroy) is
    not invoked while rpcrdma_reps_unmap is walking rb_all_reps, thus
    protecting the rb_all_reps list.

    Fixes: b0b227f071a0 ("xprtrdma: Use an llist to manage free rpcrdma_reps")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit 13cb886c591f341a8759f175292ddf978ef903a1 upstream.

    I've found that on occasion, "rmmod " will hang while if an NFS
    is under load.

    Ensure that ri_remove_done is initialized only just before the
    transport is woken up to force a close. This avoids the completion
    possibly getting initialized again while the CM event handler is
    waiting for a wake-up.

    Fixes: bebd031866ca ("xprtrdma: Support unplugging an HCA from under an NFS mount")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit b32b9ed493f938e191f790a0991d20b18b38c35b upstream.

    On device re-insertion, the RDMA device driver crashes trying to set
    up a new QP:

    Nov 27 16:32:06 manet kernel: BUG: kernel NULL pointer dereference, address: 00000000000001c0
    Nov 27 16:32:06 manet kernel: #PF: supervisor write access in kernel mode
    Nov 27 16:32:06 manet kernel: #PF: error_code(0x0002) - not-present page
    Nov 27 16:32:06 manet kernel: PGD 0 P4D 0
    Nov 27 16:32:06 manet kernel: Oops: 0002 [#1] SMP
    Nov 27 16:32:06 manet kernel: CPU: 1 PID: 345 Comm: kworker/u28:0 Tainted: G W 5.4.0 #852
    Nov 27 16:32:06 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
    Nov 27 16:32:06 manet kernel: Workqueue: xprtiod xprt_rdma_connect_worker [rpcrdma]
    Nov 27 16:32:06 manet kernel: RIP: 0010:atomic_try_cmpxchg+0x2/0x12
    Nov 27 16:32:06 manet kernel: Code: ff ff 48 8b 04 24 5a c3 c6 07 00 0f 1f 40 00 c3 31 c0 48 81 ff 08 09 68 81 72 0c 31 c0 48 81 ff 83 0c 68 81 0f 92 c0 c3 8b 06 0f b1 17 0f 94 c2 84 d2 75 02 89 06 88 d0 c3 53 ba 01 00 00 00
    Nov 27 16:32:06 manet kernel: RSP: 0018:ffffc900035abbf0 EFLAGS: 00010046
    Nov 27 16:32:06 manet kernel: RAX: 0000000000000000 RBX: 00000000000001c0 RCX: 0000000000000000
    Nov 27 16:32:06 manet kernel: RDX: 0000000000000001 RSI: ffffc900035abbfc RDI: 00000000000001c0
    Nov 27 16:32:06 manet kernel: RBP: ffffc900035abde0 R08: 000000000000000e R09: ffffffffffffc000
    Nov 27 16:32:06 manet kernel: R10: 0000000000000000 R11: 000000000002e800 R12: ffff88886169d9f8
    Nov 27 16:32:06 manet kernel: R13: ffff88886169d9f4 R14: 0000000000000246 R15: 0000000000000000
    Nov 27 16:32:06 manet kernel: FS: 0000000000000000(0000) GS:ffff88846fa40000(0000) knlGS:0000000000000000
    Nov 27 16:32:06 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Nov 27 16:32:06 manet kernel: CR2: 00000000000001c0 CR3: 0000000002009006 CR4: 00000000001606e0
    Nov 27 16:32:06 manet kernel: Call Trace:
    Nov 27 16:32:06 manet kernel: do_raw_spin_lock+0x2f/0x5a
    Nov 27 16:32:06 manet kernel: create_qp_common.isra.47+0x856/0xadf [mlx4_ib]
    Nov 27 16:32:06 manet kernel: ? slab_post_alloc_hook.isra.60+0xa/0x1a
    Nov 27 16:32:06 manet kernel: ? __kmalloc+0x125/0x139
    Nov 27 16:32:06 manet kernel: mlx4_ib_create_qp+0x57f/0x972 [mlx4_ib]

    The fix is to copy the qp_init_attr struct that was just created by
    rpcrdma_ep_create() instead of using the one from the previous
    connection instance.

    Fixes: 98ef77d1aaa7 ("xprtrdma: Send Queue size grows after a reconnect")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit 2ae50ad68cd79224198b525f7bd645c9da98b6ff upstream.

    A recent clean up attempted to separate Receive handling and RPC
    Reply processing, in the name of clean layering.

    Unfortunately, we can't do this because the Receive Queue has to be
    refilled _after_ the most recent credit update from the responder
    is parsed from the transport header, but _before_ we wake up the
    next RPC sender. That is right in the middle of
    rpcrdma_reply_handler().

    Usually this isn't a problem because current responder
    implementations don't vary their credit grant. The one exception is
    when a connection is established: the grant goes from one to a much
    larger number on the first Receive. The requester MUST post enough
    Receives right then so that any outstanding requests can be sent
    without risking RNR and connection loss.

    Fixes: 6ceea36890a0 ("xprtrdma: Refactor Receive accounting")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit c3700780a096fc66467c81076ddf7f3f11d639b5 upstream.

    Close some holes introduced by commit 6dc6ec9e04c4 ("xprtrdma: Cache
    free MRs in each rpcrdma_req") that could result in list corruption.

    In addition, the result that is tabulated in @count is no longer
    used, so @count is removed.

    Fixes: 6dc6ec9e04c4 ("xprtrdma: Cache free MRs in each rpcrdma_req")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit a31b2f939219dd9bffdf01a45bd91f209f8cc369 upstream.

    This is because xprt_request_get_cong() is allowing more than one
    RPC Call to be transmitted before the first Receive on the new
    connection. The first Receive fills the Receive Queue based on the
    server's credit grant. Before that Receive, there is only a single
    Receive WR posted because the client doesn't know the server's
    credit grant.

    Solution is to clear rq_cong on all outstanding rpc_rqsts when the
    the cwnd is reset. This is because an RPC/RDMA credit is good for
    one connection instance only.

    Fixes: 75891f502f5f ("SUNRPC: Support for congestion control ... ")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit 4b93dab36f28e673725e5e6123ebfccf7697f96a upstream.

    When adding frwr_unmap_async way back when, I re-used the existing
    trace_xprtrdma_post_send() trace point to record the return code
    of ib_post_send.

    Unfortunately there are some cases where re-using that trace point
    causes a crash. Instead, construct a trace point specific to posting
    Local Invalidate WRs that will always be safe to use in that context,
    and will act as a trace log eye-catcher for Local Invalidation.

    Fixes: 847568942f93 ("xprtrdma: Remove fr_state")
    Fixes: d8099feda483 ("xprtrdma: Reduce context switching due ... ")
    Signed-off-by: Chuck Lever
    Tested-by: Bill Baker
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     

31 Oct, 2019

1 commit


28 Sep, 2019

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - Add a new knfsd file cache, so that we don't have to open and close
    on each (NFSv2/v3) READ or WRITE. This can speed up read and write
    in some cases. It also replaces our readahead cache.

    - Prevent silent data loss on write errors, by treating write errors
    like server reboots for the purposes of write caching, thus forcing
    clients to resend their writes.

    - Tweak the code that allocates sessions to be more forgiving, so
    that NFSv4.1 mounts are less likely to hang when a server already
    has a lot of clients.

    - Eliminate an arbitrary limit on NFSv4 ACL sizes; they should now be
    limited only by the backend filesystem and the maximum RPC size.

    - Allow the server to enforce use of the correct kerberos credentials
    when a client reclaims state after a reboot.

    And some miscellaneous smaller bugfixes and cleanup"

    * tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux: (34 commits)
    sunrpc: clean up indentation issue
    nfsd: fix nfs read eof detection
    nfsd: Make nfsd_reset_boot_verifier_locked static
    nfsd: degraded slot-count more gracefully as allocation nears exhaustion.
    nfsd: handle drc over-allocation gracefully.
    nfsd: add support for upcall version 2
    nfsd: add a "GetVersion" upcall for nfsdcld
    nfsd: Reset the boot verifier on all write I/O errors
    nfsd: Don't garbage collect files that might contain write errors
    nfsd: Support the server resetting the boot verifier
    nfsd: nfsd_file cache entries should be per net namespace
    nfsd: eliminate an unnecessary acl size limit
    Deprecate nfsd fault injection
    nfsd: remove duplicated include from filecache.c
    nfsd: Fix the documentation for svcxdr_tmpalloc()
    nfsd: Fix up some unused variable warnings
    nfsd: close cached files prior to a REMOVE or RENAME that would replace target
    nfsd: rip out the raparms cache
    nfsd: have nfsd_test_lock use the nfsd_file cache
    nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
    ...

    Linus Torvalds
     

27 Sep, 2019

1 commit

  • Pull NFS client updates from Anna Schumaker:
    "Stable bugfixes:
    - Dequeue the request from the receive queue while we're re-encoding
    # v4.20+
    - Fix buffer handling of GSS MIC without slack # 5.1

    Features:
    - Increase xprtrdma maximum transport header and slot table sizes
    - Add support for nfs4_call_sync() calls using a custom
    rpc_task_struct
    - Optimize the default readahead size
    - Enable pNFS filelayout LAYOUTGET on OPEN

    Other bugfixes and cleanups:
    - Fix possible null-pointer dereferences and memory leaks
    - Various NFS over RDMA cleanups
    - Various NFS over RDMA comment updates
    - Don't receive TCP data into a reset request buffer
    - Don't try to parse incomplete RPC messages
    - Fix congestion window race with disconnect
    - Clean up pNFS return-on-close error handling
    - Fixes for NFS4ERR_OLD_STATEID handling"

    * tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (53 commits)
    pNFS/filelayout: enable LAYOUTGET on OPEN
    NFS: Optimise the default readahead size
    NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
    NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
    NFSv4: Fix OPEN_DOWNGRADE error handling
    pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid
    NFSv4: Add a helper to increment stateid seqids
    NFSv4: Handle RPC level errors in LAYOUTRETURN
    NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
    NFSv4: Clean up pNFS return-on-close error handling
    pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
    NFS: remove unused check for negative dentry
    NFSv3: use nfs_add_or_obtain() to create and reference inodes
    NFS: Refactor nfs_instantiate() for dentry referencing callers
    SUNRPC: Fix congestion window race with disconnect
    SUNRPC: Don't try to parse incomplete RPC messages
    SUNRPC: Rename xdr_buf_read_netobj to xdr_buf_read_mic
    SUNRPC: Fix buffer handling of GSS MIC without slack
    SUNRPC: RPC level errors should always set task->tk_rpc_status
    SUNRPC: Don't receive TCP data into a request buffer that has been reset
    ...

    Linus Torvalds
     

27 Aug, 2019

3 commits

  • Eli Dorfman reports that after a series of idle disconnects, an
    RPC/RDMA transport becomes unusable (rdma_create_qp returns
    -ENOMEM). Problem was tracked down to increasing Send Queue size
    after each reconnect.

    The rdma_create_qp() API does not promise to leave its @qp_init_attr
    parameter unaltered. In fact, some drivers do modify one or more of
    its fields. Thus our calls to rdma_create_qp must use a fresh copy
    of ib_qp_init_attr each time.

    This fix is appropriate for kernels dating back to late 2007, though
    it will have to be adapted, as the connect code has changed over the
    years.

    Reported-by: Eli Dorfman
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Ensure that the re-establishment delay does not grow exponentially
    on each good reconnect. This probably should have been part of
    commit 675dd90ad093 ("xprtrdma: Modernize ops->connect").

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • The optimization done in "xprtrdma: Simplify rpcrdma_mr_pop" was a
    bit too optimistic. MRs left over after a reconnect still need to
    be recycled, not added back to the free list, since they could be
    in flight or actually fully registered.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

22 Aug, 2019

3 commits

  • Micro-optimization: In rpcrdma_post_recvs, since commit e340c2d6ef2a
    ("xprtrdma: Reduce the doorbell rate (Receive)"), the common case is
    to return without doing anything. Found with perf.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Micro-optimization: Save the cost of three function calls during
    transport header encoding.

    These were "noinline" before to generate more meaningful call stacks
    during debugging, but this code is now pretty stable.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • For the moment the returned value just happens to be correct because
    the current backchannel server implementation does not vary the
    number of credits it offers. The spec does permit this value to
    change during the lifetime of a connection, however.

    The actual maximum is fixed for all RPC/RDMA transports, because
    each transport instance has to pre-allocate the resources for
    processing BC requests. That's the value that should be returned.

    Fixes: 7402a4fedc2b ("SUNRPC: Fix up backchannel slot table ... ")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

21 Aug, 2019

11 commits

  • Clean up: The function name should match the documenting comment.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • rpcrdma_rep objects are removed from their free list by only a
    single thread: the Receive completion handler. Thus that free list
    can be converted to an llist, where a single-threaded consumer and
    a multi-threaded producer (rpcrdma_buffer_put) can both access the
    llist without the need for any serialization.

    This eliminates spin lock contention between the Receive completion
    handler and rpcrdma_buffer_get, and makes the rep consumer wait-
    free.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean up: Now that the free list is used sparingly, get rid of the
    separate spin lock protecting it.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Instead of a globally-contended MR free list, cache MRs in each
    rpcrdma_req as they are released. This means acquiring and releasing
    an MR will be lock-free in the common case, even outside the
    transport send lock.

    The original idea of per-rpcrdma_req MR free lists was suggested by
    Shirley Ma several years ago. I just now
    figured out how to make that idea work with on-demand MR allocation.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Probably would be good to also pass GFP flags to ib_alloc_mr.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Refactor: Retrieve an MR and handle error recovery entirely in
    rpc_rdma.c, as this is not a device-specific function.

    Note that since commit 89f90fe1ad8b ("SUNRPC: Allow calls to
    xprt_transmit() to drain the entire transmit queue"), the
    xprt_transmit function handles the cond_resched. The transport no
    longer has to do this itself.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean up. There is only one remaining rpcrdma_mr_put call site, and
    it can be directly replaced with unmap_and_put because mr->mr_dir is
    set to DMA_NONE just before the call.

    Now all the call sites do a DMA unmap, and we can just rename
    mr_unmap_and_put to mr_put, which nicely matches mr_get.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean up: rpcrdma_mr_pop call sites check if the list is empty
    first. Let's replace the list_empty with less costly logic.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Commit 48be539dd44a ("xprtrdma: Introduce ->alloc_slot call-out for
    xprtrdma") added a separate alloc_slot and free_slot to the RPC/RDMA
    transport. Later, commit 75891f502f5f ("SUNRPC: Support for
    congestion control when queuing is enabled") modified the generic
    alloc/free_slot methods, but neglected the methods in xprtrdma.

    Found via code review.

    Fixes: 75891f502f5f ("SUNRPC: Support for congestion control ... ")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean up: There are other "all" list heads. For code clarity
    distinguish this one as for use only for MRs by renaming it.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Make the field name the same for all trace points that handle
    pointers to struct rpcrdma_rep. That makes it easy to grep for
    matching rep points in trace output.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

20 Aug, 2019

4 commits

  • Although I haven't seen any performance results that justify it,
    I've received several complaints that NFS/RDMA no longer supports
    a maximum rsize and wsize of 1MB. These days it is somewhat smaller.

    To simplify the logic that determines whether a chunk list is
    necessary, the implementation uses a fixed maximum size of the
    transport header. Currently that maximum size is 256 bytes, one
    quarter of the default inline threshold size for RPC/RDMA v1.

    Since commit a78868497c2e ("xprtrdma: Reduce max_frwr_depth"), the
    size of chunks is also smaller to take advantage of inline page
    lists in device internal MR data structures.

    The combination of these two design choices has reduced the maximum
    NFS rsize and wsize that can be used for most RNIC/HCAs. Increasing
    the maximum transport header size and the maximum number of RDMA
    segments it can contain increases the negotiated maximum rsize/wsize
    on common RNIC/HCAs.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Commit 302d3deb206 ("xprtrdma: Prevent inline overflow") added this
    calculation back in 2016, but got it wrong. I tested only the lower
    bound, which is why there is a max_t there. The upper bound should be
    rounded up too.

    Now, when using DIV_ROUND_UP, that takes care of the lower bound as
    well.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Comment was made obsolete by commit 8cec3dba76a4 ("xprtrdma:
    rpcrdma_regbuf alignment").

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Things have changed since this comment was written. In particular,
    the reworking of connection closing, on-demand creation of MRs, and
    the removal of fr_state all mean that deferring MR recovery to
    frwr_map is no longer needed. The description is obsolete.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

19 Aug, 2019

2 commits