24 Feb, 2020

1 commit


20 Feb, 2020

1 commit

  • commit ca1c671302825182629d3c1a60363cee6f5455bb upstream.

    The @nents value that was passed to ib_dma_map_sg() has to be passed
    to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
    concatenate sg entries, it will return a different nents value than
    it was passed.

    The bug was exposed by recent changes to the AMD IOMMU driver, which
    enabled sg entry concatenation.

    Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
    new memory registration API") and reviewing other kernel ULPs, it's
    not clear that the frwr_map() logic was ever correct for this case.

    Reported-by: Andre Tomt
    Suggested-by: Robin Murphy
    Signed-off-by: Chuck Lever
    Cc: stable@vger.kernel.org
    Reviewed-by: Jason Gunthorpe
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     

11 Feb, 2020

1 commit

  • commit 3d96208c30f84d6edf9ab4fac813306ac0d20c10 upstream.

    When upcalling gssproxy, cache_head.expiry_time is set as a
    timeval, not seconds since boot. As such, RPC cache expiry
    logic will not clean expired objects created under
    auth.rpcsec.context cache.

    This has proven to cause kernel memory leaks on field. Using
    64 bit variants of getboottime/timespec

    Expiration times have worked this way since 2010's c5b29f885afe "sunrpc:
    use seconds since boot in expiry cache". The gssproxy code introduced
    in 2012 added gss_proxy_save_rsc and introduced the bug. That's a while
    for this to lurk, but it required a bit of an extreme case to make it
    obvious.

    Signed-off-by: Roberto Bergantinos Corpas
    Cc: stable@vger.kernel.org
    Fixes: 030d794bf498 "SUNRPC: Use gssproxy upcall for server..."
    Tested-By: Frank Sorenson
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Roberto Bergantinos Corpas
     

26 Jan, 2020

3 commits

  • [ Upstream commit e8d70b321ecc9b23d09b8df63e38a2f73160c209 ]

    xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len.
    This can happen when xdr_buf_read_mic() is given an xdr_buf with
    a small page array (like, only a few bytes).

    Instead, just cap the number of bytes that xdr_shrink_pagelen()
    will move.

    Fixes: 5f1bc39979d ("SUNRPC: Fix buffer handling of GSS MIC ... ")
    Signed-off-by: Chuck Lever
    Reviewed-by: Benjamin Coddington
    Signed-off-by: Trond Myklebust
    Signed-off-by: Sasha Levin

    Chuck Lever
     
  • commit 8729aaba74626c4ebce3abf1b9e96bb62d2958ca upstream.

    I noticed that for callback requests, the reported backlog latency
    is always zero, and the rtt value is crazy big. The problem was that
    rqst->rq_xtime is never set for backchannel requests.

    Fixes: 78215759e20d ("SUNRPC: Make RTT measurement more ... ")
    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit 5866efa8cbfbadf3905072798e96652faf02dbe8 upstream.

    gss_read_proxy_verf() assumes things about the XDR buffer containing
    the RPC Call that are not true for buffers generated by
    svc_rdma_recv().

    RDMA's buffers look more like what the upper layer generates for
    sending: head is a kmalloc'd buffer; it does not point to a page
    whose contents are contiguous with the first page in the buffers'
    page array. The result is that ACCEPT_SEC_CONTEXT via RPC/RDMA has
    stopped working on Linux NFS servers that use gssproxy.

    This does not affect clients that use only TCP to send their
    ACCEPT_SEC_CONTEXT operation (that's all Linux clients). Other
    clients, like Solaris NFS clients, send ACCEPT_SEC_CONTEXT on the
    same transport as they send all other NFS operations. Such clients
    can send ACCEPT_SEC_CONTEXT via RPC/RDMA.

    I thought I had found every direct reference in the server RPC code
    to the rqstp->rq_pages field.

    Bug found at the 2019 Westford NFS bake-a-thon.

    Fixes: 3316f0631139 ("svcrdma: Persistently allocate and DMA- ... ")
    Signed-off-by: Chuck Lever
    Tested-by: Bill Baker
    Reviewed-by: Simo Sorce
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     

18 Jan, 2020

7 commits

  • commit 671c450b6fe0680ea1cb1cf1526d764fdd5a3d3f upstream.

    Since v5.4, a device removal occasionally triggered this oops:

    Dec 2 17:13:53 manet kernel: BUG: unable to handle page fault for address: 0000000c00000219
    Dec 2 17:13:53 manet kernel: #PF: supervisor read access in kernel mode
    Dec 2 17:13:53 manet kernel: #PF: error_code(0x0000) - not-present page
    Dec 2 17:13:53 manet kernel: PGD 0 P4D 0
    Dec 2 17:13:53 manet kernel: Oops: 0000 [#1] SMP
    Dec 2 17:13:53 manet kernel: CPU: 2 PID: 468 Comm: kworker/2:1H Tainted: G W 5.4.0-00050-g53717e43af61 #883
    Dec 2 17:13:53 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
    Dec 2 17:13:53 manet kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
    Dec 2 17:13:53 manet kernel: RIP: 0010:rpcrdma_wc_receive+0x7c/0xf6 [rpcrdma]
    Dec 2 17:13:53 manet kernel: Code: 6d 8b 43 14 89 c1 89 45 78 48 89 4d 40 8b 43 2c 89 45 14 8b 43 20 89 45 18 48 8b 45 20 8b 53 14 48 8b 30 48 8b 40 10 48 8b 38 8b 87 18 02 00 00 48 85 c0 75 18 48 8b 05 1e 24 c4 e1 48 85 c0
    Dec 2 17:13:53 manet kernel: RSP: 0018:ffffc900035dfe00 EFLAGS: 00010246
    Dec 2 17:13:53 manet kernel: RAX: ffff888467290000 RBX: ffff88846c638400 RCX: 0000000000000048
    Dec 2 17:13:53 manet kernel: RDX: 0000000000000048 RSI: 00000000f942e000 RDI: 0000000c00000001
    Dec 2 17:13:53 manet kernel: RBP: ffff888467611b00 R08: ffff888464e4a3c4 R09: 0000000000000000
    Dec 2 17:13:53 manet kernel: R10: ffffc900035dfc88 R11: fefefefefefefeff R12: ffff888865af4428
    Dec 2 17:13:53 manet kernel: R13: ffff888466023000 R14: ffff88846c63f000 R15: 0000000000000010
    Dec 2 17:13:53 manet kernel: FS: 0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000
    Dec 2 17:13:53 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Dec 2 17:13:53 manet kernel: CR2: 0000000c00000219 CR3: 0000000002009002 CR4: 00000000001606e0
    Dec 2 17:13:53 manet kernel: Call Trace:
    Dec 2 17:13:53 manet kernel: __ib_process_cq+0x5c/0x14e [ib_core]
    Dec 2 17:13:53 manet kernel: ib_cq_poll_work+0x26/0x70 [ib_core]
    Dec 2 17:13:53 manet kernel: process_one_work+0x19d/0x2cd
    Dec 2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
    Dec 2 17:13:53 manet kernel: worker_thread+0x1a6/0x25a
    Dec 2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
    Dec 2 17:13:53 manet kernel: kthread+0xf4/0xf9
    Dec 2 17:13:53 manet kernel: ? kthread_queue_delayed_work+0x74/0x74
    Dec 2 17:13:53 manet kernel: ret_from_fork+0x24/0x30

    The proximal cause is that this rpcrdma_rep has a rr_rdmabuf that
    is still pointing to the old ib_device, which has been freed. The
    only way that is possible is if this rpcrdma_rep was not destroyed
    by rpcrdma_ia_remove.

    Debugging showed that was indeed the case: this rpcrdma_rep was
    still in use by a completing RPC at the time of the device removal,
    and thus wasn't on the rep free list. So, it was not found by
    rpcrdma_reps_destroy().

    The fix is to introduce a list of all rpcrdma_reps so that they all
    can be found when a device is removed. That list is used to perform
    only regbuf DMA unmapping, replacing that call to
    rpcrdma_reps_destroy().

    Meanwhile, to prevent corruption of this list, I've moved the
    destruction of temp rpcrdma_rep objects to rpcrdma_post_recvs().
    rpcrdma_xprt_drain() ensures that post_recvs (and thus rep_destroy) is
    not invoked while rpcrdma_reps_unmap is walking rb_all_reps, thus
    protecting the rb_all_reps list.

    Fixes: b0b227f071a0 ("xprtrdma: Use an llist to manage free rpcrdma_reps")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit 13cb886c591f341a8759f175292ddf978ef903a1 upstream.

    I've found that on occasion, "rmmod " will hang while if an NFS
    is under load.

    Ensure that ri_remove_done is initialized only just before the
    transport is woken up to force a close. This avoids the completion
    possibly getting initialized again while the CM event handler is
    waiting for a wake-up.

    Fixes: bebd031866ca ("xprtrdma: Support unplugging an HCA from under an NFS mount")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit b32b9ed493f938e191f790a0991d20b18b38c35b upstream.

    On device re-insertion, the RDMA device driver crashes trying to set
    up a new QP:

    Nov 27 16:32:06 manet kernel: BUG: kernel NULL pointer dereference, address: 00000000000001c0
    Nov 27 16:32:06 manet kernel: #PF: supervisor write access in kernel mode
    Nov 27 16:32:06 manet kernel: #PF: error_code(0x0002) - not-present page
    Nov 27 16:32:06 manet kernel: PGD 0 P4D 0
    Nov 27 16:32:06 manet kernel: Oops: 0002 [#1] SMP
    Nov 27 16:32:06 manet kernel: CPU: 1 PID: 345 Comm: kworker/u28:0 Tainted: G W 5.4.0 #852
    Nov 27 16:32:06 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
    Nov 27 16:32:06 manet kernel: Workqueue: xprtiod xprt_rdma_connect_worker [rpcrdma]
    Nov 27 16:32:06 manet kernel: RIP: 0010:atomic_try_cmpxchg+0x2/0x12
    Nov 27 16:32:06 manet kernel: Code: ff ff 48 8b 04 24 5a c3 c6 07 00 0f 1f 40 00 c3 31 c0 48 81 ff 08 09 68 81 72 0c 31 c0 48 81 ff 83 0c 68 81 0f 92 c0 c3 8b 06 0f b1 17 0f 94 c2 84 d2 75 02 89 06 88 d0 c3 53 ba 01 00 00 00
    Nov 27 16:32:06 manet kernel: RSP: 0018:ffffc900035abbf0 EFLAGS: 00010046
    Nov 27 16:32:06 manet kernel: RAX: 0000000000000000 RBX: 00000000000001c0 RCX: 0000000000000000
    Nov 27 16:32:06 manet kernel: RDX: 0000000000000001 RSI: ffffc900035abbfc RDI: 00000000000001c0
    Nov 27 16:32:06 manet kernel: RBP: ffffc900035abde0 R08: 000000000000000e R09: ffffffffffffc000
    Nov 27 16:32:06 manet kernel: R10: 0000000000000000 R11: 000000000002e800 R12: ffff88886169d9f8
    Nov 27 16:32:06 manet kernel: R13: ffff88886169d9f4 R14: 0000000000000246 R15: 0000000000000000
    Nov 27 16:32:06 manet kernel: FS: 0000000000000000(0000) GS:ffff88846fa40000(0000) knlGS:0000000000000000
    Nov 27 16:32:06 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Nov 27 16:32:06 manet kernel: CR2: 00000000000001c0 CR3: 0000000002009006 CR4: 00000000001606e0
    Nov 27 16:32:06 manet kernel: Call Trace:
    Nov 27 16:32:06 manet kernel: do_raw_spin_lock+0x2f/0x5a
    Nov 27 16:32:06 manet kernel: create_qp_common.isra.47+0x856/0xadf [mlx4_ib]
    Nov 27 16:32:06 manet kernel: ? slab_post_alloc_hook.isra.60+0xa/0x1a
    Nov 27 16:32:06 manet kernel: ? __kmalloc+0x125/0x139
    Nov 27 16:32:06 manet kernel: mlx4_ib_create_qp+0x57f/0x972 [mlx4_ib]

    The fix is to copy the qp_init_attr struct that was just created by
    rpcrdma_ep_create() instead of using the one from the previous
    connection instance.

    Fixes: 98ef77d1aaa7 ("xprtrdma: Send Queue size grows after a reconnect")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit 2ae50ad68cd79224198b525f7bd645c9da98b6ff upstream.

    A recent clean up attempted to separate Receive handling and RPC
    Reply processing, in the name of clean layering.

    Unfortunately, we can't do this because the Receive Queue has to be
    refilled _after_ the most recent credit update from the responder
    is parsed from the transport header, but _before_ we wake up the
    next RPC sender. That is right in the middle of
    rpcrdma_reply_handler().

    Usually this isn't a problem because current responder
    implementations don't vary their credit grant. The one exception is
    when a connection is established: the grant goes from one to a much
    larger number on the first Receive. The requester MUST post enough
    Receives right then so that any outstanding requests can be sent
    without risking RNR and connection loss.

    Fixes: 6ceea36890a0 ("xprtrdma: Refactor Receive accounting")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit c3700780a096fc66467c81076ddf7f3f11d639b5 upstream.

    Close some holes introduced by commit 6dc6ec9e04c4 ("xprtrdma: Cache
    free MRs in each rpcrdma_req") that could result in list corruption.

    In addition, the result that is tabulated in @count is no longer
    used, so @count is removed.

    Fixes: 6dc6ec9e04c4 ("xprtrdma: Cache free MRs in each rpcrdma_req")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit a31b2f939219dd9bffdf01a45bd91f209f8cc369 upstream.

    This is because xprt_request_get_cong() is allowing more than one
    RPC Call to be transmitted before the first Receive on the new
    connection. The first Receive fills the Receive Queue based on the
    server's credit grant. Before that Receive, there is only a single
    Receive WR posted because the client doesn't know the server's
    credit grant.

    Solution is to clear rq_cong on all outstanding rpc_rqsts when the
    the cwnd is reset. This is because an RPC/RDMA credit is good for
    one connection instance only.

    Fixes: 75891f502f5f ("SUNRPC: Support for congestion control ... ")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit 4b93dab36f28e673725e5e6123ebfccf7697f96a upstream.

    When adding frwr_unmap_async way back when, I re-used the existing
    trace_xprtrdma_post_send() trace point to record the return code
    of ib_post_send.

    Unfortunately there are some cases where re-using that trace point
    causes a crash. Instead, construct a trace point specific to posting
    Local Invalidate WRs that will always be safe to use in that context,
    and will act as a trace log eye-catcher for Local Invalidation.

    Fixes: 847568942f93 ("xprtrdma: Remove fr_state")
    Fixes: d8099feda483 ("xprtrdma: Reduce context switching due ... ")
    Signed-off-by: Chuck Lever
    Tested-by: Bill Baker
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     

09 Jan, 2020

1 commit

  • commit 5fcaf6982d1167f1cd9b264704f6d1ef4c505d54 upstream.

    I was investigating a crash in our Virtuozzo7 kernel which happened in
    in svcauth_unix_set_client. I found out that we access m_client field
    in ip_map structure, which was received from sunrpc_cache_lookup (we
    have a bit older kernel, now the code is in sunrpc_cache_add_entry), and
    these field looks uninitialized (m_client == 0x74 don't look like a
    pointer) but in the cache_head in flags we see 0x1 which is CACHE_VALID.

    It looks like the problem appeared from our previous fix to sunrpc (1):
    commit 4ecd55ea0742 ("sunrpc: fix cache_head leak due to queued
    request")

    And we've also found a patch already fixing our patch (2):
    commit d58431eacb22 ("sunrpc: don't mark uninitialised items as VALID.")

    Though the crash is eliminated, I think the core of the problem is not
    completely fixed:

    Neil in the patch (2) makes cache_head CACHE_NEGATIVE, before
    cache_fresh_locked which was added in (1) to fix crash. These way
    cache_is_valid won't say the cache is valid anymore and in
    svcauth_unix_set_client the function cache_check will return error
    instead of 0, and we don't count entry as initialized.

    But it looks like we need to remove cache_fresh_locked completely in
    sunrpc_cache_lookup:

    In (1) we've only wanted to make cache_fresh_unlocked->cache_dequeue so
    that cache_requests with no readers also release corresponding
    cache_head, to fix their leak. We with Vasily were not sure if
    cache_fresh_locked and cache_fresh_unlocked should be used in pair or
    not, so we've guessed to use them in pair.

    Now we see that we don't want the CACHE_VALID bit set here by
    cache_fresh_locked, as "valid" means "initialized" and there is no
    initialization in sunrpc_cache_add_entry. Both expiry_time and
    last_refresh are not used in cache_fresh_unlocked code-path and also not
    required for the initial fix.

    So to conclude cache_fresh_locked was called by mistake, and we can just
    safely remove it instead of crutching it with CACHE_NEGATIVE. It looks
    ideologically better for me. Hope I don't miss something here.

    Here is our crash backtrace:
    [13108726.326291] BUG: unable to handle kernel NULL pointer dereference at 0000000000000074
    [13108726.326365] IP: [] svcauth_unix_set_client+0x2ab/0x520 [sunrpc]
    [13108726.326448] PGD 0
    [13108726.326468] Oops: 0002 [#1] SMP
    [13108726.326497] Modules linked in: nbd isofs xfs loop kpatch_cumulative_81_0_r1(O) xt_physdev nfnetlink_queue bluetooth rfkill ip6table_nat nf_nat_ipv6 ip_vs_wrr ip_vs_wlc ip_vs_sh nf_conntrack_netlink ip_vs_sed ip_vs_pe_sip nf_conntrack_sip ip_vs_nq ip_vs_lc ip_vs_lblcr ip_vs_lblc ip_vs_ftp ip_vs_dh nf_nat_ftp nf_conntrack_ftp iptable_raw xt_recent nf_log_ipv6 xt_hl ip6t_rt nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_TCPMSS xt_tcpmss vxlan ip6_udp_tunnel udp_tunnel xt_statistic xt_NFLOG nfnetlink_log dummy xt_mark xt_REDIRECT nf_nat_redirect raw_diag udp_diag tcp_diag inet_diag netlink_diag af_packet_diag unix_diag rpcsec_gss_krb5 xt_addrtype ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 ebtable_nat ebtable_broute nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_raw nfsv4
    [13108726.327173] dns_resolver cls_u32 binfmt_misc arptable_filter arp_tables ip6table_filter ip6_tables devlink fuse_kio_pcs ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat iptable_nat nf_nat_ipv4 xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_wdog_tmo xt_multiport bonding xt_set xt_conntrack iptable_filter iptable_mangle kpatch(O) ebtable_filter ebt_among ebtables ip_set_hash_ip ip_set nfnetlink vfat fat skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass fuse pcspkr ses enclosure joydev sg mei_me hpwdt hpilo lpc_ich mei ipmi_si shpchp ipmi_devintf ipmi_msghandler xt_ipvs acpi_power_meter ip_vs_rr nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd grace fscache nf_nat cls_fw sch_htb sch_cbq sch_sfq ip_vs em_u32 nf_conntrack tun br_netfilter veth overlay ip6_vzprivnet ip6_vznetstat ip_vznetstat
    [13108726.327817] ip_vzprivnet vziolimit vzevent vzlist vzstat vznetstat vznetdev vzmon vzdev bridge pio_kaio pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper scsi_transport_iscsi 8021q syscopyarea sysfillrect garp sysimgblt fb_sys_fops mrp stp ttm llc bnx2x crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel drm dm_multipath ghash_clmulni_intel uas aesni_intel lrw gf128mul glue_helper ablk_helper cryptd tg3 smartpqi scsi_transport_sas mdio libcrc32c i2c_core usb_storage ptp pps_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kpatch_cumulative_82_0_r1]
    [13108726.328403] CPU: 35 PID: 63742 Comm: nfsd ve: 51332 Kdump: loaded Tainted: G W O ------------ 3.10.0-862.20.2.vz7.73.29 #1 73.29
    [13108726.328491] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 10/02/2018
    [13108726.328554] task: ffffa0a6a41b1160 ti: ffffa0c2a74bc000 task.ti: ffffa0c2a74bc000
    [13108726.328610] RIP: 0010:[] [] svcauth_unix_set_client+0x2ab/0x520 [sunrpc]
    [13108726.328706] RSP: 0018:ffffa0c2a74bfd80 EFLAGS: 00010246
    [13108726.328750] RAX: 0000000000000001 RBX: ffffa0a6183ae000 RCX: 0000000000000000
    [13108726.328811] RDX: 0000000000000074 RSI: 0000000000000286 RDI: ffffa0c2a74bfcf0
    [13108726.328864] RBP: ffffa0c2a74bfe00 R08: ffffa0bab8c22960 R09: 0000000000000001
    [13108726.328916] R10: 0000000000000001 R11: 0000000000000001 R12: ffffa0a32aa7f000
    [13108726.328969] R13: ffffa0a6183afac0 R14: ffffa0c233d88d00 R15: ffffa0c2a74bfdb4
    [13108726.329022] FS: 0000000000000000(0000) GS:ffffa0e17f9c0000(0000) knlGS:0000000000000000
    [13108726.329081] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [13108726.332311] CR2: 0000000000000074 CR3: 00000026a1b28000 CR4: 00000000007607e0
    [13108726.334606] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [13108726.336754] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [13108726.338908] PKRU: 00000000
    [13108726.341047] Call Trace:
    [13108726.343074] [] ? groups_alloc+0x34/0x110
    [13108726.344837] [] svc_set_client+0x24/0x30 [sunrpc]
    [13108726.346631] [] svc_process_common+0x241/0x710 [sunrpc]
    [13108726.348332] [] svc_process+0x103/0x190 [sunrpc]
    [13108726.350016] [] nfsd+0xdf/0x150 [nfsd]
    [13108726.351735] [] ? nfsd_destroy+0x80/0x80 [nfsd]
    [13108726.353459] [] kthread+0xd1/0xe0
    [13108726.355195] [] ? create_kthread+0x60/0x60
    [13108726.356896] [] ret_from_fork_nospec_begin+0x7/0x21
    [13108726.358577] [] ? create_kthread+0x60/0x60
    [13108726.360240] Code: 4c 8b 45 98 0f 8e 2e 01 00 00 83 f8 fe 0f 84 76 fe ff ff 85 c0 0f 85 2b 01 00 00 49 8b 50 40 b8 01 00 00 00 48 89 93 d0 1a 00 00 0f c1 02 83 c0 01 83 f8 01 0f 8e 53 02 00 00 49 8b 44 24 38
    [13108726.363769] RIP [] svcauth_unix_set_client+0x2ab/0x520 [sunrpc]
    [13108726.365530] RSP
    [13108726.367179] CR2: 0000000000000074

    Fixes: d58431eacb22 ("sunrpc: don't mark uninitialised items as VALID.")
    Signed-off-by: Pavel Tikhomirov
    Acked-by: NeilBrown
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Pavel Tikhomirov
     

13 Dec, 2019

1 commit

  • commit 66eb3add452aa1be65ad536da99fac4b8f620b74 upstream.

    Jon Hunter: "I have been tracking down another suspend/NFS related
    issue where again I am seeing random delays exiting suspend. The delays
    can be up to a couple minutes in the worst case and this is causing a
    suspend test we have to fail."

    Change the use of a deferrable work to a standard delayed one.

    Reported-by: Jon Hunter
    Tested-by: Jon Hunter
    Fixes: 7e0a0e38fcfea ("SUNRPC: Replace the queue timer with a delayed work function")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     

31 Oct, 2019

3 commits


11 Oct, 2019

1 commit

  • Since commit 4f8943f80883 ("SUNRPC: Replace direct task wakeups from
    softirq context") there has been a race to the value of the sk_err if both
    XPRT_SOCK_WAKE_ERROR and XPRT_SOCK_WAKE_DISCONNECT are set. In that case,
    we may end up losing the sk_err value that existed when xs_error_report was
    called.

    Fix this by reverting to the previous behavior: instead of using SO_ERROR
    to retrieve the value at a later time (which might also return sk_err_soft),
    copy the sk_err value onto struct sock_xprt, and use that value to wake
    pending tasks.

    Signed-off-by: Benjamin Coddington
    Fixes: 4f8943f80883 ("SUNRPC: Replace direct task wakeups from softirq context")
    Signed-off-by: Anna Schumaker

    Benjamin Coddington
     

28 Sep, 2019

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - Add a new knfsd file cache, so that we don't have to open and close
    on each (NFSv2/v3) READ or WRITE. This can speed up read and write
    in some cases. It also replaces our readahead cache.

    - Prevent silent data loss on write errors, by treating write errors
    like server reboots for the purposes of write caching, thus forcing
    clients to resend their writes.

    - Tweak the code that allocates sessions to be more forgiving, so
    that NFSv4.1 mounts are less likely to hang when a server already
    has a lot of clients.

    - Eliminate an arbitrary limit on NFSv4 ACL sizes; they should now be
    limited only by the backend filesystem and the maximum RPC size.

    - Allow the server to enforce use of the correct kerberos credentials
    when a client reclaims state after a reboot.

    And some miscellaneous smaller bugfixes and cleanup"

    * tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux: (34 commits)
    sunrpc: clean up indentation issue
    nfsd: fix nfs read eof detection
    nfsd: Make nfsd_reset_boot_verifier_locked static
    nfsd: degraded slot-count more gracefully as allocation nears exhaustion.
    nfsd: handle drc over-allocation gracefully.
    nfsd: add support for upcall version 2
    nfsd: add a "GetVersion" upcall for nfsdcld
    nfsd: Reset the boot verifier on all write I/O errors
    nfsd: Don't garbage collect files that might contain write errors
    nfsd: Support the server resetting the boot verifier
    nfsd: nfsd_file cache entries should be per net namespace
    nfsd: eliminate an unnecessary acl size limit
    Deprecate nfsd fault injection
    nfsd: remove duplicated include from filecache.c
    nfsd: Fix the documentation for svcxdr_tmpalloc()
    nfsd: Fix up some unused variable warnings
    nfsd: close cached files prior to a REMOVE or RENAME that would replace target
    nfsd: rip out the raparms cache
    nfsd: have nfsd_test_lock use the nfsd_file cache
    nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
    ...

    Linus Torvalds
     

27 Sep, 2019

1 commit

  • Pull NFS client updates from Anna Schumaker:
    "Stable bugfixes:
    - Dequeue the request from the receive queue while we're re-encoding
    # v4.20+
    - Fix buffer handling of GSS MIC without slack # 5.1

    Features:
    - Increase xprtrdma maximum transport header and slot table sizes
    - Add support for nfs4_call_sync() calls using a custom
    rpc_task_struct
    - Optimize the default readahead size
    - Enable pNFS filelayout LAYOUTGET on OPEN

    Other bugfixes and cleanups:
    - Fix possible null-pointer dereferences and memory leaks
    - Various NFS over RDMA cleanups
    - Various NFS over RDMA comment updates
    - Don't receive TCP data into a reset request buffer
    - Don't try to parse incomplete RPC messages
    - Fix congestion window race with disconnect
    - Clean up pNFS return-on-close error handling
    - Fixes for NFS4ERR_OLD_STATEID handling"

    * tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (53 commits)
    pNFS/filelayout: enable LAYOUTGET on OPEN
    NFS: Optimise the default readahead size
    NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
    NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
    NFSv4: Fix OPEN_DOWNGRADE error handling
    pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid
    NFSv4: Add a helper to increment stateid seqids
    NFSv4: Handle RPC level errors in LAYOUTRETURN
    NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
    NFSv4: Clean up pNFS return-on-close error handling
    pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
    NFS: remove unused check for negative dentry
    NFSv3: use nfs_add_or_obtain() to create and reference inodes
    NFS: Refactor nfs_instantiate() for dentry referencing callers
    SUNRPC: Fix congestion window race with disconnect
    SUNRPC: Don't try to parse incomplete RPC messages
    SUNRPC: Rename xdr_buf_read_netobj to xdr_buf_read_mic
    SUNRPC: Fix buffer handling of GSS MIC without slack
    SUNRPC: RPC level errors should always set task->tk_rpc_status
    SUNRPC: Don't receive TCP data into a request buffer that has been reset
    ...

    Linus Torvalds
     

25 Sep, 2019

1 commit


22 Sep, 2019

1 commit

  • Pull RDMA subsystem updates from Jason Gunthorpe:
    "This cycle mainly saw lots of bug fixes and clean up code across the
    core code and several drivers, few new functional changes were made.

    - Many cleanup and bug fixes for hns

    - Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed,
    bnxt_re, efa

    - Share the query_port code between all the iWarp drivers

    - General rework and cleanup of the ODP MR umem code to fit better
    with the mmu notifier get/put scheme

    - Support rdma netlink in non init_net name spaces

    - mlx5 support for XRC devx and DC ODP"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits)
    RDMA: Fix double-free in srq creation error flow
    RDMA/efa: Fix incorrect error print
    IB/mlx5: Free mpi in mp_slave mode
    IB/mlx5: Use the original address for the page during free_pages
    RDMA/bnxt_re: Fix spelling mistake "missin_resp" -> "missing_resp"
    RDMA/hns: Package operations of rq inline buffer into separate functions
    RDMA/hns: Optimize cmd init and mode selection for hip08
    IB/hfi1: Define variables as unsigned long to fix KASAN warning
    IB/{rdmavt, hfi1, qib}: Add a counter for credit waits
    IB/hfi1: Add traces for TID RDMA READ
    RDMA/siw: Relax from kmap_atomic() use in TX path
    IB/iser: Support up to 16MB data transfer in a single command
    RDMA/siw: Fix page address mapping in TX path
    RDMA: Fix goto target to release the allocated memory
    RDMA/usnic: Avoid overly large buffers on stack
    RDMA/odp: Add missing cast for 32 bit
    RDMA/hns: Use devm_platform_ioremap_resource() to simplify code
    Documentation/infiniband: update name of some functions
    RDMA/cma: Fix false error message
    RDMA/hns: Fix wrong assignment of qp_access_flags
    ...

    Linus Torvalds
     

21 Sep, 2019

4 commits

  • If the congestion window closes just as the transport disconnects,
    a reconnect is never driven because:

    1. The XPRT_CONG_WAIT flag prevents tasks from taking the write lock
    2. There's no wake-up of the first task on the xprt->sending queue

    To address this, clear the congestion wait flag as part of
    completing a disconnect.

    Fixes: 75891f502f5f ("SUNRPC: Support for congestion control ... ")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • If the copy of the RPC reply into our buffers did not complete, and
    we could end up with a truncated message. In that case, just resend
    the call.

    Fixes: a0584ee9aed80 ("SUNRPC: Use struct xdr_stream when decoding...")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • Let the name reflect the single use. The function now assumes the GSS MIC
    is the last object in the buffer.

    Signed-off-by: Benjamin Coddington
    Signed-off-by: Anna Schumaker

    Benjamin Coddington
     
  • The GSS Message Integrity Check data for krb5i may lie partially in the XDR
    reply buffer's pages and tail. If so, we try to copy the entire MIC into
    free space in the tail. But as the estimations of the slack space required
    for authentication and verification have improved there may be less free
    space in the tail to complete this copy -- see commit 2c94b8eca1a2
    ("SUNRPC: Use au_rslack when computing reply buffer size"). In fact, there
    may only be room in the tail for a single copy of the MIC, and not part of
    the MIC and then another complete copy.

    The real world failure reported is that `ls` of a directory on NFS may
    sometimes return -EIO, which can be traced back to xdr_buf_read_netobj()
    failing to find available free space in the tail to copy the MIC.

    Fix this by checking for the case of the MIC crossing the boundaries of
    head, pages, and tail. If so, shift the buffer until the MIC is contained
    completely within the pages or tail. This allows the remainder of the
    function to create a sub buffer that directly address the complete MIC.

    Signed-off-by: Benjamin Coddington
    Cc: stable@vger.kernel.org # v5.1
    Reviewed-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Benjamin Coddington
     

19 Sep, 2019

1 commit

  • Pull vfs mount API infrastructure updates from Al Viro:
    "Infrastructure bits of mount API conversions.

    The rest is more of per-filesystem updates and that will happen
    in separate pull requests"

    * 'work.mount-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    mtd: Provide fs_context-aware mount_mtd() replacement
    vfs: Create fs_context-aware mount_bdev() replacement
    new helper: get_tree_keyed()
    vfs: set fs_context::user_ns for reconfigure

    Linus Torvalds
     

18 Sep, 2019

3 commits


14 Sep, 2019

1 commit


06 Sep, 2019

1 commit


05 Sep, 2019

1 commit


27 Aug, 2019

6 commits