09 Feb, 2017

1 commit

  • commit 034dd34ff4916ec1f8f74e39ca3efb04eab2f791 upstream.

    Olga Kornievskaia says: "I ran into this oops in the nfsd (below)
    (4.10-rc3 kernel). To trigger this I had a client (unsuccessfully) try
    to mount the server with krb5 where the server doesn't have the
    rpcsec_gss_krb5 module built."

    The problem is that rsci.cred is copied from a svc_cred structure that
    gss_proxy didn't properly initialize. Fix that.

    [120408.542387] general protection fault: 0000 [#1] SMP
    ...
    [120408.565724] CPU: 0 PID: 3601 Comm: nfsd Not tainted 4.10.0-rc3+ #16
    [120408.567037] Hardware name: VMware, Inc. VMware Virtual =
    Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
    [120408.569225] task: ffff8800776f95c0 task.stack: ffffc90003d58000
    [120408.570483] RIP: 0010:gss_mech_put+0xb/0x20 [auth_rpcgss]
    ...
    [120408.584946] ? rsc_free+0x55/0x90 [auth_rpcgss]
    [120408.585901] gss_proxy_save_rsc+0xb2/0x2a0 [auth_rpcgss]
    [120408.587017] svcauth_gss_proxy_init+0x3cc/0x520 [auth_rpcgss]
    [120408.588257] ? __enqueue_entity+0x6c/0x70
    [120408.589101] svcauth_gss_accept+0x391/0xb90 [auth_rpcgss]
    [120408.590212] ? try_to_wake_up+0x4a/0x360
    [120408.591036] ? wake_up_process+0x15/0x20
    [120408.592093] ? svc_xprt_do_enqueue+0x12e/0x2d0 [sunrpc]
    [120408.593177] svc_authenticate+0xe1/0x100 [sunrpc]
    [120408.594168] svc_process_common+0x203/0x710 [sunrpc]
    [120408.595220] svc_process+0x105/0x1c0 [sunrpc]
    [120408.596278] nfsd+0xe9/0x160 [nfsd]
    [120408.597060] kthread+0x101/0x140
    [120408.597734] ? nfsd_destroy+0x60/0x60 [nfsd]
    [120408.598626] ? kthread_park+0x90/0x90
    [120408.599448] ret_from_fork+0x22/0x30

    Fixes: 1d658336b05f "SUNRPC: Add RPC based upcall mechanism for RPCGSS auth"
    Cc: Simo Sorce
    Reported-by: Olga Kornievskaia
    Tested-by: Olga Kornievskaia
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    J. Bruce Fields
     

01 Feb, 2017

1 commit

  • commit c929ea0b910355e1876c64431f3d5802f95b3d75 upstream.

    After removing sunrpc module, I get many kmemleak information as,
    unreferenced object 0xffff88003316b1e0 (size 544):
    comm "gssproxy", pid 2148, jiffies 4294794465 (age 4200.081s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] kmemleak_alloc+0x4a/0xa0
    [] kmem_cache_alloc+0x15e/0x1f0
    [] ida_pre_get+0xaa/0x150
    [] ida_simple_get+0xad/0x180
    [] nlmsvc_lookup_host+0x4ab/0x7f0 [lockd]
    [] lockd+0x4d/0x270 [lockd]
    [] param_set_timeout+0x55/0x100 [lockd]
    [] svc_defer+0x114/0x3f0 [sunrpc]
    [] svc_defer+0x2d7/0x3f0 [sunrpc]
    [] rpc_show_info+0x8a/0x110 [sunrpc]
    [] proc_reg_write+0x7f/0xc0
    [] __vfs_write+0xdf/0x3c0
    [] vfs_write+0xef/0x240
    [] SyS_write+0xad/0x130
    [] entry_SYSCALL_64_fastpath+0x1a/0xa9
    [] 0xffffffffffffffff

    I found, the ida information (dynamic memory) isn't cleanup.

    Signed-off-by: Kinglong Mee
    Fixes: 2f048db4680a ("SUNRPC: Add an identifier for struct rpc_clnt")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Kinglong Mee
     

26 Jan, 2017

5 commits

  • commit 6d6bf72de914059b304f7b99530a7856e5c846aa upstream.

    Clean up: This message was intended to be a dprintk, as it is on the
    server-side.

    Fixes: 87cfb9a0c85c ('xprtrdma: Client-side support for ...')
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit 8d38de65644d900199f035277aa5f3da4aa9fc17 upstream.

    Verbs providers may perform house-keeping on the Send Queue during
    each signaled send completion. It is necessary therefore for a verbs
    consumer (like xprtrdma) to occasionally force a signaled send
    completion if it runs unsignaled most of the time.

    xprtrdma does not require signaled completions for Send or FastReg
    Work Requests, but does signal some LocalInv Work Requests. To
    ensure that Send Queue house-keeping can run before the Send Queue
    is more than half-consumed, xprtrdma forces a signaled completion
    on occasion by counting the number of Send Queue Entries it
    consumes. It currently does this by counting each ib_post_send as
    one Entry.

    Commit c9918ff56dfb ("xprtrdma: Add ro_unmap_sync method for FRWR")
    introduced the ability for frwr_op_unmap_sync to post more than one
    Work Request with a single post_send. Thus the underlying assumption
    of one Send Queue Entry per ib_post_send is no longer true.

    Also, FastReg Work Requests are currently never signaled. They
    should be signaled once in a while, just as Send is, to keep the
    accounting of consumed SQEs accurate.

    While we're here, convert the CQCOUNT macros to the currently
    preferred kernel coding style, which is inline functions.

    Fixes: c9918ff56dfb ("xprtrdma: Add ro_unmap_sync method for FRWR")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit ce1ca7d2d140a1f4aaffd297ac487f246963dd2f upstream.

    In rdma_read_chunk_frmr() when ib_post_send() fails, the error code path
    invokes ib_dma_unmap_sg() to unmap the sg list. It then invokes
    svc_rdma_put_frmr() which in turn tries to unmap the same sg list through
    ib_dma_unmap_sg() again. This second unmap is invalid and could lead to
    problems when the iova being unmapped is subsequently reused. Remove
    the call to unmap in rdma_read_chunk_frmr() and let svc_rdma_put_frmr()
    handle it.

    Fixes: 412a15c0fe53 ("svcrdma: Port to new memory registration API")
    Signed-off-by: Sriharsha Basavapatna
    Reviewed-by: Chuck Lever
    Reviewed-by: Yuval Shaia
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Sriharsha Basavapatna
     
  • commit 78794d1890708cf94e3961261e52dcec2cc34722 upstream.

    Context expiry times are in units of seconds since boot, not unix time.

    The use of get_seconds() here therefore sets the expiry time decades in
    the future. This prevents timely freeing of contexts destroyed by
    client RPC_GSS_PROC_DESTROY requests. We'd still free them eventually
    (when the module is unloaded or the container shut down), but a lot of
    contexts could pile up before then.

    Fixes: c5b29f885afe "sunrpc: use seconds since boot in expiry cache"
    Reported-by: Andy Adamson
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    J. Bruce Fields
     
  • commit 546125d1614264d26080817d0c8cddb9b25081fa upstream.

    The inet6addr_chain is an atomic notifier chain, so we can't call
    anything that might sleep (like lock_sock)... instead of closing the
    socket from svc_age_temp_xprts_now (which is called by the notifier
    function), just have the rpc service threads do it instead.

    Fixes: c3d4879e01be "sunrpc: Add a function to close..."
    Signed-off-by: Scott Mayhew
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Scott Mayhew
     

15 Jan, 2017

1 commit


09 Jan, 2017

1 commit

  • commit 1cded9d2974fe4fe339fc0ccd6638b80d465ab2c upstream.

    There are two problems with refcounting of auth_gss messages.

    First, the reference on the pipe->pipe list (taken by a call
    to rpc_queue_upcall()) is not counted. It seems to be
    assumed that a message in pipe->pipe will always also be in
    pipe->in_downcall, where it is correctly reference counted.

    However there is no guaranty of this. I have a report of a
    NULL dereferences in rpc_pipe_read() which suggests a msg
    that has been freed is still on the pipe->pipe list.

    One way I imagine this might happen is:
    - message is queued for uid=U and auth->service=S1
    - rpc.gssd reads this message and starts processing.
    This removes the message from pipe->pipe
    - message is queued for uid=U and auth->service=S2
    - rpc.gssd replies to the first message. gss_pipe_downcall()
    calls __gss_find_upcall(pipe, U, NULL) and it finds the
    *second* message, as new messages are placed at the head
    of ->in_downcall, and the service type is not checked.
    - This second message is removed from ->in_downcall and freed
    by gss_release_msg() (even though it is still on pipe->pipe)
    - rpc.gssd tries to read another message, and dereferences a pointer
    to this message that has just been freed.

    I fix this by incrementing the reference count before calling
    rpc_queue_upcall(), and decrementing it if that fails, or normally in
    gss_pipe_destroy_msg().

    It seems strange that the reply doesn't target the message more
    precisely, but I don't know all the details. In any case, I think the
    reference counting irregularity became a measureable bug when the
    extra arg was added to __gss_find_upcall(), hence the Fixes: line
    below.

    The second problem is that if rpc_queue_upcall() fails, the new
    message is not freed. gss_alloc_msg() set the ->count to 1,
    gss_add_msg() increments this to 2, gss_unhash_msg() decrements to 1,
    then the pointer is discarded so the memory never gets freed.

    Fixes: 9130b8dbc6ac ("SUNRPC: allow for upcalls for same uid but different gss service")
    Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1011250
    Signed-off-by: NeilBrown
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    NeilBrown
     

19 Nov, 2016

1 commit


14 Nov, 2016

1 commit

  • This fixes the following panic that can occur with NFSoRDMA.

    general protection fault: 0000 [#1] SMP
    Modules linked in: rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi
    scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp
    scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm
    mlx5_ib ib_core intel_powerclamp coretemp kvm_intel kvm sg ioatdma
    ipmi_devintf ipmi_ssif dcdbas iTCO_wdt iTCO_vendor_support pcspkr
    irqbypass sb_edac shpchp dca crc32_pclmul ghash_clmulni_intel edac_core
    lpc_ich aesni_intel lrw gf128mul glue_helper ablk_helper mei_me mei
    ipmi_si cryptd wmi ipmi_msghandler acpi_pad acpi_power_meter nfsd
    auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod
    crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper
    syscopyarea sysfillrect sysimgblt ahci fb_sys_fops ttm libahci mlx5_core
    tg3 crct10dif_pclmul drm crct10dif_common
    ptp i2c_core libata crc32c_intel pps_core fjes dm_mirror dm_region_hash
    dm_log dm_mod
    CPU: 1 PID: 120 Comm: kworker/1:1 Not tainted 3.10.0-514.el7.x86_64 #1
    Hardware name: Dell Inc. PowerEdge R320/0KM5PX, BIOS 2.4.2 01/29/2015
    Workqueue: events check_lifetime
    task: ffff88031f506dd0 ti: ffff88031f584000 task.ti: ffff88031f584000
    RIP: 0010:[] []
    _raw_spin_lock_bh+0x17/0x50
    RSP: 0018:ffff88031f587ba8 EFLAGS: 00010206
    RAX: 0000000000020000 RBX: 20041fac02080072 RCX: ffff88031f587fd8
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 20041fac02080072
    RBP: ffff88031f587bb0 R08: 0000000000000008 R09: ffffffff8155be77
    R10: ffff880322a59b00 R11: ffffea000bf39f00 R12: 20041fac02080072
    R13: 000000000000000d R14: ffff8800c4fbd800 R15: 0000000000000001
    FS: 0000000000000000(0000) GS:ffff880322a40000(0000)
    knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f3c52d4547e CR3: 00000000019ba000 CR4: 00000000001407e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Stack:
    20041fac02080002 ffff88031f587bd0 ffffffff81557830 20041fac02080002
    ffff88031f587c78 ffff88031f587c40 ffffffff8155ae08 000000010157df32
    0000000800000001 ffff88031f587c20 ffffffff81096acb ffffffff81aa37d0
    Call Trace:
    [] lock_sock_nested+0x20/0x50
    [] sock_setsockopt+0x78/0x940
    [] ? lock_timer_base.isra.33+0x2b/0x50
    [] kernel_setsockopt+0x4d/0x50
    [] svc_age_temp_xprts_now+0x174/0x1e0 [sunrpc]
    [] nfsd_inetaddr_event+0x9d/0xd0 [nfsd]
    [] notifier_call_chain+0x4c/0x70
    [] __blocking_notifier_call_chain+0x4d/0x70
    [] blocking_notifier_call_chain+0x16/0x20
    [] __inet_del_ifa+0x168/0x2d0
    [] check_lifetime+0x25f/0x270
    [] process_one_work+0x17b/0x470
    [] worker_thread+0x126/0x410
    [] ? rescuer_thread+0x460/0x460
    [] kthread+0xcf/0xe0
    [] ? kthread_create_on_node+0x140/0x140
    [] ret_from_fork+0x58/0x90
    [] ? kthread_create_on_node+0x140/0x140
    Code: ca 75 f1 5d c3 0f 1f 80 00 00 00 00 eb d9 66 0f 1f 44 00 00 0f 1f
    44 00 00 55 48 89 e5 53 48 89 fb e8 7e 04 a0 ff b8 00 00 02 00 0f
    c1 03 89 c2 c1 ea 10 66 39 c2 75 03 5b 5d c3 83 e2 fe 0f
    RIP [] _raw_spin_lock_bh+0x17/0x50
    RSP

    Signed-off-by: Scott Mayhew
    Fixes: c3d4879e ("sunrpc: Add a function to close temporary transports immediately")
    Reviewed-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Scott Mayhew
     

12 Nov, 2016

1 commit

  • Pull NFS client bugfixes from Anna Schumaker:
    "Most of these fix regressions in 4.9, and none are going to stable
    this time around.

    Bugfixes:
    - Trim extra slashes in v4 nfs_paths to fix tools that use this
    - Fix a -Wmaybe-uninitialized warnings
    - Fix suspicious RCU usages
    - Fix Oops when mounting multiple servers at once
    - Suppress a false-positive pNFS error
    - Fix a DMAR failure in NFS over RDMA"

    * tag 'nfs-for-4.9-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
    xprtrdma: Fix DMAR failure in frwr_op_map() after reconnect
    fs/nfs: Fix used uninitialized warn in nfs4_slot_seqid_in_use()
    NFS: Don't print a pNFS error if we aren't using pNFS
    NFS: Ignore connections that have cl_rpcclient uninitialized
    SUNRPC: Fix suspicious RCU usage
    NFSv4.1: work around -Wmaybe-uninitialized warning
    NFS: Trim extra slash in v4 nfs_path

    Linus Torvalds
     

11 Nov, 2016

1 commit

  • When a LOCALINV WR is flushed, the frmr is marked STALE, then
    frwr_op_unmap_sync DMA-unmaps the frmr's SGL. These STALE frmrs
    are then recovered when frwr_op_map hunts for an INVALID frmr to
    use.

    All other cases that need frmr recovery leave that SGL DMA-mapped.
    The FRMR recovery path unconditionally DMA-unmaps the frmr's SGL.

    To avoid DMA unmapping the SGL twice for flushed LOCAL_INV WRs,
    alter the recovery logic (rather than the hot frwr_op_unmap_sync
    path) to distinguish among these cases. This solution also takes
    care of the case where multiple LOCAL_INV WRs are issued for the
    same rpcrdma_req, some complete successfully, but some are flushed.

    Reported-by: Vasco Steinmetz
    Signed-off-by: Chuck Lever
    Tested-by: Vasco Steinmetz
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

08 Nov, 2016

1 commit

  • We need to hold the rcu_read_lock() when calling rcu_dereference(),
    otherwise we can't guarantee that the object being dereferenced still
    exists.

    Fixes: 39e5d2df ("SUNRPC search xprt switch for sockaddr")
    Signed-off-by: Anna Schumaker

    Anna Schumaker
     

02 Nov, 2016

1 commit


29 Oct, 2016

1 commit

  • We've been seeing some crashes in testing that look like this:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] memcpy_orig+0x29/0x110
    PGD 212ca2067 PUD 212ca3067 PMD 0
    Oops: 0002 [#1] SMP
    Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ppdev parport_pc i2c_piix4 sg parport i2c_core virtio_balloon pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ata_generic pata_acpi virtio_scsi 8139too ata_piix libata 8139cp mii virtio_pci floppy virtio_ring serio_raw virtio
    CPU: 1 PID: 1540 Comm: nfsd Not tainted 4.9.0-rc1 #39
    Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
    task: ffff88020d7ed200 task.stack: ffff880211838000
    RIP: 0010:[] [] memcpy_orig+0x29/0x110
    RSP: 0018:ffff88021183bdd0 EFLAGS: 00010206
    RAX: 0000000000000000 RBX: ffff88020d7fa000 RCX: 000000f400000000
    RDX: 0000000000000014 RSI: ffff880212927020 RDI: 0000000000000000
    RBP: ffff88021183be30 R08: 01000000ef896996 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff880211704ca8
    R13: ffff88021473f000 R14: 00000000ef896996 R15: ffff880211704800
    FS: 0000000000000000(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 0000000212ca1000 CR4: 00000000000006e0
    Stack:
    ffffffffa01ea087 ffffffff63400001 ffff880215145e00 ffff880211bacd00
    ffff88021473f2b8 0000000000000004 00000000d0679d67 ffff880211bacd00
    ffff88020d7fa000 ffff88021473f000 0000000000000000 ffff88020d7faa30
    Call Trace:
    [] ? svc_tcp_recvfrom+0x5a7/0x790 [sunrpc]
    [] svc_recv+0xad8/0xbd0 [sunrpc]
    [] nfsd+0xde/0x160 [nfsd]
    [] ? nfsd_destroy+0x60/0x60 [nfsd]
    [] kthread+0xd8/0xf0
    [] ret_from_fork+0x1f/0x40
    [] ? kthread_park+0x60/0x60
    Code: 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe 7c 35 48 83 ea 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d 76 20 89 07 4c 89 4f 08 4c 89 57 10 4c 89 5f 18 48 8d 7f 20 73 d4
    RIP [] memcpy_orig+0x29/0x110
    RSP
    CR2: 0000000000000000

    Both Bruce and Eryu ran a bisect here and found that the problematic
    patch was 68778945e46 (SUNRPC: Separate buffer pointers for RPC Call and
    Reply messages).

    That patch changed rpc_xdr_encode to use a new rq_rbuffer pointer to
    set up the receive buffer, but didn't change all of the necessary
    codepaths to set it properly. In particular the backchannel setup was
    missing.

    We need to set rq_rbuffer whenever rq_buffer is set. Ensure that it is.

    Reviewed-by: Chuck Lever
    Tested-by: Chuck Lever
    Reported-by: Eryu Guan
    Tested-by: Eryu Guan
    Fixes: 68778945e46 "SUNRPC: Separate buffer pointers..."
    Reported-by: J. Bruce Fields
    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

27 Oct, 2016

1 commit

  • As of ac4e97abce9b "scatterlist: sg_set_buf() argument must be in linear
    mapping", sg_set_buf hits a BUG when make_checksum_v2->xdr_process_buf,
    among other callers, passes it memory on the stack.

    We only need a scatterlist to pass this to the crypto code, and it seems
    like overkill to require kmalloc'd memory just to encrypt a few bytes,
    but for now this seems the best fix.

    Many of these callers are in the NFS write paths, so we allocate with
    GFP_NOFS. It might be possible to do without allocations here entirely,
    but that would probably be a bigger project.

    Cc: Rusty Russell
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

14 Oct, 2016

2 commits

  • Pull NFS client updates from Anna Schumaker:
    "Highlights include:

    Stable bugfixes:
    - sunrpc: fix writ espace race causing stalls
    - NFS: Fix inode corruption in nfs_prime_dcache()
    - NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
    - NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
    - NFSv4: Open state recovery must account for file permission changes
    - NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic

    Features:
    - Add support for tracking multiple layout types with an ordered list
    - Add support for using multiple backchannel threads on the client
    - Add support for pNFS file layout session trunking
    - Delay xprtrdma use of DMA API (for device driver removal)
    - Add support for xprtrdma remote invalidation
    - Add support for larger xprtrdma inline thresholds
    - Use a scatter/gather list for sending xprtrdma RPC calls
    - Add support for the CB_NOTIFY_LOCK callback
    - Improve hashing sunrpc auth_creds by using both uid and gid

    Bugfixes:
    - Fix xprtrdma use of DMA API
    - Validate filenames before adding to the dcache
    - Fix corruption of xdr->nwords in xdr_copy_to_scratch
    - Fix setting buffer length in xdr_set_next_buffer()
    - Don't deadlock the state manager on the SEQUENCE status flags
    - Various delegation and stateid related fixes
    - Retry operations if an interrupted slot receives EREMOTEIO
    - Make nfs boot time y2038 safe"

    * tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits)
    NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
    fs: nfs: Make nfs boot time y2038 safe
    sunrpc: replace generic auth_cred hash with auth-specific function
    sunrpc: add RPCSEC_GSS hash_cred() function
    sunrpc: add auth_unix hash_cred() function
    sunrpc: add generic_auth hash_cred() function
    sunrpc: add hash_cred() function to rpc_authops struct
    Retry operation on EREMOTEIO on an interrupted slot
    pNFS: Fix atime updates on pNFS clients
    sunrpc: queue work on system_power_efficient_wq
    NFSv4.1: Even if the stateid is OK, we may need to recover the open modes
    NFSv4: If recovery failed for a specific open stateid, then don't retry
    NFSv4: Fix retry issues with nfs41_test/free_stateid
    NFSv4: Open state recovery must account for file permission changes
    NFSv4: Mark the lock and open stateids as invalid after freeing them
    NFSv4: Don't test open_stateid unless it is set
    NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
    NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation
    NFSv4: Fix a race when updating an open_stateid
    NFSv4: Fix a race in nfs_inode_reclaim_delegation()
    ...

    Linus Torvalds
     
  • Pull nfsd updates from Bruce Fields:
    "Some RDMA work and some good bugfixes, and two new features that could
    benefit from user testing:

    - Anna Schumacker contributed a simple NFSv4.2 COPY implementation.
    COPY is already supported on the client side, so a call to
    copy_file_range() on a recent client should now result in a
    server-side copy that doesn't require all the data to make a round
    trip to the client and back.

    - Jeff Layton implemented callbacks to notify clients when contended
    locks become available, which should reduce latency on workloads
    with contended locks"

    * tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux:
    NFSD: Implement the COPY call
    nfsd: handle EUCLEAN
    nfsd: only WARN once on unmapped errors
    exportfs: be careful to only return expected errors.
    nfsd4: setclientid_confirm with unmatched verifier should fail
    nfsd: randomize SETCLIENTID reply to help distinguish servers
    nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies
    nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant
    nfsd: add a LRU list for blocked locks
    nfsd: have nfsd4_lock use blocking locks for v4.1+ locks
    nfsd: plumb in a CB_NOTIFY_LOCK operation
    NFSD: fix corruption in notifier registration
    svcrdma: support Remote Invalidation
    svcrdma: Server-side support for rpcrdma_connect_private
    rpcrdma: RDMA/CM private message data structure
    svcrdma: Skip put_page() when send_reply() fails
    svcrdma: Tail iovec leaves an orphaned DMA mapping
    nfsd: fix dprintk in nfsd4_encode_getdeviceinfo
    nfsd: eliminate cb_minorversion field
    nfsd: don't set a FL_LAYOUT lease for flexfiles layouts

    Linus Torvalds
     

11 Oct, 2016

2 commits

  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     
  • Al Viro
     

10 Oct, 2016

1 commit

  • Pull main rdma updates from Doug Ledford:
    "This is the main pull request for the rdma stack this release. The
    code has been through 0day and I had it tagged for linux-next testing
    for a couple days.

    Summary:

    - updates to mlx5

    - updates to mlx4 (two conflicts, both minor and easily resolved)

    - updates to iw_cxgb4 (one conflict, not so obvious to resolve,
    proper resolution is to keep the code in cxgb4_main.c as it is in
    Linus' tree as attach_uld was refactored and moved into
    cxgb4_uld.c)

    - improvements to uAPI (moved vendor specific API elements to uAPI
    area)

    - add hns-roce driver and hns and hns-roce ACPI reset support

    - conversion of all rdma code away from deprecated
    create_singlethread_workqueue

    - security improvement: remove unsafe ib_get_dma_mr (breaks lustre in
    staging)"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits)
    staging/lustre: Disable InfiniBand support
    iw_cxgb4: add fast-path for small REG_MR operations
    cxgb4: advertise support for FR_NSMR_TPTE_WR
    IB/core: correctly handle rdma_rw_init_mrs() failure
    IB/srp: Fix infinite loop when FMR sg[0].offset != 0
    IB/srp: Remove an unused argument
    IB/core: Improve ib_map_mr_sg() documentation
    IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets
    IB/mthca: Move user vendor structures
    IB/nes: Move user vendor structures
    IB/ocrdma: Move user vendor structures
    IB/mlx4: Move user vendor structures
    IB/cxgb4: Move user vendor structures
    IB/cxgb3: Move user vendor structures
    IB/mlx5: Move and decouple user vendor structures
    IB/{core,hw}: Add constant for node_desc
    ipoib: Make ipoib_warn ratelimited
    IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue
    IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue
    IB/ipoib: Remove deprecated create_singlethread_workqueue
    ...

    Linus Torvalds
     

08 Oct, 2016

1 commit

  • Current supplementary groups code can massively overallocate memory and
    is implemented in a way so that access to individual gid is done via 2D
    array.

    If number of gids is
    Cc: Vasily Kulikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

01 Oct, 2016

4 commits


28 Sep, 2016

2 commits

  • CURRENT_TIME macro is not appropriate for filesystems as it
    doesn't use the right granularity for filesystem timestamps.
    Use current_time() instead.

    CURRENT_TIME is also not y2038 safe.

    This is also in preparation for the patch that transitions
    vfs timestamps to use 64 bit time and hence make them
    y2038 safe. As part of the effort current_time() will be
    extended to do range checks. Hence, it is necessary for all
    file system timestamps to use current_time(). Also,
    current_time() will be transitioned along with vfs to be
    y2038 safe.

    Note that whenever a single call to current_time() is used
    to change timestamps in different inodes, it is because they
    share the same time granularity.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Acked-by: Felipe Balbi
    Acked-by: Steven Whitehouse
    Acked-by: Ryusuke Konishi
    Acked-by: David Sterba
    Signed-off-by: Al Viro

    Deepa Dinamani
     
  • sunrpc uses workqueue to clean cache regulary. There is no real dependency
    of executing work on the cpu which queueing it.

    On a idle system, especially for a heterogeneous systems like big.LITTLE,
    it is observed that the big idle cpu was woke up many times just to service
    this work, which against the principle of power saving. It would be better
    if we can schedule it on a cpu which the scheduler believes to be the most
    appropriate one.

    After apply this patch, system_wq will be replaced by
    system_power_efficient_wq for sunrpc. This functionality is enabled when
    CONFIG_WQ_POWER_EFFICIENT is selected.

    Signed-off-by: Ke Wang
    Signed-off-by: Anna Schumaker

    Ke Wang
     

24 Sep, 2016

1 commit

  • Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or
    less unchecked, this moves the capability of creating a global rkey into
    the RDMA core, where it can be easily audited. It also prints a warning
    everytime this feature is used as well.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Jason Gunthorpe
    Reviewed-by: Steve Wise
    Signed-off-by: Doug Ledford

    Christoph Hellwig
     

23 Sep, 2016

7 commits

  • Support Remote Invalidation. A private message is exchanged with
    the client upon RDMA transport connect that indicates whether
    Send With Invalidation may be used by the server to send RPC
    replies. The invalidate_rkey is arbitrarily chosen from among
    rkeys present in the RPC-over-RDMA header's chunk lists.

    Send With Invalidate improves performance only when clients can
    recognize, while processing an RPC reply, that an rkey has already
    been invalidated. That has been submitted as a separate change.

    In the future, the RPC-over-RDMA protocol might support Remote
    Invalidation properly. The protocol needs to enable signaling
    between peers to indicate when Remote Invalidation can be used
    for each individual RPC.

    Signed-off-by: Chuck Lever
    Reviewed-by: Sagi Grimberg
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Prepare to receive an RDMA-CM private message when handling a new
    connection attempt, and send a similar message as part of connection
    acceptance.

    Both sides can communicate their various implementation limits.
    Implementations that don't support this sideband protocol ignore it.

    Signed-off-by: Chuck Lever
    Reviewed-by: Sagi Grimberg
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Message from syslogd@klimt at Aug 18 17:00:37 ...
    kernel:page:ffffea0020639b00 count:0 mapcount:0 mapping: (null) index:0x0
    Aug 18 17:00:37 klimt kernel: flags: 0x2fffff80000000()
    Aug 18 17:00:37 klimt kernel: page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)

    Aug 18 17:00:37 klimt kernel: kernel BUG at /home/cel/src/linux/linux-2.6/include/linux/mm.h:445!
    Aug 18 17:00:37 klimt kernel: RIP: 0010:[] svc_rdma_sendto+0x641/0x820 [rpcrdma]

    send_reply() assigns its page argument as the first page of ctxt. On
    error, send_reply() already invokes svc_rdma_put_context(ctxt, 1);
    which does a put_page() on that very page. No need to do that again
    as svc_rdma_sendto exits.

    Fixes: 3e1eeb980822 ("svcrdma: Close connection when a send error occurs")
    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • The ctxt's count field is overloaded to mean the number of pages in
    the ctxt->page array and the number of SGEs in the ctxt->sge array.
    Typically these two numbers are the same.

    However, when an inline RPC reply is constructed from an xdr_buf
    with a tail iovec, the head and tail often occupy the same page,
    but each are DMA mapped independently. In that case, ->count equals
    the number of pages, but it does not equal the number of SGEs.
    There's one more SGE, for the tail iovec. Hence there is one more
    DMA mapping than there are pages in the ctxt->page array.

    This isn't a real problem until the server's iommu is enabled. Then
    each RPC reply that has content in that iovec orphans a DMA mapping
    that consists of real resources.

    krb5i and krb5p always populate that tail iovec. After a couple
    million sent krb5i/p RPC replies, the NFS server starts behaving
    erratically. Reboot is needed to clear the problem.

    Fixes: 9d11b51ce7c1 ("svcrdma: Fix send_reply() scatter/gather set-up")
    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • There is only one waiter for the completion, therefore there
    is no need to use complete_all(). Let's make that clear by
    using complete() instead of complete_all().

    The usage pattern of the completion is:

    waiter context waker context

    frwr_op_unmap_sync()
    reinit_completion()
    ib_post_send()
    wait_for_completion()

    frwr_wc_localinv_wake()
    complete()

    Signed-off-by: Daniel Wagner
    Cc: Anna Schumaker
    Cc: Trond Myklebust
    Cc: Chuck Lever
    Cc: linux-nfs@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Signed-off-by: Anna Schumaker

    Daniel Wagner
     
  • Use xdr->nwords to tell us how much buffer remains.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • When we copy the first part of the data, we need to ensure that value
    of xdr->nwords is updated as well. Do so by calling __xdr_inline_decode()

    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     

20 Sep, 2016

3 commits

  • Write space becoming available may race with putting the task to sleep
    in xprt_wait_for_buffer_space(). The existing mechanism to avoid the
    race does not work.

    This (edited) partial trace illustrates the problem:

    [1] rpc_task_run_action: task:43546@5 ... action=call_transmit
    [2] xs_write_space snd_task (== 43546), but
    this has not yet been queued and the wake up is lost.

    [4] xs_nospace() is called which calls xprt_wait_for_buffer_space()
    which queues task 43546.

    [5] The call to sk->sk_write_space() at the end of xs_nospace() (which
    is supposed to handle the above race) does not call
    xprt_write_space() as the SOCKWQ_ASYNC_NOSPACE bit is clear and
    thus the task is not woken.

    Fix the race by resetting the SOCKWQ_ASYNC_NOSPACE bit in xs_nospace()
    so the second call to sk->sk_write_space() calls xprt_write_space().

    Suggested-by: Trond Myklebust
    Signed-off-by: David Vrabel
    cc: stable@vger.kernel.org # 4.4
    Signed-off-by: Anna Schumaker

    David Vrabel
     
  • Clean up: the extra layer of indirection doesn't add value.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean up: When converting xprtrdma to use the new CQ API, I missed a
    spot. The naming convention elsewhere is:

    {svc_rdma,rpcrdma}_wc_{operation}

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever