15 Jan, 2020

40 commits

  • Using signed 32-bit types for UTC time leads to the y2038 overflow,
    which is what happens in the sunrpc code at the moment.

    This changes the sunrpc code over to use time64_t where possible.
    The one exception is the gss_import_v{1,2}_context() function for
    kerberos5, which uses 32-bit timestamps in the protocol. Here,
    we can at least treat the numbers as 'unsigned', which extends the
    range from 2038 to 2106.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Anna Schumaker

    Arnd Bergmann
     
  • Split out from commit "NFS: Add fs_context support."

    Add wrappers nfs_errorf(), nfs_invalf(), and nfs_warnf() which log error
    information to the fs_context. Convert some printk's to use these new
    wrappers instead.

    Signed-off-by: Scott Mayhew
    Signed-off-by: Anna Schumaker

    Scott Mayhew
     
  • Split out from commit "NFS: Add fs_context support."

    This patch adds additional refactoring for the conversion of NFS to use
    fs_context, namely:

    (*) Merge nfs_mount_info and nfs_clone_mount into nfs_fs_context.
    nfs_clone_mount has had several fields removed, and nfs_mount_info
    has been removed altogether.
    (*) Various functions now take an fs_context as an argument instead
    of nfs_mount_info, nfs_fs_context, etc.

    Signed-off-by: Scott Mayhew
    Signed-off-by: Anna Schumaker

    Scott Mayhew
     
  • Add filesystem context support to NFS, parsing the options in advance and
    attaching the information to struct nfs_fs_context. The highlights are:

    (*) Merge nfs_mount_info and nfs_clone_mount into nfs_fs_context. This
    structure represents NFS's superblock config.

    (*) Make use of the VFS's parsing support to split comma-separated lists

    (*) Pin the NFS protocol module in the nfs_fs_context.

    (*) Attach supplementary error information to fs_context. This has the
    downside that these strings must be static and can't be formatted.

    (*) Remove the auxiliary file_system_type structs since the information
    necessary can be conveyed in the nfs_fs_context struct instead.

    (*) Root mounts are made by duplicating the config for the requested mount
    so as to have the same parameters. Submounts pick up their parameters
    from the parent superblock.

    [AV -- retrans is u32, not string]
    [SM -- Renamed cfg to ctx in a few functions in an earlier patch]
    [SM -- Moved fs_context mount option parsing to an earlier patch]
    [SM -- Moved fs_context error logging to a later patch]
    [SM -- Fixed printks in nfs4_try_get_tree() and nfs4_get_referral_tree()]
    [SM -- Added is_remount_fc() helper]
    [SM -- Deferred some refactoring to a later patch]
    [SM -- Fixed referral mounts, which were broken in the original patch]
    [SM -- Fixed leak of nfs_fattr when fs_context is freed]

    Signed-off-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Scott Mayhew
    Signed-off-by: Anna Schumaker

    David Howells
     
  • Split out from commit "NFS: Add fs_context support."

    Convert existing mount option definitions to fs_parameter_enum's and
    fs_parameter_spec's. Parse mount options using fs_parse() and
    lookup_constant().

    Notes:

    1) Fixed a typo in the udp6 definition in nfs_xprt_protocol_tokens
    from the original commit.

    2) fs_parse() expects an fs_context as the first arg so that any
    errors can be logged to the fs_context. We're passing NULL for the
    fs_context (this will change in commit "NFS: Add fs_context support.")
    which is okay as it will cause logfc() to do a printk() instead.

    3) fs_parse() expects an fs_paramter as the third arg. We're
    building an fs_parameter manually in nfs_fs_context_parse_option(),
    which will go away in commit "NFS: Add fs_context support.".

    Signed-off-by: Scott Mayhew
    Signed-off-by: Anna Schumaker

    Scott Mayhew
     
  • Split out from commit "NFS: Add fs_context support."

    Rename cfg to ctx in nfs_init_server(), nfs_verify_authflavors(),
    and nfs_request_mount(). No functional changes.

    Signed-off-by: Scott Mayhew
    Signed-off-by: Anna Schumaker

    Scott Mayhew
     
  • Do some tidying of the parsing code, including:

    (*) Returning 0/error rather than true/false.

    (*) Putting the nfs_fs_context pointer first in some arg lists.

    (*) Unwrap some lines that will now fit on one line.

    (*) Provide unioned sockaddr/sockaddr_storage fields to avoid casts.

    (*) nfs_parse_devname() can paste its return values directly into the
    nfs_fs_context struct as that's where the caller puts them.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    David Howells
     
  • Add a small buffer in nfs_fs_context to avoid string duplication when
    parsing numbers. Also make the parsing function wrapper place the parsed
    integer directly in the appropriate nfs_fs_context struct member.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    David Howells
     
  • Deindent nfs_fs_context_parse_option().

    Signed-off-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    David Howells
     
  • Split nfs_parse_mount_options() to move the prologue, list-splitting and
    epilogue into one function and the per-option processing into another.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    David Howells
     
  • Rename struct nfs_parsed_mount_data to struct nfs_fs_context and rename
    pointers to it to "ctx". At some point this will be pointed to by an
    fs_context struct's fs_private pointer.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    David Howells
     
  • The mount argument match tables should never be altered so constify them.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    David Howells
     
  • Split various bits relating to mount parameterisation out from
    fs/nfs/super.c into their own file to form the basis of filesystem context
    handling for NFS.

    No other changes are made to the code beyond removing 'static' qualifiers.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    David Howells
     
  • it's always either nfs_set_sb_security() or nfs_clone_sb_security(),
    the choice being controlled by mount_info->cloned != NULL. No need
    to add methods, especially when both instances live right next to
    the caller and are never accessed anywhere else.

    Reviewed-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    Al Viro
     
  • We used to check ->i_op for being nfs_dir_inode_operations. With
    separate inode_operations for v3 and v4 that became bogus, but
    rather than going for protocol-dependent comparison we could've
    just checked ->i_fop instead; _that_ is the same for all protocol
    versions.

    Reviewed-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    Al Viro
     
  • The only possible values are nfs_fill_super and nfs_clone_super. The
    latter is used only when crossing into a submount and it is almost
    identical to the former; the only differences are
    * ->s_time_gran unconditionally set to 1 (even for v2 mounts).
    Regression dating back to 2012, actually.
    * ->s_blocksize/->s_blocksize_bits set to that of parent.

    Rather than messing with the method, stash ->s_blocksize_bits in
    mount_info in submount case and after the (now unconditional)
    call of nfs_fill_super() override ->s_blocksize/->s_blocksize_bits
    if that has been set.

    Reviewed-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    Al Viro
     
  • pick it from mount_info

    Reviewed-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    Al Viro
     
  • Make it static, even. And remove a stale extern of (long-gone)
    nfs_xdev_mount_common() from internal.h, while we are at it.

    Reviewed-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    Al Viro
     
  • they are identical now...

    Reviewed-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    Al Viro
     
  • Reviewed-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    Al Viro
     
  • That will allow to get rid of passing those references around in
    quite a few places. Moreover, that will allow to merge xdev and
    remote file_system_type.

    Reviewed-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    Al Viro
     
  • Do it in nfs_do_submount() instead. As a side benefit, nfs_clone_data
    doesn't need ->fh and ->fattr anymore.

    Reviewed-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    Al Viro
     
  • Reviewed-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    Al Viro
     
  • nothing in it will be looking at that thing anyway

    Reviewed-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    Al Viro
     
  • They are identical now.

    Reviewed-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    Al Viro
     
  • Do that (fhandle allocation, setting struct server up) in
    nfs4_referral_mount() and nfs4_try_mount() resp. and pass the
    server and pointer to mount_info into nfs_do_root_mount() so that
    nfs4_remote_referral_mount()/nfs_remote_mount() could be merged.

    Since we are moving stuff from ->mount() instances to the points
    prior to vfs_kern_mount() that would trigger those, we need to
    make sure that do_nfs_root_mount() will do the corresponding
    cleanup itself if it doesn't trigger those ->mount() instances.

    Reviewed-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    Al Viro
     
  • Reviewed-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    Al Viro
     
  • Allow it to take ERR_PTR() for server and return ERR_CAST() of it in
    such case. All callers used to open-code that...

    Reviewed-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Anna Schumaker

    Al Viro
     
  • Pull NFS client bugfixes from Anna Schumaker:
    "Three NFS over RDMA fixes for bugs Chuck found that can be hit during
    device removal:

    - Fix create_qp crash on device unload

    - Fix completion wait during device removal

    - Fix oops in receive handler after device removal"

    * tag 'nfs-for-5.5-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
    xprtrdma: Fix oops in Receive handler after device removal
    xprtrdma: Fix completion wait during device removal
    xprtrdma: Fix create_qp crash on device unload

    Linus Torvalds
     
  • Since v5.4, a device removal occasionally triggered this oops:

    Dec 2 17:13:53 manet kernel: BUG: unable to handle page fault for address: 0000000c00000219
    Dec 2 17:13:53 manet kernel: #PF: supervisor read access in kernel mode
    Dec 2 17:13:53 manet kernel: #PF: error_code(0x0000) - not-present page
    Dec 2 17:13:53 manet kernel: PGD 0 P4D 0
    Dec 2 17:13:53 manet kernel: Oops: 0000 [#1] SMP
    Dec 2 17:13:53 manet kernel: CPU: 2 PID: 468 Comm: kworker/2:1H Tainted: G W 5.4.0-00050-g53717e43af61 #883
    Dec 2 17:13:53 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
    Dec 2 17:13:53 manet kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
    Dec 2 17:13:53 manet kernel: RIP: 0010:rpcrdma_wc_receive+0x7c/0xf6 [rpcrdma]
    Dec 2 17:13:53 manet kernel: Code: 6d 8b 43 14 89 c1 89 45 78 48 89 4d 40 8b 43 2c 89 45 14 8b 43 20 89 45 18 48 8b 45 20 8b 53 14 48 8b 30 48 8b 40 10 48 8b 38 8b 87 18 02 00 00 48 85 c0 75 18 48 8b 05 1e 24 c4 e1 48 85 c0
    Dec 2 17:13:53 manet kernel: RSP: 0018:ffffc900035dfe00 EFLAGS: 00010246
    Dec 2 17:13:53 manet kernel: RAX: ffff888467290000 RBX: ffff88846c638400 RCX: 0000000000000048
    Dec 2 17:13:53 manet kernel: RDX: 0000000000000048 RSI: 00000000f942e000 RDI: 0000000c00000001
    Dec 2 17:13:53 manet kernel: RBP: ffff888467611b00 R08: ffff888464e4a3c4 R09: 0000000000000000
    Dec 2 17:13:53 manet kernel: R10: ffffc900035dfc88 R11: fefefefefefefeff R12: ffff888865af4428
    Dec 2 17:13:53 manet kernel: R13: ffff888466023000 R14: ffff88846c63f000 R15: 0000000000000010
    Dec 2 17:13:53 manet kernel: FS: 0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000
    Dec 2 17:13:53 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Dec 2 17:13:53 manet kernel: CR2: 0000000c00000219 CR3: 0000000002009002 CR4: 00000000001606e0
    Dec 2 17:13:53 manet kernel: Call Trace:
    Dec 2 17:13:53 manet kernel: __ib_process_cq+0x5c/0x14e [ib_core]
    Dec 2 17:13:53 manet kernel: ib_cq_poll_work+0x26/0x70 [ib_core]
    Dec 2 17:13:53 manet kernel: process_one_work+0x19d/0x2cd
    Dec 2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
    Dec 2 17:13:53 manet kernel: worker_thread+0x1a6/0x25a
    Dec 2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
    Dec 2 17:13:53 manet kernel: kthread+0xf4/0xf9
    Dec 2 17:13:53 manet kernel: ? kthread_queue_delayed_work+0x74/0x74
    Dec 2 17:13:53 manet kernel: ret_from_fork+0x24/0x30

    The proximal cause is that this rpcrdma_rep has a rr_rdmabuf that
    is still pointing to the old ib_device, which has been freed. The
    only way that is possible is if this rpcrdma_rep was not destroyed
    by rpcrdma_ia_remove.

    Debugging showed that was indeed the case: this rpcrdma_rep was
    still in use by a completing RPC at the time of the device removal,
    and thus wasn't on the rep free list. So, it was not found by
    rpcrdma_reps_destroy().

    The fix is to introduce a list of all rpcrdma_reps so that they all
    can be found when a device is removed. That list is used to perform
    only regbuf DMA unmapping, replacing that call to
    rpcrdma_reps_destroy().

    Meanwhile, to prevent corruption of this list, I've moved the
    destruction of temp rpcrdma_rep objects to rpcrdma_post_recvs().
    rpcrdma_xprt_drain() ensures that post_recvs (and thus rep_destroy) is
    not invoked while rpcrdma_reps_unmap is walking rb_all_reps, thus
    protecting the rb_all_reps list.

    Fixes: b0b227f071a0 ("xprtrdma: Use an llist to manage free rpcrdma_reps")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • I've found that on occasion, "rmmod " will hang while if an NFS
    is under load.

    Ensure that ri_remove_done is initialized only just before the
    transport is woken up to force a close. This avoids the completion
    possibly getting initialized again while the CM event handler is
    waiting for a wake-up.

    Fixes: bebd031866ca ("xprtrdma: Support unplugging an HCA from under an NFS mount")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • On device re-insertion, the RDMA device driver crashes trying to set
    up a new QP:

    Nov 27 16:32:06 manet kernel: BUG: kernel NULL pointer dereference, address: 00000000000001c0
    Nov 27 16:32:06 manet kernel: #PF: supervisor write access in kernel mode
    Nov 27 16:32:06 manet kernel: #PF: error_code(0x0002) - not-present page
    Nov 27 16:32:06 manet kernel: PGD 0 P4D 0
    Nov 27 16:32:06 manet kernel: Oops: 0002 [#1] SMP
    Nov 27 16:32:06 manet kernel: CPU: 1 PID: 345 Comm: kworker/u28:0 Tainted: G W 5.4.0 #852
    Nov 27 16:32:06 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
    Nov 27 16:32:06 manet kernel: Workqueue: xprtiod xprt_rdma_connect_worker [rpcrdma]
    Nov 27 16:32:06 manet kernel: RIP: 0010:atomic_try_cmpxchg+0x2/0x12
    Nov 27 16:32:06 manet kernel: Code: ff ff 48 8b 04 24 5a c3 c6 07 00 0f 1f 40 00 c3 31 c0 48 81 ff 08 09 68 81 72 0c 31 c0 48 81 ff 83 0c 68 81 0f 92 c0 c3 8b 06 0f b1 17 0f 94 c2 84 d2 75 02 89 06 88 d0 c3 53 ba 01 00 00 00
    Nov 27 16:32:06 manet kernel: RSP: 0018:ffffc900035abbf0 EFLAGS: 00010046
    Nov 27 16:32:06 manet kernel: RAX: 0000000000000000 RBX: 00000000000001c0 RCX: 0000000000000000
    Nov 27 16:32:06 manet kernel: RDX: 0000000000000001 RSI: ffffc900035abbfc RDI: 00000000000001c0
    Nov 27 16:32:06 manet kernel: RBP: ffffc900035abde0 R08: 000000000000000e R09: ffffffffffffc000
    Nov 27 16:32:06 manet kernel: R10: 0000000000000000 R11: 000000000002e800 R12: ffff88886169d9f8
    Nov 27 16:32:06 manet kernel: R13: ffff88886169d9f4 R14: 0000000000000246 R15: 0000000000000000
    Nov 27 16:32:06 manet kernel: FS: 0000000000000000(0000) GS:ffff88846fa40000(0000) knlGS:0000000000000000
    Nov 27 16:32:06 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Nov 27 16:32:06 manet kernel: CR2: 00000000000001c0 CR3: 0000000002009006 CR4: 00000000001606e0
    Nov 27 16:32:06 manet kernel: Call Trace:
    Nov 27 16:32:06 manet kernel: do_raw_spin_lock+0x2f/0x5a
    Nov 27 16:32:06 manet kernel: create_qp_common.isra.47+0x856/0xadf [mlx4_ib]
    Nov 27 16:32:06 manet kernel: ? slab_post_alloc_hook.isra.60+0xa/0x1a
    Nov 27 16:32:06 manet kernel: ? __kmalloc+0x125/0x139
    Nov 27 16:32:06 manet kernel: mlx4_ib_create_qp+0x57f/0x972 [mlx4_ib]

    The fix is to copy the qp_init_attr struct that was just created by
    rpcrdma_ep_create() instead of using the one from the previous
    connection instance.

    Fixes: 98ef77d1aaa7 ("xprtrdma: Send Queue size grows after a reconnect")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Pull parisc fixes from Helge Deller:
    "A boot crash fix by Mike Rapoport and a printk fix by Krzysztof
    Kozlowski"

    * 'parisc-5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    parisc: fix map_pages() to actually populate upper directory
    parisc: Use proper printk format for resource_size_t

    Linus Torvalds
     
  • Pull asm-generic fixes from Arnd Bergmann:
    "Here are two bugfixes from Mike Rapoport, both fixing compile-time
    errors for the nds32 architecture that were recently introduced"

    * tag 'asm-generic-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
    nds32: fix build failure caused by page table folding updates
    asm-generic/nds32: don't redefine cacheflush primitives

    Linus Torvalds
     
  • Pull SCSI fixes from James Bottomley:
    "Two simple fixes in the upper drivers (so both fairly core), one in
    enclosures, which fixes replugging a device into an enclosure slot and
    one in the disk driver which fixes revalidating a drive with
    protection information (PI) to make it a non-PI drive ... previously
    we were still remembering the old PI state.

    Both fixed issues are quite rare in the field"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
    scsi: enclosure: Fix stale device oops with hot replug
    scsi: sd: Clear sdkp->protection_type if disk is reformatted without PI

    Linus Torvalds
     
  • Merge misc fixes from David Howells.

    Two afs fixes and a key refcounting fix.

    * dhowells:
    afs: Fix afs_lookup() to not clobber the version on a new dentry
    afs: Fix use-after-loss-of-ref
    keys: Fix request_key() cache

    Linus Torvalds
     
  • Fix afs_lookup() to not clobber the version set on a new dentry by
    afs_do_lookup() - especially as it's using the wrong version of the
    version (we need to use the one given to us by whatever op the dir
    contents correspond to rather than what's in the afs_vnode).

    Fixes: 9dd0b82ef530 ("afs: Fix missing dentry data version updating")
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     
  • afs_lookup() has a tracepoint to indicate the outcome of
    d_splice_alias(), passing it the inode to retrieve the fid from.
    However, the function gave up its ref on that inode when it called
    d_splice_alias(), which may have failed and dropped the inode.

    Fix this by caching the fid.

    Fixes: 80548b03991f ("afs: Add more tracepoints")
    Reported-by: Al Viro
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     
  • When the key cached by request_key() and co. is cleaned up on exit(),
    the code looks in the wrong task_struct, and so clears the wrong cache.
    This leads to anomalies in key refcounting when doing, say, a kernel
    build on an afs volume, that then trigger kasan to report a
    use-after-free when the key is viewed in /proc/keys.

    Fix this by making exit_creds() look in the passed-in task_struct rather
    than in current (the task_struct cleanup code is deferred by RCU and
    potentially run in another task).

    Fixes: 7743c48e54ee ("keys: Cache result of request_key*() temporarily in task_struct")
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Merge misc fixes from Andrew Morton:
    "11 mm fixes"

    * emailed patches from Andrew Morton :
    mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATE
    mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid
    mm/page-writeback.c: improve arithmetic divisions
    mm/page-writeback.c: use div64_ul() for u64-by-unsigned-long divide
    mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio()
    mm, debug_pagealloc: don't rely on static keys too early
    mm: memcg/slab: fix percpu slab vmstats flushing
    mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD alignment
    mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment
    mm/memory_hotplug: don't free usage map when removing a re-added early section
    mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations

    Linus Torvalds