15 Jan, 2020

40 commits

  • Trace layout errors for pNFS/flexfiles on read/write/commit operations.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • Clean up the generic file commit tracepoints to use a 64-bit value
    for the verifier, and to display the pNFS filehandle, if it exists.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • Clean up the generic writeback tracepoints so they do pass the
    full structures as arguments. Also ensure we report the number
    of bytes actually written.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • Clean up the generic file read tracepoints so they do pass the
    full structures as arguments. Also ensure we report the number
    of bytes actually read.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • If the attempt to do pNFS fails, then record what action we
    take to recover (resend, reset to pnfs or reset to mds).

    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • Casting a negative value to an unsigned long is not the same as
    converting it to its absolute value.

    Fixes: 96650e2effa2 ("NFS: Fix show_nfs_errors macros again")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • Ensure we always return the number of bytes read/written. Also display
    the pnfs filehandle if it is in use.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • Instead of making assumptions about the commit verifier contents, change
    the commit code to ensure we always check that the verifier was set
    by the XDR code.

    Fixes: f54bcf2ecee9 ("pnfs: Prepare for flexfiles by pulling out common code")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • Don't clear the NFS_CONTEXT_RESEND_WRITES flag until after calling
    nfs_commit_inode(). Otherwise, if nfs_commit_inode() returns an
    error, we end up with dirty pages in the page cache, but no tag
    to tell us that those pages need resending.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • Remove gss_mech_list_pseudoflavors() and its callers. This is part of
    an unused API, and could leak an RCU reference if it were ever called.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • If a write or commit failed, and the mapping sees a fatal error, we
    need to revalidate the contents of that mapping.

    Fixes: 06c9fdf3b9f1 ("NFS: On fatal writeback errors, we need to call nfs_inode_remove_request()")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • If we suffer a fatal error upon writing a file, which causes us to
    need to revalidate the entire mapping, then we should also revalidate
    the file size.

    Fixes: d2ceb7e57086 ("NFS: Don't use page_file_mapping after removing the page")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • Clean up: This simplifies the logic in rpcrdma_post_recvs.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • To safely get rid of all rpcrdma_reps from a particular connection
    instance, xprtrdma has to wait until each of those reps is finished
    being used. A rep may be backing the rq_rcv_buf of an RPC that has
    just completed, for example.

    Since it is safe to invoke rpcrdma_rep_destroy() only in the Receive
    completion handler, simply mark reps remaining in the rb_all_reps
    list after the transport is drained. These will then be deleted as
    rpcrdma_post_recvs pulls them off the rep free list.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • This reduces the hardware and memory footprint of an unconnected
    transport.

    At some point in the future, transport reconnect will allow
    resolving the destination IP address through a different device. The
    current change enables reps for the new connection to be allocated
    on whichever NUMA node the new device affines to after a reconnect.

    Note that this does not destroy _all_ the transport's reps... there
    will be a few that are still part of a running RPC completion.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Currently the underlying RDMA device is chosen at transport set-up
    time. But it will soon be at connect time instead.

    The maximum size of a transport header is based on device
    capabilities. Thus transport header buffers have to be allocated
    _after_ the underlying device has been chosen (via address and route
    resolution); ie, in the connect worker.

    Thus, move the allocation of transport header buffers to the connect
    worker, after the point at which the underlying RDMA device has been
    chosen.

    This also means the RDMA device is available to do a DMA mapping of
    these buffers at connect time, instead of in the hot I/O path. Make
    that optimization as well.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Refactor: Perform the "is supported" check in rpcrdma_ep_create()
    instead of in rpcrdma_ia_open(). frwr_open() is where most of the
    logic to query device attributes is already located.

    The current code displays a redundant error message when the device
    does not support FRWR. As an additional clean-up, this patch removes
    the extra message.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • To support device hotplug and migrating a connection between devices
    of different capabilities, we have to guarantee that all in-kernel
    devices can support the same max NFS payload size (1 megabyte).

    This means that possibly one or two in-tree devices are no longer
    supported for NFS/RDMA because they cannot support 1MB rsize/wsize.
    The only one I confirmed was cxgb3, but it has already been removed
    from the kernel.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean up: there is no need to keep two copies of the same value.
    Also, in subsequent patches, rpcrdma_ep_create() will be called in
    the connect worker rather than at set-up time.

    Minor fix: Initialize the transport's sendctx to the value based on
    the capabilities of the underlying device, not the maximum setting.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • The size of the sendctx queue depends on the value stored in
    ia->ri_max_send_sges. This value is determined by querying the
    underlying device.

    Eventually, rpcrdma_ia_open() and rpcrdma_ep_create() will be called
    in the connect worker rather than at transport set-up time. The
    underlying device will not have been chosen device set-up time.

    The sendctx queue will thus have to be created after the underlying
    device has been chosen via address and route resolution; in other
    words, in the connect worker.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean-up. The max_send_sge value also happens to be stored in
    ep->rep_attr. Let's keep just a single copy.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Currently the allocation of buf is not being null checked and
    a null pointer dereference can occur when the memory allocation fails.
    Fix this by adding a check and returning -ENOMEM.

    Addresses-Coverity: ("Dereference null return")
    Fixes: 6d972518b821 ("NFS: Add fs_context support.")
    Signed-off-by: Colin Ian King
    Signed-off-by: Anna Schumaker

    Colin Ian King
     
  • If CONFIG_SWAP=n, it does not make much sense to offer the user the
    option to enable support for swapping over NFS, as that will still fail
    at run time:

    # swapon /swap
    swapon: /swap: swapon failed: Function not implemented

    Fix this by adding a dependency on CONFIG_SWAP.

    Fixes: a564b8f0398636ba ("nfs: enable swap on NFS")
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Anna Schumaker

    Geert Uytterhoeven
     
  • The empty_iov structure is only copied into another structure,
    so make it const.

    The opportunity for this change was found using Coccinelle.

    Signed-off-by: Julia Lawall
    Signed-off-by: Anna Schumaker

    Julia Lawall
     
  • swapon over NFS does not go through generic_swapfile_activate
    code path when setting up extents. This makes holes in NFS
    swapfiles possible which is not expected for swapon.

    Signed-off-by: Murphy Zhou
    Signed-off-by: Anna Schumaker

    Murphy Zhou
     
  • The xprtrdma connect logic can return -EPROTO if the underlying
    device or network path does not support RDMA. This can happen
    after a device removal/insertion.

    - When SOFTCONN is set, EPROTO is a permanent error.

    - When SOFTCONN is not set, EPROTO is treated as a temporary error.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • This seems to be a somewhat common issue with Kerberos NFSv4.0
    set-ups.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Try to capture the reason for the writeback path tagging an error on
    a page.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • In nfs3_proc_lookup, if nfs_alloc_fattr fails, will only print
    "NFS call lookup". This may be confusing, move dprintk after
    nfs_alloc_fattr.

    Reported-by: Hulk Robot
    Signed-off-by: zhengbin
    Signed-off-by: Anna Schumaker

    zhengbin
     
  • Fixes coccicheck warning:

    fs/nfs/nfs4state.c:1138:2-3: Unneeded semicolon
    fs/nfs/nfs4proc.c:6862:2-3: Unneeded semicolon
    fs/nfs/nfs4proc.c:8629:2-3: Unneeded semicolon

    Reported-by: Hulk Robot
    Signed-off-by: zhengbin
    Signed-off-by: Anna Schumaker

    zhengbin
     
  • On 32-bit architectures, xdr_encode_nfstime4() needlessly
    truncates timestamps to a 32-bit value in the range between
    year 1902 and 2038.

    Change it to use 'struct timespec64' to allow the entire range
    of values supported by the server.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Anna Schumaker

    Arnd Bergmann
     
  • For NFSv2 and NFSv3, timestamps are stored using 32-bit entities
    and overflow in y2038. For historic reasons we truncate the
    64-bit timestamps by converting from a timespec64 to a timespec
    first.

    Remove this unnecessary conversion step and do the truncation
    in the final functions that take a timestamp.

    This is transparent to users, but avoids one of the last uses
    of 'timespec' and lets us remove it later.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Anna Schumaker

    Arnd Bergmann
     
  • nfs currently behaves differently on 32-bit and 64-bit kernels regarding
    the on-disk format of nfs_fscache_inode_auxdata.

    That format should really be the same on any kernel, and we should avoid
    the 'timespec' type in order to remove that from the kernel later on.

    Using plain 'timespec64' would not be good here, since that includes
    implied padding and would possibly leak kernel stack data to the on-disk
    format on 32-bit architectures.

    struct __kernel_timespec would work as a replacement, but open-coding
    the two struct members in nfs_fscache_inode_auxdata makes it more
    obvious what's going on here, and keeps the current format for 64-bit
    architectures.

    Cc: David Howells
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Anna Schumaker

    Arnd Bergmann
     
  • Push down the use of timespec64 into NFS nfs_fattr, to avoid needless
    conversions, and get closer to having 64-bit time_t support on 32-bit
    NFSv4 and removing some old interfaces from the kernel.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Anna Schumaker

    Arnd Bergmann
     
  • Using signed 32-bit types for UTC time leads to the y2038 overflow,
    which is what happens in the sunrpc code at the moment.

    This changes the sunrpc code over to use time64_t where possible.
    The one exception is the gss_import_v{1,2}_context() function for
    kerberos5, which uses 32-bit timestamps in the protocol. Here,
    we can at least treat the numbers as 'unsigned', which extends the
    range from 2038 to 2106.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Anna Schumaker

    Arnd Bergmann
     
  • Split out from commit "NFS: Add fs_context support."

    Add wrappers nfs_errorf(), nfs_invalf(), and nfs_warnf() which log error
    information to the fs_context. Convert some printk's to use these new
    wrappers instead.

    Signed-off-by: Scott Mayhew
    Signed-off-by: Anna Schumaker

    Scott Mayhew
     
  • Split out from commit "NFS: Add fs_context support."

    This patch adds additional refactoring for the conversion of NFS to use
    fs_context, namely:

    (*) Merge nfs_mount_info and nfs_clone_mount into nfs_fs_context.
    nfs_clone_mount has had several fields removed, and nfs_mount_info
    has been removed altogether.
    (*) Various functions now take an fs_context as an argument instead
    of nfs_mount_info, nfs_fs_context, etc.

    Signed-off-by: Scott Mayhew
    Signed-off-by: Anna Schumaker

    Scott Mayhew
     
  • Add filesystem context support to NFS, parsing the options in advance and
    attaching the information to struct nfs_fs_context. The highlights are:

    (*) Merge nfs_mount_info and nfs_clone_mount into nfs_fs_context. This
    structure represents NFS's superblock config.

    (*) Make use of the VFS's parsing support to split comma-separated lists

    (*) Pin the NFS protocol module in the nfs_fs_context.

    (*) Attach supplementary error information to fs_context. This has the
    downside that these strings must be static and can't be formatted.

    (*) Remove the auxiliary file_system_type structs since the information
    necessary can be conveyed in the nfs_fs_context struct instead.

    (*) Root mounts are made by duplicating the config for the requested mount
    so as to have the same parameters. Submounts pick up their parameters
    from the parent superblock.

    [AV -- retrans is u32, not string]
    [SM -- Renamed cfg to ctx in a few functions in an earlier patch]
    [SM -- Moved fs_context mount option parsing to an earlier patch]
    [SM -- Moved fs_context error logging to a later patch]
    [SM -- Fixed printks in nfs4_try_get_tree() and nfs4_get_referral_tree()]
    [SM -- Added is_remount_fc() helper]
    [SM -- Deferred some refactoring to a later patch]
    [SM -- Fixed referral mounts, which were broken in the original patch]
    [SM -- Fixed leak of nfs_fattr when fs_context is freed]

    Signed-off-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Scott Mayhew
    Signed-off-by: Anna Schumaker

    David Howells
     
  • Split out from commit "NFS: Add fs_context support."

    Convert existing mount option definitions to fs_parameter_enum's and
    fs_parameter_spec's. Parse mount options using fs_parse() and
    lookup_constant().

    Notes:

    1) Fixed a typo in the udp6 definition in nfs_xprt_protocol_tokens
    from the original commit.

    2) fs_parse() expects an fs_context as the first arg so that any
    errors can be logged to the fs_context. We're passing NULL for the
    fs_context (this will change in commit "NFS: Add fs_context support.")
    which is okay as it will cause logfc() to do a printk() instead.

    3) fs_parse() expects an fs_paramter as the third arg. We're
    building an fs_parameter manually in nfs_fs_context_parse_option(),
    which will go away in commit "NFS: Add fs_context support.".

    Signed-off-by: Scott Mayhew
    Signed-off-by: Anna Schumaker

    Scott Mayhew