05 Apr, 2020

1 commit


29 Mar, 2020

1 commit


27 Mar, 2020

12 commits

  • Change the rpcrdma_xprt_disconnect() function so that it no longer
    waits for the DISCONNECTED event. This prevents blocking if the
    remote is unresponsive.

    In rpcrdma_xprt_disconnect(), the transport's rpcrdma_ep is
    detached. Upon return from rpcrdma_xprt_disconnect(), the transport
    (r_xprt) is ready immediately for a new connection.

    The RDMA_CM_DEVICE_REMOVAL and RDMA_CM_DISCONNECTED events are now
    handled almost identically.

    However, because the lifetimes of rpcrdma_xprt structures and
    rpcrdma_ep structures are now independent, creating an rpcrdma_ep
    needs to take a module ref count. The ep now owns most of the
    hardware resources for a transport.

    Also, a kref is needed to ensure that rpcrdma_ep sticks around
    long enough for the cm_event_handler to finish.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • rpcrdma_cm_event_handler() is always passed an @id pointer that is
    valid. However, in a subsequent patch, we won't be able to extract
    an r_xprt in every case. So instead of using the r_xprt's
    presentation address strings, extract them from struct rdma_cm_id.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • I eventually want to allocate rpcrdma_ep separately from struct
    rpcrdma_xprt so that on occasion there can be more than one ep per
    xprt.

    The new struct rpcrdma_ep will contain all the fields currently in
    rpcrdma_ia and in rpcrdma_ep. This is all the device and CM settings
    for the connection, in addition to per-connection settings
    negotiated with the remote.

    Take this opportunity to rename the existing ep fields from rep_* to
    re_* to disambiguate these from struct rpcrdma_rep.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Completion errors after a disconnect often occur much sooner than a
    CM_DISCONNECT event. Use this to try to detect connection loss more
    quickly.

    Note that other kernel ULPs do take care to disconnect explicitly
    when a WR is flushed.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean up:
    The upper layer serializes calls to xprt_rdma_close, so there is no
    need for an atomic bit operation, saving 8 bytes in rpcrdma_ia.

    This enables merging rpcrdma_ia_remove directly into the disconnect
    logic.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Move rdma_cm_id creation into rpcrdma_ep_create() so that it is now
    responsible for allocating all per-connection hardware resources.

    With this clean-up, all three arms of the switch statement in
    rpcrdma_ep_connect are exactly the same now, thus the switch can be
    removed.

    Because device removal behaves a little differently than
    disconnection, there is a little more work to be done before
    rpcrdma_ep_destroy() can release the connection's rdma_cm_id. So
    it is not quite symmetrical with rpcrdma_ep_create() yet.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Make a Protection Domain (PD) a per-connection resource rather than
    a per-transport resource. In other words, when the connection
    terminates, the PD is destroyed.

    Thus there is one less HW resource that remains allocated to a
    transport after a connection is closed.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean up: Simplify the synopses of functions in the connect and
    disconnect paths in preparation for combining the rpcrdma_ia and
    struct rpcrdma_ep structures.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean up: Simplify the synopses of functions in the post_send path
    by combining the struct rpcrdma_ia and struct rpcrdma_ep arguments.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean up: prepare for combining the rpcrdma_ia and rpcrdma_ep
    structures. Take the opportunity to rename the function to be
    consistent with the "subsystem _ object _ verb" naming scheme.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Refactor rpcrdma_ep_create(), rpcrdma_ep_disconnect(), and
    rpcrdma_ep_destroy().

    rpcrdma_ep_create will be invoked at connect time instead of at
    transport set-up time. It will be responsible for allocating per-
    connection resources. In this patch it allocates the CQs and
    creates a QP. More to come.

    rpcrdma_ep_destroy() is the inverse functionality that is
    invoked at disconnect time. It will be responsible for releasing
    the CQs and QP.

    These changes should be safe to do because both connect and
    disconnect is guaranteed to be serialized by the transport send
    lock.

    This takes us another step closer to resolving the address and route
    only at connect time so that connection failover to another device
    will work correctly.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Two changes:
    - Show the number of SG entries that were mapped. This helps debug
    DMA-related problems.
    - Record the MR's resource ID instead of its memory address. This
    groups each MR with its associated rdma-tool output, and reduces
    needless exposure of memory addresses.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

26 Mar, 2020

1 commit

  • Ever since commit 2c94b8eca1a2 ("SUNRPC: Use au_rslack when computing
    reply buffer size"). It changed how "req->rq_rcvsize" is calculated. It
    used to use au_cslack value which was nice and large and changed it to
    au_rslack value which turns out to be too small.

    Since 5.1, v3 mount with sec=krb5p fails against an Ontap server
    because client's receive buffer it too small.

    For gss krb5p, we need to account for the mic token in the verifier,
    and the wrap token in the wrap token.

    RFC 4121 defines:
    mic token
    Octet no Name Description
    --------------------------------------------------------------
    0..1 TOK_ID Identification field. Tokens emitted by
    GSS_GetMIC() contain the hex value 04 04
    expressed in big-endian order in this
    field.
    2 Flags Attributes field, as described in section
    4.2.2.
    3..7 Filler Contains five octets of hex value FF.
    8..15 SND_SEQ Sequence number field in clear text,
    expressed in big-endian order.
    16..last SGN_CKSUM Checksum of the "to-be-signed" data and
    octet 0..15, as described in section 4.2.4.

    that's 16bytes (GSS_KRB5_TOK_HDR_LEN) + chksum

    wrap token
    Octet no Name Description
    --------------------------------------------------------------
    0..1 TOK_ID Identification field. Tokens emitted by
    GSS_Wrap() contain the hex value 05 04
    expressed in big-endian order in this
    field.
    2 Flags Attributes field, as described in section
    4.2.2.
    3 Filler Contains the hex value FF.
    4..5 EC Contains the "extra count" field, in big-
    endian order as described in section 4.2.3.
    6..7 RRC Contains the "right rotation count" in big-
    endian order, as described in section
    4.2.5.
    8..15 SND_SEQ Sequence number field in clear text,
    expressed in big-endian order.
    16..last Data Encrypted data for Wrap tokens with
    confidentiality, or plaintext data followed
    by the checksum for Wrap tokens without
    confidentiality, as described in section
    4.2.4.

    Also 16bytes of header (GSS_KRB5_TOK_HDR_LEN), encrypted data, and cksum
    (other things like padding)

    RFC 3961 defines known cksum sizes:
    Checksum type sumtype checksum section or
    value size reference
    ---------------------------------------------------------------------
    CRC32 1 4 6.1.3
    rsa-md4 2 16 6.1.2
    rsa-md4-des 3 24 6.2.5
    des-mac 4 16 6.2.7
    des-mac-k 5 8 6.2.8
    rsa-md4-des-k 6 16 6.2.6
    rsa-md5 7 16 6.1.1
    rsa-md5-des 8 24 6.2.4
    rsa-md5-des3 9 24 ??
    sha1 (unkeyed) 10 20 ??
    hmac-sha1-des3-kd 12 20 6.3
    hmac-sha1-des3 13 20 ??
    sha1 (unkeyed) 14 20 ??
    hmac-sha1-96-aes128 15 20 [KRB5-AES]
    hmac-sha1-96-aes256 16 20 [KRB5-AES]
    [reserved] 0x8003 ? [GSS-KRB5]

    Linux kernel now mainly supports type 15,16 so max cksum size is 20bytes.
    (GSS_KRB5_MAX_CKSUM_LEN)

    Re-use already existing define of GSS_KRB5_MAX_SLACK_NEEDED that's used
    for encoding the gss_wrap tokens (same tokens are used in reply).

    Fixes: 2c94b8eca1a2 ("SUNRPC: Use au_rslack when computing reply buffer size")
    Signed-off-by: Olga Kornievskaia
    Reviewed-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Olga Kornievskaia
     

16 Mar, 2020

6 commits

  • By preventing compiler inlining of the integrity and privacy
    helpers, stack utilization for the common case (authentication only)
    goes way down.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean up: this function is no longer used.

    Signed-off-by: Chuck Lever
    Reviewed-by: Benjamin Coddington
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • xdr_buf_read_mic() tries to find unused contiguous space in a
    received xdr_buf in order to linearize the checksum for the call
    to gss_verify_mic. However, the corner cases in this code are
    numerous and we seem to keep missing them. I've just hit yet
    another buffer overrun related to it.

    This overrun is at the end of xdr_buf_read_mic():

    1284 if (buf->tail[0].iov_len != 0)
    1285 mic->data = buf->tail[0].iov_base + buf->tail[0].iov_len;
    1286 else
    1287 mic->data = buf->head[0].iov_base + buf->head[0].iov_len;
    1288 __read_bytes_from_xdr_buf(&subbuf, mic->data, mic->len);
    1289 return 0;

    This logic assumes the transport has set the length of the tail
    based on the size of the received message. base + len is then
    supposed to be off the end of the message but still within the
    actual buffer.

    In fact, the length of the tail is set by the upper layer when the
    Call is encoded so that the end of the tail is actually the end of
    the allocated buffer itself. This causes the logic above to set
    mic->data to point past the end of the receive buffer.

    The "mic->data = head" arm of this if statement is no less fragile.

    As near as I can tell, this has been a problem forever. I'm not sure
    that minimizing au_rslack recently changed this pathology much.

    So instead, let's use a more straightforward approach: kmalloc a
    separate buffer to linearize the checksum. This is similar to
    how gss_validate() currently works.

    Coming back to this code, I had some trouble understanding what
    was going on. So I've cleaned up the variable naming and added
    a few comments that point back to the XDR definition in RFC 2203
    to help guide future spelunkers, including myself.

    As an added clean up, the functionality that was in
    xdr_buf_read_mic() is folded directly into gss_unwrap_resp_integ(),
    as that is its only caller.

    Signed-off-by: Chuck Lever
    Reviewed-by: Benjamin Coddington
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • The variable status is being initialized with a value that is never
    read and it is being updated later with a new value. The initialization
    is redundant and can be removed.

    Addresses-Coverity: ("Unused value")
    Signed-off-by: Colin Ian King
    Signed-off-by: Trond Myklebust

    Colin Ian King
     
  • If the RPC call is synchronous, assume the cred is already pinned
    by the caller.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Add a flag to signal to the RPC layer that the credential is already
    pinned for the duration of the RPC call.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

14 Feb, 2020

1 commit

  • The @nents value that was passed to ib_dma_map_sg() has to be passed
    to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
    concatenate sg entries, it will return a different nents value than
    it was passed.

    The bug was exposed by recent changes to the AMD IOMMU driver, which
    enabled sg entry concatenation.

    Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
    new memory registration API") and reviewing other kernel ULPs, it's
    not clear that the frwr_map() logic was ever correct for this case.

    Reported-by: Andre Tomt
    Suggested-by: Robin Murphy
    Signed-off-by: Chuck Lever
    Cc: stable@vger.kernel.org
    Reviewed-by: Jason Gunthorpe
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

08 Feb, 2020

3 commits

  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - Server-to-server copy code from Olga.

    To use it, client and both servers must have support, the target
    server must be able to access the source server over NFSv4.2, and
    the target server must have the inter_copy_offload_enable module
    parameter set.

    - Improvements and bugfixes for the new filehandle cache, especially
    in the container case, from Trond

    - Also from Trond, better reporting of write errors.

    - Y2038 work from Arnd"

    * tag 'nfsd-5.6' of git://linux-nfs.org/~bfields/linux: (55 commits)
    sunrpc: expiry_time should be seconds not timeval
    nfsd: make nfsd_filecache_wq variable static
    nfsd4: fix double free in nfsd4_do_async_copy()
    nfsd: convert file cache to use over/underflow safe refcount
    nfsd: Define the file access mode enum for tracing
    nfsd: Fix a perf warning
    nfsd: Ensure sampling of the write verifier is atomic with the write
    nfsd: Ensure sampling of the commit verifier is atomic with the commit
    sunrpc: clean up cache entry add/remove from hashtable
    sunrpc: Fix potential leaks in sunrpc_cache_unhash()
    nfsd: Ensure exclusion between CLONE and WRITE errors
    nfsd: Pass the nfsd_file as arguments to nfsd4_clone_file_range()
    nfsd: Update the boot verifier on stable writes too.
    nfsd: Fix stable writes
    nfsd: Allow nfsd_vfs_write() to take the nfsd_file as an argument
    nfsd: Fix a soft lockup race in nfsd_file_mark_find_or_create()
    nfsd: Reduce the number of calls to nfsd_file_gc()
    nfsd: Schedule the laundrette regularly irrespective of file errors
    nfsd: Remove unused constant NFSD_FILE_LRU_RESCAN
    nfsd: Containerise filecache laundrette
    ...

    Linus Torvalds
     
  • Puyll NFS client updates from Anna Schumaker:
    "Stable bugfixes:
    - Fix memory leaks and corruption in readdir # v2.6.37+
    - Directory page cache needs to be locked when read # v2.6.37+

    New features:
    - Convert NFS to use the new mount API
    - Add "softreval" mount option to let clients use cache if server goes down
    - Add a config option to compile without UDP support
    - Limit the number of inactive delegations the client can cache at once
    - Improved readdir concurrency using iterate_shared()

    Other bugfixes and cleanups:
    - More 64-bit time conversions
    - Add additional diagnostic tracepoints
    - Check for holes in swapfiles, and add dependency on CONFIG_SWAP
    - Various xprtrdma cleanups to prepare for 5.7's changes
    - Several fixes for NFS writeback and commit handling
    - Fix acls over krb5i/krb5p mounts
    - Recover from premature loss of openstateids
    - Fix NFS v3 chacl and chmod bug
    - Compare creds using cred_fscmp()
    - Use kmemdup_nul() in more places
    - Optimize readdir cache page invalidation
    - Lease renewal and recovery fixes"

    * tag 'nfs-for-5.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (93 commits)
    NFSv4.0: nfs4_do_fsinfo() should not do implicit lease renewals
    NFSv4: try lease recovery on NFS4ERR_EXPIRED
    NFS: Fix memory leaks
    nfs: optimise readdir cache page invalidation
    NFS: Switch readdir to using iterate_shared()
    NFS: Use kmemdup_nul() in nfs_readdir_make_qstr()
    NFS: Directory page cache pages need to be locked when read
    NFS: Fix memory leaks and corruption in readdir
    SUNRPC: Use kmemdup_nul() in rpc_parse_scope_id()
    NFS: Replace various occurrences of kstrndup() with kmemdup_nul()
    NFSv4: Limit the total number of cached delegations
    NFSv4: Add accounting for the number of active delegations held
    NFSv4: Try to return the delegation immediately when marked for return on close
    NFS: Clear NFS_DELEGATION_RETURN_IF_CLOSED when the delegation is returned
    NFSv4: nfs_inode_evict_delegation() should set NFS_DELEGATION_RETURNING
    NFS: nfs_find_open_context() should use cred_fscmp()
    NFS: nfs_access_get_cached_rcu() should use cred_fscmp()
    NFSv4: pnfs_roc() must use cred_fscmp() to compare creds
    NFS: remove unused macros
    nfs: Return EINVAL rather than ERANGE for mount parse errors
    ...

    Linus Torvalds
     
  • When upcalling gssproxy, cache_head.expiry_time is set as a
    timeval, not seconds since boot. As such, RPC cache expiry
    logic will not clean expired objects created under
    auth.rpcsec.context cache.

    This has proven to cause kernel memory leaks on field. Using
    64 bit variants of getboottime/timespec

    Expiration times have worked this way since 2010's c5b29f885afe "sunrpc:
    use seconds since boot in expiry cache". The gssproxy code introduced
    in 2012 added gss_proxy_save_rsc and introduced the bug. That's a while
    for this to lurk, but it required a bit of an extreme case to make it
    obvious.

    Signed-off-by: Roberto Bergantinos Corpas
    Cc: stable@vger.kernel.org
    Fixes: 030d794bf498 "SUNRPC: Use gssproxy upcall for server..."
    Tested-By: Frank Sorenson
    Signed-off-by: J. Bruce Fields

    Roberto Bergantinos Corpas
     

04 Feb, 2020

2 commits

  • The most notable change is DEFINE_SHOW_ATTRIBUTE macro split in
    seq_file.h.

    Conversion rule is:

    llseek => proc_lseek
    unlocked_ioctl => proc_ioctl

    xxx => proc_xxx

    delete ".owner = THIS_MODULE" line

    [akpm@linux-foundation.org: fix drivers/isdn/capi/kcapi_proc.c]
    [sfr@canb.auug.org.au: fix kernel/sched/psi.c]
    Link: http://lkml.kernel.org/r/20200122180545.36222f50@canb.auug.org.au
    Link: http://lkml.kernel.org/r/20191225172546.GB13378@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Using kmemdup_nul() is more efficient when the length is known.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     

30 Jan, 2020

1 commit

  • …/kernel/git/arnd/playground

    Pull y2038 updates from Arnd Bergmann:
    "Core, driver and file system changes

    These are updates to device drivers and file systems that for some
    reason or another were not included in the kernel in the previous
    y2038 series.

    I've gone through all users of time_t again to make sure the kernel is
    in a long-term maintainable state, replacing all remaining references
    to time_t with safe alternatives.

    Some related parts of the series were picked up into the nfsd, xfs,
    alsa and v4l2 trees. A final set of patches in linux-mm removes the
    now unused time_t/timeval/timespec types and helper functions after
    all five branches are merged for linux-5.6, ensuring that no new users
    get merged.

    As a result, linux-5.6, or my backport of the patches to 5.4 [1],
    should be the first release that can serve as a base for a 32-bit
    system designed to run beyond year 2038, with a few remaining caveats:

    - All user space must be compiled with a 64-bit time_t, which will be
    supported in the coming musl-1.2 and glibc-2.32 releases, along
    with installed kernel headers from linux-5.6 or higher.

    - Applications that use the system call interfaces directly need to
    be ported to use the time64 syscalls added in linux-5.1 in place of
    the existing system calls. This impacts most users of futex() and
    seccomp() as well as programming languages that have their own
    runtime environment not based on libc.

    - Applications that use a private copy of kernel uapi header files or
    their contents may need to update to the linux-5.6 version, in
    particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h,
    linux/elfcore.h, linux/sockios.h, linux/timex.h and
    linux/can/bcm.h.

    - A few remaining interfaces cannot be changed to pass a 64-bit
    time_t in a compatible way, so they must be configured to use
    CLOCK_MONOTONIC times or (with a y2106 problem) unsigned 32-bit
    timestamps. Most importantly this impacts all users of 'struct
    input_event'.

    - All y2038 problems that are present on 64-bit machines also apply
    to 32-bit machines. In particular this affects file systems with
    on-disk timestamps using signed 32-bit seconds: ext4 with
    ext3-style small inodes, ext2, xfs (to be fixed soon) and ufs"

    [1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame

    * tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (21 commits)
    Revert "drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC"
    y2038: sh: remove timeval/timespec usage from headers
    y2038: sparc: remove use of struct timex
    y2038: rename itimerval to __kernel_old_itimerval
    y2038: remove obsolete jiffies conversion functions
    nfs: fscache: use timespec64 in inode auxdata
    nfs: fix timstamp debug prints
    nfs: use time64_t internally
    sunrpc: convert to time64_t for expiry
    drm/etnaviv: avoid deprecated timespec
    drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC
    drm/msm: avoid using 'timespec'
    hfs/hfsplus: use 64-bit inode timestamps
    hostfs: pass 64-bit timestamps to/from user space
    packet: clarify timestamp overflow
    tsacct: add 64-bit btime field
    acct: stop using get_seconds()
    um: ubd: use 64-bit time_t where possible
    xtensa: ISS: avoid struct timeval
    dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD
    ...

    Linus Torvalds
     

23 Jan, 2020

2 commits


15 Jan, 2020

10 commits

  • Remove gss_mech_list_pseudoflavors() and its callers. This is part of
    an unused API, and could leak an RCU reference if it were ever called.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • Clean up: This simplifies the logic in rpcrdma_post_recvs.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • To safely get rid of all rpcrdma_reps from a particular connection
    instance, xprtrdma has to wait until each of those reps is finished
    being used. A rep may be backing the rq_rcv_buf of an RPC that has
    just completed, for example.

    Since it is safe to invoke rpcrdma_rep_destroy() only in the Receive
    completion handler, simply mark reps remaining in the rb_all_reps
    list after the transport is drained. These will then be deleted as
    rpcrdma_post_recvs pulls them off the rep free list.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • This reduces the hardware and memory footprint of an unconnected
    transport.

    At some point in the future, transport reconnect will allow
    resolving the destination IP address through a different device. The
    current change enables reps for the new connection to be allocated
    on whichever NUMA node the new device affines to after a reconnect.

    Note that this does not destroy _all_ the transport's reps... there
    will be a few that are still part of a running RPC completion.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Currently the underlying RDMA device is chosen at transport set-up
    time. But it will soon be at connect time instead.

    The maximum size of a transport header is based on device
    capabilities. Thus transport header buffers have to be allocated
    _after_ the underlying device has been chosen (via address and route
    resolution); ie, in the connect worker.

    Thus, move the allocation of transport header buffers to the connect
    worker, after the point at which the underlying RDMA device has been
    chosen.

    This also means the RDMA device is available to do a DMA mapping of
    these buffers at connect time, instead of in the hot I/O path. Make
    that optimization as well.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Refactor: Perform the "is supported" check in rpcrdma_ep_create()
    instead of in rpcrdma_ia_open(). frwr_open() is where most of the
    logic to query device attributes is already located.

    The current code displays a redundant error message when the device
    does not support FRWR. As an additional clean-up, this patch removes
    the extra message.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • To support device hotplug and migrating a connection between devices
    of different capabilities, we have to guarantee that all in-kernel
    devices can support the same max NFS payload size (1 megabyte).

    This means that possibly one or two in-tree devices are no longer
    supported for NFS/RDMA because they cannot support 1MB rsize/wsize.
    The only one I confirmed was cxgb3, but it has already been removed
    from the kernel.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean up: there is no need to keep two copies of the same value.
    Also, in subsequent patches, rpcrdma_ep_create() will be called in
    the connect worker rather than at set-up time.

    Minor fix: Initialize the transport's sendctx to the value based on
    the capabilities of the underlying device, not the maximum setting.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • The size of the sendctx queue depends on the value stored in
    ia->ri_max_send_sges. This value is determined by querying the
    underlying device.

    Eventually, rpcrdma_ia_open() and rpcrdma_ep_create() will be called
    in the connect worker rather than at transport set-up time. The
    underlying device will not have been chosen device set-up time.

    The sendctx queue will thus have to be created after the underlying
    device has been chosen via address and route resolution; in other
    words, in the connect worker.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean-up. The max_send_sge value also happens to be stored in
    ep->rep_attr. Let's keep just a single copy.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever