27 Oct, 2015

1 commit

  • commit c91aed9896946721bb30705ea2904edb3725dd61 upstream.

    The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions
    were not taking into account the initial page_offset when determining
    the rdma read length. This resulted in a read who's starting address
    and length exceeded the base/bounds of the frmr.

    The server gets an async error from the rdma device and kills the
    connection, and the client then reconnects and resends. This repeats
    indefinitely, and the application hangs.

    Most work loads don't tickle this bug apparently, but one test hit it
    every time: building the linux kernel on a 16 core node with 'make -j
    16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA.

    This bug seems to only be tripped with devices having small fastreg page
    list depths. I didn't see it with mlx4, for instance.

    Fixes: 0bf4828983df ('svcrdma: refactor marshalling logic')
    Signed-off-by: Steve Wise
    Tested-by: Chuck Lever
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Steve Wise
     

23 Oct, 2015

1 commit

  • commit 9d11b51ce7c150a69e761e30518f294fc73d55ff upstream.

    The Linux NFS server returns garbage in the data payload of inline
    NFS/RDMA READ replies. These are READs of under 1000 bytes or so
    where the client has not provided either a reply chunk or a write
    list.

    The NFS server delivers the data payload for an NFS READ reply to
    the transport in an xdr_buf page list. If the NFS client did not
    provide a reply chunk or a write list, send_reply() is supposed to
    set up a separate sge for the page containing the READ data, and
    another sge for XDR padding if needed, then post all of the sges via
    a single SEND Work Request.

    The problem is send_reply() does not advance through the xdr_buf
    when setting up scatter/gather entries for SEND WR. It always calls
    dma_map_xdr with xdr_off set to zero. When there's more than one
    sge, dma_map_xdr() sets up the SEND sge's so they all point to the
    xdr_buf's head.

    The current Linux NFS/RDMA client always provides a reply chunk or
    a write list when performing an NFS READ over RDMA. Therefore, it
    does not exercise this particular case. The Linux server has never
    had to use more than one extra sge for building RPC/RDMA replies
    with a Linux client.

    However, an NFS/RDMA client _is_ allowed to send small NFS READs
    without setting up a write list or reply chunk. The NFS READ reply
    fits entirely within the inline reply buffer in this case. This is
    perhaps a more efficient way of performing NFS READs that the Linux
    NFS/RDMA client may some day adopt.

    Fixes: b432e6b3d9c1 ('svcrdma: Change DMA mapping logic to . . .')
    BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=285
    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     

30 Sep, 2015

4 commits

  • commit 79234c3db6842a3de03817211d891e0c2878f756 upstream.

    Avoid all races with the connect/disconnect handlers by taking the
    transport lock.

    Reported-by:"Suzuki K. Poulose"
    Acked-by: Jeff Layton
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • commit 0fdea1e8a2853f79d39b8555cc9de16a7e0ab26f upstream.

    Commit 718ba5b87343, moved the responsibility for unlocking the socket to
    xs_tcp_setup_socket, meaning that the socket will be unlocked before we
    know that it has finished trying to connect. The following patch is based on
    an initial patch by Russell King to ensure that we delay clearing the
    XPRT_CONNECTING flag until we either know that we failed to initiate
    a connection attempt, or the connection attempt itself failed.

    Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing")
    Reported-by: Russell King
    Reported-by: Russell King
    Tested-by: Russell King
    Tested-by: Benjamin Coddington
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • commit 0c78789e3a030615c6650fde89546cadf40ec2cc upstream.

    In case the reconnection attempt fails.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • commit 99b1a4c32ad22024ac6198a4337aaec5ea23168f upstream.

    It is rather pointless to test the value of transport->inet after
    calling xs_reset_transport(), since it will always be zero, and
    so we will never see any exponential back off behaviour.
    Also don't force early connections for SOFTCONN tasks. If the server
    disconnects us, we should respect the exponential backoff.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     

04 Aug, 2015

1 commit


05 May, 2015

1 commit

  • In an environment where the KDC is running Active Directory, the
    exported composite name field returned in the context could be large
    enough to span a page boundary. Attaching a scratch buffer to the
    decoding xdr_stream helps deal with those cases.

    The case where we saw this was actually due to behavior that's been
    fixed in newer gss-proxy versions, but we're fixing it here too.

    Signed-off-by: Scott Mayhew
    Cc: stable@vger.kernel.org
    Reviewed-by: Simo Sorce
    Signed-off-by: J. Bruce Fields

    Scott Mayhew
     

27 Apr, 2015

2 commits

  • Pull NFS client updates from Trond Myklebust:
    "Another set of mainly bugfixes and a couple of cleanups. No new
    functionality in this round.

    Highlights include:

    Stable patches:
    - Fix a regression in /proc/self/mountstats
    - Fix the pNFS flexfiles O_DIRECT support
    - Fix high load average due to callback thread sleeping

    Bugfixes:
    - Various patches to fix the pNFS layoutcommit support
    - Do not cache pNFS deviceids unless server notifications are enabled
    - Fix a SUNRPC transport reconnection regression
    - make debugfs file creation failure non-fatal in SUNRPC
    - Another fix for circular directory warnings on NFSv4 "junctioned"
    mountpoints
    - Fix locking around NFSv4.2 fallocate() support
    - Truncating NFSv4 file opens should also sync O_DIRECT writes
    - Prevent infinite loop in rpcrdma_ep_create()

    Features:
    - Various improvements to the RDMA transport code's handling of
    memory registration
    - Various code cleanups"

    * tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (55 commits)
    fs/nfs: fix new compiler warning about boolean in switch
    nfs: Remove unneeded casts in nfs
    NFS: Don't attempt to decode missing directory entries
    Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one"
    NFS: Rename idmap.c to nfs4idmap.c
    NFS: Move nfs_idmap.h into fs/nfs/
    NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h
    NFS: Add a stub for GETDEVICELIST
    nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes
    nfs: fix DIO good bytes calculation
    nfs: Fetch MOUNTED_ON_FILEID when updating an inode
    sunrpc: make debugfs file creation failure non-fatal
    nfs: fix high load average due to callback thread sleeping
    NFS: Reduce time spent holding the i_mutex during fallocate()
    NFS: Don't zap caches on fallocate()
    xprtrdma: Make rpcrdma_{un}map_one() into inline functions
    xprtrdma: Handle non-SEND completions via a callout
    xprtrdma: Add "open" memreg op
    xprtrdma: Add "destroy MRs" memreg op
    xprtrdma: Add "reset MRs" memreg op
    ...

    Linus Torvalds
     
  • Pull fourth vfs update from Al Viro:
    "d_inode() annotations from David Howells (sat in for-next since before
    the beginning of merge window) + four assorted fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    RCU pathwalk breakage when running into a symlink overmounting something
    fix I_DIO_WAKEUP definition
    direct-io: only inc/dec inode->i_dio_count for file systems
    fs/9p: fix readdir()
    VFS: assorted d_backing_inode() annotations
    VFS: fs/inode.c helpers: d_inode() annotations
    VFS: fs/cachefiles: d_backing_inode() annotations
    VFS: fs library helpers: d_inode() annotations
    VFS: assorted weird filesystems: d_inode() annotations
    VFS: normal filesystems (and lustre): d_inode() annotations
    VFS: security/: d_inode() annotations
    VFS: security/: d_backing_inode() annotations
    VFS: net/: d_inode() annotations
    VFS: net/unix: d_backing_inode() annotations
    VFS: kernel/: d_inode() annotations
    VFS: audit: d_backing_inode() annotations
    VFS: Fix up some ->d_inode accesses in the chelsio driver
    VFS: Cachefiles should perform fs modifications on the top layer only
    VFS: AF_UNIX sockets should call mknod on the top layer only

    Linus Torvalds
     

24 Apr, 2015

3 commits

  • NFS: NFSoRDMA Client Changes

    This patch series creates an operation vector for each of the different
    memory registration modes. This should make it easier to one day increase
    credit limit, rsize, and wsize.

    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • * bugfixes:
    NFSv4: Return delegations synchronously in evict_inode
    SUNRPC: Fix a regression when reconnecting
    NFS: remount with security change should return EINVAL
    nfs: do not export discarded symbols
    NFSv4.1: don't export static symbol

    Trond Myklebust
     
  • v2: gracefully handle the case where some dentry pointers end up NULL
    and be more dilligent about zeroing out dentry pointers

    We currently have a problem that SELinux policy is being enforced when
    creating debugfs files. If a debugfs file is created as a side effect of
    doing some syscall, then that creation can fail if the SELinux policy
    for that process prevents it.

    This seems wrong. We don't do that for files under /proc, for instance,
    so Bruce has proposed a patch to fix that.

    While discussing that patch however, Greg K.H. stated:

    "No kernel code should care / fail if a debugfs function fails, so
    please fix up the sunrpc code first."

    This patch converts all of the sunrpc debugfs setup code to be void
    return functins, and the callers to not look for errors from those
    functions.

    This should allow rpc_clnt and rpc_xprt creation to work, even if the
    kernel fails to create debugfs files for some reason.

    Cc: Greg Kroah-Hartman
    Acked-by: "J. Bruce Fields"
    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     

16 Apr, 2015

4 commits

  • The current semantics of string_escape_mem are inadequate for one of its
    current users, vsnprintf(). If that is to honour its contract, it must
    know how much space would be needed for the entire escaped buffer, and
    string_escape_mem provides no way of obtaining that (short of allocating a
    large enough buffer (~4 times input string) to let it play with, and
    that's definitely a big no-no inside vsnprintf).

    So change the semantics for string_escape_mem to be more snprintf-like:
    Return the size of the output that would be generated if the destination
    buffer was big enough, but of course still only write to the part of dst
    it is allowed to, and (contrary to snprintf) don't do '\0'-termination.
    It is then up to the caller to detect whether output was truncated and to
    append a '\0' if desired. Also, we must output partial escape sequences,
    otherwise a call such as snprintf(buf, 3, "%1pE", "\123") would cause
    printf to write a \0 to buf[2] but leaving buf[0] and buf[1] with whatever
    they previously contained.

    This also fixes a bug in the escaped_string() helper function, which used
    to unconditionally pass a length of "end-buf" to string_escape_mem();
    since the latter doesn't check osz for being insanely large, it would
    happily write to dst. For example, kasprintf(GFP_KERNEL, "something and
    then %pE", ...); is an easy way to trigger an oops.

    In test-string_helpers.c, the -ENOMEM test is replaced with testing for
    getting the expected return value even if the buffer is too small. We
    also ensure that nothing is written (by relying on a NULL pointer deref)
    if the output size is 0 by passing NULL - this has to work for
    kasprintf("%pE") to work.

    In net/sunrpc/cache.c, I think qword_add still has the same semantics.
    Someone should definitely double-check this.

    In fs/proc/array.c, I made the minimum possible change, but longer-term it
    should stop poking around in seq_file internals.

    [andriy.shevchenko@linux.intel.com: simplify qword_add]
    [andriy.shevchenko@linux.intel.com: add missed curly braces]
    Signed-off-by: Rasmus Villemoes
    Acked-by: Andy Shevchenko
    Signed-off-by: Andy Shevchenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     
  • There are a lot of embedded systems that run most or all of their
    functionality in init, running as root:root. For these systems,
    supporting multiple users is not necessary.

    This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for
    non-root users, non-root groups, and capabilities optional. It is enabled
    under CONFIG_EXPERT menu.

    When this symbol is not defined, UID and GID are zero in any possible case
    and processes always have all capabilities.

    The following syscalls are compiled out: setuid, setregid, setgid,
    setreuid, setresuid, getresuid, setresgid, getresgid, setgroups,
    getgroups, setfsuid, setfsgid, capget, capset.

    Also, groups.c is compiled out completely.

    In kernel/capability.c, capable function was moved in order to avoid
    adding two ifdef blocks.

    This change saves about 25 KB on a defconfig build. The most minimal
    kernels have total text sizes in the high hundreds of kB rather than
    low MB. (The 25k goes down a bit with allnoconfig, but not that much.

    The kernel was booted in Qemu. All the common functionalities work.
    Adding users/groups is not possible, failing with -ENOSYS.

    Bloat-o-meter output:
    add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650)

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Iulia Manda
    Reviewed-by: Josh Triplett
    Acked-by: Geert Uytterhoeven
    Tested-by: Paul E. McKenney
    Reviewed-by: Paul E. McKenney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Iulia Manda
     
  • socket inodes and sunrpc filesystems - inodes owned by that code

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Pull networking updates from David Miller:

    1) Add BQL support to via-rhine, from Tino Reichardt.

    2) Integrate SWITCHDEV layer support into the DSA layer, so DSA drivers
    can support hw switch offloading. From Floria Fainelli.

    3) Allow 'ip address' commands to initiate multicast group join/leave,
    from Madhu Challa.

    4) Many ipv4 FIB lookup optimizations from Alexander Duyck.

    5) Support EBPF in cls_bpf classifier and act_bpf action, from Daniel
    Borkmann.

    6) Remove the ugly compat support in ARP for ugly layers like ax25,
    rose, etc. And use this to clean up the neigh layer, then use it to
    implement MPLS support. All from Eric Biederman.

    7) Support L3 forwarding offloading in switches, from Scott Feldman.

    8) Collapse the LOCAL and MAIN ipv4 FIB tables when possible, to speed
    up route lookups even further. From Alexander Duyck.

    9) Many improvements and bug fixes to the rhashtable implementation,
    from Herbert Xu and Thomas Graf. In particular, in the case where
    an rhashtable user bulk adds a large number of items into an empty
    table, we expand the table much more sanely.

    10) Don't make the tcp_metrics hash table per-namespace, from Eric
    Biederman.

    11) Extend EBPF to access SKB fields, from Alexei Starovoitov.

    12) Split out new connection request sockets so that they can be
    established in the main hash table. Much less false sharing since
    hash lookups go direct to the request sockets instead of having to
    go first to the listener then to the request socks hashed
    underneath. From Eric Dumazet.

    13) Add async I/O support for crytpo AF_ALG sockets, from Tadeusz Struk.

    14) Support stable privacy address generation for RFC7217 in IPV6. From
    Hannes Frederic Sowa.

    15) Hash network namespace into IP frag IDs, also from Hannes Frederic
    Sowa.

    16) Convert PTP get/set methods to use 64-bit time, from Richard
    Cochran.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1816 commits)
    fm10k: Bump driver version to 0.15.2
    fm10k: corrected VF multicast update
    fm10k: mbx_update_max_size does not drop all oversized messages
    fm10k: reset head instead of calling update_max_size
    fm10k: renamed mbx_tx_dropped to mbx_tx_oversized
    fm10k: update xcast mode before synchronizing multicast addresses
    fm10k: start service timer on probe
    fm10k: fix function header comment
    fm10k: comment next_vf_mbx flow
    fm10k: don't handle mailbox events in iov_event path and always process mailbox
    fm10k: use separate workqueue for fm10k driver
    fm10k: Set PF queues to unlimited bandwidth during virtualization
    fm10k: expose tx_timeout_count as an ethtool stat
    fm10k: only increment tx_timeout_count in Tx hang path
    fm10k: remove extraneous "Reset interface" message
    fm10k: separate PF only stats so that VF does not display them
    fm10k: use hw->mac.max_queues for stats
    fm10k: only show actual queues, not the maximum in hardware
    fm10k: allow creation of VLAN on default vid
    fm10k: fix unused warnings
    ...

    Linus Torvalds
     

15 Apr, 2015

1 commit

  • Pull trivial tree from Jiri Kosina:
    "Usual trivial tree updates. Nothing outstanding -- mostly printk()
    and comment fixes and unused identifier removals"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    goldfish: goldfish_tty_probe() is not using 'i' any more
    powerpc: Fix comment in smu.h
    qla2xxx: Fix printks in ql_log message
    lib: correct link to the original source for div64_u64
    si2168, tda10071, m88ds3103: Fix firmware wording
    usb: storage: Fix printk in isd200_log_config()
    qla2xxx: Fix printk in qla25xx_setup_mode
    init/main: fix reset_device comment
    ipwireless: missing assignment
    goldfish: remove unreachable line of code
    coredump: Fix do_coredump() comment
    stacktrace.h: remove duplicate declaration task_struct
    smpboot.h: Remove unused function prototype
    treewide: Fix typo in printk messages
    treewide: Fix typo in printk messages
    mod_devicetable: fix comment for match_flags

    Linus Torvalds
     

12 Apr, 2015

1 commit


01 Apr, 2015

1 commit

  • We currently have a problem that SELinux policy is being enforced when
    creating debugfs files. If a debugfs file is created as a side effect of
    doing some syscall, then that creation can fail if the SELinux policy
    for that process prevents it.

    This seems wrong. We don't do that for files under /proc, for instance,
    so Bruce has proposed a patch to fix that.

    While discussing that patch however, Greg K.H. stated:

    "No kernel code should care / fail if a debugfs function fails, so
    please fix up the sunrpc code first."

    This patch converts all of the sunrpc debugfs setup code to be void
    return functins, and the callers to not look for errors from those
    functions.

    This should allow rpc_clnt and rpc_xprt creation to work, even if the
    kernel fails to create debugfs files for some reason.

    Symptoms were failing krb5 mounts on systems using gss-proxy and
    selinux.

    Fixes: 388f0c776781 "sunrpc: add a debugfs rpc_xprt directory..."
    Cc: stable@vger.kernel.org
    Signed-off-by: Jeff Layton
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

31 Mar, 2015

14 commits

  • These functions are called in a loop for each page transferred via
    RDMA READ or WRITE. Extract loop invariants and inline them to
    reduce CPU overhead.

    Signed-off-by: Chuck Lever
    Tested-by: Devesh Sharma
    Tested-by: Meghana Cheripady
    Tested-by: Veeresh U. Kokatnur
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Allow each memory registration mode to plug in a callout that handles
    the completion of a memory registration operation.

    Signed-off-by: Chuck Lever
    Reviewed-by: Sagi Grimberg
    Tested-by: Devesh Sharma
    Tested-by: Meghana Cheripady
    Tested-by: Veeresh U. Kokatnur
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • The open op determines the size of various transport data structures
    based on device capabilities and memory registration mode.

    Signed-off-by: Chuck Lever
    Tested-by: Devesh Sharma
    Tested-by: Meghana Cheripady
    Tested-by: Veeresh U. Kokatnur
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Memory Region objects associated with a transport instance are
    destroyed before the instance is shutdown and destroyed.

    Signed-off-by: Chuck Lever
    Reviewed-by: Sagi Grimberg
    Tested-by: Devesh Sharma
    Tested-by: Meghana Cheripady
    Tested-by: Veeresh U. Kokatnur
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • This method is invoked when a transport instance is about to be
    reconnected. Each Memory Region object is reset to its initial
    state.

    Signed-off-by: Chuck Lever
    Reviewed-by: Sagi Grimberg
    Tested-by: Devesh Sharma
    Tested-by: Meghana Cheripady
    Tested-by: Veeresh U. Kokatnur
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • This method is used when setting up a new transport instance to
    create a pool of Memory Region objects that will be used to register
    memory during operation.

    Memory Regions are not needed for "physical" registration, since
    ->prepare and ->release are no-ops for that mode.

    Signed-off-by: Chuck Lever
    Reviewed-by: Sagi Grimberg
    Tested-by: Devesh Sharma
    Tested-by: Meghana Cheripady
    Tested-by: Veeresh U. Kokatnur
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • There is very little common processing among the different external
    memory deregistration functions.

    Signed-off-by: Chuck Lever
    Tested-by: Devesh Sharma
    Tested-by: Meghana Cheripady
    Tested-by: Veeresh U. Kokatnur
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • There is very little common processing among the different external
    memory registration functions. Have rpcrdma_create_chunks() call
    the registration method directly. This removes a stack frame and a
    switch statement from the external registration path.

    Signed-off-by: Chuck Lever
    Tested-by: Devesh Sharma
    Tested-by: Meghana Cheripady
    Tested-by: Veeresh U. Kokatnur
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • The max_payload computation is generalized to ensure that the
    payload maximum is the lesser of RPC_MAX_DATA_SEGS and the number of
    data segments that can be transmitted in an inline buffer.

    Signed-off-by: Chuck Lever
    Reviewed-by: Sagi Grimberg
    Tested-by: Devesh Sharma
    Tested-by: Meghana Cheripady
    Tested-by: Veeresh U. Kokatnur
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Instead of employing switch() statements, let's use the typical
    Linux kernel idiom for handling behavioral variation: virtual
    functions.

    Start by defining a vector of operations for each supported memory
    registration mode, and by adding a source file for each mode.

    Signed-off-by: Chuck Lever
    Reviewed-by: Sagi Grimberg
    Tested-by: Devesh Sharma
    Tested-by: Meghana Cheripady
    Tested-by: Veeresh U. Kokatnur
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • If a provider advertizes a zero max_fast_reg_page_list_len, FRWR
    depth detection loops forever. Instead of just failing the mount,
    try other memory registration modes.

    Fixes: 0fc6c4e7bb28 ("xprtrdma: mind the device's max fast . . .")
    Reported-by: Devesh Sharma
    Signed-off-by: Chuck Lever
    Tested-by: Devesh Sharma
    Tested-by: Meghana Cheripady
    Tested-by: Veeresh U. Kokatnur
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • The RPC/RDMA transport's FRWR registration logic registers whole
    pages. This means areas in the first and last pages that are not
    involved in the RDMA I/O are needlessly exposed to the server.

    Buffered I/O is typically page-aligned, so not a problem there. But
    for direct I/O, which can be byte-aligned, and for reply chunks,
    which are nearly always smaller than a page, the transport could
    expose memory outside the I/O buffer.

    FRWR allows byte-aligned memory registration, so let's use it as
    it was intended.

    Reported-by: Sagi Grimberg
    Signed-off-by: Chuck Lever
    Tested-by: Devesh Sharma
    Tested-by: Meghana Cheripady
    Tested-by: Veeresh U. Kokatnur
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Commit 6ab59945f292 ("xprtrdma: Update rkeys after transport
    reconnect" added logic in the ->send_request path to update the
    chunk list when an RPC/RDMA request is retransmitted.

    Note that rpc_xdr_encode() resets and re-encodes the entire RPC
    send buffer for each retransmit of an RPC. The RPC send buffer
    is not preserved from the previous transmission of an RPC.

    Revert 6ab59945f292, and instead, just force each request to be
    fully marshaled every time through ->send_request. This should
    preserve the fix from 6ab59945f292, while also performing pullup
    during retransmits.

    Signed-off-by: Chuck Lever
    Acked-by: Sagi Grimberg
    Tested-by: Devesh Sharma
    Tested-by: Meghana Cheripady
    Tested-by: Veeresh U. Kokatnur
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Signed-off-by: Chuck Lever
    Reviewed-by: Sagi Grimberg
    Tested-by: Devesh Sharma
    Tested-by: Meghana Cheripady
    Tested-by: Veeresh U. Kokatnur
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

28 Mar, 2015

1 commit

  • If the task needs to give up the socket lock in order to allow a
    reconnect to occur, then it must also clear the 'rq_bytes_sent' field
    so that when it retransmits, it knows to start from the beginning.

    Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing")
    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

13 Mar, 2015

1 commit


12 Mar, 2015

1 commit


09 Mar, 2015

1 commit

  • POLL_OUT isn't what callers of ->poll() are expecting to see; it's
    actually __SI_POLL | 2 and it's a siginfo code, not a poll bitmap
    bit...

    Signed-off-by: Al Viro
    Cc: stable@vger.kernel.org
    Cc: Bruce Fields
    Signed-off-by: Linus Torvalds

    Al Viro
     

07 Mar, 2015

2 commits

  • This patch fix spelling typo in printk messages.

    Signed-off-by: Masanari Iida
    Acked-by: Randy Dunlap
    Signed-off-by: Jiri Kosina

    Masanari Iida
     
  • Pull NFS client bugfixes from Trond Myklebust:
    "Highlights include:

    - Fix a regression in the NFSv4 open state recovery code
    - Fix a regression in the NFSv4 close code
    - Fix regressions and side-effects of the loop-back mounted NFS fixes
    in 3.18, that cause the NFS read() syscall to return EBUSY.
    - Fix regressions around the readdirplus code and how it interacts
    with the VFS lazy unmount changes that went into v3.18.
    - Fix issues with out-of-order RPC call replies replacing updated
    attributes with stale ones (particularly after a truncate()).
    - Fix an underflow checking issue with RPC/RDMA credits
    - Fix a number of issues with the NFSv4 delegation return/free code.
    - Fix issues around stale NFSv4.1 leases when doing a mount"

    * tag 'nfs-for-4.0-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (24 commits)
    NFSv4.1: Clear the old state by our client id before establishing a new lease
    NFSv4: Fix a race in NFSv4.1 server trunking discovery
    NFS: Don't write enable new pages while an invalidation is proceeding
    NFS: Fix a regression in the read() syscall
    NFSv4: Ensure we skip delegations that are already being returned
    NFSv4: Pin the superblock while we're returning the delegation
    NFSv4: Ensure we honour NFS_DELEGATION_RETURNING in nfs_inode_set_delegation()
    NFSv4: Ensure that we don't reap a delegation that is being returned
    NFS: Fix stateid used for NFS v4 closes
    NFSv4: Don't call put_rpccred() under the rcu_read_lock()
    NFS: Don't require a filehandle to refresh the inode in nfs_prime_dcache()
    NFSv3: Use the readdir fileid as the mounted-on-fileid
    NFS: Don't invalidate a submounted dentry in nfs_prime_dcache()
    NFSv4: Set a barrier in the update_changeattr() helper
    NFS: Fix nfs_post_op_update_inode() to set an attribute barrier
    NFS: Remove size hack in nfs_inode_attrs_need_update()
    NFSv4: Add attribute update barriers to delegreturn and pNFS layoutcommit
    NFS: Add attribute update barriers to NFS writebacks
    NFS: Set an attribute barrier on all updates
    NFS: Add attribute update barriers to nfs_setattr_update_inode()
    ...

    Linus Torvalds