Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

27 Oct, 2015

1 commit

23a0f8cd3 svcrdma: handle rdma read with a non-zero initial page offset ... Browse Code »

commit c91aed9896946721bb30705ea2904edb3725dd61 upstream.

The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions
were not taking into account the initial page_offset when determining
the rdma read length. This resulted in a read who's starting address
and length exceeded the base/bounds of the frmr.

The server gets an async error from the rdma device and kills the
connection, and the client then reconnects and resends. This repeats
indefinitely, and the application hangs.

Most work loads don't tickle this bug apparently, but one test hit it
every time: building the linux kernel on a 16 core node with 'make -j
16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA.

This bug seems to only be tripped with devices having small fastreg page
list depths. I didn't see it with mlx4, for instance.

Fixes: 0bf4828983df ('svcrdma: refactor marshalling logic')
Signed-off-by: Steve Wise
Tested-by: Chuck Lever
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Steve Wise
2015-10-27 08:51:59 +0800

23 Oct, 2015

1 commit

6b27c668e svcrdma: Fix send_reply() scatter/gather set-up ... Browse Code »

commit 9d11b51ce7c150a69e761e30518f294fc73d55ff upstream.

The Linux NFS server returns garbage in the data payload of inline
NFS/RDMA READ replies. These are READs of under 1000 bytes or so
where the client has not provided either a reply chunk or a write
list.

The NFS server delivers the data payload for an NFS READ reply to
the transport in an xdr_buf page list. If the NFS client did not
provide a reply chunk or a write list, send_reply() is supposed to
set up a separate sge for the page containing the READ data, and
another sge for XDR padding if needed, then post all of the sges via
a single SEND Work Request.

The problem is send_reply() does not advance through the xdr_buf
when setting up scatter/gather entries for SEND WR. It always calls
dma_map_xdr with xdr_off set to zero. When there's more than one
sge, dma_map_xdr() sets up the SEND sge's so they all point to the
xdr_buf's head.

The current Linux NFS/RDMA client always provides a reply chunk or
a write list when performing an NFS READ over RDMA. Therefore, it
does not exercise this particular case. The Linux server has never
had to use more than one extra sge for building RPC/RDMA replies
with a Linux client.

However, an NFS/RDMA client _is_ allowed to send small NFS READs
without setting up a write list or reply chunk. The NFS READ reply
fits entirely within the inline reply buffer in this case. This is
perhaps a more efficient way of performing NFS READs that the Linux
NFS/RDMA client may some day adopt.

Fixes: b432e6b3d9c1 ('svcrdma: Change DMA mapping logic to . . .')
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=285
Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Chuck Lever
2015-10-23 05:43:16 +0800

30 Sep, 2015

4 commits

85d1ba73e SUNRPC: Lock the transport layer on shutdown ... Browse Code »

commit 79234c3db6842a3de03817211d891e0c2878f756 upstream.

Avoid all races with the connect/disconnect handlers by taking the
transport lock.

Reported-by:"Suzuki K. Poulose"
Acked-by: Jeff Layton
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Trond Myklebust
2015-09-30 01:26:11 +0800
77bb3c931 SUNRPC: Ensure that we wait for connections to complete before retrying ... Browse Code »

commit 0fdea1e8a2853f79d39b8555cc9de16a7e0ab26f upstream.

Commit 718ba5b87343, moved the responsibility for unlocking the socket to
xs_tcp_setup_socket, meaning that the socket will be unlocked before we
know that it has finished trying to connect. The following patch is based on
an initial patch by Russell King to ensure that we delay clearing the
XPRT_CONNECTING flag until we either know that we failed to initiate
a connection attempt, or the connection attempt itself failed.

Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing")
Reported-by: Russell King
Reported-by: Russell King
Tested-by: Russell King
Tested-by: Benjamin Coddington
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Trond Myklebust
2015-09-30 01:26:10 +0800
f160db25e SUNRPC: xs_reset_transport must mark the connection as disconnected ... Browse Code »

commit 0c78789e3a030615c6650fde89546cadf40ec2cc upstream.

In case the reconnection attempt fails.

Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Trond Myklebust
2015-09-30 01:26:10 +0800
fc56e1157 SUNRPC: Fix a thinko in xs_connect() ... Browse Code »

commit 99b1a4c32ad22024ac6198a4337aaec5ea23168f upstream.

It is rather pointless to test the value of transport->inet after
calling xs_reset_transport(), since it will always be zero, and
so we will never see any exponential back off behaviour.
Also don't force early connections for SOFTCONN tasks. If the server
disconnects us, we should respect the exponential backoff.

Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Trond Myklebust
2015-09-30 01:26:10 +0800

04 Aug, 2015

1 commit

213f7d2bb SUNRPC: Fix a memory leak in the backchannel code ... Browse Code »

commit 88de6af24f2b48b06c514d3c3d0a8f22fafe30bd upstream.

req->rq_private_buf isn't initialised when xprt_setup_backchannel calls
xprt_free_allocation.

Fixes: fb7a0b9addbdb ("nfs41: New backchannel helper routines")
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Trond Myklebust
2015-08-04 00:29:17 +0800

05 May, 2015

1 commit

9507271d9 svcrpc: fix potential GSSX_ACCEPT_SEC_CONTEXT decoding failures ... Browse Code »

In an environment where the KDC is running Active Directory, the
exported composite name field returned in the context could be large
enough to span a page boundary. Attaching a scratch buffer to the
decoding xdr_stream helps deal with those cases.

The case where we saw this was actually due to behavior that's been
fixed in newer gss-proxy versions, but we're fixing it here too.

Signed-off-by: Scott Mayhew
Cc: stable@vger.kernel.org
Reviewed-by: Simo Sorce
Signed-off-by: J. Bruce Fields

Scott Mayhew
2015-05-05 00:02:40 +0800

27 Apr, 2015

2 commits

59953fba8 Merge tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client updates from Trond Myklebust:
"Another set of mainly bugfixes and a couple of cleanups. No new
functionality in this round.

Highlights include:

Stable patches:
- Fix a regression in /proc/self/mountstats
- Fix the pNFS flexfiles O_DIRECT support
- Fix high load average due to callback thread sleeping

Bugfixes:
- Various patches to fix the pNFS layoutcommit support
- Do not cache pNFS deviceids unless server notifications are enabled
- Fix a SUNRPC transport reconnection regression
- make debugfs file creation failure non-fatal in SUNRPC
- Another fix for circular directory warnings on NFSv4 "junctioned"
mountpoints
- Fix locking around NFSv4.2 fallocate() support
- Truncating NFSv4 file opens should also sync O_DIRECT writes
- Prevent infinite loop in rpcrdma_ep_create()

Features:
- Various improvements to the RDMA transport code's handling of
memory registration
- Various code cleanups"

* tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (55 commits)
fs/nfs: fix new compiler warning about boolean in switch
nfs: Remove unneeded casts in nfs
NFS: Don't attempt to decode missing directory entries
Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one"
NFS: Rename idmap.c to nfs4idmap.c
NFS: Move nfs_idmap.h into fs/nfs/
NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h
NFS: Add a stub for GETDEVICELIST
nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes
nfs: fix DIO good bytes calculation
nfs: Fetch MOUNTED_ON_FILEID when updating an inode
sunrpc: make debugfs file creation failure non-fatal
nfs: fix high load average due to callback thread sleeping
NFS: Reduce time spent holding the i_mutex during fallocate()
NFS: Don't zap caches on fallocate()
xprtrdma: Make rpcrdma_{un}map_one() into inline functions
xprtrdma: Handle non-SEND completions via a callout
xprtrdma: Add "open" memreg op
xprtrdma: Add "destroy MRs" memreg op
xprtrdma: Add "reset MRs" memreg op
...

Linus Torvalds
2015-04-27 08:33:59 +0800
9ec3a646f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only

Linus Torvalds
2015-04-27 08:22:07 +0800

24 Apr, 2015

3 commits

f139b6c67 Merge tag 'nfs-rdma-for-4.1-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma ... Browse Code »

NFS: NFSoRDMA Client Changes

This patch series creates an operation vector for each of the different
memory registration modes. This should make it easier to one day increase
credit limit, rsize, and wsize.

Signed-off-by: Anna Schumaker

Trond Myklebust
2015-04-24 03:16:37 +0800
21330b667 Merge branch 'bugfixes' ... Browse Code »

* bugfixes:
NFSv4: Return delegations synchronously in evict_inode
SUNRPC: Fix a regression when reconnecting
NFS: remount with security change should return EINVAL
nfs: do not export discarded symbols
NFSv4.1: don't export static symbol

Trond Myklebust
2015-04-24 03:16:27 +0800
3f9400981 sunrpc: make debugfs file creation failure non-fatal ... Browse Code »

v2: gracefully handle the case where some dentry pointers end up NULL
and be more dilligent about zeroing out dentry pointers

We currently have a problem that SELinux policy is being enforced when
creating debugfs files. If a debugfs file is created as a side effect of
doing some syscall, then that creation can fail if the SELinux policy
for that process prevents it.

This seems wrong. We don't do that for files under /proc, for instance,
so Bruce has proposed a patch to fix that.

While discussing that patch however, Greg K.H. stated:

"No kernel code should care / fail if a debugfs function fails, so
please fix up the sunrpc code first."

This patch converts all of the sunrpc debugfs setup code to be void
return functins, and the callers to not look for errors from those
functions.

This should allow rpc_clnt and rpc_xprt creation to work, even if the
kernel fails to create debugfs files for some reason.

Cc: Greg Kroah-Hartman
Acked-by: "J. Bruce Fields"
Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust

Jeff Layton
2015-04-24 02:42:27 +0800

16 Apr, 2015

4 commits

41416f233 lib/string_helpers.c: change semantics of string_escape_mem ... Browse Code »

The current semantics of string_escape_mem are inadequate for one of its
current users, vsnprintf(). If that is to honour its contract, it must
know how much space would be needed for the entire escaped buffer, and
string_escape_mem provides no way of obtaining that (short of allocating a
large enough buffer (~4 times input string) to let it play with, and
that's definitely a big no-no inside vsnprintf).

So change the semantics for string_escape_mem to be more snprintf-like:
Return the size of the output that would be generated if the destination
buffer was big enough, but of course still only write to the part of dst
it is allowed to, and (contrary to snprintf) don't do '\0'-termination.
It is then up to the caller to detect whether output was truncated and to
append a '\0' if desired. Also, we must output partial escape sequences,
otherwise a call such as snprintf(buf, 3, "%1pE", "\123") would cause
printf to write a \0 to buf[2] but leaving buf[0] and buf[1] with whatever
they previously contained.

This also fixes a bug in the escaped_string() helper function, which used
to unconditionally pass a length of "end-buf" to string_escape_mem();
since the latter doesn't check osz for being insanely large, it would
happily write to dst. For example, kasprintf(GFP_KERNEL, "something and
then %pE", ...); is an easy way to trigger an oops.

In test-string_helpers.c, the -ENOMEM test is replaced with testing for
getting the expected return value even if the buffer is too small. We
also ensure that nothing is written (by relying on a NULL pointer deref)
if the output size is 0 by passing NULL - this has to work for
kasprintf("%pE") to work.

In net/sunrpc/cache.c, I think qword_add still has the same semantics.
Someone should definitely double-check this.

In fs/proc/array.c, I made the minimum possible change, but longer-term it
should stop poking around in seq_file internals.

[andriy.shevchenko@linux.intel.com: simplify qword_add]
[andriy.shevchenko@linux.intel.com: add missed curly braces]
Signed-off-by: Rasmus Villemoes
Acked-by: Andy Shevchenko
Signed-off-by: Andy Shevchenko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rasmus Villemoes
2015-04-16 07:35:24 +0800
2813893f8 kernel: conditionally support non-root users, groups and capabilities ... Browse Code »
13

There are a lot of embedded systems that run most or all of their
functionality in init, running as root:root. For these systems,
supporting multiple users is not necessary.

This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for
non-root users, non-root groups, and capabilities optional. It is enabled
under CONFIG_EXPERT menu.

When this symbol is not defined, UID and GID are zero in any possible case
and processes always have all capabilities.

The following syscalls are compiled out: setuid, setregid, setgid,
setreuid, setresuid, getresuid, setresgid, getresgid, setgroups,
getgroups, setfsuid, setfsgid, capget, capset.

Also, groups.c is compiled out completely.

In kernel/capability.c, capable function was moved in order to avoid
adding two ifdef blocks.

This change saves about 25 KB on a defconfig build. The most minimal
kernels have total text sizes in the high hundreds of kB rather than
low MB. (The 25k goes down a bit with allnoconfig, but not that much.

The kernel was booted in Qemu. All the common functionalities work.
Adding users/groups is not possible, failing with -ENOSYS.

Bloat-o-meter output:
add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650)

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Iulia Manda
Reviewed-by: Josh Triplett
Acked-by: Geert Uytterhoeven
Tested-by: Paul E. McKenney
Reviewed-by: Paul E. McKenney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Iulia Manda
2015-04-16 07:35:22 +0800
c5ef60352 VFS: net/: d_inode() annotations ... Browse Code »

socket inodes and sunrpc filesystems - inodes owned by that code

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2015-04-16 03:06:56 +0800
6c373ca89 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:

1) Add BQL support to via-rhine, from Tino Reichardt.

2) Integrate SWITCHDEV layer support into the DSA layer, so DSA drivers
can support hw switch offloading. From Floria Fainelli.

3) Allow 'ip address' commands to initiate multicast group join/leave,
from Madhu Challa.

4) Many ipv4 FIB lookup optimizations from Alexander Duyck.

5) Support EBPF in cls_bpf classifier and act_bpf action, from Daniel
Borkmann.

6) Remove the ugly compat support in ARP for ugly layers like ax25,
rose, etc. And use this to clean up the neigh layer, then use it to
implement MPLS support. All from Eric Biederman.

7) Support L3 forwarding offloading in switches, from Scott Feldman.

8) Collapse the LOCAL and MAIN ipv4 FIB tables when possible, to speed
up route lookups even further. From Alexander Duyck.

9) Many improvements and bug fixes to the rhashtable implementation,
from Herbert Xu and Thomas Graf. In particular, in the case where
an rhashtable user bulk adds a large number of items into an empty
table, we expand the table much more sanely.

10) Don't make the tcp_metrics hash table per-namespace, from Eric
Biederman.

11) Extend EBPF to access SKB fields, from Alexei Starovoitov.

12) Split out new connection request sockets so that they can be
established in the main hash table. Much less false sharing since
hash lookups go direct to the request sockets instead of having to
go first to the listener then to the request socks hashed
underneath. From Eric Dumazet.

13) Add async I/O support for crytpo AF_ALG sockets, from Tadeusz Struk.

14) Support stable privacy address generation for RFC7217 in IPV6. From
Hannes Frederic Sowa.

15) Hash network namespace into IP frag IDs, also from Hannes Frederic
Sowa.

16) Convert PTP get/set methods to use 64-bit time, from Richard
Cochran.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1816 commits)
fm10k: Bump driver version to 0.15.2
fm10k: corrected VF multicast update
fm10k: mbx_update_max_size does not drop all oversized messages
fm10k: reset head instead of calling update_max_size
fm10k: renamed mbx_tx_dropped to mbx_tx_oversized
fm10k: update xcast mode before synchronizing multicast addresses
fm10k: start service timer on probe
fm10k: fix function header comment
fm10k: comment next_vf_mbx flow
fm10k: don't handle mailbox events in iov_event path and always process mailbox
fm10k: use separate workqueue for fm10k driver
fm10k: Set PF queues to unlimited bandwidth during virtualization
fm10k: expose tx_timeout_count as an ethtool stat
fm10k: only increment tx_timeout_count in Tx hang path
fm10k: remove extraneous "Reset interface" message
fm10k: separate PF only stats so that VF does not display them
fm10k: use hw->mac.max_queues for stats
fm10k: only show actual queues, not the maximum in hardware
fm10k: allow creation of VLAN on default vid
fm10k: fix unused warnings
...

Linus Torvalds
2015-04-16 00:00:47 +0800

15 Apr, 2015

1 commit

d0bbe0dd3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull trivial tree from Jiri Kosina:
"Usual trivial tree updates. Nothing outstanding -- mostly printk()
and comment fixes and unused identifier removals"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
goldfish: goldfish_tty_probe() is not using 'i' any more
powerpc: Fix comment in smu.h
qla2xxx: Fix printks in ql_log message
lib: correct link to the original source for div64_u64
si2168, tda10071, m88ds3103: Fix firmware wording
usb: storage: Fix printk in isd200_log_config()
qla2xxx: Fix printk in qla25xx_setup_mode
init/main: fix reset_device comment
ipwireless: missing assignment
goldfish: remove unreachable line of code
coredump: Fix do_coredump() comment
stacktrace.h: remove duplicate declaration task_struct
smpboot.h: Remove unused function prototype
treewide: Fix typo in printk messages
treewide: Fix typo in printk messages
mod_devicetable: fix comment for match_flags

Linus Torvalds
2015-04-15 00:50:27 +0800

12 Apr, 2015

1 commit

d8725c86a get rid of the size argument of sock_sendmsg() ... Browse Code »

it's equal to iov_iter_count(&msg->msg_iter) in all cases

Signed-off-by: Al Viro

Al Viro
2015-04-12 03:27:37 +0800

01 Apr, 2015

1 commit

f9c72d10d sunrpc: make debugfs file creation failure non-fatal ... Browse Code »

We currently have a problem that SELinux policy is being enforced when
creating debugfs files. If a debugfs file is created as a side effect of
doing some syscall, then that creation can fail if the SELinux policy
for that process prevents it.

This seems wrong. We don't do that for files under /proc, for instance,
so Bruce has proposed a patch to fix that.

While discussing that patch however, Greg K.H. stated:

"No kernel code should care / fail if a debugfs function fails, so
please fix up the sunrpc code first."

This patch converts all of the sunrpc debugfs setup code to be void
return functins, and the callers to not look for errors from those
functions.

This should allow rpc_clnt and rpc_xprt creation to work, even if the
kernel fails to create debugfs files for some reason.

Symptoms were failing krb5 mounts on systems using gss-proxy and
selinux.

Fixes: 388f0c776781 "sunrpc: add a debugfs rpc_xprt directory..."
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton
Acked-by: Greg Kroah-Hartman
Signed-off-by: J. Bruce Fields

Jeff Layton
2015-04-01 02:15:08 +0800

31 Mar, 2015

14 commits

d654788e9 xprtrdma: Make rpcrdma_{un}map_one() into inline functions ... Browse Code »

These functions are called in a loop for each page transferred via
RDMA READ or WRITE. Extract loop invariants and inline them to
reduce CPU overhead.

Signed-off-by: Chuck Lever
Tested-by: Devesh Sharma
Tested-by: Meghana Cheripady
Tested-by: Veeresh U. Kokatnur
Signed-off-by: Anna Schumaker

Chuck Lever
2015-03-31 21:52:53 +0800
e46ac34c3 xprtrdma: Handle non-SEND completions via a callout ... Browse Code »

Allow each memory registration mode to plug in a callout that handles
the completion of a memory registration operation.

Signed-off-by: Chuck Lever
Reviewed-by: Sagi Grimberg
Tested-by: Devesh Sharma
Tested-by: Meghana Cheripady
Tested-by: Veeresh U. Kokatnur
Signed-off-by: Anna Schumaker

Chuck Lever
2015-03-31 21:52:53 +0800
3968cb585 xprtrdma: Add "open" memreg op ... Browse Code »

The open op determines the size of various transport data structures
based on device capabilities and memory registration mode.

Signed-off-by: Chuck Lever
Tested-by: Devesh Sharma
Tested-by: Meghana Cheripady
Tested-by: Veeresh U. Kokatnur
Signed-off-by: Anna Schumaker

Chuck Lever
2015-03-31 21:52:53 +0800
4561f347d xprtrdma: Add "destroy MRs" memreg op ... Browse Code »

Memory Region objects associated with a transport instance are
destroyed before the instance is shutdown and destroyed.

Signed-off-by: Chuck Lever
Reviewed-by: Sagi Grimberg
Tested-by: Devesh Sharma
Tested-by: Meghana Cheripady
Tested-by: Veeresh U. Kokatnur
Signed-off-by: Anna Schumaker

Chuck Lever
2015-03-31 21:52:53 +0800
31a701a94 xprtrdma: Add "reset MRs" memreg op ... Browse Code »

This method is invoked when a transport instance is about to be
reconnected. Each Memory Region object is reset to its initial
state.

Signed-off-by: Chuck Lever
Reviewed-by: Sagi Grimberg
Tested-by: Devesh Sharma
Tested-by: Meghana Cheripady
Tested-by: Veeresh U. Kokatnur
Signed-off-by: Anna Schumaker

Chuck Lever
2015-03-31 21:52:53 +0800
91e70e70e xprtrdma: Add "init MRs" memreg op ... Browse Code »

This method is used when setting up a new transport instance to
create a pool of Memory Region objects that will be used to register
memory during operation.

Memory Regions are not needed for "physical" registration, since
->prepare and ->release are no-ops for that mode.

Signed-off-by: Chuck Lever
Reviewed-by: Sagi Grimberg
Tested-by: Devesh Sharma
Tested-by: Meghana Cheripady
Tested-by: Veeresh U. Kokatnur
Signed-off-by: Anna Schumaker

Chuck Lever
2015-03-31 21:52:52 +0800
6814baead xprtrdma: Add a "deregister_external" op for each memreg mode ... Browse Code »

There is very little common processing among the different external
memory deregistration functions.

Signed-off-by: Chuck Lever
Tested-by: Devesh Sharma
Tested-by: Meghana Cheripady
Tested-by: Veeresh U. Kokatnur
Signed-off-by: Anna Schumaker

Chuck Lever
2015-03-31 21:52:52 +0800
9c1b4d775 xprtrdma: Add a "register_external" op for each memreg mode ... Browse Code »

There is very little common processing among the different external
memory registration functions. Have rpcrdma_create_chunks() call
the registration method directly. This removes a stack frame and a
switch statement from the external registration path.

Signed-off-by: Chuck Lever
Tested-by: Devesh Sharma
Tested-by: Meghana Cheripady
Tested-by: Veeresh U. Kokatnur
Signed-off-by: Anna Schumaker

Chuck Lever
2015-03-31 21:52:52 +0800
1c9351ee0 xprtrdma: Add a "max_payload" op for each memreg mode ... Browse Code »

The max_payload computation is generalized to ensure that the
payload maximum is the lesser of RPC_MAX_DATA_SEGS and the number of
data segments that can be transmitted in an inline buffer.

Signed-off-by: Chuck Lever
Reviewed-by: Sagi Grimberg
Tested-by: Devesh Sharma
Tested-by: Meghana Cheripady
Tested-by: Veeresh U. Kokatnur
Signed-off-by: Anna Schumaker

Chuck Lever
2015-03-31 21:52:52 +0800
a0ce85f59 xprtrdma: Add vector of ops for each memory registration strategy ... Browse Code »

Instead of employing switch() statements, let's use the typical
Linux kernel idiom for handling behavioral variation: virtual
functions.

Start by defining a vector of operations for each supported memory
registration mode, and by adding a source file for each mode.

Signed-off-by: Chuck Lever
Reviewed-by: Sagi Grimberg
Tested-by: Devesh Sharma
Tested-by: Meghana Cheripady
Tested-by: Veeresh U. Kokatnur
Signed-off-by: Anna Schumaker

Chuck Lever
2015-03-31 21:52:52 +0800
41f970289 xprtrdma: Prevent infinite loop in rpcrdma_ep_create() ... Browse Code »

If a provider advertizes a zero max_fast_reg_page_list_len, FRWR
depth detection loops forever. Instead of just failing the mount,
try other memory registration modes.

Fixes: 0fc6c4e7bb28 ("xprtrdma: mind the device's max fast . . .")
Reported-by: Devesh Sharma
Signed-off-by: Chuck Lever
Tested-by: Devesh Sharma
Tested-by: Meghana Cheripady
Tested-by: Veeresh U. Kokatnur
Signed-off-by: Anna Schumaker

Chuck Lever
2015-03-31 21:52:52 +0800
805272406 xprtrdma: Byte-align FRWR registration ... Browse Code »

The RPC/RDMA transport's FRWR registration logic registers whole
pages. This means areas in the first and last pages that are not
involved in the RDMA I/O are needlessly exposed to the server.

Buffered I/O is typically page-aligned, so not a problem there. But
for direct I/O, which can be byte-aligned, and for reply chunks,
which are nearly always smaller than a page, the transport could
expose memory outside the I/O buffer.

FRWR allows byte-aligned memory registration, so let's use it as
it was intended.

Reported-by: Sagi Grimberg
Signed-off-by: Chuck Lever
Tested-by: Devesh Sharma
Tested-by: Meghana Cheripady
Tested-by: Veeresh U. Kokatnur
Signed-off-by: Anna Schumaker

Chuck Lever
2015-03-31 21:52:52 +0800
e23779451 xprtrdma: Perform a full marshal on retransmit ... Browse Code »

Commit 6ab59945f292 ("xprtrdma: Update rkeys after transport
reconnect" added logic in the ->send_request path to update the
chunk list when an RPC/RDMA request is retransmitted.

Note that rpc_xdr_encode() resets and re-encodes the entire RPC
send buffer for each retransmit of an RPC. The RPC send buffer
is not preserved from the previous transmission of an RPC.

Revert 6ab59945f292, and instead, just force each request to be
fully marshaled every time through ->send_request. This should
preserve the fix from 6ab59945f292, while also performing pullup
during retransmits.

Signed-off-by: Chuck Lever
Acked-by: Sagi Grimberg
Tested-by: Devesh Sharma
Tested-by: Meghana Cheripady
Tested-by: Veeresh U. Kokatnur
Signed-off-by: Anna Schumaker

Chuck Lever
2015-03-31 21:52:52 +0800
0dd39cae2 xprtrdma: Display IPv6 addresses and port numbers correctly ... Browse Code »

Signed-off-by: Chuck Lever
Reviewed-by: Sagi Grimberg
Tested-by: Devesh Sharma
Tested-by: Meghana Cheripady
Tested-by: Veeresh U. Kokatnur
Signed-off-by: Anna Schumaker

Chuck Lever
2015-03-31 21:52:52 +0800

28 Mar, 2015

1 commit

0695314ef SUNRPC: Fix a regression when reconnecting ... Browse Code »

If the task needs to give up the socket lock in order to allow a
reconnect to occur, then it must also clear the 'rq_bytes_sent' field
so that when it retransmits, it knows to start from the beginning.

Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing")
Signed-off-by: Trond Myklebust

Trond Myklebust
2015-03-28 00:24:36 +0800

13 Mar, 2015

1 commit

55cc1d780 SUNRPC: fix build-warning due to format missmatch ... Browse Code »

fix build-warning introduced by commit: f0eede10fd4 ("SUNRPC: use
jiffies_to_msecs for converting jiffies") which did not fixup
the format properly (my bad).

Signed-off-by: Nicholas Mc Guire
Signed-off-by: Trond Myklebust

Nicholas Mc Guire
2015-03-13 21:05:20 +0800

12 Mar, 2015

1 commit

f0eede10f SUNRPC: use jiffies_to_msecs for converting jiffies ... Browse Code »
13

Use jiffies_to_msecs for converting jiffies as it handles all of the corner
cases reliably and also helps readability.

Signed-off-by: Nicholas Mc Guire
Signed-off-by: Trond Myklebust

Nicholas Mc Guire
2015-03-12 23:53:55 +0800

09 Mar, 2015

1 commit

1711fd9ad sunrpc: fix braino in ->poll() ... Browse Code »
2

POLL_OUT isn't what callers of ->poll() are expecting to see; it's
actually __SI_POLL | 2 and it's a siginfo code, not a poll bitmap
bit...

Signed-off-by: Al Viro
Cc: stable@vger.kernel.org
Cc: Bruce Fields
Signed-off-by: Linus Torvalds

Al Viro
2015-03-09 03:53:46 +0800

07 Mar, 2015

2 commits

d939be3ad treewide: Fix typo in printk messages ... Browse Code »

This patch fix spelling typo in printk messages.

Signed-off-by: Masanari Iida
Acked-by: Randy Dunlap
Signed-off-by: Jiri Kosina

Masanari Iida
2015-03-07 06:05:39 +0800
1b1bd5619 Merge tag 'nfs-for-4.0-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:

- Fix a regression in the NFSv4 open state recovery code
- Fix a regression in the NFSv4 close code
- Fix regressions and side-effects of the loop-back mounted NFS fixes
in 3.18, that cause the NFS read() syscall to return EBUSY.
- Fix regressions around the readdirplus code and how it interacts
with the VFS lazy unmount changes that went into v3.18.
- Fix issues with out-of-order RPC call replies replacing updated
attributes with stale ones (particularly after a truncate()).
- Fix an underflow checking issue with RPC/RDMA credits
- Fix a number of issues with the NFSv4 delegation return/free code.
- Fix issues around stale NFSv4.1 leases when doing a mount"

* tag 'nfs-for-4.0-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (24 commits)
NFSv4.1: Clear the old state by our client id before establishing a new lease
NFSv4: Fix a race in NFSv4.1 server trunking discovery
NFS: Don't write enable new pages while an invalidation is proceeding
NFS: Fix a regression in the read() syscall
NFSv4: Ensure we skip delegations that are already being returned
NFSv4: Pin the superblock while we're returning the delegation
NFSv4: Ensure we honour NFS_DELEGATION_RETURNING in nfs_inode_set_delegation()
NFSv4: Ensure that we don't reap a delegation that is being returned
NFS: Fix stateid used for NFS v4 closes
NFSv4: Don't call put_rpccred() under the rcu_read_lock()
NFS: Don't require a filehandle to refresh the inode in nfs_prime_dcache()
NFSv3: Use the readdir fileid as the mounted-on-fileid
NFS: Don't invalidate a submounted dentry in nfs_prime_dcache()
NFSv4: Set a barrier in the update_changeattr() helper
NFS: Fix nfs_post_op_update_inode() to set an attribute barrier
NFS: Remove size hack in nfs_inode_attrs_need_update()
NFSv4: Add attribute update barriers to delegreturn and pNFS layoutcommit
NFS: Add attribute update barriers to NFS writebacks
NFS: Set an attribute barrier on all updates
NFS: Add attribute update barriers to nfs_setattr_update_inode()
...

Linus Torvalds
2015-03-07 02:09:57 +0800