Eric Lee / smarc-fsl-linux-kernel

24 Mar, 2019

2 commits

43bceddcd svcrpc: fix UDP on servers with lots of threads ... Browse Code »

commit b7e5034cbecf5a65b7bfdc2b20a8378039577706 upstream.

James Pearson found that an NFS server stopped responding to UDP
requests if started with more than 1017 threads.

sv_max_mesg is about 2^20, so that is probably where the calculation
performed by

svc_sock_setbufsize(svsk->sk_sock,
(serv->sv_nrthreads+3) * serv->sv_max_mesg,
(serv->sv_nrthreads+3) * serv->sv_max_mesg);

starts to overflow an int.

Reported-by: James Pearson
Tested-by: James Pearson
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

J. Bruce Fields
2019-03-24 03:10:10 +0800
d84bc704b xprtrdma: Make sure Send CQ is allocated on an existing compvec ... Browse Code »

[ Upstream commit a4cb5bdb754afe21f3e9e7164213e8600cf69427 ]

Make sure the device has at least 2 completion vectors
before allocating to compvec#1

Fixes: a4699f5647f3 (xprtrdma: Put Send CQ in IB_POLL_WORKQUEUE mode)
Signed-off-by: Nicolas Morey-Chaisemartin
Reviewed-by: Chuck Lever
Signed-off-by: Anna Schumaker
Signed-off-by: Sasha Levin

Nicolas Morey-Chaisemartin
2019-03-24 03:09:45 +0800

27 Feb, 2019

1 commit

7169938b0 xprtrdma: Double free in rpcrdma_sendctxs_create() ... Browse Code »

[ Upstream commit 6e17f58c486d9554341f70aa5b63b8fbed07b3fa ]

The clean up is handled by the caller, rpcrdma_buffer_create(), so this
call to rpcrdma_sendctxs_destroy() leads to a double free.

Fixes: ae72950abf99 ("xprtrdma: Add data structure to manage RDMA Send arguments")
Signed-off-by: Dan Carpenter
Reviewed-by: Chuck Lever
Signed-off-by: Anna Schumaker
Signed-off-by: Sasha Levin

Dan Carpenter
2019-02-27 17:08:53 +0800

23 Feb, 2019

1 commit

a7e0b9680 sunrpc: fix 4 more call sites that were using stack memory with a scatterlist ... Browse Code »

commit e7afe6c1d486b516ed586dcc10b3e7e3e85a9c2b upstream.

While trying to reproduce a reported kernel panic on arm64, I discovered
that AUTH_GSS basically doesn't work at all with older enctypes on arm64
systems with CONFIG_VMAP_STACK enabled. It turns out there still a few
places using stack memory with scatterlists, causing krb5_encrypt() and
krb5_decrypt() to produce incorrect results (or a BUG if CONFIG_DEBUG_SG
is enabled).

Tested with cthon on v4.0/v4.1/v4.2 with krb5/krb5i/krb5p using
des3-cbc-sha1 and arcfour-hmac-md5.

Signed-off-by: Scott Mayhew
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Scott Mayhew
2019-02-23 16:07:27 +0800

15 Feb, 2019

3 commits

9b65b18f8 svcrdma: Remove max_sge check at connect time ... Browse Code »

commit e248aa7be86e8179f20ac0931774ecd746f3f5bf upstream.

Two and a half years ago, the client was changed to use gathered
Send for larger inline messages, in commit 655fec6987b ("xprtrdma:
Use gathered Send for large inline messages"). Several fixes were
required because there are a few in-kernel device drivers whose
max_sge is 3, and these were broken by the change.

Apparently my memory is going, because some time later, I submitted
commit 25fd86eca11c ("svcrdma: Don't overrun the SGE array in
svc_rdma_send_ctxt"), and after that, commit f3c1fd0ee294 ("svcrdma:
Reduce max_send_sges"). These too incorrectly assumed in-kernel
device drivers would have more than a few Send SGEs available.

The fix for the server side is not the same. This is because the
fundamental problem on the server is that, whether or not the client
has provisioned a chunk for the RPC reply, the server must squeeze
even the most complex RPC replies into a single RDMA Send. Failing
in the send path because of Send SGE exhaustion should never be an
option.

Therefore, instead of failing when the send path runs out of SGEs,
switch to using a bounce buffer mechanism to handle RPC replies that
are too complex for the device to send directly. That allows us to
remove the max_sge check to enable drivers with small max_sge to
work again.

Reported-by: Don Dutile
Fixes: 25fd86eca11c ("svcrdma: Don't overrun the SGE array in ...")
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Chuck Lever
2019-02-15 15:10:13 +0800
4d376ab80 svcrdma: Reduce max_send_sges ... Browse Code »

commit f3c1fd0ee294abd4367dfa72d89f016c682202f0 upstream.

There's no need to request a large number of send SGEs because the
inline threshold already constrains the number of SGEs per Send.

Signed-off-by: Chuck Lever
Reviewed-by: Sagi Grimberg
Signed-off-by: J. Bruce Fields
Cc: Don Dutile
Signed-off-by: Greg Kroah-Hartman

Chuck Lever
2019-02-15 15:10:13 +0800
2440f3ceb SUNRPC: Always drop the XPRT_LOCK on XPRT_CLOSE_WAIT ... Browse Code »

This patch is only appropriate for stable kernels v4.16 - v4.19

Since commit 9b30889c548a ("SUNRPC: Ensure we always close the socket after
a connection shuts down"), and until commit c544577daddb ("SUNRPC: Clean up
transport write space handling"), it is possible for the NFS client to spin
in the following tight loop:

269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_bind [sunrpc]
269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_connect [sunrpc]
269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_transmit [sunrpc]
269.964085: xprt_transmit: peer=[10.0.1.82]:2049 xid=0x761d3f77 status=-32
269.964085: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=-32 action=call_transmit_status [sunrpc]
269.964085: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=-32 action=call_status [sunrpc]
269.964085: rpc_call_status: task:43@0 status=-32

The issue is that the path through call_transmit_status does not release
the XPRT_LOCK when the transmit result is -EPIPE, so the socket cannot be
properly shut down.

The below commit fixed things up in mainline by unconditionally calling
xprt_end_transmit() and releasing the XPRT_LOCK after every pass through
call_transmit. However, the entirety of this commit is not appropriate for
stable kernels because its original inclusion was part of a series that
modifies the sunrpc code to use a different queueing model. As a result,
there are machinations within this patch that are not needed for a stable
fix and will not make sense without a larger backport of the mainline
series.

In this patch, we take the slightly modified bit of the mainline patch
below, which is to release the XPRT_LOCK on transmission error should we
detect that the transport is waiting to close.

commit c544577daddb618c7dd5fa7fb98d6a41782f020e upstream
Author: Trond Myklebust
Date: Mon Sep 3 23:39:27 2018 -0400

SUNRPC: Clean up transport write space handling

Treat socket write space handling in the same way we now treat transport
congestion: by denying the XPRT_LOCK until the transport signals that it
has free buffer space.

Signed-off-by: Trond Myklebust

The original discussion of the problem is here:

https://lore.kernel.org/linux-nfs/20181212135157.4489-1-dwysocha@redhat.com/T/#t

This passes my usual cthon and xfstests on NFS as applied on v4.19 mainline.

Reported-by: Dave Wysochanski
Suggested-by: Trond Myklebust
Signed-off-by: Benjamin Coddington
Signed-off-by: Greg Kroah-Hartman

Benjamin Coddington
2019-02-15 15:10:13 +0800

23 Jan, 2019

1 commit

61b29bed9 sunrpc: handle ENOMEM in rpcb_getport_async ... Browse Code »

commit 81c88b18de1f11f70c97f28ced8d642c00bb3955 upstream.

If we ignore the error we'll hit a null dereference a little later.

Reported-by: syzbot+4b98281f2401ab849f4b@syzkaller.appspotmail.com
Signed-off-by: J. Bruce Fields
Signed-off-by: Anna Schumaker
Signed-off-by: Greg Kroah-Hartman

J. Bruce Fields
2019-01-23 04:40:35 +0800

17 Jan, 2019

1 commit

44e7bab39 sunrpc: use-after-free in svc_process_common() ... Browse Code »

commit d4b09acf924b84bae77cad090a9d108e70b43643 upstream.

if node have NFSv41+ mounts inside several net namespaces
it can lead to use-after-free in svc_process_common()

svc_process_common()
/* Setup reply header */
rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE

svc_process_common() can use incorrect rqstp->rq_xprt,
its caller function bc_svc_process() takes it from serv->sv_bc_xprt.
The problem is that serv is global structure but sv_bc_xprt
is assigned per-netnamespace.

According to Trond, the whole "let's set up rqstp->rq_xprt
for the back channel" is nothing but a giant hack in order
to work around the fact that svc_process_common() uses it
to find the xpt_ops, and perform a couple of (meaningless
for the back channel) tests of xpt_flags.

All we really need in svc_process_common() is to be able to run
rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr()

Bruce J Fields points that this xpo_prep_reply_hdr() call
is an awfully roundabout way just to do "svc_putnl(resv, 0);"
in the tcp case.

This patch does not initialiuze rqstp->rq_xprt in bc_svc_process(),
now it calls svc_process_common() with rqstp->rq_xprt = NULL.

To adjust reply header svc_process_common() just check
rqstp->rq_prot and calls svc_tcp_prep_reply_hdr() for tcp case.

To handle rqstp->rq_xprt = NULL case in functions called from
svc_process_common() patch intruduces net namespace pointer
svc_rqst->rq_bc_net and adjust SVC_NET() definition.
Some other function was also adopted to properly handle described case.

Signed-off-by: Vasily Averin
Cc: stable@vger.kernel.org
Fixes: 23c20ecd4475 ("NFS: callback up - users counting cleanup")
Signed-off-by: J. Bruce Fields
v2: added lost extern svc_tcp_prep_reply_hdr()
Signed-off-by: Vasily Averin
Signed-off-by: Greg Kroah-Hartman

Vasily Averin
2019-01-17 05:04:37 +0800

13 Jan, 2019

3 commits

668ecd6b1 sunrpc: use SVC_NET() in svcauth_gss_* functions ... Browse Code »

commit b8be5674fa9a6f3677865ea93f7803c4212f3e10 upstream.

Signed-off-by: Vasily Averin
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Vasily Averin
2019-01-13 16:51:05 +0800
807957cec sunrpc: fix cache_head leak due to queued request ... Browse Code »

commit 4ecd55ea074217473f94cfee21bb72864d39f8d7 upstream.

After commit d202cce8963d, an expired cache_head can be removed from the
cache_detail's hash.

However, the expired cache_head may be waiting for a reply from a
previously submitted request. Such a cache_head has an increased
refcounter and therefore it won't be freed after cache_put(freeme).

Because the cache_head was removed from the hash it cannot be found
during cache_clean() and can be leaked forever, together with stalled
cache_request and other taken resources.

In our case we noticed it because an entry in the export cache was
holding a reference on a filesystem.

Fixes d202cce8963d ("sunrpc: never return expired entries in sunrpc_cache_lookup")
Cc: Pavel Tikhomirov
Cc: stable@kernel.org # 2.6.35
Signed-off-by: Vasily Averin
Reviewed-by: NeilBrown
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Vasily Averin
2019-01-13 16:51:05 +0800
233ba13db SUNRPC: Fix a race with XPRT_CONNECTING ... Browse Code »

[ Upstream commit cf76785d30712d90185455e752337acdb53d2a5d ]

Ensure that we clear XPRT_CONNECTING before releasing the XPRT_LOCK so that
we don't have races between the (asynchronous) socket setup code and
tasks in xprt_connect().

Signed-off-by: Trond Myklebust
Tested-by: Chuck Lever
Signed-off-by: Sasha Levin

Trond Myklebust
2019-01-13 16:51:01 +0800

10 Jan, 2019

1 commit

60f05dddf sock: Make sock->sk_stamp thread-safe ... Browse Code »

[ Upstream commit 3a0ed3e9619738067214871e9cb826fa23b2ddb9 ]

Al Viro mentioned (Message-ID
)
that there is probably a race condition
lurking in accesses of sk_stamp on 32-bit machines.

sock->sk_stamp is of type ktime_t which is always an s64.
On a 32 bit architecture, we might run into situations of
unsafe access as the access to the field becomes non atomic.

Use seqlocks for synchronization.
This allows us to avoid using spinlocks for readers as
readers do not need mutual exclusion.

Another approach to solve this is to require sk_lock for all
modifications of the timestamps. The current approach allows
for timestamps to have their own lock: sk_stamp_lock.
This allows for the patch to not compete with already
existing critical sections, and side effects are limited
to the paths in the patch.

The addition of the new field maintains the data locality
optimizations from
commit 9115e8cd2a0c ("net: reorganize struct sock for better data
locality")

Note that all the instances of the sk_stamp accesses
are either through the ioctl or the syscall recvmsg.

Signed-off-by: Deepa Dinamani
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Deepa Dinamani
2019-01-10 00:38:33 +0800

21 Dec, 2018

1 commit

20595815b SUNRPC: Fix a potential race in xprt_connect() ... Browse Code »

[ Upstream commit 0a9a4304f3614e25d9de9b63502ca633c01c0d70 ]

If an asynchronous connection attempt completes while another task is
in xprt_connect(), then the call to rpc_sleep_on() could end up
racing with the call to xprt_wake_pending_tasks().
So add a second test of the connection state after we've put the
task to sleep and set the XPRT_CONNECTING flag, when we know that there
can be no asynchronous connection attempts still in progress.

Fixes: 0b9e79431377d ("SUNRPC: Move the test for XPRT_CONNECTING into...")
Signed-off-by: Trond Myklebust
Signed-off-by: Sasha Levin

Trond Myklebust
2018-12-21 21:15:17 +0800

13 Dec, 2018

1 commit

33154a299 SUNRPC: Fix leak of krb5p encode pages ... Browse Code »

commit 8dae5398ab1ac107b1517e8195ed043d5f422bd0 upstream.

call_encode can be invoked more than once per RPC call. Ensure that
each call to gss_wrap_req_priv does not overwrite pointers to
previously allocated memory.

Signed-off-by: Chuck Lever
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Chuck Lever
2018-12-13 16:16:19 +0800

01 Dec, 2018

1 commit

ab1a52066 SUNRPC: Fix a bogus get/put in generic_key_to_expire() ... Browse Code »

[ Upstream commit e3d5e573a54dabdc0f9f3cb039d799323372b251 ]

Signed-off-by: Trond Myklebust
Signed-off-by: Sasha Levin

Trond Myklebust
2018-12-01 16:37:33 +0800

27 Nov, 2018

1 commit

7818cf3d9 SUNRPC: drop pointless static qualifier in xdr_get_next_encode_buffer() ... Browse Code »

[ Upstream commit 025911a5f4e36955498ed50806ad1b02f0f76288 ]

There is no need to have the '__be32 *p' variable static since new value
always be assigned before use it.

Signed-off-by: YueHaibing
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields
Signed-off-by: Sasha Levin

YueHaibing
2018-11-27 23:13:08 +0800

21 Nov, 2018

1 commit

03c91663c sunrpc: correct the computation for page_ptr when truncating ... Browse Code »

commit 5d7a5bcb67c70cbc904057ef52d3fcfeb24420bb upstream.

When truncating the encode buffer, the page_ptr is getting
advanced, causing the next page to be skipped while encoding.
The page is still included in the response, so the response
contains a page of bogus data.

We need to adjust the page_ptr backwards to ensure we encode
the next page into the correct place.

We saw this triggered when concurrent directory modifications caused
nfsd4_encode_direct_fattr() to return nfserr_noent, and the resulting
call to xdr_truncate_encode() corrupted the READDIR reply.

Signed-off-by: Frank Sorenson
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Frank Sorenson
2018-11-21 16:19:23 +0800

14 Nov, 2018

2 commits

18c1d28f3 nfsd: Fix an Oops in free_session() ... Browse Code »

commit bb6ad5572c0022e17e846b382d7413cdcf8055be upstream.

In call_xpt_users(), we delete the entry from the list, but we
do not reinitialise it. This triggers the list poisoning when
we later call unregister_xpt_user() in nfsd4_del_conns().

Signed-off-by: Trond Myklebust
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields
Signed-off-by: Greg Kroah-Hartman

Trond Myklebust
2018-11-14 03:08:49 +0800
5fac1c6cc xprtrdma: Reset credit grant properly after a disconnect ... Browse Code »

[ Upstream commit ef739b2175dde9c05594f768cb78149f1ce2ac36 ]

On a fresh connection, an RPC/RDMA client is supposed to send only
one RPC Call until it gets a credit grant in the first RPC Reply
from the server [RFC 8166, Section 3.3.3].

There is a bug in the Linux client's credit accounting mechanism
introduced by commit e7ce710a8802 ("xprtrdma: Avoid deadlock when
credit window is reset"). On connect, it simply dumps all pending
RPC Calls onto the new connection.

Servers have been tolerant of this bad behavior. Currently no server
implementation ever changes its credit grant over reconnects, and
servers always repost enough Receives before connections are fully
established.

To correct this issue, ensure that the client resets both the credit
grant _and_ the congestion window when handling a reconnect.

Fixes: e7ce710a8802 ("xprtrdma: Avoid deadlock when credit ... ")
Signed-off-by: Chuck Lever
Cc: stable@kernel.org
Signed-off-by: Anna Schumaker
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Chuck Lever
2018-11-14 03:08:34 +0800

24 Aug, 2018

2 commits

53a01c9a5 Merge tag 'nfs-for-4.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs ... Browse Code »

Pull NFS client updates from Anna Schumaker:
"These patches include adding async support for the v4.2 COPY
operation. I think Bruce is planning to send the server patches for
the next release, but I figured we could get the client side out of
the way now since it's been in my tree for a while. This shouldn't
cause any problems, since the server will still respond with
synchronous copies even if the client requests async.

Features:
- Add support for asynchronous server-side COPY operations

Stable bufixes:
- Fix an off-by-one in bl_map_stripe() (v3.17+)
- NFSv4 client live hangs after live data migration recovery (v4.9+)
- xprtrdma: Fix disconnect regression (v4.18+)
- Fix locking in pnfs_generic_recover_commit_reqs (v4.14+)
- Fix a sleep in atomic context in nfs4_callback_sequence() (v4.9+)

Other bugfixes and cleanups:
- Optimizations and fixes involving NFS v4.1 / pNFS layout handling
- Optimize lseek(fd, SEEK_CUR, 0) on directories to avoid locking
- Immediately reschedule writeback when the server replies with an
error
- Fix excessive attribute revalidation in nfs_execute_ok()
- Add error checking to nfs_idmap_prepare_message()
- Use new vm_fault_t return type
- Return a delegation when reclaiming one that the server has
recalled
- Referrals should inherit proto setting from parents
- Make rpc_auth_create_args a const
- Improvements to rpc_iostats tracking
- Fix a potential reference leak when there is an error processing a
callback
- Fix rmdir / mkdir / rename nlink accounting
- Fix updating inode change attribute
- Fix error handling in nfsn4_sp4_select_mode()
- Use an appropriate work queue for direct-write completion
- Don't busy wait if NFSv4 session draining is interrupted"

* tag 'nfs-for-4.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (54 commits)
pNFS: Remove unwanted optimisation of layoutget
pNFS/flexfiles: ff_layout_pg_init_read should exit on error
pNFS: Treat RECALLCONFLICT like DELAY...
pNFS: When updating the stateid in layoutreturn, also update the recall range
NFSv4: Fix a sleep in atomic context in nfs4_callback_sequence()
NFSv4: Fix locking in pnfs_generic_recover_commit_reqs
NFSv4: Fix a typo in nfs4_init_channel_attrs()
NFSv4: Don't busy wait if NFSv4 session draining is interrupted
NFS recover from destination server reboot for copies
NFS add a simple sync nfs4_proc_commit after async COPY
NFS handle COPY ERR_OFFLOAD_NO_REQS
NFS send OFFLOAD_CANCEL when COPY killed
NFS export nfs4_async_handle_error
NFS handle COPY reply CB_OFFLOAD call race
NFS add support for asynchronous COPY
NFS COPY xdr handle async reply
NFS OFFLOAD_CANCEL xdr
NFS CB_OFFLOAD xdr
NFS: Use an appropriate work queue for direct-write completion
NFSv4: Fix error handling in nfs4_sp4_select_mode()
...

Linus Torvalds
2018-08-24 07:03:58 +0800
9157141c9 Merge tag 'nfsd-4.19-1' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd updates from Bruce Fields:
"Chuck Lever fixed a problem with NFSv4.0 callbacks over GSS from
multi-homed servers.

The only new feature is a minor bit of protocol (change_attr_type)
which the client doesn't even use yet.

Other than that, various bugfixes and cleanup"

* tag 'nfsd-4.19-1' of git://linux-nfs.org/~bfields/linux: (27 commits)
sunrpc: Add comment defining gssd upcall API keywords
nfsd: Remove callback_cred
nfsd: Use correct credential for NFSv4.0 callback with GSS
sunrpc: Extract target name into svc_cred
sunrpc: Enable the kernel to specify the hostname part of service principals
sunrpc: Don't use stack buffer with scatterlist
rpc: remove unneeded variable 'ret' in rdma_listen_handler
nfsd: use true and false for boolean values
nfsd: constify write_op[]
fs/nfsd: Delete invalid assignment statements in nfsd4_decode_exchange_id
NFSD: Handle full-length symlinks
NFSD: Refactor the generic write vector fill helper
svcrdma: Clean up Read chunk path
svcrdma: Avoid releasing a page in svc_xprt_release()
nfsd: Mark expected switch fall-through
sunrpc: remove redundant variables 'checksumlen','blocksize' and 'data'
nfsd: fix leaked file lock with nfs exported overlayfs
nfsd: don't advertise a SCSI layout for an unsupported request_queue
nfsd: fix corrupted reply to badly ordered compound
nfsd: clarify check_op_ordering
...

Linus Torvalds
2018-08-24 07:00:10 +0800

23 Aug, 2018

4 commits

108b833cd sunrpc: Add comment defining gssd upcall API keywords ... Browse Code »

During review, it was found that the target, service, and srchost
keywords are easily conflated. Add an explainer.

Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields

Chuck Lever
2018-08-23 06:32:07 +0800
9abdda5dd sunrpc: Extract target name into svc_cred ... Browse Code »

NFSv4.0 callback needs to know the GSS target name the client used
when it established its lease. That information is available from
the GSS context created by gssproxy. Make it available in each
svc_cred.

Note this will also give us access to the real target service
principal name (which is typically "nfs", but spec does not require
that).

Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields

Chuck Lever
2018-08-23 06:32:07 +0800
a1a237775 sunrpc: Enable the kernel to specify the hostname part of service principals ... Browse Code »

A multi-homed NFS server may have more than one "nfs" key in its
keytab. Enable the kernel to pick the key it wants as a machine
credential when establishing a GSS context.

This is useful for GSS-protected NFSv4.0 callbacks, which are
required by RFC 7530 S3.3.3 to use the same principal as the service
principal the client used when establishing its lease.

A complementary modification to rpc.gssd is required to fully enable
this feature.

Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields

Chuck Lever
2018-08-23 06:32:07 +0800
44090cc87 sunrpc: Don't use stack buffer with scatterlist ... Browse Code »

Fedora got a bug report from NFS:

kernel BUG at include/linux/scatterlist.h:143!
...
RIP: 0010:sg_init_one+0x7d/0x90
..
make_checksum+0x4e7/0x760 [rpcsec_gss_krb5]
gss_get_mic_kerberos+0x26e/0x310 [rpcsec_gss_krb5]
gss_marshal+0x126/0x1a0 [auth_rpcgss]
? __local_bh_enable_ip+0x80/0xe0
? call_transmit_status+0x1d0/0x1d0 [sunrpc]
call_transmit+0x137/0x230 [sunrpc]
__rpc_execute+0x9b/0x490 [sunrpc]
rpc_run_task+0x119/0x150 [sunrpc]
nfs4_run_exchange_id+0x1bd/0x250 [nfsv4]
_nfs4_proc_exchange_id+0x2d/0x490 [nfsv4]
nfs41_discover_server_trunking+0x1c/0xa0 [nfsv4]
nfs4_discover_server_trunking+0x80/0x270 [nfsv4]
nfs4_init_client+0x16e/0x240 [nfsv4]
? nfs_get_client+0x4c9/0x5d0 [nfs]
? _raw_spin_unlock+0x24/0x30
? nfs_get_client+0x4c9/0x5d0 [nfs]
nfs4_set_client+0xb2/0x100 [nfsv4]
nfs4_create_server+0xff/0x290 [nfsv4]
nfs4_remote_mount+0x28/0x50 [nfsv4]
mount_fs+0x3b/0x16a
vfs_kern_mount.part.35+0x54/0x160
nfs_do_root_mount+0x7f/0xc0 [nfsv4]
nfs4_try_mount+0x43/0x70 [nfsv4]
? get_nfs_version+0x21/0x80 [nfs]
nfs_fs_mount+0x789/0xbf0 [nfs]
? pcpu_alloc+0x6ca/0x7e0
? nfs_clone_super+0x70/0x70 [nfs]
? nfs_parse_mount_options+0xb40/0xb40 [nfs]
mount_fs+0x3b/0x16a
vfs_kern_mount.part.35+0x54/0x160
do_mount+0x1fd/0xd50
ksys_mount+0xba/0xd0
__x64_sys_mount+0x21/0x30
do_syscall_64+0x60/0x1f0
entry_SYSCALL_64_after_hwframe+0x49/0xbe

This is BUG_ON(!virt_addr_valid(buf)) triggered by using a stack
allocated buffer with a scatterlist. Convert the buffer for
rc4salt to be dynamically allocated instead.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1615258
Signed-off-by: Laura Abbott
Signed-off-by: J. Bruce Fields

Laura Abbott
2018-08-23 06:32:07 +0800

17 Aug, 2018

2 commits

0a3173a5f Merge branch 'linus/master' into rdma.git for-next ... Browse Code »

rdma.git merge resolution for the 4.19 merge window

Conflicts:
drivers/infiniband/core/rdma_core.c
- Use the rdma code and revise with the new spelling for
atomic_fetch_add_unless
drivers/nvme/host/rdma.c
- Replace max_sge with max_send_sge in new blk code
drivers/nvme/target/rdma.c
- Use the blk code and revise to use NULL for ib_post_recv when
appropriate
- Replace max_sge with max_recv_sge in new blk code
net/rds/ib_send.c
- Use the net code and revise to use NULL for ib_post_recv when
appropriate

Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2018-08-17 04:21:29 +0800
89982f7cc Merge tag 'v4.18' into rdma.git for-next ... Browse Code »

Resolve merge conflicts from the -rc cycle against the rdma.git tree:

Conflicts:
drivers/infiniband/core/uverbs_cmd.c
- New ifs added to ib_uverbs_ex_create_flow in -rc and for-next
- Merge removal of file->ucontext in for-next with new code in -rc
drivers/infiniband/core/uverbs_main.c
- for-next removed code from ib_uverbs_write() that was modified
in for-rc

Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2018-08-17 03:12:00 +0800

10 Aug, 2018

6 commits

ac5bb5b3b rpc: remove unneeded variable 'ret' in rdma_listen_handler ... Browse Code »

The ret is not modified after initalization, So just remove the variable
and return 0.

Signed-off-by: zhong jiang
Signed-off-by: J. Bruce Fields

zhong jiang
2018-08-10 04:11:21 +0800
11b4d66ea NFSD: Handle full-length symlinks ... Browse Code »

I've given up on the idea of zero-copy handling of SYMLINK on the
server side. This is because the Linux VFS symlink API requires the
symlink pathname to be in a NUL-terminated kmalloc'd buffer. The
NUL-termination is going to be problematic (watching out for
landing on a page boundary and dealing with a 4096-byte pathname).

I don't believe that SYMLINK creation is on a performance path or is
requested frequently enough that it will cause noticeable CPU cache
pollution due to data copies.

There will be two places where a transport callout will be necessary
to fill in the rqstp: one will be in the svc_fill_symlink_pathname()
helper that is used by NFSv2 and NFSv3, and the other will be in
nfsd4_decode_create().

Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields

Chuck Lever
2018-08-10 04:11:21 +0800
3fd9557ae NFSD: Refactor the generic write vector fill helper ... Browse Code »

fill_in_write_vector() is nearly the same logic as
svc_fill_write_vector(), but there are a few differences so that
the former can handle multiple WRITE payloads in a single COMPOUND.

svc_fill_write_vector() can be adjusted so that it can be used in
the NFSv4 WRITE code path too. Instead of assuming the pages are
coming from rq_args.pages, have the caller pass in the page list.

The immediate benefit is a reduction of code duplication. It also
prevents the NFSv4 WRITE decoder from passing an empty vector
element when the transport has provided the payload in the xdr_buf's
page array.

Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields

Chuck Lever
2018-08-10 04:11:21 +0800
07d0ff3b0 svcrdma: Clean up Read chunk path ... Browse Code »

Simplify the error handling at the tail of recv_read_chunk() by
re-arranging rq_pages[] housekeeping and documenting it properly.

NB: In this path, svc_rdma_recvfrom returns zero. Therefore no
subsequent reply processing is done on the svc_rqstp, and thus the
rq_respages field does not need to be updated.

Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields

Chuck Lever
2018-08-10 04:11:21 +0800
a53d5cb06 svcrdma: Avoid releasing a page in svc_xprt_release() ... Browse Code »

svc_xprt_release() invokes svc_free_res_pages(), which releases
pages between rq_respages and rq_next_page.

Historically, the RPC/RDMA transport has set these two pointers to
be different by one, which means:

- one page gets released when svc_recv returns 0. This normally
happens whenever one or more RDMA Reads need to be dispatched to
complete construction of an RPC Call.

- one page gets released after every call to svc_send.

In both cases, this released page is immediately refilled by
svc_alloc_arg. There does not seem to be a reason for releasing this
page.

To avoid this unnecessary memory allocator traffic, set rq_next_page
more carefully.

Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields

Chuck Lever
2018-08-10 04:11:21 +0800
9cc3b98d1 sunrpc: remove redundant variables 'checksumlen','blocksize' and 'data' ... Browse Code »

Variables 'checksumlen','blocksize' and 'data' are being assigned,
but are never used, hence they are redundant and can be removed.

Fix the following warning:

net/sunrpc/auth_gss/gss_krb5_wrap.c:443:7: warning: variable ‘blocksize’ set but not used [-Wunused-but-set-variable]
net/sunrpc/auth_gss/gss_krb5_crypto.c:376:15: warning: variable ‘checksumlen’ set but not used [-Wunused-but-set-variable]
net/sunrpc/xprtrdma/svc_rdma.c:97:9: warning: variable ‘data’ set but not used [-Wunused-but-set-variable]

Signed-off-by: YueHaibing
Reviewed-by: Chuck Lever
Signed-off-by: J. Bruce Fields

YueHaibing
2018-08-10 04:11:21 +0800

09 Aug, 2018

1 commit

8d4fb8ff4 xprtrdma: Fix disconnect regression ... Browse Code »

I found that injecting disconnects with v4.18-rc resulted in
random failures of the multi-threaded git regression test.

The root cause appears to be that, after a reconnect, the
RPC/RDMA transport is waking pending RPCs before the transport has
posted enough Receive buffers to receive the Replies. If a Reply
arrives before enough Receive buffers are posted, the connection
is dropped. A few connection drops happen in quick succession as
the client and server struggle to regain credit synchronization.

This regression was introduced with commit 7c8d9e7c8863 ("xprtrdma:
Move Receive posting to Receive handler"). The client is supposed to
post a single Receive when a connection is established because
it's not supposed to send more than one RPC Call before it gets
a fresh credit grant in the first RPC Reply [RFC 8166, Section
3.3.3].

Unfortunately there appears to be a longstanding bug in the Linux
client's credit accounting mechanism. On connect, it simply dumps
all pending RPC Calls onto the new connection. It's possible it has
done this ever since the RPC/RDMA transport was added to the kernel
ten years ago.

Servers have so far been tolerant of this bad behavior. Currently no
server implementation ever changes its credit grant over reconnects,
and servers always repost enough Receives before connections are
fully established.

The Linux client implementation used to post a Receive before each
of these Calls. This has covered up the flooding send behavior.

I could try to correct this old bug so that the client sends exactly
one RPC Call and waits for a Reply. Since we are so close to the
next merge window, I'm going to instead provide a simple patch to
post enough Receives before a reconnect completes (based on the
number of credits granted to the previous connection).

The spurious disconnects will be gone, but the client will still
send multiple RPC Calls immediately after a reconnect.

Addressing the latter problem will wait for a merge window because
a) I expect it to be a large change requiring lots of testing, and
b) obviously the Linux client has interoperated successfully since
day zero while still being broken.

Fixes: 7c8d9e7c8863 ("xprtrdma: Move Receive posting to ... ")
Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: Chuck Lever
Signed-off-by: Anna Schumaker

Chuck Lever
2018-08-09 04:49:47 +0800

05 Aug, 2018

1 commit

07d53ae4f net: Remove some unneeded semicolon ... Browse Code »

These semicolons are not needed. Just remove them.

Signed-off-by: zhong jiang
Signed-off-by: David S. Miller

zhong jiang
2018-08-05 04:05:39 +0800

01 Aug, 2018

4 commits

8fdee4cc9 sunrpc: whitespace fixes ... Browse Code »

Remove trailing whitespace and blank line at EOF

Signed-off-by: Stephen Hemminger
Signed-off-by: Anna Schumaker

Stephen Hemminger
2018-08-01 00:53:40 +0800
0f90be132 NFSv4 client live hangs after live data migration recovery ... Browse Code »

After a live data migration event at the NFS server, the client may send
I/O requests to the wrong server, causing a live hang due to repeated
recovery events. On the wire, this will appear as an I/O request failing
with NFS4ERR_BADSESSION, followed by successful CREATE_SESSION, repeatedly.
NFS4ERR_BADSSESSION is returned because the session ID being used was
issued by the other server and is not valid at the old server.

The failure is caused by async worker threads having cached the transport
(xprt) in the rpc_task structure. After the migration recovery completes,
the task is redispatched and the task resends the request to the wrong
server based on the old value still present in tk_xprt.

The solution is to recompute the tk_xprt field of the rpc_task structure
so that the request goes to the correct server.

Signed-off-by: Bill Baker
Reviewed-by: Chuck Lever
Tested-by: Helen Chao
Fixes: fb43d17210ba ("SUNRPC: Use the multipath iterator to assign a ...")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Anna Schumaker

Bill Baker
2018-08-01 00:53:40 +0800
1a54c0cfc sunrpc: kstrtoul() can also return -ERANGE ... Browse Code »

Smatch complains that "num" can be uninitialized when kstrtoul() returns
-ERANGE. It's true enough, but basically harmless in this case.

Signed-off-by: Dan Carpenter
Signed-off-by: Anna Schumaker

Dan Carpenter
2018-08-01 00:53:40 +0800
016583d70 sunrpc: Change rpc_print_iostats to rpc_clnt_show_stats and handle rpc_clnt clones ... Browse Code »

The existing rpc_print_iostats has a few shortcomings. First, the naming
is not consistent with other functions in the kernel that display stats.
Second, it is really displaying stats for an rpc_clnt structure as it
displays both xprt stats and per-op stats. Third, it does not handle
rpc_clnt clones, which is important for the one in-kernel tree caller
of this function, the NFS client's nfs_show_stats function.

Fix all of the above by renaming the rpc_print_iostats to
rpc_clnt_show_stats and looping through any rpc_clnt clones via
cl_parent.

Once this interface is fixed, this addresses a problem with NFSv4.
Before this patch, the /proc/self/mountstats always showed incorrect
counts for NFSv4 lease and session related opcodes such as SEQUENCE,
RENEW, SETCLIENTID, CREATE_SESSION, etc. These counts were always 0
even though many ops would go over the wire. The reason for this is
there are multiple rpc_clnt structures allocated for any given NFSv4
mount, and inside nfs_show_stats() we callled into rpc_print_iostats()
which only handled one of them, nfs_server->client. Fix these counts
by calling sunrpc's new rpc_clnt_show_stats() function, which handles
cloned rpc_clnt structs and prints the stats together.

Note that one side-effect of the above is that multiple mounts from
the same NFS server will show identical counts in the above ops due
to the fact the one rpc_clnt (representing the NFSv4 client state)
is shared across mounts.

Signed-off-by: Dave Wysochanski
Signed-off-by: Anna Schumaker

Dave Wysochanski
2018-08-01 00:53:35 +0800