30 Dec, 2020
1 commit
-
[ Upstream commit 9b82d88d5976e5f2b8015d58913654856576ace5 ]
NLM uses an interval-based rebinding, i.e. it clears the transport's
binding under certain conditions if more than 60 seconds have elapsed
since the connection was last bound.This rebinding is not necessary for an autobind RPC client over a
connection-oriented protocol like TCP.It can also cause problems: it is possible for nlm_bind_host() to clear
XPRT_BOUND whilst a connection worker is in the middle of trying to
reconnect, after it had already been checked in xprt_connect().When the connection worker notices that XPRT_BOUND has been cleared
under it, in xs_tcp_finish_connecting(), that results in:xs_tcp_setup_socket: connect returned unhandled error -107
Worse, it's possible that the two can get into lockstep, resulting in
the same behaviour repeated indefinitely, with the above error every
300 seconds, without ever recovering, and the connection never being
established. This has been seen in practice, with a large number of NLM
client tasks, following a server restart.The existing callers of nlm_bind_host & nlm_rebind_host should not need
to force the rebind, for TCP, so restrict the interval-based rebinding
to UDP only.For TCP, we will still rebind when needed, e.g. on timeout, and connection
error (including closure), since connection-related errors on an existing
connection, ECONNREFUSED when trying to connect, and rpc_check_timeout(),
already unconditionally clear XPRT_BOUND.To avoid having to add the fix, and explanation, to both nlm_bind_host()
and nlm_rebind_host(), remove the duplicate code from the former, and
have it call the latter.Drop the dprintk, which adds no value over a trace.
Signed-off-by: Calum Mackay
Fixes: 35f5a422ce1a ("SUNRPC: new interface to force an RPC rebind")
Signed-off-by: Trond Myklebust
Signed-off-by: Sasha Levin
04 Nov, 2019
1 commit
-
NFSv2, v3 and NFSv4 servers often have duplicate replay caches that look
at the source port when deciding whether or not an RPC call is a replay
of a previous call. This requires clients to perform strange TCP gymnastics
in order to ensure that when they reconnect to the server, they bind
to the same source port.NFSv4.1 and NFSv4.2 have sessions that provide proper replay semantics,
that do not look at the source port of the connection. This patch therefore
ensures they can ignore the rebind requirement.Signed-off-by: Trond Myklebust
27 Apr, 2019
2 commits
-
When we create a new lockd client, we want to be able to pass the
correct credential of the process that created the struct nlm_host.Signed-off-by: Trond Myklebust
Signed-off-by: Anna Schumaker -
When converting kuids to AUTH_UNIX creds, etc we will want to use the
same user namespace as the process that created the rpc client.Signed-off-by: Trond Myklebust
Signed-off-by: Anna Schumaker
19 Mar, 2019
1 commit
-
If the last NFSv3 unmount from a given host races with a mount from the
same host, we can destroy an nlm_host that is still in use.Specifically nlmclnt_lookup_host() can increment h_count on
an nlm_host that nlmclnt_release_host() has just successfully called
refcount_dec_and_test() on.
Once nlmclnt_lookup_host() drops the mutex, nlm_destroy_host_lock()
will be called to destroy the nlmclnt which is now in use again.The cause of the problem is that the dec_and_test happens outside the
locked region. This is easily fixed by using
refcount_dec_and_mutex_lock().Fixes: 8ea6ecc8b075 ("lockd: Create client-side nlm_host cache")
Cc: stable@vger.kernel.org (v2.6.38+)
Signed-off-by: NeilBrown
Signed-off-by: Trond Myklebust
30 Oct, 2018
1 commit
-
printk format used %*s instead of %.*s, so hostname_len does not limit
the number of bytes accessed from hostname.Signed-off-by: Amir Goldstein
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields
25 Jan, 2018
1 commit
-
The server shouldn't actually delete the struct nlm_host until it hits
the garbage collector. In order to make that work correctly with the
refcount API, we can bump the refcount by one, and then use
refcount_dec_if_one() in the garbage collector.Signed-off-by: Trond Myklebust
Acked-by: J. Bruce Fields
15 Jan, 2018
2 commits
-
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.The variable nsm_handle.sm_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.**Important note for maintainers:
Some functions from refcount_t API defined in lib/refcount.c
have different memory ordering guarantees than their atomic
counterparts.
The full comparison can be seen in
https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
in state to be merged to the documentation tree.
Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.
Please double check that you don't have some undocumented
memory guarantees for this variable usage.For the nsm_handle.sm_count it might make a difference
in following places:
- nsm_release(): decrement in refcount_dec_and_lock() only
provides RELEASE ordering, control dependency on success
and holds a spin lock on success vs. fully ordered atomic
counterpart. No change for the spin lock guarantees.Suggested-by: Kees Cook
Reviewed-by: David Windsor
Reviewed-by: Hans Liljestrand
Signed-off-by: Elena Reshetova
Signed-off-by: Trond Myklebust -
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.The variable nlm_host.h_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.**Important note for maintainers:
Some functions from refcount_t API defined in lib/refcount.c
have different memory ordering guarantees than their atomic
counterparts.
The full comparison can be seen in
https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
in state to be merged to the documentation tree.
Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.
Please double check that you don't have some undocumented
memory guarantees for this variable usage.For the nlm_host.h_count it might make a difference
in following places:
- nlmsvc_release_host(): decrement in refcount_dec()
provides RELEASE ordering, while original atomic_dec()
was fully unordered. Since the change is for better, it
should not matter.
- nlmclnt_release_host(): decrement in refcount_dec_and_test() only
provides RELEASE ordering and control dependency on success
vs. fully ordered atomic counterpart. It doesn't seem to
matter in this case since object freeing happens under mutex
lock anyway.Suggested-by: Kees Cook
Reviewed-by: David Windsor
Reviewed-by: Hans Liljestrand
Signed-off-by: Elena Reshetova
Signed-off-by: Trond Myklebust
28 Nov, 2017
2 commits
-
nlm_complain_hosts() walks through nlm_server_hosts hlist, which should
be protected by nlm_host_mutex.Signed-off-by: Vasily Averin
Reviewed-by: Jeff Layton
Signed-off-by: J. Bruce Fields -
Publishing of net pointer is not safe,
use net->ns.inum as net ID in debug messages[ 171.757678] lockd_up_net: per-net data created; net=f00001e7
[ 171.767188] NFSD: starting 90-second grace period (net f00001e7)
[ 300.653313] lockd: nuking all hosts in net f00001e7...
[ 300.653641] lockd: host garbage collection for net f00001e7
[ 300.653968] lockd: nlmsvc_mark_resources for net f00001e7
[ 300.711483] lockd_down_net: per-net data destroyed; net=f00001e7
[ 300.711847] lockd: nuking all hosts in net 0...
[ 300.711847] lockd: host garbage collection for net 0
[ 300.711848] lockd: nlmsvc_mark_resources for net 0Signed-off-by: Vasily Averin
Signed-off-by: J. Bruce Fields
02 Nov, 2017
1 commit
-
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.By default all files without license information are under the default
license of the kernel, which is GPL version 2.Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
24 Oct, 2015
1 commit
-
Currently we have reference-counted per-net NSM RPC client
which created on the first monitor request and destroyed
after the last unmonitor request. It's needed because
RPC client need to know 'utsname()->nodename', but utsname()
might be NULL when nsm_unmonitor() called.So instead of holding the rpc client we could just save nodename
in struct nlm_host and pass it to the rpc_create().
Thus ther is no need in keeping rpc client until last
unmonitor request. We could create separate RPC clients
for each monitor/unmonitor requests.Signed-off-by: Andrey Ryabinin
Signed-off-by: J. Bruce Fields
13 Oct, 2015
1 commit
-
Commit cb7323fffa85 ("lockd: create and use per-net NSM
RPC clients on MON/UNMON requests") introduced per-net
NSM RPC clients. Unfortunately this doesn't make any sense
without per-net nsm_handle.E.g. the following scenario could happen
Two hosts (X and Y) in different namespaces (A and B) share
the same nsm struct.1. nsm_monitor(host_X) called => NSM rpc client created,
nsm->sm_monitored bit set.
2. nsm_mointor(host-Y) called => nsm->sm_monitored already set,
we just exit. Thus in namespace B ln->nsm_clnt == NULL.
3. host X destroyed => nsm->sm_count decremented to 1
4. host Y destroyed => nsm_unmonitor() => nsm_mon_unmon() => NULL-ptr
dereference of *ln->nsm_clntSo this could be fixed by making per-net nsm_handles list,
instead of global. Thus different net namespaces will not be able
share the same nsm_handle.Signed-off-by: Andrey Ryabinin
Cc:
Signed-off-by: J. Bruce Fields
01 Mar, 2013
1 commit
-
Pull nfsd changes from J Bruce Fields:
"Miscellaneous bugfixes, plus:- An overhaul of the DRC cache by Jeff Layton. The main effect is
just to make it larger. This decreases the chances of intermittent
errors especially in the UDP case. But we'll need to watch for any
reports of performance regressions.- Containerized nfsd: with some limitations, we now support
per-container nfs-service, thanks to extensive work from Stanislav
Kinsbursky over the last year."Some notes about conflicts, since there were *two* non-data semantic
conflicts here:- idr_remove_all() had been added by a memory leak fix, but has since
become deprecated since idr_destroy() does it for us now.- xs_local_connect() had been added by this branch to make AF_LOCAL
connections be synchronous, but in the meantime Trond had changed the
calling convention in order to avoid a RCU dereference.There were a couple of more obvious actual source-level conflicts due to
the hlist traversal changes and one just due to code changes next to
each other, but those were trivial.* 'for-3.9' of git://linux-nfs.org/~bfields/linux: (49 commits)
SUNRPC: make AF_LOCAL connect synchronous
nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum
svcrpc: fix rpc server shutdown races
svcrpc: make svc_age_temp_xprts enqueue under sv_lock
lockd: nlmclnt_reclaim(): avoid stack overflow
nfsd: enable NFSv4 state in containers
nfsd: disable usermode helper client tracker in container
nfsd: use proper net while reading "exports" file
nfsd: containerize NFSd filesystem
nfsd: fix comments on nfsd_cache_lookup
SUNRPC: move cache_detail->cache_request callback call to cache_read()
SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function
SUNRPC: rework cache upcall logic
SUNRPC: introduce cache_detail->cache_request callback
NFS: simplify and clean cache library
NFS: use SUNRPC cache creation and destruction helper for DNS cache
nfsd4: free_stid can be static
nfsd: keep a checksum of the first 256 bytes of request
sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer
sunrpc: fix comment in struct xdr_buf definition
...
28 Feb, 2013
1 commit
-
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;type T;
expression a,c,d,e;
identifier b;
statement S;
@@-T b;
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin
Acked-by: Paul E. McKenney
Signed-off-by: Sasha Levin
Cc: Wu Fengguang
Cc: Marcelo Tosatti
Cc: Gleb Natapov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
05 Feb, 2013
1 commit
-
These routines are used by server and client code, so having them in a
separate header would be best.Signed-off-by: Jeff Layton
Acked-by: Trond Myklebust
Signed-off-by: J. Bruce Fields
05 Nov, 2012
1 commit
-
- Convert the non-trivial ones into WARN_ON_ONCE().
- Remove the trivial refcounting BUGsSigned-off-by: Trond Myklebust
28 Jul, 2012
7 commits
-
This is a cleanup patch - makes code looks simplier.
It replaces widely used rqstp->rq_xprt->xpt_net by introduced SVC_NET(rqstp).Signed-off-by: Stanislav Kinsbursky
Signed-off-by: J. Bruce Fields -
Signed-off-by: Stanislav Kinsbursky
Signed-off-by: J. Bruce Fields -
Just a small cleanup.
Signed-off-by: Stanislav Kinsbursky
Signed-off-by: J. Bruce Fields -
This patch introduces moves nrhosts in per-net data.
It also adds kernel warning to nlm_shutdown_hosts_net() about remaining hosts
in specified network namespace context.Signed-off-by: Stanislav Kinsbursky
Signed-off-by: J. Bruce Fields -
This patch moves next_gc to per-net data.
Note: passed network can be NULL (when Lockd kthread is exiting of Lockd
module is removing).Signed-off-by: Stanislav Kinsbursky
Signed-off-by: J. Bruce Fields -
Signed-off-by: J. Bruce Fields
-
This is required for per-network NLM shutdown and cleanup.
This patch passes init_net for a while.Signed-off-by: Stanislav Kinsbursky
Signed-off-by: J. Bruce Fields
15 Feb, 2012
2 commits
-
Lockd now managed in network namespace context. And this patch introduces
network namespace related NLM hosts shutdown in case of releasing per-net Lockd
resources.Signed-off-by: Stanislav Kinsbursky
Signed-off-by: Trond Myklebust -
This object depends on RPC client, and thus on network namespace.
So let's make it's allocation and lookup in network namespace context.Signed-off-by: Stanislav Kinsbursky
Signed-off-by: Trond Myklebust
14 Sep, 2011
1 commit
-
For IPv6 local address, lockd can not callback to client for
missing scope id when binding address at inet6_bind:324 if (addr_type & IPV6_ADDR_LINKLOCAL) {
325 if (addr_len >= sizeof(struct sockaddr_in6) &&
326 addr->sin6_scope_id) {
327 /* Override any existing binding, if another one
328 * is supplied by user.
329 */
330 sk->sk_bound_dev_if = addr->sin6_scope_id;
331 }
332
333 /* Binding to link-local address requires an interface */
334 if (!sk->sk_bound_dev_if) {
335 err = -EINVAL;
336 goto out_unlock;
337 }Replacing svc_addr_u by sockaddr_storage, let rqstp->rq_daddr contains more info
besides address.Reviewed-by: Jeff Layton
Reviewed-by: Chuck Lever
Signed-off-by: Mi Jinlong
Signed-off-by: J. Bruce Fields
26 Jan, 2011
1 commit
-
Nick Bowler reports:
> We were just having some NFS server troubles, and my client machine
> running 2.6.38-rc1+ (specifically, commit 2b1caf6ed7b888c95) crashed
> hard (syslog output appended to this mail).
>
> I'm not sure what the exact timeline was or how to reproduce this,
> but the server was rebooted during all this. Since I've never seen
> this happen before, it is possibly a regression from previous kernel
> releases. However, I recently updated my nfs-utils (on the client) to
> version 1.2.3, so that might be related as well.[ BUG output redacted ]
When done searching, the for_each_host loop in next_host_state() falls
through and returns the final host on the host chain without bumping
it's reference count.Since the host's ref count is only one at that point, releasing the
host in nlm_host_rebooted() attempts to destroy the host prematurely,
and therefore hits a BUG().Likely, the original intent of the for_each_host behavior in
next_host_state() was to handle the case when the host chain is empty.
Searching the chain and finding no suitable host to return needs to be
handled as well.Defensively restructure next_host_state() always to return NULL when
the loop falls through.Introduced by commit b10e30f6 "lockd: reorganize nlm_host_rebooted".
Cc: J. Bruce Fields
Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust
05 Jan, 2011
1 commit
-
We unlock again after we goto out.
Signed-off-by: Dan Carpenter
Signed-off-by: Trond Myklebust
17 Dec, 2010
10 commits
-
Clean up.
The contents of the src_sap field is not used in nlm_alloc_host().
Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust -
Clean up.
Remove the now unused helper nlm_lookup_host().
Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust -
Clean up.
Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust -
Clean up.
nlm_hosts now contains only server-side entries. Rename it to match
convention of client side cache.Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust -
Clean up.
Change nlmsvc_lookup_host() to be purpose-built for server-side
nlm_host management. This replaces the generic nlm_lookup_host()
helper function, just like on the client side. The lookup logic is
specialized for server host lookups.The server side cache also gets its own specialized equivalent of the
nlm_release_host() function.Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust -
NFS clients don't need the garbage collection processing that is
performed on nlm_host structures. The client picks up an nlm_host at
mount time and holds a reference to it until the file system is
unmounted.Servers, on the other hand, don't have a precise way to tell when an
nlm_host is no longer being used, so zero refcount nlm_host entries
are left to expire in the cache after a time.Basically there's nothing holding a reference to an nlm_host between
individual server-side NLM requests, but we can't afford the expense
of recreating them for every new NLM request from a client. The
nlm_host cache adds some lifetime hysteresis to entries in the cache
so the next time a particular nlm_host is needed, it's likely to be
discovered by a lookup rather than created from whole cloth.With the new implementation, client nlm_host cache items are no longer
garbage collected, and are destroyed directly by a new release
function specialized for client entries, nlmclnt_release_host(). They
are cached in their own data structure, and have their own lookup
logic, simplified and specialized for client nlm_host entries.However, the client nlm_host cache still shares reboot recovery logic
with the server nlm_host cache. The NSM "peer rebooted" downcall for
clients and servers still come through the same RPC call. This is a
legacy formal API that would be difficult to alter, and besides, the
user space NSM implementation can't tell the difference between peers
that are clients or servers.For this reason, the client cache continues to share the
nlm_host_mutex (and reboot recovery logic) with the server cache.Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust -
Refactor the tail of nlm_gc_hosts() into nlm_destroy_host() so that
this logic can be used separately from garbage collection.Rename it _locked() to document that it must be called with the hosts
cache mutex held.Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust -
Refactor nlm_host allocation and initialization into a separate
function. This will be the common piece of server and client nlm_host
lookup logic after the nlm_host cache is split.Small change: use kmalloc() instead of kzalloc(), as we're overwriting
almost all fields in the new nlm_host struct with non-zero values
immediately after it is allocated. An added benefit is we now have an
explicit reference to each field name where it is initialized (for all
you cscope fans out there).Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust -
Minor reorganization; no change in behavior. This will save some
duplicated code after we split the client and server host caches.Signed-off-by: J. Bruce Fields
[ cel: Forward-ported to 2.6.37 ]
Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust -
We've got a lot of loops like this, and I find them a little easier to
read with the macros. More such loops are coming.Signed-off-by: J. Bruce Fields
[ cel: Forward-ported to 2.6.37 ]
Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust