30 Dec, 2020

1 commit

  • [ Upstream commit 9b82d88d5976e5f2b8015d58913654856576ace5 ]

    NLM uses an interval-based rebinding, i.e. it clears the transport's
    binding under certain conditions if more than 60 seconds have elapsed
    since the connection was last bound.

    This rebinding is not necessary for an autobind RPC client over a
    connection-oriented protocol like TCP.

    It can also cause problems: it is possible for nlm_bind_host() to clear
    XPRT_BOUND whilst a connection worker is in the middle of trying to
    reconnect, after it had already been checked in xprt_connect().

    When the connection worker notices that XPRT_BOUND has been cleared
    under it, in xs_tcp_finish_connecting(), that results in:

    xs_tcp_setup_socket: connect returned unhandled error -107

    Worse, it's possible that the two can get into lockstep, resulting in
    the same behaviour repeated indefinitely, with the above error every
    300 seconds, without ever recovering, and the connection never being
    established. This has been seen in practice, with a large number of NLM
    client tasks, following a server restart.

    The existing callers of nlm_bind_host & nlm_rebind_host should not need
    to force the rebind, for TCP, so restrict the interval-based rebinding
    to UDP only.

    For TCP, we will still rebind when needed, e.g. on timeout, and connection
    error (including closure), since connection-related errors on an existing
    connection, ECONNREFUSED when trying to connect, and rpc_check_timeout(),
    already unconditionally clear XPRT_BOUND.

    To avoid having to add the fix, and explanation, to both nlm_bind_host()
    and nlm_rebind_host(), remove the duplicate code from the former, and
    have it call the latter.

    Drop the dprintk, which adds no value over a trace.

    Signed-off-by: Calum Mackay
    Fixes: 35f5a422ce1a ("SUNRPC: new interface to force an RPC rebind")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Sasha Levin

    Calum Mackay
     

04 Nov, 2019

1 commit

  • NFSv2, v3 and NFSv4 servers often have duplicate replay caches that look
    at the source port when deciding whether or not an RPC call is a replay
    of a previous call. This requires clients to perform strange TCP gymnastics
    in order to ensure that when they reconnect to the server, they bind
    to the same source port.

    NFSv4.1 and NFSv4.2 have sessions that provide proper replay semantics,
    that do not look at the source port of the connection. This patch therefore
    ensures they can ignore the rebind requirement.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

27 Apr, 2019

2 commits


19 Mar, 2019

1 commit

  • If the last NFSv3 unmount from a given host races with a mount from the
    same host, we can destroy an nlm_host that is still in use.

    Specifically nlmclnt_lookup_host() can increment h_count on
    an nlm_host that nlmclnt_release_host() has just successfully called
    refcount_dec_and_test() on.
    Once nlmclnt_lookup_host() drops the mutex, nlm_destroy_host_lock()
    will be called to destroy the nlmclnt which is now in use again.

    The cause of the problem is that the dec_and_test happens outside the
    locked region. This is easily fixed by using
    refcount_dec_and_mutex_lock().

    Fixes: 8ea6ecc8b075 ("lockd: Create client-side nlm_host cache")
    Cc: stable@vger.kernel.org (v2.6.38+)
    Signed-off-by: NeilBrown
    Signed-off-by: Trond Myklebust

    NeilBrown
     

30 Oct, 2018

1 commit


25 Jan, 2018

1 commit

  • The server shouldn't actually delete the struct nlm_host until it hits
    the garbage collector. In order to make that work correctly with the
    refcount API, we can bump the refcount by one, and then use
    refcount_dec_if_one() in the garbage collector.

    Signed-off-by: Trond Myklebust
    Acked-by: J. Bruce Fields

    Trond Myklebust
     

15 Jan, 2018

2 commits

  • atomic_t variables are currently used to implement reference
    counters with the following properties:
    - counter is initialized to 1 using atomic_set()
    - a resource is freed upon counter reaching zero
    - once counter reaches zero, its further
    increments aren't allowed
    - counter schema uses basic atomic operations
    (set, inc, inc_not_zero, dec_and_test, etc.)

    Such atomic variables should be converted to a newly provided
    refcount_t type and API that prevents accidental counter overflows
    and underflows. This is important since overflows and underflows
    can lead to use-after-free situation and be exploitable.

    The variable nsm_handle.sm_count is used as pure reference counter.
    Convert it to refcount_t and fix up the operations.

    **Important note for maintainers:

    Some functions from refcount_t API defined in lib/refcount.c
    have different memory ordering guarantees than their atomic
    counterparts.
    The full comparison can be seen in
    https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
    in state to be merged to the documentation tree.
    Normally the differences should not matter since refcount_t provides
    enough guarantees to satisfy the refcounting use cases, but in
    some rare cases it might matter.
    Please double check that you don't have some undocumented
    memory guarantees for this variable usage.

    For the nsm_handle.sm_count it might make a difference
    in following places:
    - nsm_release(): decrement in refcount_dec_and_lock() only
    provides RELEASE ordering, control dependency on success
    and holds a spin lock on success vs. fully ordered atomic
    counterpart. No change for the spin lock guarantees.

    Suggested-by: Kees Cook
    Reviewed-by: David Windsor
    Reviewed-by: Hans Liljestrand
    Signed-off-by: Elena Reshetova
    Signed-off-by: Trond Myklebust

    Elena Reshetova
     
  • atomic_t variables are currently used to implement reference
    counters with the following properties:
    - counter is initialized to 1 using atomic_set()
    - a resource is freed upon counter reaching zero
    - once counter reaches zero, its further
    increments aren't allowed
    - counter schema uses basic atomic operations
    (set, inc, inc_not_zero, dec_and_test, etc.)

    Such atomic variables should be converted to a newly provided
    refcount_t type and API that prevents accidental counter overflows
    and underflows. This is important since overflows and underflows
    can lead to use-after-free situation and be exploitable.

    The variable nlm_host.h_count is used as pure reference counter.
    Convert it to refcount_t and fix up the operations.

    **Important note for maintainers:

    Some functions from refcount_t API defined in lib/refcount.c
    have different memory ordering guarantees than their atomic
    counterparts.
    The full comparison can be seen in
    https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
    in state to be merged to the documentation tree.
    Normally the differences should not matter since refcount_t provides
    enough guarantees to satisfy the refcounting use cases, but in
    some rare cases it might matter.
    Please double check that you don't have some undocumented
    memory guarantees for this variable usage.

    For the nlm_host.h_count it might make a difference
    in following places:
    - nlmsvc_release_host(): decrement in refcount_dec()
    provides RELEASE ordering, while original atomic_dec()
    was fully unordered. Since the change is for better, it
    should not matter.
    - nlmclnt_release_host(): decrement in refcount_dec_and_test() only
    provides RELEASE ordering and control dependency on success
    vs. fully ordered atomic counterpart. It doesn't seem to
    matter in this case since object freeing happens under mutex
    lock anyway.

    Suggested-by: Kees Cook
    Reviewed-by: David Windsor
    Reviewed-by: Hans Liljestrand
    Signed-off-by: Elena Reshetova
    Signed-off-by: Trond Myklebust

    Elena Reshetova
     

28 Nov, 2017

2 commits

  • nlm_complain_hosts() walks through nlm_server_hosts hlist, which should
    be protected by nlm_host_mutex.

    Signed-off-by: Vasily Averin
    Reviewed-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Vasily Averin
     
  • Publishing of net pointer is not safe,
    use net->ns.inum as net ID in debug messages

    [ 171.757678] lockd_up_net: per-net data created; net=f00001e7
    [ 171.767188] NFSD: starting 90-second grace period (net f00001e7)
    [ 300.653313] lockd: nuking all hosts in net f00001e7...
    [ 300.653641] lockd: host garbage collection for net f00001e7
    [ 300.653968] lockd: nlmsvc_mark_resources for net f00001e7
    [ 300.711483] lockd_down_net: per-net data destroyed; net=f00001e7
    [ 300.711847] lockd: nuking all hosts in net 0...
    [ 300.711847] lockd: host garbage collection for net 0
    [ 300.711848] lockd: nlmsvc_mark_resources for net 0

    Signed-off-by: Vasily Averin
    Signed-off-by: J. Bruce Fields

    Vasily Averin
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

24 Oct, 2015

1 commit

  • Currently we have reference-counted per-net NSM RPC client
    which created on the first monitor request and destroyed
    after the last unmonitor request. It's needed because
    RPC client need to know 'utsname()->nodename', but utsname()
    might be NULL when nsm_unmonitor() called.

    So instead of holding the rpc client we could just save nodename
    in struct nlm_host and pass it to the rpc_create().
    Thus ther is no need in keeping rpc client until last
    unmonitor request. We could create separate RPC clients
    for each monitor/unmonitor requests.

    Signed-off-by: Andrey Ryabinin
    Signed-off-by: J. Bruce Fields

    Andrey Ryabinin
     

13 Oct, 2015

1 commit

  • Commit cb7323fffa85 ("lockd: create and use per-net NSM
    RPC clients on MON/UNMON requests") introduced per-net
    NSM RPC clients. Unfortunately this doesn't make any sense
    without per-net nsm_handle.

    E.g. the following scenario could happen
    Two hosts (X and Y) in different namespaces (A and B) share
    the same nsm struct.

    1. nsm_monitor(host_X) called => NSM rpc client created,
    nsm->sm_monitored bit set.
    2. nsm_mointor(host-Y) called => nsm->sm_monitored already set,
    we just exit. Thus in namespace B ln->nsm_clnt == NULL.
    3. host X destroyed => nsm->sm_count decremented to 1
    4. host Y destroyed => nsm_unmonitor() => nsm_mon_unmon() => NULL-ptr
    dereference of *ln->nsm_clnt

    So this could be fixed by making per-net nsm_handles list,
    instead of global. Thus different net namespaces will not be able
    share the same nsm_handle.

    Signed-off-by: Andrey Ryabinin
    Cc:
    Signed-off-by: J. Bruce Fields

    Andrey Ryabinin
     

01 Mar, 2013

1 commit

  • Pull nfsd changes from J Bruce Fields:
    "Miscellaneous bugfixes, plus:

    - An overhaul of the DRC cache by Jeff Layton. The main effect is
    just to make it larger. This decreases the chances of intermittent
    errors especially in the UDP case. But we'll need to watch for any
    reports of performance regressions.

    - Containerized nfsd: with some limitations, we now support
    per-container nfs-service, thanks to extensive work from Stanislav
    Kinsbursky over the last year."

    Some notes about conflicts, since there were *two* non-data semantic
    conflicts here:

    - idr_remove_all() had been added by a memory leak fix, but has since
    become deprecated since idr_destroy() does it for us now.

    - xs_local_connect() had been added by this branch to make AF_LOCAL
    connections be synchronous, but in the meantime Trond had changed the
    calling convention in order to avoid a RCU dereference.

    There were a couple of more obvious actual source-level conflicts due to
    the hlist traversal changes and one just due to code changes next to
    each other, but those were trivial.

    * 'for-3.9' of git://linux-nfs.org/~bfields/linux: (49 commits)
    SUNRPC: make AF_LOCAL connect synchronous
    nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum
    svcrpc: fix rpc server shutdown races
    svcrpc: make svc_age_temp_xprts enqueue under sv_lock
    lockd: nlmclnt_reclaim(): avoid stack overflow
    nfsd: enable NFSv4 state in containers
    nfsd: disable usermode helper client tracker in container
    nfsd: use proper net while reading "exports" file
    nfsd: containerize NFSd filesystem
    nfsd: fix comments on nfsd_cache_lookup
    SUNRPC: move cache_detail->cache_request callback call to cache_read()
    SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function
    SUNRPC: rework cache upcall logic
    SUNRPC: introduce cache_detail->cache_request callback
    NFS: simplify and clean cache library
    NFS: use SUNRPC cache creation and destruction helper for DNS cache
    nfsd4: free_stid can be static
    nfsd: keep a checksum of the first 256 bytes of request
    sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer
    sunrpc: fix comment in struct xdr_buf definition
    ...

    Linus Torvalds
     

28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

05 Feb, 2013

1 commit


05 Nov, 2012

1 commit


28 Jul, 2012

7 commits


15 Feb, 2012

2 commits


14 Sep, 2011

1 commit

  • For IPv6 local address, lockd can not callback to client for
    missing scope id when binding address at inet6_bind:

    324 if (addr_type & IPV6_ADDR_LINKLOCAL) {
    325 if (addr_len >= sizeof(struct sockaddr_in6) &&
    326 addr->sin6_scope_id) {
    327 /* Override any existing binding, if another one
    328 * is supplied by user.
    329 */
    330 sk->sk_bound_dev_if = addr->sin6_scope_id;
    331 }
    332
    333 /* Binding to link-local address requires an interface */
    334 if (!sk->sk_bound_dev_if) {
    335 err = -EINVAL;
    336 goto out_unlock;
    337 }

    Replacing svc_addr_u by sockaddr_storage, let rqstp->rq_daddr contains more info
    besides address.

    Reviewed-by: Jeff Layton
    Reviewed-by: Chuck Lever
    Signed-off-by: Mi Jinlong
    Signed-off-by: J. Bruce Fields

    Mi Jinlong
     

26 Jan, 2011

1 commit

  • Nick Bowler reports:

    > We were just having some NFS server troubles, and my client machine
    > running 2.6.38-rc1+ (specifically, commit 2b1caf6ed7b888c95) crashed
    > hard (syslog output appended to this mail).
    >
    > I'm not sure what the exact timeline was or how to reproduce this,
    > but the server was rebooted during all this. Since I've never seen
    > this happen before, it is possibly a regression from previous kernel
    > releases. However, I recently updated my nfs-utils (on the client) to
    > version 1.2.3, so that might be related as well.

    [ BUG output redacted ]

    When done searching, the for_each_host loop in next_host_state() falls
    through and returns the final host on the host chain without bumping
    it's reference count.

    Since the host's ref count is only one at that point, releasing the
    host in nlm_host_rebooted() attempts to destroy the host prematurely,
    and therefore hits a BUG().

    Likely, the original intent of the for_each_host behavior in
    next_host_state() was to handle the case when the host chain is empty.
    Searching the chain and finding no suitable host to return needs to be
    handled as well.

    Defensively restructure next_host_state() always to return NULL when
    the loop falls through.

    Introduced by commit b10e30f6 "lockd: reorganize nlm_host_rebooted".

    Cc: J. Bruce Fields
    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

05 Jan, 2011

1 commit


17 Dec, 2010

10 commits

  • Clean up.

    The contents of the src_sap field is not used in nlm_alloc_host().

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean up.

    Remove the now unused helper nlm_lookup_host().

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean up.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean up.

    nlm_hosts now contains only server-side entries. Rename it to match
    convention of client side cache.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean up.

    Change nlmsvc_lookup_host() to be purpose-built for server-side
    nlm_host management. This replaces the generic nlm_lookup_host()
    helper function, just like on the client side. The lookup logic is
    specialized for server host lookups.

    The server side cache also gets its own specialized equivalent of the
    nlm_release_host() function.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • NFS clients don't need the garbage collection processing that is
    performed on nlm_host structures. The client picks up an nlm_host at
    mount time and holds a reference to it until the file system is
    unmounted.

    Servers, on the other hand, don't have a precise way to tell when an
    nlm_host is no longer being used, so zero refcount nlm_host entries
    are left to expire in the cache after a time.

    Basically there's nothing holding a reference to an nlm_host between
    individual server-side NLM requests, but we can't afford the expense
    of recreating them for every new NLM request from a client. The
    nlm_host cache adds some lifetime hysteresis to entries in the cache
    so the next time a particular nlm_host is needed, it's likely to be
    discovered by a lookup rather than created from whole cloth.

    With the new implementation, client nlm_host cache items are no longer
    garbage collected, and are destroyed directly by a new release
    function specialized for client entries, nlmclnt_release_host(). They
    are cached in their own data structure, and have their own lookup
    logic, simplified and specialized for client nlm_host entries.

    However, the client nlm_host cache still shares reboot recovery logic
    with the server nlm_host cache. The NSM "peer rebooted" downcall for
    clients and servers still come through the same RPC call. This is a
    legacy formal API that would be difficult to alter, and besides, the
    user space NSM implementation can't tell the difference between peers
    that are clients or servers.

    For this reason, the client cache continues to share the
    nlm_host_mutex (and reboot recovery logic) with the server cache.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Refactor the tail of nlm_gc_hosts() into nlm_destroy_host() so that
    this logic can be used separately from garbage collection.

    Rename it _locked() to document that it must be called with the hosts
    cache mutex held.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Refactor nlm_host allocation and initialization into a separate
    function. This will be the common piece of server and client nlm_host
    lookup logic after the nlm_host cache is split.

    Small change: use kmalloc() instead of kzalloc(), as we're overwriting
    almost all fields in the new nlm_host struct with non-zero values
    immediately after it is allocated. An added benefit is we now have an
    explicit reference to each field name where it is initialized (for all
    you cscope fans out there).

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Minor reorganization; no change in behavior. This will save some
    duplicated code after we split the client and server host caches.

    Signed-off-by: J. Bruce Fields
    [ cel: Forward-ported to 2.6.37 ]
    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    J. Bruce Fields
     
  • We've got a lot of loops like this, and I find them a little easier to
    read with the macros. More such loops are coming.

    Signed-off-by: J. Bruce Fields
    [ cel: Forward-ported to 2.6.37 ]
    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    J. Bruce Fields