26 Sep, 2020

1 commit

  • It was an interesting idea but nobody seems to be using it, it's buggy
    at this point, and nfs4state.c is already complicated enough without it.
    The new nfsd/clients/ code provides some of the same functionality, and
    could probably do more if desired.

    This feature has been deprecated since 9d60d93198c6 ("Deprecate nfsd
    fault injection").

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

21 May, 2020

1 commit


20 Dec, 2019

3 commits

  • A couple of time_t variables are only used to track the state of the
    lease time and its expiration. The code correctly uses the 'time_after()'
    macro to make this work on 32-bit architectures even beyond year 2038,
    but the get_seconds() function and the time_t type itself are deprecated
    as they behave inconsistently between 32-bit and 64-bit architectures
    and often lead to code that is not y2038 safe.

    As a minor issue, using get_seconds() leads to problems with concurrent
    settimeofday() or clock_settime() calls, in the worst case timeout never
    triggering after the time has been set backwards.

    Change nfsd to use time64_t and ktime_get_boottime_seconds() here. This
    is clearly excessive, as boottime by itself means we never go beyond 32
    bits, but it does mean we handle this correctly and consistently without
    having to worry about corner cases and should be no more expensive than
    the previous implementation on 64-bit architectures.

    The max_cb_time() function gets changed in order to avoid an expensive
    64-bit division operation, but as the lease time is at most one hour,
    there is no change in behavior.

    Also do the same for server-to-server copy expiration time.

    Signed-off-by: Arnd Bergmann
    [bfields@redhat.com: fix up copy expiration]
    Signed-off-by: J. Bruce Fields

    Arnd Bergmann
     
  • The nfsd4_blocked_lock->nbl_time timestamp is recorded in jiffies,
    but then compared to a CLOCK_REALTIME timestamp later on, which makes
    no sense.

    For consistency with the other timestamps, change this to use a time_t.

    This is a change in behavior, which may cause regressions, but the
    current code is not sensible. On a system with CONFIG_HZ=1000,
    the 'time_after((unsigned long)nbl->nbl_time, (unsigned long)cutoff))'
    check is false for roughly the first 18 days of uptime and then true
    for the next 49 days.

    Fixes: 7919d0a27f1e ("nfsd: add a LRU list for blocked locks")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: J. Bruce Fields

    Arnd Bergmann
     
  • The nii_time field gets truncated to 'time_t' on 32-bit architectures
    before printing.

    Remove the use of 'struct timespec' to product the correct output
    beyond 2038.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: J. Bruce Fields

    Arnd Bergmann
     

10 Dec, 2019

2 commits

  • Given a universal address, mount the source server from the destination
    server. Use an internal mount. Call the NFS client nfs42_ssc_open to
    obtain the NFS struct file suitable for nfsd_copy_range.

    Ability to do "inter" server-to-server depends on the an nfsd kernel
    parameter "inter_copy_offload_enable".

    Signed-off-by: Olga Kornievskaia

    Olga Kornievskaia
     
  • Introducing the COPY_NOTIFY operation.

    Create a new unique stateid that will keep track of the copy
    state and the upcoming READs that will use that stateid.
    Each associated parent stateid has a list of copy
    notify stateids. A copy notify structure makes a copy of
    the parent stateid and a clientid and will use it to look
    up the parent stateid during the READ request (suggested
    by Trond Myklebust ).

    At nfs4_put_stid() time, we walk the list of the associated
    copy notify stateids and delete them.

    Laundromat thread will traverse globally stored copy notify
    stateid in idr and notice if any haven't been referenced in the
    lease period, if so, it'll remove them.

    Return single netaddr to advertise to the copy.

    Suggested-by: Trond Myklebust
    Signed-off-by: Olga Kornievskaia
    Signed-off-by: Andy Adamson

    Olga Kornievskaia
     

09 Nov, 2019

1 commit


10 Sep, 2019

1 commit

  • Version 2 upcalls will allow the nfsd to include a hash of the kerberos
    principal string in the Cld_Create upcall. If a principal is present in
    the svc_cred, then the hash will be included in the Cld_Create upcall.
    We attempt to use the svc_cred.cr_raw_principal (which is returned by
    gssproxy) first, and then fall back to using the svc_cred.cr_principal
    (which is returned by both gssproxy and rpc.svcgssd). Upon a subsequent
    restart, the hash will be returned in the Cld_Gracestart downcall and
    stored in the reclaim_str_hashtbl so it can be used when handling
    reclaim opens.

    Signed-off-by: Scott Mayhew
    Signed-off-by: J. Bruce Fields

    Scott Mayhew
     

19 Aug, 2019

3 commits


13 Jul, 2019

1 commit

  • Pull driver core and debugfs updates from Greg KH:
    "Here is the "big" driver core and debugfs changes for 5.3-rc1

    It's a lot of different patches, all across the tree due to some api
    changes and lots of debugfs cleanups.

    Other than the debugfs cleanups, in this set of changes we have:

    - bus iteration function cleanups

    - scripts/get_abi.pl tool to display and parse Documentation/ABI
    entries in a simple way

    - cleanups to Documenatation/ABI/ entries to make them parse easier
    due to typos and other minor things

    - default_attrs use for some ktype users

    - driver model documentation file conversions to .rst

    - compressed firmware file loading

    - deferred probe fixes

    All of these have been in linux-next for a while, with a bunch of
    merge issues that Stephen has been patient with me for"

    * tag 'driver-core-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (102 commits)
    debugfs: make error message a bit more verbose
    orangefs: fix build warning from debugfs cleanup patch
    ubifs: fix build warning after debugfs cleanup patch
    driver: core: Allow subsystems to continue deferring probe
    drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT
    arch_topology: Remove error messages on out-of-memory conditions
    lib: notifier-error-inject: no need to check return value of debugfs_create functions
    swiotlb: no need to check return value of debugfs_create functions
    ceph: no need to check return value of debugfs_create functions
    sunrpc: no need to check return value of debugfs_create functions
    ubifs: no need to check return value of debugfs_create functions
    orangefs: no need to check return value of debugfs_create functions
    nfsd: no need to check return value of debugfs_create functions
    lib: 842: no need to check return value of debugfs_create functions
    debugfs: provide pr_fmt() macro
    debugfs: log errors when something goes wrong
    drivers: s390/cio: Fix compilation warning about const qualifiers
    drivers: Add generic helper to match by of_node
    driver_find_device: Unify the match function with class_find_device()
    bus_find_device: Unify the match callback with class_find_device
    ...

    Linus Torvalds
     

04 Jul, 2019

4 commits

  • Decode the implementation ID and display in nfsd/clients/#/info. It may
    be help identify the client. It won't be used otherwise.

    (When this went into the protocol, I thought the implementation ID would
    be a slippery slope towards implementation-specific workarounds as with
    the http user-agent. But I guess I was wrong, the risk seems pretty low
    now.)

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • I plan to expose some information about nfsv4 clients here.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Keep a second reference count which is what is really used to decide
    when to free the client's memory.

    Next I'm going to add an nfsd/clients/ directory with a subdirectory for
    each NFSv4 client. File objects under nfsd/clients/ will hold these
    references.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Rename this to a more descriptive name: it counts the number of
    in-progress rpc's referencing this client.

    Next I'm going to add a second refcount with a slightly different use.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

03 Jul, 2019

1 commit

  • When calling debugfs functions, there is no need to ever check the
    return value. The function can work or not, but the code logic should
    never do something different based on this.

    Cc: "J. Bruce Fields"
    Cc: Jeff Layton
    Cc: linux-nfs@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman
    Link: https://lore.kernel.org/r/20190612152603.GB18440@kroah.com
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

24 Apr, 2019

1 commit


09 Apr, 2019

1 commit

  • If there are multiple callbacks queued, waiting for the callback
    slot when the callback gets shut down, then they all currently
    end up acting as if they hold the slot, and call
    nfsd4_cb_sequence_done() resulting in interesting side-effects.

    In addition, the 'retry_nowait' path in nfsd4_cb_sequence_done()
    causes a loop back to nfsd4_cb_prepare() without first freeing the
    slot, which causes a deadlock when nfsd41_cb_get_slot() gets called
    a second time.

    This patch therefore adds a boolean to track whether or not the
    callback did pick up the slot, so that it can do the right thing
    in these 2 cases.

    Cc: stable@vger.kernel.org
    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     

20 Dec, 2018

1 commit

  • SUNRPC has two sorts of credentials, both of which appear as
    "struct rpc_cred".
    There are "generic credentials" which are supplied by clients
    such as NFS and passed in 'struct rpc_message' to indicate
    which user should be used to authorize the request, and there
    are low-level credentials such as AUTH_NULL, AUTH_UNIX, AUTH_GSS
    which describe the credential to be sent over the wires.

    This patch replaces all the generic credentials by 'struct cred'
    pointers - the credential structure used throughout Linux.

    For machine credentials, there is a special 'struct cred *' pointer
    which is statically allocated and recognized where needed as
    having a special meaning. A look-up of a low-level cred will
    map this to a machine credential.

    Signed-off-by: NeilBrown
    Acked-by: J. Bruce Fields
    Signed-off-by: Anna Schumaker

    NeilBrown
     

26 Sep, 2018

2 commits

  • Upon receiving a request for async copy, create a new kthread. If we
    get asynchronous request, make sure to copy the needed arguments/state
    from the stack before starting the copy. Then start the thread and reply
    back to the client indicating copy is asynchronous.

    nfsd_copy_file_range() will copy in a loop over the total number of
    bytes is needed to copy. In case a failure happens in the middle, we
    ignore the error and return how much we copied so far. Once done
    creating a workitem for the callback workqueue and send CB_OFFLOAD with
    the results.

    The lifetime of the copy stateid is bound to the vfs copy. This way we
    don't need to keep the nfsd_net structure for the callback. We could
    keep it around longer so that an OFFLOAD_STATUS that came late would
    still get results, but clients should be able to deal without that.

    We handle OFFLOAD_CANCEL by sending a signal to the copy thread and
    calling kthread_stop.

    A client should cancel any ongoing copies before calling DESTROY_CLIENT;
    if not, we return a CLIENT_BUSY error.

    If the client is destroyed for some other reason (lease expiration, or
    server shutdown), we must clean up any ongoing copies ourselves.

    Signed-off-by: Olga Kornievskaia
    [colin.king@canonical.com: fix leak in error case]
    [bfields@fieldses.org: remove signalling, merge patches]
    Signed-off-by: J. Bruce Fields

    Olga Kornievskaia
     
  • Signed-off-by: Olga Kornievskaia
    Signed-off-by: J. Bruce Fields

    Olga Kornievskaia
     

23 Aug, 2018

1 commit


08 Nov, 2017

5 commits

  • atomic_t variables are currently used to implement reference
    counters with the following properties:
    - counter is initialized to 1 using atomic_set()
    - a resource is freed upon counter reaching zero
    - once counter reaches zero, its further
    increments aren't allowed
    - counter schema uses basic atomic operations
    (set, inc, inc_not_zero, dec_and_test, etc.)

    Such atomic variables should be converted to a newly provided
    refcount_t type and API that prevents accidental counter overflows
    and underflows. This is important since overflows and underflows
    can lead to use-after-free situation and be exploitable.

    The variable nfs4_file.fi_ref is used as pure reference counter.
    Convert it to refcount_t and fix up the operations.

    Suggested-by: Kees Cook
    Reviewed-by: David Windsor
    Reviewed-by: Hans Liljestrand
    Signed-off-by: Elena Reshetova
    Signed-off-by: J. Bruce Fields

    Elena Reshetova
     
  • atomic_t variables are currently used to implement reference
    counters with the following properties:
    - counter is initialized to 1 using atomic_set()
    - a resource is freed upon counter reaching zero
    - once counter reaches zero, its further
    increments aren't allowed
    - counter schema uses basic atomic operations
    (set, inc, inc_not_zero, dec_and_test, etc.)

    Such atomic variables should be converted to a newly provided
    refcount_t type and API that prevents accidental counter overflows
    and underflows. This is important since overflows and underflows
    can lead to use-after-free situation and be exploitable.

    The variable nfs4_cntl_odstate.co_odcount is used as pure reference counter.
    Convert it to refcount_t and fix up the operations.

    Suggested-by: Kees Cook
    Reviewed-by: David Windsor
    Reviewed-by: Hans Liljestrand
    Signed-off-by: Elena Reshetova
    Signed-off-by: J. Bruce Fields

    Elena Reshetova
     
  • atomic_t variables are currently used to implement reference
    counters with the following properties:
    - counter is initialized to 1 using atomic_set()
    - a resource is freed upon counter reaching zero
    - once counter reaches zero, its further
    increments aren't allowed
    - counter schema uses basic atomic operations
    (set, inc, inc_not_zero, dec_and_test, etc.)

    Such atomic variables should be converted to a newly provided
    refcount_t type and API that prevents accidental counter overflows
    and underflows. This is important since overflows and underflows
    can lead to use-after-free situation and be exploitable.

    The variable nfs4_stid.sc_count is used as pure reference counter.
    Convert it to refcount_t and fix up the operations.

    Suggested-by: Kees Cook
    Reviewed-by: David Windsor
    Reviewed-by: Hans Liljestrand
    Signed-off-by: Elena Reshetova
    Signed-off-by: J. Bruce Fields

    Elena Reshetova
     
  • The spec allows us to return NFS4ERR_SEQ_FALSE_RETRY if we notice that
    the client is making a call that matches a previous (slot, seqid) pair
    but that *isn't* actually a replay, because some detail of the call
    doesn't actually match the previous one.

    Catching every such case is difficult, but we may as well catch a few
    easy ones. This also handles the case described in the previous patch,
    in a different way.

    The spec does however require us to catch the case where the difference
    is in the rpc credentials. This prevents somebody from snooping another
    user's replies by fabricating retries.

    (But the practical value of the attack is limited by the fact that the
    replies with the most sensitive data are READ replies, which are not
    normally cached.)

    Tested-by: Olga Kornievskaia
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Currently our handling of 4.1+ requests without "cachethis" set is
    confusing and not quite correct.

    Suppose a client sends a compound consisting of only a single SEQUENCE
    op, and it matches the seqid in a session slot (so it's a retry), but
    the previous request with that seqid did not have "cachethis" set.

    The obvious thing to do might be to return NFS4ERR_RETRY_UNCACHED_REP,
    but the protocol only allows that to be returned on the op following the
    SEQUENCE, and there is no such op in this case.

    The protocol permits us to cache replies even if the client didn't ask
    us to. And it's easy to do so in the case of solo SEQUENCE compounds.

    So, when we get a solo SEQUENCE, we can either return the previously
    cached reply or NFSERR_SEQ_FALSE_RETRY if we notice it differs in some
    way from the original call.

    Currently, we're returning a corrupt reply in the case a solo SEQUENCE
    matches a previous compound with more ops. This actually matters
    because the Linux client recently started doing this as a way to recover
    from lost replies to idempotent operations in the case the process doing
    the original reply was killed: in that case it's difficult to keep the
    original arguments around to do a real retry, and the client no longer
    cares what the result is anyway, but it would like to make sure that the
    slot's sequence id has been incremented, and the solo SEQUENCE assures
    that: if the server never got the original reply, it will increment the
    sequence id. If it did get the original reply, it won't increment, and
    nothing else that about the reply really matters much. But we can at
    least attempt to return valid xdr!

    Tested-by: Olga Kornievskaia
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

18 Feb, 2017

1 commit


01 Feb, 2017

1 commit

  • nfsd assigns the nfs4_free_lock_stateid to .sc_free in init_lock_stateid().

    If nfsd doesn't go through init_lock_stateid() and put stateid at end,
    there is a NULL reference to .sc_free when calling nfs4_put_stid(ns).

    This patch let the nfs4_stid.sc_free assignment to nfs4_alloc_stid().

    Cc: stable@vger.kernel.org
    Fixes: 356a95ece7aa "nfsd: clean up races in lock stateid searching..."
    Signed-off-by: Kinglong Mee
    Reviewed-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Kinglong Mee
     

27 Sep, 2016

3 commits

  • It's possible for a client to call in on a lock that is blocked for a
    long time, but discontinue polling for it. A malicious client could
    even set a lock on a file, and then spam the server with failing lock
    requests from different lockowners that pile up in a DoS attack.

    Add the blocked lock structures to a per-net namespace LRU when hashing
    them, and timestamp them. If the lock request is not revisited after a
    lease period, we'll drop it under the assumption that the client is no
    longer interested.

    This also gives us a mechanism to clean up these objects at server
    shutdown time as well.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Create a new per-lockowner+per-inode structure that contains a
    file_lock. Have nfsd4_lock add this structure to the lockowner's list
    prior to setting the lock. Then call the vfs and request a blocking lock
    (by setting FL_SLEEP). If we get anything besides FILE_LOCK_DEFERRED
    back, then we dequeue the block structure and free it. When the next
    lock request comes in, we'll look for an existing block for the same
    filehandle and dequeue and reuse it if there is one.

    When the lock comes free (a'la an lm_notify call), we dequeue it
    from the lockowner's list and kick off a CB_NOTIFY_LOCK callback to
    inform the client that it should retry the lock request.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Add the encoding/decoding for CB_NOTIFY_LOCK operations.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

17 Sep, 2016

1 commit


14 Jul, 2016

1 commit

  • This addresses the conundrum referenced in RFC5661 18.35.3,
    and will allow clients to return state to the server using the
    machine credentials.

    The biggest part of the problem is that we need to allow the client
    to send a compound op with integrity/privacy on mounts that don't
    have it enabled.

    Add server support for properly decoding and using spo_must_enforce
    and spo_must_allow bits. Add support for machine credentials to be
    used for CLOSE, OPEN_DOWNGRADE, LOCKU, DELEGRETURN,
    and TEST/FREE STATEID.
    Implement a check so as to not throw WRONGSEC errors when these
    operations are used if integrity/privacy isn't turned on.

    Without this, Linux clients with credentials that expired while holding
    delegations were getting stuck in an endless loop.

    Signed-off-by: Andrew Elble
    Reviewed-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Andrew Elble
     

16 Jun, 2016

1 commit

  • It used to be the case that state had an rwlock that was locked for write
    by downgrades, but for read for upgrades (opens). Well, the problem is
    if there are two competing opens for the same state, they step on
    each other toes potentially leading to leaking file descriptors
    from the state structure, since access mode is a bitmap only set once.

    Signed-off-by: Oleg Drokin
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    Oleg Drokin
     

14 May, 2016

1 commit


16 Jan, 2016

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Smaller bugfixes and cleanup, including a fix for a failures of
    kerberized NFSv4.1 mounts, and Scott Mayhew's work addressing ACK
    storms that can affect some high-availability NFS setups"

    * tag 'nfsd-4.5' of git://linux-nfs.org/~bfields/linux:
    nfsd: add new io class tracepoint
    nfsd: give up on CB_LAYOUTRECALLs after two lease periods
    nfsd: Fix nfsd leaks sunrpc module references
    lockd: constify nlmsvc_binding structure
    lockd: use to_delayed_work
    nfsd: use to_delayed_work
    Revert "svcrdma: Do not send XDR roundup bytes for a write chunk"
    lockd: Register callbacks on the inetaddr_chain and inet6addr_chain
    nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain
    sunrpc: Add a function to close temporary transports immediately
    nfsd: don't base cl_cb_status on stale information
    nfsd4: fix gss-proxy 4.1 mounts for some AD principals
    nfsd: fix unlikely NULL deref in mach_creds_match
    nfsd: minor consolidation of mach_cred handling code
    nfsd: helper for dup of possibly NULL string
    svcrpc: move some initialization to common code
    nfsd: fix a warning message
    nfsd: constify nfsd4_callback_ops structure
    nfsd: recover: constify nfsd4_client_tracking_ops structures
    svcrdma: Do not send XDR roundup bytes for a write chunk

    Linus Torvalds
     

08 Dec, 2015

1 commit