11 Dec, 2012

5 commits

  • Precursor patch. Hard-coded "init_net" will be replaced by proper one in
    future.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: J. Bruce Fields

    Stanislav Kinsbursky
     
  • There could be a situation, when NFSd was started in one network namespace, but
    stopped in another one.
    This will trigger kernel panic, because RPCBIND client is stored on per-net
    NFSd data, and will be NULL on NFSd shutdown.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: J. Bruce Fields

    Stanislav Kinsbursky
     
  • With NFSv4, if we create a file then open it we explicit avoid checking
    the permissions on the file during the open because the fact that we
    created it ensures we should be allow to open it (the create and the
    open should appear to be a single operation).

    However if the reply to an EXCLUSIVE create gets lots and the client
    resends the create, the current code will perform the permission check -
    because it doesn't realise that it did the open already..

    This patch should fix this.

    Note that I haven't actually seen this cause a problem. I was just
    looking at the code trying to figure out a different EXCLUSIVE open
    related issue, and this looked wrong.

    (Fix confirmed with pynfs 4.0 test OPEN4--bfields)

    Cc: stable@kernel.org
    Signed-off-by: NeilBrown
    [bfields: use OWNER_OVERRIDE and update for 4.1]
    Signed-off-by: J. Bruce Fields

    Neil Brown
     
  • This is a cleanup patch.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: J. Bruce Fields

    Stanislav Kinsbursky
     
  • Pointer to client tracking operations - client_tracking_ops - have to be
    containerized, because different environment can support different trackers
    (for example, legacy tracker currently is not suported in container).

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: J. Bruce Fields

    Stanislav Kinsbursky
     

04 Dec, 2012

6 commits

  • Fix nfsd4_lockt and release_lockowner to lookup the referenced client,
    so that it can renew it, or correctly return "expired", as appropriate.

    Also share some code while we're here.

    Reported-by: Frank Filz
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Over TCP, RPC's are preceded by a single 4-byte field telling you how
    long the rpc is (in bytes). The spec also allows you to send an RPC in
    multiple such records (the high bit of the length field is used to tell
    you whether this is the final record).

    We've survived for years without supporting this because in practice the
    clients we care about don't use it. But the userland rpc libraries do,
    and every now and then an experimental client will run into this. (Most
    recently I noticed it while trying to write a pynfs check.) And we're
    really on the wrong side of the spec here--let's fix this.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Keep a separate field, sk_datalen, that tracks only the data contained
    in a fragment, not including the fragment header.

    For now, this is always just max(0, sk_tcplen - 4), but after we allow
    multiple fragments sk_datalen will accumulate the total rpc data size
    while sk_tcplen only tracks progress receiving the current fragment.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The full reclen doesn't include the fragment header, but sk_tcplen does.
    Fix this to make it an apples-to-apples comparison.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Soon we want to support multiple fragments, in which case it may be
    legal for a single fragment to be smaller than 8 bytes, so we'll want to
    delay this check till we've reached the last fragment.

    Also fix an outdated comment.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Byte-swapping in place is always a little dubious.

    Let's instead define this field to always be big-endian, and do the
    swapping on demand where we need it.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

03 Dec, 2012

10 commits


29 Nov, 2012

1 commit

  • There were only a small number of functions in this file and since they
    all affect stored state I think it makes sense to put them in state.h
    instead. I also dropped most static inline declarations since there are
    no callers when fault injection is not enabled.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: J. Bruce Fields

    Bryan Schumaker
     

28 Nov, 2012

13 commits

  • Grace time is a part of NFSv4 state engine, which is constructed per network
    namespace.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: J. Bruce Fields

    Stanislav Kinsbursky
     
  • Lease time is a part of NFSv4 state engine, which is constructed per network
    namespace.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: J. Bruce Fields

    Stanislav Kinsbursky
     
  • This is a cleanup patch. Functions nfsd_pool_stats_open() and
    nfsd_pool_stats_release() are declared in fs/nfsd/nfsd.h.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: J. Bruce Fields

    Stanislav Kinsbursky
     
  • Flag in_grace is a part of client tracking state, which is network namesapce
    aware. So let'a replace global static variable with per-net one.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: J. Bruce Fields

    Stanislav Kinsbursky
     
  • Opening and closing of this file is done in client tracking init and exit
    operations.
    Client tracking is done in network namespace context already. So let's make
    this file opened and closed per network context - this will simlify it's
    management.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: J. Bruce Fields

    Stanislav Kinsbursky
     
  • Split NFSv4 state init and shutdown into two different calls: per-net one and
    generic one.
    Per-net cwinit/shutdown pair have to be called for any namespace, generic pair
    - only once on NSFd kthreads start and shutdown respectively.

    Refresh of diff-nfsd-call-state-init-twice

    Signed-off-by: J. Bruce Fields

    Stanislav Kinsbursky
     
  • This patch renames nfs4_state_start_net() into nfs4_state_create_net(), where
    get_net() now performed.
    Also it introduces new nfs4_state_start_net(), which is now responsible for
    state creation and initializing all per-net data and which is now called from
    nfs4_state_start().

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: J. Bruce Fields

    Stanislav Kinsbursky
     
  • This patch renames __nfs4_state_shutdown_net() into nfs4_state_shutdown_net(),
    __nfs4_state_shutdown() into nfs4_state_shutdown_net() and moves all network
    related shutdown operations to nfs4_state_shutdown_net().

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: J. Bruce Fields

    Stanislav Kinsbursky
     
  • NFSv4 delegations are stored in global list. But they are nfs4_client
    dependent, which is network namespace aware already.
    State shutdown and laundromat are done per network namespace as well.
    So, delegations unhash have to be done in network namespace context.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: J. Bruce Fields

    Stanislav Kinsbursky
     
  • This lock protects the client lru list and session hash table, which are
    allocated per network namespace already.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: J. Bruce Fields

    Stanislav Kinsbursky
     
  • Protection of __nfs4_state_shutdown() with nfs4_lock_state() looks redundant.

    This function is called by the last NFSd thread on it's exit and state lock
    protects actually two functions (del_recall_lru is protected by recall_lock):
    1) nfsd4_client_tracking_exit
    2) __nfs4_state_shutdown_net

    "nfsd4_client_tracking_exit" doesn't require state lock protection, because it's
    state can be modified only by tracker callbacks.
    Here a re they:
    1) create: is called only from nfsd4_proc_compound.
    2) remove: is called from either nfsd4_proc_compound or nfs4_laundromat.
    3) check: is called only from nfsd4_proc_compound.
    4) grace_done; called only from nfs4_laundromat.

    nfsd4_proc_compound is called onll by NFSd kthread, which is exiting right
    now.
    nfs4_laundromat is called by laundry_wq. But laundromat_work was canceled
    already.

    "__nfs4_state_shutdown_net" also doesn't require state lock protection,
    because all NFSd kthreads are dead, and no race can happen with NFSd start,
    because "nfsd_up" flag is still set.
    Moreover, all Nfsd shutdown is protected with global nfsd_mutex.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: J. Bruce Fields

    Stanislav Kinsbursky
     
  • That function is only called under nfsd_mutex: we know that because the
    only caller is nfsd_svc, via

    nfsd_svc
    nfsd_startup
    nfs4_state_start
    nfsd4_client_tracking_init
    client_tracking_ops->init == nfsd4_load_reboot_recovery_data

    The shared state accessed here includes:

    - user_recovery_dirname: used here, modified only by
    nfs4_reset_recoverydir, which can be verified to only be
    called under nfsd_mutex.
    - filesystem state, protected by i_mutex (handwaving slightly
    here)
    - rec_file, reclaim_str_hashtbl, reclaim_str_hashtbl_size: other
    than here, used only from code called from nfsd or laundromat
    threads, both of which should be started only after this runs
    (see nfsd_svc) and stopped before this could run again (see
    nfsd_shutdown, called from nfsd_last_thread).

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The spec requires badname, not inval, in these cases.

    Some callers want us to return enoent, but I can see no justification
    for that.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

26 Nov, 2012

5 commits

  • Linus has pointed out that indiscriminate use of BUG's can make it
    harder to diagnose bugs because they can bring a machine down, often
    before we manage to get any useful debugging information to the logs.
    (Consider, for example, a BUG() that fires in a workqueue, or while
    holding a spinlock).

    Most of these BUG's won't do much more than kill an nfsd thread, but it
    would still probably be safer to get out the warning without dying.

    There's still more of this to do in nfsd/.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Our server rejects compounds containing more than one write operation.
    It's unclear whether this is really permitted by the spec; with 4.0,
    it's possibly OK, with 4.1 (which has clearer limits on compound
    parameters), it's probably not OK. No client that we're aware of has
    ever done this, but in theory it could be useful.

    The source of the limitation: we need an array of iovecs to pass to the
    write operation. In the worst case that array of iovecs could have
    hundreds of elements (the maximum rwsize divided by the page size), so
    it's too big to put on the stack, or in each compound op. So we instead
    keep a single such array in the compound argument.

    We fill in that array at the time we decode the xdr operation.

    But we decode every op in the compound before executing any of them. So
    once we've used that array we can't decode another write.

    If we instead delay filling in that array till the time we actually
    perform the write, we can reuse it.

    Another option might be to switch to decoding compound ops one at a
    time. I considered doing that, but it has a number of other side
    effects, and I'd rather fix just this one problem for now.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • In preparation for moving some of this elsewhere.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • In preparation for moving some of it elsewhere.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The comment here is totally bogus:
    - OP_WRITE + 1 is RELEASE_LOCKOWNER. Maybe there was some older
    version of the spec in which that served as a sort of
    OP_ILLEGAL? No idea, but it's clearly wrong now.
    - In any case, I can't see that the spec says anything about
    what to do if the client sends us less ops than promised.
    It's clearly nutty client behavior, and we should do
    whatever's easiest: returning an xdr error (even though it
    won't be consistent with the error on the last op returned)
    seems fine to me.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields