07 Aug, 2010

5 commits

  • It's harmless to set this after the server is created, but also
    ineffective, since the value is only used at the time of
    svc_create_pooled(). So fail the attempt, in keeping with the pattern
    set by write_versions, write_{lease,grace}time and write_recoverydir.

    (This could break userspace that tried to write to nfsd/max_block_size
    between setting up sockets and starting the server. However, such code
    wouldn't have worked anyway, and I don't know of any examples--rpc.nfsd
    in nfs-utils, probably the only user of the interface, doesn't do that.)

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Commit 59db4a0c102e0de226a3395dbf25ea51bf845937 "nfsd: move more into
    nfsd_startup()" inadvertently moved nfsd_versions after
    nfsd_create_svc(). On older distributions using an rpc.nfsd that does
    not explicitly set the list of nfsd versions, this results in
    svc-create_pooled() being called with an empty versions array. The
    resulting incomplete initialization leads to a NULL dereference in
    svc_process_common() the first time a client accesses the server.

    Move nfsd_reset_versions() back before the svc_create_pooled(); this
    time, put it closer to the svc_create_pooled() call, to make this
    mistake more difficult in the future.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Signed-off-by: Andrea Gelmini
    Signed-off-by: J. Bruce Fields

    Andrea Gelmini
     
  • If a callback is retried at nfsd4_cb_recall_done() due to
    some error, the returned rpc reply crashes here:

    @@ -514,6 +514,7 @@ decode_cb_sequence(struct xdr_stream *xdr, struct nfsd4_cb_sequence *res,
    u32 dummy;
    __be32 *p;

    + BUG_ON(!res);
    if (res->cbs_minorversion == 0)
    return 0;

    [BUG_ON added for demonstration]

    This is because the nfsd4_cb_done_sequence() has NULLed out
    the task->tk_msg.rpc_resp pointer.

    Also eventually the rpc would use the new slot without making
    sure it is free by calling nfsd41_cb_setup_sequence().

    This problem was introduced by a 4.1 protocol addition patch:
    [0421b5c5] nfsd41: Backchannel: Implement cb_recall over NFSv4.1

    Which was overlooking the possibility of an RPC callback retries.
    For not-4.1 case redoing the _prepare is harmless.

    Signed-off-by: Boaz Harrosh
    Signed-off-by: J. Bruce Fields

    Boaz Harrosh
     
  • We must create the server before we can call init_socks or check the
    number of threads.

    Symptoms were a NULL pointer dereference in nfsd_svc(). Problem
    identified by Jeff Layton.

    Also fix a minor cleanup-on-error case in nfsd_startup().

    Reported-by: Tetsuo Handa
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

31 Jul, 2010

1 commit


30 Jul, 2010

5 commits

  • Fixes at least one real minor bug: the nfs4 recovery dir sysctl
    would not return its status properly.

    Also I finished Al's 1e41568d7378d ("Take ima_path_check() in nfsd
    past dentry_open() in nfsd_open()") commit, it moved the IMA
    code, but left the old path initializer in there.

    The rest is just dead code removed I think, although I was not
    fully sure about the "is_borc" stuff. Some more review
    would be still good.

    Found by gcc 4.6's new warnings.

    Signed-off-by: Andi Kleen
    Cc: Al Viro
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: J. Bruce Fields

    Andi Kleen
     
  • The vfs doesn't really allow us to "upgrade" a file descriptor from
    read-only to read-write, and our attempt to do so in nfs4_upgrade_open
    is ugly and incomplete.

    Move to a different scheme where we keep multiple opens, shared between
    open stateid's, in the nfs4_file struct. Each file will be opened at
    most 3 times (for read, write, and read-write), and those opens will be
    shared between all clients and openers. On upgrade we will do another
    open if necessary instead of attempting to upgrade an existing open.
    We keep count of the number of readers and writers so we know when to
    close the shared files.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • It is legal to perform a write using the lock stateid that was
    originally associated with a read lock, or with a file that was
    originally opened for read, but has since been upgraded.

    So, when checking the openmode, check the mode associated with the
    open stateid from which the lock was derived.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Move more work into helper functions.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The delegation code mostly pretends to support either read or write
    delegations. However, correct support for write delegations would
    require, for example, breaking of delegations (and/or implementation of
    cb_getattr) on stat. Currently all that stops us from handing out
    delegations is a subtle reference-counting issue.

    Avoid confusion by adding an earlier check that explicitly refuses write
    delegations.

    For now, though, I'm not going so far as to rip out existing
    half-support for write delegations, in case we get around to using that
    soon.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

28 Jul, 2010

1 commit

  • The readahead cache compensates for the fact that the NFS server
    currently does an open and close on every IO operation in the NFSv2 and
    NFSv3 case.

    In the NFSv4 case we have long-lived struct files associated with client
    opens, so there's no need for this. In fact, concurrent IO's using
    trying to modify the same file->f_ra may cause problems.

    So, don't bother with the readahead cache in that case.

    Note eventually we'll likely do this in the v2/v3 case as well by
    keeping a cache of struct files instead of struct file_ra_state's.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

23 Jul, 2010

8 commits

  • More idiomatic to put the error case in the if clause.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • This is just cleanup--it's harmless to call nfsd_rachache_init,
    nfsd_init_socks, and nfsd_reset_versions more than once. But there's no
    point to it.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Right now, nfsd keeps a lockd reference for each socket that it has
    open. This is unnecessary and complicates the error handling on
    startup and shutdown. Change it to just do a lockd_up when starting
    the first nfsd thread just do a single lockd_down when taking down the
    last nfsd thread. Because of the strange way the sv_count is handled
    this requires an extra flag to tell whether the nfsd_serv holds a
    reference for lockd or not.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • There doesn't seem to be any need to reset the nfssvc_boot time if the
    nfsd startup failed.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • __write_ports_addxprt calls nfsd_create_serv. That increases the
    refcount of nfsd_serv (which is tracked in sv_nrthreads). The service
    only decrements the thread count on error, not on success like
    __write_ports_addfd does, so using this interface leaves the nfsd
    thread count high.

    Fix this by having this function call svc_destroy() on error to release
    the reference (and possibly to tear down the service) and simply
    decrement the refcount without tearing down the service on success.

    This makes the sv_threads handling work basically the same in both
    __write_ports_addxprt and __write_ports_addfd.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • The refcounting for nfsd is a little goofy. What happens is that we
    create the nfsd RPC service, attach sockets to it but don't actually
    start the threads until someone writes to the "threads" procfile. To do
    this, __write_ports_addfd will create the nfsd service and then will
    decrement the refcount when exiting but won't actually destroy the
    service.

    This is fine when there aren't errors, but when there are this can
    cause later attempts to start nfsd to fail. nfsd_serv will be set,
    and that causes __write_versions to return EBUSY.

    Fix this by calling svc_destroy on nfsd_serv when this function is
    going to return error.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • If someone tries to shut down the laundry_wq while it isn't up it'll
    cause an oops.

    This can happen because write_ports can create a nfsd_svc before we
    really start the nfs server, and we may fail before the server is ever
    started.

    Also make sure state is shutdown on error paths in nfsd_svc().

    Use a common global nfsd_up flag instead of nfs4_init, and create common
    helper functions for nfsd start/shutdown, as there will be other work
    that we want done only when we the number of nfsd threads transitions
    between zero and nonzero.

    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Trivial cleanup, since "dest" is never used.

    Reported-by: Anshul Madan
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

08 Jul, 2010

1 commit

  • Some well-known NFSv3 clients drop their directory entry caches when
    they receive replies with no WCC data. Without this data, they
    employ extra READ, LOOKUP, and GETATTR requests to ensure their
    directory entry caches are up to date, causing performance to suffer
    needlessly.

    In order to return WCC data, our server has to have both the pre-op
    and the post-op attribute data on hand when a reply is XDR encoded.
    The pre-op data is filled in when the incoming fh is locked, and the
    post-op data is filled in when the fh is unlocked.

    Unfortunately, for REMOVE, RMDIR, MKNOD, and MKDIR, the directory fh
    is not unlocked until well after the reply has been XDR encoded. This
    means that encode_wcc_data() does not have wcc_data for the parent
    directory, so none is returned to the client after these operations
    complete.

    By unlocking the parent directory fh immediately after the internal
    operations for each NFS procedure is complete, the post-op data is
    filled in before XDR encoding starts, so it can be returned to the
    client properly.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

07 Jul, 2010

2 commits

  • Reported-by: "Madan, Anshul"
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • This patch makes the cache_cleaner workqueue deferrable, to prevent
    unnecessary system wake-ups, which is very important for embedded
    battery-powered devices.

    do_cache_clean() is called every 30 seconds at the moment, and often
    makes the system wake up from its power-save sleep state. With this
    change, when the workqueue uses a deferrable timer, the
    do_cache_clean() invocation will be delayed and combined with the
    closest "real" wake-up. This improves the power consumption situation.

    Note, I tried to create a DECLARE_DELAYED_WORK_DEFERRABLE() helper
    macro, similar to DECLARE_DELAYED_WORK(), but failed because of the
    way the timer wheel core stores the deferrable flag (it is the
    LSBit in the time->base pointer). My attempt to define a static
    variable with this bit set ended up with the "initializer element is
    not constant" error.

    Thus, I have to use run-time initialization, so I created a new
    cache_initialize() function which is called once when sunrpc is
    being initialized.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: J. Bruce Fields

    Artem Bityutskiy
     

25 Jun, 2010

2 commits


23 Jun, 2010

4 commits


01 Jun, 2010

4 commits


31 May, 2010

7 commits