15 Jan, 2012

1 commit

  • * 'for-3.3' of git://linux-nfs.org/~bfields/linux: (31 commits)
    nfsd4: nfsd4_create_clid_dir return value is unused
    NFSD: Change name of extended attribute containing junction
    svcrpc: don't revert to SVC_POOL_DEFAULT on nfsd shutdown
    svcrpc: fix double-free on shutdown of nfsd after changing pool mode
    nfsd4: be forgiving in the absence of the recovery directory
    nfsd4: fix spurious 4.1 post-reboot failures
    NFSD: forget_delegations should use list_for_each_entry_safe
    NFSD: Only reinitilize the recall_lru list under the recall lock
    nfsd4: initialize special stateid's at compile time
    NFSd: use network-namespace-aware cache registering routines
    SUNRPC: create svc_xprt in proper network namespace
    svcrpc: update outdated BKL comment
    nfsd41: allow non-reclaim open-by-fh's in 4.1
    svcrpc: avoid memory-corruption on pool shutdown
    svcrpc: destroy server sockets all at once
    svcrpc: make svc_delete_xprt static
    nfsd: Fix oops when parsing a 0 length export
    nfsd4: Use kmemdup rather than duplicating its implementation
    nfsd4: add a separate (lockowner, inode) lookup
    nfsd4: fix CONFIG_NFSD_FAULT_INJECTION compile error
    ...

    Linus Torvalds
     

12 Dec, 2011

1 commit


07 Dec, 2011

4 commits

  • This patch makes svc_xprt inherit network namespace link from its socket.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: J. Bruce Fields

    Stanislav Kinsbursky
     
  • Socket callbacks use svc_xprt_enqueue() to add an xprt to a
    pool->sp_sockets list. In normal operation a server thread will later
    come along and take the xprt off that list. On shutdown, after all the
    threads have exited, we instead manually walk the sv_tempsocks and
    sv_permsocks lists to find all the xprt's and delete them.

    So the sp_sockets lists don't really matter any more. As a result,
    we've mostly just ignored them and hoped they would go away.

    Which has gotten us into trouble; witness for example ebc63e531cc6
    "svcrpc: fix list-corrupting race on nfsd shutdown", the result of Ben
    Greear noticing that a still-running svc_xprt_enqueue() could re-add an
    xprt to an sp_sockets list just before it was deleted. The fix was to
    remove it from the list at the end of svc_delete_xprt(). But that only
    made corruption less likely--I can see nothing that prevents a
    svc_xprt_enqueue() from adding another xprt to the list at the same
    moment that we're removing this xprt from the list. In fact, despite
    the earlier xpo_detach(), I don't even see what guarantees that
    svc_xprt_enqueue() couldn't still be running on this xprt.

    So, instead, note that svc_xprt_enqueue() essentially does:
    lock sp_lock
    if XPT_BUSY unset
    add to sp_sockets
    unlock sp_lock

    So, if we do:

    set XPT_BUSY on every xprt.
    Empty every sp_sockets list, under the sp_socks locks.

    Then we're left knowing that the sp_sockets lists are all empty and will
    stay that way, since any svc_xprt_enqueue() will check XPT_BUSY under
    the sp_lock and see it set.

    And *then* we can continue deleting the xprt's.

    (Thanks to Jeff Layton for being correctly suspicious of this code....)

    Cc: Ben Greear
    Cc: Jeff Layton
    Cc: stable@kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • There's no reason I can see that we need to call sv_shutdown between
    closing the two lists of sockets.

    Cc: stable@kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

01 Nov, 2011

1 commit


14 Sep, 2011

1 commit

  • For IPv6 local address, lockd can not callback to client for
    missing scope id when binding address at inet6_bind:

    324 if (addr_type & IPV6_ADDR_LINKLOCAL) {
    325 if (addr_len >= sizeof(struct sockaddr_in6) &&
    326 addr->sin6_scope_id) {
    327 /* Override any existing binding, if another one
    328 * is supplied by user.
    329 */
    330 sk->sk_bound_dev_if = addr->sin6_scope_id;
    331 }
    332
    333 /* Binding to link-local address requires an interface */
    334 if (!sk->sk_bound_dev_if) {
    335 err = -EINVAL;
    336 goto out_unlock;
    337 }

    Replacing svc_addr_u by sockaddr_storage, let rqstp->rq_daddr contains more info
    besides address.

    Reviewed-by: Jeff Layton
    Reviewed-by: Chuck Lever
    Signed-off-by: Mi Jinlong
    Signed-off-by: J. Bruce Fields

    Mi Jinlong
     

16 Jul, 2011

1 commit

  • After commit 3262c816a3d7fb1eaabce633caa317887ed549ae "[PATCH] knfsd:
    split svc_serv into pools", svc_delete_xprt (then svc_delete_socket) no
    longer removed its xpt_ready (then sk_ready) field from whatever list it
    was on, noting that there was no point since the whole list was about to
    be destroyed anyway.

    That was mostly true, but forgot that a few svc_xprt_enqueue()'s might
    still be hanging around playing with the about-to-be-destroyed list, and
    could get themselves into trouble writing to freed memory if we left
    this xprt on the list after freeing it.

    (This is actually functionally identical to a patch made first by Ben
    Greear, but with more comments.)

    Cc: stable@kernel.org
    Cc: gnb@fmeh.org
    Reported-by: Ben Greear
    Tested-by: Ben Greear
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

12 Jan, 2011

1 commit

  • Multiple backchannels can share the same tcp connection; from rfc 5661 section
    2.10.3.1:

    A connection's association with a session is not exclusive. A
    connection associated with the channel(s) of one session may be
    simultaneously associated with the channel(s) of other sessions
    including sessions associated with other client IDs.

    However, multiple backchannels share a connection, they must all share
    the same xid stream (hence the same rpc_xprt); the only way we have to
    match replies with calls at the rpc layer is using the xid.

    So, keep the rpc_xprt around as long as the connection lasts, in case
    we're asked to use the connection as a backchannel again.

    Requests to create new backchannel clients over a given server
    connection should results in creating new clients that reuse the
    existing rpc_xprt.

    But to start, just reject attempts to associate multiple rpc_xprt's with
    the same underlying bc_xprt.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

05 Jan, 2011

1 commit

  • Currently we use -EAGAIN returns to determine when to drop a deferred
    request. On its own, that is error-prone, as it makes us treat -EAGAIN
    returns from other functions specially to prevent inadvertent dropping.

    So, use a flag on the request instead.

    Returning an error on request deferral is still required, to prevent
    further processing, but we no longer need worry that an error return on
    its own could result in a drop.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

18 Dec, 2010

2 commits

  • The xpt_pool field is only used for reporting BUGs.
    And it isn't used correctly.

    In particular, when it is cleared in svc_xprt_received before
    XPT_BUSY is cleared, there is no guarantee that either the
    compiler or the CPU might not re-order to two assignments, just
    setting xpt_pool to NULL after XPT_BUSY is cleared.

    If a different cpu were running svc_xprt_enqueue at this moment,
    it might see XPT_BUSY clear and then xpt_pool non-NULL, and
    so BUG.

    This could be fixed by calling
    smp_mb__before_clear_bit()
    before the clear_bit. However as xpt_pool isn't really used,
    it seems safest to simply remove xpt_pool.

    Another alternate would be to change the clear_bit to
    clear_bit_unlock, and the test_and_set_bit to test_and_set_bit_lock.

    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     
  • J. Bruce Fields
     

08 Dec, 2010

1 commit

  • When an xprt is created, it has a refcount of 1, and XPT_BUSY is set.
    The refcount is *not* owned by the thread that created the xprt
    (as is clear from the fact that creators never put the reference).
    Rather, it is owned by the absence of XPT_DEAD. Once XPT_DEAD is set,
    (And XPT_BUSY is clear) that initial reference is dropped and the xprt
    can be freed.

    So when a creator clears XPT_BUSY it is dropping its only reference and
    so must not touch the xprt again.

    However svc_recv, after calling ->xpo_accept (and so getting an XPT_BUSY
    reference on a new xprt), calls svc_xprt_recieved. This clears
    XPT_BUSY and then svc_xprt_enqueue - this last without owning a reference.
    This is dangerous and has been seen to leave svc_xprt_enqueue working
    with an xprt containing garbage.

    So we need to hold an extra counted reference over that call to
    svc_xprt_received.

    For safety, any time we clear XPT_BUSY and then use the xprt again, we
    first get a reference, and the put it again afterwards.

    Note that svc_close_all does not need this extra protection as there are
    no threads running, and the final free can only be called asynchronously
    from such a thread.

    Signed-off-by: NeilBrown
    Cc: stable@kernel.org
    Signed-off-by: J. Bruce Fields

    NeilBrown
     

20 Nov, 2010

5 commits

  • We call svc_xprt_enqueue() after something happens which we think may
    require handling from a server thread. To avoid such events being lost,
    svc_xprt_enqueue() must guarantee that there will be a svc_serv() call
    from a server thread following any such event. It does that by either
    waking up a server thread itself, or checking that XPT_BUSY is set (in
    which case somebody else is doing it).

    But the check of XPT_BUSY could occur just as someone finishes
    processing some other event, and just before they clear XPT_BUSY.

    Therefore it's important not to clear XPT_BUSY without subsequently
    doing another svc_export_enqueue() to check whether the xprt should be
    requeued.

    The xpo_wspace() check in svc_xprt_enqueue() breaks this rule, allowing
    an event to be missed in situations like:

    data arrives
    call svc_tcp_data_ready():
    call svc_xprt_enqueue():
    set BUSY
    find no write space
    svc_reserve():
    free up write space
    call svc_enqueue():
    test BUSY
    clear BUSY

    So, instead, check wspace in the same places that the state flags are
    checked: before taking BUSY, and in svc_receive().

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Neil Brown had to explain to me why we do this here; record the answer
    for posterity.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • There's no need to be fooling with XPT_BUSY now that all the threads
    are gone.

    The list_del_init() here could execute at the same time as the
    svc_xprt_enqueue()'s list_add_tail(), with undefined results. We don't
    really care at this point, but it might result in a spurious
    list-corruption warning or something.

    And svc_close() isn't adding any value; just call svc_delete_xprt()
    directly.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Follow up on b48fa6b99100dc7772af3cd276035fcec9719ceb by moving all the
    svc_xprt_received() calls for the main xprt to one place. The clearing
    of XPT_BUSY here is critical to the correctness of the server, so I'd
    prefer it to be obvious where we do it.

    The only substantive result is moving svc_xprt_received() after
    svc_receive_deferred(). Other than a (likely insignificant) delay
    waking up the next thread, that should be harmless.

    Also reshuffle the exit code a little to skip a few other steps that we
    don't care about the in the svc_delete_xprt() case.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • There's no harm to doing this, since the only caller will immediately
    call svc_enqueue() afterwards, ensuring we don't miss the remaining
    deferred requests just because XPT_DEFERRED was briefly cleared.

    But why not just do this the simple way?

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

18 Nov, 2010

1 commit


26 Oct, 2010

3 commits


19 Oct, 2010

1 commit


02 Oct, 2010

3 commits


27 Sep, 2010

2 commits


08 Sep, 2010

2 commits

  • Pull out some code into helper functions, fix a typo.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The current practice of waiting for cache updates by queueing the
    whole request to be retried has (at least) two problems.

    1/ With NFSv4, requests can be quite complex and re-trying a whole
    request when a later part fails should only be a last-resort, not a
    normal practice.

    2/ Large requests, and in particular any 'write' request, will not be
    queued by the current code and doing so would be undesirable.

    In many cases only a very sort wait is needed before the cache gets
    valid data.

    So, providing the underlying transport permits it by setting
    ->thread_wait,
    arrange to wait briefly for an upcall to be completed (as reflected in
    the clearing of CACHE_PENDING).
    If the short wait was not long enough and CACHE_PENDING is still set,
    fall back on the old approach.

    The 'thread_wait' value is set to 5 seconds when there are spare
    threads, and 1 second when there are no spare threads.

    These values are probably much higher than needed, but will ensure
    some forward progress.

    Note that as we only request an update for a non-valid item, and as
    non-valid items are updated in place it is extremely unlikely that
    cache_check will return -ETIMEDOUT. Normally cache_defer_req will
    sleep for a short while and then find that the item is_valid.

    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     

04 May, 2010

1 commit


03 May, 2010

1 commit

  • svc_xprt_received must be called when ->xpo_recvfrom has finished
    receiving a message, so that the XPT_BUSY flag will be cleared and
    if necessary, requeued for further work.

    This call is currently made in each ->xpo_recvfrom function, often
    from multiple different points. In each case it is the earliest point
    on a particular path where it is known that the protection provided by
    XPT_BUSY is no longer needed.

    However there are (still) some error paths which do not call
    svc_xprt_received, and requiring each ->xpo_recvfrom to make the call
    does not encourage robustness.

    So: move the svc_xprt_received call to be made just after the
    call to ->xpo_recvfrom(), and move it of the various ->xpo_recvfrom
    methods.

    This means that it may not be called at the earliest possible instant,
    but this is unlikely to be a measurable performance issue.

    Note that there are still other calls to svc_xprt_received as it is
    also needed when an xprt is newly created.

    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    Neil Brown
     

30 Mar, 2010

2 commits

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     
  • svc_xprt_put() can call tcp_close(), which can sleep, so we shouldn't be
    holding this lock.

    In fact, only the xpt_list removal and the sv_tmpcnt decrement should
    need the sv_lock here.

    Reported-by: Mi Jinlong
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

01 Mar, 2010

2 commits


27 Feb, 2010

1 commit

  • The 'struct svc_deferred_req's on the xpt_deferred queue do not
    own a reference to the owning xprt. This is seen in svc_revisit
    which is where things are added to this queue. dr->xprt is set to
    NULL and the reference to the xprt it put.

    So when this list is cleaned up in svc_delete_xprt, we mustn't
    put the reference.

    Also, replace the 'for' with a 'while' which is arguably
    simpler and more likely to compile efficiently.

    Cc: Tom Tucker
    Signed-off-by: NeilBrown
    Cc: stable@kernel.org
    Signed-off-by: J. Bruce Fields

    Neil Brown
     

27 Jan, 2010

2 commits

  • write_ports() converts svc_create_xprt()'s ENOENT error return to
    EPROTONOSUPPORT so that rpc.nfsd (in user space) can report an error
    message that makes sense.

    It turns out that several of the other kernel APIs rpc.nfsd use can
    also return ENOENT from svc_create_xprt(), by way of lockd_up().

    On the client side, an NFSv2 or NFSv3 mount request can also return
    the result of lockd_up(). This error may also be returned during an
    NFSv4 mount request, since the NFSv4 callback service uses
    svc_create_xprt() to create the callback listener. An ENOENT error
    return results in a confusing error message from the mount command.

    Let's have svc_create_xprt() return EPROTONOSUPPORT instead of ENOENT.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: Bruce observed we have more or less common logic in each of
    svc_create_xprt()'s callers: the check to create an IPv6 RPC listener
    socket only if CONFIG_IPV6 is set. I'm about to add another case
    that does just the same.

    If we move the ifdefs into __svc_xpo_create(), then svc_create_xprt()
    call sites can get rid of the "#ifdef" ugliness, and can use the same
    logic with or without IPv6 support available in the kernel.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever