07 Aug, 2010

1 commit

  • It's harmless to set this after the server is created, but also
    ineffective, since the value is only used at the time of
    svc_create_pooled(). So fail the attempt, in keeping with the pattern
    set by write_versions, write_{lease,grace}time and write_recoverydir.

    (This could break userspace that tried to write to nfsd/max_block_size
    between setting up sockets and starting the server. However, such code
    wouldn't have worked anyway, and I don't know of any examples--rpc.nfsd
    in nfs-utils, probably the only user of the interface, doesn't do that.)

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

30 Jul, 2010

1 commit

  • Fixes at least one real minor bug: the nfs4 recovery dir sysctl
    would not return its status properly.

    Also I finished Al's 1e41568d7378d ("Take ima_path_check() in nfsd
    past dentry_open() in nfsd_open()") commit, it moved the IMA
    code, but left the old path initializer in there.

    The rest is just dead code removed I think, although I was not
    fully sure about the "is_borc" stuff. Some more review
    would be still good.

    Found by gcc 4.6's new warnings.

    Signed-off-by: Andi Kleen
    Cc: Al Viro
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: J. Bruce Fields

    Andi Kleen
     

23 Jul, 2010

3 commits

  • Right now, nfsd keeps a lockd reference for each socket that it has
    open. This is unnecessary and complicates the error handling on
    startup and shutdown. Change it to just do a lockd_up when starting
    the first nfsd thread just do a single lockd_down when taking down the
    last nfsd thread. Because of the strange way the sv_count is handled
    this requires an extra flag to tell whether the nfsd_serv holds a
    reference for lockd or not.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • __write_ports_addxprt calls nfsd_create_serv. That increases the
    refcount of nfsd_serv (which is tracked in sv_nrthreads). The service
    only decrements the thread count on error, not on success like
    __write_ports_addfd does, so using this interface leaves the nfsd
    thread count high.

    Fix this by having this function call svc_destroy() on error to release
    the reference (and possibly to tear down the service) and simply
    decrement the refcount without tearing down the service on success.

    This makes the sv_threads handling work basically the same in both
    __write_ports_addxprt and __write_ports_addfd.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • The refcounting for nfsd is a little goofy. What happens is that we
    create the nfsd RPC service, attach sockets to it but don't actually
    start the threads until someone writes to the "threads" procfile. To do
    this, __write_ports_addfd will create the nfsd service and then will
    decrement the refcount when exiting but won't actually destroy the
    service.

    This is fine when there aren't errors, but when there are this can
    cause later attempts to start nfsd to fail. nfsd_serv will be set,
    and that causes __write_versions to return EBUSY.

    Fix this by calling svc_destroy on nfsd_serv when this function is
    going to return error.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

25 May, 2010

1 commit


04 May, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

07 Mar, 2010

5 commits


28 Jan, 2010

1 commit

  • Try to create a PF_INET6 listener for NFSD, if IPv6 is enabled in the
    kernel.

    Make sure nfsd_serv's reference count is decreased if
    __write_ports_addxprt() failed to create a listener. See
    __write_ports_addfd().

    Our current plan is to rely on rpc.nfsd to create appropriate IPv6
    listeners when server-side NFS/IPv6 support is desired. Legacy
    behavior, via the write_threads or write_svc kernel APIs, will remain
    the same -- only IPv4 listeners are created.

    Signed-off-by: Chuck Lever
    [bfields@citi.umich.edu: Move error-handling code to end]
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

27 Jan, 2010

1 commit

  • write_ports() converts svc_create_xprt()'s ENOENT error return to
    EPROTONOSUPPORT so that rpc.nfsd (in user space) can report an error
    message that makes sense.

    It turns out that several of the other kernel APIs rpc.nfsd use can
    also return ENOENT from svc_create_xprt(), by way of lockd_up().

    On the client side, an NFSv2 or NFSv3 mount request can also return
    the result of lockd_up(). This error may also be returned during an
    NFSv4 mount request, since the NFSv4 callback service uses
    svc_create_xprt() to create the callback listener. An ENOENT error
    return results in a confusing error message from the mount command.

    Let's have svc_create_xprt() return EPROTONOSUPPORT instead of ENOENT.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

16 Dec, 2009

1 commit


15 Dec, 2009

3 commits


02 Oct, 2009

1 commit


26 Aug, 2009

1 commit

  • lock_kernel() in knfsd was replaced with a mutex. The later
    commit 03cf6c9f49a8fea953d38648d016e3f46e814991 ("knfsd:
    add file to export stats about nfsd pools") did not follow
    that change. This patch fixes the issue.

    Also move the get and put of nfsd_serv to the open and close methods
    (instead of start and stop methods) to allow atomic check and increment
    of reference count in the open method (where we can still return an
    error).

    Signed-off-by: Ryusei Yamaguchi
    Signed-off-by: Isaku Yamahata
    Signed-off-by: YOSHIFUJI Hideaki
    Cc: Greg Banks
    Signed-off-by: J. Bruce Fields

    Ryusei Yamaguchi
     

21 Aug, 2009

1 commit


10 Aug, 2009

1 commit


29 Jul, 2009

2 commits

  • Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • kmemleak produces the following warning

    unreferenced object 0xc9ec02a0 (size 8):
    comm "cat", pid 19048, jiffies 730243
    backtrace:
    [] create_object+0x100/0x240
    [] kmemleak_alloc+0x2b/0x60
    [] __kmalloc+0x14b/0x270
    [] write_pool_threads+0x87/0x1d0
    [] nfsctl_transaction_write+0x58/0x70
    [] nfsctl_transaction_read+0x4f/0x60
    [] vfs_read+0x94/0x150
    [] sys_read+0x3d/0x70
    [] sysenter_do_call+0x12/0x32
    [] 0xffffffff

    write_pool_threads() only frees nthreads on error paths, in the success case
    we leak it.

    Signed-off-by: Eric Sesterhenn
    Reviewed-by: Catalin Marinas
    Signed-off-by: J. Bruce Fields

    Eric Sesterhenn
     

13 Jul, 2009

1 commit

  • * Remove smp_lock.h from files which don't need it (including some headers!)
    * Add smp_lock.h to files which do need it
    * Make smp_lock.h include conditional in hardirq.h
    It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

    This will make hardirq.h inclusion cheaper for every PREEMPT=n config
    (which includes allmodconfig/allyesconfig, BTW)

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

19 Jun, 2009

1 commit

  • Currently when we write a number to 'threads' in nfsdfs,
    we take the nfsd_mutex, update the number of threads, then take the
    mutex again to read the number of threads.

    Mostly this isn't a big deal. However if we are write '0', and
    portmap happens to be dead, then we can get unpredictable behaviour.
    If the nfsd threads all got killed quickly and the last thread is
    waiting for portmap to respond, then the second time we take the mutex
    we will block waiting for the last thread.
    However if the nfsd threads didn't die quite that fast, then there
    will be no contention when we try to take the mutex again.

    Unpredictability isn't fun, and waiting for the last thread to exit is
    pointless, so avoid taking the lock twice.
    To achieve this, get nfsd_svc return a non-negative number of active
    threads when not returning a negative error.

    Signed-off-by: NeilBrown

    NeilBrown
     

29 Apr, 2009

13 commits

  • Clean up: For consistency, handle output buffer size checking in a
    other nfsctl functions the same way it's done for write_versions().

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • While it's not likely today that there are enough NFS versions to
    overflow the output buffer in write_versions(), we should be more
    careful about detecting the end of the buffer.

    The number of NFS versions will only increase as NFSv4 minor versions
    are added.

    Note that this API doesn't behave the same as portlist. Here we
    attempt to display as many versions as will fit in the buffer, and do
    not provide any indication that an overflow would have occurred. I
    don't have any good rationale for that.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • While it's not likely a pathname will be longer than
    SIMPLE_TRANSACTION_SIZE, we should be more careful about just
    plopping it into the output buffer without bounds checking.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Adjust the synopsis of svc_sock_names() to pass in the size of the
    output buffer. Add a documenting comment.

    This is a cosmetic change for now. A subsequent patch will make sure
    the buffer length is passed to one_sock_name(), where the length will
    actually be useful.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Adjust the synopsis of svc_addsock() to pass in the size of the output
    buffer. Add a documenting comment.

    This is a cosmetic change for now. A subsequent patch will make sure
    the buffer length is passed to one_sock_name(), where the length will
    actually be useful.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • The svc_xprt_names() function can overflow its buffer if it's so near
    the end of the passed in buffer that the "name too long" string still
    doesn't fit. Of course, it could never tell if it was near the end
    of the passed in buffer, since its only caller passes in zero as the
    buffer length.

    Let's make this API a little safer.

    Change svc_xprt_names() so it *always* checks for a buffer overflow,
    and change its only caller to pass in the correct buffer length.

    If svc_xprt_names() does overflow its buffer, it now fails with an
    ENAMETOOLONG errno, instead of trying to write a message at the end
    of the buffer. I don't like this much, but I can't figure out a clean
    way that's always safe to return some of the names, *and* an
    indication that the buffer was not long enough.

    The displayed error when doing a 'cat /proc/fs/nfsd/portlist' is
    "File name too long".

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up.

    A couple of years ago, a series of commits, finishing with commit
    5680c446, swapped the order of the lockd_up() and svc_addsock() calls
    in __write_ports(). At that time lockd_up() needed to know the
    transport protocol of the passed-in socket to start a listener on the
    same transport protocol.

    These days, lockd_up() doesn't take a protocol argument; it always
    starts both a UDP and TCP listener. It's now more straightforward to
    try the lockd_up() first, then do a lockd_down() if the svc_addsock()
    fails.

    Careful review of this code shows that the svc_sock_names() call is
    used only to close the just-opened socket in case lockd_up() fails.
    So it is no longer needed if lockd_up() is done first.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: Refactor transport name listing out of __write_ports() to
    make it easier to understand and maintain.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • User space must call listen(3) on SOCK_STREAM sockets passed into
    /proc/fs/nfsd/portlist, otherwise that listener is ignored. Document
    this.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: Refactor the socket creation logic out of __write_ports() to
    make it easier to understand and maintain.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: Refactor the socket closing logic out of __write_ports() to
    make it easier to understand and maintain.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: Refactor transport addition out of __write_ports() to make
    it easier to understand and maintain.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: Refactor transport removal out of __write_ports() to make it
    easier to understand and maintain.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever