20 Aug, 2011

1 commit

  • Use NUMA aware allocations to reduce latencies and increase throughput.

    sunrpc kthreads can use kthread_create_on_node() if pool_mode is
    "percpu" or "pernode", and svc_prepare_thread()/svc_init_buffer() can
    also take into account NUMA node affinity for memory allocations.

    Signed-off-by: Eric Dumazet
    CC: "J. Bruce Fields"
    CC: Neil Brown
    CC: David Miller
    Reviewed-by: Greg Banks
    [bfields@redhat.com: fix up caller nfs41_callback_up]
    Signed-off-by: J. Bruce Fields

    Eric Dumazet
     

28 Oct, 2010

1 commit

  • lockd should use lock_flocks() instead of lock_kernel()
    to lock against posix locks accessing the i_flock list.

    This is a prerequisite to turning lock_flocks into a
    spinlock.

    Signed-off-by: Arnd Bergmann
    Acked-by: J. Bruce Fields

    Arnd Bergmann
     

02 Oct, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

27 Jan, 2010

1 commit

  • Clean up: Bruce observed we have more or less common logic in each of
    svc_create_xprt()'s callers: the check to create an IPv6 RPC listener
    socket only if CONFIG_IPV6 is set. I'm about to add another case
    that does just the same.

    If we move the ifdefs into __svc_xpo_create(), then svc_create_xprt()
    call sites can get rid of the "#ifdef" ugliness, and can use the same
    logic with or without IPv6 support available in the kernel.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

19 Nov, 2009

1 commit


12 Nov, 2009

1 commit


07 May, 2009

1 commit

  • If lockd is signalled soon enough after restart then locks_start_grace()
    will try to re-add an entry to a list and trigger a lock corruption
    warning.

    Thanks to Wang Chen for the problem report and diagnosis.

    WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c()
    ...
    list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128).
    ...
    Pid: 23062, comm: lockd Tainted: G W 2.6.30-rc2 #3
    Call Trace:
    [] warn_slowpath+0x71/0xa0
    [] ? update_curr+0x11d/0x125
    [] ? trace_hardirqs_on_caller+0x18/0x150
    [] ? trace_hardirqs_on+0xb/0xd
    [] ? _raw_spin_lock+0x53/0xfa
    [] __list_add+0x27/0x5c
    [] locks_start_grace+0x22/0x30 [lockd]
    [] set_grace_period+0x39/0x53 [lockd]
    [] ? lock_kernel+0x1c/0x28
    [] lockd+0x64/0x164 [lockd]
    [] ? trace_hardirqs_on_caller+0x18/0x150
    [] ? complete+0x34/0x3e
    [] ? lockd+0x0/0x164 [lockd]
    [] ? lockd+0x0/0x164 [lockd]
    [] kthread+0x45/0x6b
    [] ? kthread+0x0/0x6b
    [] kernel_thread_helper+0x7/0x10

    Reported-by: Wang Chen
    Signed-off-by: J. Bruce Fields
    Cc: stable@kernel.org

    J. Bruce Fields
     

29 Mar, 2009

4 commits

  • Apparently a lot of people need to disable IPv6 completely on their
    distributor-built systems, which have CONFIG_IPV6_MODULE enabled at
    build time.

    They do this by blacklisting the ipv6.ko module. This causes the
    creation of the lockd service listener to fail if CONFIG_IPV6_MODULE
    is set, but the module cannot be loaded.

    Now that the kernel's PF_INET6 RPC listeners are completely separate
    from PF_INET listeners, we can always start PF_INET. Then lockd can
    try to start PF_INET6, but it isn't required to be available.

    Note this has the added benefit that NLM callbacks from AF_INET6
    servers will never come from AF_INET remotes. We no longer have to
    worry about matching mapped IPv4 addresses to AF_INET when comparing
    addresses.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • We're about to convert over to using separate PF_INET and PF_INET6
    listeners, instead of a single PF_INET6 listener that also receives
    AF_INET requests and maps them to AF_INET6.

    Clear the way by removing the logic in lockd and the NFSv4 callback
    server that creates an AF_INET6 service listener.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Since an RPC service listener's protocol family is specified now via
    svc_create_xprt(), it no longer needs to be passed to svc_create() or
    svc_create_pooled(). Remove that argument from the synopsis of those
    functions, and remove the sv_family field from the svc_serv struct.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • The sv_family field is going away. Pass a protocol family argument to
    svc_create_xprt() instead of extracting the family from the passed-in
    svc_serv struct.

    Again, as this is a listener socket and not an address, we make this
    new argument an "int" protocol family, instead of an "sa_family_t."

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

08 Jan, 2009

2 commits


07 Jan, 2009

4 commits

  • If the kernel is configured to support IPv6 and the RPC server can register
    services via rpcbindv4, we are all set to enable IPv6 support for lockd.

    Signed-off-by: Chuck Lever
    Cc: Aime Le Rouzic
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up.

    Treat the nsm_use_hostnames global variable like nsm_local_state.
    Note that the default value of nsm_use_hostnames is still zero.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: The include/linux/lockd/sm_inter.h header is nearly empty
    now. Remove it.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • The default method for calculating the number of connections allowed
    per RPC service arbitrarily limits single-threaded services to 80
    connections. This is too low for services like lockd and artificially
    limits the number of TCP clients that it can support.

    Have lockd set a default sv_maxconn value to 1024 (which is the typical
    default value for RLIMIT_NOFILE. Also add a module parameter to allow an
    admin to set this to an arbitrary value.

    Signed-off-by: Jeff Layton
    Acked-by: Neil Brown
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

24 Dec, 2008

1 commit


25 Nov, 2008

1 commit


05 Oct, 2008

2 commits

  • Clean up: Now that lockd_up() starts listeners for both transports, the
    "proto" argument is no longer needed.

    Signed-off-by: Chuck Lever
    Cc: Neil Brown
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Commit 24e36663, which first appeared in 2.6.19, changed lockd so that
    the client side starts a UDP listener only if there is a UDP NFSv2/v3
    mount. Its description notes:

    This... means that lockd will *not* listen on UDP if the only
    mounts are TCP mount (and nfsd hasn't started).

    The latter is the only one that concerns me at all - I don't know
    if this might be a problem with some servers.

    Unfortunately it is a problem for Linux itself. The rpc.statd daemon
    on Linux uses UDP for contacting the local lockd, no matter which
    protocol is used for NFS mounts. Without a local lockd UDP listener,
    NFSv2/v3 lock recovery from Linux NFS clients always fails.

    Revert parts of commit 24e36663 so lockd_up() always starts both
    listeners.

    Signed-off-by: Chuck Lever
    Cc: Neil Brown
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

04 Oct, 2008

1 commit

  • Rewrite grace period code to unify management of grace period across
    lockd and nfsd. The current code has lockd and nfsd cooperate to
    compute a grace period which is satisfactory to them both, and then
    individually enforce it. This creates a slight race condition, since
    the enforcement is not coordinated. It's also more complicated than
    necessary.

    Here instead we have lockd and nfsd each inform common code when they
    enter the grace period, and when they're ready to leave the grace
    period, and allow normal locking only after both of them are ready to
    leave.

    We also expect the locks_start_grace()/locks_end_grace() interface here
    to be simpler to build on for future cluster/high-availability work,
    which may require (for example) putting individual filesystems into
    grace, or enforcing grace periods across multiple cluster nodes.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

30 Sep, 2008

3 commits

  • End lockd's grace period using schedule_delayed_work() instead of a
    check on every pass through the main loop.

    After a later patch, we'll depend on lockd to end its grace period even
    if it's not currently handling requests; so it shouldn't depend on being
    woken up from the main loop to do so.

    Also, Nakano Hiroaki (who independently produced a similar patch)
    noticed that the current behavior is buggy in the face of jiffies
    wraparound:

    "lockd uses time_before() to determine whether the grace period
    has expired. This would seem to be enough to avoid timer
    wrap-around issues, but, unfortunately, that is not the case.
    The time_* family of comparison functions can be safely used to
    compare jiffies relatively close in time, but they stop working
    after approximately LONG_MAX/2 ticks. nfsd can suffer this
    problem because the time_before() comparison in lockd() is not
    performed until the first request comes in, which means that if
    there is no lockd traffic for more than LONG_MAX/2 ticks we are
    screwed.

    "The implication of this is that once time_before() starts
    misbehaving any attempt from a NFS client to execute fcntl()
    will be received with a NLM_LCK_DENIED_GRACE_PERIOD message for
    25 days (assuming HZ=1000). In other words, the 50 seconds grace
    period could turn into a grace period of 50 days or more.

    "Note: This bug was analyzed independently by Oda-san
    and myself."

    Signed-off-by: J. Bruce Fields
    Cc: Nakano Hiroaki
    Cc: Itsuro Oda

    J. Bruce Fields
     
  • The check here is currently harmless but unnecessary, since, as the
    comment notes, there aren't any blocked-lock callbacks to process
    during the grace period anyway.

    And eventually we want to allow multiple grace periods that come and go
    for different filesystems over the course of the lifetime of lockd, at
    which point this check is just going to get in the way.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Introduce and initialize an address family field in the svc_serv structure.

    This field will determine what family to use for the service's listener
    sockets and what families are advertised via the local rpcbind daemon.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

24 Jun, 2008

1 commit


25 Apr, 2008

1 commit

  • * git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (80 commits)
    SUNRPC: Invalidate the RPCSEC_GSS session if the server dropped the request
    make nfs_automount_list static
    NFS: remove duplicate flags assignment from nfs_validate_mount_data
    NFS - fix potential NULL pointer dereference v2
    SUNRPC: Don't change the RPCSEC_GSS context on a credential that is in use
    SUNRPC: Fix a race in gss_refresh_upcall()
    SUNRPC: Don't disconnect more than once if retransmitting NFSv4 requests
    SUNRPC: Remove the unused export of xprt_force_disconnect
    SUNRPC: remove XS_SENDMSG_RETRY
    SUNRPC: Protect creds against early garbage collection
    NFSv4: Attempt to use machine credentials in SETCLIENTID calls
    NFSv4: Reintroduce machine creds
    NFSv4: Don't use cred->cr_ops->cr_name in nfs4_proc_setclientid()
    nfs: fix printout of multiword bitfields
    nfs: return negative error value from nfs{,4}_stat_to_errno
    NLM/lockd: Ensure client locking calls use correct credentials
    NFS: Remove the buggy lock-if-signalled case from do_setlk()
    NLM/lockd: Fix a race when cancelling a blocking lock
    NLM/lockd: Ensure that nlmclnt_cancel() returns results of the CANCEL call
    NLM: Remove the signal masking in nlmclnt_proc/nlmclnt_cancel
    ...

    Linus Torvalds
     

24 Apr, 2008

2 commits

  • When svc_recv returns an unexpected error, lockd will print a warning
    and exit. This problematic for several reasons. In particular, it will
    cause the reference counts for the thread to be wrong, and can lead to a
    potential BUG() call.

    Rather than exiting on error from svc_recv, have the thread do a 1s
    sleep and then retry the loop. This is unlikely to cause any harm, and
    if the error turns out to be something temporary then it may be able to
    recover.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Have lockd_up start lockd using kthread_run. With this change,
    lockd_down now blocks until lockd actually exits, so there's no longer
    need for the waitqueue code at the end of lockd_down. This also means
    that only one lockd can be running at a time which simplifies the code
    within lockd's main loop.

    This also adds a check for kthread_should_stop in the main loop of
    nlmsvc_retry_blocked and after that function returns. There's no sense
    continuing to retry blocks if lockd is coming down anyway.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

20 Mar, 2008

1 commit

  • Bruce Fields says:
    "By the way, we've got another config-related nit here:

    http://bugzilla.linux-nfs.org/show_bug.cgi?id=156

    You can build lockd without CONFIG_SYSCTL set, but then the module will
    fail to load."

    For now, disable the sysctl registration calls in lockd if CONFIG_SYSCTL
    is not enabled. This allows the kernel to build properly if PROC_FS or
    SYSCTL is not enabled, but an NFS client is desired.

    In the long run, we would like to be able to build the kernel with an
    NFS client but without lockd. This makes sense, for example, if you want
    an NFSv4-only NFS client, as NFSv4 doesn't use NLM at all.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

22 Feb, 2008

1 commit

  • Sorry for the noise, but here's the v3 of this compilation fix :)

    There are some places, which declare the char buf[...] on the stack
    to push it later into dprintk(). Since the dprintk sometimes (if the
    CONFIG_SYSCTL=n) becomes an empty do { } while (0) stub, these buffers
    cause gcc to produce appropriate warnings.

    Wrap these buffers with RPC_IFDEBUG macro, as Trond proposed, to
    compile them out when not needed.

    Signed-off-by: Pavel Emelyanov
    Acked-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Pavel Emelyanov
     

02 Feb, 2008

4 commits

  • Update the write handler for the portlist file to allow creating new
    listening endpoints on a transport. The general form of the string is:

    For example:

    echo "tcp 2049" > /proc/fs/nfsd/portlist

    This is intended to support the creation of a listening endpoint for
    RDMA transports without adding #ifdef code to the nfssvc.c file.

    Transports can also be removed as follows:

    '-'

    For example:

    echo "-tcp 2049" > /proc/fs/nfsd/portlist

    Attempting to add a listener with an invalid transport string results
    in EPROTONOSUPPORT and a perror string of "Protocol not supported".

    Attempting to remove an non-existent listener (.e.g. bad proto or port)
    results in ENOTCONN and a perror string of
    "Transport endpoint is not connected"

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • Add a new svc function that allows a service to query whether a
    transport instance has already been created. This is used in lockd
    to determine whether or not a transport needs to be created when
    a lockd instance is brought up.

    Specifying 0 for the address family or port is effectively a wild-card,
    and will result in matching the first transport in the service's list
    that has a matching class name.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • Move sk_list and sk_ready to svc_xprt. This involves close because these
    lists are walked by svcs when closing all their transports. So I combined
    the moving of these lists to svc_xprt with making close transport independent.

    The svc_force_sock_close has been changed to svc_close_all and takes a list
    as an argument. This removes some svc internals knowledge from the svcs.

    This code races with module removal and transport addition.

    Thanks to Simon Holm Thøgersen for a compile fix.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields
    Cc: Simon Holm Thøgersen

    Tom Tucker
     
  • Modify the various kernel RPC svcs to use the svc_create_xprt service.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     

18 Jul, 2007

2 commits

  • Both lockd and (in the nfsv4 case) nfsd enforce a "grace period" after reboot,
    during which clients may reclaim locks from the previous server instance, but
    may not acquire new locks.

    Currently the lockd and nfsd enforce grace periods of different lengths. This
    may cause problems when we reboot a server with both v2/v3 and v4 clients.
    For example, if the lockd grace period is shorter (as is likely the case),
    then a v3 client might acquire a new lock that conflicts with a lock already
    held (but not yet reclaimed) by a v4 client.

    This patch calculates a lease time that lockd and nfsd can both use.

    Signed-off-by: Marc Eshel
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marc Eshel
     
  • Currently, the freezer treats all tasks as freezable, except for the kernel
    threads that explicitly set the PF_NOFREEZE flag for themselves. This
    approach is problematic, since it requires every kernel thread to either
    set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
    care for the freezing of tasks at all.

    It seems better to only require the kernel threads that want to or need to
    be frozen to use some freezer-related code and to remove any
    freezer-related code from the other (nonfreezable) kernel threads, which is
    done in this patch.

    The patch causes all kernel threads to be nonfreezable by default (ie. to
    have PF_NOFREEZE set by default) and introduces the set_freezable()
    function that should be called by the freezable kernel threads in order to
    unset PF_NOFREEZE. It also makes all of the currently freezable kernel
    threads call set_freezable(), so it shouldn't cause any (intentional)
    change of behaviour to appear. Additionally, it updates documentation to
    describe the freezing of tasks more accurately.

    [akpm@linux-foundation.org: build fixes]
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Nigel Cunningham
    Cc: Pavel Machek
    Cc: Oleg Nesterov
    Cc: Gautham R Shenoy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

11 Jul, 2007

1 commit


18 Feb, 2007

1 commit