20 Aug, 2011
1 commit
-
Use NUMA aware allocations to reduce latencies and increase throughput.
sunrpc kthreads can use kthread_create_on_node() if pool_mode is
"percpu" or "pernode", and svc_prepare_thread()/svc_init_buffer() can
also take into account NUMA node affinity for memory allocations.Signed-off-by: Eric Dumazet
CC: "J. Bruce Fields"
CC: Neil Brown
CC: David Miller
Reviewed-by: Greg Banks
[bfields@redhat.com: fix up caller nfs41_callback_up]
Signed-off-by: J. Bruce Fields
28 Oct, 2010
1 commit
-
lockd should use lock_flocks() instead of lock_kernel()
to lock against posix locks accessing the i_flock list.This is a prerequisite to turning lock_flocks into a
spinlock.Signed-off-by: Arnd Bergmann
Acked-by: J. Bruce Fields
02 Oct, 2010
1 commit
-
Signed-off-by: Pavel Emelyanov
Signed-off-by: J. Bruce Fields
30 Mar, 2010
1 commit
-
…it slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
27 Jan, 2010
1 commit
-
Clean up: Bruce observed we have more or less common logic in each of
svc_create_xprt()'s callers: the check to create an IPv6 RPC listener
socket only if CONFIG_IPV6 is set. I'm about to add another case
that does just the same.If we move the ifdefs into __svc_xpo_create(), then svc_create_xprt()
call sites can get rid of the "#ifdef" ugliness, and can use the same
logic with or without IPv6 support available in the kernel.Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields
19 Nov, 2009
1 commit
-
For consistency drop & in front of every proc_handler. Explicity
taking the address is unnecessary and it prevents optimizations
like stubbing the proc_handlers to NULL.Cc: Alexey Dobriyan
Cc: Ingo Molnar
Cc: Joe Perches
Signed-off-by: Eric W. Biederman
12 Nov, 2009
1 commit
-
Now that sys_sysctl is a generic wrapper around /proc/sys .ctl_name
and .strategy members of sysctl tables are dead code. Remove them.Cc: Jan Harkes
Signed-off-by: Eric W. Biederman
07 May, 2009
1 commit
-
If lockd is signalled soon enough after restart then locks_start_grace()
will try to re-add an entry to a list and trigger a lock corruption
warning.Thanks to Wang Chen for the problem report and diagnosis.
WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c()
...
list_add corruption. next->prev should be prev (ef8fe958), but was ef8ff128. (next=ef8ff128).
...
Pid: 23062, comm: lockd Tainted: G W 2.6.30-rc2 #3
Call Trace:
[] warn_slowpath+0x71/0xa0
[] ? update_curr+0x11d/0x125
[] ? trace_hardirqs_on_caller+0x18/0x150
[] ? trace_hardirqs_on+0xb/0xd
[] ? _raw_spin_lock+0x53/0xfa
[] __list_add+0x27/0x5c
[] locks_start_grace+0x22/0x30 [lockd]
[] set_grace_period+0x39/0x53 [lockd]
[] ? lock_kernel+0x1c/0x28
[] lockd+0x64/0x164 [lockd]
[] ? trace_hardirqs_on_caller+0x18/0x150
[] ? complete+0x34/0x3e
[] ? lockd+0x0/0x164 [lockd]
[] ? lockd+0x0/0x164 [lockd]
[] kthread+0x45/0x6b
[] ? kthread+0x0/0x6b
[] kernel_thread_helper+0x7/0x10Reported-by: Wang Chen
Signed-off-by: J. Bruce Fields
Cc: stable@kernel.org
29 Mar, 2009
4 commits
-
Apparently a lot of people need to disable IPv6 completely on their
distributor-built systems, which have CONFIG_IPV6_MODULE enabled at
build time.They do this by blacklisting the ipv6.ko module. This causes the
creation of the lockd service listener to fail if CONFIG_IPV6_MODULE
is set, but the module cannot be loaded.Now that the kernel's PF_INET6 RPC listeners are completely separate
from PF_INET listeners, we can always start PF_INET. Then lockd can
try to start PF_INET6, but it isn't required to be available.Note this has the added benefit that NLM callbacks from AF_INET6
servers will never come from AF_INET remotes. We no longer have to
worry about matching mapped IPv4 addresses to AF_INET when comparing
addresses.Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust -
We're about to convert over to using separate PF_INET and PF_INET6
listeners, instead of a single PF_INET6 listener that also receives
AF_INET requests and maps them to AF_INET6.Clear the way by removing the logic in lockd and the NFSv4 callback
server that creates an AF_INET6 service listener.Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust -
Since an RPC service listener's protocol family is specified now via
svc_create_xprt(), it no longer needs to be passed to svc_create() or
svc_create_pooled(). Remove that argument from the synopsis of those
functions, and remove the sv_family field from the svc_serv struct.Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust -
The sv_family field is going away. Pass a protocol family argument to
svc_create_xprt() instead of extracting the family from the passed-in
svc_serv struct.Again, as this is a listener socket and not an address, we make this
new argument an "int" protocol family, instead of an "sa_family_t."Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust
08 Jan, 2009
2 commits
-
Clean up: Use Bruce's preferred control flow style in make_socks().
Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields -
Clean up: extract common logic in NLM's make_socks() function
into a helper.Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields
07 Jan, 2009
4 commits
-
If the kernel is configured to support IPv6 and the RPC server can register
services via rpcbindv4, we are all set to enable IPv6 support for lockd.Signed-off-by: Chuck Lever
Cc: Aime Le Rouzic
Signed-off-by: J. Bruce Fields -
Clean up.
Treat the nsm_use_hostnames global variable like nsm_local_state.
Note that the default value of nsm_use_hostnames is still zero.Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields -
Clean up: The include/linux/lockd/sm_inter.h header is nearly empty
now. Remove it.Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields -
The default method for calculating the number of connections allowed
per RPC service arbitrarily limits single-threaded services to 80
connections. This is too low for services like lockd and artificially
limits the number of TCP clients that it can support.Have lockd set a default sv_maxconn value to 1024 (which is the typical
default value for RLIMIT_NOFILE. Also add a module parameter to allow an
admin to set this to an arbitrary value.Signed-off-by: Jeff Layton
Acked-by: Neil Brown
Signed-off-by: J. Bruce Fields
24 Dec, 2008
1 commit
-
Signed-off-by: Trond Myklebust
25 Nov, 2008
1 commit
-
If nfsd was shut down before the grace period ended, we could end up
with a freed object still on grace_list. Thanks to Jeff Moyer for
reporting the resulting list corruption warnings.Signed-off-by: J. Bruce Fields
Tested-by: Jeff Moyer
05 Oct, 2008
2 commits
-
Clean up: Now that lockd_up() starts listeners for both transports, the
"proto" argument is no longer needed.Signed-off-by: Chuck Lever
Cc: Neil Brown
Signed-off-by: J. Bruce Fields -
Commit 24e36663, which first appeared in 2.6.19, changed lockd so that
the client side starts a UDP listener only if there is a UDP NFSv2/v3
mount. Its description notes:This... means that lockd will *not* listen on UDP if the only
mounts are TCP mount (and nfsd hasn't started).The latter is the only one that concerns me at all - I don't know
if this might be a problem with some servers.Unfortunately it is a problem for Linux itself. The rpc.statd daemon
on Linux uses UDP for contacting the local lockd, no matter which
protocol is used for NFS mounts. Without a local lockd UDP listener,
NFSv2/v3 lock recovery from Linux NFS clients always fails.Revert parts of commit 24e36663 so lockd_up() always starts both
listeners.Signed-off-by: Chuck Lever
Cc: Neil Brown
Signed-off-by: J. Bruce Fields
04 Oct, 2008
1 commit
-
Rewrite grace period code to unify management of grace period across
lockd and nfsd. The current code has lockd and nfsd cooperate to
compute a grace period which is satisfactory to them both, and then
individually enforce it. This creates a slight race condition, since
the enforcement is not coordinated. It's also more complicated than
necessary.Here instead we have lockd and nfsd each inform common code when they
enter the grace period, and when they're ready to leave the grace
period, and allow normal locking only after both of them are ready to
leave.We also expect the locks_start_grace()/locks_end_grace() interface here
to be simpler to build on for future cluster/high-availability work,
which may require (for example) putting individual filesystems into
grace, or enforcing grace periods across multiple cluster nodes.Signed-off-by: J. Bruce Fields
30 Sep, 2008
3 commits
-
End lockd's grace period using schedule_delayed_work() instead of a
check on every pass through the main loop.After a later patch, we'll depend on lockd to end its grace period even
if it's not currently handling requests; so it shouldn't depend on being
woken up from the main loop to do so.Also, Nakano Hiroaki (who independently produced a similar patch)
noticed that the current behavior is buggy in the face of jiffies
wraparound:"lockd uses time_before() to determine whether the grace period
has expired. This would seem to be enough to avoid timer
wrap-around issues, but, unfortunately, that is not the case.
The time_* family of comparison functions can be safely used to
compare jiffies relatively close in time, but they stop working
after approximately LONG_MAX/2 ticks. nfsd can suffer this
problem because the time_before() comparison in lockd() is not
performed until the first request comes in, which means that if
there is no lockd traffic for more than LONG_MAX/2 ticks we are
screwed."The implication of this is that once time_before() starts
misbehaving any attempt from a NFS client to execute fcntl()
will be received with a NLM_LCK_DENIED_GRACE_PERIOD message for
25 days (assuming HZ=1000). In other words, the 50 seconds grace
period could turn into a grace period of 50 days or more."Note: This bug was analyzed independently by Oda-san
and myself."Signed-off-by: J. Bruce Fields
Cc: Nakano Hiroaki
Cc: Itsuro Oda -
The check here is currently harmless but unnecessary, since, as the
comment notes, there aren't any blocked-lock callbacks to process
during the grace period anyway.And eventually we want to allow multiple grace periods that come and go
for different filesystems over the course of the lifetime of lockd, at
which point this check is just going to get in the way.Signed-off-by: J. Bruce Fields
-
Introduce and initialize an address family field in the svc_serv structure.
This field will determine what family to use for the service's listener
sockets and what families are advertised via the local rpcbind daemon.Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields
24 Jun, 2008
1 commit
-
If lockd_down is called very rapidly after lockd_up returns, then
there is a slim chance that lockd() will never be called. kthread()
will return before calling the function, so we'll end up never
actually calling the cleanup functions for the thread.Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields
25 Apr, 2008
1 commit
-
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (80 commits)
SUNRPC: Invalidate the RPCSEC_GSS session if the server dropped the request
make nfs_automount_list static
NFS: remove duplicate flags assignment from nfs_validate_mount_data
NFS - fix potential NULL pointer dereference v2
SUNRPC: Don't change the RPCSEC_GSS context on a credential that is in use
SUNRPC: Fix a race in gss_refresh_upcall()
SUNRPC: Don't disconnect more than once if retransmitting NFSv4 requests
SUNRPC: Remove the unused export of xprt_force_disconnect
SUNRPC: remove XS_SENDMSG_RETRY
SUNRPC: Protect creds against early garbage collection
NFSv4: Attempt to use machine credentials in SETCLIENTID calls
NFSv4: Reintroduce machine creds
NFSv4: Don't use cred->cr_ops->cr_name in nfs4_proc_setclientid()
nfs: fix printout of multiword bitfields
nfs: return negative error value from nfs{,4}_stat_to_errno
NLM/lockd: Ensure client locking calls use correct credentials
NFS: Remove the buggy lock-if-signalled case from do_setlk()
NLM/lockd: Fix a race when cancelling a blocking lock
NLM/lockd: Ensure that nlmclnt_cancel() returns results of the CANCEL call
NLM: Remove the signal masking in nlmclnt_proc/nlmclnt_cancel
...
24 Apr, 2008
2 commits
-
When svc_recv returns an unexpected error, lockd will print a warning
and exit. This problematic for several reasons. In particular, it will
cause the reference counts for the thread to be wrong, and can lead to a
potential BUG() call.Rather than exiting on error from svc_recv, have the thread do a 1s
sleep and then retry the loop. This is unlikely to cause any harm, and
if the error turns out to be something temporary then it may be able to
recover.Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields -
Have lockd_up start lockd using kthread_run. With this change,
lockd_down now blocks until lockd actually exits, so there's no longer
need for the waitqueue code at the end of lockd_down. This also means
that only one lockd can be running at a time which simplifies the code
within lockd's main loop.This also adds a check for kthread_should_stop in the main loop of
nlmsvc_retry_blocked and after that function returns. There's no sense
continuing to retry blocks if lockd is coming down anyway.Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields
20 Mar, 2008
1 commit
-
Bruce Fields says:
"By the way, we've got another config-related nit here:http://bugzilla.linux-nfs.org/show_bug.cgi?id=156
You can build lockd without CONFIG_SYSCTL set, but then the module will
fail to load."For now, disable the sysctl registration calls in lockd if CONFIG_SYSCTL
is not enabled. This allows the kernel to build properly if PROC_FS or
SYSCTL is not enabled, but an NFS client is desired.In the long run, we would like to be able to build the kernel with an
NFS client but without lockd. This makes sense, for example, if you want
an NFSv4-only NFS client, as NFSv4 doesn't use NLM at all.Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust
22 Feb, 2008
1 commit
-
Sorry for the noise, but here's the v3 of this compilation fix :)
There are some places, which declare the char buf[...] on the stack
to push it later into dprintk(). Since the dprintk sometimes (if the
CONFIG_SYSCTL=n) becomes an empty do { } while (0) stub, these buffers
cause gcc to produce appropriate warnings.Wrap these buffers with RPC_IFDEBUG macro, as Trond proposed, to
compile them out when not needed.Signed-off-by: Pavel Emelyanov
Acked-by: J. Bruce Fields
Signed-off-by: Trond Myklebust
02 Feb, 2008
4 commits
-
Update the write handler for the portlist file to allow creating new
listening endpoints on a transport. The general form of the string is:For example:
echo "tcp 2049" > /proc/fs/nfsd/portlist
This is intended to support the creation of a listening endpoint for
RDMA transports without adding #ifdef code to the nfssvc.c file.Transports can also be removed as follows:
'-'
For example:
echo "-tcp 2049" > /proc/fs/nfsd/portlist
Attempting to add a listener with an invalid transport string results
in EPROTONOSUPPORT and a perror string of "Protocol not supported".Attempting to remove an non-existent listener (.e.g. bad proto or port)
results in ENOTCONN and a perror string of
"Transport endpoint is not connected"Signed-off-by: Tom Tucker
Acked-by: Neil Brown
Reviewed-by: Chuck Lever
Reviewed-by: Greg Banks
Signed-off-by: J. Bruce Fields -
Add a new svc function that allows a service to query whether a
transport instance has already been created. This is used in lockd
to determine whether or not a transport needs to be created when
a lockd instance is brought up.Specifying 0 for the address family or port is effectively a wild-card,
and will result in matching the first transport in the service's list
that has a matching class name.Signed-off-by: Tom Tucker
Acked-by: Neil Brown
Reviewed-by: Chuck Lever
Reviewed-by: Greg Banks
Signed-off-by: J. Bruce Fields -
Move sk_list and sk_ready to svc_xprt. This involves close because these
lists are walked by svcs when closing all their transports. So I combined
the moving of these lists to svc_xprt with making close transport independent.The svc_force_sock_close has been changed to svc_close_all and takes a list
as an argument. This removes some svc internals knowledge from the svcs.This code races with module removal and transport addition.
Thanks to Simon Holm Thøgersen for a compile fix.
Signed-off-by: Tom Tucker
Acked-by: Neil Brown
Reviewed-by: Chuck Lever
Reviewed-by: Greg Banks
Signed-off-by: J. Bruce Fields
Cc: Simon Holm Thøgersen -
Modify the various kernel RPC svcs to use the svc_create_xprt service.
Signed-off-by: Tom Tucker
Acked-by: Neil Brown
Reviewed-by: Chuck Lever
Reviewed-by: Greg Banks
Signed-off-by: J. Bruce Fields
18 Jul, 2007
2 commits
-
Both lockd and (in the nfsv4 case) nfsd enforce a "grace period" after reboot,
during which clients may reclaim locks from the previous server instance, but
may not acquire new locks.Currently the lockd and nfsd enforce grace periods of different lengths. This
may cause problems when we reboot a server with both v2/v3 and v4 clients.
For example, if the lockd grace period is shorter (as is likely the case),
then a v3 client might acquire a new lock that conflicts with a lock already
held (but not yet reclaimed) by a v4 client.This patch calculates a lease time that lockd and nfsd can both use.
Signed-off-by: Marc Eshel
Signed-off-by: J. Bruce Fields
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves. This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.The patch causes all kernel threads to be nonfreezable by default (ie. to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE. It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear. Additionally, it updates documentation to
describe the freezing of tasks more accurately.[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki
Acked-by: Nigel Cunningham
Cc: Pavel Machek
Cc: Oleg Nesterov
Cc: Gautham R Shenoy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Jul, 2007
1 commit
-
Signed-off-by: Trond Myklebust
18 Feb, 2007
1 commit
-
Globally, s/driverfs/sysfs/g.
Signed-off-by: Robert P. J. Day
Signed-off-by: Adrian Bunk