Doug / smarc-fsl-linux-kernel | Embedian Git Server

08 Dec, 2010

1 commit

ed2849d3e sunrpc: prevent use-after-free on clearing XPT_BUSY ... Browse Code »

When an xprt is created, it has a refcount of 1, and XPT_BUSY is set.
The refcount is *not* owned by the thread that created the xprt
(as is clear from the fact that creators never put the reference).
Rather, it is owned by the absence of XPT_DEAD. Once XPT_DEAD is set,
(And XPT_BUSY is clear) that initial reference is dropped and the xprt
can be freed.

So when a creator clears XPT_BUSY it is dropping its only reference and
so must not touch the xprt again.

However svc_recv, after calling ->xpo_accept (and so getting an XPT_BUSY
reference on a new xprt), calls svc_xprt_recieved. This clears
XPT_BUSY and then svc_xprt_enqueue - this last without owning a reference.
This is dangerous and has been seen to leave svc_xprt_enqueue working
with an xprt containing garbage.

So we need to hold an extra counted reference over that call to
svc_xprt_received.

For safety, any time we clear XPT_BUSY and then use the xprt again, we
first get a reference, and the put it again afterwards.

Note that svc_close_all does not need this extra protection as there are
no threads running, and the final free can only be called asynchronously
from such a thread.

Signed-off-by: NeilBrown
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields

NeilBrown
2010-12-08 09:39:55 +0800

18 Nov, 2010

1 commit

451a3c24b BKL: remove extraneous #include <smp_lock.h> ... Browse Code »

The big kernel lock has been removed from all these files at some point,
leaving only the #include.

Remove this too as a cleanup.

Signed-off-by: Arnd Bergmann
Signed-off-by: Linus Torvalds

Arnd Bergmann
2010-11-18 00:59:32 +0800

26 Oct, 2010

3 commits

01dba075d svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue ... Browse Code »

If any xprt marked DEAD is also left BUSY for the rest of its life, then
the XPT_DEAD check here is superfluous--we'll get the same result from
the XPT_BUSY check just after.

Signed-off-by: J. Bruce Fields

J. Bruce Fields
2010-10-26 05:59:33 +0800
ac9303eb7 svcrpc: assume svc_delete_xprt() called only once ... Browse Code »

As long as DEAD exports are left BUSY, and svc_delete_xprt is called
only with BUSY held, then svc_delete_xprt() will never be called on an
xprt that is already DEAD.

Signed-off-by: J. Bruce Fields

J. Bruce Fields
2010-10-26 05:59:32 +0800
7e4fdd074 svcrpc: never clear XPT_BUSY on dead xprt ... Browse Code »

Once an xprt has been deleted, there's no reason to allow it to be
enqueued--at worst, that might cause the xprt to be re-added to some
global list, resulting in later corruption.

Also, note this leaves us with no need for the reference-count
manipulation here.

Reviewed-by: NeilBrown
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2010-10-26 05:58:40 +0800

19 Oct, 2010

1 commit

8f3a6de31 sunrpc: Turn list_for_each-s into the ..._entry-s ... Browse Code »

Saves some lines of code and some branticks when reading one.

Signed-off-by: Pavel Emelyanov
Reviewed-by: Chuck Lever
Signed-off-by: J. Bruce Fields

Pavel Emelyanov
2010-10-19 22:48:16 +0800

02 Oct, 2010

3 commits

edc7a8940 nfsd: provide callbacks on svc_xprt deletion ... Browse Code »

NFSv4.1 needs warning when a client tcp connection goes down, if that
connection is being used as a backchannel, so that it can warn the
client that it has lost the backchannel connection.

Signed-off-by: J. Bruce Fields

J. Bruce Fields
2010-10-02 07:29:44 +0800
62832c039 sunrpc: Pull net argument downto svc_create_socket ... Browse Code »

After this the socket creation in it knows the context.

Signed-off-by: Pavel Emelyanov
Signed-off-by: J. Bruce Fields

Pavel Emelyanov
2010-10-02 05:18:55 +0800
fc5d00b04 sunrpc: Add net argument to svc_create_xprt ... Browse Code »

Signed-off-by: Pavel Emelyanov
Signed-off-by: J. Bruce Fields

Pavel Emelyanov
2010-10-02 05:18:54 +0800

27 Sep, 2010

2 commits

4fb8518bd sunrpc: Tag svc_xprt with net ... Browse Code »

The transport representation should be per-net of course.

Signed-off-by: Pavel Emelyanov
Signed-off-by: J. Bruce Fields

Pavel Emelyanov
2010-09-27 22:16:12 +0800
e3bfca01c sunrpc: Make xprt auth cache release work with the xprt ... Browse Code »

This is done in order to facilitate getting the ip_map_cache from
which to put the ip_map.

Signed-off-by: Pavel Emelyanov
Signed-off-by: J. Bruce Fields

Pavel Emelyanov
2010-09-27 22:16:11 +0800

08 Sep, 2010

2 commits

6610f720e svcrpc: minor cache cleanup ... Browse Code »

Pull out some code into helper functions, fix a typo.

Signed-off-by: J. Bruce Fields

J. Bruce Fields
2010-09-08 08:19:12 +0800
f16b6e8d8 sunrpc/cache: allow threads to block while waiting for cache update. ... Browse Code »

The current practice of waiting for cache updates by queueing the
whole request to be retried has (at least) two problems.

1/ With NFSv4, requests can be quite complex and re-trying a whole
request when a later part fails should only be a last-resort, not a
normal practice.

2/ Large requests, and in particular any 'write' request, will not be
queued by the current code and doing so would be undesirable.

In many cases only a very sort wait is needed before the cache gets
valid data.

So, providing the underlying transport permits it by setting
->thread_wait,
arrange to wait briefly for an upcall to be completed (as reflected in
the clearing of CACHE_PENDING).
If the short wait was not long enough and CACHE_PENDING is still set,
fall back on the old approach.

The 'thread_wait' value is set to 5 seconds when there are spare
threads, and 1 second when there are no spare threads.

These values are probably much higher than needed, but will ensure
some forward progress.

Note that as we only request an update for a non-valid item, and as
non-valid items are updated in place it is extremely unlikely that
cache_check will return -ETIMEDOUT. Normally cache_defer_req will
sleep for a short while and then find that the item is_valid.

Signed-off-by: NeilBrown
Signed-off-by: J. Bruce Fields

NeilBrown
2010-09-08 07:22:07 +0800

04 May, 2010

1 commit

5306293c9 Merge commit 'v2.6.34-rc6' ... Browse Code »

Conflicts:
fs/nfsd/nfs4callback.c

J. Bruce Fields
2010-05-04 23:29:05 +0800

03 May, 2010

1 commit

b48fa6b99 sunrpc: centralise most calls to svc_xprt_received ... Browse Code »

svc_xprt_received must be called when ->xpo_recvfrom has finished
receiving a message, so that the XPT_BUSY flag will be cleared and
if necessary, requeued for further work.

This call is currently made in each ->xpo_recvfrom function, often
from multiple different points. In each case it is the earliest point
on a particular path where it is known that the protection provided by
XPT_BUSY is no longer needed.

However there are (still) some error paths which do not call
svc_xprt_received, and requiring each ->xpo_recvfrom to make the call
does not encourage robustness.

So: move the svc_xprt_received call to be made just after the
call to ->xpo_recvfrom(), and move it of the various ->xpo_recvfrom
methods.

This means that it may not be called at the earliest possible instant,
but this is unlikely to be a measurable performance issue.

Note that there are still other calls to svc_xprt_received as it is
also needed when an xprt is newly created.

Signed-off-by: NeilBrown
Signed-off-by: J. Bruce Fields

Neil Brown
2010-05-03 20:33:00 +0800

30 Mar, 2010

2 commits

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800
788e69e54 svcrpc: don't hold sv_lock over svc_xprt_put() ... Browse Code »

svc_xprt_put() can call tcp_close(), which can sleep, so we shouldn't be
holding this lock.

In fact, only the xpt_list removal and the sv_tmpcnt decrement should
need the sv_lock here.

Reported-by: Mi Jinlong
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2010-03-30 09:02:31 +0800

01 Mar, 2010

2 commits

1b644b6e6 Revert "sunrpc: move the close processing after do recvfrom method" ... Browse Code »

This reverts commit b0401d725334a94d57335790b8ac2404144748ee, which
moved svc_delete_xprt() outside of XPT_BUSY, and allowed it to be called
after svc_xpt_recived(), removing its last reference and destroying it
after it had already been queued for future processing.

Signed-off-by: J. Bruce Fields

J. Bruce Fields
2010-03-01 05:39:30 +0800
f5822754e Revert "sunrpc: fix peername failed on closed listener" ... Browse Code »

This reverts commit b292cf9ce70d221c3f04ff62db5ab13d9a249ca8. The
commit that it attempted to patch up,
b0401d725334a94d57335790b8ac2404144748ee, was fundamentally wrong, and
will also be reverted.

Signed-off-by: J. Bruce Fields

J. Bruce Fields
2010-03-01 05:39:15 +0800

27 Feb, 2010

1 commit

ab1b18f70 sunrpc: remove unnecessary svc_xprt_put ... Browse Code »

The 'struct svc_deferred_req's on the xpt_deferred queue do not
own a reference to the owning xprt. This is seen in svc_revisit
which is where things are added to this queue. dr->xprt is set to
NULL and the reference to the xprt it put.

So when this list is cleaned up in svc_delete_xprt, we mustn't
put the reference.

Also, replace the 'for' with a 'while' which is arguably
simpler and more likely to compile efficiently.

Cc: Tom Tucker
Signed-off-by: NeilBrown
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields

Neil Brown
2010-02-27 06:42:46 +0800

27 Jan, 2010

2 commits

687179081 SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found" ... Browse Code »

write_ports() converts svc_create_xprt()'s ENOENT error return to
EPROTONOSUPPORT so that rpc.nfsd (in user space) can report an error
message that makes sense.

It turns out that several of the other kernel APIs rpc.nfsd use can
also return ENOENT from svc_create_xprt(), by way of lockd_up().

On the client side, an NFSv2 or NFSv3 mount request can also return
the result of lockd_up(). This error may also be returned during an
NFSv4 mount request, since the NFSv4 callback service uses
svc_create_xprt() to create the callback listener. An ENOENT error
return results in a confusing error message from the mount command.

Let's have svc_create_xprt() return EPROTONOSUPPORT instead of ENOENT.

Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields

Chuck Lever
2010-01-27 06:59:21 +0800
d6783b2b6 SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt() ... Browse Code »

Clean up: Bruce observed we have more or less common logic in each of
svc_create_xprt()'s callers: the check to create an IPv6 RPC listener
socket only if CONFIG_IPV6 is set. I'm about to add another case
that does just the same.

If we move the ifdefs into __svc_xpo_create(), then svc_create_xprt()
call sites can get rid of the "#ifdef" ugliness, and can use the same
logic with or without IPv6 support available in the kernel.

Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields

Chuck Lever
2010-01-27 06:56:43 +0800

07 Jan, 2010

2 commits

93939f4e5 Merge branch 'for-2.6.33' of git://linux-nfs.org/~bfields/linux ... Browse Code »

* 'for-2.6.33' of git://linux-nfs.org/~bfields/linux:
sunrpc: fix peername failed on closed listener
nfsd: make sure data is on disk before calling ->fsync
nfsd: fix "insecure" export option

Linus Torvalds
2010-01-07 10:10:15 +0800
b292cf9ce sunrpc: fix peername failed on closed listener ... Browse Code »

There're some warnings of "nfsd: peername failed (err 107)!"
socket error -107 means Transport endpoint is not connected.
This warning message was outputed by svc_tcp_accept() [net/sunrpc/svcsock.c],
when kernel_getpeername returns -107. This means socket might be CLOSED.

And svc_tcp_accept was called by svc_recv() [net/sunrpc/svc_xprt.c]

if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) {

newxpt = xprt->xpt_ops->xpo_accept(xprt);

So this might happen when xprt->xpt_flags has both XPT_LISTENER and XPT_CLOSE.

Let's take a look at commit b0401d72, this commit has moved the close
processing after do recvfrom method, but this commit also introduces this
warnings, if the xpt_flags has both XPT_LISTENER and XPT_CLOSED, we should
close it, not accpet then close.

Signed-off-by: Xiaotian Feng
Cc: J. Bruce Fields
Cc: Neil Brown
Cc: Trond Myklebust
Cc: David S. Miller
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields

Xiaotian Feng
2010-01-07 06:38:04 +0800

17 Dec, 2009

1 commit

37c24b37f Merge branch 'for-2.6.33' of git://linux-nfs.org/~bfields/linux ... Browse Code »

* 'for-2.6.33' of git://linux-nfs.org/~bfields/linux: (42 commits)
nfsd: remove pointless paths in file headers
nfsd: move most of nfsfh.h to fs/nfsd
nfsd: remove unused field rq_reffh
nfsd: enable V4ROOT exports
nfsd: make V4ROOT exports read-only
nfsd: restrict filehandles accepted in V4ROOT case
nfsd: allow exports of symlinks
nfsd: filter readdir results in V4ROOT case
nfsd: filter lookup results in V4ROOT case
nfsd4: don't continue "under" mounts in V4ROOT case
nfsd: introduce export flag for v4 pseudoroot
nfsd: let "insecure" flag vary by pseudoflavor
nfsd: new interface to advertise export features
nfsd: Move private headers to source directory
vfs: nfsctl.c un-used nfsd #includes
lockd: Remove un-used nfsd headers #includes
s390: remove un-used nfsd #includes
sparc: remove un-used nfsd #includes
parsic: remove un-used nfsd #includes
compat.c: Remove dependence on nfsd private headers
...

Linus Torvalds
2009-12-17 02:43:34 +0800

30 Nov, 2009

1 commit

f64f9e719 net: Move && and || to end of previous line ... Browse Code »

Not including net/atm/

Compiled tested x86 allyesconfig only
Added a > 80 column line or two, which I ignored.
Existing checkpatch plaints willfully, cheerfully ignored.

Signed-off-by: Joe Perches
Signed-off-by: David S. Miller

Joe Perches
2009-11-30 08:55:45 +0800

24 Nov, 2009

1 commit

78c210efd Revert "knfsd: avoid overloading the CPU scheduler with enormous load averages" ... Browse Code »

This reverts commit 59a252ff8c0f2fa32c896f69d56ae33e641ce7ad.

This helps in an entirely cached workload but not necessarily in
workloads that require waiting on disk.

Conflicts:

include/linux/sunrpc/svc.h
net/sunrpc/svc_xprt.c

Reported-by: Simon Kirby
Tested-by: Jesper Krogh
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2009-11-24 01:34:05 +0800

12 Sep, 2009

1 commit

4cfc7e601 nfsd41: sunrpc: Added rpc server-side backchannel handling ... Browse Code »

When the call direction is a reply, copy the xid and call direction into the
req->rq_private_buf.head[0].iov_base otherwise rpc_verify_header returns
rpc_garbage.

Signed-off-by: Rahul Iyer
Signed-off-by: Mike Sager
Signed-off-by: Marc Eshel
Signed-off-by: Benny Halevy
Signed-off-by: Ricardo Labiaga
Signed-off-by: Andy Adamson
Signed-off-by: Benny Halevy
[get rid of CONFIG_NFSD_V4_1]
[sunrpc: refactoring of svc_tcp_recvfrom]
[nfsd41: sunrpc: create common send routine for the fore and the back channels]
[nfsd41: sunrpc: Use free_page() to free server backchannel pages]
[nfsd41: sunrpc: Document server backchannel locking]
[nfsd41: sunrpc: remove bc_connect_worker()]
[nfsd41: sunrpc: Define xprt_server_backchannel()[
[nfsd41: sunrpc: remove bc_close and bc_init_auto_disconnect dummy functions]
[nfsd41: sunrpc: eliminate unneeded switch statement in xs_setup_tcp()]
[nfsd41: sunrpc: Don't auto close the server backchannel connection]
[nfsd41: sunrpc: Remove unused functions]
Signed-off-by: Alexandros Batsakis
Signed-off-by: Ricardo Labiaga
Signed-off-by: Benny Halevy
[nfsd41: change bc_sock to bc_xprt]
[nfsd41: sunrpc: move struct rpc_buffer def into a common header file]
[nfsd41: sunrpc: use rpc_sleep in bc_send_request so not to block on mutex]
[removed cosmetic changes]
Signed-off-by: Benny Halevy
[sunrpc: add new xprt class for nfsv4.1 backchannel]
[sunrpc: v2.1 change handling of auto_close and init_auto_disconnect operations for the nfsv4.1 backchannel]
Signed-off-by: Alexandros Batsakis
[reverted more cosmetic leftovers]
[got rid of xprt_server_backchannel]
[separated "nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel"]
Signed-off-by: Benny Halevy
Cc: Trond Myklebust
[sunrpc: change idle timeout value for the backchannel]
Signed-off-by: Alexandros Batsakis
Signed-off-by: Benny Halevy
Acked-by: Trond Myklebust
Signed-off-by: J. Bruce Fields

Rahul Iyer
2009-09-12 03:04:16 +0800

28 Aug, 2009

1 commit

b0401d725 sunrpc: move the close processing after do recvfrom method ... Browse Code »

sunrpc: "Move close processing to a single place"
(d7979ae4a050a45b78af51832475001b68263d2a) moved the
close processing before the recvfrom method. This may
cause the close processing never to execute. So this
patch moves it to the right place.

Signed-off-by: Wei Yongjun
Signed-off-by: J. Bruce Fields

Wei Yongjun
2009-08-28 05:18:38 +0800

26 Aug, 2009

1 commit

ed2d8aed5 knfsd: Replace lock_kernel with a mutex in nfsd pool stats. ... Browse Code »

lock_kernel() in knfsd was replaced with a mutex. The later
commit 03cf6c9f49a8fea953d38648d016e3f46e814991 ("knfsd:
add file to export stats about nfsd pools") did not follow
that change. This patch fixes the issue.

Also move the get and put of nfsd_serv to the open and close methods
(instead of start and stop methods) to allow atomic check and increment
of reference count in the open method (where we can still return an
error).

Signed-off-by: Ryusei Yamaguchi
Signed-off-by: Isaku Yamahata
Signed-off-by: YOSHIFUJI Hideaki
Cc: Greg Banks
Signed-off-by: J. Bruce Fields

Ryusei Yamaguchi
2009-08-26 00:39:37 +0800

13 Jul, 2009

1 commit

405f55712 headers: smp_lock.h redux ... Browse Code »

* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

This will make hardirq.h inclusion cheaper for every PREEMPT=n config
(which includes allmodconfig/allyesconfig, BTW)

Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-07-13 03:22:34 +0800

29 Apr, 2009

2 commits

335c54bdc NFSD: Prevent a buffer overflow in svc_xprt_names() ... Browse Code »

The svc_xprt_names() function can overflow its buffer if it's so near
the end of the passed in buffer that the "name too long" string still
doesn't fit. Of course, it could never tell if it was near the end
of the passed in buffer, since its only caller passes in zero as the
buffer length.

Let's make this API a little safer.

Change svc_xprt_names() so it *always* checks for a buffer overflow,
and change its only caller to pass in the correct buffer length.

If svc_xprt_names() does overflow its buffer, it now fails with an
ENAMETOOLONG errno, instead of trying to write a message at the end
of the buffer. I don't like this much, but I can't figure out a clean
way that's always safe to return some of the names, *and* an
indication that the buffer was not long enough.

The displayed error when doing a 'cat /proc/fs/nfsd/portlist' is
"File name too long".

Signed-off-by: Chuck Lever
Signed-off-by: J. Bruce Fields

Chuck Lever
2009-04-29 01:54:28 +0800
dcf1a3573 net/sunrpc/svc_xprt.c: fix sparse warnings ... Browse Code »

Fix the following sparse warnings in net/sunrpc/svc_xprt.c.

warning: symbol 'svc_recv' was not declared. Should it be static?
warning: symbol 'svc_drop' was not declared. Should it be static?
warning: symbol 'svc_send' was not declared. Should it be static?
warning: symbol 'svc_close_all' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten
Signed-off-by: J. Bruce Fields

H Hartley Sweeten
2009-04-29 01:00:02 +0800

07 Apr, 2009

1 commit

a63856252 Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linux ... Browse Code »

* 'for-2.6.30' of git://linux-nfs.org/~bfields/linux: (81 commits)
nfsd41: define nfsd4_set_statp as noop for !CONFIG_NFSD_V4
nfsd41: define NFSD_DRC_SIZE_SHIFT in set_max_drc
nfsd41: Documentation/filesystems/nfs41-server.txt
nfsd41: CREATE_EXCLUSIVE4_1
nfsd41: SUPPATTR_EXCLCREAT attribute
nfsd41: support for 3-word long attribute bitmask
nfsd: dynamically skip encoded fattr bitmap in _nfsd4_verify
nfsd41: pass writable attrs mask to nfsd4_decode_fattr
nfsd41: provide support for minor version 1 at rpc level
nfsd41: control nfsv4.1 svc via /proc/fs/nfsd/versions
nfsd41: add OPEN4_SHARE_ACCESS_WANT nfs4_stateid bmap
nfsd41: access_valid
nfsd41: clientid handling
nfsd41: check encode size for sessions maxresponse cached
nfsd41: stateid handling
nfsd: pass nfsd4_compound_state* to nfs4_preprocess_{state,seq}id_op
nfsd41: destroy_session operation
nfsd41: non-page DRC for solo sequence responses
nfsd41: Add a create session replay cache
nfsd41: create_session operation
...

Linus Torvalds
2009-04-07 04:25:56 +0800

04 Apr, 2009

1 commit

2f425878b nfsd: don't use the deferral service, return NFS4ERR_DELAY ... Browse Code »

On an NFSv4.1 server cache miss that causes an upcall, NFS4ERR_DELAY will be
returned. It is up to the NFSv4.1 client to resend only the operations that
have not been processed.

Initialize rq_usedeferral to 1 in svc_process(). It sill be turned off in
nfsd4_proc_compound() only when NFSv4.1 Sessions are used.

Note: this isn't an adequate solution on its own. It's acceptable as a way
to get some minimal 4.1 up and working, but we're going to have to find a
way to avoid returning DELAY in all common cases before 4.1 can really be
considered ready.

Signed-off-by: Andy Adamson
Signed-off-by: Benny Halevy
[nfsd41: reverse rq_nodeferral negative logic]
Signed-off-by: Benny Halevy
[sunrpc: initialize rq_usedeferral]
Signed-off-by: Andy Adamson
Signed-off-by: Benny Halevy
Signed-off-by: J. Bruce Fields

Andy Adamson
2009-04-04 08:41:12 +0800

29 Mar, 2009

2 commits

9652ada3f SUNRPC: Change svc_create_xprt() to take a @family argument ... Browse Code »

The sv_family field is going away. Pass a protocol family argument to
svc_create_xprt() instead of extracting the family from the passed-in
svc_serv struct.

Again, as this is a listener socket and not an address, we make this
new argument an "int" protocol family, instead of an "sa_family_t."

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2009-03-29 03:54:36 +0800
156e62094 SUNRPC: Clean up svc_find_xprt() calling sequence ... Browse Code »

Clean up: add documentating comment and use appropriate data types for
svc_find_xprt()'s arguments.

This also eliminates a mixed sign comparison: @port was an int, while
the return value of svc_xprt_local_port() is an unsigned short.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2009-03-29 03:53:57 +0800

19 Mar, 2009

2 commits

03cf6c9f4 knfsd: add file to export stats about nfsd pools ... Browse Code »

Add /proc/fs/nfsd/pool_stats to export to userspace various
statistics about the operation of rpc server thread pools.

This patch is based on a forward-ported version of
knfsd-add-pool-thread-stats which has been shipping in the SGI
"Enhanced NFS" product since 2006 and which was previously
posted:

http://article.gmane.org/gmane.linux.nfs/10375

It has also been updated thus:

* moved EXPORT_SYMBOL() to near the function it exports
* made the new struct struct seq_operations const
* used SEQ_START_TOKEN instead of ((void *)1)
* merged fix from SGI PV 990526 "sunrpc: use dprintk instead of
printk in svc_pool_stats_*()" by Harshula Jayasuriya.
* merged fix from SGI PV 964001 "Crash reading pool_stats before
nfsds are started".

Signed-off-by: Greg Banks
Signed-off-by: Harshula Jayasuriya
Signed-off-by: J. Bruce Fields

Greg Banks
2009-03-19 05:38:42 +0800
59a252ff8 knfsd: avoid overloading the CPU scheduler with enormous load averages ... Browse Code »

Avoid overloading the CPU scheduler with enormous load averages
when handling high call-rate NFS loads. When the knfsd bottom half
is made aware of an incoming call by the socket layer, it tries to
choose an nfsd thread and wake it up. As long as there are idle
threads, one will be woken up.

If there are lot of nfsd threads (a sensible configuration when
the server is disk-bound or is running an HSM), there will be many
more nfsd threads than CPUs to run them. Under a high call-rate
low service-time workload, the result is that almost every nfsd is
runnable, but only a handful are actually able to run. This situation
causes two significant problems:

1. The CPU scheduler takes over 10% of each CPU, which is robbing
the nfsd threads of valuable CPU time.

2. At a high enough load, the nfsd threads starve userspace threads
of CPU time, to the point where daemons like portmap and rpc.mountd
do not schedule for tens of seconds at a time. Clients attempting
to mount an NFS filesystem timeout at the very first step (opening
a TCP connection to portmap) because portmap cannot wake up from
select() and call accept() in time.

Disclaimer: these effects were observed on a SLES9 kernel, modern
kernels' schedulers may behave more gracefully.

The solution is simple: keep in each svc_pool a counter of the number
of threads which have been woken but have not yet run, and do not wake
any more if that count reaches an arbitrary small threshold.

Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16
synthetic client threads simulating an rsync (i.e. recursive directory
listing) workload reading from an i386 RH9 install image (161480
regular files in 10841 directories) on the server. That tree is small
enough to fill in the server's RAM so no disk traffic was involved.
This setup gives a sustained call rate in excess of 60000 calls/sec
before being CPU-bound on the server. The server was running 128 nfsds.

Profiling showed schedule() taking 6.7% of every CPU, and __wake_up()
taking 5.2%. This patch drops those contributions to 3.0% and 2.2%.
Load average was over 120 before the patch, and 20.9 after.

This patch is a forward-ported version of knfsd-avoid-nfsd-overload
which has been shipping in the SGI "Enhanced NFS" product since 2006.
It has been posted before:

http://article.gmane.org/gmane.linux.nfs/10374

Signed-off-by: Greg Banks
Signed-off-by: J. Bruce Fields

Greg Banks
2009-03-19 05:38:41 +0800

08 Jan, 2009

1 commit

24c3767e4 SUNRPC: The sunrpc server code should not be used by out-of-tree modules ... Browse Code »

Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields

Trond Myklebust
2009-01-08 06:18:42 +0800