26 Jan, 2011
1 commit
-
The information required to find the nfs_client cooresponding to the incoming
back channel request is contained in the NFS layer. Perform minimal checking
in the RPC layer pg_authenticate method, and push more detailed checking into
the NFS layer where the nfs_client can be found.Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust
15 Jan, 2011
1 commit
-
* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: (62 commits)
nfsd4: fix callback restarting
nfsd: break lease on unlink, link, and rename
nfsd4: break lease on nfsd setattr
nfsd: don't support msnfs export option
nfsd4: initialize cb_per_client
nfsd4: allow restarting callbacks
nfsd4: simplify nfsd4_cb_prepare
nfsd4: give out delegations more quickly in 4.1 case
nfsd4: add helper function to run callbacks
nfsd4: make sure sequence flags are set after destroy_session
nfsd4: re-probe callback on connection loss
nfsd4: set sequence flag when backchannel is down
nfsd4: keep finer-grained callback status
rpc: allow xprt_class->setup to return a preexisting xprt
rpc: keep backchannel xprt as long as server connection
rpc: move sk_bc_xprt to svc_xprt
nfsd4: allow backchannel recovery
nfsd4: support BIND_CONN_TO_SESSION
nfsd4: modify session list under cl_lock
Documentation: fl_mylease no longer exists
...Fix up conflicts in fs/nfsd/vfs.c with the vfs-scale work. The
vfs-scale work touched some msnfs cases, and this merge removes support
for that entirely, so the conflict was trivial to resolve.
12 Jan, 2011
4 commits
-
* 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (89 commits)
NFS fix the setting of exchange id flag
NFS: Don't use vm_map_ram() in readdir
NFSv4: Ensure continued open and lockowner name uniqueness
NFS: Move cl_delegations to the nfs_server struct
NFS: Introduce nfs_detach_delegations()
NFS: Move cl_state_owners and related fields to the nfs_server struct
NFS: Allow walking nfs_client.cl_superblocks list outside client.c
pnfs: layout roc code
pnfs: update nfs4_callback_recallany to handle layouts
pnfs: add CB_LAYOUTRECALL handling
pnfs: CB_LAYOUTRECALL xdr code
pnfs: change lo refcounting to atomic_t
pnfs: check that partial LAYOUTGET return is ignored
pnfs: add layout to client list before sending rpc
pnfs: serialize LAYOUTGET(openstateid)
pnfs: layoutget rpc code cleanup
pnfs: change how lsegs are removed from layout list
pnfs: change layout state seqlock to a spinlock
pnfs: add prefix to struct pnfs_layout_hdr fields
pnfs: add prefix to struct pnfs_layout_segment fields
... -
This allows us to reuse the xprt associated with a server connection if
one has already been set up.Signed-off-by: J. Bruce Fields
-
Multiple backchannels can share the same tcp connection; from rfc 5661 section
2.10.3.1:A connection's association with a session is not exclusive. A
connection associated with the channel(s) of one session may be
simultaneously associated with the channel(s) of other sessions
including sessions associated with other client IDs.However, multiple backchannels share a connection, they must all share
the same xid stream (hence the same rpc_xprt); the only way we have to
match replies with calls at the rpc layer is using the xid.So, keep the rpc_xprt around as long as the connection lasts, in case
we're asked to use the connection as a backchannel again.Requests to create new backchannel clients over a given server
connection should results in creating new clients that reuse the
existing rpc_xprt.But to start, just reject attempts to associate multiple rpc_xprt's with
the same underlying bc_xprt.Signed-off-by: J. Bruce Fields
-
This seems obviously transport-level information even if it's currently
used only by the server socket code.Signed-off-by: J. Bruce Fields
11 Jan, 2011
2 commits
-
Conflicts:
fs/nfs/nfs2xdr.c
fs/nfs/nfs3xdr.c
fs/nfs/nfs4xdr.c -
vm_map_ram() is not available on NOMMU platforms, and causes trouble
on incoherrent architectures such as ARM when we access the page data
through both the direct and the virtual mapping.The alternative is to use the direct mapping to access page data
for the case when we are not crossing a page boundary, but to copy
the data into a linear scratch buffer when we are accessing data
that spans page boundaries.Signed-off-by: Trond Myklebust
Tested-by: Marc Kleine-Budde
Cc: stable@kernel.org [2.6.37]
08 Jan, 2011
2 commits
-
* 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (33 commits)
usb: don't use flush_scheduled_work()
speedtch: don't abuse struct delayed_work
media/video: don't use flush_scheduled_work()
media/video: explicitly flush request_module work
ioc4: use static work_struct for ioc4_load_modules()
init: don't call flush_scheduled_work() from do_initcalls()
s390: don't use flush_scheduled_work()
rtc: don't use flush_scheduled_work()
mmc: update workqueue usages
mfd: update workqueue usages
dvb: don't use flush_scheduled_work()
leds-wm8350: don't use flush_scheduled_work()
mISDN: don't use flush_scheduled_work()
macintosh/ams: don't use flush_scheduled_work()
vmwgfx: don't use flush_scheduled_work()
tpm: don't use flush_scheduled_work()
sonypi: don't use flush_scheduled_work()
hvsi: don't use flush_scheduled_work()
xen: don't use flush_scheduled_work()
gdrom: don't use flush_scheduled_work()
...Fixed up trivial conflict in drivers/media/video/bt8xx/bttv-input.c
as per Tejun. -
…t/npiggin/linux-npiggin
* 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits)
fs: scale mntget/mntput
fs: rename vfsmount counter helpers
fs: implement faster dentry memcmp
fs: prefetch inode data in dcache lookup
fs: improve scalability of pseudo filesystems
fs: dcache per-inode inode alias locking
fs: dcache per-bucket dcache hash locking
bit_spinlock: add required includes
kernel: add bl_list
xfs: provide simple rcu-walk ACL implementation
btrfs: provide simple rcu-walk ACL implementation
ext2,3,4: provide simple rcu-walk ACL implementation
fs: provide simple rcu-walk generic_check_acl implementation
fs: provide rcu-walk aware permission i_ops
fs: rcu-walk aware d_revalidate method
fs: cache optimise dentry and inode for rcu-walk
fs: dcache reduce branches in lookup path
fs: dcache remove d_mounted
fs: fs_struct use seqlock
fs: rcu-walk for path lookup
...
07 Jan, 2011
9 commits
-
Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.Patched with:
git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
Signed-off-by: Nick Piggin
-
RCU free the struct inode. This will allow:
- Subsequent store-free path walking patch. The inode must be consulted for
permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
to take i_lock no longer need to take sb_inode_list_lock to walk the list in
the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
page lock to follow page->mapping.The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.Signed-off-by: Nick Piggin
-
Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.Signed-off-by: Nick Piggin
-
Differentiate from server backchannel
Signed-off-by: Andy Adamson
Acked-by: Bruce Fields
Signed-off-by: Trond Myklebust -
The sessions based callback service is started prior to the CREATE_SESSION call
so that it can handle CB_NULL requests which can be sent before the
CREATE_SESSION call returns and the session ID is known.Set the callback sessionid after a sucessful CREATE_SESSION.
Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust -
Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust -
Move the current sock create and destroy routines into the new transport ops.
Back channel socket will be destroyed by the svc_closs_all call in svc_destroy.Added check: only TCP supported on shared back channel.
Signed-off-by: Andy Adamson
Acked-by: Bruce Fields
Signed-off-by: Trond Myklebust -
Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust -
The NFSv4.1 shared back channel does not need to call svc_drop because the
callback service never outlives the single connection it services, and it
reuses it's buffers and keeps the trasport.Signed-off-by: Andy Adamson
Acked-by: Bruce Fields
Signed-off-by: Trond Myklebust
05 Jan, 2011
7 commits
-
Supposes cache_check runs simultaneously with an update on a different
CPU:cache_check task doing update
^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^1. test for CACHE_VALID 1'. set entry->data
& !CACHE_NEGATIVE2. use entry->data 2'. set CACHE_VALID
If the two memory writes performed in step 1' and 2' appear misordered
with respect to the reads in step 1 and 2, then the caller could get
stale data at step 2 even though it saw CACHE_VALID set on the cache
entry.Add memory barriers to prevent this.
Reviewed-by: NeilBrown
Signed-off-by: J. Bruce Fields -
We attempt to turn a cache entry negative in place. But that entry may
already have been filled in by some other task since we last checked
whether it was valid, so we could be modifying an already-valid entry.
If nothing else there's a likely leak in such a case when the entry is
eventually put() and contents are not freed because it has
CACHE_NEGATIVE set.So, take the cache_lock just as sunrpc_cache_update() does.
Reviewed-by: NeilBrown
Signed-off-by: J. Bruce Fields -
Currently we use -EAGAIN returns to determine when to drop a deferred
request. On its own, that is error-prone, as it makes us treat -EAGAIN
returns from other functions specially to prevent inadvertent dropping.So, use a flag on the request instead.
Returning an error on request deferral is still required, to prevent
further processing, but we no longer need worry that an error return on
its own could result in a drop.Signed-off-by: J. Bruce Fields
-
Commit d29068c431599fa "sunrpc: Simplify cache_defer_req and related
functions." asserted that cache_check() could determine success or
failure of cache_defer_req() by checking the CACHE_PENDING bit.This isn't quite right.
We need to know whether cache_defer_req() created a deferred request,
in which case sending an rpc reply has become the responsibility of the
deferred request, and it is important that we not send our own reply,
resulting in two different replies to the same request.And the CACHE_PENDING bit doesn't tell us that; we could have
succesfully created a deferred request at the same time as another
thread cleared the CACHE_PENDING bit.So, partially revert that commit, to ensure that cache_check() returns
-EAGAIN if and only if a deferred request has been created.Signed-off-by: J. Bruce Fields
Acked-by: NeilBrown -
Signed-off-by: NeilBrown
[bfields@redhat.com: moved svcauth_unix_purge outside ifdef's.]
Signed-off-by: J. Bruce Fields -
Once a sunrpc cache entry is VALID, we should be replacing it (and
allowing any concurrent users to destroy it on last put) instead of
trying to update it in place.Otherwise someone referencing the ip_map we're modifying here could try
to use the m_client just as we're putting the last reference.The bug should only be seen by users of the legacy nfsd interfaces.
(Thanks to Neil for suggestion to use sunrpc_invalidate.)
Reviewed-by: NeilBrown
Signed-off-by: J. Bruce Fields -
On Tue, 2010-12-14 at 16:58 +0800, Mi Jinlong wrote:
> Hi,
>
> When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic
> at NFS client's __rpc_create_common function.
>
> The panic place is:
> rpc_mkpipe
> __rpc_lookup_create() __rpc_mkpipe() __rpc_create_common()
> ****** BUG_ON(!d_unhashed(dentry)); ****** *panic*
>
> It means that the dentry's d_flags have be set DCACHE_UNHASHED,
> but it should not be set here.
>
> Is someone known this bug? or give me some idea?
>
> A reproduce program is append, but it can't reproduce the bug every time.
> the export is: "/nfsroot *(rw,no_root_squash,fsid=0,insecure)"
>
> And the panic message is append.
>
> ============================================================================
> #!/bin/sh
>
> LOOPTOTAL=768
> LOOPCOUNT=0
> ret=0
>
> while [ $LOOPCOUNT -ne $LOOPTOTAL ]
> do
> ((LOOPCOUNT += 1))
> service nfs restart
> /usr/sbin/rpc.idmapd
> mount -t nfs4 127.0.0.1:/ /mnt|| return 1;
> ls -l /var/lib/nfs/rpc_pipefs/nfs/*/
> umount /mnt
> echo $LOOPCOUNT
> done
>
> ===============================================================================
> Code: af 60 01 00 00 89 fa 89 f0 e8 64 cf 89 f0 e8 5c 7c 64 cf 31 c0 8b 5c 24 10 8b
> 74 24 14 8b 7c 24 18 8b 6c 24 1c 83 c4 20 c3 0b eb fc 8b 46 28 c7 44 24 08 20
> de ee f0 c7 44 24 04 56 ea
> EIP:[] __rpc_create_common+0x8a/0xc0 [sunrpc] SS:ESP 0068:eccb5d28
> ---[ end trace 8f5606cd08928ed2]---
> Kernel panic - not syncing: Fatal exception
> Pid:7131, comm: mount.nfs4 Tainted: G D -------------------2.6.32 #1
> Call Trace:
> [] ? panic+0x42/0xed
> [] ? oops_end+0xbc/0xd0
> [] ? do_invalid_op+0x0/0x90
> [] ? do_invalid_op+0x7f/0x90
> [] ? __rpc_create_common+0x8a/0xc0[sunrpc]
> [] ? rpc_free_task+0x33/0x70[sunrpc]
> [] ? prc_call_sync+0x48/0x60[sunrpc]
> [] ? rpc_ping+0x4e/0x60[sunrpc]
> [] ? rpc_create+0x38f/0x4f0[sunrpc]
> [] ? error_code+0x73/0x78
> [] ? __rpc_create_common+0x8a/0xc0[sunrpc]
> [] ? d_lookup+0x2a/0x40
> [] ? rpc_mkpipe+0x111/0x1b0[sunrpc]
> [] ? nfs_create_rpc_client+0xb4/0xf0[nfs]
> [] ? nfs_fscache_get_client_cookie+0x1d/0x50[nfs]
> [] ? nfs_idmap_new+0x7b/0x140[nfs]
> [] ? strlcpy+0x3a/0x60
> [] ? nfs4_set_client+0xea/0x2b0[nfs]
> [] ? nfs4_create_server+0xac/0x1b0[nfs]
> [] ? krealloc+0x40/0x50
> [] ? nfs4_remote_get_sb+0x6b/0x250[nfs]
> [] ? kstrdup+0x3c/0x60
> [] ? vfs_kern_mount+0x69/0x170
> [] ? nfs_do_root_mount+0x6c/0xa0[nfs]
> [] ? nfs4_try_mount+0x37/0xa0[nfs]
> [] ? nfs4_validate_text_mount_data+-x7d/0xf0[nfs]
> [] ? nfs4_get_sb+0x92/0x2f0
> [] ? vfs_kern_mount+0x69/0x170
> [] ? get_fs_type+0x32/0xb0
> [] ? do_kern_mount+0x3f/0xe0
> [] ? do_mount+0x2ef/0x740
> [] ? copy_mount_options+0xb0/0x120
> [] ? sys_mount+0x6e/0xa0Hi,
Does the following patch fix the problem?
Cheers
Trond--------------------------
SUNRPC: Fix a BUG in __rpc_create_commonFrom: Trond Myklebust
Mi Jinlong reports:
When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic
at NFS client's __rpc_create_common function.The panic place is:
rpc_mkpipe
__rpc_lookup_create()
Signed-off-by: Trond Myklebust
27 Dec, 2010
1 commit
-
Conflicts:
net/ipv4/fib_frontend.c
22 Dec, 2010
1 commit
-
Signed-off-by: Joe Perches
Signed-off-by: Trond Myklebust
18 Dec, 2010
5 commits
-
And remove unnecessary double semicolon too.
No effect to code, as test is != 0.
Signed-off-by: Joe Perches
Signed-off-by: J. Bruce Fields -
These macros never be used for several years.
Signed-off-by: Shan Wei
Signed-off-by: J. Bruce Fields -
Currently svc_sock_names calls svc_close_xprt on a svc_sock to
which it does not own a reference.
As soon as svc_close_xprt sets XPT_CLOSE, the socket could be
freed by a separate thread (though this is a very unlikely race).It is safer to hold a reference while calling svc_close_xprt.
Signed-off-by: NeilBrown
Signed-off-by: J. Bruce Fields -
The xpt_pool field is only used for reporting BUGs.
And it isn't used correctly.In particular, when it is cleared in svc_xprt_received before
XPT_BUSY is cleared, there is no guarantee that either the
compiler or the CPU might not re-order to two assignments, just
setting xpt_pool to NULL after XPT_BUSY is cleared.If a different cpu were running svc_xprt_enqueue at this moment,
it might see XPT_BUSY clear and then xpt_pool non-NULL, and
so BUG.This could be fixed by calling
smp_mb__before_clear_bit()
before the clear_bit. However as xpt_pool isn't really used,
it seems safest to simply remove xpt_pool.Another alternate would be to change the clear_bit to
clear_bit_unlock, and the test_and_set_bit to test_and_set_bit_lock.Signed-off-by: NeilBrown
Signed-off-by: J. Bruce Fields
17 Dec, 2010
4 commits
-
Now that all client-side XDR decoder routines use xdr_streams, there
should be no need to support the legacy calling sequence [rpc_rqst *,
__be32 *, RPC res *] anywhere. We can construct an xdr_stream in the
generic RPC code, instead of in each decoder function.This is a refactoring change. It should not cause different behavior.
Signed-off-by: Chuck Lever
Tested-by: J. Bruce Fields
Signed-off-by: Trond Myklebust -
Now that all client-side XDR encoder routines use xdr_streams, there
should be no need to support the legacy calling sequence [rpc_rqst *,
__be32 *, RPC arg *] anywhere. We can construct an xdr_stream in the
generic RPC code, instead of in each encoder function.Also, all the client-side encoder functions return 0 now, making a
return value superfluous. Take this opportunity to convert them to
return void instead.This is a refactoring change. It should not cause different behavior.
Signed-off-by: Chuck Lever
Tested-by: J. Bruce Fields
Signed-off-by: Trond Myklebust -
Clean up.
Just fixed a panic where the nrprocs field in a different upper layer
client was set by hand incorrectly. Use the compiler-generated method
used by the other upper layer protocols.Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust -
Clean up.
The trend in the other XDR encoder functions is to BUG() when encoding
problems occur, since a problem here is always due to a local coding
error. Then, instead of a status, zero is unconditionally returned.Update the rpcbind XDR encoders to behave this way.
To finish the update, use the new-style be32_to_cpup() and
cpu_to_be32() macros, and compute the buffer sizes using raw integers
instead of sizeof(). This matches the conventions used in other XDR
functions.Signed-off-by: Chuck Lever
Tested-by: J. Bruce Fields
Signed-off-by: Trond Myklebust
15 Dec, 2010
1 commit
-
cancel_rearming_delayed_work[queue]() has been superceded by
cancel_delayed_work_sync() quite some time ago. Convert all the
in-kernel users. The conversions are completely equivalent and
trivial.Signed-off-by: Tejun Heo
Acked-by: "David S. Miller"
Acked-by: Greg Kroah-Hartman
Acked-by: Evgeniy Polyakov
Cc: Jeff Garzik
Cc: Benjamin Herrenschmidt
Cc: Mauro Carvalho Chehab
Cc: netdev@vger.kernel.org
Cc: Anton Vorontsov
Cc: David Woodhouse
Cc: "J. Bruce Fields"
Cc: Neil Brown
Cc: Alex Elder
Cc: xfs-masters@oss.sgi.com
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Andrew Morton
Cc: netfilter-devel@vger.kernel.org
Cc: Trond Myklebust
Cc: linux-nfs@vger.kernel.org
08 Dec, 2010
1 commit
-
When an xprt is created, it has a refcount of 1, and XPT_BUSY is set.
The refcount is *not* owned by the thread that created the xprt
(as is clear from the fact that creators never put the reference).
Rather, it is owned by the absence of XPT_DEAD. Once XPT_DEAD is set,
(And XPT_BUSY is clear) that initial reference is dropped and the xprt
can be freed.So when a creator clears XPT_BUSY it is dropping its only reference and
so must not touch the xprt again.However svc_recv, after calling ->xpo_accept (and so getting an XPT_BUSY
reference on a new xprt), calls svc_xprt_recieved. This clears
XPT_BUSY and then svc_xprt_enqueue - this last without owning a reference.
This is dangerous and has been seen to leave svc_xprt_enqueue working
with an xprt containing garbage.So we need to hold an extra counted reference over that call to
svc_xprt_received.For safety, any time we clear XPT_BUSY and then use the xprt again, we
first get a reference, and the put it again afterwards.Note that svc_close_all does not need this extra protection as there are
no threads running, and the final free can only be called asynchronously
from such a thread.Signed-off-by: NeilBrown
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields
23 Nov, 2010
1 commit
-
If the rpcauth_refreshcred() call returns an error other than
EACCES, ENOMEM or ETIMEDOUT, we currently end up looping forever
between call_refresh and call_refreshresult.The correct thing to do here is to exit on all errors except
EAGAIN and ETIMEDOUT, for which case we retry 3 times, then
return EACCES.Signed-off-by: Trond Myklebust