Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

01 Oct, 2014

1 commit

72c23f081 Merge branch 'bugfixes' into linux-next ... Browse Code »

* bugfixes:
NFSv4.1: Fix an NFSv4.1 state renewal regression
NFSv4: fix open/lock state recovery error handling
NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
NFS: Fabricate fscache server index key correctly
SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT
nfs: fix duplicate proc entries

Trond Myklebust
2014-10-01 05:21:41 +0800

29 Sep, 2014

2 commits

df817ba35 NFSv4: fix open/lock state recovery error handling ... Browse Code »
5

The current open/lock state recovery unfortunately does not handle errors
such as NFS4ERR_CONN_NOT_BOUND_TO_SESSION correctly. Instead of looping,
just proceeds as if the state manager is finished recovering.
This patch ensures that we loop back, handle higher priority errors
and complete the open/lock state recovery.

Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust

Trond Myklebust
2014-09-29 04:03:04 +0800
a4339b7b6 NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails ... Browse Code »
5

If a NFSv4.x server returns NFS4ERR_STALE_CLIENTID in response to a
CREATE_SESSION or SETCLIENTID_CONFIRM in order to tell us that it rebooted
a second time, then the client will currently take this to mean that it must
declare all locks to be stale, and hence ineligible for reboot recovery.

RFC3530 and RFC5661 both suggest that the client should instead rely on the
server to respond to inelegible open share, lock and delegation reclaim
requests with NFS4ERR_NO_GRACE in this situation.

Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust

Trond Myklebust
2014-09-29 04:03:03 +0800

25 Sep, 2014

1 commit

8faaa6d5d Fixing lease renewal ... Browse Code »
5

Commit c9fdeb28 removed a 'continue' after checking if the lease needs
to be renewed. However, if client hasn't moved, the code falls down to
starting reboot recovery erroneously (ie., sends open reclaim and gets
back stale_clientid error) before recovering from getting stale_clientid
on the renew operation.

Signed-off-by: Olga Kornievskaia
Fixes: c9fdeb280b8c (NFS: Add basic migration support to state manager thread)
Cc: stable@vger.kernel.org # 3.13+
Signed-off-by: Trond Myklebust

Olga Kornievskaia
2014-09-25 11:03:15 +0800

09 Sep, 2014

1 commit

0c0e0d3c0 nfs: revert "nfs4: queue free_lock_state job submission to nfsiod" ... Browse Code »

This reverts commit 49a4bda22e186c4d0eb07f4a36b5b1a378f9398d.

Christoph reported an oops due to the above commit:

generic/089 242s ...[ 2187.041239] general protection fault: 0000 [#1]
SMP
[ 2187.042899] Modules linked in:
[ 2187.044000] CPU: 0 PID: 11913 Comm: kworker/0:1 Not tainted 3.16.0-rc6+ #1151
[ 2187.044287] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 2187.044287] Workqueue: nfsiod free_lock_state_work
[ 2187.044287] task: ffff880072b50cd0 ti: ffff88007a4ec000 task.ti: ffff88007a4ec000
[ 2187.044287] RIP: 0010:[] [] free_lock_state_work+0x16/0x30
[ 2187.044287] RSP: 0018:ffff88007a4efd58 EFLAGS: 00010296
[ 2187.044287] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007a947ac0 RCX: 8000000000000000
[ 2187.044287] RDX: ffffffff826af9e0 RSI: ffff88007b093c00 RDI: ffff88007b093db8
[ 2187.044287] RBP: ffff88007a4efd58 R08: ffffffff832d3e10 R09: 000001c40efc0000
[ 2187.044287] R10: 0000000000000000 R11: 0000000000059e30 R12: ffff88007fc13240
[ 2187.044287] R13: ffff88007fc18b00 R14: ffff88007b093db8 R15: 0000000000000000
[ 2187.044287] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[ 2187.044287] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2187.044287] CR2: 00007f93ec33fb80 CR3: 0000000079dc2000 CR4: 00000000000006f0
[ 2187.044287] Stack:
[ 2187.044287] ffff88007a4efdd8 ffffffff810cc877 ffffffff810cc80d ffff88007fc13258
[ 2187.044287] 000000007a947af0 0000000000000000 ffffffff8353ccc8 ffffffff82b6f3d0
[ 2187.044287] 0000000000000000 ffffffff82267679 ffff88007a4efdd8 ffff88007fc13240
[ 2187.044287] Call Trace:
[ 2187.044287] [] process_one_work+0x1c7/0x490
[ 2187.044287] [] ? process_one_work+0x15d/0x490
[ 2187.044287] [] worker_thread+0x119/0x4f0
[ 2187.044287] [] ? trace_hardirqs_on+0xd/0x10
[ 2187.044287] [] ? init_pwq+0x190/0x190
[ 2187.044287] [] kthread+0xdf/0x100
[ 2187.044287] [] ? __init_kthread_worker+0x70/0x70
[ 2187.044287] [] ret_from_fork+0x7c/0xb0
[ 2187.044287] [] ? __init_kthread_worker+0x70/0x70
[ 2187.044287] Code: 0f 1f 44 00 00 31 c0 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d b7 48 fe ff ff 48 8b 87 58 fe ff ff 48 89 e5 48 8b 40 30 8b 00 48 8b 10 48 89 c7 48 8b 92 90 03 00 00 ff 52 28 5d c3
[ 2187.044287] RIP [] free_lock_state_work+0x16/0x30
[ 2187.044287] RSP
[ 2187.103626] ---[ end trace 0f11326d28e5d8fa ]---

The original reason for this patch was because the fl_release_private
operation couldn't sleep. With commit ed9814d85810 (locks: defer freeing
locks in locks_delete_lock until after i_lock has been dropped), this is
no longer a problem so we can revert this patch.

Reported-by: Christoph Hellwig
Signed-off-by: Jeff Layton
Reviewed-by: Christoph Hellwig
Tested-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Jeff Layton
2014-09-09 08:00:32 +0800

14 Aug, 2014

1 commit

06b8ab552 Merge tag 'nfs-for-3.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client updates from Trond Myklebust:
"Highlights include:

- stable fix for a bug in nfs3_list_one_acl()
- speed up NFS path walks by supporting LOOKUP_RCU
- more read/write code cleanups
- pNFS fixes for layout return on close
- fixes for the RCU handling in the rpcsec_gss code
- more NFS/RDMA fixes"

* tag 'nfs-for-3.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits)
nfs: reject changes to resvport and sharecache during remount
NFS: Avoid infinite loop when RELEASE_LOCKOWNER getting expired error
SUNRPC: remove all refcounting of groupinfo from rpcauth_lookupcred
NFS: fix two problems in lookup_revalidate in RCU-walk
NFS: allow lockless access to access_cache
NFS: teach nfs_lookup_verify_inode to handle LOOKUP_RCU
NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU
NFS: support RCU_WALK in nfs_permission()
sunrpc/auth: allow lockless (rcu) lookup of credential cache.
NFS: prepare for RCU-walk support but pushing tests later in code.
NFS: nfs4_lookup_revalidate: only evaluate parent if it will be used.
NFS: add checks for returned value of try_module_get()
nfs: clear_request_commit while holding i_lock
pnfs: add pnfs_put_lseg_async
pnfs: find swapped pages on pnfs commit lists too
nfs: fix comment and add warn_on for PG_INODE_REF
nfs: check wait_on_bit_lock err in page_group_lock
sunrpc: remove "ec" argument from encrypt_v2 operation
sunrpc: clean up sparse endianness warnings in gss_krb5_wrap.c
sunrpc: clean up sparse endianness warnings in gss_krb5_seal.c
...

Linus Torvalds
2014-08-14 08:13:19 +0800

16 Jul, 2014

1 commit

743162013 sched: Remove proliferation of wait_on_bit() action functions ... Browse Code »
23

The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().

So:
Rename wait_on_bit and wait_on_bit_lock to
wait_on_bit_action and wait_on_bit_lock_action
to make it explicit that they need an action function.

Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
which are *not* given an action function but implicitly use
a standard one.
The decision to error-out if a signal is pending is now made
based on the 'mode' argument rather than being encoded in the action
function.

All instances of the old wait_on_bit and wait_on_bit_lock which
can use the new version have been changed accordingly and their
action functions have been discarded.
wait_on_bit{_lock} does not return any specific error code in the
event of a signal so the caller must check for non-zero and
interpolate their own error code as appropriate.

The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"

The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.

A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack. So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).

Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS. CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.

Signed-off-by: NeilBrown
Acked-by: David Howells (fscache, keys)
Acked-by: Steven Whitehouse (gfs2)
Acked-by: Peter Zijlstra
Cc: Oleg Nesterov
Cc: Steve French
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar

NeilBrown
2014-07-16 21:10:39 +0800

13 Jul, 2014

2 commits

49a4bda22 nfs4: queue free_lock_state job submission to nfsiod ... Browse Code »
26

We got a report of the following warning in Fedora:

BUG: sleeping function called from invalid context at mm/slub.c:969
in_atomic(): 1, irqs_disabled(): 0, pid: 533, name: bash
3 locks held by bash/533:
#0: (&sp->so_delegreturn_mutex){+.+...}, at: [] nfs4_proc_lock+0x262/0x910 [nfsv4]
#1: (&nfsi->rwsem){.+.+.+}, at: [] nfs4_proc_lock+0x26a/0x910 [nfsv4]
#2: (&sb->s_type->i_lock_key#23){+.+...}, at: [] flock_lock_file_wait+0x8c/0x3a0
CPU: 0 PID: 533 Comm: bash Not tainted 3.15.0-0.rc1.git1.1.fc21.x86_64 #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
0000000000000000 00000000d664ff3c ffff880078b69a70 ffffffff817e82e0
0000000000000000 ffff880078b69a98 ffffffff810cf1a4 0000000000000050
0000000000000050 ffff88007cc01a00 ffff880078b69ad8 ffffffff8121449e
Call Trace:
[] dump_stack+0x4d/0x66
[] __might_sleep+0x184/0x240
[] kmem_cache_alloc_trace+0x4e/0x330
[] ? nfs4_release_lockowner+0x74/0x110 [nfsv4]
[] nfs4_release_lockowner+0x74/0x110 [nfsv4]
[] nfs4_put_lock_state+0x90/0xb0 [nfsv4]
[] nfs4_fl_release_lock+0x15/0x20 [nfsv4]
[] locks_free_lock+0x45/0x90
[] flock_lock_file_wait+0x11c/0x3a0
[] ? nfs4_proc_lock+0x26a/0x910 [nfsv4]
[] do_vfs_lock+0x1e/0x30 [nfsv4]
[] nfs4_proc_lock+0x279/0x910 [nfsv4]
[] ? local_clock+0x16/0x30
[] ? lock_release_holdtime.part.28+0xf/0x200
[] do_unlk+0x8c/0xc0 [nfs]
[] nfs_flock+0xa5/0xf0 [nfs]
[] locks_remove_file+0xb6/0x1e0
[] ? kfree+0xd8/0x2d0
[] __fput+0xd3/0x210
[] ____fput+0xe/0x10
[] task_work_run+0xcd/0xf0
[] do_notify_resume+0x61/0x90
[] int_signal+0x12/0x17

The problem is that NFSv4 is trying to do an allocation from
fl_release_private (in order to send a RELEASE_LOCKOWNER call). That
function can be called while holding the inode->i_lock, and it's
currently set up to do __GFP_WAIT allocations. v4.1 code has a
similar problem.

This patch adds a work_struct to the nfs4_lock_state and has the code
queue the free_lock_state operation to nfsiod.

Reported-by: Josh Stone
Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust

Jeff Layton
2014-07-13 06:36:35 +0800
8003d3c4a nfs4: treat lock owners as opaque values ... Browse Code »

Do the following set of ops with a file on a NFSv4 mount:

exec 3>>/file/on/nfsv4
flock -x 3
exec 3>&-

You'll see the LOCK request go across the wire, but no LOCKU when the
file is closed.

What happens is that the fd is passed across a fork, and the final close
is done in a different process than the opener. That makes
__nfs4_find_lock_state miss finding the correct lock state because it
uses the fl_pid as a search key. A new one is created, and the locking
code treats it as a delegation stateid (because NFS_LOCK_INITIALIZED
isn't set).

The root cause of this breakage seems to be commit 77041ed9b49a9e
(NFSv4: Ensure the lockowners are labelled using the fl_owner and/or
fl_pid).

That changed it so that flock lockowners are allocated based on the
fl_pid. I think this is incorrect. flock locks should be "owned" by the
struct file, and that is already accounted for in the fl_owner field of
the lock request when it comes through nfs_flock.

This patch basically reverts the above commit and with it, a LOCKU is
sent in the above reproducer.

Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust

Jeff Layton
2014-07-13 06:36:31 +0800

11 Jun, 2014

1 commit

d1e1cda86 Merge tag 'nfs-for-3.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client updates from Trond Myklebust:
"Highlights include:

- massive cleanup of the NFS read/write code by Anna and Dros
- support multiple NFS read/write requests per page in order to deal
with non-page aligned pNFS striping. Also cleans up the r/wsize <
page size code nicely.
- stable fix for ensuring inode is declared uptodate only after all
the attributes have been checked.
- stable fix for a kernel Oops when remounting
- NFS over RDMA client fixes
- move the pNFS files layout driver into its own subdirectory"

* tag 'nfs-for-3.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits)
NFS: populate ->net in mount data when remounting
pnfs: fix lockup caused by pnfs_generic_pg_test
NFSv4.1: Fix typo in dprintk
NFSv4.1: Comment is now wrong and redundant to code
NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_state
xprtrdma: Disconnect on registration failure
xprtrdma: Remove BUG_ON() call sites
xprtrdma: Avoid deadlock when credit window is reset
SUNRPC: Move congestion window constants to header file
xprtrdma: Reset connection timeout after successful reconnect
xprtrdma: Use macros for reconnection timeout constants
xprtrdma: Allocate missing pagelist
xprtrdma: Remove Tavor MTU setting
xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting
xprtrdma: Reduce the number of hardway buffer allocations
xprtrdma: Limit work done by completion handler
xprtrmda: Reduce calls to ib_poll_cq() in completion handlers
xprtrmda: Reduce lock contention in completion handlers
xprtrdma: Split the completion queue
xprtrdma: Make rpcrdma_ep_destroy() return void
...

Linus Torvalds
2014-06-11 06:02:42 +0800

06 Jun, 2014

1 commit

abbec2da1 NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_state ... Browse Code »
5

The addition of lockdep code to write_seqcount_begin/end has lead to
a bunch of false positive claims of ABBA deadlocks with the so_lock
spinlock. Audits show that this simply cannot happen because the
read side code does not spin while holding so_lock.

Cc: # 3.13.x
Signed-off-by: Trond Myklebust

Trond Myklebust
2014-06-06 01:02:47 +0800

18 Apr, 2014

1 commit

4e857c58e arch: Mass conversion of smp_mb__*() ... Browse Code »

Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra
Acked-by: Paul E. McKenney
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2014-04-18 20:20:48 +0800

19 Mar, 2014

1 commit

150e7260f NFSv4: Ensure we respect soft mount timeouts during trunking discovery ... Browse Code »

Tested-by: Steve Dickson
Signed-off-by: Trond Myklebust

Trond Myklebust
2014-03-19 20:34:40 +0800

18 Mar, 2014

1 commit

bd0f725c4 Merge branch 'devel' into linux-next Browse Code »

Trond Myklebust
2014-03-18 03:15:21 +0800

06 Mar, 2014

1 commit

927864cd9 NFSv4: Fix the return value of nfs4_select_rw_stateid ... Browse Code »

In commit 5521abfdcf4d6 (NFSv4: Resend the READ/WRITE RPC call
if a stateid change causes an error), we overloaded the return value of
nfs4_select_rw_stateid() to cause it to return -EWOULDBLOCK if an RPC
call is outstanding that would cause the NFSv4 lock or open stateid
to change.
That is all redundant when we actually copy the stateid used in the
read/write RPC call that failed, and check that against the current
stateid. It is doubly so, when we consider that in the NFSv4.1 case,
we also set the stateid's seqid to the special value '0', which means
'match the current valid stateid'.

Reported-by: Andy Adamson
Link: http://lkml.kernel.org/r/1393954269-3974-1-git-send-email-andros@netapp.com
Signed-off-by: Trond Myklebust

Trond Myklebust
2014-03-06 00:55:24 +0800

20 Feb, 2014

1 commit

4f14c194a NFSv4: Clear the open state flags if the new stateid does not match ... Browse Code »

RFC3530 and RFC5661 both prescribe that the 'opaque' field of the
open stateid returned by new OPEN/OPEN_DOWNGRADE/CLOSE calls for
the same file and open owner should match.
If this is not the case, assume that the open state has been lost,
and that we need to recover it.

Signed-off-by: Trond Myklebust

Trond Myklebust
2014-02-20 10:21:07 +0800

19 Feb, 2014

1 commit

146d70caa NFS fix error return in nfs4_select_rw_stateid ... Browse Code »

Do not return an error when nfs4_copy_delegation_stateid succeeds.

Signed-off-by: Andy Adamson
Link: http://lkml.kernel.org/r/1392737765-41942-1-git-send-email-andros@netapp.com
Fixes: ef1820f9be27b (NFSv4: Don't try to recover NFSv4 locks when...)
Cc: NeilBrown
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Trond Myklebust

Andy Adamson
2014-02-19 22:31:56 +0800

06 Jan, 2014

1 commit

16a6ddc70 point to the right include file in a comment (left over from a9004abc3 ) ... Browse Code »

Signed-off-by: Toralf Förster
Signed-off-by: Trond Myklebust

Toralf Förster
2014-01-06 04:51:35 +0800

17 Nov, 2013

1 commit

673fdfe3f Merge tag 'nfs-for-3.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client bugfixes:
- Stable fix for data corruption when retransmitting O_DIRECT writes
- Stable fix for a deep recursion/stack overflow bug in rpc_release_client
- Stable fix for infinite looping when mounting a NFSv4.x volume
- Fix a typo in the nfs mount option parser
- Allow pNFS layouts to be compiled into the kernel when NFSv4.1 is

* tag 'nfs-for-3.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: fix pnfs Kconfig defaults
NFS: correctly report misuse of "migration" mount option.
nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once
SUNRPC: Avoid deep recursion in rpc_release_client
SUNRPC: Fix a data corruption issue when retransmitting RPC calls

Linus Torvalds
2013-11-17 05:14:56 +0800

15 Nov, 2013

1 commit

16735d022 tree-wide: use reinit_completion instead of INIT_COMPLETION ... Browse Code »

Use this new function to make code more comprehensible, since we are
reinitialzing the completion, not initializing.

[akpm@linux-foundation.org: linux-next resyncs]
Signed-off-by: Wolfram Sang
Acked-by: Linus Walleij (personally at LCE13)
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wolfram Sang
2013-11-15 08:32:21 +0800

14 Nov, 2013

1 commit

6d769f1e1 nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once ... Browse Code »
2

Currently, when we try to mount and get back NFS4ERR_CLID_IN_USE or
NFS4ERR_WRONGSEC, we create a new rpc_clnt and then try the call again.
There is no guarantee that doing so will work however, so we can end up
retrying the call in an infinite loop.

Worse yet, we create the new client using rpc_clone_client_set_auth,
which creates the new client as a child of the old one. Thus, we can end
up with a *very* long lineage of rpc_clnts. When we go to put all of the
references to them, we can end up with a long call chain that can smash
the stack as each rpc_free_client() call can recurse back into itself.

This patch fixes this by simply ensuring that the SETCLIENTID call will
only be retried in this situation if the last attempt did not use
RPC_AUTH_UNIX.

Note too that with this change, we don't need the (i > 2) check in the
-EACCES case since we now have a more reliable test as to whether we
should reattempt.

Cc: stable@vger.kernel.org # v3.10+
Cc: Chuck Lever
Tested-by/Acked-by: Weston Andros Adamson
Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust

Jeff Layton
2013-11-14 08:21:05 +0800

01 Nov, 2013

1 commit

1acd1c301 nfs: fix inverted test for delegation in nfs4_reclaim_open_state ... Browse Code »
2

commit 6686390bab6a0e0 (NFS: remove incorrect "Lock reclaim failed!"
warning.) added a test for a delegation before checking to see if any
reclaimed locks failed. The test however is backward and is only doing
that check when a delegation is held instead of when one isn't.

Cc: NeilBrown
Signed-off-by: Jeff Layton
Fixes: 6686390bab6a: NFS: remove incorrect "Lock reclaim failed!" warning.
Cc: stable@vger.kernel.org # 3.12
Signed-off-by: Trond Myklebust

Jeff Layton
2013-11-01 01:24:07 +0800

29 Oct, 2013

5 commits

0625c2dd6 NFS: Fix possible endless state recovery wait ... Browse Code »

In nfs4_wait_clnt_recover(), hold a reference to the clp being
waited on. The state manager can reduce clp->cl_count to 1, in
which case the nfs_put_client() in nfs4_run_state_manager() can
free *clp before wait_on_bit() returns and allows
nfs4_wait_clnt_recover() to run again.

The behavior at that point is non-deterministic. If the waited-on
bit still happens to be zero, wait_on_bit() will wake the waiter as
expected. If the bit is set again (say, if the memory was poisoned
when freed) wait_on_bit() can leave the waiter asleep.

This is a narrow fix which ensures the safety of accessing *clp in
nfs4_wait_clnt_recover(), but does not address the continued use
of a possibly freed *clp after nfs4_wait_clnt_recover() returns
(see nfs_end_delegation_return(), for example).

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:31:55 +0800
d1c2331e7 NFS: Handle SEQ4_STATUS_LEASE_MOVED ... Browse Code »

With the advent of NFSv4 sessions in NFSv4.1 and following, a "lease
moved" condition is reported differently than it is in NFSv4.0.

NFSv4 minor version 0 servers return an error status code,
NFS4ERR_LEASE_MOVED, to signal that a lease has moved. This error
causes the whole compound operation to fail. Normal compounds
against this server continue to fail until the client performs
migration recovery on the migrated share.

Minor version 1 and later servers assert a bit flag in the reply to
a compound's SEQUENCE operation to signal LEASE_MOVED. This is not
a fatal condition: operations against this server continue normally.
The server asserts this flag until the client performs migration
recovery on the migrated share.

Note that servers MUST NOT return NFS4ERR_LEASE_MOVED to NFSv4
clients not using NFSv4.0.

After the server asserts any of the sr_status_flags in the SEQUENCE
operation in a typical compound, our client initiates standard lease
recovery. For NFSv4.1+, a stand-alone SEQUENCE operation is
performed to discover what recovery is needed.

If SEQ4_STATUS_LEASE_MOVED is asserted in this stand-alone SEQUENCE
operation, our client attempts to discover which FSIDs have been
migrated, and then performs migration recovery on each.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:31:07 +0800
b7f7a66e4 NFS: Support NFS4ERR_LEASE_MOVED recovery in state manager ... Browse Code »

A migration on the FSID in play for the current NFS operation
is reported via the error status code NFS4ERR_MOVED.

"Lease moved" means that a migration has occurred on some other
FSID than the one for the current operation. It's a signal that
the client should take action immediately to handle a migration
that it may not have noticed otherwise. This is so that the
client's lease does not expire unnoticed on the destination server.

In NFSv4.0, a moved lease is reported with the NFS4ERR_LEASE_MOVED
error status code.

To recover from NFS4ERR_LEASE_MOVED, check each FSID for that server
to see if it is still present. Invoke nfs4_try_migration() if the
FSID is no longer present on the server.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:30:21 +0800
c9fdeb280 NFS: Add basic migration support to state manager thread ... Browse Code »
36

Migration recovery and state recovery must be serialized, so handle
both in the state manager thread.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:24:40 +0800
3660cd432 NFSv4 Remove zeroing state kern warnings ... Browse Code »

As of commit 5d422301f97b821301efcdb6fc9d1a83a5c102d6 we no longer zero the
state.

Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust

Andy Adamson
2013-10-29 02:28:53 +0800

05 Sep, 2013

2 commits

ef1820f9b NFSv4: Don't try to recover NFSv4 locks when they are lost. ... Browse Code »

When an NFSv4 client loses contact with the server it can lose any
locks that it holds.

Currently when it reconnects to the server it simply tries to reclaim
those locks. This might succeed even though some other client has
held and released a lock in the mean time. So the first client might
think the file is unchanged, but it isn't. This isn't good.

If, when recovery happens, the locks cannot be claimed because some
other client still holds the lock, then we get a message in the kernel
logs, but the client can still write. So two clients can both think
they have a lock and can both write at the same time. This is equally
not good.

There was a patch a while ago
http://comments.gmane.org/gmane.linux.nfs/41917

which tried to address some of this, but it didn't seem to go
anywhere. That patch would also send a signal to the process. That
might be useful but for now this patch just causes writes to fail.

For NFSv4 (unlike v2/v3) there is a strong link between the lock and
the write request so we can fairly easily fail any IO of the lock is
gone. While some applications might not expect this, it is still
safer than allowing the write to succeed.

Because this is a fairly big change in behaviour a module parameter,
"recover_locks", is introduced which defaults to true (the current
behaviour) but can be set to "false" to tell the client not to try to
recover things that were lost.

Signed-off-by: NeilBrown
Signed-off-by: Trond Myklebust

NeilBrown
2013-09-05 00:26:32 +0800
b6a85258d NFS: Fix warning introduced by NFSv4.0 transport blocking patches ... Browse Code »

When CONFIG_NFS_V4_1 is not enabled, gcc emits this warning:

linux/fs/nfs/nfs4state.c:255:12: warning:
‘nfs4_begin_drain_session’ defined but not used [-Wunused-function]
static int nfs4_begin_drain_session(struct nfs_client *clp)
^

Eventually NFSv4.0 migration recovery will invoke this function, but
that has not yet been merged. Hide nfs4_begin_drain_session()
behind CONFIG_NFS_V4_1 for now.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-09-05 00:26:30 +0800

04 Sep, 2013

2 commits

2cf8bca8b NFS: Update session draining barriers for NFSv4.0 transport blocking ... Browse Code »

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-09-04 03:26:38 +0800
9d33059c1 NFS: Enable slot table helpers for NFSv4.0 ... Browse Code »

I'd like to re-use NFSv4.1's slot table machinery for NFSv4.0
transport blocking. Re-organize some of nfs4session.c so the slot
table code is built even when NFS_V4_1 is disabled.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-09-04 03:26:33 +0800

23 Aug, 2013

1 commit

6686390ba NFS: remove incorrect "Lock reclaim failed!" warning. ... Browse Code »
4

After reclaiming state that was lost, the NFS client tries to reclaim
any locks, and then checks that each one has NFS_LOCK_INITIALIZED set
(which means that the server has confirmed the lock).
However if the client holds a delegation, nfs_reclaim_locks() simply aborts
(or more accurately it called nfs_lock_reclaim() and that returns without
doing anything).

This is because when a delegation is held, the server doesn't need to
know about locks.

So if a delegation is held, NFS_LOCK_INITIALIZED is not expected, and
its absence is certainly not an error.

So don't print the warnings if NFS_DELGATED_STATE is set.

Signed-off-by: NeilBrown
Signed-off-by: Trond Myklebust

NeilBrown
2013-08-23 02:34:14 +0800

08 Aug, 2013

2 commits

73d8bde5e NFS: Never use user credentials for lease renewal ... Browse Code »

Never try to use a non-UID 0 user credential for lease management,
as that credential can change out from under us. The server will
block NFSv4 lease recovery with NFS4ERR_CLID_INUSE.

Since the mechanism to acquire a credential for lease management
is now the same for all minor versions, replace the minor version-
specific callout with a single function.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-08-08 01:06:08 +0800
d688f7b8f NFS: Use root's credential for lease management when keytab is missing ... Browse Code »

Commit 05f4c350 "NFS: Discover NFSv4 server trunking when mounting"
Fri Sep 14 17:24:32 2012 introduced Uniform Client String support,
which forces our NFS client to establish a client ID immediately
during a mount operation rather than waiting until a user wants to
open a file.

Normally machine credentials (eg. from a keytab) are used to perform
a mount operation that is protected by Kerberos. Before 05fc350,
SETCLIENTID used a machine credential, or fell back to a regular
user's credential if no keytab is available.

On clients that don't have a keytab, performing SETCLIENTID early
means there's no user credential to fall back on, since no regular
user has kinit'd yet. 05f4c350 seems to have broken the ability
to mount with sec=krb5 on clients that don't have a keytab in
kernels 3.7 - 3.10.

To address this regression, commit 4edaa308 (NFS: Use "krb5i" to
establish NFSv4 state whenever possible), Sat Mar 16 15:56:20 2013,
was merged in 3.10. This commit forces the NFS client to fall back
to AUTH_SYS for lease management operations if no keytab is
available.

Neil Brown noticed that, since root is required to kinit to do a
sec=krb5 mount when a client doesn't have a keytab, we can try to
use root's Kerberos credential before AUTH_SYS.

Now, when determining a principal and flavor to use for lease
management, the NFS client tries in this order:

1. Flavor: AUTH_GSS, krb5i
Principal: service principal (via keytab)

2. Flavor: AUTH_GSS, krb5i
Principal: user principal established for UID 0 (via kinit)

3. Flavor: AUTH_SYS
Principal: UID 0 / GID 0

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-08-08 01:05:10 +0800

24 Jul, 2013

1 commit

b14b7979d NFS: Fix return type of nfs4_end_drain_session() stub ... Browse Code »

Clean up: when NFSv4.1 support is compiled out,
nfs4_end_drain_session() becomes a stub. Make the synopsis of the
stub match the synopsis of the real version of the function.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-07-24 06:18:53 +0800

10 Jul, 2013

1 commit

be0c5d8c0 Merge tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client updates from Trond Myklebust:
"Feature highlights include:
- Add basic client support for NFSv4.2
- Add basic client support for Labeled NFS (selinux for NFSv4.2)
- Fix the use of credentials in NFSv4.1 stateful operations, and add
support for NFSv4.1 state protection.

Bugfix highlights:
- Fix another NFSv4 open state recovery race
- Fix an NFSv4.1 back channel session regression
- Various rpc_pipefs races
- Fix another issue with NFSv3 auth negotiation

Please note that Labeled NFS does require some additional support from
the security subsystem. The relevant changesets have all been
reviewed and acked by James Morris."

* tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (54 commits)
NFS: Set NFS_CS_MIGRATION for NFSv4 mounts
NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs
nfs: have NFSv3 try server-specified auth flavors in turn
nfs: have nfs_mount fake up a auth_flavs list when the server didn't provide it
nfs: move server_authlist into nfs_try_mount_request
nfs: refactor "need_mount" code out of nfs_try_mount
SUNRPC: PipeFS MOUNT notification optimization for dying clients
SUNRPC: split client creation routine into setup and registration
SUNRPC: fix races on PipeFS UMOUNT notifications
SUNRPC: fix races on PipeFS MOUNT notifications
NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount
NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount
NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize
NFS: Improve legacy idmapping fallback
NFSv4.1 end back channel session draining
NFS: Apply v4.1 capabilities to v4.2
NFSv4.1: Clean up layout segment comparison helper names
NFSv4.1: layout segment comparison helpers should take 'const' parameters
NFSv4: Move the DNS resolver into the NFSv4 module
rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set
...

Linus Torvalds
2013-07-10 03:09:43 +0800

04 Jul, 2013

1 commit

f170168b9 drivers: avoid parsing names as kthread_run() format strings ... Browse Code »

Calling kthread_run with a single name parameter causes it to be handled
as a format string. Many callers are passing potentially dynamic string
content, so use "%s" in those cases to avoid any potential accidents.

Signed-off-by: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2013-07-04 07:07:41 +0800

29 Jun, 2013

1 commit

1c8c601a8 locks: protect most of the file_lock handling with i_lock ... Browse Code »

Having a global lock that protects all of this code is a clear
scalability problem. Instead of doing that, move most of the code to be
protected by the i_lock instead. The exceptions are the global lists
that the ->fl_link sits on, and the ->fl_block list.

->fl_link is what connects these structures to the
global lists, so we must ensure that we hold those locks when iterating
over or updating these lists.

Furthermore, sound deadlock detection requires that we hold the
blocked_list state steady while checking for loops. We also must ensure
that the search and update to the list are atomic.

For the checking and insertion side of the blocked_list, push the
acquisition of the global lock into __posix_lock_file and ensure that
checking and update of the blocked_list is done without dropping the
lock in between.

On the removal side, when waking up blocked lock waiters, take the
global lock before walking the blocked list and dequeue the waiters from
the global list prior to removal from the fl_block list.

With this, deadlock detection should be race free while we minimize
excessive file_lock_lock thrashing.

Finally, in order to avoid a lock inversion problem when handling
/proc/locks output we must ensure that manipulations of the fl_block
list are also protected by the file_lock_lock.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2013-06-29 16:57:42 +0800

20 Jun, 2013

1 commit

62f288a02 NFSv4.1 end back channel session draining ... Browse Code »

We need to ensure that we clear NFS4_SLOT_TBL_DRAINING on the back
channel when we're done recovering the session.

Regression introduced by commit 774d5f14e (NFSv4.1 Fix a pNFS session
draining deadlock)

Signed-off-by: Andy Adamson
[Trond: Changed order to start back-channel first. Minor code cleanup]
Signed-off-by: Trond Myklebust
Cc: stable@vger.kernel.org [>=3.10]

Andy Adamson
2013-06-20 22:19:21 +0800

07 Jun, 2013

1 commit

965e9c23d NFSv4.1: Ensure that reclaim_complete uses the right credential ... Browse Code »

We want to use the same credential for reclaim_complete as we used
for the exchange_id call.

Signed-off-by: Trond Myklebust

Trond Myklebust
2013-06-07 04:24:35 +0800