Doug / smarc-fsl-linux-kernel | Embedian Git Server

29 Oct, 2013

32 commits

4d4b69dd8 NFS: add support for multiple sec= mount options ... Browse Code »

This patch adds support for multiple security options which can be
specified using a colon-delimited list of security flavors (the same
syntax as nfsd's exports file).

This is useful, for instance, when NFSv4.x mounts cross SECINFO
boundaries. With this patch a user can use "sec=krb5i,krb5p"
to mount a remote filesystem using krb5i, but can still cross
into krb5p-only exports.

New mounts will try all security options before failing. NFSv4.x
SECINFO results will be compared against the sec= flavors to
find the first flavor in both lists or if no match is found will
return -EPERM.

Signed-off-by: Weston Andros Adamson
Signed-off-by: Trond Myklebust

Weston Andros Adamson
2013-10-29 03:38:02 +0800
5837f6dfc NFS: stop using NFS_MOUNT_SECFLAVOUR server flag ... Browse Code »

Since the parsed sec= flavor is now stored in nfs_server->auth_info,
we no longer need an nfs_server flag to determine if a sec= option was
used.

This flag has not been completely removed because it is still needed for
the (old but still supported) non-text parsed mount options ABI
compatability.

Signed-off-by: Weston Andros Adamson
Signed-off-by: Trond Myklebust

Weston Andros Adamson
2013-10-29 03:37:56 +0800
0f5f49b8b NFS: cache parsed auth_info in nfs_server ... Browse Code »

Cache the auth_info structure in nfs_server and pass these values to submounts.

This lays the groundwork for supporting multiple sec= options.

Signed-off-by: Weston Andros Adamson
Signed-off-by: Trond Myklebust

Weston Andros Adamson
2013-10-29 03:37:43 +0800
a3f73c27a NFS: separate passed security flavs from selected ... Browse Code »

When filling parsed_mount_data, store the parsed sec= mount option in
the new struct nfs_auth_info and the chosen flavor in selected_flavor.

This patch lays the groundwork for supporting multiple sec= options.

Signed-off-by: Weston Andros Adamson
Signed-off-by: Trond Myklebust

Weston Andros Adamson
2013-10-29 03:36:58 +0800
47fd88e6b NFSv4: make nfs_find_best_sec static ... Browse Code »

It's not used outside of nfs4namespace.c anymore.

Signed-off-by: Weston Andros Adamson
Signed-off-by: Trond Myklebust

Weston Andros Adamson
2013-10-29 03:33:34 +0800
0625c2dd6 NFS: Fix possible endless state recovery wait ... Browse Code »

In nfs4_wait_clnt_recover(), hold a reference to the clp being
waited on. The state manager can reduce clp->cl_count to 1, in
which case the nfs_put_client() in nfs4_run_state_manager() can
free *clp before wait_on_bit() returns and allows
nfs4_wait_clnt_recover() to run again.

The behavior at that point is non-deterministic. If the waited-on
bit still happens to be zero, wait_on_bit() will wake the waiter as
expected. If the bit is set again (say, if the memory was poisoned
when freed) wait_on_bit() can leave the waiter asleep.

This is a narrow fix which ensures the safety of accessing *clp in
nfs4_wait_clnt_recover(), but does not address the continued use
of a possibly freed *clp after nfs4_wait_clnt_recover() returns
(see nfs_end_delegation_return(), for example).

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:31:55 +0800
cd3fadece NFS: Set EXCHGID4_FLAG_SUPP_MOVED_MIGR ... Browse Code »

Broadly speaking, v4.1 migration is untested. There are no servers
in the wild that support NFSv4.1 migration. However, as server
implementations become available, we do want to enable testing by
developers, while leaving it disabled for environments for which
broken migration support would be an unpleasant surprise.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:31:25 +0800
d1c2331e7 NFS: Handle SEQ4_STATUS_LEASE_MOVED ... Browse Code »

With the advent of NFSv4 sessions in NFSv4.1 and following, a "lease
moved" condition is reported differently than it is in NFSv4.0.

NFSv4 minor version 0 servers return an error status code,
NFS4ERR_LEASE_MOVED, to signal that a lease has moved. This error
causes the whole compound operation to fail. Normal compounds
against this server continue to fail until the client performs
migration recovery on the migrated share.

Minor version 1 and later servers assert a bit flag in the reply to
a compound's SEQUENCE operation to signal LEASE_MOVED. This is not
a fatal condition: operations against this server continue normally.
The server asserts this flag until the client performs migration
recovery on the migrated share.

Note that servers MUST NOT return NFS4ERR_LEASE_MOVED to NFSv4
clients not using NFSv4.0.

After the server asserts any of the sr_status_flags in the SEQUENCE
operation in a typical compound, our client initiates standard lease
recovery. For NFSv4.1+, a stand-alone SEQUENCE operation is
performed to discover what recovery is needed.

If SEQ4_STATUS_LEASE_MOVED is asserted in this stand-alone SEQUENCE
operation, our client attempts to discover which FSIDs have been
migrated, and then performs migration recovery on each.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:31:07 +0800
f8aba1e8d NFS: Handle NFS4ERR_LEASE_MOVED during async RENEW ... Browse Code »

With NFSv4 minor version 0, the asynchronous lease RENEW
heartbeat can return NFS4ERR_LEASE_MOVED. Error recovery logic for
async RENEW is a separate code path from the generic NFS proc paths,
so it must be updated to handle NFS4ERR_LEASE_MOVED as well.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:30:52 +0800
60ea68129 NFS: Migration support for RELEASE_LOCKOWNER ... Browse Code »

Currently the Linux NFS client ignores the operation status code for
the RELEASE_LOCKOWNER operation. Like NFSv3's UMNT operation,
RELEASE_LOCKOWNER is a courtesy to help servers manage their
resources, and the outcome is not consequential for the client.

During a migration, a server may report NFS4ERR_LEASE_MOVED, in
which case the client really should retry, since typically
LEASE_MOVED has nothing to do with the current operation, but does
prevent it from going forward.

Also, it's important for a client to respond as soon as possible to
a moved lease condition, since the client's lease could expire on
the destination without further action by the client.

NFS4ERR_DELAY is not included in the list of valid status codes for
RELEASE_LOCKOWNER in RFC 3530bis. However, rfc3530-migration-update
does permit migration-capable servers to return DELAY to clients,
but only in the context of an ongoing migration. In this case the
server has frozen lock state in preparation for migration, and a
client retry would help the destination server purge unneeded state
once migration recovery is complete.

Interestly, NFS4ERR_MOVED is not valid for RELEASE_LOCKOWNER, even
though lock owners can be migrated with Transparent State Migration.

Note that RFC 3530bis section 9.5 includes RELEASE_LOCKOWNER in the
list of operations that renew a client's lease on the server if they
succeed. Now that our client pays attention to the operation's
status code, we can note that renewal appropriately.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:30:46 +0800
8ef2f8d46 NFS: Implement support for NFS4ERR_LEASE_MOVED ... Browse Code »

Trigger lease-moved recovery when a request returns
NFS4ERR_LEASE_MOVED.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:30:27 +0800
b7f7a66e4 NFS: Support NFS4ERR_LEASE_MOVED recovery in state manager ... Browse Code »

A migration on the FSID in play for the current NFS operation
is reported via the error status code NFS4ERR_MOVED.

"Lease moved" means that a migration has occurred on some other
FSID than the one for the current operation. It's a signal that
the client should take action immediately to handle a migration
that it may not have noticed otherwise. This is so that the
client's lease does not expire unnoticed on the destination server.

In NFSv4.0, a moved lease is reported with the NFS4ERR_LEASE_MOVED
error status code.

To recover from NFS4ERR_LEASE_MOVED, check each FSID for that server
to see if it is still present. Invoke nfs4_try_migration() if the
FSID is no longer present on the server.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:30:21 +0800
44c999338 NFS: Add method to detect whether an FSID is still on the server ... Browse Code »

Introduce a mechanism for probing a server to determine if an FSID
is present or absent.

The on-the-wire compound is different between minor version 0 and 1.
Minor version 0 appends a RENEW operation to identify which client
ID is probing. Minor version 1 has a SEQUENCE operation in the
compound which effectively carries the same information.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:30:03 +0800
352297b91 NFS: Handle NFS4ERR_MOVED during delegation recall ... Browse Code »

When a server returns NFS4ERR_MOVED during a delegation recall,
trigger the new migration recovery logic in the state manager.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:25:30 +0800
519ae255d NFS: Add migration recovery callouts in nfs4proc.c ... Browse Code »

When a server returns NFS4ERR_MOVED, trigger the new migration
recovery logic in the state manager.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:25:23 +0800
9f51a78e3 NFS: Rename "stateid_invalid" label ... Browse Code »

I'm going to use this exit label also for migration recovery
failures.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:25:10 +0800
f1478c13c NFS: Re-use exit code in nfs4_async_handle_error() ... Browse Code »

Clean up.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:24:55 +0800
c9fdeb280 NFS: Add basic migration support to state manager thread ... Browse Code »

Migration recovery and state recovery must be serialized, so handle
both in the state manager thread.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:24:40 +0800
ce6cda184 NFS: Add a super_block backpointer to the nfs_server struct ... Browse Code »

NFS_SB() returns the pointer to an nfs_server struct, given a
pointer to a super_block. But we have no way to go back the other
way.

Add a super_block backpointer field so that, given an nfs_server
struct, it is easy to get to the filesystem's root dentry.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:24:26 +0800
b03d735b4 NFS: Add method to retrieve fs_locations during migration recovery ... Browse Code »

The nfs4_proc_fs_locations() function is invoked during referral
processing to perform a GETATTR(fs_locations) on an object's parent
directory in order to discover the target of the referral. It
performs a LOOKUP in the compound, so the client needs to know the
parent's file handle a priori.

Unfortunately this function is not adequate for handling migration
recovery. We need to probe fs_locations information on an FSID, but
there's no parent directory available for many operations that
can return NFS4ERR_MOVED.

Another subtlety: recovering from NFS4ERR_LEASE_MOVED is a process
of walking over a list of known FSIDs that reside on the server, and
probing whether they have migrated. Once the server has detected
that the client has probed all migrated file systems, it stops
returning NFS4ERR_LEASE_MOVED.

A minor version zero server needs to know what client ID is
requesting fs_locations information so it can clear the flag that
forces it to continue returning NFS4ERR_LEASE_MOVED. This flag is
set per client ID and per FSID. However, the client ID is not an
argument of either the PUTFH or GETATTR operations. Later minor
versions have client ID information embedded in the compound's
SEQUENCE operation.

Therefore, by convention, minor version zero clients send a RENEW
operation in the same compound as the GETATTR(fs_locations), since
RENEW's one argument is a clientid4. This allows a minor version
zero server to identify correctly the client that is probing for a
migration.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:24:00 +0800
9e6ee76df NFS: Export _nfs_display_fhandle() ... Browse Code »

Allow code in nfsv4.ko to use _nfs_display_fhandle().

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:23:35 +0800
ec011fe84 NFS: Introduce a vector of migration recovery ops ... Browse Code »

The differences between minor version 0 and minor version 1
migration will be abstracted by the addition of a set of migration
recovery ops.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:23:17 +0800
800c06a5b NFS: Add functions to swap transports during migration recovery ... Browse Code »

Introduce functions that can walk through an array of returned
fs_locations information and connect a transport to one of the
destination servers listed therein.

Note that NFS minor version 1 introduces "fs_locations_info" which
extends the locations array sorting criteria available to clients.
This is not supported yet.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:23:07 +0800
32e62b7c3 NFS: Add nfs4_update_server ... Browse Code »

New function nfs4_update_server() moves an nfs_server to a different
nfs_client. This is done as part of migration recovery.

Though it may be appealing to think of them as the same thing,
migration recovery is not the same as following a referral.

For a referral, the client has not descended into the file system
yet: it has no nfs_server, no super block, no inodes or open state.
It is enough to simply instantiate the nfs_server and super block,
and perform a referral mount.

For a migration, however, we have all of those things already, and
they have to be moved to a different nfs_client. No local namespace
changes are needed here.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2013-10-29 03:22:29 +0800
d2bfda2e7 NFSv4: don't reprocess cached open CLAIM_PREVIOUS ... Browse Code »

Cached opens have already been handled by _nfs4_opendata_reclaim_to_nfs4_state
and can safely skip being reprocessed, but must still call update_open_stateid
to make sure that all active fmodes are recovered.

Signed-off-by: Weston Andros Adamson
Cc: stable@vger.kernel.org # 3.7.x: f494a6071d3: NFSv4: fix NULL dereference
Cc: stable@vger.kernel.org # 3.7.x: a43ec98b72a: NFSv4: don't fail on missin
Cc: stable@vger.kernel.org # 3.7.x
Signed-off-by: Trond Myklebust

Weston Andros Adamson
2013-10-29 03:10:56 +0800
d49f042ae NFSv4: Fix state reference counting in _nfs4_opendata_reclaim_to_nfs4_state ... Browse Code »

Currently, if the call to nfs_refresh_inode fails, then we end up leaking
a reference count, due to the call to nfs4_get_open_state.
While we're at it, replace nfs4_get_open_state with a simple call to
atomic_inc(); there is no need to do a full lookup of the struct nfs_state
since it is passed as an argument in the struct nfs4_opendata, and
is already assigned to the variable 'state'.

Cc: stable@vger.kernel.org # 3.7.x: a43ec98b72a: NFSv4: don't fail on missing
Cc: stable@vger.kernel.org # 3.7.x
Signed-off-by: Trond Myklebust

Trond Myklebust
2013-10-29 02:57:12 +0800
a43ec98b7 NFSv4: don't fail on missing fattr in open recover ... Browse Code »

This is an unneeded check that could cause the client to fail to recover
opens.

Signed-off-by: Weston Andros Adamson
Signed-off-by: Trond Myklebust

Weston Andros Adamson
2013-10-29 02:54:03 +0800
f494a6071 NFSv4: fix NULL dereference in open recover ... Browse Code »

_nfs4_opendata_reclaim_to_nfs4_state doesn't expect to see a cached
open CLAIM_PREVIOUS, but this can happen. An example is when there are
RDWR openers and RDONLY openers on a delegation stateid. The recovery
path will first try an open CLAIM_PREVIOUS for the RDWR openers, this
marks the delegation as not needing RECLAIM anymore, so the open
CLAIM_PREVIOUS for the RDONLY openers will not actually send an rpc.

The NULL dereference is due to _nfs4_opendata_reclaim_to_nfs4_state
returning PTR_ERR(rpc_status) when !rpc_done. When the open is
cached, rpc_done == 0 and rpc_status == 0, thus
_nfs4_opendata_reclaim_to_nfs4_state returns NULL - this is unexpected
by callers of nfs4_opendata_to_nfs4_state().

This can be reproduced easily by opening the same file two times on an
NFSv4.0 mount with delegations enabled, once as RDWR and once as RDONLY then
sleeping for a long time. While the files are held open, kick off state
recovery and this NULL dereference will be hit every time.

An example OOPS:

[ 65.003602] BUG: unable to handle kernel NULL pointer dereference at 00000000
00000030
[ 65.005312] IP: [] __nfs4_close+0x1e/0x160 [nfsv4]
[ 65.006820] PGD 7b0ea067 PUD 791ff067 PMD 0
[ 65.008075] Oops: 0000 [#1] SMP
[ 65.008802] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache
snd_ens1371 gameport nfsd snd_rawmidi snd_ac97_codec ac97_bus btusb snd_seq snd
_seq_device snd_pcm ppdev bluetooth auth_rpcgss coretemp snd_page_alloc crc32_pc
lmul crc32c_intel ghash_clmulni_intel microcode rfkill nfs_acl vmw_balloon serio
_raw snd_timer lockd parport_pc e1000 snd soundcore parport i2c_piix4 shpchp vmw
_vmci sunrpc ata_generic mperf pata_acpi mptspi vmwgfx ttm scsi_transport_spi dr
m mptscsih mptbase i2c_core
[ 65.018684] CPU: 0 PID: 473 Comm: 192.168.10.85-m Not tainted 3.11.2-201.fc19
.x86_64 #1
[ 65.020113] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
Reference Platform, BIOS 6.00 07/31/2013
[ 65.022012] task: ffff88003707e320 ti: ffff88007b906000 task.ti: ffff88007b906000
[ 65.023414] RIP: 0010:[] [] __nfs4_close+0x1e/0x160 [nfsv4]
[ 65.025079] RSP: 0018:ffff88007b907d10 EFLAGS: 00010246
[ 65.026042] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 65.027321] RDX: 0000000000000050 RSI: 0000000000000001 RDI: 0000000000000000
[ 65.028691] RBP: ffff88007b907d38 R08: 0000000000016f60 R09: 0000000000000000
[ 65.029990] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 65.031295] R13: 0000000000000050 R14: 0000000000000000 R15: 0000000000000001
[ 65.032527] FS: 0000000000000000(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000
[ 65.033981] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 65.035177] CR2: 0000000000000030 CR3: 000000007b27f000 CR4: 00000000000407f0
[ 65.036568] Stack:
[ 65.037011] 0000000000000000 0000000000000001 ffff88007b907d90 ffff88007a880220
[ 65.038472] ffff88007b768de8 ffff88007b907d48 ffffffffa037e4a5 ffff88007b907d80
[ 65.039935] ffffffffa036a6c8 ffff880037020e40 ffff88007a880000 ffff880037020e40
[ 65.041468] Call Trace:
[ 65.042050] [] nfs4_close_state+0x15/0x20 [nfsv4]
[ 65.043209] [] nfs4_open_recover_helper+0x148/0x1f0 [nfsv4]
[ 65.044529] [] nfs4_open_recover+0x116/0x150 [nfsv4]
[ 65.045730] [] nfs4_open_reclaim+0xad/0x150 [nfsv4]
[ 65.046905] [] nfs4_do_reclaim+0x149/0x5f0 [nfsv4]
[ 65.048071] [] nfs4_run_state_manager+0x3bc/0x670 [nfsv4]
[ 65.049436] [] ? nfs4_do_reclaim+0x5f0/0x5f0 [nfsv4]
[ 65.050686] [] ? nfs4_do_reclaim+0x5f0/0x5f0 [nfsv4]
[ 65.051943] [] kthread+0xc0/0xd0
[ 65.052831] [] ? insert_kthread_work+0x40/0x40
[ 65.054697] [] ret_from_fork+0x7c/0xb0
[ 65.056396] [] ? insert_kthread_work+0x40/0x40
[ 65.058208] Code: 5c 41 5d 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 89 d5 41 54 53 48 89 fb 8b 67 30 f0 41 ff 44 24 44 49 8d 7c 24 40 e8 0e 0a 2d e1 44
[ 65.065225] RIP [] __nfs4_close+0x1e/0x160 [nfsv4]
[ 65.067175] RSP
[ 65.068570] CR2: 0000000000000030
[ 65.070098] ---[ end trace 0d1fe4f5c7dd6f8b ]---

Cc: #3.7+
Signed-off-by: Weston Andros Adamson
Signed-off-by: Trond Myklebust

Weston Andros Adamson
2013-10-29 02:53:32 +0800
83c78eb04 NFSv4.1: Don't change the security label as part of open reclaim. ... Browse Code »

The current caching model calls for the security label to be set on
first lookup and/or on any subsequent label changes. There is no
need to do it as part of an open reclaim.

Signed-off-by: Trond Myklebust

Trond Myklebust
2013-10-29 02:50:38 +0800
1966903f8 nfs: fix handling of invalid mount options in nfs_remount ... Browse Code »

nfs_parse_mount_options returns 0 on error, not -errno.

Reported-by: Karel Zak
Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust

Jeff Layton
2013-10-29 02:35:07 +0800
57acc40d7 nfs: reject version and minorversion changes on remount attempts ... Browse Code »

Reported-by: Eric Doutreleau
Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust

Jeff Layton
2013-10-29 02:30:23 +0800
3660cd432 NFSv4 Remove zeroing state kern warnings ... Browse Code »

As of commit 5d422301f97b821301efcdb6fc9d1a83a5c102d6 we no longer zero the
state.

Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust

Andy Adamson
2013-10-29 02:28:53 +0800

02 Oct, 2013

2 commits

99875249b NFSv4: Ensure that we disable the resend timeout for NFSv4 ... Browse Code »

The spec states that the client should not resend requests because
the server will disconnect if it needs to drop an RPC request.

Signed-off-by: Trond Myklebust

Trond Myklebust
2013-10-02 06:22:11 +0800
a6f951ddb NFSv4: Fix a use-after-free situation in _nfs4_proc_getlk() ... Browse Code »

In nfs4_proc_getlk(), when some error causes a retry of the call to
_nfs4_proc_getlk(), we can end up with Oopses of the form

BUG: unable to handle kernel NULL pointer dereference at 0000000000000134
IP: [] _raw_spin_lock+0xe/0x30

Call Trace:
[] _atomic_dec_and_lock+0x4d/0x70
[] nfs4_put_lock_state+0x32/0xb0 [nfsv4]
[] nfs4_fl_release_lock+0x15/0x20 [nfsv4]
[] _nfs4_proc_getlk.isra.40+0x146/0x170 [nfsv4]
[] nfs4_proc_lock+0x399/0x5a0 [nfsv4]

The problem is that we don't clear the request->fl_ops after the first
try and so when we retry, nfs4_set_lock_state() exits early without
setting the lock stateid.
Regression introduced by commit 70cc6487a4e08b8698c0e2ec935fb48d10490162
(locks: make ->lock release private data before returning in GETLK case)

Reported-by: Weston Andros Adamson
Reported-by: Jorge Mora
Signed-off-by: Trond Myklebust
Cc: #2.6.22+

Trond Myklebust
2013-10-02 06:21:28 +0800

01 Oct, 2013

4 commits

f92731884 Merge tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client bugfixes from Trond Myklebust:
- Stable fix for Oopses in the pNFS files layout driver
- Fix a regression when doing a non-exclusive file create on NFSv4.x
- NFSv4.1 security negotiation fixes when looking up the root
filesystem
- Fix a memory ordering issue in the pNFS files layout driver

* tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Give "flavor" an initial value to fix a compile warning
NFSv4.1: try SECINFO_NO_NAME flavs until one works
NFSv4.1: Ensure memory ordering between nfs4_ds_connect and nfs4_fl_prepare_ds
NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails
NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem method

Linus Torvalds
2013-10-01 08:10:26 +0800
522d6d38f Merge branch 'akpm' (fixes from Andrew Morton) ... Browse Code »

Merge misc fixes from Andrew Morton.

* emailed patches from Andrew Morton : (22 commits)
pidns: fix free_pid() to handle the first fork failure
ipc,msg: prevent race with rmid in msgsnd,msgrcv
ipc/sem.c: update sem_otime for all operations
mm/hwpoison: fix the lack of one reference count against poisoned page
mm/hwpoison: fix false report on 2nd attempt at page recovery
mm/hwpoison: fix test for a transparent huge page
mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood
block: change config option name for cmdline partition parsing
mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration
mm: avoid reinserting isolated balloon pages into LRU lists
arch/parisc/mm/fault.c: fix uninitialized variable usage
include/asm-generic/vtime.h: avoid zero-length file
nilfs2: fix issue with race condition of competition between segments for dirty blocks
Documentation/kernel-parameters.txt: replace kernelcore with Movable
mm/bounce.c: fix a regression where MS_SNAP_STABLE (stable pages snapshotting) was ignored
kernel/kmod.c: check for NULL in call_usermodehelper_exec()
ipc/sem.c: synchronize the proc interface
ipc/sem.c: optimize sem_lock()
ipc/sem.c: fix race in sem_lock()
mm/compaction.c: periodically schedule when freeing pages
...

Linus Torvalds
2013-10-01 05:32:32 +0800
7f42ec394 nilfs2: fix issue with race condition of competition between segments for dirty blocks ... Browse Code »

Many NILFS2 users were reported about strange file system corruption
(for example):

NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768
NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540)

But such error messages are consequence of file system's issue that takes
place more earlier. Fortunately, Jerome Poulin
and Anton Eliasson were reported about another
issue not so recently. These reports describe the issue with segctor
thread's crash:

BUG: unable to handle kernel paging request at 0000000000004c83
IP: nilfs_end_page_io+0x12/0xd0 [nilfs2]

Call Trace:
nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
nilfs_segctor_construct+0x17b/0x290 [nilfs2]
nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
kthread+0xc0/0xd0
ret_from_fork+0x7c/0xb0

These two issues have one reason. This reason can raise third issue
too. Third issue results in hanging of segctor thread with eating of
100% CPU.

REPRODUCING PATH:

One of the possible way or the issue reproducing was described by
Jermoe me Poulin :

1. init S to get to single user mode.
2. sysrq+E to make sure only my shell is running
3. start network-manager to get my wifi connection up
4. login as root and launch "screen"
5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
6. lscp | xz -9e > lscp.txt.xz
7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs
8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
9. start a screen and launch strace -f -o find-cat.log -t find
/mnt/nilfs -type f -exec cat {} > /dev/null \;
10. start a screen and launch strace -f -o apt-get.log -t apt-get update
11. launch the last command again as it did not crash the first time
12. apt-get crashes
13. ps aux > ps-aux-crashed.log
13. sysrq+W
14. sysrq+E wait for everything to terminate
15. sysrq+SUSB

Simplified way of the issue reproducing is starting kernel compilation
task and "apt-get update" in parallel.

REPRODUCIBILITY:

The issue is reproduced not stable [60% - 80%]. It is very important to
have proper environment for the issue reproducing. The critical
conditions for successful reproducing:

(1) It should have big modified file by mmap() way.

(2) This file should have the count of dirty blocks are greater that
several segments in size (for example, two or three) from time to time
during processing.

(3) It should be intensive background activity of files modification
in another thread.

INVESTIGATION:

First of all, it is possible to see that the reason of crash is not valid
page address:

NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783

Moreover, value of b_page (0x1a82) is 6786. This value looks like segment
number. And b_blocknr with b_size values look like block numbers. So,
buffer_head's pointer points on not proper address value.

Detailed investigation of the issue is discovered such picture:

[-----------------------------SEGMENT 6783-------------------------------]
NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024, segbuf->sb_segnum 6783

[-----------------------------SEGMENT 6784-------------------------------]
NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev ffff880221cffee8
NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784
NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0
[----------] ditto
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15

[-----------------------------SEGMENT 6785-------------------------------]
NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev ffff880221cffc88
NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785
NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0
[----------] ditto
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12

NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783
NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784
NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785

NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82

BUG: unable to handle kernel paging request at 0000000000001a82
IP: [] nilfs_end_page_io+0x12/0xd0 [nilfs2]

Usually, for every segment we collect dirty files in list. Then, dirty
blocks are gathered for every dirty file, prepared for write and
submitted by means of nilfs_segbuf_submit_bh() call. Finally, it takes
place complete write phase after calling nilfs_end_bio_write() on the
block layer. Buffers/pages are marked as not dirty on final phase and
processed files removed from the list of dirty files.

It is possible to see that we had three prepare_write and submit_bio
phases before segbuf_wait and complete_write phase. Moreover, segments
compete between each other for dirty blocks because on every iteration
of segments processing dirty buffer_heads are added in several lists of
payload_buffers:

[SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
[SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8

The next pointer is the same but prev pointer has changed. It means
that buffer_head has next pointer from one list but prev pointer from
another. Such modification can be made several times. And, finally, it
can be resulted in various issues: (1) segctor hanging, (2) segctor
crashing, (3) file system metadata corruption.

FIX:
This patch adds:

(1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
for every proccessed dirty block;

(2) checking of BH_Async_Write flag in
nilfs_lookup_dirty_data_buffers() and
nilfs_lookup_dirty_node_buffers();

(3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().

Reported-by: Jerome Poulin
Reported-by: Anton Eliasson
Cc: Paul Fertser
Cc: ARAI Shun-ichi
Cc: Piotr Szymaniak
Cc: Juan Barry Manuel Canham
Cc: Zahid Chowdhury
Cc: Elmer Zhang
Cc: Kenneth Langga
Signed-off-by: Vyacheslav Dubeyko
Acked-by: Ryusuke Konishi
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vyacheslav Dubeyko
2013-10-01 05:31:02 +0800
720236569 fs/binfmt_elf.c: prevent a coredump with a large vm_map_count from Oopsing ... Browse Code »

A high setting of max_map_count, and a process core-dumping with a large
enough vm_map_count could result in an NT_FILE note not being written,
and the kernel crashing immediately later because it has assumed
otherwise.

Reproduction of the oops-causing bug described here:

https://lkml.org/lkml/2013/8/30/50

Rge ussue originated in commit 2aa362c49c31 ("coredump: extend core dump
note section to contain file names of mapped file") from Oct 4, 2012.

This patch make that section optional in that case. fill_files_note()
should signify the error, and also let the info struct in
elf_core_dump() be zero-initialized so that we can check for the
optionally written note.

[akpm@linux-foundation.org: avoid abusing E2BIG, remove a couple of not-really-needed local variables]
[akpm@linux-foundation.org: fix sparse warning]
Signed-off-by: Dan Aloni
Cc: Al Viro
Cc: Denys Vlasenko
Reported-by: Martin MOKREJS
Tested-by: Martin MOKREJS
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Aloni
2013-10-01 05:31:01 +0800

30 Sep, 2013

2 commits

13f358389 afs: dget_parent() can't return a negative dentry ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-09-30 10:02:24 +0800
7b9a2378b ocfs2: needs ->d_lock to poke in ->d_parent->d_inode from ->d_revalidate() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-09-30 10:02:20 +0800