Eric Lee / smarc-fsl-linux-kernel

01 Feb, 2011

2 commits

0fd08c554 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 ... Browse Code »

* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: NFSv4 readdir loses entries
NFS: Micro-optimize nfs4_decode_dirent()
NFS: Fix an NFS client lockdep issue
NFS construct consistent co_ownerid for v4.1
NFS: nfs_wcc_update_inode() should set nfsi->attr_gencount
NFS improve pnfs_put_deviceid_cache debug print
NFS fix cb_sequence error processing
NFS do not find client in NFSv4 pg_authenticate
NLM: Fix "kernel BUG at fs/lockd/host.c:417!" or ".../host.c:283!"
NFS: Prevent memory allocation failure in nfsacl_encode()
NFS: nfsacl_{encode,decode} should return signed integer
NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!"
NFS: Fix "kernel BUG at fs/aio.c:554!"
NFS4: Avoid potential NULL pointer dereference in decode_and_add_ds().
NFS: fix handling of malloc failure during nfs_flush_multi()

Linus Torvalds
2011-02-01 07:41:02 +0800
fb9f1f17e Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: xfs_bmap_add_extent_delay_real should init br_startblock
xfs: fix dquot shaker deadlock
xfs: handle CIl transaction commit failures correctly
xfs: limit extsize to size of AGs and/or MAXEXTLEN
xfs: prevent extsize alignment from exceeding maximum extent size
xfs: limit extent length for allocation to AG size
xfs: speculative delayed allocation uses rounddown_power_of_2 badly
xfs: fix efi item leak on forced shutdown
xfs: fix log ticket leak on forced shutdown.

Linus Torvalds
2011-02-01 06:15:40 +0800

31 Jan, 2011

2 commits

af5eb745e NTFS: Fix invalid pointer dereference in ntfs_mft_record_alloc(). ... Browse Code »

In ntfs_mft_record_alloc() when mapping the new extent mft record with
map_extent_mft_record() we overwrite @m with the return value and on
error, we then try to use the old @m but that is no longer there as @m
now contains an error code instead so we crash when dereferencing the
error code as if it were a pointer.

The simple fix is to use a temporary variable to store the return value
thus preserving the original @m for later use. This is a backport from
the commercial Tuxera-NTFS driver and is well tested...

Thanks go to Julia Lawall for pointing this out (whilst I had fixed it
in the commercial driver I had failed to fix it in the Linux kernel).

Signed-off-by: Anton Altaparmakov
Signed-off-by: Linus Torvalds

Anton Altaparmakov
2011-01-31 10:58:11 +0800
9fbf0c08d Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: More crypto cleanup (try #2)
CIFS: Add strictcache mount option
CIFS: Implement cifs_strict_writev (try #4)
[CIFS] Replace cifs md5 hashing functions with kernel crypto APIs

Linus Torvalds
2011-01-31 10:56:27 +0800

29 Jan, 2011

3 commits

d1205f87b NFS: NFSv4 readdir loses entries ... Browse Code »

On recent 2.6.38-rc kernels, connectathon basic test 6 fails on
NFSv4 mounts of OpenSolaris with something like:

> ./test6: readdir
> ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.12' dir entry, pass 0
> ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.82' dir entry, pass 0
> ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.164' dir entry, pass 0
> ./test6: (/mnt/klimt/matisse.test) Test failed with 3 errors
> basic tests failed
> Tests failed, leaving /mnt/klimt mounted
> [cel@matisse cthon04]$

I narrowed the problem down to nfs4_decode_dirent() reporting that the
decode buffer had overflowed while decoding the entries for those
missing files.

verify_attr_len() assumes both it's pointer arguments reside on the
same page. When these arguments point to locations on two different
pages, verify_attr_len() can report false errors. This can happen now
that a large NFSv4 readdir result can span pages.

We have reasonably good checking in nfs4_decode_dirent() anyway, so
it should be safe to simply remove the extra checking.

At a guess, this was introduced by commit 6650239a, "NFS: Don't use
vm_map_ram() in readdir".

Cc: stable@kernel.org [2.6.37]
Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2011-01-29 02:41:35 +0800
c08e76d0c NFS: Micro-optimize nfs4_decode_dirent() ... Browse Code »

Make the decoding of NFSv4 directory entries slightly more efficient
by:

1. Avoiding unnecessary byte swapping when checking XDR booleans,
and

2. Not bumping "p" when its value will be immediately replaced by
xdr_inline_decode()

This commit makes nfs4_decode_dirent() consistent with similar logic
in the other two decode_dirent() functions.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2011-01-29 02:37:35 +0800
e00b8a240 NFS: Fix an NFS client lockdep issue ... Browse Code »

There is no reason to be freeing the delegation cred in the rcu callback,
and doing so is resulting in a lockdep complaint that rpc_credcache_lock
is being called from both softirq and non-softirq contexts.

Reported-by: Chuck Lever
Signed-off-by: Trond Myklebust
Cc: stable@kernel.org

Trond Myklebust
2011-01-29 02:37:09 +0800

28 Jan, 2011

10 commits

24446fc66 xfs: xfs_bmap_add_extent_delay_real should init br_startblock ... Browse Code »

When filling in the middle of a previous delayed allocation in
xfs_bmap_add_extent_delay_real, set br_startblock of the new delay
extent to the right to nullstartblock instead of 0 before inserting
the extent into the ifork (xfs_iext_insert), rather than setting
br_startblock afterward.

Adding the extent into the ifork with br_startblock=0 can lead to
the extent being copied into the btree by xfs_bmap_extent_to_btree
if we happen to convert from extents format to btree format before
updating br_startblock with the correct value. The unexpected
addition of this delay extent to the btree can cause subsequent
XFS_WANT_CORRUPTED_GOTO filesystem shutdown in several
xfs_bmap_add_extent_delay_real cases where we are converting a delay
extent to real and unexpectedly find an extent already inserted.
For example:

911 case BMAP_LEFT_FILLING:
912 /*
913 * Filling in the first part of a previous delayed allocation.
914 * The left neighbor is not contiguous.
915 */
916 trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_);
917 xfs_bmbt_set_startoff(ep, new_endoff);
918 temp = PREV.br_blockcount - new->br_blockcount;
919 xfs_bmbt_set_blockcount(ep, temp);
920 xfs_iext_insert(ip, idx, 1, new, state);
921 ip->i_df.if_lastex = idx;
922 ip->i_d.di_nextents++;
923 if (cur == NULL)
924 rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
925 else {
926 rval = XFS_ILOG_CORE;
927 if ((error = xfs_bmbt_lookup_eq(cur, new->br_startoff,
928 new->br_startblock, new->br_blockcount,
929 &i)))
930 goto done;
931 XFS_WANT_CORRUPTED_GOTO(i == 0, done);

With the bogus extent in the btree we shutdown the filesystem at
931. The conversion from extents to btree format happens when the
number of extents in the inode increases above ip->i_df.if_ext_max.
xfs_bmap_extent_to_btree copies extents from the ifork into the
btree, ignoring all delalloc extents which are denoted by
br_startblock having some value of nullstartblock.

SGI-PV: 1013221

Signed-off-by: Ben Myers
Reviewed-by: Dave Chinner
Signed-off-by: Alex Elder

bpm@sgi.com
2011-01-28 23:13:29 +0800
0fbca4d1c xfs: fix dquot shaker deadlock ... Browse Code »

Commit 368e136 ("xfs: remove duplicate code from dquot reclaim") fails
to unlock the dquot freelist when the number of loop restarts is
exceeded in xfs_qm_dqreclaim_one(). This causes hangs in memory
reclaim.

Rework the loop control logic into an unwind stack that all the
different cases jump into. This means there is only one set of code
that processes the loop exit criteria, and simplifies the unlocking
of all the items from different points in the loop. It also fixes a
double increment of the restart counter from the qi_dqlist_lock
case.

Reported-by: Malcolm Scott
Signed-off-by: Dave Chinner
Reviewed-by: Alex Elder

Dave Chinner
2011-01-28 23:05:36 +0800
c6f990d1f xfs: handle CIl transaction commit failures correctly ... Browse Code »

Failure to commit a transaction into the CIL is not handled
correctly. This currently can only happen when racing with a
shutdown and requires an explicit shutdown check, so it rare and can
be avoided. Remove the shutdown check and make the CIL commit a void
function to indicate it will always succeed, thereby removing the
incorrectly handled failure case.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Alex Elder

Dave Chinner
2011-01-28 23:05:36 +0800
5315837da xfs: limit extsize to size of AGs and/or MAXEXTLEN ... Browse Code »

The extent size hint can be set to larger than an AG. This means
that the alignment process can push the range to be allocated
outside the bounds of the AG, resulting in assert failures or
corrupted bmbt records. Similarly, if the extsize is larger than the
maximum extent size supported, the alignment process will produce
extents that are too large to fit into the bmbt records, resulting
in a different type of assert/corruption failure.

Fix this by limiting extsize at the time іt is set firstly to be
less than MAXEXTLEN, then to be a maximum of half the size of the
AGs in the filesystem for non-realtime inodes. Realtime inodes do
not allocate out of AGs, so don't have to be restricted by the size
of AGs.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Alex Elder

Dave Chinner
2011-01-28 23:05:36 +0800
4ce159890 xfs: prevent extsize alignment from exceeding maximum extent size ... Browse Code »

When doing delayed allocation, if the allocation size is for a
maximally sized extent, extent size alignment can push it over this
limit. This results in an assert failure in xfs_bmbt_set_allf() as
the extent length is too large to find in the extent record.

Fix this by ensuring that we allow for space that extent size
alignment requires (up to 2 * (extsize -1) blocks as we have to
handle both head and tail alignment) when limiting the maximum size
of the extent.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Alex Elder

Dave Chinner
2011-01-28 23:05:36 +0800
14b064cea xfs: limit extent length for allocation to AG size ... Browse Code »

Delayed allocation extents can be larger than AGs, so when trying to
convert a large range we may scan every AG inside
xfs_bmap_alloc_nullfb() trying to find an AG with a size larger than
an AG. We should stop when we find the first AG with a maximum
possible allocation size. This causes excessive CPU usage when there
are lots of AGs.

The same problem occurs when doing preallocation of a range larger
than an AG.

Fix the problem by limiting real allocation lengths to the maximum
that an AG can support. This means if we have empty AGs, we'll stop
the search at the first of them. If there are no empty AGs, we'll
still scan them all, but that is a different problem....

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Alex Elder

Dave Chinner
2011-01-28 23:05:35 +0800
b8fc82630 xfs: speculative delayed allocation uses rounddown_power_of_2 badly ... Browse Code »

rounddown_power_of_2() returns an undefined result when passed a
value of zero. The specualtive delayed allocation code is doing this
when the inode is zero length. Hence occasionally the preallocation
is much, much larger than is necessary (e.g. 8GB for a 270 _byte_
file). Ensure we don't even pass a zero value to this function so
the result of preallocation is always the desired size.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Alex Elder

Dave Chinner
2011-01-28 23:05:35 +0800
e34a314c5 xfs: fix efi item leak on forced shutdown ... Browse Code »

After test 139, kmemleak shows:

unreferenced object 0xffff880078b405d8 (size 400):
comm "xfs_io", pid 4904, jiffies 4294909383 (age 1186.728s)
hex dump (first 32 bytes):
60 c1 17 79 00 88 ff ff 60 c1 17 79 00 88 ff ff `..y....`..y....
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[] kmemleak_alloc+0x2d/0x60
[] kmem_cache_alloc+0x13f/0x2b0
[] kmem_zone_alloc+0x77/0xf0
[] kmem_zone_zalloc+0x1e/0x50
[] xfs_efi_init+0x4b/0xb0
[] xfs_trans_get_efi+0x58/0x90
[] xfs_bmap_finish+0x8b/0x1d0
[] xfs_itruncate_finish+0x2c4/0x5d0
[] xfs_setattr+0x8df/0xa70
[] xfs_vn_setattr+0x1b/0x20
[] notify_change+0x170/0x2e0
[] do_truncate+0x66/0xa0
[] sys_ftruncate+0xdb/0xe0
[] system_call_fastpath+0x16/0x1b
[] 0xffffffffffffffff

The cause of the leak is that the "remove" parameter of IOP_UNPIN()
is never set when a CIL push is aborted. This means that the EFI
item is never freed if it was in the push being cancelled. The
problem is specific to delayed logging, but has uncovered a couple
of problems with the handling of IOP_UNPIN(remove).

Firstly, we cannot safely call xfs_trans_del_item() from IOP_UNPIN()
in the CIL commit failure path or the iclog write failure path
because for delayed loging we have no transaction context. Hence we
must only call xfs_trans_del_item() if the log item being unpinned
has an active log item descriptor.

Secondly, xfs_trans_uncommit() does not handle log item descriptor
freeing during the traversal of log items on a transaction. It can
reference a freed log item descriptor when unpinning an EFI item.
Hence it needs to use a safe list traversal method to allow items to
be removed from the transaction during IOP_UNPIN().

Signed-off-by: Dave Chinner
Reviewed-by: Alex Elder

Dave Chinner
2011-01-28 23:01:33 +0800
b12ece7d8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: avoid picking MDS that is not active
ceph: avoid immediate cap check after import
ceph: fix flushing of caps vs cap import
ceph: fix erroneous cap flush to non-auth mds
ceph: fix cap_wanted_delay_{min,max} mount option initialization
ceph: fix xattr rbtree search
ceph: fix getattr on directory when using norbytes

Linus Torvalds
2011-01-28 10:12:58 +0800
ee2c92585 cifs: More crypto cleanup (try #2) ... Browse Code »

Replaced md4 hashing function local to cifs module with kernel crypto APIs.
As a result, md4 hashing function and its supporting functions in
file md4.c are not needed anymore.

Cleaned up function declarations, removed forward function declarations,
and removed a header file that is being deleted from being included.

Verified that sec=ntlm/i, sec=ntlmv2/i, and sec=ntlmssp/i work correctly.

Signed-off-by: Shirish Pargaonkar
Reviewed-by: Jeff Layton
Signed-off-by: Steve French

Shirish Pargaonkar
2011-01-28 03:58:13 +0800

27 Jan, 2011

1 commit

7db37c5e6 xfs: fix log ticket leak on forced shutdown. ... Browse Code »

The kmemleak detector shows this after test 139:

unreferenced object 0xffff880079b88bb0 (size 264):
comm "xfs_io", pid 4904, jiffies 4294909382 (age 276.824s)
hex dump (first 32 bytes):
00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N..........
ff ff ff ff ff ff ff ff 48 7b c9 82 ff ff ff ff ........H{......
backtrace:
[] kmemleak_alloc+0x2d/0x60
[] kmem_cache_alloc+0x13f/0x2b0
[] kmem_zone_alloc+0x77/0xf0
[] kmem_zone_zalloc+0x1e/0x50
[] xlog_ticket_alloc+0x34/0x170
[] xlog_cil_push+0xa4/0x3f0
[] xlog_cil_force_lsn+0x15a/0x160
[] _xfs_log_force_lsn+0x75/0x2d0
[] _xfs_trans_commit+0x2bd/0x2f0
[] xfs_iomap_write_allocate+0x1ad/0x350
[] xfs_map_blocks+0x21f/0x370
[] xfs_vm_writepage+0x1c7/0x550
[] __writepage+0x1a/0x50
[] write_cache_pages+0x1c2/0x4c0
[] generic_writepages+0x27/0x30
[] xfs_vm_writepages+0x5d/0x80

By inspection, the leak occurs when xlog_write() returns and error
and we jump to the abort path without dropping the reference on the
active ticket.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Alex Elder

Dave Chinner
2011-01-27 09:02:00 +0800

26 Jan, 2011

18 commits

c7a360b05 NFS construct consistent co_ownerid for v4.1 ... Browse Code »

As stated in section 2.4 of RFC 5661, subsequent instances of the client need
to present the same co_ownerid. Concatinate the client's IP dot address,
host name, and the rpc_auth pseudoflavor to form the co_ownerid.

Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust

Andy Adamson
2011-01-26 11:49:14 +0800
ac751efa6 console: rename acquire/release_console_sem() to console_lock/unlock() ... Browse Code »

The -rt patches change the console_semaphore to console_mutex. As a
result, a quite large chunk of the patches changes all
acquire/release_console_sem() to acquire/release_console_mutex()

This commit makes things use more neutral function names which dont make
implications about the underlying lock.

The only real change is the return value of console_trylock which is
inverted from try_acquire_console_sem()

This patch also paves the way to switching console_sem from a semaphore to
a mutex.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: make console_trylock return 1 on success, per Geert]
Signed-off-by: Torben Hohn
Cc: Thomas Gleixner
Cc: Greg KH
Cc: Ingo Molnar
Cc: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Torben Hohn
2011-01-26 08:50:06 +0800
3689456b4 squashfs: fix use of uninitialised variable in zlib & xz decompressors ... Browse Code »

Fix potential use of uninitialised variable caused by recent
decompressor code optimisations.

In zlib_uncompress (zlib_wrapper.c) we have

int zlib_err, zlib_init = 0;
...
do {
...
if (avail == 0) {
offset = 0;
put_bh(bh[k++]);
continue;
}
...
zlib_err = zlib_inflate(stream, Z_SYNC_FLUSH);
...
} while (zlib_err == Z_OK);

If continue is executed (avail == 0) then the while condition will be
evaluated testing zlib_err, which is uninitialised first time around the
loop.

Fix this by getting rid of the 'if (avail == 0)' condition test, this
edge condition should not be being handled in the decompressor code, and
instead handle it generically in the caller code.

Similarly for xz_wrapper.c.

Incidentally, on most architectures (bar Mips and Parisc), no
uninitialised variable warning is generated by gcc, this is because the
while condition test on continue is optimised out and not performed
(when executing continue zlib_err has not been changed since entering
the loop, and logically if the while condition was true previously, then
it's still true).

Signed-off-by: Phillip Lougher
Reported-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Phillip Lougher
2011-01-26 08:50:05 +0800
3af03655e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
nilfs2: fix crash after one superblock became unavailable

Linus Torvalds
2011-01-26 07:03:36 +0800
27dc1cd3a NFS: nfs_wcc_update_inode() should set nfsi->attr_gencount ... Browse Code »

If the call to nfs_wcc_update_inode() results in an attribute update, we
need to ensure that the inode's attr_gencount gets bumped too, otherwise
we are not protected against races with other GETATTR calls.

Signed-off-by: Trond Myklebust

Trond Myklebust
2011-01-26 04:28:21 +0800
b2a2897dc NFS improve pnfs_put_deviceid_cache debug print ... Browse Code »

What we really want to know is the ref count.

Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust

Andy Adamson
2011-01-26 04:26:51 +0800
2c4cdf8f6 NFS fix cb_sequence error processing ... Browse Code »

Always assign the cb_process_state nfs_client pointer so a processing error
in cb_sequence after the nfs_client is found and referenced returns
a non-NULL cb_process_state nfs_client and the matching nfs_put_client in
nfs4_callback_compound dereferences the client.

Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust

Andy Adamson
2011-01-26 04:26:51 +0800
778be232a NFS do not find client in NFSv4 pg_authenticate ... Browse Code »

The information required to find the nfs_client cooresponding to the incoming
back channel request is contained in the NFS layer. Perform minimal checking
in the RPC layer pg_authenticate method, and push more detailed checking into
the NFS layer where the nfs_client can be found.

Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust

Andy Adamson
2011-01-26 04:26:51 +0800
80c30e8de NLM: Fix "kernel BUG at fs/lockd/host.c:417!" or ".../host.c:283!" ... Browse Code »

Nick Bowler reports:

> We were just having some NFS server troubles, and my client machine
> running 2.6.38-rc1+ (specifically, commit 2b1caf6ed7b888c95) crashed
> hard (syslog output appended to this mail).
>
> I'm not sure what the exact timeline was or how to reproduce this,
> but the server was rebooted during all this. Since I've never seen
> this happen before, it is possibly a regression from previous kernel
> releases. However, I recently updated my nfs-utils (on the client) to
> version 1.2.3, so that might be related as well.

[ BUG output redacted ]

When done searching, the for_each_host loop in next_host_state() falls
through and returns the final host on the host chain without bumping
it's reference count.

Since the host's ref count is only one at that point, releasing the
host in nlm_host_rebooted() attempts to destroy the host prematurely,
and therefore hits a BUG().

Likely, the original intent of the for_each_host behavior in
next_host_state() was to handle the case when the host chain is empty.
Searching the chain and finding no suitable host to return needs to be
handled as well.

Defensively restructure next_host_state() always to return NULL when
the loop falls through.

Introduced by commit b10e30f6 "lockd: reorganize nlm_host_rebooted".

Cc: J. Bruce Fields
Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2011-01-26 04:24:47 +0800
f61f6da0d NFS: Prevent memory allocation failure in nfsacl_encode() ... Browse Code »

nfsacl_encode() allocates memory in certain cases. This of course
is not guaranteed to work.

Since commit 9f06c719 "SUNRPC: New xdr_streams XDR encoder API", the
kernel's XDR encoders can't return a result indicating possibly a
failure, so a memory allocation failure in nfsacl_encode() has become
fatal (ie, the XDR code Oopses) in some cases.

However, the allocated memory is a tiny fixed amount, on the order
of 40-50 bytes. We can easily use a stack-allocated buffer for
this, with only a wee bit of nose-holding.

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2011-01-26 04:24:47 +0800
731f3f482 NFS: nfsacl_{encode,decode} should return signed integer ... Browse Code »

Clean up.

The nfsacl_encode() and nfsacl_decode() functions return negative
errno values, and each call site verifies that the returned value
is not negative. Change the synopsis of both of these functions
to reflect this usage.

Document the synopsis and return values.

Reported-by: Trond Myklebust
Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2011-01-26 04:24:47 +0800
ee5dc7732 NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!" ... Browse Code »

Milan Broz reports:

> on today Linus' tree I get OOps if using nfs.
>
> server (2.6.36) exports dir:
> /dir 172.16.1.0/24(rw,async,all_squash,no_subtree_check,anonuid=500,anongid=500)
>
> on client it is mounted in fstab
> server:/dir /mnt/tst nfs rw,soft 0 0
>
> and these commands OOpses it (simplified from a configure script):
>
> cd /dir
> touch x
> install x y
>
> [ 105.327701] ------------[ cut here ]------------
> [ 105.327979] kernel BUG at fs/nfs/nfs3xdr.c:1338!
> [ 105.328075] invalid opcode: 0000 [#1] PREEMPT SMP
> [ 105.328223] last sysfs file: /sys/devices/virtual/bdi/0:16/uevent
> [ 105.328349] Modules linked in: usbcore dm_mod
> [ 105.328553]
> [ 105.328678] Pid: 3710, comm: install Not tainted 2.6.37+ #423 440BX Desktop Reference Platform/VMware Virtual Platform
> [ 105.328853] EIP: 0060:[] EFLAGS: 00010282 CPU: 0
> [ 105.329152] EIP is at nfs3_xdr_enc_setacl3args+0x61/0x98
> [ 105.329249] EAX: ffffffea EBX: ce941d98 ECX: 00000000 EDX: 00000004
> [ 105.329340] ESI: ce941cd0 EDI: 000000a4 EBP: ce941cc0 ESP: ce941cb4
> [ 105.329431] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [ 105.329525] Process install (pid: 3710, ti=ce940000 task=ced36f20 task.ti=ce940000)
> [ 105.336600] Stack:
> [ 105.336693] ce941cd0 ce9dc000 00000000 ce941cf8 c12ecd02 c12f43e0 c116c00b cf754158
> [ 105.336982] ce9dc004 cf754284 ce9dc004 cf7ffee8 ceff9978 ce9dc000 cf7ffee8 ce9dc000
> [ 105.337182] ce9dc000 ce941d14 c12e698d cf75412c ce941d98 cf7ffee8 cf7fff20 00000000
> [ 105.337405] Call Trace:
> [ 105.337695] [] rpcauth_wrap_req+0x75/0x7f
> [ 105.337806] [] ? xdr_encode_opaque+0x12/0x15
> [ 105.337898] [] ? nfs3_xdr_enc_setacl3args+0x0/0x98
> [ 105.337988] [] call_transmit+0x17e/0x1e8
> [ 105.338072] [] __rpc_execute+0x6d/0x1a6
> [ 105.338155] [] rpc_execute+0x34/0x37
> [ 105.338235] [] rpc_run_task+0xb5/0xbd
> [ 105.338316] [] rpc_call_sync+0x3d/0x58
> [ 105.338402] [] nfs3_proc_setacls+0x18e/0x24f
> [ 105.338493] [] ? __kmalloc+0x148/0x1c4
> [ 105.338579] [] ? posix_acl_alloc+0x12/0x22
> [ 105.338665] [] nfs3_proc_setacl+0xa0/0xca
> [ 105.338748] [] nfs3_setxattr+0x62/0x88
> [ 105.338834] [] ? sub_preempt_count+0x7c/0x89
> [ 105.338926] [] ? nfs3_setxattr+0x0/0x88
> [ 105.339026] [] __vfs_setxattr_noperm+0x26/0x95
> [ 105.339114] [] vfs_setxattr+0x5b/0x76
> [ 105.339211] [] setxattr+0x9d/0xc3
> [ 105.339298] [] ? handle_pte_fault+0x258/0x5cb
> [ 105.339428] [] ? __free_pages+0x1a/0x23
> [ 105.339517] [] ? up_read+0x16/0x2c
> [ 105.339599] [] ? fget+0x0/0xa3
> [ 105.339677] [] ? fget+0x0/0xa3
> [ 105.339760] [] ? get_parent_ip+0xb/0x31
> [ 105.339843] [] ? sub_preempt_count+0x7c/0x89
> [ 105.339931] [] sys_fsetxattr+0x51/0x79
> [ 105.340014] [] sysenter_do_call+0x12/0x32
> [ 105.340133] Code: 2e 76 18 00 58 31 d2 8b 7f 28 f6 43 04 01 74 03 8b 53 08 6a 00 8b 46 04 6a 01 8b 0b 52 89 fa e8 85 10 f8 ff 83 c4 0c 85 c0 79 04 0b eb fe 31 c9 f6 43 04 04 74 03 8b 4b 0c 68 00 10 00 00 8d
> [ 105.350321] EIP: [] nfs3_xdr_enc_setacl3args+0x61/0x98 SS:ESP 0068:ce941cb4
> [ 105.364385] ---[ end trace 01fcfe7f0f7f6e4a ]---

nfs3_xdr_enc_setacl3args() is not properly setting up the target
buffer before nfsacl_encode() attempts to encode the ACL.

Introduced by commit d9c407b1 "NFS: Introduce new-style XDR encoding
functions for NFSv3."

Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2011-01-26 04:24:47 +0800
839f7ad69 NFS: Fix "kernel BUG at fs/aio.c:554!" ... Browse Code »

Nick Piggin reports:

> I'm getting use after frees in aio code in NFS
>
> [ 2703.396766] Call Trace:
> [ 2703.396858] [] ? native_sched_clock+0x27/0x80
> [ 2703.396959] [] ? put_lock_stats+0xe/0x40
> [ 2703.397058] [] ? lock_release_holdtime+0xa8/0x140
> [ 2703.397159] [] lock_acquire+0x95/0x1b0
> [ 2703.397260] [] ? aio_put_req+0x2b/0x60
> [ 2703.397361] [] ? get_parent_ip+0x11/0x50
> [ 2703.397464] [] _raw_spin_lock_irq+0x41/0x80
> [ 2703.397564] [] ? aio_put_req+0x2b/0x60
> [ 2703.397662] [] aio_put_req+0x2b/0x60
> [ 2703.397761] [] do_io_submit+0x2be/0x7c0
> [ 2703.397895] [] sys_io_submit+0xb/0x10
> [ 2703.397995] [] system_call_fastpath+0x16/0x1b
>
> Adding some tracing, it is due to nfs completing the request then
> returning something other than -EIOCBQUEUED, so aio.c
> also completes the request.

To address this, prevent the NFS direct I/O engine from completing
async iocbs when the forward path returns an error without starting
any I/O.

This fix appears to survive ^C during both "xfstest no. 208" and "fsx
-Z."

It's likely this bug has existed for a very long while, as we are seeing
very similar symptoms in OEL 5. Copying stable.

Cc: Stable
Signed-off-by: Chuck Lever
Signed-off-by: Trond Myklebust

Chuck Lever
2011-01-26 04:24:47 +0800
ad3d2eedf NFS4: Avoid potential NULL pointer dereference in decode_and_add_ds(). ... Browse Code »

On Mon, 17 Jan 2011, Mi Jinlong wrote:

>
>
> Jesper Juhl:
> > strrchr() can return NULL if nothing is found. If this happens we'll
> > dereference a NULL pointer in
> > fs/nfs/nfs4filelayoutdev.c::decode_and_add_ds().
> >
> > I tried to find some other code that guarantees that this can never
> > happen but I was unsuccessful. So, unless someone else can point to some
> > code that ensures this can never be a problem, I believe this patch is
> > needed.
> >
> > While I was changing this code I also noticed that all the dprintk()
> > statements, except one, start with "%s:". The one missing the ":" I added
> > it to.
>
> Maybe another one also should be changed at decode_and_add_ds() at line 243:
>
> 243 printk("%s Decoded address and port %s\n", __func__, buf);
>
Missed that one. Thanks.

Signed-off-by: Jesper Juhl
Signed-off-by: Trond Myklebust

Jesper Juhl
2011-01-26 04:24:46 +0800
d39454ffe CIFS: Add strictcache mount option ... Browse Code »

Use for switching on strict cache mode. In this mode the
client reads from the cache all the time it has Oplock Level II,
otherwise - read from the server. As for write - the client stores
a data in the cache in Exclusive Oplock case, otherwise - write
directly to the server.

Signed-off-by: Pavel Shilovsky
Reviewed-by: Jeff Layton
Signed-off-by: Steve French

Pavel Shilovsky
2011-01-26 03:31:38 +0800
72432ffcf CIFS: Implement cifs_strict_writev (try #4) ... Browse Code »

If we don't have Exclusive oplock we write a data to the server.
Also set invalidate_mapping flag on the inode if we wrote something
to the server. Add cifs_iovec_write to let the client write iovec
buffers through CIFSSMBWrite2.

Signed-off-by: Pavel Shilovsky
Reviewed-by: Jeff Layton
Signed-off-by: Steve French

Pavel Shilovsky
2011-01-26 03:30:13 +0800
93c100c0b [CIFS] Replace cifs md5 hashing functions with kernel crypto APIs ... Browse Code »

Replace remaining use of md5 hash functions local to cifs module
with kernel crypto APIs.
Remove header and source file containing those local functions.

Signed-off-by: Shirish Pargaonkar
Reviewed-by: Jeff Layton
Signed-off-by: Steve French

Steve French
2011-01-26 03:28:43 +0800
d66bbd441 ceph: avoid picking MDS that is not active ... Browse Code »

Ignore replication or auth frag data if it indicates an MDS that is not
active. This can happen if the MDS shuts down and the client has stale
data about the namespace distribution across the MDS cluster. If that's
the case, fall back to directing the request based on the auth cap (which
should always be accurate).

Signed-off-by: Sage Weil

Sage Weil
2011-01-26 00:16:37 +0800

25 Jan, 2011

1 commit

c723fdab8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
Make CIFS mount work in a container.
CIFS: Remove pointless variable assignment in cifs_dfs_do_automount()

Linus Torvalds
2011-01-25 12:23:54 +0800

24 Jan, 2011

2 commits

f1d0c9986 Make CIFS mount work in a container. ... Browse Code »

Teach cifs about network namespaces, so mounting uses adresses/routing
visible from the container rather than from init context.

A container is a chroot on steroids that changes more than just the root
filesystem the new processes see. One thing containers can isolate is
"network namespaces", meaning each container can have its own set of
ethernet interfaces, each with its own own IP address and routing to the
outside world. And if you open a socket in _userspace_ from processes
within such a container, this works fine.

But sockets opened from within the kernel still use a single global
networking context in a lot of places, meaning the new socket's address
and routing are correct for PID 1 on the host, but are _not_ what
userspace processes in the container get to use.

So when you mount a network filesystem from within in a container, the
mount code in the CIFS driver uses the host's networking context and not
the container's networking context, so it gets the wrong address, uses
the wrong routing, and may even try to go out an interface that the
container can't even access... Bad stuff.

This patch copies the mount process's network context into the CIFS
structure that stores the rest of the server information for that mount
point, and changes the socket open code to use the saved network context
instead of the global network context. I.E. "when you attempt to use
these addresses, do so relative to THIS set of network interfaces and
routing rules, not the old global context from back before we supported
containers".

The big long HOWTO sets up a test environment on the assumption you've
never used ocntainers before. It basically says:

1) configure and build a new kernel that has container support
2) build a new root filesystem that includes the userspace container
control package (LXC)
3) package/run them under KVM (so you don't have to mess up your host
system in order to play with containers).
4) set up some containers under the KVM system
5) set up contradictory routing in the KVM system and the container so
that the host and the container see different things for the same address
6) try to mount a CIFS share from both contexts so you can both force it
to work and force it to fail.

For a long drawn out test reproduction sequence, see:

http://landley.livejournal.com/47024.html
http://landley.livejournal.com/47205.html
http://landley.livejournal.com/47476.html

Signed-off-by: Rob Landley
Reviewed-by: Jeff Layton
Signed-off-by: Steve French

Rob Landley
2011-01-24 12:28:51 +0800
3f391c79b CIFS: Remove pointless variable assignment in cifs_dfs_do_automount() ... Browse Code »

In fs/cifs/cifs_dfs_ref.c::cifs_dfs_do_automount() we have this code:

...
mnt = ERR_PTR(-EINVAL);
if (IS_ERR(tlink)) {
mnt = ERR_CAST(tlink);
goto free_full_path;
}
ses = tlink_tcon(tlink)->ses;

rc = get_dfs_path(xid, ses, full_path + 1, cifs_sb->local_nls,
&num_referrals, &referrals,
cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MAP_SPECIAL_CHR);

cifs_put_tlink(tlink);

mnt = ERR_PTR(-ENOENT);
...

The assignment of 'mnt = ERR_PTR(-EINVAL);' is completely pointless. If we
take the 'if (IS_ERR(tlink))' branch we'll set 'mnt' again and we'll also
do so if we do not take the branch. There is no way we'll ever use 'mnt'
with the assigned 'ERR_PTR(-EINVAL)' value, so we may as well just remove
the pointless assignment.

Signed-off-by: Jesper Juhl
Signed-off-by: Steve French

Jesper Juhl
2011-01-24 11:32:01 +0800

23 Jan, 2011

1 commit

ff5fdb614 fs: fix new dcache.c kernel-doc warnings ... Browse Code »

Fix new fs/dcache.c kernel-doc warnings:

Warning(fs/dcache.c:184): No description found for parameter 'dentry'
Warning(fs/dcache.c:296): No description found for parameter 'parent'
Warning(fs/dcache.c:1985): No description found for parameter 'dparent'
Warning(fs/dcache.c:1985): Excess function parameter 'parent' description in 'd_validate'

Signed-off-by: Randy Dunlap
Cc: Alexander Viro
Cc: Nick Piggin
Signed-off-by: Linus Torvalds

Randy Dunlap
2011-01-23 12:32:38 +0800