Eric Lee / smarc-fsl-linux-kernel

14 Mar, 2011

2 commits

c9c6cac0c kill path_lookup() ... Browse Code »

all remaining callers pass LOOKUP_PARENT to it, so
flags argument can die; renamed to kern_path_parent()

Signed-off-by: Al Viro

Al Viro
2011-03-14 21:15:23 +0800
586ce098a compat breakage in preadv() and pwritev() ... Browse Code »

Fix for a dumb preadv()/pwritev() compat bug - unlike the native
variants, compat_... ones forget to check FMODE_P{READ,WRITE}, so e.g.
on pipe the native preadv() will fail with -ESPIPE and compat one will
act as readv() and succeed. Not critical, but it's a clear bug with trivial
fix.

Signed-off-by: Al Viro

Al Viro
2011-03-14 07:21:26 +0800

10 Mar, 2011

12 commits

d891eedbc fs/dcache: allow d_obtain_alias() to return unhashed dentries ... Browse Code »

Without this patch, inodes are not promptly freed on last close of an
unlinked file by an nfs client:

client$ mount -tnfs4 server:/export/ /mnt/
client$ tail -f /mnt/FOO
...
server$ df -i /export
server$ rm /export/FOO
(^C the tail -f)
server$ df -i /export
server$ echo 2 >/proc/sys/vm/drop_caches
server$ df -i /export

the df's will show that the inode is not freed on the filesystem until
the last step, when it could have been freed after killing the client's
tail -f. On-disk data won't be deallocated either, leading to possible
spurious ENOSPC.

This occurs because when the client does the close, it arrives in a
compound with a putfh and a close, processed like:

- putfh: look up the filehandle. The only alias found for the
inode will be DCACHE_UNHASHED alias referenced by the filp
this, so it creates a new DCACHE_DISCONECTED dentry and
returns that instead.
- close: closes the existing filp, which is destroyed
immediately by dput() since it's DCACHE_UNHASHED.
- end of the compound: release the reference
to the current filehandle, and dput() the new
DCACHE_DISCONECTED dentry, which gets put on the
unused list instead of being destroyed immediately.

Nick Piggin suggested fixing this by allowing d_obtain_alias to return
the unhashed dentry that is referenced by the filp, instead of making it
create a new dentry.

Leave __d_find_alias() alone to avoid changing behavior of other
callers.

Also nfsd doesn't need all the checks of __d_find_alias(); any dentry,
hashed or unhashed, disconnected or not, should work.

Signed-off-by: J. Bruce Fields
Signed-off-by: Al Viro

J. Bruce Fields
2011-03-10 18:18:54 +0800
1ca551c6c Check for immutable/append flag in fallocate path ... Browse Code »

In the fallocate path the kernel doesn't check for the immutable/append
flag. It's possible to have a race condition in this scenario: an
application open a file in read/write and it does something, meanwhile
root set the immutable flag on the file, the application at that point
can call fallocate with success. In addition, we don't allow to do any
unreserve operation on an append only file but only the reserve one.

Signed-off-by: Marco Stornelli
Signed-off-by: Al Viro

Marco Stornelli
2011-03-10 17:22:15 +0800
9177ada99 fat: fix d_revalidate oopsen on NFS exports ... Browse Code »

can't blindly check nd->flags in ->d_revalidate()

Signed-off-by: Al Viro

Al Viro
2011-03-10 16:45:49 +0800
8ce84eeb5 jfs: fix d_revalidate oopsen on NFS exports ... Browse Code »

can't blindly check nd->flags in ->d_revalidate()

Signed-off-by: Al Viro

Al Viro
2011-03-10 16:45:28 +0800
4714e6373 ocfs2: fix d_revalidate oopsen on NFS exports ... Browse Code »

can't blindly check nd->flags in ->d_revalidate()

Signed-off-by: Al Viro

Al Viro
2011-03-10 16:45:07 +0800
53fe92416 gfs2: fix d_revalidate oopsen on NFS exports ... Browse Code »

can't blindly check nd->flags in ->d_revalidate()

Signed-off-by: Al Viro

Al Viro
2011-03-10 16:44:48 +0800
529c5f958 fuse: fix d_revalidate oopsen on NFS exports ... Browse Code »

can't blindly check nd->flags in ->d_revalidate()

Signed-off-by: Al Viro

Al Viro
2011-03-10 16:44:31 +0800
0eb980e31 ceph: fix d_revalidate oopsen on NFS exports ... Browse Code »

can't blindly check nd->flags in ->d_revalidate()

Signed-off-by: Al Viro

Al Viro
2011-03-10 16:44:05 +0800
c78f4cc5e reiserfs xattr ->d_revalidate() shouldn't care about RCU ... Browse Code »

... it returns an error unconditionally

Signed-off-by: Al Viro

Al Viro
2011-03-10 16:42:01 +0800
ae50adcb0 /proc/self is never going to be invalidated... ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-03-10 16:41:53 +0800
397949170 Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux ... Browse Code »

* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux:
nfsd: wrong index used in inner loop
nfsd4: fix bad pointer on failure to find delegation
NFSD: fix decode_cb_sequence4resok

Linus Torvalds
2011-03-10 06:52:09 +0800
78833dd70 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
nd->inode is not set on the second attempt in path_walk()
unfuck proc_sysctl ->d_compare()
minimal fix for do_filp_open() race

Linus Torvalds
2011-03-10 05:55:51 +0800

09 Mar, 2011

2 commits

b306419ae nd->inode is not set on the second attempt in path_walk() ... Browse Code »

We leave it at whatever it had been pointing to after the
first link_path_walk() had failed with -ESTALE. Things
do not work well after that...

Signed-off-by: Al Viro

Al Viro
2011-03-09 10:16:28 +0800
3ec07aa95 nfsd: wrong index used in inner loop ... Browse Code »

Index i was already used in the outer loop

Cc: stable@kernel.org
Signed-off-by: Roel Kluin
Signed-off-by: J. Bruce Fields

roel
2011-03-09 08:46:10 +0800

08 Mar, 2011

2 commits

dfef6dcd3 unfuck proc_sysctl ->d_compare() ... Browse Code »

a) struct inode is not going to be freed under ->d_compare();
however, the thing PROC_I(inode)->sysctl points to just might.
Fortunately, it's enough to make freeing that sucker delayed,
provided that we don't step on its ->unregistering, clear
the pointer to it in PROC_I(inode) before dropping the reference
and check if it's NULL in ->d_compare().

b) I'm not sure that we *can* walk into NULL inode here (we recheck
dentry->seq between verifying that it's still hashed / fetching
dentry->d_inode and passing it to ->d_compare() and there's no
negative hashed dentries in /proc/sys/*), but if we can walk into
that, we really should not have ->d_compare() return 0 on it!
Said that, I really suspect that this check can be simply killed.
Nick?

Signed-off-by: Al Viro

Al Viro
2011-03-08 15:22:27 +0800
32b007b4e nfsd4: fix bad pointer on failure to find delegation ... Browse Code »

In case of a nonempty list, the return on error here is obviously bogus;
it ends up being a pointer to the list head instead of to any valid
delegation on the list.

In particular, if nfsd4_delegreturn() hits this case, and you're quite unlucky,
then renew_client may oops, and it may take an embarassingly long time to
figure out why. Facepalm.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
IP: [] nfsd4_delegreturn+0x125/0x200
...

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2011-03-08 00:44:53 +0800

06 Mar, 2011

1 commit

fb62c00a6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: no .snap inside of snapped namespace
libceph: fix msgr standby handling
libceph: fix msgr keepalive flag
libceph: fix msgr backoff
libceph: retry after authorization failure
libceph: fix handling of short returns from get_user_pages
ceph: do not clear I_COMPLETE from d_release
ceph: do not set I_COMPLETE
Revert "ceph: keep reference to parent inode on ceph_dentry"

Linus Torvalds
2011-03-06 02:43:22 +0800

05 Mar, 2011

3 commits

e9e3d724e nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3) ... Browse Code »

The "bad_page()" page allocator sanity check was reported recently (call
chain as follows):

bad_page+0x69/0x91
free_hot_cold_page+0x81/0x144
skb_release_data+0x5f/0x98
__kfree_skb+0x11/0x1a
tcp_ack+0x6a3/0x1868
tcp_rcv_established+0x7a6/0x8b9
tcp_v4_do_rcv+0x2a/0x2fa
tcp_v4_rcv+0x9a2/0x9f6
do_timer+0x2df/0x52c
ip_local_deliver+0x19d/0x263
ip_rcv+0x539/0x57c
netif_receive_skb+0x470/0x49f
:virtio_net:virtnet_poll+0x46b/0x5c5
net_rx_action+0xac/0x1b3
__do_softirq+0x89/0x133
call_softirq+0x1c/0x28
do_softirq+0x2c/0x7d
do_IRQ+0xec/0xf5
default_idle+0x0/0x50
ret_from_intr+0x0/0xa
default_idle+0x29/0x50
cpu_idle+0x95/0xb8
start_kernel+0x220/0x225
_sinittext+0x22f/0x236

It occurs because an skb with a fraglist was freed from the tcp
retransmit queue when it was acked, but a page on that fraglist had
PG_Slab set (indicating it was allocated from the Slab allocator (which
means the free path above can't safely free it via put_page.

We tracked this back to an nfsv4 setacl operation, in which the nfs code
attempted to fill convert the passed in buffer to an array of pages in
__nfs4_proc_set_acl, which gets used by the skb->frags list in
xs_sendpages. __nfs4_proc_set_acl just converts each page in the buffer
to a page struct via virt_to_page, but the vfs allocates the buffer via
kmalloc, meaning the PG_slab bit is set. We can't create a buffer with
kmalloc and free it later in the tcp ack path with put_page, so we need
to either:

1) ensure that when we create the list of pages, no page struct has
PG_Slab set

or

2) not use a page list to send this data

Given that these buffers can be multiple pages and arbitrarily sized, I
think (1) is the right way to go. I've written the below patch to
allocate a page from the buddy allocator directly and copy the data over
to it. This ensures that we have a put_page free-able page for every
entry that winds up on an skb frag list, so it can be safely freed when
the frame is acked. We do a put page on each entry after the
rpc_call_sync call so as to drop our own reference count to the page,
leaving only the ref count taken by tcp_sendpages. This way the data
will be properly freed when the ack comes in

Successfully tested by myself to solve the above oops.

Note, as this is the result of a setacl operation that exceeded a page
of data, I think this amounts to a local DOS triggerable by an
uprivlidged user, so I'm CCing security on this as well.

Signed-off-by: Neil Horman
CC: Trond Myklebust
CC: security@kernel.org
CC: Jeff Layton
Signed-off-by: Linus Torvalds

Neil Horman
2011-03-05 09:28:52 +0800
455cec0ab ceph: no .snap inside of snapped namespace ... Browse Code »

Otherwise you can do things like

# mkdir .snap/foo
# cd .snap/foo/.snap
# ls

Signed-off-by: Sage Weil

Sage Weil
2011-03-05 04:25:09 +0800
1858efd47 minimal fix for do_filp_open() race ... Browse Code »

failure exits on the no-O_CREAT side of do_filp_open() merge with
those of O_CREAT one; unfortunately, if do_path_lookup() returns
-ESTALE, we'll get out_filp:, notice that we are about to return
-ESTALE without having trying to create the sucker with LOOKUP_REVAL
and jump right into the O_CREAT side of code. And proceed to try
and create a file. Usually that'll fail with -ESTALE again, but
we can race and get that attempt of pathname resolution to succeed.

open() without O_CREAT really shouldn't end up creating files, races
or not. The real fix is to rearchitect the whole do_filp_open(),
but for now splitting the failure exits will do.

Signed-off-by: Al Viro

Al Viro
2011-03-05 02:14:21 +0800

04 Mar, 2011

6 commits

833602694 Merge branch 'i_nlink' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'i_nlink' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
hfs: fix rename() over non-empty directory
udf: fix i_nlink limit
fix reiserfs mkdir() breakage
exofs: i_nlink races in rename()
nilfs2: i_nlink races in rename()
minix: i_nlink races in rename()
ufs: i_nlink races in rename()
sysv: i_nlink races in rename()

Linus Torvalds
2011-03-04 07:37:59 +0800
4c7fd114c Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: zero proper structure size for geometry calls

Linus Torvalds
2011-03-04 04:44:22 +0800
c640e13f8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
nilfs2: fix regression that i-flag is not set on changeless checkpoints

Linus Torvalds
2011-03-04 04:42:48 +0800
16a8b70a5 ceph: do not clear I_COMPLETE from d_release ... Browse Code »

First, this was racy anyway: d_release isn't called until well after the
dentry is unhashed. Second, this runs afoul of the recent dcache change
that clears d_parent prior to calling d_release (949854d0), causing a NULL
pointer dereference.

Signed-off-by: Sage Weil

Sage Weil
2011-03-04 02:09:52 +0800
b545cc150 ceph: do not set I_COMPLETE ... Browse Code »

Do not set the I_COMPLETE flag on directories until we resolve races with
dcache pruning.

Signed-off-by: Sage Weil

Sage Weil
2011-03-04 02:09:51 +0800
9bde178d0 Revert "ceph: keep reference to parent inode on ceph_dentry" ... Browse Code »

This reverts commit 97d79b403ef03f729883246208ef5d8a2ebc4d68.

This fails to account for d_parent changes due to rename or disconnected
dentries due to submounts or NFS reexports.

Signed-off-by: Sage Weil

Sage Weil
2011-03-04 02:09:50 +0800

03 Mar, 2011

10 commits

69102e9b4 hfs: fix rename() over non-empty directory ... Browse Code »

merge hfs_unlink() and hfs_rmdir(), while we are at it.

Signed-off-by: Al Viro

Al Viro
2011-03-03 14:28:40 +0800
810c1b2e4 udf: fix i_nlink limit ... Browse Code »

(256 << sizeof(x)) - 1 is not the maximal possible value of x...
In reality, the maximal allowed value for UDF FileLinkCount is
65535.

Signed-off-by: Al Viro

Al Viro
2011-03-03 14:28:40 +0800
99890a3be fix reiserfs mkdir() breakage ... Browse Code »

if directory has so many subdirectories that its link count is set
to 1 (i.e. "can't tell accurately") and reiserfs_new_inode() fails,
we shouldn't decrement the parent's link count in cleanup path;
that's what DEC_DIR_INODE_NLINK() is for. As it is, we end up
with parent suddenly getting zero i_nlink, with very unpleasant
effects.

Signed-off-by: Al Viro

Al Viro
2011-03-03 14:28:40 +0800
babfe5604 exofs: i_nlink races in rename() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-03-03 14:28:17 +0800
30eb43d31 nilfs2: i_nlink races in rename() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-03-03 14:28:17 +0800
6f88049ca minix: i_nlink races in rename() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-03-03 14:28:16 +0800
37750cdda ufs: i_nlink races in rename() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-03-03 14:28:16 +0800
4787d45fa sysv: i_nlink races in rename() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-03-03 14:28:16 +0800
f7d222ea2 Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6 ... Browse Code »

* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
of/promtree: allow DT device matching by fixing 'name' brokenness (v5)
x86: OLPC: have prom_early_alloc BUG rather than return NULL
of/flattree: Drop an uninteresting message to pr_debug level
of: Add missing of_address.h to xilinx ehci driver

Linus Torvalds
2011-03-03 12:01:57 +0800
8aaccf7fa of/flattree: Drop an uninteresting message to pr_debug level ... Browse Code »

This message looks like an error (which it isn't) when booting with a
flattened device tree. Remove the message from normal kernel builds.

Signed-off-by: Paul Bolle
Signed-off-by: Grant Likely

Paul Bolle
2011-03-03 04:45:18 +0800

02 Mar, 2011

2 commits

e8a80c6f7 ext2: Fix link count corruption under heavy link+rename load ... Browse Code »

vfs_rename_other() does not lock renamed inode with i_mutex. Thus changing
i_nlink in a non-atomic manner (which happens in ext2_rename()) can corrupt
it as reported and analyzed by Josh.

In fact, there is no good reason to mess with i_nlink of the moved file.
We did it presumably to simulate linking into the new directory and unlinking
from an old one. But the practical effect of this is disputable because fsck
can possibly treat file as being properly linked into both directories without
writing any error which is confusing. So we just stop increment-decrement
games with i_nlink which also fixes the corruption.

CC: stable@kernel.org
CC: Al Viro
Signed-off-by: Josh Hunt
Signed-off-by: Jan Kara

Josh Hunt
2011-03-02 18:03:52 +0800
af24ee9ea xfs: zero proper structure size for geometry calls ... Browse Code »

Commit 493f3358cb289ccf716c5a14fa5bb52ab75943e5 added this call to
xfs_fs_geometry() in order to avoid passing kernel stack data back
to user space:

+ memset(geo, 0, sizeof(*geo));

Unfortunately, one of the callers of that function passes the
address of a smaller data type, cast to fit the type that
xfs_fs_geometry() requires. As a result, this can happen:

Kernel panic - not syncing: stack-protector: Kernel stack is corrupted
in: f87aca93

Pid: 262, comm: xfs_fsr Not tainted 2.6.38-rc6-493f3358cb2+ #1
Call Trace:

[] ? panic+0x50/0x150
[] ? __stack_chk_fail+0x10/0x18
[] ? xfs_ioc_fsgeometry_v1+0x56/0x5d [xfs]

Fix this by fixing that one caller to pass the right type and then
copy out the subset it is interested in.

Note: This patch is an alternative to one originally proposed by
Eric Sandeen.

Reported-by: Jeffrey Hundstad
Signed-off-by: Alex Elder
Reviewed-by: Eric Sandeen
Tested-by: Jeffrey Hundstad

Alex Elder
2011-03-02 11:21:13 +0800