Eric Lee / linux-smarc-t335x-v3.2

20 Jun, 2011

2 commits

8e833fd2e fix comment in generic_permission() ... Browse Code »

CAP_DAC_OVERRIDE is enough for MAY_EXEC on directory, even if
no exec bits are set.

Signed-off-by: Al Viro

Al Viro
2011-06-20 22:45:56 +0800
6291176bc kill obsolete comment for follow_down() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-06-20 22:45:49 +0800

16 Jun, 2011

2 commits

8aef18845 VFS: Fix vfsmount overput on simultaneous automount ... Browse Code »

[Kudos to dhowells for tracking that crap down]

If two processes attempt to cause automounting on the same mountpoint at the
same time, the vfsmount holding the mountpoint will be left with one too few
references on it, causing a BUG when the kernel tries to clean up.

The problem is that lock_mount() drops the caller's reference to the
mountpoint's vfsmount in the case where it finds something already mounted on
the mountpoint as it transits to the mounted filesystem and replaces path->mnt
with the new mountpoint vfsmount.

During a pathwalk, however, we don't take a reference on the vfsmount if it is
the same as the one in the nameidata struct, but do_add_mount() doesn't know
this.

The fix is to make sure we have a ref on the vfsmount of the mountpoint before
calling do_add_mount(). However, if lock_mount() doesn't transit, we're then
left with an extra ref on the mountpoint vfsmount which needs releasing.
We can handle that in follow_managed() by not making assumptions about what
we can and what we cannot get from lookup_mnt() as the current code does.

The callers of follow_managed() expect that reference to path->mnt will be
grabbed iff path->mnt has been changed. follow_managed() and follow_automount()
keep track of whether such reference has been grabbed and assume that it'll
happen in those and only those cases that'll have us return with changed
path->mnt. That assumption is almost correct - it breaks in case of
racing automounts and in even harder to hit race between following a mountpoint
and a couple of mount --move. The thing is, we don't need to make that
assumption at all - after the end of loop in follow_manage() we can check
if path->mnt has ended up unchanged and do mntput() if needed.

The BUG can be reproduced with the following test program:

#include
#include
#include
#include
#include
int main(int argc, char **argv)
{
int pid, ws;
struct stat buf;
pid = fork();
stat(argv[1], &buf);
if (pid > 0) wait(&ws);
return 0;
}

and the following procedure:

(1) Mount an NFS volume that on the server has something else mounted on a
subdirectory. For instance, I can mount / from my server:

mount warthog:/ /mnt -t nfs4 -r

On the server /data has another filesystem mounted on it, so NFS will see
a change in FSID as it walks down the path, and will mark /mnt/data as
being a mountpoint. This will cause the automount code to be triggered.

!!! Do not look inside the mounted fs at this point !!!

(2) Run the above program on a file within the submount to generate two
simultaneous automount requests:

/tmp/forkstat /mnt/data/testfile

(3) Unmount the automounted submount:

umount /mnt/data

(4) Unmount the original mount:

umount /mnt

At this point the kernel should throw a BUG with something like the
following:

BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12]

Note that the bug appears on the root dentry of the original mount, not the
mountpoint and not the submount because sys_umount() hasn't got to its final
mntput_no_expire() yet, but this isn't so obvious from the call trace:

[] shrink_dcache_for_umount+0x69/0x82
[] generic_shutdown_super+0x37/0x15b
[] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs]
[] kill_anon_super+0x1d/0x7e
[] nfs4_kill_super+0x60/0xb6 [nfs]
[] deactivate_locked_super+0x34/0x83
[] deactivate_super+0x6f/0x7b
[] mntput_no_expire+0x18d/0x199
[] mntput+0x3b/0x44
[] release_mounts+0xa2/0xbf
[] sys_umount+0x47a/0x4ba
[] ? trace_hardirqs_on_caller+0x1fd/0x22f
[] system_call_fastpath+0x16/0x1b

as do_umount() is inlined. However, you can see release_mounts() in there.

Note also that it may be necessary to have multiple CPU cores to be able to
trigger this bug.

Tested-by: Jeff Layton
Tested-by: Ian Kent
Signed-off-by: David Howells
Signed-off-by: Al Viro

Al Viro
2011-06-16 23:28:16 +0800
50338b889 fix wrong iput on d_inode introduced by e6bc45d65d ... Browse Code »

Git bisection shows that commit e6bc45d65df8599fdbae73be9cec4ceed274db53 causes
BUG_ONs under high I/O load:

kernel BUG at fs/inode.c:1368!
[ 2862.501007] Call Trace:
[ 2862.501007] [] d_kill+0xf8/0x140
[ 2862.501007] [] dput+0xc9/0x190
[ 2862.501007] [] fput+0x15f/0x210
[ 2862.501007] [] filp_close+0x61/0x90
[ 2862.501007] [] sys_close+0xb1/0x110
[ 2862.501007] [] system_call_fastpath+0x16/0x1b

A reliable way to reproduce this bug is:
Login to KDE, run 'rsnapshot sync', and apt-get install openjdk-6-jdk,
and apt-get remove openjdk-6-jdk.

The buggy part of the patch is this:
struct inode *inode = NULL;
.....
- if (nd.last.name[nd.last.len])
- goto slashes;
inode = dentry->d_inode;
- if (inode)
- ihold(inode);
+ if (nd.last.name[nd.last.len] || !inode)
+ goto slashes;
+ ihold(inode)
...
if (inode)
iput(inode); /* truncate the inode here */

If nd.last.name[nd.last.len] is nonzero (and thus goto slashes branch is taken),
and dentry->d_inode is non-NULL, then this code now does an additional iput on
the inode, which is wrong.

Fix this by only setting the inode variable if nd.last.name[nd.last.len] is 0.

Reference: https://lkml.org/lkml/2011/6/15/50
Reported-by: Norbert Preining
Reported-by: Török Edwin
Cc: "Theodore Ts'o"
Cc: Al Viro
Signed-off-by: Török Edwin
Signed-off-by: Al Viro

Török Edwin
2011-06-16 23:27:39 +0800

07 Jun, 2011

1 commit

e6bc45d65 vfs: make unlink() and rmdir() return ENOENT in preference to EROFS ... Browse Code »

If user space attempts to remove a non-existent file or directory, and
the file system is mounted read-only, return ENOENT instead of EROFS.
Either error code is arguably valid/correct, but ENOENT is a more
specific error message.

Reported-by: Michael Tokarev
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Al Viro

Theodore Ts'o
2011-06-07 20:51:14 +0800

30 May, 2011

1 commit

3cebde241 vfs: shrink_dcache_parent before rmdir, dir rename ... Browse Code »

The dentry_unhash push-down series missed that shink_dcache_parent needs to
be called prior to rmdir or dir rename to clear DCACHE_REFERENCED and
allow efficient dentry reclaim.

Reported-by: Dave Chinner
Signed-off-by: Sage Weil
Signed-off-by: Al Viro

Sage Weil
2011-05-30 13:48:27 +0800

27 May, 2011

4 commits

d6e9bd256 Lift the check for automount points into do_lookup() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-05-27 19:03:15 +0800
dea393761 Trim excessive arguments of follow_mount_rcu() ... Browse Code »

... and kill a useless local variable in follow_dotdot_rcu(), while
we are at it - follow_mount_rcu(nd, path, inode) *always* assigned
value to *inode, and always it had been path->dentry->d_inode (aka
nd->path.dentry->d_inode, since it always got &nd->path as the second
argument).

Signed-off-by: Al Viro

Al Viro
2011-05-27 19:01:49 +0800
287548e46 split __follow_mount_rcu() into normal and .. cases ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-05-27 18:51:56 +0800
32e51f141 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (25 commits)
cifs: remove unnecessary dentry_unhash on rmdir/rename_dir
ocfs2: remove unnecessary dentry_unhash on rmdir/rename_dir
exofs: remove unnecessary dentry_unhash on rmdir/rename_dir
nfs: remove unnecessary dentry_unhash on rmdir/rename_dir
ext2: remove unnecessary dentry_unhash on rmdir/rename_dir
ext3: remove unnecessary dentry_unhash on rmdir/rename_dir
ext4: remove unnecessary dentry_unhash on rmdir/rename_dir
btrfs: remove unnecessary dentry_unhash in rmdir/rename_dir
ceph: remove unnecessary dentry_unhash calls
vfs: clean up vfs_rename_other
vfs: clean up vfs_rename_dir
vfs: clean up vfs_rmdir
vfs: fix vfs_rename_dir for FS_RENAME_DOES_D_MOVE filesystems
libfs: drop unneeded dentry_unhash
vfs: update dentry_unhash() comment
vfs: push dentry_unhash on rename_dir into file systems
vfs: push dentry_unhash on rmdir into file systems
vfs: remove dget() from dentry_unhash()
vfs: dentry_unhash immediately prior to rmdir
vfs: Block mmapped writes while the fs is frozen
...

Linus Torvalds
2011-05-27 00:52:14 +0800

26 May, 2011

11 commits

51892bbb5 vfs: clean up vfs_rename_other ... Browse Code »

Simplify control flow to match vfs_rename_dir.

Signed-off-by: Sage Weil
Signed-off-by: Al Viro

Sage Weil
2011-05-26 19:26:53 +0800
9055cba71 vfs: clean up vfs_rename_dir ... Browse Code »

Simplify control flow through vfs_rename_dir.

Signed-off-by: Sage Weil
Signed-off-by: Al Viro

Sage Weil
2011-05-26 19:26:52 +0800
912dbc15d vfs: clean up vfs_rmdir ... Browse Code »

Simplify the control flow with an out label.

Signed-off-by: Sage Weil
Signed-off-by: Al Viro

Sage Weil
2011-05-26 19:26:51 +0800
b5afd2c40 vfs: fix vfs_rename_dir for FS_RENAME_DOES_D_MOVE filesystems ... Browse Code »

vfs_rename_dir() doesn't properly account for filesystems with
FS_RENAME_DOES_D_MOVE. If new_dentry has a target inode attached, it
unhashes the new_dentry prior to the rename() iop and rehashes it after,
but doesn't account for the possibility that rename() may have swapped
{old,new}_dentry. For FS_RENAME_DOES_D_MOVE filesystems, it rehashes
new_dentry (now the old renamed-from name, which d_move() expected to go
away), such that a subsequent lookup will find it. Currently all
FS_RENAME_DOES_D_MOVE filesystems compensate for this by failing in
d_revalidate.

The bug was introduced by: commit 349457ccf2592c14bdf13b6706170ae2e94931b1
"[PATCH] Allow file systems to manually d_move() inside of ->rename()"

Fix by not rehashing the new dentry. Rehashing used to be needed by
d_move() but isn't anymore.

Reported-by: Sage Weil
Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2011-05-26 19:26:50 +0800
a71905f0d vfs: update dentry_unhash() comment ... Browse Code »

The helper is now only called by file systems, not the VFS.

Signed-off-by: Sage Weil
Signed-off-by: Al Viro

Sage Weil
2011-05-26 19:26:49 +0800
e4eaac06b vfs: push dentry_unhash on rename_dir into file systems ... Browse Code »

Only a few file systems need this. Start by pushing it down into each
rename method (except gfs2 and xfs) so that it can be dealt with on a
per-fs basis.

Acked-by: Christoph Hellwig
Signed-off-by: Sage Weil
Signed-off-by: Al Viro

Sage Weil
2011-05-26 19:26:48 +0800
79bf7c732 vfs: push dentry_unhash on rmdir into file systems ... Browse Code »

Only a few file systems need this. Start by pushing it down into each
fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs
basis.

This does not change behavior for any in-tree file systems.

Acked-by: Christoph Hellwig
Signed-off-by: Sage Weil
Signed-off-by: Al Viro

Sage Weil
2011-05-26 19:26:47 +0800
64252c75a vfs: remove dget() from dentry_unhash() ... Browse Code »

This serves no useful purpose that I can discern. All callers (rename,
rmdir) hold their own reference to the dentry.

A quick audit of all file systems showed no relevant checks on the value
of d_count in vfs_rmdir/vfs_rename_dir paths.

Signed-off-by: Sage Weil
Signed-off-by: Al Viro

Sage Weil
2011-05-26 19:26:46 +0800
48293699a vfs: dentry_unhash immediately prior to rmdir ... Browse Code »

This presumes that there is no reason to unhash a dentry if we fail because
it is a mountpoint or the LSM check fails, and that the LSM checks do not
depend on the dentry being unhashed.

Signed-off-by: Sage Weil
Signed-off-by: Al Viro

Sage Weil
2011-05-26 19:26:46 +0800
9f1fafee9 merge handle_reval_dot and nameidata_drop_rcu_last ... Browse Code »

new helper: complete_walk(). Done on successful completion
of walk, drops out of RCU mode, does d_revalidate of final
result if that hadn't been done already.

handle_reval_dot() and nameidata_drop_rcu_last() subsumed into
that one; callers converted to use of complete_walk().

Signed-off-by: Al Viro

Al Viro
2011-05-26 19:26:32 +0800
19660af73 consolidate nameidata_..._drop_rcu() ... Browse Code »

Merge these into a single function (unlazy_walk(nd, dentry)),
kill ..._maybe variants

Signed-off-by: Al Viro

Al Viro
2011-05-26 19:26:02 +0800

21 May, 2011

1 commit

1a4022f88 VFS: move BUG_ON test for symlink nd->depth after current->link_count test ... Browse Code »

This solves a serious VFS-level bug in nested_symlink (which was
rewritten from do_follow_link), and follows the order of depth tests
that existed before.

The bug triggers a BUG_ON in fs/namei.c:1381, when running racer with
symlink and rename ops.

Signed-off-by: Erez Zadok
Acked-by: Miklos Szeredi
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Erez Zadok
2011-05-21 15:12:16 +0800

14 May, 2011

1 commit

26cf46be9 vfs: micro-optimize acl_permission_check() ... Browse Code »

It's a hot function, and we're better off not mixing types in the mask
calculations. The compiler just ends up mixing 16-bit and 32-bit
operations, for no good reason.

So do everything in 'unsigned int' rather than mixing 'unsigned int'
masking with a 'umode_t' (16-bit) mode variable.

This, together with the parent commit (47a150edc2ae: "Cache user_ns in
struct cred") makes acl_permission_check() much nicer.

Signed-off-by: Linus Torvalds

Linus Torvalds
2011-05-14 02:51:01 +0800

16 Apr, 2011

1 commit

c1530019e vfs: Fix absolute RCU path walk failures due to uninitialized seq number ... Browse Code »

During RCU walk in path_lookupat and path_openat, the rcu lookup
frequently failed if looking up an absolute path, because when root
directory was looked up, seq number was not properly set in nameidata.

We dropped out of RCU walk in nameidata_drop_rcu due to mismatch in
directory entry's seq number. We reverted to slow path walk that need
to take references.

With the following patch, I saw a 50% increase in an exim mail server
benchmark throughput on a 4-socket Nehalem-EX system.

Signed-off-by: Tim Chen
Reviewed-by: Andi Kleen
Cc: stable@kernel.org (v2.6.38)
Signed-off-by: Linus Torvalds

Tim Chen
2011-04-16 06:28:12 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

25 Mar, 2011

1 commit

62a7375e5 vfs - check non-mountpoint dentry might block in __follow_mount_rcu() ... Browse Code »

When following a mount in rcu-walk mode we must check if the incoming dentry
is telling us it may need to block, even if it isn't actually a mountpoint.

Signed-off-by: Ian Kent
Signed-off-by: Al Viro

Ian Kent
2011-03-25 02:24:32 +0800

24 Mar, 2011

3 commits

b81a618dc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
deal with races in /proc/*/{syscall,stack,personality}
proc: enable writing to /proc/pid/mem
proc: make check_mem_permission() return an mm_struct on success
proc: hold cred_guard_mutex in check_mem_permission()
proc: disable mem_write after exec
mm: implement access_remote_vm
mm: factor out main logic of access_process_vm
mm: use mm_struct to resolve gate vma's in __get_user_pages
mm: arch: rename in_gate_area_no_task to in_gate_area_no_mm
mm: arch: make in_gate_area take an mm_struct instead of a task_struct
mm: arch: make get_gate_vma take an mm_struct instead of a task_struct
x86: mark associated mm when running a task in 32 bit compatibility mode
x86: add context tag to mark mm when running a task in 32-bit compatibility mode
auxv: require the target to be tracable (or yourself)
close race in /proc/*/environ
report errors in /proc/*/*map* sanely
pagemap: close races with suid execve
make sessionid permissions in /proc/*/task/* match those in /proc/*
fix leaks in path_lookupat()

Fix up trivial conflicts in fs/proc/base.c

Linus Torvalds
2011-03-24 11:51:42 +0800
2e1496707 userns: rename is_owner_or_cap to inode_owner_or_capable ... Browse Code »

And give it a kernel-doc comment.

[akpm@linux-foundation.org: btrfs changed in linux-next]
Signed-off-by: Serge E. Hallyn
Cc: "Eric W. Biederman"
Cc: Daniel Lezcano
Acked-by: David Howells
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2011-03-24 10:47:13 +0800
e795b7179 userns: userns: check user namespace for task->file uid equivalence checks ... Browse Code »

Cheat for now and say all files belong to init_user_ns. Next step will be
to let superblocks belong to a user_ns, and derive inode_userns(inode)
from inode->i_sb->s_user_ns. Finally we'll introduce more flexible
arrangements.

Changelog:
Feb 15: make is_owner_or_cap take const struct inode
Feb 23: make is_owner_or_cap bool

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Serge E. Hallyn
Acked-by: "Eric W. Biederman"
Acked-by: Daniel Lezcano
Acked-by: David Howells
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2011-03-24 10:47:08 +0800

23 Mar, 2011

1 commit

bd23a539d fix leaks in path_lookupat() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-03-23 21:56:55 +0800

18 Mar, 2011

2 commits

1aed3e420 lose 'mounting_here' argument in ->d_manage() ... Browse Code »

it's always false...

Signed-off-by: Al Viro

Al Viro
2011-03-18 22:01:59 +0800
7cc90cc3f don't pass 'mounting_here' flag to follow_down() ... Browse Code »

it's always false now

Signed-off-by: Al Viro

Al Viro
2011-03-18 21:04:20 +0800

16 Mar, 2011

8 commits

0e794589e fix follow_link() breakage ... Browse Code »

commit 574197e0de46a8a4db5c54ef7b65e43ffa8873a7 had a missing
piece, breaking the loop detection ;-/

Signed-off-by: Al Viro

Al Viro
2011-03-16 16:57:03 +0800
574197e0d tidy the trailing symlinks traversal up ... Browse Code »

* pull the handling of current->total_link_count into
__do_follow_link()
* put the common "do ->put_link() if needed and path_put() the link"
stuff into a helper (put_link(nd, link, cookie))
* rename __do_follow_link() to follow_link(), while we are at it

Signed-off-by: Al Viro

Al Viro
2011-03-16 05:16:25 +0800
b356379a0 Turn resolution of trailing symlinks iterative everywhere ... Browse Code »

The last remaining place (resolution of nested symlink) converted
to the loop of the same kind we have in path_lookupat() and
path_openat().

Note that we still *do* have a recursion in pathname resolution;
can't avoid it, really. However, it's strictly for nested symlinks
now - i.e. ones in the middle of a pathname.

link_path_walk() has lost the tail now - it always walks everything
except the last component.

do_follow_link() renamed to nested_symlink() and moved down.

Signed-off-by: Al Viro

Al Viro
2011-03-16 05:16:25 +0800
ce0525449 simplify link_path_walk() tail ... Browse Code »

Now that link_path_walk() is called without LOOKUP_PARENT
only from do_follow_link(), we can simplify the checks in
last component handling. First of all, checking if we'd
arrived to a directory is not needed - the caller will check
it anyway. And LOOKUP_FOLLOW is guaranteed to be there,
since we only get to that place with nd->depth > 0.

Signed-off-by: Al Viro

Al Viro
2011-03-16 05:16:25 +0800
bd92d7fed Make trailing symlink resolution in path_lookupat() iterative ... Browse Code »

Now the only caller of link_path_walk() that does *not* pass
LOOKUP_PARENT is do_follow_link()

Signed-off-by: Al Viro

Al Viro
2011-03-16 05:16:25 +0800
b21041d0f update nd->inode in __do_follow_link() instead of after do_follow_link() ... Browse Code »

... and note that we only need to do it for LAST_BIND symlinks

Signed-off-by: Al Viro

Al Viro
2011-03-16 05:16:25 +0800
ce57dfc17 pull handling of one pathname component into a helper ... Browse Code »

new helper: walk_component(). Handles everything except symlinks;
returns negative on error, 0 on success and 1 on symlinks we decided
to follow. Drops out of RCU mode on such symlinks.

link_path_walk() and do_last() switched to using that.

Signed-off-by: Al Viro

Al Viro
2011-03-16 05:16:20 +0800
11a7b371b fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH ... Browse Code »

We don't want to allow creation of private hardlinks by different application
using the fd passed to them via SCM_RIGHTS. So limit the null relative name
usage in linkat syscall to CAP_DAC_READ_SEARCH

Signed-off-by: Aneesh Kumar K.V

Aneesh Kumar K.V
2011-03-16 05:16:05 +0800