Doug / smarc-fsl-linux-kernel | Embedian Git Server

22 Aug, 2012

1 commit

55852635a fs: fix fs/namei.c kernel-doc warnings ... Browse Code »

Fix kernel-doc warnings in fs/namei.c:

Warning(fs/namei.c:360): No description found for parameter 'inode'
Warning(fs/namei.c:672): No description found for parameter 'nd'

Signed-off-by: Randy Dunlap
Cc: Alexander Viro
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Al Viro

Randy Dunlap
2012-08-22 22:30:10 +0800

17 Aug, 2012

1 commit

62b2ce964 vfs: fix propagation of atomic_open create error on negative dentry ... Browse Code »

If ->atomic_open() returns -ENOENT, we take care to return the create
error (e.g., EACCES), if any. Do the same when ->atomic_open() returns 1
and provides a negative dentry.

This fixes a regression where an unprivileged open O_CREAT fails with
ENOENT instead of EACCES, introduced with the new atomic_open code. It
is tested by the open/08.t test in the pjd posix test suite, and was
observed on top of fuse (backed by ceph-fuse).

Signed-off-by: Sage Weil
Signed-off-by: Miklos Szeredi

Sage Weil
2012-08-17 01:29:09 +0800

15 Aug, 2012

2 commits

38227f78a vfs: pass right create mode to may_o_create() ... Browse Code »

Pass the umask-ed create mode to may_o_create() instead of the original one.

Signed-off-by: Miklos Szeredi
Tested-by: Richard W.M. Jones

Miklos Szeredi
2012-08-15 19:01:24 +0800
62b259d8b vfs: atomic_open(): fix create mode usage ... Browse Code »

Don't mask S_ISREG off the create mode before passing to ->atomic_open(). Other
methods (->create, ->mknod) also get the complete file mode and filesystems
expect it.

Reported-by: Steve
Reported-by: Richard W.M. Jones
Signed-off-by: Miklos Szeredi
Tested-by: Richard W.M. Jones

Miklos Szeredi
2012-08-15 19:01:24 +0800

31 Jul, 2012

2 commits

c30dabfe5 fs: Push mnt_want_write() outside of i_mutex ... Browse Code »

Currently, mnt_want_write() is sometimes called with i_mutex held and sometimes
without it. This isn't really a problem because mnt_want_write() is a
non-blocking operation (essentially has a trylock semantics) but when the
function starts to handle also frozen filesystems, it will get a full lock
semantics and thus proper lock ordering has to be established. So move
all mnt_want_write() calls outside of i_mutex.

One non-trivial case needing conversion is kern_path_create() /
user_path_create() which didn't include mnt_want_write() but now needs to
because it acquires i_mutex. Because there are virtual file systems which
don't bother with freeze / remount-ro protection we actually provide both
versions of the function - one which calls mnt_want_write() and one which does
not.

[AV: scratch the previous, mnt_want_write() has been moved to kern_path_create()
by now]

Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2012-07-31 05:02:49 +0800
64894cf84 simplify lookup_open()/atomic_open() - do the temporary mnt_want_write() early ... Browse Code »

The write ref to vfsmount taken in lookup_open()/atomic_open() is going to
be dropped; we take the one to stay in dentry_open(). Just grab the temporary
in caller if it looks like we are going to need it (create/truncate/writable open)
and pass (by value) "has it succeeded" flag. Instead of doing mnt_want_write()
inside, check that flag and treat "false" as "mnt_want_write() has just failed".
mnt_want_write() is cheap and the things get considerably simpler and more robust
that way - we get it and drop it in the same function, to start with, rather
than passing a "has something in the guts of really scary functions taken it"
back to caller.

Signed-off-by: Al Viro

Al Viro
2012-07-31 04:53:35 +0800

30 Jul, 2012

7 commits

f8310c592 fix O_EXCL handling for devices ... Browse Code »

O_EXCL without O_CREAT has different semantics; it's "fail if already opened",
not "fail if already exists". commit 71574865 broke that...

Signed-off-by: Al Viro

Al Viro
2012-07-30 15:50:30 +0800
a51d9eaa4 fs: add link restriction audit reporting ... Browse Code »

Adds audit messages for unexpected link restriction violations so that
system owners will have some sort of potentially actionable information
about misbehaving processes.

Signed-off-by: Kees Cook
Signed-off-by: Al Viro

Kees Cook
2012-07-30 01:43:08 +0800
800179c9b fs: add link restrictions ... Browse Code »

This adds symlink and hardlink restrictions to the Linux VFS.

Symlinks:

A long-standing class of security issues is the symlink-based
time-of-check-time-of-use race, most commonly seen in world-writable
directories like /tmp. The common method of exploitation of this flaw
is to cross privilege boundaries when following a given symlink (i.e. a
root process follows a symlink belonging to another user). For a likely
incomplete list of hundreds of examples across the years, please see:
http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp

The solution is to permit symlinks to only be followed when outside
a sticky world-writable directory, or when the uid of the symlink and
follower match, or when the directory owner matches the symlink's owner.

Some pointers to the history of earlier discussion that I could find:

1996 Aug, Zygo Blaxell
http://marc.info/?l=bugtraq&m=87602167419830&w=2
1996 Oct, Andrew Tridgell
http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html
1997 Dec, Albert D Cahalan
http://lkml.org/lkml/1997/12/16/4
2005 Feb, Lorenzo Hernández García-Hierro
http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html
2010 May, Kees Cook
https://lkml.org/lkml/2010/5/30/144

Past objections and rebuttals could be summarized as:

- Violates POSIX.
- POSIX didn't consider this situation and it's not useful to follow
a broken specification at the cost of security.
- Might break unknown applications that use this feature.
- Applications that break because of the change are easy to spot and
fix. Applications that are vulnerable to symlink ToCToU by not having
the change aren't. Additionally, no applications have yet been found
that rely on this behavior.
- Applications should just use mkstemp() or O_CREATE|O_EXCL.
- True, but applications are not perfect, and new software is written
all the time that makes these mistakes; blocking this flaw at the
kernel is a single solution to the entire class of vulnerability.
- This should live in the core VFS.
- This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135)
- This should live in an LSM.
- This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188)

Hardlinks:

On systems that have user-writable directories on the same partition
as system files, a long-standing class of security issues is the
hardlink-based time-of-check-time-of-use race, most commonly seen in
world-writable directories like /tmp. The common method of exploitation
of this flaw is to cross privilege boundaries when following a given
hardlink (i.e. a root process follows a hardlink created by another
user). Additionally, an issue exists where users can "pin" a potentially
vulnerable setuid/setgid file so that an administrator will not actually
upgrade a system fully.

The solution is to permit hardlinks to only be created when the user is
already the existing file's owner, or if they already have read/write
access to the existing file.

Many Linux users are surprised when they learn they can link to files
they have no access to, so this change appears to follow the doctrine
of "least surprise". Additionally, this change does not violate POSIX,
which states "the implementation may require that the calling process
has permission to access the existing file"[1].

This change is known to break some implementations of the "at" daemon,
though the version used by Fedora and Ubuntu has been fixed[2] for
a while. Otherwise, the change has been undisruptive while in use in
Ubuntu for the last 1.5 years.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html
[2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279

This patch is based on the patches in Openwall and grsecurity, along with
suggestions from Al Viro. I have added a sysctl to enable the protected
behavior, and documentation.

Signed-off-by: Kees Cook
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Kees Cook
2012-07-30 01:37:58 +0800
3134f37e9 vfs: don't let do_last pass negative dentry to audit_inode ... Browse Code »

I can reliably reproduce the following panic by simply setting an audit
rule on a recent 3.5.0+ kernel:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
IP: [] audit_copy_inode+0x10/0x90
PGD 7acd9067 PUD 7b8fb067 PMD 0
Oops: 0000 [#86] SMP
Modules linked in: nfs nfs_acl auth_rpcgss fscache lockd sunrpc tpm_bios btrfs zlib_deflate libcrc32c kvm_amd kvm joydev virtio_net pcspkr i2c_piix4 floppy virtio_balloon microcode virtio_blk cirrus drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan]
CPU 0
Pid: 1286, comm: abrt-dump-oops Tainted: G D 3.5.0+ #1 Bochs Bochs
RIP: 0010:[] [] audit_copy_inode+0x10/0x90
RSP: 0018:ffff88007aebfc38 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88003692d860 RCX: 00000000000038c4
RDX: 0000000000000000 RSI: ffff88006baf5d80 RDI: ffff88003692d860
RBP: ffff88007aebfc68 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: ffff880036d30f00 R14: ffff88006baf5d80 R15: ffff88003692d800
FS: 00007f7562634740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000040 CR3: 000000003643d000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process abrt-dump-oops (pid: 1286, threadinfo ffff88007aebe000, task ffff880079614530)
Stack:
ffff88007aebfdf8 ffff88007aebff28 ffff88007aebfc98 ffffffff81211358
ffff88003692d860 0000000000000000 ffff88007aebfcc8 ffffffff810d4968
ffff88007aebfcc8 ffff8800000038c4 0000000000000000 0000000000000000
Call Trace:
[] ? ext4_lookup+0xe8/0x160
[] __audit_inode+0x118/0x2d0
[] do_last+0x999/0xe80
[] ? inode_permission+0x18/0x50
[] ? kmem_cache_alloc_trace+0x11a/0x130
[] path_openat+0xba/0x420
[] do_filp_open+0x41/0xa0
[] ? alloc_fd+0x4d/0x120
[] do_sys_open+0xed/0x1c0
[] ? __audit_syscall_entry+0xcc/0x300
[] sys_open+0x21/0x30
[] system_call_fastpath+0x16/0x1b
RSP
CR2: 0000000000000040

The problem is that do_last is passing a negative dentry to audit_inode.
The comments on lookup_open note that it can pass back a negative dentry
if O_CREAT is not set.

This patch fixes the oops, but I'm not clear on whether there's a better
approach.

Cc: Miklos Szeredi
Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-07-30 01:27:03 +0800
a8104a9fc pull mnt_want_write()/mnt_drop_write() into kern_path_create()/done_path_create() resp. ... Browse Code »

One side effect - attempt to create a cross-device link on a read-only fs fails
with EROFS instead of EXDEV now. Makes more sense, POSIX allows, etc.

Signed-off-by: Al Viro

Al Viro
2012-07-30 01:24:15 +0800
8e4bfca1d mknod: take sanity checks on mode into the very beginning ... Browse Code »

Note that applying umask can't affect their results. While
that affects errno in cases like
mknod("/no_such_directory/a", 030000)
yielding -EINVAL (due to impossible mode_t) instead of
-ENOENT (due to inexistent directory), IMO that makes a lot
more sense, POSIX allows to return either and any software
that relies on getting -ENOENT instead of -EINVAL in that
case deserves everything it gets.

Signed-off-by: Al Viro

Al Viro
2012-07-30 01:24:14 +0800
921a1650d new helper: done_path_create() ... Browse Code »

releases what needs to be released after {kern,user}_path_create()

Signed-off-by: Al Viro

Al Viro
2012-07-30 01:24:13 +0800

23 Jul, 2012

3 commits

32a7991b6 tidy up namei.c a bit ... Browse Code »

locking/unlocking for rcu walk taken to a couple of inline helpers

Signed-off-by: Al Viro

Al Viro
2012-07-23 04:00:55 +0800
3c0a61636 unobfuscate follow_up() a bit ... Browse Code »

really convoluted test in there has grown up during struct mount
introduction; what it checks is that we'd reached the root of
mount tree.

Al Viro
2012-07-23 04:00:45 +0800
1e0ea0014 use __lookup_hash() in kern_path_parent() ... Browse Code »

No need to bother with lookup_one_len() here - it's an overkill

Signed-off-by Al Viro

Al Viro
2012-07-23 03:57:53 +0800

14 Jul, 2012

24 commits

0bdaea901 VFS: Split inode_permission() ... Browse Code »

Split inode_permission() into inode- and superblock-dependent parts.

This is aimed at unionmounts where the superblock from the upper layer has to
be checked rather than the superblock from the lower layer as the upper layer
may be writable, thus allowing an unwritable file from the lower layer to be
copied up and modified.

Original-author: Valerie Aurora
Signed-off-by: David Howells (Further development)
Signed-off-by: Al Viro

David Howells
2012-07-14 20:38:36 +0800
f015f1267 VFS: Comment mount following code ... Browse Code »

Add comments describing what the directions "up" and "down" mean and ref count
handling to the VFS mount following family of functions.

Signed-off-by: Valerie Aurora (Original author)
Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2012-07-14 20:38:32 +0800
b5fb63c18 fs: add nd_jump_link ... Browse Code »

Add a helper that abstracts out the jump to an already parsed struct path
from ->follow_link operation from procfs. Not only does this clean up
the code by moving the two sides of this game into a single helper, but
it also prepares for making struct nameidata private to namei.c

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2012-07-14 20:35:40 +0800
408ef013c fs: move path_put on failure out of ->follow_link ... Browse Code »

Currently the non-nd_set_link based versions of ->follow_link are expected
to do a path_put(&nd->path) on failure. This calling convention is unexpected,
undocumented and doesn't match what the nd_set_link-based instances do.

Move the path_put out of the only non-nd_set_link based ->follow_link
instance into the caller.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2012-07-14 20:35:35 +0800
79714f72d get rid of kern_path_parent() ... Browse Code »

all callers want the same thing, actually - a kinda-sorta analog of
kern_path_create(). I.e. they want parent vfsmount/dentry (with
->i_mutex held, to make sure the child dentry is still their child)
+ the child dentry.

Signed-off-by Al Viro

Al Viro
2012-07-14 20:35:02 +0800
1acf0af9b VFS: Fix the banner comment on lookup_open() ... Browse Code »

Since commit 197e37d9, the banner comment on lookup_open() no longer matches
what the function returns. It used to return a struct file pointer or NULL and
now it returns an integer and is passed the struct file pointer it is to use
amongst its arguments. Update the comment to reflect this.

Also add a banner comment to atomic_open().

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2012-07-14 20:34:57 +0800
312b63fba don't pass nameidata * to vfs_create() ... Browse Code »

all we want is a boolean flag, same as the method gets now

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:34:50 +0800
ebfc3b49a don't pass nameidata to ->create() ... Browse Code »

boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:34:47 +0800
72bd866a0 fs/namei.c: don't pass nameidata to __lookup_hash() and lookup_real() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:34:40 +0800
00cd8dd3b stop passing nameidata to ->lookup() ... Browse Code »

Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument. And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:34:32 +0800
201f956e4 fs/namei.c: don't pass namedata to lookup_dcache() ... Browse Code »

just the flags...

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:34:25 +0800
4ce16ef3f fs/namei.c: don't pass nameidata to d_revalidate() ... Browse Code »

since the method wrapped by it doesn't need that anymore...

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:34:21 +0800
0b728e191 stop passing nameidata * to ->d_revalidate() ... Browse Code »

Just the lookup flags. Die, bastard, die...

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:34:14 +0800
2675a4eb6 fs/namei.c: get do_last() and friends return int ... Browse Code »

Same conventions as for ->atomic_open(). Trimmed the
forest of labels a bit, while we are at it...

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:33:43 +0800
30d904947 kill struct opendata ... Browse Code »

Just pass struct file *. Methods are happier that way...
There's no need to return struct file * from finish_open() now,
so let it return int. Next: saner prototypes for parts in
namei.c

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:33:39 +0800
a4a3bdd77 kill opendata->{mnt,dentry} ... Browse Code »

->filp->f_path is there for purpose...

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:33:37 +0800
d95852777 make ->atomic_open() return int ... Browse Code »

Change of calling conventions:
old new
NULL 1
file 0
ERR_PTR(-ve) -ve

Caller *knows* that struct file *; no need to return it.

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:33:35 +0800
3d8a00d20 don't modify od->filp at all ... Browse Code »

make put_filp() conditional on flag set by finish_open()

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:33:33 +0800
47237687d ->atomic_open() prototype change - pass int * instead of bool * ... Browse Code »

... and let finish_open() report having opened the file via that sucker.
Next step: don't modify od->filp at all.

[AV: FILE_CREATE was already used by cifs; Miklos' fix folded]

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:33:31 +0800
a8277b9ba vfs: move O_DIRECT check to common code ... Browse Code »

Perform open_check_o_direct() in a common place in do_last after opening the
file.

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2012-07-14 20:33:28 +0800
f60dc3db6 vfs: do_last(): clean up retry ... Browse Code »

Move the lookup retry logic to the bottom of the function to make the normal
case simpler to read.

Reported-by: David Howells
Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2012-07-14 20:33:26 +0800
77d660a8a vfs: do_last(): clean up bool ... Browse Code »

Consistently use bool for boolean values in do_last().

Reported-by: David Howells
Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2012-07-14 20:33:24 +0800
e83db1672 vfs: do_last(): clean up labels ... Browse Code »

Reported-by: David Howells
Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2012-07-14 20:33:21 +0800
aa4caadb7 vfs: do_last(): clean up error handling ... Browse Code »

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2012-07-14 20:33:20 +0800