Doug / smarc-fsl-linux-kernel | Embedian Git Server

15 Aug, 2012

1 commit

e68726ff7 vfs: canonicalize create mode in build_open_flags() ... Browse Code »

Userspace can pass weird create mode in open(2) that we canonicalize to
"(mode & S_IALLUGO) | S_IFREG" in vfs_create().

The problem is that we use the uncanonicalized mode before calling vfs_create()
with unforseen consequences.

So do the canonicalization early in build_open_flags().

Signed-off-by: Miklos Szeredi
Tested-by: Richard W.M. Jones
CC: stable@vger.kernel.org

Miklos Szeredi
2012-08-15 19:01:24 +0800

04 Aug, 2012

1 commit

fe7c80518 missed mnt_drop_write() in do_dentry_open() ... Browse Code »

This one ought to be __mnt_drop_write(), to match __mnt_want_write()
in the beginning...

Signed-off-by: Al Viro

Al Viro
2012-08-04 16:15:41 +0800

31 Jul, 2012

2 commits

14da92001 fs: Protect write paths by sb_start_write - sb_end_write ... Browse Code »

There are several entry points which dirty pages in a filesystem. mmap
(handled by block_page_mkwrite()), buffered write (handled by
__generic_file_aio_write()), splice write (generic_file_splice_write),
truncate, and fallocate (these can dirty last partial page - handled inside
each filesystem separately). Protect these places with sb_start_write() and
sb_end_write().

->page_mkwrite() calls are particularly complex since they are called with
mmap_sem held and thus we cannot use standard sb_start_write() due to lock
ordering constraints. We solve the problem by using a special freeze protection
sb_start_pagefault() which ranks below mmap_sem.

BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa
Tested-by: Peter M. Petrakis
Tested-by: Dann Frazier
Tested-by: Massimo Morana
Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2012-07-31 13:45:47 +0800
eb04c2828 fs: Add freezing handling to mnt_want_write() / mnt_drop_write() ... Browse Code »

Most of places where we want freeze protection coincides with the places where
we also have remount-ro protection. So make mnt_want_write() and
mnt_drop_write() (and their _file alternative) prevent freezing as well.
For the few cases that are really interested only in remount-ro protection
provide new function variants.

BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa
Tested-by: Peter M. Petrakis
Tested-by: Dann Frazier
Tested-by: Massimo Morana
Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2012-07-31 13:40:38 +0800

30 Jul, 2012

1 commit

b5bcdda32 take grabbing f->f_path to do_dentry_open() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-07-30 01:24:18 +0800

23 Jul, 2012

1 commit

765927b2d switch dentry_open() to struct path, make it grab references itself ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-07-23 04:01:29 +0800

14 Jul, 2012

12 commits

55e4def0a VFS: Make chown() and lchown() call fchownat() ... Browse Code »

Make the chown() and lchown() syscalls jump to the fchownat() syscall with the
appropriate extra arguments.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2012-07-14 20:35:54 +0800
c3c4f6942 do_dentry_open(): close the race with mark_files_ro() in failure exit ... Browse Code »

we want to take it out of mark_files_ro() reach *before* we start
checking if we ought to drop write access.

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:35:50 +0800
02e5180d9 do_dentry_open(): take initialization of file->f_path to caller ... Browse Code »

... and get rid of a couple of arguments and a pointless reassignment
in finish_open() case.

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:33:54 +0800
2a027e7a1 fold __dentry_open() into its sole caller ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:33:52 +0800
96b7e579a switch do_dentry_open() to returning int ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:33:49 +0800
e45198a6a make finish_no_open() return int ... Browse Code »

namely, 1 ;-) That's what we want to return from ->atomic_open()
instances after finish_no_open().

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:33:45 +0800
30d904947 kill struct opendata ... Browse Code »

Just pass struct file *. Methods are happier that way...
There's no need to return struct file * from finish_open() now,
so let it return int. Next: saner prototypes for parts in
namei.c

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:33:39 +0800
a4a3bdd77 kill opendata->{mnt,dentry} ... Browse Code »

->filp->f_path is there for purpose...

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:33:37 +0800
3d8a00d20 don't modify od->filp at all ... Browse Code »

make put_filp() conditional on flag set by finish_open()

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:33:33 +0800
47237687d ->atomic_open() prototype change - pass int * instead of bool * ... Browse Code »

... and let finish_open() report having opened the file via that sucker.
Next step: don't modify od->filp at all.

[AV: FILE_CREATE was already used by cifs; Miklos' fix folded]

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:33:31 +0800
015c3bbcd vfs: remove open intents from nameidata ... Browse Code »

All users of open intents have been converted to use ->atomic_{open,create}.

This patch gets rid of nd->intent.open and related infrastructure.

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2012-07-14 20:33:18 +0800
d18e9008c vfs: add i_op->atomic_open() ... Browse Code »

Add a new inode operation which is called on the last component of an open.
Using this the filesystem can look up, possibly create and open the file in one
atomic operation. If it cannot perform this (e.g. the file type turned out to
be wrong) it may signal this by returning NULL instead of an open struct file
pointer.

i_op->atomic_open() is only called if the last component is negative or needs
lookup. Handling cached positive dentries here doesn't add much value: these
can be opened using f_op->open(). If the cached file turns out to be invalid,
the open can be retried, this time using ->atomic_open() with a fresh dentry.

For now leave the old way of using open intents in lookup and revalidate in
place. This will be removed once all the users are converted.

David Howells noticed that if ->atomic_open() opens the file but does not create
it, handle_truncate() will be called on it even if it is not a regular file.
Fix this by checking the file type in this case too.

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2012-07-14 20:33:04 +0800

08 Jul, 2012

1 commit

332a2e124 vfs: make O_PATH file descriptors usable for 'fchdir()' ... Browse Code »

We already use them for openat() and friends, but fchdir() also wants to
be able to use O_PATH file descriptors. This should make it comparable
to the O_SEARCH of Solaris. In particular, O_PATH allows you to access
(not-quite-open) a directory you don't have read persmission to, only
execute permission.

Noticed during development of multithread support for ksh93.

Reported-by: ольга крыжановская
Cc: Al Viro
Cc: stable@kernel.org # O_PATH introduced in 3.0+
Signed-off-by: Linus Torvalds

Linus Torvalds
2012-07-08 08:19:02 +0800

02 Jun, 2012

4 commits

50ee93afc vfs: nameidata_to_filp(): don't throw away file on error ... Browse Code »

If open fails, don't put the file. This allows it to be reused if open needs to
be retried.

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2012-06-02 00:12:01 +0800
91daee988 vfs: nameidata_to_filp(): inline __dentry_open() ... Browse Code »

Copy __dentry_open() into nameidata_to_filp().

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2012-06-02 00:12:01 +0800
78f71eff3 vfs: do_dentry_open(): don't put filp ... Browse Code »

Move put_filp() out to __dentry_open(), the only caller now.

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2012-06-02 00:12:00 +0800
90ad1a8ec vfs: split __dentry_open() ... Browse Code »

Split __dentry_open() into two functions:

do_dentry_open() - does most of the actual work, doesn't put file on failure
open_check_o_direct() - after a successful open, checks direct_IO method

This will allow i_op->atomic_open to do just the file initialization and leave
the direct_IO checking to the VFS.

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2012-06-02 00:12:00 +0800

24 May, 2012

1 commit

644473e9c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull user namespace enhancements from Eric Biederman:
"This is a course correction for the user namespace, so that we can
reach an inexpensive, maintainable, and reasonably complete
implementation.

Highlights:
- Config guards make it impossible to enable the user namespace and
code that has not been converted to be user namespace safe.

- Use of the new kuid_t type ensures the if you somehow get past the
config guards the kernel will encounter type errors if you enable
user namespaces and attempt to compile in code whose permission
checks have not been updated to be user namespace safe.

- All uids from child user namespaces are mapped into the initial
user namespace before they are processed. Removing the need to add
an additional check to see if the user namespace of the compared
uids remains the same.

- With the user namespaces compiled out the performance is as good or
better than it is today.

- For most operations absolutely nothing changes performance or
operationally with the user namespace enabled.

- The worst case performance I could come up with was timing 1
billion cache cold stat operations with the user namespace code
enabled. This went from 156s to 164s on my laptop (or 156ns to
164ns per stat operation).

- (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
Most uid/gid setting system calls treat these value specially
anyway so attempting to use -1 as a uid would likely cause
entertaining failures in userspace.

- If setuid is called with a uid that can not be mapped setuid fails.
I have looked at sendmail, login, ssh and every other program I
could think of that would call setuid and they all check for and
handle the case where setuid fails.

- If stat or a similar system call is called from a context in which
we can not map a uid we lie and return overflowuid. The LFS
experience suggests not lying and returning an error code might be
better, but the historical precedent with uids is different and I
can not think of anything that would break by lying about a uid we
can't map.

- Capabilities are localized to the current user namespace making it
safe to give the initial user in a user namespace all capabilities.

My git tree covers all of the modifications needed to convert the core
kernel and enough changes to make a system bootable to runlevel 1."

Fix up trivial conflicts due to nearby independent changes in fs/stat.c

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
userns: Silence silly gcc warning.
cred: use correct cred accessor with regards to rcu read lock
userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
userns: Convert cgroup permission checks to use uid_eq
userns: Convert tmpfs to use kuid and kgid where appropriate
userns: Convert sysfs to use kgid/kuid where appropriate
userns: Convert sysctl permission checks to use kuid and kgids.
userns: Convert proc to use kuid/kgid where appropriate
userns: Convert ext4 to user kuid/kgid where appropriate
userns: Convert ext3 to use kuid/kgid where appropriate
userns: Convert ext2 to use kuid/kgid where appropriate.
userns: Convert devpts to use kuid/kgid where appropriate
userns: Convert binary formats to use kuid/kgid where appropriate
userns: Add negative depends on entries to avoid building code that is userns unsafe
userns: signal remove unnecessary map_cred_ns
userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
userns: Convert stat to return values mapped from kuids and kgids
userns: Convert user specfied uids and gids in chown into kuids and kgid
userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
...

Linus Torvalds
2012-05-24 08:42:39 +0800

03 May, 2012

2 commits

52137abe1 userns: Convert user specfied uids and gids in chown into kuids and kgid ... Browse Code »

Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2012-05-03 18:29:34 +0800
18815a180 userns: Convert capabilities related permsion checks ... Browse Code »

- Use uid_eq when comparing kuids
Use gid_eq when comparing kgids
- Use make_kuid(user_ns, 0) to talk about the user_namespace root uid

Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2012-05-03 18:28:40 +0800

10 Apr, 2012

1 commit

83d498569 SELinux: rename dentry_open to file_open ... Browse Code »

dentry_open takes a file, rename it to file_open

Signed-off-by: Eric Paris

Eric Paris
2012-04-10 00:22:50 +0800

20 Feb, 2012

1 commit

1dce27c5a Wrap accesses to the fd_sets in struct fdtable ... Browse Code »

Wrap accesses to the fd_sets in struct fdtable (for recording open files and
close-on-exec flags) so that we can move away from using fd_sets since we
abuse the fd_set structs by not allocating the full-sized structure under
normal circumstances and by non-core code looking at the internals of the
fd_sets.

The first abuse means that use of FD_ZERO() on these fd_sets is not permitted,
since that cannot be told about their abnormal lengths.

This introduces six wrapper functions for setting, clearing and testing
close-on-exec flags and fd-is-open flags:

void __set_close_on_exec(int fd, struct fdtable *fdt);
void __clear_close_on_exec(int fd, struct fdtable *fdt);
bool close_on_exec(int fd, const struct fdtable *fdt);
void __set_open_fd(int fd, struct fdtable *fdt);
void __clear_open_fd(int fd, struct fdtable *fdt);
bool fd_is_open(int fd, const struct fdtable *fdt);

Note that I've prepended '__' to the names of the set/clear functions because
they require the caller to hold a lock to use them.

Note also that I haven't added wrappers for looking behind the scenes at the
the array. Possibly that should exist too.

Signed-off-by: David Howells
Link: http://lkml.kernel.org/r/20120216174942.23314.1364.stgit@warthog.procyon.org.uk
Signed-off-by: H. Peter Anvin
Cc: Al Viro

David Howells
2012-02-20 02:30:52 +0800

07 Jan, 2012

1 commit

cdcf116d4 switch security_path_chmod() to struct path * ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-07 12:16:53 +0800

04 Jan, 2012

3 commits

a218d0fdc switch open and mkdir syscalls to umode_t ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:55:19 +0800
49f0a0767 switch sys_chmod()/sys_fchmod()/sys_fchmodat() to umode_t ... Browse Code »

SYSCALLx magic should take care of things, according to Linus...

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:55:12 +0800
2a79f17e4 vfs: mnt_drop_write_file() ... Browse Code »

new helper (wrapper around mnt_drop_write()) to be used in pair with
mnt_want_write_file().

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:52:40 +0800

28 Oct, 2011

1 commit

f3c7691e8 leases: fix write-open/read-lease race ... Browse Code »

In setlease, we use i_writecount to decide whether we can give out a
read lease.

In open, we break leases before incrementing i_writecount.

There is therefore a window between the break lease and the i_writecount
increment when setlease could add a new read lease.

This would leave us with a simultaneous write open and read lease, which
shouldn't happen.

Signed-off-by: J. Bruce Fields
Signed-off-by: Christoph Hellwig

J. Bruce Fields
2011-10-28 20:59:00 +0800

27 Jul, 2011

1 commit

e57712ebe merge fchmod() and fchmodat() guts, kill ancient broken kludge ... Browse Code »

The kludge in question is undocumented and doesn't work for 32bit
binaries on amd64, sparc64 and s390. Passing (mode_t)-1 as
mode had (since 0.99.14v and contrary to behaviour of any
other Unix, prescriptions of POSIX, SuS and our own manpages)
was kinda-sorta no-op. Note that any software relying on
that (and looking for examples shows none) would be visibly
broken on sparc64, where practically all userland is built
32bit. No such complaints noticed...

Signed-off-by: Al Viro

Al Viro
2011-07-27 03:07:43 +0800

23 Jul, 2011

1 commit

5a9a43646 vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp ... Browse Code »

Replace unclear (struct dentry *) to (struct file *) typecast with ERR_CAST() macro.

Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Al Viro

Konstantin Khlebnikov
2011-07-23 07:42:13 +0800

21 Mar, 2011

1 commit

c212f9aaf fs: Use BUG_ON(!mnt) at dentry_open(). ... Browse Code »

dentry_open() requires callers to pass a valid vfsmount.

Signed-off-by: Tetsuo Handa
Signed-off-by: Al Viro

Tetsuo Handa
2011-03-21 13:10:41 +0800

17 Mar, 2011

1 commit

0f6e0e844 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorri… ... Browse Code »

…s/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (33 commits)
AppArmor: kill unused macros in lsm.c
AppArmor: cleanup generated files correctly
KEYS: Add an iovec version of KEYCTL_INSTANTIATE
KEYS: Add a new keyctl op to reject a key with a specified error code
KEYS: Add a key type op to permit the key description to be vetted
KEYS: Add an RCU payload dereference macro
AppArmor: Cleanup make file to remove cruft and make it easier to read
SELinux: implement the new sb_remount LSM hook
LSM: Pass -o remount options to the LSM
SELinux: Compute SID for the newly created socket
SELinux: Socket retains creator role and MLS attribute
SELinux: Auto-generate security_is_socket_class
TOMOYO: Fix memory leak upon file open.
Revert "selinux: simplify ioctl checking"
selinux: drop unused packet flow permissions
selinux: Fix packet forwarding checks on postrouting
selinux: Fix wrong checks for selinux_policycap_netpeer
selinux: Fix check for xfrm selinux context algorithm
ima: remove unnecessary call to ima_must_measure
IMA: remove IMA imbalance checking
...

Linus Torvalds
2011-03-17 00:15:43 +0800

16 Mar, 2011

1 commit

a002951c9 Merge branch 'next' into for-linus Browse Code »

James Morris
2011-03-16 06:41:17 +0800

15 Mar, 2011

2 commits

65cfc6722 readlinkat(), fchownat() and fstatat() with empty relative pathnames ... Browse Code »

For readlinkat() we simply allow empty pathname; it will fail unless
we have dfd equal to O_PATH-opened symlink, so we are outside of
POSIX scope here. For fchownat() and fstatat() we allow AT_EMPTY_PATH;
let the caller explicitly ask for such behaviour.

Signed-off-by: Al Viro

Al Viro
2011-03-15 14:21:45 +0800
1abf0c718 New kind of open files - "location only". ... Browse Code »

New flag for open(2) - O_PATH. Semantics:
* pathname is resolved, but the file itself is _NOT_ opened
as far as filesystem is concerned.
* almost all operations on the resulting descriptors shall
fail with -EBADF. Exceptions are:
1) operations on descriptors themselves (i.e.
close(), dup(), dup2(), dup3(), fcntl(fd, F_DUPFD),
fcntl(fd, F_DUPFD_CLOEXEC, ...), fcntl(fd, F_GETFD),
fcntl(fd, F_SETFD, ...))
2) fcntl(fd, F_GETFL), for a common non-destructive way to
check if descriptor is open
3) "dfd" arguments of ...at(2) syscalls, i.e. the starting
points of pathname resolution
* closing such descriptor does *NOT* affect dnotify or
posix locks.
* permissions are checked as usual along the way to file;
no permission checks are applied to the file itself. Of course,
giving such thing to syscall will result in permission checks (at
the moment it means checking that starting point of ....at() is
a directory and caller has exec permissions on it).

fget() and fget_light() return NULL on such descriptors; use of
fget_raw() and fget_raw_light() is needed to get them. That protects
existing code from dealing with those things.

There are two things still missing (they come in the next commits):
one is handling of symlinks (right now we refuse to open them that
way; see the next commit for semantics related to those) and another
is descriptor passing via SCM_RIGHTS datagrams.

Signed-off-by: Al Viro

Al Viro
2011-03-15 14:21:45 +0800