Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

15 Dec, 2012

1 commit

5e4a08476 userns: Require CAP_SYS_ADMIN for most uses of setns. ... Browse Code »

Andy Lutomirski found a nasty little bug in
the permissions of setns. With unprivileged user namespaces it
became possible to create new namespaces without privilege.

However the setns calls were relaxed to only require CAP_SYS_ADMIN in
the user nameapce of the targed namespace.

Which made the following nasty sequence possible.

pid = clone(CLONE_NEWUSER | CLONE_NEWNS);
if (pid == 0) { /* child */
system("mount --bind /home/me/passwd /etc/passwd");
}
else if (pid != 0) { /* parent */
char path[PATH_MAX];
snprintf(path, sizeof(path), "/proc/%u/ns/mnt");
fd = open(path, O_RDONLY);
setns(fd, 0);
system("su -");
}

Prevent this possibility by requiring CAP_SYS_ADMIN
in the current user namespace when joing all but the user namespace.

Acked-by: Serge Hallyn
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2012-12-15 08:12:03 +0800

20 Nov, 2012

1 commit

98f842e67 proc: Usable inode numbers for the namespace file descriptors. ... Browse Code »

Assign a unique proc inode to each namespace, and use that
inode number to ensure we only allocate at most one proc
inode for every namespace in proc.

A single proc inode per namespace allows userspace to test
to see if two processes are in the same namespace.

This has been a long requested feature and only blocked because
a naive implementation would put the id in a global space and
would ultimately require having a namespace for the names of
namespaces, making migration and certain virtualization tricks
impossible.

We still don't have per superblock inode numbers for proc, which
appears necessary for application unaware checkpoint/restart and
migrations (if the application is using namespace file descriptors)
but that is now allowd by the design if it becomes important.

I have preallocated the ipc and uts initial proc inode numbers so
their structures can be statically initialized.

Signed-off-by: Eric W. Biederman

Eric W. Biederman
2012-11-20 20:19:49 +0800

19 Nov, 2012

5 commits

ae11e0f18 userns: fix return value on mntns_install() failure ... Browse Code »

Change return value from -EINVAL to -EPERM when the permission check fails.

Signed-off-by: Zhao Hongjiang
Signed-off-by: Eric W. Biederman

Zhao Hongjiang
2012-11-19 21:59:22 +0800
0c55cfc41 vfs: Allow unprivileged manipulation of the mount namespace. ... Browse Code »

- Add a filesystem flag to mark filesystems that are safe to mount as
an unprivileged user.

- Add a filesystem flag to mark filesystems that don't need MNT_NODEV
when mounted by an unprivileged user.

- Relax the permission checks to allow unprivileged users that have
CAP_SYS_ADMIN permissions in the user namespace referred to by the
current mount namespace to be allowed to mount, unmount, and move
filesystems.

Acked-by: "Serge E. Hallyn"
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2012-11-19 21:59:21 +0800
7a472ef4b vfs: Only support slave subtrees across different user namespaces ... Browse Code »

Sharing mount subtress with mount namespaces created by unprivileged
users allows unprivileged mounts created by unprivileged users to
propagate to mount namespaces controlled by privileged users.

Prevent nasty consequences by changing shared subtrees to slave
subtress when an unprivileged users creates a new mount namespace.

Acked-by: Serge Hallyn
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2012-11-19 21:59:20 +0800
771b13716 vfs: Add a user namespace reference from struct mnt_namespace ... Browse Code »

This will allow for support for unprivileged mounts in a new user namespace.

Acked-by: "Serge E. Hallyn"
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2012-11-19 21:59:19 +0800
8823c079b vfs: Add setns support for the mount namespace ... Browse Code »

setns support for the mount namespace is a little tricky as an
arbitrary decision must be made about what to set fs->root and
fs->pwd to, as there is no expectation of a relationship between
the two mount namespaces. Therefore I arbitrarily find the root
mount point, and follow every mount on top of it to find the top
of the mount stack. Then I set fs->root and fs->pwd to that
location. The topmost root of the mount stack seems like a
reasonable place to be.

Bind mount support for the mount namespace inodes has the
possibility of creating circular dependencies between mount
namespaces. Circular dependencies can result in loops that
prevent mount namespaces from every being freed. I avoid
creating those circular dependencies by adding a sequence number
to the mount namespace and require all bind mounts be of a
younger mount namespace into an older mount namespace.

Add a helper function proc_ns_inode so it is possible to
detect when we are attempting to bind mound a namespace inode.

Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2012-11-19 21:59:18 +0800

13 Oct, 2012

1 commit

91a27b2a7 vfs: define struct filename and have getname() return it ... Browse Code »

getname() is intended to copy pathname strings from userspace into a
kernel buffer. The result is just a string in kernel space. It would
however be quite helpful to be able to attach some ancillary info to
the string.

For instance, we could attach some audit-related info to reduce the
amount of audit-related processing needed. When auditing is enabled,
we could also call getname() on the string more than once and not
need to recopy it from userspace.

This patchset converts the getname()/putname() interfaces to return
a struct instead of a string. For now, the struct just tracks the
string in kernel space and the original userland pointer for it.

Later, we'll add other information to the struct as it becomes
convenient.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-10-13 08:14:55 +0800

12 Oct, 2012

1 commit

808d4e3cf consitify do_mount() arguments ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-10-12 08:02:04 +0800

23 Sep, 2012

1 commit

156cacb1d do_add_mount()/umount -l races ... Browse Code »

normally we deal with lock_mount()/umount races by checking that
mountpoint to be is still in our namespace after lock_mount() has
been done. However, do_add_mount() skips that check when called
with MNT_SHRINKABLE in flags (i.e. from finish_automount()). The
reason is that ->mnt_ns may be a temporary namespace created exactly
to contain automounts a-la NFS4 referral handling. It's not the
namespace of the caller, though, so check_mnt() would fail here.
We still need to check that ->mnt_ns is non-NULL in that case,
though.

Signed-off-by: Al Viro

Al Viro
2012-09-23 08:48:18 +0800

31 Jul, 2012

1 commit

eb04c2828 fs: Add freezing handling to mnt_want_write() / mnt_drop_write() ... Browse Code »

Most of places where we want freeze protection coincides with the places where
we also have remount-ro protection. So make mnt_want_write() and
mnt_drop_write() (and their _file alternative) prevent freezing as well.
For the few cases that are really interested only in remount-ro protection
provide new function variants.

BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa
Tested-by: Peter M. Petrakis
Tested-by: Dann Frazier
Tested-by: Massimo Morana
Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2012-07-31 13:40:38 +0800

14 Jul, 2012

4 commits

f015f1267 VFS: Comment mount following code ... Browse Code »

Add comments describing what the directions "up" and "down" mean and ref count
handling to the VFS mount following family of functions.

Signed-off-by: Valerie Aurora (Original author)
Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2012-07-14 20:38:32 +0800
be34d1a3b VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors ... Browse Code »

copy_tree() can theoretically fail in a case other than ENOMEM, but always
returns NULL which is interpreted by callers as -ENOMEM. Change it to return
an explicit error.

Also change clone_mnt() for consistency and because union mounts will add new
error cases.

Thanks to Andreas Gruenbacher for a bug fix.
[AV: folded braino fix by Dan Carpenter]

Original-author: Valerie Aurora
Signed-off-by: David Howells
Cc: Valerie Aurora
Cc: Andreas Gruenbacher
Signed-off-by: Al Viro

David Howells
2012-07-14 20:37:27 +0800
6ce6e24e7 get rid of magic in proc_namespace.c ... Browse Code »

don't rely on proc_mounts->m being the first field; container_of()
is there for purpose. No need to bother with ->private, while
we are at it - the same container_of will do nicely.

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:32:48 +0800
f7a99c5b7 get rid of ->mnt_longterm ... Browse Code »

it's enough to set ->mnt_ns of internal vfsmounts to something
distinct from all struct mnt_namespace out there; then we can
just use the check for ->mnt_ns != NULL in the fast path of
mntput_no_expire()

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:32:47 +0800

31 May, 2012

1 commit

63d37a84a vfs: umount_tree() might be called on subtree that had never made it ... Browse Code »

__mnt_make_shortterm() in there undoes the effect of __mnt_make_longterm()
we'd done back when we set ->mnt_ns non-NULL; it should not be done to
vfsmounts that had never gone through commit_tree() and friends. Kudos to
lczerner for catching that one...

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro

Al Viro
2012-05-31 09:04:55 +0800

30 May, 2012

1 commit

962830df3 brlocks/lglocks: API cleanups ... Browse Code »

lglocks and brlocks are currently generated with some complicated macros
in lglock.h. But there's no reason to not just use common utility
functions and put all the data into a common data structure.

In preparation, this patch changes the API to look more like normal
function calls with pointers, not magic macros.

The patch is rather large because I move over all users in one go to keep
it bisectable. This impacts the VFS somewhat in terms of lines changed.
But no actual behaviour change.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Andi Kleen
Cc: Al Viro
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Rusty Russell
Signed-off-by: Al Viro

Andi Kleen
2012-05-30 11:28:41 +0800

09 Jan, 2012

1 commit

98793265b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
Kconfig: acpi: Fix typo in comment.
misc latin1 to utf8 conversions
devres: Fix a typo in devm_kfree comment
btrfs: free-space-cache.c: remove extra semicolon.
fat: Spelling s/obsolate/obsolete/g
SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
tools/power turbostat: update fields in manpage
mac80211: drop spelling fix
types.h: fix comment spelling for 'architectures'
typo fixes: aera -> area, exntension -> extension
devices.txt: Fix typo of 'VMware'.
sis900: Fix enum typo 'sis900_rx_bufer_status'
decompress_bunzip2: remove invalid vi modeline
treewide: Fix comment and string typo 'bufer'
hyper-v: Update MAINTAINERS
treewide: Fix typos in various parts of the kernel, and fix some comments.
clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
leds: Kconfig: Fix typo 'D2NET_V2'
sound: Kconfig: drop unknown symbol ARCH_CLPS7500
...

Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
kconfig additions, close to removed commented-out old ones)

Linus Torvalds
2012-01-09 05:21:22 +0800

07 Jan, 2012

4 commits

8e8b87964 vfs: prevent remount read-only if pending removes ... Browse Code »

If there are any inodes on the super block that have been unlinked
(i_nlink == 0) but have not yet been deleted then prevent the
remounting the super block read-only.

Reported-by: Toshiyuki Okajima
Signed-off-by: Miklos Szeredi
Tested-by: Toshiyuki Okajima
Signed-off-by: Al Viro

Miklos Szeredi
2012-01-07 12:20:13 +0800
4ed5e82fe vfs: protect remounting superblock read-only ... Browse Code »

Currently remouting superblock read-only is racy in a major way.

With the per mount read-only infrastructure it is now possible to
prevent most races, which this patch attempts.

Before starting the remount read-only, iterate through all mounts
belonging to the superblock and if none of them have any pending
writes, set sb->s_readonly_remount. This indicates that remount is in
progress and no further write requests are allowed. If the remount
succeeds set MS_RDONLY and reset s_readonly_remount.

If the remounting is unsuccessful just reset s_readonly_remount.
This can result in transient EROFS errors, despite the fact the
remount failed. Unfortunately hodling off writes is difficult as
remount itself may touch the filesystem (e.g. through load_nls())
which would deadlock.

A later patch deals with delayed writes due to nlink going to zero.

Signed-off-by: Miklos Szeredi
Tested-by: Toshiyuki Okajima
Signed-off-by: Al Viro

Miklos Szeredi
2012-01-07 12:20:12 +0800
39f7c4db1 vfs: keep list of mounts for each superblock ... Browse Code »

Keep track of vfsmounts belonging to a superblock. List is protected
by vfsmount_lock.

Signed-off-by: Miklos Szeredi
Tested-by: Toshiyuki Okajima
Signed-off-by: Al Viro

Miklos Szeredi
2012-01-07 12:20:12 +0800
34c80b1d9 vfs: switch ->show_options() to struct dentry * ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-07 12:19:54 +0800

04 Jan, 2012

18 commits

d10577a8d vfs: trim includes a bit ... Browse Code »

[folded fix for missing magic.h from Tetsuo Handa]

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:57:13 +0800
be08d6d26 switch mnt_namespace ->root to struct mount ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:57:13 +0800
0226f4923 vfs: take /proc/*/mounts and friends to fs/proc_namespace.c ... Browse Code »

rationale: that stuff is far tighter bound to fs/namespace.c than to
the guts of procfs proper.

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:57:13 +0800
3a2393d71 vfs: opencode mntget() mnt_set_mountpoint() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:57:12 +0800
909b0a88e vfs: spread struct mount - remaining argument of next_mnt() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:57:12 +0800
c63181e6b vfs: move fsnotify junk to struct mount ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:57:12 +0800
52ba1621d vfs: move mnt_devname ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:57:11 +0800
1a4eeaf2a vfs: move mnt_list to struct mount ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:57:11 +0800
fc7be130c vfs: switch pnode.h macros to struct mount * ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:57:11 +0800
863d684f9 vfs: move the rest of int fields to struct mount ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:57:10 +0800
15169fe78 vfs: mnt_id/mnt_group_id moved ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:57:10 +0800
143c8c91c vfs: mnt_ns moved to struct mount ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:57:09 +0800
900148dca vfs: spread struct mount - mntput_no_expire ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:57:09 +0800
95bc5f25c vfs: spread struct mount - do_add_mount and graft_tree ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:57:09 +0800
6776db3d3 vfs: take mnt_share/mnt_slave/mnt_slave_list and mnt_expire to struct mount ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:57:08 +0800
32301920f vfs: and now we can make ->mnt_master point to struct mount ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:57:08 +0800
d10e8def0 vfs: take mnt_master to struct mount ... Browse Code »

make IS_MNT_SLAVE take struct mount * at the same time

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:57:08 +0800
14cf1fa8f vfs: spread struct mount - remaining argument of mnt_set_mountpoint() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:57:07 +0800