Eric Lee / smarc-fsl-linux-kernel

05 May, 2013

2 commits

b1983cd89 create_mnt_ns: unidiomatic use of list_add() ... Browse Code »

while list_add(A, B) and list_add(B, A) are equivalent when both A and B
are guaranteed to be empty, the usual idiom is list_add(what, where),
not the other way round... Not a bug per se, but only by accident and
it makes RTFS harder for no good reason.

Spotted-by: Rajat Sharma
Signed-off-by: Al Viro

Al Viro
2013-05-05 03:18:53 +0800
0d5cadb87 do_mount(): fix a leak introduced in 3.9 ("mount: consolidate permission checks") ... Browse Code »

Cc: stable@vger.kernel.org
Bisected-by: Michael Leun
Signed-off-by: Al Viro

Al Viro
2013-05-05 02:40:51 +0800

02 May, 2013

2 commits

20b4fb485 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
don't bother with deferred freeing of fdtables
proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
proc: Make the PROC_I() and PDE() macros internal to procfs
proc: Supply a function to remove a proc entry by PDE
take cgroup_open() and cpuset_open() to fs/proc/base.c
ppc: Clean up scanlog
ppc: Clean up rtas_flash driver somewhat
hostap: proc: Use remove_proc_subtree()
drm: proc: Use remove_proc_subtree()
drm: proc: Use minor->index to label things, not PDE->name
drm: Constify drm_proc_list[]
zoran: Don't print proc_dir_entry data in debug
reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
proc: Supply an accessor for getting the data from a PDE's parent
airo: Use remove_proc_subtree()
rtl8192u: Don't need to save device proc dir PDE
rtl8187se: Use a dir under /proc/net/r8180/
proc: Add proc_mkdir_data()
proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
proc: Move PDE_NET() to fs/proc/proc_net.c
...

Linus Torvalds
2013-05-02 08:51:54 +0800
0bb80f240 proc: Split the namespace stuff out into linux/proc_ns.h ... Browse Code »

Split the proc namespace stuff out into linux/proc_ns.h.

Signed-off-by: David Howells
cc: netdev@vger.kernel.org
cc: Serge E. Hallyn
cc: Eric W. Biederman
Signed-off-by: Al Viro

David Howells
2013-05-02 05:29:39 +0800

10 Apr, 2013

8 commits

e8f2b548d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs fixes from Al Viro:
"A nasty bug in fs/namespace.c caught by Andrey + a couple of less
serious unpleasantness - ecryptfs misc device playing hopeless games
with try_module_get() and palinfo procfs support being... not quite
correctly done, to be polite."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
mnt: release locks on error path in do_loopback
palinfo fixes
procfs: add proc_remove_subtree()
ecryptfs: close rmmod race

Linus Torvalds
2013-04-10 03:22:49 +0800
97216be09 fold release_mounts() into namespace_unlock() ... Browse Code »

... and provide namespace_lock() as a trivial wrapper;
switch to those two consistently.

Result is patterned after rtnl_lock/rtnl_unlock pair.

Signed-off-by: Al Viro

Al Viro
2013-04-10 02:12:54 +0800
328e6d901 switch unlock_mount() to namespace_unlock(), convert all umount_tree() callers ... Browse Code »

which allows to kill the last argument of umount_tree() and make release_mounts()
static.

Signed-off-by: Al Viro

Al Viro
2013-04-10 02:12:53 +0800
3ab6abee5 more conversions to namespace_unlock() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-04-10 02:12:53 +0800
b54b9be78 get rid of the second argument of shrink_submounts() ... Browse Code »

... it's always &unmounted.

Signed-off-by: Al Viro

Al Viro
2013-04-10 02:12:53 +0800
e3197d83d saner umount_tree()/release_mounts(), part 1 ... Browse Code »

global list of release_mounts() fodder, protected by namespace_sem;
eventually, all umount_tree() callers will use it as kill list.
Helper picking the contents of that list, releasing namespace_sem
and doing release_mounts() on what it got.

Signed-off-by: Al Viro

Al Viro
2013-04-10 02:12:52 +0800
84d17192d get rid of full-hash scan on detaching vfsmounts ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-04-10 02:12:52 +0800
e9c5d8a56 mnt: release locks on error path in do_loopback ... Browse Code »

do_loopback calls lock_mount(path) and forget to unlock_mount
if clone_mnt or copy_mnt fails.

[ 77.661566] ================================================
[ 77.662939] [ BUG: lock held when returning to user space! ]
[ 77.664104] 3.9.0-rc5+ #17 Not tainted
[ 77.664982] ------------------------------------------------
[ 77.666488] mount/514 is leaving the kernel with locks still held!
[ 77.668027] 2 locks held by mount/514:
[ 77.668817] #0: (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [] lock_mount+0x32/0xe0
[ 77.671755] #1: (&namespace_sem){+++++.}, at: [] lock_mount+0x4a/0xe0

Signed-off-by: Andrey Vagin
Signed-off-by: Al Viro

Andrey Vagin
2013-04-10 02:09:50 +0800

27 Mar, 2013

4 commits

87a8ebd63 userns: Restrict when proc and sysfs can be mounted ... Browse Code »

Only allow unprivileged mounts of proc and sysfs if they are already
mounted when the user namespace is created.

proc and sysfs are interesting because they have content that is
per namespace, and so fresh mounts are needed when new namespaces
are created while at the same time proc and sysfs have content that
is shared between every instance.

Respect the policy of who may see the shared content of proc and sysfs
by only allowing new mounts if there was an existing mount at the time
the user namespace was created.

In practice there are only two interesting cases: proc and sysfs are
mounted at their usual places, proc and sysfs are not mounted at all
(some form of mount namespace jail).

Cc: stable@vger.kernel.org
Acked-by: Serge Hallyn
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-03-27 22:50:08 +0800
132c94e31 vfs: Carefully propogate mounts across user namespaces ... Browse Code »

As a matter of policy MNT_READONLY should not be changable if the
original mounter had more privileges than creator of the mount
namespace.

Add the flag CL_UNPRIVILEGED to note when we are copying a mount from
a mount namespace that requires more privileges to a mount namespace
that requires fewer privileges.

When the CL_UNPRIVILEGED flag is set cause clone_mnt to set MNT_NO_REMOUNT
if any of the mnt flags that should never be changed are set.

This protects both mount propagation and the initial creation of a less
privileged mount namespace.

Cc: stable@vger.kernel.org
Acked-by: Serge Hallyn
Reported-by: Andy Lutomirski
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-03-27 22:50:05 +0800
90563b198 vfs: Add a mount flag to lock read only bind mounts ... Browse Code »

When a read-only bind mount is copied from mount namespace in a higher
privileged user namespace to a mount namespace in a lesser privileged
user namespace, it should not be possible to remove the the read-only
restriction.

Add a MNT_LOCK_READONLY mount flag to indicate that a mount must
remain read-only.

CC: stable@vger.kernel.org
Acked-by: Serge Hallyn
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-03-27 22:50:04 +0800
3151527ee userns: Don't allow creation if the user is chrooted ... Browse Code »

Guarantee that the policy of which files may be access that is
established by setting the root directory will not be violated
by user namespaces by verifying that the root directory points
to the root of the mount namespace at the time of user namespace
creation.

Changing the root is a privileged operation, and as a matter of policy
it serves to limit unprivileged processes to files below the current
root directory.

For reasons of simplicity and comprehensibility the privilege to
change the root directory is gated solely on the CAP_SYS_CHROOT
capability in the user namespace. Therefore when creating a user
namespace we must ensure that the policy of which files may be access
can not be violated by changing the root directory.

Anyone who runs a processes in a chroot and would like to use user
namespace can setup the same view of filesystems with a mount
namespace instead. With this result that this is not a practical
limitation for using user namespaces.

Cc: stable@vger.kernel.org
Acked-by: Serge Hallyn
Reported-by: Andy Lutomirski
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-03-27 22:49:29 +0800

23 Feb, 2013

3 commits

496ad9aa8 new helper: file_inode(file) ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-02-23 12:31:31 +0800
57eccb830 mount: consolidate permission checks ... Browse Code »

... and ask for global CAP_SYS_ADMIN only for superblock-level remounts

Signed-off-by: Al Viro

Al Viro
2013-02-23 12:31:31 +0800
9b40bc90a get rid of unprotected dereferencing of mnt->mnt_ns ... Browse Code »

It's safe only under namespace_sem or vfsmount_lock; all places
in fs/namespace.c that want mnt->mnt_ns->user_ns actually want to use
current->nsproxy->mnt_ns->user_ns (note the calls of check_mnt() in
there).

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro

Al Viro
2013-02-23 12:31:05 +0800

21 Dec, 2012

1 commit

1e75529e3 vfs, freeze: use ACCESS_ONCE() to guard access to ->mnt_flags ... Browse Code »

The compiler may optimize the while loop and make the check just be done once,
so we should use ACCESS_ONCE() to guard access to ->mnt_flags

Signed-off-by: Miao Xie
Signed-off-by: Al Viro

Miao Xie
2012-12-21 02:36:18 +0800

15 Dec, 2012

1 commit

5e4a08476 userns: Require CAP_SYS_ADMIN for most uses of setns. ... Browse Code »

Andy Lutomirski found a nasty little bug in
the permissions of setns. With unprivileged user namespaces it
became possible to create new namespaces without privilege.

However the setns calls were relaxed to only require CAP_SYS_ADMIN in
the user nameapce of the targed namespace.

Which made the following nasty sequence possible.

pid = clone(CLONE_NEWUSER | CLONE_NEWNS);
if (pid == 0) { /* child */
system("mount --bind /home/me/passwd /etc/passwd");
}
else if (pid != 0) { /* parent */
char path[PATH_MAX];
snprintf(path, sizeof(path), "/proc/%u/ns/mnt");
fd = open(path, O_RDONLY);
setns(fd, 0);
system("su -");
}

Prevent this possibility by requiring CAP_SYS_ADMIN
in the current user namespace when joing all but the user namespace.

Acked-by: Serge Hallyn
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2012-12-15 08:12:03 +0800

20 Nov, 2012

1 commit

98f842e67 proc: Usable inode numbers for the namespace file descriptors. ... Browse Code »

Assign a unique proc inode to each namespace, and use that
inode number to ensure we only allocate at most one proc
inode for every namespace in proc.

A single proc inode per namespace allows userspace to test
to see if two processes are in the same namespace.

This has been a long requested feature and only blocked because
a naive implementation would put the id in a global space and
would ultimately require having a namespace for the names of
namespaces, making migration and certain virtualization tricks
impossible.

We still don't have per superblock inode numbers for proc, which
appears necessary for application unaware checkpoint/restart and
migrations (if the application is using namespace file descriptors)
but that is now allowd by the design if it becomes important.

I have preallocated the ipc and uts initial proc inode numbers so
their structures can be statically initialized.

Signed-off-by: Eric W. Biederman

Eric W. Biederman
2012-11-20 20:19:49 +0800

19 Nov, 2012

5 commits

ae11e0f18 userns: fix return value on mntns_install() failure ... Browse Code »

Change return value from -EINVAL to -EPERM when the permission check fails.

Signed-off-by: Zhao Hongjiang
Signed-off-by: Eric W. Biederman

Zhao Hongjiang
2012-11-19 21:59:22 +0800
0c55cfc41 vfs: Allow unprivileged manipulation of the mount namespace. ... Browse Code »

- Add a filesystem flag to mark filesystems that are safe to mount as
an unprivileged user.

- Add a filesystem flag to mark filesystems that don't need MNT_NODEV
when mounted by an unprivileged user.

- Relax the permission checks to allow unprivileged users that have
CAP_SYS_ADMIN permissions in the user namespace referred to by the
current mount namespace to be allowed to mount, unmount, and move
filesystems.

Acked-by: "Serge E. Hallyn"
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2012-11-19 21:59:21 +0800
7a472ef4b vfs: Only support slave subtrees across different user namespaces ... Browse Code »

Sharing mount subtress with mount namespaces created by unprivileged
users allows unprivileged mounts created by unprivileged users to
propagate to mount namespaces controlled by privileged users.

Prevent nasty consequences by changing shared subtrees to slave
subtress when an unprivileged users creates a new mount namespace.

Acked-by: Serge Hallyn
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2012-11-19 21:59:20 +0800
771b13716 vfs: Add a user namespace reference from struct mnt_namespace ... Browse Code »

This will allow for support for unprivileged mounts in a new user namespace.

Acked-by: "Serge E. Hallyn"
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2012-11-19 21:59:19 +0800
8823c079b vfs: Add setns support for the mount namespace ... Browse Code »

setns support for the mount namespace is a little tricky as an
arbitrary decision must be made about what to set fs->root and
fs->pwd to, as there is no expectation of a relationship between
the two mount namespaces. Therefore I arbitrarily find the root
mount point, and follow every mount on top of it to find the top
of the mount stack. Then I set fs->root and fs->pwd to that
location. The topmost root of the mount stack seems like a
reasonable place to be.

Bind mount support for the mount namespace inodes has the
possibility of creating circular dependencies between mount
namespaces. Circular dependencies can result in loops that
prevent mount namespaces from every being freed. I avoid
creating those circular dependencies by adding a sequence number
to the mount namespace and require all bind mounts be of a
younger mount namespace into an older mount namespace.

Add a helper function proc_ns_inode so it is possible to
detect when we are attempting to bind mound a namespace inode.

Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2012-11-19 21:59:18 +0800

13 Oct, 2012

1 commit

91a27b2a7 vfs: define struct filename and have getname() return it ... Browse Code »

getname() is intended to copy pathname strings from userspace into a
kernel buffer. The result is just a string in kernel space. It would
however be quite helpful to be able to attach some ancillary info to
the string.

For instance, we could attach some audit-related info to reduce the
amount of audit-related processing needed. When auditing is enabled,
we could also call getname() on the string more than once and not
need to recopy it from userspace.

This patchset converts the getname()/putname() interfaces to return
a struct instead of a string. For now, the struct just tracks the
string in kernel space and the original userland pointer for it.

Later, we'll add other information to the struct as it becomes
convenient.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-10-13 08:14:55 +0800

12 Oct, 2012

1 commit

808d4e3cf consitify do_mount() arguments ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-10-12 08:02:04 +0800

23 Sep, 2012

1 commit

156cacb1d do_add_mount()/umount -l races ... Browse Code »

normally we deal with lock_mount()/umount races by checking that
mountpoint to be is still in our namespace after lock_mount() has
been done. However, do_add_mount() skips that check when called
with MNT_SHRINKABLE in flags (i.e. from finish_automount()). The
reason is that ->mnt_ns may be a temporary namespace created exactly
to contain automounts a-la NFS4 referral handling. It's not the
namespace of the caller, though, so check_mnt() would fail here.
We still need to check that ->mnt_ns is non-NULL in that case,
though.

Signed-off-by: Al Viro

Al Viro
2012-09-23 08:48:18 +0800

31 Jul, 2012

1 commit

eb04c2828 fs: Add freezing handling to mnt_want_write() / mnt_drop_write() ... Browse Code »

Most of places where we want freeze protection coincides with the places where
we also have remount-ro protection. So make mnt_want_write() and
mnt_drop_write() (and their _file alternative) prevent freezing as well.
For the few cases that are really interested only in remount-ro protection
provide new function variants.

BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa
Tested-by: Peter M. Petrakis
Tested-by: Dann Frazier
Tested-by: Massimo Morana
Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2012-07-31 13:40:38 +0800

14 Jul, 2012

4 commits

f015f1267 VFS: Comment mount following code ... Browse Code »

Add comments describing what the directions "up" and "down" mean and ref count
handling to the VFS mount following family of functions.

Signed-off-by: Valerie Aurora (Original author)
Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2012-07-14 20:38:32 +0800
be34d1a3b VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors ... Browse Code »

copy_tree() can theoretically fail in a case other than ENOMEM, but always
returns NULL which is interpreted by callers as -ENOMEM. Change it to return
an explicit error.

Also change clone_mnt() for consistency and because union mounts will add new
error cases.

Thanks to Andreas Gruenbacher for a bug fix.
[AV: folded braino fix by Dan Carpenter]

Original-author: Valerie Aurora
Signed-off-by: David Howells
Cc: Valerie Aurora
Cc: Andreas Gruenbacher
Signed-off-by: Al Viro

David Howells
2012-07-14 20:37:27 +0800
6ce6e24e7 get rid of magic in proc_namespace.c ... Browse Code »

don't rely on proc_mounts->m being the first field; container_of()
is there for purpose. No need to bother with ->private, while
we are at it - the same container_of will do nicely.

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:32:48 +0800
f7a99c5b7 get rid of ->mnt_longterm ... Browse Code »

it's enough to set ->mnt_ns of internal vfsmounts to something
distinct from all struct mnt_namespace out there; then we can
just use the check for ->mnt_ns != NULL in the fast path of
mntput_no_expire()

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:32:47 +0800

31 May, 2012

1 commit

63d37a84a vfs: umount_tree() might be called on subtree that had never made it ... Browse Code »
1

__mnt_make_shortterm() in there undoes the effect of __mnt_make_longterm()
we'd done back when we set ->mnt_ns non-NULL; it should not be done to
vfsmounts that had never gone through commit_tree() and friends. Kudos to
lczerner for catching that one...

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro

Al Viro
2012-05-31 09:04:55 +0800

30 May, 2012

1 commit

962830df3 brlocks/lglocks: API cleanups ... Browse Code »

lglocks and brlocks are currently generated with some complicated macros
in lglock.h. But there's no reason to not just use common utility
functions and put all the data into a common data structure.

In preparation, this patch changes the API to look more like normal
function calls with pointers, not magic macros.

The patch is rather large because I move over all users in one go to keep
it bisectable. This impacts the VFS somewhat in terms of lines changed.
But no actual behaviour change.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Andi Kleen
Cc: Al Viro
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Rusty Russell
Signed-off-by: Al Viro

Andi Kleen
2012-05-30 11:28:41 +0800

09 Jan, 2012

1 commit

98793265b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
Kconfig: acpi: Fix typo in comment.
misc latin1 to utf8 conversions
devres: Fix a typo in devm_kfree comment
btrfs: free-space-cache.c: remove extra semicolon.
fat: Spelling s/obsolate/obsolete/g
SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
tools/power turbostat: update fields in manpage
mac80211: drop spelling fix
types.h: fix comment spelling for 'architectures'
typo fixes: aera -> area, exntension -> extension
devices.txt: Fix typo of 'VMware'.
sis900: Fix enum typo 'sis900_rx_bufer_status'
decompress_bunzip2: remove invalid vi modeline
treewide: Fix comment and string typo 'bufer'
hyper-v: Update MAINTAINERS
treewide: Fix typos in various parts of the kernel, and fix some comments.
clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
leds: Kconfig: Fix typo 'D2NET_V2'
sound: Kconfig: drop unknown symbol ARCH_CLPS7500
...

Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
kconfig additions, close to removed commented-out old ones)

Linus Torvalds
2012-01-09 05:21:22 +0800

07 Jan, 2012

2 commits

8e8b87964 vfs: prevent remount read-only if pending removes ... Browse Code »

If there are any inodes on the super block that have been unlinked
(i_nlink == 0) but have not yet been deleted then prevent the
remounting the super block read-only.

Reported-by: Toshiyuki Okajima
Signed-off-by: Miklos Szeredi
Tested-by: Toshiyuki Okajima
Signed-off-by: Al Viro

Miklos Szeredi
2012-01-07 12:20:13 +0800
4ed5e82fe vfs: protect remounting superblock read-only ... Browse Code »

Currently remouting superblock read-only is racy in a major way.

With the per mount read-only infrastructure it is now possible to
prevent most races, which this patch attempts.

Before starting the remount read-only, iterate through all mounts
belonging to the superblock and if none of them have any pending
writes, set sb->s_readonly_remount. This indicates that remount is in
progress and no further write requests are allowed. If the remount
succeeds set MS_RDONLY and reset s_readonly_remount.

If the remounting is unsuccessful just reset s_readonly_remount.
This can result in transient EROFS errors, despite the fact the
remount failed. Unfortunately hodling off writes is difficult as
remount itself may touch the filesystem (e.g. through load_nls())
which would deadlock.

A later patch deals with delayed writes due to nlink going to zero.

Signed-off-by: Miklos Szeredi
Tested-by: Toshiyuki Okajima
Signed-off-by: Al Viro

Miklos Szeredi
2012-01-07 12:20:12 +0800