15 Dec, 2012

1 commit

  • Andy Lutomirski found a nasty little bug in
    the permissions of setns. With unprivileged user namespaces it
    became possible to create new namespaces without privilege.

    However the setns calls were relaxed to only require CAP_SYS_ADMIN in
    the user nameapce of the targed namespace.

    Which made the following nasty sequence possible.

    pid = clone(CLONE_NEWUSER | CLONE_NEWNS);
    if (pid == 0) { /* child */
    system("mount --bind /home/me/passwd /etc/passwd");
    }
    else if (pid != 0) { /* parent */
    char path[PATH_MAX];
    snprintf(path, sizeof(path), "/proc/%u/ns/mnt");
    fd = open(path, O_RDONLY);
    setns(fd, 0);
    system("su -");
    }

    Prevent this possibility by requiring CAP_SYS_ADMIN
    in the current user namespace when joing all but the user namespace.

    Acked-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

20 Nov, 2012

1 commit

  • Assign a unique proc inode to each namespace, and use that
    inode number to ensure we only allocate at most one proc
    inode for every namespace in proc.

    A single proc inode per namespace allows userspace to test
    to see if two processes are in the same namespace.

    This has been a long requested feature and only blocked because
    a naive implementation would put the id in a global space and
    would ultimately require having a namespace for the names of
    namespaces, making migration and certain virtualization tricks
    impossible.

    We still don't have per superblock inode numbers for proc, which
    appears necessary for application unaware checkpoint/restart and
    migrations (if the application is using namespace file descriptors)
    but that is now allowd by the design if it becomes important.

    I have preallocated the ipc and uts initial proc inode numbers so
    their structures can be statically initialized.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

19 Nov, 2012

5 commits

  • Change return value from -EINVAL to -EPERM when the permission check fails.

    Signed-off-by: Zhao Hongjiang
    Signed-off-by: Eric W. Biederman

    Zhao Hongjiang
     
  • - Add a filesystem flag to mark filesystems that are safe to mount as
    an unprivileged user.

    - Add a filesystem flag to mark filesystems that don't need MNT_NODEV
    when mounted by an unprivileged user.

    - Relax the permission checks to allow unprivileged users that have
    CAP_SYS_ADMIN permissions in the user namespace referred to by the
    current mount namespace to be allowed to mount, unmount, and move
    filesystems.

    Acked-by: "Serge E. Hallyn"
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Sharing mount subtress with mount namespaces created by unprivileged
    users allows unprivileged mounts created by unprivileged users to
    propagate to mount namespaces controlled by privileged users.

    Prevent nasty consequences by changing shared subtrees to slave
    subtress when an unprivileged users creates a new mount namespace.

    Acked-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • This will allow for support for unprivileged mounts in a new user namespace.

    Acked-by: "Serge E. Hallyn"
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • setns support for the mount namespace is a little tricky as an
    arbitrary decision must be made about what to set fs->root and
    fs->pwd to, as there is no expectation of a relationship between
    the two mount namespaces. Therefore I arbitrarily find the root
    mount point, and follow every mount on top of it to find the top
    of the mount stack. Then I set fs->root and fs->pwd to that
    location. The topmost root of the mount stack seems like a
    reasonable place to be.

    Bind mount support for the mount namespace inodes has the
    possibility of creating circular dependencies between mount
    namespaces. Circular dependencies can result in loops that
    prevent mount namespaces from every being freed. I avoid
    creating those circular dependencies by adding a sequence number
    to the mount namespace and require all bind mounts be of a
    younger mount namespace into an older mount namespace.

    Add a helper function proc_ns_inode so it is possible to
    detect when we are attempting to bind mound a namespace inode.

    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

13 Oct, 2012

1 commit

  • getname() is intended to copy pathname strings from userspace into a
    kernel buffer. The result is just a string in kernel space. It would
    however be quite helpful to be able to attach some ancillary info to
    the string.

    For instance, we could attach some audit-related info to reduce the
    amount of audit-related processing needed. When auditing is enabled,
    we could also call getname() on the string more than once and not
    need to recopy it from userspace.

    This patchset converts the getname()/putname() interfaces to return
    a struct instead of a string. For now, the struct just tracks the
    string in kernel space and the original userland pointer for it.

    Later, we'll add other information to the struct as it becomes
    convenient.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     

12 Oct, 2012

1 commit


23 Sep, 2012

1 commit

  • normally we deal with lock_mount()/umount races by checking that
    mountpoint to be is still in our namespace after lock_mount() has
    been done. However, do_add_mount() skips that check when called
    with MNT_SHRINKABLE in flags (i.e. from finish_automount()). The
    reason is that ->mnt_ns may be a temporary namespace created exactly
    to contain automounts a-la NFS4 referral handling. It's not the
    namespace of the caller, though, so check_mnt() would fail here.
    We still need to check that ->mnt_ns is non-NULL in that case,
    though.

    Signed-off-by: Al Viro

    Al Viro
     

31 Jul, 2012

1 commit

  • Most of places where we want freeze protection coincides with the places where
    we also have remount-ro protection. So make mnt_want_write() and
    mnt_drop_write() (and their _file alternative) prevent freezing as well.
    For the few cases that are really interested only in remount-ro protection
    provide new function variants.

    BugLink: https://bugs.launchpad.net/bugs/897421
    Tested-by: Kamal Mostafa
    Tested-by: Peter M. Petrakis
    Tested-by: Dann Frazier
    Tested-by: Massimo Morana
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     

14 Jul, 2012

4 commits

  • Add comments describing what the directions "up" and "down" mean and ref count
    handling to the VFS mount following family of functions.

    Signed-off-by: Valerie Aurora (Original author)
    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • copy_tree() can theoretically fail in a case other than ENOMEM, but always
    returns NULL which is interpreted by callers as -ENOMEM. Change it to return
    an explicit error.

    Also change clone_mnt() for consistency and because union mounts will add new
    error cases.

    Thanks to Andreas Gruenbacher for a bug fix.
    [AV: folded braino fix by Dan Carpenter]

    Original-author: Valerie Aurora
    Signed-off-by: David Howells
    Cc: Valerie Aurora
    Cc: Andreas Gruenbacher
    Signed-off-by: Al Viro

    David Howells
     
  • don't rely on proc_mounts->m being the first field; container_of()
    is there for purpose. No need to bother with ->private, while
    we are at it - the same container_of will do nicely.

    Signed-off-by: Al Viro

    Al Viro
     
  • it's enough to set ->mnt_ns of internal vfsmounts to something
    distinct from all struct mnt_namespace out there; then we can
    just use the check for ->mnt_ns != NULL in the fast path of
    mntput_no_expire()

    Signed-off-by: Al Viro

    Al Viro
     

31 May, 2012

1 commit


30 May, 2012

1 commit

  • lglocks and brlocks are currently generated with some complicated macros
    in lglock.h. But there's no reason to not just use common utility
    functions and put all the data into a common data structure.

    In preparation, this patch changes the API to look more like normal
    function calls with pointers, not magic macros.

    The patch is rather large because I move over all users in one go to keep
    it bisectable. This impacts the VFS somewhat in terms of lines changed.
    But no actual behaviour change.

    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Andi Kleen
    Cc: Al Viro
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Rusty Russell
    Signed-off-by: Al Viro

    Andi Kleen
     

09 Jan, 2012

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
    Kconfig: acpi: Fix typo in comment.
    misc latin1 to utf8 conversions
    devres: Fix a typo in devm_kfree comment
    btrfs: free-space-cache.c: remove extra semicolon.
    fat: Spelling s/obsolate/obsolete/g
    SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
    tools/power turbostat: update fields in manpage
    mac80211: drop spelling fix
    types.h: fix comment spelling for 'architectures'
    typo fixes: aera -> area, exntension -> extension
    devices.txt: Fix typo of 'VMware'.
    sis900: Fix enum typo 'sis900_rx_bufer_status'
    decompress_bunzip2: remove invalid vi modeline
    treewide: Fix comment and string typo 'bufer'
    hyper-v: Update MAINTAINERS
    treewide: Fix typos in various parts of the kernel, and fix some comments.
    clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
    gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
    leds: Kconfig: Fix typo 'D2NET_V2'
    sound: Kconfig: drop unknown symbol ARCH_CLPS7500
    ...

    Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
    kconfig additions, close to removed commented-out old ones)

    Linus Torvalds
     

07 Jan, 2012

4 commits

  • If there are any inodes on the super block that have been unlinked
    (i_nlink == 0) but have not yet been deleted then prevent the
    remounting the super block read-only.

    Reported-by: Toshiyuki Okajima
    Signed-off-by: Miklos Szeredi
    Tested-by: Toshiyuki Okajima
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • Currently remouting superblock read-only is racy in a major way.

    With the per mount read-only infrastructure it is now possible to
    prevent most races, which this patch attempts.

    Before starting the remount read-only, iterate through all mounts
    belonging to the superblock and if none of them have any pending
    writes, set sb->s_readonly_remount. This indicates that remount is in
    progress and no further write requests are allowed. If the remount
    succeeds set MS_RDONLY and reset s_readonly_remount.

    If the remounting is unsuccessful just reset s_readonly_remount.
    This can result in transient EROFS errors, despite the fact the
    remount failed. Unfortunately hodling off writes is difficult as
    remount itself may touch the filesystem (e.g. through load_nls())
    which would deadlock.

    A later patch deals with delayed writes due to nlink going to zero.

    Signed-off-by: Miklos Szeredi
    Tested-by: Toshiyuki Okajima
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • Keep track of vfsmounts belonging to a superblock. List is protected
    by vfsmount_lock.

    Signed-off-by: Miklos Szeredi
    Tested-by: Toshiyuki Okajima
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • Signed-off-by: Al Viro

    Al Viro
     

04 Jan, 2012

18 commits