26 Oct, 2010

1 commit

  • If clone_mnt() happens while mnt_make_readonly() is running, the
    cloned mount might have MNT_WRITE_HOLD flag set, which results in
    mnt_want_write() spinning forever on this mount.

    Needs CAP_SYS_ADMIN to trigger deliberately and unlikely to happen
    accidentally. But if it does happen it can hang the machine.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     

05 Oct, 2010

1 commit

  • After pushing down the BKL to the get_sb/fill_super operations of the
    filesystems that still make usage of the BKL it is safe to remove it from
    do_new_mount().

    I've read through all the code formerly covered by the BKL inside
    do_kern_mount() and have satisfied myself that it doesn't need the BKL
    any more.

    Signed-off-by: Jan Blunck
    Cc: Matthew Wilcox
    Signed-off-by: Arnd Bergmann

    Jan Blunck
     

08 Sep, 2010

1 commit

  • Sanity check the flags passed to change_mnt_propagation(). Exactly
    one flag should be set. Return EINVAL otherwise.

    Userspace can pass in arbitrary combinations of MS_* flags to mount().
    do_change_type() is called if any of MS_SHARED, MS_PRIVATE, MS_SLAVE,
    or MS_UNBINDABLE is set. do_change_type() clears MS_REC and then
    calls change_mnt_propagation() with the rest of the user-supplied
    flags. change_mnt_propagation() clearly assumes only one flag is set
    but do_change_type() does not check that this is true. For example,
    mount() with flags MS_SHARED | MS_RDONLY does not actually make the
    mount shared or read-only but does clear MNT_UNBINDABLE.

    Signed-off-by: Valerie Aurora
    Signed-off-by: Linus Torvalds

    Valerie Aurora
     

18 Aug, 2010

1 commit

  • fs: brlock vfsmount_lock

    Use a brlock for the vfsmount lock. It must be taken for write whenever
    modifying the mount hash or associated fields, and may be taken for read when
    performing mount hash lookups.

    A new lock is added for the mnt-id allocator, so it doesn't need to take
    the heavy vfsmount write-lock.

    The number of atomics should remain the same for fastpath rlock cases, though
    code would be slightly slower due to per-cpu access. Scalability is not not be
    much improved in common cases yet, due to other locks (ie. dcache_lock) getting
    in the way. However path lookups crossing mountpoints should be one case where
    scalability is improved (currently requiring the global lock).

    The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node
    Altix system (high latency to remote nodes), a simple umount microbenchmark
    (mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it
    took 6.8s, afterwards took 7.1s, about 5% slower.

    Cc: Al Viro
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     

11 Aug, 2010

3 commits

  • Commit d0adde574b8487ef30f69e2d08bba769e4be513f added MNT_STRICTATIME
    but it isn't actually used (MS_STRICTATIME clears MNT_RELATIME and
    MNT_NOATIME rather than setting any mount flag).

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • Add three helpers that retrieve a refcounted copy of the root and cwd
    from the supplied fs_struct.

    get_fs_root()
    get_fs_pwd()
    get_fs_root_and_pwd()

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • * 'for-linus' of git://git.infradead.org/users/eparis/notify: (132 commits)
    fanotify: use both marks when possible
    fsnotify: pass both the vfsmount mark and inode mark
    fsnotify: walk the inode and vfsmount lists simultaneously
    fsnotify: rework ignored mark flushing
    fsnotify: remove global fsnotify groups lists
    fsnotify: remove group->mask
    fsnotify: remove the global masks
    fsnotify: cleanup should_send_event
    fanotify: use the mark in handler functions
    audit: use the mark in handler functions
    dnotify: use the mark in handler functions
    inotify: use the mark in handler functions
    fsnotify: send fsnotify_mark to groups in event handling functions
    fsnotify: Exchange list heads instead of moving elements
    fsnotify: srcu to protect read side of inode and vfsmount locks
    fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called
    fsnotify: use _rcu functions for mark list traversal
    fsnotify: place marks on object in order of group memory address
    vfs/fsnotify: fsnotify_close can delay the final work in fput
    fsnotify: store struct file not struct path
    ...

    Fix up trivial delete/modify conflict in fs/notify/inotify/inotify.c.

    Linus Torvalds
     

10 Aug, 2010

1 commit

  • If sget() finds a matching superblock being set up, it'll
    grab an active reference to it and grab s_umount. That's
    fine - we'll wait for completion of foofs_get_sb() that way.
    However, if said foofs_get_sb() fails we'll end up holding
    the halfway-created superblock. deactivate_locked_super()
    called by foofs_get_sb() will just unlock the sucker since
    we are holding another active reference to it.

    What we need is a way to tell if superblock has been successfully
    set up. Unfortunately, neither ->s_root nor the check for
    MS_ACTIVE quite fit. Cheap and easy way, suitable for backport:
    new flag set by the (only) caller of ->get_sb(). If that flag
    isn't present by the time sget() grabbed s_umount on preexisting
    superblock it has found, it's seeing a stillborn and should
    just bury it with deactivate_locked_super() (and repeat the search).

    Longer term we want to set that flag in ->get_sb() instances (and
    check for it to distinguish between "sget() found us a live sb"
    and "sget() has allocated an sb, we need to set it up" in there,
    instead of checking ->s_root as we do now).

    Signed-off-by: Al Viro
    Cc: stable@kernel.org

    Al Viro
     

28 Jul, 2010

2 commits


18 May, 2010

1 commit


15 May, 2010

1 commit

  • 1) i_flags simply doesn't work for mount/unlink race prevention;
    we may have many links to file and rm on one of those obviously
    shouldn't prevent bind on top of another later on. To fix it
    right way we need to mark _dentry_ as unsuitable for mounting
    upon; new flag (DCACHE_CANT_MOUNT) is protected by d_flags and
    i_mutex on the inode in question. Set it (with dont_mount(dentry))
    in unlink/rmdir/etc., check (with cant_mount(dentry)) in places
    in namespace.c that used to check for S_DEAD. Setting S_DEAD
    is still needed in places where we used to set it (for directories
    getting killed), since we rely on it for readdir/rmdir race
    prevention.

    2) rename()/mount() protection has another bogosity - we unhash
    the target before we'd checked that it's not a mountpoint. Fixed.

    3) ancient bogosity in pivot_root() - we locked i_mutex on the
    right directory, but checked S_DEAD on the different (and wrong)
    one. Noticed and fixed.

    Signed-off-by: Al Viro

    Al Viro
     

12 Apr, 2010

6 commits


04 Mar, 2010

7 commits

  • Add a new UMOUNT_NOFOLLOW flag to umount(2). This is needed to prevent
    symlink attacks in unprivileged unmounts (fuse, samba, ncpfs).

    Additionally, return -EINVAL if an unknown flag is used (and specify
    an explicitly unused flag: UMOUNT_UNUSED). This makes it possible for
    the caller to determine if a flag is supported or not.

    CC: Eugene Teo
    CC: Michael Kerrisk
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • It hadn't been needed since we'd sanitized the logics in
    mark_mounts_for_expiry() (which, in turn, used to be a
    rudiment of bad old times when namespace_sem was per-ns).

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • apply function to vfsmounts in set returned by collect_mounts(),
    stop if it returns non-zero.

    Signed-off-by: Al Viro

    Al Viro
     
  • The handling of mount flags in set_mnt_shared() got a little tangled
    up during previous cleanups, with the following problems:

    * MNT_PNODE_MASK is defined as a literal constant when it should be a
    bitwise xor of other MNT_* flags
    * set_mnt_shared() clears and then sets MNT_SHARED (part of MNT_PNODE_MASK)
    * MNT_PNODE_MASK could use a comment in mount.h
    * MNT_PNODE_MASK is a terrible name, change to MNT_SHARED_MASK

    This patch fixes these problems.

    Signed-off-by: Al Viro

    Valerie Aurora
     
  • First of all, get_source() never results in CL_PROPAGATION
    alone. We either get CL_MAKE_SHARED (for the continuation
    of peer group) or CL_SLAVE (slave that is not shared) or both
    (beginning of peer group among slaves). Massage the code to
    make that explicit, kill CL_PROPAGATION test in clone_mnt()
    (nothing sets CL_MAKE_SHARED without CL_PROPAGATION and in
    clone_mnt() we are checking CL_PROPAGATION after we'd found
    that there's no CL_SLAVE, so the check for CL_MAKE_SHARED
    would do just as well).

    Fix comments, while we are at it...

    Signed-off-by: Al Viro

    Al Viro
     

17 Jan, 2010

4 commits


18 Dec, 2009

1 commit

  • This reverts commit e9496ff46a20a8592fdc7bdaaf41b45eb808d310. Quoth Al:

    "it's dependent on a lot of other stuff not currently in mainline
    and badly broken with current fs/namespace.c. Sorry, badly
    out-of-order cherry-pick from old queue.

    PS: there's a large pending series reworking the refcounting and
    lifetime rules for vfsmounts that will, among other things, allow to
    rip a subtree away _without_ dissolving connections in it, to be
    garbage-collected when all active references are gone. It's
    considerably saner wrt "is the subtree busy" logics, but it's nowhere
    near being ready for merge at the moment; this changeset is one of the
    things becoming possible with that sucker, but it certainly shouldn't
    have been picked during this cycle. My apologies..."

    Noticed-by: Eric Paris
    Requested-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

17 Dec, 2009

1 commit


12 Oct, 2009

1 commit

  • This patch allows LSM modules to determine based on original mount flags
    passed to mount(). A LSM module can get masked mount flags (if needed) by

    flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE |
    MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
    MS_STRICTATIME);

    Signed-off-by: Tetsuo Handa
    Signed-off-by: James Morris

    Tetsuo Handa
     

24 Sep, 2009

1 commit

  • sys_mount() reads/copies a whole page for its "type" parameter. When
    do_mount_root() passes a kernel address that points to an object which is
    smaller than a whole page, copy_mount_options() will happily go past this
    memory object, possibly dereferencing "wild" pointers that could be in any
    state (hence the kmemcheck warning, which shows that parts of the next
    page are not even allocated).

    (The likelihood of something going wrong here is pretty low -- first of
    all this only applies to kernel calls to sys_mount(), which are mostly
    found in the boot code. Secondly, I guess if the page was not mapped,
    exact_copy_from_user() _would_ in fact handle it correctly because of its
    access_ok(), etc. checks.)

    But it is much nicer to avoid the dubious reads altogether, by stopping as
    soon as we find a NUL byte. Is there a good reason why we can't do
    something like this, using the already existing strndup_from_user()?

    [akpm@linux-foundation.org: make copy_mount_string() static]
    [AV: fix compat mount breakage, which involves undoing akpm's change above]

    Reported-by: Ingo Molnar
    Signed-off-by: Vegard Nossum
    Cc: Al Viro
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: al

    Vegard Nossum
     

08 Aug, 2009

1 commit

  • I suspect that mnt_want_write_file() may have wrong assumption. I think
    mnt_want_write_file() is assuming it increments ->mnt_writers if
    (file->f_mode & FMODE_WRITE). But, if it's special_file(), it is false?

    Signed-off-by: OGAWA Hirofumi
    Acked-by: Dave Hansen
    Cc: Al Viro
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     

09 Jul, 2009

1 commit

  • Fix various silly problems wrt mnt_namespace.h:

    - exit_mnt_ns() isn't used, remove it
    - done that, sched.h and nsproxy.h inclusions aren't needed
    - mount.h inclusion was need for vfsmount_lock, but no longer
    - remove mnt_namespace.h inclusion from files which don't use anything
    from mnt_namespace.h

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

24 Jun, 2009

2 commits


23 Jun, 2009

2 commits

  • The purpose of this patch is to improve the remote mount path lookup
    support for distributed filesystems such as the NFSv4 client.

    When given a mount command of the form "mount server:/foo/bar /mnt", the
    NFSv4 client is required to look up the filehandle for "server:/", and
    then look up each component of the remote mount path "foo/bar" in order
    to find the directory that is actually going to be mounted on /mnt.
    Following that remote mount path may involve following symlinks,
    crossing server-side mount points and even following referrals to
    filesystem volumes on other servers.

    Since the standard VFS path lookup code already supports walking paths
    that contain all these features (using in-kernel automounts for
    following referrals) we would like to be able to reuse that rather than
    duplicate the full path traversal functionality in the NFSv4 client code.

    This patch therefore defines a VFS helper function create_mnt_ns(), that
    sets up a temporary filesystem namespace and attaches a root filesystem to
    it. It exports the create_mnt_ns() and put_mnt_ns() function for use by
    filesystem modules.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     
  • In order to allow modules to use it without having to export vfsmount_lock.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     

12 Jun, 2009

1 commit