18 May, 2010

1 commit


15 May, 2010

1 commit

  • 1) i_flags simply doesn't work for mount/unlink race prevention;
    we may have many links to file and rm on one of those obviously
    shouldn't prevent bind on top of another later on. To fix it
    right way we need to mark _dentry_ as unsuitable for mounting
    upon; new flag (DCACHE_CANT_MOUNT) is protected by d_flags and
    i_mutex on the inode in question. Set it (with dont_mount(dentry))
    in unlink/rmdir/etc., check (with cant_mount(dentry)) in places
    in namespace.c that used to check for S_DEAD. Setting S_DEAD
    is still needed in places where we used to set it (for directories
    getting killed), since we rely on it for readdir/rmdir race
    prevention.

    2) rename()/mount() protection has another bogosity - we unhash
    the target before we'd checked that it's not a mountpoint. Fixed.

    3) ancient bogosity in pivot_root() - we locked i_mutex on the
    right directory, but checked S_DEAD on the different (and wrong)
    one. Noticed and fixed.

    Signed-off-by: Al Viro

    Al Viro
     

12 Apr, 2010

6 commits


04 Mar, 2010

7 commits

  • Add a new UMOUNT_NOFOLLOW flag to umount(2). This is needed to prevent
    symlink attacks in unprivileged unmounts (fuse, samba, ncpfs).

    Additionally, return -EINVAL if an unknown flag is used (and specify
    an explicitly unused flag: UMOUNT_UNUSED). This makes it possible for
    the caller to determine if a flag is supported or not.

    CC: Eugene Teo
    CC: Michael Kerrisk
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • It hadn't been needed since we'd sanitized the logics in
    mark_mounts_for_expiry() (which, in turn, used to be a
    rudiment of bad old times when namespace_sem was per-ns).

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • apply function to vfsmounts in set returned by collect_mounts(),
    stop if it returns non-zero.

    Signed-off-by: Al Viro

    Al Viro
     
  • The handling of mount flags in set_mnt_shared() got a little tangled
    up during previous cleanups, with the following problems:

    * MNT_PNODE_MASK is defined as a literal constant when it should be a
    bitwise xor of other MNT_* flags
    * set_mnt_shared() clears and then sets MNT_SHARED (part of MNT_PNODE_MASK)
    * MNT_PNODE_MASK could use a comment in mount.h
    * MNT_PNODE_MASK is a terrible name, change to MNT_SHARED_MASK

    This patch fixes these problems.

    Signed-off-by: Al Viro

    Valerie Aurora
     
  • First of all, get_source() never results in CL_PROPAGATION
    alone. We either get CL_MAKE_SHARED (for the continuation
    of peer group) or CL_SLAVE (slave that is not shared) or both
    (beginning of peer group among slaves). Massage the code to
    make that explicit, kill CL_PROPAGATION test in clone_mnt()
    (nothing sets CL_MAKE_SHARED without CL_PROPAGATION and in
    clone_mnt() we are checking CL_PROPAGATION after we'd found
    that there's no CL_SLAVE, so the check for CL_MAKE_SHARED
    would do just as well).

    Fix comments, while we are at it...

    Signed-off-by: Al Viro

    Al Viro
     

17 Jan, 2010

4 commits


18 Dec, 2009

1 commit

  • This reverts commit e9496ff46a20a8592fdc7bdaaf41b45eb808d310. Quoth Al:

    "it's dependent on a lot of other stuff not currently in mainline
    and badly broken with current fs/namespace.c. Sorry, badly
    out-of-order cherry-pick from old queue.

    PS: there's a large pending series reworking the refcounting and
    lifetime rules for vfsmounts that will, among other things, allow to
    rip a subtree away _without_ dissolving connections in it, to be
    garbage-collected when all active references are gone. It's
    considerably saner wrt "is the subtree busy" logics, but it's nowhere
    near being ready for merge at the moment; this changeset is one of the
    things becoming possible with that sucker, but it certainly shouldn't
    have been picked during this cycle. My apologies..."

    Noticed-by: Eric Paris
    Requested-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

17 Dec, 2009

1 commit


12 Oct, 2009

1 commit

  • This patch allows LSM modules to determine based on original mount flags
    passed to mount(). A LSM module can get masked mount flags (if needed) by

    flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE |
    MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
    MS_STRICTATIME);

    Signed-off-by: Tetsuo Handa
    Signed-off-by: James Morris

    Tetsuo Handa
     

24 Sep, 2009

1 commit

  • sys_mount() reads/copies a whole page for its "type" parameter. When
    do_mount_root() passes a kernel address that points to an object which is
    smaller than a whole page, copy_mount_options() will happily go past this
    memory object, possibly dereferencing "wild" pointers that could be in any
    state (hence the kmemcheck warning, which shows that parts of the next
    page are not even allocated).

    (The likelihood of something going wrong here is pretty low -- first of
    all this only applies to kernel calls to sys_mount(), which are mostly
    found in the boot code. Secondly, I guess if the page was not mapped,
    exact_copy_from_user() _would_ in fact handle it correctly because of its
    access_ok(), etc. checks.)

    But it is much nicer to avoid the dubious reads altogether, by stopping as
    soon as we find a NUL byte. Is there a good reason why we can't do
    something like this, using the already existing strndup_from_user()?

    [akpm@linux-foundation.org: make copy_mount_string() static]
    [AV: fix compat mount breakage, which involves undoing akpm's change above]

    Reported-by: Ingo Molnar
    Signed-off-by: Vegard Nossum
    Cc: Al Viro
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: al

    Vegard Nossum
     

08 Aug, 2009

1 commit

  • I suspect that mnt_want_write_file() may have wrong assumption. I think
    mnt_want_write_file() is assuming it increments ->mnt_writers if
    (file->f_mode & FMODE_WRITE). But, if it's special_file(), it is false?

    Signed-off-by: OGAWA Hirofumi
    Acked-by: Dave Hansen
    Cc: Al Viro
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     

09 Jul, 2009

1 commit

  • Fix various silly problems wrt mnt_namespace.h:

    - exit_mnt_ns() isn't used, remove it
    - done that, sched.h and nsproxy.h inclusions aren't needed
    - mount.h inclusion was need for vfsmount_lock, but no longer
    - remove mnt_namespace.h inclusion from files which don't use anything
    from mnt_namespace.h

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

24 Jun, 2009

2 commits


23 Jun, 2009

2 commits

  • The purpose of this patch is to improve the remote mount path lookup
    support for distributed filesystems such as the NFSv4 client.

    When given a mount command of the form "mount server:/foo/bar /mnt", the
    NFSv4 client is required to look up the filehandle for "server:/", and
    then look up each component of the remote mount path "foo/bar" in order
    to find the directory that is actually going to be mounted on /mnt.
    Following that remote mount path may involve following symlinks,
    crossing server-side mount points and even following referrals to
    filesystem volumes on other servers.

    Since the standard VFS path lookup code already supports walking paths
    that contain all these features (using in-kernel automounts for
    following referrals) we would like to be able to reuse that rather than
    duplicate the full path traversal functionality in the NFSv4 client code.

    This patch therefore defines a VFS helper function create_mnt_ns(), that
    sets up a temporary filesystem namespace and attaches a root filesystem to
    it. It exports the create_mnt_ns() and put_mnt_ns() function for use by
    filesystem modules.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     
  • In order to allow modules to use it without having to export vfsmount_lock.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     

12 Jun, 2009

10 commits

  • [folded fix from Jiri Slaby]

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • d_unlinked() will be used in middle-term to ban checkpointing when opened
    but unlinked file is detected, and in long term, to detect such situation
    and special case on it.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Al Viro

    Alexey Dobriyan
     
  • This patch speeds up lmbench lat_mmap test by about another 2% after the
    first patch.

    Before:
    avg = 462.286
    std = 5.46106

    After:
    avg = 453.12
    std = 9.58257

    (50 runs of each, stddev gives a reasonable confidence)

    It does this by introducing mnt_clone_write, which avoids some heavyweight
    operations of mnt_want_write if called on a vfsmount which we know already
    has a write count; and mnt_want_write_file, which can call mnt_clone_write
    if the file is open for write.

    After these two patches, mnt_want_write and mnt_drop_write go from 7% on
    the profile down to 1.3% (including mnt_clone_write).

    [AV: mnt_want_write_file() should take file alone and derive mnt from it;
    not only all callers have that form, but that's the only mnt about which
    we know that it's already held for write if file is opened for write]

    Cc: Dave Hansen
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de
     
  • This patch speeds up lmbench lat_mmap test by about 8%. lat_mmap is set up
    basically to mmap a 64MB file on tmpfs, fault in its pages, then unmap it.
    A microbenchmark yes, but it exercises some important paths in the mm.

    Before:
    avg = 501.9
    std = 14.7773

    After:
    avg = 462.286
    std = 5.46106

    (50 runs of each, stddev gives a reasonable confidence, but there is quite
    a bit of variation there still)

    It does this by removing the complex per-cpu locking and counter-cache and
    replaces it with a percpu counter in struct vfsmount. This makes the code
    much simpler, and avoids spinlocks (although the msync is still pretty
    costly, unfortunately). It results in about 900 bytes smaller code too. It
    does increase the size of a vfsmount, however.

    It should also give a speedup on large systems if CPUs are frequently operating
    on different mounts (because the existing scheme has to operate on an atomic in
    the struct vfsmount when switching between mounts). But I'm most interested in
    the single threaded path performance for the moment.

    [AV: minor cleanup]

    Cc: Dave Hansen
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • These guys are what we add as submounts; checks for "is that attached in
    our namespace" are simply irrelevant for those and counterproductive for
    use of private vfsmount trees a-la what NFS folks want.

    Signed-off-by: Al Viro

    Al Viro
     

09 May, 2009

1 commit

  • Put generic_show_options read access to s_options under rcu_read_lock,
    split save_mount_options() into "we are setting it the first time"
    (uses in foo_fill_super()) and "we are relacing and freeing the old one",
    synchronize_rcu() before kfree() in the latter.

    Signed-off-by: Al Viro

    Al Viro