02 Apr, 2014

1 commit


31 Mar, 2014

2 commits

  • fixes RCU bug - walking through hlist is safe in face of element moves,
    since it's self-terminating. Cyclic lists are not - if we end up jumping
    to another hash chain, we'll loop infinitely without ever hitting the
    original list head.

    [fix for dumb braino folded]

    Spotted by: Max Kellermann
    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Al Viro
     
  • * switch allocation to alloc_large_system_hash()
    * make sizes overridable by boot parameters (mhash_entries=, mphash_entries=)
    * switch mountpoint_hashtable from list_head to hlist_head

    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Al Viro
     

26 Jan, 2014

1 commit

  • A bug was introduced with the is_mounted helper function in
    commit f7a99c5b7c8bd3d3f533c8b38274e33f3da9096e
    Author: Al Viro
    Date: Sat Jun 9 00:59:08 2012 -0400

    get rid of ->mnt_longterm

    it's enough to set ->mnt_ns of internal vfsmounts to something
    distinct from all struct mnt_namespace out there; then we can
    just use the check for ->mnt_ns != NULL in the fast path of
    mntput_no_expire()

    Signed-off-by: Al Viro

    The intent was to test if the real_mount(vfsmount)->mnt_ns was
    NULL_OR_ERR but the code is actually testing real_mount(vfsmount)
    and always returning true.

    The result is d_absolute_path returning paths it should be hiding.

    Cc: stable@vger.kernel.org
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Al Viro

    Eric W. Biederman
     

09 Nov, 2013

1 commit

  • * RCU-delayed freeing of vfsmounts
    * vfsmount_lock replaced with a seqlock (mount_lock)
    * sequence number from mount_lock is stored in nameidata->m_seq and
    used when we exit RCU mode
    * new vfsmount flag - MNT_SYNC_UMOUNT. Set by umount_tree() when its
    caller knows that vfsmount will have no surviving references.
    * synchronize_rcu() done between unlocking namespace_sem in namespace_unlock()
    and doing pending mntput().
    * new helper: legitimize_mnt(mnt, seq). Checks the mount_lock sequence
    number against seq, then grabs reference to mnt. Then it rechecks mount_lock
    again to close the race and either returns success or drops the reference it
    has acquired. The subtle point is that in case of MNT_SYNC_UMOUNT we can
    simply decrement the refcount and sod off - aforementioned synchronize_rcu()
    makes sure that final mntput() won't come until we leave RCU mode. We need
    that, since we don't want to end up with some lazy pathwalk racing with
    umount() and stealing the final mntput() from it - caller of umount() may
    expect it to return only once the fs is shut down and we don't want to break
    that. In other cases (i.e. with MNT_SYNC_UMOUNT absent) we have to do
    full-blown mntput() in case of mount_lock sequence number mismatch happening
    just as we'd grabbed the reference, but in those cases we won't be stealing
    the final mntput() from anything that would care.
    * mntput_no_expire() doesn't lock anything on the fast path now. Incidentally,
    SMP and UP cases are handled the same way - no ifdefs there.
    * normal pathname resolution does *not* do any writes to mount_lock. It does,
    of course, bump the refcounts of vfsmount and dentry in the very end, but that's
    it.

    Signed-off-by: Al Viro

    Al Viro
     

25 Oct, 2013

3 commits


10 Apr, 2013

1 commit


20 Nov, 2012

1 commit

  • Assign a unique proc inode to each namespace, and use that
    inode number to ensure we only allocate at most one proc
    inode for every namespace in proc.

    A single proc inode per namespace allows userspace to test
    to see if two processes are in the same namespace.

    This has been a long requested feature and only blocked because
    a naive implementation would put the id in a global space and
    would ultimately require having a namespace for the names of
    namespaces, making migration and certain virtualization tricks
    impossible.

    We still don't have per superblock inode numbers for proc, which
    appears necessary for application unaware checkpoint/restart and
    migrations (if the application is using namespace file descriptors)
    but that is now allowd by the design if it becomes important.

    I have preallocated the ipc and uts initial proc inode numbers so
    their structures can be statically initialized.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

19 Nov, 2012

2 commits

  • This will allow for support for unprivileged mounts in a new user namespace.

    Acked-by: "Serge E. Hallyn"
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • setns support for the mount namespace is a little tricky as an
    arbitrary decision must be made about what to set fs->root and
    fs->pwd to, as there is no expectation of a relationship between
    the two mount namespaces. Therefore I arbitrarily find the root
    mount point, and follow every mount on top of it to find the top
    of the mount stack. Then I set fs->root and fs->pwd to that
    location. The topmost root of the mount stack seems like a
    reasonable place to be.

    Bind mount support for the mount namespace inodes has the
    possibility of creating circular dependencies between mount
    namespaces. Circular dependencies can result in loops that
    prevent mount namespaces from every being freed. I avoid
    creating those circular dependencies by adding a sequence number
    to the mount namespace and require all bind mounts be of a
    younger mount namespace into an older mount namespace.

    Add a helper function proc_ns_inode so it is possible to
    detect when we are attempting to bind mound a namespace inode.

    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

14 Jul, 2012

2 commits

  • don't rely on proc_mounts->m being the first field; container_of()
    is there for purpose. No need to bother with ->private, while
    we are at it - the same container_of will do nicely.

    Signed-off-by: Al Viro

    Al Viro
     
  • it's enough to set ->mnt_ns of internal vfsmounts to something
    distinct from all struct mnt_namespace out there; then we can
    just use the check for ->mnt_ns != NULL in the fast path of
    mntput_no_expire()

    Signed-off-by: Al Viro

    Al Viro
     

07 Jan, 2012

1 commit


04 Jan, 2012

21 commits