14 Jul, 2012

1 commit

  • copy_tree() can theoretically fail in a case other than ENOMEM, but always
    returns NULL which is interpreted by callers as -ENOMEM. Change it to return
    an explicit error.

    Also change clone_mnt() for consistency and because union mounts will add new
    error cases.

    Thanks to Andreas Gruenbacher for a bug fix.
    [AV: folded braino fix by Dan Carpenter]

    Original-author: Valerie Aurora
    Signed-off-by: David Howells
    Cc: Valerie Aurora
    Cc: Andreas Gruenbacher
    Signed-off-by: Al Viro

    David Howells
     

30 May, 2012

1 commit

  • lglocks and brlocks are currently generated with some complicated macros
    in lglock.h. But there's no reason to not just use common utility
    functions and put all the data into a common data structure.

    In preparation, this patch changes the API to look more like normal
    function calls with pointers, not magic macros.

    The patch is rather large because I move over all users in one go to keep
    it bisectable. This impacts the VFS somewhat in terms of lines changed.
    But no actual behaviour change.

    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Andi Kleen
    Cc: Al Viro
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Rusty Russell
    Signed-off-by: Al Viro

    Andi Kleen
     

04 Jan, 2012

29 commits


07 Jan, 2011

1 commit

  • The problem that this patch aims to fix is vfsmount refcounting scalability.
    We need to take a reference on the vfsmount for every successful path lookup,
    which often go to the same mount point.

    The fundamental difficulty is that a "simple" reference count can never be made
    scalable, because any time a reference is dropped, we must check whether that
    was the last reference. To do that requires communication with all other CPUs
    that may have taken a reference count.

    We can make refcounts more scalable in a couple of ways, involving keeping
    distributed counters, and checking for the global-zero condition less
    frequently.

    - check the global sum once every interval (this will delay zero detection
    for some interval, so it's probably a showstopper for vfsmounts).

    - keep a local count and only taking the global sum when local reaches 0 (this
    is difficult for vfsmounts, because we can't hold preempt off for the life of
    a reference, so a counter would need to be per-thread or tied strongly to a
    particular CPU which requires more locking).

    - keep a local difference of increments and decrements, which allows us to sum
    the total difference and hence find the refcount when summing all CPUs. Then,
    keep a single integer "long" refcount for slow and long lasting references,
    and only take the global sum of local counters when the long refcount is 0.

    This last scheme is what I implemented here. Attached mounts and process root
    and working directory references are "long" references, and everything else is
    a short reference.

    This allows scalable vfsmount references during path walking over mounted
    subtrees and unattached (lazy umounted) mounts with processes still running
    in them.

    This results in one fewer atomic op in the fastpath: mntget is now just a
    per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
    and non-atomic decrement in the common case. However code is otherwise bigger
    and heavier, so single threaded performance is basically a wash.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

18 Aug, 2010

1 commit

  • fs: brlock vfsmount_lock

    Use a brlock for the vfsmount lock. It must be taken for write whenever
    modifying the mount hash or associated fields, and may be taken for read when
    performing mount hash lookups.

    A new lock is added for the mnt-id allocator, so it doesn't need to take
    the heavy vfsmount write-lock.

    The number of atomics should remain the same for fastpath rlock cases, though
    code would be slightly slower due to per-cpu access. Scalability is not not be
    much improved in common cases yet, due to other locks (ie. dcache_lock) getting
    in the way. However path lookups crossing mountpoints should be one case where
    scalability is improved (currently requiring the global lock).

    The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node
    Altix system (high latency to remote nodes), a simple umount microbenchmark
    (mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it
    took 6.8s, afterwards took 7.1s, about 5% slower.

    Cc: Al Viro
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     

04 Mar, 2010

1 commit

  • First of all, get_source() never results in CL_PROPAGATION
    alone. We either get CL_MAKE_SHARED (for the continuation
    of peer group) or CL_SLAVE (slave that is not shared) or both
    (beginning of peer group among slaves). Massage the code to
    make that explicit, kill CL_PROPAGATION test in clone_mnt()
    (nothing sets CL_MAKE_SHARED without CL_PROPAGATION and in
    clone_mnt() we are checking CL_PROPAGATION after we'd found
    that there's no CL_SLAVE, so the check for CL_MAKE_SHARED
    would do just as well).

    Fix comments, while we are at it...

    Signed-off-by: Al Viro

    Al Viro
     

23 Apr, 2008

2 commits

  • Show peer group ID of nearest dominating group that has intersection
    with the mount's namespace.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • Add a unique ID to each peer group using the IDR infrastructure. The
    identifiers are reused after the peer group dissolves.

    The IDR structures are protected by holding namepspace_sem for write
    while allocating or deallocating IDs.

    IDs are allocated when a previously unshared vfsmount becomes the
    first member of a peer group. When a new member is added to an
    existing group, the ID is copied from one of the old members.

    IDs are freed when the last member of a peer group is unshared.

    Setting the MNT_SHARED flag on members of a subtree is done as a
    separate step, after all the IDs have been allocated. This way an
    allocation failure can be cleaned up easilty, without affecting the
    propagation state.

    Based on design sketch by Al Viro.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     

22 Apr, 2008

2 commits


28 Mar, 2008

1 commit


07 Feb, 2008

1 commit

  • Some time ago ( http://lkml.org/lkml/2007/6/19/128 ) I wrote about
    MNT_UNBINDABLE that it felt like a bug that it is not reset by "mount
    --make-private".

    Today I happened to see mount(8) and Documentation/sharedsubtree.txt and
    both document the version obtained by applying the little patch given in
    the above (and again below).

    So, the present kernel code is not according to specs and must be regarded
    as buggy.

    Specification in Documentation/sharedsubtree.txt:
    See state diagram: unbindable should become private upon make-private.

    Specification in mount(8):
    ... It's
    also possible to set up uni-directional propagation (with --make-
    slave), to make a mount point unavailable for --bind/--rbind (with
    --make-unbindable), and to undo any of these (with --make-private).

    Repeat of old fix-shared-subtrees-make-private.patch
    (due to Dirk Gerrits, René Gabriëls, Peter Kooijmans):

    Acked-by: Ram Pai
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andries E. Brouwer