Eric Lee / linux-smarc-t335x-v3.2

17 Jan, 2011

1 commit

f03c65993 sanitize vfsmount refcounting changes ... Browse Code »

Instead of splitting refcount between (per-cpu) mnt_count
and (SMP-only) mnt_longrefs, make all references contribute
to mnt_count again and keep track of how many are longterm
ones.

Accounting rules for longterm count:
* 1 for each fs_struct.root.mnt
* 1 for each fs_struct.pwd.mnt
* 1 for having non-NULL ->mnt_ns
* decrement to 0 happens only under vfsmount lock exclusive

That allows nice common case for mntput() - since we can't drop the
final reference until after mnt_longterm has reached 0 due to the rules
above, mntput() can grab vfsmount lock shared and check mnt_longterm.
If it turns out to be non-zero (which is the common case), we know
that this is not the final mntput() and can just blindly decrement
percpu mnt_count. Otherwise we grab vfsmount lock exclusive and
do usual decrement-and-check of percpu mnt_count.

For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm();
namespace.c uses the latter in places where we don't already hold
vfsmount lock exclusive and opencodes a few remaining spots where
we need to manipulate mnt_longterm.

Note that we mostly revert the code outside of fs/namespace.c back
to what we used to have; in particular, normal code doesn't need
to care about two kinds of references, etc. And we get to keep
the optimization Nick's variant had bought us...

Signed-off-by: Al Viro

Al Viro
2011-01-17 02:47:07 +0800

16 Jan, 2011

1 commit

ea5b778a8 Unexport do_add_mount() and add in follow_automount(), not ->d_automount() ... Browse Code »

Unexport do_add_mount() and make ->d_automount() return the vfsmount to be
added rather than calling do_add_mount() itself. follow_automount() will then
do the addition.

This slightly complicates things as ->d_automount() normally wants to add the
new vfsmount to an expiration list and start an expiration timer. The problem
with that is that the vfsmount will be deleted if it has a refcount of 1 and
the timer will not repeat if the expiration list is empty.

To this end, we require the vfsmount to be returned from d_automount() with a
refcount of (at least) 2. One of these refs will be dropped unconditionally.
In addition, follow_automount() must get a 3rd ref around the call to
do_add_mount() lest it eat a ref and return an error, leaving the mount we
have open to being expired as we would otherwise have only 1 ref on it.

d_automount() should also add the the vfsmount to the expiration list (by
calling mnt_set_expiry()) and start the expiration timer before returning, if
this mechanism is to be used. The vfsmount will be unlinked from the
expiration list by follow_automount() if do_add_mount() fails.

This patch also fixes the call to do_add_mount() for AFS to propagate the mount
flags from the parent vfsmount.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2011-01-16 09:07:48 +0800

07 Jan, 2011

1 commit

b3e19d924 fs: scale mntget/mntput ... Browse Code »

The problem that this patch aims to fix is vfsmount refcounting scalability.
We need to take a reference on the vfsmount for every successful path lookup,
which often go to the same mount point.

The fundamental difficulty is that a "simple" reference count can never be made
scalable, because any time a reference is dropped, we must check whether that
was the last reference. To do that requires communication with all other CPUs
that may have taken a reference count.

We can make refcounts more scalable in a couple of ways, involving keeping
distributed counters, and checking for the global-zero condition less
frequently.

- check the global sum once every interval (this will delay zero detection
for some interval, so it's probably a showstopper for vfsmounts).

- keep a local count and only taking the global sum when local reaches 0 (this
is difficult for vfsmounts, because we can't hold preempt off for the life of
a reference, so a counter would need to be per-thread or tied strongly to a
particular CPU which requires more locking).

- keep a local difference of increments and decrements, which allows us to sum
the total difference and hence find the refcount when summing all CPUs. Then,
keep a single integer "long" refcount for slow and long lasting references,
and only take the global sum of local counters when the long refcount is 0.

This last scheme is what I implemented here. Attached mounts and process root
and working directory references are "long" references, and everything else is
a short reference.

This allows scalable vfsmount references during path walking over mounted
subtrees and unattached (lazy umounted) mounts with processes still running
in them.

This results in one fewer atomic op in the fastpath: mntget is now just a
per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
and non-atomic decrement in the common case. However code is otherwise bigger
and heavier, so single threaded performance is basically a wash.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:33 +0800

11 Aug, 2010

1 commit

532490f0a vfs: remove unused MNT_STRICTATIME ... Browse Code »

Commit d0adde574b8487ef30f69e2d08bba769e4be513f added MNT_STRICTATIME
but it isn't actually used (MS_STRICTATIME clears MNT_RELATIME and
MNT_NOATIME rather than setting any mount flag).

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2010-08-11 12:29:47 +0800

28 Jul, 2010

1 commit

2504c5d63 fsnotify/vfsmount: add fsnotify fields to struct vfsmount ... Browse Code »

This patch adds the list and mask fields needed to support vfsmount marks.
These are the same fields fsnotify needs on an inode. They are not used,
just declared and we note where the cleanup hook should be (the function is
not yet defined)

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Eric Paris

Andreas Gruenbacher
2010-07-28 21:58:57 +0800

05 Mar, 2010

1 commit

0f2cc4ecd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
init: Open /dev/console from rootfs
mqueue: fix typo "failues" -> "failures"
mqueue: only set error codes if they are really necessary
mqueue: simplify do_open() error handling
mqueue: apply mathematics distributivity on mq_bytes calculation
mqueue: remove unneeded info->messages initialization
mqueue: fix mq_open() file descriptor leak on user-space processes
fix race in d_splice_alias()
set S_DEAD on unlink() and non-directory rename() victims
vfs: add NOFOLLOW flag to umount(2)
get rid of ->mnt_parent in tomoyo/realpath
hppfs can use existing proc_mnt, no need for do_kern_mount() in there
Mirror MS_KERNMOUNT in ->mnt_flags
get rid of useless vfsmount_lock use in put_mnt_ns()
Take vfsmount_lock to fs/internal.h
get rid of insanity with namespace roots in tomoyo
take check for new events in namespace (guts of mounts_poll()) to namespace.c
Don't mess with generic_permission() under ->d_lock in hpfs
sanitize const/signedness for udf
nilfs: sanitize const/signedness in dealing with ->d_name.name
...

Fix up fairly trivial (famous last words...) conflicts in
drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c

Linus Torvalds
2010-03-05 00:15:33 +0800

04 Mar, 2010

3 commits

8089352a1 Mirror MS_KERNMOUNT in ->mnt_flags ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2010-03-04 03:08:00 +0800
47cd813f2 Take vfsmount_lock to fs/internal.h ... Browse Code »

no more users left outside of fs/*.c (and very few outside of
fs/namespace.c, actually)

Signed-off-by: Al Viro

Al Viro
2010-03-04 03:07:59 +0800
495d6c9c6 VFS: Clean up shared mount flag propagation ... Browse Code »

The handling of mount flags in set_mnt_shared() got a little tangled
up during previous cleanups, with the following problems:

* MNT_PNODE_MASK is defined as a literal constant when it should be a
bitwise xor of other MNT_* flags
* set_mnt_shared() clears and then sets MNT_SHARED (part of MNT_PNODE_MASK)
* MNT_PNODE_MASK could use a comment in mount.h
* MNT_PNODE_MASK is a terrible name, change to MNT_SHARED_MASK

This patch fixes these problems.

Signed-off-by: Al Viro

Valerie Aurora
2010-03-04 03:07:55 +0800

17 Feb, 2010

1 commit

003cb608a percpu: add __percpu sparse annotations to fs ... Browse Code »

Add __percpu sparse annotations to fs.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors. This patch doesn't affect normal builds.

Signed-off-by: Tejun Heo
Cc: "Theodore Ts'o"
Cc: Trond Myklebust
Cc: Alex Elder
Cc: Christoph Hellwig
Cc: Alexander Viro

Tejun Heo
2010-02-17 10:17:38 +0800

12 Jun, 2009

2 commits

96029c4e0 fs: introduce mnt_clone_write ... Browse Code »

This patch speeds up lmbench lat_mmap test by about another 2% after the
first patch.

Before:
avg = 462.286
std = 5.46106

After:
avg = 453.12
std = 9.58257

(50 runs of each, stddev gives a reasonable confidence)

It does this by introducing mnt_clone_write, which avoids some heavyweight
operations of mnt_want_write if called on a vfsmount which we know already
has a write count; and mnt_want_write_file, which can call mnt_clone_write
if the file is open for write.

After these two patches, mnt_want_write and mnt_drop_write go from 7% on
the profile down to 1.3% (including mnt_clone_write).

[AV: mnt_want_write_file() should take file alone and derive mnt from it;
not only all callers have that form, but that's the only mnt about which
we know that it's already held for write if file is opened for write]

Cc: Dave Hansen
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro

npiggin@suse.de
2009-06-12 09:36:02 +0800
d3ef3d735 fs: mnt_want_write speedup ... Browse Code »

This patch speeds up lmbench lat_mmap test by about 8%. lat_mmap is set up
basically to mmap a 64MB file on tmpfs, fault in its pages, then unmap it.
A microbenchmark yes, but it exercises some important paths in the mm.

Before:
avg = 501.9
std = 14.7773

After:
avg = 462.286
std = 5.46106

(50 runs of each, stddev gives a reasonable confidence, but there is quite
a bit of variation there still)

It does this by removing the complex per-cpu locking and counter-cache and
replaces it with a percpu counter in struct vfsmount. This makes the code
much simpler, and avoids spinlocks (although the msync is still pretty
costly, unfortunately). It results in about 900 bytes smaller code too. It
does increase the size of a vfsmount, however.

It should also give a speedup on large systems if CPUs are frequently operating
on different mounts (because the existing scheme has to operate on an atomic in
the struct vfsmount when switching between mounts). But I'm most interested in
the single threaded path performance for the moment.

[AV: minor cleanup]

Cc: Dave Hansen
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro

npiggin@suse.de
2009-06-12 09:36:02 +0800

27 Mar, 2009

1 commit

d0adde574 Add a strictatime mount option ... Browse Code »

Add support for explicitly requesting full atime updates. This makes it
possible for kernels to default to relatime but still allow userspace to
override it.

Signed-off-by: Matthew Garrett
Signed-off-by: Linus Torvalds

Matthew Garrett
2009-03-27 01:56:35 +0800

17 Oct, 2008

1 commit

693ac3893 include/linux/mount.h: remove CVS keyword ... Browse Code »

Remove a CVS keyword that wasn't updated for a long time from a comment.

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2008-10-17 02:21:30 +0800

01 Aug, 2008

1 commit

8d66bf548 [PATCH] pass struct path * to do_add_mount() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2008-08-01 23:25:32 +0800

27 Jul, 2008

1 commit

88b387824 [PATCH] vfs: use kstrdup() and check failing allocation ... Browse Code »

- use kstrdup() instead of kmalloc() + memcpy()
- return NULL if allocating ->mnt_devname failed
- mnt_devname should be const

Signed-off-by: Li Zefan
Acked-by: Cyrill Gorcunov
Signed-off-by: Al Viro

Li Zefan
2008-07-27 08:53:24 +0800

30 Apr, 2008

1 commit

735643ee6 Remove "#ifdef __KERNEL__" checks from unexported headers ... Browse Code »

Remove the "#ifdef __KERNEL__" tests from unexported header files in
linux/include whose entire contents are wrapped in that preprocessor
test.

Signed-off-by: Robert P. J. Day
Cc: David Woodhouse
Cc: Sam Ravnborg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robert P. J. Day
2008-04-30 23:29:54 +0800

23 Apr, 2008

2 commits

719f5d7f0 [patch 4/7] vfs: mountinfo: add mount peer group ID ... Browse Code »

Add a unique ID to each peer group using the IDR infrastructure. The
identifiers are reused after the peer group dissolves.

The IDR structures are protected by holding namepspace_sem for write
while allocating or deallocating IDs.

IDs are allocated when a previously unshared vfsmount becomes the
first member of a peer group. When a new member is added to an
existing group, the ID is copied from one of the old members.

IDs are freed when the last member of a peer group is unshared.

Setting the MNT_SHARED flag on members of a subtree is done as a
separate step, after all the IDs have been allocated. This way an
allocation failure can be cleaned up easilty, without affecting the
propagation state.

Based on design sketch by Al Viro.

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2008-04-23 12:04:51 +0800
73cd49ecd [patch 3/7] vfs: mountinfo: add mount ID ... Browse Code »

Add a unique ID to each vfsmount using the IDR infrastructure. The
identifiers are reused after the vfsmount is freed.

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2008-04-23 12:04:45 +0800

22 Apr, 2008

1 commit

6d59e7f58 [PATCH] move a bunch of declarations to fs/internal.h ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2008-04-22 11:11:01 +0800

19 Apr, 2008

3 commits

2e4b7fcd9 [PATCH] r/o bind mounts: honor mount writer counts at remount ... Browse Code »

Originally from: Herbert Poetzl

This is the core of the read-only bind mount patch set.

Note that this does _not_ add a "ro" option directly to the bind mount
operation. If you require such a mount, you must first do the bind, then
follow it up with a 'mount -o remount,ro' operation:

If you wish to have a r/o bind mount of /foo on bar:

mount --bind /foo /bar
mount -o remount,ro /bar

Acked-by: Al Viro
Signed-off-by: Christoph Hellwig
Signed-off-by: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Dave Hansen
2008-04-19 12:29:27 +0800
3d733633a [PATCH] r/o bind mounts: track numbers of writers to mounts ... Browse Code »

This is the real meat of the entire series. It actually
implements the tracking of the number of writers to a mount.
However, it causes scalability problems because there can be
hundreds of cpus doing open()/close() on files on the same mnt at
the same time. Even an atomic_t in the mnt has massive scalaing
problems because the cacheline gets so terribly contended.

This uses a statically-allocated percpu variable. All want/drop
operations are local to a cpu as long that cpu operates on the same
mount, and there are no writer count imbalances. Writer count
imbalances happen when a write is taken on one cpu, and released
on another, like when an open/close pair is performed on two

Upon a remount,ro request, all of the data from the percpu
variables is collected (expensive, but very rare) and we determine
if there are any outstanding writers to the mount.

I've written a little benchmark to sit in a loop for a couple of
seconds in several cpus in parallel doing open/write/close loops.

http://sr71.net/~dave/linux/openbench.c

The code in here is a a worst-possible case for this patch. It
does opens on a _pair_ of files in two different mounts in parallel.
This should cause my code to lose its "operate on the same mount"
optimization completely. This worst-case scenario causes a 3%
degredation in the benchmark.

I could probably get rid of even this 3%, but it would be more
complex than what I have here, and I think this is getting into
acceptable territory. In practice, I expect writing more than 3
bytes to a file, as well as disk I/O to mask any effects that this
has.

(To get rid of that 3%, we could have an #defined number of mounts
in the percpu variable. So, instead of a CPU getting operate only
on percpu data when it accesses only one mount, it could stay on
percpu data when it only accesses N or fewer mounts.)

[AV] merged fix for __clear_mnt_mount() stepping on freed vfsmount

Acked-by: Al Viro
Signed-off-by: Christoph Hellwig
Signed-off-by: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Dave Hansen
2008-04-19 12:29:27 +0800
8366025eb [PATCH] r/o bind mounts: stub functions ... Browse Code »

This patch adds two function mnt_want_write() and mnt_drop_write(). These are
used like a lock pair around and fs operations that might cause a write to the
filesystem.

Before these can become useful, we must first cover each place in the VFS
where writes are performed with a want/drop pair. When that is complete, we
can actually introduce code that will safely check the counts before allowing
r/wr/o transitions to occur.

Acked-by: Serge Hallyn
Acked-by: Al Viro
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Dave Hansen
Signed-off-by: Al Viro

Dave Hansen
2008-04-19 12:25:32 +0800

28 Mar, 2008

2 commits

c35038bec [PATCH] do shrink_submounts() for all fs types ... Browse Code »

... and take it out of ->umount_begin() instances. Call with all locks
already taken (by do_umount()) and leave calling release_mounts() to
caller (it will do release_mounts() anyway, so we can just put into
the same list).

Signed-off-by: Al Viro

Al Viro
2008-03-28 08:47:58 +0800
7c4b93d82 [PATCH] count ghost references to vfsmounts ... Browse Code »

make propagate_mount_busy() exclude references from the vfsmounts
that had been isolated by umount_tree() and are just waiting for
release_mounts() to dispose of their ->mnt_parent/->mnt_mountpoint.

Signed-off-by: Al Viro

Al Viro
2008-03-28 08:47:46 +0800

09 May, 2007

1 commit

beb7dd86a Fix misspellings collected by members of KJ list. ... Browse Code »

Fix the misspellings of "propogate", "writting" and (oh, the shame
:-) "kenrel" in the source tree.

Signed-off-by: Robert P. J. Day
Signed-off-by: Adrian Bunk

Robert P. J. Day
2007-05-09 13:14:03 +0800

12 Feb, 2007

1 commit

4ba4d4c0c [PATCH] struct vfsmount: keep mnt_count & mnt_expiry_mark away from mnt_flags ... Browse Code »

I noticed cache misses in touch_atime() that can be avoided if we keep
mnt_count & mnt_expiry_mark in a different cache line than mnt_flags
(mostly read)

mnt_count & mnt_expiry_mark are modified each time a file is opened/closed
in a file system.

touch_atime() is called each time a file is read, and generally needs to
read mnt_flags.

Other fields of struct vfsmount are mostly read so I chose to move
mnt_count & mnt_expiry_mark at the end of struct vfsmount. And adding a
comment so that nobody tries to re-arrange fields to fill the holes :)

On 64bits platforms, the new offsetof(mnt_count) is 0xC0
On 32bits platforms, it is 0x60, so I didnot add a
____cacheline_aligned_in_smp because it would have a too big impact on the
size of this object (in particular if CONFIG_X86_L1_CACHE_SHIFT=7)

Signed-off-by: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2007-02-12 02:51:25 +0800

14 Dec, 2006

1 commit

47ae32d6a [PATCH] relative atime ... Browse Code »

Add "relatime" (relative atime) support. Relative atime only updates the
atime if the previous atime is older than the mtime or ctime. Like
noatime, but useful for applications like mutt that need to know when a
file has been read since it was last modified.

A corresponding patch against mount(8) is available at
http://userweb.kernel.org/~akpm/mount-relative-atime.txt

Signed-off-by: Valerie Henson
Cc: Mark Fasheh
Cc: Al Viro
Cc: Christoph Hellwig
Cc: Karel Zak
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Valerie Henson
2006-12-14 01:05:50 +0800

09 Dec, 2006

1 commit

6b3286ed1 [PATCH] rename struct namespace to struct mnt_namespace ... Browse Code »

Rename 'struct namespace' to 'struct mnt_namespace' to avoid confusion with
other namespaces being developped for the containers : pid, uts, ipc, etc.
'namespace' variables and attributes are also renamed to 'mnt_ns'

Signed-off-by: Kirill Korotaev
Signed-off-by: Cedric Le Goater
Cc: Eric W. Biederman
Cc: Herbert Poetzl
Cc: Sukadev Bhattiprolu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill Korotaev
2006-12-09 00:28:51 +0800

25 Jun, 2006

1 commit

816724e65 Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ ... Browse Code »

Conflicts:

fs/nfs/inode.c
fs/super.c

Fix conflicts between patch 'NFS: Split fs/nfs/inode.c' and patch
'VFS: Permit filesystem to override root dentry on mount'

Trond Myklebust
2006-06-25 01:07:53 +0800

23 Jun, 2006

1 commit

726c33422 [PATCH] VFS: Permit filesystem to perform statfs with a known root dentry ... Browse Code »

Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.

This complements the get_sb() patch. That reduced the significance of
sb->s_root, allowing NFS to place a fake root there. However, NFS does
require a dentry to use as a target for the statfs operation. This permits
the root in the vfsmount to be used instead.

linux/mount.h has been added where necessary to make allyesconfig build
successfully.

Interest has also been expressed for use with the FUSE and XFS filesystems.

Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Nathan Scott
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-06-23 22:42:45 +0800

09 Jun, 2006

2 commits

5528f911b VFS: Add shrink_submounts() ... Browse Code »

Allow a submount to be marked as being 'shrinkable' by means of the
vfsmount->mnt_flags, and then add a function 'shrink_submounts()' which
attempts to recursively unmount these submounts.

Signed-off-by: Trond Myklebust

Trond Myklebust
2006-06-09 21:34:17 +0800
bb4a58bf4 VFS: Add GPL_EXPORTED function vfs_kern_mount() ... Browse Code »

do_kern_mount() does not allow the kernel to use private mount interfaces
without exposing the same interfaces to userland. The problem is that the
filesystem is referenced by name, thus meaning that it and its mount
interface must be registered in the global filesystem list.

vfs_kern_mount() passes the struct file_system_type as an explicit
parameter in order to overcome this limitation.

Signed-off-by: Trond Myklebust

Trond Myklebust
2006-06-09 21:34:15 +0800

11 Jan, 2006

1 commit

fc33a7bb9 [PATCH] per-mountpoint noatime/nodiratime ... Browse Code »

Turn noatime and nodiratime into per-mount instead of per-sb flags.

After all the preparations this is a rather trivial patch. The mount code
needs to treat the two options as per-mount instead of per-superblock, and
touch_atime needs to be changed to check the new MNT_ flags in addition to
the MS_ flags that are kept for filesystems that are always
noatime/nodiratime but not user settable anymore. Besides that core code
only nfs needed an update because it's leaving atime updates to the server
and thus sets the S_NOATIME flag on every inode, but needs to know whether
it's a real noatime mount for an getattr optimization.

While we're at it I've killed the IS_NOATIME/IS_NODIRATIME macros that were
only used by touch_atime.

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2006-01-11 00:01:34 +0800

09 Jan, 2006

1 commit

bf066c7db [PATCH] shared mounts: cleanup ... Browse Code »

Small cleanups in shared mounts code.

Signed-off-by: Miklos Szeredi
Cc: Ram Pai
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2006-01-09 12:13:56 +0800

08 Nov, 2005

5 commits

9676f0c63 [PATCH] unbindable mounts ... Browse Code »

An unbindable mount does not forward or receive propagation. Also
unbindable mount disallows bind mounts. The semantics is as follows.

Bind semantics:
It is invalid to bind mount an unbindable mount.

Move semantics:
It is invalid to move an unbindable mount under shared mount.

Clone-namespace semantics:
If a mount is unbindable in the parent namespace, the corresponding
cloned mount in the child namespace becomes unbindable too. Note:
there is subtle difference, unbindable mounts cannot be bind mounted
but can be cloned during clone-namespace.

Signed-off-by: Ram Pai
Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Ram Pai
2005-11-08 10:18:11 +0800
a58b0eb8e [PATCH] introduce slave mounts ... Browse Code »

A slave mount always has a master mount from which it receives
mount/umount events. Unlike shared mount the event propagation does not
flow from the slave mount to the master.

Signed-off-by: Ram Pai
Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Ram Pai
2005-11-08 10:18:11 +0800
03e06e68f [PATCH] introduce shared mounts ... Browse Code »

This creates shared mounts. A shared mount when bind-mounted to some
mountpoint, propagates mount/umount events to each other. All the
shared mounts that propagate events to each other belong to the same
peer-group.

Signed-off-by: Ram Pai
Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Ram Pai
2005-11-08 10:18:10 +0800
07b20889e [PATCH] beginning of the shared-subtree proper ... Browse Code »

A private mount does not forward or receive propagation. This patch
provides user the ability to convert any mount to private.

Signed-off-by: Ram Pai
Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Ram Pai
2005-11-08 10:18:10 +0800
7b7b1ace2 [PATCH] saner handling of auto_acct_off() and DQUOT_OFF() in umount ... Browse Code »

The way we currently deal with quota and process accounting that might
keep vfsmount busy at umount time is inherently broken; we try to turn
them off just in case (not quite correctly, at that) and

a) pray umount doesn't fail (otherwise they'll stay turned off)
b) pray nobody doesn anything funny just as we turn quota off

Moreover, LSM provides hooks for doing the same sort of broken logics.

The proper way to deal with that is to introduce the second kind of
reference to vfsmount. Semantics:

- when the last normal reference is dropped, all special ones are
converted to normal ones and if there had been any, cleanup is done.
- normal reference can be cloned into a special one
- special reference can be converted to normal one; that's a no-op if
we'd already passed the point of no return (i.e. mntput() had
converted special references to normal and started cleanup).

The way it works: e.g. starting process accounting converts the vfsmount
reference pinned by the opened file into special one and turns it back
to normal when it gets shut down; acct_auto_close() is done when no
normal references are left. That way it does *not* obstruct umount(2)
and it silently gets turned off when the last normal reference to
vfsmount is gone. Which is exactly what we want...

The same should be done by LSM module that holds some internal
references to vfsmount and wants to shut them down on umount - it should
make them special and security_sb_umount_close() will be called exactly
when the last normal reference to vfsmount is gone.

quota handling is even simpler - we don't use normal file IO anymore, so
there's no need to hold vfsmounts at all. DQUOT_OFF() is done from
deactivate_super(), where it really belongs.

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2005-11-08 10:18:09 +0800