Eric Lee / smarc-fsl-linux-kernel

25 Apr, 2008

1 commit

42faad996 [PATCH] restore sane ->umount_begin() API ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2008-04-25 21:23:25 +0800

23 Apr, 2008

5 commits

97e7e0f71 [patch 7/7] vfs: mountinfo: show dominating group id ... Browse Code »

Show peer group ID of nearest dominating group that has intersection
with the mount's namespace.

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2008-04-23 12:05:09 +0800
2d4d4864a [patch 6/7] vfs: mountinfo: add /proc/<pid>/mountinfo ... Browse Code »

[mszeredi@suse.cz] rewrite and split big patch into managable chunks

/proc/mounts in its current form lacks important information:

- propagation state
- root of mount for bind mounts
- the st_dev value used within the filesystem
- identifier for each mount and it's parent

It also suffers from the following problems:

- not easily extendable
- ambiguity of mountpoints within a chrooted environment
- doesn't distinguish between filesystem dependent and independent options
- doesn't distinguish between per mount and per super block options

This patch introduces /proc//mountinfo which attempts to address
all these deficiencies.

Code shared between /proc//mounts and /proc//mountinfo is
extracted into separate functions.

Thanks to Al Viro for the help in getting the design right.

Signed-off-by: Ram Pai
Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Ram Pai
2008-04-23 12:05:03 +0800
a1a2c409b [patch 5/7] vfs: mountinfo: allow using process root ... Browse Code »

Allow /proc//mountinfo to use the root of to calculate
mountpoints.

- move definition of 'struct proc_mounts' to
- add the process's namespace and root to this structure
- pass a pointer to 'struct proc_mounts' into seq_operations

In addition the following cleanups are made:

- use a common open function for /proc//{mounts,mountstat}
- surround namespace.c part of these proc files with #ifdef CONFIG_PROC_FS
- make the seq_operations structures const

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2008-04-23 12:04:57 +0800
719f5d7f0 [patch 4/7] vfs: mountinfo: add mount peer group ID ... Browse Code »

Add a unique ID to each peer group using the IDR infrastructure. The
identifiers are reused after the peer group dissolves.

The IDR structures are protected by holding namepspace_sem for write
while allocating or deallocating IDs.

IDs are allocated when a previously unshared vfsmount becomes the
first member of a peer group. When a new member is added to an
existing group, the ID is copied from one of the old members.

IDs are freed when the last member of a peer group is unshared.

Setting the MNT_SHARED flag on members of a subtree is done as a
separate step, after all the IDs have been allocated. This way an
allocation failure can be cleaned up easilty, without affecting the
propagation state.

Based on design sketch by Al Viro.

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2008-04-23 12:04:51 +0800
73cd49ecd [patch 3/7] vfs: mountinfo: add mount ID ... Browse Code »

Add a unique ID to each vfsmount using the IDR infrastructure. The
identifiers are reused after the vfsmount is freed.

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2008-04-23 12:04:45 +0800

22 Apr, 2008

3 commits

8c3ee42e8 [PATCH] get rid of more nameidata passing in namespace.c ... Browse Code »

Further reduction of stack footprint (sys_pivot_root());
lose useless BKL in there, while we are at it.

Signed-off-by: Al Viro

Al Viro
2008-04-22 11:13:47 +0800
b5266eb4c [PATCH] switch a bunch of LSM hooks from nameidata to path ... Browse Code »

Namely, ones from namespace.c

Signed-off-by: Al Viro

Al Viro
2008-04-22 11:13:23 +0800
1a60a2807 [PATCH] lock exclusively in collect_mounts() and drop_collected_mounts() ... Browse Code »

Taking namespace_sem shared there isn't worth the trouble, especially with
vfsmount ID allocation about to be added. That way we know that umount_tree(),
copy_tree() and clone_mnt() are _always_ serialized by namespace_sem.
umount_tree() still needs vfsmount_lock (it manipulates hash chains, among
other things), but that's a separate story.

Signed-off-by: Al Viro

Al Viro
2008-04-22 11:11:09 +0800

19 Apr, 2008

3 commits

2e4b7fcd9 [PATCH] r/o bind mounts: honor mount writer counts at remount ... Browse Code »

Originally from: Herbert Poetzl

This is the core of the read-only bind mount patch set.

Note that this does _not_ add a "ro" option directly to the bind mount
operation. If you require such a mount, you must first do the bind, then
follow it up with a 'mount -o remount,ro' operation:

If you wish to have a r/o bind mount of /foo on bar:

mount --bind /foo /bar
mount -o remount,ro /bar

Acked-by: Al Viro
Signed-off-by: Christoph Hellwig
Signed-off-by: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Dave Hansen
2008-04-19 12:29:27 +0800
3d733633a [PATCH] r/o bind mounts: track numbers of writers to mounts ... Browse Code »

This is the real meat of the entire series. It actually
implements the tracking of the number of writers to a mount.
However, it causes scalability problems because there can be
hundreds of cpus doing open()/close() on files on the same mnt at
the same time. Even an atomic_t in the mnt has massive scalaing
problems because the cacheline gets so terribly contended.

This uses a statically-allocated percpu variable. All want/drop
operations are local to a cpu as long that cpu operates on the same
mount, and there are no writer count imbalances. Writer count
imbalances happen when a write is taken on one cpu, and released
on another, like when an open/close pair is performed on two

Upon a remount,ro request, all of the data from the percpu
variables is collected (expensive, but very rare) and we determine
if there are any outstanding writers to the mount.

I've written a little benchmark to sit in a loop for a couple of
seconds in several cpus in parallel doing open/write/close loops.

http://sr71.net/~dave/linux/openbench.c

The code in here is a a worst-possible case for this patch. It
does opens on a _pair_ of files in two different mounts in parallel.
This should cause my code to lose its "operate on the same mount"
optimization completely. This worst-case scenario causes a 3%
degredation in the benchmark.

I could probably get rid of even this 3%, but it would be more
complex than what I have here, and I think this is getting into
acceptable territory. In practice, I expect writing more than 3
bytes to a file, as well as disk I/O to mask any effects that this
has.

(To get rid of that 3%, we could have an #defined number of mounts
in the percpu variable. So, instead of a CPU getting operate only
on percpu data when it accesses only one mount, it could stay on
percpu data when it only accesses N or fewer mounts.)

[AV] merged fix for __clear_mnt_mount() stepping on freed vfsmount

Acked-by: Al Viro
Signed-off-by: Christoph Hellwig
Signed-off-by: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Dave Hansen
2008-04-19 12:29:27 +0800
8366025eb [PATCH] r/o bind mounts: stub functions ... Browse Code »

This patch adds two function mnt_want_write() and mnt_drop_write(). These are
used like a lock pair around and fs operations that might cause a write to the
filesystem.

Before these can become useful, we must first cover each place in the VFS
where writes are performed with a want/drop pair. When that is complete, we
can actually introduce code that will safely check the counts before allowing
r/wr/o transitions to occur.

Acked-by: Serge Hallyn
Acked-by: Al Viro
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Dave Hansen
Signed-off-by: Al Viro

Dave Hansen
2008-04-19 12:25:32 +0800

28 Mar, 2008

5 commits

6758f953d [PATCH] mnt_expire is protected by namespace_sem, no need for vfsmount_lock ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2008-03-28 08:48:04 +0800
c35038bec [PATCH] do shrink_submounts() for all fs types ... Browse Code »

... and take it out of ->umount_begin() instances. Call with all locks
already taken (by do_umount()) and leave calling release_mounts() to
caller (it will do release_mounts() anyway, so we can just put into
the same list).

Signed-off-by: Al Viro

Al Viro
2008-03-28 08:47:58 +0800
bcc5c7d2b [PATCH] sanitize locking in mark_mounts_for_expiry() and shrink_submounts() ... Browse Code »

... and fix a race on access of ->mnt_share et.al. without namespace_sem
in the latter.

Signed-off-by: Al Viro

Al Viro
2008-03-28 08:47:52 +0800
7c4b93d82 [PATCH] count ghost references to vfsmounts ... Browse Code »

make propagate_mount_busy() exclude references from the vfsmounts
that had been isolated by umount_tree() and are just waiting for
release_mounts() to dispose of their ->mnt_parent/->mnt_mountpoint.

Signed-off-by: Al Viro

Al Viro
2008-03-28 08:47:46 +0800
1a3906895 [PATCH] reduce stack footprint in namespace.c ... Browse Code »

A lot of places misuse struct nameidata when they need struct path.

Signed-off-by: Al Viro

Al Viro
2008-03-28 08:47:40 +0800

15 Feb, 2008

6 commits

c32c2f63a d_path: Make seq_path() use a struct path argument ... Browse Code »

seq_path() is always called with a dentry and a vfsmount from a struct path.
Make seq_path() take it directly as an argument.

Signed-off-by: Jan Blunck
Cc: Christoph Hellwig
Cc: Al Viro
Cc: "J. Bruce Fields"
Cc: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Blunck
2008-02-15 13:17:08 +0800
ac748a09f Make set_fs_{root,pwd} take a struct path ... Browse Code »

In nearly all cases the set_fs_{root,pwd}() calls work on a struct
path. Change the function to reflect this and use path_get() here.

Signed-off-by: Jan Blunck
Signed-off-by: Andreas Gruenbacher
Acked-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Blunck
2008-02-15 13:13:33 +0800
6ac08c39a Use struct path in fs_struct ... Browse Code »

* Use struct path in fs_struct.

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Jan Blunck
Acked-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Blunck
2008-02-15 13:13:33 +0800
1d957f9bf Introduce path_put() ... Browse Code »

* Add path_put() functions for releasing a reference to the dentry and
vfsmount of a struct path in the right order

* Switch from path_release(nd) to path_put(&nd->path)

* Rename dput_path() to path_put_conditional()

[akpm@linux-foundation.org: fix cifs]
Signed-off-by: Jan Blunck
Signed-off-by: Andreas Gruenbacher
Acked-by: Christoph Hellwig
Cc:
Cc: Al Viro
Cc: Steven French
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Blunck
2008-02-15 13:13:33 +0800
4ac913785 Embed a struct path into struct nameidata instead of nd->{dentry,mnt} ... Browse Code »

This is the central patch of a cleanup series. In most cases there is no good
reason why someone would want to use a dentry for itself. This series reflects
that fact and embeds a struct path into nameidata.

Together with the other patches of this series
- it enforced the correct order of getting/releasing the reference count on
pairs
- it prepares the VFS for stacking support since it is essential to have a
struct path in every place where the stack can be traversed
- it reduces the overall code size:

without patch series:
text data bss dec hex filename
5321639 858418 715768 6895825 6938d1 vmlinux

with patch series:
text data bss dec hex filename
5320026 858418 715768 6894212 693284 vmlinux

This patch:

Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix cifs]
[akpm@linux-foundation.org: fix smack]
Signed-off-by: Jan Blunck
Signed-off-by: Andreas Gruenbacher
Acked-by: Christoph Hellwig
Cc: Al Viro
Cc: Casey Schaufler
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Blunck
2008-02-15 13:13:33 +0800
429731b15 Remove path_release_on_umount() ... Browse Code »

path_release_on_umount() should only be called from sys_umount(). I merged the
function into sys_umount() instead of having in in namei.c.

Signed-off-by: Jan Blunck
Acked-by: Christoph Hellwig
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Blunck
2008-02-15 13:13:32 +0800

09 Feb, 2008

2 commits

2dafe1c4d reduce large do_mount stack usage with noinlines ... Browse Code »

do_mount() uses a whopping 616 bytes of stack on x86_64 in 2.6.24-mm1,
largely thanks to gcc inlining the various helper functions.

noinlining these can slim it down a lot; on my box this patch gets it down
to 168, which is mostly the struct nameidata nd; left on the stack.

These functions are called only as do_mount() helpers; none of them should
be in any path that would see a performance benefit from inlining...

Signed-off-by: Eric Sandeen
Cc: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Sandeen
2008-02-09 01:22:44 +0800
b3b304a23 mount options: add generic_show_options() ... Browse Code »

Add a new s_options field to struct super_block. Filesystems can save
mount options passed to them in mount or remount. It is automatically
freed when the superblock is destroyed.

A new helper function, generic_show_options() is introduced, which uses
this field to display the mount options in /proc/mounts.

Another helper function, save_mount_options() may be used by
filesystems to save the options in the super block.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2008-02-09 01:22:39 +0800

07 Feb, 2008

1 commit

13f14b4d8 Use ilog2() in fs/namespace.c ... Browse Code »

We can use ilog2() in fs/namespace.c to compute hash_bits and hash_mask at
compile time, not runtime.

[akpm@linux-foundation.org: clean it all up]
Signed-off-by: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2008-02-07 02:41:09 +0800

25 Jan, 2008

2 commits

00d266662 kobject: convert main fs kobject to use kobject_create ... Browse Code »

This also renames fs_subsys to fs_kobj to catch all current users with a
build error instead of a build warning which can easily be missed.

Cc: Kay Sievers
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2008-01-25 12:40:13 +0800
3514faca1 kobject: remove struct kobj_type from struct kset ... Browse Code »

We don't need a "default" ktype for a kset. We should set this
explicitly every time for each kset. This change is needed so that we
can make ksets dynamic, and cleans up one of the odd, undocumented
assumption that the kset/kobject/ktype model has.

This patch is based on a lot of help from Kay Sievers.

Nasty bug in the block code was found by Dave Young

Cc: Kay Sievers
Cc: Dave Young
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2008-01-25 12:40:10 +0800

21 Oct, 2007

1 commit

8aec08094 [PATCH] new helpers - collect_mounts() and release_collected_mounts() ... Browse Code »

Get a snapshot of a subtree, creating private clones of vfsmounts
for all its components and release such snapshot resp.

Signed-off-by: Al Viro

Al Viro
2007-10-21 14:37:25 +0800

20 Oct, 2007

1 commit

8bf9725c2 pid namespaces: introduce MS_KERNMOUNT flag ... Browse Code »

This flag tells the .get_sb callback that this is a kern_mount() call so that
it can trust *data pointer to be valid in-kernel one. If this flag is passed
from the user process, it is cleared since the *data pointer is not a valid
kernel object.

Running a few steps forward - this will be needed for proc to create the
superblock and store a valid pid namespace on it during the namespace
creation. The reason, why the namespace cannot live without proc mount is
described in the appropriate patch.

Signed-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:38 +0800

17 Oct, 2007

1 commit

74bf17cff fs: remove the unused mempages parameter ... Browse Code »

Since the mempages parameter is actually not used, they should be removed.

Now there is only files_init use the mempages parameter,

files_init(mempages);

but I don't think the adaptation to mempages in files_init is really
useful; and if files_init also changed to the prototype void (*func)(void),
the wrapper vfs_caches_init would also not need the mempages parameter.

Signed-off-by: Denis Cheng
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Denis Cheng
2007-10-17 23:42:49 +0800

20 Jul, 2007

1 commit

20c2df83d mm: Remove slab destructors from kmem_cache_create(). ... Browse Code »

Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt

Paul Mundt
2007-07-20 09:11:58 +0800

17 Jul, 2007

4 commits

948730b0e fs/namespace.c should #include "internal.h" ... Browse Code »

Every file should include the headers containing the prototypes for
its global functions.

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2007-07-17 00:05:50 +0800
213dd266d namespace: ensure clone_flags are always stored in an unsigned long ... Browse Code »

While working on unshare support for the network namespace I noticed we
were putting clone flags in an int. Which is weird because the syscall
uses unsigned long and we at least need an unsigned to properly hold all of
the unshare flags.

So to make the code consistent, this patch updates the code to use
unsigned long instead of int for the clone flags in those places
where we get it wrong today.

Signed-off-by: Eric W. Biederman
Acked-by: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-07-17 00:05:48 +0800
467e9f4b5 fix create_new_namespaces() return value ... Browse Code »

dup_mnt_ns() and clone_uts_ns() return NULL on failure. This is wrong,
create_new_namespaces() uses ERR_PTR() to catch an error. This means that the
subsequent create_new_namespaces() will hit BUG_ON() in copy_mnt_ns() or
copy_utsname().

Modify create_new_namespaces() to also use the errors returned by the
copy_*_ns routines and not to systematically return ENOMEM.

[oleg@tv-sign.ru: better changelog]
Signed-off-by: Cedric Le Goater
Cc: Serge E. Hallyn
Cc: Badari Pulavarty
Cc: Pavel Emelianov
Cc: Herbert Poetzl
Cc: Eric W. Biederman
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cedric Le Goater
2007-07-17 00:05:47 +0800
b0765fb85 Make /proc/self/mounts(tats) use seq_list_xxx helpers ... Browse Code »

One more simple and stupid switching to the new API.

Signed-off-by: Pavel Emelianov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelianov
2007-07-17 00:05:42 +0800

09 May, 2007

4 commits

ee6f95829 check privileges before setting mount propagation ... Browse Code »

There's a missing check for CAP_SYS_ADMIN in do_change_type().

Signed-off-by: Miklos Szeredi
Cc: Al Viro
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-05-09 02:15:12 +0800
b5e618181 Introduce a handy list_first_entry macro ... Browse Code »

There are many places in the kernel where the construction like

foo = list_entry(head->next, struct foo_struct, list);

are used.
The code might look more descriptive and neat if using the macro

list_first_entry(head, type, member) \
list_entry((head)->next, type, member)

Here is the macro itself and the examples of its usage in the generic code.
If it will turn out to be useful, I can prepare the set of patches to
inject in into arch-specific code, drivers, networking, etc.

Signed-off-by: Pavel Emelianov
Signed-off-by: Kirill Korotaev
Cc: Randy Dunlap
Cc: Andi Kleen
Cc: Zach Brown
Cc: Davide Libenzi
Cc: John McCutchan
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: john stultz
Cc: Ram Pai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelianov
2007-05-09 02:15:11 +0800
79c0b2df7 add filesystem subtype support ... Browse Code »

There's a slight problem with filesystem type representation in fuse
based filesystems.

From the kernel's view, there are just two filesystem types: fuse and
fuseblk. From the user's view there are lots of different filesystem
types. The user is not even much concerned if the filesystem is fuse based
or not. So there's a conflict of interest in how this should be
represented in fstab, mtab and /proc/mounts.

The current scheme is to encode the real filesystem type in the mount
source. So an sshfs mount looks like this:

sshfs#user@server:/ /mnt/server fuse rw,nosuid,nodev,...

This url-ish syntax works OK for sshfs and similar filesystems. However
for block device based filesystems (ntfs-3g, zfs) it doesn't work, since
the kernel expects the mount source to be a real device name.

A possibly better scheme would be to encode the real type in the type
field as "type.subtype". So fuse mounts would look like this:

/dev/hda1 /mnt/windows fuseblk.ntfs-3g rw,...
user@server:/ /mnt/server fuse.sshfs rw,nosuid,nodev,...

This patch adds the necessary code to the kernel so that this can be
correctly displayed in /proc/mounts.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-05-09 02:15:01 +0800
e3222c4ec Merge sys_clone()/sys_unshare() nsproxy and namespace handling ... Browse Code »

sys_clone() and sys_unshare() both makes copies of nsproxy and its associated
namespaces. But they have different code paths.

This patch merges all the nsproxy and its associated namespace copy/clone
handling (as much as possible). Posted on container list earlier for
feedback.

- Create a new nsproxy and its associated namespaces and pass it back to
caller to attach it to right process.

- Changed all copy_*_ns() routines to return a new copy of namespace
instead of attaching it to task->nsproxy.

- Moved the CAP_SYS_ADMIN checks out of copy_*_ns() routines.

- Removed unnessary !ns checks from copy_*_ns() and added BUG_ON()
just incase.

- Get rid of all individual unshare_*_ns() routines and make use of
copy_*_ns() instead.

[akpm@osdl.org: cleanups, warning fix]
[clg@fr.ibm.com: remove dup_namespaces() declaration]
[serue@us.ibm.com: fix CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) retval]
[akpm@linux-foundation.org: fix build with CONFIG_SYSVIPC=n]
Signed-off-by: Badari Pulavarty
Signed-off-by: Serge Hallyn
Cc: Cedric Le Goater
Cc: "Eric W. Biederman"
Cc:
Signed-off-by: Cedric Le Goater
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2007-05-09 02:15:00 +0800