Eric Lee / smarc-fsl-linux-kernel

09 May, 2007

1 commit

c23fbb6bc VFS: delay the dentry name generation on sockets and pipes ... Browse Code »

1) Introduces a new method in 'struct dentry_operations'. This method
called d_dname() might be called from d_path() to build a pathname for
special filesystems. It is called without locks.

Future patches (if we succeed in having one common dentry for all
pipes/sockets) may need to change prototype of this method, but we now
use : char *d_dname(struct dentry *dentry, char *buffer, int buflen);

2) Adds a dynamic_dname() helper function that eases d_dname() implementations

3) Defines d_dname method for sockets : No more sprintf() at socket
creation. This is delayed up to the moment someone does an access to
/proc/pid/fd/...

4) Defines d_dname method for pipes : No more sprintf() at pipe
creation. This is delayed up to the moment someone does an access to
/proc/pid/fd/...

A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a
*nice* speedup on my Pentium(M) 1.6 Ghz :

3.090 s instead of 3.450 s

Signed-off-by: Eric Dumazet
Acked-by: Christoph Hellwig
Acked-by: Linus Torvalds
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2007-05-09 02:15:03 +0800

12 Oct, 2006

1 commit

c636ebdb1 [PATCH] VFS: Destroy the dentries contributed by a superblock on unmounting ... Browse Code »

The attached patch destroys all the dentries attached to a superblock in one go
by:

(1) Destroying the tree rooted at s_root.

(2) Destroying every entry in the anon list, one at a time.

(3) Each entry in the anon list has its subtree consumed from the leaves
inwards.

This reduces the amount of work generic_shutdown_super() does, and avoids
iterating through the dentry_unused list.

Note that locking is almost entirely absent in the shrink_dcache_for_umount*()
functions added by this patch. This is because:

(1) at the point the filesystem calls generic_shutdown_super(), it is not
permitted to further touch the superblock's set of dentries, and nor may
it remove aliases from inodes;

(2) the dcache memory shrinker now skips dentries that are being unmounted;
and

(3) the superblock no longer has any external references through which the VFS
can reach it.

Given these points, the only locking we need to do is when we remove dentries
from the unused list and the name hashes, which we do a directory's worth at a
time.

We also don't need to guard against reference counts going to zero unexpectedly
and removing bits of the tree we're working on as nothing else can call dput().

A cut down version of dentry_iput() has been folded into
shrink_dcache_for_umount_subtree() function. Apart from not needing to unlock
things, it also doesn't need to check for inotify watches.

In this version of the patch, the complaint about a dentry still being in use
has been expanded from a single BUG_ON() and now gives much more information.

Signed-off-by: David Howells
Acked-by: NeilBrown
Acked-by: Ian Kent
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-10-12 02:14:25 +0800

23 Sep, 2006

1 commit

770bfad84 NFS: Add dentry materialisation op ... Browse Code »

The attached patch adds a new directory cache management function that prepares
a disconnected anonymous function to be connected into the dentry tree. The
anonymous dentry is transferred the name and parentage from another dentry.

The following changes were made in [try #2]:

(*) d_materialise_dentry() now switches the parentage of the two nodes around
correctly when one or other of them is self-referential.

The following changes were made in [try #7]:

(*) d_instantiate_unique() has had the interior part split out as function
__d_instantiate_unique(). Callers of this latter function must be holding
the appropriate locks.

(*) _d_rehash() has been added as a wrapper around __d_rehash() to call it
with the most obvious hash list (the one from the name). d_rehash() now
calls _d_rehash().

(*) d_materialise_dentry() is now __d_materialise_dentry() and is static.

(*) d_materialise_unique() added to perform the combination of d_find_alias(),
d_materialise_dentry() and d_add_unique() that the NFS client was doing
twice, all within a single dcache_lock critical section. This reduces the
number of times two different spinlocks were being accessed.

The following further changes were made:

(*) Add the dentries onto their parents d_subdirs lists.

Signed-Off-By: David Howells
Signed-off-by: Trond Myklebust

David Howells
2006-09-23 11:24:30 +0800

04 Jul, 2006

1 commit

a90b9c05d [PATCH] lockdep: annotate dcache ... Browse Code »

Teach special (recursive) locking code to the lock validator. Has no effect
on non-lockdep kernels.

Signed-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2006-07-04 06:27:06 +0800

23 Jun, 2006

2 commits

454e2398b [PATCH] VFS: Permit filesystem to override root dentry on mount ... Browse Code »

Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.

The filesystem is then required to manually set the superblock and root dentry
pointers. For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).

The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.

This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing. In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.

The patch also makes the following changes:

(*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
pointer argument and return an integer, so most filesystems have to change
very little.

(*) If one of the convenience function is not used, then get_sb() should
normally call simple_set_mnt() to instantiate the vfsmount. This will
always return 0, and so can be tail-called from get_sb().

(*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
dcache upon superblock destruction rather than shrink_dcache_anon().

This is required because the superblock may now have multiple trees that
aren't actually bound to s_root, but that still need to be cleaned up. The
currently called functions assume that the whole tree is rooted at s_root,
and that anonymous dentries are not the roots of trees which results in
dentries being left unculled.

However, with the way NFS superblock sharing are currently set to be
implemented, these assumptions are violated: the root of the filesystem is
simply a dummy dentry and inode (the real inode for '/' may well be
inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
with child trees.

[*] Anonymous until discovered from another tree.

(*) The documentation has been adjusted, including the additional bit of
changing ext2_* into foo_* in the documentation.

[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Nathan Scott
Cc: Roland Dreier
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-06-23 22:42:45 +0800
0feae5c47 [PATCH] Fix dcache race during umount ... Browse Code »

The race is that the shrink_dcache_memory shrinker could get called while a
filesystem is being unmounted, and could try to prune a dentry belonging to
that filesystem.

If it does, then it will call in to iput on the inode while the dentry is
no longer able to be found by the umounting process. If iput takes a
while, generic_shutdown_super could get all the way though
shrink_dcache_parent and shrink_dcache_anon and invalidate_inodes without
ever waiting on this particular inode.

Eventually the superblock gets freed anyway and if the iput tried to touch
it (which some filesystems certainly do), it will lose. The promised
"Self-destruct in 5 seconds" doesn't lead to a nice day.

The race is closed by holding s_umount while calling prune_one_dentry on
someone else's dentry. As a down_read_trylock is used,
shrink_dcache_memory will no longer try to prune the dentry of a filesystem
that is being unmounted, and unmount will not be able to start until any
such active prune_one_dentry completes.

This requires that prune_dcache *knows* which filesystem (if any) it is
doing the prune on behalf of so that it can be careful of other
filesystems. shrink_dcache_memory isn't called it on behalf of any
filesystem, and so is careful of everything.

shrink_dcache_anon is now passed a super_block rather than the s_anon list
out of the superblock, so it can get the s_anon list itself, and can pass
the superblock down to prune_dcache.

If prune_dcache finds a dentry that it cannot free, it leaves it where it
is (at the tail of the list) and exits, on the assumption that some other
thread will be removing that dentry soon. To try to make sure that some
work gets done, a limited number of dnetries which are untouchable are
skipped over while choosing the dentry to work on.

I believe this race was first found by Kirill Korotaev.

Cc: Jan Blunck
Acked-by: Kirill Korotaev
Cc: Olaf Hering
Acked-by: Balbir Singh
Signed-off-by: Neil Brown
Signed-off-by: Balbir Singh
Acked-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-06-23 06:05:57 +0800

01 Apr, 2006

1 commit

3e7e241f8 [PATCH] dcache: Add helper d_hash_and_lookup ... Browse Code »

It is very common to hash a dentry and then to call lookup. If we take fs
specific hash functions into account the full hash logic can get ugly.
Further full_name_hash as an inline function is almost 100 bytes on x86 so
having a non-inline choice in some cases can measurably decrease code size.

Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-04-01 04:19:00 +0800

26 Mar, 2006

1 commit

c32ccd87b [PATCH] inotify: lock avoidance with parent watch status in dentry ... Browse Code »

Previous inotify work avoidance is good when inotify is completely unused,
but it breaks down if even a single watch is in place anywhere in the
system. Robin Holt notices that udev is one such culprit - it slows down a
512-thread application on a 512 CPU system from 6 seconds to 22 minutes.

Solve this by adding a flag in the dentry that tells inotify whether or not
its parent inode has a watch on it. Event queueing to parent will skip
taking locks if this flag is cleared. Setting and clearing of this flag on
all child dentries versus event delivery: this is no in terms of race
cases, and that was shown to be equivalent to always performing the check.

The essential behaviour is that activity occuring _after_ a watch has been
added and _before_ it has been removed, will generate events.

Signed-off-by: Nick Piggin
Cc: Robert Love
Cc: John McCutchan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-03-26 00:22:53 +0800

08 Feb, 2006

1 commit

1b8623545 [PATCH] remove bogus asm/bug.h includes. ... Browse Code »

A bunch of asm/bug.h includes are both not needed (since it will get
pulled anyway) and bogus (since they are done too early). Removed.

Signed-off-by: Al Viro

Al Viro
2006-02-08 09:56:35 +0800

04 Feb, 2006

1 commit

47ba87e0b [PATCH] make "struct d_cookie" depend on CONFIG_PROFILING ... Browse Code »

Shrinks "struct dentry" from 128 bytes to 124 on x86, allowing 31 objects
per slab instead of 30.

Cc: John Levon
Cc: Philippe Elie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Marcelo Tosatti
2006-02-04 00:32:04 +0800

09 Jan, 2006

1 commit

5160ee6fc [PATCH] shrink dentry struct ... Browse Code »

Some long time ago, dentry struct was carefully tuned so that on 32 bits
UP, sizeof(struct dentry) was exactly 128, ie a power of 2, and a multiple
of memory cache lines.

Then RCU was added and dentry struct enlarged by two pointers, with nice
results for SMP, but not so good on UP, because breaking the above tuning
(128 + 8 = 136 bytes)

This patch reverts this unwanted side effect, by using an union (d_u),
where d_rcu and d_child are placed so that these two fields can share their
memory needs.

At the time d_free() is called (and d_rcu is really used), d_child is known
to be empty and not touched by the dentry freeing.

Lockless lookups only access d_name, d_parent, d_lock, d_op, d_flags (so
the previous content of d_child is not needed if said dentry was unhashed
but still accessed by a CPU because of RCU constraints)

As dentry cache easily contains millions of entries, a size reduction is
worth the extra complexity of the ugly C union.

Signed-off-by: Eric Dumazet
Cc: Dipankar Sarma
Cc: Maneesh Soni
Cc: Miklos Szeredi
Cc: "Paul E. McKenney"
Cc: Ian Kent
Cc: Paul Jackson
Cc: Al Viro
Cc: Christoph Hellwig
Cc: Trond Myklebust
Cc: Neil Brown
Cc: James Morris
Cc: Stephen Smalley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2006-01-09 12:13:58 +0800

08 Nov, 2005

1 commit

a05964f39 [PATCH] shared mounts handling: umount ... Browse Code »

An unmount of a mount creates a umount event on the parent. If the
parent is a shared mount, it gets propagated to all mounts in the peer
group.

Signed-off-by: Ram Pai
Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Ram Pai
2005-11-08 10:18:11 +0800

08 Sep, 2005

1 commit

3f4bb1f41 [PATCH] struct dentry: place d_hash close to d_parent and d_name to speedup lookups ... Browse Code »

dentry cache uses sophisticated RCU technology (and prefetching if
available) but touches 2 cache lines per dentry during hlist lookup.

This patch moves d_hash in the same cache line than d_parent and d_name
fields so that :

1) One cache line is needed instead of two.

2) the hlist_for_each_rcu() prefetching has a chance to bring all the
needed data in advance, not only the part that includes d_hash.next.

I also changed one old comment that was wrong for 64bits.

A further optimisation would be to separate dentry in two parts, one that
is mostly read, and one writen (d_count/d_lock) to avoid false sharing on
SMP/NUMA but this would need different field placement depending on 32bits
or 64bits platform.

Signed-off-by: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2005-09-08 07:57:41 +0800

17 Apr, 2005

1 commit

1da177e4c Linux-2.6.12-rc2 ... Browse Code »

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

Linus Torvalds
2005-04-17 06:20:36 +0800