09 May, 2007

1 commit

  • 1) Introduces a new method in 'struct dentry_operations'. This method
    called d_dname() might be called from d_path() to build a pathname for
    special filesystems. It is called without locks.

    Future patches (if we succeed in having one common dentry for all
    pipes/sockets) may need to change prototype of this method, but we now
    use : char *d_dname(struct dentry *dentry, char *buffer, int buflen);

    2) Adds a dynamic_dname() helper function that eases d_dname() implementations

    3) Defines d_dname method for sockets : No more sprintf() at socket
    creation. This is delayed up to the moment someone does an access to
    /proc/pid/fd/...

    4) Defines d_dname method for pipes : No more sprintf() at pipe
    creation. This is delayed up to the moment someone does an access to
    /proc/pid/fd/...

    A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a
    *nice* speedup on my Pentium(M) 1.6 Ghz :

    3.090 s instead of 3.450 s

    Signed-off-by: Eric Dumazet
    Acked-by: Christoph Hellwig
    Acked-by: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

12 Oct, 2006

1 commit

  • The attached patch destroys all the dentries attached to a superblock in one go
    by:

    (1) Destroying the tree rooted at s_root.

    (2) Destroying every entry in the anon list, one at a time.

    (3) Each entry in the anon list has its subtree consumed from the leaves
    inwards.

    This reduces the amount of work generic_shutdown_super() does, and avoids
    iterating through the dentry_unused list.

    Note that locking is almost entirely absent in the shrink_dcache_for_umount*()
    functions added by this patch. This is because:

    (1) at the point the filesystem calls generic_shutdown_super(), it is not
    permitted to further touch the superblock's set of dentries, and nor may
    it remove aliases from inodes;

    (2) the dcache memory shrinker now skips dentries that are being unmounted;
    and

    (3) the superblock no longer has any external references through which the VFS
    can reach it.

    Given these points, the only locking we need to do is when we remove dentries
    from the unused list and the name hashes, which we do a directory's worth at a
    time.

    We also don't need to guard against reference counts going to zero unexpectedly
    and removing bits of the tree we're working on as nothing else can call dput().

    A cut down version of dentry_iput() has been folded into
    shrink_dcache_for_umount_subtree() function. Apart from not needing to unlock
    things, it also doesn't need to check for inotify watches.

    In this version of the patch, the complaint about a dentry still being in use
    has been expanded from a single BUG_ON() and now gives much more information.

    Signed-off-by: David Howells
    Acked-by: NeilBrown
    Acked-by: Ian Kent
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

23 Sep, 2006

1 commit

  • The attached patch adds a new directory cache management function that prepares
    a disconnected anonymous function to be connected into the dentry tree. The
    anonymous dentry is transferred the name and parentage from another dentry.

    The following changes were made in [try #2]:

    (*) d_materialise_dentry() now switches the parentage of the two nodes around
    correctly when one or other of them is self-referential.

    The following changes were made in [try #7]:

    (*) d_instantiate_unique() has had the interior part split out as function
    __d_instantiate_unique(). Callers of this latter function must be holding
    the appropriate locks.

    (*) _d_rehash() has been added as a wrapper around __d_rehash() to call it
    with the most obvious hash list (the one from the name). d_rehash() now
    calls _d_rehash().

    (*) d_materialise_dentry() is now __d_materialise_dentry() and is static.

    (*) d_materialise_unique() added to perform the combination of d_find_alias(),
    d_materialise_dentry() and d_add_unique() that the NFS client was doing
    twice, all within a single dcache_lock critical section. This reduces the
    number of times two different spinlocks were being accessed.

    The following further changes were made:

    (*) Add the dentries onto their parents d_subdirs lists.

    Signed-Off-By: David Howells
    Signed-off-by: Trond Myklebust

    David Howells
     

04 Jul, 2006

1 commit

  • Teach special (recursive) locking code to the lock validator. Has no effect
    on non-lockdep kernels.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

23 Jun, 2006

2 commits

  • Extend the get_sb() filesystem operation to take an extra argument that
    permits the VFS to pass in the target vfsmount that defines the mountpoint.

    The filesystem is then required to manually set the superblock and root dentry
    pointers. For most filesystems, this should be done with simple_set_mnt()
    which will set the superblock pointer and then set the root dentry to the
    superblock's s_root (as per the old default behaviour).

    The get_sb() op now returns an integer as there's now no need to return the
    superblock pointer.

    This patch permits a superblock to be implicitly shared amongst several mount
    points, such as can be done with NFS to avoid potential inode aliasing. In
    such a case, simple_set_mnt() would not be called, and instead the mnt_root
    and mnt_sb would be set directly.

    The patch also makes the following changes:

    (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
    pointer argument and return an integer, so most filesystems have to change
    very little.

    (*) If one of the convenience function is not used, then get_sb() should
    normally call simple_set_mnt() to instantiate the vfsmount. This will
    always return 0, and so can be tail-called from get_sb().

    (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
    dcache upon superblock destruction rather than shrink_dcache_anon().

    This is required because the superblock may now have multiple trees that
    aren't actually bound to s_root, but that still need to be cleaned up. The
    currently called functions assume that the whole tree is rooted at s_root,
    and that anonymous dentries are not the roots of trees which results in
    dentries being left unculled.

    However, with the way NFS superblock sharing are currently set to be
    implemented, these assumptions are violated: the root of the filesystem is
    simply a dummy dentry and inode (the real inode for '/' may well be
    inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
    with child trees.

    [*] Anonymous until discovered from another tree.

    (*) The documentation has been adjusted, including the additional bit of
    changing ext2_* into foo_* in the documentation.

    [akpm@osdl.org: convert ipath_fs, do other stuff]
    Signed-off-by: David Howells
    Acked-by: Al Viro
    Cc: Nathan Scott
    Cc: Roland Dreier
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • The race is that the shrink_dcache_memory shrinker could get called while a
    filesystem is being unmounted, and could try to prune a dentry belonging to
    that filesystem.

    If it does, then it will call in to iput on the inode while the dentry is
    no longer able to be found by the umounting process. If iput takes a
    while, generic_shutdown_super could get all the way though
    shrink_dcache_parent and shrink_dcache_anon and invalidate_inodes without
    ever waiting on this particular inode.

    Eventually the superblock gets freed anyway and if the iput tried to touch
    it (which some filesystems certainly do), it will lose. The promised
    "Self-destruct in 5 seconds" doesn't lead to a nice day.

    The race is closed by holding s_umount while calling prune_one_dentry on
    someone else's dentry. As a down_read_trylock is used,
    shrink_dcache_memory will no longer try to prune the dentry of a filesystem
    that is being unmounted, and unmount will not be able to start until any
    such active prune_one_dentry completes.

    This requires that prune_dcache *knows* which filesystem (if any) it is
    doing the prune on behalf of so that it can be careful of other
    filesystems. shrink_dcache_memory isn't called it on behalf of any
    filesystem, and so is careful of everything.

    shrink_dcache_anon is now passed a super_block rather than the s_anon list
    out of the superblock, so it can get the s_anon list itself, and can pass
    the superblock down to prune_dcache.

    If prune_dcache finds a dentry that it cannot free, it leaves it where it
    is (at the tail of the list) and exits, on the assumption that some other
    thread will be removing that dentry soon. To try to make sure that some
    work gets done, a limited number of dnetries which are untouchable are
    skipped over while choosing the dentry to work on.

    I believe this race was first found by Kirill Korotaev.

    Cc: Jan Blunck
    Acked-by: Kirill Korotaev
    Cc: Olaf Hering
    Acked-by: Balbir Singh
    Signed-off-by: Neil Brown
    Signed-off-by: Balbir Singh
    Acked-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

01 Apr, 2006

1 commit

  • It is very common to hash a dentry and then to call lookup. If we take fs
    specific hash functions into account the full hash logic can get ugly.
    Further full_name_hash as an inline function is almost 100 bytes on x86 so
    having a non-inline choice in some cases can measurably decrease code size.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

26 Mar, 2006

1 commit

  • Previous inotify work avoidance is good when inotify is completely unused,
    but it breaks down if even a single watch is in place anywhere in the
    system. Robin Holt notices that udev is one such culprit - it slows down a
    512-thread application on a 512 CPU system from 6 seconds to 22 minutes.

    Solve this by adding a flag in the dentry that tells inotify whether or not
    its parent inode has a watch on it. Event queueing to parent will skip
    taking locks if this flag is cleared. Setting and clearing of this flag on
    all child dentries versus event delivery: this is no in terms of race
    cases, and that was shown to be equivalent to always performing the check.

    The essential behaviour is that activity occuring _after_ a watch has been
    added and _before_ it has been removed, will generate events.

    Signed-off-by: Nick Piggin
    Cc: Robert Love
    Cc: John McCutchan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

08 Feb, 2006

1 commit


04 Feb, 2006

1 commit


09 Jan, 2006

1 commit

  • Some long time ago, dentry struct was carefully tuned so that on 32 bits
    UP, sizeof(struct dentry) was exactly 128, ie a power of 2, and a multiple
    of memory cache lines.

    Then RCU was added and dentry struct enlarged by two pointers, with nice
    results for SMP, but not so good on UP, because breaking the above tuning
    (128 + 8 = 136 bytes)

    This patch reverts this unwanted side effect, by using an union (d_u),
    where d_rcu and d_child are placed so that these two fields can share their
    memory needs.

    At the time d_free() is called (and d_rcu is really used), d_child is known
    to be empty and not touched by the dentry freeing.

    Lockless lookups only access d_name, d_parent, d_lock, d_op, d_flags (so
    the previous content of d_child is not needed if said dentry was unhashed
    but still accessed by a CPU because of RCU constraints)

    As dentry cache easily contains millions of entries, a size reduction is
    worth the extra complexity of the ugly C union.

    Signed-off-by: Eric Dumazet
    Cc: Dipankar Sarma
    Cc: Maneesh Soni
    Cc: Miklos Szeredi
    Cc: "Paul E. McKenney"
    Cc: Ian Kent
    Cc: Paul Jackson
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Trond Myklebust
    Cc: Neil Brown
    Cc: James Morris
    Cc: Stephen Smalley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

08 Nov, 2005

1 commit

  • An unmount of a mount creates a umount event on the parent. If the
    parent is a shared mount, it gets propagated to all mounts in the peer
    group.

    Signed-off-by: Ram Pai
    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Ram Pai
     

08 Sep, 2005

1 commit

  • dentry cache uses sophisticated RCU technology (and prefetching if
    available) but touches 2 cache lines per dentry during hlist lookup.

    This patch moves d_hash in the same cache line than d_parent and d_name
    fields so that :

    1) One cache line is needed instead of two.

    2) the hlist_for_each_rcu() prefetching has a chance to bring all the
    needed data in advance, not only the part that includes d_hash.next.

    I also changed one old comment that was wrong for 64bits.

    A further optimisation would be to separate dentry in two parts, one that
    is mostly read, and one writen (d_count/d_lock) to avoid false sharing on
    SMP/NUMA but this would need different field placement depending on 32bits
    or 64bits platform.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds