23 Oct, 2008

9 commits


29 Sep, 2008

1 commit

  • The VFS interface for the 'd_compare()' is a bit special (read: 'odd'),
    because it really just essentially replaces a memcmp(). The filesystem
    is supposed to just compare the two names with whatever case-independent
    or other function.

    And when I say 'is supposed to', I obviously mean that 'procfs does odd
    things, and actually looks at the dentry that we don't even pass down,
    rather than just the name'. Which results in problems, because we
    actually call d_compare before we have even verified that the dentry is
    still hashed at all.

    And that causes a problm since the inode that procfs looks at may have
    been free'd and the d_inode pointer is NULL. procfs just assumes that
    all dentries are positive, since procfs itself never generates a
    negative one. But memory pressure will still result in the dentry
    getting torn down, and as it is removed by RCU, it still remains visible
    on some lists - and to d_compare.

    If the filesystem just did a name comparison, we wouldn't care. And we
    could just fix procfs to know about negative dentries too. But rather
    than have the low-level filesystems know about internal VFS details,
    just move the check for a unhashed dentry up a bit, so that we will only
    call d_compare on dentries that are still active.

    The actual oops this caused didn't look like a NULL pointer dereference
    because procfs did a 'container_of(inode, struct proc_inode, vfs_inode)'
    to get at its internal proc_inode information from the inode pointer,
    and accessed a field below the inode. So the oops would look something
    like

    BUG: unable to handle kernel paging request at fffffffffffffff0
    IP: [] proc_sys_compare+0x36/0x50

    and was seen on both x86-64 (Alexey Dobriyan and Hugh Dickins) and
    ppc64 (Hugh Dickins).

    Reported-by: Alexey Dobriyan
    Acked-by: Hugh Dickins
    Cc: Al Viro
    Reviewed-by: "Eric W. Biederman"
    Signed-of-by: Linus Torvalds

    Linus Torvalds
     

25 Aug, 2008

1 commit


28 Jul, 2008

1 commit

  • This add a dcache entry to the dcache for lookup, but changing the name
    that is associated with the entry rather than the one passed in to the
    lookup routine.

    First, it sees if the case-exact match already exists in the dcache and
    uses it if one exists. Otherwise, it allocates a new node with the new
    name and splices it into the dcache.

    Original code from ntfs_lookup in fs/ntfs/namei.c by Anton Altaparmakov.

    Signed-off-by: Barry Naujok
    Signed-off-by: Anton Altaparmakov
    Acked-by: Christoph Hellwig

    Barry Naujok
     

27 Jul, 2008

1 commit


25 Jul, 2008

1 commit

  • [Summary]

    Split LRU-list of unused dentries to one per superblock to avoid soft
    lock up during NFS mounts and remounting of any filesystem.

    Previously I posted here:
    http://lkml.org/lkml/2008/3/5/590

    [Descriptions]

    - background

    dentry_unused is a list of dentries which are not referenced.
    dentry_unused grows up when references on directories or files are
    released. This list can be very long if there is huge free memory.

    - the problem

    When shrink_dcache_sb() is called, it scans all dentry_unused linearly
    under spin_lock(), and if dentry->d_sb is differnt from given
    superblock, scan next dentry. This scan costs very much if there are
    many entries, and very ineffective if there are many superblocks.

    IOW, When we need to shrink unused dentries on one dentry, but scans
    unused dentries on all superblocks in the system. For example, we scan
    500 dentries to unmount a filesystem, but scans 1,000,000 or more unused
    dentries on other superblocks.

    In our case , At mounting NFS*, shrink_dcache_sb() is called to shrink
    unused dentries on NFS, but scans 100,000,000 unused dentries on
    superblocks in the system such as local ext3 filesystems. I hear NFS
    mounting took 1 min on some system in use.

    * : NFS uses virtual filesystem in rpc layer, so NFS is affected by
    this problem.

    100,000,000 is possible number on large systems.

    Per-superblock LRU of unused dentried can reduce the cost in
    reasonable manner.

    - How to fix

    I found this problem is solved by David Chinner's "Per-superblock
    unused dentry LRU lists V3"(1), so I rebase it and add some fix to
    reclaim with fairness, which is in Andrew Morton's comments(2).

    1) http://lkml.org/lkml/2006/5/25/318
    2) http://lkml.org/lkml/2006/5/25/320

    Split LRU-list of unused dentries to each superblocks. Then, NFS
    mounting will check dentries under a superblock instead of all. But
    this spliting will break LRU of dentry-unused. So, I've attempted to
    make reclaim unused dentrins with fairness by calculate number of
    dentries to scan on this sb based on following way

    number of dentries to scan on this sb =
    count * (number of dentries on this sb / number of dentries in the machine)

    - ToDo
    - I have to measuring performance number and do stress tests.

    - When unmount occurs during prune_dcache(), scanning on same
    superblock, It is unable to reach next superblock because it is gone
    away. We restart scannig superblock from first one, it causes
    unfairness of reclaim unused dentries on first superblock. But I think
    this happens very rarely.

    - Test Results

    Result on 6GB boxes with excessive unused dentries.

    Without patch:

    $ cat /proc/sys/fs/dentry-state
    10181835 10180203 45 0 0 0
    # mount -t nfs 10.124.60.70:/work/kernel-src nfs
    real 0m1.830s
    user 0m0.001s
    sys 0m1.653s

    With this patch:
    $ cat /proc/sys/fs/dentry-state
    10236610 10234751 45 0 0 0
    # mount -t nfs 10.124.60.70:/work/kernel-src nfs
    real 0m0.106s
    user 0m0.002s
    sys 0m0.032s

    [akpm@linux-foundation.org: fix comments]
    Signed-off-by: Kentaro Makita
    Cc: Neil Brown
    Cc: Trond Myklebust
    Cc: David Chinner
    Cc: "J. Bruce Fields"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kentaro Makita
     

24 Jun, 2008

3 commits

  • Comment from Al Viro: add prepend_name() wrapper.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • Fix the following sparse warnings:

    fs/dcache.c:2183:19: warning: symbol 'filp_cachep' was not declared. Should it be static?
    fs/dcache.c:115:3: warning: context imbalance in 'dentry_iput' - unexpected unlock
    fs/dcache.c:188:2: warning: context imbalance in 'dput' - different lock contexts for basic block
    fs/dcache.c:400:2: warning: context imbalance in 'prune_one_dentry' - different lock contexts for basic block
    fs/dcache.c:431:22: warning: context imbalance in 'prune_dcache' - different lock contexts for basic block
    fs/dcache.c:563:2: warning: context imbalance in 'shrink_dcache_sb' - different lock contexts for basic block
    fs/dcache.c:1385:6: warning: context imbalance in 'd_delete' - wrong count at exit
    fs/dcache.c:1636:2: warning: context imbalance in '__d_unalias' - unexpected unlock
    fs/dcache.c:1735:2: warning: context imbalance in 'd_materialise_unique' - different lock contexts for basic block

    Signed-off-by: Miklos Szeredi
    Reviewed-by: Matthew Wilcox
    Acked-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • The path that __d_path() computes can become slightly inconsistent when it
    races with mount operations: it grabs the vfsmount_lock when traversing mount
    points but immediately drops it again, only to re-grab it when it reaches the
    next mount point. The result is that the filename computed is not always
    consisent, and the file may never have had that name. (This is unlikely, but
    still possible.)

    Fix this by grabbing the vfsmount_lock for the whole duration of
    __d_path().

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: John Johansen
    Signed-off-by: Miklos Szeredi
    Acked-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

23 Jun, 2008

1 commit


23 Apr, 2008

2 commits

  • Add a new function:

    seq_file_root()

    This is similar to seq_path(), but calculates the path relative to the
    given root, instead of current->fs->root. If the path was unreachable
    from root, then modify the root parameter to reflect this.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • [mszeredi@suse.cz] split big patch into managable chunks

    Add the following functions:

    dentry_path()
    seq_dentry()

    These are similar to d_path() and seq_path(). But instead of
    calculating the path within a mount namespace, they calculate the path
    from the root of the filesystem to a given dentry, ignoring mounts
    completely.

    Signed-off-by: Ram Pai
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Ram Pai
     

15 Feb, 2008

5 commits

  • Extract the common code to remove a dentry from the lru into a new function
    dentry_lru_remove().

    Two call sites used list_del() instead of list_del_init(). AFAIK the
    performance of both is the same. dentry_lru_remove() does a list_del_init().

    As a result dentry->d_lru is now always empty when a dentry is freed.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • d_path() is used on a pair. Lets use a struct path to
    reflect this.

    [akpm@linux-foundation.org: fix build in mm/memory.c]
    Signed-off-by: Jan Blunck
    Acked-by: Bryan Wu
    Acked-by: Christoph Hellwig
    Cc: Al Viro
    Cc: "J. Bruce Fields"
    Cc: Neil Brown
    Cc: Michael Halcrow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Blunck
     
  • Move and update d_path() kernel API documentation.

    Signed-off-by: Jan Blunck
    Acked-by: Christoph Hellwig
    Cc: Al Viro
    Cc: "J. Bruce Fields"
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Blunck
     
  • All callers to __d_path pass the dentry and vfsmount of a struct path to
    __d_path. Pass the struct path directly, instead.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Jan Blunck
    Acked-by: Christoph Hellwig
    Cc: Al Viro
    Cc: "J. Bruce Fields"
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Blunck
     
  • * Use struct path in fs_struct.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Jan Blunck
    Acked-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Blunck
     

07 Feb, 2008

2 commits

  • The inotify debugging code is supposed to verify that the
    DCACHE_INOTIFY_PARENT_WATCHED scalability optimisation does not result in
    notifications getting lost nor extra needless locking generated.

    Unfortunately there are also some races in the debugging code. And it isn't
    very good at finding problems anyway. So remove it for now.

    Signed-off-by: Nick Piggin
    Cc: Robert Love
    Cc: John McCutchan
    Cc: Jan Kara
    Cc: Yan Zheng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Use hlist_unhashed() instead of opencoded equivalent.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

22 Oct, 2007

1 commit

  • Well, it's not especially important that target->d_iname get the contents
    of dentry->d_iname, but it's important that it get initialized with
    *something*, otherwise we're just exposing some random piece of memory to
    anyone who reads the link at /proc//fd/ for the deleted file, when
    it's still held open by someone.

    I've run a test program that copies a short (=36 character) name and see that the first time I run it, without
    this patch, I get unpredicatable results out of /proc//fd/.

    Signed-off-by: J. Bruce Fields
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     

21 Oct, 2007

1 commit

  • New kind of audit rule predicates: "object is visible in given subtree".
    The part that can be sanely implemented, that is. Limitations:
    * if you have hardlink from outside of tree, you'd better watch
    it too (or just watch the object itself, obviously)
    * if you mount something under a watched tree, tell audit
    that new chunk should be added to watched subtrees
    * if you umount something in a watched tree and it's still mounted
    elsewhere, you will get matches on events happening there. New command
    tells audit to recalculate the trees, trimming such sources of false
    positives.

    Note that it's _not_ about path - if something mounted in several places
    (multiple mount, bindings, different namespaces, etc.), the match does
    _not_ depend on which one we are using for access.

    Signed-off-by: Al Viro

    Al Viro
     

17 Oct, 2007

6 commits

  • Signed-off-by: Denis Cheng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denis Cheng
     
  • This patch makes shrink_dcache_sb consistent with dentry pruning policy.

    On the first pass we iterate over dentry unused list and prepare some
    dentries for removal.

    However, since the existing code moves evicted dentries to the beginning of
    the LRU it can happen that fresh dentries from other superblocks will be
    inserted *before* our dentries.

    This can result in significant slowdown of shrink_dcache_sb(). Moreover,
    for virtual filesystems like unionfs which can call dput() during dentries
    kill existing code results in O(n^2) complexity.

    We observed 2 minutes shrink_dcache_sb() with only 35000 dentries.

    To avoid this effects we propose to isolate sb dentries at the end
    of LRU list.

    Signed-off-by: Denis V. Lunev
    Signed-off-by: Kirill Korotaev
    Signed-off-by: Andrey Mirkin
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denis V. Lunev
     
  • As it stands this comment is confusing, and not quite grammatical.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • It looks like in the end all pruners want parents removed.

    So remove unused code and function arguments.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Since the mempages parameter is actually not used, they should be removed.

    Now there is only files_init use the mempages parameter,

    files_init(mempages);

    but I don't think the adaptation to mempages in files_init is really
    useful; and if files_init also changed to the prototype void (*func)(void),
    the wrapper vfs_caches_init would also not need the mempages parameter.

    Signed-off-by: Denis Cheng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denis Cheng
     
  • This patch marks a number of allocations that are either short-lived such as
    network buffers or are reclaimable such as inode allocations. When something
    like updatedb is called, long-lived and unmovable kernel allocations tend to
    be spread throughout the address space which increases fragmentation.

    This patch groups these allocations together as much as possible by adding a
    new MIGRATE_TYPE. The MIGRATE_RECLAIMABLE type is for allocations that can be
    reclaimed on demand, but not moved. i.e. they can be migrated by deleting
    them and re-reading the information from elsewhere.

    Signed-off-by: Mel Gorman
    Cc: Andy Whitcroft
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     

18 Jul, 2007

1 commit

  • I can never remember what the function to register to receive VM pressure
    is called. I have to trace down from __alloc_pages() to find it.

    It's called "set_shrinker()", and it needs Your Help.

    1) Don't hide struct shrinker. It contains no magic.
    2) Don't allocate "struct shrinker". It's not helpful.
    3) Call them "register_shrinker" and "unregister_shrinker".
    4) Call the function "shrink" not "shrinker".
    5) Reduce the 17 lines of waffly comments to 13, but document it properly.

    Signed-off-by: Rusty Russell
    Cc: David Chinner
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     

09 May, 2007

3 commits

  • Remove includes of where it is not used/needed.
    Suggested by Al Viro.

    Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
    sparc64, and arm (all 59 defconfigs).

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • 1) Introduces a new method in 'struct dentry_operations'. This method
    called d_dname() might be called from d_path() to build a pathname for
    special filesystems. It is called without locks.

    Future patches (if we succeed in having one common dentry for all
    pipes/sockets) may need to change prototype of this method, but we now
    use : char *d_dname(struct dentry *dentry, char *buffer, int buflen);

    2) Adds a dynamic_dname() helper function that eases d_dname() implementations

    3) Defines d_dname method for sockets : No more sprintf() at socket
    creation. This is delayed up to the moment someone does an access to
    /proc/pid/fd/...

    4) Defines d_dname method for pipes : No more sprintf() at pipe
    creation. This is delayed up to the moment someone does an access to
    /proc/pid/fd/...

    A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a
    *nice* speedup on my Pentium(M) 1.6 Ghz :

    3.090 s instead of 3.450 s

    Signed-off-by: Eric Dumazet
    Acked-by: Christoph Hellwig
    Acked-by: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • Teach the dentry slab shrinker to aggressively shrink parent dentries when
    shrinking the dentry cache.

    This is done to attempt to improve the situation where the dentry slab cache
    gets a lot of internal fragmentation due to pages containing directory
    dentries. It is expected that this change will cause some of those dentries
    to be reaped earlier, and with less scanning.

    Needs careful testing.

    Cc: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton