19 Oct, 2010

1 commit

  • Under heavy multi-way parallel create workloads, the VFS struggles
    to write back all the inodes that have been changed in age order.
    The bdi flusher thread becomes CPU bound, spending 85% of it's time
    in the VFS code, mostly traversing the superblock dirty inode list
    to separate dirty inodes old enough to flush.

    We already keep an index of all metadata changes in age order - in
    the AIL - and continued log pressure will do age ordered writeback
    without any extra overhead at all. If there is no pressure on the
    log, the xfssyncd will periodically write back metadata in ascending
    disk address offset order so will be very efficient.

    Hence we can stop marking VFS inodes dirty during transaction commit
    or when changing timestamps during transactions. This will keep the
    inodes in the superblock dirty list to those containing data or
    unlogged metadata changes.

    However, the timstamp changes are slightly more complex than this -
    there are a couple of places that do unlogged updates of the
    timestamps, and the VFS need to be informed of these. Hence add a
    new function xfs_trans_ichgtime() for transactional changes,
    and leave xfs_ichgtime() for the non-transactional changes.

    Signed-off-by: Dave Chinner
    Reviewed-by: Alex Elder
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     

27 Jul, 2010

4 commits

  • Currently we need to either call IHOLD or xfs_trans_ihold on an inode when
    joining it to a transaction via xfs_trans_ijoin.

    This patches instead makes xfs_trans_ijoin usable on it's own by doing
    an implicity xfs_trans_ihold, which also allows us to drop the third
    argument. For the case where we want to hold a reference on the inode
    a xfs_trans_ijoin_ref wrapper is added which does the IHOLD and marks
    the inode for needing an xfs_iput. In addition to the cleaner interface
    to the caller this also simplifies the implementation.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     
  • Currently we track log item descriptor belonging to a transaction using a
    complex opencoded chunk allocator. This code has been there since day one
    and seems to work around the lack of an efficient slab allocator.

    This patch replaces it with dynamically allocated log item descriptors
    from a dedicated slab pool, linked to the transaction by a linked list.

    This allows to greatly simplify the log item descriptor tracking to the
    point where it's just a couple hundred lines in xfs_trans.c instead of
    a separate file. The external API has also been simplified while we're
    at it - the xfs_trans_add_item and xfs_trans_del_item functions to add/
    delete items from a transaction have been simplified to the bare minium,
    and the xfs_trans_find_item function is replaced with a direct dereference
    of the li_desc field. All debug code walking the list of log items in
    a transaction is down to a simple list_for_each_entry.

    Note that we could easily use a singly linked list here instead of the
    double linked list from list.h as the fastpath only does deletion from
    sequential traversal. But given that we don't have one available as
    a library function yet I use the list.h functions for simplicity.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     
  • Dmapi support was never merged upstream, but we still have a lot of hooks
    bloating XFS for it, all over the fast pathes of the filesystem.

    This patch drops over 700 lines of dmapi overhead. If we'll ever get HSM
    support in mainline at least the namespace events can be done much saner
    in the VFS instead of the individual filesystem, so it's not like this
    is much help for future work.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     

24 Jun, 2010

1 commit

  • The block number comes from bulkstat based inode lookups to shortcut
    the mapping calculations. We ar enot able to trust anything from
    bulkstat, so drop the block number as well so that the correct
    lookups and mappings are always done.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     

02 Sep, 2009

1 commit

  • xfs_trans_iget is a wrapper for xfs_iget that adds the inode to the
    transaction after it is read. Except when the inode already is in the
    inode cache, in which case it returns the existing locked inode with
    increment lock recursion counts.

    Now, no one in the tree every decrements these lock recursion counts,
    so any user of this gets a potential double unlock when both the original
    owner of the inode and the xfs_trans_iget caller unlock it. When looking
    back in a git bisect in the historic XFS tree there was only one place
    that decremented these counts, xfs_trans_iput. Introduced in commit
    ca25df7a840f426eb566d52667b6950b92bb84b5 by Adam Sweeney in 1993,
    and removed in commit 19f899a3ab155ff6a49c0c79b06f2f61059afaf3 by
    Steve Lord in 2003. And as long as it didn't slip through git bisects
    cracks never actually used in that time frame.

    A quick audit of the callers of xfs_trans_iget shows that no caller
    really relies on this behaviour fortunately - xfs_ialloc allows this
    inode from disk so it must not be there before, and all the RT allocator
    routines only every add each RT bitmap inode once.

    In addition to removing lots of code and reducing the size of the inode
    item this patch also avoids the double inode cache lookup in each
    create/mkdir/mknod transaction.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Signed-off-by: Felix Blyakher

    Christoph Hellwig
     

04 Dec, 2008

1 commit

  • Use xfs_trans_ijoin in xfs_trans_iget in case we need to join an inode into
    a transaction instead of opencoding it. Based on a discussion with and an
    incomplete patch from Niv Sardi.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     

28 Jul, 2008

1 commit

  • kmem_free() function takes (ptr, size) arguments but doesn't actually use
    second one.

    This patch removes size argument from all callsites.

    SGI-PV: 981498
    SGI-Modid: xfs-linux-melb:xfs-kern:31050a

    Signed-off-by: Denys Vlasenko
    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    Denys Vlasenko
     

29 Apr, 2008

1 commit

  • The writer field is not needed for non_DEBU builds so remove it. While
    we're at i also clean up the interface for is locked asserts to go through
    and xfs_iget.c helper with an interface like the xfs_ilock routines to
    isolated the XFS codebase from mrlock internals. That way we can kill
    mrlock_t entirely once rw_semaphores grow an islocked facility. Also
    remove unused flags to the ilock family of functions.

    SGI-PV: 976035
    SGI-Modid: xfs-linux-melb:xfs-kern:30902a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     

20 Jun, 2006

1 commit


29 Mar, 2006

1 commit


02 Nov, 2005

2 commits


21 Jun, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds