28 Sep, 2016

1 commit

  • current_fs_time() uses struct super_block* as an argument.
    As per Linus's suggestion, this is changed to take struct
    inode* as a parameter instead. This is because the function
    is primarily meant for vfs inode timestamps.
    Also the function was renamed as per Arnd's suggestion.

    Change all calls to current_fs_time() to use the new
    current_time() function instead. current_fs_time() will be
    deleted.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: Al Viro

    Deepa Dinamani
     

09 Feb, 2016

2 commits

  • We can store the di_changecount in the i_version field of the VFS
    inode and remove another 8 bytes from the xfs_icdinode.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Dave Chinner
     
  • The struct xfs_inode has two copies of the current timestamps in it,
    one in the vfs inode and one in the struct xfs_icdinode. Now that we
    no longer log the struct xfs_icdinode directly, we don't need to
    keep the timestamps in this structure. instead we can copy them
    straight out of the VFS inode when formatting the inode log item or
    the on-disk inode.

    This reduces the struct xfs_inode in size by 24 bytes.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Dave Chinner
     

03 Nov, 2015

1 commit

  • xfs: timestamp updates cause excessive fdatasync log traffic

    Sage Weil reported that a ceph test workload was writing to the
    log on every fdatasync during an overwrite workload. Event tracing
    showed that the only metadata modification being made was the
    timestamp updates during the write(2) syscall, but fdatasync(2)
    is supposed to ignore them. The key observation was that the
    transactions in the log all looked like this:

    INODE: #regs: 4 ino: 0x8b flags: 0x45 dsize: 32

    And contained a flags field of 0x45 or 0x85, and had data and
    attribute forks following the inode core. This means that the
    timestamp updates were triggering dirty relogging of previously
    logged parts of the inode that hadn't yet been flushed back to
    disk.

    There are two parts to this problem. The first is that XFS relogs
    dirty regions in subsequent transactions, so it carries around the
    fields that have been dirtied since the last time the inode was
    written back to disk, not since the last time the inode was forced
    into the log.

    The second part is that on v5 filesystems, the inode change count
    update during inode dirtying also sets the XFS_ILOG_CORE flag, so
    on v5 filesystems this makes a timestamp update dirty the entire
    inode.

    As a result when fdatasync is run, it looks at the dirty fields in
    the inode, and sees more than just the timestamp flag, even though
    the only metadata change since the last fdatasync was just the
    timestamps. Hence we force the log on every subsequent fdatasync
    even though it is not needed.

    To fix this, add a new field to the inode log item that tracks
    changes since the last time fsync/fdatasync forced the log to flush
    the changes to the journal. This flag is updated when we dirty the
    inode, but we do it before updating the change count so it does not
    carry the "core dirty" flag from timestamp updates. The fields are
    zeroed when the inode is marked clean (due to writeback/freeing) or
    when an fsync/datasync forces the log. Hence if we only dirty the
    timestamps on the inode between fsync/fdatasync calls, the fdatasync
    will not trigger another log force.

    Over 100 runs of the test program:

    Ext4 baseline:
    runtime: 1.63s +/- 0.24s
    avg lat: 1.59ms +/- 0.24ms
    iops: ~2000

    XFS, vanilla kernel:
    runtime: 2.45s +/- 0.18s
    avg lat: 2.39ms +/- 0.18ms
    log forces: ~400/s
    iops: ~1000

    XFS, patched kernel:
    runtime: 1.49s +/- 0.26s
    avg lat: 1.46ms +/- 0.25ms
    log forces: ~30/s
    iops: ~1500

    Reported-by: Sage Weil
    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     

28 Nov, 2014

2 commits


02 Oct, 2014

1 commit

  • The typedef for timespecs and nanotime() are completely unnecessary,
    and delay() can be moved to fs/xfs/linux.h, which means this file
    can go away.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Dave Chinner
     

18 Nov, 2013

1 commit

  • Michael L Semon reported that generic/069 runtime increased on v5
    superblocks by 100% compared to v4 superblocks. his perf-based
    analysis pointed directly at the timestamp updates being done by the
    write path in this workload. The append writers are doing 4-byte
    writes, so there are lots of timestamp updates occurring.

    The thing is, they aren't being triggered by timestamp changes -
    they are being triggered by the inode change counter needing to be
    updated. That is, every write(2) system call needs to bump the inode
    version count, and it does that through the timestamp update
    mechanism. Hence for v5 filesystems, test generic/069 is running 3
    orders of magnitude more timestmap update transactions on v5
    filesystems due to the fact it does a huge number of *4 byte*
    write(2) calls.

    This isn't a real world scenario we really need to address - anyone
    doing such sequential IO should be using fwrite(3), not write(2).
    i.e. fwrite(3) buffers the writes in userspace to minimise the
    number of write(2) syscalls, and the problem goes away.

    However, there is a small change we can make to improve the
    situation - removing the expensive lock operation on the change
    counter update. All inode version counter changes in XFS occur
    under the ip->i_ilock during a transaction, and therefore we
    don't actually need the spin lock that provides exclusive access to
    it through inc_inode_iversion().

    Hence avoid the lock and just open code the increment ourselves when
    logging the inode.

    Reported-by: Michael L. Semon
    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     

24 Oct, 2013

3 commits

  • Currently the xfs_inode.h header has a dependency on the definition
    of the BMAP btree records as the inode fork includes an array of
    xfs_bmbt_rec_host_t objects in it's definition.

    Move all the btree format definitions from xfs_btree.h,
    xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
    xfs_format.h to continue the process of centralising the on-disk
    format definitions. With this done, the xfs inode definitions are no
    longer dependent on btree header files.

    The enables a massive culling of unnecessary includes, with close to
    200 #include directives removed from the XFS kernel code base.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • xfs_trans.h has a dependency on xfs_log.h for a couple of
    structures. Most code that does transactions doesn't need to know
    anything about the log, but this dependency means that they have to
    include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
    files and clean up the includes to be in dependency order.

    In doing this, remove the direct include of xfs_trans_reserve.h from
    xfs_trans.h so that we remove the dependency between xfs_trans.h and
    xfs_mount.h. Hence the xfs_trans.h include can be moved to the
    indicate the actual dependencies other header files have on it.

    Note that these are kernel only header files, so this does not
    translate to any userspace changes at all.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • All of the buffer operations structures are needed to be exported
    for xfs_db, so move them all to a common location rather than
    spreading them all over the place. They are verifying the on-disk
    format, so while xfs_format.h might be a good place, it is not part
    of the on disk format.

    Hence we need to create a new header file that we centralise these
    related definitions. Start by moving the bffer operations
    structures, and then also move all the other definitions that have
    crept into xfs_log_format.h and xfs_format.h as there was no other
    shared header file to put them in.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     

29 Jun, 2013

1 commit

  • For CRC enabled filesystems, add support for the monotonic inode
    version change counter that is needed by protocols like NFSv4 for
    determining if the inode has changed in any way at all between two
    unrelated operations on the inode.

    This bumps the change count the first time an inode is dirtied in a
    transaction. Since all modifications to the inode are logged, this
    will catch all changes that are made to the inode, including
    timestamp updates that occur during data writes.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Reviewed-by: Chandra Seetharaman
    Signed-off-by: Ben Myers

    Dave Chinner
     

18 Dec, 2012

1 commit


15 May, 2012

2 commits

  • With the removal of xfs_rw.h and other changes over time, xfs_bit.h
    is being included in many files that don't actually need it. Clean
    up the includes as necessary.

    Also move the only-used-once xfs_ialloc_find_free() static inline
    function out of a header file that is widely included to reduce
    the number of needless dependencies on xfs_bit.h.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Untangle the header file includes a bit by moving the definition of
    xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on
    xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include
    xfs_ag.h.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     

14 Mar, 2012

2 commits

  • Add a new ili_fields member to the inode log item to isolate the in-memory
    flags from the ones that actually go to the log. This will allow tracking
    timestamp-only updates for fdatasync and O_DSYNC in the next patch and
    prepares for divorcing the on-disk log format from the in-memory log item
    a little further down the road.

    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Timestamps on regular files are the last metadata that XFS does not update
    transactionally. Now that we use the delaylog mode exclusively and made
    the log scode scale extremly well there is no need to bypass that code for
    timestamp updates. Logging all updates allows to drop a lot of code, and
    will allow for further performance improvements later on.

    Note that this patch drops optimized handling of fdatasync - it will be
    added back in a separate commit.

    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

12 Oct, 2011

1 commit

  • There is no reason to keep a reference to the inode even if we unlock
    it during transaction commit because we never drop a reference between
    the ijoin and commit. Also use this fact to merge xfs_trans_ijoin_ref
    back into xfs_trans_ijoin - the third argument decides if an unlock
    is needed now.

    I'm actually starting to wonder if allowing inodes to be unlocked
    at transaction commit really is worth the effort. The only real
    benefit is that they can be unlocked earlier when commiting a
    synchronous transactions, but that could be solved by doing the
    log force manually after the unlock, too.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

08 Jul, 2011

1 commit

  • Remove the transaction pointer in the inode. It's only used to avoid
    passing down an argument in the bmap code, and for a few asserts in
    the transaction code right now.

    Also use the local variable ip in a few more places in xfs_inode_item_unlock,
    so that it isn't only used for debug builds after the above change.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     

31 Mar, 2011

1 commit


23 Feb, 2011

1 commit

  • Currently we return iodes from xfs_ialloc with just a single reference held.
    But we need two references, as one is dropped during transaction commit and
    the second needs to be transfered to the VFS. Change xfs_ialloc to use
    xfs_iget plus xfs_trans_ijoin_ref to grab two references to the inode,
    and remove the now superflous IHOLD calls from all callers. This also
    greatly simplifies the error handling in xfs_create and also allow to remove
    xfs_trans_iget as no other callers are left.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

19 Oct, 2010

1 commit

  • Under heavy multi-way parallel create workloads, the VFS struggles
    to write back all the inodes that have been changed in age order.
    The bdi flusher thread becomes CPU bound, spending 85% of it's time
    in the VFS code, mostly traversing the superblock dirty inode list
    to separate dirty inodes old enough to flush.

    We already keep an index of all metadata changes in age order - in
    the AIL - and continued log pressure will do age ordered writeback
    without any extra overhead at all. If there is no pressure on the
    log, the xfssyncd will periodically write back metadata in ascending
    disk address offset order so will be very efficient.

    Hence we can stop marking VFS inodes dirty during transaction commit
    or when changing timestamps during transactions. This will keep the
    inodes in the superblock dirty list to those containing data or
    unlogged metadata changes.

    However, the timstamp changes are slightly more complex than this -
    there are a couple of places that do unlogged updates of the
    timestamps, and the VFS need to be informed of these. Hence add a
    new function xfs_trans_ichgtime() for transactional changes,
    and leave xfs_ichgtime() for the non-transactional changes.

    Signed-off-by: Dave Chinner
    Reviewed-by: Alex Elder
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     

27 Jul, 2010

4 commits

  • Currently we need to either call IHOLD or xfs_trans_ihold on an inode when
    joining it to a transaction via xfs_trans_ijoin.

    This patches instead makes xfs_trans_ijoin usable on it's own by doing
    an implicity xfs_trans_ihold, which also allows us to drop the third
    argument. For the case where we want to hold a reference on the inode
    a xfs_trans_ijoin_ref wrapper is added which does the IHOLD and marks
    the inode for needing an xfs_iput. In addition to the cleaner interface
    to the caller this also simplifies the implementation.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     
  • Currently we track log item descriptor belonging to a transaction using a
    complex opencoded chunk allocator. This code has been there since day one
    and seems to work around the lack of an efficient slab allocator.

    This patch replaces it with dynamically allocated log item descriptors
    from a dedicated slab pool, linked to the transaction by a linked list.

    This allows to greatly simplify the log item descriptor tracking to the
    point where it's just a couple hundred lines in xfs_trans.c instead of
    a separate file. The external API has also been simplified while we're
    at it - the xfs_trans_add_item and xfs_trans_del_item functions to add/
    delete items from a transaction have been simplified to the bare minium,
    and the xfs_trans_find_item function is replaced with a direct dereference
    of the li_desc field. All debug code walking the list of log items in
    a transaction is down to a simple list_for_each_entry.

    Note that we could easily use a singly linked list here instead of the
    double linked list from list.h as the fastpath only does deletion from
    sequential traversal. But given that we don't have one available as
    a library function yet I use the list.h functions for simplicity.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     
  • Dmapi support was never merged upstream, but we still have a lot of hooks
    bloating XFS for it, all over the fast pathes of the filesystem.

    This patch drops over 700 lines of dmapi overhead. If we'll ever get HSM
    support in mainline at least the namespace events can be done much saner
    in the VFS instead of the individual filesystem, so it's not like this
    is much help for future work.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     

24 Jun, 2010

1 commit

  • The block number comes from bulkstat based inode lookups to shortcut
    the mapping calculations. We ar enot able to trust anything from
    bulkstat, so drop the block number as well so that the correct
    lookups and mappings are always done.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     

02 Sep, 2009

1 commit

  • xfs_trans_iget is a wrapper for xfs_iget that adds the inode to the
    transaction after it is read. Except when the inode already is in the
    inode cache, in which case it returns the existing locked inode with
    increment lock recursion counts.

    Now, no one in the tree every decrements these lock recursion counts,
    so any user of this gets a potential double unlock when both the original
    owner of the inode and the xfs_trans_iget caller unlock it. When looking
    back in a git bisect in the historic XFS tree there was only one place
    that decremented these counts, xfs_trans_iput. Introduced in commit
    ca25df7a840f426eb566d52667b6950b92bb84b5 by Adam Sweeney in 1993,
    and removed in commit 19f899a3ab155ff6a49c0c79b06f2f61059afaf3 by
    Steve Lord in 2003. And as long as it didn't slip through git bisects
    cracks never actually used in that time frame.

    A quick audit of the callers of xfs_trans_iget shows that no caller
    really relies on this behaviour fortunately - xfs_ialloc allows this
    inode from disk so it must not be there before, and all the RT allocator
    routines only every add each RT bitmap inode once.

    In addition to removing lots of code and reducing the size of the inode
    item this patch also avoids the double inode cache lookup in each
    create/mkdir/mknod transaction.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Signed-off-by: Felix Blyakher

    Christoph Hellwig
     

04 Dec, 2008

1 commit

  • Use xfs_trans_ijoin in xfs_trans_iget in case we need to join an inode into
    a transaction instead of opencoding it. Based on a discussion with and an
    incomplete patch from Niv Sardi.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     

28 Jul, 2008

1 commit

  • kmem_free() function takes (ptr, size) arguments but doesn't actually use
    second one.

    This patch removes size argument from all callsites.

    SGI-PV: 981498
    SGI-Modid: xfs-linux-melb:xfs-kern:31050a

    Signed-off-by: Denys Vlasenko
    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    Denys Vlasenko
     

29 Apr, 2008

1 commit

  • The writer field is not needed for non_DEBU builds so remove it. While
    we're at i also clean up the interface for is locked asserts to go through
    and xfs_iget.c helper with an interface like the xfs_ilock routines to
    isolated the XFS codebase from mrlock internals. That way we can kill
    mrlock_t entirely once rw_semaphores grow an islocked facility. Also
    remove unused flags to the ilock family of functions.

    SGI-PV: 976035
    SGI-Modid: xfs-linux-melb:xfs-kern:30902a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     

20 Jun, 2006

1 commit


29 Mar, 2006

1 commit


02 Nov, 2005

2 commits


21 Jun, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds