30 Jul, 2012

2 commits

  • An inode that enters xfs_inactive has been removed from all global
    lists but the inode hash, and can't be recycled in xfs_iget before
    it has been marked reclaimable. Thus taking the iolock in here
    is not nessecary at all, and given the amount of lockdep false
    positives it has triggered already I'd rather remove the locking.

    The only change outside of xfs_inactive is relaxing an assert in
    xfs_itruncate_extents.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Rich Johnston
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • We can simplify check the IO_agbp pointer for being non-NULL instead of
    passing another argument through two layers of function calls.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

22 Jul, 2012

2 commits

  • There is no need to keep this helper around, opencoding it in the only
    caller is just as clear.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • All callers of xfs_imap_to_bp want the dinode pointer, so let's calculate it
    inside xfs_imap_to_bp. Once that is done xfs_itobp becomes a fairly pointless
    wrapper which can be replaced with direct calls to xfs_imap_to_bp.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

15 Jun, 2012

1 commit

  • XFS_MAXIOFFSET() is just a simple macro that resolves to
    mp->m_maxioffset. It doesn't need to exist, and it just makes the
    code unnecessarily loud and shouty.

    Make it quiet and easy to read.

    Signed-off-by: Dave Chinner
    Reviewed-by: Eric Sandeen
    Signed-off-by: Ben Myers

    Dave Chinner
     

15 May, 2012

8 commits

  • Rather than specifying XBF_MAPPED for almost all buffers, introduce
    XBF_UNMAPPED for the couple of users that use unmapped buffers.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • With the removal of xfs_rw.h and other changes over time, xfs_bit.h
    is being included in many files that don't actually need it. Clean
    up the includes as necessary.

    Also move the only-used-once xfs_ialloc_find_free() static inline
    function out of a header file that is widely included to reduce
    the number of needless dependencies on xfs_bit.h.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • The only thing left in xfs_rw.h is a function prototype for an inode
    function. Move that to xfs_inode.h, and kill xfs_rw.h.

    Also move the function implementing the prototype from xfs_rw.c to
    xfs_inode.c so we only have one function left in xfs_rw.c

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Buffers are always returned locked from the lookup routines. Hence
    we don't need to tell the lookup routines to return locked buffers,
    on to try and lock them. Remove XBF_LOCK from all the callers and
    from internal buffer cache usage.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • xfs_trans_ail_delete_bulk() can be called from different contexts so
    if the item is not in the AIL we need different shutdown for each
    context. Pass in the shutdown method needed so the correct action
    can be taken.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Queue delwri buffers on a local on-stack list instead of a per-buftarg one,
    and write back the buffers per-process instead of by waking up xfsbufd.

    This is now easily doable given that we have very few places left that write
    delwri buffers:

    - log recovery:
    Only done at mount time, and already forcing out the buffers
    synchronously using xfs_flush_buftarg

    - quotacheck:
    Same story.

    - dquot reclaim:
    Writes out dirty dquots on the LRU under memory pressure. We might
    want to look into doing more of this via xfsaild, but it's already
    more optimal than the synchronous inode reclaim that writes each
    buffer synchronously.

    - xfsaild:
    This is the main beneficiary of the change. By keeping a local list
    of buffers to write we reduce latency of writing out buffers, and
    more importably we can remove all the delwri list promotions which
    were hitting the buffer cache hard under sustained metadata loads.

    The implementation is very straight forward - xfs_buf_delwri_queue now gets
    a new list_head pointer that it adds the delwri buffers to, and all callers
    need to eventually submit the list using xfs_buf_delwi_submit or
    xfs_buf_delwi_submit_nowait. Buffers that already are on a delwri list are
    skipped in xfs_buf_delwri_queue, assuming they already are on another delwri
    list. The biggest change to pass down the buffer list was done to the AIL
    pushing. Now that we operate on buffers the trylock, push and pushbuf log
    item methods are merged into a single push routine, which tries to lock the
    item, and if possible add the buffer that needs writeback to the buffer list.
    This leads to much simpler code than the previous split but requires the
    individual IOP_PUSH instances to unlock and reacquire the AIL around calls
    to blocking routines.

    Given that xfsailds now also handle writing out buffers, the conditions for
    log forcing and the sleep times needed some small changes. The most
    important one is that we consider an AIL busy as long we still have buffers
    to push, and the other one is that we do increment the pushed LSN for
    buffers that are under flushing at this moment, but still count them towards
    the stuck items for restart purposes. Without this we could hammer on stuck
    items without ever forcing the log and not make progress under heavy random
    delete workloads on fast flash storage devices.

    [ Dave Chinner:
    - rebase on previous patches.
    - improved comments for XBF_DELWRI_Q handling
    - fix XBF_ASYNC handling in queue submission (test 106 failure)
    - rename delwri submit function buffer list parameters for clarity
    - xfs_efd_item_push() should return XFS_ITEM_PINNED ]

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Instead of writing the buffer directly from inside xfs_iflush return it to
    the caller and let the caller decide what to do with the buffer. Also
    remove the pincount check in xfs_iflush that all non-blocking callers already
    implement and the now unused flags parameter.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • If a filesystem has been forced shutdown we are never going to write inodes
    to disk, which means the inode items will stay in the AIL until we free
    the inode. Currently that is not a problem, but a pending change requires us
    to empty the AIL before shutting down the filesystem. In that case leaving
    the inode in the AIL is lethal. Make sure to remove the log item from the AIL
    to allow emptying the AIL on shutdown filesystems.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

14 Mar, 2012

2 commits

  • Add a new ili_fields member to the inode log item to isolate the in-memory
    flags from the ones that actually go to the log. This will allow tracking
    timestamp-only updates for fdatasync and O_DSYNC in the next patch and
    prepares for divorcing the on-disk log format from the in-memory log item
    a little further down the road.

    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Timestamps on regular files are the last metadata that XFS does not update
    transactionally. Now that we use the delaylog mode exclusively and made
    the log scode scale extremly well there is no need to bypass that code for
    timestamp updates. Logging all updates allows to drop a lot of code, and
    will allow for further performance improvements later on.

    Note that this patch drops optimized handling of fdatasync - it will be
    added back in a separate commit.

    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

18 Jan, 2012

4 commits

  • There is no fundamental need to keep an in-memory inode size copy in the XFS
    inode. We already have the on-disk value in the dinode, and the separate
    in-memory copy that we need for regular files only in the XFS inode.

    Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the
    VFS inode i_size field for regular files. Switch code that was directly
    accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where
    we are limited to regular files direct access of the VFS inode i_size field.

    This also allows dropping some fairly complicated code in the write path
    which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size
    that is getting updated inside ->write_end.

    Note that we do not bother resetting the VFS i_size when truncating a file
    that gets freed to zero as there is no point in doing so because the VFS inode
    is no longer in use at this point. Just relax the assert in xfs_ifree to
    only check the on-disk size instead.

    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Replace i_pin_wait, which is only used during synchronous inode flushing
    with a bit waitqueue. This trades off a much smaller inode against
    slightly slower wakeup performance, and saves 12 (32-bit) or 20 (64-bit)
    bytes in the XFS inode.

    Reviewed-by: Alex Elder
    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • We almost never block on i_flock, the exception is synchronous inode
    flushing. Instead of bloating the inode with a 16/24-byte completion
    that we abuse as a semaphore just implement it as a bitlock that uses
    a bit waitqueue for the rare sleeping path. This primarily is a
    tradeoff between a much smaller inode and a faster non-blocking
    path vs faster wakeups, and we are much better off with the former.

    A small downside is that we will lose lockdep checking for i_flock, but
    given that it's always taken inside the ilock that should be acceptable.

    Note that for example the inode writeback locking is implemented in a
    very similar way.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • We spent a lot of effort to maintain this field, but it always equals to the
    fork size divided by the constant size of an extent. The prime use of it is
    to assert that the two stay in sync. Just divide the fork size by the extent
    size in the few places that we actually use it and remove the overhead
    of maintaining it. Also introduce a few helpers to consolidate the places
    where we actually care about the value.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

14 Jan, 2012

1 commit

  • This wrapper isn't overly useful, not to say rather confusing.

    Around the call to xfs_itruncate_extents it does:

    - add tracing
    - add a few asserts in debug builds
    - conditionally update the inode size in two places
    - log the inode

    Both the tracing and the inode logging can be moved to xfs_itruncate_extents
    as they are useful for the attribute fork as well - in fact the attr code
    already does an equivalent xfs_trans_log_inode call just after calling
    xfs_itruncate_extents. The conditional size updates are a mess, and there
    was no reason to do them in two places anyway, as the first one was
    conditional on the inode having extents - but without extents we
    xfs_itruncate_extents would be a no-op and the placement wouldn't matter
    anyway. Instead move the size assignments and the asserts that make sense
    to the callers that want it.

    As a side effect of this clean up xfs_setattr_size by introducing variables
    for the old and new inode size, and moving the size updates into a common
    place.

    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

04 Jan, 2012

1 commit


30 Nov, 2011

1 commit

  • If we are doing synchronous inode reclaim we block the VM from making
    progress in memory reclaim. So if we encouter a flush locked inode
    promote it in the delwri list and wake up xfsbufd to write it out now.
    Without this we can get hangs of up to 30 seconds during workloads hitting
    synchronous inode reclaim.

    The scheme is copied from what we do for dquot reclaims.

    Reported-by: Simon Kirby
    Signed-off-by: Christoph Hellwig
    Tested-by: Simon Kirby
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

12 Oct, 2011

7 commits

  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • There is no reason to keep a reference to the inode even if we unlock
    it during transaction commit because we never drop a reference between
    the ijoin and commit. Also use this fact to merge xfs_trans_ijoin_ref
    back into xfs_trans_ijoin - the third argument decides if an unlock
    is needed now.

    I'm actually starting to wonder if allowing inodes to be unlocked
    at transaction commit really is worth the effort. The only real
    benefit is that they can be unlocked earlier when commiting a
    synchronous transactions, but that could be solved by doing the
    log force manually after the unlock, too.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • xfs_bmapi() currently handles both extent map reading and
    allocation. As a result, the code is littered with "if (wr)"
    branches to conditionally do allocation operations if required.
    This makes the code much harder to follow and causes significant
    indent issues with the code.

    Given that read mapping is much simpler than allocation, we can
    split out read mapping from xfs_bmapi() and reuse the logic that
    we have already factored out do do all the hard work of handling the
    extent map manipulations. The results in a much simpler function for
    the common extent read operations, and will allow the allocation
    code to be simplified in another commit.

    Once xfs_bmapi_read() is implemented, convert all the callers of
    xfs_bmapi() that are only reading extents to use the new function.

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • Check the return value of xfs_trans_get_buf() and fail
    appropriately.

    Signed-off-by: Chandra Seetharaman
    Signed-off-by: Alex Elder

    Chandra Seetharaman
     
  • Remove the xfs_buf_relse from xfs_bwrite and let the caller handle it to
    mirror the delwri and read paths.

    Also remove the mount pointer passed to xfs_bwrite, which is superflous now
    that we have a mount pointer in the buftarg.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • Unify the ways we add buffers to the delwri queue by always calling
    xfs_buf_delwri_queue directly. The xfs_bdwrite functions is removed and
    opencoded in its callers, and the two places setting XBF_DELWRI while a
    buffer is locked and expecting xfs_buf_unlock to pick it up are converted
    to call xfs_buf_delwri_queue directly, too. Also replace the
    XFS_BUF_UNDELAYWRITE macro with direct calls to xfs_buf_delwri_dequeue
    to make the explicit queuing/dequeuing more obvious.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

08 Aug, 2011

1 commit


27 Jul, 2011

1 commit


26 Jul, 2011

2 commits


13 Jul, 2011

3 commits


08 Jul, 2011

4 commits

  • Micro-optimize various comparisms by always byteswapping the constant
    instead of the variable, which allows to do the swap at compile instead
    of runtime.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     
  • Remove the transaction pointer in the inode. It's only used to avoid
    passing down an argument in the bmap code, and for a few asserts in
    the transaction code right now.

    Also use the local variable ip in a few more places in xfs_inode_item_unlock,
    so that it isn't only used for debug builds after the above change.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     
  • Split the guts of xfs_itruncate_finish that loop over the existing extents
    and calls xfs_bunmapi on them into a new helper, xfs_itruncate_externs.
    Make xfs_attr_inactive call it directly instead of xfs_itruncate_finish,
    which allows to simplify the latter a lot, by only letting it deal with
    the data fork. As a result xfs_itruncate_finish is renamed to
    xfs_itruncate_data to make its use case more obvious.

    Also remove the sync parameter from xfs_itruncate_data, which has been
    unessecary since the introduction of the busy extent list in 2002, and
    completely dead code since 2003 when the XFS_BMAPI_ASYNC parameter was
    made a no-op.

    I can't actually see why the xfs_attr_inactive needs to set the transaction
    sync, but let's keep this patch simple and without changes in behaviour.

    Also avoid passing a useless argument to xfs_isize_check, and make it
    private to xfs_inode.c.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     
  • xfs_itruncate_start is a rather length wrapper that evaluates to a call
    to xfs_ioend_wait and xfs_tosspages, and only has two callers.

    Instead of using the complicated checks left over from IRIX where we
    can to truncate the pagecache just call xfs_tosspages
    (aka truncate_inode_pages) directly as we want to get rid of all data
    after i_size, and truncate_inode_pages handles incorrect alignments
    and too large offsets just fine.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Reviewed-by: Dave Chinner

    Christoph Hellwig