02 Dec, 2011

1 commit


12 Oct, 2011

8 commits

  • Directories are only updated transactionally, which means fsync only
    needs to flush the log the inode is currently dirty, but not bother
    with checking for dirty data, non-transactional updates, and most
    importanly doesn't have to flush disk caches except as part of a
    transaction commit.

    While the first two optimizations can't easily be measured, the
    latter actually makes a difference when doing lots of fsync that do
    not actually have to commit the inode, e.g. because an earlier fsync
    already pushed the log far enough.

    The new xfs_dir_fsync is identical to xfs_nfs_commit_metadata except
    for the prototype, but I'm not sure creating a common helper for the
    two is worth it given how simple the functions are.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • There is no reason to keep a reference to the inode even if we unlock
    it during transaction commit because we never drop a reference between
    the ijoin and commit. Also use this fact to merge xfs_trans_ijoin_ref
    back into xfs_trans_ijoin - the third argument decides if an unlock
    is needed now.

    I'm actually starting to wonder if allowing inodes to be unlocked
    at transaction commit really is worth the effort. The only real
    benefit is that they can be unlocked earlier when commiting a
    synchronous transactions, but that could be solved by doing the
    log force manually after the unlock, too.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • Only read the LSN we need to push to with the ilock held, and then release
    it before we do the log force to improve concurrency.

    This also removes the only direct caller of _xfs_trans_commit, thus
    allowing it to be merged into the plain xfs_trans_commit again.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • xfs_bmapi() currently handles both extent map reading and
    allocation. As a result, the code is littered with "if (wr)"
    branches to conditionally do allocation operations if required.
    This makes the code much harder to follow and causes significant
    indent issues with the code.

    Given that read mapping is much simpler than allocation, we can
    split out read mapping from xfs_bmapi() and reuse the logic that
    we have already factored out do do all the hard work of handling the
    extent map manipulations. The results in a much simpler function for
    the common extent read operations, and will allow the allocation
    code to be simplified in another commit.

    Once xfs_bmapi_read() is implemented, convert all the callers of
    xfs_bmapi() that are only reading extents to use the new function.

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • Currently a buffered reader or writer can add pages to the pagecache
    while we are waiting for the iolock in xfs_file_dio_aio_write. Prevent
    this by re-checking mapping->nrpages after we got the iolock, and if
    nessecary upgrade the lock to exclusive mode. To simplify this a bit
    only take the ilock inside of xfs_file_aio_write_checks.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • We now have an i_dio_count filed and surrounding infrastructure to wait
    for direct I/O completion instead of i_icount, and we have never needed
    to iocount waits for buffered I/O given that we only set the page uptodate
    after finishing all required work. Thus remove i_iocount, and replace
    the actually needed waits with calls to inode_dio_wait.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • For append write workloads, extending the file requires a certain
    amount of exclusive locking to be done up front to ensure sanity in
    things like ensuring that we've zeroed any allocated regions
    between the old EOF and the start of the new IO.

    For single threads, this typically isn't a problem, and for large
    IOs we don't serialise enough for it to be a problem for two
    threads on really fast block devices. However for smaller IO and
    larger thread counts we have a problem.

    Take 4 concurrent sequential, single block sized and aligned IOs.
    After the first IO is submitted but before it completes, we end up
    with this state:

    IO 1 IO 2 IO 3 IO 4
    +-------+-------+-------+-------+
    ^ ^
    | |
    | |
    | |
    | \- ip->i_new_size
    \- ip->i_size

    And the IO is done without exclusive locking because offset i_size. When we submit IO 2, we see offset > ip->i_size, and
    grab the IO lock exclusive, because there is a chance we need to do
    EOF zeroing. However, there is already an IO in progress that avoids
    the need for IO zeroing because offset i_new_size. hence we
    could avoid holding the IO lock exlcusive for this. Hence after
    submission of the second IO, we'd end up this state:

    IO 1 IO 2 IO 3 IO 4
    +-------+-------+-------+-------+
    ^ ^
    | |
    | |
    | |
    | \- ip->i_new_size
    \- ip->i_size

    There is no need to grab the i_mutex of the IO lock in exclusive
    mode if we don't need to invalidate the page cache. Taking these
    locks on every direct IO effective serialises them as taking the IO
    lock in exclusive mode has to wait for all shared holders to drop
    the lock. That only happens when IO is complete, so effective it
    prevents dispatch of concurrent direct IO writes to the same inode.

    And so you can see that for the third concurrent IO, we'd avoid
    exclusive locking for the same reason we avoided the exclusive lock
    for the second IO.

    Fixing this is a bit more complex than that, because we need to hold
    a write-submission local value of ip->i_new_size to that clearing
    the value is only done if no other thread has updated it before our
    IO completes.....

    Signed-off-by: Dave Chinner
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • There is no need to grab the i_mutex of the IO lock in exclusive
    mode if we don't need to invalidate the page cache. Taking these
    locks on every direct IO effective serialises them as taking the IO
    lock in exclusive mode has to wait for all shared holders to drop
    the lock. That only happens when IO is complete, so effective it
    prevents dispatch of concurrent direct IO reads to the same inode.

    Fix this by taking the IO lock shared to check the page cache state,
    and only then drop it and take the IO lock exclusively if there is
    work to be done. Hence for the normal direct IO case, no exclusive
    locking will occur.

    Signed-off-by: Dave Chinner
    Tested-by: Joern Engel
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     

13 Aug, 2011

1 commit

  • Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the
    annoying subdirectories in the XFS source code. Besides the large
    amount of file rename the only changes are to the Makefile, a few
    files including headers with the subdirectory prefix, and the binary
    sysctl compat code that includes a header under fs/xfs/ from
    kernel/.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig