27 Sep, 2017

1 commit

  • Since commit d531d91d6990 ("xfs: always use unwritten extents for
    direct I/O writes"), we start allocating unwritten extents for all
    direct writes to allow appending aio in XFS.

    But for dio writes that could extend file size we update the in-core
    inode size first, then convert the unwritten extents to real
    allocations at dio completion time in xfs_dio_write_end_io(). Thus a
    racing direct read could see the new i_size and find the unwritten
    extents first and read zeros instead of actual data, if the direct
    writer also takes a shared iolock.

    Fix it by updating the in-core inode size after the unwritten extent
    conversion. To do this, introduce a new boolean argument to
    xfs_iomap_write_unwritten() to tell if we want to update in-core
    i_size or not.

    Suggested-by: Brian Foster
    Reviewed-by: Brian Foster
    Signed-off-by: Eryu Guan
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Eryu Guan
     

12 Sep, 2017

1 commit

  • Pull libnvdimm from Dan Williams:
    "A rework of media error handling in the BTT driver and other updates.
    It has appeared in a few -next releases and collected some late-
    breaking build-error and warning fixups as a result.

    Summary:

    - Media error handling support in the Block Translation Table (BTT)
    driver is reworked to address sleeping-while-atomic locking and
    memory-allocation-context conflicts.

    - The dax_device lookup overhead for xfs and ext4 is moved out of the
    iomap hot-path to a mount-time lookup.

    - A new 'ecc_unit_size' sysfs attribute is added to advertise the
    read-modify-write boundary property of a persistent memory range.

    - Preparatory fix-ups for arm and powerpc pmem support are included
    along with other miscellaneous fixes"

    * tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (26 commits)
    libnvdimm, btt: fix format string warnings
    libnvdimm, btt: clean up warning and error messages
    ext4: fix null pointer dereference on sbi
    libnvdimm, nfit: move the check on nd_reserved2 to the endpoint
    dax: fix FS_DAX=n BLOCK=y compilation
    libnvdimm: fix integer overflow static analysis warning
    libnvdimm, nd_blk: remove mmio_flush_range()
    libnvdimm, btt: rework error clearing
    libnvdimm: fix potential deadlock while clearing errors
    libnvdimm, btt: cache sector_size in arena_info
    libnvdimm, btt: ensure that flags were also unchanged during a map_read
    libnvdimm, btt: refactor map entry operations with macros
    libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path
    libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute
    ext4: perform dax_device lookup at mount
    ext2: perform dax_device lookup at mount
    xfs: perform dax_device lookup at mount
    dax: introduce a fs_dax_get_by_bdev() helper
    libnvdimm, btt: check memory allocation failure
    libnvdimm, label: fix index block size calculation
    ...

    Linus Torvalds
     

02 Sep, 2017

2 commits


01 Sep, 2017

1 commit

  • The ->iomap_begin() operation is a hot path, so cache the
    fs_dax_get_by_host() result at mount time to avoid the incurring the
    hash lookup overhead on a per-i/o basis.

    Reported-by: Christoph Hellwig
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Dan Williams

    Dan Williams
     

11 Jul, 2017

1 commit

  • Pull XFS updates from Darrick Wong:
    "Here are some changes for you for 4.13. For the most part it's fixes
    for bugs and deadlock problems, and preparation for online fsck in
    some future merge window.

    - Avoid quotacheck deadlocks

    - Fix transaction overflows when bunmapping fragmented files

    - Refactor directory readahead

    - Allow admin to configure if ASSERT is fatal

    - Improve transaction usage detail logging during overflows

    - Minor cleanups

    - Don't leak log items when the log shuts down

    - Remove double-underscore typedefs

    - Various preparation for online scrubbing

    - Introduce new error injection configuration sysfs knobs

    - Refactor dq_get_next to use extent map directly

    - Fix problems with iterating the page cache for unwritten data

    - Implement SEEK_{HOLE,DATA} via iomap

    - Refactor XFS to use iomap SEEK_HOLE and SEEK_DATA

    - Don't use MAXPATHLEN to check on-disk symlink target lengths"

    * tag 'xfs-4.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (48 commits)
    xfs: don't crash on unexpected holes in dir/attr btrees
    xfs: rename MAXPATHLEN to XFS_SYMLINK_MAXLEN
    xfs: fix contiguous dquot chunk iteration livelock
    xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA
    vfs: Add iomap_seek_hole and iomap_seek_data helpers
    vfs: Add page_cache_seek_hole_data helper
    xfs: remove a whitespace-only line from xfs_fs_get_nextdqblk
    xfs: rewrite xfs_dq_get_next_id using xfs_iext_lookup_extent
    xfs: Check for m_errortag initialization in xfs_errortag_test
    xfs: grab dquots without taking the ilock
    xfs: fix semicolon.cocci warnings
    xfs: Don't clear SGID when inheriting ACLs
    xfs: free cowblocks and retry on buffered write ENOSPC
    xfs: replace log_badcrc_factor knob with error injection tag
    xfs: convert drop_writes to use the errortag mechanism
    xfs: remove unneeded parameter from XFS_TEST_ERROR
    xfs: expose errortag knobs via sysfs
    xfs: make errortag a per-mountpoint structure
    xfs: free uncommitted transactions during log recovery
    xfs: don't allow bmap on rt files
    ...

    Linus Torvalds
     

28 Jun, 2017

2 commits


20 Jun, 2017

1 commit

  • If IOCB_NOWAIT is set, bail if the i_rwsem is not lockable
    immediately.

    IF IOMAP_NOWAIT is set, return EAGAIN in xfs_file_iomap_begin
    if it needs allocation either due to file extension, writing to a hole,
    or COW or waiting for other DIOs to finish.

    Return -EAGAIN if we don't have extent list in memory.

    Signed-off-by: Goldwyn Rodrigues
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Jens Axboe

    Goldwyn Rodrigues
     

14 May, 2017

1 commit

  • Tetsuo reports:

    fs/built-in.o: In function `xfs_file_iomap_end':
    xfs_iomap.c:(.text+0xe0ef9): undefined reference to `put_dax'
    fs/built-in.o: In function `xfs_file_iomap_begin':
    xfs_iomap.c:(.text+0xe1a7f): undefined reference to `dax_get_by_host'
    make: *** [vmlinux] Error 1
    $ grep DAX .config
    CONFIG_DAX=m
    # CONFIG_DEV_DAX is not set
    # CONFIG_FS_DAX is not set

    When FS_DAX=n we can/must throw away the dax code in filesystems.
    Implement 'fs_' versions of dax_get_by_host() and put_dax() that are
    nops in the FS_DAX=n case.

    Cc:
    Cc:
    Cc: Jan Kara
    Cc: "Theodore Ts'o"
    Cc: "Darrick J. Wong"
    Cc: Ross Zwisler
    Tested-by: Tony Luck
    Fixes: ef51042472f5 ("block, dax: move 'select DAX' from BLOCK to FS_DAX")
    Reported-by: Tetsuo Handa
    Signed-off-by: Dan Williams

    Dan Williams
     

07 May, 2017

1 commit

  • Pull xfs updates from Darrick Wong:
    "Here are the XFS changes for 4.12. The big new feature for this
    release is the new space mapping ioctl that we've been discussing
    since LSF2016, but other than that most of the patches are larger bug
    fixes, memory corruption prevention, and other cleanups.

    Summary:
    - various code cleanups
    - introduce GETFSMAP ioctl
    - various refactoring
    - avoid dio reads past eof
    - fix memory corruption and other errors with fragmented directory blocks
    - fix accidental userspace memory corruptions
    - publish fs uuid in superblock
    - make fstrim terminatable
    - fix race between quotaoff and in-core inode creation
    - avoid use-after-free when finishing up w/ buffer heads
    - reserve enough space to handle bmap tree resizing during cow remap"

    * tag 'xfs-4.12-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (53 commits)
    xfs: fix use-after-free in xfs_finish_page_writeback
    xfs: reserve enough blocks to handle btree splits when remapping
    xfs: wait on new inodes during quotaoff dquot release
    xfs: update ag iterator to support wait on new inodes
    xfs: support ability to wait on new inodes
    xfs: publish UUID in struct super_block
    xfs: Allow user to kill fstrim process
    xfs: better log intent item refcount checking
    xfs: fix up quotacheck buffer list error handling
    xfs: remove xfs_trans_ail_delete_bulk
    xfs: don't use bool values in trace buffers
    xfs: fix getfsmap userspace memory corruption while setting OF_LAST
    xfs: fix __user annotations for xfs_ioc_getfsmap
    xfs: corruption needs to respect endianess too!
    xfs: use NULL instead of 0 to initialize a pointer in xfs_ioc_getfsmap
    xfs: use NULL instead of 0 to initialize a pointer in xfs_getfsmap
    xfs: simplify validation of the unwritten extent bit
    xfs: remove unused values from xfs_exntst_t
    xfs: remove the unused XFS_MAXLINK_1 define
    xfs: more do_div cleanups
    ...

    Linus Torvalds
     

26 Apr, 2017

1 commit


07 Apr, 2017

1 commit


04 Apr, 2017

1 commit


09 Mar, 2017

1 commit

  • Commit fa7f138 ("xfs: clear delalloc and cache on buffered write
    failure") fixed one regression in the iomap error handling code and
    exposed another. The fundamental problem is that if a buffered write
    is a rewrite of preexisting delalloc blocks and the write fails, the
    failure handling code can punch out preexisting blocks with valid
    file data.

    This was reproduced directly by sub-block writes in the LTP
    kernel/syscalls/write/write03 test. A first 100 byte write allocates
    a single block in a file. A subsequent 100 byte write fails and
    punches out the block, including the data successfully written by
    the previous write.

    To address this problem, update the ->iomap_begin() handler to
    distinguish newly allocated delalloc blocks from preexisting
    delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the
    ->iomap_end() handler to decide when a failed or short write should
    punch out delalloc blocks.

    This introduces the subtle requirement that ->iomap_begin() should
    never combine newly allocated delalloc blocks with existing blocks
    in the resulting iomap descriptor. This can occur when a new
    delalloc reservation merges with a neighboring extent that is part
    of the current write, for example. Therefore, drop the
    post-allocation extent lookup from xfs_bmapi_reserve_delalloc() and
    just return the record inserted into the fork. This ensures only new
    blocks are returned and thus that preexisting delalloc blocks are
    always handled as "found" blocks and not punched out on a failed
    rewrite.

    Reported-by: Xiong Zhou
    Signed-off-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     

17 Feb, 2017

2 commits

  • A debug mode write failure mechanism was introduced to XFS in commit
    801cc4e17a ("xfs: debug mode forced buffered write failure") to
    facilitate targeted testing of delalloc indirect reservation management
    from userspace. This code was subsequently rendered ineffective by the
    move to iomap based buffered writes in commit 68a9f5e700 ("xfs:
    implement iomap based buffered write path"). This likely went unnoticed
    because the associated userspace code had not made it into xfstests.

    Resurrect this mechanism to facilitate effective indlen reservation
    testing from xfstests. The move to iomap based buffered writes relocated
    the hook this mechanism needs to return write failure from XFS to
    generic code. The failure trigger must remain in XFS. Given that
    limitation, convert this from a write failure mechanism to one that
    simply drops writes without returning failure to userspace. Rename all
    "fail_writes" references to "drop_writes" to illustrate the point. This
    is more hacky than preferred, but still triggers the XFS error handling
    behavior required to drive the indlen tests. This is only available in
    DEBUG mode and for testing purposes only.

    Signed-off-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     
  • The buffered write failure handling code in
    xfs_file_iomap_end_delalloc() has a couple minor problems. First, if
    written == 0, start_fsb is not rounded down and it fails to kill off a
    delalloc block if the start offset is block unaligned. This results in a
    lingering delalloc block and broken delalloc block accounting detected
    at unmount time. Fix this by rounding down start_fsb in the unlikely
    event that written == 0.

    Second, it is possible for a failed overwrite of a delalloc extent to
    leave dirty pagecache around over a hole in the file. This is because is
    possible to hit ->iomap_end() on write failure before the iomap code has
    attempted to allocate pagecache, and thus has no need to clean it up. If
    the targeted delalloc extent was successfully written by a previous
    write, however, then it does still have dirty pages when ->iomap_end()
    punches out the underlying blocks. This ultimately results in writeback
    over a hole. To fix this problem, unconditionally punch out the
    pagecache from XFS before the associated delalloc range.

    Signed-off-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     

07 Feb, 2017

3 commits

  • Instead of preallocating all the required COW blocks in the high-level
    write code do it inside the iomap code, like we do for all other I/O.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Factor a helper to calculate the extent-size aligned block out of the
    iomap code, so that it can be reused by the upcoming reflink dio code.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • We currently fall back from direct to buffered writes if we detect a
    remaining shared extent in the iomap_begin callback. But by the time
    iomap_begin is called for the potentially unaligned end block we might
    have already written most of the data to disk, which we'd now write
    again using buffered I/O. To avoid this reject all writes to reflinked
    files before starting I/O so that we are guaranteed to only write the
    data once.

    The alternative would be to unshare the unaligned start and/or end block
    before doing the I/O. I think that's doable, and will actually be
    required to support reflinks on DAX file system. But it will take a
    little more time and I'd rather get rid of the double write ASAP.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

03 Feb, 2017

1 commit

  • Christoph Hellwig pointed out that there's a potentially nasty race when
    performing simultaneous nearby directio cow writes:

    "Thread 1 writes a range from B to c

    " B --------- C
    p

    "a little later thread 2 writes from A to B

    " A --------- B
    p

    [editor's note: the 'p' denote cowextsize boundaries, which I added to
    make this more clear]

    "but the code preallocates beyond B into the range where thread
    "1 has just written, but ->end_io hasn't been called yet.
    "But once ->end_io is called thread 2 has already allocated
    "up to the extent size hint into the write range of thread 1,
    "so the end_io handler will splice the unintialized blocks from
    "that preallocation back into the file right after B."

    We can avoid this race by ensuring that thread 1 cannot accidentally
    remap the blocks that thread 2 allocated (as part of speculative
    preallocation) as part of t2's write preparation in t1's end_io handler.
    The way we make this happen is by taking advantage of the unwritten
    extent flag as an intermediate step.

    Recall that when we begin the process of writing data to shared blocks,
    we create a delayed allocation extent in the CoW fork:

    D: --RRRRRRSSSRRRRRRRR---
    C: ------DDDDDDD---------

    When a thread prepares to CoW some dirty data out to disk, it will now
    convert the delalloc reservation into an /unwritten/ allocated extent in
    the cow fork. The da conversion code tries to opportunistically
    allocate as much of a (speculatively prealloc'd) extent as possible, so
    we may end up allocating a larger extent than we're actually writing
    out:

    D: --RRRRRRSSSRRRRRRRR---
    U: ------UUUUUUU---------

    Next, we convert only the part of the extent that we're actively
    planning to write to normal (i.e. not unwritten) status:

    D: --RRRRRRSSSRRRRRRRR---
    U: ------UURRUUU---------

    If the write succeeds, the end_cow function will now scan the relevant
    range of the CoW fork for real extents and remap only the real extents
    into the data fork:

    D: --RRRRRRRRSRRRRRRRR---
    U: ------UU--UUU---------

    This ensures that we never obliterate valid data fork extents with
    unwritten blocks from the CoW fork.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     

31 Jan, 2017

1 commit


24 Jan, 2017

1 commit

  • Due to the way how xfs_iomap_write_allocate tries to convert the whole
    found extents from delalloc to real space we can run into a race
    condition with multiple threads doing writes to this same extent.
    For the non-COW case that is harmless as the only thing that can happen
    is that we call xfs_bmapi_write on an extent that has already been
    converted to a real allocation. For COW writes where we move the extent
    from the COW to the data fork after I/O completion the race is, however,
    not quite as harmless. In the worst case we are now calling
    xfs_bmapi_write on a region that contains hole in the COW work, which
    will trip up an assert in debug builds or lead to file system corruption
    in non-debug builds. This seems to be reproducible with workloads of
    small O_DSYNC write, although so far I've not managed to come up with
    a with an isolated reproducer.

    The fix for the issue is relatively simple: tell xfs_bmapi_write
    that we are only asked to convert delayed allocations and skip holes
    in that case.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

30 Nov, 2016

1 commit

  • Straight switch over to using iomap for direct I/O - we already have the
    non-COW dio path in write_begin for DAX and files with extent size hints,
    so nothing to add there. The COW path is ported over from the old
    get_blocks version and a bit of a mess, but I have some work in progress
    to make it look more like the buffered I/O COW path.

    This gets rid of xfs_get_blocks_direct and the last caller of
    xfs_get_blocks with the create flag set, so all that code can be removed.

    Last but not least I've removed a comment in xfs_filemap_fault that
    refers to xfs_get_blocks entirely instead of updating it - while the
    reference is correct, the whole DAX fault path looks different than
    the non-DAX one, so it seems rather pointless.

    Signed-off-by: Christoph Hellwig
    Tested-by: Jens Axboe
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     

28 Nov, 2016

2 commits

  • xfs_file_iomap_begin_delay() implements post-eof speculative
    preallocation by extending the block count of the requested delayed
    allocation. Now that xfs_bmapi_reserve_delalloc() has been updated to
    handle prealloc blocks separately and tag the inode, update
    xfs_file_iomap_begin_delay() to use the new parameter and rely on the
    former to tag the inode.

    Note that this patch does not change behavior.

    Signed-off-by: Brian Foster
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Brian Foster
     
  • Speculative preallocation is currently processed entirely by the callers
    of xfs_bmapi_reserve_delalloc(). The caller determines how much
    preallocation to include, adjusts the extent length and passes down the
    resulting request.

    While this works fine for post-eof speculative preallocation, it is not
    as reliable for COW fork preallocation. COW fork preallocation is
    implemented via the cowextszhint, which aligns the start offset as well
    as the length of the extent. Further, it is difficult for the caller to
    accurately identify when preallocation occurs because the returned
    extent could have been merged with neighboring extents in the fork.

    To simplify this situation and facilitate further COW fork preallocation
    enhancements, update xfs_bmapi_reserve_delalloc() to take a separate
    preallocation parameter to incorporate into the allocation request. The
    preallocation blocks value is tacked onto the end of the request and
    adjusted to accommodate neighboring extents and extent size limits.
    Since xfs_bmapi_reserve_delalloc() now knows precisely how much
    preallocation was included in the allocation, it can also tag the inodes
    appropriately to support preallocation reclaim.

    Note that xfs_bmapi_reserve_delalloc() callers are not yet updated to
    use the preallocation mechanism. This patch should not change behavior
    outside of correctly tagging reflink inodes when start offset
    preallocation occurs (which the caller does not handle correctly).

    Signed-off-by: Brian Foster
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Brian Foster
     

24 Nov, 2016

2 commits


20 Oct, 2016

2 commits

  • Instead of reserving space as the first thing in write_begin move it past
    reading the extent in the data fork. That way we only have to read from
    the data fork once and can reuse that information for trimming the extent
    to the shared/unshared boundary. Additionally this allows to easily
    limit the actual write size to said boundary, and avoid a roundtrip on the
    ilock.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • There is no need to trim an extent into a shared or non-shared one, or
    report any flags for plain old reads.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     

06 Oct, 2016

2 commits

  • Create a per-inode extent size allocator hint for copy-on-write. This
    hint is separate from the existing extent size hint so that CoW can
    take advantage of the fragmentation-reducing properties of extent size
    hints without disabling delalloc for regular writes.

    The extent size hint that's fed to the allocator during a copy on
    write operation is the greater of the cowextsize and regular extsize
    hint.

    During reflink, if we're sharing the entire source file to the entire
    destination file and the destination file doesn't already have a
    cowextsize hint, propagate the source file's cowextsize hint to the
    destination file.

    Furthermore, zero the bulkstat buffer prior to setting the fields
    so that we don't copy kernel memory contents into userspace.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • Report shared extents through the iomap interface so that FIEMAP flags
    shared blocks accurately. Have xfs_vm_bmap return zero for reflinked
    files because the bmap-based swap code requires static block mappings,
    which is incompatible with copy on write.

    NOTE: Existing userspace bmap users such as lilo will have the same
    problem with reflink files.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Darrick J. Wong

    Darrick J. Wong
     

05 Oct, 2016

3 commits

  • Modify xfs_bmap_add_extent_delay_real() so that we can convert delayed
    allocation extents in the CoW fork to real allocations, and wire this
    up all the way back to xfs_iomap_write_allocate(). In a subsequent
    patch, we'll modify the writepage handler to call this.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • Wire up iomap_begin to detect shared extents and create delayed allocation
    extents in the CoW fork:

    1) Check if we already have an extent in the COW fork for the area.
    If so nothing to do, we can move along.
    2) Look up block number for the current extent, and if there is none
    it's not shared move along.
    3) Unshare the current extent as far as we are going to write into it.
    For this we avoid an additional COW fork lookup and use the
    information we set aside in step 1) above.
    4) Goto 1) unless we've covered the whole range.

    Last but not least, this updates the xfs_reflink_reserve_cow_range calling
    convention to pass a byte offset and length, as that is what both callers
    expect anyway. This patch has been refactored considerably as part of the
    iomap transition.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: Christoph Hellwig

    Darrick J. Wong
     
  • Allow the creation of delayed allocation extents in the CoW fork. In
    a subsequent patch we'll wire up iomap_begin to actually do this via
    reflink helper functions.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     

19 Sep, 2016

5 commits

  • Another users of buffer_heads bytes the dust.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Ross Zwisler
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • We always just read the extent first, and will later lock exlusively
    after first dropping the lock in case we actually allocate blocks.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Ross Zwisler
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • Currently xfs_iomap_write_delay does up to lookups in the inode
    extent tree, which is rather costly especially with the new iomap
    based write path and small write sizes.

    But it turns out that the low-level xfs_bmap_search_extents gives us
    all the information we need in the regular delalloc buffered write
    path:

    - it will return us an extent covering the block we are looking up
    if it exists. In that case we can simply return that extent to
    the caller and are done
    - it will tell us if we are beyoned the last current allocated
    block with an eof return parameter. In that case we can create a
    delalloc reservation and use the also returned information about
    the last extent in the file as the hint to size our delalloc
    reservation.
    - it can tell us that we are writing into a hole, but that there is
    an extent beyoned this hole. In this case we can create a
    delalloc reservation that covers the requested size (possible
    capped to the next existing allocation).

    All that can be done in one single routine instead of bouncing up
    and down a few layers. This reduced the CPU overhead of the block
    mapping routines and also simplified the code a lot.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • And drop the pointless mp argument to xfs_iomap_eof_align_last_fsb,
    while we're at it.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig