Eric Lee / smarc-fsl-linux-kernel

27 Sep, 2017

1 commit

ee70daaba xfs: update i_size after unwritten conversion in dio completion ... Browse Code »

Since commit d531d91d6990 ("xfs: always use unwritten extents for
direct I/O writes"), we start allocating unwritten extents for all
direct writes to allow appending aio in XFS.

But for dio writes that could extend file size we update the in-core
inode size first, then convert the unwritten extents to real
allocations at dio completion time in xfs_dio_write_end_io(). Thus a
racing direct read could see the new i_size and find the unwritten
extents first and read zeros instead of actual data, if the direct
writer also takes a shared iolock.

Fix it by updating the in-core inode size after the unwritten extent
conversion. To do this, introduce a new boolean argument to
xfs_iomap_write_unwritten() to tell if we want to update in-core
i_size or not.

Suggested-by: Brian Foster
Reviewed-by: Brian Foster
Signed-off-by: Eryu Guan
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Eryu Guan
2017-09-27 01:55:19 +0800

12 Sep, 2017

1 commit

89fd915c4 Merge tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm from Dan Williams:
"A rework of media error handling in the BTT driver and other updates.
It has appeared in a few -next releases and collected some late-
breaking build-error and warning fixups as a result.

Summary:

- Media error handling support in the Block Translation Table (BTT)
driver is reworked to address sleeping-while-atomic locking and
memory-allocation-context conflicts.

- The dax_device lookup overhead for xfs and ext4 is moved out of the
iomap hot-path to a mount-time lookup.

- A new 'ecc_unit_size' sysfs attribute is added to advertise the
read-modify-write boundary property of a persistent memory range.

- Preparatory fix-ups for arm and powerpc pmem support are included
along with other miscellaneous fixes"

* tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (26 commits)
libnvdimm, btt: fix format string warnings
libnvdimm, btt: clean up warning and error messages
ext4: fix null pointer dereference on sbi
libnvdimm, nfit: move the check on nd_reserved2 to the endpoint
dax: fix FS_DAX=n BLOCK=y compilation
libnvdimm: fix integer overflow static analysis warning
libnvdimm, nd_blk: remove mmio_flush_range()
libnvdimm, btt: rework error clearing
libnvdimm: fix potential deadlock while clearing errors
libnvdimm, btt: cache sector_size in arena_info
libnvdimm, btt: ensure that flags were also unchanged during a map_read
libnvdimm, btt: refactor map entry operations with macros
libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path
libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute
ext4: perform dax_device lookup at mount
ext2: perform dax_device lookup at mount
xfs: perform dax_device lookup at mount
dax: introduce a fs_dax_get_by_bdev() helper
libnvdimm, btt: check memory allocation failure
libnvdimm, label: fix index block size calculation
...

Linus Torvalds
2017-09-12 04:10:57 +0800

02 Sep, 2017

2 commits

f91fb956f xfs: remove unused flags arg from xfs_file_iomap_begin_delay ... Browse Code »

Signed-off-by: Eric Sandeen
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Eric Sandeen
2017-09-02 04:08:26 +0800
8ad7c629b xfs: remove the ip argument to xfs_defer_finish ... Browse Code »

And instead require callers to explicitly join the inode using
xfs_defer_ijoin. Also consolidate the defer error handling in
a few places using a goto label.

Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2017-09-02 01:55:30 +0800

01 Sep, 2017

1 commit

486aff5e0 xfs: perform dax_device lookup at mount ... Browse Code »

The ->iomap_begin() operation is a hot path, so cache the
fs_dax_get_by_host() result at mount time to avoid the incurring the
hash lookup overhead on a per-i/o basis.

Reported-by: Christoph Hellwig
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Dan Williams

Dan Williams
2017-09-01 00:31:47 +0800

11 Jul, 2017

1 commit

642338ba3 Merge tag 'xfs-4.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull XFS updates from Darrick Wong:
"Here are some changes for you for 4.13. For the most part it's fixes
for bugs and deadlock problems, and preparation for online fsck in
some future merge window.

- Avoid quotacheck deadlocks

- Fix transaction overflows when bunmapping fragmented files

- Refactor directory readahead

- Allow admin to configure if ASSERT is fatal

- Improve transaction usage detail logging during overflows

- Minor cleanups

- Don't leak log items when the log shuts down

- Remove double-underscore typedefs

- Various preparation for online scrubbing

- Introduce new error injection configuration sysfs knobs

- Refactor dq_get_next to use extent map directly

- Fix problems with iterating the page cache for unwritten data

- Implement SEEK_{HOLE,DATA} via iomap

- Refactor XFS to use iomap SEEK_HOLE and SEEK_DATA

- Don't use MAXPATHLEN to check on-disk symlink target lengths"

* tag 'xfs-4.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (48 commits)
xfs: don't crash on unexpected holes in dir/attr btrees
xfs: rename MAXPATHLEN to XFS_SYMLINK_MAXLEN
xfs: fix contiguous dquot chunk iteration livelock
xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA
vfs: Add iomap_seek_hole and iomap_seek_data helpers
vfs: Add page_cache_seek_hole_data helper
xfs: remove a whitespace-only line from xfs_fs_get_nextdqblk
xfs: rewrite xfs_dq_get_next_id using xfs_iext_lookup_extent
xfs: Check for m_errortag initialization in xfs_errortag_test
xfs: grab dquots without taking the ilock
xfs: fix semicolon.cocci warnings
xfs: Don't clear SGID when inheriting ACLs
xfs: free cowblocks and retry on buffered write ENOSPC
xfs: replace log_badcrc_factor knob with error injection tag
xfs: convert drop_writes to use the errortag mechanism
xfs: remove unneeded parameter from XFS_TEST_ERROR
xfs: expose errortag knobs via sysfs
xfs: make errortag a per-mountpoint structure
xfs: free uncommitted transactions during log recovery
xfs: don't allow bmap on rt files
...

Linus Torvalds
2017-07-11 01:51:53 +0800

28 Jun, 2017

2 commits

f8c47250b xfs: convert drop_writes to use the errortag mechanism ... Browse Code »

We now have enhanced error injection that can control the frequency
with which errors happen, so convert drop_writes to use this.

Signed-off-by: Darrick J. Wong
Reviewed-by: Brian Foster
Reviewed-by: Carlos Maiolino

Darrick J. Wong
2017-06-28 09:23:20 +0800
9e24cfd04 xfs: remove unneeded parameter from XFS_TEST_ERROR ... Browse Code »

Since we moved the injected error frequency controls to the mountpoint,
we can get rid of the last argument to XFS_TEST_ERROR.

Signed-off-by: Darrick J. Wong
Reviewed-by: Brian Foster
Reviewed-by: Carlos Maiolino

Darrick J. Wong
2017-06-28 09:23:20 +0800

20 Jun, 2017

1 commit

29a5d29ec xfs: nowait aio support ... Browse Code »

If IOCB_NOWAIT is set, bail if the i_rwsem is not lockable
immediately.

IF IOMAP_NOWAIT is set, return EAGAIN in xfs_file_iomap_begin
if it needs allocation either due to file extension, writing to a hole,
or COW or waiting for other DIOs to finish.

Return -EAGAIN if we don't have extent list in memory.

Signed-off-by: Goldwyn Rodrigues
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Jens Axboe

Goldwyn Rodrigues
2017-06-20 21:12:03 +0800

14 May, 2017

1 commit

f5705aa8c dax, xfs, ext4: compile out iomap-dax paths in the FS_DAX=n case ... Browse Code »

Tetsuo reports:

fs/built-in.o: In function `xfs_file_iomap_end':
xfs_iomap.c:(.text+0xe0ef9): undefined reference to `put_dax'
fs/built-in.o: In function `xfs_file_iomap_begin':
xfs_iomap.c:(.text+0xe1a7f): undefined reference to `dax_get_by_host'
make: *** [vmlinux] Error 1
$ grep DAX .config
CONFIG_DAX=m
# CONFIG_DEV_DAX is not set
# CONFIG_FS_DAX is not set

When FS_DAX=n we can/must throw away the dax code in filesystems.
Implement 'fs_' versions of dax_get_by_host() and put_dax() that are
nops in the FS_DAX=n case.

Cc:
Cc:
Cc: Jan Kara
Cc: "Theodore Ts'o"
Cc: "Darrick J. Wong"
Cc: Ross Zwisler
Tested-by: Tony Luck
Fixes: ef51042472f5 ("block, dax: move 'select DAX' from BLOCK to FS_DAX")
Reported-by: Tetsuo Handa
Signed-off-by: Dan Williams

Dan Williams
2017-05-14 08:52:16 +0800

07 May, 2017

1 commit

d484467c8 Merge tag 'xfs-4.12-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull xfs updates from Darrick Wong:
"Here are the XFS changes for 4.12. The big new feature for this
release is the new space mapping ioctl that we've been discussing
since LSF2016, but other than that most of the patches are larger bug
fixes, memory corruption prevention, and other cleanups.

Summary:
- various code cleanups
- introduce GETFSMAP ioctl
- various refactoring
- avoid dio reads past eof
- fix memory corruption and other errors with fragmented directory blocks
- fix accidental userspace memory corruptions
- publish fs uuid in superblock
- make fstrim terminatable
- fix race between quotaoff and in-core inode creation
- avoid use-after-free when finishing up w/ buffer heads
- reserve enough space to handle bmap tree resizing during cow remap"

* tag 'xfs-4.12-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (53 commits)
xfs: fix use-after-free in xfs_finish_page_writeback
xfs: reserve enough blocks to handle btree splits when remapping
xfs: wait on new inodes during quotaoff dquot release
xfs: update ag iterator to support wait on new inodes
xfs: support ability to wait on new inodes
xfs: publish UUID in struct super_block
xfs: Allow user to kill fstrim process
xfs: better log intent item refcount checking
xfs: fix up quotacheck buffer list error handling
xfs: remove xfs_trans_ail_delete_bulk
xfs: don't use bool values in trace buffers
xfs: fix getfsmap userspace memory corruption while setting OF_LAST
xfs: fix __user annotations for xfs_ioc_getfsmap
xfs: corruption needs to respect endianess too!
xfs: use NULL instead of 0 to initialize a pointer in xfs_ioc_getfsmap
xfs: use NULL instead of 0 to initialize a pointer in xfs_getfsmap
xfs: simplify validation of the unwritten extent bit
xfs: remove unused values from xfs_exntst_t
xfs: remove the unused XFS_MAXLINK_1 define
xfs: more do_div cleanups
...

Linus Torvalds
2017-05-07 02:46:16 +0800

26 Apr, 2017

1 commit

fa5d932c3 ext2, ext4, xfs: retrieve dax_device for iomap operations ... Browse Code »

In preparation for converting fs/dax.c to use dax_direct_access()
instead of bdev_direct_access(), add the plumbing to retrieve the
dax_device associated with a given block_device.

Signed-off-by: Dan Williams

Dan Williams
2017-04-26 04:20:46 +0800

07 Apr, 2017

1 commit

84358536d xfs: actually report xattr extents via iomap ... Browse Code »

Apparently FIEMAP for xattrs has been broken since we switched to
the iomap backend because of an incorrect check for xattr presence.
Also fix the broken locking.

Signed-off-by: Darrick J. Wong
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2017-04-07 07:00:39 +0800

04 Apr, 2017

1 commit

63fbb4c18 xfs: remove the ISUNWRITTEN macro ... Browse Code »

Opencoding the trivial checks makes it much easier to read (and grep..).

Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2017-04-04 06:18:16 +0800

09 Mar, 2017

1 commit

f65e6fad2 xfs: use iomap new flag for newly allocated delalloc blocks ... Browse Code »

Commit fa7f138 ("xfs: clear delalloc and cache on buffered write
failure") fixed one regression in the iomap error handling code and
exposed another. The fundamental problem is that if a buffered write
is a rewrite of preexisting delalloc blocks and the write fails, the
failure handling code can punch out preexisting blocks with valid
file data.

This was reproduced directly by sub-block writes in the LTP
kernel/syscalls/write/write03 test. A first 100 byte write allocates
a single block in a file. A subsequent 100 byte write fails and
punches out the block, including the data successfully written by
the previous write.

To address this problem, update the ->iomap_begin() handler to
distinguish newly allocated delalloc blocks from preexisting
delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the
->iomap_end() handler to decide when a failed or short write should
punch out delalloc blocks.

This introduces the subtle requirement that ->iomap_begin() should
never combine newly allocated delalloc blocks with existing blocks
in the resulting iomap descriptor. This can occur when a new
delalloc reservation merges with a neighboring extent that is part
of the current write, for example. Therefore, drop the
post-allocation extent lookup from xfs_bmapi_reserve_delalloc() and
just return the record inserted into the fork. This ensures only new
blocks are returned and thus that preexisting delalloc blocks are
always handled as "found" blocks and not punched out on a failed
rewrite.

Reported-by: Xiong Zhou
Signed-off-by: Brian Foster
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Brian Foster
2017-03-09 01:58:08 +0800

17 Feb, 2017

2 commits

9dbddd7b0 xfs: resurrect debug mode drop buffered writes mechanism ... Browse Code »

A debug mode write failure mechanism was introduced to XFS in commit
801cc4e17a ("xfs: debug mode forced buffered write failure") to
facilitate targeted testing of delalloc indirect reservation management
from userspace. This code was subsequently rendered ineffective by the
move to iomap based buffered writes in commit 68a9f5e700 ("xfs:
implement iomap based buffered write path"). This likely went unnoticed
because the associated userspace code had not made it into xfstests.

Resurrect this mechanism to facilitate effective indlen reservation
testing from xfstests. The move to iomap based buffered writes relocated
the hook this mechanism needs to return write failure from XFS to
generic code. The failure trigger must remain in XFS. Given that
limitation, convert this from a write failure mechanism to one that
simply drops writes without returning failure to userspace. Rename all
"fail_writes" references to "drop_writes" to illustrate the point. This
is more hacky than preferred, but still triggers the XFS error handling
behavior required to drive the indlen tests. This is only available in
DEBUG mode and for testing purposes only.

Signed-off-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Brian Foster
2017-02-17 09:19:15 +0800
fa7f138ac xfs: clear delalloc and cache on buffered write failure ... Browse Code »

The buffered write failure handling code in
xfs_file_iomap_end_delalloc() has a couple minor problems. First, if
written == 0, start_fsb is not rounded down and it fails to kill off a
delalloc block if the start offset is block unaligned. This results in a
lingering delalloc block and broken delalloc block accounting detected
at unmount time. Fix this by rounding down start_fsb in the unlikely
event that written == 0.

Second, it is possible for a failed overwrite of a delalloc extent to
leave dirty pagecache around over a hole in the file. This is because is
possible to hit ->iomap_end() on write failure before the iomap code has
attempted to allocate pagecache, and thus has no need to clean it up. If
the targeted delalloc extent was successfully written by a previous
write, however, then it does still have dirty pages when ->iomap_end()
punches out the underlying blocks. This ultimately results in writeback
over a hole. To fix this problem, unconditionally punch out the
pagecache from XFS before the associated delalloc range.

Signed-off-by: Brian Foster
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Brian Foster
2017-02-17 09:19:12 +0800

07 Feb, 2017

3 commits

3c68d44a2 xfs: allocate direct I/O COW blocks in iomap_begin ... Browse Code »

Instead of preallocating all the required COW blocks in the high-level
write code do it inside the iomap code, like we do for all other I/O.

Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2017-02-07 09:47:47 +0800
f13eb2055 xfs: introduce xfs_aligned_fsb_count ... Browse Code »

Factor a helper to calculate the extent-size aligned block out of the
iomap code, so that it can be reused by the upcoming reflink dio code.

Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2017-02-07 09:47:46 +0800
54a4ef8af xfs: reject all unaligned direct writes to reflinked files ... Browse Code »

We currently fall back from direct to buffered writes if we detect a
remaining shared extent in the iomap_begin callback. But by the time
iomap_begin is called for the potentially unaligned end block we might
have already written most of the data to disk, which we'd now write
again using buffered I/O. To avoid this reject all writes to reflinked
files before starting I/O so that we are guaranteed to only write the
data once.

The alternative would be to unshare the unaligned start and/or end block
before doing the I/O. I think that's doable, and will actually be
required to support reflinks on DAX file system. But it will take a
little more time and I'd rather get rid of the double write ASAP.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2017-02-07 09:47:46 +0800

03 Feb, 2017

1 commit

5eda43000 xfs: mark speculative prealloc CoW fork extents unwritten ... Browse Code »

Christoph Hellwig pointed out that there's a potentially nasty race when
performing simultaneous nearby directio cow writes:

"Thread 1 writes a range from B to c

" B --------- C
p

"a little later thread 2 writes from A to B

" A --------- B
p

[editor's note: the 'p' denote cowextsize boundaries, which I added to
make this more clear]

"but the code preallocates beyond B into the range where thread
"1 has just written, but ->end_io hasn't been called yet.
"But once ->end_io is called thread 2 has already allocated
"up to the extent size hint into the write range of thread 1,
"so the end_io handler will splice the unintialized blocks from
"that preallocation back into the file right after B."

We can avoid this race by ensuring that thread 1 cannot accidentally
remap the blocks that thread 2 allocated (as part of speculative
preallocation) as part of t2's write preparation in t1's end_io handler.
The way we make this happen is by taking advantage of the unwritten
extent flag as an intermediate step.

Recall that when we begin the process of writing data to shared blocks,
we create a delayed allocation extent in the CoW fork:

D: --RRRRRRSSSRRRRRRRR---
C: ------DDDDDDD---------

When a thread prepares to CoW some dirty data out to disk, it will now
convert the delalloc reservation into an /unwritten/ allocated extent in
the cow fork. The da conversion code tries to opportunistically
allocate as much of a (speculatively prealloc'd) extent as possible, so
we may end up allocating a larger extent than we're actually writing
out:

D: --RRRRRRSSSRRRRRRRR---
U: ------UUUUUUU---------

Next, we convert only the part of the extent that we're actively
planning to write to normal (i.e. not unwritten) status:

D: --RRRRRRSSSRRRRRRRR---
U: ------UURRUUU---------

If the write succeeds, the end_cow function will now scan the relevant
range of the CoW fork for real extents and remap only the real extents
into the data fork:

D: --RRRRRRRRSRRRRRRRR---
U: ------UU--UUU---------

This ensures that we never obliterate valid data fork extents with
unwritten blocks from the CoW fork.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2017-02-03 07:14:02 +0800

31 Jan, 2017

1 commit

8ff6daa17 iomap: constify struct iomap_ops ... Browse Code »

Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2017-01-31 08:32:25 +0800

24 Jan, 2017

1 commit

d2b3964a0 xfs: fix COW writeback race ... Browse Code »

Due to the way how xfs_iomap_write_allocate tries to convert the whole
found extents from delalloc to real space we can run into a race
condition with multiple threads doing writes to this same extent.
For the non-COW case that is harmless as the only thing that can happen
is that we call xfs_bmapi_write on an extent that has already been
converted to a real allocation. For COW writes where we move the extent
from the COW to the data fork after I/O completion the race is, however,
not quite as harmless. In the worst case we are now calling
xfs_bmapi_write on a region that contains hole in the COW work, which
will trip up an assert in debug builds or lead to file system corruption
in non-debug builds. This seems to be reproducible with workloads of
small O_DSYNC write, although so far I've not managed to come up with
a with an isolated reproducer.

The fix for the issue is relatively simple: tell xfs_bmapi_write
that we are only asked to convert delayed allocations and skip holes
in that case.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2017-01-24 02:55:07 +0800

30 Nov, 2016

1 commit

acdda3aae xfs: use iomap_dio_rw ... Browse Code »

Straight switch over to using iomap for direct I/O - we already have the
non-COW dio path in write_begin for DAX and files with extent size hints,
so nothing to add there. The COW path is ported over from the old
get_blocks version and a bit of a mess, but I have some work in progress
to make it look more like the buffered I/O COW path.

This gets rid of xfs_get_blocks_direct and the last caller of
xfs_get_blocks with the create flag set, so all that code can be removed.

Last but not least I've removed a comment in xfs_filemap_fault that
refers to xfs_get_blocks entirely instead of updating it - while the
reference is correct, the whole DAX fault path looks different than
the non-DAX one, so it seems rather pointless.

Signed-off-by: Christoph Hellwig
Tested-by: Jens Axboe
Reviewed-by: Darrick J. Wong
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-11-30 11:37:15 +0800

28 Nov, 2016

2 commits

f782088c9 xfs: pass post-eof speculative prealloc blocks to bmapi ... Browse Code »

xfs_file_iomap_begin_delay() implements post-eof speculative
preallocation by extending the block count of the requested delayed
allocation. Now that xfs_bmapi_reserve_delalloc() has been updated to
handle prealloc blocks separately and tag the inode, update
xfs_file_iomap_begin_delay() to use the new parameter and rely on the
former to tag the inode.

Note that this patch does not change behavior.

Signed-off-by: Brian Foster
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Brian Foster
2016-11-28 11:57:42 +0800
974ae922e xfs: track preallocation separately in xfs_bmapi_reserve_delalloc() ... Browse Code »

Speculative preallocation is currently processed entirely by the callers
of xfs_bmapi_reserve_delalloc(). The caller determines how much
preallocation to include, adjusts the extent length and passes down the
resulting request.

While this works fine for post-eof speculative preallocation, it is not
as reliable for COW fork preallocation. COW fork preallocation is
implemented via the cowextszhint, which aligns the start offset as well
as the length of the extent. Further, it is difficult for the caller to
accurately identify when preallocation occurs because the returned
extent could have been merged with neighboring extents in the fork.

To simplify this situation and facilitate further COW fork preallocation
enhancements, update xfs_bmapi_reserve_delalloc() to take a separate
preallocation parameter to incorporate into the allocation request. The
preallocation blocks value is tacked onto the end of the request and
adjusted to accommodate neighboring extents and extent size limits.
Since xfs_bmapi_reserve_delalloc() now knows precisely how much
preallocation was included in the allocation, it can also tag the inodes
appropriately to support preallocation reclaim.

Note that xfs_bmapi_reserve_delalloc() callers are not yet updated to
use the preallocation mechanism. This patch should not change behavior
outside of correctly tagging reflink inodes when start offset
preallocation occurs (which the caller does not handle correctly).

Signed-off-by: Brian Foster
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Brian Foster
2016-11-28 11:57:42 +0800

24 Nov, 2016

2 commits

656152e55 xfs: use new extent lookup helpers xfs_file_iomap_begin_delay ... Browse Code »

And only lookup the previous extent inside xfs_iomap_prealloc_size
if we actually need it.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-11-24 08:39:44 +0800
65c5f4197 xfs: remove prev argument to xfs_bmapi_reserve_delalloc ... Browse Code »

We can easily lookup the previous extent for the cases where we need it,
which saves the callers from looking it up for us later in the series.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-11-24 08:39:44 +0800

20 Oct, 2016

2 commits

3ba020bef xfs: optimize writes to reflink files ... Browse Code »

Instead of reserving space as the first thing in write_begin move it past
reading the extent in the data fork. That way we only have to read from
the data fork once and can reuse that information for trimming the extent
to the shared/unshared boundary. Additionally this allows to easily
limit the actual write size to said boundary, and avoid a roundtrip on the
ilock.

Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-10-20 12:53:50 +0800
5f9268ca5 xfs: don't bother looking at the refcount tree for reads ... Browse Code »

There is no need to trim an extent into a shared or non-shared one, or
report any flags for plain old reads.

Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-10-20 12:53:32 +0800

06 Oct, 2016

2 commits

f7ca35227 xfs: create a separate cow extent size hint for the allocator ... Browse Code »

Create a per-inode extent size allocator hint for copy-on-write. This
hint is separate from the existing extent size hint so that CoW can
take advantage of the fragmentation-reducing properties of extent size
hints without disabling delalloc for regular writes.

The extent size hint that's fed to the allocator during a copy on
write operation is the greater of the cowextsize and regular extsize
hint.

During reflink, if we're sharing the entire source file to the entire
destination file and the destination file doesn't already have a
cowextsize hint, propagate the source file's cowextsize hint to the
destination file.

Furthermore, zero the bulkstat buffer prior to setting the fields
so that we don't copy kernel memory contents into userspace.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2016-10-06 07:26:26 +0800
db1327b16 xfs: report shared extent mappings to userspace correctly ... Browse Code »

Report shared extents through the iomap interface so that FIEMAP flags
shared blocks accurately. Have xfs_vm_bmap return zero for reflinked
files because the bmap-based swap code requires static block mappings,
which is incompatible with copy on write.

NOTE: Existing userspace bmap users such as lilo will have the same
problem with reflink files.

Signed-off-by: Christoph Hellwig
Signed-off-by: Darrick J. Wong

Darrick J. Wong
2016-10-06 07:26:04 +0800

05 Oct, 2016

3 commits

60b4984fc xfs: support allocating delayed extents in CoW fork ... Browse Code »

Modify xfs_bmap_add_extent_delay_real() so that we can convert delayed
allocation extents in the CoW fork to real allocations, and wire this
up all the way back to xfs_iomap_write_allocate(). In a subsequent
patch, we'll modify the writepage handler to call this.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2016-10-05 09:06:41 +0800
2a06705cd xfs: create delalloc extents in CoW fork ... Browse Code »

Wire up iomap_begin to detect shared extents and create delayed allocation
extents in the CoW fork:

1) Check if we already have an extent in the COW fork for the area.
If so nothing to do, we can move along.
2) Look up block number for the current extent, and if there is none
it's not shared move along.
3) Unshare the current extent as far as we are going to write into it.
For this we avoid an additional COW fork lookup and use the
information we set aside in step 1) above.
4) Goto 1) unless we've covered the whole range.

Last but not least, this updates the xfs_reflink_reserve_cow_range calling
convention to pass a byte offset and length, as that is what both callers
expect anyway. This patch has been refactored considerably as part of the
iomap transition.

Signed-off-by: Darrick J. Wong
Signed-off-by: Christoph Hellwig

Darrick J. Wong
2016-10-05 09:06:40 +0800
be51f8119 xfs: support bmapping delalloc extents in the CoW fork ... Browse Code »

Allow the creation of delayed allocation extents in the CoW fork. In
a subsequent patch we'll wire up iomap_begin to actually do this via
reflink helper functions.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2016-10-05 09:06:40 +0800

19 Sep, 2016

5 commits

6c31f495d xfs: use iomap to implement DAX ... Browse Code »

Another users of buffer_heads bytes the dust.

Signed-off-by: Christoph Hellwig
Reviewed-by: Ross Zwisler
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-09-19 09:28:38 +0800
66642c5c1 xfs: take the ilock shared if possible in xfs_file_iomap_begin ... Browse Code »

We always just read the extent first, and will later lock exlusively
after first dropping the lock in case we actually allocate blocks.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-09-19 09:26:39 +0800
ecd50729f iomap: add IOMAP_F_NEW flag ... Browse Code »

Signed-off-by: Christoph Hellwig
Reviewed-by: Ross Zwisler
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-09-19 09:24:37 +0800
51446f5ba xfs: rewrite and optimize the delalloc write path ... Browse Code »

Currently xfs_iomap_write_delay does up to lookups in the inode
extent tree, which is rather costly especially with the new iomap
based write path and small write sizes.

But it turns out that the low-level xfs_bmap_search_extents gives us
all the information we need in the regular delalloc buffered write
path:

- it will return us an extent covering the block we are looking up
if it exists. In that case we can simply return that extent to
the caller and are done
- it will tell us if we are beyoned the last current allocated
block with an eof return parameter. In that case we can create a
delalloc reservation and use the also returned information about
the last extent in the file as the hint to size our delalloc
reservation.
- it can tell us that we are writing into a hole, but that there is
an extent beyoned this hole. In this case we can create a
delalloc reservation that covers the requested size (possible
capped to the next existing allocation).

All that can be done in one single routine instead of bouncing up
and down a few layers. This reduced the CPU overhead of the block
mapping routines and also simplified the code a lot.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-09-19 09:10:21 +0800
f8e3a8257 xfs: factor our a helper to calculate the EOF alignment ... Browse Code »

And drop the pointless mp argument to xfs_iomap_eof_align_last_fsb,
while we're at it.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-09-19 09:09:28 +0800