Eric Lee / smarc-fsl-linux-kernel

04 Feb, 2017

19 commits

b5b4d4a91 xfs: fix bmv_count confusion w/ shared extents ... Browse Code »

commit c364b6d0b6cda1cd5d9ab689489adda3e82529aa upstream.

In a bmapx call, bmv_count is the total size of the array, including the
zeroth element that userspace uses to supply the search key. The output
array starts at offset 1 so that we can set up the user for the next
invocation. Since we now can split an extent into multiple bmap records
due to shared/unshared status, we have to be careful that we don't
overflow the output array.

In the original patch f86f403794b ("xfs: teach get_bmapx about shared
extents and the CoW fork") I used cur_ext (the output index) to check
for overflows, albeit with an off-by-one error. Since nexleft no longer
describes the number of unfilled slots in the output, we can rip all
that out and use cur_ext for the overflow check directly.

Failure to do this causes heap corruption in bmapx callers such as
xfs_io and xfs_scrub. xfs/328 can reproduce this problem.

Reviewed-by: Eric Sandeen
Signed-off-by: Darrick J. Wong
Signed-off-by: Greg Kroah-Hartman

Darrick J. Wong
2017-02-04 16:47:13 +0800
5d44dd54b xfs: clear _XBF_PAGES from buffers when readahead page ... Browse Code »

commit 2aa6ba7b5ad3189cc27f14540aa2f57f0ed8df4b upstream.

If we try to allocate memory pages to back an xfs_buf that we're trying
to read, it's possible that we'll be so short on memory that the page
allocation fails. For a blocking read we'll just wait, but for
readahead we simply dump all the pages we've collected so far.

Unfortunately, after dumping the pages we neglect to clear the
_XBF_PAGES state, which means that the subsequent call to xfs_buf_free
thinks that b_pages still points to pages we own. It then double-frees
the b_pages pages.

This results in screaming about negative page refcounts from the memory
manager, which xfs oughtn't be triggering. To reproduce this case,
mount a filesystem where the size of the inodes far outweighs the
availalble memory (a ~500M inode filesystem on a VM with 300MB memory
did the trick here) and run bulkstat in parallel with other memory
eating processes to put a huge load on the system. The "check summary"
phase of xfs_scrub also works for this purpose.

Signed-off-by: Darrick J. Wong
Reviewed-by: Eric Sandeen
Signed-off-by: Greg Kroah-Hartman

Darrick J. Wong
2017-02-04 16:47:12 +0800
29f96b7e9 xfs: extsize hints are not unlikely in xfs_bmap_btalloc ... Browse Code »

commit 493611ebd62673f39e2f52c2561182c558a21cb6 upstream.

With COW files they are the hotpath, just like for files with the
extent size hint attribute. We really shouldn't micro-manage anything
but failure cases with unlikely.

Additionally Arnd Bergmann recently reported that one of these two
unlikely annotations causes link failures together with an upcoming
kernel instrumentation patch, so let's get rid of it ASAP.

Signed-off-by: Christoph Hellwig
Reported-by: Arnd Bergmann
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Signed-off-by: Greg Kroah-Hartman

Christoph Hellwig
2017-02-04 16:47:12 +0800
aab858dab xfs: remove racy hasattr check from attr ops ... Browse Code »

commit 5a93790d4e2df73e30c965ec6e49be82fc3ccfce upstream.

xfs_attr_[get|remove]() have unlocked attribute fork checks to optimize
away a lock cycle in cases where the fork does not exist or is otherwise
empty. This check is not safe, however, because an attribute fork short
form to extent format conversion includes a transient state that causes
the xfs_inode_hasattr() check to fail. Specifically,
xfs_attr_shortform_to_leaf() creates an empty extent format attribute
fork and then adds the existing shortform attributes to it.

This means that lookup of an existing xattr can spuriously return
-ENOATTR when racing against a setxattr that causes the associated
format conversion. This was originally reproduced by an untar on a
particularly configured glusterfs volume, but can also be reproduced on
demand with properly crafted xattr requests.

The format conversion occurs under the exclusive ilock. xfs_attr_get()
and xfs_attr_remove() already have the proper locking and checks further
down in the functions to handle this situation correctly. Drop the
unlocked checks to avoid the spurious failure and rely on the existing
logic.

Signed-off-by: Brian Foster
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Signed-off-by: Greg Kroah-Hartman

Brian Foster
2017-02-04 16:47:12 +0800
29094164e xfs: verify dirblocklog correctly ... Browse Code »

commit 83d230eb5c638949350f4761acdfc0af5cb1bc00 upstream.

sb_dirblklog is added to sb_blocklog to compute the directory block size
in bytes. Therefore, we must compare the sum of both those values
against XFS_MAX_BLOCKSIZE_LOG, not just dirblklog.

Signed-off-by: Darrick J. Wong
Reviewed-by: Eric Sandeen
Reviewed-by: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Darrick J. Wong
2017-02-04 16:47:12 +0800
214d55efa xfs: fix COW writeback race ... Browse Code »

commit d2b3964a0780d2d2994eba57f950d6c9fe489ed8 upstream.

Due to the way how xfs_iomap_write_allocate tries to convert the whole
found extents from delalloc to real space we can run into a race
condition with multiple threads doing writes to this same extent.
For the non-COW case that is harmless as the only thing that can happen
is that we call xfs_bmapi_write on an extent that has already been
converted to a real allocation. For COW writes where we move the extent
from the COW to the data fork after I/O completion the race is, however,
not quite as harmless. In the worst case we are now calling
xfs_bmapi_write on a region that contains hole in the COW work, which
will trip up an assert in debug builds or lead to file system corruption
in non-debug builds. This seems to be reproducible with workloads of
small O_DSYNC write, although so far I've not managed to come up with
a with an isolated reproducer.

The fix for the issue is relatively simple: tell xfs_bmapi_write
that we are only asked to convert delayed allocations and skip holes
in that case.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Signed-off-by: Greg Kroah-Hartman

Christoph Hellwig
2017-02-04 16:47:12 +0800
29f319275 xfs: fix xfs_mode_to_ftype() prototype ... Browse Code »

commit fd29f7af75b7adf250beccffa63746c6a88e2b74 upstream.

A harmless warning just got introduced:

fs/xfs/libxfs/xfs_dir2.h:40:8: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers]

Removing the 'const' modifier avoids the warning and has no
other effect.

Fixes: 1fc4d33fed12 ("xfs: replace xfs_mode_to_ftype table with switch statement")
Signed-off-by: Arnd Bergmann
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Signed-off-by: Greg Kroah-Hartman

Arnd Bergmann
2017-02-04 16:47:12 +0800
d062d90c3 xfs: don't wrap ID in xfs_dq_get_next_id ... Browse Code »

commit 657bdfb7f5e68ca5e2ed009ab473c429b0d6af85 upstream.

The GETNEXTQOTA ioctl takes whatever ID is sent in,
and looks for the next active quota for an user
equal or higher to that ID.

But if we are at the maximum ID and then ask for the "next"
one, we may wrap back to zero. In this case, userspace
may loop forever, because it will start querying again
at zero.

We'll fix this in userspace as well, but for the kernel,
return -ENOENT if we ask for the next quota ID
past UINT_MAX so the caller knows to stop.

Signed-off-by: Eric Sandeen
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Signed-off-by: Greg Kroah-Hartman

Eric Sandeen
2017-02-04 16:47:12 +0800
d3201a14b xfs: sanity check inode di_mode ... Browse Code »

commit a324cbf10a3c67aaa10c9f47f7b5801562925bc2 upstream.

Check for invalid file type in xfs_dinode_verify()
and fail to load the inode structure from disk.

Reviewed-by: Darrick J. Wong
Signed-off-by: Amir Goldstein
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Signed-off-by: Greg Kroah-Hartman

Amir Goldstein
2017-02-04 16:47:12 +0800
43ce59217 xfs: sanity check inode mode when creating new dentry ... Browse Code »

commit fab8eef86c814c3dd46bc5d760b6e4a53d5fc5a6 upstream.

The helper xfs_dentry_to_name() is used by 2 different
classes of callers: Callers that pass zero mode and don't care
about the returned name.type field and Callers that pass
non zero mode and do care about the name.type field.

Change xfs_dentry_to_name() to not take the mode argument and
change the call sites of the first class to not pass the mode
argument.

Create a new helper xfs_dentry_mode_to_name() which does pass
the mode argument and returns -EFSCORRUPTED if mode is invalid.
Callers that translate non zero mode to on-disk file type now
check the return value and will export the error to user instead
of staging an invalid file type to be written to directory entry.

Signed-off-by: Amir Goldstein
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Signed-off-by: Greg Kroah-Hartman

Amir Goldstein
2017-02-04 16:47:12 +0800
b5f68e24c xfs: replace xfs_mode_to_ftype table with switch statement ... Browse Code »

commit 1fc4d33fed124fb182e8e6c214e973a29389ae83.

The size of the xfs_mode_to_ftype[] conversion table
was too small to handle an invalid value of mode=S_IFMT.

Instead of fixing the table size, replace the conversion table
with a conversion helper that uses a switch statement.

Suggested-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Signed-off-by: Amir Goldstein
Signed-off-by: Darrick J. Wong
Signed-off-by: Greg Kroah-Hartman

Amir Goldstein
2017-02-04 16:47:12 +0800
4fac84ba1 xfs: add missing include dependencies to xfs_dir2.h ... Browse Code »

commit b597dd5373a1ccc08218665dc8417433b1c09550 upstream.

xfs_dir2.h dereferences some data types in inline functions
and fails to include those type definitions, e.g.:
xfs_dir2_data_aoff_t, struct xfs_da_geometry.

Signed-off-by: Amir Goldstein
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Signed-off-by: Greg Kroah-Hartman

Amir Goldstein
2017-02-04 16:47:12 +0800
e5325fcf7 xfs: sanity check directory inode di_size ... Browse Code »

commit 3c6f46eacd876bd723a9bad3c6882714c052fd8e upstream.

This changes fixes an assertion hit when fuzzing on-disk
i_mode values.

The easy case to fix is when changing an empty file
i_mode to S_IFDIR. In this case, xfs_dinode_verify()
detects an illegal zero size for directory and fails
to load the inode structure from disk.

For the case of non empty file whose i_mode is changed
to S_IFDIR, the ASSERT() statement in xfs_dir2_isblock()
is replaced with return -EFSCORRUPTED, to avoid interacting
with corrupted jusk also when XFS_DEBUG is disabled.

Suggested-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Signed-off-by: Amir Goldstein
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Signed-off-by: Greg Kroah-Hartman

Amir Goldstein
2017-02-04 16:47:11 +0800
624e54b5a xfs: make the ASSERT() condition likely ... Browse Code »

commit bf46ecc3d8cca05f2907cf482755c42c2b11a79d upstream.

The ASSERT() condition is the normal case, not the exception,
so testing the condition should be likely(), not unlikely().

Reviewed-by: Christoph Hellwig
Signed-off-by: Amir Goldstein
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Signed-off-by: Greg Kroah-Hartman

Amir Goldstein
2017-02-04 16:47:11 +0800
4f4d5082e xfs: don't print warnings when xfs_log_force fails ... Browse Code »

commit 84a4620cfe97c9d57e39b2369bfb77faff55063d upstream.

There are only two reasons for xfs_log_force / xfs_log_force_lsn to fail:
one is an I/O error, for which xlog_bdstrat already logs a warning, and
the second is an already shutdown log due to a previous I/O errors. In
the latter case we'll already have a previous indication for the actual
error, but the large stream of misleading warnings from xfs_log_force
will probably scroll it out of the message buffer.

Simply removing the warnings thus makes the XFS log reporting significantly
better.

Signed-off-by: Christoph Hellwig
Reviewed-by: Carlos Maiolino
Signed-off-by: Darrick J. Wong
Signed-off-by: Greg Kroah-Hartman

Christoph Hellwig
2017-02-04 16:47:11 +0800
e9b776519 xfs: don't rely on ->total in xfs_alloc_space_available ... Browse Code »

commit 12ef830198b0d71668eb9b59f9ba69d32951a48a upstream.

->total is a bit of an odd parameter passed down to the low-level
allocator all the way from the high-level callers. It's supposed to
contain the maximum number of blocks to be allocated for the whole
transaction [1].

But in xfs_iomap_write_allocate we only convert existing delayed
allocations and thus only have a minimal block reservation for the
current transaction, so xfs_alloc_space_available can't use it for
the allocation decisions. Use the maximum of args->total and the
calculated block requirement to make a decision. We probably should
get rid of args->total eventually and instead apply ->minleft more
broadly, but that will require some extensive changes all over.

[1] which creates lots of confusion as most callers don't decrement it
once doing a first allocation. But that's for a separate series.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Signed-off-by: Darrick J. Wong
Signed-off-by: Greg Kroah-Hartman

Christoph Hellwig
2017-02-04 16:47:11 +0800
6b81365b1 xfs: adjust allocation length in xfs_alloc_space_available ... Browse Code »

commit 54fee133ad59c87ab01dd84ab3e9397134b32acb upstream.

We must decide in xfs_alloc_fix_freelist if we can perform an
allocation from a given AG is possible or not based on the available
space, and should not fail the allocation past that point on a
healthy file system.

But currently we have two additional places that second-guess
xfs_alloc_fix_freelist: xfs_alloc_ag_vextent tries to adjust the
maxlen parameter to remove the reservation before doing the
allocation (but ignores the various minium freespace requirements),
and xfs_alloc_fix_minleft tries to fix up the allocated length
after we've found an extent, but ignores the reservations and also
doesn't take the AGFL into account (and thus fails allocations
for not matching minlen in some cases).

Remove all these later fixups and just correct the maxlen argument
inside xfs_alloc_fix_freelist once we have the AGF buffer locked.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Signed-off-by: Darrick J. Wong
Signed-off-by: Greg Kroah-Hartman

Christoph Hellwig
2017-02-04 16:47:11 +0800
c63f4d3aa xfs: fix bogus minleft manipulations ... Browse Code »

commit 255c516278175a6dc7037d1406307f35237d8688 upstream.

We can't just set minleft to 0 when we're low on space - that's exactly
what we need minleft for: to protect space in the AG for btree block
allocations when we are low on free space.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Signed-off-by: Darrick J. Wong
Signed-off-by: Greg Kroah-Hartman

Christoph Hellwig
2017-02-04 16:47:11 +0800
d20e4ad06 xfs: bump up reserved blocks in xfs_alloc_set_aside ... Browse Code »

commit 5149fd327f16e393c1d04fa5325ab072c32472bf upstream.

Setting aside 4 blocks globally for bmbt splits isn't all that useful,
as different threads can allocate space in parallel. Bump it to 4
blocks per AG to allow each thread that is currently doing an
allocation to dip into it separately. Without that we may no have
enough reserved blocks if there are enough parallel transactions
in an almost out space file system that all run into bmap btree
splits.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Signed-off-by: Darrick J. Wong
Signed-off-by: Greg Kroah-Hartman

Christoph Hellwig
2017-02-04 16:47:11 +0800

01 Feb, 2017

1 commit

485952414 xfs: prevent quotacheck from overloading inode lru ... Browse Code »

commit e0d76fa4475ef2cf4b52d18588b8ce95153d021b upstream.

Quotacheck runs at mount time in situations where quota accounting must
be recalculated. In doing so, it uses bulkstat to visit every inode in
the filesystem. Historically, every inode processed during quotacheck
was released and immediately tagged for reclaim because quotacheck runs
before the superblock is marked active by the VFS. In other words,
the final iput() lead to an immediate ->destroy_inode() call, which
allowed the XFS background reclaim worker to start reclaiming inodes.

Commit 17c12bcd3 ("xfs: when replaying bmap operations, don't let
unlinked inodes get reaped") marks the XFS superblock active sooner as
part of the mount process to support caching inodes processed during log
recovery. This occurs before quotacheck and thus means all inodes
processed by quotacheck are inserted to the LRU on release. The
s_umount lock is held until the mount has completed and thus prevents
the shrinkers from operating on the sb. This means that quotacheck can
excessively populate the inode LRU and lead to OOM conditions on systems
without sufficient RAM.

Update the quotacheck bulkstat handler to set XFS_IGET_DONTCACHE on
inodes processed by quotacheck. This causes ->drop_inode() to return 1
and in turn causes iput_final() to evict the inode. This preserves the
original quotacheck behavior and prevents it from overloading the LRU
and running out of memory.

Reported-by: Martin Svec
Signed-off-by: Brian Foster
Reviewed-by: Eric Sandeen
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Signed-off-by: Greg Kroah-Hartman

Brian Foster
2017-02-01 15:33:05 +0800

20 Jan, 2017

1 commit

6ba35da69 xfs: Timely free truncated dirty pages ... Browse Code »

commit 0a417b8dc1f10b03e8f558b8a831f07ec4c23795 upstream.

Commit 99579ccec4e2 "xfs: skip dirty pages in ->releasepage()" started
to skip dirty pages in xfs_vm_releasepage() which also has the effect
that if a dirty page is truncated, it does not get freed by
block_invalidatepage() and is lingering in LRU list waiting for reclaim.
So a simple loop like:

while true; do
dd if=/dev/zero of=file bs=1M count=100
rm file
done

will keep using more and more memory until we hit low watermarks and
start pagecache reclaim which will eventually reclaim also the truncate
pages. Keeping these truncated (and thus never usable) pages in memory
is just a waste of memory, is unnecessarily stressing page cache
reclaim, and reportedly also leads to anonymous mmap(2) returning ENOMEM
prematurely.

So instead of just skipping dirty pages in xfs_vm_releasepage(), return
to old behavior of skipping them only if they have delalloc or unwritten
buffers and fix the spurious warnings by warning only if the page is
clean.

CC: Brian Foster
CC: Vlastimil Babka
Reported-by: Petr Tůma
Fixes: 99579ccec4e271c3d4d4e7c946058766812afdab
Signed-off-by: Jan Kara
Reviewed-by: Brian Foster
Signed-off-by: Darrick J. Wong
Signed-off-by: Greg Kroah-Hartman

Jan Kara
2017-01-20 03:18:00 +0800

12 Jan, 2017

19 commits

1b9c25568 xfs: fix max_retries _show and _store functions ... Browse Code »

commit ff97f2399edac1e0fb3fa7851d5fbcbdf04717cf upstream.

max_retries _show and _store functions should test against cfg->max_retries,
not cfg->retry_timeout

Signed-off-by: Carlos Maiolino
Reviewed-by: Eric Sandeen
Signed-off-by: Darrick J. Wong
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Carlos Maiolino
2017-01-12 18:39:45 +0800
91192ae41 xfs: fix crash and data corruption due to removal of busy COW extents ... Browse Code »

commit a1b7a4dea6166cf46be895bce4aac67ea5160fe8 upstream.

There is a race window between write_cache_pages calling
clear_page_dirty_for_io and XFS calling set_page_writeback, in which
the mapping for an inode is tagged neither as dirty, nor as writeback.

If the COW shrinker hits in exactly that window we'll remove the delayed
COW extents and writepages trying to write it back, which in release
kernels will manifest as corruption of the bmap btree, and in debug
kernels will trip the ASSERT about now calling xfs_bmapi_write with the
COWFORK flag for holes. A complex customer load manages to hit this
window fairly reliably, probably by always having COW writeback in flight
while the cow shrinker runs.

This patch adds another check for having the I_DIRTY_PAGES flag set,
which is still set during this race window. While this fixes the problem
I'm still not overly happy about the way the COW shrinker works as it
still seems a bit fragile.

Signed-off-by: Christoph Hellwig
Signed-off-by: Darrick J. Wong
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Christoph Hellwig
2017-01-12 18:39:45 +0800
b96e4e87d xfs: use the actual AG length when reserving blocks ... Browse Code »

commit 20e73b000bcded44a91b79429d8fa743247602ad upstream.

We need to use the actual AG length when making per-AG reservations,
since we could otherwise end up reserving more blocks out of the last
AG than there are actual blocks.

Complained-about-by: Brian Foster
Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Darrick J. Wong
2017-01-12 18:39:44 +0800
d9c7c9fa6 xfs: fix double-cleanup when CUI recovery fails ... Browse Code »

commit 7a21272b088894070391a94fdd1c67014020fa1d upstream.

Dan Carpenter reported a double-free of rcur if _defer_finish fails
while we're recovering CUI items. Fix the error recovery to prevent
this.

Reported-by: Dan Carpenter
Signed-off-by: Darrick J. Wong
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Darrick J. Wong
2017-01-12 18:39:44 +0800
aa38f370b xfs: use GPF_NOFS when allocating btree cursors ... Browse Code »

commit b24a978c377be5f14e798cb41238e66fe51aab2f upstream.

Use NOFS for allocating btree cursors, since they can be called
under the ilock.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Darrick J. Wong
2017-01-12 18:39:44 +0800
3c382dda4 xfs: ignore leaf attr ichdr.count in verifier during log replay ... Browse Code »

commit 2e1d23370e75d7d89350d41b4ab58c7f6a0e26b2 upstream.

When we create a new attribute, we first create a shortform
attribute, and try to fit the new attribute into it.
If that fails, we copy the (empty) attribute into a leaf attribute,
and do the copy again. Thus there can be a transient state where
we have an empty leaf attribute.

If we encounter this during log replay, the verifier will fail.
So add a test to ignore this part of the leaf attr verification
during log replay.

Thanks as usual to dchinner for spotting the problem.

Signed-off-by: Eric Sandeen
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Eric Sandeen
2017-01-12 18:39:44 +0800
c00203386 xfs: don't cap maximum dedupe request length ... Browse Code »

commit 1bb33a98702d8360947f18a44349df75ba555d5d upstream.

After various discussions on linux-fsdevel, it has been decided that it
is not necessary to cap the length of a dedupe request, and that
correctly-written userspace client programs will be able to absorb the
change. Therefore, remove the length clamping behavior.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Darrick J. Wong
2017-01-12 18:39:44 +0800
f8b20705a xfs: don't allow di_size with high bit set ... Browse Code »

commit ef388e2054feedaeb05399ed654bdb06f385d294 upstream.

The on-disk field di_size is used to set i_size, which is a signed
integer of loff_t. If the high bit of di_size is set, we'll end up with
a negative i_size, which will cause all sorts of problems. Since the
VFS won't let us create a file with such length, we should catch them
here in the verifier too.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Darrick J. Wong
2017-01-12 18:39:43 +0800
12815dd15 xfs: error out if trying to add attrs and anextents > 0 ... Browse Code »

commit 0f352f8ee8412bd9d34fb2a6411241da61175c0e upstream.

We shouldn't assert if somehow we end up trying to add an attr fork to
an inode that apparently already has attr extents because this is an
indication of on-disk corruption. Instead, return an error code to
userspace.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Darrick J. Wong
2017-01-12 18:39:43 +0800
cd4bf1d41 xfs: don't crash if reading a directory results in an unexpected hole ... Browse Code »

commit 96a3aefb8ffde23180130460b0b2407b328eb727 upstream.

In xfs_dir3_data_read, we can encounter the situation where err == 0 and
*bpp == NULL if the given bno offset happens to be a hole; this leads to
a crash if we try to set the buffer type after the _da_read_buf call.
Holes can happen due to corrupt or malicious entries in the bmbt data,
so be a little more careful when we're handling buffers.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Darrick J. Wong
2017-01-12 18:39:43 +0800
b88398de1 xfs: complain if we don't get nextents bmap records ... Browse Code »

commit 356a3225222e5bc4df88aef3419fb6424f18ab69 upstream.

When reading into memory all extents of a btree-format inode fork,
complain if the number of extents we find is not the same as the number
of extents reported in the inode core. This is needed to stop an IO
action from accessing the garbage areas of the in-core fork.

[dchinner: removed redundant assert]

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Darrick J. Wong
2017-01-12 18:39:43 +0800
4bb31bcce xfs: check for bogus values in btree block headers ... Browse Code »

commit bb3be7e7c1c18e1b141d4cadeb98cc89ecf78099 upstream.

When we're reading a btree block, make sure that what we retrieved
matches the owner and level; and has a plausible number of records.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Darrick J. Wong
2017-01-12 18:39:43 +0800
b85f32481 xfs: forbid AG btrees with level == 0 ... Browse Code »

commit d2a047f31e86941fa896e0e3271536d50aba415e upstream.

There is no such thing as a zero-level AG btree since even a single-node
zero-records btree has one level. Btree cursor constructors read
cur_nlevels straight from disk and then access things like
cur_bufs[cur_nlevels - 1] which is /really/ bad if cur_nlevels is zero!
Therefore, strengthen the verifiers to prevent this possibility.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Darrick J. Wong
2017-01-12 18:39:42 +0800
4081d4a79 xfs: handle cow fork in xfs_bmap_trace_exlist ... Browse Code »

commit c44a1f22626c153976289e1cd67bdcdfefc16e1f upstream.

By inspection, xfs_bmap_trace_exlist isn't handling cow forks,
and will trace the data fork instead.

Fix this by setting state appropriately if whichfork
== XFS_COW_FORK.

()___()
< @ @ >
| |
{o_o}
(|)

Signed-off-by: Eric Sandeen
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Eric Sandeen
2017-01-12 18:39:42 +0800
a585e1c4e xfs: pass state not whichfork to trace_xfs_extlist ... Browse Code »

commit 7710517fc37b1899722707883b54694ea710b3c0 upstream.

When xfs_bmap_trace_exlist called trace_xfs_extlist,
it sent in the "whichfork" var instead of the bmap "state"
as expected (even though state was already set up for this
purpose).

As a result, the xfs_bmap_class in tracing code used
"whichfork" not state in xfs_iext_state_to_fork(), and got
the wrong ifork pointer. It all goes downhill from
there, including an ASSERT when ifp_bytes is empty
by the time it reaches xfs_iext_get_ext():

XFS: Assertion failed: idx < ifp->if_bytes / sizeof(xfs_bmbt_rec_t)

Signed-off-by: Eric Sandeen
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Eric Sandeen
2017-01-12 18:39:42 +0800
bdbfd4ee6 xfs: Move AGI buffer type setting to xfs_read_agi ... Browse Code »

commit 200237d6746faaeaf7f4ff4abbf13f3917cee60a upstream.

We've missed properly setting the buffer type for
an AGI transaction in 3 spots now, so just move it
into xfs_read_agi() and set it if we are in a transaction
to avoid the problem in the future.

This is similar to how it is done in i.e. the dir3
and attr3 read functions.

Signed-off-by: Eric Sandeen
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Eric Sandeen
2017-01-12 18:39:42 +0800
06ac11df9 xfs: pass post-eof speculative prealloc blocks to bmapi ... Browse Code »

commit f782088c9e5d08e9494c63e68b4e85716df3e5f8 upstream.

xfs_file_iomap_begin_delay() implements post-eof speculative
preallocation by extending the block count of the requested delayed
allocation. Now that xfs_bmapi_reserve_delalloc() has been updated to
handle prealloc blocks separately and tag the inode, update
xfs_file_iomap_begin_delay() to use the new parameter and rely on the
former to tag the inode.

Note that this patch does not change behavior.

Signed-off-by: Brian Foster
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Brian Foster
2017-01-12 18:39:42 +0800
553937d3c xfs: use new extent lookup helpers xfs_file_iomap_begin_delay ... Browse Code »

commit 656152e552e5cbe0c11ad261b524376217c2fb13 upstream.

And only lookup the previous extent inside xfs_iomap_prealloc_size
if we actually need it.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Christoph Hellwig
2017-01-12 18:39:41 +0800
3d6e3b12b xfs: clean up cow fork reservation and tag inodes correctly ... Browse Code »

commit 0260d8ff5f76617e3a55a1c471383ecb4404c3ad upstream.

COW fork reservation is implemented via delayed allocation. The code is
modeled after the traditional delalloc allocation code, but is slightly
different in terms of how preallocation occurs. Rather than post-eof
speculative preallocation, COW fork preallocation is implemented via a
COW extent size hint that is designed to minimize fragmentation as a
reflinked file is split over time.

xfs_reflink_reserve_cow() still uses logic that is oriented towards
dealing with post-eof speculative preallocation, however, and is stale
or not necessarily correct. First, the EOF alignment to the COW extent
size hint is implemented in xfs_bmapi_reserve_delalloc() (which does so
correctly by aligning the start and end offsets) and so is not necessary
in xfs_reflink_reserve_cow(). The backoff and retry logic on ENOSPC is
also ineffective for the same reason, as xfs_bmapi_reserve_delalloc()
will simply perform the same allocation request on the retry. Finally,
since the COW extent size hint aligns the start and end offset of the
range to allocate, the end_fsb != orig_end_fsb logic is not sufficient.
Indeed, if a write request happens to end on an aligned offset, it is
possible that we do not tag the inode for COW preallocation even though
xfs_bmapi_reserve_delalloc() may have preallocated at the start offset.

Kill the unnecessary, duplicate code in xfs_reflink_reserve_cow().
Remove the inode tag logic as well since xfs_bmapi_reserve_delalloc()
has been updated to tag the inode correctly.

Signed-off-by: Brian Foster
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Brian Foster
2017-01-12 18:39:41 +0800