Eric Lee / smarc-fsl-linux-kernel

23 Nov, 2020

1 commit

03d720f4b Merge a349e4c65960 ("Merge tag 'xfs-5.10-fixes-7' of git://git.kernel.org/pub/sc… ... Browse Code »

…m/fs/xfs/xfs-linux") into android-mainline

Steps on the way to 5.10-rc5

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Idd51203521e6bc05f6648743b2b10c92beba865d

Greg Kroah-Hartman
2020-11-23 15:00:16 +0800

20 Nov, 2020

2 commits

eb8409071 xfs: revert "xfs: fix rmap key and record comparison functions" ... Browse Code »

This reverts commit 6ff646b2ceb0eec916101877f38da0b73e3a5b7f.

Your maintainer committed a major braino in the rmap code by adding the
attr fork, bmbt, and unwritten extent usage bits into rmap record key
comparisons. While XFS uses the usage bits *in the rmap records* for
cross-referencing metadata in xfs_scrub and xfs_repair, it only needs
the owner and offset information to distinguish between reverse mappings
of the same physical extent into the data fork of a file at multiple
offsets. The other bits are not important for key comparisons for index
lookups, and never have been.

Eric Sandeen reports that this causes regressions in generic/299, so
undo this patch before it does more damage.

Reported-by: Eric Sandeen
Fixes: 6ff646b2ceb0 ("xfs: fix rmap key and record comparison functions")
Signed-off-by: Darrick J. Wong
Reviewed-by: Eric Sandeen

Darrick J. Wong
2020-11-20 07:17:50 +0800
883a790a8 xfs: don't allow NOWAIT DIO across extent boundaries ... Browse Code »

Jens has reported a situation where partial direct IOs can be issued
and completed yet still return -EAGAIN. We don't want this to report
a short IO as we want XFS to complete user DIO entirely or not at
all.

This partial IO situation can occur on a write IO that is split
across an allocated extent and a hole, and the second mapping is
returning EAGAIN because allocation would be required.

The trivial reproducer:

$ sudo xfs_io -fdt -c "pwrite 0 4k" -c "pwrite -V 1 -b 8k -N 0 8k" /mnt/scr/foo
wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 0.0001 sec (27.509 MiB/sec and 7042.2535 ops/sec)
pwrite: Resource temporarily unavailable
$

The pwritev2(0, 8kB, RWF_NOWAIT) call returns EAGAIN having done
the first 4kB write:

xfs_file_direct_write: dev 259:1 ino 0x83 size 0x1000 offset 0x0 count 0x2000
iomap_apply: dev 259:1 ino 0x83 pos 0 length 8192 flags WRITE|DIRECT|NOWAIT (0x31) ops xfs_direct_write_iomap_ops caller iomap_dio_rw actor iomap_dio_actor
xfs_ilock_nowait: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_ilock_for_iomap
xfs_iunlock: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_direct_write_iomap_begin
xfs_iomap_found: dev 259:1 ino 0x83 size 0x1000 offset 0x0 count 8192 fork data startoff 0x0 startblock 24 blockcount 0x1
iomap_apply_dstmap: dev 259:1 ino 0x83 bdev 259:1 addr 102400 offset 0 length 4096 type MAPPED flags DIRTY

Here the first iomap loop has mapped the first 4kB of the file and
issued the IO, and we enter the second iomap_apply loop:

iomap_apply: dev 259:1 ino 0x83 pos 4096 length 4096 flags WRITE|DIRECT|NOWAIT (0x31) ops xfs_direct_write_iomap_ops caller iomap_dio_rw actor iomap_dio_actor
xfs_ilock_nowait: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_ilock_for_iomap
xfs_iunlock: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_direct_write_iomap_begin

And we exit with -EAGAIN out because we hit the allocate case trying
to make the second 4kB block.

Then IO completes on the first 4kB and the original IO context
completes and unlocks the inode, returning -EAGAIN to userspace:

xfs_end_io_direct_write: dev 259:1 ino 0x83 isize 0x1000 disize 0x1000 offset 0x0 count 4096
xfs_iunlock: dev 259:1 ino 0x83 flags IOLOCK_SHARED caller xfs_file_dio_aio_write

There are other vectors to the same problem when we re-enter the
mapping code if we have to make multiple mappinfs under NOWAIT
conditions. e.g. failing trylocks, COW extents being found,
allocation being required, and so on.

Avoid all these potential problems by only allowing IOMAP_NOWAIT IO
to go ahead if the mapping we retrieve for the IO spans an entire
allocated extent. This avoids the possibility of subsequent mappings
to complete the IO from triggering NOWAIT semantics by any means as
NOWAIT IO will now only enter the mapping code once per NOWAIT IO.

Reported-and-tested-by: Jens Axboe
Signed-off-by: Dave Chinner
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Dave Chinner
2020-11-20 00:59:11 +0800

19 Nov, 2020

6 commits

595189c25 xfs: return corresponding errcode if xfs_initialize_perag() fail ... Browse Code »

In xfs_initialize_perag(), if kmem_zalloc(), xfs_buf_hash_init(), or
radix_tree_preload() failed, the returned value 'error' is not set
accordingly.

Reported-as-fixing: 8b26c5825e02 ("xfs: handle ENOMEM correctly during initialisation of perag structures")
Fixes: 9b2471797942 ("xfs: cache unlinked pointers in an rhashtable")
Reported-by: Hulk Robot
Signed-off-by: Yu Kuai
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Yu Kuai
2020-11-19 01:23:51 +0800
27c14b5da xfs: ensure inobt record walks always make forward progress ... Browse Code »

The aim of the inode btree record iterator function is to call a
callback on every record in the btree. To avoid having to tear down and
recreate the inode btree cursor around every callback, it caches a
certain number of records in a memory buffer. After each batch of
callback invocations, we have to perform a btree lookup to find the
next record after where we left off.

However, if the keys of the inode btree are corrupt, the lookup might
put us in the wrong part of the inode btree, causing the walk function
to loop forever. Therefore, we add extra cursor tracking to make sure
that we never go backwards neither when performing the lookup nor when
jumping to the next inobt record. This also fixes an off by one error
where upon resume the lookup should have been for the inode /after/ the
point at which we stopped.

Found by fuzzing xfs/460 with keys[2].startino = ones causing bulkstat
and quotacheck to hang.

Fixes: a211432c27ff ("xfs: create simplified inode walk function")
Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R

Darrick J. Wong
2020-11-19 01:23:51 +0800
ada49d64f xfs: fix forkoff miscalculation related to XFS_LITINO(mp) ... Browse Code »

Currently, commit e9e2eae89ddb dropped a (int) decoration from
XFS_LITINO(mp), and since sizeof() expression is also involved,
the result of XFS_LITINO(mp) is simply as the size_t type
(commonly unsigned long).

Considering the expression in xfs_attr_shortform_bytesfit():
offset = (XFS_LITINO(mp) - bytes) >> 3;
let "bytes" be (int)340, and
"XFS_LITINO(mp)" be (unsigned long)336.

on 64-bit platform, the expression is
offset = ((unsigned long)336 - (int)340) >> 3 =
(int)(0xfffffffffffffffcUL >> 3) = -1

but on 32-bit platform, the expression is
offset = ((unsigned long)336 - (int)340) >> 3 =
(int)(0xfffffffcUL >> 3) = 0x1fffffff
instead.

so offset becomes a large positive number on 32-bit platform, and
cause xfs_attr_shortform_bytesfit() returns maxforkoff rather than 0.

Therefore, one result is
"ASSERT(new_size < bytes" in advance suggested by Eric and a misleading
comment together with this bugfix suggested by Darrick. It seems the
other users of XFS_LITINO(mp) are not impacted.

Fixes: e9e2eae89ddb ("xfs: only check the superblock version for dinode size calculation")
Cc: # 5.7+
Reported-and-tested-by: Dennis Gilmore
Reviewed-by: Christoph Hellwig
Signed-off-by: Gao Xiang
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Gao Xiang
2020-11-19 01:23:51 +0800
6b48e5b8a xfs: directory scrub should check the null bestfree entries too ... Browse Code »

Teach the directory scrubber to check all the bestfree entries,
including the null ones. We want to be able to detect the case where
the entry is null but there actually /is/ a directory data block.

Found by fuzzing lbests[0] = ones in xfs/391.

Fixes: df481968f33b ("xfs: scrub directory freespace")
Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-11-19 01:23:50 +0800
498fe261f xfs: strengthen rmap record flags checking ... Browse Code »

We always know the correct state of the rmap record flags (attr, bmbt,
unwritten) so check them by direct comparison.

Fixes: d852657ccfc0 ("xfs: cross-reference reverse-mapping btree")
Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-11-19 01:23:50 +0800
e95b6c3ef xfs: fix the minrecs logic when dealing with inode root child blocks ... Browse Code »

The comment and logic in xchk_btree_check_minrecs for dealing with
inode-rooted btrees isn't quite correct. While the direct children of
the inode root are allowed to have fewer records than what would
normally be allowed for a regular ondisk btree block, this is only true
if there is only one child block and the number of records don't fit in
the inode root.

Fixes: 08a3a692ef58 ("xfs: btree scrub should check minrecs")
Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-11-19 01:23:50 +0800

14 Nov, 2020

1 commit

73936cf2d Merge f01c30de86f1 ("Merge tag 'vfs-5.10-fixes-2' of git://git.kernel.org/pub/sc… ... Browse Code »

…m/fs/xfs/xfs-linux") into android-mainline

Steps on the way to 5.10-rc4

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iba36d2244c4229d44e3d391ed23dc25c6022f917

Greg Kroah-Hartman
2020-11-14 17:29:08 +0800

12 Nov, 2020

1 commit

2bd3fa793 xfs: fix a missing unlock on error in xfs_fs_map_blocks ... Browse Code »

We also need to drop the iolock when invalidate_inode_pages2 fails, not
only on all other error or successful cases.

Fixes: 527851124d10 ("xfs: implement pNFS export operations")
Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2020-11-12 00:07:37 +0800

11 Nov, 2020

4 commits

54e9b09e1 xfs: fix brainos in the refcount scrubber's rmap fragment processor ... Browse Code »

Fix some serious WTF in the reference count scrubber's rmap fragment
processing. The code comment says that this loop is supposed to move
all fragment records starting at or before bno onto the worklist, but
there's no obvious reason why nr (the number of items added) should
increment starting from 1, and breaking the loop when we've added the
target number seems dubious since we could have more rmap fragments that
should have been added to the worklist.

This seems to manifest in xfs/411 when adding one to the refcount field.

Fixes: dbde19da9637 ("xfs: cross-reference the rmapbt data with the refcountbt")
Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-11-11 08:48:03 +0800
6ff646b2c xfs: fix rmap key and record comparison functions ... Browse Code »

Keys for extent interval records in the reverse mapping btree are
supposed to be computed as follows:

(physical block, owner, fork, is_btree, is_unwritten, offset)

This provides users the ability to look up a reverse mapping from a bmbt
record -- start with the physical block; then if there are multiple
records for the same block, move on to the owner; then the inode fork
type; and so on to the file offset.

However, the key comparison functions incorrectly remove the
fork/btree/unwritten information that's encoded in the on-disk offset.
This means that lookup comparisons are only done with:

(physical block, owner, offset)

This means that queries can return incorrect results. On consistent
filesystems this hasn't been an issue because blocks are never shared
between forks or with bmbt blocks; and are never unwritten. However,
this bug means that online repair cannot always detect corruption in the
key information in internal rmapbt nodes.

Found by fuzzing keys[1].attrfork = ones on xfs/371.

Fixes: 4b8ed67794fe ("xfs: add rmap btree operations")
Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-11-11 08:47:56 +0800
5dda3897f xfs: set the unwritten bit in rmap lookup flags in xchk_bmap_get_rmapextents ... Browse Code »

When the bmbt scrubber is looking up rmap extents, we need to set the
extent flags from the bmbt record fully. This will matter once we fix
the rmap btree comparison functions to check those flags correctly.

Fixes: d852657ccfc0 ("xfs: cross-reference reverse-mapping btree")
Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-11-11 08:47:51 +0800
ea8439899 xfs: fix flags argument to rmap lookup when converting shared file rmaps ... Browse Code »

Pass the same oldext argument (which contains the existing rmapping's
unwritten state) to xfs_rmap_lookup_le_range at the start of
xfs_rmap_convert_shared. At this point in the code, flags is zero,
which means that we perform lookups using the wrong key.

Fixes: 3f165b334e51 ("xfs: convert unwritten status of reverse mappings for shared files")
Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-11-11 08:47:34 +0800

09 Nov, 2020

1 commit

2cfc344f8 Merge 5.10-rc3 into android-mainline ... Browse Code »

Linux 5.10-rc3

Signed-off-by: Greg Kroah-Hartman
Change-Id: I7884051ea7b86204b2685b51462368e122ad0772

Greg Kroah-Hartman
2020-11-09 19:49:27 +0800

05 Nov, 2020

5 commits

46afb0628 xfs: only flush the unshared range in xfs_reflink_unshare ... Browse Code »

There's no reason to flush an entire file when we're unsharing part of
a file. Therefore, only initiate writeback on the selected range.

Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R

Darrick J. Wong
2020-11-05 09:41:56 +0800
c1f6b1ac0 xfs: fix scrub flagging rtinherit even if there is no rt device ... Browse Code »

The kernel has always allowed directories to have the rtinherit flag
set, even if there is no rt device, so this check is wrong.

Fixes: 80e4e1268802 ("xfs: scrub inodes")
Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-11-05 00:52:47 +0800
c2f09217a xfs: fix missing CoW blocks writeback conversion retry ... Browse Code »

In commit 7588cbeec6df, we tried to fix a race stemming from the lack of
coordination between higher level code that wants to allocate and remap
CoW fork extents into the data fork. Christoph cites as examples the
always_cow mode, and a directio write completion racing with writeback.

According to the comments before the goto retry, we want to restart the
lookup to catch the extent in the data fork, but we don't actually reset
whichfork or cow_fsb, which means the second try executes using stale
information. Up until now I think we've gotten lucky that either
there's something left in the CoW fork to cause cow_fsb to be reset, or
either data/cow fork sequence numbers have advanced enough to force a
fresh lookup from the data fork. However, if we reach the retry with an
empty stable CoW fork and a stable data fork, neither of those things
happens. The retry foolishly re-calls xfs_convert_blocks on the CoW
fork which fails again. This time, we toss the write.

I've recently been working on extending reflink to the realtime device.
When the realtime extent size is larger than a single block, we have to
force the page cache to CoW the entire rt extent if a write (or
fallocate) are not aligned with the rt extent size. The strategy I've
chosen to deal with this is derived from Dave's blocksize > pagesize
series: dirtying around the write range, and ensuring that writeback
always starts mapping on an rt extent boundary. This has brought this
race front and center, since generic/522 blows up immediately.

However, I'm pretty sure this is a bug outright, independent of that.

Fixes: 7588cbeec6df ("xfs: retry COW fork delalloc conversion when no extent was found")
Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-11-05 00:52:47 +0800
763e4cdc0 iomap: support partial page discard on writeback block mapping failure ... Browse Code »

iomap writeback mapping failure only calls into ->discard_page() if
the current page has not been added to the ioend. Accordingly, the
XFS callback assumes a full page discard and invalidation. This is
problematic for sub-page block size filesystems where some portion
of a page might have been mapped successfully before a failure to
map a delalloc block occurs. ->discard_page() is not called in that
error scenario and the bio is explicitly failed by iomap via the
error return from ->prepare_ioend(). As a result, the filesystem
leaks delalloc blocks and corrupts the filesystem block counters.

Since XFS is the only user of ->discard_page(), tweak the semantics
to invoke the callback unconditionally on mapping errors and provide
the file offset that failed to map. Update xfs_discard_page() to
discard the corresponding portion of the file and pass the range
along to iomap_invalidatepage(). The latter already properly handles
both full and sub-page scenarios by not changing any iomap or page
state on sub-page invalidations.

Signed-off-by: Brian Foster
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Brian Foster
2020-11-05 00:52:46 +0800
869ae85da xfs: flush new eof page on truncate to avoid post-eof corruption ... Browse Code »

It is possible to expose non-zeroed post-EOF data in XFS if the new
EOF page is dirty, backed by an unwritten block and the truncate
happens to race with writeback. iomap_truncate_page() will not zero
the post-EOF portion of the page if the underlying block is
unwritten. The subsequent call to truncate_setsize() will, but
doesn't dirty the page. Therefore, if writeback happens to complete
after iomap_truncate_page() (so it still sees the unwritten block)
but before truncate_setsize(), the cached page becomes inconsistent
with the on-disk block. A mapped read after the associated page is
reclaimed or invalidated exposes non-zero post-EOF data.

For example, consider the following sequence when run on a kernel
modified to explicitly flush the new EOF page within the race
window:

$ xfs_io -fc "falloc 0 4k" -c fsync /mnt/file
$ xfs_io -c "pwrite 0 4k" -c "truncate 1k" /mnt/file
...
$ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file
00000400: 00 00 00 00 00 00 00 00 ........
$ umount /mnt/; mount /mnt/
$ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file
00000400: cd cd cd cd cd cd cd cd ........

Update xfs_setattr_size() to explicitly flush the new EOF page prior
to the page truncate to ensure iomap has the latest state of the
underlying block.

Fixes: 68a9f5e7007c ("xfs: implement iomap based buffered write path")
Signed-off-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Brian Foster
2020-11-05 00:52:46 +0800

29 Oct, 2020

3 commits

2c334e12f xfs: set xefi_discard when creating a deferred agfl free log intent item ... Browse Code »

Make sure that we actually initialize xefi_discard when we're scheduling
a deferred free of an AGFL block. This was (eventually) found by the
UBSAN while I was banging on realtime rmap problems, but it exists in
the upstream codebase. While we're at it, rearrange the structure to
reduce the struct size from 64 to 56 bytes.

Fixes: fcb762f5de2e ("xfs: add bmapi nodiscard flag")
Signed-off-by: Darrick J. Wong
Reviewed-by: Brian Foster

Darrick J. Wong
2020-10-29 23:19:18 +0800
67d3ed576 Merge 'v5.10-rc1' into android-mainline ... Browse Code »

Linux 5.10-rc1

Signed-off-by: Greg Kroah-Hartman
Change-Id: Iace3fc84a00d3023c75caa086a266de17dc1847c

Greg Kroah-Hartman
2020-10-29 13:32:38 +0800
338f8e80b Merge 0746c4a9f3d3 ("Merge branch 'i2c/for-5.10' of git://git.kernel.org/pub/scm… ... Browse Code »

…/linux/kernel/git/wsa/linux") into android-mainline

Steps on the way to 5.10-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iec426c6de4a59a517e5fa575a9424b883d958f08

Greg Kroah-Hartman
2020-10-29 04:20:34 +0800

27 Oct, 2020

1 commit

79c83f152 Merge 1f70935f637d ("Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/li… ... Browse Code »

…nux/kernel/git/soc/soc") into android-mainline

Steps on the way to 5.10-rc1

Resolves conflicts in:
Documentation/admin-guide/sysctl/vm.rst

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic58f28718f28dae42948c935dfb0c62122fe86fc

Greg Kroah-Hartman
2020-10-27 18:47:16 +0800

26 Oct, 2020

2 commits

0a824c225 Merge 38525c6919e2 ("Merge tag 'for-v5.10' of git://git.kernel.org/pub/scm/linux… ... Browse Code »

…/kernel/git/sre/linux-power-supply") into android-mainline

Steps on the way to 5.10-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie67a4cdc4b30c466c302c46bd1348b3e94ea8f91

Greg Kroah-Hartman
2020-10-26 17:16:46 +0800
33def8498 treewide: Convert macro and uses of __section(foo) to __section("foo") ... Browse Code »

Use a more generic form for __section that requires quotes to avoid
complications with clang and gcc differences.

Remove the quote operator # from compiler_attributes.h __section macro.

Convert all unquoted __section(foo) uses to quoted __section("foo").
Also convert __attribute__((section("foo"))) uses to __section("foo")
even if the __attribute__ has multiple list entry forms.

Conversion done using the script at:

https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl

Signed-off-by: Joe Perches
Reviewed-by: Nick Desaulniers
Reviewed-by: Miguel Ojeda
Signed-off-by: Linus Torvalds

Joe Perches
2020-10-26 05:51:49 +0800

25 Oct, 2020

2 commits

8c3d23ed9 Merge 6e4dc3d59284 ("Merge tag 'for-linus-5.10-1' of git://github.com/cminyard/l… ... Browse Code »

…inux-ipmi") into android-mainline

Steps on the way to 5.10-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Idbd0577a495237bf5628333110e2c98a77b39c77

Greg Kroah-Hartman
2020-10-25 23:26:30 +0800
0eac1102e Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull misc vfs updates from Al Viro:
"Assorted stuff all over the place (the largest group here is
Christoph's stat cleanups)"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: remove KSTAT_QUERY_FLAGS
fs: remove vfs_stat_set_lookup_flags
fs: move vfs_fstatat out of line
fs: implement vfs_stat and vfs_lstat in terms of vfs_fstatat
fs: remove vfs_statx_fd
fs: omfs: use kmemdup() rather than kmalloc+memcpy
[PATCH] reduce boilerplate in fsid handling
fs: Remove duplicated flag O_NDELAY occurring twice in VALID_OPEN_FLAGS
selftests: mount: add nosymfollow tests
Add a "nosymfollow" mount option.

Linus Torvalds
2020-10-25 03:26:05 +0800

24 Oct, 2020

1 commit

f11901ed7 Merge tag 'xfs-5.10-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull xfs fixes from Darrick Wong:
"Two bug fixes that trickled in during the merge window:

- Make fallocate check the alignment of its arguments against the
fundamental allocation unit of the volume the file lives on, so
that we don't trigger the fs' alignment checks.

- Cancel unprocessed log intents immediately when log recovery fails,
to avoid a log deadlock"

* tag 'xfs-5.10-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: cancel intents immediately if process_intents fails
xfs: fix fallocate functions when rtextsize is larger than 1

Linus Torvalds
2020-10-24 08:15:06 +0800

22 Oct, 2020

2 commits

2e76f188f xfs: cancel intents immediately if process_intents fails ... Browse Code »

If processing recovered log intent items fails, we need to cancel all
the unprocessed recovered items immediately so that a subsequent AIL
push in the bail out path won't get wedged on the pinned intent items
that didn't get processed.

This can happen if the log contains (1) an intent that gets and releases
an inode, (2) an intent that cannot be recovered successfully, and (3)
some third intent item. When recovery of (2) fails, we leave (3) pinned
in memory. Inode reclamation is called in the error-out path of
xfs_mountfs before xfs_log_cancel_mount. Reclamation calls
xfs_ail_push_all_sync, which gets stuck waiting for (3).

Therefore, call xlog_recover_cancel_intents if _process_intents fails.

Signed-off-by: Darrick J. Wong
Reviewed-by: Brian Foster

Darrick J. Wong
2020-10-22 07:28:46 +0800
25219dbfa xfs: fix fallocate functions when rtextsize is larger than 1 ... Browse Code »

In commit fe341eb151ec, I forgot that xfs_free_file_space isn't strictly
a "remove mapped blocks" function. It is actually a function to zero
file space by punching out the middle and writing zeroes to the
unaligned ends of the specified range. Therefore, putting a rtextsize
alignment check in that function is wrong because that breaks unaligned
ZERO_RANGE on the realtime volume.

Furthermore, xfs_file_fallocate already has alignment checks for the
functions require the file range to be aligned to the size of a
fundamental allocation unit (which is 1 FSB on the data volume and 1 rt
extent on the realtime volume). Create a new helper to check fallocate
arguments against the realtiem allocation unit size, fix the fallocate
frontend to use it, fix free_file_space to delete the correct range, and
remove a now redundant check from insert_file_space.

NOTE: The realtime extent size is not required to be a power of two!

Fixes: fe341eb151ec ("xfs: ensure that fpunch, fcollapse, and finsert operations are aligned to rt extent size")
Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R

Darrick J. Wong
2020-10-22 00:05:19 +0800

20 Oct, 2020

1 commit

bbe85027c Merge tag 'xfs-5.10-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull more xfs updates from Darrick Wong:
"The second large pile of new stuff for 5.10, with changes even more
monumental than last week!

We are formally announcing the deprecation of the V4 filesystem format
in 2030. All users must upgrade to the V5 format, which contains
design improvements that greatly strengthen metadata validation,
supports reflink and online fsck, and is the intended vehicle for
handling timestamps past 2038. We're also deprecating the old Irix
behavioral tweaks in September 2025.

Coming along for the ride are two design changes to the deferred
metadata ops subsystem. One of the improvements is to retain correct
logical ordering of tasks and subtasks, which is a more logical design
for upper layers of XFS and will become necessary when we add atomic
file range swaps and commits. The second improvement to deferred ops
improves the scalability of the log by helping the log tail to move
forward during long-running operations. This reduces log contention
when there are a large number of threads trying to run transactions.

In addition to that, this fixes numerous small bugs in log recovery;
refactors logical intent log item recovery to remove the last
remaining place in XFS where we could have nested transactions; fixes
a couple of ways that intent log item recovery could fail in ways that
wouldn't have happened in the regular commit paths; fixes a deadlock
vector in the GETFSMAP implementation (which improves its performance
by 20%); and fixes serious bugs in the realtime growfs, fallocate, and
bitmap handling code.

Summary:

- Deprecate the V4 filesystem format, some disused mount options, and
some legacy sysctl knobs now that we can support dates into the
25th century. Note that removal of V4 support will not happen until
the early 2030s.

- Fix some probles with inode realtime flag propagation.

- Fix some buffer handling issues when growing a rt filesystem.

- Fix a problem where a BMAP_REMAP unmap call would free rt extents
even though the purpose of BMAP_REMAP is to avoid freeing the
blocks.

- Strengthen the dabtree online scrubber to check hash values on
child dabtree blocks.

- Actually log new intent items created as part of recovering log
intent items.

- Fix a bug where quotas weren't attached to an inode undergoing bmap
intent item recovery.

- Fix a buffer overrun problem with specially crafted log buffer
headers.

- Various cleanups to type usage and slightly inaccurate comments.

- More cleanups to the xattr, log, and quota code.

- Don't run the (slower) shared-rmap operations on attr fork
mappings.

- Fix a bug where we failed to check the LSN of finobt blocks during
replay and could therefore overwrite newer data with older data.

- Clean up the ugly nested transaction mess that log recovery uses to
stage intent item recovery in the correct order by creating a
proper data structure to capture recovered chains.

- Use the capture structure to resume intent item chains with the
same log space and block reservations as when they were captured.

- Fix a UAF bug in bmap intent item recovery where we failed to
maintain our reference to the incore inode if the bmap operation
needed to relog itself to continue.

- Rearrange the defer ops mechanism to finish newly created subtasks
of a parent task before moving on to the next parent task.

- Automatically relog intent items in deferred ops chains if doing so
would help us avoid pinning the log tail. This will help fix some
log scaling problems now and will facilitate atomic file updates
later.

- Fix a deadlock in the GETFSMAP implementation by using an internal
memory buffer to reduce indirect calls and copies to userspace,
thereby improving its performance by ~20%.

- Fix various problems when calling growfs on a realtime volume would
not fully update the filesystem metadata.

- Fix broken Kconfig asking about deprecated XFS when XFS is
disabled"

* tag 'xfs-5.10-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (48 commits)
xfs: fix Kconfig asking about XFS_SUPPORT_V4 when XFS_FS=n
xfs: fix high key handling in the rt allocator's query_range function
xfs: annotate grabbing the realtime bitmap/summary locks in growfs
xfs: make xfs_growfs_rt update secondary superblocks
xfs: fix realtime bitmap/summary file truncation when growing rt volume
xfs: fix the indent in xfs_trans_mod_dquot
xfs: do the ASSERT for the arguments O_{u,g,p}dqpp
xfs: fix deadlock and streamline xfs_getfsmap performance
xfs: limit entries returned when counting fsmap records
xfs: only relog deferred intent items if free space in the log gets low
xfs: expose the log push threshold
xfs: periodically relog deferred intent items
xfs: change the order in which child and parent defer ops are finished
xfs: fix an incore inode UAF in xfs_bui_recover
xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering
xfs: clean up bmap intent item recovery checking
xfs: xfs_defer_capture should absorb remaining transaction reservation
xfs: xfs_defer_capture should absorb remaining block reservations
xfs: proper replay of deferred ops queued during log recovery
xfs: remove XFS_LI_RECOVERED
...

Linus Torvalds
2020-10-20 05:38:46 +0800

17 Oct, 2020

2 commits

894645546 xfs: fix Kconfig asking about XFS_SUPPORT_V4 when XFS_FS=n ... Browse Code »

Pavel Machek complained that the question about supporting deprecated
XFS v4 comes up even when XFS is disabled. This clearly makes no sense,
so fix Kconfig.

Reported-by: Pavel Machek
Signed-off-by: Darrick J. Wong
Reviewed-by: Eric Sandeen

Darrick J. Wong
2020-10-17 06:34:28 +0800
d88850bd5 xfs: fix high key handling in the rt allocator's query_range function ... Browse Code »

Fix some off-by-one errors in xfs_rtalloc_query_range. The highest key
in the realtime bitmap is always one less than the number of rt extents,
which means that the key clamp at the start of the function is wrong.
The 4th argument to xfs_rtfind_forw is the highest rt extent that we
want to probe, which means that passing 1 less than the high key is
wrong. Finally, drop the rem variable that controls the loop because we
can compare the iteration point (rtstart) against the high key directly.

The sordid history of this function is that the original commit (fb3c3)
incorrectly passed (high_rec->ar_startblock - 1) as the 'limit' parameter
to xfs_rtfind_forw. This was wrong because the "high key" is supposed
to be the largest key for which the caller wants result rows, not the
key for the first row that could possibly be outside the range that the
caller wants to see.

A subsequent attempt (8ad56) to strengthen the parameter checking added
incorrect clamping of the parameters to the number of rt blocks in the
system (despite the bitmap functions all taking units of rt extents) to
avoid querying ranges past the end of rt bitmap file but failed to fix
the incorrect _rtfind_forw parameter. The original _rtfind_forw
parameter error then survived the conversion of the startblock and
blockcount fields to rt extents (a0e5c), and the most recent off-by-one
fix (a3a37) thought it was patching a problem when the end of the rt
volume is not in use, but none of these fixes actually solved the
original problem that the author was confused about the "limit" argument
to xfs_rtfind_forw.

Sadly, all four of these patches were written by this author and even
his own usage of this function and rt testing were inadequate to get
this fixed quickly.

Original-problem: fb3c3de2f65c ("xfs: add a couple of queries to iterate free extents in the rtbitmap")
Not-fixed-by: 8ad560d2565e ("xfs: strengthen rtalloc query range checks")
Not-fixed-by: a0e5c435babd ("xfs: fix xfs_rtalloc_rec units")
Fixes: a3a374bf1889 ("xfs: fix off-by-one error in xfs_rtalloc_query_range")
Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R

Darrick J. Wong
2020-10-17 06:34:28 +0800

15 Oct, 2020

1 commit

2fc61f25f Merge tag 'xfs-5.10-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull xfs updates from Darrick Wong:
"The biggest changes are two new features for the ondisk metadata: one
to record the sizes of the inode btrees in the AG to increase
redundancy checks and to improve mount times; and a second new feature
to support timestamps until the year 2486.

We also fixed a problem where reflinking into a file that requires
synchronous writes wouldn't actually flush the updates to disk; clean
up a fair amount of cruft; and started fixing some bugs in the
realtime volume code.

Summary:

- Clean up the buffer ioend calling path so that the retry strategy
isn't quite so scattered everywhere.

- Clean up m_sb_bp handling.

- New feature: storing inode btree counts in the AGI to speed up
certain mount time per-AG block reservation operatoins and add a
little more metadata redundancy.

- New feature: Widen inode timestamps and quota grace expiration
timestamps to support dates through the year 2486.

- Get rid of more of our custom buffer allocation API wrappers.

- Use a proper VLA for shortform xattr structure namevals.

- Force the log after reflinking or deduping into a file that is
opened with O_SYNC or O_DSYNC.

- Fix some math errors in the realtime allocator"

* tag 'xfs-5.10-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (42 commits)
xfs: ensure that fpunch, fcollapse, and finsert operations are aligned to rt extent size
xfs: make sure the rt allocator doesn't run off the end
xfs: Remove unneeded semicolon
xfs: force the log after remapping a synchronous-writes file
xfs: Convert xfs_attr_sf macros to inline functions
xfs: Use variable-size array for nameval in xfs_attr_sf_entry
xfs: Remove typedef xfs_attr_shortform_t
xfs: remove typedef xfs_attr_sf_entry_t
xfs: Remove kmem_zalloc_large()
xfs: enable big timestamps
xfs: trace timestamp limits
xfs: widen ondisk quota expiration timestamps to handle y2038+
xfs: widen ondisk inode timestamps to deal with y2038+
xfs: redefine xfs_ictimestamp_t
xfs: redefine xfs_timestamp_t
xfs: move xfs_log_dinode_to_disk to the log recovery code
xfs: refactor quota timestamp coding
xfs: refactor default quota grace period setting code
xfs: refactor quota expiration timer modification
xfs: explicitly define inode timestamp range
...

Linus Torvalds
2020-10-15 05:06:06 +0800

13 Oct, 2020

3 commits

ace74e797 xfs: annotate grabbing the realtime bitmap/summary locks in growfs ... Browse Code »

Use XFS_ILOCK_RT{BITMAP,SUM} to annotate grabbing the rt bitmap and
summary locks when we grow the realtime volume, just like we do most
everywhere else. This shuts up lockdep warnings about grabbing the
ILOCK class of locks recursively:

============================================
WARNING: possible recursive locking detected
5.9.0-rc4-djw #rc4 Tainted: G O
--------------------------------------------
xfs_growfs/4841 is trying to acquire lock:
ffff888035acc230 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_ilock+0xac/0x1a0 [xfs]

but task is already holding lock:
ffff888035acedb0 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_ilock+0xac/0x1a0 [xfs]

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&xfs_nondir_ilock_class);
lock(&xfs_nondir_ilock_class);

*** DEADLOCK ***

May be due to missing lock nesting notation

Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R

Darrick J. Wong
2020-10-13 23:41:31 +0800
7249c95a3 xfs: make xfs_growfs_rt update secondary superblocks ... Browse Code »

When we call growfs on the data device, we update the secondary
superblocks to reflect the updated filesystem geometry. We need to do
this for growfs on the realtime volume too, because a future xfs_repair
run could try to fix the filesystem using a backup superblock.

This was observed by the online superblock scrubbers while running
xfs/233. One can also trigger this by growing an rt volume, cycling the
mount, and creating new rt files.

Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R

Darrick J. Wong
2020-10-13 23:41:31 +0800
f4c32e87d xfs: fix realtime bitmap/summary file truncation when growing rt volume ... Browse Code »

The realtime bitmap and summary files are regular files that are hidden
away from the directory tree. Since they're regular files, inode
inactivation will try to purge what it thinks are speculative
preallocations beyond the incore size of the file. Unfortunately,
xfs_growfs_rt forgets to update the incore size when it resizes the
inodes, with the result that inactivating the rt inodes at unmount time
will cause their contents to be truncated.

Fix this by updating the incore size when we change the ondisk size as
part of updating the superblock. Note that we don't do this when we're
allocating blocks to the rt inodes because we actually want those blocks
to get purged if the growfs fails.

This fixes corruption complaints from the online rtsummary checker when
running xfs/233. Since that test requires rmap, one can also trigger
this by growing an rt volume, cycling the mount, and creating rt files.

Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R

Darrick J. Wong
2020-10-13 23:41:31 +0800

07 Oct, 2020

1 commit

e5b23740d xfs: fix the indent in xfs_trans_mod_dquot ... Browse Code »

The formatting is strange in xfs_trans_mod_dquot, so do a reindent.

Signed-off-by: Kaixu Xia
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Kaixu Xia
2020-10-07 23:40:29 +0800