Eric Lee / smarc-fsl-linux-kernel

05 Nov, 2020

2 commits

50e7d6c7a iomap: clean up writeback state logic on writepage error ... Browse Code »

The iomap writepage error handling logic is a mash of old and
slightly broken XFS writepage logic. When keepwrite writeback state
tracking was introduced in XFS in commit 0d085a529b42 ("xfs: ensure
WB_SYNC_ALL writeback handles partial pages correctly"), XFS had an
additional cluster writeback context that scanned ahead of
->writepage() to process dirty pages over the current ->writepage()
extent mapping. This context expected a dirty page and required
retention of the TOWRITE tag on partial page processing so the
higher level writeback context would revisit the page (in contrast
to ->writepage(), which passes a page with the dirty bit already
cleared).

The cluster writeback mechanism was eventually removed and some of
the error handling logic folded into the primary writeback path in
commit 150d5be09ce4 ("xfs: remove xfs_cancel_ioend"). This patch
accidentally conflated the two contexts by using the keepwrite logic
in ->writepage() without accounting for the fact that the page is
not dirty. Further, the keepwrite logic has no practical effect on
the core ->writepage() caller (write_cache_pages()) because it never
revisits a page in the current function invocation.

Technically, the page should be redirtied for the keepwrite logic to
have any effect. Otherwise, write_cache_pages() may find the tagged
page but will skip it since it is clean. Even if the page was
redirtied, however, there is still no practical effect to keepwrite
since write_cache_pages() does not wrap around within a single
invocation of the function. Therefore, the dirty page would simply
end up retagged on the next writeback sequence over the associated
range.

All that being said, none of this really matters because redirtying
a partially processed page introduces a potential infinite redirty
-> writeback failure loop that deviates from the current design
principle of clearing the dirty state on writepage failure to avoid
building up too much dirty, unreclaimable memory on the system.
Therefore, drop the spurious keepwrite usage and dirty state
clearing logic from iomap_writepage_map(), treat the partially
processed page the same as a fully processed page, and let the
imminent ioend failure clean up the writeback state.

Signed-off-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Brian Foster
2020-11-05 00:52:46 +0800
763e4cdc0 iomap: support partial page discard on writeback block mapping failure ... Browse Code »

iomap writeback mapping failure only calls into ->discard_page() if
the current page has not been added to the ioend. Accordingly, the
XFS callback assumes a full page discard and invalidation. This is
problematic for sub-page block size filesystems where some portion
of a page might have been mapped successfully before a failure to
map a delalloc block occurs. ->discard_page() is not called in that
error scenario and the bio is explicitly failed by iomap via the
error return from ->prepare_ioend(). As a result, the filesystem
leaks delalloc blocks and corrupts the filesystem block counters.

Since XFS is the only user of ->discard_page(), tweak the semantics
to invoke the callback unconditionally on mapping errors and provide
the file offset that failed to map. Update xfs_discard_page() to
discard the corresponding portion of the file and pass the range
along to iomap_invalidatepage(). The latter already properly handles
both full and sub-page scenarios by not changing any iomap or page
state on sub-page invalidations.

Signed-off-by: Brian Foster
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Brian Foster
2020-11-05 00:52:46 +0800

28 Sep, 2020

3 commits

1a31182ed iomap: Call inode_dio_end() before generic_write_sync() ... Browse Code »

iomap complete routine can deadlock with btrfs_fallocate because of the
call to generic_write_sync().

P0 P1
inode_lock() fallocate(FALLOC_FL_ZERO_RANGE)
__iomap_dio_rw() inode_lock()

inode_unlock()

inode_dio_wait()
iomap_dio_complete()
generic_write_sync()
btrfs_file_fsync()
inode_lock()

inode_dio_end() is used to notify the end of DIO data in order
to synchronize with truncate. Call inode_dio_end() before calling
generic_write_sync(), so filesystems can lock i_rwsem during a sync.

This matches the way it is done in fs/direct-io.c:dio_complete().

Signed-off-by: Goldwyn Rodrigues
Reviewed-by: Christoph Hellwig
Reviewed-by: Josef Bacik
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Goldwyn Rodrigues
2020-09-28 23:51:08 +0800
c3d4ed1ab iomap: Allow filesystem to call iomap_dio_complete without i_rwsem ... Browse Code »

This is to avoid the deadlock caused in btrfs because of O_DIRECT |
O_DSYNC.

Filesystems such as btrfs require i_rwsem while performing sync on a
file. iomap_dio_rw() is called under i_rw_sem. This leads to a
deadlock because of:

iomap_dio_complete()
generic_write_sync()
btrfs_sync_file()

Separate out iomap_dio_complete() from iomap_dio_rw(), so filesystems
can call iomap_dio_complete() after unlocking i_rwsem.

Signed-off-by: Christoph Hellwig
Reviewed-by: Josef Bacik
Signed-off-by: Goldwyn Rodrigues
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2020-09-28 23:51:08 +0800
4595a298d iomap: Set all uptodate bits for an Uptodate page ... Browse Code »

For filesystems with block size < page size, we need to set all the
per-block uptodate bits if the page was already uptodate at the time
we create the per-block metadata. This can happen if the page is
invalidated (eg by a write to drop_caches) but ultimately not removed
from the page cache.

This is a data corruption issue as page writeback skips blocks which
are marked !uptodate.

Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
Signed-off-by: Matthew Wilcox (Oracle)
Reported-by: Qian Cai
Cc: Brian Foster
Reviewed-by: Gao Xiang
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Matthew Wilcox (Oracle)
2020-09-28 23:47:01 +0800

21 Sep, 2020

10 commits

81ee8e52a iomap: Change calling convention for zeroing ... Browse Code »

Pass the full length to iomap_zero() and dax_iomap_zero(), and have
them return how many bytes they actually handled. This is preparatory
work for handling THP, although it looks like DAX could actually take
advantage of it if there's a larger contiguous area.

Signed-off-by: Matthew Wilcox (Oracle)
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Matthew Wilcox (Oracle)
2020-09-21 23:59:27 +0800
e25ba8cbf iomap: Convert iomap_write_end types ... Browse Code »

iomap_write_end cannot return an error, so switch it to return
size_t instead of int and remove the error checking from the callers.
Also convert the arguments to size_t from unsigned int, in case anyone
ever wants to support a page size larger than 2GB.

Signed-off-by: Matthew Wilcox (Oracle)
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Matthew Wilcox (Oracle)
2020-09-21 23:59:26 +0800
0fb2d7209 iomap: Convert write_count to write_bytes_pending ... Browse Code »

Instead of counting bio segments, count the number of bytes submitted.
This insulates us from the block layer's definition of what a 'same page'
is, which is not necessarily clear once THPs are involved.

Signed-off-by: Matthew Wilcox (Oracle)
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Matthew Wilcox (Oracle)
2020-09-21 23:59:26 +0800
7d636676d iomap: Convert read_count to read_bytes_pending ... Browse Code »

Instead of counting bio segments, count the number of bytes submitted.
This insulates us from the block layer's definition of what a 'same page'
is, which is not necessarily clear once THPs are involved.

Signed-off-by: Matthew Wilcox (Oracle)
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Matthew Wilcox (Oracle)
2020-09-21 23:59:26 +0800
0a195b91e iomap: Support arbitrarily many blocks per page ... Browse Code »

Size the uptodate array dynamically to support larger pages in the
page cache. With a 64kB page, we're only saving 8 bytes per page today,
but with a 2MB maximum page size, we'd have to allocate more than 4kB
per page. Add a few debugging assertions.

Signed-off-by: Matthew Wilcox (Oracle)
Reviewed-by: Dave Chinner
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Matthew Wilcox (Oracle)
2020-09-21 23:59:26 +0800
b21866f51 iomap: Use bitmap ops to set uptodate bits ... Browse Code »

Now that the bitmap is protected by a spinlock, we can use the
more efficient bitmap ops instead of individual test/set bit ops.

Signed-off-by: Matthew Wilcox (Oracle)
Reviewed-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Matthew Wilcox (Oracle)
2020-09-21 23:59:26 +0800
a6901d4d1 iomap: Use kzalloc to allocate iomap_page ... Browse Code »

We can skip most of the initialisation, although spinlocks still
need explicit initialisation as architectures may use a non-zero
value to indicate unlocked. The comment is no longer useful as
attach_page_private() handles the refcount now.

Signed-off-by: Matthew Wilcox (Oracle)
Reviewed-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Matthew Wilcox (Oracle)
2020-09-21 23:59:26 +0800
24addd848 fs: Introduce i_blocks_per_page ... Browse Code »

This helper is useful for both THPs and for supporting block size larger
than page size. Convert all users that I could find (we have a few
different ways of writing this idiom, and I may have missed some).

Signed-off-by: Matthew Wilcox (Oracle)
Reviewed-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Acked-by: Dave Kleikamp

Matthew Wilcox (Oracle)
2020-09-21 23:59:26 +0800
7ed3cd1a6 iomap: Fix misplaced page flushing ... Browse Code »

If iomap_unshare_actor() unshares to an inline iomap, the page was
not being flushed. block_write_end() and __iomap_write_end() already
contain flushes, so adding it to iomap_write_end_inline() seems like
the best place. That means we can remove it from iomap_write_actor().

Signed-off-by: Matthew Wilcox (Oracle)
Reviewed-by: Dave Chinner
Reviewed-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Signed-off-by: Darrick J. Wong

Matthew Wilcox (Oracle)
2020-09-21 23:59:26 +0800
6cc19c5fa iomap: Use round_down/round_up macros in __iomap_write_begin ... Browse Code »

Signed-off-by: Nikolay Borisov
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Nikolay Borisov
2020-09-21 23:59:25 +0800

10 Sep, 2020

4 commits

14284fedf iomap: Mark read blocks uptodate in write_begin ... Browse Code »

When bringing (portions of) a page uptodate, we were marking blocks that
were zeroed as being uptodate, but not blocks that were read from storage.

Like the previous commit, this problem was found with generic/127 and
a kernel which failed readahead I/Os. This bug causes writes to be
silently lost when working with flaky storage.

Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
Signed-off-by: Matthew Wilcox (Oracle)
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Matthew Wilcox (Oracle)
2020-09-10 23:26:18 +0800
e6e7ca926 iomap: Clear page error before beginning a write ... Browse Code »

If we find a page in write_begin which is !Uptodate, we need
to clear any error on the page before starting to read data
into it. This matches how filemap_fault(), do_read_cache_page()
and generic_file_buffered_read() handle PageError on !Uptodate pages.
When calling iomap_set_range_uptodate() in __iomap_write_begin(), blocks
were not being marked as uptodate.

This was found with generic/127 and a specially modified kernel which
would fail (some) readahead I/Os. The test read some bytes in a prior
page which caused readahead to extend into page 0x34. There was
a subsequent write to page 0x34, followed by a read to page 0x34.
Because the blocks were still marked as !Uptodate, the read caused all
blocks to be re-read, overwriting the write. With this change, and the
next one, the bytes which were written are marked as being Uptodate, so
even though the page is still marked as !Uptodate, the blocks containing
the written data are not re-read from storage.

Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
Signed-off-by: Matthew Wilcox (Oracle)
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Matthew Wilcox (Oracle)
2020-09-10 23:26:17 +0800
c114bbc6c iomap: Fix direct I/O write consistency check ... Browse Code »

When a direct I/O write falls back to buffered I/O entirely, dio->size
will be 0 in iomap_dio_complete. Function invalidate_inode_pages2_range
will try to invalidate the rest of the address space. If there are any
dirty pages in that range, the write will fail and a "Page cache
invalidation failure on direct I/O" error will be logged.

On gfs2, this can be reproduced as follows:

xfs_io \
-c "open -ft foo" -c "pwrite 4k 4k" -c "close" \
-c "open -d foo" -c "pwrite 0 4k"

Fix this by recognizing 0-length writes.

Signed-off-by: Andreas Gruenbacher
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Andreas Gruenbacher
2020-09-10 23:26:16 +0800
a805c1116 iomap: fix WARN_ON_ONCE() from unprivileged users ... Browse Code »

It is trivial to trigger a WARN_ON_ONCE(1) in iomap_dio_actor() by
unprivileged users which would taint the kernel, or worse - panic if
panic_on_warn or panic_on_taint is set. Hence, just convert it to
pr_warn_ratelimited() to let users know their workloads are racing.
Thank Dave Chinner for the initial analysis of the racing reproducers.

Signed-off-by: Qian Cai
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Qian Cai
2020-09-10 23:26:15 +0800

24 Aug, 2020

1 commit

df561f668 treewide: Use fallthrough pseudo-keyword ... Browse Code »

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva

Gustavo A. R. Silva
2020-08-24 06:36:59 +0800

06 Aug, 2020

2 commits

60263d588 iomap: fall back to buffered writes for invalidation failures ... Browse Code »

Failing to invalid the page cache means data in incoherent, which is
a very bad state for the system. Always fall back to buffered I/O
through the page cache if we can't invalidate mappings.

Signed-off-by: Christoph Hellwig
Acked-by: Dave Chinner
Reviewed-by: Goldwyn Rodrigues
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Acked-by: Bob Peterson
Acked-by: Damien Le Moal
Reviewed-by: Theodore Ts'o # for ext4
Reviewed-by: Andreas Gruenbacher # for gfs2
Reviewed-by: Ritesh Harjani

Christoph Hellwig
2020-08-06 00:24:16 +0800
54752de92 iomap: Only invalidate page cache pages on direct IO writes ... Browse Code »

The historic requirement for XFS to invalidate cached pages on
direct IO reads has been lost in the twisty pages of history - it was
inherited from Irix, which implemented page cache invalidation on
read as a method of working around problems synchronising page
cache state with uncached IO.

XFS has carried this ever since. In the initial linux ports it was
necessary to get mmap and DIO to play "ok" together and not
immediately corrupt data. This was the state of play until the linux
kernel had infrastructure to track unwritten extents and synchronise
page faults with allocations and unwritten extent conversions
(->page_mkwrite infrastructure). IOws, the page cache invalidation
on DIO read was necessary to prevent trivial data corruptions. This
didn't solve all the problems, though.

There were peformance problems if we didn't invalidate the entire
page cache over the file on read - we couldn't easily determine if
the cached pages were over the range of the IO, and invalidation
required taking a serialising lock (i_mutex) on the inode. This
serialising lock was an issue for XFS, as it was the only exclusive
lock in the direct Io read path.

Hence if there were any cached pages, we'd just invalidate the
entire file in one go so that subsequent IOs didn't need to take the
serialising lock. This was a problem that prevented ranged
invalidation from being particularly useful for avoiding the
remaining coherency issues. This was solved with the conversion of
i_mutex to i_rwsem and the conversion of the XFS inode IO lock to
use i_rwsem. Hence we could now just do ranged invalidation and the
performance problem went away.

However, page cache invalidation was still needed to serialise
sub-page/sub-block zeroing via direct IO against buffered IO because
bufferhead state attached to the cached page could get out of whack
when direct IOs were issued. We've removed bufferheads from the
XFS code, and we don't carry any extent state on the cached pages
anymore, and so this problem has gone away, too.

IOWs, it would appear that we don't have any good reason to be
invalidating the page cache on DIO reads anymore. Hence remove the
invalidation on read because it is unnecessary overhead,
not needed to maintain coherency between mmap/buffered access and
direct IO anymore, and prevents anyone from using direct IO reads
from intentionally invalidating the page cache of a file.

Signed-off-by: Dave Chinner
Reviewed-by: Darrick J. Wong
Reviewed-by: Matthew Wilcox (Oracle)
Signed-off-by: Christoph Hellwig
Signed-off-by: Darrick J. Wong

Dave Chinner
2020-08-06 00:24:16 +0800

07 Jul, 2020

1 commit

856473cd5 iomap: Make sure iomap_end is called after iomap_begin ... Browse Code »

Make sure iomap_end is always called when iomap_begin succeeds.

Without this fix, iomap_end won't be called when a filesystem's
iomap_begin operation returns an invalid mapping, bypassing any
unlocking done in iomap_end. With this fix, the unlocking will still
happen.

This bug was found by Bob Peterson during code review. It's unlikely
that such iomap_begin bugs will survive to affect users, so backporting
this fix seems unnecessary.

Fixes: ae259a9c8593 ("fs: introduce iomap infrastructure")
Signed-off-by: Andreas Gruenbacher
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Andreas Gruenbacher
2020-07-07 01:49:27 +0800

14 Jun, 2020

1 commit

593bd5e5d Merge tag 'iomap-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull iomap fix from Darrick Wong:
"A single iomap bug fix for a variable type mistake on 32-bit
architectures, fixing an integer overflow problem in the unshare
actor"

* tag 'iomap-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: Fix unsharing of an extent >2GB on a 32-bit machine

Linus Torvalds
2020-06-14 03:44:30 +0800

09 Jun, 2020

1 commit

d4ff3b2ef iomap: Fix unsharing of an extent >2GB on a 32-bit machine ... Browse Code »

Widen the type used for counting the number of bytes unshared.

Signed-off-by: Matthew Wilcox (Oracle)
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Matthew Wilcox (Oracle)
2020-06-09 11:58:29 +0800

06 Jun, 2020

1 commit

0b166a57e Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 updates from Ted Ts'o:
"A lot of bug fixes and cleanups for ext4, including:

- Fix performance problems found in dioread_nolock now that it is the
default, caused by transaction leaks.

- Clean up fiemap handling in ext4

- Clean up and refactor multiple block allocator (mballoc) code

- Fix a problem with mballoc with a smaller file systems running out
of blocks because they couldn't properly use blocks that had been
reserved by inode preallocation.

- Fixed a race in ext4_sync_parent() versus rename()

- Simplify the error handling in the extent manipulation code

- Make sure all metadata I/O errors are felected to
ext4_ext_dirty()'s and ext4_make_inode_dirty()'s callers.

- Avoid passing an error pointer to brelse in ext4_xattr_set()

- Fix race which could result to freeing an inode on the dirty last
in data=journal mode.

- Fix refcount handling if ext4_iget() fails

- Fix a crash in generic/019 caused by a corrupted extent node"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits)
ext4: avoid unnecessary transaction starts during writeback
ext4: don't block for O_DIRECT if IOCB_NOWAIT is set
ext4: remove the access_ok() check in ext4_ioctl_get_es_cache
fs: remove the access_ok() check in ioctl_fiemap
fs: handle FIEMAP_FLAG_SYNC in fiemap_prep
fs: move fiemap range validation into the file systems instances
iomap: fix the iomap_fiemap prototype
fs: move the fiemap definitions out of fs.h
fs: mark __generic_block_fiemap static
ext4: remove the call to fiemap_check_flags in ext4_fiemap
ext4: split _ext4_fiemap
ext4: fix fiemap size checks for bitmap files
ext4: fix EXT4_MAX_LOGICAL_BLOCK macro
add comment for ext4_dir_entry_2 file_type member
jbd2: avoid leaking transaction credits when unreserving handle
ext4: drop ext4_journal_free_reserved()
ext4: mballoc: use lock for checking free blocks while retrying
ext4: mballoc: refactor ext4_mb_good_group()
ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling
ext4: mballoc: refactor ext4_mb_discard_preallocations()
...

Linus Torvalds
2020-06-06 07:19:28 +0800

04 Jun, 2020

4 commits

45dd052e6 fs: handle FIEMAP_FLAG_SYNC in fiemap_prep ... Browse Code »

By moving FIEMAP_FLAG_SYNC handling to fiemap_prep we ensure it is
handled once instead of duplicated, but can still be done under fs locks,
like xfs/iomap intended with its duplicate handling. Also make sure the
error value of filemap_write_and_wait is propagated to user space.

Signed-off-by: Christoph Hellwig
Reviewed-by: Amir Goldstein
Reviewed-by: Darrick J. Wong
Link: https://lore.kernel.org/r/20200523073016.2944131-8-hch@lst.de
Signed-off-by: Theodore Ts'o

Christoph Hellwig
2020-06-04 11:16:55 +0800
cddf8a2c4 fs: move fiemap range validation into the file systems instances ... Browse Code »

Replace fiemap_check_flags with a fiemap_prep helper that also takes the
inode and mapped range, and performs the sanity check and truncation
previously done in fiemap_check_range. This way the validation is inside
the file system itself and thus properly works for the stacked overlayfs
case as well.

Signed-off-by: Christoph Hellwig
Reviewed-by: Amir Goldstein
Reviewed-by: Darrick J. Wong
Link: https://lore.kernel.org/r/20200523073016.2944131-7-hch@lst.de
Signed-off-by: Theodore Ts'o

Christoph Hellwig
2020-06-04 11:16:55 +0800
273288189 iomap: fix the iomap_fiemap prototype ... Browse Code »

iomap_fiemap should take u64 start and len arguments, just like the
->fiemap prototype.

Signed-off-by: Christoph Hellwig
Reviewed-by: Ritesh Harjani
Reviewed-by: Darrick J. Wong
Link: https://lore.kernel.org/r/20200523073016.2944131-6-hch@lst.de
Signed-off-by: Theodore Ts'o

Christoph Hellwig
2020-06-04 11:16:55 +0800
10c5db286 fs: move the fiemap definitions out of fs.h ... Browse Code »

No need to pull the fiemap definitions into almost every file in the
kernel build.

Signed-off-by: Christoph Hellwig
Reviewed-by: Ritesh Harjani
Reviewed-by: Darrick J. Wong
Link: https://lore.kernel.org/r/20200523073016.2944131-5-hch@lst.de
Signed-off-by: Theodore Ts'o

Christoph Hellwig
2020-06-04 11:16:55 +0800

03 Jun, 2020

5 commits

f3cdc8ae1 Merge tag 'for-5.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux ... Browse Code »

Pull btrfs updates from David Sterba:
"Highlights:

- speedup dead root detection during orphan cleanup, eg. when there
are many deleted subvolumes waiting to be cleaned, the trees are
now looked up in radix tree instead of a O(N^2) search

- snapshot creation with inherited qgroup will mark the qgroup
inconsistent, requires a rescan

- send will emit file capabilities after chown, this produces a
stream that does not need postprocessing to set the capabilities
again

- direct io ported to iomap infrastructure, cleaned up and simplified
code, notably removing last use of struct buffer_head in btrfs code

Core changes:

- factor out backreference iteration, to be used by ordinary
backreferences and relocation code

- improved global block reserve utilization
* better logic to serialize requests
* increased maximum available for unlink
* improved handling on large pages (64K)

- direct io cleanups and fixes
* simplify layering, where cloned bios were unnecessarily created
for some cases
* error handling fixes (submit, endio)
* remove repair worker thread, used to avoid deadlocks during
repair

- refactored block group reading code, preparatory work for new type
of block group storage that should improve mount time on large
filesystems

Cleanups:

- cleaned up (and slightly sped up) set/get helpers for metadata data
structure members

- root bit REF_COWS got renamed to SHAREABLE to reflect the that the
blocks of the tree get shared either among subvolumes or with the
relocation trees

Fixes:

- when subvolume deletion fails due to ENOSPC, the filesystem is not
turned read-only

- device scan deals with devices from other filesystems that changed
ownership due to overwrite (mkfs)

- fix a race between scrub and block group removal/allocation

- fix long standing bug of a runaway balance operation, printing the
same line to the syslog, caused by a stale status bit on a reloc
tree that prevented progress

- fix corrupt log due to concurrent fsync of inodes with shared
extents

- fix space underflow for NODATACOW and buffered writes when it for
some reason needs to fallback to COW mode"

* tag 'for-5.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (133 commits)
btrfs: fix space_info bytes_may_use underflow during space cache writeout
btrfs: fix space_info bytes_may_use underflow after nocow buffered write
btrfs: fix wrong file range cleanup after an error filling dealloc range
btrfs: remove redundant local variable in read_block_for_search
btrfs: open code key_search
btrfs: split btrfs_direct_IO to read and write part
btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK
fs: remove dio_end_io()
btrfs: switch to iomap_dio_rw() for dio
iomap: remove lockdep_assert_held()
iomap: add a filesystem hook for direct I/O bio submission
fs: export generic_file_buffered_read()
btrfs: turn space cache writeout failure messages into debug messages
btrfs: include error on messages about failure to write space/inode caches
btrfs: remove useless 'fail_unlock' label from btrfs_csum_file_blocks()
btrfs: do not ignore error from btrfs_next_leaf() when inserting checksums
btrfs: make checksum item extension more efficient
btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents
btrfs: unexport btrfs_compress_set_level()
btrfs: simplify iget helpers
...

Linus Torvalds
2020-06-03 10:59:25 +0800
750a02ab8 Merge tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:
"Core block changes that have been queued up for this release:

- Remove dead blk-throttle and blk-wbt code (Guoqing)

- Include pid in blktrace note traces (Jan)

- Don't spew I/O errors on wouldblock termination (me)

- Zone append addition (Johannes, Keith, Damien)

- IO accounting improvements (Konstantin, Christoph)

- blk-mq hardware map update improvements (Ming)

- Scheduler dispatch improvement (Salman)

- Inline block encryption support (Satya)

- Request map fixes and improvements (Weiping)

- blk-iocost tweaks (Tejun)

- Fix for timeout failing with error injection (Keith)

- Queue re-run fixes (Douglas)

- CPU hotplug improvements (Christoph)

- Queue entry/exit improvements (Christoph)

- Move DMA drain handling to the few drivers that use it (Christoph)

- Partition handling cleanups (Christoph)"

* tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block: (127 commits)
block: mark bio_wouldblock_error() bio with BIO_QUIET
blk-wbt: rename __wbt_update_limits to wbt_update_limits
blk-wbt: remove wbt_update_limits
blk-throttle: remove tg_drain_bios
blk-throttle: remove blk_throtl_drain
null_blk: force complete for timeout request
blk-mq: drain I/O when all CPUs in a hctx are offline
blk-mq: add blk_mq_all_tag_iter
blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx
blk-mq: use BLK_MQ_NO_TAG in more places
blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG
blk-mq: move more request initialization to blk_mq_rq_ctx_init
blk-mq: simplify the blk_mq_get_request calling convention
blk-mq: remove the bio argument to ->prepare_request
nvme: force complete cancelled requests
blk-mq: blk-mq: provide forced completion method
block: fix a warning when blkdev.h is included for !CONFIG_BLOCK builds
block: blk-crypto-fallback: remove redundant initialization of variable err
block: reduce part_stat_lock() scope
block: use __this_cpu_add() instead of access by smp_processor_id()
...

Linus Torvalds
2020-06-03 06:29:19 +0800
58aeb7319 iomap: use attach/detach_page_private ... Browse Code »

Since the new pair function is introduced, we can call them to clean the
code in iomap.

Signed-off-by: Guoqing Jiang
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Reviewed-by: Darrick J. Wong
Cc: Christoph Hellwig
Cc: Dave Chinner
Link: http://lkml.kernel.org/r/20200517214718.468-7-guoqing.jiang@cloud.ionos.com
Signed-off-by: Linus Torvalds

Guoqing Jiang
2020-06-03 01:59:07 +0800
9d24a13a9 iomap: convert from readpages to readahead ... Browse Code »

Use the new readahead operation in iomap. Convert XFS and ZoneFS to use
it.

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Reviewed-by: William Kucharski
Cc: Chao Yu
Cc: Cong Wang
Cc: Dave Chinner
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: John Hubbard
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-26-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-06-03 01:59:07 +0800
d4388340a fs: convert mpage_readpages to mpage_readahead ... Browse Code »

Implement the new readahead aop and convert all callers (block_dev,
exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6,
reiserfs & udf).

The callers are all trivial except for GFS2 & OCFS2.

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: Junxiao Bi # ocfs2
Reviewed-by: Joseph Qi # ocfs2
Reviewed-by: Dave Chinner
Reviewed-by: John Hubbard
Reviewed-by: Christoph Hellwig
Reviewed-by: William Kucharski
Cc: Chao Yu
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: Michal Hocko
Cc: Zi Yan
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-17-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-06-03 01:59:07 +0800

25 May, 2020

2 commits

3ad99bec6 iomap: remove lockdep_assert_held() ... Browse Code »

Filesystems such as btrfs can perform direct I/O without holding the
inode->i_rwsem in some of the cases like writing within i_size. So,
remove the check for lockdep_assert_held() in iomap_dio_rw().

Reviewed-by: Darrick J. Wong
Signed-off-by: Goldwyn Rodrigues
Signed-off-by: David Sterba

Goldwyn Rodrigues
2020-05-25 19:12:53 +0800
8cecd0ba8 iomap: add a filesystem hook for direct I/O bio submission ... Browse Code »

This helps filesystems to perform tasks on the bio while submitting for
I/O. This could be post-write operations such as data CRC or data
replication for fs-handled RAID.

Reviewed-by: Johannes Thumshirn
Reviewed-by: Nikolay Borisov
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Goldwyn Rodrigues
Signed-off-by: David Sterba

Goldwyn Rodrigues
2020-05-25 19:12:53 +0800

13 May, 2020

1 commit

e6249cdd4 block: add blk_io_schedule() for avoiding task hung in sync dio ... Browse Code »

Sync dio could be big, or may take long time in discard or in case of
IO failure.

We have prevented task hung in submit_bio_wait() and blk_execute_rq(),
so apply the same trick for prevent task hung from happening in sync dio.

Add helper of blk_io_schedule() and use io_schedule_timeout() to prevent
task hung warning.

Signed-off-by: Ming Lei
Reviewed-by: Bart Van Assche
Cc: Salman Qazi
Cc: Jesse Barnes
Cc: Christoph Hellwig
Cc: Bart Van Assche
Cc: Hannes Reinecke
Signed-off-by: Jens Axboe

Ming Lei
2020-05-13 10:32:42 +0800

30 Apr, 2020

1 commit

b75dfde12 fibmap: Warn and return an error in case of block > INT_MAX ... Browse Code »

We better warn the fibmap user and not return a truncated and therefore
an incorrect block map address if the bmap() returned block address
is greater than INT_MAX (since user supplied integer pointer).

It's better to pr_warn() all user of ioctl_fibmap() and return a proper
error code rather than silently letting a FS corruption happen if the
user tries to fiddle around with the returned block map address.

We fix this by returning an error code of -ERANGE and returning 0 as the
block mapping address in case if it is > INT_MAX.

Now iomap_bmap() could be called from either of these two paths.
Either when a user is calling an ioctl_fibmap() interface to get
the block mapping address or by some filesystem via use of bmap()
internal kernel API.
bmap() kernel API is well equipped with handling of u64 addresses.

WARN condition in iomap_bmap_actor() was mainly added to warn all
the fibmap users. But now that we have directly added this warning
for all fibmap users and also made sure to return 0 as block map address
in case if addr > INT_MAX.
So we can now remove this logic from iomap_bmap_actor().

Signed-off-by: Ritesh Harjani
Reviewed-by: Christoph Hellwig
Reviewed-by: Jan Kara
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Ritesh Harjani
2020-04-30 22:57:46 +0800

09 Apr, 2020

1 commit

9744b923d Merge tag 'iomap-5.7-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull iomap fix from Darrick Wong:
"Fix a problem in readahead where we can crash if we can't allocate a
full bio due to GFP_NORETRY"

* tag 'iomap-5.7-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: Handle memory allocation failure in readahead

Linus Torvalds
2020-04-09 12:37:18 +0800