Eric Lee / smarc-fsl-linux-kernel

14 Jan, 2021

1 commit

43edfc892 FROMLIST: fs/buffer.c: Revoke LRU when trying to drop buffers ... Browse Code »

When a buffer is added to the LRU list, a reference is taken which is
not dropped until the buffer is evicted from the LRU list. This is the
correct behavior, however this LRU reference will prevent the buffer
from being dropped. This means that the buffer can't actually be dropped
until it is selected for eviction. There's no bound on the time spent
on the LRU list, which means that the buffer may be undroppable for
very long periods of time. Given that migration involves dropping
buffers, the associated page is now unmigratible for long periods of
time as well. CMA relies on being able to migrate a specific range
of pages, so these types of failures make CMA significantly
less reliable, especially under high filesystem usage.

Rather than waiting for the LRU algorithm to eventually kick out
the buffer, explicitly remove the buffer from the LRU list when trying
to drop it. There is still the possibility that the buffer
could be added back on the list, but that indicates the buffer is
still in use and would probably have other 'in use' indicates to
prevent dropping.

Note: a bug reported by "kernel test robot" lead to a switch from
using xas_for_each() to xa_for_each().

Bug: 174118021
Link: https://lore.kernel.org/linux-mm/cover.1610572007.git.cgoldswo@codeaurora.org/
Signed-off-by: Laura Abbott
Signed-off-by: Chris Goldsworthy
Cc: Matthew Wilcox
Reported-by: kernel test robot
Change-Id: I4a93c4ed81c57874764d12f3beea1194a30c13b2

Laura Abbott
2021-01-14 11:04:05 +0800

19 Oct, 2020

1 commit

b87d8cefe mm, memcg: rework remote charging API to support nesting ... Browse Code »

Currently the remote memcg charging API consists of two functions:
memalloc_use_memcg() and memalloc_unuse_memcg(), which set and clear the
memcg value, which overwrites the memcg of the current task.

memalloc_use_memcg(target_memcg);

memalloc_unuse_memcg();

It works perfectly for allocations performed from a normal context,
however an attempt to call it from an interrupt context or just nest two
remote charging blocks will lead to an incorrect accounting. On exit from
the inner block the active memcg will be cleared instead of being
restored.

memalloc_use_memcg(target_memcg);

memalloc_use_memcg(target_memcg_2);

memalloc_unuse_memcg();

Error: allocation here are charged to the memcg of the current
process instead of target_memcg.

memalloc_unuse_memcg();

This patch extends the remote charging API by switching to a single
function: struct mem_cgroup *set_active_memcg(struct mem_cgroup *memcg),
which sets the new value and returns the old one. So a remote charging
block will look like:

old_memcg = set_active_memcg(target_memcg);

set_active_memcg(old_memcg);

This patch is heavily based on the patch by Johannes Weiner, which can be
found here: https://lkml.org/lkml/2020/5/28/806 .

Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Reviewed-by: Shakeel Butt
Cc: Johannes Weiner
Cc: Dan Schatzberg
Link: https://lkml.kernel.org/r/20200821212056.3769116-1-guro@fb.com
Signed-off-by: Linus Torvalds

Roman Gushchin
2020-10-19 00:27:09 +0800

08 Sep, 2020

1 commit

6dbf7bb55 fs: Don't invalidate page buffers in block_write_full_page() ... Browse Code »

If block_write_full_page() is called for a page that is beyond current
inode size, it will truncate page buffers for the page and return 0.
This logic has been added in 2.5.62 in commit 81eb69062588 ("fix ext3
BUG due to race with truncate") in history.git tree to fix a problem
with ext3 in data=ordered mode. This particular problem doesn't exist
anymore because ext3 is long gone and ext4 handles ordered data
differently. Also normally buffers are invalidated by truncate code and
there's no need to specially handle this in ->writepage() code.

This invalidation of page buffers in block_write_full_page() is causing
issues to filesystems (e.g. ext4 or ocfs2) when block device is shrunk
under filesystem's hands and metadata buffers get discarded while being
tracked by the journalling layer. Although it is obviously "not
supported" it can cause kernel crashes like:

[ 7986.689400] BUG: unable to handle kernel NULL pointer dereference at
+0000000000000008
[ 7986.697197] PGD 0 P4D 0
[ 7986.699724] Oops: 0002 [#1] SMP PTI
[ 7986.703200] CPU: 4 PID: 203778 Comm: jbd2/dm-3-8 Kdump: loaded Tainted: G
+O --------- - - 4.18.0-147.5.0.5.h126.eulerosv2r9.x86_64 #1
[ 7986.716438] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 1.57 08/11/2015
[ 7986.723462] RIP: 0010:jbd2_journal_grab_journal_head+0x1b/0x40 [jbd2]
...
[ 7986.810150] Call Trace:
[ 7986.812595] __jbd2_journal_insert_checkpoint+0x23/0x70 [jbd2]
[ 7986.818408] jbd2_journal_commit_transaction+0x155f/0x1b60 [jbd2]
[ 7986.836467] kjournald2+0xbd/0x270 [jbd2]

which is not great. The crash happens because bh->b_private is suddently
NULL although BH_JBD flag is still set (this is because
block_invalidatepage() cleared BH_Mapped flag and subsequent bh lookup
found buffer without BH_Mapped set, called init_page_buffers() which has
rewritten bh->b_private). So just remove the invalidation in
block_write_full_page().

Note that the buffer cache invalidation when block device changes size
is already careful to avoid similar problems by using
invalidate_mapping_pages() which skips busy buffers so it was only this
odd block_write_full_page() behavior that could tear down bdev buffers
under filesystem's hands.

Reported-by: Ye Bin
Signed-off-by: Jan Kara
Reviewed-by: Christoph Hellwig
CC: stable@vger.kernel.org
Signed-off-by: Jens Axboe

Jan Kara
2020-09-08 00:23:07 +0800

24 Aug, 2020

1 commit

df561f668 treewide: Use fallthrough pseudo-keyword ... Browse Code »

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva

Gustavo A. R. Silva
2020-08-24 06:36:59 +0800

22 Aug, 2020

1 commit

d723b99ec Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 updates from Ted Ts'o:
"Improvements to ext4's block allocator performance for very large file
systems, especially when the file system or files which are highly
fragmented. There is a new mount option, prefetch_block_bitmaps which
will pull in the block bitmaps and set up the in-memory buddy bitmaps
when the file system is initially mounted.

Beyond that, a lot of bug fixes and cleanups. In particular, a number
of changes to make ext4 more robust in the face of write errors or
file system corruptions"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (46 commits)
ext4: limit the length of per-inode prealloc list
ext4: reorganize if statement of ext4_mb_release_context()
ext4: add mb_debug logging when there are lost chunks
ext4: Fix comment typo "the the".
jbd2: clean up checksum verification in do_one_pass()
ext4: change to use fallthrough macro
ext4: remove unused parameter of ext4_generic_delete_entry function
mballoc: replace seq_printf with seq_puts
ext4: optimize the implementation of ext4_mb_good_group()
ext4: delete invalid comments near ext4_mb_check_limits()
ext4: fix typos in ext4_mb_regular_allocator() comment
ext4: fix checking of directory entry validity for inline directories
fs: prevent BUG_ON in submit_bh_wbc()
ext4: correctly restore system zone info when remount fails
ext4: handle add_system_zone() failure in ext4_setup_system_zone()
ext4: fold ext4_data_block_valid_rcu() into the caller
ext4: check journal inode extents more carefully
ext4: don't allow overlapping system zones
ext4: handle error of ext4_setup_system_zone() on remount
ext4: delete the invalid BUGON in ext4_mb_load_buddy_gfp()
...

Linus Torvalds
2020-08-22 02:03:38 +0800

08 Aug, 2020

1 commit

377254b2c fs: prevent BUG_ON in submit_bh_wbc() ... Browse Code »

If a device is hot-removed --- for example, when a physical device is
unplugged from pcie slot or a nbd device's network is shutdown ---
this can result in a BUG_ON() crash in submit_bh_wbc(). This is
because the when the block device dies, the buffer heads will have
their Buffer_Mapped flag get cleared, leading to the crash in
submit_bh_wbc.

We had attempted to work around this problem in commit a17712c8
("ext4: check superblock mapped prior to committing"). Unfortunately,
it's still possible to hit the BUG_ON(!buffer_mapped(bh)) if the
device dies between when the work-around check in ext4_commit_super()
and when submit_bh_wbh() is finally called:

Code path:
ext4_commit_super
judge if 'buffer_mapped(sbh)' is false, return

Xianting Tian
2020-08-08 03:44:59 +0800

04 Aug, 2020

1 commit

382625d0d Merge tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block updates from Jens Axboe:
"Good amount of cleanups and tech debt removals in here, and as a
result, the diffstat shows a nice net reduction in code.

- Softirq completion cleanups (Christoph)

- Stop using ->queuedata (Christoph)

- Cleanup bd claiming (Christoph)

- Use check_events, moving away from the legacy media change
(Christoph)

- Use inode i_blkbits consistently (Christoph)

- Remove old unused writeback congestion bits (Christoph)

- Cleanup/unify submission path (Christoph)

- Use bio_uninit consistently, instead of bio_disassociate_blkg
(Christoph)

- sbitmap cleared bits handling (John)

- Request merging blktrace event addition (Jan)

- sysfs add/remove race fixes (Luis)

- blk-mq tag fixes/optimizations (Ming)

- Duplicate words in comments (Randy)

- Flush deferral cleanup (Yufen)

- IO context locking/retry fixes (John)

- struct_size() usage (Gustavo)

- blk-iocost fixes (Chengming)

- blk-cgroup IO stats fixes (Boris)

- Various little fixes"

* tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block: (135 commits)
block: blk-timeout: delete duplicated word
block: blk-mq-sched: delete duplicated word
block: blk-mq: delete duplicated word
block: genhd: delete duplicated words
block: elevator: delete duplicated word and fix typos
block: bio: delete duplicated words
block: bfq-iosched: fix duplicated word
iocost_monitor: start from the oldest usage index
iocost: Fix check condition of iocg abs_vdebt
block: Remove callback typedefs for blk_mq_ops
block: Use non _rcu version of list functions for tag_set_list
blk-cgroup: show global disk stats in root cgroup io.stat
blk-cgroup: make iostat functions visible to stat printing
block: improve discard bio alignment in __blkdev_issue_discard()
block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers
block: defer flush request no matter whether we have elevator
block: make blk_timeout_init() static
block: remove retry loop in ioc_release_fn()
block: remove unnecessary ioc nested locking
block: integrate bd_start_claiming into __blkdev_get
...

Linus Torvalds
2020-08-04 02:57:03 +0800

09 Jul, 2020

1 commit

4f74d15fe ext4: add inline encryption support ... Browse Code »

Wire up ext4 to support inline encryption via the helper functions which
fs/crypto/ now provides. This includes:

- Adding a mount option 'inlinecrypt' which enables inline encryption
on encrypted files where it can be used.

- Setting the bio_crypt_ctx on bios that will be submitted to an
inline-encrypted file.

Note: submit_bh_wbc() in fs/buffer.c also needed to be patched for
this part, since ext4 sometimes uses ll_rw_block() on file data.

- Not adding logically discontiguous data to bios that will be submitted
to an inline-encrypted file.

- Not doing filesystem-layer crypto on inline-encrypted files.

Co-developed-by: Satya Tangirala
Signed-off-by: Satya Tangirala
Reviewed-by: Theodore Ts'o
Link: https://lore.kernel.org/r/20200702015607.1215430-5-satyat@google.com
Signed-off-by: Eric Biggers

Eric Biggers
2020-07-09 01:29:43 +0800

01 Jul, 2020

1 commit

ed9b3196d fs: remove a weird comment in submit_bh_wbc ... Browse Code »

All bios can get remapped if submitted to partitions. No need to
comment on that.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-07-01 21:27:23 +0800

03 Jun, 2020

2 commits

45dcfc273 fs/buffer.c: use attach/detach_page_private ... Browse Code »

Since the new pair function is introduced, we can call them to clean the
code in buffer.c.

Signed-off-by: Guoqing Jiang
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Cc: Alexander Viro
Link: http://lkml.kernel.org/r/20200517214718.468-5-guoqing.jiang@cloud.ionos.com
Signed-off-by: Linus Torvalds

Guoqing Jiang
2020-06-03 01:59:07 +0800
485e9605c fs/buffer.c: record blockdev write errors in super_block that it backs ... Browse Code »

When syncing out a block device (a'la __sync_blockdev), any error
encountered will only be recorded in the bd_inode's mapping. When the
blockdev contains a filesystem however, we'd like to also record the
error in the super_block that's stored there.

Make mark_buffer_write_io_error also record the error in the
corresponding super_block when a writeback error occurs and the block
device contains a mounted superblock.

Since superblocks are RCU freed, hold the rcu_read_lock to ensure that
the superblock doesn't go away while we're marking it.

Signed-off-by: Jeff Layton
Signed-off-by: Andrew Morton
Reviewed-by: Jan Kara
Cc: Al Viro
Cc: Andres Freund
Cc: Matthew Wilcox
Cc: David Howells
Cc: Christoph Hellwig
Cc: Dave Chinner
Link: http://lkml.kernel.org/r/20200428135155.19223-3-jlayton@kernel.org
Signed-off-by: Linus Torvalds

Jeff Layton
2020-06-03 01:59:05 +0800

25 Apr, 2020

1 commit

3d29cb17b Merge tag 'block-5.7-2020-04-24' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"A few fixes/changes that should go into this release:

- null_blk zoned fixes (Damien)

- blkdev_close() sync improvement (Douglas)

- Fix regression in blk-iocost that impacted (at least) systemtap
(Waiman)

- Comment fix, header removal (Zhiqiang, Jianpeng)"

* tag 'block-5.7-2020-04-24' of git://git.kernel.dk/linux-block:
null_blk: Cleanup zoned device initialization
null_blk: Fix zoned command handling
block: remove unused header
blk-iocost: Fix error on iocost_ioc_vrate_adj
bdev: Reduce time holding bd_mutex in sync in blkdev_close()
buffer: remove useless comment and WB_REASON_FREE_MORE_MEM, reason.

Linus Torvalds
2020-04-25 03:44:19 +0800

18 Apr, 2020

1 commit

c4b4c2a78 buffer: remove useless comment and WB_REASON_FREE_MORE_MEM, reason. ... Browse Code »

free_more_memory func has been completely removed in commit bc48f001de12
("buffer: eliminate the need to call free_more_memory() in __getblk_slow()")

So comment and `WB_REASON_FREE_MORE_MEM` reason about free_more_memory
are no longer needed.

Fixes: bc48f001de12 ("buffer: eliminate the need to call free_more_memory() in __getblk_slow()")
Reviewed-by: Jan Kara
Signed-off-by: Zhiqiang Liu
Signed-off-by: Jens Axboe

Zhiqiang Liu
2020-04-18 11:38:11 +0800

16 Apr, 2020

1 commit

d87f63925 ext4: use non-movable memory for superblock readahead ... Browse Code »

Since commit a8ac900b8163 ("ext4: use non-movable memory for the
superblock") buffers for ext4 superblock were allocated using
the sb_bread_unmovable() helper which allocated buffer heads
out of non-movable memory blocks. It was necessarily to not block
page migrations and do not cause cma allocation failures.

However commit 85c8f176a611 ("ext4: preload block group descriptors")
broke this by introducing pre-reading of the ext4 superblock.
The problem is that __breadahead() is using __getblk() underneath,
which allocates buffer heads out of movable memory.

It resulted in page migration failures I've seen on a machine
with an ext4 partition and a preallocated cma area.

Fix this by introducing sb_breadahead_unmovable() and
__breadahead_gfp() helpers which use non-movable memory for buffer
head allocations and use them for the ext4 superblock readahead.

Reviewed-by: Andreas Dilger
Fixes: 85c8f176a611 ("ext4: preload block group descriptors")
Signed-off-by: Roman Gushchin
Link: https://lore.kernel.org/r/20200229001411.128010-1-guro@fb.com
Signed-off-by: Theodore Ts'o

Roman Gushchin
2020-04-16 11:58:48 +0800

31 Mar, 2020

1 commit

4b9fd8a82 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking updates from Ingo Molnar:
"The main changes in this cycle were:

- Continued user-access cleanups in the futex code.

- percpu-rwsem rewrite that uses its own waitqueue and atomic_t
instead of an embedded rwsem. This addresses a couple of
weaknesses, but the primary motivation was complications on the -rt
kernel.

- Introduce raw lock nesting detection on lockdep
(CONFIG_PROVE_RAW_LOCK_NESTING=y), document the raw_lock vs. normal
lock differences. This too originates from -rt.

- Reuse lockdep zapped chain_hlocks entries, to conserve RAM
footprint on distro-ish kernels running into the "BUG:
MAX_LOCKDEP_CHAIN_HLOCKS too low!" depletion of the lockdep
chain-entries pool.

- Misc cleanups, smaller fixes and enhancements - see the changelog
for details"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t
thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t
Documentation/locking/locktypes: Minor copy editor fixes
Documentation/locking/locktypes: Further clarifications and wordsmithing
m68knommu: Remove mm.h include from uaccess_no.h
x86: get rid of user_atomic_cmpxchg_inatomic()
generic arch_futex_atomic_op_inuser() doesn't need access_ok()
x86: don't reload after cmpxchg in unsafe_atomic_op2() loop
x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end()
objtool: whitelist __sanitizer_cov_trace_switch()
[parisc, s390, sparc64] no need for access_ok() in futex handling
sh: no need of access_ok() in arch_futex_atomic_op_inuser()
futex: arch_futex_atomic_op_inuser() calling conventions change
completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()
lockdep: Add posixtimer context tracing bits
lockdep: Annotate irq_work
lockdep: Add hrtimer context tracing bits
lockdep: Introduce wait-type checks
completion: Use simple wait queues
sched/swait: Prepare usage in completions
...

Linus Torvalds
2020-03-31 07:17:15 +0800

28 Mar, 2020

1 commit

f1e67e355 fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t ... Browse Code »

Bit spinlocks are problematic if PREEMPT_RT is enabled, because they
disable preemption, which is undesired for latency reasons and breaks when
regular spinlocks are taken within the bit_spinlock locked region because
regular spinlocks are converted to 'sleeping spinlocks' on RT.

PREEMPT_RT replaced the bit spinlocks with regular spinlocks to avoid this
problem. The replacement was done conditionaly at compile time, but
Christoph requested to do an unconditional conversion.

Jan suggested to move the spinlock into a existing padding hole which
avoids a size increase of struct buffer_head on production kernels.

As a benefit the lock gains lockdep coverage.

[ bigeasy: Remove the wrapper and use always spinlock_t and move it into
the padding hole ]

Signed-off-by: Thomas Gleixner
Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Thomas Gleixner
Reviewed-by: Jan Kara
Cc: Christoph Hellwig
Link: https://lkml.kernel.org/r/20191118132824.rclhrbujqh4b4g4d@linutronix.de

Thomas Gleixner
2020-03-28 20:21:08 +0800

25 Mar, 2020

1 commit

29125ed62 block: move guard_bio_eod to bio.c ... Browse Code »

This is bio layer functionality and not related to buffer heads.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-25 23:50:08 +0800

25 Jan, 2020

1 commit

cb923159b smp: Remove allocation mask from on_each_cpu_cond.*() ... Browse Code »

The allocation mask is no longer used by on_each_cpu_cond() and
on_each_cpu_cond_mask() and can be removed.

Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra (Intel)
Link: https://lore.kernel.org/r/20200117090137.1205765-4-bigeasy@linutronix.de

Sebastian Andrzej Siewior
2020-01-25 03:40:09 +0800

09 Jan, 2020

1 commit

83c9c5471 fs: move guard_bio_eod() after bio_set_op_attrs ... Browse Code »

Commit 85a8ce62c2ea ("block: add bio_truncate to fix guard_bio_eod")
adds bio_truncate() for handling bio EOD. However, bio_truncate()
doesn't use the passed 'op' parameter from guard_bio_eod's callers.

So bio_trunacate() may retrieve wrong 'op', and zering pages may
not be done for READ bio.

Fixes this issue by moving guard_bio_eod() after bio_set_op_attrs()
in submit_bh_wbc() so that bio_truncate() can always retrieve correct
op info.

Meantime remove the 'op' parameter from guard_bio_eod() because it isn't
used any more.

Cc: Carlos Maiolino
Cc: linux-fsdevel@vger.kernel.org
Fixes: 85a8ce62c2ea ("block: add bio_truncate to fix guard_bio_eod")
Signed-off-by: Ming Lei

Fold in kerneldoc and bio_op() change.

Signed-off-by: Jens Axboe

Ming Lei
2020-01-09 23:16:12 +0800

29 Dec, 2019

1 commit

85a8ce62c block: add bio_truncate to fix guard_bio_eod ... Browse Code »

Some filesystem, such as vfat, may send bio which crosses device boundary,
and the worse thing is that the IO request starting within device boundaries
can contain more than one segment past EOD.

Commit dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors")
tries to fix this issue by returning -EIO for this situation. However,
this way lets fs user code lose chance to handle -EIO, then sync_inodes_sb()
may hang for ever.

Also the current truncating on last segment is dangerous by updating the
last bvec, given bvec table becomes not immutable any more, and fs bio
users may not retrieve the truncated pages via bio_for_each_segment_all() in
its .end_io callback.

Fixes this issue by supporting multi-segment truncating. And the
approach is simpler:

- just update bio size since block layer can make correct bvec with
the updated bio size. Then bvec table becomes really immutable.

- zero all truncated segments for read bio

Cc: Carlos Maiolino
Cc: linux-fsdevel@vger.kernel.org
Fixed-by: dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors")
Reported-by: syzbot+2b9e54155c8c25d8d165@syzkaller.appspotmail.com
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-12-29 00:44:56 +0800

01 Dec, 2019

2 commits

2b211dc04 fs/buffer.c: include internal.h for missing declarations ... Browse Code »

The declarations of __block_write_begin_int and guard_bio_eod are needed
from internal.h so include it to fix the following sparse warnings:

fs/buffer.c:1930:5: warning: symbol '__block_write_begin_int' was not declared. Should it be static?
fs/buffer.c:2994:6: warning: symbol 'guard_bio_eod' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20191011170039.16100-1-ben.dooks@codethink.co.uk
Signed-off-by: Ben Dooks
Reviewed-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Dooks
2019-12-01 22:29:17 +0800
1d7066797 fs/buffer.c: fix use true/false for bool type ... Browse Code »

Use true/false for bool return type of has_bh_in_lru().

Link: http://lkml.kernel.org/r/20191029040529.GA7625@saurav
Signed-off-by: Saurav Girepunje
Reviewed-by: Andrew Morton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Saurav Girepunje
2019-12-01 22:29:17 +0800

15 Nov, 2019

1 commit

31fb992ce fs/buffer.c: support fscrypt in block_read_full_page() ... Browse Code »

After each filesystem block (as represented by a buffer_head) has been
read from disk by block_read_full_page(), decrypt it if needed. The
decryption is done on the fscrypt_read_workqueue.

This is the final change needed to support ext4 encryption with
blocksize != PAGE_SIZE, and it's a fairly small change now that
CONFIG_FS_ENCRYPTION is a bool and fs/crypto/ exposes functions to
decrypt individual blocks and to enqueue work on the fscrypt workqueue.

Don't try to add fs-verity support yet, as the fs/verity/ support layer
isn't ready for sub-page blocks yet. Just add fscrypt support for now.

Almost all the new code is compiled away when CONFIG_FS_ENCRYPTION=n.

Cc: Chandan Rajendra
Signed-off-by: Eric Biggers
Link: https://lore.kernel.org/r/20191023033312.361355-2-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o

Eric Biggers
2019-11-15 05:40:45 +0800

16 Jul, 2019

1 commit

9637d5173 Merge tag 'for-linus-20190715' of git://git.kernel.dk/linux-block ... Browse Code »

Pull more block updates from Jens Axboe:
"A later pull request with some followup items. I had some vacation
coming up to the merge window, so certain things items were delayed a
bit. This pull request also contains fixes that came in within the
last few days of the merge window, which I didn't want to push right
before sending you a pull request.

This contains:

- NVMe pull request, mostly fixes, but also a few minor items on the
feature side that were timing constrained (Christoph et al)

- Report zones fixes (Damien)

- Removal of dead code (Damien)

- Turn on cgroup psi memstall (Josef)

- block cgroup MAINTAINERS entry (Konstantin)

- Flush init fix (Josef)

- blk-throttle low iops timing fix (Konstantin)

- nbd resize fixes (Mike)

- nbd 0 blocksize crash fix (Xiubo)

- block integrity error leak fix (Wenwen)

- blk-cgroup writeback and priority inheritance fixes (Tejun)"

* tag 'for-linus-20190715' of git://git.kernel.dk/linux-block: (42 commits)
MAINTAINERS: add entry for block io cgroup
null_blk: fixup ->report_zones() for !CONFIG_BLK_DEV_ZONED
block: Limit zone array allocation size
sd_zbc: Fix report zones buffer allocation
block: Kill gfp_t argument of blkdev_report_zones()
block: Allow mapping of vmalloc-ed buffers
block/bio-integrity: fix a memory leak bug
nvme: fix NULL deref for fabrics options
nbd: add netlink reconfigure resize support
nbd: fix crash when the blksize is zero
block: Disable write plugging for zoned block devices
block: Fix elevator name declaration
block: Remove unused definitions
nvme: fix regression upon hot device removal and insertion
blk-throttle: fix zero wait time for iops throttled group
block: Fix potential overflow in blk_report_zones()
blkcg: implement REQ_CGROUP_PUNT
blkcg, writeback: Implement wbc_blkcg_css()
blkcg, writeback: Add wbc->no_cgroup_owner
blkcg, writeback: Rename wbc_account_io() to wbc_account_cgroup_owner()
...

Linus Torvalds
2019-07-16 12:20:52 +0800

10 Jul, 2019

1 commit

34e51a5e1 blkcg, writeback: Rename wbc_account_io() to wbc_account_cgroup_owner() ... Browse Code »

wbc_account_io() does a very specific job - try to see which cgroup is
actually dirtying an inode and transfer its ownership to the majority
dirtier if needed. The name is too generic and confusing. Let's
rename it to something more specific.

Reviewed-by: Jan Kara
Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2019-07-10 23:00:57 +0800

28 Jun, 2019

1 commit

8af54f291 fs: fold __generic_write_end back into generic_write_end ... Browse Code »

This effectively reverts a6d639da63ae ("fs: factor out a
__generic_write_end helper") as we now open code what is left of that
helper in iomap.

Signed-off-by: Christoph Hellwig
Signed-off-by: Andreas Gruenbacher
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2019-06-28 08:28:40 +0800

21 May, 2019

1 commit

457c89965 treewide: Add SPDX license identifier for missed files ... Browse Code »

Add SPDX license identifiers to all files which:

- Have no license information of any form

- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-21 16:50:45 +0800

01 May, 2019

2 commits

7a77dad7e iomap: Fix use-after-free error in page_done callback ... Browse Code »

In iomap_write_end, we're not holding a page reference anymore when
calling the page_done callback, but the callback needs that reference to
access the page. To fix that, move the put_page call in
__generic_write_end into the callers of __generic_write_end. Then, in
iomap_write_end, put the page after calling the page_done callback.

Reported-by: Jan Kara
Fixes: 63899c6f8851 ("iomap: add a page_done callback")
Signed-off-by: Andreas Gruenbacher
Reviewed-by: Jan Kara
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Andreas Gruenbacher
2019-05-01 22:47:37 +0800
26ddb1f4f fs: Turn __generic_write_end into a void function ... Browse Code »

The VFS-internal __generic_write_end helper always returns the value of
its @copied argument. This can be confusing, and it isn't very useful
anyway, so turn __generic_write_end into a function returning void
instead.

Signed-off-by: Andreas Gruenbacher
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Andreas Gruenbacher
2019-05-01 22:47:37 +0800

01 Mar, 2019

1 commit

dce30ca9e fs: fix guard_bio_eod to check for real EOD errors ... Browse Code »

guard_bio_eod() can truncate a segment in bio to allow it to do IO on
odd last sectors of a device.

It already checks if the IO starts past EOD, but it does not consider
the possibility of an IO request starting within device boundaries can
contain more than one segment past EOD.

In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
underflow bvec->bv_len.

Fix this by checking if truncated_bytes is lower than PAGE_SIZE.

This situation has been found on filesystems such as isofs and vfat,
which doesn't check the device size before mount, if the device is
smaller than the filesystem itself, a readahead on such filesystem,
which spans EOD, can trigger this situation, leading a call to
zero_user() with a wrong size possibly corrupting memory.

I didn't see any crash, or didn't let the system run long enough to
check if memory corruption will be hit somewhere, but adding
instrumentation to guard_bio_end() to check truncated_bytes size, was
enough to see the error.

The following script can trigger the error.

MNT=/mnt
IMG=./DISK.img
DEV=/dev/loop0

mkfs.vfat $IMG
mount $IMG $MNT
cp -R /etc $MNT &> /dev/null
umount $MNT

losetup -D

losetup --find --show --sizelimit 16247280 $IMG
mount $DEV $MNT

find $MNT -type f -exec cat {} + >/dev/null

Kudos to Eric Sandeen for coming up with the reproducer above

Reviewed-by: Ming Lei
Signed-off-by: Carlos Maiolino
Signed-off-by: Jens Axboe

Carlos Maiolino
2019-03-01 04:59:41 +0800

15 Feb, 2019

2 commits

6fb845f0e Merge tag 'v5.0-rc6' into for-5.1/block ... Browse Code »

Pull in 5.0-rc6 to avoid a dumb merge conflict with fs/iomap.c.
This is needed since io_uring is now based on the block branch,
to avoid a conflict between the multi-page bvecs and the bits
of io_uring that touch the core block parts.

* tag 'v5.0-rc6': (525 commits)
Linux 5.0-rc6
x86/mm: Make set_pmd_at() paravirt aware
MAINTAINERS: Update the ocores i2c bus driver maintainer, etc
blk-mq: remove duplicated definition of blk_mq_freeze_queue
Blk-iolatency: warn on negative inflight IO counter
blk-iolatency: fix IO hang due to negative inflight counter
MAINTAINERS: unify reference to xen-devel list
x86/mm/cpa: Fix set_mce_nospec()
futex: Handle early deadlock return correctly
futex: Fix barrier comment
net: dsa: b53: Fix for failure when irq is not defined in dt
blktrace: Show requests without sector
mips: cm: reprime error cause
mips: loongson64: remove unreachable(), fix loongson_poweroff().
sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
geneve: should not call rt6_lookup() when ipv6 was disabled
KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221)
KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222)
kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974)
signal: Better detection of synchronous signals
...

Jens Axboe
2019-02-15 23:43:59 +0800
f70f44640 fs/buffer.c: use bvec iterator to truncate the bio ... Browse Code »

Once multi-page bvec is enabled, the last bvec may include more than one
page, this patch use mp_bvec_last_segment() to truncate the bio.

Reviewed-by: Omar Sandoval
Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-02-15 23:40:11 +0800

07 Feb, 2019

1 commit

43636c804 fs: ratelimit __find_get_block_slow() failure message. ... Browse Code »

When something let __find_get_block_slow() hit all_mapped path, it calls
printk() for 100+ times per a second. But there is no need to print same
message with such high frequency; it is just asking for stall warning, or
at least bloating log files.

[ 399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
[ 399.873324][T15342] b_state=0x00000029, b_size=512
[ 399.878403][T15342] device loop0 blocksize: 4096
[ 399.883296][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
[ 399.890400][T15342] b_state=0x00000029, b_size=512
[ 399.895595][T15342] device loop0 blocksize: 4096
[ 399.900556][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
[ 399.907471][T15342] b_state=0x00000029, b_size=512
[ 399.912506][T15342] device loop0 blocksize: 4096

This patch reduces frequency to up to once per a second, in addition to
concatenating three lines into one.

[ 399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8, b_state=0x00000029, b_size=512, device loop0 blocksize: 4096

Signed-off-by: Tetsuo Handa
Reviewed-by: Jan Kara
Cc: Dmitry Vyukov
Signed-off-by: Jens Axboe

Tetsuo Handa
2019-02-07 03:58:56 +0800

05 Jan, 2019

1 commit

08d405c8b fs/: remove caller signal_pending branch predictions ... Browse Code »

This is already done for us internally by the signal machinery.

[akpm@linux-foundation.org: fix fs/buffer.c]
Link: http://lkml.kernel.org/r/20181116002713.8474-7-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Reviewed-by: Andrew Morton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2019-01-05 05:13:48 +0800

08 Dec, 2018

1 commit

fd42df305 blkcg: associate writeback bios with a blkg ... Browse Code »

One of the goals of this series is to remove a separate reference to
the css of the bio. This can and should be accessed via bio_blkcg(). In
this patch, wbc_init_bio() now requires a bio to have a device
associated with it.

Signed-off-by: Dennis Zhou
Reviewed-by: Josef Bacik
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Dennis Zhou
2018-12-08 13:26:37 +0800

03 Nov, 2018

1 commit

5f2158538 Merge tag 'for-linus-20181102' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block layer fixes from Jens Axboe:
"The biggest part of this pull request is the revert of the blkcg
cleanup series. It had one fix earlier for a stacked device issue, but
another one was reported. Rather than play whack-a-mole with this,
revert the entire series and try again for the next kernel release.

Apart from that, only small fixes/changes.

Summary:

- Indentation fixup for mtip32xx (Colin Ian King)

- The blkcg cleanup series revert (Dennis Zhou)

- Two NVMe fixes. One fixing a regression in the nvme request
initialization in this merge window, causing nvme-fc to not work.
The other is a suspend/resume p2p resource issue (James, Keith)

- Fix sg discard merge, allowing us to merge in cases where we didn't
before (Jianchao Wang)

- Call rq_qos_exit() after the queue is frozen, preventing a hang
(Ming)

- Fix brd queue setup, fixing an oops if we fail setting up all
devices (Ming)"

* tag 'for-linus-20181102' of git://git.kernel.dk/linux-block:
nvme-pci: fix conflicting p2p resource adds
nvme-fc: fix request private initialization
blkcg: revert blkcg cleanups series
block: brd: associate with queue until adding disk
block: call rq_qos_exit() after queue is frozen
mtip32xx: clean an indentation issue, remove extraneous tabs
block: fix the DISCARD request merge

Linus Torvalds
2018-11-03 02:25:48 +0800

02 Nov, 2018

1 commit

b5f2954d3 blkcg: revert blkcg cleanups series ... Browse Code »

This reverts a series committed earlier due to null pointer exception
bug report in [1]. It seems there are edge case interactions that I did
not consider and will need some time to understand what causes the
adverse interactions.

The original series can be found in [2] with a follow up series in [3].

[1] https://www.spinics.net/lists/cgroups/msg20719.html
[2] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/
[3] https://lore.kernel.org/lkml/20181020185612.51587-1-dennis@kernel.org/

This reverts the following commits:
d459d853c2ed, b2c3fa546705, 101246ec02b5, b3b9f24f5fcc, e2b0989954ae,
f0fcb3ec89f3, c839e7a03f92, bdc2491708c4, 74b7c02a9bc1, 5bf9a1f3b4ef,
a7b39b4e961c, 07b05bcc3213, 49f4c2dc2b50, 27e6fa996c53

Signed-off-by: Dennis Zhou
Signed-off-by: Jens Axboe

Dennis Zhou
2018-11-02 09:59:53 +0800

29 Oct, 2018

1 commit

dad4f140e Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax ... Browse Code »

Pull XArray conversion from Matthew Wilcox:
"The XArray provides an improved interface to the radix tree data
structure, providing locking as part of the API, specifying GFP flags
at allocation time, eliminating preloading, less re-walking the tree,
more efficient iterations and not exposing RCU-protected pointers to
its users.

This patch set

1. Introduces the XArray implementation

2. Converts the pagecache to use it

3. Converts memremap to use it

The page cache is the most complex and important user of the radix
tree, so converting it was most important. Converting the memremap
code removes the only other user of the multiorder code, which allows
us to remove the radix tree code that supported it.

I have 40+ followup patches to convert many other users of the radix
tree over to the XArray, but I'd like to get this part in first. The
other conversions haven't been in linux-next and aren't suitable for
applying yet, but you can see them in the xarray-conv branch if you're
interested"

* 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits)
radix tree: Remove multiorder support
radix tree test: Convert multiorder tests to XArray
radix tree tests: Convert item_delete_rcu to XArray
radix tree tests: Convert item_kill_tree to XArray
radix tree tests: Move item_insert_order
radix tree test suite: Remove multiorder benchmarking
radix tree test suite: Remove __item_insert
memremap: Convert to XArray
xarray: Add range store functionality
xarray: Move multiorder_check to in-kernel tests
xarray: Move multiorder_shrink to kernel tests
xarray: Move multiorder account test in-kernel
radix tree test suite: Convert iteration test to XArray
radix tree test suite: Convert tag_tagged_items to XArray
radix tree: Remove radix_tree_clear_tags
radix tree: Remove radix_tree_maybe_preload_order
radix tree: Remove split/join code
radix tree: Remove radix_tree_update_node_t
page cache: Finish XArray conversion
dax: Convert page fault handlers to XArray
...

Linus Torvalds
2018-10-29 02:35:40 +0800

21 Oct, 2018

1 commit

ec82e1c1c fs: Convert buffer to XArray ... Browse Code »

Mostly comment fixes, but one use of __xa_set_mark.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-10-21 22:46:41 +0800

22 Sep, 2018

1 commit

bdc249170 blkcg: associate writeback bios with a blkg ... Browse Code »

One of the goals of this series is to remove a separate reference to
the css of the bio. This can and should be accessed via bio_blkcg. In
this patch, the wbc_init_bio call is changed such that it must be called
after a queue has been associated with the bio.

Signed-off-by: Dennis Zhou
Reviewed-by: Josef Bacik
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Dennis Zhou (Facebook)
2018-09-22 10:29:11 +0800