Eric Lee / smarc-fsl-linux-kernel

25 Aug, 2022

1 commit

5a01e45b9 f2fs: fix to avoid use f2fs_bug_on() in f2fs_new_node_page() ... Browse Code »

[ Upstream commit 141170b759e03958f296033bb7001be62d1d363b ]

As Dipanjan Das reported, syzkaller
found a f2fs bug as below:

RIP: 0010:f2fs_new_node_page+0x19ac/0x1fc0 fs/f2fs/node.c:1295
Call Trace:
write_all_xattrs fs/f2fs/xattr.c:487 [inline]
__f2fs_setxattr+0xe76/0x2e10 fs/f2fs/xattr.c:743
f2fs_setxattr+0x233/0xab0 fs/f2fs/xattr.c:790
f2fs_xattr_generic_set+0x133/0x170 fs/f2fs/xattr.c:86
__vfs_setxattr+0x115/0x180 fs/xattr.c:182
__vfs_setxattr_noperm+0x125/0x5f0 fs/xattr.c:216
__vfs_setxattr_locked+0x1cf/0x260 fs/xattr.c:277
vfs_setxattr+0x13f/0x330 fs/xattr.c:303
setxattr+0x146/0x160 fs/xattr.c:611
path_setxattr+0x1a7/0x1d0 fs/xattr.c:630
__do_sys_lsetxattr fs/xattr.c:653 [inline]
__se_sys_lsetxattr fs/xattr.c:649 [inline]
__x64_sys_lsetxattr+0xbd/0x150 fs/xattr.c:649
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0

NAT entry and nat bitmap can be inconsistent, e.g. one nid is free
in nat bitmap, and blkaddr in its NAT entry is not NULL_ADDR, it
may trigger BUG_ON() in f2fs_new_node_page(), fix it.

Reported-by: Dipanjan Das
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
Signed-off-by: Sasha Levin

Chao Yu
2022-08-25 17:40:44 +0800

08 Apr, 2022

1 commit

2911ad024 f2fs: fix to avoid potential deadlock ... Browse Code »

[ Upstream commit 344150999b7fc88502a65bbb147a47503eca2033 ]

Quoted from Jing Xia's report, there is a potential deadlock may happen
between kworker and checkpoint as below:

[T:writeback] [T:checkpoint]
- wb_writeback
- blk_start_plug
bio contains NodeA was plugged in writeback threads
- do_writepages -- sync write inodeB, inc wb_sync_req[DATA]
- f2fs_write_data_pages
- f2fs_write_single_data_page -- write last dirty page
- f2fs_do_write_data_page
- set_page_writeback -- clear page dirty flag and
PAGECACHE_TAG_DIRTY tag in radix tree
- f2fs_outplace_write_data
- f2fs_update_data_blkaddr
- f2fs_wait_on_page_writeback -- wait NodeA to writeback here
- inode_dec_dirty_pages
- writeback_sb_inodes
- writeback_single_inode
- do_writepages
- f2fs_write_data_pages -- skip writepages due to wb_sync_req[DATA]
- wbc->pages_skipped += get_dirty_pages() -- PAGECACHE_TAG_DIRTY is not set but get_dirty_pages() returns one
- requeue_inode -- requeue inode to wb->b_dirty queue due to non-zero.pages_skipped
- blk_finish_plug

Let's try to avoid deadlock condition by forcing unplugging previous bio via
blk_finish_plug(current->plug) once we'v skipped writeback in writepages()
due to valid sbi->wb_sync_req[DATA/NODE].

Fixes: 687de7f1010c ("f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE")
Signed-off-by: Zhiguo Niu
Signed-off-by: Jing Xia
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
Signed-off-by: Sasha Levin

Chao Yu
2022-04-08 20:23:11 +0800

01 Dec, 2021

1 commit

8984bba3b f2fs: set SBI_NEED_FSCK flag when inconsistent node block found ... Browse Code »

[ Upstream commit 6663b138ded1a59e630c9e605e42aa7fde490cdc ]

Inconsistent node block will cause a file fail to open or read,
which could make the user process crashes or stucks. Let's mark
SBI_NEED_FSCK flag to trigger a fix at next fsck time. After
unlinking the corrupted file, the user process could regenerate
a new one and work correctly.

Signed-off-by: Weichao Guo
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
Signed-off-by: Sasha Levin

Weichao Guo
2021-12-01 16:04:55 +0800

24 Aug, 2021

2 commits

94c821fb2 f2fs: rebuild nat_bits during umount ... Browse Code »

If all free_nat_bitmap are available, we can rebuild nat_bits from
free_nat_bitmap entirely during umount, let's make another chance
to reenable nat_bits for image.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2021-08-24 01:25:52 +0800
521187439 f2fs: separate out iostat feature ... Browse Code »

Added F2FS_IOSTAT config option to support getting IO statistics through
sysfs and printing out periodic IO statistics tracepoint events and
moved I/O statistics related codes into separate files for better
maintenance.

Signed-off-by: Daeho Jeong
Reviewed-by: Chao Yu
[Jaegeuk Kim: set default=y]
Signed-off-by: Jaegeuk Kim

Daeho Jeong
2021-08-24 01:25:51 +0800

18 Aug, 2021

1 commit

324105775 f2fs: support fault injection for f2fs_kmem_cache_alloc() ... Browse Code »

This patch supports to inject fault into f2fs_kmem_cache_alloc().

Usage:
a) echo 32768 > /sys/fs/f2fs//inject_type or
b) mount -o fault_type=32768

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2021-08-18 02:59:05 +0800

06 Aug, 2021

1 commit

94afd6d6e f2fs: extent cache: support unaligned extent ... Browse Code »

Compressed inode may suffer read performance issue due to it can not
use extent cache, so I propose to add this unaligned extent support
to improve it.

Currently, it only works in readonly format f2fs image.

Unaligned extent: in one compressed cluster, physical block number
will be less than logical block number, so we add an extra physical
block length in extent info in order to indicate such extent status.

The idea is if one whole cluster blocks are contiguous physically,
once its mapping info was readed at first time, we will cache an
unaligned (or aligned) extent info entry in extent cache, it expects
that the mapping info will be hitted when rereading cluster.

Merge policy:
- Aligned extents can be merged.
- Aligned extent and unaligned extent can not be merged.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2021-08-06 02:26:11 +0800

03 Aug, 2021

1 commit

b7ec20617 f2fs: do not submit NEW_ADDR to read node block ... Browse Code »

After the below patch, give cp is errored, we drop dirty node pages. This
can give NEW_ADDR to read node pages. Don't do WARN_ON() which gives
generic/475 failure.

Fixes: 28607bf3aa6f ("f2fs: drop dirty node pages when cp is in error status")
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2021-08-03 02:24:25 +0800

25 Jul, 2021

1 commit

2eeb0dce7 f2fs: don't sleep while grabing nat_tree_lock ... Browse Code »

This tries to fix priority inversion in the below condition resulting in
long checkpoint delay.

f2fs_get_node_info()
- nat_tree_lock
-> sleep to grab journal_rwsem by contention

checkpoint
- waiting for nat_tree_lock

In order to let checkpoint go, let's release nat_tree_lock, if there's a
journal_rwsem contention.

Signed-off-by: Daeho Jeong
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2021-07-25 23:42:38 +0800

07 Jul, 2021

1 commit

28607bf3a f2fs: drop dirty node pages when cp is in error status ... Browse Code »

Otherwise, writeback is going to fall in a loop to flush dirty inode forever
before getting SBI_CLOSING.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2021-07-07 13:05:06 +0800

23 Jun, 2021

1 commit

6ce19aff0 f2fs: compress: add compress_inode to cache compressed blocks ... Browse Code »

Support to use address space of inner inode to cache compressed block,
in order to improve cache hit ratio of random read.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2021-06-23 16:09:35 +0800

15 May, 2021

1 commit

b763f3bed f2fs: restructure f2fs page.private layout ... Browse Code »

Restruct f2fs page private layout for below reasons:

There are some cases that f2fs wants to set a flag in a page to
indicate a specified status of page:
a) page is in transaction list for atomic write
b) page contains dummy data for aligned write
c) page is migrating for GC
d) page contains inline data for inline inode flush
e) page belongs to merkle tree, and is verified for fsverity
f) page is dirty and has filesystem/inode reference count for writeback
g) page is temporary and has decompress io context reference for compression

There are existed places in page structure we can use to store
f2fs private status/data:
- page.flags: PG_checked, PG_private
- page.private

However it was a mess when we using them, which may cause potential
confliction:
page.private PG_private PG_checked page._refcount (+1 at most)
a) -1 set +1
b) -2 set
c), d), e) set
f) 0 set +1
g) pointer set

The other problem is page.flags has no free slot, if we can avoid set
zero to page.private and set PG_private flag, then we use non-zero value
to indicate PG_private status, so that we may have chance to reclaim
PG_private slot for other usage. [1]

The other concern is f2fs has bad scalability in aspect of indicating
more page status.

So in this patch, let's restructure f2fs' page.private as below to
solve above issues:

Layout A: lowest bit should be 1
| bit0 = 1 | bit1 | bit2 | ... | bit MAX | private data .... |
bit 0 PAGE_PRIVATE_NOT_POINTER
bit 1 PAGE_PRIVATE_ATOMIC_WRITE
bit 2 PAGE_PRIVATE_DUMMY_WRITE
bit 3 PAGE_PRIVATE_ONGOING_MIGRATION
bit 4 PAGE_PRIVATE_INLINE_INODE
bit 5 PAGE_PRIVATE_REF_RESOURCE
bit 6- f2fs private data

Layout B: lowest bit should be 0
page.private is a wrapped pointer.

After the change:
page.private PG_private PG_checked page._refcount (+1 at most)
a) 11 set +1
b) 101 set +1
c) 1001 set +1
d) 10001 set +1
e) set
f) 100001 set +1
g) pointer set +1

[1] https://lore.kernel.org/linux-f2fs-devel/20210422154705.GO3596236@casper.infradead.org/T/#u

Cc: Matthew Wilcox
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2021-05-15 02:22:08 +0800

11 Apr, 2021

1 commit

5f029c045 f2fs: clean up build warnings ... Browse Code »

This patch combined the below three clean-up patches.

- modify open brace '{' following function definitions
- ERROR: spaces required around that ':'
- ERROR: spaces required before the open parenthesis '('
- ERROR: spaces prohibited before that ','
- Made suggested modifications from checkpatch in reference to WARNING:
Missing a blank line after declarations

Signed-off-by: Yi Zhuang
Signed-off-by: Jia Yang
Signed-off-by: Jack Qiu
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Yi Zhuang
2021-04-11 01:36:39 +0800

27 Mar, 2021

1 commit

d6d2b491a f2fs: allow to change discard policy based on cached discard cmds ... Browse Code »

With the default DPOLICY_BG discard thread is ioaware, which prevents
the discard thread from issuing the discard commands. On low RAM setups,
it is observed that these discard commands in the cache are consuming
high memory. This patch aims to relax the memory pressure on the system
due to f2fs pending discard cmds by changing the policy to DPOLICY_FORCE
based on the nm_i->ram_thresh configured.

Signed-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Sahitya Tummala
2021-03-27 01:27:44 +0800

26 Mar, 2021

1 commit

b862676e3 f2fs: fix to avoid out-of-bounds memory access ... Browse Code »

butt3rflyh4ck reported a bug found by
syzkaller fuzzer with custom modifications in 5.12.0-rc3+ [1]:

dump_stack+0xfa/0x151 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x82/0x32c mm/kasan/report.c:232
__kasan_report mm/kasan/report.c:399 [inline]
kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
f2fs_test_bit fs/f2fs/f2fs.h:2572 [inline]
current_nat_addr fs/f2fs/node.h:213 [inline]
get_next_nat_page fs/f2fs/node.c:123 [inline]
__flush_nat_entry_set fs/f2fs/node.c:2888 [inline]
f2fs_flush_nat_entries+0x258e/0x2960 fs/f2fs/node.c:2991
f2fs_write_checkpoint+0x1372/0x6a70 fs/f2fs/checkpoint.c:1640
f2fs_issue_checkpoint+0x149/0x410 fs/f2fs/checkpoint.c:1807
f2fs_sync_fs+0x20f/0x420 fs/f2fs/super.c:1454
__sync_filesystem fs/sync.c:39 [inline]
sync_filesystem fs/sync.c:67 [inline]
sync_filesystem+0x1b5/0x260 fs/sync.c:48
generic_shutdown_super+0x70/0x370 fs/super.c:448
kill_block_super+0x97/0xf0 fs/super.c:1394

The root cause is, if nat entry in checkpoint journal area is corrupted,
e.g. nid of journalled nat entry exceeds max nid value, during checkpoint,
once it tries to flush nat journal to NAT area, get_next_nat_page() may
access out-of-bounds memory on nat_bitmap due to it uses wrong nid value
as bitmap offset.

[1] https://lore.kernel.org/lkml/CAFcO6XOMWdr8pObek6eN6-fs58KG9doRFadgJj-FnF-1x43s2g@mail.gmail.com/T/#u

Reported-and-tested-by: butt3rflyh4ck
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2021-03-26 09:20:51 +0800

27 Feb, 2021

1 commit

5f7136db8 block: Add bio_max_segs ... Browse Code »

It's often inconvenient to use BIO_MAX_PAGES due to min() requiring the
sign to be the same. Introduce bio_max_segs() and change BIO_MAX_PAGES to
be unsigned to make it easier for the users.

Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Jens Axboe

Matthew Wilcox (Oracle)
2021-02-27 06:49:51 +0800

28 Jan, 2021

2 commits

d5f7bc006 f2fs: deprecate f2fs_trace_io ... Browse Code »

This patch deprecates f2fs_trace_io, since f2fs uses page->private more broadly,
resulting in more buggy cases.

Acked-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2021-01-28 07:20:07 +0800
36218b81f f2fs: Replace expression with offsetof() ... Browse Code »

Use the existing offsetof() macro instead of duplicating code.

Signed-off-by: Zheng Yongjun
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Zheng Yongjun
2021-01-28 07:20:00 +0800

09 Dec, 2020

1 commit

96dd02519 f2fs: fix to account inline xattr correctly during recovery ... Browse Code »

During recovery, we may missed to update inline xattr count correctly,
fix it.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-12-09 06:25:41 +0800

03 Dec, 2020

1 commit

a95ba66ac f2fs: avoid race condition for shrinker count ... Browse Code »

Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
wrong do_shinker work. Let's avoid to return insane overflowed value by adding
single tracking value.

Reported-by: Light Hsieh
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-12-03 16:59:26 +0800

03 Nov, 2020

1 commit

3acc4522d f2fs: call f2fs_get_meta_page_retry for nat page ... Browse Code »

When running fault injection test, if we don't stop checkpoint, some stale
NAT entries were flushed which breaks consistency.

Fixes: 86f33603f8c5 ("f2fs: handle errors of f2fs_get_meta_page_nofail")
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-11-03 00:33:01 +0800

14 Oct, 2020

1 commit

86f33603f f2fs: handle errors of f2fs_get_meta_page_nofail ... Browse Code »

First problem is we hit BUG_ON() in f2fs_get_sum_page given EIO on
f2fs_get_meta_page_nofail().

Quick fix was not to give any error with infinite loop, but syzbot caught
a case where it goes to that loop from fuzzed image. In turned out we abused
f2fs_get_meta_page_nofail() like in the below call stack.

- f2fs_fill_super
- f2fs_build_segment_manager
- build_sit_entries
- get_current_sit_page

INFO: task syz-executor178:6870 can't die for more than 143 seconds.
task:syz-executor178 state:R
stack:26960 pid: 6870 ppid: 6869 flags:0x00004006
Call Trace:

Showing all locks held in the system:
1 lock held by khungtaskd/1179:
#0: ffffffff8a554da0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6242
1 lock held by systemd-journal/3920:
1 lock held by in:imklog/6769:
#0: ffff88809eebc130 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:930
1 lock held by syz-executor178/6870:
#0: ffff8880925120e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0x201/0xaf0 fs/super.c:229

Actually, we didn't have to use _nofail in this case, since we could return
error to mount(2) already with the error handler.

As a result, this patch tries to 1) remove _nofail callers as much as possible,
2) deal with error case in last remaining caller, f2fs_get_sum_page().

Reported-by: syzbot+ee250ac8137be41d7b13@syzkaller.appspotmail.com
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-10-14 14:23:29 +0800

29 Sep, 2020

1 commit

e6e421870 f2fs: remove unused check on version_bitmap ... Browse Code »

A NULL will not be return by __bitmap_ptr here.
Remove the unused check.

Signed-off-by: Wang Xiaojun
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Wang Xiaojun
2020-09-29 16:48:33 +0800

15 Sep, 2020

1 commit

c8eb70248 f2fs: clean up kvfree ... Browse Code »

After commit 0b6d4ca04a86 ("f2fs: don't return vmalloc() memory from
f2fs_kmalloc()"), f2fs_k{m,z}alloc() will not return vmalloc()'ed
memory, so clean up to use kfree() instead of kvfree() to free
vmalloc()'ed memory.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-09-15 02:15:37 +0800

09 Sep, 2020

1 commit

e2cab031b f2fs: fix indefinite loop scanning for free nid ... Browse Code »

If the sbi->ckpt->next_free_nid is not NAT block aligned and if there
are free nids in that NAT block between the start of the block and
next_free_nid, then those free nids will not be scanned in scan_nat_page().
This results into mismatch between nm_i->available_nids and the sum of
nm_i->free_nid_count of all NAT blocks scanned. And nm_i->available_nids
will always be greater than the sum of free nids in all the blocks.
Under this condition, if we use all the currently scanned free nids,
then it will loop forever in f2fs_alloc_nid() as nm_i->available_nids
is still not zero but nm_i->free_nid_count of that partially scanned
NAT block is zero.

Fix this to align the nm_i->next_scan_nid to the first nid of the
corresponding NAT block.

Signed-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Sahitya Tummala
2020-09-09 11:31:33 +0800

24 Aug, 2020

1 commit

df561f668 treewide: Use fallthrough pseudo-keyword ... Browse Code »

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva

Gustavo A. R. Silva
2020-08-24 06:36:59 +0800

26 Jul, 2020

1 commit

a87aff1d4 f2fs: space related cleanup ... Browse Code »

Just for code style, no logic change
1. delete useless space
2. change spaces into tab

Signed-off-by: Jack Qiu
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jack Qiu
2020-07-26 23:15:40 +0800

24 Jul, 2020

1 commit

68e79baf4 f2fs: Change the type of f2fs_flush_inline_data() to void ... Browse Code »

The return value of f2fs_flush_inline_data() is not used,
so delete it.

Signed-off-by: Jia Yang
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jia Yang
2020-07-24 11:22:37 +0800

22 Jul, 2020

1 commit

b0f3b87fb f2fs: should avoid inode eviction in synchronous path ... Browse Code »

https://bugzilla.kernel.org/show_bug.cgi?id=208565

PID: 257 TASK: ecdd0000 CPU: 0 COMMAND: "init"
#0 [] (__schedule) from []
#1 [] (schedule) from []
#2 [] (rwsem_down_read_failed) from []
#3 [] (down_read) from []
#4 [] (f2fs_truncate_blocks) from []
#5 [] (f2fs_truncate) from []
#6 [] (f2fs_evict_inode) from []
#7 [] (evict) from []
#8 [] (iput) from []
#9 [] (f2fs_sync_node_pages) from []
#10 [] (f2fs_write_checkpoint) from []
#11 [] (f2fs_sync_fs) from []
#12 [] (f2fs_do_sync_file) from []
#13 [] (f2fs_sync_file) from []
#14 [] (vfs_fsync_range) from []
#15 [] (do_fsync) from []
#16 [] (sys_fsync) from []

This can be caused by flush_dirty_inode() in f2fs_sync_node_pages() where
iput() requires f2fs_lock_op() again resulting in livelock.

Reported-by: Zhiguo Niu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-07-22 03:55:54 +0800

09 Jul, 2020

1 commit

9627a7b31 f2fs: fix error path in do_recover_data() ... Browse Code »

- don't panic kernel if f2fs_get_node_page() fails in
f2fs_recover_inline_data() or f2fs_recover_inline_xattr();
- return error number of f2fs_truncate_blocks() to
f2fs_recover_inline_data()'s caller;

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-07-09 01:11:19 +0800

08 Jul, 2020

2 commits

9039d8355 f2fs: lost matching-pair of trace in f2fs_truncate_inode_blocks ... Browse Code »

if get_node_path() return -E2BIG and trace of
f2fs_truncate_inode_blocks_enter/exit enabled
then the matching-pair of trace_exit will lost
in log.

Signed-off-by: Yubo Feng
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Yubo Feng
2020-07-08 12:51:47 +0800
b815bdc78 f2fs: remove useless parameter of __insert_free_nid() ... Browse Code »

In current version, @state will only be FREE_NID. This parameter
has no real effect so remove it to keep clean.

Signed-off-by: Liu Song
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Liu Song
2020-07-08 12:51:45 +0800

09 Jun, 2020

1 commit

0b6d4ca04 f2fs: don't return vmalloc() memory from f2fs_kmalloc() ... Browse Code »

kmalloc() returns kmalloc'ed memory, and kvmalloc() returns either
kmalloc'ed or vmalloc'ed memory. But the f2fs wrappers, f2fs_kmalloc()
and f2fs_kvmalloc(), both return both kinds of memory.

It's redundant to have two functions that do the same thing, and also
breaking the standard naming convention is causing bugs since people
assume it's safe to kfree() memory allocated by f2fs_kmalloc(). See
e.g. the various allocations in fs/f2fs/compress.c.

Fix this by making f2fs_kmalloc() just use kmalloc(). And to avoid
re-introducing the allocation failures that the vmalloc fallback was
intended to fix, convert the largest allocations to use f2fs_kvmalloc().

Signed-off-by: Eric Biggers
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Eric Biggers
2020-06-09 11:34:58 +0800

25 May, 2020

1 commit

6d7c865c2 f2fs: avoid inifinite loop to wait for flushing node pages at cp_error ... Browse Code »

Shutdown test is somtimes hung, since it keeps trying to flush dirty node pages
in an inifinite loop. Let's drop dirty pages at umount in that case.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-05-25 11:54:34 +0800

12 May, 2020

2 commits

34c061ad8 f2fs: Avoid double lock for cp_rwsem during checkpoint ... Browse Code »

There could be a scenario where f2fs_sync_node_pages gets
called during checkpoint, which in turn tries to flush
inline data and calls iput(). This results in deadlock as
iput() tries to hold cp_rwsem, which is already held at the
beginning by checkpoint->block_operations().

Call stack :

Thread A Thread B
f2fs_write_checkpoint()
- block_operations(sbi)
- f2fs_lock_all(sbi);
- down_write(&sbi->cp_rwsem);

- open()
- igrab()
- write() write inline data
- unlink()
- f2fs_sync_node_pages()
- if (is_inline_node(page))
- flush_inline_data()
- ilookup()
page = f2fs_pagecache_get_page()
if (!page)
goto iput_out;
iput_out:
-close()
-iput()
iput(inode);
- f2fs_evict_inode()
- f2fs_truncate_blocks()
- f2fs_lock_op()
- down_read(&sbi->cp_rwsem);

Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
Signed-off-by: Sayali Lokhande
Signed-off-by: Jaegeuk Kim

Sayali Lokhande
2020-05-12 11:36:47 +0800
042be373a f2fs: shrink spinlock coverage ... Browse Code »

In f2fs_try_to_free_nids(), .nid_list_lock spinlock critical region will
increase as expected shrink number increase, to avoid spining other CPUs
for long time, we change to release nid caches with small batch each time
under .nid_list_lock coverage.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-05-12 11:36:46 +0800

18 Apr, 2020

1 commit

8b83ac81f f2fs: support read iostat ... Browse Code »

Adds to support accounting read IOs from userspace/kernel.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-04-18 00:17:00 +0800

31 Mar, 2020

1 commit

7bcd0cfa7 f2fs: don't trigger data flush in foreground operation ... Browse Code »

Data flush can generate heavy IO and cause long latency during
flush, so it's not appropriate to trigger it in foreground
operation.

And also, we may face below potential deadlock during data flush:
- f2fs_write_multi_pages
- f2fs_write_raw_pages
- f2fs_write_single_data_page
- f2fs_balance_fs
- f2fs_balance_fs_bg
- f2fs_sync_dirty_inodes
- filemap_fdatawrite -- stuck on flush same cluster

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-03-31 11:46:24 +0800

20 Mar, 2020

2 commits

985100035 f2fs: add prefix for f2fs slab cache name ... Browse Code »

In order to avoid polluting global slab cache namespace.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-03-20 02:41:26 +0800
5df7731f6 f2fs: introduce DEFAULT_IO_TIMEOUT ... Browse Code »

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-03-20 02:41:26 +0800