Eric Lee / smarc-fsl-linux-kernel

22 Dec, 2020

4 commits

317c4f58d f2fs: avoid race condition for shrinker count ... Browse Code »

Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
wrong do_shinker work. Let's avoid to return insane overflowed value by adding
single tracking value.

Reported-by: Light Hsieh
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-12-22 05:33:17 +0800
5a1197544 f2fs: fix kbytes written stat for multi-device case ... Browse Code »

For multi-device case, one f2fs image includes multi devices, so it
needs to account bytes written of all block devices belong to the image
rather than one main block device, fix it.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-12-22 05:33:15 +0800
301e31717 f2fs: change to use rwsem for cp_mutex ... Browse Code »

Use rwsem to ensure serialization of the callers and to avoid
starvation of high priority tasks, when the system is under
heavy IO workload.

Signed-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Sahitya Tummala
2020-12-22 05:33:14 +0800
c60814e9f f2fs: Remove the redundancy initialization ... Browse Code »

There are two assignments are meaningless, and remove them.

Signed-off-by: Zhang Qilong
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Zhang Qilong
2020-12-22 05:33:13 +0800

14 Oct, 2020

1 commit

86f33603f f2fs: handle errors of f2fs_get_meta_page_nofail ... Browse Code »

First problem is we hit BUG_ON() in f2fs_get_sum_page given EIO on
f2fs_get_meta_page_nofail().

Quick fix was not to give any error with infinite loop, but syzbot caught
a case where it goes to that loop from fuzzed image. In turned out we abused
f2fs_get_meta_page_nofail() like in the below call stack.

- f2fs_fill_super
- f2fs_build_segment_manager
- build_sit_entries
- get_current_sit_page

INFO: task syz-executor178:6870 can't die for more than 143 seconds.
task:syz-executor178 state:R
stack:26960 pid: 6870 ppid: 6869 flags:0x00004006
Call Trace:

Showing all locks held in the system:
1 lock held by khungtaskd/1179:
#0: ffffffff8a554da0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6242
1 lock held by systemd-journal/3920:
1 lock held by in:imklog/6769:
#0: ffff88809eebc130 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:930
1 lock held by syz-executor178/6870:
#0: ffff8880925120e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0x201/0xaf0 fs/super.c:229

Actually, we didn't have to use _nofail in this case, since we could return
error to mount(2) already with the error handler.

As a result, this patch tries to 1) remove _nofail callers as much as possible,
2) deal with error case in last remaining caller, f2fs_get_sum_page().

Reported-by: syzbot+ee250ac8137be41d7b13@syzkaller.appspotmail.com
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-10-14 14:23:29 +0800

29 Sep, 2020

2 commits

6a257471f f2fs: fix to check segment boundary during SIT page readahead ... Browse Code »

As syzbot reported:

kernel BUG at fs/f2fs/segment.h:657!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 16220 Comm: syz-executor.0 Not tainted 5.9.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:f2fs_ra_meta_pages+0xa51/0xdc0 fs/f2fs/segment.h:657
Call Trace:
build_sit_entries fs/f2fs/segment.c:4195 [inline]
f2fs_build_segment_manager+0x4b8a/0xa3c0 fs/f2fs/segment.c:4779
f2fs_fill_super+0x377d/0x6b80 fs/f2fs/super.c:3633
mount_bdev+0x32e/0x3f0 fs/super.c:1417
legacy_get_tree+0x105/0x220 fs/fs_context.c:592
vfs_get_tree+0x89/0x2f0 fs/super.c:1547
do_new_mount fs/namespace.c:2875 [inline]
path_mount+0x1387/0x2070 fs/namespace.c:3192
do_mount fs/namespace.c:3205 [inline]
__do_sys_mount fs/namespace.c:3413 [inline]
__se_sys_mount fs/namespace.c:3390 [inline]
__x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9

@blkno in f2fs_ra_meta_pages could exceed max segment count, causing panic
in following sanity check in current_sit_addr(), add check condition to
avoid this issue.

Reported-by: syzbot+3698081bcf0bb2d12174@syzkaller.appspotmail.com
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-09-29 17:12:41 +0800
9b6648228 f2fs: add trace exit in exception path ... Browse Code »

Missing the trace exit in f2fs_sync_dirty_inodes

Signed-off-by: Zhang Qilong
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Zhang Qilong
2020-09-29 16:48:33 +0800

12 Sep, 2020

1 commit

093749e29 f2fs: support age threshold based garbage collection ... Browse Code »

There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.

This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:

1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;

2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.

3. merge valid blocks from source victim into target victim with
SSR alloctor.

Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC

Benefit: GC count and block movement count both decrease obviously:

- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)

GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)

IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments

- After:

- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)

GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)

IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments

Signed-off-by: Chao Yu
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-09-12 02:11:15 +0800

11 Sep, 2020

1 commit

d0b9e42ab f2fs: introduce inmem curseg ... Browse Code »

Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;

In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;

With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature

So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-09-11 05:03:30 +0800

04 Aug, 2020

3 commits

828add774 f2fs: prepare a waiter before entering io_schedule ... Browse Code »

This is to avoid sleep() in the waiter thread.

[ 20.157753] ------------[ cut here ]------------
[ 20.158393] do not call blocking ops when !TASK_RUNNING; state=2 set at [] prepare_to_wait+0xcd/0x430
[ 20.159858] WARNING: CPU: 1 PID: 1152 at kernel/sched/core.c:7142 __might_sleep+0x149/0x1a0
...
[ 20.176110] __submit_merged_write_cond+0x191/0x310
[ 20.176739] f2fs_submit_merged_write+0x18/0x20
[ 20.177323] f2fs_wait_on_all_pages+0x269/0x2d0
[ 20.177899] ? block_operations+0x980/0x980
[ 20.178441] ? __kasan_check_read+0x11/0x20
[ 20.178975] ? finish_wait+0x260/0x260
[ 20.179488] ? percpu_counter_set+0x147/0x230
[ 20.180049] do_checkpoint+0x1757/0x2a50
[ 20.180558] f2fs_write_checkpoint+0x840/0xaf0
[ 20.181126] f2fs_sync_fs+0x287/0x4a0

Reported-by: Eric Biggers
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-08-04 11:54:58 +0800
1fd280188 f2fs: fix deadlock between quota writes and checkpoint ... Browse Code »

f2fs_write_data_pages(quota_mapping)
__f2fs_write_data_pages f2fs_write_checkpoint
* blk_start_plug(&plug);
* add bio in write_io[DATA]
- block_operations
- skip syncing quota by
>DEFAULT_RETRY_QUOTA_FLUSH_COUNT
- down_write(&sbi->node_write);
- f2fs_write_single_data_page
- down_read(node_write)
- f2fs_wait_on_all_pages(F2FS_WB_CP_DATA);

Signed-off-by: Daeho Jeong
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-08-04 01:32:51 +0800
1f07cc58b f2fs: correct comment of f2fs_exist_written_data ... Browse Code »

Function parameter mode could be TRANS_DIR_INO.

Signed-off-by: Jack Qiu
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jack Qiu
2020-08-04 01:32:43 +0800

26 Jul, 2020

1 commit

a87aff1d4 f2fs: space related cleanup ... Browse Code »

Just for code style, no logic change
1. delete useless space
2. change spaces into tab

Signed-off-by: Jack Qiu
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jack Qiu
2020-07-26 23:15:40 +0800

09 Jul, 2020

1 commit

aff6fbbe8 f2fs: don't keep meta inode pages used for compressed block migration ... Browse Code »

meta inode's pages are used for encrypted, verity and compressed blocks,
so the meta inode's cache invalidation condition in do_checkpoint() should
consider compression as well, not just for verity and encryption, fix it.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-07-09 13:28:34 +0800

09 Jun, 2020

1 commit

0b6d4ca04 f2fs: don't return vmalloc() memory from f2fs_kmalloc() ... Browse Code »

kmalloc() returns kmalloc'ed memory, and kvmalloc() returns either
kmalloc'ed or vmalloc'ed memory. But the f2fs wrappers, f2fs_kmalloc()
and f2fs_kvmalloc(), both return both kinds of memory.

It's redundant to have two functions that do the same thing, and also
breaking the standard naming convention is causing bugs since people
assume it's safe to kfree() memory allocated by f2fs_kmalloc(). See
e.g. the various allocations in fs/f2fs/compress.c.

Fix this by making f2fs_kmalloc() just use kmalloc(). And to avoid
re-introducing the allocation failures that the vmalloc fallback was
intended to fix, convert the largest allocations to use f2fs_kvmalloc().

Signed-off-by: Eric Biggers
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Eric Biggers
2020-06-09 11:34:58 +0800

19 May, 2020

1 commit

9c30df7c5 f2fs: flush dirty meta pages when flushing them ... Browse Code »

Let's guarantee flusing dirty meta pages to avoid infinite loop.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-05-19 01:47:24 +0800

12 May, 2020

3 commits

b4b10061e f2fs: refactor resize_fs to avoid meta updates in progress ... Browse Code »

Sahitya raised an issue:
- prevent meta updates while checkpoint is in progress

allocate_segment_for_resize() can cause metapage updates if
it requires to change the current node/data segments for resizing.
Stop these meta updates when there is a checkpoint already
in progress to prevent inconsistent CP data.

Signed-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-05-12 11:37:13 +0800
34c061ad8 f2fs: Avoid double lock for cp_rwsem during checkpoint ... Browse Code »

There could be a scenario where f2fs_sync_node_pages gets
called during checkpoint, which in turn tries to flush
inline data and calls iput(). This results in deadlock as
iput() tries to hold cp_rwsem, which is already held at the
beginning by checkpoint->block_operations().

Call stack :

Thread A Thread B
f2fs_write_checkpoint()
- block_operations(sbi)
- f2fs_lock_all(sbi);
- down_write(&sbi->cp_rwsem);

- open()
- igrab()
- write() write inline data
- unlink()
- f2fs_sync_node_pages()
- if (is_inline_node(page))
- flush_inline_data()
- ilookup()
page = f2fs_pagecache_get_page()
if (!page)
goto iput_out;
iput_out:
-close()
-iput()
iput(inode);
- f2fs_evict_inode()
- f2fs_truncate_blocks()
- f2fs_lock_op()
- down_read(&sbi->cp_rwsem);

Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
Signed-off-by: Sayali Lokhande
Signed-off-by: Jaegeuk Kim

Sayali Lokhande
2020-05-12 11:36:47 +0800
1f5f11a3c f2fs: remove blk_plugging in block_operations ... Browse Code »

blk_plugging doesn't seem to give any benefit.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-05-12 11:36:47 +0800

18 Apr, 2020

2 commits

8b83ac81f f2fs: support read iostat ... Browse Code »

Adds to support accounting read IOs from userspace/kernel.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-04-18 00:17:00 +0800
ce4c638cd f2fs: fix to handle error path of f2fs_ra_meta_pages() ... Browse Code »

In f2fs_ra_meta_pages(), if f2fs_submit_page_bio() failed, we need to
unlock page, fix it.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-04-18 00:17:00 +0800

23 Mar, 2020

1 commit

c84ef3c5e f2fs: Add a new CP flag to help fsck fix resize SPO issues ... Browse Code »

Add and set a new CP flag CP_RESIZEFS_FLAG during
online resize FS to help fsck fix the metadata mismatch
that may happen due to SPO during resize, where SB
got updated but CP data couldn't be written yet.

fsck errors -
Info: CKPT version = 6ed7bccb
Wrong user_block_count(2233856)
[f2fs_do_mount:3365] Checkpoint is polluted

Signed-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Sahitya Tummala
2020-03-23 12:16:28 +0800

20 Mar, 2020

1 commit

5df7731f6 f2fs: introduce DEFAULT_IO_TIMEOUT ... Browse Code »

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-03-20 02:41:26 +0800

11 Mar, 2020

1 commit

7a88ddb56 f2fs: fix inconsistent comments ... Browse Code »

Lack of maintenance on comments may mislead developers, fix them.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-03-11 00:18:33 +0800

28 Feb, 2020

1 commit

bf22c3cc8 f2fs: fix the panic in do_checkpoint() ... Browse Code »

There could be a scenario where f2fs_sync_meta_pages() will not
ensure that all F2FS_DIRTY_META pages are submitted for IO. Thus,
resulting in the below panic in do_checkpoint() -

f2fs_bug_on(sbi, get_pages(sbi, F2FS_DIRTY_META) &&
!f2fs_cp_error(sbi));

This can happen in a low-memory condition, where shrinker could
also be doing the writepage operation (stack shown below)
at the same time when checkpoint is running on another core.

schedule
down_write
f2fs_submit_page_write -> by this time, this page in page cache is tagged
as PAGECACHE_TAG_WRITEBACK and PAGECACHE_TAG_DIRTY
is cleared, due to which f2fs_sync_meta_pages()
cannot sync this page in do_checkpoint() path.
f2fs_do_write_meta_page
__f2fs_write_meta_page
f2fs_write_meta_page
shrink_page_list
shrink_inactive_list
shrink_node_memcg
shrink_node
kswapd

Signed-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Sahitya Tummala
2020-02-28 02:16:44 +0800

16 Jan, 2020

1 commit

542989b67 f2fs: don't keep META_MAPPING pages used for moving verity file blocks ... Browse Code »

META_MAPPING is used to move blocks for both encrypted and verity files.
So the META_MAPPING invalidation condition in do_checkpoint() should
consider verity too, not just encrypt.

Signed-off-by: Eric Biggers
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Eric Biggers
2020-01-16 05:43:48 +0800

20 Nov, 2019

1 commit

c45d6002f f2fs: show f2fs instance in printk_ratelimited ... Browse Code »

As Eric mentioned, bare printk{,_ratelimited} won't show which
filesystem instance these message is coming from, this patch tries
to show fs instance with sb->s_id field in all places we missed
before.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-11-20 06:41:21 +0800

03 Jul, 2019

4 commits

db6ec53b7 f2fs: add a rw_sem to cover quota flag changes ... Browse Code »

Two paths to update quota and f2fs_lock_op:

1.
- lock_op
| - quota_update
`- unlock_op

2.
- quota_update
- lock_op
`- unlock_op

But, we need to make a transaction on quota_update + lock_op in #2 case.
So, this patch introduces:
1. lock_op
2. down_write
3. check __need_flush
4. up_write
5. if there is dirty quota entries, flush them
6. otherwise, good to go

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2019-07-03 06:40:41 +0800
10f966bbf f2fs: use generic EFSBADCRC/EFSCORRUPTED ... Browse Code »

f2fs uses EFAULT as error number to indicate filesystem is corrupted
all the time, but generic filesystems use EUCLEAN for such condition,
we need to change to follow others.

This patch adds two new macros as below to wrap more generic error
code macros, and spread them in code.

EFSBADCRC EBADMSG /* Bad CRC detected */
EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */

Reported-by: Pavel Machek
Signed-off-by: Chao Yu
Acked-by: Pavel Machek
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-07-03 06:40:41 +0800
dcbb4c10e f2fs: introduce f2fs_<level> macros to wrap f2fs_printk() ... Browse Code »

- Add and use f2fs_ macros
- Convert f2fs_msg to f2fs_printk
- Remove level from f2fs_printk and embed the level in the format
- Coalesce formats and align multi-line arguments
- Remove unnecessary duplicate extern f2fs_msg f2fs.h

Signed-off-by: Joe Perches
Signed-off-by: Chao Yu
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Joe Perches
2019-07-03 06:40:40 +0800
04f0b2eaa f2fs: ioctl for removing a range from F2FS ... Browse Code »

This ioctl shrinks a given length (aligned to sections) from end of the
main area. Any cursegs and valid blocks will be moved out before
invalidating the range.

This feature can be used for adjusting partition sizes online.

History of the patch:

Sahitya Tummala:
- Add this ioctl for f2fs_compat_ioctl() as well.
- Fix debugfs status to reflect the online resize changes.
- Fix potential race between online resize path and allocate new data
block path or gc path.

Others:
- Rename some identifiers.
- Add some error handling branches.
- Clear sbi->next_victim_seg[BG_GC/FG_GC] in shrinking range.
- Implement this interface as ext4's, and change the parameter from shrunk
bytes to new block count of F2FS.
- During resizing, force to empty sit_journal and forbid adding new
entries to it, in order to avoid invalid segno in journal after resize.
- Reduce sbi->user_block_count before resize starts.
- Commit the updated superblock first, and then update in-memory metadata
only when the former succeeds.
- Target block count must align to sections.
- Write checkpoint before and after committing the new superblock, w/o
CP_FSCK_FLAG respectively, so that the FS can be fixed by fsck even if
resize fails after the new superblock is committed.
- In free_segment_range(), reduce granularity of gc_mutex.
- Add protection on curseg migration.
- Add freeze_bdev() and thaw_bdev() for resize fs.
- Remove CUR_MAIN_SECS and use MAIN_SECS directly for allocation.
- Recover super_block and FS metadata when resize fails.
- No need to clear CP_FSCK_FLAG in update_ckpt_flags().
- Clean up the sb and fs metadata update functions for resize_fs.

Geert Uytterhoeven:
- Use div_u64*() for 64-bit divisions

Arnd Bergmann:
- Not all architectures support get_user() with a 64-bit argument:
ERROR: "__get_user_bad" [fs/f2fs/f2fs.ko] undefined!
Use copy_from_user() here, this will always work.

Signed-off-by: Qiuyang Sun
Signed-off-by: Chao Yu
Signed-off-by: Sahitya Tummala
Signed-off-by: Geert Uytterhoeven
Signed-off-by: Arnd Bergmann
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Qiuyang Sun
2019-07-03 06:39:24 +0800

23 May, 2019

2 commits

5dae2d390 f2fs: fix to check layout on last valid checkpoint park ... Browse Code »

As Ju Hyung reported:

"
I was semi-forced today to use the new kernel and test f2fs.

My Ubuntu initramfs got a bit wonky and I had to boot into live CD and
fix some stuffs. The live CD was using 4.15 kernel, and just mounting
the f2fs partition there corrupted f2fs and my 4.19(with 5.1-rc1-4.19
f2fs-stable merged) refused to mount with "SIT is corrupted node"
message.

I used the latest f2fs-tools sent by Chao including "fsck.f2fs: fix to
repair cp_loads blocks at correct position"

It spit out 140M worth of output, but at least I didn't have to run it
twice. Everything returned "Ok" in the 2nd run.
The new log is at
http://arter97.com/f2fs/final

After fixing the image, I used my 4.19 kernel with 5.2-rc1-4.19
f2fs-stable merged and it mounted.

But, I got this:
[ 1.047791] F2FS-fs (nvme0n1p3): layout of large_nat_bitmap is
deprecated, run fsck to repair, chksum_offset: 4092
[ 1.081307] F2FS-fs (nvme0n1p3): Found nat_bits in checkpoint
[ 1.161520] F2FS-fs (nvme0n1p3): recover fsync data on readonly fs
[ 1.162418] F2FS-fs (nvme0n1p3): Mounted with checkpoint version = 761c7e00

But after doing a reboot, the message is gone:
[ 1.098423] F2FS-fs (nvme0n1p3): Found nat_bits in checkpoint
[ 1.177771] F2FS-fs (nvme0n1p3): recover fsync data on readonly fs
[ 1.178365] F2FS-fs (nvme0n1p3): Mounted with checkpoint version = 761c7eda

I'm not exactly sure why the kernel detected that I'm still using the
old layout on the first boot. Maybe fsck didn't fix it properly, or
the check from the kernel is improper.
"

Although we have rebuild the old deprecated checkpoint with new layout
during repair, we only repair last checkpoint park, the other old one is
remained.

Once the image was mounted, we will 1) sanity check layout and 2) decide
which checkpoint park to use according to cp_ver. So that we will print
reported message unnecessarily at step 1), to avoid it, we simply move
layout check into f2fs_sanity_check_ckpt() after step 2).

Reported-by: Park Ju Hyung
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-05-23 22:03:18 +0800
bc88ac96a f2fs: link f2fs quota ops for sysfile ... Browse Code »

This patch reverts:
commit fb40d618b039 ("f2fs: don't clear CP_QUOTA_NEED_FSCK_FLAG").

We were missing error handlers used in f2fs quota ops.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2019-05-23 22:03:11 +0800

09 May, 2019

7 commits

c9c8ed50d f2fs: fix to avoid potential race on sbi->unusable_block_count access/update ... Browse Code »

Use sbi.stat_lock to protect sbi->unusable_block_count accesss/udpate, in
order to avoid potential race on it.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-05-09 12:23:13 +0800
93770ab7a f2fs: introduce DATA_GENERIC_ENHANCE ... Browse Code »

Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
whether @blkaddr locates in main area or not.

That check is weak, since the block address in range of main area can
point to the address which is not valid in segment info table, and we
can not detect such condition, we may suffer worse corruption as system
continues running.

So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
which trigger SIT bitmap check rather than only range check.

This patch did below changes as wel:
- set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
- get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
- introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
- spread blkaddr check in:
* f2fs_get_node_info()
* __read_out_blkaddrs()
* f2fs_submit_page_read()
* ra_data_block()
* do_recover_data()

This patch can fix bug reported from bugzilla below:

https://bugzilla.kernel.org/show_bug.cgi?id=203215
https://bugzilla.kernel.org/show_bug.cgi?id=203223
https://bugzilla.kernel.org/show_bug.cgi?id=203231
https://bugzilla.kernel.org/show_bug.cgi?id=203235
https://bugzilla.kernel.org/show_bug.cgi?id=203241

= Update by Jaegeuk Kim =

DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
But, xfstest/generic/446 compalins some generated kernel messages saying invalid
bitmap was detected when reading a block. The reaons is, when we get the
block addresses from extent_cache, there is no lock to synchronize it from
truncating the blocks in parallel.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-05-09 12:23:13 +0800
f5a131bb2 f2fs: fix to be aware of readonly device in write_checkpoint() ... Browse Code »

As Park Ju Hyung reported:

Probably unrelated but a similar issue:
Warning appears upon unmounting a corrupted R/O f2fs loop image.

Should be a trivial issue to fix as well :)

[ 2373.758424] ------------[ cut here ]------------
[ 2373.758428] generic_make_request: Trying to write to read-only
block-device loop1 (partno 0)
[ 2373.758455] WARNING: CPU: 1 PID: 13950 at block/blk-core.c:2174
generic_make_request_checks+0x590/0x630
[ 2373.758556] CPU: 1 PID: 13950 Comm: umount Tainted: G O
4.19.35-zen+ #1
[ 2373.758558] Hardware name: System manufacturer System Product
Name/ROG MAXIMUS X HERO (WI-FI AC), BIOS 1704 09/14/2018
[ 2373.758564] RIP: 0010:generic_make_request_checks+0x590/0x630
[ 2373.758567] Code: 5c 03 00 00 48 8d 74 24 08 48 89 df c6 05 b5 cd
36 01 01 e8 c2 90 01 00 48 89 c6 44 89 ea 48 c7 c7 98 64 59 82 e8 d5
9b a7 ff 0b 48 8b 7b 08 e9 f2 fa ff ff 41 8b 86 98 02 00 00 49 8b
16 89
[ 2373.758570] RSP: 0018:ffff8882bdb43950 EFLAGS: 00010282
[ 2373.758573] RAX: 0000000000000050 RBX: ffff8887244c6700 RCX: 0000000000000006
[ 2373.758575] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff88884ec56340
[ 2373.758577] RBP: ffff888849c426c0 R08: 0000000000000004 R09: 00000000000003ba
[ 2373.758579] R10: 0000000000000001 R11: 0000000000000029 R12: 0000000000001000
[ 2373.758581] R13: 0000000000000000 R14: ffff888844a2e800 R15: ffff8882bdb43ac0
[ 2373.758584] FS: 00007fc0d114f8c0(0000) GS:ffff88884ec40000(0000)
knlGS:0000000000000000
[ 2373.758586] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2373.758588] CR2: 00007fc0d1ad12c0 CR3: 00000002bdb82003 CR4: 00000000003606e0
[ 2373.758590] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2373.758592] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2373.758593] Call Trace:
[ 2373.758602] ? generic_make_request+0x46/0x3d0
[ 2373.758608] ? wait_woken+0x80/0x80
[ 2373.758612] ? mempool_alloc+0xb7/0x1a0
[ 2373.758618] ? submit_bio+0x30/0x110
[ 2373.758622] ? bvec_alloc+0x7c/0xd0
[ 2373.758628] ? __submit_merged_bio+0x68/0x390
[ 2373.758633] ? f2fs_submit_page_write+0x1bb/0x7f0
[ 2373.758638] ? f2fs_do_write_meta_page+0x7f/0x160
[ 2373.758642] ? __f2fs_write_meta_page+0x70/0x140
[ 2373.758647] ? f2fs_sync_meta_pages+0x140/0x250
[ 2373.758653] ? f2fs_write_checkpoint+0x5c5/0x17b0
[ 2373.758657] ? f2fs_sync_fs+0x9c/0x110
[ 2373.758664] ? sync_filesystem+0x66/0x80
[ 2373.758667] ? generic_shutdown_super+0x1d/0x100
[ 2373.758670] ? kill_block_super+0x1c/0x40
[ 2373.758674] ? kill_f2fs_super+0x64/0xb0
[ 2373.758678] ? deactivate_locked_super+0x2d/0xb0
[ 2373.758682] ? cleanup_mnt+0x65/0xa0
[ 2373.758688] ? task_work_run+0x7f/0xa0
[ 2373.758693] ? exit_to_usermode_loop+0x9c/0xa0
[ 2373.758698] ? do_syscall_64+0xc7/0xf0
[ 2373.758703] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2373.758706] ---[ end trace 5d3639907c56271b ]---
[ 2373.758780] print_req_error: I/O error, dev loop1, sector 143048
[ 2373.758800] print_req_error: I/O error, dev loop1, sector 152200
[ 2373.758808] print_req_error: I/O error, dev loop1, sector 8192
[ 2373.758819] print_req_error: I/O error, dev loop1, sector 12272

This patch adds to detect readonly device in write_checkpoint() to avoid
trigger write IOs on it.

Reported-by: Park Ju Hyung
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-05-09 12:23:12 +0800
b61af314c f2fs: fix to skip recovery on readonly device ... Browse Code »

As Park Ju Hyung reported in mailing list:

https://sourceforge.net/p/linux-f2fs/mailman/message/36639787/

generic_make_request: Trying to write to read-only block-device loop0 (partno 0)
WARNING: CPU: 0 PID: 23437 at block/blk-core.c:2174 generic_make_request_checks+0x594/0x630

generic_make_request+0x46/0x3d0
submit_bio+0x30/0x110
__submit_merged_bio+0x68/0x390
f2fs_submit_page_write+0x1bb/0x7f0
f2fs_do_write_meta_page+0x7f/0x160
__f2fs_write_meta_page+0x70/0x140
f2fs_sync_meta_pages+0x140/0x250
f2fs_write_checkpoint+0x5c5/0x17b0
f2fs_sync_fs+0x9c/0x110
sync_filesystem+0x66/0x80
f2fs_recover_fsync_data+0x790/0xa30
f2fs_fill_super+0xe4e/0x1980
mount_bdev+0x518/0x610
mount_fs+0x34/0x13f
vfs_kern_mount.part.11+0x4f/0x120
do_mount+0x2d1/0xe40
__x64_sys_mount+0xbf/0xe0
do_syscall_64+0x4a/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9

print_req_error: I/O error, dev loop0, sector 4096

If block device is readonly, we should never trigger write IO from
filesystem layer, but previously, orphan and journal recovery didn't
consider such condition, result in triggering above warning, fix it.

Reported-by: Park Ju Hyung
Tested-by: Park Ju Hyung
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-05-09 12:23:12 +0800
b471eb99e f2fs: relocate chksum_offset for large_nat_bitmap feature ... Browse Code »

For large_nat_bitmap feature, there is a design flaw:

Previous:

struct f2fs_checkpoint layout:
+--------------------------+ 0x0000
| checkpoint_ver |
| ...... |
| checksum_offset |------+
| ...... | |
| sit_nat_version_bitmap[] |
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-05-09 12:23:11 +0800
d7eb8f1cd f2fs: allow unfixed f2fs_checkpoint.checksum_offset ... Browse Code »

Previously, f2fs_checkpoint.checksum_offset points fixed position of
f2fs_checkpoint structure:

"#define CP_CHKSUM_OFFSET 4092"

It is unnecessary, and it breaks the consecutiveness of nat and sit
bitmap stored across checkpoint park block and payload blocks.

This patch allows f2fs to handle unfixed .checksum_offset.

In addition, for the case checksum value is stored in the middle of
checkpoint park, calculating checksum value with superposition method
like we did for inode_checksum.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-05-09 12:23:11 +0800
6dc3a1266 f2fs: fix wrong __is_meta_io() macro ... Browse Code »

This patch changes codes as below:
- don't use is_read_io() as a condition to judge the meta IO.
- use .is_por to replace .is_meta to indicate IO is from recovery explicitly.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-05-09 12:23:07 +0800