22 Dec, 2020
4 commits
-
Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
wrong do_shinker work. Let's avoid to return insane overflowed value by adding
single tracking value.Reported-by: Light Hsieh
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
For multi-device case, one f2fs image includes multi devices, so it
needs to account bytes written of all block devices belong to the image
rather than one main block device, fix it.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
Use rwsem to ensure serialization of the callers and to avoid
starvation of high priority tasks, when the system is under
heavy IO workload.Signed-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
There are two assignments are meaningless, and remove them.
Signed-off-by: Zhang Qilong
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
14 Oct, 2020
1 commit
-
First problem is we hit BUG_ON() in f2fs_get_sum_page given EIO on
f2fs_get_meta_page_nofail().Quick fix was not to give any error with infinite loop, but syzbot caught
a case where it goes to that loop from fuzzed image. In turned out we abused
f2fs_get_meta_page_nofail() like in the below call stack.- f2fs_fill_super
- f2fs_build_segment_manager
- build_sit_entries
- get_current_sit_pageINFO: task syz-executor178:6870 can't die for more than 143 seconds.
task:syz-executor178 state:R
stack:26960 pid: 6870 ppid: 6869 flags:0x00004006
Call Trace:Showing all locks held in the system:
1 lock held by khungtaskd/1179:
#0: ffffffff8a554da0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6242
1 lock held by systemd-journal/3920:
1 lock held by in:imklog/6769:
#0: ffff88809eebc130 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:930
1 lock held by syz-executor178/6870:
#0: ffff8880925120e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0x201/0xaf0 fs/super.c:229Actually, we didn't have to use _nofail in this case, since we could return
error to mount(2) already with the error handler.As a result, this patch tries to 1) remove _nofail callers as much as possible,
2) deal with error case in last remaining caller, f2fs_get_sum_page().Reported-by: syzbot+ee250ac8137be41d7b13@syzkaller.appspotmail.com
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
29 Sep, 2020
2 commits
-
As syzbot reported:
kernel BUG at fs/f2fs/segment.h:657!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 16220 Comm: syz-executor.0 Not tainted 5.9.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:f2fs_ra_meta_pages+0xa51/0xdc0 fs/f2fs/segment.h:657
Call Trace:
build_sit_entries fs/f2fs/segment.c:4195 [inline]
f2fs_build_segment_manager+0x4b8a/0xa3c0 fs/f2fs/segment.c:4779
f2fs_fill_super+0x377d/0x6b80 fs/f2fs/super.c:3633
mount_bdev+0x32e/0x3f0 fs/super.c:1417
legacy_get_tree+0x105/0x220 fs/fs_context.c:592
vfs_get_tree+0x89/0x2f0 fs/super.c:1547
do_new_mount fs/namespace.c:2875 [inline]
path_mount+0x1387/0x2070 fs/namespace.c:3192
do_mount fs/namespace.c:3205 [inline]
__do_sys_mount fs/namespace.c:3413 [inline]
__se_sys_mount fs/namespace.c:3390 [inline]
__x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9@blkno in f2fs_ra_meta_pages could exceed max segment count, causing panic
in following sanity check in current_sit_addr(), add check condition to
avoid this issue.Reported-by: syzbot+3698081bcf0bb2d12174@syzkaller.appspotmail.com
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
Missing the trace exit in f2fs_sync_dirty_inodes
Signed-off-by: Zhang Qilong
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
12 Sep, 2020
1 commit
-
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.3. merge valid blocks from source victim into target victim with
SSR alloctor.Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GCBenefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segmentsSigned-off-by: Chao Yu
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim
11 Sep, 2020
1 commit
-
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later featureSo that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
04 Aug, 2020
3 commits
-
This is to avoid sleep() in the waiter thread.
[ 20.157753] ------------[ cut here ]------------
[ 20.158393] do not call blocking ops when !TASK_RUNNING; state=2 set at [] prepare_to_wait+0xcd/0x430
[ 20.159858] WARNING: CPU: 1 PID: 1152 at kernel/sched/core.c:7142 __might_sleep+0x149/0x1a0
...
[ 20.176110] __submit_merged_write_cond+0x191/0x310
[ 20.176739] f2fs_submit_merged_write+0x18/0x20
[ 20.177323] f2fs_wait_on_all_pages+0x269/0x2d0
[ 20.177899] ? block_operations+0x980/0x980
[ 20.178441] ? __kasan_check_read+0x11/0x20
[ 20.178975] ? finish_wait+0x260/0x260
[ 20.179488] ? percpu_counter_set+0x147/0x230
[ 20.180049] do_checkpoint+0x1757/0x2a50
[ 20.180558] f2fs_write_checkpoint+0x840/0xaf0
[ 20.181126] f2fs_sync_fs+0x287/0x4a0Reported-by: Eric Biggers
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
f2fs_write_data_pages(quota_mapping)
__f2fs_write_data_pages f2fs_write_checkpoint
* blk_start_plug(&plug);
* add bio in write_io[DATA]
- block_operations
- skip syncing quota by
>DEFAULT_RETRY_QUOTA_FLUSH_COUNT
- down_write(&sbi->node_write);
- f2fs_write_single_data_page
- down_read(node_write)
- f2fs_wait_on_all_pages(F2FS_WB_CP_DATA);Signed-off-by: Daeho Jeong
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
Function parameter mode could be TRANS_DIR_INO.
Signed-off-by: Jack Qiu
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
26 Jul, 2020
1 commit
-
Just for code style, no logic change
1. delete useless space
2. change spaces into tabSigned-off-by: Jack Qiu
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
09 Jul, 2020
1 commit
-
meta inode's pages are used for encrypted, verity and compressed blocks,
so the meta inode's cache invalidation condition in do_checkpoint() should
consider compression as well, not just for verity and encryption, fix it.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
09 Jun, 2020
1 commit
-
kmalloc() returns kmalloc'ed memory, and kvmalloc() returns either
kmalloc'ed or vmalloc'ed memory. But the f2fs wrappers, f2fs_kmalloc()
and f2fs_kvmalloc(), both return both kinds of memory.It's redundant to have two functions that do the same thing, and also
breaking the standard naming convention is causing bugs since people
assume it's safe to kfree() memory allocated by f2fs_kmalloc(). See
e.g. the various allocations in fs/f2fs/compress.c.Fix this by making f2fs_kmalloc() just use kmalloc(). And to avoid
re-introducing the allocation failures that the vmalloc fallback was
intended to fix, convert the largest allocations to use f2fs_kvmalloc().Signed-off-by: Eric Biggers
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
19 May, 2020
1 commit
-
Let's guarantee flusing dirty meta pages to avoid infinite loop.
Signed-off-by: Jaegeuk Kim
12 May, 2020
3 commits
-
Sahitya raised an issue:
- prevent meta updates while checkpoint is in progressallocate_segment_for_resize() can cause metapage updates if
it requires to change the current node/data segments for resizing.
Stop these meta updates when there is a checkpoint already
in progress to prevent inconsistent CP data.Signed-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
There could be a scenario where f2fs_sync_node_pages gets
called during checkpoint, which in turn tries to flush
inline data and calls iput(). This results in deadlock as
iput() tries to hold cp_rwsem, which is already held at the
beginning by checkpoint->block_operations().Call stack :
Thread A Thread B
f2fs_write_checkpoint()
- block_operations(sbi)
- f2fs_lock_all(sbi);
- down_write(&sbi->cp_rwsem);- open()
- igrab()
- write() write inline data
- unlink()
- f2fs_sync_node_pages()
- if (is_inline_node(page))
- flush_inline_data()
- ilookup()
page = f2fs_pagecache_get_page()
if (!page)
goto iput_out;
iput_out:
-close()
-iput()
iput(inode);
- f2fs_evict_inode()
- f2fs_truncate_blocks()
- f2fs_lock_op()
- down_read(&sbi->cp_rwsem);Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
Signed-off-by: Sayali Lokhande
Signed-off-by: Jaegeuk Kim -
blk_plugging doesn't seem to give any benefit.
Signed-off-by: Jaegeuk Kim
18 Apr, 2020
2 commits
-
Adds to support accounting read IOs from userspace/kernel.
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
In f2fs_ra_meta_pages(), if f2fs_submit_page_bio() failed, we need to
unlock page, fix it.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
23 Mar, 2020
1 commit
-
Add and set a new CP flag CP_RESIZEFS_FLAG during
online resize FS to help fsck fix the metadata mismatch
that may happen due to SPO during resize, where SB
got updated but CP data couldn't be written yet.fsck errors -
Info: CKPT version = 6ed7bccb
Wrong user_block_count(2233856)
[f2fs_do_mount:3365] Checkpoint is pollutedSigned-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
20 Mar, 2020
1 commit
-
As Geert Uytterhoeven reported:
for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);
On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.Quoted from Geert Uytterhoeven:
"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
11 Mar, 2020
1 commit
-
Lack of maintenance on comments may mislead developers, fix them.
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
28 Feb, 2020
1 commit
-
There could be a scenario where f2fs_sync_meta_pages() will not
ensure that all F2FS_DIRTY_META pages are submitted for IO. Thus,
resulting in the below panic in do_checkpoint() -f2fs_bug_on(sbi, get_pages(sbi, F2FS_DIRTY_META) &&
!f2fs_cp_error(sbi));This can happen in a low-memory condition, where shrinker could
also be doing the writepage operation (stack shown below)
at the same time when checkpoint is running on another core.schedule
down_write
f2fs_submit_page_write -> by this time, this page in page cache is tagged
as PAGECACHE_TAG_WRITEBACK and PAGECACHE_TAG_DIRTY
is cleared, due to which f2fs_sync_meta_pages()
cannot sync this page in do_checkpoint() path.
f2fs_do_write_meta_page
__f2fs_write_meta_page
f2fs_write_meta_page
shrink_page_list
shrink_inactive_list
shrink_node_memcg
shrink_node
kswapdSigned-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
16 Jan, 2020
1 commit
-
META_MAPPING is used to move blocks for both encrypted and verity files.
So the META_MAPPING invalidation condition in do_checkpoint() should
consider verity too, not just encrypt.Signed-off-by: Eric Biggers
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
20 Nov, 2019
1 commit
-
As Eric mentioned, bare printk{,_ratelimited} won't show which
filesystem instance these message is coming from, this patch tries
to show fs instance with sb->s_id field in all places we missed
before.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
03 Jul, 2019
4 commits
-
Two paths to update quota and f2fs_lock_op:
1.
- lock_op
| - quota_update
`- unlock_op2.
- quota_update
- lock_op
`- unlock_opBut, we need to make a transaction on quota_update + lock_op in #2 case.
So, this patch introduces:
1. lock_op
2. down_write
3. check __need_flush
4. up_write
5. if there is dirty quota entries, flush them
6. otherwise, good to goSigned-off-by: Jaegeuk Kim
-
f2fs uses EFAULT as error number to indicate filesystem is corrupted
all the time, but generic filesystems use EUCLEAN for such condition,
we need to change to follow others.This patch adds two new macros as below to wrap more generic error
code macros, and spread them in code.EFSBADCRC EBADMSG /* Bad CRC detected */
EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */Reported-by: Pavel Machek
Signed-off-by: Chao Yu
Acked-by: Pavel Machek
Signed-off-by: Jaegeuk Kim -
- Add and use f2fs_ macros
- Convert f2fs_msg to f2fs_printk
- Remove level from f2fs_printk and embed the level in the format
- Coalesce formats and align multi-line arguments
- Remove unnecessary duplicate extern f2fs_msg f2fs.hSigned-off-by: Joe Perches
Signed-off-by: Chao Yu
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
This ioctl shrinks a given length (aligned to sections) from end of the
main area. Any cursegs and valid blocks will be moved out before
invalidating the range.This feature can be used for adjusting partition sizes online.
History of the patch:
Sahitya Tummala:
- Add this ioctl for f2fs_compat_ioctl() as well.
- Fix debugfs status to reflect the online resize changes.
- Fix potential race between online resize path and allocate new data
block path or gc path.Others:
- Rename some identifiers.
- Add some error handling branches.
- Clear sbi->next_victim_seg[BG_GC/FG_GC] in shrinking range.
- Implement this interface as ext4's, and change the parameter from shrunk
bytes to new block count of F2FS.
- During resizing, force to empty sit_journal and forbid adding new
entries to it, in order to avoid invalid segno in journal after resize.
- Reduce sbi->user_block_count before resize starts.
- Commit the updated superblock first, and then update in-memory metadata
only when the former succeeds.
- Target block count must align to sections.
- Write checkpoint before and after committing the new superblock, w/o
CP_FSCK_FLAG respectively, so that the FS can be fixed by fsck even if
resize fails after the new superblock is committed.
- In free_segment_range(), reduce granularity of gc_mutex.
- Add protection on curseg migration.
- Add freeze_bdev() and thaw_bdev() for resize fs.
- Remove CUR_MAIN_SECS and use MAIN_SECS directly for allocation.
- Recover super_block and FS metadata when resize fails.
- No need to clear CP_FSCK_FLAG in update_ckpt_flags().
- Clean up the sb and fs metadata update functions for resize_fs.Geert Uytterhoeven:
- Use div_u64*() for 64-bit divisionsArnd Bergmann:
- Not all architectures support get_user() with a 64-bit argument:
ERROR: "__get_user_bad" [fs/f2fs/f2fs.ko] undefined!
Use copy_from_user() here, this will always work.Signed-off-by: Qiuyang Sun
Signed-off-by: Chao Yu
Signed-off-by: Sahitya Tummala
Signed-off-by: Geert Uytterhoeven
Signed-off-by: Arnd Bergmann
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
23 May, 2019
2 commits
-
As Ju Hyung reported:
"
I was semi-forced today to use the new kernel and test f2fs.My Ubuntu initramfs got a bit wonky and I had to boot into live CD and
fix some stuffs. The live CD was using 4.15 kernel, and just mounting
the f2fs partition there corrupted f2fs and my 4.19(with 5.1-rc1-4.19
f2fs-stable merged) refused to mount with "SIT is corrupted node"
message.I used the latest f2fs-tools sent by Chao including "fsck.f2fs: fix to
repair cp_loads blocks at correct position"It spit out 140M worth of output, but at least I didn't have to run it
twice. Everything returned "Ok" in the 2nd run.
The new log is at
http://arter97.com/f2fs/finalAfter fixing the image, I used my 4.19 kernel with 5.2-rc1-4.19
f2fs-stable merged and it mounted.But, I got this:
[ 1.047791] F2FS-fs (nvme0n1p3): layout of large_nat_bitmap is
deprecated, run fsck to repair, chksum_offset: 4092
[ 1.081307] F2FS-fs (nvme0n1p3): Found nat_bits in checkpoint
[ 1.161520] F2FS-fs (nvme0n1p3): recover fsync data on readonly fs
[ 1.162418] F2FS-fs (nvme0n1p3): Mounted with checkpoint version = 761c7e00But after doing a reboot, the message is gone:
[ 1.098423] F2FS-fs (nvme0n1p3): Found nat_bits in checkpoint
[ 1.177771] F2FS-fs (nvme0n1p3): recover fsync data on readonly fs
[ 1.178365] F2FS-fs (nvme0n1p3): Mounted with checkpoint version = 761c7edaI'm not exactly sure why the kernel detected that I'm still using the
old layout on the first boot. Maybe fsck didn't fix it properly, or
the check from the kernel is improper.
"Although we have rebuild the old deprecated checkpoint with new layout
during repair, we only repair last checkpoint park, the other old one is
remained.Once the image was mounted, we will 1) sanity check layout and 2) decide
which checkpoint park to use according to cp_ver. So that we will print
reported message unnecessarily at step 1), to avoid it, we simply move
layout check into f2fs_sanity_check_ckpt() after step 2).Reported-by: Park Ju Hyung
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
This patch reverts:
commit fb40d618b039 ("f2fs: don't clear CP_QUOTA_NEED_FSCK_FLAG").We were missing error handlers used in f2fs quota ops.
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
09 May, 2019
7 commits
-
Use sbi.stat_lock to protect sbi->unusable_block_count accesss/udpate, in
order to avoid potential race on it.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
whether @blkaddr locates in main area or not.That check is weak, since the block address in range of main area can
point to the address which is not valid in segment info table, and we
can not detect such condition, we may suffer worse corruption as system
continues running.So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
which trigger SIT bitmap check rather than only range check.This patch did below changes as wel:
- set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
- get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
- introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
- spread blkaddr check in:
* f2fs_get_node_info()
* __read_out_blkaddrs()
* f2fs_submit_page_read()
* ra_data_block()
* do_recover_data()This patch can fix bug reported from bugzilla below:
https://bugzilla.kernel.org/show_bug.cgi?id=203215
https://bugzilla.kernel.org/show_bug.cgi?id=203223
https://bugzilla.kernel.org/show_bug.cgi?id=203231
https://bugzilla.kernel.org/show_bug.cgi?id=203235
https://bugzilla.kernel.org/show_bug.cgi?id=203241= Update by Jaegeuk Kim =
DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
But, xfstest/generic/446 compalins some generated kernel messages saying invalid
bitmap was detected when reading a block. The reaons is, when we get the
block addresses from extent_cache, there is no lock to synchronize it from
truncating the blocks in parallel.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
As Park Ju Hyung reported:
Probably unrelated but a similar issue:
Warning appears upon unmounting a corrupted R/O f2fs loop image.Should be a trivial issue to fix as well :)
[ 2373.758424] ------------[ cut here ]------------
[ 2373.758428] generic_make_request: Trying to write to read-only
block-device loop1 (partno 0)
[ 2373.758455] WARNING: CPU: 1 PID: 13950 at block/blk-core.c:2174
generic_make_request_checks+0x590/0x630
[ 2373.758556] CPU: 1 PID: 13950 Comm: umount Tainted: G O
4.19.35-zen+ #1
[ 2373.758558] Hardware name: System manufacturer System Product
Name/ROG MAXIMUS X HERO (WI-FI AC), BIOS 1704 09/14/2018
[ 2373.758564] RIP: 0010:generic_make_request_checks+0x590/0x630
[ 2373.758567] Code: 5c 03 00 00 48 8d 74 24 08 48 89 df c6 05 b5 cd
36 01 01 e8 c2 90 01 00 48 89 c6 44 89 ea 48 c7 c7 98 64 59 82 e8 d5
9b a7 ff 0b 48 8b 7b 08 e9 f2 fa ff ff 41 8b 86 98 02 00 00 49 8b
16 89
[ 2373.758570] RSP: 0018:ffff8882bdb43950 EFLAGS: 00010282
[ 2373.758573] RAX: 0000000000000050 RBX: ffff8887244c6700 RCX: 0000000000000006
[ 2373.758575] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff88884ec56340
[ 2373.758577] RBP: ffff888849c426c0 R08: 0000000000000004 R09: 00000000000003ba
[ 2373.758579] R10: 0000000000000001 R11: 0000000000000029 R12: 0000000000001000
[ 2373.758581] R13: 0000000000000000 R14: ffff888844a2e800 R15: ffff8882bdb43ac0
[ 2373.758584] FS: 00007fc0d114f8c0(0000) GS:ffff88884ec40000(0000)
knlGS:0000000000000000
[ 2373.758586] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2373.758588] CR2: 00007fc0d1ad12c0 CR3: 00000002bdb82003 CR4: 00000000003606e0
[ 2373.758590] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2373.758592] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2373.758593] Call Trace:
[ 2373.758602] ? generic_make_request+0x46/0x3d0
[ 2373.758608] ? wait_woken+0x80/0x80
[ 2373.758612] ? mempool_alloc+0xb7/0x1a0
[ 2373.758618] ? submit_bio+0x30/0x110
[ 2373.758622] ? bvec_alloc+0x7c/0xd0
[ 2373.758628] ? __submit_merged_bio+0x68/0x390
[ 2373.758633] ? f2fs_submit_page_write+0x1bb/0x7f0
[ 2373.758638] ? f2fs_do_write_meta_page+0x7f/0x160
[ 2373.758642] ? __f2fs_write_meta_page+0x70/0x140
[ 2373.758647] ? f2fs_sync_meta_pages+0x140/0x250
[ 2373.758653] ? f2fs_write_checkpoint+0x5c5/0x17b0
[ 2373.758657] ? f2fs_sync_fs+0x9c/0x110
[ 2373.758664] ? sync_filesystem+0x66/0x80
[ 2373.758667] ? generic_shutdown_super+0x1d/0x100
[ 2373.758670] ? kill_block_super+0x1c/0x40
[ 2373.758674] ? kill_f2fs_super+0x64/0xb0
[ 2373.758678] ? deactivate_locked_super+0x2d/0xb0
[ 2373.758682] ? cleanup_mnt+0x65/0xa0
[ 2373.758688] ? task_work_run+0x7f/0xa0
[ 2373.758693] ? exit_to_usermode_loop+0x9c/0xa0
[ 2373.758698] ? do_syscall_64+0xc7/0xf0
[ 2373.758703] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2373.758706] ---[ end trace 5d3639907c56271b ]---
[ 2373.758780] print_req_error: I/O error, dev loop1, sector 143048
[ 2373.758800] print_req_error: I/O error, dev loop1, sector 152200
[ 2373.758808] print_req_error: I/O error, dev loop1, sector 8192
[ 2373.758819] print_req_error: I/O error, dev loop1, sector 12272This patch adds to detect readonly device in write_checkpoint() to avoid
trigger write IOs on it.Reported-by: Park Ju Hyung
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
As Park Ju Hyung reported in mailing list:
https://sourceforge.net/p/linux-f2fs/mailman/message/36639787/
generic_make_request: Trying to write to read-only block-device loop0 (partno 0)
WARNING: CPU: 0 PID: 23437 at block/blk-core.c:2174 generic_make_request_checks+0x594/0x630generic_make_request+0x46/0x3d0
submit_bio+0x30/0x110
__submit_merged_bio+0x68/0x390
f2fs_submit_page_write+0x1bb/0x7f0
f2fs_do_write_meta_page+0x7f/0x160
__f2fs_write_meta_page+0x70/0x140
f2fs_sync_meta_pages+0x140/0x250
f2fs_write_checkpoint+0x5c5/0x17b0
f2fs_sync_fs+0x9c/0x110
sync_filesystem+0x66/0x80
f2fs_recover_fsync_data+0x790/0xa30
f2fs_fill_super+0xe4e/0x1980
mount_bdev+0x518/0x610
mount_fs+0x34/0x13f
vfs_kern_mount.part.11+0x4f/0x120
do_mount+0x2d1/0xe40
__x64_sys_mount+0xbf/0xe0
do_syscall_64+0x4a/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9print_req_error: I/O error, dev loop0, sector 4096
If block device is readonly, we should never trigger write IO from
filesystem layer, but previously, orphan and journal recovery didn't
consider such condition, result in triggering above warning, fix it.Reported-by: Park Ju Hyung
Tested-by: Park Ju Hyung
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
For large_nat_bitmap feature, there is a design flaw:
Previous:
struct f2fs_checkpoint layout:
+--------------------------+ 0x0000
| checkpoint_ver |
| ...... |
| checksum_offset |------+
| ...... | |
| sit_nat_version_bitmap[] |
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
Previously, f2fs_checkpoint.checksum_offset points fixed position of
f2fs_checkpoint structure:"#define CP_CHKSUM_OFFSET 4092"
It is unnecessary, and it breaks the consecutiveness of nat and sit
bitmap stored across checkpoint park block and payload blocks.This patch allows f2fs to handle unfixed .checksum_offset.
In addition, for the case checksum value is stored in the middle of
checkpoint park, calculating checksum value with superposition method
like we did for inode_checksum.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
This patch changes codes as below:
- don't use is_read_io() as a condition to judge the meta IO.
- use .is_por to replace .is_meta to indicate IO is from recovery explicitly.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim