22 Dec, 2020

4 commits


14 Oct, 2020

1 commit

  • First problem is we hit BUG_ON() in f2fs_get_sum_page given EIO on
    f2fs_get_meta_page_nofail().

    Quick fix was not to give any error with infinite loop, but syzbot caught
    a case where it goes to that loop from fuzzed image. In turned out we abused
    f2fs_get_meta_page_nofail() like in the below call stack.

    - f2fs_fill_super
    - f2fs_build_segment_manager
    - build_sit_entries
    - get_current_sit_page

    INFO: task syz-executor178:6870 can't die for more than 143 seconds.
    task:syz-executor178 state:R
    stack:26960 pid: 6870 ppid: 6869 flags:0x00004006
    Call Trace:

    Showing all locks held in the system:
    1 lock held by khungtaskd/1179:
    #0: ffffffff8a554da0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6242
    1 lock held by systemd-journal/3920:
    1 lock held by in:imklog/6769:
    #0: ffff88809eebc130 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:930
    1 lock held by syz-executor178/6870:
    #0: ffff8880925120e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0x201/0xaf0 fs/super.c:229

    Actually, we didn't have to use _nofail in this case, since we could return
    error to mount(2) already with the error handler.

    As a result, this patch tries to 1) remove _nofail callers as much as possible,
    2) deal with error case in last remaining caller, f2fs_get_sum_page().

    Reported-by: syzbot+ee250ac8137be41d7b13@syzkaller.appspotmail.com
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

29 Sep, 2020

2 commits

  • As syzbot reported:

    kernel BUG at fs/f2fs/segment.h:657!
    invalid opcode: 0000 [#1] PREEMPT SMP KASAN
    CPU: 1 PID: 16220 Comm: syz-executor.0 Not tainted 5.9.0-rc5-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:f2fs_ra_meta_pages+0xa51/0xdc0 fs/f2fs/segment.h:657
    Call Trace:
    build_sit_entries fs/f2fs/segment.c:4195 [inline]
    f2fs_build_segment_manager+0x4b8a/0xa3c0 fs/f2fs/segment.c:4779
    f2fs_fill_super+0x377d/0x6b80 fs/f2fs/super.c:3633
    mount_bdev+0x32e/0x3f0 fs/super.c:1417
    legacy_get_tree+0x105/0x220 fs/fs_context.c:592
    vfs_get_tree+0x89/0x2f0 fs/super.c:1547
    do_new_mount fs/namespace.c:2875 [inline]
    path_mount+0x1387/0x2070 fs/namespace.c:3192
    do_mount fs/namespace.c:3205 [inline]
    __do_sys_mount fs/namespace.c:3413 [inline]
    __se_sys_mount fs/namespace.c:3390 [inline]
    __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    @blkno in f2fs_ra_meta_pages could exceed max segment count, causing panic
    in following sanity check in current_sit_addr(), add check condition to
    avoid this issue.

    Reported-by: syzbot+3698081bcf0bb2d12174@syzkaller.appspotmail.com
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Missing the trace exit in f2fs_sync_dirty_inodes

    Signed-off-by: Zhang Qilong
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Zhang Qilong
     

12 Sep, 2020

1 commit

  • There are several issues in current background GC algorithm:
    - valid blocks is one of key factors during cost overhead calculation,
    so if segment has less valid block, however even its age is young or
    it locates hot segment, CB algorithm will still choose the segment as
    victim, it's not appropriate.
    - GCed data/node will go to existing logs, no matter in-there datas'
    update frequency is the same or not, it may mix hot and cold data
    again.
    - GC alloctor mainly use LFS type segment, it will cost free segment
    more quickly.

    This patch introduces a new algorithm named age threshold based
    garbage collection to solve above issues, there are three steps
    mainly:

    1. select a source victim:
    - set an age threshold, and select candidates beased threshold:
    e.g.
    0 means youngest, 100 means oldest, if we set age threshold to 80
    then select dirty segments which has age in range of [80, 100] as
    candiddates;
    - set candidate_ratio threshold, and select candidates based the
    ratio, so that we can shrink candidates to those oldest segments;
    - select target segment with fewest valid blocks in order to
    migrate blocks with minimum cost;

    2. select a target victim:
    - select candidates beased age threshold;
    - set candidate_radius threshold, search candidates whose age is
    around source victims, searching radius should less than the
    radius threshold.
    - select target segment with most valid blocks in order to avoid
    migrating current target segment.

    3. merge valid blocks from source victim into target victim with
    SSR alloctor.

    Test steps:
    - create 160 dirty segments:
    * half of them have 128 valid blocks per segment
    * left of them have 384 valid blocks per segment
    - run background GC

    Benefit: GC count and block movement count both decrease obviously:

    - Before:
    - Valid: 86
    - Dirty: 1
    - Prefree: 11
    - Free: 6001 (6001)

    GC calls: 162 (BG: 220)
    - data segments : 160 (160)
    - node segments : 2 (2)
    Try to move 41454 blocks (BG: 41454)
    - data blocks : 40960 (40960)
    - node blocks : 494 (494)

    IPU: 0 blocks
    SSR: 0 blocks in 0 segments
    LFS: 41364 blocks in 81 segments

    - After:

    - Valid: 87
    - Dirty: 0
    - Prefree: 4
    - Free: 6008 (6008)

    GC calls: 75 (BG: 76)
    - data segments : 74 (74)
    - node segments : 1 (1)
    Try to move 12813 blocks (BG: 12813)
    - data blocks : 12544 (12544)
    - node blocks : 269 (269)

    IPU: 0 blocks
    SSR: 12032 blocks in 77 segments
    LFS: 855 blocks in 2 segments

    Signed-off-by: Chao Yu
    [Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

11 Sep, 2020

1 commit

  • Previous implementation of aligned pinfile allocation will:
    - allocate new segment on cold data log no matter whether last used
    segment is partially used or not, it makes IOs more random;
    - force concurrent cold data/GCed IO going into warm data area, it
    can make a bad effect on hot/cold data separation;

    In this patch, we introduce a new type of log named 'inmem curseg',
    the differents from normal curseg is:
    - it reuses existed segment type (CURSEG_XXX_NODE/DATA);
    - it only exists in memory, its segno, blkofs, summary will not b
    persisted into checkpoint area;

    With this new feature, we can enhance scalability of log, special
    allocators can be created for purposes:
    - pure lfs allocator for aligned pinfile allocation or file
    defragmentation
    - pure ssr allocator for later feature

    So that, let's update aligned pinfile allocation to use this new
    inmem curseg fwk.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

04 Aug, 2020

3 commits

  • This is to avoid sleep() in the waiter thread.

    [ 20.157753] ------------[ cut here ]------------
    [ 20.158393] do not call blocking ops when !TASK_RUNNING; state=2 set at [] prepare_to_wait+0xcd/0x430
    [ 20.159858] WARNING: CPU: 1 PID: 1152 at kernel/sched/core.c:7142 __might_sleep+0x149/0x1a0
    ...
    [ 20.176110] __submit_merged_write_cond+0x191/0x310
    [ 20.176739] f2fs_submit_merged_write+0x18/0x20
    [ 20.177323] f2fs_wait_on_all_pages+0x269/0x2d0
    [ 20.177899] ? block_operations+0x980/0x980
    [ 20.178441] ? __kasan_check_read+0x11/0x20
    [ 20.178975] ? finish_wait+0x260/0x260
    [ 20.179488] ? percpu_counter_set+0x147/0x230
    [ 20.180049] do_checkpoint+0x1757/0x2a50
    [ 20.180558] f2fs_write_checkpoint+0x840/0xaf0
    [ 20.181126] f2fs_sync_fs+0x287/0x4a0

    Reported-by: Eric Biggers
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • f2fs_write_data_pages(quota_mapping)
    __f2fs_write_data_pages f2fs_write_checkpoint
    * blk_start_plug(&plug);
    * add bio in write_io[DATA]
    - block_operations
    - skip syncing quota by
    >DEFAULT_RETRY_QUOTA_FLUSH_COUNT
    - down_write(&sbi->node_write);
    - f2fs_write_single_data_page
    - down_read(node_write)
    - f2fs_wait_on_all_pages(F2FS_WB_CP_DATA);

    Signed-off-by: Daeho Jeong
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Function parameter mode could be TRANS_DIR_INO.

    Signed-off-by: Jack Qiu
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jack Qiu
     

26 Jul, 2020

1 commit


09 Jul, 2020

1 commit


09 Jun, 2020

1 commit

  • kmalloc() returns kmalloc'ed memory, and kvmalloc() returns either
    kmalloc'ed or vmalloc'ed memory. But the f2fs wrappers, f2fs_kmalloc()
    and f2fs_kvmalloc(), both return both kinds of memory.

    It's redundant to have two functions that do the same thing, and also
    breaking the standard naming convention is causing bugs since people
    assume it's safe to kfree() memory allocated by f2fs_kmalloc(). See
    e.g. the various allocations in fs/f2fs/compress.c.

    Fix this by making f2fs_kmalloc() just use kmalloc(). And to avoid
    re-introducing the allocation failures that the vmalloc fallback was
    intended to fix, convert the largest allocations to use f2fs_kvmalloc().

    Signed-off-by: Eric Biggers
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Eric Biggers
     

19 May, 2020

1 commit


12 May, 2020

3 commits

  • Sahitya raised an issue:
    - prevent meta updates while checkpoint is in progress

    allocate_segment_for_resize() can cause metapage updates if
    it requires to change the current node/data segments for resizing.
    Stop these meta updates when there is a checkpoint already
    in progress to prevent inconsistent CP data.

    Signed-off-by: Sahitya Tummala
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • There could be a scenario where f2fs_sync_node_pages gets
    called during checkpoint, which in turn tries to flush
    inline data and calls iput(). This results in deadlock as
    iput() tries to hold cp_rwsem, which is already held at the
    beginning by checkpoint->block_operations().

    Call stack :

    Thread A Thread B
    f2fs_write_checkpoint()
    - block_operations(sbi)
    - f2fs_lock_all(sbi);
    - down_write(&sbi->cp_rwsem);

    - open()
    - igrab()
    - write() write inline data
    - unlink()
    - f2fs_sync_node_pages()
    - if (is_inline_node(page))
    - flush_inline_data()
    - ilookup()
    page = f2fs_pagecache_get_page()
    if (!page)
    goto iput_out;
    iput_out:
    -close()
    -iput()
    iput(inode);
    - f2fs_evict_inode()
    - f2fs_truncate_blocks()
    - f2fs_lock_op()
    - down_read(&sbi->cp_rwsem);

    Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
    Signed-off-by: Sayali Lokhande
    Signed-off-by: Jaegeuk Kim

    Sayali Lokhande
     
  • blk_plugging doesn't seem to give any benefit.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

18 Apr, 2020

2 commits


23 Mar, 2020

1 commit

  • Add and set a new CP flag CP_RESIZEFS_FLAG during
    online resize FS to help fsck fix the metadata mismatch
    that may happen due to SPO during resize, where SB
    got updated but CP data couldn't be written yet.

    fsck errors -
    Info: CKPT version = 6ed7bccb
    Wrong user_block_count(2233856)
    [f2fs_do_mount:3365] Checkpoint is polluted

    Signed-off-by: Sahitya Tummala
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Sahitya Tummala
     

20 Mar, 2020

1 commit

  • As Geert Uytterhoeven reported:

    for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

    On some platforms, HZ can be less than 50, then unexpected 0 timeout
    jiffies will be set in congestion_wait().

    This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
    value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

    Quoted from Geert Uytterhoeven:

    "A timeout of HZ means 1 second.
    HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

    If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
    as that takes care of the special cases, and never returns 0."

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

11 Mar, 2020

1 commit


28 Feb, 2020

1 commit

  • There could be a scenario where f2fs_sync_meta_pages() will not
    ensure that all F2FS_DIRTY_META pages are submitted for IO. Thus,
    resulting in the below panic in do_checkpoint() -

    f2fs_bug_on(sbi, get_pages(sbi, F2FS_DIRTY_META) &&
    !f2fs_cp_error(sbi));

    This can happen in a low-memory condition, where shrinker could
    also be doing the writepage operation (stack shown below)
    at the same time when checkpoint is running on another core.

    schedule
    down_write
    f2fs_submit_page_write -> by this time, this page in page cache is tagged
    as PAGECACHE_TAG_WRITEBACK and PAGECACHE_TAG_DIRTY
    is cleared, due to which f2fs_sync_meta_pages()
    cannot sync this page in do_checkpoint() path.
    f2fs_do_write_meta_page
    __f2fs_write_meta_page
    f2fs_write_meta_page
    shrink_page_list
    shrink_inactive_list
    shrink_node_memcg
    shrink_node
    kswapd

    Signed-off-by: Sahitya Tummala
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Sahitya Tummala
     

16 Jan, 2020

1 commit


20 Nov, 2019

1 commit

  • As Eric mentioned, bare printk{,_ratelimited} won't show which
    filesystem instance these message is coming from, this patch tries
    to show fs instance with sb->s_id field in all places we missed
    before.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

03 Jul, 2019

4 commits

  • Two paths to update quota and f2fs_lock_op:

    1.
    - lock_op
    | - quota_update
    `- unlock_op

    2.
    - quota_update
    - lock_op
    `- unlock_op

    But, we need to make a transaction on quota_update + lock_op in #2 case.
    So, this patch introduces:
    1. lock_op
    2. down_write
    3. check __need_flush
    4. up_write
    5. if there is dirty quota entries, flush them
    6. otherwise, good to go

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • f2fs uses EFAULT as error number to indicate filesystem is corrupted
    all the time, but generic filesystems use EUCLEAN for such condition,
    we need to change to follow others.

    This patch adds two new macros as below to wrap more generic error
    code macros, and spread them in code.

    EFSBADCRC EBADMSG /* Bad CRC detected */
    EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */

    Reported-by: Pavel Machek
    Signed-off-by: Chao Yu
    Acked-by: Pavel Machek
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • - Add and use f2fs_ macros
    - Convert f2fs_msg to f2fs_printk
    - Remove level from f2fs_printk and embed the level in the format
    - Coalesce formats and align multi-line arguments
    - Remove unnecessary duplicate extern f2fs_msg f2fs.h

    Signed-off-by: Joe Perches
    Signed-off-by: Chao Yu
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Joe Perches
     
  • This ioctl shrinks a given length (aligned to sections) from end of the
    main area. Any cursegs and valid blocks will be moved out before
    invalidating the range.

    This feature can be used for adjusting partition sizes online.

    History of the patch:

    Sahitya Tummala:
    - Add this ioctl for f2fs_compat_ioctl() as well.
    - Fix debugfs status to reflect the online resize changes.
    - Fix potential race between online resize path and allocate new data
    block path or gc path.

    Others:
    - Rename some identifiers.
    - Add some error handling branches.
    - Clear sbi->next_victim_seg[BG_GC/FG_GC] in shrinking range.
    - Implement this interface as ext4's, and change the parameter from shrunk
    bytes to new block count of F2FS.
    - During resizing, force to empty sit_journal and forbid adding new
    entries to it, in order to avoid invalid segno in journal after resize.
    - Reduce sbi->user_block_count before resize starts.
    - Commit the updated superblock first, and then update in-memory metadata
    only when the former succeeds.
    - Target block count must align to sections.
    - Write checkpoint before and after committing the new superblock, w/o
    CP_FSCK_FLAG respectively, so that the FS can be fixed by fsck even if
    resize fails after the new superblock is committed.
    - In free_segment_range(), reduce granularity of gc_mutex.
    - Add protection on curseg migration.
    - Add freeze_bdev() and thaw_bdev() for resize fs.
    - Remove CUR_MAIN_SECS and use MAIN_SECS directly for allocation.
    - Recover super_block and FS metadata when resize fails.
    - No need to clear CP_FSCK_FLAG in update_ckpt_flags().
    - Clean up the sb and fs metadata update functions for resize_fs.

    Geert Uytterhoeven:
    - Use div_u64*() for 64-bit divisions

    Arnd Bergmann:
    - Not all architectures support get_user() with a 64-bit argument:
    ERROR: "__get_user_bad" [fs/f2fs/f2fs.ko] undefined!
    Use copy_from_user() here, this will always work.

    Signed-off-by: Qiuyang Sun
    Signed-off-by: Chao Yu
    Signed-off-by: Sahitya Tummala
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Qiuyang Sun
     

23 May, 2019

2 commits

  • As Ju Hyung reported:

    "
    I was semi-forced today to use the new kernel and test f2fs.

    My Ubuntu initramfs got a bit wonky and I had to boot into live CD and
    fix some stuffs. The live CD was using 4.15 kernel, and just mounting
    the f2fs partition there corrupted f2fs and my 4.19(with 5.1-rc1-4.19
    f2fs-stable merged) refused to mount with "SIT is corrupted node"
    message.

    I used the latest f2fs-tools sent by Chao including "fsck.f2fs: fix to
    repair cp_loads blocks at correct position"

    It spit out 140M worth of output, but at least I didn't have to run it
    twice. Everything returned "Ok" in the 2nd run.
    The new log is at
    http://arter97.com/f2fs/final

    After fixing the image, I used my 4.19 kernel with 5.2-rc1-4.19
    f2fs-stable merged and it mounted.

    But, I got this:
    [ 1.047791] F2FS-fs (nvme0n1p3): layout of large_nat_bitmap is
    deprecated, run fsck to repair, chksum_offset: 4092
    [ 1.081307] F2FS-fs (nvme0n1p3): Found nat_bits in checkpoint
    [ 1.161520] F2FS-fs (nvme0n1p3): recover fsync data on readonly fs
    [ 1.162418] F2FS-fs (nvme0n1p3): Mounted with checkpoint version = 761c7e00

    But after doing a reboot, the message is gone:
    [ 1.098423] F2FS-fs (nvme0n1p3): Found nat_bits in checkpoint
    [ 1.177771] F2FS-fs (nvme0n1p3): recover fsync data on readonly fs
    [ 1.178365] F2FS-fs (nvme0n1p3): Mounted with checkpoint version = 761c7eda

    I'm not exactly sure why the kernel detected that I'm still using the
    old layout on the first boot. Maybe fsck didn't fix it properly, or
    the check from the kernel is improper.
    "

    Although we have rebuild the old deprecated checkpoint with new layout
    during repair, we only repair last checkpoint park, the other old one is
    remained.

    Once the image was mounted, we will 1) sanity check layout and 2) decide
    which checkpoint park to use according to cp_ver. So that we will print
    reported message unnecessarily at step 1), to avoid it, we simply move
    layout check into f2fs_sanity_check_ckpt() after step 2).

    Reported-by: Park Ju Hyung
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch reverts:
    commit fb40d618b039 ("f2fs: don't clear CP_QUOTA_NEED_FSCK_FLAG").

    We were missing error handlers used in f2fs quota ops.

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

09 May, 2019

7 commits

  • Use sbi.stat_lock to protect sbi->unusable_block_count accesss/udpate, in
    order to avoid potential race on it.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
    whether @blkaddr locates in main area or not.

    That check is weak, since the block address in range of main area can
    point to the address which is not valid in segment info table, and we
    can not detect such condition, we may suffer worse corruption as system
    continues running.

    So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
    which trigger SIT bitmap check rather than only range check.

    This patch did below changes as wel:
    - set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
    - get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
    - introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
    - spread blkaddr check in:
    * f2fs_get_node_info()
    * __read_out_blkaddrs()
    * f2fs_submit_page_read()
    * ra_data_block()
    * do_recover_data()

    This patch can fix bug reported from bugzilla below:

    https://bugzilla.kernel.org/show_bug.cgi?id=203215
    https://bugzilla.kernel.org/show_bug.cgi?id=203223
    https://bugzilla.kernel.org/show_bug.cgi?id=203231
    https://bugzilla.kernel.org/show_bug.cgi?id=203235
    https://bugzilla.kernel.org/show_bug.cgi?id=203241

    = Update by Jaegeuk Kim =

    DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
    But, xfstest/generic/446 compalins some generated kernel messages saying invalid
    bitmap was detected when reading a block. The reaons is, when we get the
    block addresses from extent_cache, there is no lock to synchronize it from
    truncating the blocks in parallel.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • As Park Ju Hyung reported:

    Probably unrelated but a similar issue:
    Warning appears upon unmounting a corrupted R/O f2fs loop image.

    Should be a trivial issue to fix as well :)

    [ 2373.758424] ------------[ cut here ]------------
    [ 2373.758428] generic_make_request: Trying to write to read-only
    block-device loop1 (partno 0)
    [ 2373.758455] WARNING: CPU: 1 PID: 13950 at block/blk-core.c:2174
    generic_make_request_checks+0x590/0x630
    [ 2373.758556] CPU: 1 PID: 13950 Comm: umount Tainted: G O
    4.19.35-zen+ #1
    [ 2373.758558] Hardware name: System manufacturer System Product
    Name/ROG MAXIMUS X HERO (WI-FI AC), BIOS 1704 09/14/2018
    [ 2373.758564] RIP: 0010:generic_make_request_checks+0x590/0x630
    [ 2373.758567] Code: 5c 03 00 00 48 8d 74 24 08 48 89 df c6 05 b5 cd
    36 01 01 e8 c2 90 01 00 48 89 c6 44 89 ea 48 c7 c7 98 64 59 82 e8 d5
    9b a7 ff 0b 48 8b 7b 08 e9 f2 fa ff ff 41 8b 86 98 02 00 00 49 8b
    16 89
    [ 2373.758570] RSP: 0018:ffff8882bdb43950 EFLAGS: 00010282
    [ 2373.758573] RAX: 0000000000000050 RBX: ffff8887244c6700 RCX: 0000000000000006
    [ 2373.758575] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff88884ec56340
    [ 2373.758577] RBP: ffff888849c426c0 R08: 0000000000000004 R09: 00000000000003ba
    [ 2373.758579] R10: 0000000000000001 R11: 0000000000000029 R12: 0000000000001000
    [ 2373.758581] R13: 0000000000000000 R14: ffff888844a2e800 R15: ffff8882bdb43ac0
    [ 2373.758584] FS: 00007fc0d114f8c0(0000) GS:ffff88884ec40000(0000)
    knlGS:0000000000000000
    [ 2373.758586] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 2373.758588] CR2: 00007fc0d1ad12c0 CR3: 00000002bdb82003 CR4: 00000000003606e0
    [ 2373.758590] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 2373.758592] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 2373.758593] Call Trace:
    [ 2373.758602] ? generic_make_request+0x46/0x3d0
    [ 2373.758608] ? wait_woken+0x80/0x80
    [ 2373.758612] ? mempool_alloc+0xb7/0x1a0
    [ 2373.758618] ? submit_bio+0x30/0x110
    [ 2373.758622] ? bvec_alloc+0x7c/0xd0
    [ 2373.758628] ? __submit_merged_bio+0x68/0x390
    [ 2373.758633] ? f2fs_submit_page_write+0x1bb/0x7f0
    [ 2373.758638] ? f2fs_do_write_meta_page+0x7f/0x160
    [ 2373.758642] ? __f2fs_write_meta_page+0x70/0x140
    [ 2373.758647] ? f2fs_sync_meta_pages+0x140/0x250
    [ 2373.758653] ? f2fs_write_checkpoint+0x5c5/0x17b0
    [ 2373.758657] ? f2fs_sync_fs+0x9c/0x110
    [ 2373.758664] ? sync_filesystem+0x66/0x80
    [ 2373.758667] ? generic_shutdown_super+0x1d/0x100
    [ 2373.758670] ? kill_block_super+0x1c/0x40
    [ 2373.758674] ? kill_f2fs_super+0x64/0xb0
    [ 2373.758678] ? deactivate_locked_super+0x2d/0xb0
    [ 2373.758682] ? cleanup_mnt+0x65/0xa0
    [ 2373.758688] ? task_work_run+0x7f/0xa0
    [ 2373.758693] ? exit_to_usermode_loop+0x9c/0xa0
    [ 2373.758698] ? do_syscall_64+0xc7/0xf0
    [ 2373.758703] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 2373.758706] ---[ end trace 5d3639907c56271b ]---
    [ 2373.758780] print_req_error: I/O error, dev loop1, sector 143048
    [ 2373.758800] print_req_error: I/O error, dev loop1, sector 152200
    [ 2373.758808] print_req_error: I/O error, dev loop1, sector 8192
    [ 2373.758819] print_req_error: I/O error, dev loop1, sector 12272

    This patch adds to detect readonly device in write_checkpoint() to avoid
    trigger write IOs on it.

    Reported-by: Park Ju Hyung
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • As Park Ju Hyung reported in mailing list:

    https://sourceforge.net/p/linux-f2fs/mailman/message/36639787/

    generic_make_request: Trying to write to read-only block-device loop0 (partno 0)
    WARNING: CPU: 0 PID: 23437 at block/blk-core.c:2174 generic_make_request_checks+0x594/0x630

    generic_make_request+0x46/0x3d0
    submit_bio+0x30/0x110
    __submit_merged_bio+0x68/0x390
    f2fs_submit_page_write+0x1bb/0x7f0
    f2fs_do_write_meta_page+0x7f/0x160
    __f2fs_write_meta_page+0x70/0x140
    f2fs_sync_meta_pages+0x140/0x250
    f2fs_write_checkpoint+0x5c5/0x17b0
    f2fs_sync_fs+0x9c/0x110
    sync_filesystem+0x66/0x80
    f2fs_recover_fsync_data+0x790/0xa30
    f2fs_fill_super+0xe4e/0x1980
    mount_bdev+0x518/0x610
    mount_fs+0x34/0x13f
    vfs_kern_mount.part.11+0x4f/0x120
    do_mount+0x2d1/0xe40
    __x64_sys_mount+0xbf/0xe0
    do_syscall_64+0x4a/0xf0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    print_req_error: I/O error, dev loop0, sector 4096

    If block device is readonly, we should never trigger write IO from
    filesystem layer, but previously, orphan and journal recovery didn't
    consider such condition, result in triggering above warning, fix it.

    Reported-by: Park Ju Hyung
    Tested-by: Park Ju Hyung
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • For large_nat_bitmap feature, there is a design flaw:

    Previous:

    struct f2fs_checkpoint layout:
    +--------------------------+ 0x0000
    | checkpoint_ver |
    | ...... |
    | checksum_offset |------+
    | ...... | |
    | sit_nat_version_bitmap[] |
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Previously, f2fs_checkpoint.checksum_offset points fixed position of
    f2fs_checkpoint structure:

    "#define CP_CHKSUM_OFFSET 4092"

    It is unnecessary, and it breaks the consecutiveness of nat and sit
    bitmap stored across checkpoint park block and payload blocks.

    This patch allows f2fs to handle unfixed .checksum_offset.

    In addition, for the case checksum value is stored in the middle of
    checkpoint park, calculating checksum value with superposition method
    like we did for inode_checksum.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch changes codes as below:
    - don't use is_read_io() as a condition to judge the meta IO.
    - use .is_por to replace .is_meta to indicate IO is from recovery explicitly.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu