06 Apr, 2019

5 commits

  • [ Upstream commit ac92985864e187a1735502f6a02f54eaa655b2aa ]

    When setting /sys/fs/f2fs//iostat_enable with non-bool value, UBSAN
    reports the following warning.

    [ 7562.295484] ================================================================================
    [ 7562.296531] UBSAN: Undefined behaviour in fs/f2fs/f2fs.h:2776:10
    [ 7562.297651] load of value 64 is not a valid value for type '_Bool'
    [ 7562.298642] CPU: 1 PID: 7487 Comm: dd Not tainted 4.20.0-rc4+ #79
    [ 7562.298653] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
    [ 7562.298662] Call Trace:
    [ 7562.298760] dump_stack+0x46/0x5b
    [ 7562.298811] ubsan_epilogue+0x9/0x40
    [ 7562.298830] __ubsan_handle_load_invalid_value+0x72/0x90
    [ 7562.298863] f2fs_file_write_iter+0x29f/0x3f0
    [ 7562.298905] __vfs_write+0x115/0x160
    [ 7562.298922] vfs_write+0xa7/0x190
    [ 7562.298934] ksys_write+0x50/0xc0
    [ 7562.298973] do_syscall_64+0x4a/0xe0
    [ 7562.298992] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 7562.299001] RIP: 0033:0x7fa45ec19c00
    [ 7562.299004] Code: 73 01 c3 48 8b 0d 88 92 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d dd eb 2c 00 00 75 10 b8 01 00 00 00 0f 05 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ce 8f 01 00 48 89 04 24
    [ 7562.299044] RSP: 002b:00007ffca52b49e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    [ 7562.299052] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa45ec19c00
    [ 7562.299059] RDX: 0000000000000400 RSI: 000000000093f000 RDI: 0000000000000001
    [ 7562.299065] RBP: 000000000093f000 R08: 0000000000000004 R09: 0000000000000000
    [ 7562.299071] R10: 00007ffca52b47b0 R11: 0000000000000246 R12: 0000000000000400
    [ 7562.299077] R13: 000000000093f000 R14: 000000000093f400 R15: 0000000000000000
    [ 7562.299091] ================================================================================

    So, if iostat_enable is enabled, set its value as true.

    Signed-off-by: Sheng Yong
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Sheng Yong
     
  • [ Upstream commit 500e0b28ecd3c5aade98f3c3a339d18dcb166bb6 ]

    We use below condition to check inline_xattr_size boundary:

    if (!F2FS_OPTION(sbi).inline_xattr_size ||
    F2FS_OPTION(sbi).inline_xattr_size >=
    DEF_ADDRS_PER_INODE -
    F2FS_TOTAL_EXTRA_ATTR_SIZE -
    DEF_INLINE_RESERVED_SIZE -
    DEF_MIN_INLINE_SIZE)

    There is there problems in that check:
    - we should allow inline_xattr_size equaling to min size of inline
    {data,dentry} area.
    - F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
    different size unit, previous one is 4 bytes, latter one is 1 bytes.
    - DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
    however, we need to consider min size of inline dentry area as well,
    minimal inline dentry should at least contain two entries: '.' and
    '..', so that min inline_dentry size is 40 bytes.

    .bitmap 1 * 1 = 1
    .reserved 1 * 1 = 1
    .dentry 11 * 2 = 22
    .filename 8 * 2 = 16
    total 40

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Chao Yu
     
  • [ Upstream commit 9083977dabf3833298ddcd40dee28687f1e6b483 ]

    Fix below warning coming because of using mutex lock in atomic context.

    BUG: sleeping function called from invalid context at kernel/locking/mutex.c:98
    in_atomic(): 1, irqs_disabled(): 0, pid: 585, name: sh
    Preemption disabled at: __radix_tree_preload+0x28/0x130
    Call trace:
    dump_backtrace+0x0/0x2b4
    show_stack+0x20/0x28
    dump_stack+0xa8/0xe0
    ___might_sleep+0x144/0x194
    __might_sleep+0x58/0x8c
    mutex_lock+0x2c/0x48
    f2fs_trace_pid+0x88/0x14c
    f2fs_set_node_page_dirty+0xd0/0x184

    Do not use f2fs_radix_tree_insert() to avoid doing cond_resched() with
    spin_lock() acquired.

    Signed-off-by: Sahitya Tummala
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Sahitya Tummala
     
  • [ Upstream commit aadcef64b22f668c1a107b86d3521d9cac915c24 ]

    As Jiqun Li reported in bugzilla:

    https://bugzilla.kernel.org/show_bug.cgi?id=202883

    sometimes, dead lock when make system call SYS_getdents64 with fsync() is
    called by another process.

    monkey running on android9.0

    1. task 9785 held sbi->cp_rwsem and waiting lock_page()
    2. task 10349 held mm_sem and waiting sbi->cp_rwsem
    3. task 9709 held lock_page() and waiting mm_sem

    so this is a dead lock scenario.

    task stack is show by crash tools as following

    crash_arm64> bt ffffffc03c354080
    PID: 9785 TASK: ffffffc03c354080 CPU: 1 COMMAND: "RxIoScheduler-3"
    >> #7 [ffffffc01b50fac0] __lock_page at ffffff80081b11e8

    crash-arm64> bt 10349
    PID: 10349 TASK: ffffffc018b83080 CPU: 1 COMMAND: "BUGLY_ASYNC_UPL"
    >> #3 [ffffffc01f8cfa40] rwsem_down_read_failed at ffffff8008a93afc
    PC: 00000033 LR: 00000000 SP: 00000000 PSTATE: ffffffffffffffff

    crash-arm64> bt 9709
    PID: 9709 TASK: ffffffc03e7f3080 CPU: 1 COMMAND: "IntentService[A"
    >> #3 [ffffffc001e67850] rwsem_down_read_failed at ffffff8008a93afc
    >> #8 [ffffffc001e67b80] el1_ia at ffffff8008084fc4
    PC: ffffff8008274114 [compat_filldir64+120]
    LR: ffffff80083584d4 [f2fs_fill_dentries+448]
    SP: ffffffc001e67b80 PSTATE: 80400145
    X29: ffffffc001e67b80 X28: 0000000000000000 X27: 000000000000001a
    X26: 00000000000093d7 X25: ffffffc070d52480 X24: 0000000000000008
    X23: 0000000000000028 X22: 00000000d43dfd60 X21: ffffffc001e67e90
    X20: 0000000000000011 X19: ffffff80093a4000 X18: 0000000000000000
    X17: 0000000000000000 X16: 0000000000000000 X15: 0000000000000000
    X14: ffffffffffffffff X13: 0000000000000008 X12: 0101010101010101
    X11: 7f7f7f7f7f7f7f7f X10: 6a6a6a6a6a6a6a6a X9: 7f7f7f7f7f7f7f7f
    X8: 0000000080808000 X7: ffffff800827409c X6: 0000000080808000
    X5: 0000000000000008 X4: 00000000000093d7 X3: 000000000000001a
    X2: 0000000000000011 X1: ffffffc070d52480 X0: 0000000000800238
    >> #9 [ffffffc001e67be0] f2fs_fill_dentries at ffffff80083584d0
    PC: 0000003c LR: 00000000 SP: 00000000 PSTATE: 000000d9
    X12: f48a02ff X11: d4678960 X10: d43dfc00 X9: d4678ae4
    X8: 00000058 X7: d4678994 X6: d43de800 X5: 000000d9
    X4: d43dfc0c X3: d43dfc10 X2: d46799c8 X1: 00000000
    X0: 00001068

    Below potential deadlock will happen between three threads:
    Thread A Thread B Thread C
    - f2fs_do_sync_file
    - f2fs_write_checkpoint
    - down_write(&sbi->node_change) -- 1)
    - do_page_fault
    - down_write(&mm->mmap_sem) -- 2)
    - do_wp_page
    - f2fs_vm_page_mkwrite
    - getdents64
    - f2fs_read_inline_dir
    - lock_page -- 3)
    - f2fs_sync_node_pages
    - lock_page -- 3)
    - __do_map_lock
    - down_read(&sbi->node_change) -- 1)
    - f2fs_fill_dentries
    - dir_emit
    - compat_filldir64
    - do_page_fault
    - down_read(&mm->mmap_sem) -- 2)

    Since f2fs_readdir is protected by inode.i_rwsem, there should not be
    any updates in inode page, we're safe to lookup dents in inode page
    without its lock held, so taking off the lock to improve concurrency
    of readdir and avoid potential deadlock.

    Reported-by: Jiqun Li
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Chao Yu
     
  • [ Upstream commit 2c28aba8b2e2a51749fa66e01b68e1cd5b53e022 ]

    With below testcase, we will fail to find existed xattr entry:

    1. mkfs.f2fs -O extra_attr -O flexible_inline_xattr /dev/zram0
    2. mount -t f2fs -o inline_xattr_size=1 /dev/zram0 /mnt/f2fs/
    3. touch /mnt/f2fs/file
    4. setfattr -n "user.name" -v 0 /mnt/f2fs/file
    5. getfattr -n "user.name" /mnt/f2fs/file

    /mnt/f2fs/file: user.name: No such attribute

    The reason is for inode which has very small inline xattr size,
    __find_inline_xattr() will fail to traverse any entry due to first
    entry may not be loaded from xattr node yet, later, we may skip to
    check entire xattr datas in __find_xattr(), result in such wrong
    condition.

    This patch adds condition to check such case to avoid this issue.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Chao Yu
     

27 Mar, 2019

1 commit

  • commit 48432984d718c95cf13e26d487c2d1b697c3c01f upstream.

    Thread A Thread B
    - __fput
    - f2fs_release_file
    - drop_inmem_pages
    - mutex_lock(&fi->inmem_lock)
    - __revoke_inmem_pages
    - lock_page(page)
    - open
    - f2fs_setattr
    - truncate_setsize
    - truncate_inode_pages_range
    - lock_page(page)
    - truncate_cleanup_page
    - f2fs_invalidate_page
    - drop_inmem_page
    - mutex_lock(&fi->inmem_lock);

    We may encounter above ABBA deadlock as reported by Kyungtae Kim:

    I'm reporting a bug in linux-4.17.19: "INFO: task hung in
    drop_inmem_page" (no reproducer)

    I think this might be somehow related to the following:
    https://groups.google.com/forum/#!searchin/syzkaller-bugs/INFO$3A$20task$20hung$20in$20%7Csort:date/syzkaller-bugs/c6soBTrdaIo/AjAzPeIzCgAJ

    =========================================
    INFO: task syz-executor7:10822 blocked for more than 120 seconds.
    Not tainted 4.17.19 #1
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    syz-executor7 D27024 10822 6346 0x00000004
    Call Trace:
    context_switch kernel/sched/core.c:2867 [inline]
    __schedule+0x721/0x1e60 kernel/sched/core.c:3515
    schedule+0x88/0x1c0 kernel/sched/core.c:3559
    schedule_preempt_disabled+0x18/0x30 kernel/sched/core.c:3617
    __mutex_lock_common kernel/locking/mutex.c:833 [inline]
    __mutex_lock+0x5bd/0x1410 kernel/locking/mutex.c:893
    mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:908
    drop_inmem_page+0xcb/0x810 fs/f2fs/segment.c:327
    f2fs_invalidate_page+0x337/0x5e0 fs/f2fs/data.c:2401
    do_invalidatepage mm/truncate.c:165 [inline]
    truncate_cleanup_page+0x261/0x330 mm/truncate.c:187
    truncate_inode_pages_range+0x552/0x1610 mm/truncate.c:367
    truncate_inode_pages mm/truncate.c:478 [inline]
    truncate_pagecache+0x6d/0x90 mm/truncate.c:801
    truncate_setsize+0x81/0xa0 mm/truncate.c:826
    f2fs_setattr+0x44f/0x1270 fs/f2fs/file.c:781
    notify_change+0xa62/0xe80 fs/attr.c:313
    do_truncate+0x12e/0x1e0 fs/open.c:63
    do_last fs/namei.c:2955 [inline]
    path_openat+0x2042/0x29f0 fs/namei.c:3505
    do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
    do_sys_open+0x35e/0x4e0 fs/open.c:1101
    __do_sys_open fs/open.c:1119 [inline]
    __se_sys_open fs/open.c:1114 [inline]
    __x64_sys_open+0x89/0xc0 fs/open.c:1114
    do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x4497b9
    RSP: 002b:00007f734e459c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
    RAX: ffffffffffffffda RBX: 00007f734e45a6cc RCX: 00000000004497b9
    RDX: 0000000000000104 RSI: 00000000000a8280 RDI: 0000000020000080
    RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
    R13: 0000000000007230 R14: 00000000006f02d0 R15: 00007f734e45a700
    INFO: task syz-executor7:10858 blocked for more than 120 seconds.
    Not tainted 4.17.19 #1
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    syz-executor7 D28880 10858 6346 0x00000004
    Call Trace:
    context_switch kernel/sched/core.c:2867 [inline]
    __schedule+0x721/0x1e60 kernel/sched/core.c:3515
    schedule+0x88/0x1c0 kernel/sched/core.c:3559
    __rwsem_down_write_failed_common kernel/locking/rwsem-xadd.c:565 [inline]
    rwsem_down_write_failed+0x5e6/0xc90 kernel/locking/rwsem-xadd.c:594
    call_rwsem_down_write_failed+0x17/0x30 arch/x86/lib/rwsem.S:117
    __down_write arch/x86/include/asm/rwsem.h:142 [inline]
    down_write+0x58/0xa0 kernel/locking/rwsem.c:72
    inode_lock include/linux/fs.h:713 [inline]
    do_truncate+0x120/0x1e0 fs/open.c:61
    do_last fs/namei.c:2955 [inline]
    path_openat+0x2042/0x29f0 fs/namei.c:3505
    do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
    do_sys_open+0x35e/0x4e0 fs/open.c:1101
    __do_sys_open fs/open.c:1119 [inline]
    __se_sys_open fs/open.c:1114 [inline]
    __x64_sys_open+0x89/0xc0 fs/open.c:1114
    do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x4497b9
    RSP: 002b:00007f734e3b4c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
    RAX: ffffffffffffffda RBX: 00007f734e3b56cc RCX: 00000000004497b9
    RDX: 0000000000000104 RSI: 00000000000a8280 RDI: 0000000020000080
    RBP: 000000000071c238 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
    R13: 0000000000007230 R14: 00000000006f02d0 R15: 00007f734e3b5700
    INFO: task syz-executor5:10829 blocked for more than 120 seconds.
    Not tainted 4.17.19 #1
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    syz-executor5 D28760 10829 6308 0x80000002
    Call Trace:
    context_switch kernel/sched/core.c:2867 [inline]
    __schedule+0x721/0x1e60 kernel/sched/core.c:3515
    schedule+0x88/0x1c0 kernel/sched/core.c:3559
    io_schedule+0x21/0x80 kernel/sched/core.c:5179
    wait_on_page_bit_common mm/filemap.c:1100 [inline]
    __lock_page+0x2b5/0x390 mm/filemap.c:1273
    lock_page include/linux/pagemap.h:483 [inline]
    __revoke_inmem_pages+0xb35/0x11c0 fs/f2fs/segment.c:231
    drop_inmem_pages+0xa3/0x3e0 fs/f2fs/segment.c:306
    f2fs_release_file+0x2c7/0x330 fs/f2fs/file.c:1556
    __fput+0x2c7/0x780 fs/file_table.c:209
    ____fput+0x1a/0x20 fs/file_table.c:243
    task_work_run+0x151/0x1d0 kernel/task_work.c:113
    exit_task_work include/linux/task_work.h:22 [inline]
    do_exit+0x8ba/0x30a0 kernel/exit.c:865
    do_group_exit+0x13b/0x3a0 kernel/exit.c:968
    get_signal+0x6bb/0x1650 kernel/signal.c:2482
    do_signal+0x84/0x1b70 arch/x86/kernel/signal.c:810
    exit_to_usermode_loop+0x155/0x190 arch/x86/entry/common.c:162
    prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
    syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
    do_syscall_64+0x445/0x4e0 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x4497b9
    RSP: 002b:00007f1c68e74ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
    RAX: fffffffffffffe00 RBX: 000000000071bf80 RCX: 00000000004497b9
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000071bf80
    RBP: 000000000071bf80 R08: 0000000000000000 R09: 000000000071bf58
    R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
    R13: 0000000000000000 R14: 00007f1c68e759c0 R15: 00007f1c68e75700

    This patch tries to use trylock_page to mitigate such deadlock condition
    for fix.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Chao Yu
     

19 Mar, 2019

1 commit

  • commit 31867b23d7d1ee3535136c6a410a6cf56f666bfc upstream.

    Otherwise, we can get wrong counts incurring checkpoint hang.

    IO_W (CP: -24, Data: 24, Flush: ( 0 0 1), Discard: ( 0 0))

    Thread A Thread B
    - f2fs_write_data_pages
    - __write_data_page
    - f2fs_submit_page_write
    - inc_page_count(F2FS_WB_DATA)
    type is F2FS_WB_DATA due to file is non-atomic one
    - f2fs_ioc_start_atomic_write
    - set_inode_flag(FI_ATOMIC_FILE)
    - f2fs_write_end_io
    - dec_page_count(F2FS_WB_CP_DATA)
    type is F2FS_WB_DATA due to file becomes
    atomic one

    Cc:
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Jaegeuk Kim
     

13 Feb, 2019

5 commits

  • [ Upstream commit e4589fa545e0020dbbc3c9bde35f35f949901392 ]

    When there is a failure in f2fs_fill_super() after/during
    the recovery of fsync'd nodes, it frees the current sbi and
    retries again. This time the mount is successful, but the files
    that got recovered before retry, still holds the extent tree,
    whose extent nodes list is corrupted since sbi and sbi->extent_list
    is freed up. The list_del corruption issue is observed when the
    file system is getting unmounted and when those recoverd files extent
    node is being freed up in the below context.

    list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)

    kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
    lr : __list_del_entry_valid+0x94/0xb4
    pc : __list_del_entry_valid+0x94/0xb4

    Call trace:
    __list_del_entry_valid+0x94/0xb4
    __release_extent_node+0xb0/0x114
    __free_extent_tree+0x58/0x7c
    f2fs_shrink_extent_tree+0xdc/0x3b0
    f2fs_leave_shrinker+0x28/0x7c
    f2fs_put_super+0xfc/0x1e0
    generic_shutdown_super+0x70/0xf4
    kill_block_super+0x2c/0x5c
    kill_f2fs_super+0x44/0x50
    deactivate_locked_super+0x60/0x8c
    deactivate_super+0x68/0x74
    cleanup_mnt+0x40/0x78
    __cleanup_mnt+0x1c/0x28
    task_work_run+0x48/0xd0
    do_notify_resume+0x678/0xe98
    work_pending+0x8/0x14

    Fix this by not creating extents for those recovered files if shrinker is
    not registered yet. Once mount is successful and shrinker is registered,
    those files can have extents again.

    Signed-off-by: Sahitya Tummala
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Sahitya Tummala
     
  • [ Upstream commit 60aa4d5536ab7fe32433ca1173bd9d6633851f27 ]

    iput() on sbi->node_inode can update sbi->stat_info
    in the below context, if the f2fs_write_checkpoint()
    has failed with error.

    f2fs_balance_fs_bg+0x1ac/0x1ec
    f2fs_write_node_pages+0x4c/0x260
    do_writepages+0x80/0xbc
    __writeback_single_inode+0xdc/0x4ac
    writeback_single_inode+0x9c/0x144
    write_inode_now+0xc4/0xec
    iput+0x194/0x22c
    f2fs_put_super+0x11c/0x1e8
    generic_shutdown_super+0x70/0xf4
    kill_block_super+0x2c/0x5c
    kill_f2fs_super+0x44/0x50
    deactivate_locked_super+0x60/0x8c
    deactivate_super+0x68/0x74
    cleanup_mnt+0x40/0x78

    Fix this by moving f2fs_destroy_stats() further below iput() in
    both f2fs_put_super() and f2fs_fill_super() paths.

    Signed-off-by: Sahitya Tummala
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Sahitya Tummala
     
  • [ Upstream commit f6176473a0c7472380eef72ebeb330cf9485bf0a ]

    When call f2fs_acl_create_masq() failed, the caller f2fs_acl_create()
    should return -EIO instead of -ENOMEM, this patch makes it consistent
    with posix_acl_create() which has been fixed in commit beaf226b863a
    ("posix_acl: don't ignore return value of posix_acl_create_masq()").

    Fixes: 83dfe53c185e ("f2fs: fix reference leaks in f2fs_acl_create")
    Signed-off-by: Tiezhu Yang
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Tiezhu Yang
     
  • [ Upstream commit 2866fb16d67992195b0526d19e65acb6640fb87f ]

    The following race could lead to inconsistent SIT bitmap:

    Task A Task B
    ====== ======
    f2fs_write_checkpoint
    block_operations
    f2fs_lock_all
    down_write(node_change)
    down_write(node_write)
    ... sync ...
    up_write(node_change)
    f2fs_file_write_iter
    set_inode_flag(FI_NO_PREALLOC)
    ......
    f2fs_write_begin(index=0, has inline data)
    prepare_write_begin
    __do_map_lock(AIO) => down_read(node_change)
    f2fs_convert_inline_page => update SIT
    __do_map_lock(AIO) => up_read(node_change)
    f2fs_flush_sit_entries
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Sheng Yong
     
  • [ Upstream commit b61ac5b720146c619c7cdf17eff2551b934399e5 ]

    This patch move dir data flush to write checkpoint process, by
    doing this, it may reduce some time for dir fsync.

    pre:
    -f2fs_do_sync_file enter
    -file_write_and_wait_range
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Yunlei He
     

10 Jan, 2019

3 commits

  • commit 64beba0558fce7b59e9a8a7afd77290e82a22163 upstream.

    There is a security report where f2fs_getxattr() has a hole to expose wrong
    memory region when the image is malformed like this.

    f2fs_getxattr: entry->e_name_len: 4, size: 12288, buffer_size: 16384, len: 4

    Cc:
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Jaegeuk Kim
     
  • commit 88960068f25fcc3759455d85460234dcc9d43fef upstream.

    Treat "block_count" from struct f2fs_super_block as 64-bit little endian
    value in sanity_check_raw_super() because struct f2fs_super_block
    declares "block_count" as "__le64".

    This fixes a bug where the superblock validation fails on big endian
    devices with the following error:
    F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0)
    F2FS-fs (sda1): Can't find valid F2FS filesystem in 1th superblock
    F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0)
    F2FS-fs (sda1): Can't find valid F2FS filesystem in 2th superblock
    As result of this the partition cannot be mounted.

    With this patch applied the superblock validation works fine and the
    partition can be mounted again:
    F2FS-fs (sda1): Mounted with checkpoint version = 7c84

    My little endian x86-64 hardware was able to mount the partition without
    this fix.
    To confirm that mounting f2fs filesystems works on big endian machines
    again I tested this on a 32-bit MIPS big endian (lantiq) device.

    Fixes: 0cfe75c5b01199 ("f2fs: enhance sanity_check_raw_super() to avoid potential overflows")
    Cc: stable@vger.kernel.org
    Signed-off-by: Martin Blumenstingl
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Martin Blumenstingl
     
  • commit 0ea295dd853e0879a9a30ab61f923c26be35b902 upstream.

    The function truncate_node frees the page with f2fs_put_page. However,
    the page index is read after that. So, the patch reads the index before
    freeing the page.

    Fixes: bf39c00a9a7f ("f2fs: drop obsolete node page when it is truncated")
    Cc:
    Signed-off-by: Pan Bian
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Pan Bian
     

14 Nov, 2018

11 commits

  • commit 4c58ed076875f36dae0f240da1e25e99e5d4afb8 upstream.

    Below race can cause reversed reference on dirty count, fix it by
    relocating __submit_bio() and inc_page_count().

    Thread A Thread B
    - f2fs_inplace_write_data
    - f2fs_submit_page_bio
    - __submit_bio
    - f2fs_write_end_io
    - dec_page_count
    - inc_page_count

    Cc:
    Fixes: d1b3e72d5490 ("f2fs: submit bio of in-place-update pages")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Chao Yu
     
  • commit ef2a007134b4eaa39264c885999f296577bc87d2 upstream.

    Testcase to reproduce this bug:
    1. mkfs.f2fs /dev/sdd
    2. mount -t f2fs /dev/sdd /mnt/f2fs
    3. touch /mnt/f2fs/file
    4. sync
    5. chattr +A /mnt/f2fs/file
    6. xfs_io -f /mnt/f2fs/file -c "fsync"
    7. godown /mnt/f2fs
    8. umount /mnt/f2fs
    9. mount -t f2fs /dev/sdd /mnt/f2fs
    10. chattr -A /mnt/f2fs/file
    11. xfs_io -f /mnt/f2fs/file -c "fsync"
    12. umount /mnt/f2fs
    13. mount -t f2fs /dev/sdd /mnt/f2fs
    14. lsattr /mnt/f2fs/file

    -----------------N- /mnt/f2fs/file

    But actually, we expect the corrct result is:

    -------A---------N- /mnt/f2fs/file

    The reason is in step 9) we missed to recover cold bit flag in inode
    block, so later, in fsync, we will skip write inode block due to below
    condition check, result in lossing data in another SPOR.

    f2fs_fsync_node_pages()
    if (!IS_DNODE(page) || !is_cold_node(page))
    continue;

    Note that, I guess that some non-dir inode has already lost cold bit
    during POR, so in order to reenable recovery for those inode, let's
    try to recover cold bit in f2fs_iget() to save more fsynced data.

    Fixes: c56675750d7c ("f2fs: remove unneeded set_cold_node()")
    Cc: 4.17+
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Chao Yu
     
  • commit 89d13c38501df730cbb2e02c4499da1b5187119d upstream.

    This patch fixes missing up_read call.

    Fixes: c9b60788fc76 ("f2fs: fix to do sanity check with block address in main area")
    Cc: # 4.19+
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Jaegeuk Kim
     
  • commit 164a63fa6b384e30ceb96ed80bc7dc3379bc0960 upstream.

    This reverts commit 66110abc4c931f879d70e83e1281f891699364bf.

    If we clear the cold data flag out of the writeback flow, we can miscount
    -1 by end_io, which incurs a deadlock caused by all I/Os being blocked during
    heavy GC.

    Balancing F2FS Async:
    - IO (CP: 1, Data: -1, Flush: ( 0 0 1), Discard: ( ...

    GC thread: IRQ
    - move_data_page()
    - set_page_dirty()
    - clear_cold_data()
    - f2fs_write_end_io()
    - type = WB_DATA_TYPE(page);
    here, we get wrong type
    - dec_page_count(sbi, type);
    - f2fs_wait_on_page_writeback()

    Cc:
    Reported-and-Tested-by: Park Ju Hyung
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Jaegeuk Kim
     
  • [ Upstream commit 1378752b9921e60749eaf18ec6c47b33f9001abb ]

    generic/417 reported as blow:

    ------------[ cut here ]------------
    kernel BUG at /home/yuchao/git/devf2fs/inode.c:695!
    invalid opcode: 0000 [#1] PREEMPT SMP
    CPU: 1 PID: 21697 Comm: umount Tainted: G W O 4.18.0-rc2+ #39
    Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
    EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
    Call Trace:
    ? _raw_spin_unlock+0x2c/0x50
    evict+0xa8/0x170
    dispose_list+0x34/0x40
    evict_inodes+0x118/0x120
    generic_shutdown_super+0x41/0x100
    ? rcu_read_lock_sched_held+0x97/0xa0
    kill_block_super+0x22/0x50
    kill_f2fs_super+0x6f/0x80 [f2fs]
    deactivate_locked_super+0x3d/0x70
    deactivate_super+0x40/0x60
    cleanup_mnt+0x39/0x70
    __cleanup_mnt+0x10/0x20
    task_work_run+0x81/0xa0
    exit_to_usermode_loop+0x59/0xa7
    do_fast_syscall_32+0x1f5/0x22c
    entry_SYSENTER_32+0x53/0x86
    EIP: f2fs_evict_inode+0x556/0x580 [f2fs]

    It can simply reproduced with scripts:

    Enable quota feature during mkfs.

    Testcase1:
    1. mkfs.f2fs /dev/zram0
    2. mount -t f2fs /dev/zram0 /mnt/f2fs
    3. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4k" -c "fsync"
    4. godown /mnt/f2fs
    5. umount /mnt/f2fs
    6. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
    7. umount /mnt/f2fs

    Testcase2:
    1. mkfs.f2fs /dev/zram0
    2. mount -t f2fs /dev/zram0 /mnt/f2fs
    3. touch /mnt/f2fs/file
    4. create process[pid = x] do:
    a) open /mnt/f2fs/file;
    b) unlink /mnt/f2fs/file
    5. godown -f /mnt/f2fs
    6. kill process[pid = x]
    7. umount /mnt/f2fs
    8. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
    9. umount /mnt/f2fs

    The reason is: during recovery, i_{c,m}time of inode will be updated, then
    the inode can be set dirty w/o being tracked in sbi->inode_list[DIRTY_META]
    global list, so later write_checkpoint will not flush such dirty inode into
    node page.

    Once umount is called, sync_filesystem() in generic_shutdown_super() will
    skip syncng dirty inodes due to sb_rdonly check, leaving dirty inodes
    there.

    To solve this issue, during umount, add remove SB_RDONLY flag in
    sb->s_flags, to make sure sync_filesystem() will not be skipped.

    Signed-off-by: Chao Yu

    Signed-off-by: Jaegeuk Kim

    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Chao Yu
     
  • [ Upstream commit cda9cc595f0bb6ffa51a4efc4b6533dfa4039b4c ]

    Now, we depend on fsck to ensure quota file data is ok,
    so we scan whole partition if checkpoint without umount
    flag. It's same for quota off error case, which may make
    quota file data inconsistent.

    generic/019 reports below error:

    __quota_error: 1160 callbacks suppressed
    Quota error (device zram1): write_blk: dquota write failed
    Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
    Quota error (device zram1): write_blk: dquota write failed
    Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
    Quota error (device zram1): write_blk: dquota write failed
    Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
    Quota error (device zram1): write_blk: dquota write failed
    Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
    Quota error (device zram1): write_blk: dquota write failed
    Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
    VFS: Busy inodes after unmount of zram1. Self-destruct in 5 seconds. Have a nice day...

    If we failed in below path due to fail to write dquot block, we will miss
    to release quota inode, fix it.

    - f2fs_put_super
    - f2fs_quota_off_umount
    - f2fs_quota_off
    - f2fs_quota_sync
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Yunlei He
     
  • [ Upstream commit b430f7263673eab1dc40e662ae3441a9619d16b8 ]

    In the call trace below, we might sleep in function dput().

    So in order to avoid sleeping under spin_lock, we remove f2fs_mark_inode_dirty_sync
    from __try_update_largest_extent && __drop_largest_extent.

    BUG: sleeping function called from invalid context at fs/dcache.c:796
    Call trace:
    dump_backtrace+0x0/0x3f4
    show_stack+0x24/0x30
    dump_stack+0xe0/0x138
    ___might_sleep+0x2a8/0x2c8
    __might_sleep+0x78/0x10c
    dput+0x7c/0x750
    block_dump___mark_inode_dirty+0x120/0x17c
    __mark_inode_dirty+0x344/0x11f0
    f2fs_mark_inode_dirty_sync+0x40/0x50
    __insert_extent_tree+0x2e0/0x2f4
    f2fs_update_extent_tree_range+0xcf4/0xde8
    f2fs_update_extent_cache+0x114/0x12c
    f2fs_update_data_blkaddr+0x40/0x50
    write_data_page+0x150/0x314
    do_write_data_page+0x648/0x2318
    __write_data_page+0xdb4/0x1640
    f2fs_write_cache_pages+0x768/0xafc
    __f2fs_write_data_pages+0x590/0x1218
    f2fs_write_data_pages+0x64/0x74
    do_writepages+0x74/0xe4
    __writeback_single_inode+0xdc/0x15f0
    writeback_sb_inodes+0x574/0xc98
    __writeback_inodes_wb+0x190/0x204
    wb_writeback+0x730/0xf14
    wb_check_old_data_flush+0x1bc/0x1c8
    wb_workfn+0x554/0xf74
    process_one_work+0x440/0x118c
    worker_thread+0xac/0x974
    kthread+0x1a0/0x1c8
    ret_from_fork+0x10/0x1c

    Signed-off-by: Zhikang Zhang
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Zhikang Zhang
     
  • [ Upstream commit 19c73a691ccf6fb2f12d4e9cf9830023966cec88 ]

    Testcase to reproduce this bug:
    1. mkfs.f2fs /dev/sdd
    2. mount -t f2fs /dev/sdd /mnt/f2fs
    3. touch /mnt/f2fs/file
    4. sync
    5. chattr +A /mnt/f2fs/file
    6. xfs_io -f /mnt/f2fs/file -c "fsync"
    7. godown /mnt/f2fs
    8. umount /mnt/f2fs
    9. mount -t f2fs /dev/sdd /mnt/f2fs
    10. lsattr /mnt/f2fs/file

    -----------------N- /mnt/f2fs/file

    But actually, we expect the corrct result is:

    -------A---------N- /mnt/f2fs/file

    The reason is we didn't recover inode.i_flags field during mount,
    fix it.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Chao Yu
     
  • [ Upstream commit 5cd1f387a13b5188b4edb4c834310302a85a6ea2 ]

    Testcase to reproduce this bug:
    1. mkfs.f2fs -O extra_attr -O inode_crtime /dev/sdd
    2. mount -t f2fs /dev/sdd /mnt/f2fs
    3. touch /mnt/f2fs/file
    4. xfs_io -f /mnt/f2fs/file -c "fsync"
    5. godown /mnt/f2fs
    6. umount /mnt/f2fs
    7. mount -t f2fs /dev/sdd /mnt/f2fs
    8. xfs_io -f /mnt/f2fs/file -c "statx -r"

    stat.btime.tv_sec = 0
    stat.btime.tv_nsec = 0

    This patch fixes to recover inode creation time fields during
    mount.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Chao Yu
     
  • [ Upstream commit fb7d70db305a1446864227abf711b756568f8242 ]

    When running fault injection test, I hit somewhat wrong behavior in f2fs_gc ->
    gc_data_segment():

    0. fault injection generated some PageError'ed pages

    1. gc_data_segment
    -> f2fs_get_read_data_page(REQ_RAHEAD)

    2. move_data_page
    -> f2fs_get_lock_data_page()
    -> f2f_get_read_data_page()
    -> f2fs_submit_page_read()
    -> submit_bio(READ)
    -> return EIO due to PageError
    -> fail to move data

    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jaegeuk Kim
     
  • [ Upstream commit 78efac537de33faab9a4302cc05a70bb4a8b3b63 ]

    Now, we have supported cgroup writeback, it depends on correctly IO
    account of specified filesystem.

    But in commit d1b3e72d5490 ("f2fs: submit bio of in-place-update pages"),
    we split write paths from f2fs_submit_page_mbio() to two:
    - f2fs_submit_page_bio() for IPU path
    - f2fs_submit_page_bio() for OPU path

    But still we account write IO only in f2fs_submit_page_mbio(), result in
    incorrect IO account, fix it by adding missing IO account in IPU path.

    Fixes: d1b3e72d5490 ("f2fs: submit bio of in-place-update pages")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Chao Yu
     

23 Aug, 2018

1 commit

  • Pull f2fs updates from Jaegeuk Kim:
    "In this round, we've tuned f2fs to improve general performance by
    serializing block allocation and enhancing discard flows like fstrim
    which avoids user IO contention. And we've added fsync_mode=nobarrier
    which gives an option to user where it skips issuing cache_flush
    commands to underlying flash storage. And there are many bug fixes
    related to fuzzed images, revoked atomic writes, quota ops, and minor
    direct IO.

    Enhancements:
    - add fsync_mode=nobarrier which bypasses cache_flush command
    - enhance the discarding flow which avoids user IOs and issues in
    LBA order
    - readahead some encrypted blocks during GC
    - enable in-memory inode checksum to verify the blocks if
    F2FS_CHECK_FS is set
    - enhance nat_bits behavior
    - set -o discard by default
    - set REQ_RAHEAD to bio in ->readpages

    Bug fixes:
    - fix a corner case to corrupt atomic_writes revoking flow
    - revisit i_gc_rwsem to fix race conditions
    - fix some dio behaviors captured by xfstests
    - correct handling errors given by quota-related failures
    - add many sanity check flows to avoid fuzz test failures
    - add more error number propagation to their callers
    - fix several corner cases to continue fault injection w/ shutdown
    loop"

    * tag 'f2fs-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (89 commits)
    f2fs: readahead encrypted block during GC
    f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc
    f2fs: fix performance issue observed with multi-thread sequential read
    f2fs: fix to skip verifying block address for non-regular inode
    f2fs: rework fault injection handling to avoid a warning
    f2fs: support fault_type mount option
    f2fs: fix to return success when trimming meta area
    f2fs: fix use-after-free of dicard command entry
    f2fs: support discard submission error injection
    f2fs: split discard command in prior to block layer
    f2fs: wake up gc thread immediately when gc_urgent is set
    f2fs: fix incorrect range->len in f2fs_trim_fs()
    f2fs: refresh recent accessed nat entry in lru list
    f2fs: fix avoid race between truncate and background GC
    f2fs: avoid race between zero_range and background GC
    f2fs: fix to do sanity check with block address in main area v2
    f2fs: fix to do sanity check with inline flags
    f2fs: fix to reset i_gc_failures correctly
    f2fs: fix invalid memory access
    f2fs: fix to avoid broken of dnode block list
    ...

    Linus Torvalds
     

21 Aug, 2018

3 commits

  • During GC, for each encrypted block, we will read block synchronously
    into meta page, and then submit it into current cold data log area.

    So this block read model with 4k granularity can make poor performance,
    like migrating non-encrypted block, let's readahead encrypted block
    as well to improve migration performance.

    To implement this, we choose meta page that its index is old block
    address of the encrypted block, and readahead ciphertext into this
    page, later, if readaheaded page is still updated, we will load its
    data into target meta page, and submit the write IO.

    Note that for OPU, truncation, deletion, we need to invalid meta
    page after we invalid old block address, to make sure we won't load
    invalid data from target meta page during encrypted block migration.

    for ((i = 0; i < 1000; i++))
    do {
    xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
    } done

    for ((i = 0; i < 1000; i+=2))
    do {
    rm /mnt/f2fs/dir/$i;
    } done

    ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);

    Before:
    gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
    gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
    gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
    gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
    gc-6549 [001] .... 214682.213902: block_plug: [gc]
    gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
    gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
    gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
    gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
    gc-6549 [001] .... 214682.226414: block_plug: [gc]
    gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
    gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
    gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
    gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
    gc-6549 [001] .... 214682.226911: block_plug: [gc]
    gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
    gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1

    After:
    gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
    gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
    gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
    gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
    gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
    gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
    gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
    gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
    gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
    gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
    gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
    gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
    gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
    gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
    gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
    gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
    gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
    gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
    gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
    gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
    gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
    gc-5678 [003] .... 214327.026046: block_plug: [gc]

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • The f2fs_gc() called by f2fs_balance_fs() requires to be called outside of
    fi->i_gc_rwsem[WRITE], since f2fs_gc() can try to grab it in a loop.

    If it hits the miximum retrials in GC, let's give a chance to release
    gc_mutex for a short time in order not to go into live lock in the worst
    case.

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This reverts the commit - "b93f771 - f2fs: remove writepages lock"
    to fix the drop in sequential read throughput.

    Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L
    device: UFS

    Before -
    read throughput: 185 MB/s
    total read requests: 85177 (of these ~80000 are 4KB size requests).
    total write requests: 2546 (of these ~2208 requests are written in 512KB).

    After -
    read throughput: 758 MB/s
    total read requests: 2417 (of these ~2042 are 512KB reads).
    total write requests: 2701 (of these ~2034 requests are written in 512KB).

    Signed-off-by: Sahitya Tummala
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

18 Aug, 2018

1 commit

  • a_ops->readpages() is only ever used for read-ahead, yet we don't flag
    the IO being submitted as such. Fix that up. Any file system that uses
    mpage_readpages() as its ->readpages() implementation will now get this
    right.

    Since we're passing in whether the IO is read-ahead or not, we don't
    need to pass in the 'gfp' separately, as it is dependent on the IO being
    read-ahead. Kill off that member.

    Add some documentation notes on ->readpages() being purely for
    read-ahead.

    Link: http://lkml.kernel.org/r/20180621010725.17813-3-axboe@kernel.dk
    Signed-off-by: Jens Axboe
    Reviewed-by: Andrew Morton
    Cc: Al Viro
    Cc: Chris Mason
    Cc: Christoph Hellwig
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jens Axboe
     

15 Aug, 2018

2 commits

  • generic/184 1s ... [failed, exit status 1]- output mismatch
    --- tests/generic/184.out 2015-01-11 16:52:27.643681072 +0800
    QA output created by 184 - silence is golden
    +rm: cannot remove '/mnt/f2fs/null': Bad address
    +mknod: '/mnt/f2fs/null': Bad address
    +chmod: cannot access '/mnt/f2fs/null': Bad address
    +./tests/generic/184: line 36: /mnt/f2fs/null: Bad address
    ...

    F2FS-fs (zram0): access invalid blkaddr:259
    EIP: f2fs_is_valid_blkaddr+0x14b/0x1b0 [f2fs]
    f2fs_iget+0x927/0x1010 [f2fs]
    f2fs_lookup+0x26e/0x630 [f2fs]
    __lookup_slow+0xb3/0x140
    lookup_slow+0x31/0x50
    walk_component+0x185/0x1f0
    path_lookupat+0x51/0x190
    filename_lookup+0x7f/0x140
    user_path_at_empty+0x36/0x40
    vfs_statx+0x61/0xc0
    __do_sys_stat64+0x29/0x40
    sys_stat64+0x13/0x20
    do_fast_syscall_32+0xaa/0x22c
    entry_SYSENTER_32+0x53/0x86

    In f2fs_iget(), we will check inode's first block address, if it is valid,
    we will set FI_FIRST_BLOCK_WRITTEN flag in inode.

    But we should only do this for regular inode, otherwise, like special
    inode, i_addr[0] is used for storing device info instead of block address,
    it will fail checking flow obviously.

    So for non-regular inode, let's skip verifying address and setting flag.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • When CONFIG_F2FS_FAULT_INJECTION is disabled, we get a warning about an
    unused label:

    fs/f2fs/segment.c: In function '__submit_discard_cmd':
    fs/f2fs/segment.c:1059:1: error: label 'submit' defined but not used [-Werror=unused-label]

    This could be fixed by adding another #ifdef around it, but the more
    reliable way of doing this seems to be to remove the other #ifdefs
    where that is easily possible.

    By defining time_to_inject() as a trivial stub, most of the checks for
    CONFIG_F2FS_FAULT_INJECTION can go away. This also leads to nicer
    formatting of the code.

    Signed-off-by: Arnd Bergmann
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Arnd Bergmann
     

14 Aug, 2018

7 commits

  • Previously, once fault injection is on, by default, all kind of faults
    will be injected to f2fs, if we want to trigger single or specified
    combined type during the test, we need to configure sysfs entry, it will
    be a little inconvenient to integrate sysfs configuring into testsuit,
    such as xfstest.

    So this patch introduces a new mount option 'fault_type' to assist old
    option 'fault_injection', with these two mount options, we can specify
    any fault rate/type at mount-time.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • generic/251
    --- tests/generic/251.out 2016-05-03 20:20:11.381899000 +0800
    QA output created by 251
    Running the test: done.
    +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument
    +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument
    +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument
    +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument
    +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument
    ...
    Ran: generic/251
    Failures: generic/251

    The reason is coverage of fstrim locates in meta area, previously we
    just return -EINVAL for such case, making generic/251 failed, to fix
    this problem, let's relieve restriction to return success with no
    block discarded.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • As Dan Carpenter reported:

    The patch 20ee4382322c: "f2fs: issue small discard by LBA order" from
    Jul 8, 2018, leads to the following Smatch warning:

    fs/f2fs/segment.c:1277 __issue_discard_cmd_orderly()
    warn: 'dc' was already freed.

    See also:
    fs/f2fs/segment.c:2550 __issue_discard_cmd_range() warn: 'dc' was already freed.

    In order to fix this issue, let's get error from __submit_discard_cmd(),
    and release current discard command after we referenced next one.

    Reported-by: Dan Carpenter
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch adds to support discard submission error injection for testing
    error handling of __submit_discard_cmd().

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Some devices has small max_{hw,}discard_sectors, so that in
    __blkdev_issue_discard(), one big size discard bio can be split
    into multiple small size discard bios, result in heavy load in IO
    scheduler and device, which can hang other sync IO for long time.

    Now, f2fs is trying to control discard commands more elaboratively,
    in order to make less conflict in between discard IO and user IO
    to enhance application's performance, so in this patch, we will
    split discard bio in f2fs in prior to in block layer to reduce
    issuing multiple discard bios in a short time.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Fixes: 5b0e95398e2b ("f2fs: introduce sbi->gc_mode to determine the policy")
    Signed-off-by: Sheng Yong
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Sheng Yong
     
  • generic/260 reported below error:

    [+] Default length with start set (should succeed)
    [+] Length beyond the end of fs (should succeed)
    [+] Length beyond the end of fs with start set (should succeed)
    +./tests/generic/260: line 94: [: 18446744073709551615: integer expression expected
    +./tests/generic/260: line 104: [: 18446744073709551615: integer expression expected
    Test done
    ...

    In f2fs_trim_fs(), if there is no discard being trimmed, we need to correct
    range->len before return.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu