07 Sep, 2019

2 commits

  • This patch fixes skipping node page writes when checkpoint is disabled.
    In this period, we can't rely on checkpoint to flush node pages.

    Fixes: fd8c8caf7e7c ("f2fs: let checkpoint flush dnode page of regular")
    Fixes: 4354994f097d ("f2fs: checkpoint disabling")
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • As Eric reported:

    On xfstest generic/204 on f2fs, I'm getting a kernel BUG.

    allocate_segment_by_default+0x9d/0x100 [f2fs]
    f2fs_allocate_data_block+0x3c0/0x5c0 [f2fs]
    do_write_page+0x62/0x110 [f2fs]
    f2fs_do_write_node_page+0x2b/0xa0 [f2fs]
    __write_node_page+0x2ec/0x590 [f2fs]
    f2fs_sync_node_pages+0x756/0x7e0 [f2fs]
    block_operations+0x25b/0x350 [f2fs]
    f2fs_write_checkpoint+0x104/0x1150 [f2fs]
    f2fs_sync_fs+0xa2/0x120 [f2fs]
    f2fs_balance_fs_bg+0x33c/0x390 [f2fs]
    f2fs_write_node_pages+0x4c/0x1f0 [f2fs]
    do_writepages+0x1c/0x70
    __writeback_single_inode+0x45/0x320
    writeback_sb_inodes+0x273/0x5c0
    wb_writeback+0xff/0x2e0
    wb_workfn+0xa1/0x370
    process_one_work+0x138/0x350
    worker_thread+0x4d/0x3d0
    kthread+0x109/0x140

    The root cause of this issue is, in a very small partition, e.g.
    in generic/204 testcase of fstest suit, filesystem's free space
    is 50MB, so at most we can write 12800 inline inode with command:
    `echo XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > $SCRATCH_MNT/$i`,
    then filesystem will have:
    - 12800 dirty inline data page
    - 12800 dirty inode page
    - and 12800 dirty imeta (dirty inode)

    When we flush node-inode's page cache, we can also flush inline
    data with each inode page, however it will run out-of-free-space
    in device, then once it triggers checkpoint, there is no room for
    huge number of imeta, at this time, GC is useless, as there is no
    dirty segment at all.

    In order to fix this, we try to recognize inode page during
    node_inode's page flushing, and update inode page from dirty inode,
    so that later another imeta (dirty inode) flush can be avoided.

    Reported-and-tested-by: Eric Biggers
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

23 Aug, 2019

1 commit

  • In mkfs, we have counted quota file's node number in cp.valid_node_count,
    so we have to avoid wrong substraction of quota node number in
    .available_nid/.avail_node_count calculation.

    f2fs_write_check_point_pack()
    {
    ..
    set_cp(valid_node_count, 1 + c.quota_inum + c.lpf_inum);

    Fixes: 292c196a3695 ("f2fs: reserve nid resource for quota sysfile")
    Fixes: 7b63f72f73af ("f2fs: fix to do sanity check on valid node/block count")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

03 Jul, 2019

2 commits

  • f2fs uses EFAULT as error number to indicate filesystem is corrupted
    all the time, but generic filesystems use EUCLEAN for such condition,
    we need to change to follow others.

    This patch adds two new macros as below to wrap more generic error
    code macros, and spread them in code.

    EFSBADCRC EBADMSG /* Bad CRC detected */
    EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */

    Reported-by: Pavel Machek
    Signed-off-by: Chao Yu
    Acked-by: Pavel Machek
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • - Add and use f2fs_ macros
    - Convert f2fs_msg to f2fs_printk
    - Remove level from f2fs_printk and embed the level in the format
    - Coalesce formats and align multi-line arguments
    - Remove unnecessary duplicate extern f2fs_msg f2fs.h

    Signed-off-by: Joe Perches
    Signed-off-by: Chao Yu
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Joe Perches
     

31 May, 2019

1 commit

  • make C=2 CHECKFLAGS="-D__CHECK_ENDIAN__"

    CHECK dir.c
    dir.c:842:50: warning: cast from restricted __le32
    CHECK node.c
    node.c:2759:40: warning: restricted __le32 degrades to integer

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

14 May, 2019

1 commit

  • Pull f2fs updates from Jaegeuk Kim:
    "Another round of various bug fixes came in. Damien improved SMR drive
    support a bit, and Chao replaced BUG_ON() with reporting errors to
    user since we've not hit from users but did hit from crafted images.
    We've found a disk layout bug in large_nat_bits feature which supports
    very large NAT entries enabled at mkfs. If the feature is enabled, it
    will give a notice to run fsck to correct the on-disk layout.

    Enhancements:
    - reduce memory consumption for SMR drive
    - better discard handling for multiple partitions
    - tracepoints for f2fs_file_write_iter/f2fs_filemap_fault
    - allow to change CP_CHKSUM_OFFSET
    - detect wrong layout of large_nat_bitmap feature
    - enhance checking valid data indices

    Bug fixes:
    - Multiple partition support for SMR drive
    - deadlock problem in f2fs_balance_fs_bg
    - add boundary checks to fix abnormal behaviors on fuzzed images
    - inline_xattr space calculations
    - replace f2fs_bug_on with errors

    In addition, this series contains various memory boundary check and
    sanity check of on-disk consistency"

    * tag 'f2fs-for-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
    f2fs: fix to avoid accessing xattr across the boundary
    f2fs: fix to avoid potential race on sbi->unusable_block_count access/update
    f2fs: add tracepoint for f2fs_filemap_fault()
    f2fs: introduce DATA_GENERIC_ENHANCE
    f2fs: fix to handle error in f2fs_disable_checkpoint()
    f2fs: remove redundant check in f2fs_file_write_iter()
    f2fs: fix to be aware of readonly device in write_checkpoint()
    f2fs: fix to skip recovery on readonly device
    f2fs: fix to consider multiple device for readonly check
    f2fs: relocate chksum_offset for large_nat_bitmap feature
    f2fs: allow unfixed f2fs_checkpoint.checksum_offset
    f2fs: Replace spaces with tab
    f2fs: insert space before the open parenthesis '('
    f2fs: allow address pointer number of dnode aligning to specified size
    f2fs: introduce f2fs_read_single_page() for cleanup
    f2fs: mark is_extension_exist() inline
    f2fs: fix to set FI_UPDATE_WRITE correctly
    f2fs: fix to avoid panic in f2fs_inplace_write_data()
    f2fs: fix to do sanity check on valid block count of segment
    f2fs: fix to do sanity check on valid node/block count
    ...

    Linus Torvalds
     

09 May, 2019

5 commits

  • Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
    whether @blkaddr locates in main area or not.

    That check is weak, since the block address in range of main area can
    point to the address which is not valid in segment info table, and we
    can not detect such condition, we may suffer worse corruption as system
    continues running.

    So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
    which trigger SIT bitmap check rather than only range check.

    This patch did below changes as wel:
    - set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
    - get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
    - introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
    - spread blkaddr check in:
    * f2fs_get_node_info()
    * __read_out_blkaddrs()
    * f2fs_submit_page_read()
    * ra_data_block()
    * do_recover_data()

    This patch can fix bug reported from bugzilla below:

    https://bugzilla.kernel.org/show_bug.cgi?id=203215
    https://bugzilla.kernel.org/show_bug.cgi?id=203223
    https://bugzilla.kernel.org/show_bug.cgi?id=203231
    https://bugzilla.kernel.org/show_bug.cgi?id=203235
    https://bugzilla.kernel.org/show_bug.cgi?id=203241

    = Update by Jaegeuk Kim =

    DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
    But, xfstest/generic/446 compalins some generated kernel messages saying invalid
    bitmap was detected when reading a block. The reaons is, when we get the
    block addresses from extent_cache, there is no lock to synchronize it from
    truncating the blocks in parallel.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch expands scalability of dnode layout, it allows address pointer
    number of dnode aligning to specified size (now, the size is one byte by
    default), and later the number can align to compress cluster size
    (1 << n bytes, n=[2,..)), it can avoid cluster acrossing two dnode, making
    design of compress meta layout simple.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • As Jungyeon reported in bugzilla:

    https://bugzilla.kernel.org/show_bug.cgi?id=203225

    - Overview
    When mounting the attached crafted image and unmounting it, following errors are reported.
    Additionally, it hangs on sync after unmounting.

    The image is intentionally fuzzed from a normal f2fs image for testing.
    Compile options for F2FS are as follows.
    CONFIG_F2FS_FS=y
    CONFIG_F2FS_STAT_FS=y
    CONFIG_F2FS_FS_XATTR=y
    CONFIG_F2FS_FS_POSIX_ACL=y
    CONFIG_F2FS_CHECK_FS=y

    - Reproduces
    mkdir test
    mount -t f2fs tmp.img test
    touch test/t
    umount test
    sync

    - Messages
    kernel BUG at fs/f2fs/node.c:3073!
    RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
    Call Trace:
    f2fs_put_super+0xf4/0x270
    generic_shutdown_super+0x62/0x110
    kill_block_super+0x1c/0x50
    kill_f2fs_super+0xad/0xd0
    deactivate_locked_super+0x35/0x60
    cleanup_mnt+0x36/0x70
    task_work_run+0x75/0x90
    exit_to_usermode_loop+0x93/0xa0
    do_syscall_64+0xba/0xf0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300

    NAT table is corrupted, so reserved meta/node inode ids were added into
    free list incorrectly, during file creation, since reserved id has cached
    in inode hash, so it fails the creation and preallocated nid can not be
    released later, result in kernel panic.

    To fix this issue, let's do nid boundary check during free nid loading.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • As Jungyeon reported in bugzilla:

    https://bugzilla.kernel.org/show_bug.cgi?id=203221

    - Overview
    When mounting the attached crafted image and running program, this error is reported.

    The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.

    - Reproduces
    cc poc_07.c
    mkdir test
    mount -t f2fs tmp.img test
    cp a.out test
    cd test
    sudo ./a.out

    - Messages
    kernel BUG at fs/f2fs/node.c:1279!
    RIP: 0010:read_node_page+0xcf/0xf0
    Call Trace:
    __get_node_page+0x6b/0x2f0
    f2fs_iget+0x8f/0xdf0
    f2fs_lookup+0x136/0x320
    __lookup_slow+0x92/0x140
    lookup_slow+0x30/0x50
    walk_component+0x1c1/0x350
    path_lookupat+0x62/0x200
    filename_lookup+0xb3/0x1a0
    do_fchmodat+0x3e/0xa0
    __x64_sys_chmod+0x12/0x20
    do_syscall_64+0x43/0xf0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    On below paths, we can have opportunity to readahead inode page
    - gc_node_segment -> f2fs_ra_node_page
    - gc_data_segment -> f2fs_ra_node_page
    - f2fs_fill_dentries -> f2fs_ra_node_page

    Unlike synchronized read, on readahead path, we can set page uptodate
    before verifying page's checksum, then read_node_page() will trigger
    kernel panic once it encounters a uptodated page w/ incorrect checksum.

    So considering readahead scenario, we have to do checksum each time
    when loading inode page even if it is uptodated.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • As Jungyeon reported in bugzilla:

    https://bugzilla.kernel.org/show_bug.cgi?id=203219

    - Overview
    When mounting the attached crafted image and running program, I got this error.
    Additionally, it hangs on sync after running the program.

    The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.

    - Reproduces
    cc poc_06.c
    mkdir test
    mount -t f2fs tmp.img test
    cp a.out test
    cd test
    sudo ./a.out
    sync

    - Messages
    kernel BUG at fs/f2fs/node.c:1183!
    RIP: 0010:f2fs_remove_inode_page+0x294/0x2d0
    Call Trace:
    f2fs_evict_inode+0x2a3/0x3a0
    evict+0xba/0x180
    __dentry_kill+0xbe/0x160
    dentry_kill+0x46/0x180
    dput+0xbb/0x100
    do_renameat2+0x3c9/0x550
    __x64_sys_rename+0x17/0x20
    do_syscall_64+0x43/0xf0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    The reason is f2fs_remove_inode_page() will trigger kernel panic due to
    inconsistent i_blocks value of inode.

    To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
    give a hint to fsck for latter repairing of potential image corruption.

    Signed-off-by: Chao Yu
    [Jaegeuk Kim: fix build warning and add unlikely]
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

09 Apr, 2019

1 commit

  • In preparation to enabling -Wimplicit-fallthrough, mark switch cases
    where we are expecting to fall through.

    This patch fixes the following warnings:

    fs/affs/affs.h:124:38: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/configfs/dir.c:1692:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/configfs/dir.c:1694:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ceph/file.c:249:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/f2fs/node.c:618:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/f2fs/node.c:620:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/btrfs/ref-verify.c:522:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/gfs2/bmap.c:711:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/gfs2/bmap.c:722:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/jffs2/fs.c:339:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/nfsd/nfs4proc.c:429:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ufs/util.h:62:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/ufs/util.h:43:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/fcntl.c:770:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/seq_file.c:319:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/libfs.c:148:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/libfs.c:150:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/signalfd.c:178:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    fs/locks.c:1473:16: warning: this statement may fall through [-Wimplicit-fallthrough=]

    Warning level 3 was used: -Wimplicit-fallthrough=3

    This patch is part of the ongoing efforts to enabling
    -Wimplicit-fallthrough.

    Reviewed-by: Kees Cook
    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

13 Mar, 2019

1 commit

  • As Gao Xiang reported in bugzilla:

    https://bugzilla.kernel.org/show_bug.cgi?id=202749

    f2fs may skip pageout() due to incorrect page reference count.

    The problem here is that MM defined the rule [1] very clearly that
    once page was set with PG_private flag, we should increment the
    refcount in that page, also main flows like pageout(), migrate_page()
    will assume there is one additional page reference count if
    page_has_private() returns true.

    But currently, f2fs won't add/del refcount when changing PG_private
    flag. Anyway, f2fs should follow MM's rule to make MM's related flows
    running as expected.

    [1] https://lore.kernel.org/lkml/2b19b3c4-2bc4-15fa-15cc-27a13e5c7af1@aol.com/

    Reported-by: Gao Xiang
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

16 Feb, 2019

1 commit

  • Some works after roll-forward recovery can get an error which will release
    all the data structures. Let's flush them in order to make it clean.

    One possible corruption came from:

    [ 90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null)
    [ 90.675349] Call trace:
    [ 90.677869] __list_del_entry_valid+0x94/0xb4
    [ 90.682351] remove_dirty_inode+0xac/0x114
    [ 90.686563] __f2fs_write_data_pages+0x6a8/0x6c8
    [ 90.691302] f2fs_write_data_pages+0x40/0x4c
    [ 90.695695] do_writepages+0x80/0xf0
    [ 90.699372] __writeback_single_inode+0xdc/0x4ac
    [ 90.704113] writeback_sb_inodes+0x280/0x440
    [ 90.708501] wb_writeback+0x1b8/0x3d0
    [ 90.712267] wb_workfn+0x1a8/0x4d4
    [ 90.715765] process_one_work+0x1c0/0x3d4
    [ 90.719883] worker_thread+0x224/0x344
    [ 90.723739] kthread+0x120/0x130
    [ 90.727055] ret_from_fork+0x10/0x18

    Reported-by: Sahitya Tummala
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

27 Dec, 2018

2 commits

  • For all ordered cases in f2fs_wait_on_page_writeback(), we need to
    check PageWriteback status, so let's clean up to relocate the check
    into f2fs_wait_on_page_writeback().

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • One report says memalloc failure during mount.

    (unwind_backtrace) from [] (show_stack+0x10/0x14)
    (show_stack) from [] (dump_stack+0x8c/0xa0)
    (dump_stack) from [] (warn_alloc+0xc4/0x160)
    (warn_alloc) from [] (__alloc_pages_nodemask+0x3f4/0x10d0)
    (__alloc_pages_nodemask) from [] (kmalloc_order_trace+0x2c/0x120)
    (kmalloc_order_trace) from [] (build_node_manager+0x35c/0x688)
    (build_node_manager) from [] (f2fs_fill_super+0xf0c/0x16cc)
    (f2fs_fill_super) from [] (mount_bdev+0x15c/0x188)
    (mount_bdev) from [] (f2fs_mount+0x18/0x20)
    (f2fs_mount) from [] (mount_fs+0x158/0x19c)
    (mount_fs) from [] (vfs_kern_mount+0x78/0x134)
    (vfs_kern_mount) from [] (do_mount+0x474/0xca4)
    (do_mount) from [] (SyS_mount+0x94/0xbc)
    (SyS_mount) from [] (ret_fast_syscall+0x0/0x48)

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

14 Dec, 2018

1 commit

  • This patch reorders flow from

    - update page
    - set_page_dirty
    - wait_on_page_writeback

    to

    - wait_on_page_writeback
    - update page
    - set_page_dirty

    The reason is:
    - set_page_dirty will increase reference of dirty page, the reference
    should be cleared before wait_on_page_writeback to keep its consistency.
    - some devices need stable page during page writebacking, so we
    should not change page's data.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

27 Nov, 2018

2 commits

  • The function truncate_node frees the page with f2fs_put_page. However,
    the page index is read after that. So, the patch reads the index before
    freeing the page.

    Fixes: bf39c00a9a7f ("f2fs: drop obsolete node page when it is truncated")
    Cc:
    Signed-off-by: Pan Bian
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Pan Bian
     
  • In F2FS_HAS_FEATURE(), we will use F2FS_SB(sb) to get sbi pointer to
    access .raw_super field, to avoid unneeded pointer conversion, this
    patch changes to F2FS_HAS_FEATURE() accept sbi parameter directly.

    Just do cleanup, no logic change.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

29 Oct, 2018

1 commit

  • Pull XArray conversion from Matthew Wilcox:
    "The XArray provides an improved interface to the radix tree data
    structure, providing locking as part of the API, specifying GFP flags
    at allocation time, eliminating preloading, less re-walking the tree,
    more efficient iterations and not exposing RCU-protected pointers to
    its users.

    This patch set

    1. Introduces the XArray implementation

    2. Converts the pagecache to use it

    3. Converts memremap to use it

    The page cache is the most complex and important user of the radix
    tree, so converting it was most important. Converting the memremap
    code removes the only other user of the multiorder code, which allows
    us to remove the radix tree code that supported it.

    I have 40+ followup patches to convert many other users of the radix
    tree over to the XArray, but I'd like to get this part in first. The
    other conversions haven't been in linux-next and aren't suitable for
    applying yet, but you can see them in the xarray-conv branch if you're
    interested"

    * 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits)
    radix tree: Remove multiorder support
    radix tree test: Convert multiorder tests to XArray
    radix tree tests: Convert item_delete_rcu to XArray
    radix tree tests: Convert item_kill_tree to XArray
    radix tree tests: Move item_insert_order
    radix tree test suite: Remove multiorder benchmarking
    radix tree test suite: Remove __item_insert
    memremap: Convert to XArray
    xarray: Add range store functionality
    xarray: Move multiorder_check to in-kernel tests
    xarray: Move multiorder_shrink to kernel tests
    xarray: Move multiorder account test in-kernel
    radix tree test suite: Convert iteration test to XArray
    radix tree test suite: Convert tag_tagged_items to XArray
    radix tree: Remove radix_tree_clear_tags
    radix tree: Remove radix_tree_maybe_preload_order
    radix tree: Remove split/join code
    radix tree: Remove radix_tree_update_node_t
    page cache: Finish XArray conversion
    dax: Convert page fault handlers to XArray
    ...

    Linus Torvalds
     

21 Oct, 2018

1 commit


17 Oct, 2018

3 commits

  • Commit 7735730d39d7 ("f2fs: fix to propagate error from __get_meta_page()")
    added disable_nat_bits() in error path of __get_nat_bitmaps(), but it's
    unneeded, beause we will fail mount, we won't have chance to change nid
    usage status w/o nat full/empty bitmaps.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Testcase to reproduce this bug:
    1. mkfs.f2fs /dev/sdd
    2. mount -t f2fs /dev/sdd /mnt/f2fs
    3. touch /mnt/f2fs/file
    4. sync
    5. chattr +A /mnt/f2fs/file
    6. xfs_io -f /mnt/f2fs/file -c "fsync"
    7. godown /mnt/f2fs
    8. umount /mnt/f2fs
    9. mount -t f2fs /dev/sdd /mnt/f2fs
    10. chattr -A /mnt/f2fs/file
    11. xfs_io -f /mnt/f2fs/file -c "fsync"
    12. umount /mnt/f2fs
    13. mount -t f2fs /dev/sdd /mnt/f2fs
    14. lsattr /mnt/f2fs/file

    -----------------N- /mnt/f2fs/file

    But actually, we expect the corrct result is:

    -------A---------N- /mnt/f2fs/file

    The reason is in step 9) we missed to recover cold bit flag in inode
    block, so later, in fsync, we will skip write inode block due to below
    condition check, result in lossing data in another SPOR.

    f2fs_fsync_node_pages()
    if (!IS_DNODE(page) || !is_cold_node(page))
    continue;

    Note that, I guess that some non-dir inode has already lost cold bit
    during POR, so in order to reenable recovery for those inode, let's
    try to recover cold bit in f2fs_iget() to save more fsynced data.

    Fixes: c56675750d7c ("f2fs: remove unneeded set_cold_node()")
    Cc: 4.17+
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • When migrating encrypted block from background GC thread, we only add
    them into f2fs inner bio cache, but forget to submit the cached bio, it
    may cause potential deadlock when we are waiting page writebacked, fix
    it.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

01 Oct, 2018

1 commit

  • There is one case that we can leave bio in f2fs, result in hanging
    page writeback waiter.

    Thread A Thread B
    - f2fs_write_cache_pages
    - f2fs_submit_page_write
    page #0 cached in bio #0 of cold log
    - f2fs_submit_page_write
    page #1 cached in bio #1 of warm log
    - f2fs_write_cache_pages
    - f2fs_submit_page_write
    bio is full, submit bio #1 contain page #1
    - f2fs_submit_merged_write_cond(, page #1)
    fail to submit bio #0 due to page #1 is not in any cached bios.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

29 Sep, 2018

2 commits


27 Sep, 2018

1 commit

  • Testcase to reproduce this bug:
    1. mkfs.f2fs -O extra_attr -O inode_crtime /dev/sdd
    2. mount -t f2fs /dev/sdd /mnt/f2fs
    3. touch /mnt/f2fs/file
    4. xfs_io -f /mnt/f2fs/file -c "fsync"
    5. godown /mnt/f2fs
    6. umount /mnt/f2fs
    7. mount -t f2fs /dev/sdd /mnt/f2fs
    8. xfs_io -f /mnt/f2fs/file -c "statx -r"

    stat.btime.tv_sec = 0
    stat.btime.tv_nsec = 0

    This patch fixes to recover inode creation time fields during
    mount.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

21 Sep, 2018

1 commit


13 Sep, 2018

1 commit


15 Aug, 2018

1 commit

  • When CONFIG_F2FS_FAULT_INJECTION is disabled, we get a warning about an
    unused label:

    fs/f2fs/segment.c: In function '__submit_discard_cmd':
    fs/f2fs/segment.c:1059:1: error: label 'submit' defined but not used [-Werror=unused-label]

    This could be fixed by adding another #ifdef around it, but the more
    reliable way of doing this seems to be to remove the other #ifdefs
    where that is easily possible.

    By defining time_to_inject() as a trivial stub, most of the checks for
    CONFIG_F2FS_FAULT_INJECTION can go away. This also leads to nicer
    formatting of the code.

    Signed-off-by: Arnd Bergmann
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Arnd Bergmann
     

14 Aug, 2018

1 commit


11 Aug, 2018

2 commits

  • f2fs recovery flow is relying on dnode block link list, it means fsynced
    file recovery depends on previous dnode's persistence in the list, so
    during fsync() we should wait on all regular inode's dnode writebacked
    before issuing flush.

    By this way, we can avoid dnode block list being broken by out-of-order
    IO submission due to IO scheduler or driver.

    Sheng Yong helps to do the test with this patch:

    Target:/data (f2fs, -)
    64MB / 32768KB / 4KB / 8

    1 / PERSIST / Index

    Base:
    SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
    1 867.82 204.15 41440.03 41370.54 680.8 1025.94 1031.08
    2 871.87 205.87 41370.3 40275.2 791.14 1065.84 1101.7
    3 866.52 205.69 41795.67 40596.16 694.69 1037.16 1031.48
    Avg 868.7366667 205.2366667 41535.33333 40747.3 722.21 1042.98 1054.753333

    After:
    SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
    1 798.81 202.5 41143 40613.87 602.71 838.08 913.83
    2 805.79 206.47 40297.2 41291.46 604.44 840.75 924.27
    3 814.83 206.17 41209.57 40453.62 602.85 834.66 927.91
    Avg 806.4766667 205.0466667 40883.25667 40786.31667 603.3333333 837.83 922.0033333

    Patched/Original:
    0.928332713 0.999074239 0.984300676 1.000957528 0.835398753 0.803303994 0.874141189

    It looks like atomic write will suffer performance regression.

    I suspect that the criminal is that we forcing to wait all dnode being in
    storage cache before we issue PREFLUSH+FUA.

    BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
    cause the problem: we will lose data of last transaction after SPO, even if
    atomic write return no error:

    - atomic_open();
    - write() P1, P2, P3;
    - atomic_commit();
    - writeback data: P1, P2, P3;
    - writeback node: N1, N2, N3;
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • There is a subtle race condition to invoke f2fs_bug_on() in shutdown tests. I've
    confirmed that the last checkpoint is preserved in consistent state, so it'd be
    fine to just return error at this moment.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

02 Aug, 2018

4 commits

  • Fsyncer will wait on all dnode pages of regular writeback before flushing,
    if there are async dnode pages blocked by IO scheduler, it may decrease
    fsync's performance.

    In this patch, we choose to let f2fs_balance_fs_bg() to trigger checkpoint
    to flush these dnode pages of regular, so async IO of dnode page can be
    elimitnated, making fsyncer only need to wait for sync IO.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Just cleanup, no logic change.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • If caller of __get_meta_page() can handle error, let's propagate error
    from __get_meta_page().

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch add to do sanity check with below field:
    - cp_pack_total_block_count
    - blkaddr of data/node
    - extent info

    - Overview
    BUG() in verify_block_addr() when writing to a corrupted f2fs image

    - Reproduce (4.18 upstream kernel)

    - POC (poc.c)

    static void activity(char *mpoint) {

    char *foo_bar_baz;
    int err;

    static int buf[8192];
    memset(buf, 0, sizeof(buf));

    err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);

    int fd = open(foo_bar_baz, O_RDWR | O_TRUNC, 0777);
    if (fd >= 0) {
    write(fd, (char *)buf, sizeof(buf));
    fdatasync(fd);
    close(fd);
    }
    }

    int main(int argc, char *argv[]) {
    activity(argv[1]);
    return 0;
    }

    - Kernel message
    [ 689.349473] F2FS-fs (loop0): Mounted with checkpoint version = 3
    [ 699.728662] WARNING: CPU: 0 PID: 1309 at fs/f2fs/segment.c:2860 f2fs_inplace_write_data+0x232/0x240
    [ 699.728670] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
    [ 699.729056] CPU: 0 PID: 1309 Comm: a.out Not tainted 4.18.0-rc1+ #4
    [ 699.729064] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    [ 699.729074] RIP: 0010:f2fs_inplace_write_data+0x232/0x240
    [ 699.729076] Code: ff e9 cf fe ff ff 49 8d 7d 10 e8 39 45 ad ff 4d 8b 7d 10 be 04 00 00 00 49 8d 7f 48 e8 07 49 ad ff 45 8b 7f 48 e9 fb fe ff ff 0b f0 41 80 4d 48 04 e9 65 fe ff ff 90 66 66 66 66 90 55 48 8d
    [ 699.729130] RSP: 0018:ffff8801f43af568 EFLAGS: 00010202
    [ 699.729139] RAX: 000000000000003f RBX: ffff8801f43af7b8 RCX: ffffffffb88c9113
    [ 699.729142] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff8802024e5540
    [ 699.729144] RBP: ffff8801f43af590 R08: 0000000000000009 R09: ffffffffffffffe8
    [ 699.729147] R10: 0000000000000001 R11: ffffed0039b0596a R12: ffff8802024e5540
    [ 699.729149] R13: ffff8801f0335500 R14: ffff8801e3e7a700 R15: ffff8801e1ee4450
    [ 699.729154] FS: 00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
    [ 699.729156] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 699.729159] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
    [ 699.729171] Call Trace:
    [ 699.729192] f2fs_do_write_data_page+0x2e2/0xe00
    [ 699.729203] ? f2fs_should_update_outplace+0xd0/0xd0
    [ 699.729238] ? memcg_drain_all_list_lrus+0x280/0x280
    [ 699.729269] ? __radix_tree_replace+0xa3/0x120
    [ 699.729276] __write_data_page+0x5c7/0xe30
    [ 699.729291] ? kasan_check_read+0x11/0x20
    [ 699.729310] ? page_mapped+0x8a/0x110
    [ 699.729321] ? page_mkclean+0xe9/0x160
    [ 699.729327] ? f2fs_do_write_data_page+0xe00/0xe00
    [ 699.729331] ? invalid_page_referenced_vma+0x130/0x130
    [ 699.729345] ? clear_page_dirty_for_io+0x332/0x450
    [ 699.729351] f2fs_write_cache_pages+0x4ca/0x860
    [ 699.729358] ? __write_data_page+0xe30/0xe30
    [ 699.729374] ? percpu_counter_add_batch+0x22/0xa0
    [ 699.729380] ? kasan_check_write+0x14/0x20
    [ 699.729391] ? _raw_spin_lock+0x17/0x40
    [ 699.729403] ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30
    [ 699.729413] ? iov_iter_advance+0x113/0x640
    [ 699.729418] ? f2fs_write_end+0x133/0x2e0
    [ 699.729423] ? balance_dirty_pages_ratelimited+0x239/0x640
    [ 699.729428] f2fs_write_data_pages+0x329/0x520
    [ 699.729433] ? generic_perform_write+0x250/0x320
    [ 699.729438] ? f2fs_write_cache_pages+0x860/0x860
    [ 699.729454] ? current_time+0x110/0x110
    [ 699.729459] ? f2fs_preallocate_blocks+0x1ef/0x370
    [ 699.729464] do_writepages+0x37/0xb0
    [ 699.729468] ? f2fs_write_cache_pages+0x860/0x860
    [ 699.729472] ? do_writepages+0x37/0xb0
    [ 699.729478] __filemap_fdatawrite_range+0x19a/0x1f0
    [ 699.729483] ? delete_from_page_cache_batch+0x4e0/0x4e0
    [ 699.729496] ? __vfs_write+0x2b2/0x410
    [ 699.729501] file_write_and_wait_range+0x66/0xb0
    [ 699.729506] f2fs_do_sync_file+0x1f9/0xd90
    [ 699.729511] ? truncate_partial_data_page+0x290/0x290
    [ 699.729521] ? __sb_end_write+0x30/0x50
    [ 699.729526] ? vfs_write+0x20f/0x260
    [ 699.729530] f2fs_sync_file+0x9a/0xb0
    [ 699.729534] ? f2fs_do_sync_file+0xd90/0xd90
    [ 699.729548] vfs_fsync_range+0x68/0x100
    [ 699.729554] ? __fget_light+0xc9/0xe0
    [ 699.729558] do_fsync+0x3d/0x70
    [ 699.729562] __x64_sys_fdatasync+0x24/0x30
    [ 699.729585] do_syscall_64+0x78/0x170
    [ 699.729595] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 699.729613] RIP: 0033:0x7f9bf930d800
    [ 699.729615] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24
    [ 699.729668] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
    [ 699.729673] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
    [ 699.729675] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
    [ 699.729678] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
    [ 699.729680] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
    [ 699.729683] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
    [ 699.729687] ---[ end trace 4ce02f25ff7d3df5 ]---
    [ 699.729782] ------------[ cut here ]------------
    [ 699.729785] kernel BUG at fs/f2fs/segment.h:654!
    [ 699.731055] invalid opcode: 0000 [#1] SMP KASAN PTI
    [ 699.732104] CPU: 0 PID: 1309 Comm: a.out Tainted: G W 4.18.0-rc1+ #4
    [ 699.733684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    [ 699.735611] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730
    [ 699.736649] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0
    [ 699.740524] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283
    [ 699.741573] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef
    [ 699.743006] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c
    [ 699.744426] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55
    [ 699.745833] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940
    [ 699.747256] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001
    [ 699.748683] FS: 00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
    [ 699.750293] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 699.751462] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
    [ 699.752874] Call Trace:
    [ 699.753386] ? f2fs_inplace_write_data+0x93/0x240
    [ 699.754341] f2fs_inplace_write_data+0xd2/0x240
    [ 699.755271] f2fs_do_write_data_page+0x2e2/0xe00
    [ 699.756214] ? f2fs_should_update_outplace+0xd0/0xd0
    [ 699.757215] ? memcg_drain_all_list_lrus+0x280/0x280
    [ 699.758209] ? __radix_tree_replace+0xa3/0x120
    [ 699.759164] __write_data_page+0x5c7/0xe30
    [ 699.760002] ? kasan_check_read+0x11/0x20
    [ 699.760823] ? page_mapped+0x8a/0x110
    [ 699.761573] ? page_mkclean+0xe9/0x160
    [ 699.762345] ? f2fs_do_write_data_page+0xe00/0xe00
    [ 699.763332] ? invalid_page_referenced_vma+0x130/0x130
    [ 699.764374] ? clear_page_dirty_for_io+0x332/0x450
    [ 699.765347] f2fs_write_cache_pages+0x4ca/0x860
    [ 699.766276] ? __write_data_page+0xe30/0xe30
    [ 699.767161] ? percpu_counter_add_batch+0x22/0xa0
    [ 699.768112] ? kasan_check_write+0x14/0x20
    [ 699.768951] ? _raw_spin_lock+0x17/0x40
    [ 699.769739] ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30
    [ 699.770885] ? iov_iter_advance+0x113/0x640
    [ 699.771743] ? f2fs_write_end+0x133/0x2e0
    [ 699.772569] ? balance_dirty_pages_ratelimited+0x239/0x640
    [ 699.773680] f2fs_write_data_pages+0x329/0x520
    [ 699.774603] ? generic_perform_write+0x250/0x320
    [ 699.775544] ? f2fs_write_cache_pages+0x860/0x860
    [ 699.776510] ? current_time+0x110/0x110
    [ 699.777299] ? f2fs_preallocate_blocks+0x1ef/0x370
    [ 699.778279] do_writepages+0x37/0xb0
    [ 699.779026] ? f2fs_write_cache_pages+0x860/0x860
    [ 699.779978] ? do_writepages+0x37/0xb0
    [ 699.780755] __filemap_fdatawrite_range+0x19a/0x1f0
    [ 699.781746] ? delete_from_page_cache_batch+0x4e0/0x4e0
    [ 699.782820] ? __vfs_write+0x2b2/0x410
    [ 699.783597] file_write_and_wait_range+0x66/0xb0
    [ 699.784540] f2fs_do_sync_file+0x1f9/0xd90
    [ 699.785381] ? truncate_partial_data_page+0x290/0x290
    [ 699.786415] ? __sb_end_write+0x30/0x50
    [ 699.787204] ? vfs_write+0x20f/0x260
    [ 699.787941] f2fs_sync_file+0x9a/0xb0
    [ 699.788694] ? f2fs_do_sync_file+0xd90/0xd90
    [ 699.789572] vfs_fsync_range+0x68/0x100
    [ 699.790360] ? __fget_light+0xc9/0xe0
    [ 699.791128] do_fsync+0x3d/0x70
    [ 699.791779] __x64_sys_fdatasync+0x24/0x30
    [ 699.792614] do_syscall_64+0x78/0x170
    [ 699.793371] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 699.794406] RIP: 0033:0x7f9bf930d800
    [ 699.795134] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24
    [ 699.798960] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
    [ 699.800483] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
    [ 699.801923] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
    [ 699.803373] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
    [ 699.804798] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
    [ 699.806233] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
    [ 699.807667] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
    [ 699.817079] ---[ end trace 4ce02f25ff7d3df6 ]---
    [ 699.818068] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730
    [ 699.819114] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0
    [ 699.822919] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283
    [ 699.823977] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef
    [ 699.825436] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c
    [ 699.826881] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55
    [ 699.828292] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940
    [ 699.829750] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001
    [ 699.831192] FS: 00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
    [ 699.832793] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 699.833981] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
    [ 699.835556] ==================================================================
    [ 699.837029] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x38c/0x3e0
    [ 699.838462] Read of size 8 at addr ffff8801f43af970 by task a.out/1309

    [ 699.840086] CPU: 0 PID: 1309 Comm: a.out Tainted: G D W 4.18.0-rc1+ #4
    [ 699.841603] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    [ 699.843475] Call Trace:
    [ 699.843982] dump_stack+0x7b/0xb5
    [ 699.844661] print_address_description+0x70/0x290
    [ 699.845607] kasan_report+0x291/0x390
    [ 699.846351] ? update_stack_state+0x38c/0x3e0
    [ 699.853831] __asan_load8+0x54/0x90
    [ 699.854569] update_stack_state+0x38c/0x3e0
    [ 699.855428] ? __read_once_size_nocheck.constprop.7+0x20/0x20
    [ 699.856601] ? __save_stack_trace+0x5e/0x100
    [ 699.857476] unwind_next_frame.part.5+0x18e/0x490
    [ 699.858448] ? unwind_dump+0x290/0x290
    [ 699.859217] ? clear_page_dirty_for_io+0x332/0x450
    [ 699.860185] __unwind_start+0x106/0x190
    [ 699.860974] __save_stack_trace+0x5e/0x100
    [ 699.861808] ? __save_stack_trace+0x5e/0x100
    [ 699.862691] ? unlink_anon_vmas+0xba/0x2c0
    [ 699.863525] save_stack_trace+0x1f/0x30
    [ 699.864312] save_stack+0x46/0xd0
    [ 699.864993] ? __alloc_pages_slowpath+0x1420/0x1420
    [ 699.865990] ? flush_tlb_mm_range+0x15e/0x220
    [ 699.866889] ? kasan_check_write+0x14/0x20
    [ 699.867724] ? __dec_node_state+0x92/0xb0
    [ 699.868543] ? lock_page_memcg+0x85/0xf0
    [ 699.869350] ? unlock_page_memcg+0x16/0x80
    [ 699.870185] ? page_remove_rmap+0x198/0x520
    [ 699.871048] ? mark_page_accessed+0x133/0x200
    [ 699.871930] ? _cond_resched+0x1a/0x50
    [ 699.872700] ? unmap_page_range+0xcd4/0xe50
    [ 699.873551] ? rb_next+0x58/0x80
    [ 699.874217] ? rb_next+0x58/0x80
    [ 699.874895] __kasan_slab_free+0x13c/0x1a0
    [ 699.875734] ? unlink_anon_vmas+0xba/0x2c0
    [ 699.876563] kasan_slab_free+0xe/0x10
    [ 699.877315] kmem_cache_free+0x89/0x1e0
    [ 699.878095] unlink_anon_vmas+0xba/0x2c0
    [ 699.878913] free_pgtables+0x101/0x1b0
    [ 699.879677] exit_mmap+0x146/0x2a0
    [ 699.880378] ? __ia32_sys_munmap+0x50/0x50
    [ 699.881214] ? kasan_check_read+0x11/0x20
    [ 699.882052] ? mm_update_next_owner+0x322/0x380
    [ 699.882985] mmput+0x8b/0x1d0
    [ 699.883602] do_exit+0x43a/0x1390
    [ 699.884288] ? mm_update_next_owner+0x380/0x380
    [ 699.885212] ? f2fs_sync_file+0x9a/0xb0
    [ 699.885995] ? f2fs_do_sync_file+0xd90/0xd90
    [ 699.886877] ? vfs_fsync_range+0x68/0x100
    [ 699.887694] ? __fget_light+0xc9/0xe0
    [ 699.888442] ? do_fsync+0x3d/0x70
    [ 699.889118] ? __x64_sys_fdatasync+0x24/0x30
    [ 699.889996] rewind_stack_do_exit+0x17/0x20
    [ 699.890860] RIP: 0033:0x7f9bf930d800
    [ 699.891585] Code: Bad RIP value.
    [ 699.892268] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
    [ 699.893781] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
    [ 699.895220] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
    [ 699.896643] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
    [ 699.898069] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
    [ 699.899505] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000

    [ 699.901241] The buggy address belongs to the page:
    [ 699.902215] page:ffffea0007d0ebc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
    [ 699.903811] flags: 0x2ffff0000000000()
    [ 699.904585] raw: 02ffff0000000000 0000000000000000 ffffffff07d00101 0000000000000000
    [ 699.906125] raw: 0000000000000000 0000000000240000 00000000ffffffff 0000000000000000
    [ 699.907673] page dumped because: kasan: bad access detected

    [ 699.909108] Memory state around the buggy address:
    [ 699.910077] ffff8801f43af800: 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 00
    [ 699.911528] ffff8801f43af880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    [ 699.912953] >ffff8801f43af900: 00 00 00 00 00 00 00 00 f1 01 f4 f4 f4 f2 f2 f2
    [ 699.914392] ^
    [ 699.915758] ffff8801f43af980: f2 00 f4 f4 00 00 00 00 f2 00 00 00 00 00 00 00
    [ 699.917193] ffff8801f43afa00: 00 00 00 00 00 00 00 00 00 f3 f3 f3 00 00 00 00
    [ 699.918634] ==================================================================

    - Location
    https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.h#L644

    Reported-by Wen Xu
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

29 Jul, 2018

1 commit

  • In synchronous scenario, like in checkpoint(), we are going to flush
    dirty node pages to device synchronously, we can easily failed
    writebacking node page due to trylock_page() failure, especially in
    condition of intensive lock competition, which can cause long latency
    of checkpoint(). So let's use lock_page() in synchronous scenario to
    avoid this issue.

    Signed-off-by: Yunlei He
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu