05 Mar, 2020

1 commit

  • commit 3e5e479a39ce9ed60cd63f7565cc1d9da77c2a4e upstream.

    As Youling reported in mailing list:

    https://www.linuxquestions.org/questions/linux-newbie-8/the-file-system-f2fs-is-broken-4175666043/

    https://www.linux.org/threads/the-file-system-f2fs-is-broken.26490/

    There is a test case can corrupt f2fs image:
    - dd if=/dev/zero of=/swapfile bs=1M count=4096
    - chmod 600 /swapfile
    - mkswap /swapfile
    - swapon --discard /swapfile

    The root cause is f2fs_swap_activate() intends to return zero value
    to setup_swap_extents() to enable SWP_FS mode (swap file goes through
    fs), in this flow, setup_swap_extents() setups swap extent with wrong
    block address range, result in discard_swap() erasing incorrect address.

    Because f2fs_swap_activate() has pinned swapfile, its data block
    address will not change, it's safe to let swap to handle IO through
    raw device, so we can get rid of SWAP_FS mode and initial swap extents
    inside f2fs_swap_activate(), by this way, later discard_swap() can trim
    in right address range.

    Fixes: 4969c06a0d83 ("f2fs: support swap file w/ DIO")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Chao Yu
     

24 Feb, 2020

5 commits

  • [ Upstream commit fe396ad8e7526f059f7b8c7290d33a1b84adacab ]

    If kobject_init_and_add() failed, caller needs to invoke kobject_put()
    to release kobject explicitly.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Chao Yu
     
  • [ Upstream commit 820d366736c949ffe698d3b3fe1266a91da1766d ]

    Detected kmemleak.

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Jaegeuk Kim
     
  • [ Upstream commit 5b1dbb082f196278f82b6a15a13848efacb9ff11 ]

    This patch moves setting I_LINKABLE early in rename2(whiteout) to avoid the
    below warning.

    [ 3189.163385] WARNING: CPU: 3 PID: 59523 at fs/inode.c:358 inc_nlink+0x32/0x40
    [ 3189.246979] Call Trace:
    [ 3189.248707] f2fs_init_inode_metadata+0x2d6/0x440 [f2fs]
    [ 3189.251399] f2fs_add_inline_entry+0x162/0x8c0 [f2fs]
    [ 3189.254010] f2fs_add_dentry+0x69/0xe0 [f2fs]
    [ 3189.256353] f2fs_do_add_link+0xc5/0x100 [f2fs]
    [ 3189.258774] f2fs_rename2+0xabf/0x1010 [f2fs]
    [ 3189.261079] vfs_rename+0x3f8/0xaa0
    [ 3189.263056] ? tomoyo_path_rename+0x44/0x60
    [ 3189.265283] ? do_renameat2+0x49b/0x550
    [ 3189.267324] do_renameat2+0x49b/0x550
    [ 3189.269316] __x64_sys_renameat2+0x20/0x30
    [ 3189.271441] do_syscall_64+0x5a/0x230
    [ 3189.273410] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 3189.275848] RIP: 0033:0x7f270b4d9a49

    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Jaegeuk Kim
     
  • [ Upstream commit bdf03299248916640a835a05d32841bb3d31912d ]

    Otherwise, we can hit deadlock by waiting for the locked page in
    move_data_block in GC.

    Thread A Thread B
    - do_page_mkwrite
    - f2fs_vm_page_mkwrite
    - lock_page
    - f2fs_balance_fs
    - mutex_lock(gc_mutex)
    - f2fs_gc
    - do_garbage_collect
    - ra_data_block
    - grab_cache_page
    - f2fs_balance_fs
    - mutex_lock(gc_mutex)

    Fixes: 39a8695824510 ("f2fs: refactor ->page_mkwrite() flow")
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Jaegeuk Kim
     
  • [ Upstream commit 47501f87c61ad2aa234add63e1ae231521dbc3f5 ]

    The previous preallocation and DIO decision like below.

    allow_outplace_dio !allow_outplace_dio
    f2fs_force_buffered_io (*) No_Prealloc / Buffered_IO Prealloc / Buffered_IO
    !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO

    But, Javier reported Case (*) where zoned device bypassed preallocation but
    fell back to buffered writes in f2fs_direct_IO(), resulting in stale data
    being read.

    In order to fix the issue, actually we need to preallocate blocks whenever
    we fall back to buffered IO like this. No change is made in the other cases.

    allow_outplace_dio !allow_outplace_dio
    f2fs_force_buffered_io (*) Prealloc / Buffered_IO Prealloc / Buffered_IO
    !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO

    Reported-and-tested-by: Javier Gonzalez
    Signed-off-by: Damien Le Moal
    Tested-by: Shin'ichiro Kawasaki
    Reviewed-by: Chao Yu
    Reviewed-by: Javier González
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Jaegeuk Kim
     

11 Feb, 2020

6 commits

  • commit 80f2388afa6ef985f9c5c228e36705c4d4db4756 upstream.

    Since ->d_compare() and ->d_hash() can be called in RCU-walk mode,
    ->d_parent and ->d_inode can be concurrently modified, and in
    particular, ->d_inode may be changed to NULL. For f2fs_d_hash() this
    resulted in a reproducible NULL dereference if a lookup is done in a
    directory being deleted, e.g. with:

    int main()
    {
    if (fork()) {
    for (;;) {
    mkdir("subdir", 0700);
    rmdir("subdir");
    }
    } else {
    for (;;)
    access("subdir/file", 0);
    }
    }

    ... or by running the 't_encrypted_d_revalidate' program from xfstests.
    Both repros work in any directory on a filesystem with the encoding
    feature, even if the directory doesn't actually have the casefold flag.

    I couldn't reproduce a crash in f2fs_d_compare(), but it appears that a
    similar crash is possible there.

    Fix these bugs by reading ->d_parent and ->d_inode using READ_ONCE() and
    falling back to the case sensitive behavior if the inode is NULL.

    Reported-by: Al Viro
    Fixes: 2c2eb7a300cd ("f2fs: Support case-insensitive file name lookups")
    Cc: # v5.4+
    Signed-off-by: Eric Biggers
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Eric Biggers
     
  • commit 5515eae647426169e4b7969271fb207881eba7f6 upstream.

    Do the name comparison for non-casefolded directories correctly.

    This is analogous to ext4's commit 66883da1eee8 ("ext4: fix dcache
    lookup of !casefolded directories").

    Fixes: 2c2eb7a300cd ("f2fs: Support case-insensitive file name lookups")
    Cc: # v5.4+
    Signed-off-by: Eric Biggers
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Eric Biggers
     
  • commit bf2cbd3c57159c2b639ee8797b52ab5af180bf83 upstream.

    Calling min_not_zero() to simplify complicated prjquota
    limit comparison in f2fs_statfs_project().

    Signed-off-by: Chengguang Xu
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Chengguang Xu
     
  • commit acdf2172172a511f97fa21ed0ee7609a6d3b3a07 upstream.

    statfs calculates Total/Used/Avail disk space in block unit,
    so we should translate soft/hard prjquota limit to block unit
    as well.

    Below testing result shows the block/inode numbers of
    Total/Used/Avail from df command are all correct afer
    applying this patch.

    [root@localhost quota-tools]\# ./repquota -P /dev/sdb1

    Chengguang Xu
     
  • commit 909110c060f22e65756659ec6fa957ae75777e00 upstream.

    Setting softlimit larger than hardlimit seems meaningless
    for disk quota but currently it is allowed. In this case,
    there may be a bit of comfusion for users when they run
    df comamnd to directory which has project quota.

    For example, we set 20M softlimit and 10M hardlimit of
    block usage limit for project quota of test_dir(project id 123).

    [root@hades f2fs]# repquota -P -a

    Chengguang Xu
     
  • commit eb31e2f63d85d1bec4f7b136f317e03c03db5503 upstream.

    Push clamping timestamps into notify_change(), so in-kernel
    callers like nfsd and overlayfs will get similar timestamp
    set behavior as utimes.

    AV: get rid of clamping in ->setattr() instances; we don't need
    to bother with that there, with notify_change() doing normalization
    in all cases now (it already did for implicit case, since current_time()
    clamps).

    Suggested-by: Miklos Szeredi
    Fixes: 42e729b9ddbb ("utimes: Clamp the timestamps before update")
    Cc: stable@vger.kernel.org # v5.4
    Cc: Deepa Dinamani
    Cc: Jeff Layton
    Signed-off-by: Amir Goldstein
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     

18 Jan, 2020

1 commit

  • commit 1f0d5c911b64165c9754139a26c8c2fad352c132 upstream.

    We expect 64-bit calculation result from below statement, however
    in 32-bit machine, looped left shift operation on pgoff_t type
    variable may cause overflow issue, fix it by forcing type cast.

    page->index << PAGE_SHIFT;

    Fixes: 26de9b117130 ("f2fs: avoid unnecessary updating inode during fsync")
    Fixes: 0a2aa8fbb969 ("f2fs: refactor __exchange_data_block for speed up")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Chao Yu
     

05 Jan, 2020

3 commits

  • [ Upstream commit 677017d196ba2a4cfff13626b951cc9a206b8c7c ]

    The FS got stuck in the below stack when the storage is almost
    full/dirty condition (when FG_GC is being done).

    schedule_timeout
    io_schedule_timeout
    congestion_wait
    f2fs_drop_inmem_pages_all
    f2fs_gc
    f2fs_balance_fs
    __write_node_page
    f2fs_fsync_node_pages
    f2fs_do_sync_file
    f2fs_ioctl

    The root cause for this issue is there is a potential infinite loop
    in f2fs_drop_inmem_pages_all() for the case where gc_failure is true
    and when there an inode whose i_gc_failures[GC_FAILURE_ATOMIC] is
    not set. Fix this by keeping track of the total atomic files
    currently opened and using that to exit from this condition.

    Fix-suggested-by: Chao Yu
    Signed-off-by: Chao Yu
    Signed-off-by: Sahitya Tummala
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Sahitya Tummala
     
  • [ Upstream commit 2a60637f06ac94869b2e630eaf837110d39bf291 ]

    As Eric reported:

    RENAME_EXCHANGE support was just added to fsstress in xfstests:

    commit 65dfd40a97b6bbbd2a22538977bab355c5bc0f06
    Author: kaixuxia
    Date: Thu Oct 31 14:41:48 2019 +0800

    fsstress: add EXCHANGE renameat2 support

    This is causing xfstest generic/579 to fail due to fsck.f2fs reporting errors.
    I'm not sure what the problem is, but it still happens even with all the
    fs-verity stuff in the test commented out, so that the test just runs fsstress.

    generic/579 23s ... [10:02:25]
    [ 7.745370] run fstests generic/579 at 2019-11-04 10:02:25
    _check_generic_filesystem: filesystem on /dev/vdc is inconsistent
    (see /results/f2fs/results-default/generic/579.full for details)
    [10:02:47]
    Ran: generic/579
    Failures: generic/579
    Failed 1 of 1 tests
    Xunit report: /results/f2fs/results-default/result.xml

    Here's the contents of 579.full:

    _check_generic_filesystem: filesystem on /dev/vdc is inconsistent
    *** fsck.f2fs output ***
    [ASSERT] (__chk_dots_dentries:1378) --> Bad inode number[0x24] for '..', parent parent ino is [0xd10]

    The root cause is that we forgot to update directory's i_pino during
    cross_rename, fix it.

    Fixes: 32f9bc25cbda0 ("f2fs: support ->rename2()")
    Signed-off-by: Chao Yu
    Tested-by: Eric Biggers
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Chao Yu
     
  • [ Upstream commit fe1897eaa6646f5a64a4cee0e6473ed9887d324b ]

    generic/018 reports an inconsistent status of atime, the
    testcase is as below:
    - open file with O_SYNC
    - write file to construct fraged space
    - calc md5 of file
    - record {a,c,m}time
    - defrag file --- do nothing
    - umount & mount
    - check {a,c,m}time

    The root cause is, as f2fs enables lazytime by default, atime
    update will dirty vfs inode, rather than dirtying f2fs inode (by set
    with FI_DIRTY_INODE), so later f2fs_write_inode() called from VFS will
    fail to update inode page due to our skip:

    f2fs_write_inode()
    if (is_inode_flag_set(inode, FI_DIRTY_INODE))
    return 0;

    So eventually, after evict(), we lose last atime for ever.

    To fix this issue, we need to check whether {a,c,m,cr}time is
    consistent in between inode cache and inode page, and only skip
    f2fs_update_inode() if f2fs inode is not dirty and time is
    consistent as well.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Chao Yu
     

22 Sep, 2019

1 commit

  • Pull f2fs updates from Jaegeuk Kim:
    "In this round, we introduced casefolding support in f2fs, and fixed
    various bugs in individual features such as IO alignment,
    checkpoint=disable, quota, and swapfile.

    Enhancement:
    - support casefolding w/ enhancement in ext4
    - support fiemap for directory
    - support FS_IO_GET|SET_FSLABEL

    Bug fix:
    - fix IO stuck during checkpoint=disable
    - avoid infinite GC loop
    - fix panic/overflow related to IO alignment feature
    - fix livelock in swap file
    - fix discard command leak
    - disallow dio for atomic_write"

    * tag 'f2fs-for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (51 commits)
    f2fs: add a condition to detect overflow in f2fs_ioc_gc_range()
    f2fs: fix to add missing F2FS_IO_ALIGNED() condition
    f2fs: fix to fallback to buffered IO in IO aligned mode
    f2fs: fix to handle error path correctly in f2fs_map_blocks
    f2fs: fix extent corrupotion during directIO in LFS mode
    f2fs: check all the data segments against all node ones
    f2fs: Add a small clarification to CONFIG_FS_F2FS_FS_SECURITY
    f2fs: fix inode rwsem regression
    f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()
    f2fs: avoid infinite GC loop due to stale atomic files
    f2fs: Fix indefinite loop in f2fs_gc()
    f2fs: convert inline_data in prior to i_size_write
    f2fs: fix error path of f2fs_convert_inline_page()
    f2fs: add missing documents of reserve_root/resuid/resgid
    f2fs: fix flushing node pages when checkpoint is disabled
    f2fs: enhance f2fs_is_checkpoint_ready()'s readability
    f2fs: clean up __bio_alloc()'s parameter
    f2fs: fix wrong error injection path in inc_valid_block_count()
    f2fs: fix to writeout dirty inode during node flush
    f2fs: optimize case-insensitive lookups
    ...

    Linus Torvalds
     

20 Sep, 2019

1 commit

  • Pull y2038 vfs updates from Arnd Bergmann:
    "Add inode timestamp clamping.

    This series from Deepa Dinamani adds a per-superblock minimum/maximum
    timestamp limit for a file system, and clamps timestamps as they are
    written, to avoid random behavior from integer overflow as well as
    having different time stamps on disk vs in memory.

    At mount time, a warning is now printed for any file system that can
    represent current timestamps but not future timestamps more than 30
    years into the future, similar to the arbitrary 30 year limit that was
    added to settimeofday().

    This was picked as a compromise to warn users to migrate to other file
    systems (e.g. ext4 instead of ext3) when they need the file system to
    survive beyond 2038 (or similar limits in other file systems), but not
    get in the way of normal usage"

    * tag 'y2038-vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
    ext4: Reduce ext4 timestamp warnings
    isofs: Initialize filesystem timestamp ranges
    pstore: fs superblock limits
    fs: omfs: Initialize filesystem timestamp ranges
    fs: hpfs: Initialize filesystem timestamp ranges
    fs: ceph: Initialize filesystem timestamp ranges
    fs: sysv: Initialize filesystem timestamp ranges
    fs: affs: Initialize filesystem timestamp ranges
    fs: fat: Initialize filesystem timestamp ranges
    fs: cifs: Initialize filesystem timestamp ranges
    fs: nfs: Initialize filesystem timestamp ranges
    ext4: Initialize timestamps limits
    9p: Fill min and max timestamps in sb
    fs: Fill in max and min timestamps in superblock
    utimes: Clamp the timestamps before update
    mount: Add mount warning for impending timestamp expiry
    timestamp_truncate: Replace users of timespec64_trunc
    vfs: Add timestamp_truncate() api
    vfs: Add file timestamp range support

    Linus Torvalds
     

19 Sep, 2019

1 commit

  • Pull fs-verity support from Eric Biggers:
    "fs-verity is a filesystem feature that provides Merkle tree based
    hashing (similar to dm-verity) for individual readonly files, mainly
    for the purpose of efficient authenticity verification.

    This pull request includes:

    (a) The fs/verity/ support layer and documentation.

    (b) fs-verity support for ext4 and f2fs.

    Compared to the original fs-verity patchset from last year, the UAPI
    to enable fs-verity on a file has been greatly simplified. Lots of
    other things were cleaned up too.

    fs-verity is planned to be used by two different projects on Android;
    most of the userspace code is in place already. Another userspace tool
    ("fsverity-utils"), and xfstests, are also available. e2fsprogs and
    f2fs-tools already have fs-verity support. Other people have shown
    interest in using fs-verity too.

    I've tested this on ext4 and f2fs with xfstests, both the existing
    tests and the new fs-verity tests. This has also been in linux-next
    since July 30 with no reported issues except a couple minor ones I
    found myself and folded in fixes for.

    Ted and I will be co-maintaining fs-verity"

    * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
    f2fs: add fs-verity support
    ext4: update on-disk format documentation for fs-verity
    ext4: add fs-verity read support
    ext4: add basic fs-verity support
    fs-verity: support builtin file signatures
    fs-verity: add SHA-512 support
    fs-verity: implement FS_IOC_MEASURE_VERITY ioctl
    fs-verity: implement FS_IOC_ENABLE_VERITY ioctl
    fs-verity: add data verification hooks for ->readpages()
    fs-verity: add the hook for file ->setattr()
    fs-verity: add the hook for file ->open()
    fs-verity: add inode and superblock fields
    fs-verity: add Kconfig and the helper functions for hashing
    fs: uapi: define verity bit for FS_IOC_GETFLAGS
    fs-verity: add UAPI header
    fs-verity: add MAINTAINERS file entry
    fs-verity: add a documentation file

    Linus Torvalds
     

18 Sep, 2019

1 commit


16 Sep, 2019

9 commits

  • In f2fs_allocate_data_block(), we will reset fio.retry for IO
    alignment feature instead of IO serialization feature.

    In addition, spread F2FS_IO_ALIGNED() to check IO alignment
    feature status explicitly.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • In LFS mode, we allow OPU for direct IO, however, we didn't consider
    IO alignment feature, so direct IO can trigger unaligned IO, let's
    just fallback to buffered IO to keep correct IO alignment semantics
    in all places.

    Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • In f2fs_map_blocks(), we should bail out once __allocate_data_block()
    failed.

    Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • In LFS mode, por_fsstress testcase reports a bug as below:

    [ASSERT] (fsck_chk_inode_blk: 931) --> ino: 0x12fe has wrong ext: [pgofs:142, blk:215424, len:16]

    Since commit f847c699cff3 ("f2fs: allow out-place-update for direct
    IO in LFS mode"), we start to allow OPU mode for direct IO, however,
    we missed to update extent cache in __allocate_data_block(), finally,
    it cause extent field being inconsistent with physical block address,
    fix it.

    Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • As a part of the sanity checking while mounting, distinct segment number
    assignment to data and node segments is verified. Fixing a small bug in
    this verification between node and data segments. We need to check all
    the data segments with all the node segments.

    Fixes: 042be0f849e5f ("f2fs: fix to do sanity check with current segment number")
    Signed-off-by: Surbhi Palande
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Surbhi Palande
     
  • Signed-off-by: Lockywolf
    Signed-off-by: Jaegeuk Kim

    Lockywolf
     
  • This is similar to 942491c9e6d6 ("xfs: fix AIM7 regression")
    Apparently our current rwsem code doesn't like doing the trylock, then
    lock for real scheme. So change our read/write methods to just do the
    trylock for the RWF_NOWAIT case.

    We don't need a check for IOCB_NOWAIT and !direct-IO because it
    is checked in generic_write_checks().

    Fixes: b91050a80cec ("f2fs: add nowait aio support")
    Signed-off-by: Goldwyn Rodrigues
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Goldwyn Rodrigues
     
  • If inode is newly created, inode page may not synchronize with inode cache,
    so fields like .i_inline or .i_extra_isize could be wrong, in below call
    path, we may access such wrong fields, result in failing to migrate valid
    target block.

    Thread A Thread B
    - f2fs_create
    - f2fs_add_link
    - f2fs_add_dentry
    - f2fs_init_inode_metadata
    - f2fs_add_inline_entry
    - f2fs_new_inode_page
    - f2fs_put_page
    : inode page wasn't updated with inode cache
    - gc_data_segment
    - is_alive
    - f2fs_get_node_page
    - datablock_addr
    - offset_in_addr
    : access uninitialized fields

    Fixes: 7a2af766af15 ("f2fs: enhance on-disk inode structure scalability")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • If committing atomic pages is failed when doing f2fs_do_sync_file(), we can
    get commited pages but atomic_file being still set like:

    - inmem: 0, atomic IO: 4 (Max. 10), volatile IO: 0 (Max. 0)

    If GC selects this block, we can get an infinite loop like this:

    f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
    f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 18533696, size = 4096
    f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
    f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
    f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
    f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 18533696, size = 4096
    f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
    f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c

    In that moment, we can observe:

    [Before]
    Try to move 5084219 blocks (BG: 384508)
    - data blocks : 4962373 (274483)
    - node blocks : 121846 (110025)
    Skipped : atomic write 4534686 (10)

    [After]
    Try to move 5088973 blocks (BG: 384508)
    - data blocks : 4967127 (274483)
    - node blocks : 121846 (110025)
    Skipped : atomic write 4539440 (10)

    So, refactor atomic_write flow like this:
    1. start_atomic_write
    - add inmem_list and set atomic_file

    2. write()
    - register it in inmem_pages

    3. commit_atomic_write
    - if no error, f2fs_drop_inmem_pages()
    - f2fs_commit_inmme_pages() failed
    : __revoked_inmem_pages() was done
    - f2fs_do_sync_file failed
    : abort_atomic_write later

    4. abort_atomic_write
    - f2fs_drop_inmem_pages

    5. f2fs_drop_inmem_pages
    - clear atomic_file
    - remove inmem_list

    Based on this change, when GC fails to move block in atomic_file,
    f2fs_drop_inmem_pages_all() can call f2fs_drop_inmem_pages().

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

09 Sep, 2019

1 commit

  • Policy - foreground GC, LFS mode and greedy GC mode.

    Under this policy, f2fs_gc() loops forever to GC as it doesn't have
    enough free segements to proceed and thus it keeps calling gc_more
    for the same victim segment. This can happen if the selected victim
    segment could not be GC'd due to failed blkaddr validity check i.e.
    is_alive() returns false for the blocks set in current validity map.

    Fix this by not resetting the sbi->cur_victim_sec to NULL_SEGNO, when
    the segment selected could not be GC'd. This helps to select another
    segment for GC and thus helps to proceed forward with GC.

    [Note]
    This can happen due to is_alive as well as atomic_file which skipps
    GC.

    Signed-off-by: Sahitya Tummala
    Signed-off-by: Jaegeuk Kim

    Sahitya Tummala
     

07 Sep, 2019

8 commits

  • In below call path, we change i_size before inline conversion, however,
    if we failed to convert inline inode, the inode may have wrong i_size
    which is larger than max inline size, result inline inode corruption.

    - f2fs_setattr
    - truncate_setsize
    - f2fs_convert_inline_inode

    This patch reorders truncate_setsize() and f2fs_convert_inline_inode()
    to guarantee inline_data has valid i_size.

    Fixes: 0cab80ee0c9e ("f2fs: fix to convert inline inode in ->setattr")
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • In error path of f2fs_convert_inline_page(), we missed to truncate newly
    reserved block in .i_addrs[0] once we failed in get_node_info(), fix it.

    Fixes: 7735730d39d7 ("f2fs: fix to propagate error from __get_meta_page()")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch fixes skipping node page writes when checkpoint is disabled.
    In this period, we can't rely on checkpoint to flush node pages.

    Fixes: fd8c8caf7e7c ("f2fs: let checkpoint flush dnode page of regular")
    Fixes: 4354994f097d ("f2fs: checkpoint disabling")
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch changes sematics of f2fs_is_checkpoint_ready()'s return
    value as: return true when checkpoint is ready, other return false,
    it can improve readability of below conditions.

    f2fs_submit_page_write()
    ...
    if (is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN) ||
    !f2fs_is_checkpoint_ready(sbi))
    __submit_merged_bio(io);

    f2fs_balance_fs()
    ...
    if (!f2fs_is_checkpoint_ready(sbi))
    return;

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Just cleanup, no logic change.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • If FAULT_BLOCK type error injection is on, in inc_valid_block_count()
    we may decrease sbi->alloc_valid_block_count percpu stat count
    incorrectly, fix it.

    Fixes: 36b877af7992 ("f2fs: Keep alloc_valid_block_count in sync")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • As Eric reported:

    On xfstest generic/204 on f2fs, I'm getting a kernel BUG.

    allocate_segment_by_default+0x9d/0x100 [f2fs]
    f2fs_allocate_data_block+0x3c0/0x5c0 [f2fs]
    do_write_page+0x62/0x110 [f2fs]
    f2fs_do_write_node_page+0x2b/0xa0 [f2fs]
    __write_node_page+0x2ec/0x590 [f2fs]
    f2fs_sync_node_pages+0x756/0x7e0 [f2fs]
    block_operations+0x25b/0x350 [f2fs]
    f2fs_write_checkpoint+0x104/0x1150 [f2fs]
    f2fs_sync_fs+0xa2/0x120 [f2fs]
    f2fs_balance_fs_bg+0x33c/0x390 [f2fs]
    f2fs_write_node_pages+0x4c/0x1f0 [f2fs]
    do_writepages+0x1c/0x70
    __writeback_single_inode+0x45/0x320
    writeback_sb_inodes+0x273/0x5c0
    wb_writeback+0xff/0x2e0
    wb_workfn+0xa1/0x370
    process_one_work+0x138/0x350
    worker_thread+0x4d/0x3d0
    kthread+0x109/0x140

    The root cause of this issue is, in a very small partition, e.g.
    in generic/204 testcase of fstest suit, filesystem's free space
    is 50MB, so at most we can write 12800 inline inode with command:
    `echo XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > $SCRATCH_MNT/$i`,
    then filesystem will have:
    - 12800 dirty inline data page
    - 12800 dirty inode page
    - and 12800 dirty imeta (dirty inode)

    When we flush node-inode's page cache, we can also flush inline
    data with each inode page, however it will run out-of-free-space
    in device, then once it triggers checkpoint, there is no room for
    huge number of imeta, at this time, GC is useless, as there is no
    dirty segment at all.

    In order to fix this, we try to recognize inode page during
    node_inode's page flushing, and update inode page from dirty inode,
    so that later another imeta (dirty inode) flush can be avoided.

    Reported-and-tested-by: Eric Biggers
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch ports below casefold enhancement patch from ext4 to f2fs

    commit 3ae72562ad91 ("ext4: optimize case-insensitive lookups")

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

30 Aug, 2019

1 commit

  • Update the inode timestamp updates to use timestamp_truncate()
    instead of timespec64_trunc().

    The change was mostly generated by the following coccinelle
    script.

    virtual context
    virtual patch

    @r1 depends on patch forall@
    struct inode *inode;
    identifier i_xtime =~ "^i_[acm]time$";
    expression e;
    @@

    inode->i_xtime =
    - timespec64_trunc(
    + timestamp_truncate(
    ...,
    - e);
    + inode);

    Signed-off-by: Deepa Dinamani
    Acked-by: Greg Kroah-Hartman
    Acked-by: Jeff Layton
    Cc: adrian.hunter@intel.com
    Cc: dedekind1@gmail.com
    Cc: gregkh@linuxfoundation.org
    Cc: hch@lst.de
    Cc: jaegeuk@kernel.org
    Cc: jlbec@evilplan.org
    Cc: richard@nod.at
    Cc: tj@kernel.org
    Cc: yuchao0@huawei.com
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: linux-ntfs-dev@lists.sourceforge.net
    Cc: linux-mtd@lists.infradead.org

    Deepa Dinamani
     

23 Aug, 2019

1 commit