30 May, 2020

1 commit

  • Under heavy fsstress, we may triggle panic while issuing discard,
    because __check_sit_bitmap() detects that discard command may earse
    valid data blocks, the root cause is as below race stack described,
    since we removed lock when flushing quota data, quota data writeback
    may race with write_checkpoint(), so that it causes inconsistency in
    between cached discard entry and segment bitmap.

    - f2fs_write_checkpoint
    - block_operations
    - set_sbi_flag(sbi, SBI_QUOTA_SKIP_FLUSH)
    - f2fs_flush_sit_entries
    - add_discard_addrs
    - __set_bit_le(i, (void *)de->discard_map);
    - f2fs_write_data_pages
    - f2fs_write_single_data_page
    : inode is quota one, cp_rwsem won't be locked
    - f2fs_do_write_data_page
    - f2fs_allocate_data_block
    - f2fs_wait_discard_bio
    : discard entry has not been added yet.
    - update_sit_entry
    - f2fs_clear_prefree_segments
    - f2fs_issue_discard
    : add discard entry

    In order to fix this, this patch uses node_write to serialize
    f2fs_allocate_data_block and checkpoint.

    Fixes: 435cbab95e39 ("f2fs: fix quota_sync failure due to f2fs_lock_op")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

29 May, 2020

1 commit


18 Apr, 2020

3 commits

  • When a discard_cmd needs to be split due to dpolicy->max_requests, then
    for the remaining length it will be either merged into another cmd or a
    new discard_cmd will be created. In this case, there is double
    accounting of dcc->undiscard_blks for the remaining len, due to which
    it shows incorrect value in stats.

    Signed-off-by: Sahitya Tummala
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Sahitya Tummala
     
  • In case a discard_cmd is split into several bios, the dc->error
    must not be overwritten once an error is reported by a bio. Also,
    move it under dc->lock.

    Signed-off-by: Sahitya Tummala
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Sahitya Tummala
     
  • F2FS already has a default timeout of 5 secs for discards that
    can be issued during umount, but it can take more than the 5 sec
    timeout if the underlying UFS device queue is already full and there
    are no more available free tags to be used. Fix this by submitting a
    small batch of discard requests so that it won't cause the device
    queue to be full at any time and thus doesn't incur its wait time
    in the umount context.

    Signed-off-by: Sahitya Tummala
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Sahitya Tummala
     

04 Apr, 2020

1 commit


31 Mar, 2020

1 commit

  • Data flush can generate heavy IO and cause long latency during
    flush, so it's not appropriate to trigger it in foreground
    operation.

    And also, we may face below potential deadlock during data flush:
    - f2fs_write_multi_pages
    - f2fs_write_raw_pages
    - f2fs_write_single_data_page
    - f2fs_balance_fs
    - f2fs_balance_fs_bg
    - f2fs_sync_dirty_inodes
    - filemap_fdatawrite -- stuck on flush same cluster

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

20 Mar, 2020

4 commits

  • In order to avoid polluting global slab cache namespace.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • As Geert Uytterhoeven reported:

    for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

    On some platforms, HZ can be less than 50, then unexpected 0 timeout
    jiffies will be set in congestion_wait().

    This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
    value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

    Quoted from Geert Uytterhoeven:

    "A timeout of HZ means 1 second.
    HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

    If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
    as that takes care of the special cases, and never returns 0."

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch removes F2FS_MOUNT_ADAPTIVE and F2FS_MOUNT_LFS mount options,
    and add F2FS_OPTION.fs_mode with below two status to indicate filesystem
    mode.

    enum {
    FS_MODE_ADAPTIVE, /* use both lfs/ssr allocation */
    FS_MODE_LFS, /* use lfs allocation only */
    };

    It can enhance code readability and fs mode's scalability.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Let's show mounted time.

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

18 Jan, 2020

3 commits

  • Mutex lock won't serialize callers, in order to avoid starving of unlucky
    caller, let's use rwsem lock instead.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Setting 0x40 in /sys/fs/f2fs/dev/ipu_policy gives a way to turn off
    bio cache, which is useufl to check whether block layer using hardware
    encryption engine merges IOs correctly.

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch tries to support compression in f2fs.

    - New term named cluster is defined as basic unit of compression, file can
    be divided into multiple clusters logically. One cluster includes 4 << n
    (n >= 0) logical pages, compression size is also cluster size, each of
    cluster can be compressed or not.

    - In cluster metadata layout, one special flag is used to indicate cluster
    is compressed one or normal one, for compressed cluster, following metadata
    maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
    data including compress header and compressed data.

    - In order to eliminate write amplification during overwrite, F2FS only
    support compression on write-once file, data can be compressed only when
    all logical blocks in file are valid and cluster compress ratio is lower
    than specified threshold.

    - To enable compression on regular inode, there are three ways:
    * chattr +c file
    * chattr +c dir; touch dir/file
    * mount w/ -o compress_extension=ext; touch file.ext

    Compress metadata layout:
    [Dnode Structure]
    +-----------------------------------------------+
    | cluster 1 | cluster 2 | ......... | cluster N |
    +-----------------------------------------------+
    . . . .
    . . . .
    . Compressed Cluster . . Normal Cluster .
    +----------+---------+---------+---------+ +---------+---------+---------+---------+
    |compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
    +----------+---------+---------+---------+ +---------+---------+---------+---------+
    . .
    . .
    . .
    +-------------+-------------+----------+----------------------------+
    | data length | data chksum | reserved | compressed data |
    +-------------+-------------+----------+----------------------------+

    Changelog:

    20190326:
    - fix error handling of read_end_io().
    - remove unneeded comments in f2fs_encrypt_one_page().

    20190327:
    - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
    - don't jump into loop directly to avoid uninitialized variables.
    - add TODO tag in error path of f2fs_write_cache_pages().

    20190328:
    - fix wrong merge condition in f2fs_read_multi_pages().
    - check compressed file in f2fs_post_read_required().

    20190401
    - allow overwrite on non-compressed cluster.
    - check cluster meta before writing compressed data.

    20190402
    - don't preallocate blocks for compressed file.

    - add lz4 compress algorithm
    - process multiple post read works in one workqueue
    Now f2fs supports processing post read work in multiple workqueue,
    it shows low performance due to schedule overhead of multiple
    workqueue executing orderly.

    20190921
    - compress: support buffered overwrite
    C: compress cluster flag
    V: valid block address
    N: NEW_ADDR

    One cluster contain 4 blocks

    before overwrite after overwrite

    - VVVV -> CVNN
    - CVNN -> VVVV

    - CVNN -> CVNN
    - CVNN -> CVVV

    - CVVV -> CVNN
    - CVVV -> CVVV

    20191029
    - add kconfig F2FS_FS_COMPRESSION to isolate compression related
    codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
    note that: will remove lzo backend if Jaegeuk agreed that too.
    - update codes according to Eric's comments.

    20191101
    - apply fixes from Jaegeuk

    20191113
    - apply fixes from Jaegeuk
    - split workqueue for fsverity

    20191216
    - apply fixes from Jaegeuk

    20200117
    - fix to avoid NULL pointer dereference

    [Jaegeuk Kim]
    - add tracepoint for f2fs_{,de}compress_pages()
    - fix many bugs and add some compression stats
    - fix overwrite/mmap bugs
    - address 32bit build error, reported by Geert.
    - bug fixes when handling errors and i_compressed_blocks

    Reported-by:
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

16 Jan, 2020

3 commits

  • Remove duplicate sbi->aw_cnt stats counter that tracks
    the number of atomic files currently opened (it also shows
    incorrect value sometimes). Use more relit lable sbi->atomic_files
    to show in the stats.

    Signed-off-by: Sahitya Tummala
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Sahitya Tummala
     
  • To catch f2fs bugs in write pointer handling code for zoned block
    devices, check write pointers of non-open zones that current segments do
    not point to. Do this check at mount time, after the fsync data recovery
    and current segments' write pointer consistency fix. Or when fsync data
    recovery is disabled by mount option, do the check when there is no fsync
    data.

    Check two items comparing write pointers with valid block maps in SIT.
    The first item is check for zones with no valid blocks. When there is no
    valid blocks in a zone, the write pointer should be at the start of the
    zone. If not, next write operation to the zone will cause unaligned write
    error. If write pointer is not at the zone start, reset the write pointer
    to place at the zone start.

    The second item is check between the write pointer position and the last
    valid block in the zone. It is unexpected that the last valid block
    position is beyond the write pointer. In such a case, report as a bug.
    Fix is not required for such zone, because the zone is not selected for
    next write operation until the zone get discarded.

    Signed-off-by: Shin'ichiro Kawasaki
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Shin'ichiro Kawasaki
     
  • On sudden f2fs shutdown, write pointers of zoned block devices can go
    further but f2fs meta data keeps current segments at positions before the
    write operations. After remounting the f2fs, this inconsistency causes
    write operations not at write pointers and "Unaligned write command"
    error is reported.

    To avoid the error, compare current segments with write pointers of open
    zones the current segments point to, during mount operation. If the write
    pointer position is not aligned with the current segment position, assign
    a new zone to the current segment. Also check the newly assigned zone has
    write pointer at zone start. If not, reset write pointer of the zone.

    Perform the consistency check during fsync recovery. Not to lose the
    fsync data, do the check after fsync data gets restored and before
    checkpoint commit which flushes data at current segment positions. Not to
    cause conflict with kworker's dirfy data/node flush, do the fix within
    SBI_POR_DOING protection.

    Signed-off-by: Shin'ichiro Kawasaki
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Shin'ichiro Kawasaki
     

01 Dec, 2019

1 commit

  • Pull f2fs updates from Jaegeuk Kim:
    "In this round, we've introduced fairly small number of patches as below.

    Enhancements:
    - improve the in-place-update IO flow
    - allocate segment to guarantee no GC for pinned files

    Bug fixes:
    - fix updatetime in lazytime mode
    - potential memory leak in f2fs_listxattr
    - record parent inode number in rename2 correctly
    - fix deadlock in f2fs_gc along with atomic writes
    - avoid needless data migration in GC"

    * tag 'f2fs-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
    f2fs: stop GC when the victim becomes fully valid
    f2fs: expose main_blkaddr in sysfs
    f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()
    f2fs: Fix deadlock in f2fs_gc() context during atomic files handling
    f2fs: show f2fs instance in printk_ratelimited
    f2fs: fix potential overflow
    f2fs: fix to update dir's i_pino during cross_rename
    f2fs: support aligned pinned file
    f2fs: avoid kernel panic on corruption test
    f2fs: fix wrong description in document
    f2fs: cache global IPU bio
    f2fs: fix to avoid memory leakage in f2fs_listxattr
    f2fs: check total_segments from devices in raw_super
    f2fs: update multi-dev metadata in resize_fs
    f2fs: mark recovery flag correctly in read_raw_super_block()
    f2fs: fix to update time in lazytime mode

    Linus Torvalds
     

20 Nov, 2019

2 commits

  • The FS got stuck in the below stack when the storage is almost
    full/dirty condition (when FG_GC is being done).

    schedule_timeout
    io_schedule_timeout
    congestion_wait
    f2fs_drop_inmem_pages_all
    f2fs_gc
    f2fs_balance_fs
    __write_node_page
    f2fs_fsync_node_pages
    f2fs_do_sync_file
    f2fs_ioctl

    The root cause for this issue is there is a potential infinite loop
    in f2fs_drop_inmem_pages_all() for the case where gc_failure is true
    and when there an inode whose i_gc_failures[GC_FAILURE_ATOMIC] is
    not set. Fix this by keeping track of the total atomic files
    currently opened and using that to exit from this condition.

    Fix-suggested-by: Chao Yu
    Signed-off-by: Chao Yu
    Signed-off-by: Sahitya Tummala
    Signed-off-by: Jaegeuk Kim

    Sahitya Tummala
     
  • As Eric mentioned, bare printk{,_ratelimited} won't show which
    filesystem instance these message is coming from, this patch tries
    to show fs instance with sb->s_id field in all places we missed
    before.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

08 Nov, 2019

1 commit

  • This patch supports 2MB-aligned pinned file, which can guarantee no GC at all
    by allocating fully valid 2MB segment.

    Check free segments by has_not_enough_free_secs() with large budget.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

07 Nov, 2019

1 commit

  • Zoned block devices (ZBC and ZAC devices) allow an explicit control
    over the condition (state) of zones. The operations allowed are:
    * Open a zone: Transition to open condition to indicate that a zone will
    actively be written
    * Close a zone: Transition to closed condition to release the drive
    resources used for writing to a zone
    * Finish a zone: Transition an open or closed zone to the full
    condition to prevent write operations

    To enable this control for in-kernel zoned block device users, define
    the new request operations REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE
    and REQ_OP_ZONE_FINISH as well as the generic function
    blkdev_zone_mgmt() for submitting these operations on a range of zones.
    This results in blkdev_reset_zones() removal and replacement with this
    new zone magement function. Users of blkdev_reset_zones() (f2fs and
    dm-zoned) are updated accordingly.

    Contains contributions from Matias Bjorling, Hans Holmberg,
    Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig.

    Reviewed-by: Javier González
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ajay Joshi
    Signed-off-by: Matias Bjorling
    Signed-off-by: Hans Holmberg
    Signed-off-by: Dmitry Fomichev
    Signed-off-by: Keith Busch
    Signed-off-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Ajay Joshi
     

26 Oct, 2019

1 commit

  • In commit 8648de2c581e ("f2fs: add bio cache for IPU"), we added
    f2fs_submit_ipu_bio() in __write_data_page() as below:

    __write_data_page()

    if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode)) {
    f2fs_submit_ipu_bio(sbi, bio, page);
    ....
    }

    in order to avoid below deadlock:

    Thread A Thread B
    - __write_data_page (inode x, page y)
    - f2fs_do_write_data_page
    - set_page_writeback ---- set writeback flag in page y
    - f2fs_inplace_write_data
    - f2fs_balance_fs
    - lock gc_mutex
    - lock gc_mutex
    - f2fs_gc
    - do_garbage_collect
    - gc_data_segment
    - move_data_page
    - f2fs_wait_on_page_writeback
    - wait_on_page_writeback --- wait writeback of page y

    However, the bio submission breaks the merge of IPU IOs.

    So in this patch let's add a global bio cache for merged IPU pages,
    then f2fs_wait_on_page_writeback() is able to submit bio if a
    writebacked page is cached in global bio cache.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

16 Sep, 2019

2 commits

  • In f2fs_allocate_data_block(), we will reset fio.retry for IO
    alignment feature instead of IO serialization feature.

    In addition, spread F2FS_IO_ALIGNED() to check IO alignment
    feature status explicitly.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • If committing atomic pages is failed when doing f2fs_do_sync_file(), we can
    get commited pages but atomic_file being still set like:

    - inmem: 0, atomic IO: 4 (Max. 10), volatile IO: 0 (Max. 0)

    If GC selects this block, we can get an infinite loop like this:

    f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
    f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 18533696, size = 4096
    f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
    f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
    f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
    f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 18533696, size = 4096
    f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
    f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c

    In that moment, we can observe:

    [Before]
    Try to move 5084219 blocks (BG: 384508)
    - data blocks : 4962373 (274483)
    - node blocks : 121846 (110025)
    Skipped : atomic write 4534686 (10)

    [After]
    Try to move 5088973 blocks (BG: 384508)
    - data blocks : 4967127 (274483)
    - node blocks : 121846 (110025)
    Skipped : atomic write 4539440 (10)

    So, refactor atomic_write flow like this:
    1. start_atomic_write
    - add inmem_list and set atomic_file

    2. write()
    - register it in inmem_pages

    3. commit_atomic_write
    - if no error, f2fs_drop_inmem_pages()
    - f2fs_commit_inmme_pages() failed
    : __revoked_inmem_pages() was done
    - f2fs_do_sync_file failed
    : abort_atomic_write later

    4. abort_atomic_write
    - f2fs_drop_inmem_pages

    5. f2fs_drop_inmem_pages
    - clear atomic_file
    - remove inmem_list

    Based on this change, when GC fails to move block in atomic_file,
    f2fs_drop_inmem_pages_all() can call f2fs_drop_inmem_pages().

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

07 Sep, 2019

1 commit

  • This patch changes sematics of f2fs_is_checkpoint_ready()'s return
    value as: return true when checkpoint is ready, other return false,
    it can improve readability of below conditions.

    f2fs_submit_page_write()
    ...
    if (is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN) ||
    !f2fs_is_checkpoint_ready(sbi))
    __submit_merged_bio(io);

    f2fs_balance_fs()
    ...
    if (!f2fs_is_checkpoint_ready(sbi))
    return;

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

23 Aug, 2019

6 commits

  • Policy - Foreground GC, LFS and greedy GC mode.

    Under this policy, f2fs_gc() loops forever to GC as it doesn't have
    enough free segements to proceed and thus it keeps calling gc_more
    for the same victim segment. This can happen if the selected victim
    segment could not be GC'd due to failed blkaddr validity check i.e.
    is_alive() returns false for the blocks set in current validity map.

    Fix this by keeping track of such invalid segments and skip those
    segments for selection in get_victim_by_default() to avoid endless
    GC loop under such error scenarios. Currently, add this logic under
    CONFIG_F2FS_CHECK_FS to be able to root cause the issue in debug
    version.

    Signed-off-by: Sahitya Tummala
    Reviewed-by: Chao Yu
    [Jaegeuk Kim: fix wrong bitmap size]
    Signed-off-by: Jaegeuk Kim

    Sahitya Tummala
     
  • build_sit_info() allocate all bitmaps for each segment one by one,
    it's quite low efficiency, this pach changes to allocate large
    continuous memory at a time, and divide it and assign for each bitmaps
    of segment. For large size image, it can expect improving its mount
    speed.

    Signed-off-by: Chen Gong
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • There is one case can cause data corruption.

    - write 4k to fileA
    - fsync fileA, 4k data is writebacked to lbaA
    - write 4k to fileA
    - kworker flushs 4k to lbaB; dnode contain lbaB didn't be persisted yet
    - write 4k to fileB
    - kworker flush 4k to lbaA due to SSR
    - SPOR -> dnode with lbaA will be recovered, however lbaA contains fileB's
    data

    One solution is tracking all fsynced file's block history, and disallow
    SSR overwrite on newly invalidated block on that file.

    However, during recovery, no matter the dnode is flushed or fsynced, all
    previous dnodes until last fsynced one in node chain can be recovered,
    that means we need to record all block change in flushed dnode, which
    will cause heavy cost, so let's just use simple fix by forbidding SSR
    overwrite directly.

    Fixes: 5b6c6be2d878 ("f2fs: use SSR for warm node as well")
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • As Pavel Machek reported:

    "We normally use -EUCLEAN to signal filesystem corruption. Plus, it is
    good idea to report it to the syslog and mark filesystem as "needing
    fsck" if filesystem can do that."

    Still we need improve the original patch with:
    - use unlikely keyword
    - add message print
    - return EUCLEAN

    However, after rethink this patch, I don't think we should add such
    condition check here as below reasons:
    - We have already checked the field in f2fs_sanity_check_ckpt(),
    - If there is fs corrupt or security vulnerability, there is nothing
    to guarantee the field is integrated after the check, unless we do
    the check before each of its use, however no filesystem does that.
    - We only have similar check for bitmap, which was added due to there
    is bitmap corruption happened on f2fs' runtime in product.
    - There are so many key fields in SB/CP/NAT did have such check
    after f2fs_sanity_check_{sb,cp,..}.

    So I propose to revert this unneeded check.

    This reverts commit 56f3ce675103e3fb9e631cfb4131fc768bc23e9a.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • We do not need to set the SBI_NEED_FSCK flag in the error paths, if we
    return error here, we will not update the checkpoint flag, so the code
    is useless, just remove it.

    Signed-off-by: Lihong Kou
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Lihong Kou
     
  • =============================================================================
    BUG discard_cmd (Tainted: G B OE ): Objects remaining in discard_cmd on __kmem_cache_shutdown()
    -----------------------------------------------------------------------------

    INFO: Slab 0xffffe1ac481d22c0 objects=36 used=2 fp=0xffff936b4748bf50 flags=0x2ffff0000000100
    Call Trace:
    dump_stack+0x63/0x87
    slab_err+0xa1/0xb0
    __kmem_cache_shutdown+0x183/0x390
    shutdown_cache+0x14/0x110
    kmem_cache_destroy+0x195/0x1c0
    f2fs_destroy_segment_manager_caches+0x21/0x40 [f2fs]
    exit_f2fs_fs+0x35/0x641 [f2fs]
    SyS_delete_module+0x155/0x230
    ? vtime_user_exit+0x29/0x70
    do_syscall_64+0x6e/0x160
    entry_SYSCALL64_slow_path+0x25/0x25

    INFO: Object 0xffff936b4748b000 @offset=0
    INFO: Object 0xffff936b4748b070 @offset=112
    kmem_cache_destroy discard_cmd: Slab cache still has objects
    Call Trace:
    dump_stack+0x63/0x87
    kmem_cache_destroy+0x1b4/0x1c0
    f2fs_destroy_segment_manager_caches+0x21/0x40 [f2fs]
    exit_f2fs_fs+0x35/0x641 [f2fs]
    SyS_delete_module+0x155/0x230
    do_syscall_64+0x6e/0x160
    entry_SYSCALL64_slow_path+0x25/0x25

    Recovery can cache discard commands, so in error path of fill_super(),
    we need give a chance to handle them, otherwise it will lead to leak
    of discard_cmd slab cache.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

11 Jul, 2019

2 commits

  • blkoff_off might over 512 due to fs corrupt or security
    vulnerability. That should be checked before being using.

    Use ENTRIES_IN_SUM to protect invalid value in cur_data_blkoff.

    Signed-off-by: Ocean Chen
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Ocean Chen
     
  • In umount, we give an constand time to handle pending discard, previously,
    in __issue_discard_cmd() we missed to check timeout condition in loop,
    result in delaying long time, fix it.

    Signed-off-by: Heng Xiao
    [Chao Yu: add commit message]
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Heng Xiao
     

03 Jul, 2019

4 commits

  • f2fs uses EFAULT as error number to indicate filesystem is corrupted
    all the time, but generic filesystems use EUCLEAN for such condition,
    we need to change to follow others.

    This patch adds two new macros as below to wrap more generic error
    code macros, and spread them in code.

    EFSBADCRC EBADMSG /* Bad CRC detected */
    EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */

    Reported-by: Pavel Machek
    Signed-off-by: Chao Yu
    Acked-by: Pavel Machek
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • As Pavel reported, once we detect filesystem inconsistency in
    f2fs_inplace_write_data(), it will be better to print kernel message as
    we did in other places.

    Reported-by: Pavel Machek
    Signed-off-by: Chao Yu
    Acked-by: Pavel Machek
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • - Add and use f2fs_ macros
    - Convert f2fs_msg to f2fs_printk
    - Remove level from f2fs_printk and embed the level in the format
    - Coalesce formats and align multi-line arguments
    - Remove unnecessary duplicate extern f2fs_msg f2fs.h

    Signed-off-by: Joe Perches
    Signed-off-by: Chao Yu
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Joe Perches
     
  • This ioctl shrinks a given length (aligned to sections) from end of the
    main area. Any cursegs and valid blocks will be moved out before
    invalidating the range.

    This feature can be used for adjusting partition sizes online.

    History of the patch:

    Sahitya Tummala:
    - Add this ioctl for f2fs_compat_ioctl() as well.
    - Fix debugfs status to reflect the online resize changes.
    - Fix potential race between online resize path and allocate new data
    block path or gc path.

    Others:
    - Rename some identifiers.
    - Add some error handling branches.
    - Clear sbi->next_victim_seg[BG_GC/FG_GC] in shrinking range.
    - Implement this interface as ext4's, and change the parameter from shrunk
    bytes to new block count of F2FS.
    - During resizing, force to empty sit_journal and forbid adding new
    entries to it, in order to avoid invalid segno in journal after resize.
    - Reduce sbi->user_block_count before resize starts.
    - Commit the updated superblock first, and then update in-memory metadata
    only when the former succeeds.
    - Target block count must align to sections.
    - Write checkpoint before and after committing the new superblock, w/o
    CP_FSCK_FLAG respectively, so that the FS can be fixed by fsck even if
    resize fails after the new superblock is committed.
    - In free_segment_range(), reduce granularity of gc_mutex.
    - Add protection on curseg migration.
    - Add freeze_bdev() and thaw_bdev() for resize fs.
    - Remove CUR_MAIN_SECS and use MAIN_SECS directly for allocation.
    - Recover super_block and FS metadata when resize fails.
    - No need to clear CP_FSCK_FLAG in update_ckpt_flags().
    - Clean up the sb and fs metadata update functions for resize_fs.

    Geert Uytterhoeven:
    - Use div_u64*() for 64-bit divisions

    Arnd Bergmann:
    - Not all architectures support get_user() with a 64-bit argument:
    ERROR: "__get_user_bad" [fs/f2fs/f2fs.ko] undefined!
    Use copy_from_user() here, this will always work.

    Signed-off-by: Qiuyang Sun
    Signed-off-by: Chao Yu
    Signed-off-by: Sahitya Tummala
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Qiuyang Sun
     

04 Jun, 2019

2 commits

  • This extends the checkpoint option to allow checkpoint=disable:%u[%]
    This allows you to specify what how much of the disk you are willing
    to lose access to while mounting with checkpoint=disable. If the amount
    lost would be higher, the mount will return -EAGAIN. This can be given
    as a percent of total space, or in blocks.

    Currently, we need to run garbage collection until the amount of holes
    is smaller than the OVP space. With the new option, f2fs can mark
    space as unusable up front instead of requiring garbage collection until
    the number of holes is small enough.

    Signed-off-by: Daniel Rosenberg
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Daniel Rosenberg
     
  • The existing threshold for allowable holes at checkpoint=disable time is
    too high. The OVP space contains reserved segments, which are always in
    the form of free segments. These must be subtracted from the OVP value.

    The current threshold is meant to be the maximum value of holes of a
    single type we can have and still guarantee that we can fill the disk
    without failing to find space for a block of a given type.

    If the disk is full, ignoring current reserved, which only helps us,
    the amount of unused blocks is equal to the OVP area. Of that, there
    are reserved segments, which must be free segments, and the rest of the
    ovp area, which can come from either free segments or holes. The maximum
    possible amount of holes is OVP-reserved.

    Now, consider the disk when mounting with checkpoint=disable.
    We must be able to fill all available free space with either data or
    node blocks. When we start with checkpoint=disable, holes are locked to
    their current type. Say we have H of one type of hole, and H+X of the
    other. We can fill H of that space with arbitrary typed blocks via SSR.
    For the remaining H+X blocks, we may not have any of a given block type
    left at all. For instance, if we were to fill the disk entirely with
    blocks of the type with fewer holes, the H+X blocks of the opposite type
    would not be used. If H+X > OVP-reserved, there would be more holes than
    could possibly exist, and we would have failed to find a suitable block
    earlier on, leading to a crash in update_sit_entry.

    If H+X
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Daniel Rosenberg