10 Jun, 2020

1 commit

  • Pull f2fs updates from Jaegeuk Kim:
    "In this round, we've added some knobs to enhance compression feature
    and harden testing environment. In addition, we've fixed several bugs
    reported from Android devices such as long discarding latency, device
    hanging during quota_sync, etc.

    Enhancements:
    - support lzo-rle algorithm
    - add two ioctls to release and reserve blocks for compression
    - support partial truncation/fiemap on compressed file
    - introduce sysfs entries to attach IO flags explicitly
    - add iostat trace point along with read io stat

    Bug fixes:
    - fix long discard latency
    - flush quota data by f2fs_quota_sync correctly
    - fix to recover parent inode number for power-cut recovery
    - fix lz4/zstd output buffer budget
    - parse checkpoint mount option correctly
    - avoid inifinite loop to wait for flushing node/meta pages
    - manage discard space correctly

    And some refactoring and clean up patches were added"

    * tag 'f2fs-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (51 commits)
    f2fs: attach IO flags to the missing cases
    f2fs: add node_io_flag for bio flags likewise data_io_flag
    f2fs: remove unused parameter of f2fs_put_rpages_mapping()
    f2fs: handle readonly filesystem in f2fs_ioc_shutdown()
    f2fs: avoid utf8_strncasecmp() with unstable name
    f2fs: don't return vmalloc() memory from f2fs_kmalloc()
    f2fs: fix retry logic in f2fs_write_cache_pages()
    f2fs: fix wrong discard space
    f2fs: compress: don't compress any datas after cp stop
    f2fs: remove unneeded return value of __insert_discard_tree()
    f2fs: fix wrong value of tracepoint parameter
    f2fs: protect new segment allocation in expand_inode_data
    f2fs: code cleanup by removing ifdef macro surrounding
    f2fs: avoid inifinite loop to wait for flushing node pages at cp_error
    f2fs: flush dirty meta pages when flushing them
    f2fs: fix checkpoint=disable:%u%%
    f2fs: compress: fix zstd data corruption
    f2fs: add compressed/gc data read IO stat
    f2fs: fix potential use-after-free issue
    f2fs: compress: don't handle non-compressed data in workqueue
    ...

    Linus Torvalds
     

09 Jun, 2020

2 commits

  • This patch adds another way to attach bio flags to node writes.

    Description: Give a way to attach REQ_META|FUA to node writes
    given temperature-based bits. Now the bits indicate:
    * REQ_META | REQ_FUA |
    * 5 | 4 | 3 | 2 | 1 | 0 |
    * Cold | Warm | Hot | Cold | Warm | Hot |

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • kmalloc() returns kmalloc'ed memory, and kvmalloc() returns either
    kmalloc'ed or vmalloc'ed memory. But the f2fs wrappers, f2fs_kmalloc()
    and f2fs_kvmalloc(), both return both kinds of memory.

    It's redundant to have two functions that do the same thing, and also
    breaking the standard naming convention is causing bugs since people
    assume it's safe to kfree() memory allocated by f2fs_kmalloc(). See
    e.g. the various allocations in fs/f2fs/compress.c.

    Fix this by making f2fs_kmalloc() just use kmalloc(). And to avoid
    re-introducing the allocation failures that the vmalloc fallback was
    intended to fix, convert the largest allocations to use f2fs_kvmalloc().

    Signed-off-by: Eric Biggers
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Eric Biggers
     

03 Jun, 2020

2 commits

  • Since the new pair function is introduced, we can call them to clean the
    code in f2fs.h.

    Signed-off-by: Guoqing Jiang
    Signed-off-by: Andrew Morton
    Acked-by: Chao Yu
    Cc: Jaegeuk Kim
    Link: http://lkml.kernel.org/r/20200517214718.468-6-guoqing.jiang@cloud.ionos.com
    Signed-off-by: Linus Torvalds

    Guoqing Jiang
     
  • ext4 and f2fs have duplicated the guts of the readahead code so they can
    read past i_size. Instead, separate out the guts of the readahead code
    so they can call it directly.

    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Andrew Morton
    Tested-by: Eric Biggers
    Reviewed-by: Christoph Hellwig
    Reviewed-by: William Kucharski
    Reviewed-by: Eric Biggers
    Cc: Chao Yu
    Cc: Cong Wang
    Cc: Darrick J. Wong
    Cc: Dave Chinner
    Cc: Gao Xiang
    Cc: Jaegeuk Kim
    Cc: John Hubbard
    Cc: Joseph Qi
    Cc: Junxiao Bi
    Cc: Michal Hocko
    Cc: Zi Yan
    Cc: Johannes Thumshirn
    Cc: Miklos Szeredi
    Link: http://lkml.kernel.org/r/20200414150233.24495-14-willy@infradead.org
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     

19 May, 2020

2 commits

  • v1 encryption policies are deprecated in favor of v2, and some new
    features (e.g. encryption+casefolding) are only being added for v2.

    Therefore, the "test_dummy_encryption" mount option (which is used for
    encryption I/O testing with xfstests) needs to support v2 policies.

    To do this, extend its syntax to be "test_dummy_encryption=v1" or
    "test_dummy_encryption=v2". The existing "test_dummy_encryption" (no
    argument) also continues to be accepted, to specify the default setting
    -- currently v1, but the next patch changes it to v2.

    To cleanly support both v1 and v2 while also making it easy to support
    specifying other encryption settings in the future (say, accepting
    "$contents_mode:$filenames_mode:v2"), make ext4 and f2fs maintain a
    pointer to the dummy fscrypt_context rather than using mount flags.

    To avoid concurrency issues, don't allow test_dummy_encryption to be set
    or changed during a remount. (The former restriction is new, but
    xfstests doesn't run into it, so no one should notice.)

    Tested with 'gce-xfstests -c {ext4,f2fs}/encrypt -g auto'. On ext4,
    there are two regressions, both of which are test bugs: ext4/023 and
    ext4/028 fail because they set an xattr and expect it to be stored
    inline, but the increase in size of the fscrypt_context from
    24 to 40 bytes causes this xattr to be spilled into an external block.

    Link: https://lore.kernel.org/r/20200512233251.118314-4-ebiggers@kernel.org
    Acked-by: Jaegeuk Kim
    Reviewed-by: Theodore Ts'o
    Signed-off-by: Eric Biggers

    Eric Biggers
     
  • When parsing the mount option, we don't have sbi->user_block_count.
    Should do it after getting it.

    Cc:
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

12 May, 2020

10 commits

  • in order to account data read IOs more accurately.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Sahitya raised an issue:
    - prevent meta updates while checkpoint is in progress

    allocate_segment_for_resize() can cause metapage updates if
    it requires to change the current node/data segments for resizing.
    Stop these meta updates when there is a checkpoint already
    in progress to prevent inconsistent CP data.

    Signed-off-by: Sahitya Tummala
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch introduces a new ioctl to rollback all compress inode
    status:
    - add reserved blocks in dnode blocks
    - increase i_compr_blocks, i_blocks, total_valid_block_count
    - remove immutable flag

    Then compress inode can be restored to support overwrite
    functionality again.

    Signee-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • There could be a scenario where f2fs_sync_node_pages gets
    called during checkpoint, which in turn tries to flush
    inline data and calls iput(). This results in deadlock as
    iput() tries to hold cp_rwsem, which is already held at the
    beginning by checkpoint->block_operations().

    Call stack :

    Thread A Thread B
    f2fs_write_checkpoint()
    - block_operations(sbi)
    - f2fs_lock_all(sbi);
    - down_write(&sbi->cp_rwsem);

    - open()
    - igrab()
    - write() write inline data
    - unlink()
    - f2fs_sync_node_pages()
    - if (is_inline_node(page))
    - flush_inline_data()
    - ilookup()
    page = f2fs_pagecache_get_page()
    if (!page)
    goto iput_out;
    iput_out:
    -close()
    -iput()
    iput(inode);
    - f2fs_evict_inode()
    - f2fs_truncate_blocks()
    - f2fs_lock_op()
    - down_read(&sbi->cp_rwsem);

    Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
    Signed-off-by: Sayali Lokhande
    Signed-off-by: Jaegeuk Kim

    Sayali Lokhande
     
  • update_sit_info should be f2fs_update_sit_info,
    otherwise build fails while no CONFIG_F2FS_STAT_FS.

    Fixes: fc7100ea2a52 ("f2fs: Add f2fs stats to sysfs")
    Signed-off-by: YueHaibing
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    YueHaibing
     
  • There are still reserved blocks on compressed inode, this patch
    introduce a new ioctl to help release reserved blocks back to
    filesystem, so that userspace can reuse those freed space.

    ----
    Daeho fixed a bug like below.

    Now, if writing pages and releasing compress blocks occur
    simultaneously, and releasing cblocks is executed more than one time
    to a file, then total block count of filesystem and block count of the
    file could be incorrect and damaged.

    We have to execute releasing compress blocks only one time for a file
    without being interfered by writepages path.
    ---

    Signed-off-by: Chao Yu
    Signed-off-by: Daeho Jeong
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
    Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
    dirhash, crypto_buf, and casefolded name. Some of these names can be
    NULL in some cases. 'struct f2fs_filename' differs from
    'struct fscrypt_name' mainly in that the casefolded name is included.

    For user-initiated directory operations like lookup() and create(),
    initialize the f2fs_filename by translating the corresponding
    fscrypt_name, then computing the dirhash and casefolded name if needed.

    This makes the dirhash and casefolded name be cached for each syscall,
    so we don't have to recompute them repeatedly. (Previously, f2fs
    computed the dirhash once per directory level, and the casefolded name
    once per directory block.) This improves performance.

    This rework also makes it much easier to correctly handle all
    combinations of normal, encrypted, casefolded, and encrypted+casefolded
    directories. (The fourth isn't supported yet but is being worked on.)

    The only other cases where an f2fs_filename gets initialized are for two
    filesystem-internal operations: (1) when converting an inline directory
    to a regular one, we grab the needed disk_name and hash from an existing
    f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
    grab the needed disk_name from f2fs_inode::i_name and compute the hash.

    Signed-off-by: Eric Biggers
    Signed-off-by: Jaegeuk Kim

    Eric Biggers
     
  • Sharing f2fs_ci_compare() between comparing cached dentries
    (f2fs_d_compare()) and comparing on-disk dentries (f2fs_match_name())
    doesn't work as well as intended, as these actions fundamentally differ
    in several ways (e.g. whether the task may sleep, whether the directory
    is stable, whether the casefolded name was precomputed, whether the
    dentry will need to be decrypted once we allow casefold+encrypt, etc.)

    Just make f2fs_d_compare() implement what it needs directly, and rework
    f2fs_ci_compare() to be specialized for f2fs_match_name().

    Signed-off-by: Eric Biggers
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Eric Biggers
     
  • LZO-RLE extension (run length encoding) was introduced to improve
    performance of LZO algorithm in scenario of data contains many zeros,
    zram has changed to use this extended algorithm by default, this
    patch adds to support this algorithm extension, to enable this
    extension, it needs to enable F2FS_FS_LZO and F2FS_FS_LZORLE config,
    and specifies "compress_algorithm=lzo-rle" mountoption.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • If compression feature is on, in scenario of no enough free memory,
    page refault ratio is higher than before, the root cause is:
    - {,de}compression flow needs to allocate intermediate pages to store
    compressed data in cluster, so during their allocation, vm may reclaim
    mmaped pages.
    - if above reclaimed pages belong to compressed cluster, during its
    refault, it may cause more intermediate pages allocation, result in
    reclaiming more mmaped pages.

    So this patch introduces a mempool for intermediate page allocation,
    in order to avoid high refault ratio, by default, number of
    preallocated page in pool is 512, user can change the number by
    assigning 'num_compress_pages' parameter during module initialization.

    Ma Feng found warnings in the original patch and fixed like below.

    Fix the following sparse warning:
    fs/f2fs/compress.c:501:5: warning: symbol 'num_compress_pages' was not declared.
    Should it be static?
    fs/f2fs/compress.c:530:6: warning: symbol 'f2fs_compress_free_page' was not
    declared. Should it be static?

    Reported-by: Hulk Robot
    Signed-off-by: Chao Yu
    Signed-off-by: Ma Feng
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

08 May, 2020

3 commits


18 Apr, 2020

2 commits


17 Apr, 2020

1 commit

  • This patch introduces a way to attach REQ_META/FUA explicitly
    to all the data writes given temperature.

    -> attach REQ_FUA to Hot Data writes

    -> attach REQ_FUA to Hot|Warm Data writes

    -> attach REQ_FUA to Hot|Warm|Cold Data writes

    -> attach REQ_FUA to Hot|Warm|Cold Data writes as well as
    REQ_META to Hot Data writes

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

08 Apr, 2020

1 commit

  • Pull f2fs updates from Jaegeuk Kim:
    "In this round, we've mainly focused on fixing bugs and addressing
    issues in recently introduced compression support.

    Enhancement:
    - add zstd support, and set LZ4 by default
    - add ioctl() to show # of compressed blocks
    - show mount time in debugfs
    - replace rwsem with spinlock
    - avoid lock contention in DIO reads

    Some major bug fixes wrt compression:
    - compressed block count
    - memory access and leak
    - remove obsolete fields
    - flag controls

    Other bug fixes and clean ups:
    - fix overflow when handling .flags in inode_info
    - fix SPO issue during resize FS flow
    - fix compression with fsverity enabled
    - potential deadlock when writing compressed pages
    - show missing mount options"

    * tag 'f2fs-for-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (66 commits)
    f2fs: keep inline_data when compression conversion
    f2fs: fix to disable compression on directory
    f2fs: add missing CONFIG_F2FS_FS_COMPRESSION
    f2fs: switch discard_policy.timeout to bool type
    f2fs: fix to verify tpage before releasing in f2fs_free_dic()
    f2fs: show compression in statx
    f2fs: clean up dic->tpages assignment
    f2fs: compress: support zstd compress algorithm
    f2fs: compress: add .{init,destroy}_decompress_ctx callback
    f2fs: compress: fix to call missing destroy_compress_ctx()
    f2fs: change default compression algorithm
    f2fs: clean up {cic,dic}.ref handling
    f2fs: fix to use f2fs_readpage_limit() in f2fs_read_multi_pages()
    f2fs: xattr.h: Make stub helpers inline
    f2fs: fix to avoid double unlock
    f2fs: fix potential .flags overflow on 32bit architecture
    f2fs: fix NULL pointer dereference in f2fs_verity_work()
    f2fs: fix to clear PG_error if fsverity failed
    f2fs: don't call fscrypt_get_encryption_info() explicitly in f2fs_tmpfile()
    f2fs: don't trigger data flush in foreground operation
    ...

    Linus Torvalds
     

04 Apr, 2020

3 commits


31 Mar, 2020

4 commits

  • f2fs_inode_info.flags is unsigned long variable, it has 32 bits
    in 32bit architecture, since we introduced FI_MMAP_FILE flag
    when we support data compression, we may access memory cross
    the border of .flags field, corrupting .i_sem field, result in
    below deadlock.

    To fix this issue, let's expand .flags as an array to grab enough
    space to store new flags.

    Call Trace:
    __schedule+0x8d0/0x13fc
    ? mark_held_locks+0xac/0x100
    schedule+0xcc/0x260
    rwsem_down_write_slowpath+0x3ab/0x65d
    down_write+0xc7/0xe0
    f2fs_drop_nlink+0x3d/0x600 [f2fs]
    f2fs_delete_inline_entry+0x300/0x440 [f2fs]
    f2fs_delete_entry+0x3a1/0x7f0 [f2fs]
    f2fs_unlink+0x500/0x790 [f2fs]
    vfs_unlink+0x211/0x490
    do_unlinkat+0x483/0x520
    sys_unlink+0x4a/0x70
    do_fast_syscall_32+0x12b/0x683
    entry_SYSENTER_32+0xaa/0x102

    Fixes: 4c8ff7095bef ("f2fs: support data compression")
    Tested-by: Ondrej Jirman
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Data flush can generate heavy IO and cause long latency during
    flush, so it's not appropriate to trigger it in foreground
    operation.

    And also, we may face below potential deadlock during data flush:
    - f2fs_write_multi_pages
    - f2fs_write_raw_pages
    - f2fs_write_single_data_page
    - f2fs_balance_fs
    - f2fs_balance_fs_bg
    - f2fs_sync_dirty_inodes
    - filemap_fdatawrite -- stuck on flush same cluster

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Merge below two conditions into f2fs_may_encrypt() for cleanup
    - IS_ENCRYPTED()
    - DUMMY_ENCRYPTION_ENABLED()

    Check IS_ENCRYPTED(inode) condition in f2fs_init_inode_metadata()
    is enough since we have already set encrypt flag in f2fs_new_inode().

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • - f2fs_iget
    - do_read_inode
    - set_inode_flag(, FI_COMPRESSED_FILE)
    - __mark_inode_dirty_flag(, true)

    It's unnecessary, so let's just mark compressed inode dirty while
    compressed inode conversion.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

25 Mar, 2020

1 commit


23 Mar, 2020

1 commit

  • It's been observed that kzalloc() on lookup_all_xattrs() are called millions
    of times on Android, quickly becoming the top abuser of slub memory allocator.

    Use a dedicated kmem cache pool for xattr lookups to mitigate this.

    Signed-off-by: Park Ju Hyung
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

20 Mar, 2020

7 commits

  • __f2fs_bio_alloc() won't fail due to memory pool backend, remove unneeded
    __GFP_NOFAIL flag in __f2fs_bio_alloc().

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • With this newly introduced interface, user can get block
    number compression saved in target inode.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • If we are in write IO path, we need to avoid using GFP_KERNEL.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • As Geert Uytterhoeven reported:

    for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

    On some platforms, HZ can be less than 50, then unexpected 0 timeout
    jiffies will be set in congestion_wait().

    This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
    value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

    Quoted from Geert Uytterhoeven:

    "A timeout of HZ means 1 second.
    HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

    If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
    as that takes care of the special cases, and never returns 0."

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • There are three status for background gc: on, off and sync, it's
    a little bit confused to use test_opt(BG_GC) and test_opt(FORCE_FG_GC)
    combinations to indicate status of background gc.

    So let's remove F2FS_MOUNT_BG_GC and F2FS_MOUNT_FORCE_FG_GC mount
    options, and add F2FS_OPTION().bggc_mode with below three status
    to clean up codes and enhance bggc mode's scalability.

    enum {
    BGGC_MODE_ON, /* background gc is on */
    BGGC_MODE_OFF, /* background gc is off */
    BGGC_MODE_SYNC, /*
    * background gc is on, migrating blocks
    * like foreground gc
    */
    };

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch removes F2FS_MOUNT_ADAPTIVE and F2FS_MOUNT_LFS mount options,
    and add F2FS_OPTION.fs_mode with below two status to indicate filesystem
    mode.

    enum {
    FS_MODE_ADAPTIVE, /* use both lfs/ssr allocation */
    FS_MODE_LFS, /* use lfs allocation only */
    };

    It can enhance code readability and fs mode's scalability.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Previously, 'norecovery' mount option will be shown as
    'disable_roll_forward', fix to show original option name correctly.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu