06 Jan, 2021

1 commit

  • commit e584bbe821229a3e7cc409eecd51df66f9268c21 upstream.

    syzbot reported a bug which could cause shift-out-of-bounds issue,
    fix it.

    Call Trace:
    __dump_stack lib/dump_stack.c:79 [inline]
    dump_stack+0x107/0x163 lib/dump_stack.c:120
    ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
    __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
    sanity_check_raw_super fs/f2fs/super.c:2812 [inline]
    read_raw_super_block fs/f2fs/super.c:3267 [inline]
    f2fs_fill_super.cold+0x16c9/0x16f6 fs/f2fs/super.c:3519
    mount_bdev+0x34d/0x410 fs/super.c:1366
    legacy_get_tree+0x105/0x220 fs/fs_context.c:592
    vfs_get_tree+0x89/0x2f0 fs/super.c:1496
    do_new_mount fs/namespace.c:2896 [inline]
    path_mount+0x12ae/0x1e70 fs/namespace.c:3227
    do_mount fs/namespace.c:3240 [inline]
    __do_sys_mount fs/namespace.c:3448 [inline]
    __se_sys_mount fs/namespace.c:3425 [inline]
    __x64_sys_mount+0x27f/0x300 fs/namespace.c:3425
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Reported-by: syzbot+ca9a785f8ac472085994@syzkaller.appspotmail.com
    Signed-off-by: Anant Thazhemadam
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Chao Yu
     

30 Dec, 2020

1 commit

  • [ Upstream commit 89ff6005039a878afac87889fee748fa3f957c3a ]

    In case of retrying fill_super with skip_recovery,
    s_encoding for casefold would not be loaded again even though it's
    already been freed because it's not NULL.
    Set NULL after free to prevent double freeing when unmount.

    Fixes: eca4873ee1b6 ("f2fs: Use generic casefolding support")
    Signed-off-by: Hyeongseok Kim
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sasha Levin

    Hyeongseok Kim
     

25 Oct, 2020

1 commit

  • Pull misc vfs updates from Al Viro:
    "Assorted stuff all over the place (the largest group here is
    Christoph's stat cleanups)"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: remove KSTAT_QUERY_FLAGS
    fs: remove vfs_stat_set_lookup_flags
    fs: move vfs_fstatat out of line
    fs: implement vfs_stat and vfs_lstat in terms of vfs_fstatat
    fs: remove vfs_statx_fd
    fs: omfs: use kmemdup() rather than kmalloc+memcpy
    [PATCH] reduce boilerplate in fsid handling
    fs: Remove duplicated flag O_NDELAY occurring twice in VALID_OPEN_FLAGS
    selftests: mount: add nosymfollow tests
    Add a "nosymfollow" mount option.

    Linus Torvalds
     

17 Oct, 2020

1 commit

  • Pull f2fs updates from Jaegeuk Kim:
    "In this round, we've added new features such as zone capacity for ZNS
    and a new GC policy, ATGC, along with in-memory segment management. In
    addition, we could improve the decompression speed significantly by
    changing virtual mapping method. Even though we've fixed lots of small
    bugs in compression support, I feel that it becomes more stable so
    that I could give it a try in production.

    Enhancements:
    - suport zone capacity in NVMe Zoned Namespace devices
    - introduce in-memory current segment management
    - add standart casefolding support
    - support age threshold based garbage collection
    - improve decompression speed by changing virtual mapping method

    Bug fixes:
    - fix condition checks in some ioctl() such as compression, move_range, etc
    - fix 32/64bits support in data structures
    - fix memory allocation in zstd decompress
    - add some boundary checks to avoid kernel panic on corrupted image
    - fix disallowing compression for non-empty file
    - fix slab leakage of compressed block writes

    In addition, it includes code refactoring for better readability and
    minor bug fixes for compression and zoned device support"

    * tag 'f2fs-for-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (51 commits)
    f2fs: code cleanup by removing unnecessary check
    f2fs: wait for sysfs kobject removal before freeing f2fs_sb_info
    f2fs: fix writecount false positive in releasing compress blocks
    f2fs: introduce check_swap_activate_fast()
    f2fs: don't issue flush in f2fs_flush_device_cache() for nobarrier case
    f2fs: handle errors of f2fs_get_meta_page_nofail
    f2fs: fix to set SBI_NEED_FSCK flag for inconsistent inode
    f2fs: reject CASEFOLD inode flag without casefold feature
    f2fs: fix memory alignment to support 32bit
    f2fs: fix slab leak of rpages pointer
    f2fs: compress: fix to disallow enabling compress on non-empty file
    f2fs: compress: introduce cic/dic slab cache
    f2fs: compress: introduce page array slab cache
    f2fs: fix to do sanity check on segment/section count
    f2fs: fix to check segment boundary during SIT page readahead
    f2fs: fix uninit-value in f2fs_lookup
    f2fs: remove unneeded parameter in find_in_block()
    f2fs: fix wrong total_sections check and fsmeta check
    f2fs: remove duplicated code in sanity_check_area_boundary
    f2fs: remove unused check on version_bitmap
    ...

    Linus Torvalds
     

30 Sep, 2020

2 commits


29 Sep, 2020

5 commits

  • As syzbot reported:

    BUG: KASAN: slab-out-of-bounds in init_min_max_mtime fs/f2fs/segment.c:4710 [inline]
    BUG: KASAN: slab-out-of-bounds in f2fs_build_segment_manager+0x9302/0xa6d0 fs/f2fs/segment.c:4792
    Read of size 8 at addr ffff8880a1b934a8 by task syz-executor682/6878

    CPU: 1 PID: 6878 Comm: syz-executor682 Not tainted 5.9.0-rc6-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x198/0x1fd lib/dump_stack.c:118
    print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
    __kasan_report mm/kasan/report.c:513 [inline]
    kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
    init_min_max_mtime fs/f2fs/segment.c:4710 [inline]
    f2fs_build_segment_manager+0x9302/0xa6d0 fs/f2fs/segment.c:4792
    f2fs_fill_super+0x381a/0x6e80 fs/f2fs/super.c:3633
    mount_bdev+0x32e/0x3f0 fs/super.c:1417
    legacy_get_tree+0x105/0x220 fs/fs_context.c:592
    vfs_get_tree+0x89/0x2f0 fs/super.c:1547
    do_new_mount fs/namespace.c:2875 [inline]
    path_mount+0x1387/0x20a0 fs/namespace.c:3192
    do_mount fs/namespace.c:3205 [inline]
    __do_sys_mount fs/namespace.c:3413 [inline]
    __se_sys_mount fs/namespace.c:3390 [inline]
    __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    The root cause is: if segs_per_sec is larger than one, and segment count
    in last section is less than segs_per_sec, we will suffer out-of-boundary
    memory access on sit_i->sentries[] in init_min_max_mtime().

    Fix this by adding sanity check among segment count, section count and
    segs_per_sec value in sanity_check_raw_super().

    Reported-by: syzbot+481a3ffab50fed41dcc0@syzkaller.appspotmail.com
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Meta area is not included in section_count computation.
    So the minimum number of total_sections is 1 meanwhile it cannot be
    greater than segment_count_main.

    The minimum number of meta segments is 8 (SB + 2 (CP + SIT + NAT) + SSA).

    Signed-off-by: Wang Xiaojun
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Wang Xiaojun
     
  • Use seg_end_blkaddr instead of "segment0_blkaddr + (segment_count <<
    log_blocks_per_seg)".

    Signed-off-by: Wang Xiaojun
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Wang Xiaojun
     
  • Relocate blkzoned feature check into parse_options() like
    other feature check.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • sbi->devs would be initialized only if image enables multiple device
    feature or blkzoned feature, if blkzoned feature flag was set by fuzz
    in non-blkzoned device, we will suffer below panic:

    get_zone_idx fs/f2fs/segment.c:4892 [inline]
    f2fs_usable_zone_blks_in_seg fs/f2fs/segment.c:4943 [inline]
    f2fs_usable_blks_in_seg+0x39b/0xa00 fs/f2fs/segment.c:4999
    Call Trace:
    check_block_count+0x69/0x4e0 fs/f2fs/segment.h:704
    build_sit_entries fs/f2fs/segment.c:4403 [inline]
    f2fs_build_segment_manager+0x51da/0xa370 fs/f2fs/segment.c:5100
    f2fs_fill_super+0x3880/0x6ff0 fs/f2fs/super.c:3684
    mount_bdev+0x32e/0x3f0 fs/super.c:1417
    legacy_get_tree+0x105/0x220 fs/fs_context.c:592
    vfs_get_tree+0x89/0x2f0 fs/super.c:1547
    do_new_mount fs/namespace.c:2896 [inline]
    path_mount+0x12ae/0x1e70 fs/namespace.c:3216
    do_mount fs/namespace.c:3229 [inline]
    __do_sys_mount fs/namespace.c:3437 [inline]
    __se_sys_mount fs/namespace.c:3414 [inline]
    __x64_sys_mount+0x27f/0x300 fs/namespace.c:3414
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46

    Add sanity check to inconsistency on factors: blkzoned flag, device
    path and device character to avoid above panic.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

22 Sep, 2020

2 commits

  • fscrypt_set_test_dummy_encryption() requires that the optional argument
    to the test_dummy_encryption mount option be specified as a substring_t.
    That doesn't work well with filesystems that use the new mount API,
    since the new way of parsing mount options doesn't use substring_t.

    Make it take the argument as a 'const char *' instead.

    Instead of moving the match_strdup() into the callers in ext4 and f2fs,
    make them just use arg->from directly. Since the pattern is
    "test_dummy_encryption=%s", the argument will be null-terminated.

    Acked-by: Jeff Layton
    Link: https://lore.kernel.org/r/20200917041136.178600-14-ebiggers@kernel.org
    Signed-off-by: Eric Biggers

    Eric Biggers
     
  • The behavior of the test_dummy_encryption mount option is that when a
    new file (or directory or symlink) is created in an unencrypted
    directory, it's automatically encrypted using a dummy encryption policy.
    That's it; in particular, the encryption (or lack thereof) of existing
    files (or directories or symlinks) doesn't change.

    Unfortunately the implementation of test_dummy_encryption is a bit weird
    and confusing. When test_dummy_encryption is enabled and a file is
    being created in an unencrypted directory, we set up an encryption key
    (->i_crypt_info) for the directory. This isn't actually used to do any
    encryption, however, since the directory is still unencrypted! Instead,
    ->i_crypt_info is only used for inheriting the encryption policy.

    One consequence of this is that the filesystem ends up providing a
    "dummy context" (policy + nonce) instead of a "dummy policy". In
    commit ed318a6cc0b6 ("fscrypt: support test_dummy_encryption=v2"), I
    mistakenly thought this was required. However, actually the nonce only
    ends up being used to derive a key that is never used.

    Another consequence of this implementation is that it allows for
    'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge
    case that can be forgotten about. For example, currently
    FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the
    dummy encryption policy when the filesystem is mounted with
    test_dummy_encryption. That seems like the wrong thing to do, since
    again, the directory itself is not actually encrypted.

    Therefore, switch to a more logical and maintainable implementation
    where the dummy encryption policy inheritance is done without setting up
    keys for unencrypted directories. This involves:

    - Adding a function fscrypt_policy_to_inherit() which returns the
    encryption policy to inherit from a directory. This can be a real
    policy, a dummy policy, or no policy.

    - Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc.
    with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc.

    - Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead
    of an inode.

    Acked-by: Jaegeuk Kim
    Acked-by: Jeff Layton
    Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.org
    Signed-off-by: Eric Biggers

    Eric Biggers
     

19 Sep, 2020

1 commit


12 Sep, 2020

3 commits

  • writepages() can be concurrently invoked for the same file by different
    threads such as a thread fsyncing the file and a kworker kernel thread.
    So, changing i_compr_blocks without protection is racy and we need to
    protect it by changing it with atomic type value. Plus, we don't need
    a 64bit value for i_compr_blocks, so just we will use a atomic value,
    not atomic64.

    Signed-off-by: Daeho Jeong
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Daeho Jeong
     
  • to keep consistent with behavior when passing compress mount option
    to kernel w/o compression feature, so that mount may not fail on
    such condition.

    Reported-by: Kyungmin Park
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • There are several issues in current background GC algorithm:
    - valid blocks is one of key factors during cost overhead calculation,
    so if segment has less valid block, however even its age is young or
    it locates hot segment, CB algorithm will still choose the segment as
    victim, it's not appropriate.
    - GCed data/node will go to existing logs, no matter in-there datas'
    update frequency is the same or not, it may mix hot and cold data
    again.
    - GC alloctor mainly use LFS type segment, it will cost free segment
    more quickly.

    This patch introduces a new algorithm named age threshold based
    garbage collection to solve above issues, there are three steps
    mainly:

    1. select a source victim:
    - set an age threshold, and select candidates beased threshold:
    e.g.
    0 means youngest, 100 means oldest, if we set age threshold to 80
    then select dirty segments which has age in range of [80, 100] as
    candiddates;
    - set candidate_ratio threshold, and select candidates based the
    ratio, so that we can shrink candidates to those oldest segments;
    - select target segment with fewest valid blocks in order to
    migrate blocks with minimum cost;

    2. select a target victim:
    - select candidates beased age threshold;
    - set candidate_radius threshold, search candidates whose age is
    around source victims, searching radius should less than the
    radius threshold.
    - select target segment with most valid blocks in order to avoid
    migrating current target segment.

    3. merge valid blocks from source victim into target victim with
    SSR alloctor.

    Test steps:
    - create 160 dirty segments:
    * half of them have 128 valid blocks per segment
    * left of them have 384 valid blocks per segment
    - run background GC

    Benefit: GC count and block movement count both decrease obviously:

    - Before:
    - Valid: 86
    - Dirty: 1
    - Prefree: 11
    - Free: 6001 (6001)

    GC calls: 162 (BG: 220)
    - data segments : 160 (160)
    - node segments : 2 (2)
    Try to move 41454 blocks (BG: 41454)
    - data blocks : 40960 (40960)
    - node blocks : 494 (494)

    IPU: 0 blocks
    SSR: 0 blocks in 0 segments
    LFS: 41364 blocks in 81 segments

    - After:

    - Valid: 87
    - Dirty: 0
    - Prefree: 4
    - Free: 6008 (6008)

    GC calls: 75 (BG: 76)
    - data segments : 74 (74)
    - node segments : 1 (1)
    Try to move 12813 blocks (BG: 12813)
    - data blocks : 12544 (12544)
    - node blocks : 269 (269)

    IPU: 0 blocks
    SSR: 12032 blocks in 77 segments
    LFS: 855 blocks in 2 segments

    Signed-off-by: Chao Yu
    [Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

11 Sep, 2020

3 commits

  • This switches f2fs over to the generic support provided in
    the previous patch.

    Since casefolded dentries behave the same in ext4 and f2fs, we decrease
    the maintenance burden by unifying them, and any optimizations will
    immediately apply to both.

    Signed-off-by: Daniel Rosenberg
    Reviewed-by: Eric Biggers
    Signed-off-by: Jaegeuk Kim

    Daniel Rosenberg
     
  • Previous implementation of aligned pinfile allocation will:
    - allocate new segment on cold data log no matter whether last used
    segment is partially used or not, it makes IOs more random;
    - force concurrent cold data/GCed IO going into warm data area, it
    can make a bad effect on hot/cold data separation;

    In this patch, we introduce a new type of log named 'inmem curseg',
    the differents from normal curseg is:
    - it reuses existed segment type (CURSEG_XXX_NODE/DATA);
    - it only exists in memory, its segno, blkofs, summary will not b
    persisted into checkpoint area;

    With this new feature, we can enhance scalability of log, special
    allocators can be created for purposes:
    - pure lfs allocator for aligned pinfile allocation or file
    defragmentation
    - pure ssr allocator for later feature

    So that, let's update aligned pinfile allocation to use this new
    inmem curseg fwk.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
    Zone-capacity indicates the maximum number of sectors that are usable in
    a zone beginning from the first sector of the zone. This makes the sectors
    sectors after the zone-capacity till zone-size to be unusable.
    This patch set tracks zone-size and zone-capacity in zoned devices and
    calculate the usable blocks per segment and usable segments per section.

    If zone-capacity is less than zone-size mark only those segments which
    start before zone-capacity as free segments. All segments at and beyond
    zone-capacity are treated as permanently used segments. In cases where
    zone-capacity does not align with segment size the last segment will start
    before zone-capacity and end beyond the zone-capacity of the zone. For
    such spanning segments only sectors within the zone-capacity are used.

    During writes and GC manage the usable segments in a section and usable
    blocks per segment. Segments which are beyond zone-capacity are never
    allocated, and do not need to be garbage collected, only the segments
    which are before zone-capacity needs to garbage collected.
    For spanning segments based on the number of usable blocks in that
    segment, write to blocks only up to zone-capacity.

    Zone-capacity is device specific and cannot be configured by the user.
    Since NVMe ZNS device zones are sequentially write only, a block device
    with conventional zones or any normal block device is needed along with
    the ZNS device for the metadata operations of F2fs.

    A typical nvme-cli output of a zoned device shows zone start and capacity
    and write pointer as below:

    SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
    SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
    SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ

    Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
    are in EMPTY state. For each zone, only zone start + 49MB is usable area,
    any lba/sector after 49MB cannot be read or written to, the drive will fail
    any attempts to read/write. So, the second zone starts at 64MB and is
    usable till 113MB (64 + 49) and the range between 113 and 128MB is
    again unusable. The next zone starts at 128MB, and so on.

    Signed-off-by: Aravind Ramesh
    Signed-off-by: Damien Le Moal
    Signed-off-by: Niklas Cassel
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Aravind Ramesh
     

11 Aug, 2020

1 commit

  • Pull f2fs updates from Jaegeuk Kim:
    "In this round, we've added two small interfaces: (a) GC_URGENT_LOW
    mode for performance and (b) F2FS_IOC_SEC_TRIM_FILE ioctl for
    security.

    The new GC mode allows Android to run some lower priority GCs in
    background, while new ioctl discards user information without race
    condition when the account is removed.

    In addition, some patches were merged to address latency-related
    issues. We've fixed some compression-related bug fixes as well as edge
    race conditions.

    Enhancements:
    - add GC_URGENT_LOW mode in gc_urgent
    - introduce F2FS_IOC_SEC_TRIM_FILE ioctl
    - bypass racy readahead to improve read latencies
    - shrink node_write lock coverage to avoid long latency

    Bug fixes:
    - fix missing compression flag control, i_size, and mount option
    - fix deadlock between quota writes and checkpoint
    - remove inode eviction path in synchronous path to avoid deadlock
    - fix to wait GCed compressed page writeback
    - fix a kernel panic in f2fs_is_compressed_page
    - check page dirty status before writeback
    - wait page writeback before update in node page write flow
    - fix a race condition between f2fs_write_end_io and f2fs_del_fsync_node_entry

    We've added some minor sanity checks and refactored trivial code
    blocks for better readability and debugging information"

    * tag 'f2fs-for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (52 commits)
    f2fs: prepare a waiter before entering io_schedule
    f2fs: update_sit_entry: Make the judgment condition of f2fs_bug_on more intuitive
    f2fs: replace test_and_set/clear_bit() with set/clear_bit()
    f2fs: make file immutable even if releasing zero compression block
    f2fs: compress: disable compression mount option if compression is off
    f2fs: compress: add sanity check during compressed cluster read
    f2fs: use macro instead of f2fs verity version
    f2fs: fix deadlock between quota writes and checkpoint
    f2fs: correct comment of f2fs_exist_written_data
    f2fs: compress: delay temp page allocation
    f2fs: compress: fix to update isize when overwriting compressed file
    f2fs: space related cleanup
    f2fs: fix use-after-free issue
    f2fs: Change the type of f2fs_flush_inline_data() to void
    f2fs: add F2FS_IOC_SEC_TRIM_FILE ioctl
    f2fs: should avoid inode eviction in synchronous path
    f2fs: segment.h: delete a duplicated word
    f2fs: compress: fix to avoid memory leak on cc->cpages
    f2fs: use generic names for generic ioctls
    f2fs: don't keep meta inode pages used for compressed block migration
    ...

    Linus Torvalds
     

04 Aug, 2020

1 commit


24 Jul, 2020

1 commit

  • During umount, f2fs_put_super() unregisters procfs entries after
    f2fs_destroy_segment_manager(), it may cause use-after-free
    issue when umount races with procfs accessing, fix it by relocating
    f2fs_unregister_sysfs().

    [Chao Yu: change commit title/message a bit]

    Signed-off-by: Li Guifu
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Li Guifu
     

09 Jul, 2020

1 commit

  • Wire up f2fs to support inline encryption via the helper functions which
    fs/crypto/ now provides. This includes:

    - Adding a mount option 'inlinecrypt' which enables inline encryption
    on encrypted files where it can be used.

    - Setting the bio_crypt_ctx on bios that will be submitted to an
    inline-encrypted file.

    - Not adding logically discontiguous data to bios that will be submitted
    to an inline-encrypted file.

    - Not doing filesystem-layer crypto on inline-encrypted files.

    This patch includes a fix for a race during IPU by
    Sahitya Tummala

    Signed-off-by: Satya Tangirala
    Acked-by: Jaegeuk Kim
    Reviewed-by: Eric Biggers
    Reviewed-by: Chao Yu
    Link: https://lore.kernel.org/r/20200702015607.1215430-4-satyat@google.com
    Co-developed-by: Eric Biggers
    Signed-off-by: Eric Biggers

    Satya Tangirala
     

08 Jul, 2020

1 commit

  • If two readahead threads having same offset enter in readpages, every read
    IOs are split and issued to the disk which giving lower bandwidth.

    This patch tries to avoid redundant readahead calls.

    Fixes one build error reported by Randy.
    Fix build error when F2FS_FS_COMPRESSION is not set/enabled.
    This label is needed in either case.

    ../fs/f2fs/data.c: In function ‘f2fs_mpage_readpages’:
    ../fs/f2fs/data.c:2327:5: error: label ‘next_page’ used but not defined
    goto next_page;

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

19 Jun, 2020

2 commits


10 Jun, 2020

1 commit

  • Pull f2fs updates from Jaegeuk Kim:
    "In this round, we've added some knobs to enhance compression feature
    and harden testing environment. In addition, we've fixed several bugs
    reported from Android devices such as long discarding latency, device
    hanging during quota_sync, etc.

    Enhancements:
    - support lzo-rle algorithm
    - add two ioctls to release and reserve blocks for compression
    - support partial truncation/fiemap on compressed file
    - introduce sysfs entries to attach IO flags explicitly
    - add iostat trace point along with read io stat

    Bug fixes:
    - fix long discard latency
    - flush quota data by f2fs_quota_sync correctly
    - fix to recover parent inode number for power-cut recovery
    - fix lz4/zstd output buffer budget
    - parse checkpoint mount option correctly
    - avoid inifinite loop to wait for flushing node/meta pages
    - manage discard space correctly

    And some refactoring and clean up patches were added"

    * tag 'f2fs-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (51 commits)
    f2fs: attach IO flags to the missing cases
    f2fs: add node_io_flag for bio flags likewise data_io_flag
    f2fs: remove unused parameter of f2fs_put_rpages_mapping()
    f2fs: handle readonly filesystem in f2fs_ioc_shutdown()
    f2fs: avoid utf8_strncasecmp() with unstable name
    f2fs: don't return vmalloc() memory from f2fs_kmalloc()
    f2fs: fix retry logic in f2fs_write_cache_pages()
    f2fs: fix wrong discard space
    f2fs: compress: don't compress any datas after cp stop
    f2fs: remove unneeded return value of __insert_discard_tree()
    f2fs: fix wrong value of tracepoint parameter
    f2fs: protect new segment allocation in expand_inode_data
    f2fs: code cleanup by removing ifdef macro surrounding
    f2fs: avoid inifinite loop to wait for flushing node pages at cp_error
    f2fs: flush dirty meta pages when flushing them
    f2fs: fix checkpoint=disable:%u%%
    f2fs: compress: fix zstd data corruption
    f2fs: add compressed/gc data read IO stat
    f2fs: fix potential use-after-free issue
    f2fs: compress: don't handle non-compressed data in workqueue
    ...

    Linus Torvalds
     

09 Jun, 2020

1 commit

  • kmalloc() returns kmalloc'ed memory, and kvmalloc() returns either
    kmalloc'ed or vmalloc'ed memory. But the f2fs wrappers, f2fs_kmalloc()
    and f2fs_kvmalloc(), both return both kinds of memory.

    It's redundant to have two functions that do the same thing, and also
    breaking the standard naming convention is causing bugs since people
    assume it's safe to kfree() memory allocated by f2fs_kmalloc(). See
    e.g. the various allocations in fs/f2fs/compress.c.

    Fix this by making f2fs_kmalloc() just use kmalloc(). And to avoid
    re-introducing the allocation failures that the vmalloc fallback was
    intended to fix, convert the largest allocations to use f2fs_kvmalloc().

    Signed-off-by: Eric Biggers
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Eric Biggers
     

19 May, 2020

2 commits

  • v1 encryption policies are deprecated in favor of v2, and some new
    features (e.g. encryption+casefolding) are only being added for v2.

    Therefore, the "test_dummy_encryption" mount option (which is used for
    encryption I/O testing with xfstests) needs to support v2 policies.

    To do this, extend its syntax to be "test_dummy_encryption=v1" or
    "test_dummy_encryption=v2". The existing "test_dummy_encryption" (no
    argument) also continues to be accepted, to specify the default setting
    -- currently v1, but the next patch changes it to v2.

    To cleanly support both v1 and v2 while also making it easy to support
    specifying other encryption settings in the future (say, accepting
    "$contents_mode:$filenames_mode:v2"), make ext4 and f2fs maintain a
    pointer to the dummy fscrypt_context rather than using mount flags.

    To avoid concurrency issues, don't allow test_dummy_encryption to be set
    or changed during a remount. (The former restriction is new, but
    xfstests doesn't run into it, so no one should notice.)

    Tested with 'gce-xfstests -c {ext4,f2fs}/encrypt -g auto'. On ext4,
    there are two regressions, both of which are test bugs: ext4/023 and
    ext4/028 fail because they set an xattr and expect it to be stored
    inline, but the increase in size of the fscrypt_context from
    24 to 40 bytes causes this xattr to be spilled into an external block.

    Link: https://lore.kernel.org/r/20200512233251.118314-4-ebiggers@kernel.org
    Acked-by: Jaegeuk Kim
    Reviewed-by: Theodore Ts'o
    Signed-off-by: Eric Biggers

    Eric Biggers
     
  • When parsing the mount option, we don't have sbi->user_block_count.
    Should do it after getting it.

    Cc:
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

12 May, 2020

4 commits

  • Sahitya raised an issue:
    - prevent meta updates while checkpoint is in progress

    allocate_segment_for_resize() can cause metapage updates if
    it requires to change the current node/data segments for resizing.
    Stop these meta updates when there is a checkpoint already
    in progress to prevent inconsistent CP data.

    Signed-off-by: Sahitya Tummala
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This reserved space isn't committed yet but cannot be used for
    allocations. For userspace it has no difference from used space.

    See the same fix in ext4 commit f06925c73942 ("ext4: report delalloc
    reserve as non-free in statfs for project quota").

    Fixes: ddc34e328d06 ("f2fs: introduce f2fs_statfs_project")
    Signed-off-by: Konstantin Khlebnikov
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Konstantin Khlebnikov
     
  • LZO-RLE extension (run length encoding) was introduced to improve
    performance of LZO algorithm in scenario of data contains many zeros,
    zram has changed to use this extended algorithm by default, this
    patch adds to support this algorithm extension, to enable this
    extension, it needs to enable F2FS_FS_LZO and F2FS_FS_LZORLE config,
    and specifies "compress_algorithm=lzo-rle" mountoption.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • If compression feature is on, in scenario of no enough free memory,
    page refault ratio is higher than before, the root cause is:
    - {,de}compression flow needs to allocate intermediate pages to store
    compressed data in cluster, so during their allocation, vm may reclaim
    mmaped pages.
    - if above reclaimed pages belong to compressed cluster, during its
    refault, it may cause more intermediate pages allocation, result in
    reclaiming more mmaped pages.

    So this patch introduces a mempool for intermediate page allocation,
    in order to avoid high refault ratio, by default, number of
    preallocated page in pool is 512, user can change the number by
    assigning 'num_compress_pages' parameter during module initialization.

    Ma Feng found warnings in the original patch and fixed like below.

    Fix the following sparse warning:
    fs/f2fs/compress.c:501:5: warning: symbol 'num_compress_pages' was not declared.
    Should it be static?
    fs/f2fs/compress.c:530:6: warning: symbol 'f2fs_compress_free_page' was not
    declared. Should it be static?

    Reported-by: Hulk Robot
    Signed-off-by: Chao Yu
    Signed-off-by: Ma Feng
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

08 May, 2020

1 commit


18 Apr, 2020

1 commit

  • Added a tracepoint to see iostat of f2fs. Default period of that
    is 3 second. This tracepoint can be used to be monitoring
    I/O statistics periodically.

    Signed-off-by: Daeho Jeong
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Daeho Jeong
     

08 Apr, 2020

1 commit

  • Pull f2fs updates from Jaegeuk Kim:
    "In this round, we've mainly focused on fixing bugs and addressing
    issues in recently introduced compression support.

    Enhancement:
    - add zstd support, and set LZ4 by default
    - add ioctl() to show # of compressed blocks
    - show mount time in debugfs
    - replace rwsem with spinlock
    - avoid lock contention in DIO reads

    Some major bug fixes wrt compression:
    - compressed block count
    - memory access and leak
    - remove obsolete fields
    - flag controls

    Other bug fixes and clean ups:
    - fix overflow when handling .flags in inode_info
    - fix SPO issue during resize FS flow
    - fix compression with fsverity enabled
    - potential deadlock when writing compressed pages
    - show missing mount options"

    * tag 'f2fs-for-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (66 commits)
    f2fs: keep inline_data when compression conversion
    f2fs: fix to disable compression on directory
    f2fs: add missing CONFIG_F2FS_FS_COMPRESSION
    f2fs: switch discard_policy.timeout to bool type
    f2fs: fix to verify tpage before releasing in f2fs_free_dic()
    f2fs: show compression in statx
    f2fs: clean up dic->tpages assignment
    f2fs: compress: support zstd compress algorithm
    f2fs: compress: add .{init,destroy}_decompress_ctx callback
    f2fs: compress: fix to call missing destroy_compress_ctx()
    f2fs: change default compression algorithm
    f2fs: clean up {cic,dic}.ref handling
    f2fs: fix to use f2fs_readpage_limit() in f2fs_read_multi_pages()
    f2fs: xattr.h: Make stub helpers inline
    f2fs: fix to avoid double unlock
    f2fs: fix potential .flags overflow on 32bit architecture
    f2fs: fix NULL pointer dereference in f2fs_verity_work()
    f2fs: fix to clear PG_error if fsverity failed
    f2fs: don't call fscrypt_get_encryption_info() explicitly in f2fs_tmpfile()
    f2fs: don't trigger data flush in foreground operation
    ...

    Linus Torvalds
     

04 Apr, 2020

1 commit


31 Mar, 2020

1 commit