29 Sep, 2020

2 commits


12 Sep, 2020

1 commit

  • There are several issues in current background GC algorithm:
    - valid blocks is one of key factors during cost overhead calculation,
    so if segment has less valid block, however even its age is young or
    it locates hot segment, CB algorithm will still choose the segment as
    victim, it's not appropriate.
    - GCed data/node will go to existing logs, no matter in-there datas'
    update frequency is the same or not, it may mix hot and cold data
    again.
    - GC alloctor mainly use LFS type segment, it will cost free segment
    more quickly.

    This patch introduces a new algorithm named age threshold based
    garbage collection to solve above issues, there are three steps
    mainly:

    1. select a source victim:
    - set an age threshold, and select candidates beased threshold:
    e.g.
    0 means youngest, 100 means oldest, if we set age threshold to 80
    then select dirty segments which has age in range of [80, 100] as
    candiddates;
    - set candidate_ratio threshold, and select candidates based the
    ratio, so that we can shrink candidates to those oldest segments;
    - select target segment with fewest valid blocks in order to
    migrate blocks with minimum cost;

    2. select a target victim:
    - select candidates beased age threshold;
    - set candidate_radius threshold, search candidates whose age is
    around source victims, searching radius should less than the
    radius threshold.
    - select target segment with most valid blocks in order to avoid
    migrating current target segment.

    3. merge valid blocks from source victim into target victim with
    SSR alloctor.

    Test steps:
    - create 160 dirty segments:
    * half of them have 128 valid blocks per segment
    * left of them have 384 valid blocks per segment
    - run background GC

    Benefit: GC count and block movement count both decrease obviously:

    - Before:
    - Valid: 86
    - Dirty: 1
    - Prefree: 11
    - Free: 6001 (6001)

    GC calls: 162 (BG: 220)
    - data segments : 160 (160)
    - node segments : 2 (2)
    Try to move 41454 blocks (BG: 41454)
    - data blocks : 40960 (40960)
    - node blocks : 494 (494)

    IPU: 0 blocks
    SSR: 0 blocks in 0 segments
    LFS: 41364 blocks in 81 segments

    - After:

    - Valid: 87
    - Dirty: 0
    - Prefree: 4
    - Free: 6008 (6008)

    GC calls: 75 (BG: 76)
    - data segments : 74 (74)
    - node segments : 1 (1)
    Try to move 12813 blocks (BG: 12813)
    - data blocks : 12544 (12544)
    - node blocks : 269 (269)

    IPU: 0 blocks
    SSR: 12032 blocks in 77 segments
    LFS: 855 blocks in 2 segments

    Signed-off-by: Chao Yu
    [Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

11 Sep, 2020

2 commits

  • Previous implementation of aligned pinfile allocation will:
    - allocate new segment on cold data log no matter whether last used
    segment is partially used or not, it makes IOs more random;
    - force concurrent cold data/GCed IO going into warm data area, it
    can make a bad effect on hot/cold data separation;

    In this patch, we introduce a new type of log named 'inmem curseg',
    the differents from normal curseg is:
    - it reuses existed segment type (CURSEG_XXX_NODE/DATA);
    - it only exists in memory, its segno, blkofs, summary will not b
    persisted into checkpoint area;

    With this new feature, we can enhance scalability of log, special
    allocators can be created for purposes:
    - pure lfs allocator for aligned pinfile allocation or file
    defragmentation
    - pure ssr allocator for later feature

    So that, let's update aligned pinfile allocation to use this new
    inmem curseg fwk.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
    Zone-capacity indicates the maximum number of sectors that are usable in
    a zone beginning from the first sector of the zone. This makes the sectors
    sectors after the zone-capacity till zone-size to be unusable.
    This patch set tracks zone-size and zone-capacity in zoned devices and
    calculate the usable blocks per segment and usable segments per section.

    If zone-capacity is less than zone-size mark only those segments which
    start before zone-capacity as free segments. All segments at and beyond
    zone-capacity are treated as permanently used segments. In cases where
    zone-capacity does not align with segment size the last segment will start
    before zone-capacity and end beyond the zone-capacity of the zone. For
    such spanning segments only sectors within the zone-capacity are used.

    During writes and GC manage the usable segments in a section and usable
    blocks per segment. Segments which are beyond zone-capacity are never
    allocated, and do not need to be garbage collected, only the segments
    which are before zone-capacity needs to garbage collected.
    For spanning segments based on the number of usable blocks in that
    segment, write to blocks only up to zone-capacity.

    Zone-capacity is device specific and cannot be configured by the user.
    Since NVMe ZNS device zones are sequentially write only, a block device
    with conventional zones or any normal block device is needed along with
    the ZNS device for the metadata operations of F2fs.

    A typical nvme-cli output of a zoned device shows zone start and capacity
    and write pointer as below:

    SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
    SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
    SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ

    Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
    are in EMPTY state. For each zone, only zone start + 49MB is usable area,
    any lba/sector after 49MB cannot be read or written to, the drive will fail
    any attempts to read/write. So, the second zone starts at 64MB and is
    usable till 113MB (64 + 49) and the range between 113 and 128MB is
    again unusable. The next zone starts at 128MB, and so on.

    Signed-off-by: Aravind Ramesh
    Signed-off-by: Damien Le Moal
    Signed-off-by: Niklas Cassel
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Aravind Ramesh
     

21 Jul, 2020

1 commit


19 Jun, 2020

1 commit

  • Assume each section has 4 segment:
    .___________________________.
    |_Segment0_|_..._|_Segment3_|
    . .
    . .
    .__________.
    |_section0_|

    Segment 0~2 has 0 valid block, segment 3 has 512 valid blocks.
    It will fail if we want to gc section0 in this scenes,
    because all 4 segments in section0 is not dirty.
    So we should use dirty section bitmap instead of dirty segment bitmap
    to get right victim section.

    Signed-off-by: Jack Qiu
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jack Qiu
     

08 May, 2020

1 commit

  • This patch corrects the SPDX License Identifier style in
    header files related to F2FS File System support.
    For C header files Documentation/process/license-rules.rst
    mandates C-like comments (opposed to C source files where
    C++ style should be used).

    Changes made by using a script provided by Joe Perches here:
    https://lkml.org/lkml/2019/2/7/46.

    Suggested-by: Joe Perches
    Signed-off-by: Nishad Kamdar
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Nishad Kamdar
     

20 Mar, 2020

1 commit


18 Jan, 2020

3 commits

  • This patch adds missing fsync_mode entry in f2fs document.

    Fixes: 04485987f053 ("f2fs: introduce async IPU policy")
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Setting 0x40 in /sys/fs/f2fs/dev/ipu_policy gives a way to turn off
    bio cache, which is useufl to check whether block layer using hardware
    encryption engine merges IOs correctly.

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch tries to support compression in f2fs.

    - New term named cluster is defined as basic unit of compression, file can
    be divided into multiple clusters logically. One cluster includes 4 << n
    (n >= 0) logical pages, compression size is also cluster size, each of
    cluster can be compressed or not.

    - In cluster metadata layout, one special flag is used to indicate cluster
    is compressed one or normal one, for compressed cluster, following metadata
    maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
    data including compress header and compressed data.

    - In order to eliminate write amplification during overwrite, F2FS only
    support compression on write-once file, data can be compressed only when
    all logical blocks in file are valid and cluster compress ratio is lower
    than specified threshold.

    - To enable compression on regular inode, there are three ways:
    * chattr +c file
    * chattr +c dir; touch dir/file
    * mount w/ -o compress_extension=ext; touch file.ext

    Compress metadata layout:
    [Dnode Structure]
    +-----------------------------------------------+
    | cluster 1 | cluster 2 | ......... | cluster N |
    +-----------------------------------------------+
    . . . .
    . . . .
    . Compressed Cluster . . Normal Cluster .
    +----------+---------+---------+---------+ +---------+---------+---------+---------+
    |compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
    +----------+---------+---------+---------+ +---------+---------+---------+---------+
    . .
    . .
    . .
    +-------------+-------------+----------+----------------------------+
    | data length | data chksum | reserved | compressed data |
    +-------------+-------------+----------+----------------------------+

    Changelog:

    20190326:
    - fix error handling of read_end_io().
    - remove unneeded comments in f2fs_encrypt_one_page().

    20190327:
    - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
    - don't jump into loop directly to avoid uninitialized variables.
    - add TODO tag in error path of f2fs_write_cache_pages().

    20190328:
    - fix wrong merge condition in f2fs_read_multi_pages().
    - check compressed file in f2fs_post_read_required().

    20190401
    - allow overwrite on non-compressed cluster.
    - check cluster meta before writing compressed data.

    20190402
    - don't preallocate blocks for compressed file.

    - add lz4 compress algorithm
    - process multiple post read works in one workqueue
    Now f2fs supports processing post read work in multiple workqueue,
    it shows low performance due to schedule overhead of multiple
    workqueue executing orderly.

    20190921
    - compress: support buffered overwrite
    C: compress cluster flag
    V: valid block address
    N: NEW_ADDR

    One cluster contain 4 blocks

    before overwrite after overwrite

    - VVVV -> CVNN
    - CVNN -> VVVV

    - CVNN -> CVNN
    - CVNN -> CVVV

    - CVVV -> CVNN
    - CVVV -> CVVV

    20191029
    - add kconfig F2FS_FS_COMPRESSION to isolate compression related
    codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
    note that: will remove lzo backend if Jaegeuk agreed that too.
    - update codes according to Eric's comments.

    20191101
    - apply fixes from Jaegeuk

    20191113
    - apply fixes from Jaegeuk
    - split workqueue for fsverity

    20191216
    - apply fixes from Jaegeuk

    20200117
    - fix to avoid NULL pointer dereference

    [Jaegeuk Kim]
    - add tracepoint for f2fs_{,de}compress_pages()
    - fix many bugs and add some compression stats
    - fix overwrite/mmap bugs
    - address 32bit build error, reported by Geert.
    - bug fixes when handling errors and i_compressed_blocks

    Reported-by:
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

08 Nov, 2019

1 commit

  • This patch supports 2MB-aligned pinned file, which can guarantee no GC at all
    by allocating fully valid 2MB segment.

    Check free segments by has_not_enough_free_secs() with large budget.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

07 Sep, 2019

1 commit

  • This patch changes sematics of f2fs_is_checkpoint_ready()'s return
    value as: return true when checkpoint is ready, other return false,
    it can improve readability of below conditions.

    f2fs_submit_page_write()
    ...
    if (is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN) ||
    !f2fs_is_checkpoint_ready(sbi))
    __submit_merged_bio(io);

    f2fs_balance_fs()
    ...
    if (!f2fs_is_checkpoint_ready(sbi))
    return;

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

23 Aug, 2019

2 commits

  • Policy - Foreground GC, LFS and greedy GC mode.

    Under this policy, f2fs_gc() loops forever to GC as it doesn't have
    enough free segements to proceed and thus it keeps calling gc_more
    for the same victim segment. This can happen if the selected victim
    segment could not be GC'd due to failed blkaddr validity check i.e.
    is_alive() returns false for the blocks set in current validity map.

    Fix this by keeping track of such invalid segments and skip those
    segments for selection in get_victim_by_default() to avoid endless
    GC loop under such error scenarios. Currently, add this logic under
    CONFIG_F2FS_CHECK_FS to be able to root cause the issue in debug
    version.

    Signed-off-by: Sahitya Tummala
    Reviewed-by: Chao Yu
    [Jaegeuk Kim: fix wrong bitmap size]
    Signed-off-by: Jaegeuk Kim

    Sahitya Tummala
     
  • build_sit_info() allocate all bitmaps for each segment one by one,
    it's quite low efficiency, this pach changes to allocate large
    continuous memory at a time, and divide it and assign for each bitmaps
    of segment. For large size image, it can expect improving its mount
    speed.

    Signed-off-by: Chen Gong
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

03 Jul, 2019

3 commits

  • f2fs uses EFAULT as error number to indicate filesystem is corrupted
    all the time, but generic filesystems use EUCLEAN for such condition,
    we need to change to follow others.

    This patch adds two new macros as below to wrap more generic error
    code macros, and spread them in code.

    EFSBADCRC EBADMSG /* Bad CRC detected */
    EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */

    Reported-by: Pavel Machek
    Signed-off-by: Chao Yu
    Acked-by: Pavel Machek
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Replace the open-coded divisions with round-up by calls to the
    DIV_ROUND_UP() helper macro.

    Signed-off-by: Geert Uytterhoeven
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Geert Uytterhoeven
     
  • - Add and use f2fs_ macros
    - Convert f2fs_msg to f2fs_printk
    - Remove level from f2fs_printk and embed the level in the format
    - Coalesce formats and align multi-line arguments
    - Remove unnecessary duplicate extern f2fs_msg f2fs.h

    Signed-off-by: Joe Perches
    Signed-off-by: Chao Yu
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Joe Perches
     

09 May, 2019

2 commits

  • Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
    whether @blkaddr locates in main area or not.

    That check is weak, since the block address in range of main area can
    point to the address which is not valid in segment info table, and we
    can not detect such condition, we may suffer worse corruption as system
    continues running.

    So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
    which trigger SIT bitmap check rather than only range check.

    This patch did below changes as wel:
    - set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
    - get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
    - introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
    - spread blkaddr check in:
    * f2fs_get_node_info()
    * __read_out_blkaddrs()
    * f2fs_submit_page_read()
    * ra_data_block()
    * do_recover_data()

    This patch can fix bug reported from bugzilla below:

    https://bugzilla.kernel.org/show_bug.cgi?id=203215
    https://bugzilla.kernel.org/show_bug.cgi?id=203223
    https://bugzilla.kernel.org/show_bug.cgi?id=203231
    https://bugzilla.kernel.org/show_bug.cgi?id=203235
    https://bugzilla.kernel.org/show_bug.cgi?id=203241

    = Update by Jaegeuk Kim =

    DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
    But, xfstest/generic/446 compalins some generated kernel messages saying invalid
    bitmap was detected when reading a block. The reaons is, when we get the
    block addresses from extent_cache, there is no lock to synchronize it from
    truncating the blocks in parallel.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • As Jungyeon reported in bugzilla:

    https://bugzilla.kernel.org/show_bug.cgi?id=203233

    - Overview
    When mounting the attached crafted image and running program, following errors are reported.
    Additionally, it hangs on sync after running program.

    The image is intentionally fuzzed from a normal f2fs image for testing.
    Compile options for F2FS are as follows.
    CONFIG_F2FS_FS=y
    CONFIG_F2FS_STAT_FS=y
    CONFIG_F2FS_FS_XATTR=y
    CONFIG_F2FS_FS_POSIX_ACL=y
    CONFIG_F2FS_CHECK_FS=y

    - Reproduces
    cc poc_13.c
    mkdir test
    mount -t f2fs tmp.img test
    cp a.out test
    cd test
    sudo ./a.out
    sync

    - Kernel messages
    F2FS-fs (sdb): Bitmap was wrongly set, blk:4608
    kernel BUG at fs/f2fs/segment.c:2102!
    RIP: 0010:update_sit_entry+0x394/0x410
    Call Trace:
    f2fs_allocate_data_block+0x16f/0x660
    do_write_page+0x62/0x170
    f2fs_do_write_node_page+0x33/0xa0
    __write_node_page+0x270/0x4e0
    f2fs_sync_node_pages+0x5df/0x670
    f2fs_write_checkpoint+0x372/0x1400
    f2fs_sync_fs+0xa3/0x130
    f2fs_do_sync_file+0x1a6/0x810
    do_fsync+0x33/0x60
    __x64_sys_fsync+0xb/0x10
    do_syscall_64+0x43/0xf0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    sit.vblocks and sum valid block count in sit.valid_map may be
    inconsistent, segment w/ zero vblocks will be treated as free
    segment, while allocating in free segment, we may allocate a
    free block, if its bitmap is valid previously, it can cause
    kernel crash due to bitmap verification failure.

    Anyway, to avoid further serious metadata inconsistence and
    corruption, it is necessary and worth to detect SIT
    inconsistence. So let's enable check_block_count() to verify
    vblocks and valid_map all the time rather than do it only
    CONFIG_F2FS_CHECK_FS is enabled.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

16 Feb, 2019

1 commit


27 Nov, 2018

1 commit


17 Oct, 2018

1 commit

  • Note that, it requires "f2fs: return correct errno in f2fs_gc".

    This adds a lightweight non-persistent snapshotting scheme to f2fs.

    To use, mount with the option checkpoint=disable, and to return to
    normal operation, remount with checkpoint=enable. If the filesystem
    is shut down before remounting with checkpoint=enable, it will revert
    back to its apparent state when it was first mounted with
    checkpoint=disable. This is useful for situations where you wish to be
    able to roll back the state of the disk in case of some critical
    failure.

    Signed-off-by: Daniel Rosenberg
    [Jaegeuk Kim: use SB_RDONLY instead of MS_RDONLY]
    Signed-off-by: Jaegeuk Kim

    Daniel Rosenberg
     

13 Sep, 2018

1 commit


21 Aug, 2018

1 commit

  • The f2fs_gc() called by f2fs_balance_fs() requires to be called outside of
    fi->i_gc_rwsem[WRITE], since f2fs_gc() can try to grab it in a loop.

    If it hits the miximum retrials in GC, let's give a chance to release
    gc_mutex for a short time in order not to go into live lock in the worst
    case.

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

02 Aug, 2018

2 commits

  • For the case when sbi->segs_per_sec > 1, take section:segment = 5 for
    example, if segment 1 is just used and allocate new segment 2, and the
    blocks of segment 1 is invalidated, at this time, the previous code will
    use __set_test_and_free to free the free_secmap and free_sections++,
    this is not correct since it is still a current section, so fix it.

    Signed-off-by: Yunlong Song
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Yunlong Song
     
  • This patch add to do sanity check with below field:
    - cp_pack_total_block_count
    - blkaddr of data/node
    - extent info

    - Overview
    BUG() in verify_block_addr() when writing to a corrupted f2fs image

    - Reproduce (4.18 upstream kernel)

    - POC (poc.c)

    static void activity(char *mpoint) {

    char *foo_bar_baz;
    int err;

    static int buf[8192];
    memset(buf, 0, sizeof(buf));

    err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);

    int fd = open(foo_bar_baz, O_RDWR | O_TRUNC, 0777);
    if (fd >= 0) {
    write(fd, (char *)buf, sizeof(buf));
    fdatasync(fd);
    close(fd);
    }
    }

    int main(int argc, char *argv[]) {
    activity(argv[1]);
    return 0;
    }

    - Kernel message
    [ 689.349473] F2FS-fs (loop0): Mounted with checkpoint version = 3
    [ 699.728662] WARNING: CPU: 0 PID: 1309 at fs/f2fs/segment.c:2860 f2fs_inplace_write_data+0x232/0x240
    [ 699.728670] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
    [ 699.729056] CPU: 0 PID: 1309 Comm: a.out Not tainted 4.18.0-rc1+ #4
    [ 699.729064] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    [ 699.729074] RIP: 0010:f2fs_inplace_write_data+0x232/0x240
    [ 699.729076] Code: ff e9 cf fe ff ff 49 8d 7d 10 e8 39 45 ad ff 4d 8b 7d 10 be 04 00 00 00 49 8d 7f 48 e8 07 49 ad ff 45 8b 7f 48 e9 fb fe ff ff 0b f0 41 80 4d 48 04 e9 65 fe ff ff 90 66 66 66 66 90 55 48 8d
    [ 699.729130] RSP: 0018:ffff8801f43af568 EFLAGS: 00010202
    [ 699.729139] RAX: 000000000000003f RBX: ffff8801f43af7b8 RCX: ffffffffb88c9113
    [ 699.729142] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff8802024e5540
    [ 699.729144] RBP: ffff8801f43af590 R08: 0000000000000009 R09: ffffffffffffffe8
    [ 699.729147] R10: 0000000000000001 R11: ffffed0039b0596a R12: ffff8802024e5540
    [ 699.729149] R13: ffff8801f0335500 R14: ffff8801e3e7a700 R15: ffff8801e1ee4450
    [ 699.729154] FS: 00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
    [ 699.729156] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 699.729159] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
    [ 699.729171] Call Trace:
    [ 699.729192] f2fs_do_write_data_page+0x2e2/0xe00
    [ 699.729203] ? f2fs_should_update_outplace+0xd0/0xd0
    [ 699.729238] ? memcg_drain_all_list_lrus+0x280/0x280
    [ 699.729269] ? __radix_tree_replace+0xa3/0x120
    [ 699.729276] __write_data_page+0x5c7/0xe30
    [ 699.729291] ? kasan_check_read+0x11/0x20
    [ 699.729310] ? page_mapped+0x8a/0x110
    [ 699.729321] ? page_mkclean+0xe9/0x160
    [ 699.729327] ? f2fs_do_write_data_page+0xe00/0xe00
    [ 699.729331] ? invalid_page_referenced_vma+0x130/0x130
    [ 699.729345] ? clear_page_dirty_for_io+0x332/0x450
    [ 699.729351] f2fs_write_cache_pages+0x4ca/0x860
    [ 699.729358] ? __write_data_page+0xe30/0xe30
    [ 699.729374] ? percpu_counter_add_batch+0x22/0xa0
    [ 699.729380] ? kasan_check_write+0x14/0x20
    [ 699.729391] ? _raw_spin_lock+0x17/0x40
    [ 699.729403] ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30
    [ 699.729413] ? iov_iter_advance+0x113/0x640
    [ 699.729418] ? f2fs_write_end+0x133/0x2e0
    [ 699.729423] ? balance_dirty_pages_ratelimited+0x239/0x640
    [ 699.729428] f2fs_write_data_pages+0x329/0x520
    [ 699.729433] ? generic_perform_write+0x250/0x320
    [ 699.729438] ? f2fs_write_cache_pages+0x860/0x860
    [ 699.729454] ? current_time+0x110/0x110
    [ 699.729459] ? f2fs_preallocate_blocks+0x1ef/0x370
    [ 699.729464] do_writepages+0x37/0xb0
    [ 699.729468] ? f2fs_write_cache_pages+0x860/0x860
    [ 699.729472] ? do_writepages+0x37/0xb0
    [ 699.729478] __filemap_fdatawrite_range+0x19a/0x1f0
    [ 699.729483] ? delete_from_page_cache_batch+0x4e0/0x4e0
    [ 699.729496] ? __vfs_write+0x2b2/0x410
    [ 699.729501] file_write_and_wait_range+0x66/0xb0
    [ 699.729506] f2fs_do_sync_file+0x1f9/0xd90
    [ 699.729511] ? truncate_partial_data_page+0x290/0x290
    [ 699.729521] ? __sb_end_write+0x30/0x50
    [ 699.729526] ? vfs_write+0x20f/0x260
    [ 699.729530] f2fs_sync_file+0x9a/0xb0
    [ 699.729534] ? f2fs_do_sync_file+0xd90/0xd90
    [ 699.729548] vfs_fsync_range+0x68/0x100
    [ 699.729554] ? __fget_light+0xc9/0xe0
    [ 699.729558] do_fsync+0x3d/0x70
    [ 699.729562] __x64_sys_fdatasync+0x24/0x30
    [ 699.729585] do_syscall_64+0x78/0x170
    [ 699.729595] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 699.729613] RIP: 0033:0x7f9bf930d800
    [ 699.729615] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24
    [ 699.729668] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
    [ 699.729673] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
    [ 699.729675] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
    [ 699.729678] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
    [ 699.729680] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
    [ 699.729683] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
    [ 699.729687] ---[ end trace 4ce02f25ff7d3df5 ]---
    [ 699.729782] ------------[ cut here ]------------
    [ 699.729785] kernel BUG at fs/f2fs/segment.h:654!
    [ 699.731055] invalid opcode: 0000 [#1] SMP KASAN PTI
    [ 699.732104] CPU: 0 PID: 1309 Comm: a.out Tainted: G W 4.18.0-rc1+ #4
    [ 699.733684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    [ 699.735611] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730
    [ 699.736649] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0
    [ 699.740524] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283
    [ 699.741573] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef
    [ 699.743006] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c
    [ 699.744426] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55
    [ 699.745833] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940
    [ 699.747256] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001
    [ 699.748683] FS: 00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
    [ 699.750293] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 699.751462] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
    [ 699.752874] Call Trace:
    [ 699.753386] ? f2fs_inplace_write_data+0x93/0x240
    [ 699.754341] f2fs_inplace_write_data+0xd2/0x240
    [ 699.755271] f2fs_do_write_data_page+0x2e2/0xe00
    [ 699.756214] ? f2fs_should_update_outplace+0xd0/0xd0
    [ 699.757215] ? memcg_drain_all_list_lrus+0x280/0x280
    [ 699.758209] ? __radix_tree_replace+0xa3/0x120
    [ 699.759164] __write_data_page+0x5c7/0xe30
    [ 699.760002] ? kasan_check_read+0x11/0x20
    [ 699.760823] ? page_mapped+0x8a/0x110
    [ 699.761573] ? page_mkclean+0xe9/0x160
    [ 699.762345] ? f2fs_do_write_data_page+0xe00/0xe00
    [ 699.763332] ? invalid_page_referenced_vma+0x130/0x130
    [ 699.764374] ? clear_page_dirty_for_io+0x332/0x450
    [ 699.765347] f2fs_write_cache_pages+0x4ca/0x860
    [ 699.766276] ? __write_data_page+0xe30/0xe30
    [ 699.767161] ? percpu_counter_add_batch+0x22/0xa0
    [ 699.768112] ? kasan_check_write+0x14/0x20
    [ 699.768951] ? _raw_spin_lock+0x17/0x40
    [ 699.769739] ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30
    [ 699.770885] ? iov_iter_advance+0x113/0x640
    [ 699.771743] ? f2fs_write_end+0x133/0x2e0
    [ 699.772569] ? balance_dirty_pages_ratelimited+0x239/0x640
    [ 699.773680] f2fs_write_data_pages+0x329/0x520
    [ 699.774603] ? generic_perform_write+0x250/0x320
    [ 699.775544] ? f2fs_write_cache_pages+0x860/0x860
    [ 699.776510] ? current_time+0x110/0x110
    [ 699.777299] ? f2fs_preallocate_blocks+0x1ef/0x370
    [ 699.778279] do_writepages+0x37/0xb0
    [ 699.779026] ? f2fs_write_cache_pages+0x860/0x860
    [ 699.779978] ? do_writepages+0x37/0xb0
    [ 699.780755] __filemap_fdatawrite_range+0x19a/0x1f0
    [ 699.781746] ? delete_from_page_cache_batch+0x4e0/0x4e0
    [ 699.782820] ? __vfs_write+0x2b2/0x410
    [ 699.783597] file_write_and_wait_range+0x66/0xb0
    [ 699.784540] f2fs_do_sync_file+0x1f9/0xd90
    [ 699.785381] ? truncate_partial_data_page+0x290/0x290
    [ 699.786415] ? __sb_end_write+0x30/0x50
    [ 699.787204] ? vfs_write+0x20f/0x260
    [ 699.787941] f2fs_sync_file+0x9a/0xb0
    [ 699.788694] ? f2fs_do_sync_file+0xd90/0xd90
    [ 699.789572] vfs_fsync_range+0x68/0x100
    [ 699.790360] ? __fget_light+0xc9/0xe0
    [ 699.791128] do_fsync+0x3d/0x70
    [ 699.791779] __x64_sys_fdatasync+0x24/0x30
    [ 699.792614] do_syscall_64+0x78/0x170
    [ 699.793371] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 699.794406] RIP: 0033:0x7f9bf930d800
    [ 699.795134] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24
    [ 699.798960] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
    [ 699.800483] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
    [ 699.801923] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
    [ 699.803373] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
    [ 699.804798] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
    [ 699.806233] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
    [ 699.807667] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
    [ 699.817079] ---[ end trace 4ce02f25ff7d3df6 ]---
    [ 699.818068] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730
    [ 699.819114] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0
    [ 699.822919] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283
    [ 699.823977] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef
    [ 699.825436] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c
    [ 699.826881] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55
    [ 699.828292] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940
    [ 699.829750] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001
    [ 699.831192] FS: 00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
    [ 699.832793] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 699.833981] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
    [ 699.835556] ==================================================================
    [ 699.837029] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x38c/0x3e0
    [ 699.838462] Read of size 8 at addr ffff8801f43af970 by task a.out/1309

    [ 699.840086] CPU: 0 PID: 1309 Comm: a.out Tainted: G D W 4.18.0-rc1+ #4
    [ 699.841603] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    [ 699.843475] Call Trace:
    [ 699.843982] dump_stack+0x7b/0xb5
    [ 699.844661] print_address_description+0x70/0x290
    [ 699.845607] kasan_report+0x291/0x390
    [ 699.846351] ? update_stack_state+0x38c/0x3e0
    [ 699.853831] __asan_load8+0x54/0x90
    [ 699.854569] update_stack_state+0x38c/0x3e0
    [ 699.855428] ? __read_once_size_nocheck.constprop.7+0x20/0x20
    [ 699.856601] ? __save_stack_trace+0x5e/0x100
    [ 699.857476] unwind_next_frame.part.5+0x18e/0x490
    [ 699.858448] ? unwind_dump+0x290/0x290
    [ 699.859217] ? clear_page_dirty_for_io+0x332/0x450
    [ 699.860185] __unwind_start+0x106/0x190
    [ 699.860974] __save_stack_trace+0x5e/0x100
    [ 699.861808] ? __save_stack_trace+0x5e/0x100
    [ 699.862691] ? unlink_anon_vmas+0xba/0x2c0
    [ 699.863525] save_stack_trace+0x1f/0x30
    [ 699.864312] save_stack+0x46/0xd0
    [ 699.864993] ? __alloc_pages_slowpath+0x1420/0x1420
    [ 699.865990] ? flush_tlb_mm_range+0x15e/0x220
    [ 699.866889] ? kasan_check_write+0x14/0x20
    [ 699.867724] ? __dec_node_state+0x92/0xb0
    [ 699.868543] ? lock_page_memcg+0x85/0xf0
    [ 699.869350] ? unlock_page_memcg+0x16/0x80
    [ 699.870185] ? page_remove_rmap+0x198/0x520
    [ 699.871048] ? mark_page_accessed+0x133/0x200
    [ 699.871930] ? _cond_resched+0x1a/0x50
    [ 699.872700] ? unmap_page_range+0xcd4/0xe50
    [ 699.873551] ? rb_next+0x58/0x80
    [ 699.874217] ? rb_next+0x58/0x80
    [ 699.874895] __kasan_slab_free+0x13c/0x1a0
    [ 699.875734] ? unlink_anon_vmas+0xba/0x2c0
    [ 699.876563] kasan_slab_free+0xe/0x10
    [ 699.877315] kmem_cache_free+0x89/0x1e0
    [ 699.878095] unlink_anon_vmas+0xba/0x2c0
    [ 699.878913] free_pgtables+0x101/0x1b0
    [ 699.879677] exit_mmap+0x146/0x2a0
    [ 699.880378] ? __ia32_sys_munmap+0x50/0x50
    [ 699.881214] ? kasan_check_read+0x11/0x20
    [ 699.882052] ? mm_update_next_owner+0x322/0x380
    [ 699.882985] mmput+0x8b/0x1d0
    [ 699.883602] do_exit+0x43a/0x1390
    [ 699.884288] ? mm_update_next_owner+0x380/0x380
    [ 699.885212] ? f2fs_sync_file+0x9a/0xb0
    [ 699.885995] ? f2fs_do_sync_file+0xd90/0xd90
    [ 699.886877] ? vfs_fsync_range+0x68/0x100
    [ 699.887694] ? __fget_light+0xc9/0xe0
    [ 699.888442] ? do_fsync+0x3d/0x70
    [ 699.889118] ? __x64_sys_fdatasync+0x24/0x30
    [ 699.889996] rewind_stack_do_exit+0x17/0x20
    [ 699.890860] RIP: 0033:0x7f9bf930d800
    [ 699.891585] Code: Bad RIP value.
    [ 699.892268] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
    [ 699.893781] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
    [ 699.895220] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
    [ 699.896643] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
    [ 699.898069] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
    [ 699.899505] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000

    [ 699.901241] The buggy address belongs to the page:
    [ 699.902215] page:ffffea0007d0ebc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
    [ 699.903811] flags: 0x2ffff0000000000()
    [ 699.904585] raw: 02ffff0000000000 0000000000000000 ffffffff07d00101 0000000000000000
    [ 699.906125] raw: 0000000000000000 0000000000240000 00000000ffffffff 0000000000000000
    [ 699.907673] page dumped because: kasan: bad access detected

    [ 699.909108] Memory state around the buggy address:
    [ 699.910077] ffff8801f43af800: 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 00
    [ 699.911528] ffff8801f43af880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    [ 699.912953] >ffff8801f43af900: 00 00 00 00 00 00 00 00 f1 01 f4 f4 f4 f2 f2 f2
    [ 699.914392] ^
    [ 699.915758] ffff8801f43af980: f2 00 f4 f4 00 00 00 00 f2 00 00 00 00 00 00 00
    [ 699.917193] ffff8801f43afa00: 00 00 00 00 00 00 00 00 00 f3 f3 f3 00 00 00 00
    [ 699.918634] ==================================================================

    - Location
    https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.h#L644

    Reported-by Wen Xu
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

27 Jul, 2018

1 commit

  • This patch introduces verify_blkaddr to check meta/data block address
    with valid range to detect bug earlier.

    In addition, once we encounter an invalid blkaddr, notice user to run
    fsck to fix, and let the kernel panic.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

05 Jun, 2018

1 commit

  • If we change system time to the past, get_mtime() will return a
    overflowed time, and SIT_I(sbi)->max_mtime will be udpated
    incorrectly, this patch fixes the two issues.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

01 Jun, 2018

4 commits

  • f2fs doesn't allow abuse on atomic write class interface, so except
    limiting in-mem pages' total memory usage capacity, we need to limit
    atomic-write usage as well when filesystem is seriously fragmented,
    otherwise we may run into infinite loop during foreground GC because
    target blocks in victim segment are belong to atomic opened file for
    long time.

    Now, we will detect failure due to atomic write in foreground GC, if
    the count exceeds threshold, we will drop all atomic written data in
    cache, by this, I expect it can keep our system running safely to
    prevent Dos attack.

    In addition, his patch adds to show GC skip information in debugfs,
    now it just shows count of skipped caused by atomic write.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • - rename is_valid_blkaddr() to is_valid_meta_blkaddr() for readability.
    - introduce is_valid_blkaddr() for cleanup.

    No logic change in this patch.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • For extreme case:
    10 section, op = 10%, no_fggc_threshold = 90%
    All section usage: 85% 85% 85% 85% 90% 90% 95% 95% 95% 95%

    During foreground GC, if we skip select dirty section whose usage
    is larger than no_fggc_threshold, we can only recycle 80% invalid
    space from four 85% usage sections and two 90% usage sections,
    result in encountering out-of-space issue.

    This reverts commit e93b9865251a0503d83fd570e7d5a7c8bc351715 to
    fix this issue, besides, we keep the logic that we scan all dirty
    section when searching a victim, so that GC can select victim with
    least valid blocks.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Related to https://lkml.org/lkml/2018/4/8/661

    Sometimes, we need to write meta data to new allocated block address,
    then we will allocate a zeroed page in inner inode's address space, and
    fill partial data in it, and leave other place with zero value which means
    some fields are initial status.

    There are two inner inodes (meta inode and node inode) setting __GFP_ZERO,
    I have just checked them, for both of them, we can avoid using __GFP_ZERO,
    and do initialization by ourselves to avoid unneeded/redundant zeroing
    from mm.

    Cc:
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

19 Mar, 2018

1 commit


13 Mar, 2018

1 commit


26 Jan, 2018

1 commit

  • This patch rebuild sit page from sit info in mem instead
    of issue a read io.

    I test this method and the result is as below:

    Pre:
    mmc_perf_test-12061 [001] ...1 976.819992: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
    mmc_perf_test-12061 [001] ...1 976.856446: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
    mmc_perf_test-12061 [003] ...1 998.976946: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
    mmc_perf_test-12061 [003] ...1 999.023269: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
    mmc_perf_test-12061 [003] ...1 1022.060772: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
    mmc_perf_test-12061 [003] ...1 1022.111034: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
    mmc_perf_test-12061 [002] ...1 1070.127643: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
    mmc_perf_test-12061 [003] ...1 1070.187352: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
    mmc_perf_test-12061 [003] ...1 1095.942124: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
    mmc_perf_test-12061 [003] ...1 1095.995975: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
    mmc_perf_test-12061 [003] ...1 1122.535091: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
    mmc_perf_test-12061 [003] ...1 1122.586521: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
    mmc_perf_test-12061 [001] ...1 1147.897487: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
    mmc_perf_test-12061 [001] ...1 1147.959438: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
    mmc_perf_test-12061 [003] ...1 1177.926951: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
    mmc_perf_test-12061 [002] ...1 1177.976823: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
    mmc_perf_test-12061 [002] ...1 1204.176087: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
    mmc_perf_test-12061 [002] ...1 1204.239046: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit

    Some sit flush consume more than 50ms.

    Now:
    mmc_perf_test-2187 [007] ...1 196.840684: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
    mmc_perf_test-2187 [007] ...1 196.841258: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
    mmc_perf_test-2187 [007] ...1 219.430582: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
    mmc_perf_test-2187 [007] ...1 219.431144: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
    mmc_perf_test-2187 [002] ...1 243.638678: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
    mmc_perf_test-2187 [000] ...1 243.638980: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
    mmc_perf_test-2187 [002] ...1 265.392180: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
    mmc_perf_test-2187 [002] ...1 265.392245: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
    mmc_perf_test-2187 [000] ...1 290.309051: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
    mmc_perf_test-2187 [000] ...1 290.309116: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
    mmc_perf_test-2187 [003] ...1 317.144209: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
    mmc_perf_test-2187 [003] ...1 317.145913: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
    mmc_perf_test-2187 [005] ...1 343.224954: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
    mmc_perf_test-2187 [005] ...1 343.225574: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
    mmc_perf_test-2187 [000] ...1 370.239846: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
    mmc_perf_test-2187 [000] ...1 370.241138: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
    mmc_perf_test-2187 [001] ...1 397.029043: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
    mmc_perf_test-2187 [001] ...1 397.030750: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
    mmc_perf_test-2187 [003] ...1 425.386377: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
    mmc_perf_test-2187 [003] ...1 425.387735: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit

    Most sit flush consume no more than 1ms.

    Signed-off-by: Yunlei He
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Yunlei He
     

23 Jan, 2018

1 commit

  • This patch splits need_inplace_update to two functions:
    a. should_update_inplace() includes all conditions that we must use IPU.
    b. should_update_outplace() includes all conditions that we must use OPU.

    So that, in f2fs_ioc_set_pin_file() and f2fs_defragment_range(), we can
    use corresponding function to check whether we can trigger OPU/IPU or not.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

03 Jan, 2018

1 commit


06 Nov, 2017

1 commit

  • When we are closing to trigger foreground GC, if there are only a few
    of dirty metas, we can log these dirty metas in left space of opened
    segments instead of triggering foreground GC.

    With this patch, total count of foreground GC triggered by
    test/generic/* of fstest suit reduce from 254 to 184.

    So let's do the check before foreground GC anyway.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu