05 Sep, 2013

2 commits

  • This patch improves the gc efficiency by optimizing the victim
    selection policy. With this optimization, the random re-write
    performance could increase up to 20%.

    For f2fs, when disk is in shortage of free spaces, gc will selects
    dirty segments and moves valid blocks around for making more space
    available. The gc cost of a segment is determined by the valid blocks
    in the segment. The less the valid blocks, the higher the efficiency.
    The ideal victim segment is the one that has the most garbage blocks.

    Currently, it searches up to 20 dirty segments for a victim segment.
    The selected victim is not likely the best victim for gc when there
    are much more dirty segments. Why not searching more dirty segments
    for a better victim? The cost of searching dirty segments is
    negligible in comparison to moving blocks.

    In this patch, it enlarges the MAX_VICTIM_SEARCH to 4096 to make
    the search more aggressively for a possible better victim. Since
    it also applies to victim selection for SSR, it will likely improve
    the SSR efficiency as well.

    The test case is simple. It creates as many files until the disk full.
    The size for each file is 32KB. Then it writes as many as 100000
    records of 4KB size to random offsets of random files in sync mode.
    The testing was done on a 2GB partition of a SDHC card. Let's see the
    test result of f2fs without and with the patch.

    ---------------------------------------
    2GB partition, SDHC
    create 52023 files of size 32768 bytes
    random re-write 100000 records of 4KB
    ---------------------------------------
    | file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
    [no patch] 341 4227 1174 174840
    [patched] 324 2958 645 106682

    It's obvious that, with the patch, f2fs finishes the test in 20+% less
    time than without the patch. And internally it does much less gc with
    higher efficiency than before.

    Since the performance improvement is related to gc, it might not be so
    obvious for other tests that do not trigger gc as often as this one (
    This is because f2fs selects dirty segments for SSR use most of the
    time when free space is in shortage). The well-known iozone test tool
    was not used for benchmarking the patch becuase it seems do not have
    a test case that performs random re-write on a full disk.

    This patch is the revised version based on the suggestion from
    Jaegeuk Kim.

    Signed-off-by: Jin Xu
    [Jaegeuk Kim: suggested simpler solution]
    Reviewed-by: Jaegeuk Kim
    Signed-off-by: Jaegeuk Kim

    Jin Xu
     
  • Previously, we experience bio traces as follows when running simple sequential
    write test.

    f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500104928, size = 4K
    f2fs_do_submit_bio: type = NODE, io = no sync, sector = 499922208, size = 368K
    f2fs_do_submit_bio: type = NODE, io = no sync, sector = 499914752, size = 140K

    -> total 512K

    The first one is to write an indirect node block, and the others are to write
    direct node blocks.

    The reason why there are two separate bios for direct node blocks is:
    0. initial state
    ------------------ ------------------
    | | |xxxxxxxx |
    ------------------ ------------------

    1. write 368K
    ------------------ ------------------
    | | |xxxxxxxxWWWWWWWW|
    ------------------ ------------------

    2. write 140K
    ------------------ ------------------
    |WWWWWWW | |xxxxxxxxWWWWWWWW|
    ------------------ ------------------

    This is because f2fs_write_node_pages tries to write just 512K totally, so that
    we can lose the chance to merge more bios nicely.

    After this patch is applied, we can get the following bio traces.

    f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500103168, size = 8K
    f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500111368, size = 4K
    f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500107272, size = 512K
    f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500108296, size = 512K
    f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500109320, size = 500K

    And finally, we can improve the sequential write performance,
    from 458.775 MB/s to 479.945 MB/s on SSD.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

03 Sep, 2013

2 commits


27 Aug, 2013

2 commits


26 Aug, 2013

6 commits

  • 0. modified inode structure
    --------------------------------------
    metadata (e.g., i_mtime, i_ctime, etc)
    --------------------------------------
    direct pointers [0 ~ 873]

    inline xattrs (200 bytes by default)

    indirect pointers [0 ~ 4]
    --------------------------------------
    node footer
    --------------------------------------

    1. setxattr flow
    - read_all_xattrs copies all the xattrs from inline and xattr node block.
    - handle xattr entries
    - write_all_xattrs copies modified xattrs into inline and xattr node block.

    2. getxattr flow
    - read_all_xattrs copies all the xattrs from inline and xattr node block.
    - check target entries

    3. Usage
    # mount -t f2fs -o inline_xattr $DEV $MNT

    Once mounted with the inline_xattr option, f2fs marks all the newly created
    files to reserve an amount of inline xattr space explicitly inside the inode
    block. Without the mount option, f2fs will not touch any existing files and
    newly created files as well.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • The truncate_xattr_node function will be used by inline xattr.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • The __find_xattr is to search the wanted xattr entry starting from the
    base_addr.

    If not found, the returned entry is the last empty xattr entry that can be
    allocated newly.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch enables the number of direct pointers inside on-disk inode block to
    be changed dynamically according to the size of inline xattr space.

    The number of direct pointers, ADDRS_PER_INODE, can be changed only if the file
    has inline xattr flag.

    The number of direct pointers that will be used by inline xattrs is defined as
    F2FS_INLINE_XATTR_ADDRS.
    Current patch assigns F2FS_INLINE_XATTR_ADDRS to 0 temporarily.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch adds basic inode flags for inline xattrs, F2FS_INLINE_XATTR,
    and add a mount option, inline_xattr, which is enabled when xattr is set.

    If the mount option is enabled, all the files are marked with the inline_xattrs
    flag.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Fix to return -ENOMEM in the kset create and add error handling
    case instead of 0, as done elsewhere in this function.

    Introduced by commit b59d0bae6ca30c496f298881616258f9cde0d9c6.
    (f2fs: add sysfs support for controlling the gc_thread)

    Signed-off-by: Wei Yongjun
    Acked-by: Namjae Jeon
    [Jaegeuk Kim: merge the patch with previous modification]
    Signed-off-by: Jaegeuk Kim

    Wei Yongjun
     

20 Aug, 2013

2 commits

  • This patch removes a false-alaramed BUG_ON.
    The previous BUG_ON condition didn't cover the following true scenario.

    In f2fs_add_link, 1) get_new_data_page gives an uptodate page successfully,
    and then, 2) init_inode_metadata returns -ENOSPC.
    At this moment, a new clean data page is remained in the page cache, but its
    block address still indicates NEW_ADDR.
    After then, even if sync is called, this clean data page cannot be written to
    the disk due to the clean state.

    So this means that get_lock_data_page should make a new empty page when its
    block address is NEW_ADDR and its page is not uptodated.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • When any of the caches create fails in init_f2fs_fs(), the other caches which are
    create successful should be free.

    Signed-off-by: Zhao Hongjiang
    Signed-off-by: Jaegeuk Kim

    Zhao Hongjiang
     

19 Aug, 2013

3 commits

  • An error "label at end of compound statement" will occur if CONFIG_F2FS_STAT_FS
    disabled.
    fs/f2fs/segment.c:556:1: error: label at end of compound statement
    So clean up the 'out' label to fix it.

    Reported-by: Fengguang Wu
    Signed-off-by: Gu Zheng
    Signed-off-by: Jaegeuk Kim

    Gu Zheng
     
  • In f2fs_write_inode, updating inode after f2fs_balance_fs is not
    a optimized way in the case that f2fs_gc is performed ahead. The
    inode page will be unnecessarily written out twice, one of which
    is in f2fs_gc->...->sync_node_pages and the other is in
    update_inode_page.

    Let's update the inode page in prior to f2fs_balance_fs to avoid
    this.

    To reproduce it,
    $ touch file (before this step, should make the device need f2fs_gc)
    $ sync (or wait the bdi to write dirty inode)

    Signed-off-by: Jin Xu
    Signed-off-by: Jaegeuk Kim

    Jin Xu
     
  • alloc_page() returns a NULL on failure, it never returns an ERR_PTR.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Jaegeuk Kim

    Dan Carpenter
     

12 Aug, 2013

3 commits


09 Aug, 2013

3 commits

  • This patch introduces a new inline function, cur_cp_version, to reduce redundant
    codes.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Previously xattr node blocks are stored to the COLD_NODE log, which means that
    our roll-forward mechanism doesn't recover the xattr node blocks at all.
    Only the direct node blocks in the WARM_NODE log can be recovered.

    So, let's resolve the issue simply by conducting checkpoint during fsync when a
    file has a modified xattr node block.

    This approach is able to degrade the performance, but normally the checkpoint
    overhead is shown at the initial fsync call after the xattr entry changes.
    Once the checkpoint is done, no additional overhead would be occurred.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch fixes the use of XATTR_NODE_OFFSET.

    o The offset should not use several MSB bits which are used by marking node
    blocks.

    o IS_DNODE should handle XATTR_NODE_OFFSET to avoid potential abnormality
    during the fsync call.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

08 Aug, 2013

1 commit

  • This patch should resolve the following error reported by kbuild test robot.

    All error/warnings:

    In file included from fs/f2fs/dir.c:13:0:
    >> fs/f2fs/f2fs.h:435:17: error: field 's_kobj' has incomplete type
    struct kobject s_kobj;

    The failure was caused by missing the kobject header file in dir.c.
    So, this patch move the header file to the right location, f2fs.h.

    CC: Namjae Jeon
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

06 Aug, 2013

4 commits

  • This patch fixes a deadlock bug that occurs quite often when there are
    concurrent write and fsync on a same file.

    Following is the simplified call trace when tasks get hung.

    fsync thread:
    - f2fs_sync_file
    ...
    - f2fs_write_data_pages
    ...
    - update_extent_cache
    ...
    - update_inode
    - wait_on_page_writeback

    bdi writeback thread
    - __writeback_single_inode
    - f2fs_write_data_pages
    - mutex_lock(sbi->writepages)

    The deadlock happens when the fsync thread waits on a inode page that has
    been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately,
    no one else could be able to submit the cached bio to block layer for
    writeback. This is because the fsync thread already hold a sbi->fs_lock and
    the sbi->writepages lock, causing the bdi thread being blocked when attempt
    to write data pages for the same inode. At the same time, f2fs_gc thread
    does not notice the situation and could not help. Even the sync syscall
    gets blocked.

    To fix it, we could submit the cached bio first before waiting on a inode page
    that is being written back.

    Signed-off-by: Jin Xu
    [Jaegeuk Kim: add more cases to use f2fs_wait_on_page_writeback]
    Signed-off-by: Jaegeuk Kim

    Jin Xu
     
  • This code is being used for nobh_write_end() function.
    But since now f2fs_write_end function is added so
    there is no need for this code.

    Signed-off-by: Namjae Jeon
    Signed-off-by: Pankaj Kumar
    Signed-off-by: Jaegeuk Kim

    Namjae Jeon
     
  • Add sysfs entry gc_idle to control the gc policy. Where
    gc_idle = 1 corresponds to selecting a cost benefit approach,
    while gc_idle = 2 corresponds to selecting a greedy approach
    to garbage collection. The selection is mutually exclusive one
    approach will work at any point. If gc_idle = 0, then this
    option is disabled.

    Cc: Gu Zheng
    Signed-off-by: Namjae Jeon
    Signed-off-by: Pankaj Kumar
    Reviewed-by: Gu Zheng
    [Jaegeuk Kim: change the select_gc_type() flow slightly]
    Signed-off-by: Jaegeuk Kim

    Namjae Jeon
     
  • Add sysfs entries to control the timing parameters for
    f2fs gc thread.

    Various Sysfs options introduced are:
    gc_min_sleep_time: Min Sleep time for GC in ms
    gc_max_sleep_time: Max Sleep time for GC in ms
    gc_no_gc_sleep_time: Default Sleep time for GC in ms

    Cc: Gu Zheng
    Signed-off-by: Namjae Jeon
    Signed-off-by: Pankaj Kumar
    Reviewed-by: Gu Zheng
    [Jaegeuk Kim: fix an umount bug and some minor changes]
    Signed-off-by: Jaegeuk Kim

    Namjae Jeon
     

31 Jul, 2013

1 commit


30 Jul, 2013

10 commits


08 Jul, 2013

1 commit

  • In the previous Al Viro's readdir patch set, there occurs a bug when
    running
    xfstest: 006 as follows.

    [Error output]
    alpha size = 4, name length = 6, total files = 4096, nproc=1
    1023 files created
    rm: cannot remove `/mnt/f2fs/permname.15150/a': Directory not empty

    [Correct output]
    alpha size = 4, name length = 6, total files = 4096, nproc=1
    4097 files created

    This bug is due to the misupdate of directory position in ctx.
    So, this patch fixes this.

    [AV: fixed a braino]

    CC: Al Viro
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Al Viro

    Jaegeuk Kim