22 Aug, 2015

6 commits

  • This patch alters to replace total hit stat with rbtree hit stat,
    and then adjust showing of extent cache stat:

    Hit Count:
    L1-1: for largest node hit count;
    L1-2: for last cached node hit count;
    L2: for extent node hit after lookuping in rbtree.

    Hit Ratio:
    ratio (hit count / total lookup count)

    Inner Struct Count:
    tree count, node count.

    Before:
    Extent Hit Ratio: 0 / 2

    Extent Tree Count: 3

    Extent Node Count: 2

    Patched:
    Exten Cacache:
    - Hit Count: L1-1:4871 L1-2:2074 L2:208
    - Hit Ratio: 1% (7153 / 550751)
    - Inner Struct Count: tree: 26560, node: 11824

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch adds to stat the hit count of largest/cached node for showing
    in debugfs.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • The test step is like below:
    1. touch file
    2. truncate -s $((1024*1024)) file
    3. fallocate -o 0 -l $((1024*1024)) file
    4. fibmap.f2fs file

    Our result of fibmap.f2fs showed below is not correct:

    file_pos start_blk end_blk blks
    0 -937166132 -937166132 1
    4096 -937166132 -937166132 1
    8192 -937166132 -937166132 1
    12288 -937166132 -937166132 1
    16384 -937166132 -937166132 1
    20480 -937166132 -937166132 1
    ...
    1040384 -937166132 -937166132 1
    1044480 -937166132 -937166132 1

    This is because f2fs_map_blocks will return with no error when meeting
    a hole or preallocated block, the caller __get_data_block will map the
    uninitialized variable value to bh->b_blocknr.

    Unfortunately generic_block_bmap will neither check the return value of
    get_data() nor check mapping info of buffer_head, result in returning
    the random block address.

    After fixing the issue, our result shows correctly:

    file_pos start_blk end_blk blks
    0 0 0 256

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Add annotation to let us know more clearly about space utilization
    information of regular dentry and inline dentry.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • In f2fs_lookup_extent_tree, et->cached_en was read and updated with only
    read lock held,
    it could cause __lookup_extent_tree within return entirely wrong
    extent_node, if other
    thread update et->cached_en just before __lookup_extent_tree return.

    However, there are two things about this patch that need to be noticed:
    1. It does no good to arrange the order of concurrent read/write, the result
    would still
    be random in such case.
    2. It's built on this assumption: the mix up of reads and writes on a single
    pointer would
    not make the pointer partially wrong at any time. Please let me know if I'm
    wrong, thx.

    Signed-off-by: Fan li
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Fan Li
     
  • Fix typo.

    Signed-off-by: Junesung Lee
    Signed-off-by: Jaegeuk Kim

    Junesung Lee
     

21 Aug, 2015

12 commits


20 Aug, 2015

1 commit


15 Aug, 2015

3 commits


12 Aug, 2015

2 commits

  • Previously, we use radix tree to index all registered page entries for
    atomic file, but now we only use radix tree to see whether current page
    is indexed or not, since the other user of radix tree is gone in commit
    042b7816aaeb ("f2fs: remove unnecessary call to invalidate inmemory pages").

    So in this patch, we try to use one more efficient way:
    Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private
    value to indicate page indexing status. By using this way, we can save
    memory and lookup time.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • We run ltp testcase with f2fs and obtain a TFAIL in diotest4, the result in
    detail is as fallow:

    dio04

    <<>>
    tag=dio04 stime=1432278894
    cmdline="diotest4"
    contacts=""
    analysis=exit
    <<>>
    diotest4 1 TPASS : Negative Offset
    diotest4 2 TPASS : removed
    diotest4 3 TFAIL : diotest4.c:129: write allows odd count.returns 1: Success
    diotest4 4 TFAIL : diotest4.c:183: Odd count of read and write
    diotest4 5 TPASS : Read beyond the file size
    ......

    the result of ext4 with same environment:

    dio04

    <<>>
    tag=dio04 stime=1432259643
    cmdline="diotest4"
    contacts=""
    analysis=exit
    <<>>
    diotest4 1 TPASS : Negative Offset
    diotest4 2 TPASS : removed
    diotest4 3 TPASS : Odd count of read and write
    diotest4 4 TPASS : Read beyond the file size
    ......

    The reason is that when triggering DIO in f2fs, we will return zero value
    in ->direct_IO if writer's buffer offset, file offset and transfer size is
    not alignment to block size of filesystem, resulting in falling back into
    buffered write instead of returning -EINVAL.

    This patch fixes that problem by returning correct error number for above
    case, and removing the judgement condition in check_direct_IO to make sure
    the verification will be enabled for direct reader too.

    Besides, Jaegeuk Kim pointed out that there is expectional cases we should
    always make direct-io falling back into buffered write, such as dio in
    encrypted file.

    Signed-off-by: Yunlei He
    [Chao Yu make small change and add detail description in commit message]
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

11 Aug, 2015

1 commit

  • fill_zero can fail due to a lot of reason, but previously we do not handle
    its return value, so its callers such as punch_hole/f2fs_zero_range may
    report success, but actually can fail because of error occurs inside
    fill_zero.

    This patch fixes to report correct return value of fill_zero.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

06 Aug, 2015

2 commits

  • When testing with generic/101 in xfstests, error message outputed as below:

    --- tests/generic/101.out
    +++ results//generic/101.out.bad
    @@ -10,10 +10,14 @@
    File foo content after log replay:
    0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
    *
    -0200000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    +0200000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
    *
    0372000
    ...
    (Run 'diff -u tests/generic/101.out results/generic/101.out.bad' to see the entire diff)

    The test flow is like below:
    1. pwrite foo -S 0xaa 0 64K
    2. pwrite foo -S 0xbb 64K 61K
    3. sync
    4. truncate foo 64K
    5. truncate foo 125K
    6. fsync foo
    7. flakey drop writes
    8. umount

    After this test, we expect the data of recovered file will have the first
    64k of data filling with value 0xaa and the next 61k of data filling with
    value 0x00 because we have fsynced it before dropping writes in dm.

    In f2fs, during recovering, we will only recover the valid block address
    in direct node page if it is marked as a fsynced dnode, but block address
    which means invalid/reserved (with value NULL_ADDR/NEW_ADDR) will not be
    recovered. So, the file recovered shows its incorrect data 0xbb in range of
    [61k, 125k].

    In this patch, we fix to recover invalid/reserved block during recover flow.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • In some cases, we only need the block address when we call
    f2fs_reserve_block,
    other fields of struct dnode_of_data aren't necessary.
    We can try extent cache first for such cases in order to speed up the
    process.

    Signed-off-by: Fan li
    Signed-off-by: Jaegeuk Kim

    Fan Li
     

05 Aug, 2015

13 commits

  • To avoid meeting garbage data in next free node block at the end of warm
    node chain when doing recovery, we will try to zero out that invalid block.

    If the device is not support discard, our way for zeroing out block is:
    grabbing a temporary zeroed page in meta inode, then, issue write request
    with this page.

    But, we forget to release that temporary page, so our memory usage will
    increase without gaining any hit ratio benefit, so it's better to free it
    for saving memory.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • In following call path, we will pass a locked and referenced ipage
    pointer to get_new_data_page:
    - init_inode_metadata
    - make_empty_dir
    - get_new_data_page

    There are two exit paths in get_new_data_page when error occurs:
    1) grab_cache_page fails, ipage will not be released;
    2) f2fs_reserve_block fails, ipage will be released in callee.

    So, it's not consistent for error handling in get_new_data_page.

    For f2fs_reserve_block, it's not very easy to change the rule
    of error handling, since it's already complicated.

    Here we deside to choose an easy way to fix this issue:
    If any error occur in get_new_data_page, we will ensure releasing
    ipage in this function.

    The same issue is in f2fs_convert_inline_dir, fix that too.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Replace BUG_ON with f2fs_bug_on to deal with
    block and segment validity check failed.

    Signed-off-by: Xue Liu
    Signed-off-by: Jaegeuk Kim

    Liu Xue
     
  • In get_meta_page, we guarantee no failure for the returned page,
    but sometimes, IO error from device will incur returning an
    non-updated page.

    Then, we still use this page as updated one, exception could happen
    when using this kind of page.

    So in this condition, we'd better freeze fs by making fs readonly and
    and stop doing checkpoint.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • some backing devices need pages to be stable during writeback. It doesn't
    matter if
    the page is completely overwritten or already uptodate, it needs to wait
    before write.

    Signed-off-by: Fan li
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Fan Li
     
  • This patch adds to handle error cases in commit_inmem_pages.
    If an error occurs, it stops to write the pages and return the error right
    away.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • When there is no enough free nids in free nid cache, we will try to
    readahead FREE_NID_PAGES:4 nat pages into page cache of meta_inode,
    then, reading nat entries in nat page for adding free nids to free nid
    cache.

    But when traversing all nat pages we readaheaded in a circulation,
    our exit condition is not set right, one more nat page will be scanned
    without readaheading, resulting worse read performance.

    This patch fixes to read the correct number nat pages to avoid bad
    performance.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • If we clear inline data/dentry flag in handle_failed_inode, we will fail
    to decline the stat count of inline data/dentry in f2fs_evict_inode due
    to no flag in inode. So remove the wrong clearing.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • In f2fs_ioc_start_{atomic,volatile}_write, if we failed in converting
    inline data, we will report error to user, but still remain atomic/volatile
    flag in inode, it will impact further writes for this file. Fix it.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch fixes the incorrect range (0, LONG_MAX) which is used
    in ranged fsync. If we use LONG_MAX as the parameter for indicating
    the end of file we want to synchronize, in 32-bits architecture
    machine, these datas after 4GB offset may not be persisted in
    storage after ->fsync returned.

    Here, we alter LONG_MAX to LLONG_MAX to fix this issue.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • When flushing comes from background, if there is no dirty page in the
    mapping of inode, we'd better to skip seeking dirty page from mapping
    for writebacking.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • The if statement "goto continue_unlock" is exactly the same when
    each if condition is true that is depended on the value of both
    "step" and "is_cold_data(page)" are 0 or 1. That means when the
    value of "step" equals to "is_cold_data(page)", the if condition
    is true and the if statement "goto continue_unlock" appears only
    once, so it can be optimized to reduce the duplicated code.

    Signed-off-by: Tiezhu Yang
    Signed-off-by: Jaegeuk Kim

    Tiezhu Yang
     
  • In handle_failed_inode, there is a potential deadlock which can happen
    in below call path:

    - f2fs_create
    - f2fs_lock_op down_read(cp_rwsem)
    - f2fs_add_link
    - __f2fs_add_link
    - init_inode_metadata
    - f2fs_init_security failed
    - truncate_blocks failed
    - handle_failed_inode
    - f2fs_truncate
    - truncate_blocks(..,true)
    - write_checkpoint
    - block_operations
    - f2fs_lock_all down_write(cp_rwsem)
    - f2fs_lock_op down_read(cp_rwsem)

    So in this path, we pass parameter to f2fs_truncate to make sure
    cp_rwsem in truncate_blocks will not be locked again.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu