13 Oct, 2016

1 commit


01 Oct, 2016

6 commits


13 Sep, 2016

1 commit


08 Sep, 2016

1 commit

  • In order to enhance performance, we try to readahead node page during
    GC, but before loading node page we should get block address of node page
    which is stored in NAT table, so synchronously read of single NAT page
    block our readahead flow.

    f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xa1e, oldaddr = 0xa1e, newaddr = 0xa1e, rw = READ_SYNC(MP), type = META
    f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x35e9, oldaddr = 0x72d7a, newaddr = 0x72d7a, rw = READAHEAD ^H, type = NODE
    f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xc1f, oldaddr = 0xc1f, newaddr = 0xc1f, rw = READ_SYNC(MP), type = META
    f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x389d, oldaddr = 0x72d7d, newaddr = 0x72d7d, rw = READAHEAD ^H, type = NODE
    f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3a82, oldaddr = 0x72d7f, newaddr = 0x72d7f, rw = READAHEAD ^H, type = NODE
    f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3bfa, oldaddr = 0x72d86, newaddr = 0x72d86, rw = READAHEAD ^H, type = NODE

    This patch adds one phase that do readahead NAT pages in batch before
    readahead node page for more effeciently.

    f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0x1952, oldaddr = 0x1952, newaddr = 0x1952, rw = READ_SYNC(MP), type = META
    f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc34, oldaddr = 0xc34, newaddr = 0xc34, rw = READ_SYNC(MP), type = META
    f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa33, oldaddr = 0xa33, newaddr = 0xa33, rw = READ_SYNC(MP), type = META
    f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc30, oldaddr = 0xc30, newaddr = 0xc30, rw = READ_SYNC(MP), type = META
    f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc32, oldaddr = 0xc32, newaddr = 0xc32, rw = READ_SYNC(MP), type = META
    f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc26, oldaddr = 0xc26, newaddr = 0xc26, rw = READ_SYNC(MP), type = META
    f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa2b, oldaddr = 0xa2b, newaddr = 0xa2b, rw = READ_SYNC(MP), type = META
    f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc23, oldaddr = 0xc23, newaddr = 0xc23, rw = READ_SYNC(MP), type = META
    f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc24, oldaddr = 0xc24, newaddr = 0xc24, rw = READ_SYNC(MP), type = META
    f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa10, oldaddr = 0xa10, newaddr = 0xa10, rw = READ_SYNC(MP), type = META
    f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc2c, oldaddr = 0xc2c, newaddr = 0xc2c, rw = READ_SYNC(MP), type = META
    f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db7, oldaddr = 0x6be00, newaddr = 0x6be00, rw = READAHEAD ^H, type = NODE
    f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db9, oldaddr = 0x6be17, newaddr = 0x6be17, rw = READAHEAD ^H, type = NODE
    f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dbc, oldaddr = 0x6be1a, newaddr = 0x6be1a, rw = READAHEAD ^H, type = NODE
    f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc3, oldaddr = 0x6be20, newaddr = 0x6be20, rw = READAHEAD ^H, type = NODE
    f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc7, oldaddr = 0x6be24, newaddr = 0x6be24, rw = READAHEAD ^H, type = NODE
    f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc9, oldaddr = 0x6be25, newaddr = 0x6be25, rw = READAHEAD ^H, type = NODE

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

30 Aug, 2016

1 commit

  • This patch changes to check valid block number of one GCed section
    directly instead of checking the number in all segments of section
    one by one in order to clean up codes of foreground GC.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

28 Jul, 2016

1 commit

  • Pull f2fs updates from Jaegeuk Kim:
    "The major change in this version is mitigating cpu overheads on write
    paths by replacing redundant inode page updates with mark_inode_dirty
    calls. And we tried to reduce lock contentions as well to improve
    filesystem scalability. Other feature is setting F2FS automatically
    when detecting host-managed SMR.

    Enhancements:
    - ioctl to move a range of data between files
    - inject orphan inode errors
    - avoid flush commands congestion
    - support lazytime

    Bug fixes:
    - return proper results for some dentry operations
    - fix deadlock in add_link failure
    - disable extent_cache for fcollapse/finsert"

    * tag 'for-f2fs-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
    f2fs: clean up coding style and redundancy
    f2fs: get victim segment again after new cp
    f2fs: handle error case with f2fs_bug_on
    f2fs: avoid data race when deciding checkpoin in f2fs_sync_file
    f2fs: support an ioctl to move a range of data blocks
    f2fs: fix to report error number of f2fs_find_entry
    f2fs: avoid memory allocation failure due to a long length
    f2fs: reset default idle interval value
    f2fs: use blk_plug in all the possible paths
    f2fs: fix to avoid data update racing between GC and DIO
    f2fs: add maximum prefree segments
    f2fs: disable extent_cache for fcollapse/finsert inodes
    f2fs: refactor __exchange_data_block for speed up
    f2fs: fix ERR_PTR returned by bio
    f2fs: avoid mark_inode_dirty
    f2fs: move i_size_write in f2fs_write_end
    f2fs: fix to avoid redundant discard during fstrim
    f2fs: avoid mismatching block range for discard
    f2fs: fix incorrect f_bfree calculation in ->statfs
    f2fs: use percpu_rw_semaphore
    ...

    Linus Torvalds
     

23 Jul, 2016

1 commit

  • Previous selected segment may become free after write_checkpoint,
    if we do garbage collect on this segment, and then new_curseg happen
    to reuse it, it may cause f2fs_bug_on as below.

    panic+0x154/0x29c
    do_garbage_collect+0x15c/0xaf4
    f2fs_gc+0x2dc/0x444
    f2fs_balance_fs.part.22+0xcc/0x14c
    f2fs_balance_fs+0x28/0x34
    f2fs_map_blocks+0x5ec/0x790
    f2fs_preallocate_blocks+0xe0/0x100
    f2fs_file_write_iter+0x64/0x11c
    new_sync_write+0xac/0x11c
    vfs_write+0x144/0x1e4
    SyS_write+0x60/0xc0

    Here, maybe we check sit and ssa type during reset_curseg. So, we check
    segment is stale or not, and select a new victim to avoid this.

    Signed-off-by: Yunlei He
    Signed-off-by: Jaegeuk Kim

    Yunlei He
     

21 Jul, 2016

1 commit

  • These two are confusing leftover of the old world order, combining
    values of the REQ_OP_ and REQ_ namespaces. For callers that don't
    special case we mostly just replace bi_rw with bio_data_dir or
    op_is_write, except for the few cases where a switch over the REQ_OP_
    values makes more sense. Any check for READA is replaced with an
    explicit check for REQ_RAHEAD. Also remove the READA alias for
    REQ_RAHEAD.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Mike Christie
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

16 Jul, 2016

2 commits

  • This patch reverts 19a5f5e2ef37 (f2fs: drop any block plugging),
    and adds blk_plug in write paths additionally.

    The main reason is that blk_start_plug can be used to wake up from low-power
    mode before submitting further bios.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Datas in file can be operated by GC and DIO simultaneously, so we will
    face race case as below:

    For write case:
    Thread A Thread B
    - generic_file_direct_write
    - invalidate_inode_pages2_range
    - f2fs_direct_IO
    - do_blockdev_direct_IO
    - do_direct_IO
    - get_more_blocks
    - f2fs_gc
    - do_garbage_collect
    - gc_data_segment
    - move_data_page
    - do_write_data_page
    migrate data block to new block address
    - dio_bio_submit
    update user data to old block address

    For read case:
    Thread A Thread B
    - generic_file_direct_write
    - invalidate_inode_pages2_range
    - f2fs_direct_IO
    - do_blockdev_direct_IO
    - do_direct_IO
    - get_more_blocks
    - f2fs_balance_fs
    - f2fs_gc
    - do_garbage_collect
    - gc_data_segment
    - move_data_page
    - do_write_data_page
    migrate data block to new block address
    - write_checkpoint
    - do_checkpoint
    - clear_prefree_segments
    - f2fs_issue_discard
    discard old block adress
    - dio_bio_submit
    update user buffer from obsolete block address

    In order to fix this, for one file, we should let DIO and GC getting exclusion
    against with each other.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

09 Jul, 2016

2 commits

  • If we fail to move data page during foreground GC, we should give another
    chance to writeback that page which was set dirty previously by writer.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • In procedure of synchonized read, after sending out the read request, reader
    will try to lock the page for waiting device to finish the read jobs and
    unlock the page, but meanwhile, truncater will race with reader, so after
    reader get lock of the page, it should check page's mapping to detect
    whether someone has truncated the page in advance, then reader has the
    chance to do the retry if truncation was done, otherwise read can be failed
    due to previous condition check.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

09 Jun, 2016

2 commits


08 Jun, 2016

1 commit

  • Separate the op from the rq_flag_bits and have f2fs
    set/get the bio using bio_set_op_attrs/bio_op.

    Signed-off-by: Mike Christie
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     

03 Jun, 2016

1 commit


08 May, 2016

1 commit


28 Apr, 2016

1 commit

  • For foreground GC, we cache node blocks in victim section and set them
    dirty, then we call sync_node_pages to flush these node pages, but
    meanwhile, those node pages which does not locate in victim section
    will be flushed together, so more bandwidth and continuous free space
    would be occupied.

    So for this condition, it's better to leave those unrelated node page
    in cache for further write hit, and let CP or VM to flush them afterward.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

27 Apr, 2016

1 commit


27 Feb, 2016

2 commits

  • Add a new help f2fs_update_data_blkaddr to clean up redundant codes.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • For now, flow of GCing an encrypted data page:
    1) try to grab meta page in meta inode's mapping with index of old block
    address of that data page
    2) load data of ciphertext into meta page
    3) allocate new block address
    4) write the meta page into new block address
    5) update block address pointer in direct node page.

    Other reader/writer will use f2fs_wait_on_encrypted_page_writeback to
    check and wait on GCed encrypted data cached in meta page writebacked
    in order to avoid inconsistence among data page cache, meta page cache
    and data on-disk when updating.

    However, we will use new block address updated in step 5) as an index to
    lookup meta page in inner bio buffer. That would be wrong, and we will
    never find the GCing meta page, since we use the old block address as
    index of that page in step 1).

    This patch fixes the issue by adjust the order of step 1) and step 3),
    and in step 1) grab page with index generated in step 3).

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

23 Feb, 2016

8 commits

  • This patch enables to trace old block address of CoWed page for better
    debugging.

    f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE
    f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE
    f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE

    f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA
    f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA
    f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA

    f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA
    f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA
    f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • With a partition which was formated as multi segments in one section,
    we stated incorrectly for count of gc operation.

    e.g., for a partition with segs_per_sec = 4

    cat /sys/kernel/debug/f2fs/status

    GC calls: 208 (BG: 7)
    - data segments : 104 (52)
    - node segments : 104 (24)

    GC called count should be (104 (data segs) + 104 (node segs)) / 4 = 52,
    rather than 208. Fix it.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch avoids to remain inefficient victim segment number selected by
    a victim.

    For example, if all the dirty segments has same valid blocks, we can get
    the victim segments descending order due to keeping wrong last segment number.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • There are redundant pointer conversion in following call stack:
    - at position a, inode was been converted to f2fs_file_info.
    - at position b, f2fs_file_info was been converted to inode again.

    - truncate_blocks(inode,..)
    - fi = F2FS_I(inode) ---a
    - ADDRS_PER_PAGE(node_page, fi)
    - addrs_per_inode(fi)
    - inode = &fi->vfs_inode ---b
    - f2fs_has_inline_xattr(inode)
    - fi = F2FS_I(inode)
    - is_inode_flag_set(fi,..)

    In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and
    addrs_per_inode to acept parameter with type of inode pointer.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • variable nsearched in get_victim_by_default() indicates the number of
    dirty segments we already checked. There are 2 problems about the way
    it updates:
    1. When p.ofs_unit is greater than 1, the victim we find consists
    of multiple segments, possibly more than 1 dirty segment.
    But nsearched always increases by 1.
    2. If segments have been found but not been chosen, nsearched won't
    increase. So even we have checked all dirty segments, nsearched
    may still less than p.max_search.
    All these problems could cause unnecessary search after all dirty
    segments have already been checked.

    Signed-off-by: Fan li
    Signed-off-by: Jaegeuk Kim

    Fan Li
     
  • In write_begin, if storage supports stable_page, we don't need to wait for
    writeback to update its contents.
    This patch introduces to use wait_for_stable_page instead of
    wait_on_page_writeback.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • If we configure section consist of multiple segments, foreground GC will
    do the garbage collection with following approach:

    for each segment in victim section
    blk_start_plug
    for each valid block in segment
    write out by OPU method
    submit bio cache
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • The scenario is:
    1. create lots of node blocks
    2. sync
    3. write lots of inline_data
    -> got panic due to no free space

    In that case, we should flush node blocks when writing inline_data in #3,
    and trigger gc as well.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

12 Jan, 2016

1 commit


31 Dec, 2015

1 commit

  • Sometimes we keep dumb when IO error occur in lower layer device, so user
    will not receive any error return value for some operation, but actually,
    the operation did not succeed.

    This sould be avoided, so this patch reports such kind of error to user.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

05 Dec, 2015

1 commit


14 Oct, 2015

2 commits

  • Once f2fs_gc is done, wait_ms is changed once more.
    So, its tracepoint would be located after it.

    Reported-by: He YunLei
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • different competitors

    Since we use different page cache (normally inode's page cache for R/W
    and meta inode's page cache for GC) to cache the same physical block
    which is belong to an encrypted inode. Writeback of these two page
    cache should be exclusive, but now we didn't handle writeback state
    well, so there may be potential racing problem:

    a)
    kworker: f2fs_gc:
    - f2fs_write_data_pages
    - f2fs_write_data_page
    - do_write_data_page
    - write_data_page
    - f2fs_submit_page_mbio
    (page#1 in inode's page cache was queued
    in f2fs bio cache, and be ready to write
    to new blkaddr)
    - gc_data_segment
    - move_encrypted_block
    - pagecache_get_page
    (page#2 in meta inode's page cache
    was cached with the invalid datas
    of physical block located in new
    blkaddr)
    - f2fs_submit_page_mbio
    (page#1 was submitted, later, page#2
    with invalid data will be submitted)

    b)
    f2fs_gc:
    - gc_data_segment
    - move_encrypted_block
    - f2fs_submit_page_mbio
    (page#1 in meta inode's page cache was
    queued in f2fs bio cache, and be ready
    to write to new blkaddr)
    user thread:
    - f2fs_write_begin
    - f2fs_submit_page_bio
    (we submit the request to block layer
    to update page#2 in inode's page cache
    with physical block located in new
    blkaddr, so here we may read gabbage
    data from new blkaddr since GC hasn't
    writebacked the page#1 yet)

    This patch fixes above potential racing problem for encrypted inode.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

13 Oct, 2015

1 commit

  • Now, we use ra_meta_pages to reads continuous physical blocks as much as
    possible to improve performance of following reads. However, ra_meta_pages
    uses a synchronous readahead approach by submitting bio with READ, as READ
    is with high priority, it can not be used in the case of preloading blocks,
    and it's not sure when these RAed pages will be used.

    This patch supports asynchronous readahead in ra_meta_pages by tagging bio
    with READA flag in order to allow preloading.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu