01 Oct, 2016

3 commits


14 Sep, 2016

1 commit


13 Sep, 2016

1 commit


08 Sep, 2016

2 commits

  • Add roll-forward recovery process for encrypted dentry, so the first fsync
    issued to an encrypted file does not need writing checkpoint.

    This improves the performance of the following test at thousands of small
    files: open -> write -> fsync -> close

    Signed-off-by: Shuoran Liu
    Acked-by: Chao Yu
    [Jaegeuk Kim: modify kernel message to show encrypted names]
    Signed-off-by: Jaegeuk Kim

    Shuoran Liu
     
  • Like most filesystems, f2fs will issue discard command synchronously, so
    when user trigger fstrim through ioctl, multiple discard commands will be
    issued serially with sync mode, which makes poor performance.

    In this patch we try to support async discard, so that all discard
    commands can be issued and be waited for endio in batch to improve
    performance.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

21 Jul, 2016

2 commits


14 Jun, 2016

1 commit


03 Jun, 2016

3 commits


21 May, 2016

1 commit


19 May, 2016

1 commit


12 May, 2016

1 commit


08 May, 2016

2 commits

  • When testing f2fs with inline_dentry option, generic/342 reports:
    VFS: Busy inodes after unmount of dm-0. Self-destruct in 5 seconds. Have a nice day...

    After rmmod f2fs module, kenrel shows following dmesg:
    =============================================================================
    BUG f2fs_inode_cache (Tainted: G O ): Objects remaining in f2fs_inode_cache on __kmem_cache_shutdown()
    -----------------------------------------------------------------------------

    Disabling lock debugging due to kernel taint
    INFO: Slab 0xf51ca0e0 objects=22 used=1 fp=0xd1e6fc60 flags=0x40004080
    CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
    Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
    00000086 00000086 d062fe18 c13a83a0 f51ca0e0 d062fe38 d062fea4 c11c7276
    c1981040 f51ca0e0 00000016 00000001 d1e6fc60 40004080 656a624f 20737463
    616d6572 6e696e69 6e692067 66326620 6e695f73 5f65646f 68636163 6e6f2065
    Call Trace:
    [] dump_stack+0x5f/0x8f
    [] slab_err+0x76/0x80
    [] ? __kmem_cache_shutdown+0x100/0x2f0
    [] ? __kmem_cache_shutdown+0x100/0x2f0
    [] __kmem_cache_shutdown+0x125/0x2f0
    [] kmem_cache_destroy+0x158/0x1f0
    [] ? mutex_unlock+0xd/0x10
    [] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
    [] SyS_delete_module+0x16c/0x1d0
    [] ? do_fast_syscall_32+0x30/0x1c0
    [] ? __this_cpu_preempt_check+0xf/0x20
    [] ? trace_hardirqs_on_caller+0xdd/0x210
    [] ? trace_hardirqs_off+0xb/0x10
    [] do_fast_syscall_32+0xa1/0x1c0
    [] sysenter_past_esp+0x45/0x74
    INFO: Object 0xd1e6d9e0 @offset=6624
    kmem_cache_destroy f2fs_inode_cache: Slab cache still has objects
    CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
    Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
    00000286 00000286 d062fef4 c13a83a0 f174b000 d062ff14 d062ff28 c1198ac7
    c197fe18 f3c5b980 d062ff20 000d04f2 d062ff0c d062ff0c d062ff14 d062ff14
    f8f20dc0 fffffff5 d062e000 d062ff30 f8f15aa3 d062ff7c c10f596c 73663266
    Call Trace:
    [] dump_stack+0x5f/0x8f
    [] kmem_cache_destroy+0x1e7/0x1f0
    [] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
    [] SyS_delete_module+0x16c/0x1d0
    [] ? do_fast_syscall_32+0x30/0x1c0
    [] ? __this_cpu_preempt_check+0xf/0x20
    [] ? trace_hardirqs_on_caller+0xdd/0x210
    [] ? trace_hardirqs_off+0xb/0x10
    [] do_fast_syscall_32+0xa1/0x1c0
    [] sysenter_past_esp+0x45/0x74

    The reason is: in recovery flow, we use delayed iput mechanism for directory
    which has recovered dentry block. It means the reference of inode will be
    held until last dirty dentry page being writebacked.

    But when we mount f2fs with inline_dentry option, during recovery, dirent
    may only be recovered into dir inode page rather than dentry page, so there
    are no chance for us to release inode reference in ->writepage when
    writebacking last dentry page.

    We can call paired iget/iput explicityly for inline_dentry case, but for
    non-inline_dentry case, iput will call writeback_single_inode to write all
    data pages synchronously, but during recovery, ->writepages of f2fs skips
    writing all pages, result in losing dirent.

    This patch fixes this issue by obsoleting old mechanism, and introduce a
    new dir_list to hold all directory inodes which has recovered datas until
    finishing recovery.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • In find_fsync_dnodes, get_tmp_page will read dnode page synchronously,
    previously, ra_meta_page did the same work, which is redundant, remove
    it.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

04 May, 2016

1 commit


27 Apr, 2016

1 commit


15 Apr, 2016

1 commit


05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

23 Feb, 2016

3 commits

  • f2fs support atomic write with following semantics:
    1. open db file
    2. ioctl start atomic write
    3. (write db file) * n
    4. ioctl commit atomic write
    5. close db file

    With this flow we can avoid file becoming corrupted when abnormal power
    cut, because we hold data of transaction in referenced pages linked in
    inmem_pages list of inode, but without setting them dirty, so these data
    won't be persisted unless we commit them in step 4.

    But we should still hold journal db file in memory by using volatile
    write, because our semantics of 'atomic write support' is incomplete, in
    step 4, we could fail to submit all dirty data of transaction, once
    partial dirty data was committed in storage, then after a checkpoint &
    abnormal power-cut, db file will be corrupted forever.

    So this patch tries to improve atomic write flow by adding a revoking flow,
    once inner error occurs in committing, this gives another chance to try to
    revoke these partial submitted data of current transaction, it makes
    committing operation more like aotmical one.

    If we're not lucky, once revoking operation was failed, EAGAIN will be
    reported to user for suggesting doing the recovery with held journal file,
    or retrying current transaction again.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • There are redundant pointer conversion in following call stack:
    - at position a, inode was been converted to f2fs_file_info.
    - at position b, f2fs_file_info was been converted to inode again.

    - truncate_blocks(inode,..)
    - fi = F2FS_I(inode) ---a
    - ADDRS_PER_PAGE(node_page, fi)
    - addrs_per_inode(fi)
    - inode = &fi->vfs_inode ---b
    - f2fs_has_inline_xattr(inode)
    - fi = F2FS_I(inode)
    - is_inode_flag_set(fi,..)

    In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and
    addrs_per_inode to acept parameter with type of inode pointer.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • In write_begin, if storage supports stable_page, we don't need to wait for
    writeback to update its contents.
    This patch introduces to use wait_for_stable_page instead of
    wait_on_page_writeback.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

31 Dec, 2015

1 commit

  • do_checkpoint and write_checkpoint can fail due to reasons like triggering
    in a readonly fs or encountering IO error of storage device.

    So it's better to report such error info to user, let user be aware of
    failure of doing checkpoint.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

05 Dec, 2015

2 commits


13 Oct, 2015

2 commits

  • Now, we use ra_meta_pages to reads continuous physical blocks as much as
    possible to improve performance of following reads. However, ra_meta_pages
    uses a synchronous readahead approach by submitting bio with READ, as READ
    is with high priority, it can not be used in the case of preloading blocks,
    and it's not sure when these RAed pages will be used.

    This patch supports asynchronous readahead in ra_meta_pages by tagging bio
    with READA flag in order to allow preloading.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • In recovery or checkpoint flow, we grab pages temperarily in meta inode's
    mapping for caching temperary data, actually, datas in these pages were
    not meta data of f2fs, but still we tag them with REQ_META flag. However,
    lower device like eMMC may do some optimization for data of such type.
    So in order to avoid wrong optimization, we'd better remove such flag
    for temperary non-meta pages.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

10 Oct, 2015

2 commits

  • Protecting recovery flow by using cp_rwsem is not needed, since we have
    prevent triggering any checkpoint by locking cp_mutex previously.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • We have potential overflow issue when calculating size of object, when
    we left shift index with PAGE_CACHE_SHIFT bits, if type of index has only
    32-bits space in 32-bit architecture, left shifting will incur overflow,
    i.e:

    pgoff_t index = 0xFFFFFFFF;
    loff_t size = index << PAGE_CACHE_SHIFT;
    size: 0xFFFFF000

    So we should cast index with 64-bits type to avoid this issue.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

20 Aug, 2015

1 commit


06 Aug, 2015

1 commit

  • When testing with generic/101 in xfstests, error message outputed as below:

    --- tests/generic/101.out
    +++ results//generic/101.out.bad
    @@ -10,10 +10,14 @@
    File foo content after log replay:
    0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
    *
    -0200000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    +0200000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
    *
    0372000
    ...
    (Run 'diff -u tests/generic/101.out results/generic/101.out.bad' to see the entire diff)

    The test flow is like below:
    1. pwrite foo -S 0xaa 0 64K
    2. pwrite foo -S 0xbb 64K 61K
    3. sync
    4. truncate foo 64K
    5. truncate foo 125K
    6. fsync foo
    7. flakey drop writes
    8. umount

    After this test, we expect the data of recovered file will have the first
    64k of data filling with value 0xaa and the next 61k of data filling with
    value 0x00 because we have fsynced it before dropping writes in dm.

    In f2fs, during recovering, we will only recover the valid block address
    in direct node page if it is marked as a fsynced dnode, but block address
    which means invalid/reserved (with value NULL_ADDR/NEW_ADDR) will not be
    recovered. So, the file recovered shows its incorrect data 0xbb in range of
    [61k, 125k].

    In this patch, we fix to recover invalid/reserved block during recover flow.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

05 Aug, 2015

1 commit

  • To avoid meeting garbage data in next free node block at the end of warm
    node chain when doing recovery, we will try to zero out that invalid block.

    If the device is not support discard, our way for zeroing out block is:
    grabbing a temporary zeroed page in meta inode, then, issue write request
    with this page.

    But, we forget to release that temporary page, so our memory usage will
    increase without gaining any hit ratio benefit, so it's better to free it
    for saving memory.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

03 Jun, 2015

1 commit

  • This patch clean up codes through:
    1.rename f2fs_replace_block to __f2fs_replace_block().
    2.introduce new f2fs_replace_block() to include __f2fs_replace_block()
    and some common related codes around __f2fs_replace_block().

    Then, newly introduced function f2fs_replace_block can be used by
    following patch.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

29 May, 2015

2 commits


08 May, 2015

1 commit


17 Apr, 2015

1 commit