12 Jan, 2017

1 commit

  • commit 230436b3ef3fd7d4a1da19edf5e87bb2d74e0fc2 upstream.

    gcc is unsure about the use of last_ofs_in_node, which might happen
    without a prior initialization:

    fs/f2fs//git/arm-soc/fs/f2fs/data.c: In function ‘f2fs_map_blocks’:
    fs/f2fs/data.c:799:54: warning: ‘last_ofs_in_node’ may be used uninitialized in this function [-Wmaybe-uninitialized]
    if (prealloc && dn.ofs_in_node != last_ofs_in_node + 1) {

    As pointed out by Chao Yu, the code is actually correct as 'prealloc'
    is only set if the last_ofs_in_node has been set, the two always
    get updated together.

    This initializes last_ofs_in_node to dn.ofs_in_node for each
    new dnode at the start of the 'next_block' loop, which at that
    point is a correct initialization as well. I assume that compilers
    that correctly track the contents of the variables and do not
    warn about the condition also figure out that they can eliminate
    the extra assignment here.

    Fixes: 46008c6d4232 ("f2fs: support in batch multi blocks preallocation")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Greg Kroah-Hartman

    Arnd Bergmann
     

12 Oct, 2016

1 commit

  • The mapping_set_error() helper sets the correct AS_ flag for the mapping
    so there is no reason to open code it. Use the helper directly.

    [akpm@linux-foundation.org: be honest about conversion from -ENXIO to -EIO]
    Link: http://lkml.kernel.org/r/20160912111608.2588-2-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

01 Oct, 2016

3 commits

  • While we call ->writepages, there are two cases:
    a. we didn't writeout any dirty pages, since they are writebacked by other
    thread concurrently.
    b. we writeout dirty pages, and have already submitted bio to block layer.

    In these cases, we don't need to do additional bio flushing unnecessarily,
    it may split bio in cache into smaller one.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Previously, we only support global fault injection configuration, so that
    when we configure type/rate of fault injection through sysfs, mount
    option, it will influence all f2fs partition which is being used.

    It is not make sence, since it will be not convenient if developer want
    to test separated partitions with different fault injection rate/type
    simultaneously, also it's not possible to enable fault injection in one
    partition and disable fault injection in other one.

    >From now on, we move global configuration of fault injection in module
    into per-superblock, hence injection testing can be more flexible.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch improves the migration of dirty pages and allows migrating atomic
    written pages that F2FS uses in Page Cache. Instead of the fallback releasing
    page path, it provides better performance for memory compaction, CMA and other
    users of memory page migrating. For dirty pages, there is no need to write back
    first when migrating. For an atomic written page before committing, we can
    migrate the page and update the related 'inmem_pages' list at the same time.

    Signed-off-by: Weichao Guo
    Reviewed-by: Chao Yu
    [Jaegeuk Kim: fix some coding style]
    Signed-off-by: Jaegeuk Kim

    Weichao Guo
     

23 Sep, 2016

2 commits


14 Sep, 2016

1 commit

  • Previously, f2fs_write_begin sets PageUptodate all the time. But, when user
    tries to update the entire page (i.e., len == PAGE_SIZE), we need to consider
    that the page is able to be copied partially afterwards. In such the case,
    we will lose the remaing region in the page.

    This patch fixes this by setting PageUptodate in f2fs_write_end as given copied
    result. In the short copy case, it returns zero to let generic_perform_write
    retry copying user data again.

    As a result, f2fs_write_end() works:
    PageUptodate len copied return retry
    1. no 4096 4096 4096 false -> return 4096
    2. no 4096 1024 0 true -> goto #1 case
    3. yes 2048 2048 2048 false -> return 2048
    4. yes 2048 1024 1024 false -> return 1024

    Suggested-by: Al Viro
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

13 Sep, 2016

1 commit


08 Sep, 2016

1 commit


30 Aug, 2016

3 commits

  • In write_begin(), we skip checking dnode block for preallocating block
    when whole block needs to be updated since we preallocated its block in
    f2fs_preallocate_blocks, for partial updated block, we will still try
    to lock its node and do preallocation in write_begin(), so in
    f2fs_preallocate_blocks we should not preallocate its block.

    But previously, the calculation of preallocating block number is
    incorrect, fix it.

    Signed-off-by: Chao Yu
    [Jaegeuk Kim: fix a bug]
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Fixes the following sparse warning:

    fs/f2fs/data.c:969:12: warning:
    symbol 'f2fs_grab_bio' was not declared. Should it be static?

    Signed-off-by: Wei Yongjun
    Acked-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Wei Yongjun
     
  • If we preallocate blocks with f2fs_reserve_blocks in f2fs_map_blocks, we
    should call f2fs_balance_fs for checking and reclaiming space, fix it.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

19 Aug, 2016

1 commit

  • This reverts commit a2ee0a300344a6da76186129b078113354fe13d2.

    When testing with generic/032 of xfstest suit, failure message will be
    reported as below:

    generic/032 8s ... [failed, exit status 1] - output mismatch (see results/generic/032.out.bad)
    --- tests/generic/032.out 2015-01-11 16:52:27.643681072 +0800
    +++ results/generic/032.out.bad 2016-08-06 13:44:43.861330500 +0800
    @@ -1,5 +1,5 @@
    QA output created by 032
    -100 iterations
    -0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
    -*
    -0100000
    +1: [768..775]: unwritten
    +Unwritten extents found!
    ...
    (Run 'diff -u tests/generic/032.out results/generic/032.out.bad' to see the entire diff)
    Ran: generic/032
    Failures: generic/032
    Failed 1 of 1 tests

    In write_end(), we should update i_size of inode before unlock page,
    otherwise, we will lose newly updated data in following race condition.

    Thread A Thread B
    - write_end
    - unlock page
    - writepages
    - lock_page
    - writepage
    if page is out-of-range of file size,
    we will skip writting the page.
    - update i_size

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

05 Aug, 2016

1 commit


28 Jul, 2016

1 commit

  • Pull f2fs updates from Jaegeuk Kim:
    "The major change in this version is mitigating cpu overheads on write
    paths by replacing redundant inode page updates with mark_inode_dirty
    calls. And we tried to reduce lock contentions as well to improve
    filesystem scalability. Other feature is setting F2FS automatically
    when detecting host-managed SMR.

    Enhancements:
    - ioctl to move a range of data between files
    - inject orphan inode errors
    - avoid flush commands congestion
    - support lazytime

    Bug fixes:
    - return proper results for some dentry operations
    - fix deadlock in add_link failure
    - disable extent_cache for fcollapse/finsert"

    * tag 'for-f2fs-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
    f2fs: clean up coding style and redundancy
    f2fs: get victim segment again after new cp
    f2fs: handle error case with f2fs_bug_on
    f2fs: avoid data race when deciding checkpoin in f2fs_sync_file
    f2fs: support an ioctl to move a range of data blocks
    f2fs: fix to report error number of f2fs_find_entry
    f2fs: avoid memory allocation failure due to a long length
    f2fs: reset default idle interval value
    f2fs: use blk_plug in all the possible paths
    f2fs: fix to avoid data update racing between GC and DIO
    f2fs: add maximum prefree segments
    f2fs: disable extent_cache for fcollapse/finsert inodes
    f2fs: refactor __exchange_data_block for speed up
    f2fs: fix ERR_PTR returned by bio
    f2fs: avoid mark_inode_dirty
    f2fs: move i_size_write in f2fs_write_end
    f2fs: fix to avoid redundant discard during fstrim
    f2fs: avoid mismatching block range for discard
    f2fs: fix incorrect f_bfree calculation in ->statfs
    f2fs: use percpu_rw_semaphore
    ...

    Linus Torvalds
     

27 Jul, 2016

2 commits

  • Merge updates from Andrew Morton:

    - a few misc bits

    - ocfs2

    - most(?) of MM

    * emailed patches from Andrew Morton : (125 commits)
    thp: fix comments of __pmd_trans_huge_lock()
    cgroup: remove unnecessary 0 check from css_from_id()
    cgroup: fix idr leak for the first cgroup root
    mm: memcontrol: fix documentation for compound parameter
    mm: memcontrol: remove BUG_ON in uncharge_list
    mm: fix build warnings in
    mm, thp: convert from optimistic swapin collapsing to conservative
    mm, thp: fix comment inconsistency for swapin readahead functions
    thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
    shmem: split huge pages beyond i_size under memory pressure
    thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
    khugepaged: add support of collapse for tmpfs/shmem pages
    shmem: make shmem_inode_info::lock irq-safe
    khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
    thp: extract khugepaged from mm/huge_memory.c
    shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
    shmem: add huge pages support
    shmem: get_unmapped_area align huge page
    shmem: prepare huge= mount option and sysfs knob
    mm, rmap: account shmem thp pages
    ...

    Linus Torvalds
     
  • Vladimir has noticed that we might declare memcg oom even during
    readahead because read_pages only uses GFP_KERNEL (with mapping_gfp
    restriction) while __do_page_cache_readahead uses
    page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from
    OOMs. This gfp mask discrepancy is really unfortunate and easily
    fixable. Drop page_cache_alloc_readahead() which only has one user and
    outsource the gfp_mask logic into readahead_gfp_mask and propagate this
    mask from __do_page_cache_readahead down to read_pages.

    This alone would have only very limited impact as most filesystems are
    implementing ->readpages and the common implementation mpage_readpages
    does GFP_KERNEL (with mapping_gfp restriction) again. We can tell it to
    use readahead_gfp_mask instead as this function is called only during
    readahead as well. The same applies to read_cache_pages.

    ext4 has its own ext4_mpage_readpages but the path which has pages !=
    NULL can use the same gfp mask. Btrfs, cifs, f2fs and orangefs are
    doing a very similar pattern to mpage_readpages so the same can be
    applied to them as well.

    [akpm@linux-foundation.org: coding-style fixes]
    [mhocko@suse.com: restrict gfp mask in mpage_alloc]
    Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz
    Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Cc: Vladimir Davydov
    Cc: Chris Mason
    Cc: Steve French
    Cc: Theodore Ts'o
    Cc: Jan Kara
    Cc: Mike Marshall
    Cc: Jaegeuk Kim
    Cc: Changman Lee
    Cc: Chao Yu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

26 Jul, 2016

1 commit


16 Jul, 2016

3 commits

  • This patch reverts 19a5f5e2ef37 (f2fs: drop any block plugging),
    and adds blk_plug in write paths additionally.

    The main reason is that blk_start_plug can be used to wake up from low-power
    mode before submitting further bios.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Datas in file can be operated by GC and DIO simultaneously, so we will
    face race case as below:

    For write case:
    Thread A Thread B
    - generic_file_direct_write
    - invalidate_inode_pages2_range
    - f2fs_direct_IO
    - do_blockdev_direct_IO
    - do_direct_IO
    - get_more_blocks
    - f2fs_gc
    - do_garbage_collect
    - gc_data_segment
    - move_data_page
    - do_write_data_page
    migrate data block to new block address
    - dio_bio_submit
    update user data to old block address

    For read case:
    Thread A Thread B
    - generic_file_direct_write
    - invalidate_inode_pages2_range
    - f2fs_direct_IO
    - do_blockdev_direct_IO
    - do_direct_IO
    - get_more_blocks
    - f2fs_balance_fs
    - f2fs_gc
    - do_garbage_collect
    - gc_data_segment
    - move_data_page
    - do_write_data_page
    migrate data block to new block address
    - write_checkpoint
    - do_checkpoint
    - clear_prefree_segments
    - f2fs_issue_discard
    discard old block adress
    - dio_bio_submit
    update user buffer from obsolete block address

    In order to fix this, for one file, we should let DIO and GC getting exclusion
    against with each other.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This is to fix wrong error pointer handling flow reported by Dan.

    Reported-by: Dan Carpenter
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

09 Jul, 2016

5 commits

  • We don't need to do i_size_write under page lock.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • SetPageUptodate() issues memory barrier, resulting in performance degrdation.
    Let's avoid that.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch adds f2fs_set_page_dirty_nobuffer() copied from __set_page_dirty_buffer.
    When appending 4KB blocks in f2fs on pmem with multiple cores, this improves the
    overall performance.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • In procedure of synchonized read, after sending out the read request, reader
    will try to lock the page for waiting device to finish the read jobs and
    unlock the page, but meanwhile, truncater will race with reader, so after
    reader get lock of the page, it should check page's mapping to detect
    whether someone has truncated the page in advance, then reader has the
    chance to do the retry if truncation was done, otherwise read can be failed
    due to previous condition check.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • For encrypted inode, if user overwrites data of the inode, f2fs will read
    encrypted data into page cache, and then do the decryption.

    However reader can race with overwriter, and it will see encrypted data
    which has not been decrypted by overwriter yet. Fix it by moving decrypting
    work to background and keep page non-uptodated until data is decrypted.

    Thread A Thread B
    - f2fs_file_write_iter
    - __generic_file_write_iter
    - generic_perform_write
    - f2fs_write_begin
    - f2fs_submit_page_bio
    - generic_file_read_iter
    - do_generic_file_read
    - lock_page_killable
    - unlock_page
    - copy_page_to_iter
    hit the encrypted data in updated page
    - lock_page
    - fscrypt_decrypt_page

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

07 Jul, 2016

2 commits


14 Jun, 2016

1 commit


09 Jun, 2016

2 commits


08 Jun, 2016

2 commits


03 Jun, 2016

6 commits

  • Previously, f2fs_write_data_pages() calls __f2fs_writepage() which calls
    f2fs_write_data_page().
    If f2fs_write_data_page() returns AOP_WRITEPAGE_ACTIVATE, __f2fs_writepage()
    calls mapping_set_error(). But, this should not happen at every time, since
    sometimes f2fs_write_data_page() tries to skip writing pages without error.
    For example, volatile_write() gives EIO all the time, as Shuoran Liu pointed
    out.

    Reported-by: Shuoran Liu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • If there is no cold page, we don't need to do a loop to flush dirty
    data pages.

    On /dev/pmem0,

    1. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048 conv=fsync
    Before : 1.1 GB/s
    After : 1.2 GB/s

    2. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048
    Before : 2.2 GB/s
    After : 2.3 GB/s

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • For data pages, let's try to flush as much as possible in background.

    On /dev/pmem0,

    1. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048 conv=fsync
    Before : 800 MB/s
    After : 1.1 GB/s

    2. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048
    Before : 1.3 GB/s
    After : 2.2 GB/s

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch removes writepages lock.
    We can improve multi-threading performance.

    tiobench, 32 threads, 4KB write per fsync on SSD
    Before: 25.88 MB/s
    After: 28.03 MB/s

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • If roll-forward recovery can recover i_size, we don't need to update inode's
    metadata during fsync.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch reduces to call them across the whole tree.
    - sync_inode_page()
    - update_inode_page()
    - update_inode()
    - f2fs_write_inode()

    Instead, checkpoint will flush all the dirty inode metadata before syncing
    node pages.
    Note that, this is doable, since we call mark_inode_dirty_sync() for all
    inode's field change which needs to update on-disk inode as well.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim