29 Jul, 2016

1 commit

  • Pull vfs updates from Al Viro:
    "Assorted cleanups and fixes.

    Probably the most interesting part long-term is ->d_init() - that will
    have a bunch of followups in (at least) ceph and lustre, but we'll
    need to sort the barrier-related rules before it can get used for
    really non-trivial stuff.

    Another fun thing is the merge of ->d_iput() callers (dentry_iput()
    and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all
    except the one in __d_lookup_lru())"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
    fs/dcache.c: avoid soft-lockup in dput()
    vfs: new d_init method
    vfs: Update lookup_dcache() comment
    bdev: get rid of ->bd_inodes
    Remove last traces of ->sync_page
    new helper: d_same_name()
    dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends()
    vfs: clean up documentation
    vfs: document ->d_real()
    vfs: merge .d_select_inode() into .d_real()
    unify dentry_iput() and dentry_unlink_inode()
    binfmt_misc: ->s_root is not going anywhere
    drop redundant ->owner initializations
    ufs: get rid of redundant checks
    orangefs: constify inode_operations
    missed comment updates from ->direct_IO() prototype change
    file_inode(f)->i_mapping is f->f_mapping
    trim fsnotify hooks a bit
    9p: new helper - v9fs_parent_fid()
    debugfs: ->d_parent is never NULL or negative
    ...

    Linus Torvalds
     

28 Jul, 2016

1 commit

  • Pull f2fs updates from Jaegeuk Kim:
    "The major change in this version is mitigating cpu overheads on write
    paths by replacing redundant inode page updates with mark_inode_dirty
    calls. And we tried to reduce lock contentions as well to improve
    filesystem scalability. Other feature is setting F2FS automatically
    when detecting host-managed SMR.

    Enhancements:
    - ioctl to move a range of data between files
    - inject orphan inode errors
    - avoid flush commands congestion
    - support lazytime

    Bug fixes:
    - return proper results for some dentry operations
    - fix deadlock in add_link failure
    - disable extent_cache for fcollapse/finsert"

    * tag 'for-f2fs-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
    f2fs: clean up coding style and redundancy
    f2fs: get victim segment again after new cp
    f2fs: handle error case with f2fs_bug_on
    f2fs: avoid data race when deciding checkpoin in f2fs_sync_file
    f2fs: support an ioctl to move a range of data blocks
    f2fs: fix to report error number of f2fs_find_entry
    f2fs: avoid memory allocation failure due to a long length
    f2fs: reset default idle interval value
    f2fs: use blk_plug in all the possible paths
    f2fs: fix to avoid data update racing between GC and DIO
    f2fs: add maximum prefree segments
    f2fs: disable extent_cache for fcollapse/finsert inodes
    f2fs: refactor __exchange_data_block for speed up
    f2fs: fix ERR_PTR returned by bio
    f2fs: avoid mark_inode_dirty
    f2fs: move i_size_write in f2fs_write_end
    f2fs: fix to avoid redundant discard during fstrim
    f2fs: avoid mismatching block range for discard
    f2fs: fix incorrect f_bfree calculation in ->statfs
    f2fs: use percpu_rw_semaphore
    ...

    Linus Torvalds
     

27 Jul, 2016

2 commits

  • Merge updates from Andrew Morton:

    - a few misc bits

    - ocfs2

    - most(?) of MM

    * emailed patches from Andrew Morton : (125 commits)
    thp: fix comments of __pmd_trans_huge_lock()
    cgroup: remove unnecessary 0 check from css_from_id()
    cgroup: fix idr leak for the first cgroup root
    mm: memcontrol: fix documentation for compound parameter
    mm: memcontrol: remove BUG_ON in uncharge_list
    mm: fix build warnings in
    mm, thp: convert from optimistic swapin collapsing to conservative
    mm, thp: fix comment inconsistency for swapin readahead functions
    thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
    shmem: split huge pages beyond i_size under memory pressure
    thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
    khugepaged: add support of collapse for tmpfs/shmem pages
    shmem: make shmem_inode_info::lock irq-safe
    khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
    thp: extract khugepaged from mm/huge_memory.c
    shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
    shmem: add huge pages support
    shmem: get_unmapped_area align huge page
    shmem: prepare huge= mount option and sysfs knob
    mm, rmap: account shmem thp pages
    ...

    Linus Torvalds
     
  • Vladimir has noticed that we might declare memcg oom even during
    readahead because read_pages only uses GFP_KERNEL (with mapping_gfp
    restriction) while __do_page_cache_readahead uses
    page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from
    OOMs. This gfp mask discrepancy is really unfortunate and easily
    fixable. Drop page_cache_alloc_readahead() which only has one user and
    outsource the gfp_mask logic into readahead_gfp_mask and propagate this
    mask from __do_page_cache_readahead down to read_pages.

    This alone would have only very limited impact as most filesystems are
    implementing ->readpages and the common implementation mpage_readpages
    does GFP_KERNEL (with mapping_gfp restriction) again. We can tell it to
    use readahead_gfp_mask instead as this function is called only during
    readahead as well. The same applies to read_cache_pages.

    ext4 has its own ext4_mpage_readpages but the path which has pages !=
    NULL can use the same gfp mask. Btrfs, cifs, f2fs and orangefs are
    doing a very similar pattern to mpage_readpages so the same can be
    applied to them as well.

    [akpm@linux-foundation.org: coding-style fixes]
    [mhocko@suse.com: restrict gfp mask in mpage_alloc]
    Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz
    Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Cc: Vladimir Davydov
    Cc: Chris Mason
    Cc: Steve French
    Cc: Theodore Ts'o
    Cc: Jan Kara
    Cc: Mike Marshall
    Cc: Jaegeuk Kim
    Cc: Changman Lee
    Cc: Chao Yu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

26 Jul, 2016

1 commit


23 Jul, 2016

1 commit

  • Previous selected segment may become free after write_checkpoint,
    if we do garbage collect on this segment, and then new_curseg happen
    to reuse it, it may cause f2fs_bug_on as below.

    panic+0x154/0x29c
    do_garbage_collect+0x15c/0xaf4
    f2fs_gc+0x2dc/0x444
    f2fs_balance_fs.part.22+0xcc/0x14c
    f2fs_balance_fs+0x28/0x34
    f2fs_map_blocks+0x5ec/0x790
    f2fs_preallocate_blocks+0xe0/0x100
    f2fs_file_write_iter+0x64/0x11c
    new_sync_write+0xac/0x11c
    vfs_write+0x144/0x1e4
    SyS_write+0x60/0xc0

    Here, maybe we check sit and ssa type during reset_curseg. So, we check
    segment is stale or not, and select a new victim to avoid this.

    Signed-off-by: Yunlei He
    Signed-off-by: Jaegeuk Kim

    Yunlei He
     

21 Jul, 2016

5 commits

  • These two are confusing leftover of the old world order, combining
    values of the REQ_OP_ and REQ_ namespaces. For callers that don't
    special case we mostly just replace bi_rw with bio_data_dir or
    op_is_write, except for the few cases where a switch over the REQ_OP_
    values makes more sense. Any check for READA is replaced with an
    explicit check for REQ_RAHEAD. Also remove the READA alias for
    REQ_RAHEAD.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Mike Christie
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • It's enough to show BUG or WARN by f2fs_bug_on for error case.
    Then, we don't need to remain corrupted filesystem.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • When fs utilization is almost full, f2fs_sync_file should do checkpoint if
    there is not enough space for roll-forward later. (i.e. space_for_roll_forward)
    So, currently we have no lock for sbi->alloc_valid_block_count, resulting in
    race condition.

    In rare case, we can get -ENOSPC when doing roll-forward which triggers

    if (is_valid_blkaddr(sbi, dest, META_POR)) {
    if (src == NULL_ADDR) {
    err = reserve_new_block(&dn);
    f2fs_bug_on(sbi, err);
    ...
    }
    ...
    }
    in do_recover_data.

    So, this patch avoids that situation in advance.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch implements moving a range of data blocks from source file to
    destination file.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch fixes to report the right error number of f2fs_find_entry to
    its caller.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

19 Jul, 2016

1 commit


16 Jul, 2016

7 commits

  • The default value of idle interval is 2 mins, but for most time when
    screen shutdown, there are still operations during the 2 mins interval,
    and gc's sleep time is about 30 secs to 60 secs, so there is almost no
    chance for GC thread to do garbage collecting.

    Set default value of idle interval value from 2 mins to 5 secs for
    fixing.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch reverts 19a5f5e2ef37 (f2fs: drop any block plugging),
    and adds blk_plug in write paths additionally.

    The main reason is that blk_start_plug can be used to wake up from low-power
    mode before submitting further bios.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Datas in file can be operated by GC and DIO simultaneously, so we will
    face race case as below:

    For write case:
    Thread A Thread B
    - generic_file_direct_write
    - invalidate_inode_pages2_range
    - f2fs_direct_IO
    - do_blockdev_direct_IO
    - do_direct_IO
    - get_more_blocks
    - f2fs_gc
    - do_garbage_collect
    - gc_data_segment
    - move_data_page
    - do_write_data_page
    migrate data block to new block address
    - dio_bio_submit
    update user data to old block address

    For read case:
    Thread A Thread B
    - generic_file_direct_write
    - invalidate_inode_pages2_range
    - f2fs_direct_IO
    - do_blockdev_direct_IO
    - do_direct_IO
    - get_more_blocks
    - f2fs_balance_fs
    - f2fs_gc
    - do_garbage_collect
    - gc_data_segment
    - move_data_page
    - do_write_data_page
    migrate data block to new block address
    - write_checkpoint
    - do_checkpoint
    - clear_prefree_segments
    - f2fs_issue_discard
    discard old block adress
    - dio_bio_submit
    update user buffer from obsolete block address

    In order to fix this, for one file, we should let DIO and GC getting exclusion
    against with each other.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • In 1TB storage, we need to admit 22841 prefree segments, which can consume
    too much segments.
    This patch sets 8GB in max. prefree segments in that case.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This reduces the elapsed time to do xfstests/generic/017.

    Before: 458 s
    After: 390 s

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This reduces the elapsed time to do xfstests/generic/017.

    Before: 715 s
    After: 458 s

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This is to fix wrong error pointer handling flow reported by Dan.

    Reported-by: Dan Carpenter
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

09 Jul, 2016

15 commits

  • Let's check inode's dirtiness before calling mark_inode_dirty.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • We don't need to do i_size_write under page lock.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • With below test steps, f2fs will issue redundant discard when doing fstrim,
    the reason is that we issue discards for both prefree segments and
    consecutive freed region user wants to trim, part regions they covered are
    overlapped, here, we change to do not to issue any discards for prefree
    segments in trimmed range.

    1. mount -t f2fs -o discard /dev/zram0 /mnt/f2fs
    2. fstrim -o 0 -l 3221225472 -m 2097152 -v /mnt/f2fs/
    3. dd if=/dev/zero of=/mnt/f2fs/a bs=2M count=1
    4. dd if=/dev/zero of=/mnt/f2fs/b bs=1M count=1
    5. sync
    6. rm /mnt/f2fs/a /mnt/f2fs/b
    7. fstrim -o 0 -l 3221225472 -m 2097152 -v /mnt/f2fs/

    Before:
    -5428 [001] ...1 9511.052125: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x200
    -5428 [001] ...1 9511.052787: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x300

    After:
    -6764 [000] ...1 9720.382504: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x300

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch skip discard block range smaller than trim_minlen,
    and can not be merged by neighbour

    Signed-off-by: Yunlei He
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Yunlei He
     
  • As manual described, f_bfree indicates total free blocks in fs, in f2fs, it
    includes two parts: visible free blocks and over-provision blocks. This
    patch corrrects the calculation.

    fsblkcnt_t f_bfree; /* free blocks in fs */

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch replaces rw_semaphore with percpu_rw_semaphore for:
    sbi->cp_rwsem
    nm_i->nat_tree_lock

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • If the node page is up-to-date, it should be alive.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch shrinks the critical region in spin_lock.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • SetPageUptodate() issues memory barrier, resulting in performance degrdation.
    Let's avoid that.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch adds f2fs_set_page_dirty_nobuffer() copied from __set_page_dirty_buffer.
    When appending 4KB blocks in f2fs on pmem with multiple cores, this improves the
    overall performance.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • When base_addr is NULL, there is no need to call kzfree,
    it should return -ENOMEM directly. Additionally, it is
    better to initialize variable 'error' with 0.

    Signed-off-by: Tiezhu Yang
    Signed-off-by: Jaegeuk Kim

    Tiezhu Yang
     
  • This patch adds 'nodiscard' mount option.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • If we fail to move data page during foreground GC, we should give another
    chance to writeback that page which was set dirty previously by writer.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • In procedure of synchonized read, after sending out the read request, reader
    will try to lock the page for waiting device to finish the read jobs and
    unlock the page, but meanwhile, truncater will race with reader, so after
    reader get lock of the page, it should check page's mapping to detect
    whether someone has truncated the page in advance, then reader has the
    chance to do the retry if truncation was done, otherwise read can be failed
    due to previous condition check.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • For encrypted inode, if user overwrites data of the inode, f2fs will read
    encrypted data into page cache, and then do the decryption.

    However reader can race with overwriter, and it will see encrypted data
    which has not been decrypted by overwriter yet. Fix it by moving decrypting
    work to background and keep page non-uptodated until data is decrypted.

    Thread A Thread B
    - f2fs_file_write_iter
    - __generic_file_write_iter
    - generic_perform_write
    - f2fs_write_begin
    - f2fs_submit_page_bio
    - generic_file_read_iter
    - do_generic_file_read
    - lock_page_killable
    - unlock_page
    - copy_page_to_iter
    hit the encrypted data in updated page
    - lock_page
    - fscrypt_decrypt_page

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

07 Jul, 2016

6 commits