02 Apr, 2014

1 commit

  • We should unlock page in ->readpage() path and also should unlock & release page
    in error path of ->write_begin() to avoid deadlock or memory leak.
    So let's add release code to fix the problem when we fail to read inline data.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

01 Apr, 2014

1 commit


18 Mar, 2014

3 commits


28 Feb, 2014

1 commit

  • We should de-account dirty counters for page when redirty in ->writepage().

    Wu Fengguang described in 'commit 971767caf632190f77a40b4011c19948232eed75':
    "writeback: fix dirtied pages accounting on redirty
    De-account the accumulative dirty counters on page redirty.

    Page redirties (very common in ext4) will introduce mismatch between
    counters (a) and (b)

    a) NR_DIRTIED, BDI_DIRTIED, tsk->nr_dirtied
    b) NR_WRITTEN, BDI_WRITTEN

    This will introduce systematic errors in balanced_rate and result in
    dirty page position errors (ie. the dirty pages are no longer balanced
    around the global/bdi setpoints)."

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

24 Feb, 2014

1 commit

  • Even if f2fs_write_data_page is called by the page reclaiming path, we should
    not write the page to provide enough free segments for the worst case scenario.
    Otherwise, f2fs can face with no free segment while gc is conducted, resulting
    in:

    ------------[ cut here ]------------
    kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/segment.c:565!
    RIP: 0010:[] [] new_curseg+0x331/0x340 [f2fs]
    Call Trace:
    allocate_segment_by_default+0x204/0x280 [f2fs]
    allocate_data_block+0x108/0x210 [f2fs]
    write_data_page+0x8a/0xc0 [f2fs]
    do_write_data_page+0xe1/0x2a0 [f2fs]
    move_data_page+0x8a/0xf0 [f2fs]
    f2fs_gc+0x446/0x970 [f2fs]
    f2fs_balance_fs+0xb6/0xd0 [f2fs]
    f2fs_write_begin+0x50/0x350 [f2fs]
    ? unlock_page+0x27/0x30
    ? unlock_page+0x27/0x30
    generic_file_buffered_write+0x10a/0x280
    ? file_update_time+0xa3/0xf0
    __generic_file_aio_write+0x1c8/0x3d0
    ? generic_file_aio_write+0x52/0xb0
    ? generic_file_aio_write+0x52/0xb0
    generic_file_aio_write+0x65/0xb0
    do_sync_write+0x5a/0x90
    vfs_write+0xc5/0x1f0
    SyS_write+0x55/0xa0
    system_call_fastpath+0x16/0x1b

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

17 Feb, 2014

3 commits


31 Jan, 2014

1 commit

  • Pull core block IO changes from Jens Axboe:
    "The major piece in here is the immutable bio_ve series from Kent, the
    rest is fairly minor. It was supposed to go in last round, but
    various issues pushed it to this release instead. The pull request
    contains:

    - Various smaller blk-mq fixes from different folks. Nothing major
    here, just minor fixes and cleanups.

    - Fix for a memory leak in the error path in the block ioctl code
    from Christian Engelmayer.

    - Header export fix from CaiZhiyong.

    - Finally the immutable biovec changes from Kent Overstreet. This
    enables some nice future work on making arbitrarily sized bios
    possible, and splitting more efficient. Related fixes to immutable
    bio_vecs:

    - dm-cache immutable fixup from Mike Snitzer.
    - btrfs immutable fixup from Muthu Kumar.

    - bio-integrity fix from Nic Bellinger, which is also going to stable"

    * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
    xtensa: fixup simdisk driver to work with immutable bio_vecs
    block/blk-mq-cpu.c: use hotcpu_notifier()
    blk-mq: for_each_* macro correctness
    block: Fix memory leak in rw_copy_check_uvector() handling
    bio-integrity: Fix bio_integrity_verify segment start bug
    block: remove unrelated header files and export symbol
    blk-mq: uses page->list incorrectly
    blk-mq: use __smp_call_function_single directly
    btrfs: fix missing increment of bi_remaining
    Revert "block: Warn and free bio if bi_end_io is not set"
    block: Warn and free bio if bi_end_io is not set
    blk-mq: fix initializing request's start time
    block: blk-mq: don't export blk_mq_free_queue()
    block: blk-mq: make blk_sync_queue support mq
    block: blk-mq: support draining mq queue
    dm cache: increment bi_remaining when bi_end_io is restored
    block: fixup for generic bio chaining
    block: Really silence spurious compiler warnings
    block: Silence spurious compiler warnings
    block: Kill bio_pair_split()
    ...

    Linus Torvalds
     

22 Jan, 2014

1 commit


20 Jan, 2014

1 commit


16 Jan, 2014

2 commits

  • Doing sync_meta_pages with META_FLUSH when checkpoint, we overide rw
    using WRITE_FLUSH_FUA. At this time, we also should set
    REQ_META|REQ_PRIO.

    Signed-off-by: Changman Lee
    Signed-off-by: Jaegeuk Kim

    Changman Lee
     
  • This patch should resolve the following bug.

    =========================================================
    [ INFO: possible irq lock inversion dependency detected ]
    3.13.0-rc5.f2fs+ #6 Not tainted
    ---------------------------------------------------------
    kswapd0/41 just changed the state of lock:
    (&sbi->gc_mutex){+.+.-.}, at: [] f2fs_balance_fs+0xae/0xd0 [f2fs]
    but this lock took another, RECLAIM_FS-READ-unsafe lock in the past:
    (&sbi->cp_rwsem){++++.?}

    and interrupts could create inverse lock ordering between them.

    other info that might help us debug this:
    Chain exists of:
    &sbi->gc_mutex --> &sbi->cp_mutex --> &sbi->cp_rwsem

    Possible interrupt unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(&sbi->cp_rwsem);
    local_irq_disable();
    lock(&sbi->gc_mutex);
    lock(&sbi->cp_mutex);

    lock(&sbi->gc_mutex);

    *** DEADLOCK ***

    This bug is due to the f2fs_balance_fs call in f2fs_write_data_page.
    If f2fs_write_data_page is triggered by wbc->for_reclaim via kswapd, it should
    not call f2fs_balance_fs which tries to get a mutex grabbed by original syscall
    flow.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

14 Jan, 2014

1 commit


06 Jan, 2014

3 commits

  • The get_dnode_of_data nullifies inode and node page when error is occurred.

    There are two cases that passes inode page into get_dnode_of_data().

    1. make_empty_dir()
    -> get_new_data_page()
    -> f2fs_reserve_block(ipage)
    -> get_dnode_of_data()

    2. f2fs_convert_inline_data()
    -> __f2fs_convert_inline_data()
    -> f2fs_reserve_block(ipage)
    -> get_dnode_of_data()

    This patch adds correct error handling codes when get_dnode_of_data() returns
    an error.

    At first, f2fs_reserve_block() calls f2fs_put_dnode() whenever reserve_new_block
    returns an error.
    So, the rule of f2fs_reserve_block() is to nullify inode page when there is any
    error internally.

    Finally, two callers of f2fs_reserve_block() should call f2fs_put_dnode()
    appropriately if they got an error since successful f2fs_reserve_block().

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Change log from v1:
    o handle NULL pointer of grab_cache_page_write_begin() pointed by Chao Yu.

    This patch refactors f2fs_convert_inline_data to check a couple of conditions
    internally for deciding whether it needs to convert inline_data or not.

    So, the new f2fs_convert_inline_data initially checks:
    1) f2fs_has_inline_data(), and
    2) the data size to be changed.

    If the inode has inline_data but the size to fill is less than MAX_INLINE_DATA,
    then we don't need to convert the inline_data with data allocation.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • In f2fs_write_begin(), if f2fs_conver_inline_data() returns an error like
    -ENOSPC, f2fs should call f2fs_put_page().
    Otherwise, it is remained as a locked page, resulting in the following bug.

    [] sleep_on_page+0xe/0x20
    [] __lock_page+0x67/0x70
    [] truncate_inode_pages_range+0x368/0x5d0
    [] truncate_inode_pages+0x15/0x20
    [] truncate_pagecache+0x4b/0x70
    [] truncate_setsize+0x12/0x20
    [] f2fs_setattr+0x72/0x270 [f2fs]
    [] notify_change+0x213/0x400
    [] do_truncate+0x66/0xa0
    [] vfs_truncate+0x191/0x1b0
    [] do_sys_truncate+0x5c/0xa0
    [] SyS_truncate+0xe/0x10
    [] system_call_fastpath+0x16/0x1b
    [] 0xffffffffffffffff

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

26 Dec, 2013

3 commits

  • Hook inline data read/write, truncate, fallocate, setattr, etc.

    Files need meet following 2 requirement to inline:
    1) file size is not greater than MAX_INLINE_DATA;
    2) file doesn't pre-allocate data blocks by fallocate().

    FI_INLINE_DATA will not be set while creating a new regular inode because
    most of the files are bigger than ~3.4K. Set FI_INLINE_DATA only when
    data is submitted to block layer, ranther than set it while creating a new
    inode, this also avoids converting data from inline to normal data block
    and vice versa.

    While writting inline data to inode block, the first data block should be
    released if the file has a block indexed by i_addr[0].

    On the other hand, when a file operation is appied to a file with inline
    data, we need to test if this file can remain inline by doing this
    operation, otherwise it should be convert into normal file by reserving
    a new data block, copying inline data to this new block and clear
    FI_INLINE_DATA flag. Because reserve a new data block here will make use
    of i_addr[0], if we save inline data in i_addr[0..872], then the first
    4 bytes would be overwriten. This problem can be avoided simply by
    not using i_addr[0] for inline data.

    Signed-off-by: Huajun Li
    Signed-off-by: Haicheng Li
    Signed-off-by: Weihong Xu
    Signed-off-by: Jaegeuk Kim

    Huajun Li
     
  • The f2fs supports 4KB block size. If user requests dwrite with under 4KB data,
    it allocates a new 4KB data block.
    However, f2fs doesn't add zero data into the untouched data area inside the
    newly allocated data block.

    This incurs an error during the xfstest #263 test as follow.

    263 12s ... [failed, exit status 1] - output mismatch (see 263.out.bad)
    --- 263.out 2013-03-09 03:37:15.043967603 +0900
    +++ 263.out.bad 2013-12-27 04:20:39.230203114 +0900
    @@ -1,3 +1,976 @@
    QA output created by 263
    fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z
    -fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z
    +fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z
    +truncating to largest ever: 0x12a00
    +truncating to largest ever: 0x75400
    +fallocating to largest ever: 0x79cbf
    ...
    (Run 'diff -u 263.out 263.out.bad' to see the entire diff)
    Ran: 263
    Failures: 263
    Failed 1 of 1 tests

    It turns out that, when the test tries to write 2KB data with dio, the new dio
    path allocates 4KB data block without filling zero data inside the remained 2KB
    area. Finally, the output file contains a garbage data for that region.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • When get_dnode_of_data() in get_data_block() returns a successful dnode, we
    should put the dnode.
    But, previously, if its data block address is equal to NEW_ADDR, we didn't do
    that, resulting in a deadlock condition.
    So, this patch splits original error conditions with this case, and then calls
    f2fs_put_dnode before finishing the function.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

23 Dec, 2013

18 commits

  • Update several comments:
    1. use f2fs_{un}lock_op install of mutex_{un}lock_op.
    2. update comment of get_data_block().
    3. update description of node offset.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • When using the f2fs_io_info in the low level, we still need to merge the
    rw and rw_flag, so use the rw to hold all the io flags directly,
    and remove the rw_flag field.

    ps.It is based on the previous patch:
    f2fs: move all the bio initialization into __bio_alloc

    Signed-off-by: Gu Zheng
    Signed-off-by: Jaegeuk Kim

    Gu Zheng
     
  • Move all the bio initialization into __bio_alloc, and some minor cleanups are
    also added.

    v3:
    Use 'bool' rather than 'int' as Kim suggested.

    v2:
    Use 'is_read' rather than 'rw' as Yu Chao suggested.
    Remove the needless initialization of bio->bi_private.

    Signed-off-by: Gu Zheng
    Signed-off-by: Jaegeuk Kim

    Gu Zheng
     
  • Previously, f2fs doesn't support direct IOs with high performance, which throws
    every write requests via the buffered write path, resulting in highly
    performance degradation due to memory opeations like copy_from_user.

    This patch introduces a new direct IO path in which every write requests are
    processed by generic blockdev_direct_IO() with enhanced get_block function.

    The get_data_block() in f2fs handles:
    1. if original data blocks are allocates, then give them to blockdev.
    2. otherwise,
    a. preallocate requested block addresses
    b. do not use extent cache for better performance
    c. give the block addresses to blockdev

    This policy induces that:
    - new allocated data are sequentially written to the disk
    - updated data are randomly written to the disk.
    - f2fs gives consistency on its file meta, not file data.

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • We need to get a trace before submit_bio, since its bi_sector is remapped during
    the submit_bio.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch introduces f2fs_io_info to mitigate the complex parameter list.

    struct f2fs_io_info {
    enum page_type type; /* contains DATA/NODE/META/META_FLUSH */
    int rw; /* contains R/RS/W/WS */
    int rw_flag; /* contains REQ_META/REQ_PRIO */
    }

    1. f2fs_write_data_pages
    - DATA
    - WRITE_SYNC is set when wbc->WB_SYNC_ALL.

    2. sync_node_pages
    - NODE
    - WRITE_SYNC all the time

    3. sync_meta_pages
    - META
    - WRITE_SYNC all the time
    - REQ_META | REQ_PRIO all the time

    ** f2fs_submit_merged_bio() handles META_FLUSH.

    4. ra_nat_pages, ra_sit_pages, ra_sum_pages
    - META
    - READ_SYNC

    Cc: Fan Li
    Cc: Changman Lee
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Previously f2fs submits most of write requests using WRITE_SYNC, but f2fs_write_data_pages
    submits last write requests by sync_mode flags callers pass.

    This causes a performance problem since continuous pages with different sync flags
    can't be merged in cfq IO scheduler(thanks yu chao for pointing it out), and synchronous
    requests often take more time.

    This patch makes the following modifies to DATA writebacks:

    1. every page will be written back using the sync mode caller pass.
    2. only pages with the same sync mode can be merged in one bio request.

    These changes are restricted to DATA pages.Other types of writebacks are modified
    To remain synchronous.

    In my test with tiotest, f2fs sequence write performance is improved by about 7%-10% ,
    and this patch has no obvious impact on other performance tests.

    Signed-off-by: Fan Li
    Signed-off-by: Jaegeuk Kim

    Fan Li
     
  • This patch adds unlikely() macro into the most of codes.
    The basic rule is to add that when:
    - checking unusual errors,
    - checking page mappings,
    - and the other unlikely conditions.

    Change log from v1:
    - Don't add unlikely for the NULL test and error test: advised by Andi Kleen.

    Cc: Chao Yu
    Cc: Andi Kleen
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • As we know, some of our branch condition will rarely be true. So we could add
    'unlikely' to let compiler optimize these code, by this way we could drop
    unneeded 'jump' assemble code to improve performance.

    change log:
    o add *unlikely* as many as possible across the whole source files at once
    suggested by Jaegeuk Kim.

    Suggested-by: Jaegeuk Kim
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch integrates redundant bio operations on read and write IOs.

    1. Move bio-related codes to the top of data.c.
    2. Replace f2fs_submit_bio with f2fs_submit_merged_bio, which handles read
    bios additionally.
    3. Introduce __submit_merged_bio to submit the merged bio.
    4. Change f2fs_readpage to f2fs_submit_page_bio.
    5. Introduce f2fs_submit_page_mbio to integrate previous submit_read_page and
    submit_write_page.

    Reviewed-by: Gu Zheng
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • We should return error if we do not get an updated page in find_date_page
    when f2fs_readpage failed.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch fixes some bit overflows by the shift operations.

    Dan Carpenter reported potential bugs on bit overflows as follows.

    fs/f2fs/segment.c:910 submit_write_page()
    warn: should 'blk_addr << ((sbi)->log_blocksize - 9)' be a 64 bit type?
    fs/f2fs/checkpoint.c:429 get_valid_checkpoint()
    warn: should '1 << ()' be a 64 bit type?
    fs/f2fs/data.c:408 f2fs_readpage()
    warn: should 'blk_addr << ((sbi)->log_blocksize - 9)' be a 64 bit type?
    fs/f2fs/data.c:457 submit_read_page()
    warn: should 'blk_addr << ((sbi)->log_blocksize - 9)' be a 64 bit type?
    fs/f2fs/data.c:525 get_data_block_ro()
    warn: should 'i << blkbits' be a 64 bit type?

    Bug-Reported-by: Dan Carpenter
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Add the function f2fs_reserve_block() to easily reserve new blocks, and
    use it to clean up more codes.

    Signed-off-by: Huajun Li
    Signed-off-by: Haicheng Li
    Signed-off-by: Weihong Xu
    Signed-off-by: Jaegeuk Kim

    Huajun Li
     
  • This patch adds a tracepoint for f2fs_submit_read_bio.

    Signed-off-by: Chao Yu
    [Jaegeuk Kim: integrate tracepoints of f2fs_submit_read(_write)_bio]
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch adds a tracepoint for submit_read_page.

    Signed-off-by: Chao Yu
    [Jaegeuk Kim: integrate tracepoints of f2fs_submit_read(_write)_page]
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • For better read performance, we add a new function to support for merging
    contiguous read as the one for write.

    v1-->v2:
    o add declarations here as Gu Zheng suggested.
    o use new structure f2fs_bio_info introduced by Jaegeuk Kim.

    Signed-off-by: Chao Yu
    Acked-by: Gu Zheng

    Chao Yu
     
  • The f2fs manages an extent cache to search a number of consecutive data blocks
    very quickly.

    However it conducts unnecessary cache operations if the file is highly
    fragmented with no valid extent cache.

    In such the case, we don't need to handle the extent cache, but just can disable
    the cache facility.

    Nevertheless, this patch gives one more chance to enable the extent cache.

    For example,
    1. create a file
    2. write data sequentially which produces a large valid extent cache
    3. update some data, resulting in a fragmented extent
    4. if the fragmented extent is too small, then drop extent cache
    5. close the file

    6. open the file again
    7. give another chance to make a new extent cache
    8. write data sequentially again which creates another big extent cache.
    ...

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch removes an unnecessary semaphore (i.e., sbi->bio_sem).
    There is no reason to use the semaphore when f2fs submits read and write IOs.
    Instead, let's use a write mutex and cover the sbi->bio[] by the lock.

    Change log from v1:
    o split write_mutex suggested by Chao Yu

    Chao described,
    "All DATA/NODE/META bio buffers in superblock is protected by
    'sbi->write_mutex', but each bio buffer area is independent, So we
    should split write_mutex to three for DATA/NODE/META."

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim