13 Sep, 2016

1 commit


25 Aug, 2016

1 commit

  • we came across an error as below:

    [build_nat_area_bitmap:1710] nid[0x 1718] addr[0x 1c18ddc] ino[0x 1718]
    [build_nat_area_bitmap:1710] nid[0x 1719] addr[0x 1c193d5] ino[0x 1719]
    [build_nat_area_bitmap:1710] nid[0x 171a] addr[0x 1c1736e] ino[0x 171a]
    [build_nat_area_bitmap:1710] nid[0x 171b] addr[0x 58b3ee8f] ino[0x815f92ed]
    [build_nat_area_bitmap:1710] nid[0x 171c] addr[0x fcdc94b] ino[0x49366377]
    [build_nat_area_bitmap:1710] nid[0x 171d] addr[0x 7cd2facf] ino[0xb3c55300]
    [build_nat_area_bitmap:1710] nid[0x 171e] addr[0x bd4e25d0] ino[0x77c34c09]

    ... ...

    [build_nat_area_bitmap:1710] nid[0x 1718] addr[0x 1c18ddc] ino[0x 1718]
    [build_nat_area_bitmap:1710] nid[0x 1719] addr[0x 1c193d5] ino[0x 1719]
    [build_nat_area_bitmap:1710] nid[0x 171a] addr[0x 1c1736e] ino[0x 171a]
    [build_nat_area_bitmap:1710] nid[0x 171b] addr[0x 58b3ee8f] ino[0x815f92ed]
    [build_nat_area_bitmap:1710] nid[0x 171c] addr[0x fcdc94b] ino[0x49366377]
    [build_nat_area_bitmap:1710] nid[0x 171d] addr[0x 7cd2facf] ino[0xb3c55300]
    [build_nat_area_bitmap:1710] nid[0x 171e] addr[0x bd4e25d0] ino[0x77c34c09]

    One nat block may be stepped by a data block, so this patch forbid to
    write if the blkaddr is illegal

    Signed-off-by: Yunlei He
    Signed-off-by: Jaegeuk Kim

    Yunlei He
     

16 Jul, 2016

1 commit


07 Jul, 2016

1 commit


14 Jun, 2016

1 commit


03 Jun, 2016

3 commits


08 May, 2016

1 commit

  • Restructure struct seg_entry to eliminate holes in it, after that,
    in 32-bits machine, it reduces size from 32 bytes to 24 bytes; in
    64-bits machine, it reduces size from 56 bytes to 40 bytes.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

23 Feb, 2016

3 commits

  • In curseg cache, f2fs caches two different parts:
    - datas of current summay block, i.e. summary entries, footer info.
    - journal info, i.e. sparse nat/sit entries or io stat info.

    With this approach, 1) it may cause higher lock contention when we access
    or update both of the parts of cache since we use the same mutex lock
    curseg_mutex to protect the cache. 2) current summary block with last
    journal info will be writebacked into device as a normal summary block
    when flushing, however, we treat journal info as valid one only in current
    summary, so most normal summary blocks contain junk journal data, it wastes
    remaining space of summary block.

    So, in order to fix above issues, we split curseg cache into two parts:
    a) current summary block, protected by original mutex lock curseg_mutex
    b) journal cache, protected by newly introduced r/w semaphore journal_rwsem

    When loading curseg cache during ->mount, we store summary info and
    journal info into different caches; When doing checkpoint, we combine
    datas of two cache into current summary block for persisting.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • f2fs support atomic write with following semantics:
    1. open db file
    2. ioctl start atomic write
    3. (write db file) * n
    4. ioctl commit atomic write
    5. close db file

    With this flow we can avoid file becoming corrupted when abnormal power
    cut, because we hold data of transaction in referenced pages linked in
    inmem_pages list of inode, but without setting them dirty, so these data
    won't be persisted unless we commit them in step 4.

    But we should still hold journal db file in memory by using volatile
    write, because our semantics of 'atomic write support' is incomplete, in
    step 4, we could fail to submit all dirty data of transaction, once
    partial dirty data was committed in storage, then after a checkpoint &
    abnormal power-cut, db file will be corrupted forever.

    So this patch tries to improve atomic write flow by adding a revoking flow,
    once inner error occurs in committing, this gives another chance to try to
    revoke these partial submitted data of current transaction, it makes
    committing operation more like aotmical one.

    If we're not lucky, once revoking operation was failed, EAGAIN will be
    reported to user for suggesting doing the recovery with held journal file,
    or retrying current transaction again.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch fixes confilct on page->private value between f2fs_trace_pid and
    atomic page.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

13 Oct, 2015

1 commit


10 Oct, 2015

2 commits

  • This patch introduce background_gc=sync enabling synchronous cleaning in
    background.

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Previously, we skip dentry block writes when wbc is SYNC_NONE with no memory
    pressure and the number of dirty pages is pretty small.

    But, we didn't skip for normal data writes, which gives us not much big impact
    on overall performance.
    Moreover, by skipping some data writes, kworker falls into infinite loop to try
    to write blocks, when many dir inodes have only one dentry block.

    So, this patch removes skipping data writes.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

15 Aug, 2015

1 commit


12 Aug, 2015

1 commit

  • Previously, we use radix tree to index all registered page entries for
    atomic file, but now we only use radix tree to see whether current page
    is indexed or not, since the other user of radix tree is gone in commit
    042b7816aaeb ("f2fs: remove unnecessary call to invalidate inmemory pages").

    So in this patch, we try to use one more efficient way:
    Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private
    value to indicate page indexing status. By using this way, we can save
    memory and lookup time.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

05 Aug, 2015

1 commit


26 Jun, 2015

1 commit

  • Pull cgroup writeback support from Jens Axboe:
    "This is the big pull request for adding cgroup writeback support.

    This code has been in development for a long time, and it has been
    simmering in for-next for a good chunk of this cycle too. This is one
    of those problems that has been talked about for at least half a
    decade, finally there's a solution and code to go with it.

    Also see last weeks writeup on LWN:

    http://lwn.net/Articles/648292/"

    * 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits)
    writeback, blkio: add documentation for cgroup writeback support
    vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB
    writeback: do foreign inode detection iff cgroup writeback is enabled
    v9fs: fix error handling in v9fs_session_init()
    bdi: fix wrong error return value in cgwb_create()
    buffer: remove unusued 'ret' variable
    writeback: disassociate inodes from dying bdi_writebacks
    writeback: implement foreign cgroup inode bdi_writeback switching
    writeback: add lockdep annotation to inode_to_wb()
    writeback: use unlocked_inode_to_wb transaction in inode_congested()
    writeback: implement unlocked_inode_to_wb transaction and use it for stat updates
    writeback: implement [locked_]inode_to_wb_and_lock_list()
    writeback: implement foreign cgroup inode detection
    writeback: make writeback_control track the inode being written back
    writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()
    mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use
    writeback: implement memcg writeback domain based throttling
    writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes
    writeback: implement memcg wb_domain
    writeback: update wb_over_bg_thresh() to use wb_domain aware operations
    ...

    Linus Torvalds
     

02 Jun, 2015

2 commits

  • With the planned cgroup writeback support, backing-dev related
    declarations will be more widely used across block and cgroup;
    unfortunately, including backing-dev.h from include/linux/blkdev.h
    makes cyclic include dependency quite likely.

    This patch separates out backing-dev-defs.h which only has the
    essential definitions and updates blkdev.h to include it. c files
    which need access to more backing-dev details now include
    backing-dev.h directly. This takes backing-dev.h off the common
    include dependency chain making it a lot easier to use it across block
    and cgroup.

    v2: fs/fat build failure fixed.

    Signed-off-by: Tejun Heo
    Reviewed-by: Jan Kara
    Cc: Jens Axboe
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback)
    and the role of the separation is unclear. For cgroup support for
    writeback IOs, a bdi will be updated to host multiple wb's where each
    wb serves writeback IOs of a different cgroup on the bdi. To achieve
    that, a wb should carry all states necessary for servicing writeback
    IOs for a cgroup independently.

    This patch moves bandwidth related fields from backing_dev_info into
    bdi_writeback.

    * The moved fields are: bw_time_stamp, dirtied_stamp, written_stamp,
    write_bandwidth, avg_write_bandwidth, dirty_ratelimit,
    balanced_dirty_ratelimit, completions and dirty_exceeded.

    * writeback_chunk_size() and over_bground_thresh() now take @wb
    instead of @bdi.

    * bdi_writeout_fraction(bdi, ...) -> wb_writeout_fraction(wb, ...)
    bdi_dirty_limit(bdi, ...) -> wb_dirty_limit(wb, ...)
    bdi_position_ration(bdi, ...) -> wb_position_ratio(wb, ...)
    bdi_update_writebandwidth(bdi, ...) -> wb_update_write_bandwidth(wb, ...)
    [__]bdi_update_bandwidth(bdi, ...) -> [__]wb_update_bandwidth(wb, ...)
    bdi_{max|min}_pause(bdi, ...) -> wb_{max|min}_pause(wb, ...)
    bdi_dirty_limits(bdi, ...) -> wb_dirty_limits(wb, ...)

    * Init/exits of the relocated fields are moved to bdi_wb_init/exit()
    respectively. Note that explicit zeroing is dropped in the process
    as wb's are cleared in entirety anyway.

    * As there's still only one bdi_writeback per backing_dev_info, all
    uses of bdi->stat[] are mechanically replaced with bdi->wb.stat[]
    introducing no behavior changes.

    v2: Typo in description fixed as suggested by Jan.

    Signed-off-by: Tejun Heo
    Reviewed-by: Jan Kara
    Cc: Jens Axboe
    Cc: Wu Fengguang
    Cc: Jaegeuk Kim
    Cc: Steven Whitehouse
    Signed-off-by: Jens Axboe

    Tejun Heo
     

29 May, 2015

1 commit

  • This patch adds a bitmap for discard issues from f2fs_trim_fs.
    There-in rule is to issue discard commands only for invalidated blocks
    after mount.
    Once mount is done, f2fs_trim_fs trims out whole invalid area.
    After ehn, it will not issue and discrads redundantly.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

11 Apr, 2015

1 commit

  • In __set_free we will check whether all segment are free in one section
    when free one segment, in order to set section to free status. But the
    searching region of segmap is from start segno to last segno of main
    area, it's not necessary. So let's just only check all segment bitmap
    of target section.

    Signed-off-by: Wanpeng Li
    Signed-off-by: Jaegeuk Kim

    Wanpeng Li
     

12 Feb, 2015

3 commits

  • rwlock can provide better concurrency when there are much more readers than
    writers because readers can hold the rwlock simultaneously.

    But now, for segmap_lock rwlock in struct free_segmap_info, there is only one
    reader 'mount' from below call path:
    ->f2fs_fill_super
    ->build_segment_manager
    ->build_dirty_segmap
    ->init_dirty_segmap
    ->find_next_inuse
    read_lock
    ...
    read_unlock

    Now that our concurrency can not be improved since there is no other reader for
    this lock, we do not need to use rwlock_t type for segmap_lock, let's replace it
    with spinlock_t type.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Instead of using variable length array, this patch let preallocate memory for
    them.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Currently, there are several variables with Boolean type as below:

    struct f2fs_sb_info {
    ...
    int s_dirty;
    bool need_fsck;
    bool s_closing;
    ...
    bool por_doing;
    ...
    }

    For this there are some issues:
    1. there are some space of f2fs_sb_info is wasted due to aligning after Boolean
    type variables by compiler.
    2. if we continuously add new flag into f2fs_sb_info, structure will be messed
    up.

    So in this patch, we try to:
    1. switch s_dirty to Boolean type variable since it has two status 0/1.
    2. merge s_dirty/need_fsck/s_closing/por_doing variables into s_flag.
    3. introduce an enum type which can indicate different states of sbi.
    4. use new introduced universal interfaces is_sbi_flag_set/{set,clear}_sbi_flag
    to operate flags for sbi.

    After that, above issues will be fixed.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

10 Nov, 2014

1 commit


04 Nov, 2014

1 commit


07 Oct, 2014

1 commit

  • This patch introduces a very limited functionality for atomic write support.
    In order to support atomic write, this patch adds two ioctls:
    o F2FS_IOC_START_ATOMIC_WRITE
    o F2FS_IOC_COMMIT_ATOMIC_WRITE

    The database engine should be aware of the following sequence.
    1. open
    -> ioctl(F2FS_IOC_START_ATOMIC_WRITE);
    2. writes
    : all the written data will be treated as atomic pages.
    3. commit
    -> ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
    : this flushes all the data blocks to the disk, which will be shown all or
    nothing by f2fs recovery procedure.
    4. repeat to #2.

    The IO pattens should be:

    ,- START_ATOMIC_WRITE ,- COMMIT_ATOMIC_WRITE
    CP | D D D D D D | FSYNC | D D D D | FSYNC ...
    `- COMMIT_ATOMIC_WRITE

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

01 Oct, 2014

2 commits

  • My static checker complains that segment is a u64 but only the lower 31
    bits can be used before we hit a shift wrapping bug.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Jaegeuk Kim

    Dan Carpenter
     
  • This patch cleans up the existing and new macros for readability.

    Rule is like this.

    ,-----------------------------------------> MAX_BLKADDR -,
    | ,------------- TOTAL_BLKS ----------------------------,
    | | |
    | ,- seg0_blkaddr ,----- sit/nat/ssa/main blkaddress |
    block | | (SEG0_BLKADDR) | | | | (e.g., MAIN_BLKADDR) |
    address 0..x................ a b c d .............................
    | |
    global seg# 0...................... m .............................
    | | |
    | `------- MAIN_SEGS -----------'
    `-------------- TOTAL_SEGS ---------------------------'
    | |
    seg# 0..........xx..................

    = Note =
    o GET_SEGNO_FROM_SEG0 : blk address -> global segno
    o GET_SEGNO : blk address -> segno
    o START_BLOCK : segno -> starting block address

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

24 Sep, 2014

5 commits

  • Previously, f2fs activates SSR if the # of free segments reaches to the # of
    overprovisioned segments.
    In this case, SSR starts to use dirty segments only, so that the overprovisoned
    space cannot be selected for new data.
    This means that we have no chance to utilizae the overprovisioned space at all.

    This patch fixes that by allowing LFS allocations until the # of free segments
    reaches to the last threshold, reserved space.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch changes the ipu_policy setting to use any combination of orthogonal policies.

    Signed-off-by: Changman Lee
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Block size in f2fs is 4096 bytes, so theoretically, f2fs can support 4096 bytes
    sector device at maximum. But now f2fs only support 512 bytes size sector, so
    block device such as zRAM which uses page cache as its block storage space will
    not be mounted successfully as mismatch between sector size of zRAM and sector
    size of f2fs supported.

    In this patch we support large sector size in f2fs, so block device with sector
    size of 512/1024/2048/4096 bytes can be supported in f2fs.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch cleans up a simple macro.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Previously, all the dnode pages should be read during the roll-forward recovery.
    Even worsely, whole the chain was traversed twice.
    This patch removes that redundant and costly read operations by using page cache
    of meta_inode and readahead function as well.

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

16 Sep, 2014

1 commit

  • If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
    only starts to try in-place-updates.
    And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
    keeps out-of-order manner. Otherwise, it triggers in-place-updates.

    This may be used by storage showing very high random write performance.

    For example, it can be used when,

    Seq. writes (Data) + wait + Seq. writes (Node)

    is pretty much slower than,

    Rand. writes (Data)

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

10 Sep, 2014

3 commits

  • In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
    writes"), we descripte the issue as below:

    "Although building NAT journal in cursum reduce the read/write work for NAT
    block, but previous design leave us lower performance when write checkpoint
    frequently for these cases:
    1. if journal in cursum has already full, it's a bit of waste that we flush all
    nat entries to page for persistence, but not to cache any entries.
    2. if journal in cursum is not full, we fill nat entries to journal util
    journal is full, then flush the left dirty entries to disk without merge
    journaled entries, so these journaled entries may be flushed to disk at next
    checkpoint but lost chance to flushed last time."

    Actually, we have the same problem in using SIT journal area.

    In this patch, firstly we will update sit journal with dirty entries as many as
    possible. Secondly if there is no space in sit journal, we will remove all
    entries in journal and walk through the whole dirty entry bitmap of sit,
    accounting dirty sit entries located in same SIT block to sit entry set. All
    entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
    by count of entries in set. Later we flush entries in set which have fewest
    entries into journal as many as we can, and then flush dense set with merged
    entries to disk.

    In this way we can use sit journal area more effectively, also we will reduce
    SIT update, result in gaining in performance and saving lifetime of flash
    device.

    In my testing environment, it shows this patch can help to reduce SIT block
    update obviously.

    virtual machine + hard disk:
    fsstress -p 20 -n 400 -l 5
    sit page num cp count sit pages/cp
    based 2006.50 1349.75 1.486
    patched 1566.25 1463.25 1.070

    Our latency of merging op is small when handling a great number of dirty SIT
    entries in flush_sit_entries:
    latency(ns) dirty sit count
    36038 2151
    49168 2123
    37174 2232

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO is not used, remove it.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • This patch replaces BUG cases with f2fs_bug_on to remain fsck.f2fs information.
    And it implements some void functions to initiate fsck.f2fs too.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim