30 Apr, 2013

2 commits


29 Apr, 2013

3 commits

  • We call lock_page when we need to update a page after readpage.
    Between grab and lock page, the page can be truncated by other thread.
    So, we should check the page after lock_page whether it was truncated or not.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • In order to avoid build_free_nid lock contention, let's change the order of
    function calls as follows.

    At first, check whether there is enough free nids.
    - If available, just get a free nid with spin_lock without any overhead.
    - Otherwise, conduct build_free_nids.
    : scan nat pages, journal nat entries, and nat cache entries.

    We should consider carefullly not to serve free nids intermediately made by
    build_free_nids.
    We can get stable free nids only after build_free_nids is done.

    Reviewed-by: Namjae Jeon
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This can help when debugging the free nid allocation flows.

    Reviewed-by: Namjae Jeon
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

26 Apr, 2013

4 commits

  • It is more obvious that add_free_nid checks whether the free nid is zero or not.

    Reviewed-by: Namjae Jeon
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Adding REQ_META for all the metadata requests can help in improving the
    FS performance, if the underlying device supports TAGGING.
    So, when considering the submit_bio path for all the f2fs requests. We can
    add REQ_META for all the META requests.
    As a precursor to this change we considered the commit
    4265900e0be653f5b78baf2816857ef57cf1332f 'mmc: MMC-4.5 Data Tag Support'

    Signed-off-by: Namjae Jeon
    Signed-off-by: Amit Sahrawat
    Signed-off-by: Jaegeuk Kim

    Namjae Jeon
     
  • Previously, background GC submits many 4KB read requests to load victim blocks
    and/or its (i)node blocks.

    ...
    f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed
    f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0]
    f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee
    f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0]
    f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef
    f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0]
    ...

    However, by the fact that many IOs are sequential, we can give a chance to merge
    the IOs by IO scheduler.
    In order to do that, let's use blk_plug.

    ...
    f2fs_gc : f2fs_iget: ino = 143
    f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee
    f2fs_gc : f2fs_iget: ino = 143
    f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef
    : block_rq_complete: 8,16 R () 1519616 + 8 [0]
    : block_rq_complete: 8,16 R () 1519848 + 8 [0]
    : block_rq_complete: 8,16 R () 1520432 + 96 [0]
    : block_rq_complete: 8,16 R () 1520536 + 104 [0]
    : block_rq_complete: 8,16 R () 1521008 + 112 [0]
    : block_rq_complete: 8,16 R () 1521440 + 152 [0]
    : block_rq_complete: 8,16 R () 1521688 + 144 [0]
    : block_rq_complete: 8,16 R () 1522128 + 192 [0]
    : block_rq_complete: 8,16 R () 1523256 + 328 [0]
    ...

    Note that this issue should be addressed in checkpoint, and some readahead
    flows too.

    Reviewed-by: Namjae Jeon
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • If there is no victim segments selected by background GC, let's wait
    a little bit longer time to collect dirty segments.
    By default, let's give 5 minutes.

    Reviewed-by: Namjae Jeon
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

23 Apr, 2013

8 commits

  • Add tracepoints to debug checkpoint request.

    Signed-off-by: Namjae Jeon
    Signed-off-by: Pankaj Kumar
    Acked-by: Steven Rostedt
    [Jaegeuk: change expressions]
    Signed-off-by: Jaegeuk Kim

    Namjae Jeon
     
  • Add tracepoints to debug the various page write operation
    like data pages, meta pages.

    Signed-off-by: Namjae Jeon
    Signed-off-by: Pankaj Kumar
    Acked-by: Steven Rostedt
    [Jaegeuk: remove unnecessary tracepoints]
    Signed-off-by: Jaegeuk Kim

    Namjae Jeon
     
  • Add tracepoints to debug the block allocation & fallocate.

    Signed-off-by: Namjae Jeon
    Signed-off-by: Pankaj Kumar
    Acked-by: Steven Rostedt
    [Jaegeuk: enhance information]
    Signed-off-by: Jaegeuk Kim

    Namjae Jeon
     
  • Add tracepoints for tracing the garbage collector
    threads in f2fs with status of collection & type.

    Signed-off-by: Namjae Jeon
    Signed-off-by: Pankaj Kumar
    Acked-by: Steven Rostedt
    [Jaegeuk: modify slightly to show information]
    Signed-off-by: Jaegeuk Kim

    Namjae Jeon
     
  • Add tracepoints for page i/o operations and block allocation
    tracing during page read operation.

    Signed-off-by: Namjae Jeon
    Signed-off-by: Pankaj Kumar
    Acked-by: Steven Rostedt
    [Jaegeuk: combine and modify the tracepoint structures]
    Signed-off-by: Jaegeuk Kim

    Namjae Jeon
     
  • add tracepoints for tracing the truncate operations
    like truncate node/data blocks, f2fs_truncate etc.

    Tracepoints are added at entry and exit of operation
    to trace the success & failure of operation.

    Signed-off-by: Namjae Jeon
    Signed-off-by: Pankaj Kumar
    Acked-by: Steven Rostedt
    [Jaegeuk: combine and modify the tracepoint structures]
    Signed-off-by: Jaegeuk Kim

    Namjae Jeon
     
  • Add tracepoints in f2fs for tracing the syncing
    operations like filesystem sync, file sync enter/exit.
    It will helf to trace the code under debugging scenarios.

    Also add tracepoints for tracing the various inode operations
    like building inode, eviction of inode, link/unlike of
    inodes.

    Signed-off-by: Namjae Jeon
    Signed-off-by: Pankaj Kumar
    Acked-by: Steven Rostedt
    [Jaegeuk: combine and modify the tracepoint structures]
    Signed-off-by: Jaegeuk Kim

    Namjae Jeon
     
  • The code conditions put inside the function is_multimedia_file are
    reverse to the name i.e, we need to negate the return to actually
    check if the file is a multimedia file. So, change the code and usage
    path to align both the name and comparision conditions.

    Signed-off-by: Namjae Jeon
    Signed-off-by: Amit Sahrawat
    Signed-off-by: Jaegeuk Kim

    Namjae Jeon
     

22 Apr, 2013

1 commit

  • Fix to return a negative error code from the error handling
    case instead of 0, as returned elsewhere in this function.
    Introduce by commit c0d39e(f2fs: fix return values from validate superblock)

    Signed-off-by: Wei Yongjun
    Acked-by: Namjae Jeon
    Signed-off-by: Jaegeuk Kim

    Wei Yongjun
     

09 Apr, 2013

5 commits

  • Fix typo mistakes.
    1. I think that it should be 'L' instead of 'V'.
    2. and try to fix 'Front' instead of 'Frone'

    Signed-off-by: Namjae Jeon
    Signed-off-by: Amit Sahrawat
    Signed-off-by: Jaegeuk Kim

    Namjae Jeon
     
  • In order to be aware of prefree and free sections during FG_GC, let's start with
    write_checkpoint().

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • If (ofs % (NIDS_PER_BLOCK + 1) == 0), the node is an indirect node block.

    Signed-off-by: Zhihui Zhang
    Signed-off-by: Jaegeuk Kim

    Zhihui Zhang
     
  • In the previous version, f2fs uses global locks according to the usage types,
    such as directory operations, block allocation, block write, and so on.

    Reference the following lock types in f2fs.h.
    enum lock_type {
    RENAME, /* for renaming operations */
    DENTRY_OPS, /* for directory operations */
    DATA_WRITE, /* for data write */
    DATA_NEW, /* for data allocation */
    DATA_TRUNC, /* for data truncate */
    NODE_NEW, /* for node allocation */
    NODE_TRUNC, /* for node truncate */
    NODE_WRITE, /* for node write */
    NR_LOCK_TYPE,
    };

    In that case, we lose the performance under the multi-threading environment,
    since every types of operations must be conducted one at a time.

    In order to address the problem, let's share the locks globally with a mutex
    array regardless of any types.
    So, let users grab a mutex and perform their jobs in parallel as much as
    possbile.

    For this, I propose a new global lock scheme as follows.

    0. Data structure
    - f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
    - f2fs_sb_info -> node_write

    1. mutex_lock_op(sbi)
    - try to get an avaiable lock from the array.
    - returns the index of the gottern lock variable.

    2. mutex_unlock_op(sbi, index of the lock)
    - unlock the given index of the lock.

    3. mutex_lock_all(sbi)
    - grab all the locks in the array before the checkpoint.

    4. mutex_unlock_all(sbi)
    - release all the locks in the array after checkpoint.

    5. block_operations()
    - call mutex_lock_all()
    - sync_dirty_dir_inodes()
    - grab node_write
    - sync_node_pages()

    Note that,
    the pairs of mutex_lock_op()/mutex_unlock_op() and
    mutex_lock_all()/mutex_unlock_all() should be used together.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Move the f2fs_balance_fs out of the truncate_hole function and only
    perform that in punch_hole use case. The commit:

    ed60b1644e7f7e5dd67d21caf7e4425dff05dad0

    intended to do this but moved it into truncate_hole to cover more
    cases. However, a deadlock scenario is possible when deleting an inode
    entry under specific conditions:

    f2fs_delete_entry()
    mutex_lock_op(sbi, DENTRY_OPS);
    truncate_hole()
    f2fs_balance_fs()
    mutex_lock(&sbi->gc_mutex);
    f2fs_gc()
    write_checkpoint()
    block_operations()
    mutex_lock_op(sbi, DENTRY_OPS);

    Lets move it into the punch_hole case to cover the original intent of
    avoiding it during fallocate's expand_inode_data case.

    Change-Id: I29f8ea1056b0b88b70ba8652d901b6e8431bb27e
    Signed-off-by: Jason Hrycay
    Signed-off-by: Jaegeuk Kim

    Jason Hrycay
     

03 Apr, 2013

11 commits

  • This patch reduces redundant spin_lock operations in alloc_nid_failed().
    The alloc_nid_failed() does not need to delete entry and add one again
    by triggering spin_lock and spin_unlock redundantly.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Commit - fa9150a84c - replaces a call to generic_writepages() in
    f2fs_write_data_pages() with write_cache_pages(), with a function pointer
    argument pointing to routine: __f2fs_writepage.

    -> https://git.kernel.org/linus/fa9150a84ca333f68127097c4fa1eda4b3913a22

    This patch adds a NULL pointer check in f2fs_write_data_pages() to avoid
    a possible NULL pointer dereference, in case if - mapping->a_ops->writepage -
    is NULL.

    Signed-off-by: P J P
    Signed-off-by: Jaegeuk Kim

    P J P
     
  • Like below, there are 8 segment bitmaps for SSR victim candidates.

    enum dirty_type {
    DIRTY_HOT_DATA, /* dirty segments assigned as hot data logs */
    DIRTY_WARM_DATA, /* dirty segments assigned as warm data logs */
    DIRTY_COLD_DATA, /* dirty segments assigned as cold data logs */
    DIRTY_HOT_NODE, /* dirty segments assigned as hot node logs */
    DIRTY_WARM_NODE, /* dirty segments assigned as warm node logs */
    DIRTY_COLD_NODE, /* dirty segments assigned as cold node logs */
    DIRTY, /* to count # of dirty segments */
    PRE, /* to count # of entirely obsolete segments */
    NR_DIRTY_TYPE
    };

    The upper 6 bitmaps indicates segments dirtied by active log areas respectively.
    And, the DIRTY bitmap integrates all the 6 bitmaps.

    For example,
    o DIRTY_HOT_DATA : 1010000
    o DIRTY_WARM_DATA: 0100000
    o DIRTY_COLD_DATA: 0001000
    o DIRTY_HOT_NODE : 0000010
    o DIRTY_WARM_NODE: 0000001
    o DIRTY_COLD_NODE: 0000000
    In this case,
    o DIRTY : 1111011,

    which means that we should guarantee the consistency between DIRTY and other
    bitmaps concreately.

    However, the SSR mode selects victims freely from any log types, which can set
    multiple bits across the various bitmap types.

    So, this patch eliminates this inconsistency.

    Reviewed-by: Namjae Jeon
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • In order to do GC more reliably, I'd like to lock the vicitm summary page
    until its GC is completed, and also prevent any checkpoint process.

    Reviewed-by: Namjae Jeon
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch adds a new condition that allocates free segments in the current
    active section even if SSR is needed.
    Otherwise, f2fs cannot allocate remained free segments in the section since
    SSR finds dirty segments only.

    Reviewed-by: Namjae Jeon
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • The foreground GCs are triggered under not enough free sections.
    So, we should not skip moving valid blocks in the victim segments.

    Reviewed-by: Namjae Jeon
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch removes a bitmap for victim segments selected by foreground GC, and
    modifies the other bitmap for victim segments selected by background GC.

    1) foreground GC bitmap
    : We don't need to manage this, since we just only one previous victim section
    number instead of the whole victim history.
    The f2fs uses the victim section number in order not to allocate currently
    GC'ed section to current active logs.

    2) background GC bitmap
    : This bitmap is used to avoid selecting victims repeatedly by background GCs.
    In addition, the victims are able to be selected by foreground GCs, since
    there is no need to read victim blocks during foreground GCs.

    By the fact that the foreground GC reclaims segments in a section unit, it'd
    be better to manage this bitmap based on the section granularity.

    Reviewed-by: Namjae Jeon
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • When allocating a new segment under the LFS mode, we should keep the section
    boundary.

    Reviewed-by: Namjae Jeon
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • In get_node_page, we do not need to call lock_page all the time.

    If the node page is cached as uptodate,

    1. grab_cache_page locks the page,
    2. read_node_page unlocks the page, and
    3. lock_page is called for further process.

    Let's avoid this.

    Reviewed-by: Namjae Jeon
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Let's use a macro to get the total number of sections.

    Reviewed-by: Namjae Jeon
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • A macro should not use duplicate parameter names.

    Reviewed-by: Namjae Jeon
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

31 Mar, 2013

1 commit


27 Mar, 2013

4 commits

  • When we recover fsync'ed data after power-off-recovery, we should guarantee
    that any parent inode number should be correct for each direct inode blocks.

    So, let's make the following rules.

    - The fsync should do checkpoint to all the inodes that were experienced hard
    links.

    - So, the only normal files can be recovered by roll-forward.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • In the checkpoint flow, the f2fs investigates the total nat cache entries.
    Previously, if an entry has NULL_ADDR, f2fs drops the entry and adds the
    obsolete nid to the free nid list.
    However, this free nid will be reused sooner, resulting in its nat entry miss.
    In order to avoid this, we don't need to drop the nat cache entry at this moment.

    Reviewed-by: Namjae Jeon
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch removes data_version check flow during the fsync call.
    The original purpose for the use of data_version was to avoid writng inode
    pages redundantly by the fsync calls repeatedly.
    However, when user can modify file meta and then call fsync, we should not
    skip fsync procedure.
    So, let's remove this condition check and hope that user triggers in right
    manner.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • We should handle errors during the recovery flow correctly.
    For example, if we get -ENOMEM, we should report a mount failure instead of
    conducting the remained mount procedure.

    Reviewed-by: Namjae Jeon
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

20 Mar, 2013

1 commit