27 Apr, 2016

1 commit


10 Mar, 2016

1 commit


24 Sep, 2015

1 commit


03 May, 2015

1 commit

  • Currently it is possible to lose whole file system block worth of data
    when we hit the specific interaction with unwritten and delayed extents
    in status extent tree.

    The problem is that when we insert delayed extent into extent status
    tree the only way to get rid of it is when we write out delayed buffer.
    However there is a limitation in the extent status tree implementation
    so that when inserting unwritten extent should there be even a single
    delayed block the whole unwritten extent would be marked as delayed.

    At this point, there is no way to get rid of the delayed extents,
    because there are no delayed buffers to write out. So when a we write
    into said unwritten extent we will convert it to written, but it still
    remains delayed.

    When we try to write into that block later ext4_da_map_blocks() will set
    the buffer new and delayed and map it to invalid block which causes
    the rest of the block to be zeroed loosing already written data.

    For now we can fix this by simply not allowing to set delayed status on
    written extent in the extent status tree. Also add WARN_ON() to make
    sure that we notice if this happens in the future.

    This problem can be easily reproduced by running the following xfs_io.

    xfs_io -f -c "pwrite -S 0xaa 4096 2048" \
    -c "falloc 0 131072" \
    -c "pwrite -S 0xbb 65536 2048" \
    -c "fsync" /mnt/test/fff

    echo 3 > /proc/sys/vm/drop_caches
    xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff

    This can be theoretically also reproduced by at random by running fsx,
    but it's not very reliable, though on machines with bigger page size
    (like ppc) this can be seen more often (especially xfstest generic/127)

    Signed-off-by: Lukas Czerner
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Lukas Czerner
     

03 Apr, 2015

1 commit


26 Nov, 2014

5 commits

  • Introduce a simple aging to extent status tree. Each extent has a
    REFERENCED bit which gets set when the extent is used. Shrinker then
    skips entries with referenced bit set and clears the bit. Thus
    frequently used extents have higher chances of staying in memory.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Currently flags for extent status tree are defined twice, once shifted
    and once without a being shifted. Consolidate these definitions into one
    place and make some computations automatic to make adding flags less
    error prone. Compiler should be clever enough to figure out these are
    constants and generate the same code.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Currently we scan extent status trees of inodes until we reclaim nr_to_scan
    extents. This can however require a lot of scanning when there are lots
    of delayed extents (as those cannot be reclaimed).

    Change shrinker to work as shrinkers are supposed to and *scan* only
    nr_to_scan extents regardless of how many extents did we actually
    reclaim. We however need to be careful and avoid scanning each status
    tree from the beginning - that could lead to a situation where we would
    not be able to reclaim anything at all when first nr_to_scan extents in
    the tree are always unreclaimable. We remember with each inode offset
    where we stopped scanning and continue from there when we next come
    across the inode.

    Note that we also need to update places calling __es_shrink() manually
    to pass reasonable nr_to_scan to have a chance of reclaiming anything and
    not just 1.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Currently callers adding extents to extent status tree were responsible
    for adding the inode to the list of inodes with freeable extents. This
    is error prone and puts list handling in unnecessarily many places.

    Just add inode to the list automatically when the first non-delay extent
    is added to the tree and remove inode from the list when the last
    non-delay extent is removed.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • In this commit we discard the lru algorithm for inodes with extent
    status tree because it takes significant effort to maintain a lru list
    in extent status tree shrinker and the shrinker can take a long time to
    scan this lru list in order to reclaim some objects.

    We replace the lru ordering with a simple round-robin. After that we
    never need to keep a lru list. That means that the list needn't be
    sorted if the shrinker can not reclaim any objects in the first round.

    Cc: Andreas Dilger
    Signed-off-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Zheng Liu
     

21 Oct, 2014

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "A large number of cleanups and bug fixes, with some (minor) journal
    optimizations"

    [ This got sent to me before -rc1, but was stuck in my spam folder. - Linus ]

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (67 commits)
    ext4: check s_chksum_driver when looking for bg csum presence
    ext4: move error report out of atomic context in ext4_init_block_bitmap()
    ext4: Replace open coded mdata csum feature to helper function
    ext4: delete useless comments about ext4_move_extents
    ext4: fix reservation overflow in ext4_da_write_begin
    ext4: add ext4_iget_normal() which is to be used for dir tree lookups
    ext4: don't orphan or truncate the boot loader inode
    ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT
    ext4: optimize block allocation on grow indepth
    ext4: get rid of code duplication
    ext4: fix over-defensive complaint after journal abort
    ext4: fix return value of ext4_do_update_inode
    ext4: fix mmap data corruption when blocksize < pagesize
    vfs: fix data corruption when blocksize < pagesize for mmaped data
    ext4: fold ext4_nojournal_sops into ext4_sops
    ext4: support freezing ext2 (nojournal) file systems
    ext4: fold ext4_sync_fs_nojournal() into ext4_sync_fs()
    ext4: don't check quota format when there are no quota files
    jbd2: simplify calling convention around __jbd2_journal_clean_checkpoint_list
    jbd2: avoid pointless scanning of checkpoint lists
    ...

    Linus Torvalds
     

02 Sep, 2014

4 commits

  • This commit adds some statictics in extent status tree shrinker. The
    purpose to add these is that we want to collect more details when we
    encounter a stall caused by extent status tree shrinker. Here we count
    the following statictics:
    stats:
    the number of all objects on all extent status trees
    the number of reclaimable objects on lru list
    cache hits/misses
    the last sorted interval
    the number of inodes on lru list
    average:
    scan time for shrinking some objects
    the number of shrunk objects
    maximum:
    the inode that has max nr. of objects on lru list
    the maximum scan time for shrinking some objects

    The output looks like below:
    $ cat /proc/fs/ext4/sda1/es_shrinker_info
    stats:
    28228 objects
    6341 reclaimable objects
    5281/631 cache hits/misses
    586 ms last sorted interval
    250 inodes on lru list
    average:
    153 us scan time
    128 shrunk objects
    maximum:
    255 inode (255 objects, 198 reclaimable)
    125723 us max scan time

    If the lru list has never been sorted, the following line will not be
    printed:
    586ms last sorted interval
    If there is an empty lru list, the following lines also will not be
    printed:
    250 inodes on lru list
    ...
    maximum:
    255 inode (255 objects, 198 reclaimable)
    0 us max scan time

    Meanwhile in this commit a new trace point is defined to print some
    details in __ext4_es_shrink().

    Cc: Andreas Dilger
    Cc: Jan Kara
    Reviewed-by: Jan Kara
    Signed-off-by: Zheng Liu
    Signed-off-by: Theodore Ts'o

    Zheng Liu
     
  • This commit improves the trace point of extents status tree. We rename
    trace_ext4_es_shrink_enter in ext4_es_count() because it is also used
    in ext4_es_scan() and we can not identify them from the result.

    Further this commit fixes a variable name in trace point in order to
    keep consistency with others.

    Cc: Andreas Dilger
    Cc: Jan Kara
    Reviewed-by: Jan Kara
    Signed-off-by: Zheng Liu
    Signed-off-by: Theodore Ts'o

    Zheng Liu
     
  • Make the function name less redundant.

    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     
  • Teach ext4_ext_drop_refs() to accept a NULL argument, much like
    kfree(). This allows us to drop a lot of checks to make sure path is
    non-NULL before calling ext4_ext_drop_refs().

    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     

13 Jul, 2014

1 commit

  • This fixes the following lockdep complaint:

    [ INFO: possible circular locking dependency detected ]
    3.16.0-rc2-mm1+ #7 Tainted: G O
    -------------------------------------------------------
    kworker/u24:0/4356 is trying to acquire lock:
    (&(&sbi->s_es_lru_lock)->rlock){+.+.-.}, at: [] __ext4_es_shrink+0x4f/0x2e0

    but task is already holding lock:
    (&ei->i_es_lock){++++-.}, at: [] ext4_es_insert_extent+0x71/0x180

    which lock already depends on the new lock.

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(&ei->i_es_lock);
    lock(&(&sbi->s_es_lru_lock)->rlock);
    lock(&ei->i_es_lock);
    lock(&(&sbi->s_es_lru_lock)->rlock);

    *** DEADLOCK ***

    6 locks held by kworker/u24:0/4356:
    #0: ("writeback"){.+.+.+}, at: [] process_one_work+0x180/0x560
    #1: ((&(&wb->dwork)->work)){+.+.+.}, at: [] process_one_work+0x180/0x560
    #2: (&type->s_umount_key#22){++++++}, at: [] grab_super_passive+0x44/0x90
    #3: (jbd2_handle){+.+...}, at: [] start_this_handle+0x189/0x5f0
    #4: (&ei->i_data_sem){++++..}, at: [] ext4_map_blocks+0x132/0x550
    #5: (&ei->i_es_lock){++++-.}, at: [] ext4_es_insert_extent+0x71/0x180

    stack backtrace:
    CPU: 0 PID: 4356 Comm: kworker/u24:0 Tainted: G O 3.16.0-rc2-mm1+ #7
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    Workqueue: writeback bdi_writeback_workfn (flush-253:0)
    ffffffff8213dce0 ffff880014b07538 ffffffff815df0bb 0000000000000007
    ffffffff8213e040 ffff880014b07588 ffffffff815db3dd ffff880014b07568
    ffff880014b07610 ffff88003b868930 ffff88003b868908 ffff88003b868930
    Call Trace:
    [] dump_stack+0x4e/0x68
    [] print_circular_bug+0x1fb/0x20c
    [] __lock_acquire+0x163e/0x1d00
    [] ? retint_restore_args+0xe/0xe
    [] ? __slab_alloc+0x4a8/0x4ce
    [] ? __ext4_es_shrink+0x4f/0x2e0
    [] lock_acquire+0x87/0x120
    [] ? __ext4_es_shrink+0x4f/0x2e0
    [] ? ext4_es_free_extent+0x5d/0x70
    [] _raw_spin_lock+0x39/0x50
    [] ? __ext4_es_shrink+0x4f/0x2e0
    [] ? kmem_cache_alloc+0x18b/0x1a0
    [] __ext4_es_shrink+0x4f/0x2e0
    [] ext4_es_insert_extent+0xc8/0x180
    [] ext4_map_blocks+0x1c4/0x550
    [] ext4_writepages+0x6d4/0xd00
    ...

    Reported-by: Minchan Kim
    Signed-off-by: Theodore Ts'o
    Reported-by: Minchan Kim
    Cc: stable@vger.kernel.org
    Cc: Zheng Liu

    Theodore Ts'o
     

13 May, 2014

1 commit

  • In ext4_es_can_be_merged() when checking whether we can merge two
    extents we should use EXT_MAX_BLOCKS instead of defining it manually.
    Also if it is really the case we should notify userspace because clearly
    there is a bug in extent status tree implementation since this should
    never happen.

    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Zheng Liu

    Lukas Czerner
     

21 Apr, 2014

1 commit

  • Currently in ext4 there is quite a mess when it comes to naming
    unwritten extents. Sometimes we call it uninitialized and sometimes we
    refer to it as unwritten.

    The right name for the extent which has been allocated but does not
    contain any written data is _unwritten_. Other file systems are
    using this name consistently, even the buffer head state refers to it as
    unwritten. We need to fix this confusion in ext4.

    This commit changes every reference to an uninitialized extent (meaning
    allocated but unwritten) to unwritten extent. This includes comments,
    function names and variable names. It even covers abbreviation of the
    word uninitialized (such as uninit) and some misspellings.

    This commit does not change any of the code paths at all. This has been
    confirmed by comparing md5sums of the assembly code of each object file
    after all the function names were stripped from it.

    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"

    Lukas Czerner
     

07 Apr, 2014

1 commit

  • '0x7FDEADBEEF' will be truncated to 32-bit number under unicore32. Need
    append 'ULL' for it.

    The related warning (with allmodconfig under unicore32):

    CC [M] fs/ext4/extents_status.o
    fs/ext4/extents_status.c: In function "__es_remove_extent":
    fs/ext4/extents_status.c:813: warning: integer constant is too large for "long" type

    Signed-off-by: Chen Gang
    Signed-off-by: "Theodore Ts'o"

    Chen Gang
     

21 Feb, 2014

1 commit


20 Feb, 2014

1 commit

  • Avoid false positives by static code analysis tools such as sparse and
    coverity caused by the fact that we set the physical block, and then
    the status in the extent_status structure. It is also more efficient
    to set both of these values at once.

    Addresses-Coverity-Id: #989077
    Addresses-Coverity-Id: #989078
    Addresses-Coverity-Id: #1080722

    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Zheng Liu

    Theodore Ts'o
     

11 Sep, 2013

1 commit

  • Convert the filesystem shrinkers to use the new API, and standardise some
    of the behaviours of the shrinkers at the same time. For example,
    nr_to_scan means the number of objects to scan, not the number of objects
    to free.

    I refactored the CIFS idmap shrinker a little - it really needs to be
    broken up into a shrinker per tree and keep an item count with the tree
    root so that we don't need to walk the tree every time the shrinker needs
    to count the number of objects in the tree (i.e. all the time under
    memory pressure).

    [glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree]
    [assorted fixes folded in]
    Signed-off-by: Dave Chinner
    Signed-off-by: Glauber Costa
    Acked-by: Mel Gorman
    Acked-by: Artem Bityutskiy
    Acked-by: Jan Kara
    Acked-by: Steven Whitehouse
    Cc: Adrian Hunter
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton

    Signed-off-by: Al Viro

    Dave Chinner
     

29 Aug, 2013

1 commit

  • After applied the commit (4a092d73), we have reduced the number of
    source files that need to #include ext4_extents.h. But we can do
    better.

    This commit defines ext4_zeroout_es() in extents.c and move
    EXT_MAX_BLOCKS into ext4.h in order not to include ext4_extents.h in
    indirect.c and ioctl.c. Meanwhile we just need to include this file in
    extent_status.c when ES_AGGRESSIVE_TEST is defined. Otherwise, this
    commit removes a duplicated declaration in trace/events/ext4.h.

    After applied this patch, we just need to include ext4_extents.h file
    in {super,migrate,move_extents,extents}.c, and it is easy for us to
    define a new extent disk layout.

    Signed-off-by: Zheng Liu
    Signed-off-by: "Theodore Ts'o"

    Zheng Liu
     

17 Aug, 2013

3 commits

  • Add a new fiemap flag which forces the all of the extents in an inode
    to be cached in the extent_status tree. This is critically important
    when using AIO to a preallocated file, since if we need to read in
    blocks from the extent tree, the io_submit(2) system call becomes
    synchronous, and the AIO is no longer "A", which is bad.

    In addition, for most files which have an external leaf tree block,
    the cost of caching the information in the extent status tree will be
    less than caching the entire 4k block in the buffer cache. So it is
    generally a win to keep the extent information cached.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • When we read in an extent tree leaf block from disk, arrange to have
    all of its entries cached. In nearly all cases the in-memory
    representation will be more compact than the on-disk representation in
    the buffer cache, and it allows us to get the information without
    having to traverse the extent tree for successive extents.

    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Zheng Liu

    Theodore Ts'o
     
  • Don't use an unsigned long long for the es_status flags; this requires
    that we pass 64-bit values around which is painful on 32-bit systems.
    Instead pass the extent status flags around using the low 4 bits of an
    unsigned int, and shift them into place when we are reading or writing
    es_pblk.

    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Zheng Liu

    Theodore Ts'o
     

15 Jul, 2013

1 commit

  • Some callers of ext4_es_remove_extent() and ext4_es_insert_extent()
    may not be completely robust against ENOMEM failures (or the
    consequences of reflecting ENOMEM back up to userspace may lead to
    xfstest or user application failure).

    To mitigate against this, when trying to insert an entry in the extent
    status tree, try to shrink the inode's extent status tree before
    returning ENOMEM. If there are entries which don't record information
    about extents under delayed allocations, freeing one of them is
    preferable to returning ENOMEM.

    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Zheng Liu

    Theodore Ts'o
     

13 Jul, 2013

1 commit


01 Jul, 2013

1 commit

  • Now we maintain an proper in-order LRU list in ext4 to reclaim entries
    from extent status tree when we are under heavy memory pressure. For
    keeping this order, a spin lock is used to protect this list. But this
    lock burns a lot of CPU time. We can use the following steps to trigger
    it.

    % cd /dev/shm
    % dd if=/dev/zero of=ext4-img bs=1M count=2k
    % mkfs.ext4 ext4-img
    % mount -t ext4 -o loop ext4-img /mnt
    % cd /mnt
    % for ((i=0;i

    Zheng Liu
     

03 May, 2013

1 commit

  • We (Linux Kernel Performance project) found a regression introduced
    by commit:

    f7fec032aa ext4: track all extent status in extent status tree

    The commit causes about 20% performance decrease in fio random write
    test. Profiler shows that rb_next() uses a lot of CPU time. The call
    stack is:

    rb_next
    ext4_es_find_delayed_extent
    ext4_map_blocks
    _ext4_get_block
    ext4_get_block_write
    __blockdev_direct_IO
    ext4_direct_IO
    generic_file_direct_write
    __generic_file_aio_write
    ext4_file_write
    aio_rw_vect_retry
    aio_run_iocb
    do_io_submit
    sys_io_submit
    system_call_fastpath
    io_submit
    td_io_getevents
    io_u_queued_complete
    thread_main
    main
    __libc_start_main

    The cause is that ext4_es_find_delayed_extent() doesn't have an
    upper bound, it keeps searching until a delayed extent is found.
    When there are a lots of non-delayed entries in the extent state
    tree, ext4_es_find_delayed_extent() may uses a lot of CPU time.

    Reported-by: LKP project
    Signed-off-by: Yan, Zheng
    Signed-off-by: Zheng Liu
    Cc: "Theodore Ts'o"

    Yan, Zheng
     

11 Mar, 2013

3 commits

  • When we try to split an extent, this extent could be zeroed out and mark
    as initialized. But we don't know this in ext4_map_blocks because it
    only returns a length of allocated extent. Meanwhile we will mark this
    extent as uninitialized because we only check m_flags.

    This commit update extent status tree when we try to split an unwritten
    extent. We don't need to worry about the status of this extent because
    we always mark it as initialized.

    Signed-off-by: Zheng Liu
    Signed-off-by: "Theodore Ts'o"
    Cc: Dmitry Monakhov

    Zheng Liu
     
  • This commit adds a self-testing infrastructure like extent tree does to
    do a sanity check for extent status tree. After status tree is as a
    extent cache, we'd better to make sure that it caches right result.

    After applied this commit, we will get a lot of messages when we run
    xfstests as below.

    ...
    kernel: ES len assertation failed for inode: 230 retval 1 != map->m_len
    3 in ext4_map_blocks (allocation)
    ...
    kernel: ES cache assertation failed for inode: 230 es_cached ex
    [974/2/4781/20] != found ex [974/1/4781/1000]
    ...
    kernel: ES insert assertation failed for inode: 635 ex_status
    [0/45/21388/w] != es_status [44/1/21432/u]
    ...

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Zheng Liu
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     
  • Check the length of an extent to avoid a potential overflow in
    ext4_es_can_be_merged().

    Signed-off-by: Zheng Liu
    Signed-off-by: "Theodore Ts'o"
    Cc: Dmitry Monakhov

    Zheng Liu
     

02 Mar, 2013

1 commit

  • Use a percpu counter rather than atomic types for shrinker accounting.
    There's no need for ultimate accuracy in the shrinker, so this
    should come a little more cheaply. The percpu struct is somewhat
    large, but there was a big gap before the cache-aligned
    s_es_lru_lock anyway, and it fits nicely in there.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

01 Mar, 2013

1 commit

  • When the system is under memory pressure, ext4_es_srhink() will get
    called very often. So optimize returning the number of items in the
    file system's extent status cache by keeping a per-filesystem count,
    instead of calculating it each time by scanning all of the inodes in
    the extent status cache.

    Also rename the slab used for the extent status cache to be
    "ext4_extent_status" so it's obviousl the slab in question is created
    by ext4.

    Signed-off-by: "Theodore Ts'o"
    Cc: Zheng Liu

    Theodore Ts'o
     

23 Feb, 2013

1 commit

  • len is 0 means no extent needs to be removed, so return immediately.
    Otherwise it could trigger the following BUG_ON() in
    ext4_es_remove_extent()

    end = lblk + len - 1;
    BUG_ON(end < lblk);

    This could be reproduced by a simple truncate(1) command by an
    unprivileged user

    truncate -s $(($((2**32 - 1)) * 4096)) /mnt/ext4/testfile

    The same is true for __es_insert_extent().

    Patched kernel passed xfstests regression test.

    Signed-off-by: Eryu Guan
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Zheng Liu

    Eryu Guan
     

18 Feb, 2013

4 commits

  • Although extent status is loaded on-demand, we also need to reclaim
    extent from the tree when we are under a heavy memory pressure because
    in some cases fragmented extent tree causes status tree costs too much
    memory.

    Here we maintain a lru list in super_block. When the extent status of
    an inode is accessed and changed, this inode will be move to the tail
    of the list. The inode will be dropped from this list when it is
    cleared. In the inode, a counter is added to count the number of
    cached objects in extent status tree. Here only written/unwritten/hole
    extent is counted because delayed extent doesn't be reclaimed due to
    fiemap, bigalloc and seek_data/hole need it. The counter will be
    increased as a new extent is allocated, and it will be decreased as a
    extent is freed.

    In this commit we use normal shrinker framework to reclaim memory from
    the status tree. ext4_es_reclaim_extents_count() traverses the lru list
    to count the number of reclaimable extents. ext4_es_shrink() tries to
    reclaim written/unwritten/hole extents from extent status tree. The
    inode that has been shrunk is moved to the tail of lru list.

    Signed-off-by: Zheng Liu
    Signed-off-by: "Theodore Ts'o"
    Cc: Jan kara

    Zheng Liu
     
  • This commit changes some interfaces in extent status tree because we
    need to use inode to count the cached objects in a extent status tree.

    Signed-off-by: Zheng Liu
    Signed-off-by: "Theodore Ts'o"
    Cc: Jan kara

    Zheng Liu
     
  • After tracking all extent status, we already have a extent cache in
    memory. Every time we want to lookup a block mapping, we can first
    try to lookup it in extent status tree to avoid a potential disk I/O.

    A new function called ext4_es_lookup_extent is defined to finish this
    work. When we try to lookup a block mapping, we always call
    ext4_map_blocks and/or ext4_da_map_blocks. So in these functions we
    first try to lookup a block mapping in extent status tree.

    A new flag EXT4_GET_BLOCKS_NO_PUT_HOLE is used in ext4_da_map_blocks
    in order not to put a hole into extent status tree because this hole
    will be converted to delayed extent in the tree immediately.

    Signed-off-by: Zheng Liu
    Signed-off-by: "Theodore Ts'o"
    Cc: Jan kara

    Zheng Liu
     
  • This commit renames ext4_es_find_extent with ext4_es_find_delayed_extent
    and improve this function. First, we split input and output parameter.
    Second, this function never return the first block of the next delayed
    extent after 'es'.

    Signed-off-by: Zheng Liu
    Signed-off-by: "Theodore Ts'o"
    Cc: Jan kara

    Zheng Liu