14 Jul, 2008

6 commits

  • journal_try_to_free_buffers() could race with jbd commit transaction
    when the later is holding the buffer reference while waiting for the
    data buffer to flush to disk. If the caller of
    journal_try_to_free_buffers() request tries hard to release the buffers,
    it will treat the failure as error and return back to the caller. We
    have seen the directo IO failed due to this race. Some of the caller of
    releasepage() also expecting the buffer to be dropped when passed with
    GFP_KERNEL mask to the releasepage()->journal_try_to_free_buffers().

    With this patch, if the caller is passing the GFP_KERNEL to indicating
    this call could wait, in case of try_to_free_buffers() failed, let's
    waiting for journal_commit_transaction() to finish commit the current
    committing transaction , then try to free those buffers again with
    journal locked.

    Signed-off-by: Mingming Cao
    Reviewed-by: Badari Pulavarty
    Signed-off-by: "Theodore Ts'o"

    Mingming Cao
     
  • __FUNCTION__ is gcc-specific, use __func__ instead

    Signed-off-by: Stoyan Gaydarov
    Cc: Theodore Ts'o
    Cc: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Theodore Ts'o

    Stoyan Gaydarov
     
  • The error processing of the return value of mb_free_blocks is meanless
    because it only returns 0. This fix includes

    - make mb_free_blocks return void

    - remove the error processing part in callers

    - unlock group before calling ext4_error in mb_free_blocks

    Signed-off-by: Shen Feng
    Cc: Mingming Cao
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Theodore Ts'o

    Shen Feng
     
  • When the directory fs/ext4 is not correctly created under proc, the entry
    under this directory should not be created.

    Signed-off-by: Shen Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Theodore Ts'o

    Shen Feng
     
  • Linus Torvalds
     
  • # cat devices.list
    c 1:3 r
    # echo 'c 1:3 w' > sub/devices.allow
    # cat sub/devices.list
    c 1:3 w

    As illustrated, the parent group has no write permission to /dev/null, so
    it's child should not be allowed to add this write permission.

    Signed-off-by: Li Zefan
    Acked-by: Serge Hallyn
    Cc: Serge Hallyn
    Cc: Paul Menage
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

12 Jul, 2008

34 commits

  • Update group infos when updating a group's descriptor.
    Add group infos when adding a group's descriptor.
    Refresh cache pages used by mb_alloc when changes occur.
    This will probably need modifications when META_BG resizing will be allowed.

    Signed-off-by: Frederic Bohe
    Signed-off-by: Mingming Cao

    Frederic Bohe
     
  • Use the BUFFER_FNS functions (set_buffer_foo) to set buffer
    head state atomically instead of nonatomic __set_bit().

    Signed-off-by: Eric Sandeen
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • Set sbi->s_journal to NULL after we call journal_destroy(). This
    will be later needed because after journal_destroy() is called,
    ext4_clear_inode() can still be called for some inodes (e.g. root
    inode) and we'll need to detect there that journal doesn't exists
    anymore.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • mballoc allocation missed check for blocks reserved for root users. Add
    ext4_has_free_blocks() check before allocation. Also modified
    ext4_has_free_blocks() to support multiple block allocation request.

    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Mingming Cao
     
  • To ensure that bits are truly on-disk after an fsync,
    we should call blkdev_issue_flush if barriers are supported.

    Inspired by an old thread on barriers, by reiserfs & xfs
    which do the same, and by a patch SuSE ships with their kernel

    Signed-off-by: Eric Sandeen
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • Move the code related to block allocation to a single function and add helper
    funtions to differient allocation for data and meta data blocks

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • When mballoc is enabled, block allocation for old block-based
    files are allocated using mballoc allocator instead of old
    block-based allocator. The old ext3 block reservation is turned
    off when mballoc is turned on.

    However, the in-core preallocation is not enabled for block-based/
    non-extent based file block allocation. This result in performance
    regression, as now we don't have "reservation" ore in-core preallocation
    to prevent interleaved fragmentation in multiple writes workload.

    This patch fix this by enable per inode in-core preallocation
    for non extent files when mballoc is used.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • When meta_bg feature is enabled and s_first_meta_bg != 0,
    ext4_init_block_bitmap() miscalculates the number of block used by
    the group descriptor table (0 or 1 for metablock block group)

    This patch fixes this by using ext4_bg_num_gdb()

    Signed-off-by: Akinobu Mita
    Cc: Andrew Morton
    Cc: Stephen Tweedie
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"
    Acked-by: Andreas Dilger

    Akinobu Mita
     
  • Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • dx_root_limit() will had some dead code which forced it to always return
    20, and dx_node_limit to always return 22 for debugging purposes.
    Remove it.

    Acked-by: Andreas Dilger
    Signed-off-by: Li Zefan
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Li Zefan
     
  • Fix ext4_ext_journal_restart() so it returns any errors reported by
    ext4_journal_extend() and ext4_journal_restart().

    Signed-off-by: Shen Feng
    Reviewed-by: Aneesh Kumar K.V
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Shen Feng
     
  • When pos=0 or depth, the fields of ext4_ext_path is are not
    completely filled. This patch also removes some unnecessary code.

    Signed-off-by: Shen Feng
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Shen Feng
     
  • ext4_ext_create_new_leaf must return error when its
    calling to ext4_ext_split failed.

    Signed-off-by: Shen Feng
    Signed-off-by: Mingming Cao
    Reviewed-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Shen Feng
     
  • When allocating unitialized space at the end of file which had been
    preallocated with the FALLOC_FL_KEEP_SIZE option, the file size is not
    updated at that time. But the later we are not updating the file size
    when writing to that preallocated space.

    These changes are for code correctness. This patch allows us to update
    the i_disksize at the write_end() callback of filesystem properly.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • Quota allocation is not removed when ext4_mb_new_blocks calls
    kmem_cache_alloc failed. Also make sure the allocation context is freed
    on the error path.

    Signed-off-by: Shen Feng
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Shen Feng
     
  • The previous sb_min_blocksize() has already set the block size.

    Signed-off-by: Li Zefan
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Li Zefan
     
  • This patch mostly controls the way inode are allocated in order to
    make ialloc aware of flex_bg block group grouping. It achieves this
    by bypassing the Orlov allocator when block group meta-data are packed
    toghether through mke2fs. Since the impact on the block allocator is
    minimal, this patch should have little or no effect on other block
    allocation algorithms. By controlling the inode allocation, it can
    basically control where the initial search for new block begins and
    thus indirectly manipulate the block allocator.

    This allocator favors data and meta-data locality so the disk will
    gradually be filled from block group zero upward. This helps improve
    performance by reducing seek time. Since the group of inode tables
    within one flex_bg are treated as one giant inode table, uninitialized
    block groups would not need to partially initialize as many inode
    table as with Orlov which would help fsck time as the filesystem usage
    goes up.

    Signed-off-by: Jose R. Santos
    Signed-off-by: Valerie Clement
    Signed-off-by: "Theodore Ts'o"

    Jose R. Santos
     
  • Carlo Wood has demonstrated that it's possible to recover deleted
    files from the journal. Something that will make this easier is if we
    can put the time of the commit into commit block.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • ext4_next_entry() is used by the debugging function dx_show_leaf(), so
    it must be defined before that function.

    Signed-off-by: Li Zefan
    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Li Zefan
     
  • Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • Since this a non-static function, make it be ext4 specific to avoid
    conflicts with potentially other filesystems.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • remove the definitions of macros XATTR_TRUSTED_PREFIX and XATTR_USER_PREFIX
    since they are defined in linux/xattr.h

    Signed-off-by: Shen Feng
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Shen Feng
     
  • ext4_mb_seq_history_open(): check if sbi->s_mb_history is NULL

    ext4_mb_history_init(): replace kmalloc and memset with kzalloc

    ext4_mb_init_backend(): remove memset since kzalloc is used

    ext4_mb_init(): the return value of ext4_mb_init_backend is int,
    but i is unsigned, replace it with a new int variable.

    Signed-off-by: Shen Feng
    Reviewed-by: Aneesh Kumar K.V
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Shen Feng
     
  • Add error processing for ext4_mb_load_buddy when it calls
    ext4_mb_init_cache.

    Signed-off-by: Shen Feng
    Reviewed-by: Aneesh Kumar K.V
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Shen Feng
     
  • ext4_mb_init_cache() incorrectly always return EIO on success. This
    causes the caller of ext4_mb_init_cache() fail when it checks the return
    value.

    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Mingming Cao
     
  • * remove unnecessary code in free_rb_tree_fname

    * rename free_rb_tree_fname to ext4_htree_create_dir_info
    since it and ext4_htree_free_dir_info are a pair

    * replace kmalloc with kzalloc in ext4_htree_free_dir_info

    All these make the code more readable and simple.
    PS: this patch is also suitable for ext3.

    Signed-off-by: Shen Feng
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Shen Feng
     
  • Signed-off-by: Alexey Dobriyan
    Cc: Mingming Cao
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton

    Alexey Dobriyan
     
  • if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
    side-effects to allow a definition of BUG_ON that drops the code completely.

    The semantic patch that makes this change is as follows:
    (http://www.emn.fr/x-info/coccinelle/)

    //
    @ disable unlikely @ expression E,f; @@

    (
    if () { BUG(); }
    |
    - if (unlikely(E)) { BUG(); }
    + BUG_ON(E);
    )

    @@ expression E,f; @@

    (
    if () { BUG(); }
    |
    - if (E) { BUG(); }
    + BUG_ON(E);
    )
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Andrew Morton
    Signed-off-by: "Theodore Ts'o"

    Julia Lawall
     
  • With mballoc we search for the best extent using different
    criteria. We should always use the goal group when we are
    starting with a new criteria.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • Change second/third to fourth.

    Signed-off-by: Shen Feng
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Shen Feng
     
  • Some architectures implement ext4_find_next_bit and
    ext4_find_next_zero_bit in such a way that they return
    greater than max for some input values. Make sure
    mb_find_next_bit and mb_find_next_zero_bit return the
    right values.

    On 2.6.25 we have include/asm-x86/bitops_32.h
    static inline unsigned find_first_bit(const unsigned long *addr, unsigned size)
    {
    unsigned x = 0;

    while (x < size) {
    unsigned long val = *addr++;
    if (val)
    return __ffs(val) + x;
    x += (sizeof(*addr)<
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • ext4_dx_find_entry uses ext4_next_entry without verifying that the entry is
    valid. If its rec_len == 0 this causes an infinite loop. Refactor the loop
    to check the validity of entries before checking whether they match and
    moving onto the next one.

    There are other uses of ext4_next_entry in this file which also look
    problematic. They should be reviewed and fixed if/when we have a test-case
    that triggers them.

    This patch fixes the first case (image hdb.25.softlockup.gz) reported in
    http://bugzilla.kernel.org/show_bug.cgi?id=10882.

    Signed-off-by: Duane Griffin
    Signed-off-by: Theodore Ts'o

    Duane Griffin
     
  • While freeing indirect blocks we attach a journal head to the parent buffer
    head, free the blocks, then journal the parent. If the indirect block list
    is corrupted and points to the parent the journal head will be detached
    when the block is cleared, causing an OOPS.

    Check for that explicitly and handle it gracefully.

    This patch fixes the third case (image hdb.20000057.nullderef.gz)
    reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882.

    Signed-off-by: Duane Griffin
    Signed-off-by: Theodore Ts'o

    Duane Griffin
     
  • If the orphan node list includes valid, untruncatable nodes with nlink > 0
    the ext4_orphan_cleanup loop which attempts to delete them will not do so,
    causing it to loop forever. Fix by checking for such nodes in the
    ext4_orphan_get function.

    This patch fixes the second case (image hdb.20000009.softlockup.gz)
    reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882.

    Signed-off-by: Duane Griffin
    Signed-off-by: Theodore Ts'o

    Duane Griffin