10 Sep, 2011

1 commit


06 Jun, 2011

1 commit

  • Kazuya Mio reported that he was able to hit BUG_ON(next == lblock)
    in ext4_ext_put_gap_in_cache() while creating a sparse file in extent
    format and fill the tail of file up to its end. We will hit the BUG_ON
    when we write the last block (2^32-1) into the sparse file.

    The root cause of the problem lies in the fact that we specifically set
    s_maxbytes so that block at s_maxbytes fit into on-disk extent format,
    which is 32 bit long. However, we are not storing start and end block
    number, but rather start block number and length in blocks. It means
    that in order to cover extent from 0 to EXT_MAX_BLOCK we need
    EXT_MAX_BLOCK+1 to fit into len (because we counting block 0 as well) -
    and it does not.

    The only way to fix it without changing the meaning of the struct
    ext4_extent members is, as Kazuya Mio suggested, to lower s_maxbytes
    by one fs block so we can cover the whole extent we can get by the
    on-disk extent format.

    Also in many places EXT_MAX_BLOCK is used as length instead of maximum
    logical block number as the name suggests, it is all a bit messy. So
    this commit renames it to EXT_MAX_BLOCKS and change its usage in some
    places to actually be maximum number of blocks in the extent.

    The bug which this commit fixes can be reproduced as follows:

    dd if=/dev/zero of=/mnt/mp1/file bs= count=1 seek=$((2**32-2))
    sync
    dd if=/dev/zero of=/mnt/mp1/file bs= count=1 seek=$((2**32-1))

    Reported-by: Kazuya Mio
    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"

    Lukas Czerner
     

19 May, 2011

1 commit


28 Oct, 2010

1 commit


27 Jul, 2010

1 commit


03 Jun, 2010

1 commit


17 May, 2010

2 commits


11 May, 2010

1 commit

  • Making sure ee_block is initialized to zero to prevent gcc from
    kvetching. It's harmless (although it's not obvious that it's
    harmless) from code inspection:

    fs/ext4/move_extent.c:478: warning: 'start_ext.ee_block' may be used
    uninitialized in this function

    Thanks to Stefan Richter for first bringing this to the attention of
    linux-ext4@vger.kernel.org.

    Signed-off-by: LiuQi
    Signed-off-by: "Theodore Ts'o"
    Cc: Stefan Richter

    Steven Liu
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

04 Mar, 2010

3 commits


16 Feb, 2010

1 commit


07 Dec, 2009

1 commit

  • This patch fixes three problems in the handling of the
    EXT4_IOC_MOVE_EXT ioctl:

    1. In current EXT4_IOC_MOVE_EXT, there are read access mode checks for
    original and donor files, but they allow the illegal write access to
    donor file, since donor file is overwritten by original file data. To
    fix this problem, change access mode checks of original (r->r/w) and
    donor (r->w) files.

    2. Disallow the use of donor files that have a setuid or setgid bits.

    3. Call mnt_want_write() and mnt_drop_write() before and after
    ext4_move_extents() calling to get write access to a mount.

    Signed-off-by: Akira Fujita
    Signed-off-by: "Theodore Ts'o"

    Akira Fujita
     

24 Nov, 2009

3 commits

  • Integrate duplicate lines (acquire/release semaphore and invalidate
    extent cache in move_extent_per_page()) into mext_replace_branches(),
    to reduce source and object code size.

    Signed-off-by: Akira Fujita
    Signed-off-by: "Theodore Ts'o"

    Akira Fujita
     
  • The move_extent.moved_len is used to pass back the number of exchanged
    blocks count to user space. Currently the caller must clear this
    field; but we spend more code space checking for this requirement than
    simply zeroing the field ourselves, so let's just make life easier for
    everyone all around.

    Signed-off-by: Kazuya Mio
    Signed-off-by: Akira Fujita
    Signed-off-by: "Theodore Ts'o"

    Kazuya Mio
     
  • At the beginning of ext4_move_extent(), we call
    ext4_discard_preallocations() to discard inode PAs of orig and donor
    inodes. But in the following case, blocks can be double freed, so
    move ext4_discard_preallocations() to the end of ext4_move_extents().

    1. Discard inode PAs of orig and donor inodes with
    ext4_discard_preallocations() in ext4_move_extents().

    orig : [ DATA1 ]
    donor: [ DATA2 ]

    2. While data blocks are exchanging between orig and donor inodes, new
    inode PAs is created to orig by other process's block allocation.
    (Since there are semaphore gaps in ext4_move_extents().) And new
    inode PAs is used partially (2-1).

    2-1 Create new inode PAs to orig inode
    orig : [ DATA1 | used PA1 | free PA1 ]
    donor: [ DATA2 ]

    3. Donor inode which has old orig inode's blocks is deleted after
    EXT4_IOC_MOVE_EXT finished (3-1, 3-2). So the block bitmap
    corresponds to old orig inode's blocks are freed.

    3-1 After EXT4_IOC_MOVE_EXT finished
    orig : [ DATA2 | free PA1 ]
    donor: [ DATA1 | used PA1 ]

    3-2 Delete donor inode
    orig : [ DATA2 | free PA1 ]
    donor: [ FREE SPACE(DATA1) | FREE SPACE(used PA1) ]

    4. The double-free of blocks is occurred, when close() is called to
    orig inode. Because ext4_discard_preallocations() for orig inode
    frees used PA1 and free PA1, though used PA1 is already freed in 3.

    4-1 Double-free of blocks is occurred
    orig : [ DATA2 | FREE SPACE(free PA1) ]
    donor: [ FREE SPACE(DATA1) | DOUBLE FREE(used PA1) ]

    Signed-off-by: Akira Fujita
    Signed-off-by: "Theodore Ts'o"

    Akira Fujita
     

23 Nov, 2009

4 commits

  • Fix a few spelling typos in move_extent.c

    Signed-off-by: Akira Fujita
    Signed-off-by: "Theodore Ts'o"

    Akira Fujita
     
  • If CONFIG_PROVE_LOCKING is enabled, the double_down_write_data_sem()
    will trigger a false-positive warning of a recursive lock. Since we
    take i_data_sem for the two inodes ordered by their inode numbers,
    this isn't a problem. Use of down_write_nested() will notify the lock
    dependency checker machinery that there is no problem here.

    This problem was reported by Brian Rogers:

    http://marc.info/?l=linux-ext4&m=125115356928011&w=1

    Reported-by: Brian Rogers
    Signed-off-by: Akira Fujita
    Signed-off-by: "Theodore Ts'o"

    Akira Fujita
     
  • ext4_move_extents() checks the logical block contiguousness
    of original file with ext4_find_extent() and mext_next_extent().
    Therefore the extent which ext4_ext_path structure indicates
    must not be changed between above functions.

    But in current implementation, there is no i_data_sem protection
    between ext4_ext_find_extent() and mext_next_extent(). So the extent
    which ext4_ext_path structure indicates may be overwritten by
    delalloc. As a result, ext4_move_extents() will exchange wrong blocks
    between original and donor files. I change the place where
    acquire/release i_data_sem to solve this problem.

    Moreover, I changed move_extent_per_page() to start transaction first,
    and then acquire i_data_sem. Without this change, there is a
    possibility of the deadlock between mmap() and ext4_move_extents():

    * NOTE: "A", "B" and "C" mean different processes

    A-1: ext4_ext_move_extents() acquires i_data_sem of two inodes.

    B: do_page_fault() starts the transaction (T),
    and then tries to acquire i_data_sem.
    But process "A" is already holding it, so it is kept waiting.

    C: While "A" and "B" running, kjournald2 tries to commit transaction (T)
    but it is under updating, so kjournald2 waits for it.

    A-2: Call ext4_journal_start with holding i_data_sem,
    but transaction (T) is locked.

    Signed-off-by: Akira Fujita
    Signed-off-by: "Theodore Ts'o"

    Akira Fujita
     
  • If the EXT4_IOC_MOVE_EXT ioctl fails, the number of blocks that were
    exchanged before the failure should be returned to the userspace
    caller. Unfortunately, currently if the block size is not the same as
    the page size, the returned block count that is returned is the
    page-aligned block count instead of the actual block count. This
    commit addresses this bug.

    Signed-off-by: Akira Fujita
    Signed-off-by: "Theodore Ts'o"

    Akira Fujita
     

29 Sep, 2009

2 commits


17 Sep, 2009

6 commits

  • There's no reason to redefine the maximum allowable offset
    in an extent-based file just for defrag;
    EXT_MAX_BLOCK already does this.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • If logical block offset of original file which is passed to
    EXT4_IOC_MOVE_EXT is different from donor file's,
    a calculation error occurs in ext4_calc_swap_extents(),
    therefore wrong block is exchanged between original file and donor file.
    As a result, we hit ext4_error() in check_block_validity().
    To detect the logical offset difference in EXT4_IOC_MOVE_EXT,
    add checks to mext_calc_swap_extents() and handle it as error,
    since data exchange must be done between the same blocks in EXT4_IOC_MOVE_EXT.

    Reported-by: Peng Tao
    Signed-off-by: Akira Fujita
    Signed-off-by: "Theodore Ts'o"

    Akira Fujita
     
  • There is the possibility that path structure which is taken
    by ext4_ext_find_extent() indicates null extents.
    Because during data block exchanging in ext4_move_extents(),
    constitution of an extent tree may be changed.
    As a solution, the patch adds null extent check
    to ext_get_path().

    Reported-by: Peng Tao
    Signed-off-by: Akira Fujita
    Signed-off-by: "Theodore Ts'o"

    Akira Fujita
     
  • Replace BUG_ON calls with a call to ext4_error()
    to print an error message if EXT4_IOC_MOVE_EXT failed
    with some kind of reasons. This will help to debug.
    Ted pointed this out, thanks.

    Signed-off-by: Akira Fujita
    Signed-off-by: "Theodore Ts'o"

    Akira Fujita
     
  • Replace get_ext_path macro with an inline function,
    since this macro looks like a function call but its arguments
    get modified. Ted pointed this out, thanks.

    Signed-off-by: Akira Fujita
    Signed-off-by: "Theodore Ts'o"

    Akira Fujita
     
  • The mext_check_arguments() function in move_extents.c has wrong
    comparisons. orig_start which is passed from user-space is block
    unit, but i_size of inode is byte unit, therefore the checks do not
    work fine. This mis-check leads to the overflow of 'len' and then
    hits BUG_ON() in ext4_move_extents(). The patch fixes this issue.

    Signed-off-by: Akira Fujita
    Reviewed-by: Greg Freemyer
    Signed-off-by: "Theodore Ts'o"

    Akira Fujita
     

06 Sep, 2009

2 commits

  • This function means moving extents every page, so change its name from
    move_exgtent_par_page().

    Signed-off-by: Akira Fujita
    Signed-off-by: "Theodore Ts'o"

    Akira Fujita
     
  • The ext4_move_extents() functions checks with BUG_ON() whether the
    exchanged blocks count accords with request blocks count. But, if the
    target range (orig_start + len) includes sparse block(s), 'moved_len'
    (exchanged blocks count) does not agree with 'len' (request blocks
    count), since sparse block is not counted in 'moved_len'. This causes
    us to hit the BUG_ON(), even though the function succeeded.

    Signed-off-by: Akira Fujita
    Signed-off-by: "Theodore Ts'o"

    Akira Fujita
     

11 Aug, 2009

1 commit

  • move_extent_par_page calls a_ops->write_begin() to increase journal
    handler's reference count. However, if either mext_replace_branches()
    or ext4_get_block fails, the increased reference count isn't
    decreased. This will cause a later attempt to umount of the fs to hang
    forever. The patch addresses the issue by calling ext4_journal_stop()
    if page is not NULL (which means a_ops->write_end() isn't invoked).

    Signed-off-by: Peng Tao
    Signed-off-by: "Theodore Ts'o"

    Peng Tao
     

18 Jun, 2009

1 commit

  • The EXT4_IOC_MOVE_EXT exchanges the blocks between orig_fd and donor_fd,
    and then write the file data of orig_fd to donor_fd.
    ext4_mext_move_extent() is the main fucntion of ext4 online defrag,
    and this patch includes all functions related to ext4 online defrag.

    Signed-off-by: Akira Fujita
    Signed-off-by: Takashi Sato
    Signed-off-by: Kazuya Mio
    Signed-off-by: "Theodore Ts'o"

    Akira Fujita