16 May, 2008

1 commit

  • If the block allocator gets blocks out of system zone ext4 calls
    ext4_error. But if the file system is mounted with errors=continue
    retry block allocation. We need to mark the system zone blocks as
    in use to make sure retry don't pick them again

    System zone is the block range mapping block bitmap, inode bitmap and inode
    table.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     

15 May, 2008

1 commit

  • This fix the uninitialized bs when we try to replace a xattr entry in
    ibody with the new value which require more than free space.

    This situation only happens we format ext3/4 with inode size more than 128 and
    we have put xattr entries both in ibody and block. The consequences about
    this bug is we will lost the xattr block which pointed by i_file_acl with all
    xattr entires in it. We will alloc a new xattr block and put that large value
    entry in it. The old xattr block will become orphan block.

    Signed-off-by: Tiger Yang
    Cc:
    Cc: Andreas Gruenbacher
    Acked-by: Andreas Dilger
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tiger Yang
     

14 May, 2008

5 commits

  • In case of inode preallocation, the number of blocks to allocate depends
    on the file size and it is calculated in ext4_mb_normalize_request().
    Each group in the filesystem is then checked to find one that can be
    used for allocation; this is done in ext4_mb_good_group().

    When a file bigger than 4MB is created, the requested number of blocks
    to preallocate, calculated by ext4_mb_normalize_request is 4096.
    However for a filesystem with 1KB block size, the maximum size of the
    block buddies used by the multiblock allocator is 2048, so none of
    groups in the filesystem satisfies the search criteria in
    ext4_mb_good_group(). Scanning all the filesystem groups impacts
    performance.

    This was demonstrated by using a freshly created, 70GB, 1k block
    filesystem, with caches dropped write before the test via
    /proc/sys/vm/drop_caches, and with the filesystem mounted with
    nodelalloc and nodealloc,nomballoc. The time to write an 8 megabyte
    file using "dd if=/dev/zero of=/mnt/test/fo bs=8k count=1k conv=fsync"
    took 35.5091 seconds (236kB/s) with nodellaloc, and 0.233754 seconds
    (35.9 MB/s) with the nodelloc,nomballoc options. With a 1TB partition,
    it took several minutes to write 8MB!

    This patch modifies the algorithm in ext4_mb_normalize_group_request to
    calculate the number of blocks to allocate by taking into account the
    maximum size of free blocks chunks handled by the multiblock allocator.

    It has also been tested for filesystems with 2KB and 4KB block sizes to
    ensure that those cases don't regress.

    Reviewed-by: Aneesh Kumar K.V
    Signed-off-by: Valerie Clement
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Valerie Clement
     
  • Cc:
    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • In journal=data mode, it is not enough to do write_inode_now as done in
    vfs_quota_on() to write all data to their final location (which is
    needed for quota_read to work correctly). Calling journal_flush() does
    its job.

    Cc:
    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • When quota is disabled, we should not print 'journaled quota not
    supported' when user tried to mount non-journaled quota. Also fix typo
    in the message.

    Signed-off-by: Jan Kara
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • We should not allow user to change quota mount options when quota is
    just suspended. It would make mount options and internal quota state
    inconsistent. Also we should not allow user to change quota format when
    quota is turned on. On the other hand we can just silently ignore when
    some option is set to the value it already has (mount does this on
    remount).

    Cc:
    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

13 May, 2008

1 commit

  • bdevname() fills the buffer that it is given as a parameter, so calling
    strcpy() or snprintf() on the returned value is redundant (and probably not
    guaranteed to work - I don't think strcpy and snprintf support overlapping
    buffers.)

    Signed-off-by: Jean Delvare
    Cc: Stephen Tweedie
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jean Delvare
     

30 Apr, 2008

14 commits

  • In ext4_mb_init_backend() 'i' is of type ext4_group_t. Since unsigned, i
    >= 0 is always true, so fix hot spins after err_freebuddy: and -meta:
    and prevent decrements when zero.

    Signed-off-by: Roel Kluin
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Roel Kluin
     
  • 'copied' is unsigned, whereas 'ret2' is not. The test (copied < 0) fails

    Signed-off-by: Roel Kluin
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Roel Kluin
     
  • Move function and structure definiations out of mballoc.c and put it under
    a new header file mballoc.h

    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Mingming Cao
     
  • This patch allows compiling mballoc with:
    #define AGGRESSIVE_CHECK
    #define DOUBLE_CHECK
    #define MB_DEBUG

    It fixes:
    Compilation errors:
    fs/ext4/mballoc.c: In function '__mb_check_buddy':
    fs/ext4/mballoc.c:605: error: 'struct ext4_prealloc_space' has no member named 'group_list'
    fs/ext4/mballoc.c:606: error: 'struct ext4_prealloc_space' has no member named 'pstart'
    fs/ext4/mballoc.c:608: error: 'struct ext4_prealloc_space' has no member named 'len'

    Compilation warnings:
    fs/ext4/mballoc.c: In function 'ext4_mb_normalize_group_request':
    fs/ext4/mballoc.c:2863: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'int'
    fs/ext4/mballoc.c: In function 'ext4_mb_use_inode_pa':
    fs/ext4/mballoc.c:3103: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'int'

    Sparse check:
    fs/ext4/mballoc.c:3818:2: warning: context imbalance in 'ext4_mb_show_ac' - different lock contexts for basic block

    Signed-off-by: Solofo Ramangalahy
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Solofo Ramangalahy
     
  • Because ext4_check_descriptors is called at mount time you can't use ext4_error
    as it calls ext4_commit_sb, which since the sb isn't all the way initialized
    causes bad things to happen (ie a panic). This patch changes the ext4_error's
    to printk's to keep this problem from happening. Thanks much,

    Signed-off-by: Josef Bacik
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Josef Bacik
     
  • We should mark the inode dirty only after initializing the extent
    tree. Also if we fail during extent initialization we need
    to call DQUOT_FREE_INODE.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • The recently announced "Linux POSIX file system test suite"
    caught a truncate issue when using extents:
    mtime and ctime are not updated when truncate is successful.

    This is the single issue caught with "default" ext4 (mkfs and mount
    with minimal options).
    The testsuite does not report failure with -o noextents.

    With the following patch, all tests of the testsuite pass.

    Signed-off-by: Solofo Ramangalahy
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Solofo Ramangalahy
     
  • We can't do GFP_NOFS allocation after taking ext4_lock_group

    BUG: sleeping function called from invalid context at mm/slab.c:3054
    in_atomic():1, irqs_disabled():0
    1 lock held by vi/2426:
    #0: (&ei->i_data_sem){----}, at: [] ext4_release_file+0x23/0x66
    Pid: 2426, comm: vi Not tainted 2.6.25-rc7 #24
    [] __might_sleep+0xbe/0xc5
    [] kmem_cache_alloc+0x22/0xa6
    [] ext4_mb_release_inode_pa+0x73/0x1b3
    [] ext4_mb_discard_inode_preallocations+0x22d/0x2d4
    [] ? param_set_ushort+0x32/0x39
    [] ext4_discard_reservation+0x27/0x6a
    [] ext4_release_file+0x2a/0x66
    [] __fput+0xae/0x155
    [] fput+0x17/0x19
    [] filp_close+0x50/0x5a
    [] sys_close+0x71/0xad
    [] sysenter_past_esp+0x5f/0xa5

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • Move ext4 headers out of include/linux. This is just the trivial move,
    there's some more thing that could be done later.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Christoph Hellwig
     
  • This fixes the allocations with GFP_KERNEL while under a transaction problems
    in ext4. This patch is the same as its ext3 counterpart, just switches these
    to GFP_NOFS.

    Signed-off-by: Josef Bacik
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Josef Bacik
     
  • Call dquot_drop() from ext4_dquot_drop() even if we fail to start a
    transaction. Otherwise we never get to dropping references to quota structures
    from the inode and umount will hang indefinitely. Thanks to Payphone LIOU for
    spotting the problem.

    Signed-off-by: Jan Kara
    Signed-off-by: Mingming Cao
    CC: Payphone LIOU
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • The patch below makes ext4 update mtime and ctime of the directory
    into which we move file even if the directory entry already exists.

    Signed-off-by: Jan Kara
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • I checked ext4_ioctl and it looked largely safe to not be used
    without BKL. So convert it over to unlocked_ioctl.

    Signed-off-by: Andi Kleen
    Signed-off-by: Theodore Ts'o

    Andi Kleen
     
  • When we convert an uninitialized extent to an initialized extent
    we need to make sure we return the number of blocks in the
    extent from the file system block corresponding to logical
    file block. Otherwise we cache wrong extent details and this
    results in file system corruption.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     

29 Apr, 2008

7 commits

  • mballoc.c is a whole lot of static functions, which gcc seems to
    really like to inline.

    With the changes below, on x86, I can at least get from:

    432 ext4_mb_new_blocks
    240 ext4_mb_free_blocks
    208 ext4_mb_discard_group_preallocations
    188 ext4_mb_seq_groups_show
    164 ext4_mb_init_cache
    152 ext4_mb_release_inode_pa
    136 ext4_mb_seq_history_show
    ...

    to

    220 ext4_mb_free_blocks
    188 ext4_mb_seq_groups_show
    176 ext4_mb_regular_allocator
    164 ext4_mb_init_cache
    156 ext4_mb_new_blocks
    152 ext4_mb_release_inode_pa
    136 ext4_mb_seq_history_show
    124 ext4_mb_release_group_pa
    ...

    which still has some big functions in there, but not 432 bytes!

    Signed-off-by: Eric Sandeen
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • ext4_ext_get_blocks() returns the number of blocks allocated with buffer
    head unmapped for a read from prealloc space. This is needed so that
    delayed allocation doesn't do block reservation for prealloc space
    since the blocks are already reserved on disk. Mark the buffer head
    unwritten. Some code paths try to read the block if the buffer_head is
    not new and no uptodate. Marking the buffer head unwritten avoids this
    reading.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • ext4_ext_get_blocks() returns number of blocks allocated with buffer
    heads unmapped for a read from prealloc space. This is needed so that
    delayed allocation doesn't do block reservation for prealloc space since
    the blocks are already resevred on disk. Fix ext4_ext_get_blocks to not
    return greater than max_blocks, since some of the code paths cannot
    handle such a return value.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • ext4_fallocate needs to update file size in each transaction. Otherwise
    if we crash the file size won't be seen. We were also not marking
    the inode dirty after updating file size before. Also when we try to
    retry allocation due to ENOSPC, make sure we reset the variable ret so
    that we actually do a retry.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • Fail migrate if we allocated new blocks via mmap write.

    If we write to holes in the file via mmap, we end up allocating
    new blocks. This block allocation happens without taking inode->i_mutex.
    Since migrate is protected by i_mutex and migrate expects that no
    new blocks get allocated during migrate, fail migrate if new blocks
    get allocated.

    We can't take inode->i_mutex in the mmap write path because that
    would result in a locking order violation between i_mutex and mmap_sem.
    Also adding a separate rw_sempahore for protection is really high overhead
    for a rare operation such as migrate.

    Signed-off-by: Aneesh Kumar K.V
    Acked-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • This patch handles possible ENOSPC errors when writing to an
    uninitialized extent in case the filesystem is full.

    A write to a prealloc area causes the split of an unititalized extent
    into initialized and uninitialized extents. If we don't have
    space to add new extent information, instead of returning error,
    convert the existing uninitialized extent to initialized one. We
    need to zero out the blocks corresponding to the entire extent to
    prevent uninitialized data reaching userspace.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • This patch enables extent-formatted normal symlinks. Using extents
    format allows a symlink to refer to a block number larger than 2^32
    on large filesystems. We still don't enable extent format for fast
    symlinks, which are contained in the inode itself.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     

17 Apr, 2008

11 commits