07 Nov, 2008

3 commits

  • When initializing an uninitialized block group in ext4_new_inode(),
    its block group checksum must be re-calculated. This fixes a race
    when several threads try to allocate a new inode in an UNINIT'd group.

    There is some question whether we need to be initializing the block
    bitmap in ext4_new_inode() at all, but for now, if we are going to
    init the block group, let's eliminate the race.

    Signed-off-by: Frederic Bohe
    Signed-off-by: "Theodore Ts'o"

    Frederic Bohe
     
  • We need to make sure we mark the buffer_heads as dirty and uptodate
    so that block_write_full_page write them correctly.

    This fixes mmap corruptions that can occur in low memory situations.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • This fixes a 2.6.27 regression which was introduced in commit a02908f1.

    We weren't passing the chunk parameter down to the two subections,
    ext4_indirect_trans_blocks() and ext4_ext_index_trans_blocks(), with
    the result that massively overestimate the amount of credits needed by
    ext4_da_writepages, especially in the non-extents case. This causes
    failures especially on /boot partitions, which tend to be small and
    non-extent using since GRUB doesn't handle extents.

    This patch fixes the bug reported by Joseph Fannin at:
    http://bugzilla.kernel.org/show_bug.cgi?id=11964

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

04 Nov, 2008

3 commits

  • In ext4_sync_fs, we only wait for a commit to finish if we started it,
    but there may be one already in progress which will not be synced.

    In the case of a data=ordered umount with pending long symlinks which
    are delayed due to a long list of other I/O on the backing block
    device, this causes the buffer associated with the long symlinks to
    not be moved to the inode dirty list in the second phase of
    fsync_super. Then, before they can be dirtied again, kjournald exits,
    seeing the UMOUNT flag and the dirty pages are never written to the
    backing block device, causing long symlink corruption and exposing new
    or previously freed block data to userspace.

    To ensure all commits are synced, we flush all journal commits now
    when sync_fs'ing ext4.

    Signed-off-by: Arthur Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: "Theodore Ts'o"
    Cc: Eric Sandeen
    Cc:

    Theodore Ts'o
     
  • Use le16_to_cpu to read the s_reserved_gdt_blocks values
    from super block.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • If we try to free a block which is already freed, the code was
    returning without first unlocking the group.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     

28 Oct, 2008

3 commits

  • As reported by Eric Paris, the capable() check in ext4_has_free_blocks()
    sometimes causes SELinux denials.

    We can rearrange the logic so that we only try to use the root-reserved
    blocks when necessary, and even then we can move the capable() test
    to last, to avoid the check most of the time.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • Mingming pointed out that ext4_claim_free_blocks & ext4_has_free_blocks
    are largely cut & pasted; they can be collapsed/merged as follows.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • Vegard Nossum reported a bug which accesses freed memory (found via
    kmemcheck). When journal has been aborted, ext4_put_super() calls
    ext4_abort() after freeing the journal_t object, and then ext4_abort()
    accesses it. This patch fix it.

    Signed-off-by: Hidehiro Kawai
    Acked-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Hidehiro Kawai
     

26 Oct, 2008

1 commit

  • Fix a regression caused by commit d0156417, "ext4: fix ext4_dx_readdir
    hash collision handling", where deleting files in a large directory
    (requiring more than one getdents system call), results in some
    filenames being returned twice. This was caused by a failure to
    update info->curr_hash and info->curr_minor_hash, so that if the
    directory had gotten modified since the last getdents() system call
    (as would be the case if the user is running "rm -r" or "git clean"),
    a directory entry would get returned twice to the userspace.

    Signed-off-by: "Theodore Ts'o"

    This patch fixes the bug reported by Markus Trippelsdorf at:
    http://bugzilla.kernel.org/show_bug.cgi?id=11844

    Signed-off-by: "Theodore Ts'o"
    Tested-by: Markus Trippelsdorf

    Theodore Ts'o
     

24 Oct, 2008

2 commits

  • Signed-off-by: Christoph Hellwig
    [ All users removed in "switch all filesystems over to d_obtain_alias",
    aka commit 440037287c5ebb07033ab927ca16bb68c291d309 ]
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: (66 commits)
    [PATCH] kill the rest of struct file propagation in block ioctls
    [PATCH] get rid of struct file use in blkdev_ioctl() BLKBSZSET
    [PATCH] get rid of blkdev_locked_ioctl()
    [PATCH] get rid of blkdev_driver_ioctl()
    [PATCH] sanitize blkdev_get() and friends
    [PATCH] remember mode of reiserfs journal
    [PATCH] propagate mode through swsusp_close()
    [PATCH] propagate mode through open_bdev_excl/close_bdev_excl
    [PATCH] pass fmode_t to blkdev_put()
    [PATCH] kill the unused bsize on the send side of /dev/loop
    [PATCH] trim file propagation in block/compat_ioctl.c
    [PATCH] end of methods switch: remove the old ones
    [PATCH] switch sr
    [PATCH] switch sd
    [PATCH] switch ide-scsi
    [PATCH] switch tape_block
    [PATCH] switch dcssblk
    [PATCH] switch dasd
    [PATCH] switch mtd_blkdevs
    [PATCH] switch mmc
    ...

    Linus Torvalds
     

23 Oct, 2008

2 commits


21 Oct, 2008

2 commits


18 Oct, 2008

1 commit


17 Oct, 2008

4 commits


16 Oct, 2008

3 commits


14 Oct, 2008

3 commits


13 Oct, 2008

1 commit

  • fs/ext4/super.c: In function 'ext4_fill_super':
    fs/ext4/super.c:2226: error: 'ext4_ui_proc_fops' undeclared (first use
    in this function)
    fs/ext4/super.c:2226: error: (Each undeclared identifier is reported
    only once
    fs/ext4/super.c:2226: error: for each function it appears in.)

    Signed-off-by: Alexander Beregalov
    Signed-off-by: Theodore Ts'o

    Alexander Beregalov
     

11 Oct, 2008

5 commits

  • We need to make sure we don't reuse the data blocks released
    during the transaction untill the transaction commits. We force
    this mode only for ordered and journalled mode. Writeback mode
    already don't provided data consistency.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Theodore Ts'o

    Aneesh Kumar K.V
     
  • During filesystem recovery we may be doing a truncate
    which expects some of the mballoc data structures to
    be initialized. So do ext4_mb_init before recovery.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Theodore Ts'o

    Aneesh Kumar K.V
     
  • If the journal doesn't abort when it gets an IO error in file data
    blocks, the file data corruption will spread silently. Because
    most of applications and commands do buffered writes without fsync(),
    they don't notice the IO error. It's scary for mission critical
    systems. On the other hand, if the journal aborts whenever it gets
    an IO error in file data blocks, the system will easily become
    inoperable. So this patch introduces a filesystem option to
    determine whether it aborts the journal or just call printk() when
    it gets an IO error in file data.

    If you mount an ext4 fs with data_err=abort option, it aborts on file
    data write error. If you mount it with data_err=ignore, it doesn't
    abort, just call printk(). data_err=ignore is the default.

    Here is the corresponding patch of the ext3 version:
    http://kerneltrap.org/mailarchive/linux-kernel/2008/9/9/3239374

    Signed-off-by: Hidehiro Kawai
    Signed-off-by: Theodore Ts'o

    Hidehiro Kawai
     
  • If the journal has aborted due to a checkpointing failure, we
    have to keep the contents of the journal space. Otherwise, the
    filesystem will lose uncheckpointed metadata completely and
    become inconsistent. To avoid this, we need to keep needs_recovery
    flag if checkpoint has failed.

    With this patch, ext4_put_super() detects a checkpointing failure
    from the return value of journal_destroy(), then it invokes
    ext4_abort() to make the filesystem read only and keep
    needs_recovery flag. Errors from jbd2_journal_flush() are also
    handled by this patch in some places.

    Signed-off-by: Hidehiro Kawai
    Signed-off-by: Theodore Ts'o

    Hidehiro Kawai
     
  • The ext4 filesystem is getting stable enough that it's time to drop
    the "dev" prefix. Also remove the requirement for the TEST_FILESYS
    flag.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

10 Oct, 2008

3 commits

  • This fixes a bug which caused on-line resizing of filesystems with a
    1k blocksize to fail. The root cause of this bug was the fact that if
    an uninitalized bitmap block gets read in by userspace (which
    e2fsprogs does try to avoid, but can happen when the blocksize is less
    than the pagesize and an adjacent blocks is read into memory)
    ext4_read_block_bitmap() was erroneously depending on the buffer
    uptodate flag to decide whether it needed to initialize the bitmap
    block in memory --- i.e., to set the standard set of blocks in use by
    a block group (superblock, bitmaps, inode table, etc.). Essentially,
    ext4_read_block_bitmap() assumed it was the only routine that might
    try to read a block containing a block bitmap, which is simply not
    true.

    To fix this, ext4_read_block_bitmap() and ext4_read_inode_bitmap()
    must always initialize uninitialized bitmap blocks. Once a block or
    inode is allocated out of that bitmap, it will be marked as
    initialized in the block group descriptor, so in general this won't
    result any extra unnecessary work.

    Signed-off-by: Frederic Bohe
    Signed-off-by: "Theodore Ts'o"

    Frederic Bohe
     
  • Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • With modern hard drives, reading 64k takes roughly the same time as
    reading a 4k block. So request readahead for adjacent inode table
    blocks to reduce the time it takes when iterating over directories
    (especially when doing this in htree sort order) in a cold cache case.
    With this patch, the time it takes to run "git status" on a kernel
    tree after flushing the caches via "echo 3 > /proc/sys/vm/drop_caches"
    is reduced by 21%.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

09 Oct, 2008

1 commit

  • ext4_xattr_set_handle() eventually ends up calling
    ext4_mark_inode_dirty() which tries to expand the inode by shifting
    the EAs. This leads to the xattr_sem being downed again and leading
    to a deadlock.

    This patch makes sure that if ext4_xattr_set_handle() is in the
    call-chain, ext4_mark_inode_dirty() will not expand the inode.

    Signed-off-by: Kalpak Shah
    Signed-off-by: "Theodore Ts'o"

    Kalpak Shah
     

07 Oct, 2008

2 commits


06 Oct, 2008

1 commit