11 Aug, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
    no need for list_for_each_entry_safe()/resetting with superblock list
    Fix sget() race with failing mount
    vfs: don't hold s_umount over close_bdev_exclusive() call
    sysv: do not mark superblock dirty on remount
    sysv: do not mark superblock dirty on mount
    btrfs: remove junk sb_dirt change
    BFS: clean up the superblock usage
    AFFS: wait for sb synchronization when needed
    AFFS: clean up dirty flag usage
    cifs: truncate fallout
    mbcache: fix shrinker function return value
    mbcache: Remove unused features
    add f_flags to struct statfs(64)
    pass a struct path to vfs_statfs
    update VFS documentation for method changes.
    All filesystems that need invalidate_inode_buffers() are doing that explicitly
    convert remaining ->clear_inode() to ->evict_inode()
    Make ->drop_inode() just return whether inode needs to be dropped
    fs/inode.c:clear_inode() is gone
    fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
    ...

    Fix up trivial conflicts in fs/nilfs2/super.c

    Linus Torvalds
     

10 Aug, 2010

1 commit


05 Aug, 2010

1 commit

  • commit 3d0518f4, "ext4: New rec_len encoding for very
    large blocksizes" made several changes to this path, but from
    a perf perspective, un-inlining ext4_rec_len_from_disk() seems
    most significant. This function is called from ext4_check_dir_entry(),
    which on a file-creation workload is called extremely often.

    I tested this with bonnie:

    # bonnie++ -u root -s 0 -f -x 200 -d /mnt/test -n 32

    (this does 200 iterations) and got this for the file creations:

    ext4 stock: Average = 21206.8 files/s
    ext4 inlined: Average = 22346.7 files/s (+5%)

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     

02 Aug, 2010

1 commit


27 Jul, 2010

7 commits


30 Jun, 2010

1 commit


29 Jun, 2010

2 commits


15 Jun, 2010

1 commit


12 Jun, 2010

1 commit

  • We don't need to set s_dirt in most of the ext4 code when journaling
    is enabled. In ext3/4 some of the summary statistics for # of free
    inodes, blocks, and directories are calculated from the per-block
    group statistics when the file system is mounted or unmounted. As a
    result the superblock doesn't have to be updated, either via the
    journal or by setting s_dirt. There are a few exceptions, most
    notably when resizing the file system, where the superblock needs to
    be modified --- and in that case it should be done as a journalled
    operation if possible, and s_dirt set only in no-journal mode.

    This patch will optimize out some unneeded disk writes when using ext4
    with a journal.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

28 May, 2010

1 commit


17 May, 2010

8 commits

  • Add a new ext4 state to tell us when a file has been newly created; use
    that state in ext4_sync_file in no-journal mode to tell us when we need
    to sync the parent directory as well as the inode and data itself. This
    fixes a problem in which a panic or power failure may lose the entire
    file even when using fsync, since the parent directory entry is lost.

    Addresses-Google-Bug: #2480057

    Signed-off-by: Frank Mayhar
    Signed-off-by: "Theodore Ts'o"

    Frank Mayhar
     
  • This patch was generated using:

    #!/usr/bin/perl -i
    while (<>) {
    s/[ ]+$//;
    print;
    }

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • struct ext4_new_group_input needs to be converted because u64 has
    only 32-bit alignment on some 32-bit architectures, notably i386.

    Signed-off-by: Ben Hutchings
    Signed-off-by: "Theodore Ts'o"

    Ben Hutchings
     
  • It is unnecessary, and in general impossible, to define the compat
    ioctl numbers except when building the filesystem with CONFIG_COMPAT
    defined.

    Signed-off-by: Ben Hutchings
    Signed-off-by: "Theodore Ts'o"

    Ben Hutchings
     
  • At several places we modify EXT4_I(inode)->i_flags without holding
    i_mutex (ext4_do_update_inode, ...). These modifications are racy and
    we can lose updates to i_flags. So convert handling of i_flags to use
    bitops which are atomic.

    https://bugzilla.kernel.org/show_bug.cgi?id=15792

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     
  • EXT4_ERROR_INODE() tends to provide better error information and in a
    more consistent format. Some errors were not even identifying the inode
    or directory which was corrupted, which made them not very useful.

    Addresses-Google-Bug: #2507977

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • Jack up ext4_get_blocks() and add a new function, ext4_map_blocks()
    which uses a much smaller structure, struct ext4_map_blocks which is
    20 bytes, as opposed to a struct buffer_head, which nearly 5 times
    bigger on an x86_64 machine. By switching things to use
    ext4_map_blocks(), we can save stack space by using ext4_map_blocks()
    since we can avoid allocating a struct buffer_head on the stack.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • This adds a new field in ext4_group_info to cache the largest available
    block range in a block group; and don't load the buddy pages until *after*
    we've done a sanity check on the block group.

    With large allocation requests (e.g., fallocate(), 8MiB) and relatively full
    partitions, it's easy to have no block groups with a block extent large
    enough to satisfy the input request length. This currently causes the loop
    during cr == 0 in ext4_mb_regular_allocator() to load the buddy bitmap pages
    for EVERY block group. That can be a lot of pages. The patch below allows
    us to call ext4_mb_good_group() BEFORE we load the buddy pages (although we
    have check again after we lock the block group).

    Addresses-Google-Bug: #2578108
    Addresses-Google-Bug: #2704453

    Signed-off-by: Curt Wohlgemuth
    Signed-off-by: "Theodore Ts'o"

    Curt Wohlgemuth
     

06 Mar, 2010

3 commits

  • * 'write_inode2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    pass writeback_control to ->write_inode
    make sure data is on disk before calling ->write_inode

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (36 commits)
    ext4: fix up rb_root initializations to use RB_ROOT
    ext4: Code cleanup for EXT4_IOC_MOVE_EXT ioctl
    ext4: Fix the NULL reference in double_down_write_data_sem()
    ext4: Fix insertion point of extent in mext_insert_across_blocks()
    ext4: consolidate in_range() definitions
    ext4: cleanup to use ext4_grp_offs_to_block()
    ext4: cleanup to use ext4_group_first_block_no()
    ext4: Release page references acquired in ext4_da_block_invalidatepages
    ext4: Fix ext4_quota_write cross block boundary behaviour
    ext4: Convert BUG_ON checks to use ext4_error() instead
    ext4: Use direct_IO_no_locking in ext4 dio read
    ext4: use ext4_get_block_write in buffer write
    ext4: mechanical rename some of the direct I/O get_block's identifiers
    ext4: make "offset" consistent in ext4_check_dir_entry()
    ext4: Handle non empty on-disk orphan link
    ext4: explicitly remove inode from orphan list after failed direct io
    ext4: fix error handling in migrate
    ext4: deprecate obsoleted mount options
    ext4: Fix fencepost error in chosing choosing group vs file preallocation.
    jbd2: clean up an assertion in jbd2_journal_commit_transaction()
    ...

    Linus Torvalds
     
  • This gives the filesystem more information about the writeback that
    is happening. Trond requested this for the NFS unstable write handling,
    and other filesystems might benefit from this too by beeing able to
    distinguish between the different callers in more detail.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

05 Mar, 2010

1 commit

  • Allocate uninitialized extent before ext4 buffer write and
    convert the extent to initialized after io completes.
    The purpose is to make sure an extent can only be marked
    initialized after it has been written with new data so
    we can safely drop the i_mutex lock in ext4 DIO read without
    exposing stale data. This helps to improve multi-thread DIO
    read performance on high-speed disks.

    Skip the nobh and data=journal mount cases to make things simple for now.

    Signed-off-by: Jiaying Zhang
    Signed-off-by: "Theodore Ts'o"

    Jiaying Zhang
     

04 Mar, 2010

1 commit

  • There are duplicate macro definitions of in_range() in mballoc.h and
    balloc.c. This consolidates these two definitions into ext4.h, and
    changes extents.c to use in_range() as well.

    Signed-off-by: Akinobu Mita
    Signed-off-by: "Theodore Ts'o"
    Cc: Andreas Dilger

    Akinobu Mita
     

03 Mar, 2010

2 commits


24 Feb, 2010

1 commit

  • fallocate() may potentially instantiate blocks past EOF, depending
    on the flags used when it is called.

    e2fsck currently has a test for blocks past i_size, and it
    sometimes trips up - noticeably on xfstests 013 which runs fsstress.

    This patch from Jiayang does fix it up - it (along with
    e2fsprogs updates and other patches recently from Aneesh) has
    survived many fsstress runs in a row.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Jiaying Zhang
    Signed-off-by: "Theodore Ts'o"

    Jiaying Zhang
     

17 Feb, 2010

1 commit

  • Add __percpu sparse annotations to fs.

    These annotations are to make sparse consider percpu variables to be
    in a different address space and warn if accessed without going
    through percpu accessors. This patch doesn't affect normal builds.

    Signed-off-by: Tejun Heo
    Cc: "Theodore Ts'o"
    Cc: Trond Myklebust
    Cc: Alex Elder
    Cc: Christoph Hellwig
    Cc: Alexander Viro

    Tejun Heo
     

16 Feb, 2010

1 commit


25 Jan, 2010

3 commits


15 Jan, 2010

1 commit


01 Jan, 2010

1 commit

  • In the past, ext4_calc_metadata_amount(), and its sub-functions
    ext4_ext_calc_metadata_amount() and ext4_indirect_calc_metadata_amount()
    badly over-estimated the number of metadata blocks that might be
    required for delayed allocation blocks. This didn't matter as much
    when functions which managed the reserved metadata blocks were more
    aggressive about dropping reserved metadata blocks as delayed
    allocation blocks were written, but unfortunately they were too
    aggressive. This was fixed in commit 0637c6f, but as a result the
    over-estimation by ext4_calc_metadata_amount() would lead to reserving
    2-3 times the number of pending delayed allocation blocks as
    potentially required metadata blocks. So if there are 1 megabytes of
    blocks which have been not yet been allocation, up to 3 megabytes of
    space would get reserved out of the user's quota and from the file
    system free space pool until all of the inode's data blocks have been
    allocated.

    This commit addresses this problem by much more accurately estimating
    the number of metadata blocks that will be required. It will still
    somewhat over-estimate the number of blocks needed, since it must make
    a worst case estimate not knowing which physical blocks will be
    needed, but it is much more accurate than before.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o