11 Jan, 2011

1 commit

  • The addition of 64k block capability in the rec_len_from_disk
    and rec_len_to_disk functions added a bit of math overhead which
    slows down file create workloads needlessly when the architecture
    cannot even support 64k blocks, thanks to page size limits.

    Similar changes already exist in the ext4 codebase.

    The directory entry checking can also be optimized a bit
    by sprinkling in some unlikely() conditions to move the
    error handling out of line.

    bonnie++ sequential file creates on a 512MB ramdisk speeds up
    from about 77,000/s to about 82,000/s, about a 6% improvement.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Jan Kara

    Eric Sandeen
     

27 May, 2010

1 commit

  • The problem with this is that 17d9ddc72fb8bba0d4f678 ("rbtree: Add support
    for augmented rbtrees") in the linux-next tree adds a new field to that
    struct which needs to be NULLas well. This patch uses RB_ROOT as the
    intializer so all of the relevant fields will be NULL'd.

    Signed-off-by: Venkatesh Pallipadi
    Cc: Eric Paris
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Jan Kara

    Venkatesh Pallipadi
     

16 Jul, 2009

1 commit

  • Get rid of extenddisksize parameter of ext3_get_blocks_handle(). This seems to
    be a relict from some old days and setting disksize in this function does not
    make much sence. Currently it was set only by ext3_getblk(). Since the
    parameter has some effect only if create == 1, it is easy to check that the
    three callers which end up calling ext3_getblk() with create == 1 (ext3_append,
    ext3_quota_write, ext3_mkdir) do the right thing and set disksize themselves.

    Signed-off-by: Jan Kara

    Jan Kara
     

03 Apr, 2009

1 commit


26 Oct, 2008

1 commit

  • Fix a regression caused by commit 6a897cf4, "ext3: fix ext3_dx_readdir
    hash collision handling", where deleting files in a large directory
    (requiring more than one getdents system call), results in some
    filenames being returned twice. This was caused by a failure to
    update info->curr_hash and info->curr_minor_hash, so that if the
    directory had gotten modified since the last getdents() system call
    (as would be the case if the user is running "rm -r" or "git clean"),
    a directory entry would get returned twice to the userspace.

    This patch fixes the bug reported by Markus Trippelsdorf at:
    http://bugzilla.kernel.org/show_bug.cgi?id=11844

    Signed-off-by: "Theodore Ts'o"
    Tested-by: Markus Trippelsdorf

    Theodore Ts'o
     

20 Oct, 2008

2 commits

  • A very large directory with many read failures (either due to storage
    problems, or due to invalid size & blocks from corruption) will generate a
    printk storm as the filesystem continues to try to read all the blocks.
    This flood of messages can tie up the box until it is complete - which may
    be a very long time, especially for very large corrupted values.

    This is fixed by only reporting the corruption once each time we try to
    read the directory.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"
    Cc: Eugene Teo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • This fixes a bug where readdir() would return a directory entry twice
    if there was a hash collision in an hash tree indexed directory.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Eugene Dashevsky
    Signed-off-by: Mike Snitzer
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eugene Dashevsky
     

26 Jul, 2008

1 commit

  • - remove unnecessary code in free_rb_tree_fname
    - rename free_rb_tree_fname to ext3_htree_create_dir_info
    since it and ext3_htree_free_dir_info are a pair
    - replace kmalloc with kzalloc in ext3_htree_free_dir_info

    Signed-off-by: Shen Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shen Feng
     

15 Nov, 2007

1 commit

  • With 64KB blocksize, a directory entry can have size 64KB which does not
    fit into 16 bits we have for entry lenght. So we store 0xffff instead and
    convert value when read from / written to disk. The patch also converts
    some places to use ext3_next_entry() when we are changing them anyway.

    [akpm@linux-foundation.org: coding-style cleanups]
    Signed-off-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

17 Oct, 2007

3 commits

  • CONFIG_EXT3_INDEX is not an exposed config option in the kernel, and it is
    unconditionally defined in ext3_fs.h. tune2fs is already able to turn off
    dir indexing, so at this point it's just cluttering up the code. Remove
    it.

    Signed-off-by: Eric Sandeen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • Fix f_version type: should be u64 instead of long

    There is a type inconsistency between struct inode i_version and struct file
    f_version.

    fs.h:

    struct inode
    u64 i_version;

    and

    struct file
    unsigned long f_version;

    Users do:

    fs/ext3/dir.c:

    if (filp->f_version != inode->i_version) {

    So why isn't f_version a u64 ? It becomes a problem if versions gets
    higher than 2^32 and we are on an architecture where longs are 32 bits.

    This patch changes the f_version type to u64, and updates the users accordingly.

    It applies to 2.6.23-rc2-mm2.

    Signed-off-by: Mathieu Desnoyers
    Cc: Martin Bligh
    Cc: "Randy.Dunlap"
    Cc: Al Viro
    Cc:
    Cc: Mark Fasheh
    Cc: Christoph Hellwig
    Cc: "J. Bruce Fields"
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     
  • Combine the file_ra_state members
    unsigned long prev_index
    unsigned int prev_offset
    into
    loff_t prev_pos

    It is more consistent and better supports huge files.

    Thanks to Peter for the nice proposal!

    [akpm@linux-foundation.org: fix shift overflow]
    Cc: Peter Zijlstra
    Signed-off-by: Fengguang Wu
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     

20 Jul, 2007

2 commits

  • Split ondemand readahead interface into two functions. I think this makes it
    a little clearer for non-readahead experts (like Rusty).

    Internally they both call ondemand_readahead(), but the page argument is
    changed to an obvious boolean flag.

    Signed-off-by: Rusty Russell
    Signed-off-by: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • Convert ext3/ext4 dir reads to use on-demand readahead.

    Readahead for dirs operates _not_ on file level, but on blockdev level. This
    makes a difference when the data blocks are not continuous. And the read
    routine is somehow opaque: there's no handy info about the status of current
    page. So a simplified call scheme is employed: to call into readahead
    whenever the current page falls out of readahead windows.

    Signed-off-by: Fengguang Wu
    Cc: Steven Pratt
    Cc: Ram Pai
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     

09 May, 2007

1 commit


09 Dec, 2006

1 commit


08 Dec, 2006

1 commit

  • I've been using Steve Grubb's purely evil "fsfuzzer" tool, at
    http://people.redhat.com/sgrubb/files/fsfuzzer-0.4.tar.gz

    Basically it makes a filesystem, splats some random bits over it, then
    tries to mount it and do some simple filesystem actions.

    At best, the filesystem catches the corruption gracefully. At worst,
    things spin out of control.

    As you might guess, we found a couple places in ext3 where things spin out
    of control :)

    First, we had a corrupted directory that was never checked for
    consistency... it was corrupt, and pointed to another bad "entry" of
    length 0. The for() loop looped forever, since the length of
    ext3_next_entry(de) was 0, and we kept looking at the same pointer over and
    over and over and over... I modeled this check and subsequent action on
    what is done for other directory types in ext3_readdir...

    (adding this check adds some computational expense; I am testing a followup
    patch to reduce the number of times we check and re-check these directory
    entries, in all cases. Thanks for the idea, Andreas).

    Next we had a root directory inode which had a corrupted size, claimed to
    be > 200M on a 4M filesystem. There was only really 1 block in the
    directory, but because the size was so large, readdir kept coming back for
    more, spewing thousands of printk's along the way.

    Per Andreas' suggestion, if we're in this read error condition and we're
    trying to read an offset which is greater than i_blocks worth of bytes,
    stop trying, and break out of the loop.

    With these two changes fsfuzz test survives quite well on ext3.

    Signed-off-by: Eric Sandeen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     

01 Oct, 2006

1 commit


27 Sep, 2006

3 commits


21 Apr, 2006

1 commit


29 Mar, 2006

1 commit

  • This is a conversion to make the various file_operations structs in fs/
    const. Basically a regexp job, with a few manual fixups

    The goal is both to increase correctness (harder to accidentally write to
    shared datastructures) and reducing the false sharing of cachelines with
    things that get dirty in .data (while .rodata is nicely read only and thus
    cache clean)

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

27 Mar, 2006

1 commit

  • Currently ext3_get_block() only maps or allocates one block at a time. This
    is quite inefficient for sequential IO workload.

    I have posted a early implements a simply multiple block map and allocation
    with current ext3. The basic idea is allocating the 1st block in the existing
    way, and attempting to allocate the next adjacent blocks on a best effort
    basis. More description about the implementation could be found here:
    http://marc.theaimsgroup.com/?l=ext2-devel&m=112162230003522&w=2

    The following the latest version of the patch: break the original patch into 5
    patches, re-worked some logicals, and fixed some bugs. The break ups are:

    [patch 1] Adding map multiple blocks at a time in ext3_get_blocks()
    [patch 2] Extend ext3_get_blocks() to support multiple block allocation
    [patch 3] Implement multiple block allocation in ext3-try-to-allocate
    (called via ext3_new_block()).
    [patch 4] Proper accounting updates in ext3_new_blocks()
    [patch 5] Adjust reservation window size properly (by the given number
    of blocks to allocate) before block allocation to increase the
    possibility of allocating multiple blocks in a single call.

    Tests done so far includes fsx,tiobench and dbench. The following numbers
    collected from Direct IO tests (1G file creation/read) shows the system time
    have been greatly reduced (more than 50% on my 8 cpu system) with the patches.

    1G file DIO write:
    2.6.15 2.6.15+patches
    real 0m31.275s 0m31.161s
    user 0m0.000s 0m0.000s
    sys 0m3.384s 0m0.564s

    1G file DIO read:
    2.6.15 2.6.15+patches
    real 0m30.733s 0m30.624s
    user 0m0.000s 0m0.004s
    sys 0m0.748s 0m0.380s

    Some previous test we did on buffered IO with using multiple blocks allocation
    and delayed allocation shows noticeable improvement on throughput and system
    time.

    This patch:

    Add support of mapping multiple blocks in one call.

    This is useful for DIO reads and re-writes (where blocks are already
    allocated), also is in line with Christoph's proposal of using getblocks() in
    mpage_readpage() or mpage_readpages().

    Signed-off-by: Mingming Cao
    Cc: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     

23 Mar, 2006

1 commit

  • Linus points out that ext3_readdir's readahead only cuts in when
    ext3_readdir() is operating at the very start of the directory. So for large
    directories we end up performing no readahead at all and we suck.

    So take it all out and use the core VM's page_cache_readahead(). This means
    that ext3 directory reads will use all of readahead's dynamic sizing goop.

    Note that we're using the directory's filp->f_ra to hold the readahead state,
    but readahead is actually being performed against the underlying blockdev's
    address_space. Fortunately the readahead code is all set up to handle this.

    Tested with printk. It works. I was struggling to find a real workload which
    actually cared.

    (The patch also exports page_cache_readahead() to GPL modules)

    Cc: "Stephen C. Tweedie"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds