11 Jan, 2011

1 commit


20 Dec, 2010

1 commit


28 Oct, 2010

1 commit

  • The llseek system call should return EINVAL if passed a seek offset
    which results in a write error. What this maximum offset should be
    depends on whether or not the huge_file file system feature is set,
    and whether or not the file is extent based or not.

    If the file has no "EXT4_EXTENTS_FL" flag, the maximum size which can be
    written (write systemcall) is different from the maximum size which can be
    sought (lseek systemcall).

    For example, the following 2 cases demonstrates the differences
    between the maximum size which can be written, versus the seek offset
    allowed by the llseek system call:

    #1: mkfs.ext3 ; mount -t ext4
    #2: mkfs.ext3 ; tune2fs -Oextent,huge_file ; mount -t ext4

    Table. the max file size which we can write or seek
    at each filesystem feature tuning and file flag setting
    +============+===============================+===============================+
    | \ File flag| | |
    | \ | !EXT4_EXTENTS_FL | EXT4_EXTETNS_FL |
    |case \| | |
    +------------+-------------------------------+-------------------------------+
    | #1 | write: 2194719883264 | write: -------------- |
    | | seek: 2199023251456 | seek: -------------- |
    +------------+-------------------------------+-------------------------------+
    | #2 | write: 4402345721856 | write: 17592186044415 |
    | | seek: 17592186044415 | seek: 17592186044415 |
    +------------+-------------------------------+-------------------------------+

    The differences exist because ext4 has 2 maxbytes which are sb->s_maxbytes
    (= extent-mapped maxbytes) and EXT4_SB(sb)->s_bitmap_maxbytes (= block-mapped
    maxbytes). Although generic_file_llseek uses only extent-mapped maxbytes.
    (llseek of ext4_file_operations is generic_file_llseek which uses
    sb->s_maxbytes.)

    Therefore we create ext4 llseek function which uses 2 maxbytes.

    The new own function originates from generic_file_llseek().
    If the file flag, "EXT4_EXTENTS_FL" is not set, the function alters
    inode->i_sb->s_maxbytes into EXT4_SB(inode->i_sb)->s_bitmap_maxbytes.

    Signed-off-by: Toshiyuki Okajima
    Signed-off-by: "Theodore Ts'o"
    Cc: Andreas Dilger

    Toshiyuki Okajima
     

27 Jul, 2010

3 commits


14 Jun, 2010

1 commit


17 May, 2010

3 commits


05 Mar, 2010

1 commit


02 Mar, 2010

1 commit

  • The callers of ext4_check_dir_entry() usually pass in the "file
    offset" (ext4_readdir, htree_dirblock_to_tree, search_dirblock,
    ext4_dx_find_entry, empty_dir), but a few callers (add_dirent_to_buf,
    ext4_delete_entry) only pass in the buffer offset.

    To accomodate those last two (which would be hard to fix otherwise),
    this patch changes ext4_check_dir_entry() to print the physical block
    number and the relative offset as well as the passed-in offset.

    Signed-off-by: Toshiyuki Okajima
    Signed-off-by: "Theodore Ts'o"

    Toshiyuki Okajima
     

16 Feb, 2010

1 commit


14 May, 2009

2 commits


15 Feb, 2009

1 commit

  • The rec_len field in the directory entry is 16 bits, so to encode
    blocksizes larger than 64k becomes problematic. This patch allows us
    to supprot block sizes up to 256k, by using the low 2 bits to extend
    the range of rec_len to 2**18-1 (since valid rec_len sizes must be a
    multiple of 4). We use the convention that a rec_len of 0 or 65535
    means the filesystem block size, for compatibility with older kernels.

    It's unlikely we'll see VM pages of up to 256k, but at some point we
    might find that the Linux VM has been enhanced to support filesystem
    block sizes > than the VM page size, at which point it might be useful
    for some applications to allow very large filesystem block sizes.

    Signed-off-by: Wei Yongjun
    Signed-off-by: "Theodore Ts'o"

    Wei Yongjun
     

06 Jan, 2009

1 commit


05 Nov, 2008

1 commit

  • Convert the unsigned longs that are most responsible for bloating the
    stack usage on 64-bit systems.

    Nearly all places in the ext3/4 code which uses "unsigned long" is
    probably a bug, since on 32-bit systems a ulong a 32-bits, which means
    we are wasting stack space on 64-bit systems.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

26 Oct, 2008

1 commit

  • Fix a regression caused by commit d0156417, "ext4: fix ext4_dx_readdir
    hash collision handling", where deleting files in a large directory
    (requiring more than one getdents system call), results in some
    filenames being returned twice. This was caused by a failure to
    update info->curr_hash and info->curr_minor_hash, so that if the
    directory had gotten modified since the last getdents() system call
    (as would be the case if the user is running "rm -r" or "git clean"),
    a directory entry would get returned twice to the userspace.

    Signed-off-by: "Theodore Ts'o"

    This patch fixes the bug reported by Markus Trippelsdorf at:
    http://bugzilla.kernel.org/show_bug.cgi?id=11844

    Signed-off-by: "Theodore Ts'o"
    Tested-by: Markus Trippelsdorf

    Theodore Ts'o
     

09 Oct, 2008

1 commit

  • Note: some people thinks this represents a security bug, since it
    might make the system go away while it is printing a large number of
    console messages, especially if a serial console is involved. Hence,
    it has been assigned CVE-2008-3528, but it requires that the attacker
    either has physical access to your machine to insert a USB disk with a
    corrupted filesystem image (at which point why not just hit the power
    button), or is otherwise able to convince the system administrator to
    mount an arbitrary filesystem image (at which point why not just
    include a setuid shell or world-writable hard disk device file or some
    such). Me, I think they're just being silly. --tytso

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"
    Cc: linux-ext4@vger.kernel.org
    Cc: Eugene Teo

    Eric Sandeen
     

09 Sep, 2008

2 commits


20 Aug, 2008

1 commit


15 Jul, 2008

1 commit

  • This patch does block reservation for delayed
    allocation, to avoid ENOSPC later at page flush time.

    Blocks(data and metadata) are reserved at da_write_begin()
    time, the freeblocks counter is updated by then, and the number of
    reserved blocks is store in per inode counter.

    At the writepage time, the unused reserved meta blocks are returned
    back. At unlink/truncate time, reserved blocks are properly released.

    Updated fix from Aneesh Kumar K.V
    to fix the oldallocator block reservation accounting with delalloc, added
    lock to guard the counters and also fix the reservation for meta blocks.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Mingming Cao
    Signed-off-by: Theodore Ts'o

    Mingming Cao
     

12 Jul, 2008

1 commit

  • * remove unnecessary code in free_rb_tree_fname

    * rename free_rb_tree_fname to ext4_htree_create_dir_info
    since it and ext4_htree_free_dir_info are a pair

    * replace kmalloc with kzalloc in ext4_htree_free_dir_info

    All these make the code more readable and simple.
    PS: this patch is also suitable for ext3.

    Signed-off-by: Shen Feng
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Shen Feng
     

30 Apr, 2008

2 commits


26 Feb, 2008

1 commit


29 Jan, 2008

2 commits

  • This patch adds a new data type ext4_lblk_t to represent
    the logical file blocks.

    This is the preparatory patch to support large files in ext4
    The follow up patch with convert the ext4_inode i_blocks to
    represent the number of blocks in file system block size. This
    changes makes it possible to have a block number 2**32 -1 which
    will result in overflow if the block number is represented by
    signed long. This patch convert all the block number to type
    ext4_lblk_t which is typedef to __u32

    Also remove dead code ext4_ext_walk_space

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Mingming Cao
    Signed-off-by: Eric Sandeen

    Aneesh Kumar K.V
     
  • With 64KB blocksize, a directory entry can have size 64KB which does not fit
    into 16 bits we have for entry lenght. So we store 0xffff instead and convert
    value when read from / written to disk. The patch also converts some places
    to use ext4_next_entry() when we are changing them anyway.

    Signed-off-by: Jan Kara
    Signed-off-by: Mingming Cao

    Jan Kara
     

18 Oct, 2007

1 commit

  • CONFIG_EXT4_INDEX is not an exposed config option in the kernel, and it is
    unconditionally defined in ext4_fs.h. tune2fs is already able to turn off
    dir indexing, so at this point it's just cluttering up the code. Remove
    it.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton

    Eric Sandeen
     

17 Oct, 2007

2 commits

  • Fix f_version type: should be u64 instead of long

    There is a type inconsistency between struct inode i_version and struct file
    f_version.

    fs.h:

    struct inode
    u64 i_version;

    and

    struct file
    unsigned long f_version;

    Users do:

    fs/ext3/dir.c:

    if (filp->f_version != inode->i_version) {

    So why isn't f_version a u64 ? It becomes a problem if versions gets
    higher than 2^32 and we are on an architecture where longs are 32 bits.

    This patch changes the f_version type to u64, and updates the users accordingly.

    It applies to 2.6.23-rc2-mm2.

    Signed-off-by: Mathieu Desnoyers
    Cc: Martin Bligh
    Cc: "Randy.Dunlap"
    Cc: Al Viro
    Cc:
    Cc: Mark Fasheh
    Cc: Christoph Hellwig
    Cc: "J. Bruce Fields"
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     
  • Combine the file_ra_state members
    unsigned long prev_index
    unsigned int prev_offset
    into
    loff_t prev_pos

    It is more consistent and better supports huge files.

    Thanks to Peter for the nice proposal!

    [akpm@linux-foundation.org: fix shift overflow]
    Cc: Peter Zijlstra
    Signed-off-by: Fengguang Wu
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     

20 Jul, 2007

2 commits

  • Split ondemand readahead interface into two functions. I think this makes it
    a little clearer for non-readahead experts (like Rusty).

    Internally they both call ondemand_readahead(), but the page argument is
    changed to an obvious boolean flag.

    Signed-off-by: Rusty Russell
    Signed-off-by: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • Convert ext3/ext4 dir reads to use on-demand readahead.

    Readahead for dirs operates _not_ on file level, but on blockdev level. This
    makes a difference when the data blocks are not continuous. And the read
    routine is somehow opaque: there's no handy info about the status of current
    page. So a simplified call scheme is employed: to call into readahead
    whenever the current page falls out of readahead windows.

    Signed-off-by: Fengguang Wu
    Cc: Steven Pratt
    Cc: Ram Pai
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     

09 May, 2007

1 commit


09 Dec, 2006

1 commit


08 Dec, 2006

1 commit

  • I've been using Steve Grubb's purely evil "fsfuzzer" tool, at
    http://people.redhat.com/sgrubb/files/fsfuzzer-0.4.tar.gz

    Basically it makes a filesystem, splats some random bits over it, then
    tries to mount it and do some simple filesystem actions.

    At best, the filesystem catches the corruption gracefully. At worst,
    things spin out of control.

    As you might guess, we found a couple places in ext4 where things spin out
    of control :)

    First, we had a corrupted directory that was never checked for
    consistency... it was corrupt, and pointed to another bad "entry" of
    length 0. The for() loop looped forever, since the length of
    ext4_next_entry(de) was 0, and we kept looking at the same pointer over and
    over and over and over... I modeled this check and subsequent action on
    what is done for other directory types in ext4_readdir...

    (adding this check adds some computational expense; I am testing a followup
    patch to reduce the number of times we check and re-check these directory
    entries, in all cases. Thanks for the idea, Andreas).

    Next we had a root directory inode which had a corrupted size, claimed to
    be > 200M on a 4M filesystem. There was only really 1 block in the
    directory, but because the size was so large, readdir kept coming back for
    more, spewing thousands of printk's along the way.

    Per Andreas' suggestion, if we're in this read error condition and we're
    trying to read an offset which is greater than i_blocks worth of bytes,
    stop trying, and break out of the loop.

    With these two changes fsfuzz test survives quite well on ext4.

    Signed-off-by: Eric Sandeen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     

12 Oct, 2006

2 commits

  • Someone's tab key is emitting spaces. Attempt to repair some of the damage.

    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • On disk extents format:
    /*
    * this is extent on-disk structure
    * it's used at the bottom of the tree
    */
    struct ext3_extent {
    __le32 ee_block; /* first logical block extent covers */
    __le16 ee_len; /* number of blocks covered by extent */
    __le16 ee_start_hi; /* high 16 bits of physical block */
    __le32 ee_start; /* low 32 bigs of physical block */
    };

    Signed-off-by: Alex Tomas
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Tomas