20 Sep, 2007

2 commits

  • The do_split() function for htree dir blocks is intended to split a leaf
    block to make room for a new entry. It sorts the entries in the original
    block by hash value, then moves the last half of the entries to the new
    block - without accounting for how much space this actually moves. (IOW,
    it moves half of the entry *count* not half of the entry *space*). If by
    chance we have both large & small entries, and we move only the smallest
    entries, and we have a large new entry to insert, we may not have created
    enough space for it.

    The patch below stores each record size when calculating the dx_map, and
    then walks the hash-sorted dx_map, calculating how many entries must be
    moved to more evenly split the existing entries between the old block and
    the new block, guaranteeing enough space for the new entry.

    The dx_map "offs" member is reduced to u16 so that the overall map size
    does not change - it is temporarily stored at the end of the new block, and
    if it grows too large it may be overwritten. By making offs and size both
    u16, we won't grow the map size.

    Also add a few comments to the functions involved.

    This fixes the testcase reported by hooanon05@yahoo.co.jp on the
    linux-ext4 list, "ext3 dir_index causes an error"

    Thanks to Andreas Dilger for discussing the problem & solution with me.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Andreas Dilger
    Tested-by: Junjiro Okajima
    Cc: Theodore Ts'o
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • Convert asserts (BUGs) in dx_probe from bad on-disk data to recoverable
    errors with helpful warnings. With help catching other asserts from Duane
    Griffin

    Signed-off-by: Eric Sandeen
    Acked-by: Duane Griffin
    Acked-by: Theodore Ts'o
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     

12 Sep, 2007

1 commit

  • If we fail to start a transaction when releasing dquot, we have to call
    dquot_release() anyway to mark dquot structure as inactive. Otherwise we
    end in an infinite loop inside dqput().

    Signed-off-by: Jan Kara
    Cc: xb
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

01 Aug, 2007

1 commit

  • Yan Zheng wrote:

    > I think I found a bug in ext4/extents.c, "ext4_ext_put_in_cache" uses
    > "__u32" to receive physical block number. "ext4_ext_put_in_cache" is
    > used in "ext4_ext_get_blocks", it sets ext4 inode's extent cache
    > according most recently tree lookup (higher 16 bits of saved physical
    > block number are always zero). when serving a mapping request,
    > "ext4_ext_get_blocks" first check whether the logical block is in
    > inode's extent cache. if the logical block is in the cache and the
    > cached region isn't a gap, "ext4_ext_get_blocks" gets physical block
    > number by using cached region's physical block number and offset in
    > the cached region. as described above, "ext4_ext_get_blocks" may
    > return wrong result when there are physical block numbers bigger than
    > 0xffffffff.
    >

    You are right. Thanks for reporting this!

    Signed-off-by: Mingming Cao
    Cc: Yan Zheng
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     

27 Jul, 2007

1 commit

  • ext[234]_check_descriptors sanity checks block group descriptor geometry at
    mount time, testing whether the block bitmap, inode bitmap, and inode table
    reside wholly within the blockgroup. However, the inode table test is off
    by one so that if the last block in the inode table resides on the last
    block of the block group, the test incorrectly fails. This is because it
    tests the last block as (start + length) rather than (start + length - 1).

    This can be seen by trying to mount a filesystem made such as:

    mkfs.ext2 -F -b 1024 -m 0 -g 256 -N 3744 fsfile 1024

    which yields:

    EXT2-fs error (device loop0): ext2_check_descriptors: Inode table for group 0 not in group (block 101)!
    EXT2-fs: group descriptors corrupted!

    There is a similar bug in e2fsprogs, patch already sent for that.

    (I wonder if inside(), outside(), and/or in_range() should someday be
    used in this and other tests throughout the ext filesystems...)

    Signed-off-by: Eric Sandeen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     

20 Jul, 2007

4 commits

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     
  • Looking at the current linus-git tree jbd_debug() define in
    include/linux/jbd2.h

    extern u8 journal_enable_debug;

    #define jbd_debug(n, f, a...) \
    do { \
    if ((n) fs/ext4/inode.c: In function ‘ext4_write_inode’:
    > fs/ext4/inode.c:2906: warning: comparison is always true due to limited
    > range of data type
    >
    > fs/jbd2/recovery.c: In function ‘jbd2_journal_recover’:
    > fs/jbd2/recovery.c:254: warning: comparison is always true due to
    > limited range of data type
    > fs/jbd2/recovery.c:257: warning: comparison is always true due to
    > limited range of data type
    >
    > fs/jbd2/recovery.c: In function ‘jbd2_journal_skip_recovery’:
    > fs/jbd2/recovery.c:301: warning: comparison is always true due to
    > limited range of data type
    >
    Noticed all warnings are occurs when the debug level is 0. Then found
    the "jbd2: Move jbd2-debug file to debugfs" patch
    http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0f49d5d019afa4e94253bfc92f0daca3badb990b

    changed the jbd2_journal_enable_debug from int type to u8, makes the
    jbd_debug comparision is always true when the debugging level is 0. Thus
    the compile warning occurs.

    Thought about changing the jbd2_journal_enable_debug data type back to
    int, but can't, because the jbd2-debug is moved to debug fs, where
    calling debugfs_create_u8() to create the debugfs entry needs the value
    to be u8 type.

    Even if we changed the data type back to int, the code is still buggy,
    kernel should not print jbd2 debug message if the
    jbd2_journal_enable_debug is set to 0. But this is not the case.

    The fix is change the level of debugging to 1. The same should fixed in
    ext3/JBD, but currently ext3 jbd-debug via /proc fs is broken, so we
    probably should fix it all together.

    Signed-off-by: Mingming Cao
    Cc: Jeff Garzik
    Cc: Theodore Tso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • Split ondemand readahead interface into two functions. I think this makes it
    a little clearer for non-readahead experts (like Rusty).

    Internally they both call ondemand_readahead(), but the page argument is
    changed to an obvious boolean flag.

    Signed-off-by: Rusty Russell
    Signed-off-by: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • Convert ext3/ext4 dir reads to use on-demand readahead.

    Readahead for dirs operates _not_ on file level, but on blockdev level. This
    makes a difference when the data blocks are not continuous. And the read
    routine is somehow opaque: there's no handy info about the status of current
    page. So a simplified call scheme is employed: to call into readahead
    whenever the current page falls out of readahead windows.

    Signed-off-by: Fengguang Wu
    Cc: Steven Pratt
    Cc: Ram Pai
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     

18 Jul, 2007

18 commits

  • Use the EXT_LAST_INDEX macro; that's what it's there for.

    Clean up ext4_ext_ext_grow_indepth() so the correct EXT_FIRST_INDEX or
    EXT_FIRST_MACRO is used as necessary. The two macros are equivalent, so
    the C will collapse the if statement out, but it makes the code much
    more readable.

    Signed-off-by: Dmitry Monakhov
    Acked-by: Alex Tomas
    Signed-off-by: Dave Kleikamp
    Singed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     
  • Signed-off-by: Dmitry Monakhov
    Acked-by: Alex Tomas
    Signed-off-by: Dave Kleikamp
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     
  • ext4_change_inode_journal_flag() is only called from one location:
    ext4_ioctl(EXT3_IOC_SETFLAGS). That ioctl case already has a IS_RDONLY()
    call in it so this one is superfluous.

    Signed-off-by: Dave Hansen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Dave Kleikamp
    Signed-off-by: "Theodore Ts'o"

    Dave Hansen
     
  • Replace (n & (n-1)) in the context of power of 2 checks with
    is_power_of_2()

    Signed-off-by: Vignesh Babu
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Dave Kleikamp
    Signed-off-by: "Theodore Ts'o"

    Vignesh Babu
     
  • Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • This patch adds support to ext4 for allowing more than 65000
    subdirectories. Currently the maximum number of subdirectories is capped
    at 32000.

    If we exceed 65000 subdirectories in an htree directory it sets the
    inode link count to 1 and no longer counts subdirectories. The
    directory link count is not actually used when determining if a
    directory is empty, as that only counts subdirectories and not regular
    files that might be in there.

    A EXT4_FEATURE_RO_COMPAT_DIR_NLINK flag has been added and it is set if
    the subdir count for any directory crosses 65000. A later fsck will clear
    EXT4_FEATURE_RO_COMPAT_DIR_NLINK if there are no longer any directory
    with >65000 subdirs.

    Signed-off-by: Andreas Dilger
    Signed-off-by: Kalpak Shah
    Signed-off-by: "Theodore Ts'o"

    Andreas Dilger
     
  • We need to make sure that existing ext3 filesystems can also avail the
    new fields that have been added to the ext4 inode. We use
    s_want_extra_isize and s_min_extra_isize to decide by how much we should
    expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set
    then we expand the inode by max(s_want_extra_isize, s_min_extra_isize ,
    sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is
    still an open question about whether users should be able to set
    s_*_extra_isize smaller than the known fields or not.

    This patch also adds the functionality to expand inodes to include the
    newly added fields. We start by trying to expand by s_want_extra_isize
    bytes and if its fails we try to expand by s_min_extra_isize bytes. This
    is done by changing the i_extra_isize if enough space is available in
    the inode and no EAs are present. If EAs are present and there is enough
    space in the inode then the EAs in the inode are shifted to make space.
    If enough space is not available in the inode due to the EAs then 1 or
    more EAs are shifted to the external EA block. In the worst case when
    even the external EA block does not have enough space we inform the user
    that some EA would need to be deleted or s_min_extra_isize would have to
    be reduced.

    Signed-off-by: Andreas Dilger
    Signed-off-by: Kalpak Shah
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Kalpak Shah
     
  • This patch adds nanosecond timestamps for ext4. This involves adding
    *time_extra fields to the ext4_inode to extend the timestamps to
    64-bits. Creation time is also added by this patch.

    These extended fields will fit into an inode if the filesystem was
    formatted with large inodes (-I 256 or larger) and there are currently
    no EAs consuming all of the available space. For new inodes we always
    reserve enough space for the kernel's known extended fields, but for
    inodes created with an old kernel this might not have been the case. So
    this patch also adds the EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature
    flag(ro-compat so that older kernels can't create inodes with a smaller
    extra_isize). which indicates if the fields fitting inside
    s_min_extra_isize are available or not. If the expansion of inodes if
    unsuccessful then this feature will be disabled. This feature is only
    enabled if requested by the sysadmin.

    None of the extended inode fields is critical for correct filesystem
    operation.

    Signed-off-by: Andreas Dilger
    Signed-off-by: Kalpak Shah
    Signed-off-by: Eric Sandeen
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Kalpak Shah
     
  • When the JBD code was forked to create the new JBD2 code base, the
    references to CONFIG_JBD_DEBUG where never changed to
    CONFIG_JBD2_DEBUG. This patch fixes that.

    Signed-off-by: Jose R. Santos
    Signed-off-by: "Theodore Ts'o"

    Jose R. Santos
     
  • Set the journals JBD2_FEATURE_INCOMPAT_64BIT on devices with more
    than 32bit block sizes during mount time. This ensure proper record
    lenth when writing to the journal.

    Signed-off-by: Jose R. Santos
    Signed-off-by: Andreas Dilger
    Signed-off-by: Mingming Cao
    Signed-off-by: Laurent Vivier
    Signed-off-by: "Theodore Ts'o"

    Jose R. Santos
     
  • Add more run-time checking of extent header fields and remove BUG_ON
    checks so we don't panic the kernel just because the on-disk filesystem
    is corrupted.

    Signed-off-by: Alex Tomas
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Alex Tomas
     
  • Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into
    ext4-specific i_flags. Quota code changes these flags on quota files
    (to make it harder for sysadmin to screw himself) and these changes were
    not correctly propagated into the filesystem.

    (This is a forward port patch from ext3)

    Signed-off-by: Jan Kara
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • Turn on extents feature by default in ext4 filesystem, to get wider
    testing of extents feature in ext4dev. This can be disabled using
    -o noextents.

    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Mingming Cao
     
  • This change was suggested by Andreas Dilger.
    This patch changes the EXT_MAX_LEN value and extent code which marks/checks
    uninitialized extents. With this change it will be possible to have
    initialized extents with 2^15 blocks (earlier the max blocks we could have
    was 2^15 - 1). This way we can have better extent-to-block alignment.
    Now, maximum number of blocks we can have in an initialized extent is 2^15
    and in an uninitialized extent is 2^15 - 1.

    Signed-off-by: Amit Arora

    Amit Arora
     
  • This patch adds write support to the uninitialized extents that get
    created when a preallocation is done using fallocate(). It takes care of
    splitting the extents into multiple (upto three) extents and merging the
    new split extents with neighbouring ones, if possible.

    Signed-off-by: Amit Arora

    Amit Arora
     
  • This patch implements ->fallocate() inode operation in ext4. With this
    patch users of ext4 file systems will be able to use fallocate() system
    call for persistent preallocation. Current implementation only supports
    preallocation for regular files (directories not supported as of date)
    with extent maps. This patch does not support block-mapped files currently.
    Only FALLOC_ALLOCATE and FALLOC_RESV_SPACE modes are being supported as of
    now.

    Signed-off-by: Amit Arora

    Amit Arora
     
  • Introduce is_owner_or_cap() macro in fs.h, and convert over relevant
    users to it. This is done because we want to avoid bugs in the future
    where we check for only effective fsuid of the current task against a
    file's owning uid, without simultaneously checking for CAP_FOWNER as
    well, thus violating its semantics.
    [ XFS uses special macros and structures, and in general looked ...
    untouchable, so we leave it alone -- but it has been looked over. ]

    The (current->fsuid != inode->i_uid) check in generic_permission() and
    exec_permission_lite() is left alone, because those operations are
    covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. Similarly operations
    falling under the purview of CAP_CHOWN and CAP_LEASE are also left alone.

    Signed-off-by: Satyam Sharma
    Cc: Al Viro
    Acked-by: Serge E. Hallyn
    Signed-off-by: Linus Torvalds

    Satyam Sharma
     
  • currently the export_operation structure and helpers related to it are in
    fs.h. fs.h is already far too large and there are very few places needing the
    export bits, so split them off into a separate header.

    [akpm@linux-foundation.org: fix cifs build]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Neil Brown
    Cc: Steven French
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

17 Jul, 2007

6 commits

  • This is a patch that speeds up statfs. It is very simple - the "overhead"
    calculation, which takes a huge amount of time for large filesystems, never
    changes unless the size of the filesystem itself changes. That means we can
    store it in memory and only recalculate if the filesystem has been resized
    (almost never).

    It also fixes a minor problem that we never update the on-disk superblock free
    blocks/inodes counts until the filesystem is unmounted. While not fatal, we
    may as well update that on disk when we have the information, and it makes
    things like debugfs and dumpe2fs report a bit more accurate info.

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andreas Dilger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • Fix error handling in ext4_create_journal according to kernel conventions.

    Signed-off-by: Borislav Petkov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • In ext4_new_blocks(), one of two ext4_block_bitmap() calls should be
    ext4_inode_bitmap() call. It is not harmful in normal processing, but it
    should be fixed.

    Signed-off-by: Toshiyuki Okajima
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshiyuki Okajima
     
  • ext4_orphan_add() and ext4_orphan_del() functions lock sb->s_lock with a
    transaction started with ext4_mark_recovery_complete() waits for a transaction
    holding sb->s_lock, thus leading to a possible deadlock. At the moment we
    call ext4_mark_recovery_complete() from ext4_remount() we have done all the
    work needed for remounting and thus we are safe to drop sb->s_lock before we
    wait for transactions to commit. Note that at this moment we are still
    guarded by s_umount lock against other remounts/umounts.

    Signed-off-by: Jan Kara
    Cc: Eric Sandeen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • After ext3 orphan list check has been added into ext3_destroy_inode()
    (please see my previous patch) the following situation has been detected:

    EXT3-fs warning (device sda6): ext3_unlink: Deleting nonexistent file (37901290), 0
    Inode 00000101a15b7840: orphan list check failed!
    00000773 6f665f00 74616d72 00000573 65725f00 06737270 66000000 616d726f
    ...
    Call Trace: [] ext3_destroy_inode+0x79/0x90
    [] sys_unlink+0x126/0x1a0
    [] error_exit+0x0/0x81
    [] system_call+0x7e/0x83

    First messages said that unlinked inode has i_nlink=0, then ext3_unlink()
    adds this inode into orphan list.

    Second message means that this inode has not been removed from orphan list.
    Inode dump has showed that i_fop = &bad_file_ops and it can be set in
    make_bad_inode() only. Then I've found that ext3_read_inode() can call
    make_bad_inode() without any error/warning messages, for example in the
    following case:

    ...
    if (inode->i_nlink == 0) {
    if (inode->i_mode == 0 ||
    !(EXT3_SB(inode->i_sb)->s_mount_state & EXT3_ORPHAN_FS)) {
    /* this inode is deleted */
    brelse (bh);
    goto bad_inode;
    ...

    Bad inode can live some time, ext3_unlink can add it to orphan list, but
    ext3_delete_inode() do not deleted this inode from orphan list. As result
    we can have orphan list corruption detected in ext3_destroy_inode().

    However it is not clear for me how to fix this issue correctly.

    As far as i see is_bad_inode() is called after iget() in all places
    excluding ext3_lookup() and ext3_get_parent(). I believe it makes sense to
    add bad inode check to these functions too and call iput if bad inode
    detected.

    Signed-off-by: Vasily Averin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasily Averin
     
  • Customers claims to ext3-related errors, investigation showed that ext3
    orphan list has been corrupted and have the reference to non-ext3 inode.
    The following debug helps to understand the reasons of this issue.

    [akpm@linux-foundation.org: update for print_hex_dump() changes]
    Signed-off-by: Vasily Averin
    Cc: "Randy.Dunlap"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasily Averin
     

10 Jul, 2007

1 commit


24 Jun, 2007

1 commit


01 Jun, 2007

4 commits


17 May, 2007

1 commit

  • SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

    Signed-off-by: Christoph Lameter
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Steven French
    Cc: Michael Halcrow
    Cc: OGAWA Hirofumi
    Cc: Miklos Szeredi
    Cc: Steven Whitehouse
    Cc: Roman Zippel
    Cc: David Woodhouse
    Cc: Dave Kleikamp
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: Paul Mackerras
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: David Chinner
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter