18 Dec, 2007

1 commit

  • As it turns out, the kernel divides by EXT3_INODES_PER_GROUP(s) when
    mounting an ext3 filesystem. If that number is zero, a crash follows.
    Below a patch.

    This crash was reported by Joeri de Ruiter, Carst Tankink and Pim Vullers.

    Cc:
    Acked-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andries E. Brouwer
     

22 Oct, 2007

2 commits

  • Now that nfsd has stopped writing to the find_exported_dentry member we an
    mark the export_operations const

    Signed-off-by: Christoph Hellwig
    Cc: Neil Brown
    Cc: "J. Bruce Fields"
    Cc:
    Cc: Dave Kleikamp
    Cc: Anton Altaparmakov
    Cc: David Chinner
    Cc: Timothy Shimmin
    Cc: OGAWA Hirofumi
    Cc: Hugh Dickins
    Cc: Chris Mason
    Cc: Jeff Mahoney
    Cc: "Vladimir V. Saveliev"
    Cc: Steven Whitehouse
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Trivial switch over to the new generic helpers.

    Signed-off-by: Christoph Hellwig
    Cc: Neil Brown
    Cc: "J. Bruce Fields"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

18 Oct, 2007

8 commits

  • Convert s_r_blocks_count and s_free_blocks_count to
    s_r_blocks_count_lo and s_free_blocks_count_lo

    This helps in finding BUGs due to direct partial access of
    these split 64 bit values

    Also fix direct partial access in ext4 code

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • Convert s_blocks_count to s_blocks_count_lo
    This helps in finding BUGs due to direct partial access of
    these split 64 bit values

    Also fix direct partial access in ext4 code

    Signed-off-by: Aneesh Kumar K.V

    Aneesh Kumar K.V
     
  • Convert bg_inode_bitmap and bg_inode_table to bg_inode_bitmap_lo
    and bg_inode_table_lo. This helps in finding BUGs due to
    direct partial access of these split 64 bit values

    Also fix one direct partial access

    Signed-off-by: Aneesh Kumar K.V

    Aneesh Kumar K.V
     
  • Convert bg_block_bitmap to bg_block_bitmap_lo
    This helps in catching some BUGS due to direct
    partial access of these split fields.

    Signed-off-by: Aneesh Kumar K.V

    Aneesh Kumar K.V
     
  • This feature relaxes check restrictions on where each block groups meta
    data is located within the storage media. This allows for the allocation
    of bitmaps or inode tables outside the block group boundaries in cases
    where bad blocks forces us to look for new blocks which the owning block
    group can not satisfy. This will also allow for new meta-data allocation
    schemes to improve performance and scalability.

    Signed-off-by: Jose R. Santos
    Cc:
    Signed-off-by: Andrew Morton

    Jose R. Santos
     
  • In pass1 of e2fsck, every inode table in the fileystem is scanned and checked,
    regardless of whether it is in use. This is this the most time consuming part
    of the filesystem check. The unintialized block group feature can greatly
    reduce e2fsck time by eliminating checking of uninitialized inodes.

    With this feature, there is a a high water mark of used inodes for each block
    group. Block and inode bitmaps can be uninitialized on disk via a flag in the
    group descriptor to avoid reading or scanning them at e2fsck time. A checksum
    of each group descriptor is used to ensure that corruption in the group
    descriptor's bit flags does not cause incorrect operation.

    The feature is enabled through a mkfs option

    mke2fs /dev/ -O uninit_groups

    A patch adding support for uninitialized block groups to e2fsprogs tools has
    been posted to the linux-ext4 mailing list.

    The patches have been stress tested with fsstress and fsx. In performance
    tests testing e2fsck time, we have seen that e2fsck time on ext3 grows
    linearly with the total number of inodes in the filesytem. In ext4 with the
    uninitialized block groups feature, the e2fsck time is constant, based
    solely on the number of used inodes rather than the total inode count.
    Since typical ext4 filesystems only use 1-10% of their inodes, this feature can
    greatly reduce e2fsck time for users. With performance improvement of 2-20
    times, depending on how full the filesystem is.

    The attached graph shows the major improvements in e2fsck times in filesystems
    with a large total inode count, but few inodes in use.

    In each group descriptor if we have

    EXT4_BG_INODE_UNINIT set in bg_flags:
    Inode table is not initialized/used in this group. So we can skip
    the consistency check during fsck.
    EXT4_BG_BLOCK_UNINIT set in bg_flags:
    No block in the group is used. So we can skip the block bitmap
    verification for this group.

    We also add two new fields to group descriptor as a part of
    uninitialized group patch.

    __le16 bg_itable_unused; /* Unused inodes count */
    __le16 bg_checksum; /* crc16(sb_uuid+group+desc) */

    bg_itable_unused:

    If we have EXT4_BG_INODE_UNINIT not set in bg_flags
    then bg_itable_unused will give the offset within
    the inode table till the inodes are used. This can be
    used by fsck to skip list of inodes that are marked unused.

    bg_checksum:
    Now that we depend on bg_flags and bg_itable_unused to determine
    the block and inode usage, we need to make sure group descriptor
    is not corrupt. We add checksum to group descriptor to
    detect corruption. If the descriptor is found to be corrupt, we
    mark all the blocks and inodes in the group used.

    Signed-off-by: Avantika Mathur
    Signed-off-by: Andreas Dilger
    Signed-off-by: Mingming Cao
    Signed-off-by: Aneesh Kumar K.V

    Andreas Dilger
     
  • Fragment support in ext2/3/4 was never implemented, and it probably will
    never be implemented. So remove it from ext4.

    Signed-off-by: Coly Li
    Acked-by: Andreas Dilger
    Signed-off-by: Andrew Morton
    Signed-off-by: "Theodore Ts'o"

    Coly Li
     
  • change JBD_XXX macros to JBD2_XXX in JBD2/Ext4

    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Mingming Cao
     

17 Oct, 2007

6 commits


12 Sep, 2007

1 commit

  • If we fail to start a transaction when releasing dquot, we have to call
    dquot_release() anyway to mark dquot structure as inactive. Otherwise we
    end in an infinite loop inside dqput().

    Signed-off-by: Jan Kara
    Cc: xb
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

27 Jul, 2007

1 commit

  • ext[234]_check_descriptors sanity checks block group descriptor geometry at
    mount time, testing whether the block bitmap, inode bitmap, and inode table
    reside wholly within the blockgroup. However, the inode table test is off
    by one so that if the last block in the inode table resides on the last
    block of the block group, the test incorrectly fails. This is because it
    tests the last block as (start + length) rather than (start + length - 1).

    This can be seen by trying to mount a filesystem made such as:

    mkfs.ext2 -F -b 1024 -m 0 -g 256 -N 3744 fsfile 1024

    which yields:

    EXT2-fs error (device loop0): ext2_check_descriptors: Inode table for group 0 not in group (block 101)!
    EXT2-fs: group descriptors corrupted!

    There is a similar bug in e2fsprogs, patch already sent for that.

    (I wonder if inside(), outside(), and/or in_range() should someday be
    used in this and other tests throughout the ext filesystems...)

    Signed-off-by: Eric Sandeen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     

20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     

18 Jul, 2007

5 commits

  • Replace (n & (n-1)) in the context of power of 2 checks with
    is_power_of_2()

    Signed-off-by: Vignesh Babu
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Dave Kleikamp
    Signed-off-by: "Theodore Ts'o"

    Vignesh Babu
     
  • This patch adds nanosecond timestamps for ext4. This involves adding
    *time_extra fields to the ext4_inode to extend the timestamps to
    64-bits. Creation time is also added by this patch.

    These extended fields will fit into an inode if the filesystem was
    formatted with large inodes (-I 256 or larger) and there are currently
    no EAs consuming all of the available space. For new inodes we always
    reserve enough space for the kernel's known extended fields, but for
    inodes created with an old kernel this might not have been the case. So
    this patch also adds the EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature
    flag(ro-compat so that older kernels can't create inodes with a smaller
    extra_isize). which indicates if the fields fitting inside
    s_min_extra_isize are available or not. If the expansion of inodes if
    unsuccessful then this feature will be disabled. This feature is only
    enabled if requested by the sysadmin.

    None of the extended inode fields is critical for correct filesystem
    operation.

    Signed-off-by: Andreas Dilger
    Signed-off-by: Kalpak Shah
    Signed-off-by: Eric Sandeen
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Kalpak Shah
     
  • Set the journals JBD2_FEATURE_INCOMPAT_64BIT on devices with more
    than 32bit block sizes during mount time. This ensure proper record
    lenth when writing to the journal.

    Signed-off-by: Jose R. Santos
    Signed-off-by: Andreas Dilger
    Signed-off-by: Mingming Cao
    Signed-off-by: Laurent Vivier
    Signed-off-by: "Theodore Ts'o"

    Jose R. Santos
     
  • Turn on extents feature by default in ext4 filesystem, to get wider
    testing of extents feature in ext4dev. This can be disabled using
    -o noextents.

    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Mingming Cao
     
  • currently the export_operation structure and helpers related to it are in
    fs.h. fs.h is already far too large and there are very few places needing the
    export bits, so split them off into a separate header.

    [akpm@linux-foundation.org: fix cifs build]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Neil Brown
    Cc: Steven French
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

17 Jul, 2007

4 commits

  • This is a patch that speeds up statfs. It is very simple - the "overhead"
    calculation, which takes a huge amount of time for large filesystems, never
    changes unless the size of the filesystem itself changes. That means we can
    store it in memory and only recalculate if the filesystem has been resized
    (almost never).

    It also fixes a minor problem that we never update the on-disk superblock free
    blocks/inodes counts until the filesystem is unmounted. While not fatal, we
    may as well update that on disk when we have the information, and it makes
    things like debugfs and dumpe2fs report a bit more accurate info.

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andreas Dilger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • Fix error handling in ext4_create_journal according to kernel conventions.

    Signed-off-by: Borislav Petkov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • ext4_orphan_add() and ext4_orphan_del() functions lock sb->s_lock with a
    transaction started with ext4_mark_recovery_complete() waits for a transaction
    holding sb->s_lock, thus leading to a possible deadlock. At the moment we
    call ext4_mark_recovery_complete() from ext4_remount() we have done all the
    work needed for remounting and thus we are safe to drop sb->s_lock before we
    wait for transactions to commit. Note that at this moment we are still
    guarded by s_umount lock against other remounts/umounts.

    Signed-off-by: Jan Kara
    Cc: Eric Sandeen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Customers claims to ext3-related errors, investigation showed that ext3
    orphan list has been corrupted and have the reference to non-ext3 inode.
    The following debug helps to understand the reasons of this issue.

    [akpm@linux-foundation.org: update for print_hex_dump() changes]
    Signed-off-by: Vasily Averin
    Cc: "Randy.Dunlap"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasily Averin
     

01 Jun, 2007

1 commit


17 May, 2007

1 commit

  • SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

    Signed-off-by: Christoph Lameter
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Steven French
    Cc: Michael Halcrow
    Cc: OGAWA Hirofumi
    Cc: Miklos Szeredi
    Cc: Steven Whitehouse
    Cc: Roman Zippel
    Cc: David Woodhouse
    Cc: Dave Kleikamp
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: Paul Mackerras
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: David Chinner
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

08 May, 2007

2 commits

  • I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
    SLAB.

    I think its purpose was to have a callback after an object has been freed
    to verify that the state is the constructor state again? The callback is
    performed before each freeing of an object.

    I would think that it is much easier to check the object state manually
    before the free. That also places the check near the code object
    manipulation of the object.

    Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
    compiled with SLAB debugging on. If there would be code in a constructor
    handling SLAB_DEBUG_INITIAL then it would have to be conditional on
    SLAB_DEBUG otherwise it would just be dead code. But there is no such code
    in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
    use of, difficult to understand and there are easier ways to accomplish the
    same effect (i.e. add debug code before kfree).

    There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
    clear in fs inode caches. Remove the pointless checks (they would even be
    pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

    This is the last slab flag that SLUB did not support. Remove the check for
    unimplemented flags from SLUB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
    been used in 6 years (so akpm says).

    find * -name \*.[ch] | xargs grep -l invalidate_bdev |
    while read file; do
    quilt add $file;
    sed -ie 's/invalidate_bdev(\([^,]*\),[^)]*)/invalidate_bdev(\1)/g' $file;
    done

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

13 Feb, 2007

1 commit


12 Feb, 2007

2 commits

  • Fix insecure default behaviour reported by Tigran Aivazian: if an ext2 or
    ext3 or ext4 filesystem is tuned to mount with "acl", but mounted by a
    kernel built without ACL support, then umask was ignored when creating
    inodes - though root or user has umask 022, touch creates files as 0666,
    and mkdir creates directories as 0777.

    This appears to have worked right until 2.6.11, when a fix to the default
    mode on symlinks (always 0777) assumed VFS applies umask: which it does,
    unless the mount is marked for ACLs; but ext[234] set MS_POSIXACL in
    s_flags according to s_mount_opt set according to def_mount_opts.

    We could revert to the 2.6.10 ext[234]_init_acl (adding an S_ISLNK test);
    but other filesystems only set MS_POSIXACL when ACLs are configured. We
    could fix this at another level; but it seems most robust to avoid setting
    the s_mount_opt flag in the first place (at the expense of more ifdefs).

    Likewise don't set the XATTR_USER flag when built without XATTR support.

    Signed-off-by: Hugh Dickins
    Cc: Tigran Aivazian
    Cc:
    Cc: Andreas Gruenbacher
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • In the rare case where we have skipped orphan inode processing due to a
    readonly block device, and the block device subsequently changes back to
    read-write, disallow a remount,rw transition of the filesystem when we have an
    unprocessed orphan inodes as this would corrupt the list.

    Ideally we should process the orphan inode list during the remount, but that's
    trickier, and this plugs the hole for now.

    Signed-off-by: Eric Sandeen
    Cc: "Stephen C. Tweedie"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     

08 Dec, 2006

4 commits

  • If you do something like:

    # touch foo
    # tail -f foo &
    # rm foo
    #
    #

    you'll panic, because ext3/4 tries to do orphan list processing on the
    readonly snapshot device, and:

    kernel: journal commit I/O error
    kernel: Assertion failure in journal_flush_Rsmp_e2f189ce() at journal.c:1356: "!journal->j_checkpoint_transactions"
    kernel: Kernel panic: Fatal exception

    for a truly readonly underlying device, it's reasonable and necessary
    to just skip orphan list processing.

    Signed-off-by: Eric Sandeen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • Update ext4_statfs to return an FSID that is a 64 bit XOR of the 128 bit
    filesystem UUID as suggested by Andreas Dilger. See the following Bugzilla
    entry for details:

    http://bugzilla.kernel.org/show_bug.cgi?id=136

    Cc: Andreas Dilger
    Cc: Stephen Tweedie
    Signed-off-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg
     
  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_NOFS is an alias of GFP_NOFS.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter