18 Oct, 2007

9 commits

  • Convert s_r_blocks_count and s_free_blocks_count to
    s_r_blocks_count_lo and s_free_blocks_count_lo

    This helps in finding BUGs due to direct partial access of
    these split 64 bit values

    Also fix direct partial access in ext4 code

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • Convert s_blocks_count to s_blocks_count_lo
    This helps in finding BUGs due to direct partial access of
    these split 64 bit values

    Also fix direct partial access in ext4 code

    Signed-off-by: Aneesh Kumar K.V

    Aneesh Kumar K.V
     
  • Convert bg_inode_bitmap and bg_inode_table to bg_inode_bitmap_lo
    and bg_inode_table_lo. This helps in finding BUGs due to
    direct partial access of these split 64 bit values

    Also fix one direct partial access

    Signed-off-by: Aneesh Kumar K.V

    Aneesh Kumar K.V
     
  • Convert bg_block_bitmap to bg_block_bitmap_lo
    This helps in catching some BUGS due to direct
    partial access of these split fields.

    Signed-off-by: Aneesh Kumar K.V

    Aneesh Kumar K.V
     
  • This feature relaxes check restrictions on where each block groups meta
    data is located within the storage media. This allows for the allocation
    of bitmaps or inode tables outside the block group boundaries in cases
    where bad blocks forces us to look for new blocks which the owning block
    group can not satisfy. This will also allow for new meta-data allocation
    schemes to improve performance and scalability.

    Signed-off-by: Jose R. Santos
    Cc:
    Signed-off-by: Andrew Morton

    Jose R. Santos
     
  • Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Andrew Morton

    Aneesh Kumar K.V
     
  • In pass1 of e2fsck, every inode table in the fileystem is scanned and checked,
    regardless of whether it is in use. This is this the most time consuming part
    of the filesystem check. The unintialized block group feature can greatly
    reduce e2fsck time by eliminating checking of uninitialized inodes.

    With this feature, there is a a high water mark of used inodes for each block
    group. Block and inode bitmaps can be uninitialized on disk via a flag in the
    group descriptor to avoid reading or scanning them at e2fsck time. A checksum
    of each group descriptor is used to ensure that corruption in the group
    descriptor's bit flags does not cause incorrect operation.

    The feature is enabled through a mkfs option

    mke2fs /dev/ -O uninit_groups

    A patch adding support for uninitialized block groups to e2fsprogs tools has
    been posted to the linux-ext4 mailing list.

    The patches have been stress tested with fsstress and fsx. In performance
    tests testing e2fsck time, we have seen that e2fsck time on ext3 grows
    linearly with the total number of inodes in the filesytem. In ext4 with the
    uninitialized block groups feature, the e2fsck time is constant, based
    solely on the number of used inodes rather than the total inode count.
    Since typical ext4 filesystems only use 1-10% of their inodes, this feature can
    greatly reduce e2fsck time for users. With performance improvement of 2-20
    times, depending on how full the filesystem is.

    The attached graph shows the major improvements in e2fsck times in filesystems
    with a large total inode count, but few inodes in use.

    In each group descriptor if we have

    EXT4_BG_INODE_UNINIT set in bg_flags:
    Inode table is not initialized/used in this group. So we can skip
    the consistency check during fsck.
    EXT4_BG_BLOCK_UNINIT set in bg_flags:
    No block in the group is used. So we can skip the block bitmap
    verification for this group.

    We also add two new fields to group descriptor as a part of
    uninitialized group patch.

    __le16 bg_itable_unused; /* Unused inodes count */
    __le16 bg_checksum; /* crc16(sb_uuid+group+desc) */

    bg_itable_unused:

    If we have EXT4_BG_INODE_UNINIT not set in bg_flags
    then bg_itable_unused will give the offset within
    the inode table till the inodes are used. This can be
    used by fsck to skip list of inodes that are marked unused.

    bg_checksum:
    Now that we depend on bg_flags and bg_itable_unused to determine
    the block and inode usage, we need to make sure group descriptor
    is not corrupt. We add checksum to group descriptor to
    detect corruption. If the descriptor is found to be corrupt, we
    mark all the blocks and inodes in the group used.

    Signed-off-by: Avantika Mathur
    Signed-off-by: Andreas Dilger
    Signed-off-by: Mingming Cao
    Signed-off-by: Aneesh Kumar K.V

    Andreas Dilger
     
  • CONFIG_EXT4_INDEX is not an exposed config option in the kernel, and it is
    unconditionally defined in ext4_fs.h. tune2fs is already able to turn off
    dir indexing, so at this point it's just cluttering up the code. Remove
    it.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton

    Eric Sandeen
     
  • Fragment support in ext2/3/4 was never implemented, and it probably will
    never be implemented. So remove it from ext4.

    Signed-off-by: Coly Li
    Acked-by: Andreas Dilger
    Signed-off-by: Andrew Morton
    Signed-off-by: "Theodore Ts'o"

    Coly Li
     

18 Jul, 2007

6 commits

  • This patch adds support to ext4 for allowing more than 65000
    subdirectories. Currently the maximum number of subdirectories is capped
    at 32000.

    If we exceed 65000 subdirectories in an htree directory it sets the
    inode link count to 1 and no longer counts subdirectories. The
    directory link count is not actually used when determining if a
    directory is empty, as that only counts subdirectories and not regular
    files that might be in there.

    A EXT4_FEATURE_RO_COMPAT_DIR_NLINK flag has been added and it is set if
    the subdir count for any directory crosses 65000. A later fsck will clear
    EXT4_FEATURE_RO_COMPAT_DIR_NLINK if there are no longer any directory
    with >65000 subdirs.

    Signed-off-by: Andreas Dilger
    Signed-off-by: Kalpak Shah
    Signed-off-by: "Theodore Ts'o"

    Andreas Dilger
     
  • We need to make sure that existing ext3 filesystems can also avail the
    new fields that have been added to the ext4 inode. We use
    s_want_extra_isize and s_min_extra_isize to decide by how much we should
    expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set
    then we expand the inode by max(s_want_extra_isize, s_min_extra_isize ,
    sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is
    still an open question about whether users should be able to set
    s_*_extra_isize smaller than the known fields or not.

    This patch also adds the functionality to expand inodes to include the
    newly added fields. We start by trying to expand by s_want_extra_isize
    bytes and if its fails we try to expand by s_min_extra_isize bytes. This
    is done by changing the i_extra_isize if enough space is available in
    the inode and no EAs are present. If EAs are present and there is enough
    space in the inode then the EAs in the inode are shifted to make space.
    If enough space is not available in the inode due to the EAs then 1 or
    more EAs are shifted to the external EA block. In the worst case when
    even the external EA block does not have enough space we inform the user
    that some EA would need to be deleted or s_min_extra_isize would have to
    be reduced.

    Signed-off-by: Andreas Dilger
    Signed-off-by: Kalpak Shah
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Kalpak Shah
     
  • This patch adds nanosecond timestamps for ext4. This involves adding
    *time_extra fields to the ext4_inode to extend the timestamps to
    64-bits. Creation time is also added by this patch.

    These extended fields will fit into an inode if the filesystem was
    formatted with large inodes (-I 256 or larger) and there are currently
    no EAs consuming all of the available space. For new inodes we always
    reserve enough space for the kernel's known extended fields, but for
    inodes created with an old kernel this might not have been the case. So
    this patch also adds the EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature
    flag(ro-compat so that older kernels can't create inodes with a smaller
    extra_isize). which indicates if the fields fitting inside
    s_min_extra_isize are available or not. If the expansion of inodes if
    unsuccessful then this feature will be disabled. This feature is only
    enabled if requested by the sysadmin.

    None of the extended inode fields is critical for correct filesystem
    operation.

    Signed-off-by: Andreas Dilger
    Signed-off-by: Kalpak Shah
    Signed-off-by: Eric Sandeen
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Kalpak Shah
     
  • When the JBD code was forked to create the new JBD2 code base, the
    references to CONFIG_JBD_DEBUG where never changed to
    CONFIG_JBD2_DEBUG. This patch fixes that.

    Signed-off-by: Jose R. Santos
    Signed-off-by: "Theodore Ts'o"

    Jose R. Santos
     
  • Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into
    ext4-specific i_flags. Quota code changes these flags on quota files
    (to make it harder for sysadmin to screw himself) and these changes were
    not correctly propagated into the filesystem.

    (This is a forward port patch from ext3)

    Signed-off-by: Jan Kara
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • This patch implements ->fallocate() inode operation in ext4. With this
    patch users of ext4 file systems will be able to use fallocate() system
    call for persistent preallocation. Current implementation only supports
    preallocation for regular files (directories not supported as of date)
    with extent maps. This patch does not support block-mapped files currently.
    Only FALLOC_ALLOCATE and FALLOC_RESV_SPACE modes are being supported as of
    now.

    Signed-off-by: Amit Arora

    Amit Arora
     

01 Jun, 2007

2 commits


13 Feb, 2007

1 commit

  • Many struct inode_operations in the kernel can be "const". Marking them const
    moves these to the .rodata section, which avoids false sharing with potential
    dirty data. In addition it'll catch accidental writes at compile time to
    these shared resources.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

12 Oct, 2006

9 commits