12 Oct, 2006

1 commit

  • Current error behaviour for ext2 and ext3 filesystems does not fully
    correspond to the documentation and should be fixed.

    According to man 8 mount, ext2 and ext3 file systems allow to set one of 3
    different on-errors behaviours:

    ---- start of quote man 8 mount ----

    errors=continue / errors=remount-ro / errors=panic

    Define the behaviour when an error is encountered. (Either ignore
    errors and just mark the file system erroneous and continue, or remount
    the file system read-only, or panic and halt the system.) The default is
    set in the filesystem superblock, and can be changed using tune2fs(8).

    ---- end of quote ----

    However EXT3_ERRORS_CONTINUE is not read from the superblock, and thus
    ERRORS_CONT is not saved on the sbi->s_mount_opt. It leads to the incorrect
    handle of errors on ext3.

    Then we've checked corresponding code in ext2 and discovered that it is buggy
    as well:

    - EXT2_ERRORS_CONTINUE is not read from the superblock (the same);

    - parse_option() does not clean the alternative values and thus something
    like (ERRORS_CONT|ERRORS_RO) can be set;

    - if options are omitted, parse_option() does not set any of these options.

    Therefore it is possible to set any combination of these options on the ext2:

    - none of them may be set: EXT2_ERRORS_CONTINUE on superblock / empty mount
    options;

    - any of them may be set using mount options;

    - 2 any options may be set: by using EXT2_ERRORS_RO/EXT2_ERRORS_PANIC on the
    superblock and other value in mount options;

    - and finally all three options may be set by adding third option in remount.

    Currently ext2 uses these values only in ext2_error() and it is not leading to
    any noticeable troubles. However somebody may be discouraged when he will try
    to workaround EXT2_ERRORS_PANIC on the superblock by using errors=continue in
    mount options.

    This patch:

    EXT3_ERRORS_CONTINUE should be taken from the superblock as default value for
    error behaviour.

    Signed-off-by: Dmitry Mishin
    Acked-by: Vasily Averin
    Acked-by: Kirill Korotaev
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Mishin
     

01 Oct, 2006

7 commits


27 Sep, 2006

13 commits

  • This eliminates the i_blksize field from struct inode. Filesystems that want
    to provide a per-inode st_blksize can do so by providing their own getattr
    routine instead of using the generic_fillattr() function.

    Note that some filesystems were providing pretty much random (and incorrect)
    values for i_blksize.

    [bunk@stusta.de: cleanup]
    [akpm@osdl.org: generic_fillattr() fix]
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     
  • * Rougly half of callers already do it by not checking return value
    * Code in drivers/acpi/osl.c does the following to be sure:

    (void)kmem_cache_destroy(cache);

    * Those who check it printk something, however, slab_error already printed
    the name of failed cache.
    * XFS BUGs on failed kmem_cache_destroy which is not the decision
    low-level filesystem driver should make. Converted to ignore.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • * Removing useless casts
    * Removing useless wrapper
    * Conversion from kmalloc+memset to kzalloc

    Signed-off-by: Panagiotis Issaris
    Acked-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Panagiotis Issaris
     
  • Conversions from kmalloc+memset to kzalloc.

    Signed-off-by: Panagiotis Issaris
    Jffs2-bit-acked-by: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Panagiotis Issaris
     
  • Some of the changes in balloc.c are just cosmetic, as Andreas pointed out -
    if they overflow they'll then underflow and things are fine.

    5th hunk actually fixes an overflow problem.

    Also check for potential overflows in inode & block counts when resizing.

    Signed-off-by: Eric Sandeen
    Cc: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • Fixing up some endian-ness warnings in preparation to clone ext4 from ext3.

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • More white space cleanups in preparation of cloning ext4 from ext3.
    Removing spaces that precede a tab.

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • SWsoft Virtuozzo/OpenVZ Linux kernel team has discovered that ext3 error
    behavior was broken in linux kernels since 2.5.x versions by the following
    patch:

    2002/10/31 02:15:26-05:00 tytso@snap.thunk.org
    Default mount options from superblock for ext2/3 filesystems
    http://linux.bkbits.net:8080/linux-2.6/gnupatch@3dc0d88eKbV9ivV4ptRNM8fBuA3JBQ

    In case ext3 file system is mounted with errors=continue
    (EXT3_ERRORS_CONTINUE) errors should be ignored when possible. However at
    present in case of any error kernel aborts journal and remounts filesystem
    to read-only. Such behavior was hit number of times and noted to differ
    from that of 2.4.x kernels.

    This patch fixes this:
    - do nothing in case of EXT3_ERRORS_CONTINUE,
    - set EXT3_MOUNT_ABORT and call journal_abort() in all other cases
    - panic() should be called after ext3_commit_super() to save
    sb marked as EXT3_ERROR_FS

    Signed-off-by: Vasily Averin
    Acked-by: Kirill Korotaev
    Cc: Theodore Ts'o
    Cc: "Stephen C. Tweedie"
    Cc: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasily Averin
     
  • Signed-off-by: Mingming Cao
    Acked-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • In the past there were a few kernel panics related to block reservation
    tree operations failure (insert/remove etc). It would be very useful to
    get the block allocation reservation map info when such error happens.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • This is primarily format string fixes, with changes to ialloc.c where large
    inode counts could overflow, and also pass around journal_inum as an
    unsigned long, just to be pedantic about it....

    Signed-off-by: Eric Sandeen
    Cc: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • I need to do some actual IO testing now, but this gets things mounting for
    a 16T ext3 filesystem. (patched up e2fsprogs is needed too, I'll send that
    off the kernel list)

    This patch fixes these issues in the kernel:

    o sbi->s_groups_count overflows in ext3_fill_super()

    sbi->s_groups_count = (le32_to_cpu(es->s_blocks_count) -
    le32_to_cpu(es->s_first_data_block) +
    EXT3_BLOCKS_PER_GROUP(sb) - 1) /
    EXT3_BLOCKS_PER_GROUP(sb);

    at 16T, s_blocks_count is already maxed out; adding
    EXT3_BLOCKS_PER_GROUP(sb) overflows it and groups_count comes out to 0.
    Not really what we want, and causes a failed mount.

    Feel free to check my math (actually, please do!), but changing it this
    way should work & avoid the overflow:

    (A + B - 1)/B changed to: ((A - 1)/B) + 1

    o ext3_check_descriptors() overflows range checks

    ext3_check_descriptors() iterates over all block groups making sure
    that various bits are within the right block ranges... on the last pass
    through, it is checking the error case

    [item] >= block + EXT3_BLOCKS_PER_GROUP(sb)

    where "block" is the first block in the last block group. The last
    block in this group (and the last one that will fit in 32 bits) is block
    + EXT3_BLOCKS_PER_GROUP(sb)- 1. block + EXT3_BLOCKS_PER_GROUP(sb) wraps
    back around to 0.

    so, make things clearer with "first_block" and "last_block" where those
    are first and last, inclusive, and use rather than =.

    Finally, the last block group may be smaller than the rest, so account
    for this on the last pass through: last_block = sb->s_blocks_count - 1;

    (a similar patch could be done for ext2; does anyone in their right mind
    use ext2 at 16T? I'll send an ext2 patch doing the same thing if that's
    warranted)

    Signed-off-by: Eric Sandeen
    Cc: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • Remove whitespace from ext3 and jbd, before we clone ext4.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     

17 Sep, 2006

2 commits

  • ext3-get-blocks support caused ~20% degrade in Sequential read
    performance (tiobench). Problem is with marking the buffer boundary
    so IO can be submitted right away. Here is the patch to fix it.

    2.6.18-rc6:
    -----------
    # ./iotest
    1048576+0 records in
    1048576+0 records out
    4294967296 bytes (4.3 GB) copied, 75.2726 seconds, 57.1 MB/s

    real 1m15.285s
    user 0m0.276s
    sys 0m3.884s

    2.6.18-rc6 + fix:
    -----------------
    [root@elm3a241 ~]# ./iotest
    1048576+0 records in
    1048576+0 records out
    4294967296 bytes (4.3 GB) copied, 62.9356 seconds, 68.2 MB/s

    The boundary block check in ext3_get_blocks_handle needs to be adjusted
    against the count of blocks mapped in this call, now that it can map
    more than one block.

    Signed-off-by: Suparna Bhattacharya
    Tested-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Suparna Bhattacharya
     
  • Inodes earlier than the 'first' inode (e.g. journal, resize) should be
    rejected early - except the root inode. Also inode numbers that are too
    big should be rejected early.

    [akpm@osdl.org: cleanup]
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

09 Sep, 2006

1 commit

  • It has been reported that ext3_getblk() is not doing the right thing and
    triggering following WARN():

    BUG: warning at fs/ext3/inode.c:1016/ext3_getblk()
    ext3_getblk+0x98/0x2a6 md_wakeup_thread+0x26/0x2a
    ext3_bread+0x1f/0x88 ext3_quota_read+0x136/0x1ae
    v1_read_dqblk+0x61/0xac dquot_acquire+0xf6/0x107
    ext3_acquire_dquot+0x46/0x68 dqget+0x155/0x1e7
    dquot_transfer+0x3e0/0x3e9 dput+0x23/0x13e
    ext3_setattr+0xc3/0x240 current_fs_time+0x52/0x6a
    notify_change+0x2bd/0x30d chown_common+0x9c/0xc5
    strncpy_from_user+0x3b/0x68 do_path_lookup+0xdf/0x266
    __user_walk_fd+0x44/0x5a sys_chown+0x4a/0x55
    vfs_write+0xe7/0x13c sys_mkdir+0x1f/0x23
    syscall_call+0x7/0xb

    Looking at the code, it looks like it's not handle HOLE correctly. It ends
    up returning -EIO. Here is the patch to fix it.

    If we really want to be paranoid, we can allow return values 0 (HOLE), 1
    (we asked for one block) and return -EIO for more than 1 block. But I
    really don't see a reason for doing it - all we need is the block# here.
    (doesn't matter how many blocks are mapped).

    ext3_get_blocks_handle() returns number of blocks it mapped. It returns 0
    in case of HOLE. ext3_getblk() should handle HOLE properly (currently its
    dumping warning stack and returning -EIO).

    Signed-off-by: Badari Pulavarty
    Acked-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     

28 Aug, 2006

1 commit

  • To handle the earlier bogus ENOSPC error caused by filesystem full of block
    reservation, current code falls back to non block reservation, starts to
    allocate block(s) from the goal allocation block group as if there is no
    block reservation.

    Current code needs to re-load the corresponding block group descriptor for
    the initial goal block group in this case. The patch fixes this.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     

01 Aug, 2006

2 commits

  • For files other than IFREG, nobh option doesn't make sense. Modifications
    to them are journalled and needs buffer heads to do that. Without this
    patch, we get kernel oops in page_buffers().

    Signed-off-by: Badari Pulavarty
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • The inode number out of an NFS file handle gets passed eventually to
    ext3_get_inode_block() without any checking. If ext3_get_inode_block()
    allows it to trigger an error, then bad filehandles can have unpleasant
    effect - ext3_error() will usually cause a forced read-only remount, or a
    panic if `errors=panic' was used.

    So remove the call to ext3_error there and put a matching check in
    ext3/namei.c where inode numbers are read off storage.

    [akpm@osdl.org: fix off-by-one error]
    Signed-off-by: Neil Brown
    Signed-off-by: Jan Kara
    Cc: Marcel Holtmann
    Cc:
    Cc: "Stephen C. Tweedie"
    Cc: Eric Sandeen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Brown
     

11 Jul, 2006

1 commit


04 Jul, 2006

1 commit

  • The quota code plays interesting games with the lock ordering; to quote Jan:

    | i_mutex of inode containing quota file is acquired after all other
    | quota locks. i_mutex of all other inodes is acquired before quota
    | locks. Quota code makes sure (by resetting inode operations and
    | setting special flag on inode) that noone tries to enter quota code
    | while holding i_mutex on a quota file...

    The good news is that all of this special case i_mutex grabbing happens in the
    (per filesystem) low level quota write function. For this special case we
    need a new I_MUTEX_* nesting level, since this just entirely outside any of
    the regular VFS locking rules for i_mutex. I trust Jan on his blue eyes that
    this is not ever going to deadlock; and based on that the patch below is what
    it takes to inform lockdep of these very interesting new locking rules.

    The new locking rule for the I_MUTEX_QUOTA nesting level is that this is the
    deepest possible level of nesting for i_mutex, and that this only should be
    used in quota write (and possibly read) function of filesystems. This makes
    the lock ordering of the I_MUTEX_* levels:

    I_MUTEX_PARENT -> I_MUTEX_CHILD -> I_MUTEX_NORMAL -> I_MUTEX_QUOTA

    Has no effect on non-lockdep kernels.

    Signed-off-by: Arjan van de Ven
    Acked-by: Ingo Molnar
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

01 Jul, 2006

1 commit


29 Jun, 2006

1 commit


27 Jun, 2006

1 commit

  • This patch adds "-o bh" option to force use of buffer_heads. This option
    is needed when we make "nobh" as default - and if we run into problems.

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     

26 Jun, 2006

5 commits

  • The variables nlen and rlen are defined/initialized but not used in
    ext3_add_entry().

    Signed-off-by: Johann Lombardi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johann Lombardi
     
  • Convert the ext3 in-kernel filesystem blocks to ext3_fsblk_t. Convert the
    rest of all unsigned long type in-kernel filesystem blocks to ext3_fsblk_t,
    and replace the printk format string respondingly.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • Some of the in-kernel ext3 block variable type are treated as signed 4 bytes
    int type, thus limited ext3 filesystem to 8TB (4kblock size based). While
    trying to fix them, it seems quite confusing in the ext3 code where some
    blocks are filesystem-wide blocks, some are group relative offsets that need
    to be signed value (as -1 has special meaning). So it seem saner to define
    two types of physical blocks: one is filesystem wide blocks, another is
    group-relative blocks. The following patches clarify these two types of
    blocks in the ext3 code, and fix the type bugs which limit current 32 bit ext3
    filesystem limit to 8TB.

    With this series of patches and the percpu counter data type changes in the mm
    tree, we are able to extend exts filesystem limit to 16TB.

    This work is also a pre-request for the recent >32 bit ext3 work, and makes
    the kernel to able to address 48 bit ext3 block a lot easier: Simply redefine
    ext3_fsblk_t from unsigned long to sector_t and redefine the format string for
    ext3 filesystem block corresponding.

    Two RFC with a series patches have been posted to ext2-devel list and have
    been reviewed and discussed:
    http://marc.theaimsgroup.com/?l=ext2-devel&m=114722190816690&w=2

    http://marc.theaimsgroup.com/?l=ext2-devel&m=114784919525942&w=2

    Patches are tested on both 32 bit machine and 64 bit machine, 8TB ext3 filesystem(with the latest to be released e2fsprogs-1.39). Tests
    includes overnight fsx, tiobench, dbench and fsstress.

    This patch:

    Defines ext3_fsblk_t and ext3_grpblk_t, and the printk format string for
    filesystem wide blocks.

    This patch classifies all block group relative blocks, and ext3_fsblk_t blocks
    occurs in the same function where used to be confusing before. Also include
    kernel bug fixes for filesystem wide in-kernel block variables. There are
    some fileystem wide blocks are treated as int/unsigned int type in the kernel
    currently, especially in ext3 block allocation and reservation code. This
    patch fixed those bugs by converting those variables to ext3_fsblk_t(unsigned
    long) type.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • This was reported as Debian bug #336604.

    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     
  • If ext3 filesystem is larger than 2TB, and sector_t is a u32 (i.e.
    CONFIG_LBD not defined in the kernel), the calculation of the disk sector
    will overflow. Add check at ext3_fill_super() and ext3_group_extend() to
    prevent mount/remount/resize >2TB ext3 filesystem if sector_t size is 4
    bytes.

    Verified this patch on a 32 bit platform without CONFIG_LBD defined
    (sector_t is 32 bits long), mount refuse to mount a 10TB ext3.

    Signed-off-by: Mingming Cao
    Acked-by: Andreas Dilger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     

23 Jun, 2006

3 commits

  • The percpu counter data type are changed in this set of patches to support
    more users like ext3 who need more than 32 bit to store the free blocks
    total in the filesystem.

    - Generic perpcu counters data type changes. The size of the global counter
    and local counter were explictly specified using s64 and s32. The global
    counter is changed from long to s64, while the local counter is changed from
    long to s32, so we could avoid doing 64 bit update in most cases.

    - Users of the percpu counters are updated to make use of the new
    percpu_counter_init() routine now taking an additional parameter to allow
    users to pass the initial value of the global counter.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • Steven Rostedt points out that `rsv' here is usually
    NULL, so we should avoid calling kfree().

    Also, fix up some nearby whitespace damage.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Give the statfs superblock operation a dentry pointer rather than a superblock
    pointer.

    This complements the get_sb() patch. That reduced the significance of
    sb->s_root, allowing NFS to place a fake root there. However, NFS does
    require a dentry to use as a target for the statfs operation. This permits
    the root in the vfsmount to be used instead.

    linux/mount.h has been added where necessary to make allyesconfig build
    successfully.

    Interest has also been expressed for use with the FUSE and XFS filesystems.

    Signed-off-by: David Howells
    Acked-by: Al Viro
    Cc: Nathan Scott
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells