09 Dec, 2006

2 commits

  • This facility provides three entry points:

    ilog2() Log base 2 of unsigned long
    ilog2_u32() Log base 2 of u32
    ilog2_u64() Log base 2 of u64

    These facilities can either be used inside functions on dynamic data:

    int do_something(long q)
    {
    ...;
    y = ilog2(x)
    ...;
    }

    Or can be used to statically initialise global variables with constant values:

    unsigned n = ilog2(27);

    When performing static initialisation, the compiler will report "error:
    initializer element is not constant" if asked to take a log of zero or of
    something not reducible to a constant. They treat negative numbers as
    unsigned.

    When not dealing with a constant, they fall back to using fls() which permits
    them to use arch-specific log calculation instructions - such as BSR on
    x86/x86_64 or SCAN on FRV - if available.

    [akpm@osdl.org: MMC fix]
    Signed-off-by: David Howells
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Herbert Xu
    Cc: David Howells
    Cc: Wojtek Kaniewski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the ext3
    filesystem.

    Signed-off-by: Josef "Jeff" Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef "Jeff" Sipek
     

08 Dec, 2006

15 commits

  • Port fix to the off-by-one in find_next_usable_block's memscan from ext2 to
    ext3; but it didn't cause a serious problem for ext3 because the additional
    ext3_test_allocatable check rescued it from the error.

    Signed-off-by: Mingming Cao
    Signed-off-by: Hugh Dickins
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • ext3_new_blocks has a nice io_error label for setting -EIO, so goto that in
    the one place that doesn't already use it.

    Signed-off-by: Mingming Cao
    Signed-off-by: Hugh Dickins
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • The reservations tree is an rb_tree not a list, so it's less confusing to use
    rb_entry() than list_entry() - though they're both just container_of().

    Signed-off-by: Mingming Cao
    Signed-off-by: Hugh Dickins
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • rsv_end is the last block within the reservation, so alloc_new_reservation
    should accept start_block == rsv_end as success.

    Signed-off-by: Mingming Cao
    Signed-off-by: Hugh Dickins
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • grp_goal 0 is a genuine goal (unlike -1), so ext3_try_to_allocate_with_rsv
    should treat it as such.

    Signed-off-by: Mingming Cao
    Signed-off-by: Hugh Dickins
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • ext3_new_blocks should reset the reservation window size to 0 when squeezing
    the last blocks out of an almost full filesystem, so the retry doesn't skip
    any groups with less than half that free, reporting ENOSPC too soon.

    Signed-off-by: Mingming Cao
    Signed-off-by: Hugh Dickins
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • If you do something like:

    # touch foo
    # tail -f foo &
    # rm foo
    #
    #

    you'll panic, because ext3/4 tries to do orphan list processing on the
    readonly snapshot device, and:

    kernel: journal commit I/O error
    kernel: Assertion failure in journal_flush_Rsmp_e2f189ce() at journal.c:1356: "!journal->j_checkpoint_transactions"
    kernel: Kernel panic: Fatal exception

    for a truly readonly underlying device, it's reasonable and necessary
    to just skip orphan list processing.

    Signed-off-by: Eric Sandeen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • Hugh Dickins wrote:
    > Not found anything relevant, but I keep noticing these lines
    > in ext2_try_to_allocate_with_rsv(), ext3 and ext4 similar:
    >
    > } else if (grp_goal > 0 &&
    > (my_rsv->rsv_end - grp_goal + 1) < *count)
    > try_to_extend_reservation(my_rsv, sb,
    > *count-my_rsv->rsv_end + grp_goal - 1);
    >
    > They're wrong, a no-op in most groups, aren't they? rsv_end is an
    > absolute block number, whereas grp_goal is group-relative, so the
    > calculation ought to bring in group_first_block? Or I'm confused.
    >

    Signed-off-by: Mingming Cao
    Cc: "linux-ext4@vger.kernel.org"
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • In journal=ordered or journal=data mode retry in ext3_prepare_write()
    breaks the requirements of journaling of data with respect to metadata.
    The fix is to call commit_write to commit allocated zero blocks before
    retry.

    Signed-off-by: Kirill Korotaev
    Cc: Ingo Molnar
    Cc: Ken Chen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Savochkin
     
  • Saves nearly 4kbytes on x86.

    Cc: Arnaldo Carvalho de Melo
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • I've been using Steve Grubb's purely evil "fsfuzzer" tool, at
    http://people.redhat.com/sgrubb/files/fsfuzzer-0.4.tar.gz

    Basically it makes a filesystem, splats some random bits over it, then
    tries to mount it and do some simple filesystem actions.

    At best, the filesystem catches the corruption gracefully. At worst,
    things spin out of control.

    As you might guess, we found a couple places in ext3 where things spin out
    of control :)

    First, we had a corrupted directory that was never checked for
    consistency... it was corrupt, and pointed to another bad "entry" of
    length 0. The for() loop looped forever, since the length of
    ext3_next_entry(de) was 0, and we kept looking at the same pointer over and
    over and over and over... I modeled this check and subsequent action on
    what is done for other directory types in ext3_readdir...

    (adding this check adds some computational expense; I am testing a followup
    patch to reduce the number of times we check and re-check these directory
    entries, in all cases. Thanks for the idea, Andreas).

    Next we had a root directory inode which had a corrupted size, claimed to
    be > 200M on a 4M filesystem. There was only really 1 block in the
    directory, but because the size was so large, readdir kept coming back for
    more, spewing thousands of printk's along the way.

    Per Andreas' suggestion, if we're in this read error condition and we're
    trying to read an offset which is greater than i_blocks worth of bytes,
    stop trying, and break out of the loop.

    With these two changes fsfuzz test survives quite well on ext3.

    Signed-off-by: Eric Sandeen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • lock_super() is unnecessary for setting super-block feature flags. Use the
    provided *_SET_COMPAT_FEATURE() macros as well.

    Signed-off-by: Andreas Gruenbacher
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Gruenbacher
     
  • Update ext3_statfs to return an FSID that is a 64 bit XOR of the 128 bit
    filesystem UUID as suggested by Andreas Dilger. See the following Bugzilla
    entry for details:

    http://bugzilla.kernel.org/show_bug.cgi?id=136

    Cc: Andreas Dilger
    Cc: Stephen Tweedie
    Signed-off-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg
     
  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_NOFS is an alias of GFP_NOFS.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

12 Oct, 2006

1 commit

  • Current error behaviour for ext2 and ext3 filesystems does not fully
    correspond to the documentation and should be fixed.

    According to man 8 mount, ext2 and ext3 file systems allow to set one of 3
    different on-errors behaviours:

    ---- start of quote man 8 mount ----

    errors=continue / errors=remount-ro / errors=panic

    Define the behaviour when an error is encountered. (Either ignore
    errors and just mark the file system erroneous and continue, or remount
    the file system read-only, or panic and halt the system.) The default is
    set in the filesystem superblock, and can be changed using tune2fs(8).

    ---- end of quote ----

    However EXT3_ERRORS_CONTINUE is not read from the superblock, and thus
    ERRORS_CONT is not saved on the sbi->s_mount_opt. It leads to the incorrect
    handle of errors on ext3.

    Then we've checked corresponding code in ext2 and discovered that it is buggy
    as well:

    - EXT2_ERRORS_CONTINUE is not read from the superblock (the same);

    - parse_option() does not clean the alternative values and thus something
    like (ERRORS_CONT|ERRORS_RO) can be set;

    - if options are omitted, parse_option() does not set any of these options.

    Therefore it is possible to set any combination of these options on the ext2:

    - none of them may be set: EXT2_ERRORS_CONTINUE on superblock / empty mount
    options;

    - any of them may be set using mount options;

    - 2 any options may be set: by using EXT2_ERRORS_RO/EXT2_ERRORS_PANIC on the
    superblock and other value in mount options;

    - and finally all three options may be set by adding third option in remount.

    Currently ext2 uses these values only in ext2_error() and it is not leading to
    any noticeable troubles. However somebody may be discouraged when he will try
    to workaround EXT2_ERRORS_PANIC on the superblock by using errors=continue in
    mount options.

    This patch:

    EXT3_ERRORS_CONTINUE should be taken from the superblock as default value for
    error behaviour.

    Signed-off-by: Dmitry Mishin
    Acked-by: Vasily Averin
    Acked-by: Kirill Korotaev
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Mishin
     

01 Oct, 2006

7 commits


27 Sep, 2006

13 commits

  • This eliminates the i_blksize field from struct inode. Filesystems that want
    to provide a per-inode st_blksize can do so by providing their own getattr
    routine instead of using the generic_fillattr() function.

    Note that some filesystems were providing pretty much random (and incorrect)
    values for i_blksize.

    [bunk@stusta.de: cleanup]
    [akpm@osdl.org: generic_fillattr() fix]
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     
  • * Rougly half of callers already do it by not checking return value
    * Code in drivers/acpi/osl.c does the following to be sure:

    (void)kmem_cache_destroy(cache);

    * Those who check it printk something, however, slab_error already printed
    the name of failed cache.
    * XFS BUGs on failed kmem_cache_destroy which is not the decision
    low-level filesystem driver should make. Converted to ignore.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • * Removing useless casts
    * Removing useless wrapper
    * Conversion from kmalloc+memset to kzalloc

    Signed-off-by: Panagiotis Issaris
    Acked-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Panagiotis Issaris
     
  • Conversions from kmalloc+memset to kzalloc.

    Signed-off-by: Panagiotis Issaris
    Jffs2-bit-acked-by: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Panagiotis Issaris
     
  • Some of the changes in balloc.c are just cosmetic, as Andreas pointed out -
    if they overflow they'll then underflow and things are fine.

    5th hunk actually fixes an overflow problem.

    Also check for potential overflows in inode & block counts when resizing.

    Signed-off-by: Eric Sandeen
    Cc: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • Fixing up some endian-ness warnings in preparation to clone ext4 from ext3.

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • More white space cleanups in preparation of cloning ext4 from ext3.
    Removing spaces that precede a tab.

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • SWsoft Virtuozzo/OpenVZ Linux kernel team has discovered that ext3 error
    behavior was broken in linux kernels since 2.5.x versions by the following
    patch:

    2002/10/31 02:15:26-05:00 tytso@snap.thunk.org
    Default mount options from superblock for ext2/3 filesystems
    http://linux.bkbits.net:8080/linux-2.6/gnupatch@3dc0d88eKbV9ivV4ptRNM8fBuA3JBQ

    In case ext3 file system is mounted with errors=continue
    (EXT3_ERRORS_CONTINUE) errors should be ignored when possible. However at
    present in case of any error kernel aborts journal and remounts filesystem
    to read-only. Such behavior was hit number of times and noted to differ
    from that of 2.4.x kernels.

    This patch fixes this:
    - do nothing in case of EXT3_ERRORS_CONTINUE,
    - set EXT3_MOUNT_ABORT and call journal_abort() in all other cases
    - panic() should be called after ext3_commit_super() to save
    sb marked as EXT3_ERROR_FS

    Signed-off-by: Vasily Averin
    Acked-by: Kirill Korotaev
    Cc: Theodore Ts'o
    Cc: "Stephen C. Tweedie"
    Cc: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasily Averin
     
  • Signed-off-by: Mingming Cao
    Acked-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • In the past there were a few kernel panics related to block reservation
    tree operations failure (insert/remove etc). It would be very useful to
    get the block allocation reservation map info when such error happens.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • This is primarily format string fixes, with changes to ialloc.c where large
    inode counts could overflow, and also pass around journal_inum as an
    unsigned long, just to be pedantic about it....

    Signed-off-by: Eric Sandeen
    Cc: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • I need to do some actual IO testing now, but this gets things mounting for
    a 16T ext3 filesystem. (patched up e2fsprogs is needed too, I'll send that
    off the kernel list)

    This patch fixes these issues in the kernel:

    o sbi->s_groups_count overflows in ext3_fill_super()

    sbi->s_groups_count = (le32_to_cpu(es->s_blocks_count) -
    le32_to_cpu(es->s_first_data_block) +
    EXT3_BLOCKS_PER_GROUP(sb) - 1) /
    EXT3_BLOCKS_PER_GROUP(sb);

    at 16T, s_blocks_count is already maxed out; adding
    EXT3_BLOCKS_PER_GROUP(sb) overflows it and groups_count comes out to 0.
    Not really what we want, and causes a failed mount.

    Feel free to check my math (actually, please do!), but changing it this
    way should work & avoid the overflow:

    (A + B - 1)/B changed to: ((A - 1)/B) + 1

    o ext3_check_descriptors() overflows range checks

    ext3_check_descriptors() iterates over all block groups making sure
    that various bits are within the right block ranges... on the last pass
    through, it is checking the error case

    [item] >= block + EXT3_BLOCKS_PER_GROUP(sb)

    where "block" is the first block in the last block group. The last
    block in this group (and the last one that will fit in 32 bits) is block
    + EXT3_BLOCKS_PER_GROUP(sb)- 1. block + EXT3_BLOCKS_PER_GROUP(sb) wraps
    back around to 0.

    so, make things clearer with "first_block" and "last_block" where those
    are first and last, inclusive, and use rather than =.

    Finally, the last block group may be smaller than the rest, so account
    for this on the last pass through: last_block = sb->s_blocks_count - 1;

    (a similar patch could be done for ext2; does anyone in their right mind
    use ext2 at 16T? I'll send an ext2 patch doing the same thing if that's
    warranted)

    Signed-off-by: Eric Sandeen
    Cc: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • Remove whitespace from ext3 and jbd, before we clone ext4.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     

17 Sep, 2006

2 commits

  • ext3-get-blocks support caused ~20% degrade in Sequential read
    performance (tiobench). Problem is with marking the buffer boundary
    so IO can be submitted right away. Here is the patch to fix it.

    2.6.18-rc6:
    -----------
    # ./iotest
    1048576+0 records in
    1048576+0 records out
    4294967296 bytes (4.3 GB) copied, 75.2726 seconds, 57.1 MB/s

    real 1m15.285s
    user 0m0.276s
    sys 0m3.884s

    2.6.18-rc6 + fix:
    -----------------
    [root@elm3a241 ~]# ./iotest
    1048576+0 records in
    1048576+0 records out
    4294967296 bytes (4.3 GB) copied, 62.9356 seconds, 68.2 MB/s

    The boundary block check in ext3_get_blocks_handle needs to be adjusted
    against the count of blocks mapped in this call, now that it can map
    more than one block.

    Signed-off-by: Suparna Bhattacharya
    Tested-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Suparna Bhattacharya
     
  • Inodes earlier than the 'first' inode (e.g. journal, resize) should be
    rejected early - except the root inode. Also inode numbers that are too
    big should be rejected early.

    [akpm@osdl.org: cleanup]
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown