12 Oct, 2006

40 commits

  • Currently ioprio_best function first checks wethere aioprio or bioprio equals
    IOPRIO_CLASS_NONE (ioprio_valid() macros does that) and if it is so it returns
    bioprio/aioprio appropriately. Thus the next four lines, that set aclass/bclass
    to IOPRIO_CLASS_BE, if aclass/bclass == IOPRIO_CLASS_NONE, are never executed.

    The second problem: if aioprio from class IOPRIO_CLASS_NONE and bioprio from
    class IOPRIO_CLASS_IDLE are passed to ioprio_best function, it will return
    IOPRIO_CLASS_IDLE. It means that during __make_request we can merge two
    requests and set the priority of merged request to IDLE, while one of
    the initial requests originates from a process with NONE (default) priority.
    So we can get a situation when a process with default ioprio will experience
    IO starvation, while there is no process from real-time class in the system.

    Just removing ioprio_valid check should correct situation.

    Signed-off-by: Vasily Tarasov
    Signed-off-by: Jens Axboe

    Vasily Tarasov
     
  • Don't jump to the unlock+release path, we already did that.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • - Calculate a variable in bvec_alloc_bs() only once needed, not earlier
    (bio.o down from 18408 to 18376 Bytes, 32 Bytes saved, probably due to
    data locality improvements).

    - Init variable idx to silence a gcc warning which already existed in the
    unmodified original base file (bvec_alloc_bs() handles idx correctly, so
    there's no need for the warning):

    fs/bio.c: In function `bio_alloc_bioset':
    fs/bio.c:169: warning: `idx' may be used uninitialized in this function

    Signed-off-by: Andreas Mohr
    Acked-by: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Mohr
     
  • The attached patch destroys all the dentries attached to a superblock in one go
    by:

    (1) Destroying the tree rooted at s_root.

    (2) Destroying every entry in the anon list, one at a time.

    (3) Each entry in the anon list has its subtree consumed from the leaves
    inwards.

    This reduces the amount of work generic_shutdown_super() does, and avoids
    iterating through the dentry_unused list.

    Note that locking is almost entirely absent in the shrink_dcache_for_umount*()
    functions added by this patch. This is because:

    (1) at the point the filesystem calls generic_shutdown_super(), it is not
    permitted to further touch the superblock's set of dentries, and nor may
    it remove aliases from inodes;

    (2) the dcache memory shrinker now skips dentries that are being unmounted;
    and

    (3) the superblock no longer has any external references through which the VFS
    can reach it.

    Given these points, the only locking we need to do is when we remove dentries
    from the unused list and the name hashes, which we do a directory's worth at a
    time.

    We also don't need to guard against reference counts going to zero unexpectedly
    and removing bits of the tree we're working on as nothing else can call dput().

    A cut down version of dentry_iput() has been folded into
    shrink_dcache_for_umount_subtree() function. Apart from not needing to unlock
    things, it also doesn't need to check for inotify watches.

    In this version of the patch, the complaint about a dentry still being in use
    has been expanded from a single BUG_ON() and now gives much more information.

    Signed-off-by: David Howells
    Acked-by: NeilBrown
    Acked-by: Ian Kent
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Make sure all dentries refs are released before calling kill_anon_super() so
    that the assumption that generic_shutdown_super() can completely destroy the
    dentry tree for there will be no external references holds true.

    What was being done in the put_super() superblock op, is now done in the
    kill_sb() filesystem op instead, prior to calling kill_anon_super().

    This makes the struct autofs_sb_info::root member variable redundant (since
    sb->s_root is still available), and so that is removed. The calls to
    shrink_dcache_sb() are also removed since they're also redundant as
    shrink_dcache_for_umount() will now be called after the cleanup routine.

    Signed-off-by: David Howells
    Acked-by: Ian Kent
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Make sure all dentries refs are released before calling kill_block_super()
    so that the assumption that generic_shutdown_super() can completely destroy
    the dentry tree for there will be no external references holds true.

    What was being done in the put_super() superblock op, is now done in the
    kill_sb() filesystem op instead, prior to calling kill_block_super().

    Changes made in [try #2]:

    (*) reiserfs_kill_sb() now checks that the superblock FS info pointer is set
    before trying to dereference it.

    Signed-off-by: David Howells
    Cc: "Rafael J. Wysocki"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Signed-off-by: Alexey Dobriyan
    Cc: David Woodhouse
    Cc: David Howells
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • A couple of flush_dcache_page()s are missing on the I/O-error paths.

    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Monakhov Dmitriy
     
  • Aince all callers dereference sb, and this function does so earlier too, we
    dont need the check.

    Signed-off-by: Eric Sesterhenn
    Acked-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sesterhenn
     
  • A couple of HDIO IOCTLs are not yet handled and a few others are marked
    as using a pointer rather than an unsigned long. The formers include:

    HDIO_GET_WCACHE, HDIO_GET_ACOUSTIC, HDIO_GET_ADDRESS and
    HDIO_GET_BUSSTATE. The latters are: HDIO_SET_MULTCOUNT,
    HDIO_SET_UNMASKINTR, HDIO_SET_KEEPSETTINGS, HDIO_SET_32BIT,
    HDIO_SET_NOWERR, HDIO_SET_DMA, HDIO_SET_PIO_MODE and HDIO_SET_NICE.

    Additionally 0x330 used to be HDIO_GETGEO_BIG and may be issued by 32-bit
    `hdparm' run on a 64-bit kernel making Linux complain loudly.

    This is a fix for these issues.

    Signed-off-by: Maciej W. Rozycki
    Cc: Alan Cox
    Acked-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Maciej W. Rozycki
     
  • Current error behaviour for ext2 and ext3 filesystems does not fully
    correspond to the documentation and should be fixed.

    According to man 8 mount, ext2 and ext3 file systems allow to set one of 3
    different on-errors behaviours:

    ---- start of quote man 8 mount ----

    errors=continue / errors=remount-ro / errors=panic

    Define the behaviour when an error is encountered. (Either ignore
    errors and just mark the file system erroneous and continue, or remount
    the file system read-only, or panic and halt the system.) The default is
    set in the filesystem superblock, and can be changed using tune2fs(8).

    ---- end of quote ----

    However EXT3_ERRORS_CONTINUE is not read from the superblock, and thus
    ERRORS_CONT is not saved on the sbi->s_mount_opt. It leads to the incorrect
    handle of errors on ext3.

    Then we've checked corresponding code in ext2 and discovered that it is buggy
    as well:

    - EXT2_ERRORS_CONTINUE is not read from the superblock (the same);

    - parse_option() does not clean the alternative values and thus something
    like (ERRORS_CONT|ERRORS_RO) can be set;

    - if options are omitted, parse_option() does not set any of these options.

    Therefore it is possible to set any combination of these options on the ext2:

    - none of them may be set: EXT2_ERRORS_CONTINUE on superblock / empty mount
    options;

    - any of them may be set using mount options;

    - 2 any options may be set: by using EXT2_ERRORS_RO/EXT2_ERRORS_PANIC on the
    superblock and other value in mount options;

    - and finally all three options may be set by adding third option in remount.

    Currently ext2 uses these values only in ext2_error() and it is not leading to
    any noticeable troubles. However somebody may be discouraged when he will try
    to workaround EXT2_ERRORS_PANIC on the superblock by using errors=continue in
    mount options.

    This patch:

    EXT2_ERRORS_CONTINUE should be read from the superblock as default value for
    error behaviour. parse_option() should clean the alternative options and
    should not change default value taken from the superblock.

    Signed-off-by: Vasily Averin
    Acked-by: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasily Averin
     
  • Current error behaviour for ext2 and ext3 filesystems does not fully
    correspond to the documentation and should be fixed.

    According to man 8 mount, ext2 and ext3 file systems allow to set one of 3
    different on-errors behaviours:

    ---- start of quote man 8 mount ----

    errors=continue / errors=remount-ro / errors=panic

    Define the behaviour when an error is encountered. (Either ignore
    errors and just mark the file system erroneous and continue, or remount
    the file system read-only, or panic and halt the system.) The default is
    set in the filesystem superblock, and can be changed using tune2fs(8).

    ---- end of quote ----

    However EXT3_ERRORS_CONTINUE is not read from the superblock, and thus
    ERRORS_CONT is not saved on the sbi->s_mount_opt. It leads to the incorrect
    handle of errors on ext3.

    Then we've checked corresponding code in ext2 and discovered that it is buggy
    as well:

    - EXT2_ERRORS_CONTINUE is not read from the superblock (the same);

    - parse_option() does not clean the alternative values and thus something
    like (ERRORS_CONT|ERRORS_RO) can be set;

    - if options are omitted, parse_option() does not set any of these options.

    Therefore it is possible to set any combination of these options on the ext2:

    - none of them may be set: EXT2_ERRORS_CONTINUE on superblock / empty mount
    options;

    - any of them may be set using mount options;

    - 2 any options may be set: by using EXT2_ERRORS_RO/EXT2_ERRORS_PANIC on the
    superblock and other value in mount options;

    - and finally all three options may be set by adding third option in remount.

    Currently ext2 uses these values only in ext2_error() and it is not leading to
    any noticeable troubles. However somebody may be discouraged when he will try
    to workaround EXT2_ERRORS_PANIC on the superblock by using errors=continue in
    mount options.

    This patch:

    EXT3_ERRORS_CONTINUE should be taken from the superblock as default value for
    error behaviour.

    Signed-off-by: Dmitry Mishin
    Acked-by: Vasily Averin
    Acked-by: Kirill Korotaev
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Mishin
     
  • If grow_buffers() is for some reason passed a block number which wants to lie
    outside the maximum-addressable pagecache range (PAGE_SIZE * 4G bytes) then it
    will accidentally truncate `index' and will then instnatiate a page at the
    wrong pagecache offset. This causes __getblk_slow() to go into an infinite
    loop.

    This can happen with corrupted disks, or with software errors elsewhere.

    Detect that, and handle it.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Implement the epoll_pwait system call, that extend the event wait mechanism
    with the same logic ppoll and pselect do. The definition of epoll_pwait
    is:

    int epoll_pwait(int epfd, struct epoll_event *events, int maxevents,
    int timeout, const sigset_t *sigmask, size_t sigsetsize);

    The difference between the vanilla epoll_wait and epoll_pwait is that the
    latter allows the caller to specify a signal mask to be set while waiting
    for events. Hence epoll_pwait will wait until either one monitored event,
    or an unmasked signal happen. If sigmask is NULL, the epoll_pwait system
    call will act exactly like epoll_wait. For the POSIX definition of
    pselect, information is available here:

    http://www.opengroup.org/onlinepubs/009695399/functions/select.html

    Signed-off-by: Davide Libenzi
    Cc: David Woodhouse
    Cc: Andi Kleen
    Cc: Michael Kerrisk
    Cc: Ulrich Drepper
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • Someone's tab key is emitting spaces. Attempt to repair some of the damage.

    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Current error behaviour for ext2 and ext3 filesystems does not fully
    correspond to the documentation and should be fixed.

    According to man 8 mount, ext2 and ext3 file systems allow to set one of 3
    different on-errors behaviours:

    ---- start of quote man 8 mount ----

    errors=continue / errors=remount-ro / errors=panic

    Define the behaviour when an error is encountered. (Either ignore
    errors and just mark the file system erroneous and continue, or remount
    the file system read-only, or panic and halt the system.) The default is
    set in the filesystem superblock, and can be changed using tune2fs(8).

    ---- end of quote ----

    However EXT3_ERRORS_CONTINUE is not read from the superblock, and thus
    ERRORS_CONT is not saved on the sbi->s_mount_opt. It leads to the incorrect
    handle of errors on ext3.

    Then we've checked corresponding code in ext2 and discovered that it is buggy
    as well:

    - EXT2_ERRORS_CONTINUE is not read from the superblock (the same);

    - parse_option() does not clean the alternative values and thus something
    like (ERRORS_CONT|ERRORS_RO) can be set;

    - if options are omitted, parse_option() does not set any of these options.

    Therefore it is possible to set any combination of these options on the ext2:

    - none of them may be set: EXT2_ERRORS_CONTINUE on superblock / empty mount
    options;

    - any of them may be set using mount options;

    - 2 any options may be set: by using EXT2_ERRORS_RO/EXT2_ERRORS_PANIC on the
    superblock and other value in mount options;

    - and finally all three options may be set by adding third option in remount.

    Currently ext2 uses these values only in ext2_error() and it is not leading to
    any noticeable troubles. However somebody may be discouraged when he will try
    to workaround EXT2_ERRORS_PANIC on the superblock by using errors=continue in
    mount options.

    This patch:

    EXT4_ERRORS_CONTINUE should be taken from the superblock as default value for
    error behaviour.

    Signed-off-by: Dmitry Mishin
    Acked-by: Vasily Averin
    Acked-by: Kirill Korotaev
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Mishin
     
  • I assume this means "logical sb block". So call it that.

    I still don't understand the name though. A block is a block. What's
    different about this one?

    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • With CONFIG_LBD=n, sector_div() expands to a plain old divide. But ext4 is
    _not_ passing in a sector_t as the first argument, so...

    fs/built-in.o: In function `ext4_get_group_no_and_offset':
    fs/ext4/balloc.c:39: undefined reference to `__umoddi3'
    fs/ext4/balloc.c:41: undefined reference to `__udivdi3'
    fs/built-in.o: In function `find_group_orlov':
    fs/ext4/ialloc.c:278: undefined reference to `__udivdi3'
    fs/built-in.o: In function `ext4_fill_super':
    fs/ext4/super.c:1488: undefined reference to `__udivdi3'
    fs/ext4/super.c:1488: undefined reference to `__umoddi3'
    fs/ext4/super.c:1594: undefined reference to `__udivdi3'
    fs/ext4/super.c:1601: undefined reference to `__umoddi3'

    Fix that up by calling do_div() directly.

    Also cast the arg to u64. do_div() is only defined on u64, and ext4_fsblk_t
    is supposed to be opaque.

    Note especially the changes to find_group_orlov(). It was attempting to do

    do_div(int, unsigned long long);

    which is royally screwed up. Switched it to plain old divide.

    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Way too big to inline.

    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • move '_hi' bits of block numbers in the larger part of the
    block group descriptor structure

    Signed-off-by: Alexandre Ratchov
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Ratchov
     
  • make block group descriptor larger.

    Signed-off-by: Alexandre Ratchov
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Ratchov
     
  • Similar to ext4, change blocks in JBD2 from sector_t to unsigned long long.

    Signed-off-by: Mingming Cao
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • Previously when in-kernel ext4 block type is sector_t, it's only 4 bits long
    under some 32bit arch (when CONFIG_LBD is not on). So we need to check the
    size of sector_t before we read 48bit long on-disk blocks to in-kernel blocks.

    These checks are unnecessary now as we changed the in-kernel blocks to
    unsigned longlong.

    Signed-off-by: Mingming Cao
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • Change ext4 in-kernel block type (ext4_fsblk_t) from sector_t to unsigned
    long long. Remove ext4 block type string micro E3FSBLK, replaced with "%llu"

    [akpm@osdl.org: build fix]
    Signed-off-by: Mingming Cao
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • In-kernel super block changes to support >32 bit free blocks numbers.

    Signed-off-by: Laurent Vivier
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Alexandre Ratchov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Laurent Vivier
     
  • As we are planning to support 48-bit block numbers for ext4, we need to
    support 48-bit block numbers for extended attributes. In the short term, we
    can do this by reuse (on-disk) 16-bit padding (linux2.i_pad1 currently used
    only by "hurd") as high order bits for xattr. This patch basically does that.

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • JBD layer in-kernel block varibles type fixes to support >32 bit block number
    and convert to sector_t type.

    Signed-off-by: Mingming Cao
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • Here is the patch to JBD to handle 64 bit block numbers, originally from Zach
    Brown. This patch is useful only after adding support for 64-bit block
    numbers in the filesystem.

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Zach Brown
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     
  • Signed-off-by: Randy Dunlap
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Make it possible to add file preallocation support in future as an RO_COMPAT
    feature by recognizing uninitialized extents as holes and limiting extent
    length to keep the top bit of ee_len free for marking uninitialized extents.

    Signed-off-by: Suparna Bhattacharya
    Signed-off-by: Mingming Cao
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Suparna Bhattacharya
     
  • Signed-off-by: Alex Tomas
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Tomas
     
  • Redefine ext3 in-kernel filesystem block type (ext3_fsblk_t) from unsigned
    long to sector_t, to allow kernel to handle >32 bit ext3 blocks.

    Signed-off-by: Mingming Cao
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • On disk extents format:
    /*
    * this is extent on-disk structure
    * it's used at the bottom of the tree
    */
    struct ext3_extent {
    __le32 ee_block; /* first logical block extent covers */
    __le16 ee_len; /* number of blocks covered by extent */
    __le16 ee_start_hi; /* high 16 bits of physical block */
    __le32 ee_start; /* low 32 bigs of physical block */
    };

    Signed-off-by: Alex Tomas
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Tomas
     
  • Reworked from a patch by Mingming Cao and Randy Dunlap

    Signed-off-By: Randy Dunlap
    Signed-off-by: Mingming Cao
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • jbd and jbd2 currently use the same slab names which must be unique. The
    patch below just renames jbd2's slabs.

    Signed-off-by: Johann Lombardi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johann Lombardi
     
  • Mingming Cao originally did this work, and Shaggy reproduced it using some
    scripts from her.

    Signed-off-by: Mingming Cao
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • This is a simple copy of the files in fs/jbd to fs/jbd2 and
    /usr/incude/linux/[ext4_]jbd.h to /usr/include/[ext4_]jbd2.h

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • Originally part of a patch from Mingming Cao and Randy Dunlap. Reorganized
    by Shaggy.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Mingming Cao
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • Mingming Cao originally did this work, and Shaggy reproduced it using some
    scripts from her.

    Signed-off-by: Mingming Cao
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao