10 May, 2007

1 commit


09 May, 2007

3 commits

  • A patch that stores inode flags such as S_IMMUTABLE, S_APPEND, etc. from
    i_flags to EXT3_I(inode)->i_flags when inode is written to disk. The same
    thing is done on GETFLAGS ioctl.

    Quota code changes these flags on quota files (to make it harder for
    sysadmin to screw himself) and these changes were not correctly propagated
    into the filesystem (especially, lsattr did not show them and users were
    wondering...).

    Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into
    ext3-specific i_flags. Hence, when someone sets these flags via a
    different interface than ioctl, they are stored correctly.

    Signed-off-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Remove includes of where it is not used/needed.
    Suggested by Al Viro.

    Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
    sparc64, and arm (all 59 defconfigs).

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Taken from http://bugzilla.kernel.org/show_bug.cgi?id=5079

    signed long ranges from -2.147.483.648 to 2.147.483.647 on x86 32bit

    10000011110110100100111110111101 .. -2,082,844,739
    10000011110110100100111110111101 .. 2,212,122,557
    Cc:

    Andreas says:

    This patch is now treating timestamps with the high bit set as negative
    times (before Jan 1, 1970). This means we lose 1/2 of the possible range
    of timestamps (lopping off 68 years before unix timestamp overflow -
    now only 30 years away :-) to handle the extremely rare case of setting
    timestamps into the distant past.

    If we are only interested in fixing the underflow case, we could just
    limit the values to 0 instead of storing negative values. At worst this
    will skew the timestamp by a few hours for timezones in the far east
    (files would still show Jan 1, 1970 in "ls -l" output).

    That said, it seems 32-bit systems (mine at least) allow files to be set
    into the past (01/01/1907 works fine) so it seems this patch is bringing
    the x86_64 behaviour into sync with other kernels.

    On the plus side, we have a patch that is ready to add nanosecond timestamps
    to ext3 and as an added bonus adds 2 high bits to the on-disk timestamp so
    this extends the maximum date to 2242.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Markus Rechberger
     

03 Apr, 2007

1 commit

  • Revert e92a4d595b464c4aae64be39ca61a9ffe9c8b278.

    Dmitry points out

    "When we block_prepare_write() failed while ext3_prepare_write() we jump to
    "failure" label and call ext3_prepare_failure() witch search last mapped bh
    and invoke commit_write untill it. This is wrong!! because some bh from
    begining to the last mapped bh may be not uptodate. As a result we commit to
    disk not uptodate page content witch contains garbage from previous usage."

    and

    "Unexpected file size increasing."

    Call trace the same as it was in first issue but result is different.
    For example we have file with i_size is zero. we want write two blocks ,
    but fs has only one free block.

    ->ext3_prepare_write(...from == 0, to == 2048)
    retry:
    ->block_prepare_write() == -ENOSPC# we failed but allocated one block here.
    ->ext3_prepare_failure()
    ->commit_write( from == 0, to == 1024) # after this i_size becomes 1024 :)
    if (ret == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, &retries))
    goto retry;

    Finally when all retries will be spended ext3_prepare_failure return
    -ENOSPC, but i_size was increased and later block trimm procedures can't
    help here.

    We don't appear to have the horsepower to fix these issues, so let's put
    things back the way they were for now.

    Cc: Kirill Korotaev
    Cc: Ingo Molnar
    Cc: Ken Chen
    Cc: Andrey Savochkin
    Cc:
    Cc: Dmitriy Monakhov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

12 Feb, 2007

1 commit


08 Dec, 2006

1 commit


01 Oct, 2006

1 commit


27 Sep, 2006

4 commits

  • This eliminates the i_blksize field from struct inode. Filesystems that want
    to provide a per-inode st_blksize can do so by providing their own getattr
    routine instead of using the generic_fillattr() function.

    Note that some filesystems were providing pretty much random (and incorrect)
    values for i_blksize.

    [bunk@stusta.de: cleanup]
    [akpm@osdl.org: generic_fillattr() fix]
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     
  • More white space cleanups in preparation of cloning ext4 from ext3.
    Removing spaces that precede a tab.

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • This is primarily format string fixes, with changes to ialloc.c where large
    inode counts could overflow, and also pass around journal_inum as an
    unsigned long, just to be pedantic about it....

    Signed-off-by: Eric Sandeen
    Cc: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • Remove whitespace from ext3 and jbd, before we clone ext4.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     

17 Sep, 2006

1 commit

  • ext3-get-blocks support caused ~20% degrade in Sequential read
    performance (tiobench). Problem is with marking the buffer boundary
    so IO can be submitted right away. Here is the patch to fix it.

    2.6.18-rc6:
    -----------
    # ./iotest
    1048576+0 records in
    1048576+0 records out
    4294967296 bytes (4.3 GB) copied, 75.2726 seconds, 57.1 MB/s

    real 1m15.285s
    user 0m0.276s
    sys 0m3.884s

    2.6.18-rc6 + fix:
    -----------------
    [root@elm3a241 ~]# ./iotest
    1048576+0 records in
    1048576+0 records out
    4294967296 bytes (4.3 GB) copied, 62.9356 seconds, 68.2 MB/s

    The boundary block check in ext3_get_blocks_handle needs to be adjusted
    against the count of blocks mapped in this call, now that it can map
    more than one block.

    Signed-off-by: Suparna Bhattacharya
    Tested-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Suparna Bhattacharya
     

09 Sep, 2006

1 commit

  • It has been reported that ext3_getblk() is not doing the right thing and
    triggering following WARN():

    BUG: warning at fs/ext3/inode.c:1016/ext3_getblk()
    ext3_getblk+0x98/0x2a6 md_wakeup_thread+0x26/0x2a
    ext3_bread+0x1f/0x88 ext3_quota_read+0x136/0x1ae
    v1_read_dqblk+0x61/0xac dquot_acquire+0xf6/0x107
    ext3_acquire_dquot+0x46/0x68 dqget+0x155/0x1e7
    dquot_transfer+0x3e0/0x3e9 dput+0x23/0x13e
    ext3_setattr+0xc3/0x240 current_fs_time+0x52/0x6a
    notify_change+0x2bd/0x30d chown_common+0x9c/0xc5
    strncpy_from_user+0x3b/0x68 do_path_lookup+0xdf/0x266
    __user_walk_fd+0x44/0x5a sys_chown+0x4a/0x55
    vfs_write+0xe7/0x13c sys_mkdir+0x1f/0x23
    syscall_call+0x7/0xb

    Looking at the code, it looks like it's not handle HOLE correctly. It ends
    up returning -EIO. Here is the patch to fix it.

    If we really want to be paranoid, we can allow return values 0 (HOLE), 1
    (we asked for one block) and return -EIO for more than 1 block. But I
    really don't see a reason for doing it - all we need is the block# here.
    (doesn't matter how many blocks are mapped).

    ext3_get_blocks_handle() returns number of blocks it mapped. It returns 0
    in case of HOLE. ext3_getblk() should handle HOLE properly (currently its
    dumping warning stack and returning -EIO).

    Signed-off-by: Badari Pulavarty
    Acked-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     

01 Aug, 2006

2 commits

  • For files other than IFREG, nobh option doesn't make sense. Modifications
    to them are journalled and needs buffer heads to do that. Without this
    patch, we get kernel oops in page_buffers().

    Signed-off-by: Badari Pulavarty
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • The inode number out of an NFS file handle gets passed eventually to
    ext3_get_inode_block() without any checking. If ext3_get_inode_block()
    allows it to trigger an error, then bad filehandles can have unpleasant
    effect - ext3_error() will usually cause a forced read-only remount, or a
    panic if `errors=panic' was used.

    So remove the call to ext3_error there and put a matching check in
    ext3/namei.c where inode numbers are read off storage.

    [akpm@osdl.org: fix off-by-one error]
    Signed-off-by: Neil Brown
    Signed-off-by: Jan Kara
    Cc: Marcel Holtmann
    Cc:
    Cc: "Stephen C. Tweedie"
    Cc: Eric Sandeen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Brown
     

29 Jun, 2006

1 commit


26 Jun, 2006

2 commits

  • Convert the ext3 in-kernel filesystem blocks to ext3_fsblk_t. Convert the
    rest of all unsigned long type in-kernel filesystem blocks to ext3_fsblk_t,
    and replace the printk format string respondingly.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • Some of the in-kernel ext3 block variable type are treated as signed 4 bytes
    int type, thus limited ext3 filesystem to 8TB (4kblock size based). While
    trying to fix them, it seems quite confusing in the ext3 code where some
    blocks are filesystem-wide blocks, some are group relative offsets that need
    to be signed value (as -1 has special meaning). So it seem saner to define
    two types of physical blocks: one is filesystem wide blocks, another is
    group-relative blocks. The following patches clarify these two types of
    blocks in the ext3 code, and fix the type bugs which limit current 32 bit ext3
    filesystem limit to 8TB.

    With this series of patches and the percpu counter data type changes in the mm
    tree, we are able to extend exts filesystem limit to 16TB.

    This work is also a pre-request for the recent >32 bit ext3 work, and makes
    the kernel to able to address 48 bit ext3 block a lot easier: Simply redefine
    ext3_fsblk_t from unsigned long to sector_t and redefine the format string for
    ext3 filesystem block corresponding.

    Two RFC with a series patches have been posted to ext2-devel list and have
    been reviewed and discussed:
    http://marc.theaimsgroup.com/?l=ext2-devel&m=114722190816690&w=2

    http://marc.theaimsgroup.com/?l=ext2-devel&m=114784919525942&w=2

    Patches are tested on both 32 bit machine and 64 bit machine, 8TB ext3 filesystem(with the latest to be released e2fsprogs-1.39). Tests
    includes overnight fsx, tiobench, dbench and fsstress.

    This patch:

    Defines ext3_fsblk_t and ext3_grpblk_t, and the printk format string for
    filesystem wide blocks.

    This patch classifies all block group relative blocks, and ext3_fsblk_t blocks
    occurs in the same function where used to be confusing before. Also include
    kernel bug fixes for filesystem wide in-kernel block variables. There are
    some fileystem wide blocks are treated as int/unsigned int type in the kernel
    currently, especially in ext3 block allocation and reservation code. This
    patch fixed those bugs by converting those variables to ext3_fsblk_t(unsigned
    long) type.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     

04 May, 2006

1 commit

  • Some places in ext3 multiple block allocation code (in 2.6.17-rc3) don't
    handle the little endian well. This was resulting in *wrong* block numbers
    being assigned to in-memory block variables and then stored on disk
    eventually. The following patch has been verified to fix an ext3
    filesystem failure when run ltp test on a 64 bit machine.

    Signed-off-by; Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     

27 Mar, 2006

6 commits

  • Mingming Cao recently added multi-block allocation support for ext3,
    currently used only by DIO. I added support to map multiple blocks for
    mpage_readpages(). This patch add support for ext3_get_block() to deal
    with multi-block mapping. Basically it renames ext3_direct_io_get_blocks()
    as ext3_get_block().

    Signed-off-by: Badari Pulavarty
    Cc: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • - Clean up a few little layout things and comments.

    - Add a WARN_ON to a case which I was wondering about.

    - Tune up some inlines.

    Cc: Mingming Cao
    Cc: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Now that get_block() can handle mapping multiple disk blocks, no need to have
    ->get_blocks(). This patch removes fs specific ->get_blocks() added for DIO
    and makes it users use get_block() instead.

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • Add support for multiple block allocation in ext3-get-blocks().

    Look up the disk block mapping and count the total number of blocks to
    allocate, then pass it to ext3_new_block(), where the real block allocation is
    performed. Once multiple blocks are allocated, prepare the branch with those
    just allocated blocks info and finally splice the whole branch into the block
    mapping tree.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • Currently ext3_get_block() only maps or allocates one block at a time. This
    is quite inefficient for sequential IO workload.

    I have posted a early implements a simply multiple block map and allocation
    with current ext3. The basic idea is allocating the 1st block in the existing
    way, and attempting to allocate the next adjacent blocks on a best effort
    basis. More description about the implementation could be found here:
    http://marc.theaimsgroup.com/?l=ext2-devel&m=112162230003522&w=2

    The following the latest version of the patch: break the original patch into 5
    patches, re-worked some logicals, and fixed some bugs. The break ups are:

    [patch 1] Adding map multiple blocks at a time in ext3_get_blocks()
    [patch 2] Extend ext3_get_blocks() to support multiple block allocation
    [patch 3] Implement multiple block allocation in ext3-try-to-allocate
    (called via ext3_new_block()).
    [patch 4] Proper accounting updates in ext3_new_blocks()
    [patch 5] Adjust reservation window size properly (by the given number
    of blocks to allocate) before block allocation to increase the
    possibility of allocating multiple blocks in a single call.

    Tests done so far includes fsx,tiobench and dbench. The following numbers
    collected from Direct IO tests (1G file creation/read) shows the system time
    have been greatly reduced (more than 50% on my 8 cpu system) with the patches.

    1G file DIO write:
    2.6.15 2.6.15+patches
    real 0m31.275s 0m31.161s
    user 0m0.000s 0m0.000s
    sys 0m3.384s 0m0.564s

    1G file DIO read:
    2.6.15 2.6.15+patches
    real 0m30.733s 0m30.624s
    user 0m0.000s 0m0.004s
    sys 0m0.748s 0m0.380s

    Some previous test we did on buffered IO with using multiple blocks allocation
    and delayed allocation shows noticeable improvement on throughput and system
    time.

    This patch:

    Add support of mapping multiple blocks in one call.

    This is useful for DIO reads and re-writes (where blocks are already
    allocated), also is in line with Christoph's proposal of using getblocks() in
    mpage_readpage() or mpage_readpages().

    Signed-off-by: Mingming Cao
    Cc: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • The return value of this function is never used, so let's be honest and
    declare it as void.

    Some places where invalidatepage returned 0, I have inserted comments
    suggesting a BUG_ON.

    [akpm@osdl.org: JBD BUG fix]
    [akpm@osdl.org: rework for git-nfs]
    [akpm@osdl.org: don't go BUG in block_invalidate_page()]
    Signed-off-by: Neil Brown
    Acked-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

23 Mar, 2006

2 commits

  • ext3's truncate_sem is always released in the same function it's taken
    and it otherwise is a mutex as well..

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • Linus points out that ext3_readdir's readahead only cuts in when
    ext3_readdir() is operating at the very start of the directory. So for large
    directories we end up performing no readahead at all and we suck.

    So take it all out and use the core VM's page_cache_readahead(). This means
    that ext3 directory reads will use all of readahead's dynamic sizing goop.

    Note that we're using the directory's filp->f_ra to hold the readahead state,
    but readahead is actually being performed against the underlying blockdev's
    address_space. Fortunately the readahead code is all set up to handle this.

    Tested with printk. It works. I was struggling to find a real workload which
    actually cared.

    (The patch also exports page_cache_readahead() to GPL modules)

    Cc: "Stephen C. Tweedie"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

12 Mar, 2006

1 commit

  • One can do "chattr +j" on a file to change its journalling mode. Fix
    writeback mode with "nobh" handling for it.

    Even though, we mount ext3 filesystem in writeback mode with "nobh" option,
    some one can do "chattr +j" on a single file to force it to do journalled
    mode. In order to do journaling, ext3_block_truncate_page() need to
    fallback to default case of creating buffers and adding them to transaction
    etc.

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     

02 Feb, 2006

1 commit

  • Migrate a page with buffers without requiring writeback

    This introduces a new address space operation migratepage() that may be used
    by a filesystem to implement its own version of page migration.

    A version is provided that migrates buffers attached to pages. Some
    filesystems (ext2, ext3, xfs) are modified to utilize this feature.

    The swapper address space operation are modified so that a regular
    migrate_page() will occur for anonymous pages without writeback (migrate_pages
    forces every anonymous page to have a swap entry).

    Signed-off-by: Mike Kravetz
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

14 Nov, 2005

1 commit


31 Oct, 2005

2 commits

  • This patch adds tests for the return value of sb_getblk() in the ext2/3
    filesystems. In fs/buffer.c it is stated that the getblk() function never
    fails. However, it does can return NULL in some situations due to I/O
    errors, which may lead us to NULL pointer dereferences

    Signed-off-by: Glauber de Oliveira Costa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber de Oliveira Costa
     
  • I noticed some problems while running ext3 with the debug flag set on.
    More precisely, I was unable to umount the filesystem. Some investigation
    took me to the patch that follows.

    At a first glance , the lock/unlock I've taken out seems really not
    necessary, as the main code (outside debug) does not lock the super. The
    only additional danger operations that debug code introduces seems to be
    related to bitmap, but bitmap operations tends to be all atomic anyway.

    I also took the opportunity to fix 2 spelling errors.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber de Oliveira Costa
     

28 Oct, 2005

1 commit

  • - ->releasepage() annotated (s/int/gfp_t), instances updated
    - missing gfp_t in fs/* added
    - fixed misannotation from the original sweep caught by bitwise checks:
    XFS used __nocast both for gfp_t and for flags used by XFS allocator.
    The latter left with unsigned int __nocast; we might want to add a
    different type for those but for now let's leave them alone. That,
    BTW, is a case when __nocast use had been actively confusing - it had
    been used in the same code for two different and similar types, with
    no way to catch misuses. Switch of gfp_t to bitwise had caught that
    immediately...

    One tricky bit is left alone to be dealt with later - mapping->flags is
    a mix of gfp_t and error indications. Left alone for now.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

10 Sep, 2005

1 commit

  • Update the file systems in fs/ implementing a delete_inode() callback to
    call truncate_inode_pages(). One implementation note: In developing this
    patch I put the calls to truncate_inode_pages() at the very top of those
    filesystems delete_inode() callbacks in order to retain the previous
    behavior. I'm guessing that some of those could probably be optimized.

    Signed-off-by: Mark Fasheh
    Acked-by: Christoph Hellwig
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Fasheh
     

08 Jul, 2005

1 commit


24 Jun, 2005

1 commit


06 May, 2005

1 commit


01 May, 2005

1 commit

  • The extra race-with-truncate-then-retry logic around
    ext3_get_block_handle(), which was inherited from ext2, becomes unecessary
    for ext3, since we have already obtained the ei->truncate_sem in
    ext3_get_block_handle() before calling ext3_alloc_branch(). The
    ei->truncate_sem is already there to block concurrent truncate and block
    allocation on the same inode. So the inode's indirect addressing tree
    won't be changed after we grab that semaphore.

    We could, after get the semaphore, re-verify the branch is up-to-date or
    not. If it has been changed, then get the updated branch. If we still
    need block allocation, we will have a safe version of the branch to work
    with in the ext3_find_goal()/ext3_splice_branch().

    The code becomes more readable after remove those retry logic. The patch
    also clean up some gotos in ext3_get_block_handle() to make it more
    readable.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds