09 May, 2007

1 commit

  • A patch that stores inode flags such as S_IMMUTABLE, S_APPEND, etc. from
    i_flags to EXT3_I(inode)->i_flags when inode is written to disk. The same
    thing is done on GETFLAGS ioctl.

    Quota code changes these flags on quota files (to make it harder for
    sysadmin to screw himself) and these changes were not correctly propagated
    into the filesystem (especially, lsattr did not show them and users were
    wondering...).

    Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into
    ext3-specific i_flags. Hence, when someone sets these flags via a
    different interface than ioctl, they are stored correctly.

    Signed-off-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

13 Feb, 2007

1 commit

  • Many struct inode_operations in the kernel can be "const". Marking them const
    moves these to the .rodata section, which avoids false sharing with potential
    dirty data. In addition it'll catch accidental writes at compile time to
    these shared resources.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

01 Oct, 2006

2 commits


27 Sep, 2006

2 commits


24 Sep, 2006

1 commit


01 Aug, 2006

1 commit

  • The inode number out of an NFS file handle gets passed eventually to
    ext3_get_inode_block() without any checking. If ext3_get_inode_block()
    allows it to trigger an error, then bad filehandles can have unpleasant
    effect - ext3_error() will usually cause a forced read-only remount, or a
    panic if `errors=panic' was used.

    So remove the call to ext3_error there and put a matching check in
    ext3/namei.c where inode numbers are read off storage.

    [akpm@osdl.org: fix off-by-one error]
    Signed-off-by: Neil Brown
    Signed-off-by: Jan Kara
    Cc: Marcel Holtmann
    Cc:
    Cc: "Stephen C. Tweedie"
    Cc: Eric Sandeen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Brown
     

26 Jun, 2006

2 commits

  • Convert the ext3 in-kernel filesystem blocks to ext3_fsblk_t. Convert the
    rest of all unsigned long type in-kernel filesystem blocks to ext3_fsblk_t,
    and replace the printk format string respondingly.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • Some of the in-kernel ext3 block variable type are treated as signed 4 bytes
    int type, thus limited ext3 filesystem to 8TB (4kblock size based). While
    trying to fix them, it seems quite confusing in the ext3 code where some
    blocks are filesystem-wide blocks, some are group relative offsets that need
    to be signed value (as -1 has special meaning). So it seem saner to define
    two types of physical blocks: one is filesystem wide blocks, another is
    group-relative blocks. The following patches clarify these two types of
    blocks in the ext3 code, and fix the type bugs which limit current 32 bit ext3
    filesystem limit to 8TB.

    With this series of patches and the percpu counter data type changes in the mm
    tree, we are able to extend exts filesystem limit to 16TB.

    This work is also a pre-request for the recent >32 bit ext3 work, and makes
    the kernel to able to address 48 bit ext3 block a lot easier: Simply redefine
    ext3_fsblk_t from unsigned long to sector_t and redefine the format string for
    ext3 filesystem block corresponding.

    Two RFC with a series patches have been posted to ext2-devel list and have
    been reviewed and discussed:
    http://marc.theaimsgroup.com/?l=ext2-devel&m=114722190816690&w=2

    http://marc.theaimsgroup.com/?l=ext2-devel&m=114784919525942&w=2

    Patches are tested on both 32 bit machine and 64 bit machine, 8TB ext3 filesystem(with the latest to be released e2fsprogs-1.39). Tests
    includes overnight fsx, tiobench, dbench and fsstress.

    This patch:

    Defines ext3_fsblk_t and ext3_grpblk_t, and the printk format string for
    filesystem wide blocks.

    This patch classifies all block group relative blocks, and ext3_fsblk_t blocks
    occurs in the same function where used to be confusing before. Also include
    kernel bug fixes for filesystem wide in-kernel block variables. There are
    some fileystem wide blocks are treated as int/unsigned int type in the kernel
    currently, especially in ext3 block allocation and reservation code. This
    patch fixed those bugs by converting those variables to ext3_fsblk_t(unsigned
    long) type.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     

05 May, 2006

1 commit


25 Apr, 2006

1 commit


29 Mar, 2006

1 commit

  • This is a conversion to make the various file_operations structs in fs/
    const. Basically a regexp job, with a few manual fixups

    The goal is both to increase correctness (harder to accidentally write to
    shared datastructures) and reducing the false sharing of cachelines with
    things that get dirty in .data (while .rodata is nicely read only and thus
    cache clean)

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

27 Mar, 2006

2 commits

  • Change ext3_try_to_allocate() (called via ext3_new_blocks()) to try to
    allocate the requested number of blocks on a best effort basis: After
    allocated the first block, it will always attempt to allocate the next few(up
    to the requested size and not beyond the reservation window) adjacent blocks
    at the same time.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • Currently ext3_get_block() only maps or allocates one block at a time. This
    is quite inefficient for sequential IO workload.

    I have posted a early implements a simply multiple block map and allocation
    with current ext3. The basic idea is allocating the 1st block in the existing
    way, and attempting to allocate the next adjacent blocks on a best effort
    basis. More description about the implementation could be found here:
    http://marc.theaimsgroup.com/?l=ext2-devel&m=112162230003522&w=2

    The following the latest version of the patch: break the original patch into 5
    patches, re-worked some logicals, and fixed some bugs. The break ups are:

    [patch 1] Adding map multiple blocks at a time in ext3_get_blocks()
    [patch 2] Extend ext3_get_blocks() to support multiple block allocation
    [patch 3] Implement multiple block allocation in ext3-try-to-allocate
    (called via ext3_new_block()).
    [patch 4] Proper accounting updates in ext3_new_blocks()
    [patch 5] Adjust reservation window size properly (by the given number
    of blocks to allocate) before block allocation to increase the
    possibility of allocating multiple blocks in a single call.

    Tests done so far includes fsx,tiobench and dbench. The following numbers
    collected from Direct IO tests (1G file creation/read) shows the system time
    have been greatly reduced (more than 50% on my 8 cpu system) with the patches.

    1G file DIO write:
    2.6.15 2.6.15+patches
    real 0m31.275s 0m31.161s
    user 0m0.000s 0m0.000s
    sys 0m3.384s 0m0.564s

    1G file DIO read:
    2.6.15 2.6.15+patches
    real 0m30.733s 0m30.624s
    user 0m0.000s 0m0.004s
    sys 0m0.748s 0m0.380s

    Some previous test we did on buffered IO with using multiple blocks allocation
    and delayed allocation shows noticeable improvement on throughput and system
    time.

    This patch:

    Add support of mapping multiple blocks in one call.

    This is useful for DIO reads and re-writes (where blocks are already
    allocated), also is in line with Christoph's proposal of using getblocks() in
    mpage_readpage() or mpage_readpages().

    Signed-off-by: Mingming Cao
    Cc: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     

23 Mar, 2006

1 commit

  • Linus points out that ext3_readdir's readahead only cuts in when
    ext3_readdir() is operating at the very start of the directory. So for large
    directories we end up performing no readahead at all and we suck.

    So take it all out and use the core VM's page_cache_readahead(). This means
    that ext3 directory reads will use all of readahead's dynamic sizing goop.

    Note that we're using the directory's filp->f_ra to hold the readahead state,
    but readahead is actually being performed against the underlying blockdev's
    address_space. Fortunately the readahead code is all set up to handle this.

    Tested with printk. It works. I was struggling to find a real workload which
    actually cared.

    (The patch also exports page_cache_readahead() to GPL modules)

    Cc: "Stephen C. Tweedie"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

08 Sep, 2005

1 commit

  • If /etc/mtab is a regular file all of the mount options (of a file system)
    are written to /etc/mtab by the mount command. The quota tools look there
    for the quota strings for their operation. If, however, /etc/mtab is a
    symlink to /proc/mounts (a "good thing" in some environments) the tools
    don't write anything - they assume the kernel will take care of things.

    While the quota options are sent down to the kernel via the mount system
    call and the file system codes handle them properly unfortunately there is
    no code to echo the quota strings into /proc/mounts and the quota tools
    fail in the symlink case.

    The attached patchs modify the EXT[2|3] and JFS codes to add the necessary
    hooks. The show_options function of each file system in these patches
    currently deal with only those things that seemed related to quotas;
    especially in the EXT3 case more can be done (later?).

    Jan Kara also noted the difficulty in moving these changes above the FS
    codes responding similarly to myself to Andrew's comment about possible
    VFS migration. Issue summary:

    - FS codes have to process the entire string of options anyway.

    - Only FS codes that use quotas must have a show_options function (for
    quotas to work properly) however quotas are only used in a small number
    of FS.

    - Since most of the quota using FS support other options these FS codes
    should have the a show_options function to show those options - and the
    quota echoing becomes virtually negligible.

    Based on feedback I have modified my patches from the original:

    JFS a missing patch has been restored to the posting
    EXT[2|3] and JFS always use the show_options function
    - Each FS has at least one FS specific option displayed
    - QUOTA output is under a CONFIG_QUOTA ifdef
    - a follow-on patch will add a multitude of options for each FS
    EXT[2|3] and JFS "quota" is treated as "usrquota"
    EXT3 journalled data check for journalled quota removed
    EXT[2|3] mount when quota specified but not compiled in

    - no changes from my original patch. I tested the patch and the codes
    warn but

    - still mount. With all due respection I believe the comments
    otherwise were a

    - misread of the patch. Please reread/test and comment. XFS patch
    removed - the XFS team already made the necessary changes EXT3 mixing
    old and new quotas are handled differently (not purely exclusive)

    - if old and new quotas for the same type are used together the old
    type is silently depricated for compatability (e.g. usrquota and
    usrjquota)

    - mixing of old and new quotas is an error (e.g. usrjquota and
    grpquota)

    Signed-off-by: Mark Bellon
    Acked-by: Dave Kleikamp
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Bellon
     

13 Jul, 2005

1 commit


24 Jun, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds