09 Jan, 2012

1 commit

  • Delete any instances of include module.h that were not strictly
    required. In the case of ext2, the declaration of MODULE_LICENSE
    etc. were in inode.c but the module_init/exit were in super.c, so
    relocate the MODULE_LICENCE/AUTHOR block to super.c which makes it
    consistent with ext3 and ext4 at the same time.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: Jan Kara

    Paul Gortmaker
     

02 Nov, 2011

1 commit


21 Jul, 2011

2 commits

  • Simple filesystems always pass inode->i_sb_bdev as the block device
    argument, and never need a end_io handler. Let's simply things for
    them and for my grepping activity by dropping these arguments. The
    only thing not falling into that scheme is ext4, which passes and
    end_io handler without needing special flags (yet), but given how
    messy the direct I/O code there is use of __blockdev_direct_IO
    in one instead of two out of three cases isn't going to make a large
    difference anyway.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Let filesystems handle waiting for direct I/O requests themselves instead
    of doing it beforehand. This means filesystem-specific locks to prevent
    new dio referenes from appearing can be held. This is important to allow
    generalizing i_dio_count to non-DIO_LOCKING filesystems.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

31 Mar, 2011

1 commit


10 Mar, 2011

1 commit

  • Code has been converted over to the new explicit on-stack plugging,
    and delay users have been converted to use the new API for that.
    So lets kill off the old plugging along with aops->sync_page().

    Signed-off-by: Jens Axboe

    Jens Axboe
     

26 Oct, 2010

1 commit

  • Add a new helper to write out the inode using the writeback code,
    that is including the correct dirty bit and list manipulation. A few
    of filesystems already opencode this, and a lot of others should be
    using it instead of using write_inode_now which also writes out the
    data.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

23 Sep, 2010

1 commit


10 Aug, 2010

8 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Make sure we check the truncate constraints early on in ->setattr by adding
    those checks to inode_change_ok. Also clean up and document inode_change_ok
    to make this obvious.

    As a fallout we don't have to call inode_newsize_ok from simple_setsize and
    simplify it down to a truncate_setsize which doesn't return an error. This
    simplifies a lot of setattr implementations and means we use truncate_setsize
    almost everywhere. Get rid of fat_setsize now that it's trivial and mark
    ext2_setsize static to make the calling convention obvious.

    Keep the inode_newsize_ok in vmtruncate for now as all callers need an
    audit for its removal anyway.

    Note: setattr code in ecryptfs doesn't call inode_change_ok at all and
    needs a deeper audit, but that is left for later.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Despite its name it's now a generic implementation of ->setattr, but
    rather a helper to copy attributes from a struct iattr to the inode.
    Rename it to setattr_copy to reflect this fact.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Move the call to vmtruncate to get rid of accessive blocks to the callers
    in preparation of the new truncate sequence and rename the non-truncating
    version to block_write_begin.

    While we're at it also remove several unused arguments to block_write_begin.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • For filesystem that implement directories in pagecache we call
    block_write_begin with an already allocated page for this code, while the
    normal regular file write path uses the default block_write_begin behaviour.

    Get rid of the __foofs_write_begin helper and opencode the normal write_begin
    call in foofs_write_begin, while adding a new foofs_prepare_chunk helper for
    the directory code. The added benefit is that foofs_prepare_chunk has
    a much saner calling convention.

    Note that the interruptible flag passed into block_write_begin is always
    ignored if we already pass in a page (see next patch for details), and
    we never were doing truncations of exessive blocks for this case either so we
    can switch directly to block_write_begin_newtrunc.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Move the call to vmtruncate to get rid of accessive blocks to the only
    remaining caller and rename the non-truncating version to nobh_write_begin.

    Get rid of the superflous file argument to it while we're at it.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Move the call to vmtruncate to get rid of accessive blocks to the callers
    in prepearation of the new truncate calling sequence. This was only done
    for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
    was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
    its _newtrunc variant while at it as just opencoding the two additional
    paramters is shorted than the name suffix.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

05 Jun, 2010

1 commit

  • mtime and ctime should be changed only if the file size has actually
    changed. Patches changing ext2 and tmpfs from vmtruncate to new truncate
    sequence has caused regressions where they always update timestamps.

    There is some strange cases in POSIX where truncate(2) must not update
    times unless the size has acutally changed, see 6e656be89.

    This area is all still rather buggy in different ways in a lot of
    filesystems and needs a cleanup and audit (ideally the vfs will provide
    a simple attribute or call to direct all filesystems exactly which
    attributes to change). But coming up with the best solution will take a
    while and is not appropriate for rc anyway.

    So fix recent regression for now.

    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     

28 May, 2010

1 commit


22 May, 2010

3 commits

  • Quota must being initialized if size or uid/git changes requested.
    But initialization performed in two different places:
    in case of i_size file system is responsible for dquot init
    , but in case of uid/gid init will be called internally in
    dquot_transfer().
    This ambiguity makes code harder to understand.
    Let's move this logic to one common helper function.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jan Kara

    Dmitry Monakhov
     
  • The BKL is still used in ext2_put_super(), ext2_fill_super(), ext2_sync_fs()
    ext2_remount() and ext2_write_inode(). From these calls ext2_put_super(),
    ext2_fill_super() and ext2_remount() are protected against each other by
    the struct super_block s_umount rw semaphore. The call in ext2_write_inode()
    could only protect the modification of the ext2_sb_info through
    ext2_update_dynamic_rev() against concurrent ext2_sync_fs() or ext2_remount().
    ext2_fill_super() and ext2_put_super() can be left out because you need a
    valid filesystem reference in all three cases, which you do not have when
    you are one of these functions.

    If the BKL is only protecting the modification of the ext2_sb_info it can
    safely be removed since this is protected by the struct ext2_sb_info s_lock.

    Signed-off-by: Jan Blunck
    Cc: Jan Kara
    Signed-off-by: Jan Kara

    Jan Blunck
     
  • Add a spinlock that protects against concurrent modifications of
    s_mount_state, s_blocks_last, s_overhead_last and the content of the
    superblock's buffer pointed to by sbi->s_es. The spinlock is now used in
    ext2_xattr_update_super_block() which was setting the
    EXT2_FEATURE_COMPAT_EXT_ATTR flag on the superblock without protection
    before. Likewise the spinlock is used in ext2_show_options() to have a
    consistent view of the mount options.

    This is a preparation patch for removing the BKL from ext2 in the next
    patch.

    Signed-off-by: Jan Blunck
    Cc: Andi Kleen
    Cc: Jan Kara
    Cc: OGAWA Hirofumi
    Signed-off-by: Jan Kara

    Jan Blunck
     

06 Mar, 2010

2 commits

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
    quota: stop using QUOTA_OK / NO_QUOTA
    dquot: cleanup dquot initialize routine
    dquot: move dquot initialization responsibility into the filesystem
    dquot: cleanup dquot drop routine
    dquot: move dquot drop responsibility into the filesystem
    dquot: cleanup dquot transfer routine
    dquot: move dquot transfer responsibility into the filesystem
    dquot: cleanup inode allocation / freeing routines
    dquot: cleanup space allocation / freeing routines
    ext3: add writepage sanity checks
    ext3: Truncate allocated blocks if direct IO write fails to update i_size
    quota: Properly invalidate caches even for filesystems with blocksize < pagesize
    quota: generalize quota transfer interface
    quota: sb_quota state flags cleanup
    jbd: Delay discarding buffers in journal_unmap_buffer
    ext3: quota_write cross block boundary behaviour
    quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
    quota: split out compat_sys_quotactl support from quota.c
    quota: split out netlink notification support from quota.c
    quota: remove invalid optimization from quota_sync_all
    ...

    Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c

    Linus Torvalds
     
  • This gives the filesystem more information about the writeback that
    is happening. Trond requested this for the NFS unstable write handling,
    and other filesystems might benefit from this too by beeing able to
    distinguish between the different callers in more detail.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

05 Mar, 2010

3 commits

  • Get rid of the initialize dquot operation - it is now always called from
    the filesystem and if a filesystem really needs it's own (which none
    currently does) it can just call into it's own routine directly.

    Rename the now static low-level dquot_initialize helper to __dquot_initialize
    and vfs_dq_init to dquot_initialize to have a consistent namespace.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Currently various places in the VFS call vfs_dq_init directly. This means
    we tie the quota code into the VFS. Get rid of that and make the
    filesystem responsible for the initialization. For most metadata operations
    this is a straight forward move into the methods, but for truncate and
    open it's a bit more complicated.

    For truncate we currently only call vfs_dq_init for the sys_truncate case
    because open already takes care of it for ftruncate and open(O_TRUNC) - the
    new code causes an additional vfs_dq_init for those which is harmless.

    For open the initialization is moved from do_filp_open into the open method,
    which means it happens slightly earlier now, and only for regular files.
    The latter is fine because we don't need to initialize it for operations
    on special files, and we already do it as part of the namespace operations
    for directories.

    Add a dquot_file_open helper that filesystems that support generic quotas
    can use to fill in ->open.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Get rid of the transfer dquot operation - it is now always called from
    the filesystem and if a filesystem really needs it's own (which none
    currently does) it can just call into it's own routine directly.

    Rename the now static low-level dquot_transfer helper to __dquot_transfer
    and vfs_dq_transfer to dquot_transfer to have a consistent namespace,
    and make the new dquot_transfer return a normal negative errno value
    which all callers expect.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     

10 Dec, 2009

1 commit

  • make messages produced by ext2 more unified. It should be
    easy to parse.

    dmesg before patch:
    [ 4893.684892] reservations ON
    [ 4893.684896] xip option not supported
    [ 4893.684961] EXT2-fs warning: mounting ext3 filesystem as ext2
    [ 4893.684964] EXT2-fs warning: maximal mount count reached, running
    e2fsck is recommended
    [ 4893.684990] EXT II FS: 0.5b, 95/08/09, bs=1024, fs=1024, gc=2,
    bpg=8192, ipg=1280, mo=80010]

    dmesg after patch:
    [ 4893.684892] EXT2-fs (loop0): reservations ON
    [ 4893.684896] EXT2-fs (loop0): xip option not supported
    [ 4893.684961] EXT2-fs (loop0): warning: mounting ext3 filesystem as
    ext2
    [ 4893.684964] EXT2-fs (loop0): warning: maximal mount count reached,
    running e2fsck is recommended
    [ 4893.684990] EXT2-fs (loop0): 0.5b, 95/08/09, bs=1024, fs=1024, gc=2,
    bpg=8192, ipg=1280, mo=80010]

    Signed-off-by: Alexey Fisher
    Reviewed-by: Andreas Dilger
    Signed-off-by: Jan Kara

    Alexey Fisher
     

16 Sep, 2009

1 commit

  • Enable removing of corrupted pages through truncation
    for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
    These should cover most server needs.

    I chose the set of migration aware file systems for this
    for now, assuming they have been especially audited.
    But in general it should be safe for all file systems
    on the data area that support read/write and truncate.

    Caveat: the hardware error handler does not take i_mutex
    for now before calling the truncate function. Is that ok?

    Cc: tytso@mit.edu
    Cc: hch@infradead.org
    Cc: mfasheh@suse.com
    Cc: aia21@cantab.net
    Cc: hugh.dickins@tiscali.co.uk
    Cc: swhiteho@redhat.com
    Signed-off-by: Andi Kleen

    Andi Kleen
     

14 Sep, 2009

1 commit


24 Jun, 2009

1 commit


12 Jun, 2009

1 commit


14 Apr, 2009

1 commit

  • If two writers allocating blocks to file race with each other (e.g.
    because writepages races with ordinary write or two writepages race with
    each other), ext2_getblock() can be called on the same inode in parallel.
    Before we are going to allocate new blocks, we have to recheck the block
    chain we have obtained so far without holding truncate_mutex. Otherwise
    we could overwrite the indirect block pointer set by the other writer
    leading to data loss.

    The below test program by Ying is able to reproduce the data loss with ext2
    on in BRD in a few minutes if the machine is under memory pressure:

    long kMemSize = 50 << 20;
    int kPageSize = 4096;

    int main(int argc, char **argv) {
    int status;
    int count = 0;
    int i;
    char *fname = "/mnt/test.mmap";
    char *mem;
    unlink(fname);
    int fd = open(fname, O_CREAT | O_EXCL | O_RDWR, 0600);
    status = ftruncate(fd, kMemSize);
    mem = mmap(0, kMemSize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    // Fill the memory with 1s.
    memset(mem, 1, kMemSize);
    sleep(2);
    for (i = 0; i < kMemSize; i++) {
    int byte_good = mem[i] != 0;
    if (!byte_good && ((i % kPageSize) == 0)) {
    //printf("%d ", i / kPageSize);
    count++;
    }
    }
    munmap(mem, kMemSize);
    close(fd);
    unlink(fname);

    if (count > 0) {
    printf("Running %d bad page\n", count);
    return 1;
    }
    return 0;
    }

    Cc: Ying Han
    Cc: Nick Piggin
    Signed-off-by: Jan Kara
    Cc: Mingming Cao
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

26 Mar, 2009

1 commit


09 Jan, 2009

1 commit


01 Jan, 2009

1 commit


04 Oct, 2008

1 commit

  • Any block based fs (this patch includes ext3) just has to declare its own
    fiemap() function and then call this generic function with its own
    get_block_t. This works well for block based filesystems that will map
    multiple contiguous blocks at one time, but will work for filesystems that
    only map one block at a time, you will just end up with an "extent" for each
    block. One gotcha is this will not play nicely where there is hole+data
    after the EOF. This function will assume its hit the end of the data as soon
    as it hits a hole after the EOF, so if there is any data past that it will
    not pick that up. AFAIK no block based fs does this anyway, but its in the
    comments of the function anyway just in case.

    Signed-off-by: Josef Bacik
    Signed-off-by: Mark Fasheh
    Signed-off-by: "Theodore Ts'o"
    Cc: linux-fsdevel@vger.kernel.org

    Josef Bacik
     

29 Jul, 2008

1 commit

  • When we read some part of a file through pagecache, if there is a
    pagecache of corresponding index but this page is not uptodate, read IO
    is issued and this page will be uptodate.

    I think this is good for pagesize == blocksize environment but there is
    room for improvement on pagesize != blocksize environment. Because in
    this case a page can have multiple buffers and even if a page is not
    uptodate, some buffers can be uptodate.

    So I suggest that when all buffers which correspond to a part of a file
    that we want to read are uptodate, use this pagecache and copy data from
    this pagecache to user buffer even if a page is not uptodate. This can
    reduce read IO and improve system throughput.

    I wrote a benchmark program and got result number with this program.

    This benchmark do:

    1: mount and open a test file.

    2: create a 512MB file.

    3: close a file and umount.

    4: mount and again open a test file.

    5: pwrite randomly 300000 times on a test file. offset is aligned
    by IO size(1024bytes).

    6: measure time of preading randomly 100000 times on a test file.

    The result was:
    2.6.26
    330 sec

    2.6.26-patched
    226 sec

    Arch:i386
    Filesystem:ext3
    Blocksize:1024 bytes
    Memory: 1GB

    On ext3/4, a file is written through buffer/block. So random read/write
    mixed workloads or random read after random write workloads are optimized
    with this patch under pagesize != blocksize environment. This test result
    showed this.

    The benchmark program is as follows:

    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    #define LEN 1024
    #define LOOP 1024*512 /* 512MB */

    main(void)
    {
    unsigned long i, offset, filesize;
    int fd;
    char buf[LEN];
    time_t t1, t2;

    if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
    perror("cannot mount\n");
    exit(1);
    }
    memset(buf, 0, LEN);
    fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
    if (fd < 0) {
    perror("cannot open file\n");
    exit(1);
    }
    for (i = 0; i < LOOP; i++)
    write(fd, buf, LEN);
    close(fd);
    if (umount("/root/test1/") < 0) {
    perror("cannot umount\n");
    exit(1);
    }
    if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
    perror("cannot mount\n");
    exit(1);
    }
    fd = open("/root/test1/testfile", O_RDWR);
    if (fd < 0) {
    perror("cannot open file\n");
    exit(1);
    }

    filesize = LEN * LOOP;
    for (i = 0; i < 300000; i++){
    offset = (random() % filesize) & (~(LEN - 1));
    pwrite(fd, buf, LEN, offset);
    }
    printf("start test\n");
    time(&t1);
    for (i = 0; i < 100000; i++){
    offset = (random() % filesize) & (~(LEN - 1));
    pread(fd, buf, LEN, offset);
    }
    time(&t2);
    printf("%ld sec\n", t2-t1);
    close(fd);
    if (umount("/root/test1/") < 0) {
    perror("cannot umount\n");
    exit(1);
    }
    }

    Signed-off-by: Hisashi Hifumi
    Cc: Nick Piggin
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hisashi Hifumi
     

28 Apr, 2008

3 commits

  • Use ext2_fsblk_t type for filesystem-wide blocks number

    Signed-off-by: Akinobu Mita
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Use ext2_group_first_block_no() and assign the return values to
    ext2_fsblk_t variables.

    Signed-off-by: Akinobu Mita
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Convert XIP to support non-struct page backed memory, using VM_MIXEDMAP for
    the user mappings.

    This requires the get_xip_page API to be changed to an address based one.
    Improve the API layering a little bit too, while we're here.

    This is required in order to support XIP filesystems on memory that isn't
    backed with struct page (but memory with struct page is still supported too).

    Signed-off-by: Nick Piggin
    Acked-by: Carsten Otte
    Cc: Jared Hulbert
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin