22 Nov, 2011

1 commit

  • This patch separates the code pertaining to allocations into two
    parts: quota-related information and block reservations.
    This patch also moves all the block reservation structure allocations to
    function gfs2_inplace_reserve to simplify the code, and moves
    the frees to function gfs2_inplace_release.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

21 Oct, 2011

2 commits

  • This means that after the initial allocation for any inode, the
    last used resource group is cached in the inode for future use.
    This drastically reduces the number of lookups of resource
    groups in the common case, and this the contention on that
    data structure.

    The allocation algorithm is the same as previously, except that we
    always check to see if the goal block is within the cached rgrp
    first before going to the rbtree to look one up.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The aim of this patch is to use the newly enhanced ->dirty_inode()
    super block operation to deal with atime updates, rather than
    piggy backing that code into ->write_inode() as is currently
    done.

    The net result is a simplification of the code in various places
    and a reduction of the number of gfs2_dinode_out() calls since
    this is now implied by ->dirty_inode().

    Some of the mark_inode_dirty() calls have been moved under glocks
    in order to take advantage of then being able to avoid locking in
    ->dirty_inode() when we already have suitable locks.

    One consequence is that generic_write_end() now correctly deals
    with file size updates, so that we do not need a separate check
    for that afterwards. This also, indirectly, means that fdatasync
    should work correctly on GFS2 - the current code always syncs the
    metadata whether it needs to or not.

    Has survived testing with postmark (with and without atime) and
    also fsx.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

14 Jul, 2011

1 commit

  • This patch contains a few misc fixes which resolve a recently
    reported issue. This patch has been a real team effort and has
    received a lot of testing.

    The first issue is that the ail lock needs to be held over a few
    more operations. The lock thats added into gfs2_releasepage() may
    possibly be a candidate for replacing with RCU at some future
    point, but at this stage we've gone for the obvious fix.

    The second issue is that gfs2_write_inode() can end up calling
    a glock recursively when called from gfs2_evict_inode() via the
    syncing code, so it needs a guard added.

    The third issue is that we either need to not truncate the metadata
    pages of inodes which have zero link count, but which we cannot
    deallocate due to them still being in use by other nodes, or we need
    to ensure that those pages have all made it through the journal and
    ail lists first. This patch takes the former approach, but the
    latter has also been tested and there is nothing to choose between
    them performance-wise. So again, we could revise that decision
    in the future.

    Also, the inode eviction process is now better documented.

    Signed-off-by: Steven Whitehouse
    Tested-by: Bob Peterson
    Tested-by: Abhijith Das
    Reported-by: Barry J. Marson
    Reported-by: David Teigland

    Steven Whitehouse
     

03 May, 2011

1 commit

  • If the buffer is dirty or pinned, then as well as printing a
    warning, we should also refuse to release the page in
    question.

    Currently this can occur if there is a race between mmap()ed
    writers and O_DIRECT on the same file. With the addition of
    ->launder_page() in the future, we should be able to close
    this gap.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

18 Apr, 2011

1 commit

  • I did an audit of gfs2's transaction glock for bugzilla bug
    658619 and ran across this:

    In function gfs2_write_end, in the unlikely event that
    gfs2_meta_inode_buffer returns an error, the code may forget
    to unlock the transaction lock because the "failed" label
    appears after the call to function gfs2_trans_end.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

25 Mar, 2011

1 commit

  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

14 Mar, 2011

1 commit

  • gfs2_write_begin() calls grab_cache_page_write_begin() that returns *locked*
    page. Correspondent error-handling path lacks for unlock_page() call:

    > out:
    > if (error == 0)
    > return 0;
    >
    > page_cache_release(page);

    The whole system hangs if gfs2_unstuff_dinode() called from gfs2_write_begin()
    failed for some reason.

    Reported-by: Maxim
    Signed-off-by: Maxim
    Signed-off-by: Steven Whitehouse

    Maxim
     

10 Mar, 2011

1 commit

  • Code has been converted over to the new explicit on-stack plugging,
    and delay users have been converted to use the new API for that.
    So lets kill off the old plugging along with aops->sync_page().

    Signed-off-by: Jens Axboe

    Jens Axboe
     

26 Oct, 2010

1 commit

  • __block_write_begin and block_prepare_write are identical except for slightly
    different calling conventions. Convert all callers to the __block_write_begin
    calling conventions and drop block_prepare_write.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

28 Sep, 2010

2 commits

  • This shouldn't really be required, but gcc can't tell that
    "al" is only accessed when initialised.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Some of the functions in GFS2 were not reserving space in the transaction for
    the resource group header and the resource groups bitblocks that get added
    when you do allocation. GFS2 now makes sure to reserve space for the
    resource group header and either all the bitblocks in the resource group, or
    one for each block that it may allocate, whichever is smaller using the new
    gfs2_rg_blocks() inline function.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

20 Sep, 2010

3 commits

  • This patch adds support for fallocate to gfs2. Since the gfs2 does not support
    uninitialized data blocks, it must write out zeros to all the blocks. However,
    since it does not need to lock any pages to read from, gfs2 can write out the
    zero blocks much more efficiently. On a moderately full filesystem, fallocate
    works around 5 times faster on average. The fallocate call also allows gfs2 to
    add blocks to the file without changing the filesize, which will make it
    possible for gfs2 to preallocate space for the rindex file, so that gfs2 can
    grow a completely full filesystem.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • With the update of the truncate code, ip->i_disksize and
    inode->i_size are merely copies of each other. This means
    we can remove ip->i_disksize and use inode->i_size exclusively
    reducing the size of a GFS2 inode by 8 bytes.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This updates GFS2's truncate code to use the new truncate
    sequence correctly. This is a stepping stone to being
    able to remove ip->i_disksize in favour of using i_size
    everywhere now that the two sizes are always identical.

    Signed-off-by: Steven Whitehouse
    Cc: Nick Piggin
    Cc: Christoph Hellwig

    Steven Whitehouse
     

11 Aug, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
    no need for list_for_each_entry_safe()/resetting with superblock list
    Fix sget() race with failing mount
    vfs: don't hold s_umount over close_bdev_exclusive() call
    sysv: do not mark superblock dirty on remount
    sysv: do not mark superblock dirty on mount
    btrfs: remove junk sb_dirt change
    BFS: clean up the superblock usage
    AFFS: wait for sb synchronization when needed
    AFFS: clean up dirty flag usage
    cifs: truncate fallout
    mbcache: fix shrinker function return value
    mbcache: Remove unused features
    add f_flags to struct statfs(64)
    pass a struct path to vfs_statfs
    update VFS documentation for method changes.
    All filesystems that need invalidate_inode_buffers() are doing that explicitly
    convert remaining ->clear_inode() to ->evict_inode()
    Make ->drop_inode() just return whether inode needs to be dropped
    fs/inode.c:clear_inode() is gone
    fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
    ...

    Fix up trivial conflicts in fs/nilfs2/super.c

    Linus Torvalds
     

10 Aug, 2010

2 commits

  • Make sure we check the truncate constraints early on in ->setattr by adding
    those checks to inode_change_ok. Also clean up and document inode_change_ok
    to make this obvious.

    As a fallout we don't have to call inode_newsize_ok from simple_setsize and
    simplify it down to a truncate_setsize which doesn't return an error. This
    simplifies a lot of setattr implementations and means we use truncate_setsize
    almost everywhere. Get rid of fat_setsize now that it's trivial and mark
    ext2_setsize static to make the calling convention obvious.

    Keep the inode_newsize_ok in vmtruncate for now as all callers need an
    audit for its removal anyway.

    Note: setattr code in ecryptfs doesn't call inode_change_ok at all and
    needs a deeper audit, but that is left for later.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Move the call to vmtruncate to get rid of accessive blocks to the callers
    in prepearation of the new truncate calling sequence. This was only done
    for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
    was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
    its _newtrunc variant while at it as just opencoding the two additional
    paramters is shorted than the name suffix.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

29 Jul, 2010

2 commits

  • Function gfs2_write_alloc_required always returned zero as its
    return code. Therefore, it doesn't need to return a return code
    at all. Given that, we can use the return value to return whether
    or not the dinode needs block allocations rather than passing
    that value in, which in turn simplifies a bunch of error checking.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • Use nobh_writepage rather than calling mpage_writepage directly.

    Signed-off-by: Steven Whitehouse
    Cc: Christoph Hellwig

    Steven Whitehouse
     

28 May, 2010

1 commit

  • Lots of filesystems calls vmtruncate despite not implementing the old
    ->truncate method. Switch them to use simple_setsize and add some
    comments about the truncate code where it seems fitting.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de
     

29 Mar, 2010

1 commit

  • If the inode size was corrupt for stuffed files, it was possible
    for the copying of data to overrun the block and/or page. This patch
    checks for that condition so that this is no longer possible.

    This is also preparation for the new truncate sequence patch which
    requires the ability to have stuffed files with larger sizes than
    (disk block size - sizeof(on disk inode)) with the restriction that
    only the initial part of the file may be non-zero.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

01 Mar, 2010

1 commit

  • Since the start of GFS2, an "extra" inode has been used to store
    the metadata belonging to each inode. The only reason for using
    this inode was to have an extra address space, the other fields
    were unused. This means that the memory usage was rather inefficient.

    The reason for keeping each inode's metadata in a separate address
    space is that when glocks are requested on remote nodes, we need to
    be able to efficiently locate the data and metadata which relating
    to that glock (inode) in order to sync or sync and invalidate it
    (depending on the remotely requested lock mode).

    This patch adds a new type of glock, which has in addition to
    its normal fields, has an address space. This applies to all
    inode and rgrp glocks (but to no other glock types which remain
    as before). As a result, we no longer need to have the second
    inode.

    This results in three major improvements:
    1. A saving of approx 25% of memory used in caching inodes
    2. A removal of the circular dependency between inodes and glocks
    3. No confusion between "normal" and "metadata" inodes in super.c

    Although the first of these is the more immediately apparent, the
    second is just as important as it now enables a number of clean
    ups at umount time. Those will be the subject of future patches.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

03 Dec, 2009

2 commits

  • No one is calling wb_writeback and write_cache_pages with
    wbc.nonblocking=1 any more. And lumpy pageout will want to do
    nonblocking writeback without the congestion wait.

    Signed-off-by: Wu Fengguang
    Signed-off-by: Steven Whitehouse

    Wu Fengguang
     
  • When a gfs2 filesystem is grown, it needs to rebuild the rindex list to be able
    to use the new space. gfs2 does this when the rindex is marked not uptodate,
    which happens when the rindex glock is dropped. However, on a single node
    setup, there is never any reason to drop the rindex glock, so gfs2 never
    invalidates the the rindex. This patch makes gfs2 automatically drop the
    rindex glock after filesystem grows, so it can refresh the rindex list.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

16 Sep, 2009

1 commit

  • Enable removing of corrupted pages through truncation
    for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
    These should cover most server needs.

    I chose the set of migration aware file systems for this
    for now, assuming they have been especially audited.
    But in general it should be safe for all file systems
    on the data area that support read/write and truncate.

    Caveat: the hardware error handler does not take i_mutex
    for now before calling the truncate function. Is that ok?

    Cc: tytso@mit.edu
    Cc: hch@infradead.org
    Cc: mfasheh@suse.com
    Cc: aia21@cantab.net
    Cc: hugh.dickins@tiscali.co.uk
    Cc: swhiteho@redhat.com
    Signed-off-by: Andi Kleen

    Andi Kleen
     

30 Jul, 2009

1 commit

  • GFS2 wasn't syncing its statfs info on grows. This causes a problem
    when you grow the filesystem on multiple nodes. GFS2 would calculate
    the new space based on the resource groups (which are always current),
    and then assume that the filesystem had grown the from the existing
    statfs size. If you grew the filesystem on two different nodes in a
    short time, the second node wouldn't see the statfs size change from the
    first node, and would assume that it was grown by a larger amount than
    it was. When all these changes were synced out, the total fileystem
    size would be incorrect (the first grow would be counted twice).

    This patch syncs makes GFS2 read in the statfs changes from disk before
    a grow, and write them out after the grow, while the master statfs inode
    is locked.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

22 May, 2009

1 commit

  • This patch renames the ops_*.c files which have no counterpart
    without the ops_ prefix in order to shorten the name and make
    it more readable. In addition, ops_address.h (which was very
    small) is moved into inode.h and inode.h is cleaned up by
    adding extern where required.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse