22 Nov, 2011

1 commit

  • This patch separates the code pertaining to allocations into two
    parts: quota-related information and block reservations.
    This patch also moves all the block reservation structure allocations to
    function gfs2_inplace_reserve to simplify the code, and moves
    the frees to function gfs2_inplace_release.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

21 Nov, 2011

1 commit

  • This patch is a revision of the one I previously posted.
    I tried to integrate all the suggestions Steve gave.
    The purpose of the patch is to change function gfs2_alloc_block
    (allocate either a dinode block or an extent of data blocks)
    to a more generic gfs2_alloc_blocks function that can
    allocate both a dinode _and_ an extent of data blocks in the
    same call. This will ultimately help us create a multi-block
    reservation scheme to reduce file fragmentation.

    This patch moves more toward a generic multi-block allocator that
    takes a pointer to the number of data blocks to allocate, plus whether
    or not to allocate a dinode. In theory, it could be called to allocate
    (1) a single dinode block, (2) a group of one or more data blocks, or
    (3) a dinode plus several data blocks.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

15 Nov, 2011

1 commit

  • GFS2 functions gfs2_alloc_block and gfs2_alloc_di do basically
    the same things, with a few exceptions. This patch combines
    the two functions into a slightly more generic gfs2_alloc_block.
    Having one centralized block allocation function will reduce
    code redundancy and make it easier to implement multi-block
    reservations to reduce file fragmentation in the future.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

08 Nov, 2011

1 commit


21 Oct, 2011

6 commits

  • Move the recently added readahead of the indirect pointer
    tree during deallocation into its own function in order
    that we can use it elsewhere in the future. Also this
    fixes the resetting of the "first" variable in the
    original patch.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • GFS2's fallocate code currently goes through the page cache. Since it's only
    writing to the end of the file or to holes in it, it doesn't need to, and it
    was causing issues on low memory environments. This patch pulls in some of
    Steve's block allocation work, and uses it to simply allocate the blocks for
    the file, and zero them out at allocation time. It provides a slight
    performance increase, and it dramatically simplifies the code.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • This patch improves the performance of delete/unlink
    operations in a GFS2 file system where the files are large
    by adding a layer of metadata read-ahead for indirect blocks.
    Mileage will vary, but on my system, deleting an 8.6G file
    dropped from 22 seconds to about 4.5 seconds.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • Each block which is deallocated, requires a call to gfs2_rlist_add()
    and each of those calls was calling gfs2_blk2rgrpd() in order to
    figure out which rgrp the block belonged in. This can be speeded up
    by making use of the rgrp cached in the inode. We also reset this
    cached rgrp in case the block has changed rgrp. This should provide
    a big reduction in gfs2_blk2rgrpd() calls during deallocation.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The recursive_scan() function only ever takes a single "bc"
    argument, so we might as well just call do_strip() directly
    from resource_scan() rather than pass it in as an argument.

    Also the "data" argument is always a struct strip_mine, so
    we can pass that in, rather than using a void pointer.

    This also moves do_strip() ahead of recursive_scan() so that
    we don't need to add a prototype.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Since we have ruled out supporting online filesystem shrink,
    it is possible to make the resource group list append only
    during the life of a super block. This gives several benefits:

    Firstly, we only need to read new rindex elements as they are added
    rather than needing to reread the whole rindex file each time one
    element is added.

    Secondly, the rindex glock can be held for much shorter periods of
    time, and is completely removed from the fast path for allocations.
    The lock is taken in shared mode only when updating the resource
    groups when the first allocation occurs, and after a grow has
    taken place.

    Thirdly, this results in a reduction in code size, and everything
    gets a lot simpler to understand in this area.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

23 Jul, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (107 commits)
    vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp
    isofs: Remove global fs lock
    jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory
    fix IN_DELETE_SELF on overwriting rename() on ramfs et.al.
    mm/truncate.c: fix build for CONFIG_BLOCK not enabled
    fs:update the NOTE of the file_operations structure
    Remove dead code in dget_parent()
    AFS: Fix silly characters in a comment
    switch d_add_ci() to d_splice_alias() in "found negative" case as well
    simplify gfs2_lookup()
    jfs_lookup(): don't bother with . or ..
    get rid of useless dget_parent() in btrfs rename() and link()
    get rid of useless dget_parent() in fs/btrfs/ioctl.c
    fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers
    drivers: fix up various ->llseek() implementations
    fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek
    Ext4: handle SEEK_HOLE/SEEK_DATA generically
    Btrfs: implement our own ->llseek
    fs: add SEEK_HOLE and SEEK_DATA flags
    reiserfs: make reiserfs default to barrier=flush
    ...

    Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_super.c due to the new
    shrinker callout for the inode cache, that clashed with the xfs code to
    start the periodic workers later.

    Linus Torvalds
     

21 Jul, 2011

1 commit

  • Let filesystems handle waiting for direct I/O requests themselves instead
    of doing it beforehand. This means filesystem-specific locks to prevent
    new dio referenes from appearing can be held. This is important to allow
    generalizing i_dio_count to non-DIO_LOCKING filesystems.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

15 Jul, 2011

1 commit

  • __gfs2_free_data and __gfs2_free_meta are almost identical, and
    can be trivially combined.

    [This is as per Eric's original patch minus gfs2_free_data() which had
    no callers left and plus the conversion of the bmap.c calls to these
    functions. All in all, a nice clean up]

    Signed-off-by: Eric Sandeen
    Signed-off-by: Steven Whitehouse

    Eric Sandeen
     

21 May, 2011

1 commit

  • The deallocation code for directories in GFS2 is largely divided into
    two parts. The first part deallocates any directory leaf blocks and
    marks the directory as being a regular file when that is complete. The
    second stage was identical to deallocating regular files.

    Regular files have their data blocks in a different
    address space to directories, and thus what would have been normal data
    blocks in a regular file (the hash table in a GFS2 directory) were
    deallocated correctly. However, a reference to these blocks was left in the
    journal (assuming of course that some previous activity had resulted in
    those blocks being in the journal or ail list).

    This patch uses the i_depth as a test of whether the inode is an
    exhash directory (we cannot test the inode type as that has already
    been changed to a regular file at this stage in deallocation)

    The original issue was reported by Chris Hertel as an issue he encountered
    running bonnie++

    Reported-by: Christopher R. Hertel
    Cc: Abhijith Das
    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

31 Mar, 2011

1 commit


24 Feb, 2011

1 commit

  • This patch is a performance improvement to GFS2's dealloc code.
    Rather than update the quota file and statfs file for every
    single block that's stripped off in unlink function do_strip,
    this patch keeps track and updates them once for every layer
    that's stripped. This is done entirely inside the existing
    transaction, so there should be no risk of corruption.
    The other functions that deallocate blocks will be unaffected
    because they are using wrapper functions that do the same
    thing that they do today.

    I tested this code on my roth cluster by creating 200
    files in a directory, each of which is 100MB, then on
    four nodes, I simultaneously deleted the files, thus competing
    for GFS2 resources (but different files). The commands
    I used were:

    [root@roth-01]# time for i in `seq 1 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
    [root@roth-02]# time for i in `seq 2 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
    [root@roth-03]# time for i in `seq 3 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
    [root@roth-05]# time for i in `seq 4 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done

    The performance increase was significant:

    roth-01 roth-02 roth-03 roth-05
    --------- --------- --------- ---------
    old: real 0m34.027 0m25.021s 0m23.906s 0m35.646s
    new: real 0m22.379s 0m24.362s 0m24.133s 0m18.562s

    Total time spent deleting:
    old: 118.6s
    new: 89.4

    For this particular case, this showed a 25% performance increase for
    GFS2 unlinks.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

30 Nov, 2010

2 commits


28 Sep, 2010

1 commit

  • Some of the functions in GFS2 were not reserving space in the transaction for
    the resource group header and the resource groups bitblocks that get added
    when you do allocation. GFS2 now makes sure to reserve space for the
    resource group header and either all the bitblocks in the resource group, or
    one for each block that it may allocate, whichever is smaller using the new
    gfs2_rg_blocks() inline function.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

20 Sep, 2010

2 commits

  • With the update of the truncate code, ip->i_disksize and
    inode->i_size are merely copies of each other. This means
    we can remove ip->i_disksize and use inode->i_size exclusively
    reducing the size of a GFS2 inode by 8 bytes.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This updates GFS2's truncate code to use the new truncate
    sequence correctly. This is a stepping stone to being
    able to remove ip->i_disksize in favour of using i_size
    everywhere now that the two sizes are always identical.

    Signed-off-by: Steven Whitehouse
    Cc: Nick Piggin
    Cc: Christoph Hellwig

    Steven Whitehouse
     

30 Jul, 2010

1 commit


29 Jul, 2010

1 commit

  • Function gfs2_write_alloc_required always returned zero as its
    return code. Therefore, it doesn't need to return a return code
    at all. Given that, we can use the return value to return whether
    or not the dinode needs block allocations rather than passing
    that value in, which in turn simplifies a bunch of error checking.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

15 Jul, 2010

1 commit


21 May, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
    GFS2: Fix typo
    GFS2: stuck in inode wait, no glocks stuck
    GFS2: Eliminate useless err variable
    GFS2: Fix writing to non-page aligned gfs2_quota structures
    GFS2: Add some useful messages
    GFS2: fix quota state reporting
    GFS2: Various gfs2_logd improvements
    GFS2: glock livelock
    GFS2: Clean up stuffed file copying
    GFS2: docs update
    GFS2: Remove space from slab cache name

    Linus Torvalds
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

29 Mar, 2010

1 commit

  • If the inode size was corrupt for stuffed files, it was possible
    for the copying of data to overrun the block and/or page. This patch
    checks for that condition so that this is no longer possible.

    This is also preparation for the new truncate sequence patch which
    requires the ability to have stuffed files with larger sizes than
    (disk block size - sizeof(on disk inode)) with the restriction that
    only the initial part of the file may be non-zero.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

12 Feb, 2010

1 commit

  • This patch solves a corner case during allocation which occurs if both
    metadata (indirect) and data blocks are required but there is an
    obstacle in the filesystem (e.g. a resource group header or another
    allocated block) such that when the allocation is requested only
    enough blocks for the metadata are returned.

    By changing the exit condition of this loop, we ensure that a
    minimum of one data block will always be returned.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

12 Jun, 2009

1 commit

  • This patch adds the ability to trace various aspects of the GFS2
    filesystem. The trace points are divided into three groups,
    glocks, logging and bmap. These points have been chosen because
    they allow inspection of the major internal functions of GFS2
    and they are also generic enough that they are unlikely to need
    any major changes as the filesystem evolves.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

10 Jun, 2009

1 commit


22 May, 2009

1 commit

  • This patch renames the ops_*.c files which have no counterpart
    without the ops_ prefix in order to shorten the name and make
    it more readable. In addition, ops_address.h (which was very
    small) is moved into inode.h and inode.h is cleaned up by
    adding extern where required.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

20 May, 2009

1 commit

  • This patch improves the error handling in the case where we
    discover that the summary information in the resource group
    doesn't match the bitmap information while in the process of
    allocating blocks. Originally this resulted in a kernel bug,
    but this patch changes that so that we return -EIO and print
    some messages explaining what went wrong, and how to fix it.

    We also remember locally not to try and allocate from the
    same rgrp again, so that a subsequent allocation in a
    different rgrp should succeed.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

24 Mar, 2009

1 commit

  • This is the big patch that I've been working on for some time
    now. There are many reasons for wanting to make this change
    such as:
    o Reducing overhead by eliminating duplicated fields between structures
    o Simplifcation of the code (reduces the code size by a fair bit)
    o The locking interface is now the DLM interface itself as proposed
    some time ago.
    o Fewer lookups of glocks when processing replies from the DLM
    o Fewer memory allocations/deallocations for each glock
    o Scope to do further optimisations in the future (but this patch is
    more than big enough for now!)

    Please note that (a) this patch relates to the lock_dlm module and
    not the DLM itself, that is still a separate module; and (b) that
    we retain the ability to build GFS2 as a standalone single node
    filesystem with out requiring the DLM.

    This patch needs a lot of testing, hence my keeping it I restarted
    my -git tree after the last merge window. That way, this has the maximum
    exposure before its merged. This is (modulo a few minor bug fixes) the
    same patch that I've been posting on and off the the last three months
    and its passed a number of different tests so far.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

05 Jan, 2009

3 commits


25 Jun, 2008

1 commit

  • This patch fixes bz 450641.

    This patch changes the computation for zero_metapath_length(), which it
    renames to metapath_branch_start(). When you are extending the metadata
    tree, The indirect blocks that point to the new data block must either
    diverge from the existing tree either at the inode, or at the first
    indirect block. They can diverge at the first indirect block because the
    inode has room for 483 pointers while the indirect blocks have room for
    509 pointers, so when the tree is grown, there is some free space in the
    first indirect block. What metapath_branch_start() now computes is the
    height where the first indirect block for the new data block is located.
    It can either be 1 (if the indirect block diverges from the inode) or 2
    (if it diverges from the first indirect block).

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

31 Mar, 2008

3 commits

  • This patch streamlines the quota checking in the "no quota" case by
    making the check inline in the calling function, thus reducing the
    number of function calls. Eventually we might be able to remove the
    checks from the gfs2_quota_lock() and gfs2_quota_check() functions, but
    currently we can't as there are a very few places in the code which need
    to call these functions directly still.

    Signed-off-by: Steven Whitehouse
    Cc: Abhijith Das

    Steven Whitehouse
     
  • gfs2_alloc_get may fail so we have to check it to prevent
    NULL pointer dereference.

    Signed-off-by: Cyrill Gorcunov
    Signed-off-by: Steven Whitehouse

    Cyrill Gorcunov
     
  • We've supported mapping of extents when no block allocation is required
    for some time. This patch extends that to mapping of extents when an
    allocation has been requested. In that case we try to allocate as many
    blocks as are requested, but we might return fewer in case there is
    something preventing us from returning the complete amount (e.g. an
    already allocated block is in the way).

    Currently the only code path which can actually request multiple data
    blocks in a single bmap call is the page_mkwrite path and even then it
    only happens if there are multiple blocks per page. What this patch does
    do however, is merge the allocation requests for metadata (growing the
    metadata tree in either height or depth) with the allocation of the data
    blocks in the case that both are needed. This results in lower overheads
    even in the single block allocation case.

    The one thing which we can't handle here at the moment is unstuffing. I
    would like to be able to do that, but the problem which arises is that
    in order to unstuff one has to get a locked page from the page cache
    which results in locking problems in the (usual) case that the caller is
    holding the page lock on the page it wishes to map. So that case will
    have to be addressed in future patches.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse