22 Nov, 2011

1 commit

  • This patch separates the code pertaining to allocations into two
    parts: quota-related information and block reservations.
    This patch also moves all the block reservation structure allocations to
    function gfs2_inplace_reserve to simplify the code, and moves
    the frees to function gfs2_inplace_release.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

21 Nov, 2011

1 commit

  • This patch is a revision of the one I previously posted.
    I tried to integrate all the suggestions Steve gave.
    The purpose of the patch is to change function gfs2_alloc_block
    (allocate either a dinode block or an extent of data blocks)
    to a more generic gfs2_alloc_blocks function that can
    allocate both a dinode _and_ an extent of data blocks in the
    same call. This will ultimately help us create a multi-block
    reservation scheme to reduce file fragmentation.

    This patch moves more toward a generic multi-block allocator that
    takes a pointer to the number of data blocks to allocate, plus whether
    or not to allocate a dinode. In theory, it could be called to allocate
    (1) a single dinode block, (2) a group of one or more data blocks, or
    (3) a dinode plus several data blocks.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

15 Nov, 2011

1 commit

  • GFS2 functions gfs2_alloc_block and gfs2_alloc_di do basically
    the same things, with a few exceptions. This patch combines
    the two functions into a slightly more generic gfs2_alloc_block.
    Having one centralized block allocation function will reduce
    code redundancy and make it easier to implement multi-block
    reservations to reduce file fragmentation in the future.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

09 Nov, 2011

1 commit


08 Nov, 2011

1 commit

  • This patch adds read-ahead capability to GFS2's
    directory hash table management. It greatly improves
    performance for some directory operations. For example:
    In one of my file systems that has 1000 directories, each
    of which has 1000 files, time to execute a recursive
    ls (time ls -fR /mnt/gfs2 > /dev/null) was reduced
    from 2m2.814s on a stock kernel to 0m45.938s.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

21 Oct, 2011

4 commits

  • Each block which is deallocated, requires a call to gfs2_rlist_add()
    and each of those calls was calling gfs2_blk2rgrpd() in order to
    figure out which rgrp the block belonged in. This can be speeded up
    by making use of the rgrp cached in the inode. We also reset this
    cached rgrp in case the block has changed rgrp. This should provide
    a big reduction in gfs2_blk2rgrpd() calls during deallocation.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Since we have ruled out supporting online filesystem shrink,
    it is possible to make the resource group list append only
    during the life of a super block. This gives several benefits:

    Firstly, we only need to read new rindex elements as they are added
    rather than needing to reread the whole rindex file each time one
    element is added.

    Secondly, the rindex glock can be held for much shorter periods of
    time, and is completely removed from the fast path for allocations.
    The lock is taken in shared mode only when updating the resource
    groups when the first allocation occurs, and after a grow has
    taken place.

    Thirdly, this results in a reduction in code size, and everything
    gets a lot simpler to understand in this area.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The aim of this patch is to use the newly enhanced ->dirty_inode()
    super block operation to deal with atime updates, rather than
    piggy backing that code into ->write_inode() as is currently
    done.

    The net result is a simplification of the code in various places
    and a reduction of the number of gfs2_dinode_out() calls since
    this is now implied by ->dirty_inode().

    Some of the mark_inode_dirty() calls have been moved under glocks
    in order to take advantage of then being able to avoid locking in
    ->dirty_inode() when we already have suitable locks.

    One consequence is that generic_write_end() now correctly deals
    with file size updates, so that we do not need a separate check
    for that afterwards. This also, indirectly, means that fdatasync
    should work correctly on GFS2 - the current code always syncs the
    metadata whether it needs to or not.

    Has survived testing with postmark (with and without atime) and
    also fsx.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Since there is now only a single caller to gfs2_dir_read_data()
    and it has a number of constant arguments, we can factor
    those out. Also some tests relating to the inode size were
    being done twice.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

15 Jul, 2011

1 commit

  • This patch adds a cache for the hash table to the directory code
    in order to help simplify the way in which the hash table is
    accessed. This is intended to be a first step towards introducing
    some performance improvements in the directory code.

    There are two follow ups that I'm hoping to see fairly shortly. One
    is to simplify the hash table reading code now that we always read the
    complete hash table, whether we want one entry or all of them. The
    other is to introduce readahead on the heads of the hash chains
    which are referred to from the table.

    The hash table is a maximum of 128k in size, so it is not worth trying
    to read it in small chunks.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

09 May, 2011

2 commits

  • This adds an increment of the link count when we add a new directory
    entry, if that entry is itself a directory. This means that we no
    longer need separate code to perform this operation.

    Now that both adding and removing directory entries automatically
    update the parent directory's link count if required, that makes
    the code shorter and simpler than before.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • When we remove an entry from a directory, we can save ourselves
    some trouble if we know the type of the entry in question, since
    if it is itself a directory, we can update the link count of the
    parent at the same time as removing the directory entry.

    In addition this patch also merges the rmdir and unlink code which
    was almost identical anyway. This eliminates the calls to remove
    the . and .. directory entries on each rmdir (not needed since the
    directory will be deallocated, anyway) which was the only thing preventing
    passing the dentry to gfs2_dir_del(). The passing of the dentry
    rather than just the name allows us to figure out the type of the entry
    which is being removed, and thus adjust the link count when required.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

20 Apr, 2011

4 commits

  • The previous patches made function gfs2_dir_exhash_dealloc do nothing
    but call function foreach_leaf. This patch simplifies the code by
    moving the entire function foreach_leaf into gfs2_dir_exhash_dealloc.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • Function foreach_leaf used to look up the leaf block address and get
    a buffer_head. Then it would call leaf_dealloc which did the same
    lookup. This patch combines the two operations by making foreach_leaf
    pass the leaf bh to leaf_dealloc.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • At the end of function gfs2_dir_exhash_dealloc, it was setting the dinode
    type to "file" to prevent directory corruption in case of a crash.
    It was doing so in its own journal transaction. This patch makes the
    change occur when the last call is make to leaf_dealloc, since it needs
    to rewrite the directory dinode at that time anyway.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • Since foreach_leaf is only called with leaf_dealloc as its only possible
    call function, we can simplify the code by making it call leaf_dealloc
    directly. This simplifies the code and eliminates the need for
    leaf_call_t, the generic call method. This is a first small step in
    simplifying the directory leaf deallocation code.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

18 Apr, 2011

1 commit

  • This patch fixes a deadlock in GFS2 where two processes are trying
    to reclaim an unlinked dinode:
    One holds the inode glock and calls gfs2_lookup_by_inum trying to look
    up the inode, which it can't, due to I_FREEING. The other has set
    I_FREEING from vfs and is at the beginning of gfs2_delete_inode
    waiting for the glock, which is held by the first. The solution is to
    add a new non_block parameter to the gfs2_iget function that causes it
    to return -ENOENT if the inode is being freed.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

20 Sep, 2010

2 commits

  • Rather than calculating the qstrs for . and .. each time
    we need them, its better to keep a constant version of
    these and just refer to them when required.

    Signed-off-by: Steven Whitehouse
    Reviewed-by: Christoph Hellwig

    Steven Whitehouse
     
  • With the update of the truncate code, ip->i_disksize and
    inode->i_size are merely copies of each other. This means
    we can remove ip->i_disksize and use inode->i_size exclusively
    reducing the size of a GFS2 inode by 8 bytes.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

29 Jul, 2010

2 commits

  • The k[mc]allocs in dr_split_leaf() and dir_double_exhash() are failable,
    so remove __GFP_NOFAIL from their masks.

    Cc: Bob Peterson
    Signed-off-by: David Rientjes
    Signed-off-by: Steven Whitehouse

    David Rientjes
     
  • If we don't need a huge amount of memory in ->readdir() then
    we can use kmalloc rather than vmalloc to allocate it. This
    should cut down on the greater overheads associated with
    vmalloc for smaller directories.

    We may be able to eliminate vmalloc entirely at some stage,
    but this is easy to do right away.

    Also using GFP_NOFS to avoid any issues wrt to deleting inodes
    while under a glock, and suggestion from Linus to factor out
    the alloc/dealloc.

    I've given this a test with a variety of different sized
    directories and it seems to work ok.

    Cc: Andrew Morton
    Cc: Nick Piggin
    Cc: Prarit Bhargava
    Signed-off-by: Steven Whitehouse
    Signed-off-by: Linus Torvalds

    Steven Whitehouse
     

15 Jul, 2010

1 commit

  • This patch fixes a kernel Oops in the GFS2 rename code.

    The problem was in the way the gfs2 directory code was trying
    to re-use sentinel directory entries.

    In the failing case, gfs2's rename function was renaming a
    file to another name that had the same non-trivial length.
    The file being renamed happened to be the first directory
    entry on the leaf block.

    First, the rename code (gfs2_rename in ops_inode.c) found the
    original directory entry and decided it could do its job by
    simply replacing the directory entry with another. Therefore
    it determined correctly that no block allocations were needed.

    Next, the rename code deleted the old directory entry prior to
    replacing it with the new name. Therefore, the soon-to-be
    replaced directory entry was temporarily made into a directory
    entry "sentinel" or a place holder at the start of a leaf block.

    Lastly, it went to re-add the replacement directory entry in
    that leaf block. However, when gfs2_dirent_find_space was
    looking for space in the leaf block, it used the wrong value
    for the sentinel. That threw off its calculations so later
    it decides it can't really re-use the sentinel and therefore
    must allocate a new leaf block. But because it previously decided
    to re-use the directory entry, it didn't waste the time to
    grab a new block allocation for the inode. Therefore, the
    inode's i_alloc pointer was still NULL and it crashes trying to
    reference it.

    In the case of sentinel directory entries, the entire dirent is
    reused, not just the "free space" portion of it, and therefore
    the function gfs2_dirent_find_space should use the value 0
    rather than GFS2_DIRENT_SIZE(0) for the actual dirent size.

    Fixing this calculation enables the reproducer programs to work
    properly.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

14 Apr, 2010

1 commit

  • This patch fixes a couple gfs2 problems with the reclaiming of
    unlinked dinodes. First, there were a couple of livelocks where
    everything would come to a halt waiting for a glock that was
    seemingly held by a process that no longer existed. In fact, the
    process did exist, it just had the wrong pid number in the holder
    information. Second, there was a lock ordering problem between
    inode locking and glock locking. Third, glock/inode contention
    could sometimes cause inodes to be improperly marked invalid by
    iget_failed.

    Signed-off-by: Bob Peterson

    Bob Peterson
     

03 Dec, 2009

1 commit

  • This function only had one caller left, and that caller only
    called it for leaf blocks, hence one branch of the "if" was
    never taken. In addition the call to get_left had already
    verified the metadata type, so the function can be reduced
    to a single line of code in its caller.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

20 May, 2009

1 commit

  • This patch improves the error handling in the case where we
    discover that the summary information in the resource group
    doesn't match the bitmap information while in the process of
    allocating blocks. Originally this resulted in a kernel bug,
    but this patch changes that so that we return -EIO and print
    some messages explaining what went wrong, and how to fix it.

    We also remember locally not to try and allocate from the
    same rgrp again, so that a subsequent allocation in a
    different rgrp should succeed.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

24 Mar, 2009

1 commit

  • This is the big patch that I've been working on for some time
    now. There are many reasons for wanting to make this change
    such as:
    o Reducing overhead by eliminating duplicated fields between structures
    o Simplifcation of the code (reduces the code size by a fair bit)
    o The locking interface is now the DLM interface itself as proposed
    some time ago.
    o Fewer lookups of glocks when processing replies from the DLM
    o Fewer memory allocations/deallocations for each glock
    o Scope to do further optimisations in the future (but this patch is
    more than big enough for now!)

    Please note that (a) this patch relates to the lock_dlm module and
    not the DLM itself, that is still a separate module; and (b) that
    we retain the ability to build GFS2 as a standalone single node
    filesystem with out requiring the DLM.

    This patch needs a lot of testing, hence my keeping it I restarted
    my -git tree after the last merge window. That way, this has the maximum
    exposure before its merged. This is (modulo a few minor bug fixes) the
    same patch that I've been posting on and off the the last three months
    and its passed a number of different tests so far.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

05 Jan, 2009

3 commits


10 Apr, 2008

1 commit

  • There are several places where GFP_KERNEL allocations happen under a glock,
    which will result in hangs if we're under memory pressure and go to re-enter the
    fs in order to flush stuff out. This patch changes the culprits to GFS_NOFS to
    keep this problem from happening. Thank you,

    Signed-off-by: Josef Bacik
    Signed-off-by: Steven Whitehouse

    Josef Bacik
     

31 Mar, 2008

9 commits

  • gfs2_alloc_get may fail so we have to check it to prevent
    NULL pointer dereference.

    Signed-off-by: Cyrill Gorcunov
    Signed-off-by: Steven Whitehouse

    Cyrill Gorcunov
     
  • We've supported mapping of extents when no block allocation is required
    for some time. This patch extends that to mapping of extents when an
    allocation has been requested. In that case we try to allocate as many
    blocks as are requested, but we might return fewer in case there is
    something preventing us from returning the complete amount (e.g. an
    already allocated block is in the way).

    Currently the only code path which can actually request multiple data
    blocks in a single bmap call is the page_mkwrite path and even then it
    only happens if there are multiple blocks per page. What this patch does
    do however, is merge the allocation requests for metadata (growing the
    metadata tree in either height or depth) with the allocation of the data
    blocks in the case that both are needed. This results in lower overheads
    even in the single block allocation case.

    The one thing which we can't handle here at the moment is unstuffing. I
    would like to be able to do that, but the problem which arises is that
    in order to unstuff one has to get a locked page from the page cache
    which results in locking problems in the (usual) case that the caller is
    holding the page lock on the page it wishes to map. So that case will
    have to be addressed in future patches.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • replace all:
    big_endian_variable = cpu_to_beX(beX_to_cpu(big_endian_variable) +
    expression_in_cpu_byteorder);
    with:
    beX_add_cpu(&big_endian_variable, expression_in_cpu_byteorder);
    generated with semantic patch

    Signed-off-by: Marcin Slusarz
    Signed-off-by: Steven Whitehouse

    Marcin Slusarz
     
  • The blocks counter is almost a duplicate of the i_blocks
    field in the VFS inode. The only difference is that i_blocks
    can be only 32bits long for 32bit arch without large single file
    support. Since GFS2 doesn't handle the non-large single file
    case (for 32 bit anyway) this adds a new config dependency on
    64BIT || LSF. This has always been the case, however we've never
    explicitly said so before.

    Even if we do add support for the non-LSF case, we will still
    not require this field to be duplicated since we will not be
    able to access oversized files anyway.

    So the net result of all this is that we shave 8 bytes from a gfs2_inode
    and get our config deps correct.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Rather than having to allocate a single block at a time, this patch
    allows the block allocator to allocate an extent. Since there is
    no difference (so far as the block allocator is concerned) between
    data blocks and indirect blocks, it is posible to allocate a single
    extent and for the caller to unrevoke just the blocks required
    for indirect blocks.

    Currently the only bit of GFS2 to make use of this feature is the
    build height function. The intention is that gfs2_block_map will
    be changed to make use of this feature in future patches.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Thanks to the preceeding patches, the only difference between
    these two functions is their name. We can thus merge them
    and call the new function gfs2_alloc_block to reflect the
    fact that it can allocate either kind of block.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • By adding an extra argument to gfs2_trans_add_unrevoke we can now
    specify an extent length of blocks to unrevoke. This means that
    we only need to make one pass through the list for each extent
    rather than each block. Currently the only extent length which
    is used is 1, but that will change in the future.

    Also gfs2_trans_add_unrevoke is removed from gfs2_alloc_meta
    since its the only difference between this and gfs2_alloc_data
    which is left. This will allow a future patch to merge these
    two functions into one (i.e. one call to allocate both data
    and metadata in a single extent in the future).

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch forms a pair with the previous patch which shrunk
    di_height. Like that patch di_depth is renamed i_depth and moved
    into struct gfs2_inode directly. Also the field goes from 16 bits
    to 8 bits since it is also limited to a max value which is rather
    small (17 in this case). In addition we also now validate the field
    against this maximum value when its read in.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch removed the unnecessary parameter from function
    gfs2_rlist_alloc. The parameter was always passed in as 0.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

08 Feb, 2008

1 commit