21 Oct, 2011

1 commit

  • The aim of this patch is to use the newly enhanced ->dirty_inode()
    super block operation to deal with atime updates, rather than
    piggy backing that code into ->write_inode() as is currently
    done.

    The net result is a simplification of the code in various places
    and a reduction of the number of gfs2_dinode_out() calls since
    this is now implied by ->dirty_inode().

    Some of the mark_inode_dirty() calls have been moved under glocks
    in order to take advantage of then being able to avoid locking in
    ->dirty_inode() when we already have suitable locks.

    One consequence is that generic_write_end() now correctly deals
    with file size updates, so that we do not need a separate check
    for that afterwards. This also, indirectly, means that fdatasync
    should work correctly on GFS2 - the current code always syncs the
    metadata whether it needs to or not.

    Has survived testing with postmark (with and without atime) and
    also fsx.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

20 Jul, 2011

1 commit


13 May, 2011

1 commit

  • This moves the symlink specific parts of inode creation
    into the function where we initialise the rest of the
    dinode. As a result we have one less place where we need
    to look up the inode's buffer.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

09 May, 2011

2 commits

  • This function was intended for debugging purposes, but it is not very
    useful. If we want to know what is on disk then all we need is a
    block number and gfs2_edit can give us much better information about
    what is there. Otherwise, if we are interested in what is stored in
    the in-core inode, it doesn't help us out there either.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This adds an increment of the link count when we add a new directory
    entry, if that entry is itself a directory. This means that we no
    longer need separate code to perform this operation.

    Now that both adding and removing directory entries automatically
    update the parent directory's link count if required, that makes
    the code shorter and simpler than before.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

20 Apr, 2011

2 commits

  • This patch adds writeback_control to writing back the AIL
    list. This means that we can then take advantage of the
    information we get in ->write_inode() in order to set off
    some pre-emptive writeback.

    In addition, the AIL code is cleaned up a bit to make it
    a bit simpler to understand.

    There is still more which can usefully be done in this area,
    but this is a good start at least.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The GLF_LRU flag introduced in the previous patch can be
    used to check if a glock is on the lru list when a new
    holder is queued and if so remove it, without having first
    to get the lru_lock.

    The main purpose of this patch however is to optimise the
    glocks left over when an inode at end of life is being
    evicted. Previously such glocks were left with the GLF_LFLUSH
    flag set, so that when reclaimed, each one required a log flush.
    This patch resets the GLF_LFLUSH flag when there is nothing
    left to flush thus preventing later log flushes as glocks are
    reused or demoted.

    In order to do this, we need to keep track of the number of
    revokes which are outstanding, and also to clear the GLF_LFLUSH
    bit after a log commit when only revokes have been processed.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

18 Apr, 2011

1 commit

  • This patch fixes a deadlock in GFS2 where two processes are trying
    to reclaim an unlinked dinode:
    One holds the inode glock and calls gfs2_lookup_by_inum trying to look
    up the inode, which it can't, due to I_FREEING. The other has set
    I_FREEING from vfs and is at the beginning of gfs2_delete_inode
    waiting for the glock, which is held by the first. The solution is to
    add a new non_block parameter to the gfs2_iget function that causes it
    to return -ENOENT if the inode is being freed.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

18 Jan, 2011

1 commit

  • In the (impossible, except if there is fs corruption) error path
    in gfs2_lookup_by_inum() if the call to gfs2_inode_refresh()
    fails, it was leaving the function by calling iput() rather
    than iget_failed(). This would cause future lookups of the same
    inode to block forever.

    This patch fixes the problem by moving the call to gfs2_inode_refresh()
    into gfs2_inode_lookup() where iget_failed() is part of the error path
    already. Also this cleans up some unreachable code and makes
    gfs2_set_iop() static.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

07 Jan, 2011

1 commit


15 Nov, 2010

1 commit

  • This area of the code has always been a bit delicate due to the
    subtleties of lock ordering. The problem is that for "normal"
    alloc/dealloc, we always grab the inode locks first and the rgrp lock
    later.

    In order to ensure no races in looking up the unlinked, but still
    allocated inodes, we need to hold the rgrp lock when we do the lookup,
    which means that we can't take the inode glock.

    The solution is to borrow the technique already used by NFS to solve
    what is essentially the same problem (given an inode number, look up
    the inode carefully, checking that it really is in the expected
    state).

    We cannot do that directly from the allocation code (lock ordering
    again) so we give the job to the pre-existing delete workqueue and
    carry on with the allocation as normal.

    If we find there is no space, we do a journal flush (required anyway
    if space from a deallocation is to be released) which should block
    against the pending deallocations, so we should always get the space
    back.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

20 Sep, 2010

2 commits

  • This patch adds support for fallocate to gfs2. Since the gfs2 does not support
    uninitialized data blocks, it must write out zeros to all the blocks. However,
    since it does not need to lock any pages to read from, gfs2 can write out the
    zero blocks much more efficiently. On a moderately full filesystem, fallocate
    works around 5 times faster on average. The fallocate call also allows gfs2 to
    add blocks to the file without changing the filesize, which will make it
    possible for gfs2 to preallocate space for the rindex file, so that gfs2 can
    grow a completely full filesystem.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • With the update of the truncate code, ip->i_disksize and
    inode->i_size are merely copies of each other. This means
    we can remove ip->i_disksize and use inode->i_size exclusively
    reducing the size of a GFS2 inode by 8 bytes.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

21 May, 2010

1 commit

  • The previous patch I wrote for reclaiming unlinked dinodes
    had some shortcomings and did not prevent all hangs.
    This version is much cleaner and more logical, and has
    passed very difficult testing. Sorry for the churn.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

14 Apr, 2010

1 commit

  • This patch fixes a couple gfs2 problems with the reclaiming of
    unlinked dinodes. First, there were a couple of livelocks where
    everything would come to a halt waiting for a glock that was
    seemingly held by a process that no longer existed. In fact, the
    process did exist, it just had the wrong pid number in the holder
    information. Second, there was a lock ordering problem between
    inode locking and glock locking. Third, glock/inode contention
    could sometimes cause inodes to be improperly marked invalid by
    iget_failed.

    Signed-off-by: Bob Peterson

    Bob Peterson
     

22 May, 2009

4 commits


15 Apr, 2009

1 commit


24 Mar, 2009

1 commit

  • This is the big patch that I've been working on for some time
    now. There are many reasons for wanting to make this change
    such as:
    o Reducing overhead by eliminating duplicated fields between structures
    o Simplifcation of the code (reduces the code size by a fair bit)
    o The locking interface is now the DLM interface itself as proposed
    some time ago.
    o Fewer lookups of glocks when processing replies from the DLM
    o Fewer memory allocations/deallocations for each glock
    o Scope to do further optimisations in the future (but this patch is
    more than big enough for now!)

    Please note that (a) this patch relates to the lock_dlm module and
    not the DLM itself, that is still a separate module; and (b) that
    we retain the ability to build GFS2 as a standalone single node
    filesystem with out requiring the DLM.

    This patch needs a lot of testing, hence my keeping it I restarted
    my -git tree after the last merge window. That way, this has the maximum
    exposure before its merged. This is (modulo a few minor bug fixes) the
    same patch that I've been posting on and off the the last three months
    and its passed a number of different tests so far.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

05 Jan, 2009

2 commits

  • The final field in gfs2_dinode_host was the i_flags field. Thats
    renamed to i_diskflags in order to avoid confusion with the existing
    inode flags, and moved into the inode proper at a suitable location
    to avoid creating a "hole".

    At that point struct gfs2_dinode_host is no longer needed and as
    promised (quite some time ago!) it can now be removed completely.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Move the contents of some headers which contained very
    little into more sensible places, and remove the original
    header files. This should make it easier to find things.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

18 Sep, 2008

1 commit

  • Until now, we've used the same scheme as GFS1 for atime. This has failed
    since atime is a per vfsmnt flag, not a per fs flag and as such the
    "noatime" flag was not getting passed down to the filesystems. This
    patch removes all the "special casing" around atime updates and we
    simply use the VFS's atime code.

    The net result is that GFS2 will now support all the same atime related
    mount options of any other filesystem on a per-vfsmnt basis. We do lose
    the "lazy atime" updates, but we gain "relatime". We could add lazy
    atime to the VFS at a later date, if there is a requirement for that
    variant still - I suspect relatime will be enough.

    Also we lose about 100 lines of code after this patch has been applied,
    and I have a suspicion that it will speed things up a bit, even when
    atime is "on". So it seems like a nice clean up as well.

    From a user perspective, everything stays the same except the loss of
    the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very
    least, and to be honest I don't think anybody ever used it) and that a
    number of options which were ignored before now work correctly.

    Please let me know if you've got any comments. I'm pushing this out
    early so that you can all see what my plans are.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

27 Aug, 2008

1 commit

  • This patch fixes a locking issue in the rename code by ensuring that we hold
    the per sb rename lock over both directory and "other" renames which involve
    different parent directories.

    At the same time, this moved the (only called from one place) function
    gfs2_ok_to_move into the file that its called from, so we can mark it
    static. This should make a code a bit easier to follow.

    Signed-off-by: Steven Whitehouse
    Cc: Peter Staubach

    Steven Whitehouse
     

27 Jul, 2008

1 commit


10 Jul, 2008

1 commit


03 Jul, 2008

1 commit

  • GFS2 calls permission() to verify permissions after locks on the files
    have been taken.

    For this it's sufficient to call gfs2_permission() instead. This
    results in the following changes:

    - IS_RDONLY() check is not performed
    - IS_IMMUTABLE() check is not performed
    - devcgroup_inode_permission() is not called
    - security_inode_permission() is not called

    IS_RDONLY() should be unnecessary anyway, as the per-mount read-only
    flag should provide protection against read-only remounts during
    operations. do_gfs2_set_flags() has been fixed to perform
    mnt_want_write()/mnt_drop_write() to protect against remounting
    read-only.

    IS_IMMUTABLE has been added to gfs2_permission()

    Repeating the security checks seems to be pointless, as they don't
    normally change, and if they do, it's independent of the filesystem
    state.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Steven Whitehouse

    Miklos Szeredi
     

31 Mar, 2008

2 commits

  • The blocks counter is almost a duplicate of the i_blocks
    field in the VFS inode. The only difference is that i_blocks
    can be only 32bits long for 32bit arch without large single file
    support. Since GFS2 doesn't handle the non-large single file
    case (for 32 bit anyway) this adds a new config dependency on
    64BIT || LSF. This has always been the case, however we've never
    explicitly said so before.

    Even if we do add support for the non-LSF case, we will still
    not require this field to be duplicated since we will not be
    able to access oversized files anyway.

    So the net result of all this is that we shave 8 bytes from a gfs2_inode
    and get our config deps correct.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch improves the calculation of the tree height in order to reduce
    the number of operations which are carried out on each call to gfs2_block_map.
    In the common case, we now make a single comparison, rather than calculating
    the required tree height from scratch each time. Also in the case that the
    tree does need some extra height, we start from the current height rather from
    zero when we work out what the new height ought to be.

    In addition the di_height field is moved into the inode proper and reduced
    in size to a u8 since the value must be between 0 and GFS2_MAX_META_HEIGHT (10).

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

25 Jan, 2008

2 commits

  • Just like ext3 we now have three sets of address space operations
    to cover the cases of writeback, ordered and journalled data
    writes. This means that the individual operations can now become
    less complicated as we are able to remove some of the tests for
    file data mode from the code.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This adds a function "gfs2_is_writeback()" along the lines of the
    existing "gfs2_is_jdata()" in order to clean up the code and make
    the various tests for the inode mode more obvious. It also fixes
    the PageChecked() logic where we were resetting the flag too early
    in the case of an error path.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

10 Oct, 2007

1 commit

  • There is a possible deadlock between two processes on the same node, where one
    process is deleting an inode, and another process is looking for allocated but
    unused inodes to delete in order to create more space.

    process A does an iput() on inode X, and it's i_count drops to 0. This causes
    iput_final() to be called, which puts an inode into state I_FREEING at
    generic_delete_inode(). There no point between when iput_final() is called, and
    when I_FREEING is set where GFS2 could acquire any glocks. Once I_FREEING is
    set, no other process on that node can successfully look up that inode until
    the delete finishes.

    process B locks the the resource group for the same inode in get_local_rgrp(),
    which is called by gfs2_inplace_reserve_i()

    process A tries to lock the resource group for the inode in
    gfs2_dinode_dealloc(), but it's already locked by process B

    process B waits in find_inode for the inode to have the I_FREEING state cleared.

    Deadlock.

    This patch solves the problem by adding an alternative to gfs2_iget(),
    gfs2_iget_skip(), that simply skips any inodes that are in the I_FREEING
    state.o The alternate test function is just like the original one, except that
    it fails if the inode is being freed, and sets a skipped flag. The alternate
    set function is just like the original, except that it fails if the skipped
    flag is set. Only try_rgrp_unlink() calls gfs2_iget_skip() instead of
    gfs2_iget().

    Signed-off-by: Benjamin E. Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

09 Jul, 2007

4 commits

  • GFS2 has been passing i_mode within NFS File Handle. Other than the
    wrong assumption that there is always room for this extra 16 bit value,
    the current gfs2_get_dentry doesn't really need the i_mode to work
    correctly. Note that GFS2 NFS code does go thru the same lookup code
    path as direct file access route (where the mode is obtained from name
    lookup) but gfs2_get_dentry() is coded for different purpose. It is not
    used during lookup time. It is part of the file access procedure call.
    When the call is invoked, if on-disk inode is not in-memory, it has to
    be read-in. This makes i_mode passing a useless overhead.

    Signed-off-by: S. Wendy Cheng
    Signed-off-by: Steven Whitehouse

    Wendy Cheng
     
  • GFS2 lookup code doesn't ask for inode shared glock. This implies during
    in-memory inode creation for existing file, GFS2 will not disk-read in
    the inode contents. This leaves no_formal_ino un-initialized during
    lookup time. The un-initialized no_formal_ino is subsequently encoded
    into file handle. Clients will get ESTALE error whenever it tries to
    access these files.

    Signed-off-by: S. Wendy Cheng
    Signed-off-by: Steven Whitehouse

    Wendy Cheng
     
  • This patch fixes some sign issues which were accidentally introduced
    into the quota & statfs code during the endianess annotation process.
    Also included is a general clean up which moves all of the _host
    structures out of gfs2_ondisk.h (where they should not have been to
    start with) and into the places where they are actually used (often only
    one place). Also those _host structures which are not required any more
    are removed entirely (which is the eventual plan for all of them).

    The conversion routines from ondisk.c are also moved into the places
    where they are actually used, which for almost every one, was just one
    single place, so all those are now static functions. This also cleans up
    the end of gfs2_ondisk.h which no longer needs the #ifdef __KERNEL__.

    The net result is a reduction of about 100 lines of code, many functions
    now marked static plus the bug fixes as mentioned above. For good
    measure I ran the code through sparse after making these changes to
    check that there are no warnings generated.

    This fixes Red Hat bz #239686

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch cleans up the inode number handling code. The main difference
    is that instead of looking up the inodes using a struct gfs2_inum_host
    we now use just the no_addr member of this structure. The tests relating
    to no_formal_ino can then be done by the calling code. This has
    advantages in that we want to do different things in different code
    paths if the no_formal_ino doesn't match. In the NFS patch we want to
    return -ESTALE, but in the ->lookup() path, its a bug in the fs if the
    no_formal_ino doesn't match and thus we can withdraw in this case.

    In order to later fix bz #201012, we need to be able to look up an inode
    without knowing no_formal_ino, as the only information that is known to
    us is the on-disk location of the inode in question.

    This patch will also help us to fix bz #236099 at a later date by
    cleaning up a lot of the code in that area.

    There are no user visible changes as a result of this patch and there
    are no changes to the on-disk format either.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

30 Nov, 2006

3 commits

  • The gfs2_glock_nq_m_atime function is unused in so far as its only
    ever called with num_gh = 1, and this falls through to the
    gfs2_glock_nq_atime function, so we might as well call that directly.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This function wasn't really doing the right thing. There was no need
    to update the inode size at this point and the updating of the
    i_blocks field has now been moved to the places where di_blocks is
    updated. A result of this patch and some those preceeding it is that
    unlocking a glock is now a much more efficient process, since there
    is no longer any requirement to copy data from the gfs2 inode into
    the vfs inode at this point.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Remove the di_[amc]time fields and use inode->i_[amc]time
    fields instead. This saves 24 bytes from the gfs2_inode.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse