10 Oct, 2007

2 commits

  • There is a possible deadlock between two processes on the same node, where one
    process is deleting an inode, and another process is looking for allocated but
    unused inodes to delete in order to create more space.

    process A does an iput() on inode X, and it's i_count drops to 0. This causes
    iput_final() to be called, which puts an inode into state I_FREEING at
    generic_delete_inode(). There no point between when iput_final() is called, and
    when I_FREEING is set where GFS2 could acquire any glocks. Once I_FREEING is
    set, no other process on that node can successfully look up that inode until
    the delete finishes.

    process B locks the the resource group for the same inode in get_local_rgrp(),
    which is called by gfs2_inplace_reserve_i()

    process A tries to lock the resource group for the inode in
    gfs2_dinode_dealloc(), but it's already locked by process B

    process B waits in find_inode for the inode to have the I_FREEING state cleared.

    Deadlock.

    This patch solves the problem by adding an alternative to gfs2_iget(),
    gfs2_iget_skip(), that simply skips any inodes that are in the I_FREEING
    state.o The alternate test function is just like the original one, except that
    it fails if the inode is being freed, and sets a skipped flag. The alternate
    set function is just like the original, except that it fails if the skipped
    flag is set. Only try_rgrp_unlink() calls gfs2_iget_skip() instead of
    gfs2_iget().

    Signed-off-by: Benjamin E. Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • Fix a nasty inode meta data corruption issue by keeping the buffer head in
    icache array. This buffer needs to stay in memory until journal flush occurs
    Otherwise, gfs2_meta_inode_buffer could do a disk read before the inode hits
    disk. It ends up with meta data corruptions. The buffer will be released as
    part of the existing journal flush logic.

    Signed-off-by: S. Wendy Cheng
    Signed-off-by: Steven Whitehouse

    Wendy Cheng
     

09 Jul, 2007

9 commits

  • GFS2 has been passing i_mode within NFS File Handle. Other than the
    wrong assumption that there is always room for this extra 16 bit value,
    the current gfs2_get_dentry doesn't really need the i_mode to work
    correctly. Note that GFS2 NFS code does go thru the same lookup code
    path as direct file access route (where the mode is obtained from name
    lookup) but gfs2_get_dentry() is coded for different purpose. It is not
    used during lookup time. It is part of the file access procedure call.
    When the call is invoked, if on-disk inode is not in-memory, it has to
    be read-in. This makes i_mode passing a useless overhead.

    Signed-off-by: S. Wendy Cheng
    Signed-off-by: Steven Whitehouse

    Wendy Cheng
     
  • GFS2 lookup code doesn't ask for inode shared glock. This implies during
    in-memory inode creation for existing file, GFS2 will not disk-read in
    the inode contents. This leaves no_formal_ino un-initialized during
    lookup time. The un-initialized no_formal_ino is subsequently encoded
    into file handle. Clients will get ESTALE error whenever it tries to
    access these files.

    Signed-off-by: S. Wendy Cheng
    Signed-off-by: Steven Whitehouse

    Wendy Cheng
     
  • There were two issues during deallocation of unlinked inodes. The
    first was relating to the use of a "try" lock which in the case of
    the inode lock wasn't trying hard enough to deallocate in all
    circumstances (now changed to a normal glock) and in the case of
    the iopen lock didn't wait for the demotion of the shared lock before
    attempting to get the exclusive lock, and thereby sometimes (timing dependent)
    not completing the deallocation when it should have done.

    The second issue related to the lack of a way to invalidate dcache entries
    on remote nodes (now fixed by this patch) which meant that unlinks were
    taking a long time to return disk space to the fs. By adding some code to
    invalidate the dcache entries across the cluster for unlinked inodes, that
    is now fixed.

    This patch was written jointly by Abhijith Das and Steven Whitehouse.

    Signed-off-by: Abhijith Das
    Signed-off-by: Steven Whitehouse

    Abhijith Das
     
  • fs/gfs2/inode.c: In function 'gfs2_lookupi':
    fs/gfs2/inode.c:392: warning: 'error' may be used uninitialized in this function

    Looks like a real bug to me.

    Cc: Steven Whitehouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Steven Whitehouse

    akpm@linux-foundation.org
     
  • Under certain circumstances its possible (though rather unlikely) that
    inodes which were unlinked by one node while still open on another might
    get "lost" in the sense that they don't get deallocated if the node
    which held the inode open crashed before it was unlinked.

    This patch adds the recovery code which allows automatic deallocation of
    the inode if its found during block allocation (the sensible time to
    look for such inodes since we are scanning the rgrp's bitmaps anyway at
    this time, so it adds no overhead to do this).

    Since the inode will have had its i_nlink set to zero, all we need to
    trigger recovery is a lookup and an iput(), and the normal deallocation
    code takes care of the rest.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This fixes a bug in the ordering of operations in the error path of
    createi. Its not valid to do an iput() when holding the inode's glock
    since the iput() will (in this case) result in delete_inode() being
    called which needs to grab the lock itself. This was causing the
    recursive lock checking code to trigger.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This adds a nanosecond timestamp feature to the GFS2 filesystem. Due
    to the way that the on-disk format works, older filesystems will just
    appear to have this field set to zero. When mounted by an older version
    of GFS2, the filesystem will simply ignore the extra fields so that
    it will again appear to have whole second resolution, so that its
    trivially backward compatible.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch fixes some sign issues which were accidentally introduced
    into the quota & statfs code during the endianess annotation process.
    Also included is a general clean up which moves all of the _host
    structures out of gfs2_ondisk.h (where they should not have been to
    start with) and into the places where they are actually used (often only
    one place). Also those _host structures which are not required any more
    are removed entirely (which is the eventual plan for all of them).

    The conversion routines from ondisk.c are also moved into the places
    where they are actually used, which for almost every one, was just one
    single place, so all those are now static functions. This also cleans up
    the end of gfs2_ondisk.h which no longer needs the #ifdef __KERNEL__.

    The net result is a reduction of about 100 lines of code, many functions
    now marked static plus the bug fixes as mentioned above. For good
    measure I ran the code through sparse after making these changes to
    check that there are no warnings generated.

    This fixes Red Hat bz #239686

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch cleans up the inode number handling code. The main difference
    is that instead of looking up the inodes using a struct gfs2_inum_host
    we now use just the no_addr member of this structure. The tests relating
    to no_formal_ino can then be done by the calling code. This has
    advantages in that we want to do different things in different code
    paths if the no_formal_ino doesn't match. In the NFS patch we want to
    return -ESTALE, but in the ->lookup() path, its a bug in the fs if the
    no_formal_ino doesn't match and thus we can withdraw in this case.

    In order to later fix bz #201012, we need to be able to look up an inode
    without knowing no_formal_ino, as the only information that is known to
    us is the on-disk location of the inode in question.

    This patch will also help us to fix bz #236099 at a later date by
    cleaning up a lot of the code in that area.

    There are no user visible changes as a result of this patch and there
    are no changes to the on-disk format either.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

08 Mar, 2007

2 commits

  • The following patch fixes Red Hat bz 229831. Without this patch its
    possible for the wrong inode to be returned in certain cases. It is a
    pretty unusual event, so that its taken some time to track down. Thanks
    and due to Josef Whiter who did a lot of the testing required to thrack
    this down and fix it.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • ok, the following is the minimum changes to get NFSD going before we
    settle down this issue .. would appreciate this in the tree so other NFS
    related works can get done in parallel.

    Signed-off-by: S. Wendy Cheng
    Signed-off-by: Steven Whitehouse

    Wendy Cheng
     

06 Feb, 2007

7 commits

  • Move the glock acquisition to outside of the transactions.

    Lock odering must be preserved in order to prevent ABBA
    deadlocks. The current gfs2_change_nlink code would tries
    to grab the glock after having started a transaction and thus is holding
    the log lock. This is inconsistent with other code paths in
    gfs that grab the resource group glock prior to staring
    a tranactions.

    One problem with this fix is that the resource group
    lock is always grabbed now even if the inode still has
    ref count and can not be marked for unlink.

    Signed-off-by: Russell Cattelan
    Signed-off-by: Steven Whitehouse

    Russell Cattelan
     
  • In certain cases, its possible for NFS to call the lookup code while
    holding the glock (when doing a readdirplus operation) so we need to
    check for that and not try and lock the glock twice. This also fixes a
    typo in a previous NFS related GFS2 patch.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • I was looking something else up and came across this...

    I don't honestly have a good reason to change it other than to make it
    like every other Linux filesystem in this regard. ;-) It doesn't
    functionally change anything, but makes some lines shorter. :)

    I'm also curious; why does gfs2 have 64-bits of on-disk timestamps, but
    not in timespec_t format, and only stores second resolutions? Seems like
    you're halfway to sub-second resolutions already.

    I suppose if that gets implemented then all of the below should
    instead be CURRENT_TIME not CURRENT_TIME_SEC.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Steven Whitehouse

    Eric Sandeen
     
  • On Thu, Jan 11, 2007 at 10:26:27PM -0800, Andrew Morton wrote:
    >...
    > Changes since 2.6.20-rc3-mm1:
    >...
    > git-gfs2-nmw.patch
    >...
    > git trees
    >...

    This patch makes the needlessly globlal gfs2_change_nlink_i() static.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Steven Whitehouse

    Adrian Bunk
     
  • Second round of gfs2_rename lock re-ordering to allow Anaconda adding
    root partition on top of gfs2. Previous to this patch the recursive
    lock detector in glock.c can be triggered due to attempting to lock
    the rgrp twice. This fixes it by checking to see whether the rgrp
    is already locked.

    This fixes Red Hat bugzilla #221237

    Signed-off-by: S. Wendy Cheng
    Signed-off-by: Steven Whitehouse

    S. Wendy Cheng
     
  • Update the quilt header comments to match the
    code changes.

    Change gfs2_lookup_simple to return an error in the case
    of a NULL inode.
    The callers of gfs2_lookup_simple do not check for NULL
    in the no entry case and such would end up dereferencing a NULL ptr.

    This fixes:
    http://projects.info-pull.com/mokb/MOKB-15-11-2006.html

    Signed-off-by: Russell Cattelan
    Signed-off-by: Steven Whitehouse

    Russell Cattelan
     
  • Bugzilla 215088

    Fix deadlock in gfs2_change_nlink() while installing RHEL5 into GFS2
    partition. The gfs2_rename() apparently needs block allocation for the
    new name (into the directory) where it requires rg locks. At the same
    time, while updating the nlink count for the replaced file,
    gfs2_change_nlink() tries to return the inode meta-data back to resource
    group where it needs rg locks too. Our logic doesn't allow process to
    acquire these locks recursively by the same process (RHEL installer)
    that results a BUG call. This only happens within rename code path and
    only if the destination file exists before the rename operation.

    Signed-off-by: S. Wendy Cheng
    Signed-off-by: Steven Whitehouse

    S. Wendy Cheng
     

30 Nov, 2006

20 commits