10 Oct, 2007

2 commits

  • There is a possible deadlock between two processes on the same node, where one
    process is deleting an inode, and another process is looking for allocated but
    unused inodes to delete in order to create more space.

    process A does an iput() on inode X, and it's i_count drops to 0. This causes
    iput_final() to be called, which puts an inode into state I_FREEING at
    generic_delete_inode(). There no point between when iput_final() is called, and
    when I_FREEING is set where GFS2 could acquire any glocks. Once I_FREEING is
    set, no other process on that node can successfully look up that inode until
    the delete finishes.

    process B locks the the resource group for the same inode in get_local_rgrp(),
    which is called by gfs2_inplace_reserve_i()

    process A tries to lock the resource group for the inode in
    gfs2_dinode_dealloc(), but it's already locked by process B

    process B waits in find_inode for the inode to have the I_FREEING state cleared.

    Deadlock.

    This patch solves the problem by adding an alternative to gfs2_iget(),
    gfs2_iget_skip(), that simply skips any inodes that are in the I_FREEING
    state.o The alternate test function is just like the original one, except that
    it fails if the inode is being freed, and sets a skipped flag. The alternate
    set function is just like the original, except that it fails if the skipped
    flag is set. Only try_rgrp_unlink() calls gfs2_iget_skip() instead of
    gfs2_iget().

    Signed-off-by: Benjamin E. Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • This was missing from the dir_split_leaf() function although in
    most cases its not a problem due to other functions having
    already previously called gfs2_trans_add_bh. This makes certain
    that it is correct.

    Signed-off-by: Steven Whitehouse
    Cc: Wendy Cheng

    Steven Whitehouse
     

09 Jul, 2007

4 commits

  • GFS2 lookup code doesn't ask for inode shared glock. This implies during
    in-memory inode creation for existing file, GFS2 will not disk-read in
    the inode contents. This leaves no_formal_ino un-initialized during
    lookup time. The un-initialized no_formal_ino is subsequently encoded
    into file handle. Clients will get ESTALE error whenever it tries to
    access these files.

    Signed-off-by: S. Wendy Cheng
    Signed-off-by: Steven Whitehouse

    Wendy Cheng
     
  • This adds a nanosecond timestamp feature to the GFS2 filesystem. Due
    to the way that the on-disk format works, older filesystems will just
    appear to have this field set to zero. When mounted by an older version
    of GFS2, the filesystem will simply ignore the extra fields so that
    it will again appear to have whole second resolution, so that its
    trivially backward compatible.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch fixes some sign issues which were accidentally introduced
    into the quota & statfs code during the endianess annotation process.
    Also included is a general clean up which moves all of the _host
    structures out of gfs2_ondisk.h (where they should not have been to
    start with) and into the places where they are actually used (often only
    one place). Also those _host structures which are not required any more
    are removed entirely (which is the eventual plan for all of them).

    The conversion routines from ondisk.c are also moved into the places
    where they are actually used, which for almost every one, was just one
    single place, so all those are now static functions. This also cleans up
    the end of gfs2_ondisk.h which no longer needs the #ifdef __KERNEL__.

    The net result is a reduction of about 100 lines of code, many functions
    now marked static plus the bug fixes as mentioned above. For good
    measure I ran the code through sparse after making these changes to
    check that there are no warnings generated.

    This fixes Red Hat bz #239686

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch cleans up the inode number handling code. The main difference
    is that instead of looking up the inodes using a struct gfs2_inum_host
    we now use just the no_addr member of this structure. The tests relating
    to no_formal_ino can then be done by the calling code. This has
    advantages in that we want to do different things in different code
    paths if the no_formal_ino doesn't match. In the NFS patch we want to
    return -ESTALE, but in the ->lookup() path, its a bug in the fs if the
    no_formal_ino doesn't match and thus we can withdraw in this case.

    In order to later fix bz #201012, we need to be able to look up an inode
    without knowing no_formal_ino, as the only information that is known to
    us is the on-disk location of the inode in question.

    This patch will also help us to fix bz #236099 at a later date by
    cleaning up a lot of the code in that area.

    There are no user visible changes as a result of this patch and there
    are no changes to the on-disk format either.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

01 May, 2007

2 commits

  • alpha:

    fs/gfs2/dir.c: In function 'gfs2_dir_read_leaf':
    fs/gfs2/dir.c:1322: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'sector_t'
    fs/gfs2/dir.c: In function 'gfs2_dir_read':
    fs/gfs2/dir.c:1455: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type '__u64'

    Cc: Steven Whitehouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Steven Whitehouse

    akpm@linux-foundation.org
     
  • This patch detects when the number of entries in a leaf block or inode
    block (in the case of stuffed directories) is corrupt and informs the
    user. It prevents us from running off the end of the array thats been
    allocated for the sorting in this case,

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

15 Feb, 2007

1 commit

  • After Al Viro (finally) succeeded in removing the sched.h #include in module.h
    recently, it makes sense again to remove other superfluous sched.h includes.
    There are quite a lot of files which include it but don't actually need
    anything defined in there. Presumably these includes were once needed for
    macros that used to live in sched.h, but moved to other header files in the
    course of cleaning it up.

    To ease the pain, this time I did not fiddle with any header files and only
    removed #includes from .c-files, which tend to cause less trouble.

    Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
    arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
    allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
    configs in arch/arm/configs on arm. I also checked that no new warnings were
    introduced by the patch (actually, some warnings are removed that were emitted
    by unnecessarily included header files).

    Signed-off-by: Tim Schmielau
    Acked-by: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Schmielau
     

06 Feb, 2007

2 commits

  • I was looking something else up and came across this...

    I don't honestly have a good reason to change it other than to make it
    like every other Linux filesystem in this regard. ;-) It doesn't
    functionally change anything, but makes some lines shorter. :)

    I'm also curious; why does gfs2 have 64-bits of on-disk timestamps, but
    not in timespec_t format, and only stores second resolutions? Seems like
    you're halfway to sub-second resolutions already.

    I suppose if that gets implemented then all of the below should
    instead be CURRENT_TIME not CURRENT_TIME_SEC.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Steven Whitehouse

    Eric Sandeen
     
  • This removes the extra filldir callback which gfs2 was using to
    enclose an attempt at readahead for inodes during readdir. The
    code was too complicated and also hurts performance badly in the
    case that the getdents64/readdir call isn't being followed by
    stat() and it wasn't even getting it right all the time when it
    was.

    As a result, on my test box an "ls" of a directory containing 250000
    files fell from about 7mins (freshly mounted, so nothing cached) to
    between about 15 to 25 seconds. When the directory content was cached,
    the time taken fell from about 3mins to about 4 or 5 seconds.

    Interestingly in the cached case, running "ls -l" once reduced the time
    taken for subsequent runs of "ls" to about 6 secs even without this
    patch. Now it turns out that there was a special case of glocks being
    used for prefetching the metadata, but because of the timeouts for these
    locks (set to 10 secs) the metadata was being timed out before it was
    being used and this the prefetch code was constantly trying to prefetch
    the same data over and over.

    Calling "ls -l" meant that the inodes were brought into memory and once
    the inodes are cached, the glocks are not disposed of until the inodes
    are pushed out of the cache, thus extending the lifetime of the glocks,
    and thus bringing down the time for subsequent runs of "ls"
    considerably.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

30 Nov, 2006

7 commits

  • When deleting directory entries, we set the inum.no_addr to zero
    in a dirent when its the first dirent in a block and thus cannot
    be merged into the previous dirent as is the usual case. In gfs1,
    inum.no_formal_ino was used instead.

    This patch changes gfs2 to set both inum.no_addr and inum.no_formal_ino
    to zero. It also changes the test from just looking at inum.no_addr to
    look at both inum.no_addr and inum.no_formal_ino and a sentinel is
    now considered to be a dirent in which _either_ (or both) of them
    is set to zero.

    This resolves Red Hat bugzillas: #215809, #211465

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This function wasn't really doing the right thing. There was no need
    to update the inode size at this point and the updating of the
    i_blocks field has now been moved to the places where di_blocks is
    updated. A result of this patch and some those preceeding it is that
    unlocking a glock is now a much more efficient process, since there
    is no longer any requirement to copy data from the gfs2 inode into
    the vfs inode at this point.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This is almost never used. Its there for backward
    compatibility with GFS1. It doesn't need its own
    field since it can always be calculated from the
    inode mode & flags. This saves a bit more space
    in the gfs2_inode.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Remove the di_[amc]time fields and use inode->i_[amc]time
    fields instead. This saves 24 bytes from the gfs2_inode.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Everywhere this was called, a struct gfs2_inode was available,
    but despite that, it was always called with a struct gfs2_dinode
    as an argument. By making this change it paves the way to start
    eliminating fields duplicated between the kernel's struct inode
    and the struct gfs2_dinode.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Signed-off-by: Al Viro
    Signed-off-by: Steven Whitehouse

    Al Viro
     
  • Signed-off-by: Al Viro
    Signed-off-by: Steven Whitehouse

    Al Viro
     

20 Oct, 2006

4 commits


25 Sep, 2006

1 commit


22 Sep, 2006

1 commit

  • Fix a bug in the directory reading code, where we might have dereferenced
    a NULL pointer in case of OOM. Updated the directory code to use the new
    & improved version of gfs2_meta_ra() which now returns the first block
    that was being read. Previously it was releasing it requiring following
    code to grab the block again at each point it was called.

    Also turned off readahead on directory lookups since we are reading a
    hash table, and therefore reading the entries in order is very
    unlikely. Readahead is still used for all other calls to the
    directory reading function (e.g. when growing the hash table).

    Removed the DIO_START constant. Everywhere this was used, it was
    used to unconditionally start i/o aside from a couple of places, so
    I've removed it and made the couple of exceptions to this rule into
    separate functions.

    Also hunted through the other DIO flags and removed them as arguments
    from functions which were always called with the same combination of
    arguments.

    Updated gfs2_meta_indirect_buffer to be a bit more efficient and
    hopefully also be a bit easier to read.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

19 Sep, 2006

1 commit

  • lm_interface.h has a few out of the tree clients such as GFS1
    and userland tools.

    Right now, these clients keeps a copy of the file in their build tree
    that can go out of sync.

    Move lm_interface.h to include/linux, export it to userland and
    clean up fs/gfs2 to use the new location.

    Signed-off-by: Fabio M. Di Nitto
    Signed-off-by: Steven Whitehouse

    Fabio Massimo Di Nitto
     

07 Sep, 2006

1 commit


05 Sep, 2006

3 commits


01 Sep, 2006

1 commit

  • As per comments from Jan Engelhardt this
    updates the copyright message to say "version" in full rather than
    "v.2". Also incore.h has been updated to remove forward structure
    declarations which are not required.

    The gfs2_quota_lvb structure has now had endianess annotations added
    to it. Also quota.c has been updated so that we now store the
    lvb data locally in endian independant format to avoid needing
    a structure in host endianess too. As a result the endianess
    conversions are done as required at various points and thus the
    conversion routines in lvb.[ch] are no longer required. I've
    moved the one remaining constant in lvb.h thats used into lm.h
    and removed the unused lvb.[ch].

    I have not changed the HIF_ constants. That is left to a later patch
    which I hope will unify the gh_flags and gh_iflags fields of the
    struct gfs2_holder.

    Cc: Jan Engelhardt
    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

02 Aug, 2006

1 commit

  • This was a nasty bug which resulted in corruption of hash tables
    in the directory code with larger directories. We forgot to
    increment a pointer in the read/write routines internal to the
    directory code.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

26 Jul, 2006

1 commit

  • Tidy up gfs2_unstuffer_page by:

    a) Moving it into bmap.c
    b) Making it static
    c) Calling it directly from gfs2_unstuff_dinode
    d) Updating all callers of gfs2_unstuff_dinode due to one less
    required argument.

    It doesn't change the behaviour at all.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

17 Jul, 2006

1 commit


12 Jul, 2006

1 commit


10 Jul, 2006

1 commit


15 Jun, 2006

1 commit

  • This patch fixes the way we have been dealing with unlinked,
    but still open files. It removes all limits (other than memory
    for inodes, as per every other filesystem) on numbers of these
    which we can support on GFS2. It also means that (like other
    fs) its the responsibility of the last process to close the file
    to deallocate the storage, rather than the person who did the
    unlinking. Note that with GFS2, those two events might take place
    on different nodes.

    Also there are a number of other changes:

    o We use the Linux inode subsystem as it was intended to be
    used, wrt allocating GFS2 inodes
    o The Linux inode cache is now the point which we use for
    local enforcement of only holding one copy of the inode in
    core at once (previous to this we used the glock layer).
    o We no longer use the unlinked "special" file. We just ignore it
    completely. This makes unlinking more efficient.
    o We now use the 4th block allocation state. The previously unused
    state is used to track unlinked but still open inodes.
    o gfs2_inoded is no longer needed
    o Several fields are now no longer needed (and removed) from the in
    core struct gfs2_inode
    o Several fields are no longer needed (and removed) from the in core
    superblock

    There are a number of future possible optimisations and clean ups
    which have been made possible by this patch.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

19 May, 2006

1 commit


06 May, 2006

1 commit

  • This adds readpages support (and also corrects a small bug in
    the readpage error path at the same time). Hopefully this will
    improve performance by allowing GFS to submit larger lumps of
    I/O at a time.

    In order to simplify the setting of BH_Boundary, it currently gets
    set when we hit the end of a indirect pointer block. There is
    always a boundary at this point with the current allocation code.
    It doesn't get all the boundaries right though, so there is still
    room for improvement in this.

    See comments in fs/gfs2/ops_address.c for further information about
    readpages with GFS2.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

28 Apr, 2006

1 commit

  • This patch contains the following possible cleanups:
    - make needlessly global code static
    - #if 0 unused functions
    - remove the following global function that was both unused and
    unimplemented:
    - super.c: gfs2_do_upgrade()

    Signed-off-by: Adrian Bunk
    Signed-off-by: Steven Whitehouse

    Adrian Bunk
     

24 Apr, 2006

1 commit