11 Jan, 2012

3 commits

  • If the first mounter fails to recover one of the journals
    during mount, the mount should fail.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Previously, a spectator mount would not even attempt to do
    journal recovery for a failed node. This meant that if all
    mounted nodes were spectators, everyone would be stuck after
    a node failed, all waiting for recovery to be performed.
    This is unnecessary since the failed node had a clean journal.

    Instead, allow a spectator mount to do a partial "read only"
    recovery, which means it will check if the failed journal is
    clean, and if so, report a successful recovery. If the failed
    journal is not clean, it reports that journal recovery failed.
    This makes it work the same as a read only mount on a read only
    block device.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • This new method of managing recovery is an alternative to
    the previous approach of using the userland gfs_controld.

    - use dlm slot numbers to assign journal id's
    - use dlm recovery callbacks to initiate journal recovery
    - use a dlm lock to determine the first node to mount fs
    - use a dlm lock to track journals that need recovery

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

29 Sep, 2010

1 commit

  • The tests further down the recovery function relating to
    unlocking the journal need to be updated to match the
    intial test. Also, a test in the umount code which was
    surplus to requirements has been removed. Umounting
    spectator mounts now works correctly, as expected.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

27 Sep, 2010

1 commit


23 Jul, 2010

1 commit

  • Workqueue can now handle high concurrency. Convert gfs to use
    workqueue instead of slow-work.

    * Steven pointed out that recovery path might be run from allocation
    path and thus requires forward progress guarantee without memory
    allocation. Create and use gfs_recovery_wq with rescuer. Please
    note that forward progress wasn't guaranteed with slow-work.

    * Updated to use non-reentrant workqueue.

    Signed-off-by: Tejun Heo
    Acked-by: Steven Whitehouse

    Tejun Heo
     

03 Dec, 2009

1 commit

  • There are two spare field in the header common to all GFS2
    metadata. One is just the right size to fit a journal id
    in it, and this patch updates the journal code so that each
    time a metadata block is modified, we tag it with the journal
    id of the node which is performing the modification.

    The reason for this is that it should make it much easier to
    debug issues which arise if we can tell which node was the
    last to modify a particular metadata block.

    Since the field is updated before the block is written into
    the journal, each journal should only contain metadata which
    is tagged with its own journal id. The one exception to this
    is the journal header block, which might have a different node's
    id in it, if that journal was recovered by another node in the
    cluster.

    Thus each journal will contain a record of which nodes recovered
    it, via the journal header.

    The other field in the metadata header could potentially be
    used to hold information about what kind of operation was
    performed, but for the time being we just zero it on each
    transaction so that if we use it for that in future, we'll
    know that the information (where it exists) is reliable.

    I did consider using the other field to hold the journal
    sequence number, however since in GFS2's journaling we write
    the modified data into the journal and not the original
    data, this gives no information as to what action caused the
    modification, so I think we can probably come up with a better
    use for those 64 bits in the future.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

21 Nov, 2009

1 commit


20 Nov, 2009

1 commit


19 May, 2009

1 commit

  • This patch fixes a race condition where we can receive recovery
    requests part way through processing a umount. This was causing
    problems since the recovery thread had already gone away.

    Looking in more detail at the recovery code, it was really trying
    to implement a slight variation on a work queue, and that happens to
    align nicely with the recently introduced slow-work subsystem. As a
    result I've updated the code to use slow-work, rather than its own home
    grown variety of work queue.

    When using the wait_on_bit() function, I noticed that the wait function
    that was supplied as an argument was appearing in the WCHAN field, so
    I've updated the function names in order to produce more meaningful
    output.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

24 Mar, 2009

1 commit

  • This is the big patch that I've been working on for some time
    now. There are many reasons for wanting to make this change
    such as:
    o Reducing overhead by eliminating duplicated fields between structures
    o Simplifcation of the code (reduces the code size by a fair bit)
    o The locking interface is now the DLM interface itself as proposed
    some time ago.
    o Fewer lookups of glocks when processing replies from the DLM
    o Fewer memory allocations/deallocations for each glock
    o Scope to do further optimisations in the future (but this patch is
    more than big enough for now!)

    Please note that (a) this patch relates to the lock_dlm module and
    not the DLM itself, that is still a separate module; and (b) that
    we retain the ability to build GFS2 as a standalone single node
    filesystem with out requiring the DLM.

    This patch needs a lot of testing, hence my keeping it I restarted
    my -git tree after the last merge window. That way, this has the maximum
    exposure before its merged. This is (modulo a few minor bug fixes) the
    same patch that I've been posting on and off the the last three months
    and its passed a number of different tests so far.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

05 Jan, 2009

2 commits

  • The functions which are being moved can all be marked
    static in their new locations, since they only have
    a single caller each. Their new locations are more
    logical than before and some of the functions are
    small enough that the compiler might well inline them.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • By moving gfs2_recoverd, we can make an additional function static
    and it also leaves only (the already scheduled for removal) gfs2_glockd
    in daemon.c.

    At the same time the declaration of gfs2_quotad is moved to quota.h
    to reflect the new location of gfs2_quotad in a previous patch. Also
    the recovery.h and quota.h headers are cleaned up.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

27 Jun, 2008

2 commits

  • This patch merges the lock_nolock module into GFS2 itself. As well as removing
    some of the overhead of the module, it also means that its now impossible to
    build GFS2 without a lock module (which would be a pointless thing to do
    anyway).

    We also plan to merge lock_dlm into GFS2 in the future, but that is a more
    tricky task, and will therefore be a separate patch.

    Signed-off-by: Steven Whitehouse
    Cc: David Teigland

    Steven Whitehouse
     
  • This patch implements a number of cleanups to the core of the
    GFS2 glock code. As a result a lot of code is removed. It looks
    like a really big change, but actually a large part of this patch
    is either removing or moving existing code.

    There are some new bits too though, such as the new run_queue()
    function which is considerably streamlined. Highlights of this
    patch include:

    o Fixes a cluster coherency bug during SH -> EX lock conversions
    o Removes the "glmutex" code in favour of a single bit lock
    o Removes the ->go_xmote_bh() for inodes since it was duplicating
    ->go_lock()
    o We now only use the ->lm_lock() function for both locks and
    unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED)
    o The fast path is considerably shortly, giving performance gains
    especially with lock_nolock
    o The glock_workqueue is now used for all the callbacks from the DLM
    which allows us to simplify the lock_dlm module (see following patch)
    o The way is now open to make further changes such as eliminating the two
    threads (gfs2_glockd and gfs2_scand) in favour of a more efficient
    scheme.

    This patch has undergone extensive testing with various test suites
    so it should be pretty stable by now.

    Signed-off-by: Steven Whitehouse
    Cc: Bob Peterson

    Steven Whitehouse
     

10 Apr, 2008

1 commit

  • There are several places where GFP_KERNEL allocations happen under a glock,
    which will result in hangs if we're under memory pressure and go to re-enter the
    fs in order to flush stuff out. This patch changes the culprits to GFS_NOFS to
    keep this problem from happening. Thank you,

    Signed-off-by: Josef Bacik
    Signed-off-by: Steven Whitehouse

    Josef Bacik
     

31 Mar, 2008

2 commits

  • fs/gfs2/recovery.c: In function 'get_log_header':
    fs/gfs2/recovery.c:152: warning: 'lh.lh_sequence' may be used uninitialized in this function
    fs/gfs2/recovery.c:152: warning: 'lh.lh_flags' may be used uninitialized in this function
    fs/gfs2/recovery.c:152: warning: 'lh.lh_tail' may be used uninitialized in this function
    fs/gfs2/recovery.c:152: warning: 'lh.lh_blkno' may be used uninitialized in this function
    fs/gfs2/recovery.c:152: warning: 'lh.lh_hash' may be used uninitialized in this function

    Cc: David Teigland
    Cc: Bob Peterson
    Signed-off-by: Andrew Morton
    Signed-off-by: Steven Whitehouse

    akpm@linux-foundation.org
     
  • The functions in lm.c were just wrappers which were mostly
    only used in one other file. By moving the functions to
    the files where they are being used, they can be marked
    static and also this will usually result in them being inlined
    since they are often only used from one point in the code.

    A couple of really trivial functions have been inlined by hand
    into the function which called them as it makes the code clearer
    to do that.

    We also gain from one fewer function call in the glock lock and
    unlock paths.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

03 Feb, 2008

1 commit


25 Jan, 2008

2 commits

  • This patch allows gfs2 to perform journal recovery even if it is mounted
    read-only. Strictly speaking, a read-only mount should not be writing to
    the filesystem, but we do this only to perform journal recovery. A
    read-only mount will fail if we don't recover the dirty journal. Also,
    when gfs2 is used as a root filesystem, it will be mounted read-only
    before being mounted read-write during the boot sequence. A failed
    read-only mount will panic the machine during bootup.

    Signed-off-by: Abhijith Das
    Signed-off-by: Steven Whitehouse

    Abhijith Das
     
  • This patch is just a cleanup. Function gfs2_get_block() just calls
    function gfs2_block_map reversing the last two parameters. By
    reversing the parameters, gfs2_block_map() may be called directly
    and function gfs2_get_block may be eliminated altogether.
    Since this function is done for every block operation,
    this streamlines the code and makes it a little bit more efficient.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

10 Oct, 2007

1 commit

  • This is for bugzilla bug #248176: GFS2: invalid metadata block

    Patches 1 thru 3 were accepted upstream, but there were problems
    with 4 and 5. Those issues have been resolved and now the recovery
    tests are passing without errors. This code has gone through
    41 * 3 successful gfs2 recovery tests before it hit an
    unrelated (openais) problem. I'm continuing to test it.

    This is a complete rewrite of patch 5 for bug #248176, written by
    Steve Whitehouse. This is referred to in the bugzilla record as
    "new 6" and "a different solution".

    The problem was that the journal inodes, although protected by
    a glock, were not synched with the other nodes because they don't
    use the inode glock synch operations (i.e. no "glops" were defined).
    Therefore, journal recovery on a journal-recovering node were causing
    the blocks to get out of sync with the node that was actually trying
    to use that journal as it comes back up from a reboot.

    There are two possible solutions: (1) To make the journals use the
    normal inode glock sync operations, or (2) To make the journal
    operations take effect immediately (i.e. no caching). Although
    option 1 works, it turns out to be a lot more code. Steve opted
    for option 2, which is much simpler and therefore less prone to
    regression errors.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    --

    Bob Peterson
     

09 Jul, 2007

1 commit

  • This patch fixes some sign issues which were accidentally introduced
    into the quota & statfs code during the endianess annotation process.
    Also included is a general clean up which moves all of the _host
    structures out of gfs2_ondisk.h (where they should not have been to
    start with) and into the places where they are actually used (often only
    one place). Also those _host structures which are not required any more
    are removed entirely (which is the eventual plan for all of them).

    The conversion routines from ondisk.c are also moved into the places
    where they are actually used, which for almost every one, was just one
    single place, so all those are now static functions. This also cleans up
    the end of gfs2_ondisk.h which no longer needs the #ifdef __KERNEL__.

    The net result is a reduction of about 100 lines of code, many functions
    now marked static plus the bug fixes as mentioned above. For good
    measure I ran the code through sparse after making these changes to
    check that there are no warnings generated.

    This fixes Red Hat bz #239686

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

15 Feb, 2007

1 commit

  • After Al Viro (finally) succeeded in removing the sched.h #include in module.h
    recently, it makes sense again to remove other superfluous sched.h includes.
    There are quite a lot of files which include it but don't actually need
    anything defined in there. Presumably these includes were once needed for
    macros that used to live in sched.h, but moved to other header files in the
    course of cleaning it up.

    To ease the pain, this time I did not fiddle with any header files and only
    removed #includes from .c-files, which tend to cause less trouble.

    Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
    arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
    allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
    configs in arch/arm/configs on arm. I also checked that no new warnings were
    introduced by the patch (actually, some warnings are removed that were emitted
    by unnecessarily included header files).

    Signed-off-by: Tim Schmielau
    Acked-by: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Schmielau
     

06 Dec, 2006

1 commit

  • As per comments from Andrew Morton and Jan Engelhardt, this fixes the
    indent and removes the "static" from a variable declaration since its
    not needed in this case (now allocated on the stack of the function
    in question).

    Signed-off-by: Steven Whitehouse
    Cc: Jan Engelhardt
    Cc: Andrew Morton

    Steven Whitehouse
     

30 Nov, 2006

2 commits


20 Oct, 2006

1 commit

  • This fix means that bmap will map extents of the length requested
    by the VFS rather than guessing at it, or just mapping one block
    at a time. The other callers of gfs2_block_map are audited to ensure
    they send the correct max extent lengths (i.e. set bh->b_size correctly).

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

25 Sep, 2006

1 commit


22 Sep, 2006

1 commit

  • Fix a bug in the directory reading code, where we might have dereferenced
    a NULL pointer in case of OOM. Updated the directory code to use the new
    & improved version of gfs2_meta_ra() which now returns the first block
    that was being read. Previously it was releasing it requiring following
    code to grab the block again at each point it was called.

    Also turned off readahead on directory lookups since we are reading a
    hash table, and therefore reading the entries in order is very
    unlikely. Readahead is still used for all other calls to the
    directory reading function (e.g. when growing the hash table).

    Removed the DIO_START constant. Everywhere this was used, it was
    used to unconditionally start i/o aside from a couple of places, so
    I've removed it and made the couple of exceptions to this rule into
    separate functions.

    Also hunted through the other DIO flags and removed them as arguments
    from functions which were always called with the same combination of
    arguments.

    Updated gfs2_meta_indirect_buffer to be a bit more efficient and
    hopefully also be a bit easier to read.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

19 Sep, 2006

2 commits

  • lm_interface.h has a few out of the tree clients such as GFS1
    and userland tools.

    Right now, these clients keeps a copy of the file in their build tree
    that can go out of sync.

    Move lm_interface.h to include/linux, export it to userland and
    clean up fs/gfs2 to use the new location.

    Signed-off-by: Fabio M. Di Nitto
    Signed-off-by: Steven Whitehouse

    Fabio Massimo Di Nitto
     
  • This is a tidy up of the GFS2 bmap code. The main change is that the
    bh is passed to gfs2_block_map allowing the flags to be set directly
    rather than having to repeat that code several times in ops_address.c.

    At the same time, the extent mapping code from gfs2_extent_map has
    been moved into gfs2_block_map. This allows all calls to gfs2_block_map
    to map extents in the case that no allocation is taking place. As a
    result reads and non-allocating writes should be faster. A quick test
    with postmark appears to support this.

    There is a limit on the number of blocks mapped in a single bmap
    call in that it will only ever map blocks which are pointed to
    from a single pointer block. So in other words, it will never try
    to do additional i/o in order to satisfy read-ahead. The maximum
    number of blocks is thus somewhat less than 512 (the GFS2 4k block
    size minus the header divided by sizeof(u64)). I've further limited
    the mapping of "normal" blocks to 32 blocks (to avoid extra work)
    since readpages() will currently read a maximum of 32 blocks ahead (128k).

    Some further work will probably be needed to set a suitable value
    for DIO as well, but for now thats left at the maximum 512 (see
    ops_address.c:gfs2_get_block_direct).

    There is probably a lot more that can be done to improve bmap for GFS2,
    but this is a good first step.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

06 Sep, 2006

1 commit


05 Sep, 2006

1 commit


01 Sep, 2006

1 commit

  • As per comments from Jan Engelhardt this
    updates the copyright message to say "version" in full rather than
    "v.2". Also incore.h has been updated to remove forward structure
    declarations which are not required.

    The gfs2_quota_lvb structure has now had endianess annotations added
    to it. Also quota.c has been updated so that we now store the
    lvb data locally in endian independant format to avoid needing
    a structure in host endianess too. As a result the endianess
    conversions are done as required at various points and thus the
    conversion routines in lvb.[ch] are no longer required. I've
    moved the one remaining constant in lvb.h thats used into lm.h
    and removed the unused lvb.[ch].

    I have not changed the HIF_ constants. That is left to a later patch
    which I hope will unify the gh_flags and gh_iflags fields of the
    struct gfs2_holder.

    Cc: Jan Engelhardt
    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

11 Aug, 2006

1 commit

  • recovery.c add a brelse to deal with gfs2_replay_read_block being called
    twice on the same block.

    add a dput to drop the ref count on the root inode.
    This was causing lingering glocks and thus causing
    a mount failure to hang.

    Fix a endian conversion macro that was was swizzling
    16bits when it should have been swizzling 32.

    Signed-off-by: Russell Cattelan
    Signed-off-by: Steven Whitehouse

    Russell Cattelan
     

05 Aug, 2006

1 commit

  • Mmapped files were able to trigger a lock ordering bug. Private
    maps do not need to take the glock so early on. Shared maps do
    unfortunately, however we can get around that by adding a flag
    into the flags for the struct gfs2_file. This only works because
    we are taking an exclusive lock at this point, so we know that
    nobody else can be racing with us.

    Fixes Red Hat bugzilla: #201196

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

15 Jun, 2006

1 commit

  • This patch fixes the way we have been dealing with unlinked,
    but still open files. It removes all limits (other than memory
    for inodes, as per every other filesystem) on numbers of these
    which we can support on GFS2. It also means that (like other
    fs) its the responsibility of the last process to close the file
    to deallocate the storage, rather than the person who did the
    unlinking. Note that with GFS2, those two events might take place
    on different nodes.

    Also there are a number of other changes:

    o We use the Linux inode subsystem as it was intended to be
    used, wrt allocating GFS2 inodes
    o The Linux inode cache is now the point which we use for
    local enforcement of only holding one copy of the inode in
    core at once (previous to this we used the glock layer).
    o We no longer use the unlinked "special" file. We just ignore it
    completely. This makes unlinking more efficient.
    o We now use the 4th block allocation state. The previously unused
    state is used to track unlinked but still open inodes.
    o gfs2_inoded is no longer needed
    o Several fields are now no longer needed (and removed) from the in
    core struct gfs2_inode
    o Several fields are no longer needed (and removed) from the in core
    superblock

    There are a number of future possible optimisations and clean ups
    which have been made possible by this patch.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

19 May, 2006

2 commits