22 Jul, 2016

1 commit

  • Before this patch, if you used gfs2_jadd to add new journals of a
    size smaller than the existing journals, replaying those new journals
    would withdraw. That's because function gfs2_replay_incr_blk was
    using the number of journal blocks (jd_block) from the superblock's
    journal pointer. In other words, "My journal's max size" rather than
    "the journal we're replaying's size." This patch changes the function
    to use the size of the pertinent journal rather than always using the
    journal we happen to be using.

    Signed-off-by: Bob Peterson

    Bob Peterson
     

13 Jan, 2015

1 commit


16 Jul, 2014

1 commit

  • The current "wait_on_bit" interface requires an 'action'
    function to be provided which does the actual waiting.
    There are over 20 such functions, many of them identical.
    Most cases can be satisfied by one of just two functions, one
    which uses io_schedule() and one which just uses schedule().

    So:
    Rename wait_on_bit and wait_on_bit_lock to
    wait_on_bit_action and wait_on_bit_lock_action
    to make it explicit that they need an action function.

    Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
    which are *not* given an action function but implicitly use
    a standard one.
    The decision to error-out if a signal is pending is now made
    based on the 'mode' argument rather than being encoded in the action
    function.

    All instances of the old wait_on_bit and wait_on_bit_lock which
    can use the new version have been changed accordingly and their
    action functions have been discarded.
    wait_on_bit{_lock} does not return any specific error code in the
    event of a signal so the caller must check for non-zero and
    interpolate their own error code as appropriate.

    The wait_on_bit() call in __fscache_wait_on_invalidate() was
    ambiguous as it specified TASK_UNINTERRUPTIBLE but used
    fscache_wait_bit_interruptible as an action function.
    David Howells confirms this should be uniformly
    "uninterruptible"

    The main remaining user of wait_on_bit{,_lock}_action is NFS
    which needs to use a freezer-aware schedule() call.

    A comment in fs/gfs2/glock.c notes that having multiple 'action'
    functions is useful as they display differently in the 'wchan'
    field of 'ps'. (and /proc/$PID/wchan).
    As the new bit_wait{,_io} functions are tagged "__sched", they
    will not show up at all, but something higher in the stack. So
    the distinction will still be visible, only with different
    function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
    gfs2/glock.c case).

    Since first version of this patch (against 3.15) two new action
    functions appeared, on in NFS and one in CIFS. CIFS also now
    uses an action function that makes the same freezer aware
    schedule call as NFS.

    Signed-off-by: NeilBrown
    Acked-by: David Howells (fscache, keys)
    Acked-by: Steven Whitehouse (gfs2)
    Acked-by: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Steve French
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
    Signed-off-by: Ingo Molnar

    NeilBrown
     

04 Jun, 2014

1 commit

  • …teve/gfs2-3.0-nmw into next

    Pull gfs2 updates from Steven Whitehouse:
    "This must be about the smallest merge window patch set ever for GFS2.
    It is probably also the first one without a single patch from me.
    That is down to a combination of factors, and I have some things in
    the works that are not quite ready yet, that I hope to put in next
    time around.

    Returning to what is here this time... we have 3 patches which fix
    various warnings. Two are bug fixes (for quotas and also a rare
    recovery race condition). The final patch, from Ben Marzinski, is an
    important change in the freeze code which has been in progress for
    some time. This removes the need to take and drop the transaction
    lock for every single transaction, when the only time it was used, was
    at file system freeze time. Ben's patch integrates the freeze
    operation into the journal flush code as an alternative with lower
    overheads and also lands up resolving some difficult to fix races at
    the same time"

    * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
    GFS2: Prevent recovery before the local journal is set
    GFS2: fs/gfs2/file.c: kernel-doc warning fixes
    GFS2: fs/gfs2/bmap.c: kernel-doc warning fixes
    GFS2: remove transaction glock
    GFS2: lops.c: replace 0 by NULL for pointers
    GFS2: quotas not being refreshed in gfs2_adjust_quota

    Linus Torvalds
     

14 May, 2014

1 commit

  • GFS2 has a transaction glock, which must be grabbed for every
    transaction, whose purpose is to deal with freezing the filesystem.
    Aside from this involving a large amount of locking, it is very easy to
    make the current fsfreeze code hang on unfreezing.

    This patch rewrites how gfs2 handles freezing the filesystem. The
    transaction glock is removed. In it's place is a freeze glock, which is
    cached (but not held) in a shared state by every node in the cluster
    when the filesystem is mounted. This lock only needs to be grabbed on
    freezing, and actions which need to be safe from freezing, like
    recovery.

    When a node wants to freeze the filesystem, it grabs this glock
    exclusively. When the freeze glock state changes on the nodes (either
    from shared to unlocked, or shared to exclusive), the filesystem does a
    special log flush. gfs2_log_flush() does all the work for flushing out
    the and shutting down the incore log, and then it tries to grab the
    freeze glock in a shared state again. Since the filesystem is stuck in
    gfs2_log_flush, no new transaction can start, and nothing can be written
    to disk. Unfreezing the filesytem simply involes dropping the freeze
    glock, allowing gfs2_log_flush() to grab and then release the shared
    lock, so it is cached for next time.

    However, in order for the unfreezing ioctl to occur, gfs2 needs to get a
    shared lock on the filesystem root directory inode to check permissions.
    If that glock has already been grabbed exclusively, fsfreeze will be
    unable to get the shared lock and unfreeze the filesystem.

    In order to allow the unfreeze, this patch makes gfs2 grab a shared lock
    on the filesystem root directory during the freeze, and hold it until it
    unfreezes the filesystem. The functions which need to grab a shared
    lock in order to allow the unfreeze ioctl to be issued now use the lock
    grabbed by the freeze code instead.

    The freeze and unfreeze code take care to make sure that this shared
    lock will not be dropped while another process is using it.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

18 Apr, 2014

1 commit

  • Mostly scripted conversion of the smp_mb__* barriers.

    Signed-off-by: Peter Zijlstra
    Acked-by: Paul E. McKenney
    Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
    Cc: Linus Torvalds
    Cc: linux-arch@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

07 Mar, 2014

1 commit

  • If multiple nodes fail and their recovery work runs simultaneously, they
    would use the same unprotected variables in the superblock. For example,
    they would stomp on each other's revoked blocks lists, which resulted
    in file system metadata corruption. This patch moves the necessary
    variables so that each journal has its own separate area for tracking
    its journal replay.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

11 Jan, 2012

3 commits

  • If the first mounter fails to recover one of the journals
    during mount, the mount should fail.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Previously, a spectator mount would not even attempt to do
    journal recovery for a failed node. This meant that if all
    mounted nodes were spectators, everyone would be stuck after
    a node failed, all waiting for recovery to be performed.
    This is unnecessary since the failed node had a clean journal.

    Instead, allow a spectator mount to do a partial "read only"
    recovery, which means it will check if the failed journal is
    clean, and if so, report a successful recovery. If the failed
    journal is not clean, it reports that journal recovery failed.
    This makes it work the same as a read only mount on a read only
    block device.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • This new method of managing recovery is an alternative to
    the previous approach of using the userland gfs_controld.

    - use dlm slot numbers to assign journal id's
    - use dlm recovery callbacks to initiate journal recovery
    - use a dlm lock to determine the first node to mount fs
    - use a dlm lock to track journals that need recovery

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

29 Sep, 2010

1 commit

  • The tests further down the recovery function relating to
    unlocking the journal need to be updated to match the
    intial test. Also, a test in the umount code which was
    surplus to requirements has been removed. Umounting
    spectator mounts now works correctly, as expected.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

27 Sep, 2010

1 commit


23 Jul, 2010

1 commit

  • Workqueue can now handle high concurrency. Convert gfs to use
    workqueue instead of slow-work.

    * Steven pointed out that recovery path might be run from allocation
    path and thus requires forward progress guarantee without memory
    allocation. Create and use gfs_recovery_wq with rescuer. Please
    note that forward progress wasn't guaranteed with slow-work.

    * Updated to use non-reentrant workqueue.

    Signed-off-by: Tejun Heo
    Acked-by: Steven Whitehouse

    Tejun Heo
     

03 Dec, 2009

1 commit

  • There are two spare field in the header common to all GFS2
    metadata. One is just the right size to fit a journal id
    in it, and this patch updates the journal code so that each
    time a metadata block is modified, we tag it with the journal
    id of the node which is performing the modification.

    The reason for this is that it should make it much easier to
    debug issues which arise if we can tell which node was the
    last to modify a particular metadata block.

    Since the field is updated before the block is written into
    the journal, each journal should only contain metadata which
    is tagged with its own journal id. The one exception to this
    is the journal header block, which might have a different node's
    id in it, if that journal was recovered by another node in the
    cluster.

    Thus each journal will contain a record of which nodes recovered
    it, via the journal header.

    The other field in the metadata header could potentially be
    used to hold information about what kind of operation was
    performed, but for the time being we just zero it on each
    transaction so that if we use it for that in future, we'll
    know that the information (where it exists) is reliable.

    I did consider using the other field to hold the journal
    sequence number, however since in GFS2's journaling we write
    the modified data into the journal and not the original
    data, this gives no information as to what action caused the
    modification, so I think we can probably come up with a better
    use for those 64 bits in the future.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

21 Nov, 2009

1 commit


20 Nov, 2009

1 commit


19 May, 2009

1 commit

  • This patch fixes a race condition where we can receive recovery
    requests part way through processing a umount. This was causing
    problems since the recovery thread had already gone away.

    Looking in more detail at the recovery code, it was really trying
    to implement a slight variation on a work queue, and that happens to
    align nicely with the recently introduced slow-work subsystem. As a
    result I've updated the code to use slow-work, rather than its own home
    grown variety of work queue.

    When using the wait_on_bit() function, I noticed that the wait function
    that was supplied as an argument was appearing in the WCHAN field, so
    I've updated the function names in order to produce more meaningful
    output.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

24 Mar, 2009

1 commit

  • This is the big patch that I've been working on for some time
    now. There are many reasons for wanting to make this change
    such as:
    o Reducing overhead by eliminating duplicated fields between structures
    o Simplifcation of the code (reduces the code size by a fair bit)
    o The locking interface is now the DLM interface itself as proposed
    some time ago.
    o Fewer lookups of glocks when processing replies from the DLM
    o Fewer memory allocations/deallocations for each glock
    o Scope to do further optimisations in the future (but this patch is
    more than big enough for now!)

    Please note that (a) this patch relates to the lock_dlm module and
    not the DLM itself, that is still a separate module; and (b) that
    we retain the ability to build GFS2 as a standalone single node
    filesystem with out requiring the DLM.

    This patch needs a lot of testing, hence my keeping it I restarted
    my -git tree after the last merge window. That way, this has the maximum
    exposure before its merged. This is (modulo a few minor bug fixes) the
    same patch that I've been posting on and off the the last three months
    and its passed a number of different tests so far.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

05 Jan, 2009

2 commits

  • The functions which are being moved can all be marked
    static in their new locations, since they only have
    a single caller each. Their new locations are more
    logical than before and some of the functions are
    small enough that the compiler might well inline them.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • By moving gfs2_recoverd, we can make an additional function static
    and it also leaves only (the already scheduled for removal) gfs2_glockd
    in daemon.c.

    At the same time the declaration of gfs2_quotad is moved to quota.h
    to reflect the new location of gfs2_quotad in a previous patch. Also
    the recovery.h and quota.h headers are cleaned up.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

27 Jun, 2008

2 commits

  • This patch merges the lock_nolock module into GFS2 itself. As well as removing
    some of the overhead of the module, it also means that its now impossible to
    build GFS2 without a lock module (which would be a pointless thing to do
    anyway).

    We also plan to merge lock_dlm into GFS2 in the future, but that is a more
    tricky task, and will therefore be a separate patch.

    Signed-off-by: Steven Whitehouse
    Cc: David Teigland

    Steven Whitehouse
     
  • This patch implements a number of cleanups to the core of the
    GFS2 glock code. As a result a lot of code is removed. It looks
    like a really big change, but actually a large part of this patch
    is either removing or moving existing code.

    There are some new bits too though, such as the new run_queue()
    function which is considerably streamlined. Highlights of this
    patch include:

    o Fixes a cluster coherency bug during SH -> EX lock conversions
    o Removes the "glmutex" code in favour of a single bit lock
    o Removes the ->go_xmote_bh() for inodes since it was duplicating
    ->go_lock()
    o We now only use the ->lm_lock() function for both locks and
    unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED)
    o The fast path is considerably shortly, giving performance gains
    especially with lock_nolock
    o The glock_workqueue is now used for all the callbacks from the DLM
    which allows us to simplify the lock_dlm module (see following patch)
    o The way is now open to make further changes such as eliminating the two
    threads (gfs2_glockd and gfs2_scand) in favour of a more efficient
    scheme.

    This patch has undergone extensive testing with various test suites
    so it should be pretty stable by now.

    Signed-off-by: Steven Whitehouse
    Cc: Bob Peterson

    Steven Whitehouse
     

10 Apr, 2008

1 commit

  • There are several places where GFP_KERNEL allocations happen under a glock,
    which will result in hangs if we're under memory pressure and go to re-enter the
    fs in order to flush stuff out. This patch changes the culprits to GFS_NOFS to
    keep this problem from happening. Thank you,

    Signed-off-by: Josef Bacik
    Signed-off-by: Steven Whitehouse

    Josef Bacik
     

31 Mar, 2008

2 commits

  • fs/gfs2/recovery.c: In function 'get_log_header':
    fs/gfs2/recovery.c:152: warning: 'lh.lh_sequence' may be used uninitialized in this function
    fs/gfs2/recovery.c:152: warning: 'lh.lh_flags' may be used uninitialized in this function
    fs/gfs2/recovery.c:152: warning: 'lh.lh_tail' may be used uninitialized in this function
    fs/gfs2/recovery.c:152: warning: 'lh.lh_blkno' may be used uninitialized in this function
    fs/gfs2/recovery.c:152: warning: 'lh.lh_hash' may be used uninitialized in this function

    Cc: David Teigland
    Cc: Bob Peterson
    Signed-off-by: Andrew Morton
    Signed-off-by: Steven Whitehouse

    akpm@linux-foundation.org
     
  • The functions in lm.c were just wrappers which were mostly
    only used in one other file. By moving the functions to
    the files where they are being used, they can be marked
    static and also this will usually result in them being inlined
    since they are often only used from one point in the code.

    A couple of really trivial functions have been inlined by hand
    into the function which called them as it makes the code clearer
    to do that.

    We also gain from one fewer function call in the glock lock and
    unlock paths.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

03 Feb, 2008

1 commit


25 Jan, 2008

2 commits

  • This patch allows gfs2 to perform journal recovery even if it is mounted
    read-only. Strictly speaking, a read-only mount should not be writing to
    the filesystem, but we do this only to perform journal recovery. A
    read-only mount will fail if we don't recover the dirty journal. Also,
    when gfs2 is used as a root filesystem, it will be mounted read-only
    before being mounted read-write during the boot sequence. A failed
    read-only mount will panic the machine during bootup.

    Signed-off-by: Abhijith Das
    Signed-off-by: Steven Whitehouse

    Abhijith Das
     
  • This patch is just a cleanup. Function gfs2_get_block() just calls
    function gfs2_block_map reversing the last two parameters. By
    reversing the parameters, gfs2_block_map() may be called directly
    and function gfs2_get_block may be eliminated altogether.
    Since this function is done for every block operation,
    this streamlines the code and makes it a little bit more efficient.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

10 Oct, 2007

1 commit

  • This is for bugzilla bug #248176: GFS2: invalid metadata block

    Patches 1 thru 3 were accepted upstream, but there were problems
    with 4 and 5. Those issues have been resolved and now the recovery
    tests are passing without errors. This code has gone through
    41 * 3 successful gfs2 recovery tests before it hit an
    unrelated (openais) problem. I'm continuing to test it.

    This is a complete rewrite of patch 5 for bug #248176, written by
    Steve Whitehouse. This is referred to in the bugzilla record as
    "new 6" and "a different solution".

    The problem was that the journal inodes, although protected by
    a glock, were not synched with the other nodes because they don't
    use the inode glock synch operations (i.e. no "glops" were defined).
    Therefore, journal recovery on a journal-recovering node were causing
    the blocks to get out of sync with the node that was actually trying
    to use that journal as it comes back up from a reboot.

    There are two possible solutions: (1) To make the journals use the
    normal inode glock sync operations, or (2) To make the journal
    operations take effect immediately (i.e. no caching). Although
    option 1 works, it turns out to be a lot more code. Steve opted
    for option 2, which is much simpler and therefore less prone to
    regression errors.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    --

    Bob Peterson
     

09 Jul, 2007

1 commit

  • This patch fixes some sign issues which were accidentally introduced
    into the quota & statfs code during the endianess annotation process.
    Also included is a general clean up which moves all of the _host
    structures out of gfs2_ondisk.h (where they should not have been to
    start with) and into the places where they are actually used (often only
    one place). Also those _host structures which are not required any more
    are removed entirely (which is the eventual plan for all of them).

    The conversion routines from ondisk.c are also moved into the places
    where they are actually used, which for almost every one, was just one
    single place, so all those are now static functions. This also cleans up
    the end of gfs2_ondisk.h which no longer needs the #ifdef __KERNEL__.

    The net result is a reduction of about 100 lines of code, many functions
    now marked static plus the bug fixes as mentioned above. For good
    measure I ran the code through sparse after making these changes to
    check that there are no warnings generated.

    This fixes Red Hat bz #239686

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

15 Feb, 2007

1 commit

  • After Al Viro (finally) succeeded in removing the sched.h #include in module.h
    recently, it makes sense again to remove other superfluous sched.h includes.
    There are quite a lot of files which include it but don't actually need
    anything defined in there. Presumably these includes were once needed for
    macros that used to live in sched.h, but moved to other header files in the
    course of cleaning it up.

    To ease the pain, this time I did not fiddle with any header files and only
    removed #includes from .c-files, which tend to cause less trouble.

    Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
    arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
    allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
    configs in arch/arm/configs on arm. I also checked that no new warnings were
    introduced by the patch (actually, some warnings are removed that were emitted
    by unnecessarily included header files).

    Signed-off-by: Tim Schmielau
    Acked-by: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Schmielau
     

06 Dec, 2006

1 commit

  • As per comments from Andrew Morton and Jan Engelhardt, this fixes the
    indent and removes the "static" from a variable declaration since its
    not needed in this case (now allocated on the stack of the function
    in question).

    Signed-off-by: Steven Whitehouse
    Cc: Jan Engelhardt
    Cc: Andrew Morton

    Steven Whitehouse
     

30 Nov, 2006

2 commits


20 Oct, 2006

1 commit

  • This fix means that bmap will map extents of the length requested
    by the VFS rather than guessing at it, or just mapping one block
    at a time. The other callers of gfs2_block_map are audited to ensure
    they send the correct max extent lengths (i.e. set bh->b_size correctly).

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

25 Sep, 2006

1 commit


22 Sep, 2006

1 commit

  • Fix a bug in the directory reading code, where we might have dereferenced
    a NULL pointer in case of OOM. Updated the directory code to use the new
    & improved version of gfs2_meta_ra() which now returns the first block
    that was being read. Previously it was releasing it requiring following
    code to grab the block again at each point it was called.

    Also turned off readahead on directory lookups since we are reading a
    hash table, and therefore reading the entries in order is very
    unlikely. Readahead is still used for all other calls to the
    directory reading function (e.g. when growing the hash table).

    Removed the DIO_START constant. Everywhere this was used, it was
    used to unconditionally start i/o aside from a couple of places, so
    I've removed it and made the couple of exceptions to this rule into
    separate functions.

    Also hunted through the other DIO flags and removed them as arguments
    from functions which were always called with the same combination of
    arguments.

    Updated gfs2_meta_indirect_buffer to be a bit more efficient and
    hopefully also be a bit easier to read.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

19 Sep, 2006

2 commits

  • lm_interface.h has a few out of the tree clients such as GFS1
    and userland tools.

    Right now, these clients keeps a copy of the file in their build tree
    that can go out of sync.

    Move lm_interface.h to include/linux, export it to userland and
    clean up fs/gfs2 to use the new location.

    Signed-off-by: Fabio M. Di Nitto
    Signed-off-by: Steven Whitehouse

    Fabio Massimo Di Nitto
     
  • This is a tidy up of the GFS2 bmap code. The main change is that the
    bh is passed to gfs2_block_map allowing the flags to be set directly
    rather than having to repeat that code several times in ops_address.c.

    At the same time, the extent mapping code from gfs2_extent_map has
    been moved into gfs2_block_map. This allows all calls to gfs2_block_map
    to map extents in the case that no allocation is taking place. As a
    result reads and non-allocating writes should be faster. A quick test
    with postmark appears to support this.

    There is a limit on the number of blocks mapped in a single bmap
    call in that it will only ever map blocks which are pointed to
    from a single pointer block. So in other words, it will never try
    to do additional i/o in order to satisfy read-ahead. The maximum
    number of blocks is thus somewhat less than 512 (the GFS2 4k block
    size minus the header divided by sizeof(u64)). I've further limited
    the mapping of "normal" blocks to 32 blocks (to avoid extra work)
    since readpages() will currently read a maximum of 32 blocks ahead (128k).

    Some further work will probably be needed to set a suitable value
    for DIO as well, but for now thats left at the maximum 512 (see
    ops_address.c:gfs2_get_block_direct).

    There is probably a lot more that can be done to improve bmap for GFS2,
    but this is a good first step.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

06 Sep, 2006

1 commit