11 Jan, 2012

1 commit

  • This new method of managing recovery is an alternative to
    the previous approach of using the userland gfs_controld.

    - use dlm slot numbers to assign journal id's
    - use dlm recovery callbacks to initiate journal recovery
    - use a dlm lock to determine the first node to mount fs
    - use a dlm lock to track journals that need recovery

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

04 Jan, 2012

1 commit

  • These new callbacks notify the dlm user about lock recovery.
    GFS2, and possibly others, need to be aware of when the dlm
    will be doing lock recovery for a failed lockspace member.

    In the past, this coordination has been done between dlm and
    file system daemons in userspace, which then direct their
    kernel counterparts. These callbacks allow the same
    coordination directly, and more simply.

    Signed-off-by: David Teigland

    David Teigland
     

09 Mar, 2011

1 commit

  • This patch fixes a race in deallocating glocks which was introduced
    in the RCU glock patch. We need to ensure that the glock count is
    kept correct even in the case that there is a race to add a new
    glock into the hash table. Also, to avoid having to wait for an
    RCU grace period, the glock counter can be decremented before
    call_rcu() is called.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

21 Jan, 2011

1 commit

  • This has a number of advantages:

    - Reduces contention on the hash table lock
    - Makes the code smaller and simpler
    - Should speed up glock dumps when under load
    - Removes ref count changing in examine_bucket
    - No longer need hash chain lock in glock_put() in common case

    There are some further changes which this enables and which
    we may do in the future. One is to look at using SLAB_RCU,
    and another is to look at using a per-cpu counter for the
    per-sb glock counter, since that is touched twice in the
    lifetime of each glock (but only used at umount time).

    Signed-off-by: Steven Whitehouse
    Cc: Paul E. McKenney

    Steven Whitehouse
     

30 Nov, 2010

2 commits

  • We can only merge the fields into a bitfield if the locking
    rules for them are the same. In this case gl_spin covers all
    of the fields (write side) but a couple of them are used
    with GLF_LOCK as the read side lock, which should be ok
    since we know that the field in question won't be changing
    at the time.

    The gl_req setting has to be done earlier (in glock.c) in order
    to place it under gl_spin. The gl_reply setting also has to be
    brought under gl_spin in order to comply with the new rules.

    This saves 4*sizeof(unsigned int) per glock.

    Signed-off-by: Steven Whitehouse
    Cc: Bob Peterson

    Steven Whitehouse
     
  • The DLM never returns -EAGAIN in response to dlm_lock(), and even
    if it did, the test in gdlm_lock() was wrong anyway. Once that
    test is removed, it is possible to greatly simplify this code
    by simply using a "normal" error return code (0 for success).

    We then no longer need the LM_OUT_ASYNC return code which can
    be removed.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

20 Sep, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

01 Mar, 2010

1 commit

  • Since the start of GFS2, an "extra" inode has been used to store
    the metadata belonging to each inode. The only reason for using
    this inode was to have an extra address space, the other fields
    were unused. This means that the memory usage was rather inefficient.

    The reason for keeping each inode's metadata in a separate address
    space is that when glocks are requested on remote nodes, we need to
    be able to efficiently locate the data and metadata which relating
    to that glock (inode) in order to sync or sync and invalidate it
    (depending on the remotely requested lock mode).

    This patch adds a new type of glock, which has in addition to
    its normal fields, has an address space. This applies to all
    inode and rgrp glocks (but to no other glock types which remain
    as before). As a result, we no longer need to have the second
    inode.

    This results in three major improvements:
    1. A saving of approx 25% of memory used in caching inodes
    2. A removal of the circular dependency between inodes and glocks
    3. No confusion between "normal" and "metadata" inodes in super.c

    Although the first of these is the more immediately apparent, the
    second is just as important as it now enables a number of clean
    ups at umount time. Those will be the subject of future patches.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

03 Feb, 2010

2 commits

  • Although all glocks are, by the time of the umount glock wait,
    scheduled for demotion, some of them haven't made it far
    enough through the process for the original set of waiting
    code to wait for them.

    This extends the ref count to the whole glock lifetime in order
    to ensure that the waiting does catch all glocks. It does make
    it a bit more invasive, but it seems the only sensible solution
    at the moment.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch adds a wait on umount between the point at which we
    dispose of all glocks and the point at which we unmount the
    lock protocol. This ensures that we've received all the replies
    to our unlock requests before we stop the locking.

    Signed-off-by: Steven Whitehouse
    Reported-by: Fabio M. Di Nitto

    Steven Whitehouse
     

24 Mar, 2009

2 commits

  • After calling out to the dlm, GFS2 sets the new state of a glock to
    gl_target in gdlm_ast(). However, gl_target is not always the lock
    state that was requested. If a conversion from shared to exclusive
    fails, finish_xmote() will call do_xmote() with LM_ST_UNLOCKED, instead
    of gl->gl_target, so that it can reacquire the lock in exlusive the next
    time around. In this case, setting the lock to gl_target in gdlm_ast()
    will make GFS2 think that it has the glock in exclusive mode, when
    really, it doesn't have the glock locked at all. This patch adds a new
    field to the gfs2_glock structure, gl_req, to track the mode that was
    requested.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • This is the big patch that I've been working on for some time
    now. There are many reasons for wanting to make this change
    such as:
    o Reducing overhead by eliminating duplicated fields between structures
    o Simplifcation of the code (reduces the code size by a fair bit)
    o The locking interface is now the DLM interface itself as proposed
    some time ago.
    o Fewer lookups of glocks when processing replies from the DLM
    o Fewer memory allocations/deallocations for each glock
    o Scope to do further optimisations in the future (but this patch is
    more than big enough for now!)

    Please note that (a) this patch relates to the lock_dlm module and
    not the DLM itself, that is still a separate module; and (b) that
    we retain the ability to build GFS2 as a standalone single node
    filesystem with out requiring the DLM.

    This patch needs a lot of testing, hence my keeping it I restarted
    my -git tree after the last merge window. That way, this has the maximum
    exposure before its merged. This is (modulo a few minor bug fixes) the
    same patch that I've been posting on and off the the last three months
    and its passed a number of different tests so far.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse