02 Nov, 2011

1 commit


21 Oct, 2011

5 commits

  • Unfortunately, it is not enough to just ignore locked buffers during
    the AIL flush from fsync. We need to be able to ignore all buffers
    which are locked, dirty or pinned at this stage as they might have
    been added subsequent to the log flush earlier in the fsync function.

    In addition, this means that we no longer need to rely on i_mutex to
    keep out writes during fsync, so we can, as a side-effect, remove
    that protection too.

    Signed-off-by: Steven Whitehouse
    Tested-By: Abhijith Das

    Steven Whitehouse
     
  • Since we have ruled out supporting online filesystem shrink,
    it is possible to make the resource group list append only
    during the life of a super block. This gives several benefits:

    Firstly, we only need to read new rindex elements as they are added
    rather than needing to reread the whole rindex file each time one
    element is added.

    Secondly, the rindex glock can be held for much shorter periods of
    time, and is completely removed from the fast path for allocations.
    The lock is taken in shared mode only when updating the resource
    groups when the first allocation occurs, and after a grow has
    taken place.

    Thirdly, this results in a reduction in code size, and everything
    gets a lot simpler to understand in this area.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Here is an update of Bob's original rbtree patch which, in addition, also
    resolves the rather strange ref counting that was being done relating to
    the bitmap blocks.

    Originally we had a dual system for journaling resource groups. The metadata
    blocks were journaled and also the rgrp itself was added to a list. The reason
    for adding the rgrp to the list in the journal was so that the "repolish
    clones" code could be run to update the free space, and potentially send any
    discard requests when the log was flushed. This was done by comparing the
    "cloned" bitmap with what had been written back on disk during the transaction
    commit.

    Due to this, there was a requirement to hang on to the rgrps' bitmap buffers
    until the journal had been flushed. For that reason, there was a rather
    complicated set up in the ->go_lock ->go_unlock functions for rgrps involving
    both a mutex and a spinlock (the ->sd_rindex_spin) to maintain a reference
    count on the buffers.

    However, the journal maintains a reference count on the buffers anyway, since
    they are being journaled as metadata buffers. So by moving the code which deals
    with the post-journal accounting for bitmap blocks to the metadata journaling
    code, we can entirely dispense with the rather strange buffer ref counting
    scheme and also the requirement to journal the rgrps.

    The net result of all this is that the ->sd_rindex_spin is left to do exactly
    one job, and that is to look after the rbtree or rgrps.

    This patch is designed to be a stepping stone towards using RCU for the rbtree
    of resource groups, however the reduction in the number of uses of the
    ->sd_rindex_spin is likely to have benefits for multi-threaded workloads,
    anyway.

    The patch retains ->go_lock and ->go_unlock for rgrps, however these maybe also
    be removed in future in favour of calling the functions directly where required
    in the code. That will allow locking of resource groups without needing to
    actually read them in - something that could be useful in speeding up statfs.

    In the mean time though it is valid to dereference ->bi_bh only when the rgrp
    is locked. This is basically the same rule as before, modulo the references not
    being valid until the following journal flush.

    Signed-off-by: Steven Whitehouse
    Signed-off-by: Bob Peterson
    Cc: Benjamin Marzinski

    Bob Peterson
     
  • Journaled data requires that a complete flush of all dirty data for
    the file is done, in order that the ail flush which comes after
    will succeed.

    Also the recently enhanced bug trap can trigger falsely in case
    an ail flush from fsync races with a page read. This updates the
    bug trap such that it will ignore buffers which are locked and
    only trigger on dirty and/or pinned buffers when the ail flush
    is run from fsync. The original bug trap is retained when ail
    flush is run from ->go_sync()

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The assert was being tested under the wrong lock, a
    legacy of the original code. Also, if it does trigger,
    the resulting information was not always a lot of help.

    This moves the patch under the correct lock and also
    prints out more useful information in tacking down the
    source of the problem.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

15 Jul, 2011

3 commits

  • This adds S_NOSEC support to GFS2. We set/reset the flag either when
    a user calls setattr or when we have just regained the glock
    from another node. The flag is only set if there are no xattrs
    on the inode and there is no suid bit set.

    Signed-off-by: Steven Whitehouse
    Reviewed-by: Andi Kleen
    Cc: Al Viro

    Steven Whitehouse
     
  • This patch is a performance improvement for GFS2 in a clustered
    environment. It makes the glock hold time self-adjusting.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • This patch adds a cache for the hash table to the directory code
    in order to help simplify the way in which the hash table is
    accessed. This is intended to be a first step towards introducing
    some performance improvements in the directory code.

    There are two follow ups that I'm hoping to see fairly shortly. One
    is to simplify the hash table reading code now that we always read the
    complete hash table, whether we want one entry or all of them. The
    other is to introduce readahead on the heads of the hash chains
    which are referred to from the table.

    The hash table is a maximum of 128k in size, so it is not worth trying
    to read it in small chunks.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

14 Jul, 2011

1 commit

  • This patch contains a few misc fixes which resolve a recently
    reported issue. This patch has been a real team effort and has
    received a lot of testing.

    The first issue is that the ail lock needs to be held over a few
    more operations. The lock thats added into gfs2_releasepage() may
    possibly be a candidate for replacing with RCU at some future
    point, but at this stage we've gone for the obvious fix.

    The second issue is that gfs2_write_inode() can end up calling
    a glock recursively when called from gfs2_evict_inode() via the
    syncing code, so it needs a guard added.

    The third issue is that we either need to not truncate the metadata
    pages of inodes which have zero link count, but which we cannot
    deallocate due to them still being in use by other nodes, or we need
    to ensure that those pages have all made it through the journal and
    ail lists first. This patch takes the former approach, but the
    latter has also been tested and there is nothing to choose between
    them performance-wise. So again, we could revise that decision
    in the future.

    Also, the inode eviction process is now better documented.

    Signed-off-by: Steven Whitehouse
    Tested-by: Bob Peterson
    Tested-by: Abhijith Das
    Reported-by: Barry J. Marson
    Reported-by: David Teigland

    Steven Whitehouse
     

12 Jul, 2011

1 commit

  • Right now, there is nothing that forces the log to get flushed when a node
    drops its rindex glock so that another node can grow the filesystem. If the
    log doesn't get flushed, GFS2 can corrupt the sd_log_le_rg list in the
    following way.

    A node puts an rgd on the list in rg_lo_add(), and then the rindex glock is
    dropped so the other node can grow the filesystem. When the node reacquires the
    rindex glock, that rgd gets deleted in clear_rgrpdi() before ever being
    removed from the list by gfs2_log_flush().

    This code simply forces a log flush when the rindex glock is invalidated,
    solving the problem.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

09 May, 2011

1 commit


20 Apr, 2011

1 commit

  • This patch is designed to clean up GFS2's fsync
    implementation and ensure that it really does get everything on
    disk. Since ->write_inode() has been updated, we can call that
    via the vfs library function sync_inode_metadata() and the only
    remaining thing that has to be done is to ensure that we get
    any revoke records in the log after the inode has been written back.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

18 Apr, 2011

1 commit


11 Mar, 2011

1 commit

  • The log lock is currently used to protect the AIL lists and
    the movements of buffers into and out of them. The lists
    are self contained and no log specific items outside the
    lists are accessed when starting or emptying the AIL lists.

    Hence the operation of the AIL does not require the protection
    of the log lock so split them out into a new AIL specific lock
    to reduce the amount of traffic on the log lock. This will
    also reduce the amount of serialisation that occurs when
    the gfs2_logd pushes on the AIL to move it forward.

    This reduces the impact of log pushing on sequential write
    throughput.

    Signed-off-by: Dave Chinner
    Signed-off-by: Steven Whitehouse

    Dave Chinner
     

21 Jan, 2011

1 commit

  • This has a number of advantages:

    - Reduces contention on the hash table lock
    - Makes the code smaller and simpler
    - Should speed up glock dumps when under load
    - Removes ref count changing in examine_bucket
    - No longer need hash chain lock in glock_put() in common case

    There are some further changes which this enables and which
    we may do in the future. One is to look at using SLAB_RCU,
    and another is to look at using a per-cpu counter for the
    per-sb glock counter, since that is touched twice in the
    lifetime of each glock (but only used at umount time).

    Signed-off-by: Steven Whitehouse
    Cc: Paul E. McKenney

    Steven Whitehouse
     

16 Dec, 2010

1 commit

  • There is no requirement to flush the delete workqueue before a
    gfs2 filesystem is suspended. The workqueue's work will just
    be suspended along with the rest of the tasks on the filesystem.

    The resolves a deadlock situation where the transaction lock's
    demotion code was trying to flush the delete workqueue while at
    the same time, the workqueue was waiting for the transaction
    lock.

    The delete workqueue is flushed by gfs2_make_fs_ro() already, so
    that umount/remount are correctly protected anyway.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

06 Oct, 2010

1 commit


20 Sep, 2010

1 commit

  • With the update of the truncate code, ip->i_disksize and
    inode->i_size are merely copies of each other. This means
    we can remove ip->i_disksize and use inode->i_size exclusively
    reducing the size of a GFS2 inode by 8 bytes.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

01 Mar, 2010

1 commit

  • Since the start of GFS2, an "extra" inode has been used to store
    the metadata belonging to each inode. The only reason for using
    this inode was to have an extra address space, the other fields
    were unused. This means that the memory usage was rather inefficient.

    The reason for keeping each inode's metadata in a separate address
    space is that when glocks are requested on remote nodes, we need to
    be able to efficiently locate the data and metadata which relating
    to that glock (inode) in order to sync or sync and invalidate it
    (depending on the remotely requested lock mode).

    This patch adds a new type of glock, which has in addition to
    its normal fields, has an address space. This applies to all
    inode and rgrp glocks (but to no other glock types which remain
    as before). As a result, we no longer need to have the second
    inode.

    This results in three major improvements:
    1. A saving of approx 25% of memory used in caching inodes
    2. A removal of the circular dependency between inodes and glocks
    3. No confusion between "normal" and "metadata" inodes in super.c

    Although the first of these is the more immediately apparent, the
    second is just as important as it now enables a number of clean
    ups at umount time. Those will be the subject of future patches.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

03 Dec, 2009

1 commit


30 Jul, 2009

1 commit

  • When a file is deleted from a gfs2 filesystem on one node, a dcache
    entry for it may still exist on other nodes in the cluster. If this
    happens, gfs2 will be unable to free this file on disk. Because of this,
    it's possible to have a gfs2 filesystem with no files on it and no free
    space. With this patch, when a node receives a callback notifying it
    that the file is being deleted on another node, it schedules a new
    workqueue thread to remove the file's dcache entry.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

20 May, 2009

1 commit

  • This patch improves the error handling in the case where we
    discover that the summary information in the resource group
    doesn't match the bitmap information while in the process of
    allocating blocks. Originally this resulted in a kernel bug,
    but this patch changes that so that we return -EIO and print
    some messages explaining what went wrong, and how to fix it.

    We also remember locally not to try and allocate from the
    same rgrp again, so that a subsequent allocation in a
    different rgrp should succeed.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

20 Apr, 2009

1 commit


24 Mar, 2009

4 commits

  • This cleans up a number of bits of code mostly based in glops.c.
    A couple of simple functions have been merged into the callers
    to make it more obvious what is going on, the mysterious raising
    of i_writecount around the truncate_inode_pages() call has been
    removed. The meta_go_* operations have been renamed rgrp_go_*
    since that is the only lock type that they are used with.

    The unused argument of gfs2_read_sb has been removed. Also
    a bug has been fixed where a check for the rindex inode was
    in the wrong callback. More comments are added, and the
    debugging code is improved too.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This adds a sysfs file called demote_rq to GFS2's
    per filesystem directory. Its possible to use this
    file to demote arbitrary glocks in exactly the same
    way as if a request had come in from a remote node.

    This is intended for testing issues relating to caching
    of data under glocks. Despite that, the interface is
    generic enough to send requests to any type of glock,
    but be careful as its not always safe to send an
    arbitrary message to an arbitrary glock. For that reason
    and to prevent DoS, this interface is restricted to root
    only.

    The messages look like this:

    :

    Example:

    echo -n "2:13324 EX" >/sys/fs/gfs2/unity:myfs/demote_rq

    Which means "please demote inode glock (type 2) number 13324 so that
    I can get an EX (exclusive) lock". The lock modes are those which
    would normally be sent by a remote node in its callback so if you
    want to unlock a glock, you use EX, to demote to shared, use SH or PR
    (depending on whether you like GFS2 or DLM lock modes better!).

    If the glock doesn't exist, you'll get -ENOENT returned. If the
    arguments don't make sense, you'll get -EINVAL returned.

    The plan is that this interface will be used in combination with
    the blktrace patch which I recently posted for comments although
    it is, of course, still useful in its own right.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch fixes a deadlock when the journal is flushed and there
    are dirty inodes other than the one which caused the journal flush.
    Originally the journal flushing code was trying to obtain the
    transaction glock while running the flush code for an inode glock.
    We no longer require the transaction glock at this point in time
    since we know that any attempt to get the transaction glock from
    another node will result in a journal flush. So if we are flushing
    the journal, we can be sure that the transaction lock is still
    cached from when the transaction was started.

    By inlining a version of gfs2_trans_begin() (minus the bit which
    gets the transaction glock) we can avoid the deadlock problems
    caused if there is a demote request queued up on the transaction
    glock.

    In addition I've also moved the umount rwsem so that it covers
    the glock workqueue, since it all demotions are done by this
    workqueue now. That fixes a bug on umount which I came across
    while fixing the original problem.

    Reported-by: David Teigland
    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This is the big patch that I've been working on for some time
    now. There are many reasons for wanting to make this change
    such as:
    o Reducing overhead by eliminating duplicated fields between structures
    o Simplifcation of the code (reduces the code size by a fair bit)
    o The locking interface is now the DLM interface itself as proposed
    some time ago.
    o Fewer lookups of glocks when processing replies from the DLM
    o Fewer memory allocations/deallocations for each glock
    o Scope to do further optimisations in the future (but this patch is
    more than big enough for now!)

    Please note that (a) this patch relates to the lock_dlm module and
    not the DLM itself, that is still a separate module; and (b) that
    we retain the ability to build GFS2 as a standalone single node
    filesystem with out requiring the DLM.

    This patch needs a lot of testing, hence my keeping it I restarted
    my -git tree after the last merge window. That way, this has the maximum
    exposure before its merged. This is (modulo a few minor bug fixes) the
    same patch that I've been posting on and off the the last three months
    and its passed a number of different tests so far.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

05 Jan, 2009

4 commits

  • This patch removes the two daemons, gfs2_scand and gfs2_glockd
    and replaces them with a shrinker which is called from the VM.

    The net result is that GFS2 responds better when there is memory
    pressure, since it shrinks the glock cache at the same rate
    as the VFS shrinks the dcache and icache. There are no longer
    any time based criteria for shrinking glocks, they are kept
    until such time as the VM asks for more memory and then we
    demote just as many glocks as required.

    There are potential future changes to this code, including the
    possibility of sorting the glocks which are to be written back
    into inode number order, to get a better I/O ordering. It would
    be very useful to have an elevator based workqueue implementation
    for this, as that would automatically deal with the read I/O cases
    at the same time.

    This patch is my answer to Andrew Morton's remark, made during
    the initial review of GFS2, asking why GFS2 needs so many kernel
    threads, the answer being that it doesn't :-) This patch is a
    net loss of about 200 lines of code.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Following on from the recent clean up of gfs2_quotad, this patch moves
    the processing of "truncate in progress" inodes from the glock workqueue
    into gfs2_quotad. This fixes a hang due to the "truncate in progress"
    processing requiring glocks in order to complete.

    It might seem odd to use gfs2_quotad for this particular item, but
    we have to use a pre-existing thread since creating a thread implies
    a GFP_KERNEL memory allocation which is not allowed from the glock
    workqueue context. Of the existing threads, gfs2_logd and gfs2_recoverd
    may deadlock if used for this operation. gfs2_scand and gfs2_glockd are
    both scheduled for removal at some (hopefully not too distant) future
    point. That leaves only gfs2_quotad whose workload is generally fairly
    light and is easily adapted for this extra task.

    Also, as a result of this change, it opens the way for a future patch to
    make the reading of the inode's information asynchronous with respect to
    the glock workqueue, which is another improvement that has been on the list
    for some time now.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Although the glock dumps print quite a lot of information about
    the glocks themselves, there are more things which can be
    usefully added to the dump realting to the objects themselves.

    This patch adds a few more fields to the inode and resource
    group lines, which should be useful for debugging.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The final field in gfs2_dinode_host was the i_flags field. Thats
    renamed to i_diskflags in order to avoid confusion with the existing
    inode flags, and moved into the inode proper at a suitable location
    to avoid creating a "hole".

    At that point struct gfs2_dinode_host is no longer needed and as
    promised (quite some time ago!) it can now be removed completely.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

27 Jun, 2008

1 commit

  • This patch implements a number of cleanups to the core of the
    GFS2 glock code. As a result a lot of code is removed. It looks
    like a really big change, but actually a large part of this patch
    is either removing or moving existing code.

    There are some new bits too though, such as the new run_queue()
    function which is considerably streamlined. Highlights of this
    patch include:

    o Fixes a cluster coherency bug during SH -> EX lock conversions
    o Removes the "glmutex" code in favour of a single bit lock
    o Removes the ->go_xmote_bh() for inodes since it was duplicating
    ->go_lock()
    o We now only use the ->lm_lock() function for both locks and
    unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED)
    o The fast path is considerably shortly, giving performance gains
    especially with lock_nolock
    o The glock_workqueue is now used for all the callbacks from the DLM
    which allows us to simplify the lock_dlm module (see following patch)
    o The way is now open to make further changes such as eliminating the two
    threads (gfs2_glockd and gfs2_scand) in favour of a more efficient
    scheme.

    This patch has undergone extensive testing with various test suites
    so it should be pretty stable by now.

    Signed-off-by: Steven Whitehouse
    Cc: Bob Peterson

    Steven Whitehouse
     

12 May, 2008

1 commit

  • This patch fixes a GFS2 filesystem consistency error reported from
    function do_strip. The problem was caused by a timing window
    that allowed two vfs inodes to be created in memory that point
    to the same file. The problem is fixed by making the vfs's
    iget_test, iget_set mechanism check and set a new bit in the
    in-core gfs2_inode structure while the vfs inode spin_lock is held.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

31 Mar, 2008

1 commit


25 Jan, 2008

3 commits

  • Previously we were doing (write data, wait for data, write metadata, wait
    for metadata). After this patch we so (write metadata, write data, wait for
    data, wait for metadata) which should be more efficient.

    Also I noticed that the drop_bh and xmote_bh functions were almost
    identical. In fact the only difference was a single test, and that
    test is such that in the drop_bh case, it would always evaluate to
    the correct result. As such we can use the xmote_bh functions in
    all the places where we were using the drop_bh function and remove
    the drop_bh functions.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The i_cache was designed to keep references to the indirect blocks
    used during block mapping so that they didn't have to be looked
    up continually. The idea failed because there are too many places
    where the i_cache needs to be freed, and this has in the past been
    the cause of many bugs.

    In addition there was no performance benefit being gained since the
    disk blocks in question were cached anyway. So this patch removes
    it in order to simplify the code to prepare for other changes which
    would otherwise have had to add further support for this feature.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This cleans up the mmap() code path for GFS2 by implementing the
    page_mkwrite function for GFS2. We are thus able to use the
    generic filemap_fault function for our ->fault() implementation.

    This now means that shared writable mappings will be much more
    efficiently shared across the cluster if there is a reasonable
    proportion of read activity (the greater proportion, the better).

    As a side effect, it also reduces the size of the code, removes
    special cases from readpage and readpages, and makes the code
    path easier to follow.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

10 Oct, 2007

1 commit

  • The following alters gfs2_trans_add_revoke() to take a struct
    gfs2_bufdata as an argument. This eliminates the memory allocation which
    was previously required by making use of the already existing struct
    gfs2_bufdata. It makes some sanity checks to ensure that the
    gfs2_bufdata has been removed from all the lists before its recycled as
    a revoke structure. This saves one memory allocation and one free per
    revoke structure.

    Also as a result, and to simplify the locking, since there is no longer
    any blocking code in gfs2_trans_add_revoke() we must hold the log lock
    whenever this function is called. This reduces the amount of times we
    take and unlock the log lock.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse