20 Apr, 2011

1 commit

  • This adds support for two new flags. One keeps track of whether
    the glock is on the LRU list or not. The other isn't really a
    flag as such, but an indication of whether the glock has an
    attached object or not. This indication is reported without
    any locking, which is ok since we do not dereference the object
    pointer but merely report whether it is NULL or not.

    Also, this fixes one place where a tracepoint was missing, which
    was at the point we remove deallocated blocks from the journal.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

25 Mar, 2011

1 commit

  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

14 Mar, 2011

1 commit


10 Mar, 2011

2 commits

  • With the plugging now being explicitly controlled by the
    submitter, callers need not pass down unplugging hints
    to the block layer. If they want to unplug, it's because they
    manually plugged on their own - in which case, they should just
    unplug at will.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Code has been converted over to the new explicit on-stack plugging,
    and delay users have been converted to use the new API for that.
    So lets kill off the old plugging along with aops->sync_page().

    Signed-off-by: Jens Axboe

    Jens Axboe
     

27 Oct, 2010

1 commit

  • This removes more dead code that was somehow missed by commit 0d99519efef
    (writeback: remove unused nonblocking and congestion checks). There are
    no behavior change except for the removal of two entries from one of the
    ext4 tracing interface.

    The nonblocking checks in ->writepages are no longer used because the
    flusher now prefer to block on get_request_wait() than to skip inodes on
    IO congestion. The latter will lead to more seeky IO.

    The nonblocking checks in ->writepage are no longer used because it's
    redundant with the WB_SYNC_NONE check.

    We no long set ->nonblocking in VM page out and page migration, because
    a) it's effectively redundant with WB_SYNC_NONE in current code
    b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
    that would skip some dirty inodes on congestion and page out others, which
    is unfair in terms of LRU age.

    Inspired by Christoph Hellwig. Thanks!

    Signed-off-by: Wu Fengguang
    Cc: Theodore Ts'o
    Cc: David Howells
    Cc: Sage Weil
    Cc: Steve French
    Cc: Chris Mason
    Cc: Jens Axboe
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     

08 Aug, 2010

1 commit

  • Remove the current bio flags and reuse the request flags for the bio, too.
    This allows to more easily trace the type of I/O from the filesystem
    down to the block driver. There were two flags in the bio that were
    missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
    renamed two request flags that had a superflous RW in them.

    Note that the flags are in bio.h despite having the REQ_ name - as
    blkdev.h includes bio.h that is the only way to go for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

12 May, 2010

1 commit


05 May, 2010

1 commit

  • This patch contains various tweaks to how log flushes and active item writeback
    work. gfs2_logd is now managed by a waitqueue, and gfs2_log_reseve now waits
    for gfs2_logd to do the log flushing. Multiple functions were rewritten to
    remove the need to call gfs2_log_lock(). Instead of using one test to see if
    gfs2_logd had work to do, there are now seperate tests to check if there
    are two many buffers in the incore log or if there are two many items on the
    active items list.

    This patch is a port of a patch Steve Whitehouse wrote about a year ago, with
    some minor changes. Since gfs2_ail1_start always submits all the active items,
    it no longer needs to keep track of the first ai submitted, so this has been
    removed. In gfs2_log_reserve(), the order of the calls to
    prepare_to_wait_exclusive() and wake_up() when firing off the logd thread has
    been switched. If it called wake_up first there was a small window for a race,
    where logd could run and return before gfs2_log_reserve was ready to get woken
    up. If gfs2_logd ran, but did not free up enough blocks, gfs2_log_reserve()
    would be left waiting for gfs2_logd to eventualy run because it timed out.
    Finally, gt_logd_secs, which controls how long to wait before gfs2_logd times
    out, and flushes the log, can now be set on mount with ar_commit.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

01 Mar, 2010

1 commit

  • Since the start of GFS2, an "extra" inode has been used to store
    the metadata belonging to each inode. The only reason for using
    this inode was to have an extra address space, the other fields
    were unused. This means that the memory usage was rather inefficient.

    The reason for keeping each inode's metadata in a separate address
    space is that when glocks are requested on remote nodes, we need to
    be able to efficiently locate the data and metadata which relating
    to that glock (inode) in order to sync or sync and invalidate it
    (depending on the remotely requested lock mode).

    This patch adds a new type of glock, which has in addition to
    its normal fields, has an address space. This applies to all
    inode and rgrp glocks (but to no other glock types which remain
    as before). As a result, we no longer need to have the second
    inode.

    This results in three major improvements:
    1. A saving of approx 25% of memory used in caching inodes
    2. A removal of the circular dependency between inodes and glocks
    3. No confusion between "normal" and "metadata" inodes in super.c

    Although the first of these is the more immediately apparent, the
    second is just as important as it now enables a number of clean
    ups at umount time. Those will be the subject of future patches.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

11 Jan, 2010

1 commit


22 May, 2009

1 commit

  • This patch renames the ops_*.c files which have no counterpart
    without the ops_ prefix in order to shorten the name and make
    it more readable. In addition, ops_address.h (which was very
    small) is moved into inode.h and inode.h is cleaned up by
    adding extern where required.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

11 May, 2009

2 commits

  • This adds a GFS2 specific writepage for metadata, rather than
    continuing to use the VFS function. As a result we now tag all
    our metadata I/O with the correct flag so that blktraces will
    now be less confusing.

    Also, the generic function was checking for a number of corner
    cases which cannot happen on the metadata address spaces so that
    this should be faster too.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • After Jens recent updates:
    http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a1f242524c3c1f5d40f1c9c343427e34d1aadd6e
    et al. this is a patch to bring gfs2 uptodate with the core
    code. Also I've managed to squash another call to ll_rw_block()
    along the way.

    There is still one part of the GFS2 I/O paths which are not correctly
    annotated and that is due to the sharing of the writeback code between
    the data and metadata address spaces. I would like to change that too,
    but this patch is still worth doing on its own, I think.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

24 Mar, 2009

2 commits

  • This cleans up a number of bits of code mostly based in glops.c.
    A couple of simple functions have been merged into the callers
    to make it more obvious what is going on, the mysterious raising
    of i_writecount around the truncate_inode_pages() call has been
    removed. The meta_go_* operations have been renamed rgrp_go_*
    since that is the only lock type that they are used with.

    The unused argument of gfs2_read_sb has been removed. Also
    a bug has been fixed where a check for the rindex inode was
    in the wrong callback. More comments are added, and the
    debugging code is improved too.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This is the big patch that I've been working on for some time
    now. There are many reasons for wanting to make this change
    such as:
    o Reducing overhead by eliminating duplicated fields between structures
    o Simplifcation of the code (reduces the code size by a fair bit)
    o The locking interface is now the DLM interface itself as proposed
    some time ago.
    o Fewer lookups of glocks when processing replies from the DLM
    o Fewer memory allocations/deallocations for each glock
    o Scope to do further optimisations in the future (but this patch is
    more than big enough for now!)

    Please note that (a) this patch relates to the lock_dlm module and
    not the DLM itself, that is still a separate module; and (b) that
    we retain the ability to build GFS2 as a standalone single node
    filesystem with out requiring the DLM.

    This patch needs a lot of testing, hence my keeping it I restarted
    my -git tree after the last merge window. That way, this has the maximum
    exposure before its merged. This is (modulo a few minor bug fixes) the
    same patch that I've been posting on and off the the last three months
    and its passed a number of different tests so far.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

27 Jun, 2008

1 commit

  • This patch implements a number of cleanups to the core of the
    GFS2 glock code. As a result a lot of code is removed. It looks
    like a really big change, but actually a large part of this patch
    is either removing or moving existing code.

    There are some new bits too though, such as the new run_queue()
    function which is considerably streamlined. Highlights of this
    patch include:

    o Fixes a cluster coherency bug during SH -> EX lock conversions
    o Removes the "glmutex" code in favour of a single bit lock
    o Removes the ->go_xmote_bh() for inodes since it was duplicating
    ->go_lock()
    o We now only use the ->lm_lock() function for both locks and
    unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED)
    o The fast path is considerably shortly, giving performance gains
    especially with lock_nolock
    o The glock_workqueue is now used for all the callbacks from the DLM
    which allows us to simplify the lock_dlm module (see following patch)
    o The way is now open to make further changes such as eliminating the two
    threads (gfs2_glockd and gfs2_scand) in favour of a more efficient
    scheme.

    This patch has undergone extensive testing with various test suites
    so it should be pretty stable by now.

    Signed-off-by: Steven Whitehouse
    Cc: Bob Peterson

    Steven Whitehouse
     

12 May, 2008

1 commit

  • This patch fixes a GFS2 filesystem consistency error reported from
    function do_strip. The problem was caused by a timing window
    that allowed two vfs inodes to be created in memory that point
    to the same file. The problem is fixed by making the vfs's
    iget_test, iget_set mechanism check and set a new bit in the
    in-core gfs2_inode structure while the vfs inode spin_lock is held.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

25 Jan, 2008

4 commits

  • This patch fixes a minor typo. Surprisingly, it still compiled.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • This patch optimizes function gfs2_meta_read. Basically, gfs2_meta_wait
    was being called regardless of whether a disk read was requested.
    This just pulls that wait into the if that triggers the read.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • This set of address space operations was missing a sync_page
    operation.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The i_cache was designed to keep references to the indirect blocks
    used during block mapping so that they didn't have to be looked
    up continually. The idea failed because there are too many places
    where the i_cache needs to be freed, and this has in the past been
    the cause of many bugs.

    In addition there was no performance benefit being gained since the
    disk blocks in question were cached anyway. So this patch removes
    it in order to simplify the code to prepare for other changes which
    would otherwise have had to add further support for this feature.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

10 Oct, 2007

4 commits

  • * GFS2 has been using i_cache array to store its indirect meta blocks.
    Its flush routine doesn't correctly clean up all the entries. The
    problem would show while multiple nodes do simultaneous writes to the
    same file. Upon glock exclusive lock transfer, if the file is a sparse
    file with large file size where the indirect meta blocks span multiple
    array entries with "zero" entries in between. The flush routine
    prematurely stops the flushing that leaves old (stale) entries around.
    This leads to several nasty issues, including data corruption.
    * Fix gfs2_get_block_noalloc checking to correctly return EIO upon
    unmapped buffer.

    Signed-off-by: Wendy Cheng
    Signed-off-by: Steven Whitehouse

    Wendy Cheng
     
  • This patch cleans up the code for writing journaled data into the log.
    It also removes the need to allocate a small "tag" structure for each
    block written into the log. Instead we just keep count of the outstanding
    I/O so that we can be sure that its all been written at the correct time.
    Another result of this patch is that a number of ll_rw_block() calls
    have become submit_bh() calls, closing some races at the same time.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The following alters gfs2_trans_add_revoke() to take a struct
    gfs2_bufdata as an argument. This eliminates the memory allocation which
    was previously required by making use of the already existing struct
    gfs2_bufdata. It makes some sanity checks to ensure that the
    gfs2_bufdata has been removed from all the lists before its recycled as
    a revoke structure. This saves one memory allocation and one free per
    revoke structure.

    Also as a result, and to simplify the locking, since there is no longer
    any blocking code in gfs2_trans_add_revoke() we must hold the log lock
    whenever this function is called. This reduces the amount of times we
    take and unlock the log lock.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • gfs2_pin and gfs2_unpin are only used in lops.c, despite being
    defined in meta_io.c, so this patch moves them into lops.c and
    makes them static. At the same time, its possible to clean up
    the locking in the buf and databuf _lo_add() functions so that
    we only need to grab the spinlock once. Also we have to move
    lock_buffer() around the _lo_add() functions since we can't
    do that in gfs2_pin() any more since we hold the spinlock
    for the duration of that function.

    As a result, the code shrinks by 12 lines and we do far fewer
    operations when adding buffers to the log. It also makes the
    code somewhat easier to read & understand.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

09 Jul, 2007

1 commit

  • This patch passes all my nasty tests that were causing the code to
    fail under one circumstance or another. Here is a complete summary
    of all changes from today's git tree, in order of appearance:

    1. There are now separate variables for metadata buffer accounting.
    2. Variable sd_log_num_hdrs is no longer needed, since the header
    accounting is taken care of by the reserve/refund sequence.
    3. Fixed a tiny grammatical problem in a comment.
    4. Added a new function "calc_reserved" to calculate the reserved
    log space. This isn't entirely necessary, but it has two benefits:
    First, it simplifies the gfs2_log_refund function greatly.
    Second, it allows for easier debugging because I could sprinkle the
    code with calls to this function to make sure the accounting is
    proper (by adding asserts and printks) at strategic point of the code.
    5. In log_pull_tail there apparently was a kludge to fix up the
    accounting based on a "pull" parameter. The buffer accounting is
    now done properly, so the kludge was removed.
    6. File sync operations were making a call to gfs2_log_flush that
    writes another journal header. Since that header was unplanned
    for (reserved) by the reserve/refund sequence, the free space had
    to be decremented so that when log_pull_tail gets called, the free
    space is be adjusted properly. (Did I hear you call that a kludge?
    well, maybe, but a lot more justifiable than the one I removed).
    7. In the gfs2_log_shutdown code, it optionally syncs the log by
    specifying the PULL parameter to log_write_header. I'm not sure
    this is necessary anymore. It just seems to me there could be
    cases where shutdown is called while there are outstanding log
    buffers.
    8. In the (data)buf_lo_before_commit functions, I changed some offset
    values from being calculated on the fly to being constants. That
    simplified some code and we might as well let the compiler do the
    calculation once rather than redoing those cycles at run time.
    9. This version has my rewritten databuf_lo_add function.
    This version is much more like its predecessor, buf_lo_add, which
    makes it easier to understand. Again, this might not be necessary,
    but it seems as if this one works as well as the previous one,
    maybe even better, so I decided to leave it in.
    10. In databuf_lo_before_commit, a previous data corruption problem
    was caused by going off the end of the buffer. The proper solution
    is to have the proper limit in place, rather than stopping earlier.
    (Thus my previous attempt to fix it is wrong).
    If you don't wrap the buffer, you're stopping too early and that
    causes more log buffer accounting problems.
    11. In lops.h there are two new (previously mentioned) constants for
    figuring out the data offset for the journal buffers.
    12. There are also two new functions, buf_limit and databuf_limit to
    calculate how many entries will fit in the buffer.
    13. In function gfs2_meta_wipe, it needs to distinguish between pinned
    metadata buffers and journaled data buffers for proper journal buffer
    accounting. It can't use the JDATA gfs2_inode flag because it's
    sometimes passed the "real" inode and sometimes the "metadata
    inode" and the inode flags will be random bits in a metadata
    gfs2_inode. It needs to base its decision on which was passed in.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Robert Peterson
     

12 Feb, 2007

1 commit

  • Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
    corresponding "kmem_cache_zalloc()" call.

    Signed-off-by: Robert P. J. Day
    Cc: "Luck, Tony"
    Cc: Andi Kleen
    Cc: Roland McGrath
    Cc: James Bottomley
    Cc: Greg KH
    Acked-by: Joel Becker
    Cc: Steven Whitehouse
    Cc: Jan Kara
    Cc: Michael Halcrow
    Cc: "David S. Miller"
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     

30 Nov, 2006

3 commits

  • Since the superblock and the address_space are determined by the
    glock, we might as well just pass that as the argument since all
    the callers already have that available.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • By moving gfs2_meta_syncfs() into log.c, gfs2_ail1_start()
    can be made static.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This fixes a bug which resulted in poor performance due to flushing
    the journal too often. The code path in question was via the inode_go_sync()
    function in glops.c. The solution is not to flush the journal immediately
    when inodes are ejected from memory, but batch up the work for glockd to
    deal with later on. This means that glocks may now live on beyond the end of
    the lifetime of their inodes (but not very much longer in the normal case).

    Also fixed in this patch is a bug (which was hidden by the bug mentioned above) in
    calculation of the number of free journal blocks.

    The gfs2_logd process has been altered to be more responsive to the journal
    filling up. We now wake it up when the number of uncommitted journal blocks
    has reached the threshold level rather than trying to flush directly at the
    end of each transaction. This again means doing fewer, but larger, log
    flushes in general.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

03 Oct, 2006

1 commit


02 Oct, 2006

1 commit


28 Sep, 2006

1 commit

  • The following patches reduce the size of the VFS inode structure by 28 bytes
    on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction
    in the inode size on a UP kernel that is configured in a production mode
    (i.e., with no spinlock or other debugging functions enabled; if you want to
    save memory taken up by in-core inodes, the first thing you should do is
    disable the debugging options; they are responsible for a huge amount of bloat
    in the VFS inode structure).

    This patch:

    The filesystem or device-specific pointer in the inode is inside a union,
    which is pretty pointless given that all 30+ users of this field have been
    using the void pointer. Get rid of the union and rename it to i_private, with
    a comment to explain who is allowed to use the void pointer. This is just a
    cleanup, but it allows us to reuse the union 'u' for something something where
    the union will actually be used.

    Signed-off-by: "Theodore Ts'o"
    Cc: Steven Whitehouse
    Signed-off-by: Andrew Morton

    Theodore Ts'o
     

22 Sep, 2006

1 commit

  • Fix a bug in the directory reading code, where we might have dereferenced
    a NULL pointer in case of OOM. Updated the directory code to use the new
    & improved version of gfs2_meta_ra() which now returns the first block
    that was being read. Previously it was releasing it requiring following
    code to grab the block again at each point it was called.

    Also turned off readahead on directory lookups since we are reading a
    hash table, and therefore reading the entries in order is very
    unlikely. Readahead is still used for all other calls to the
    directory reading function (e.g. when growing the hash table).

    Removed the DIO_START constant. Everywhere this was used, it was
    used to unconditionally start i/o aside from a couple of places, so
    I've removed it and made the couple of exceptions to this rule into
    separate functions.

    Also hunted through the other DIO flags and removed them as arguments
    from functions which were always called with the same combination of
    arguments.

    Updated gfs2_meta_indirect_buffer to be a bit more efficient and
    hopefully also be a bit easier to read.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

21 Sep, 2006

1 commit


19 Sep, 2006

1 commit

  • lm_interface.h has a few out of the tree clients such as GFS1
    and userland tools.

    Right now, these clients keeps a copy of the file in their build tree
    that can go out of sync.

    Move lm_interface.h to include/linux, export it to userland and
    clean up fs/gfs2 to use the new location.

    Signed-off-by: Fabio M. Di Nitto
    Signed-off-by: Steven Whitehouse

    Fabio Massimo Di Nitto
     

05 Sep, 2006

3 commits