08 Apr, 2013

1 commit

  • In order to allow transactions and log flushes to happen at the same
    time, gfs2 needs to move the transaction accounting and active items
    list code into the gfs2_trans structure. As a first step toward this,
    this patch removes the gfs2_ail structure, and handles the active items
    list in the gfs_trans structure. This keeps gfs2 from allocating an ail
    structure on log flushes, and gives us a struture that can later be used
    to store the transaction accounting outside of the gfs2 superblock
    structure.

    With this patch, at the end of a transaction, gfs2 will add the
    gfs2_trans structure to the superblock if there is not one already.
    This structure now has the active items fields that were previously in
    gfs2_ail. This is not necessary in the case where the transaction was
    simply used to add revokes, since these are never written outside of the
    journal, and thus, don't need an active items list.

    Also, in order to make sure that the transaction structure is not
    removed while it's still in use by gfs2_trans_end, unlocking the
    sd_log_flush_lock has to happen slightly later in ending the
    transaction.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

29 Jan, 2013

1 commit

  • Instead of using a list of buffers to write ahead of the journal
    flush, this now uses a list of inodes and calls ->writepages
    via filemap_fdatawrite() in order to achieve the same thing. For
    most use cases this results in a shorter ordered write list,
    as well as much larger i/os being issued.

    The ordered write list is sorted by inode number before writing
    in order to retain the disk block ordering between inodes as
    per the previous code.

    The previous ordered write code used to conflict in its assumptions
    about how to write out the disk blocks with mpage_writepages()
    so that with this updated version we can also use mpage_writepages()
    for GFS2's ordered write, writepages implementation. So we will
    also send larger i/os from writeback too.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

02 May, 2012

1 commit

  • This patch eliminates the gfs2_log_element data structure and
    rolls its two components into the gfs2_bufdata. This makes the code
    easier to understand and makes it easier to migrate to a rbtree
    to keep the list sorted.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

24 Apr, 2012

4 commits

  • This patch removes a log lock from around atomic operation where
    it is not needed, removes an unused variable, and also changes
    a void pointer used incorrectly to a struct page pointer.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This is another clean up in the logging code. This per-transaction
    list was largely unused. Its main function was to ensure that the
    number of buffers in a transaction was correct, however that counter
    was only used to check the number of buffers in the bd_list_tr, plus
    an assert at the end of each transaction. With the assert now changed
    to use the calculated buffer counts, we can remove both bd_list_tr and
    its associated counter.

    This should make the code easier to understand as well as shrinking
    a couple of structures.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Prior to this patch, we have two ways of sending i/o to the log.
    One of those is used when we need to allocate both the data
    to be written itself and also a buffer head to submit it. This
    is done via sb_getblk and friends. This is used mostly for writing
    log headers.

    The other method is used when writing blocks which have some
    in-place counterpart. This is the case for all the metadata
    blocks which are journalled, and when journaled data is in use,
    for unescaped journalled data blocks.

    This patch replaces both of those two methods, and about half
    a dozen separate i/o submission points with a single i/o
    submission function. We also go direct to bio rather than
    using buffer heads, since this allows us to build i/o
    requests of the maximum size for the block device in
    question. It also reduces the memory required for flushing
    the log, which can be very useful in low memory situations.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The "pull" argument to log_write_header() is only used
    for debug purposes and it is not really needed any more. There
    are other tests for this particular problem, so I think we can
    dispose of it in order to simplify the code.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

09 Mar, 2012

1 commit

  • We already send both a pre and post flush to the block device
    when writing a journal header. There is no need to wait for
    the previous I/O specifically when we do this, unless we've
    turned "barriers" off.

    As a side effect, this also cleans up the code path for flushing
    the journal and makes it more readable.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

29 Feb, 2012

3 commits


09 Jan, 2012

1 commit

  • * 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits)
    PM / Hibernate: Implement compat_ioctl for /dev/snapshot
    PM / Freezer: fix return value of freezable_schedule_timeout_killable()
    PM / shmobile: Allow the A4R domain to be turned off at run time
    PM / input / touchscreen: Make st1232 use device PM QoS constraints
    PM / QoS: Introduce dev_pm_qos_add_ancestor_request()
    PM / shmobile: Remove the stay_on flag from SH7372's PM domains
    PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume
    PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode
    PM: Drop generic_subsys_pm_ops
    PM / Sleep: Remove forward-only callbacks from AMBA bus type
    PM / Sleep: Remove forward-only callbacks from platform bus type
    PM: Run the driver callback directly if the subsystem one is not there
    PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
    PM/Devfreq: Add Exynos4-bus device DVFS driver for Exynos4210/4212/4412.
    PM / Sleep: Merge internal functions in generic_ops.c
    PM / Sleep: Simplify generic system suspend callbacks
    PM / Hibernate: Remove deprecated hibernation snapshot ioctls
    PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
    ARM: S3C64XX: Implement basic power domain support
    PM / shmobile: Use common always on power domain governor
    ...

    Fix up trivial conflict in fs/xfs/xfs_buf.c due to removal of unused
    XBT_FORCE_SLEEP bit

    Linus Torvalds
     

22 Nov, 2011

1 commit

  • There is no reason to export two functions for entering the
    refrigerator. Calling refrigerator() instead of try_to_freeze()
    doesn't save anything noticeable or removes any race condition.

    * Rename refrigerator() to __refrigerator() and make it return bool
    indicating whether it scheduled out for freezing.

    * Update try_to_freeze() to return bool and relay the return value of
    __refrigerator() if freezing().

    * Convert all refrigerator() users to try_to_freeze().

    * Update documentation accordingly.

    * While at it, add might_sleep() to try_to_freeze().

    Signed-off-by: Tejun Heo
    Cc: Samuel Ortiz
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Steven Whitehouse
    Cc: Andrew Morton
    Cc: Jan Kara
    Cc: KONISHI Ryusuke
    Cc: Christoph Hellwig

    Tejun Heo
     

08 Nov, 2011

1 commit

  • Christoph has split up REQ_PRIO from REQ_META. That means that
    we can drop REQ_PRIO from places where is it not needed. I'm
    not at all sure that the combination WRITE_FLUSH_FUA | REQ_PRIO
    makes any kind of sense, anyway.

    In addition, I've added REQ_META to one place in the code where
    it was missing. REQ_PRIO has been left for read/writes triggered
    by glock acquisition and writeback only. We can adjust it again
    if required, but these are the most important points from a
    performance perspective.

    Signed-off-by: Steven Whitehouse
    Cc: Christoph Hellwig

    Steven Whitehouse
     

23 Aug, 2011

1 commit

  • Add a new REQ_PRIO to let requests preempt others in the cfq I/O schedule,
    and lave REQ_META purely for marking requests as metadata in blktrace.

    All existing callers of REQ_META except for XFS are updated to also
    set REQ_PRIO for now.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Namhyung Kim
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

14 Jul, 2011

1 commit

  • This patch contains a few misc fixes which resolve a recently
    reported issue. This patch has been a real team effort and has
    received a lot of testing.

    The first issue is that the ail lock needs to be held over a few
    more operations. The lock thats added into gfs2_releasepage() may
    possibly be a candidate for replacing with RCU at some future
    point, but at this stage we've gone for the obvious fix.

    The second issue is that gfs2_write_inode() can end up calling
    a glock recursively when called from gfs2_evict_inode() via the
    syncing code, so it needs a guard added.

    The third issue is that we either need to not truncate the metadata
    pages of inodes which have zero link count, but which we cannot
    deallocate due to them still being in use by other nodes, or we need
    to ensure that those pages have all made it through the journal and
    ail lists first. This patch takes the former approach, but the
    latter has also been tested and there is nothing to choose between
    them performance-wise. So again, we could revise that decision
    in the future.

    Also, the inode eviction process is now better documented.

    Signed-off-by: Steven Whitehouse
    Tested-by: Bob Peterson
    Tested-by: Abhijith Das
    Reported-by: Barry J. Marson
    Reported-by: David Teigland

    Steven Whitehouse
     

22 May, 2011

1 commit

  • The ail flush code has always relied upon log flushing to prevent
    it from spinning needlessly. This fixes it to wait on the last
    I/O request submitted (we don't need to wait for all of it)
    instead of either spinning with io_schedule or sleeping.

    As a result cpu usage of gfs2_logd is much reduced with certain
    workloads.

    Reported-by: Abhijith Das
    Tested-by: Abhijith Das
    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

03 May, 2011

1 commit

  • In the recent patches to update the AIL list code, I managed to
    forget that the ail list lock got dropped, even though I
    added a comment specifically to remind myself :(

    Reported-by: Barry Marson
    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

20 Apr, 2011

3 commits


25 Mar, 2011

1 commit

  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

14 Mar, 2011

1 commit


11 Mar, 2011

1 commit

  • The log lock is currently used to protect the AIL lists and
    the movements of buffers into and out of them. The lists
    are self contained and no log specific items outside the
    lists are accessed when starting or emptying the AIL lists.

    Hence the operation of the AIL does not require the protection
    of the log lock so split them out into a new AIL specific lock
    to reduce the amount of traffic on the log lock. This will
    also reduce the amount of serialisation that occurs when
    the gfs2_logd pushes on the AIL to move it forward.

    This reduces the impact of log pushing on sequential write
    throughput.

    Signed-off-by: Dave Chinner
    Signed-off-by: Steven Whitehouse

    Dave Chinner
     

10 Mar, 2011

1 commit

  • With the plugging now being explicitly controlled by the
    submitter, callers need not pass down unplugging hints
    to the block layer. If they want to unplug, it's because they
    manually plugged on their own - in which case, they should just
    unplug at will.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

19 Oct, 2010

1 commit


17 Sep, 2010

1 commit


10 Sep, 2010

1 commit


08 Aug, 2010

2 commits

  • Remove the current bio flags and reuse the request flags for the bio, too.
    This allows to more easily trace the type of I/O from the filesystem
    down to the block driver. There were two flags in the bio that were
    missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
    renamed two request flags that had a superflous RW in them.

    Note that the flags are in bio.h despite having the REQ_ name - as
    blkdev.h includes bio.h that is the only way to go for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • A barrier request should by defintion have priority in get_request
    and let the queue be unplugged immediately as it's blocking all forward
    progress due to the queue draining.

    Most filesystems already get this implicitly by the way how submit_bh
    treats the buffer_ordered flag, and gfs2 sets it explicitly. But btrfs
    and XFS are still forgetting to set the flag, as is blkdev_issue_flush
    and some places in DM/MD.

    For XFS on metadata heavy workloads this gives a consistent speedup
    in the 2-3% range.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

21 May, 2010

1 commit

  • The previous patch I wrote for reclaiming unlinked dinodes
    had some shortcomings and did not prevent all hangs.
    This version is much cleaner and more logical, and has
    passed very difficult testing. Sorry for the churn.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

06 May, 2010

1 commit

  • The following patch adds a message to indicate when barriers have been
    disabled due to a block device which doesn't support them. You could
    already tell this via the mount options in /proc/mounts, but all the
    other filesystems also log a message at the same time.

    Also, the same mechanisms are used to indicate when the lock
    demote interface has been used (only ever used for debugging)
    which is a request from our support team.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

05 May, 2010

1 commit

  • This patch contains various tweaks to how log flushes and active item writeback
    work. gfs2_logd is now managed by a waitqueue, and gfs2_log_reseve now waits
    for gfs2_logd to do the log flushing. Multiple functions were rewritten to
    remove the need to call gfs2_log_lock(). Instead of using one test to see if
    gfs2_logd had work to do, there are now seperate tests to check if there
    are two many buffers in the incore log or if there are two many items on the
    active items list.

    This patch is a port of a patch Steve Whitehouse wrote about a year ago, with
    some minor changes. Since gfs2_ail1_start always submits all the active items,
    it no longer needs to keep track of the first ai submitted, so this has been
    removed. In gfs2_log_reserve(), the order of the calls to
    prepare_to_wait_exclusive() and wake_up() when firing off the logd thread has
    been switched. If it called wake_up first there was a small window for a race,
    where logd could run and return before gfs2_log_reserve was ready to get woken
    up. If gfs2_logd ran, but did not free up enough blocks, gfs2_log_reserve()
    would be left waiting for gfs2_logd to eventualy run because it timed out.
    Finally, gt_logd_secs, which controls how long to wait before gfs2_logd times
    out, and flushes the log, can now be set on mount with ar_commit.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

11 Mar, 2010

1 commit

  • GFS2 tracks the number of revokes and unrevokes that are part of committed
    transactions via sd_log_commited_revoke. It is possible for one process to add
    revokes during its transaction, while another process unrevokes them during its
    transaction. If the second process finishes its transaction first,
    sd_log_commited_revoke will be decremented by the number of unrevokes that the
    second process did, without first being incremented by the number of revokes
    the first process did. This is fine, since all started transactions must be
    completed before the journal can be flushed. However, sd_log_commited_revoke
    is an unsigned integer, and log_refund() causes an assertion failure if it
    would go negative at the end of a transaction. This patch makes
    sd_log_commited_revoke a signed integer and allows it to go negative.
    __gfs2_log_flush() still checks that it mataches the actual number of revokes.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

03 Dec, 2009

1 commit

  • There are two spare field in the header common to all GFS2
    metadata. One is just the right size to fit a journal id
    in it, and this patch updates the journal code so that each
    time a metadata block is modified, we tag it with the journal
    id of the node which is performing the modification.

    The reason for this is that it should make it much easier to
    debug issues which arise if we can tell which node was the
    last to modify a particular metadata block.

    Since the field is updated before the block is written into
    the journal, each journal should only contain metadata which
    is tagged with its own journal id. The one exception to this
    is the journal header block, which might have a different node's
    id in it, if that journal was recovered by another node in the
    cluster.

    Thus each journal will contain a record of which nodes recovered
    it, via the journal header.

    The other field in the metadata header could potentially be
    used to hold information about what kind of operation was
    performed, but for the time being we just zero it on each
    transaction so that if we use it for that in future, we'll
    know that the information (where it exists) is reliable.

    I did consider using the other field to hold the journal
    sequence number, however since in GFS2's journaling we write
    the modified data into the journal and not the original
    data, this gives no information as to what action caused the
    modification, so I think we can probably come up with a better
    use for those 64 bits in the future.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

12 Jun, 2009

2 commits


11 May, 2009

1 commit

  • After Jens recent updates:
    http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a1f242524c3c1f5d40f1c9c343427e34d1aadd6e
    et al. this is a patch to bring gfs2 uptodate with the core
    code. Also I've managed to squash another call to ll_rw_block()
    along the way.

    There is still one part of the GFS2 I/O paths which are not correctly
    annotated and that is due to the sharing of the writeback code between
    the data and metadata address spaces. I would like to change that too,
    but this patch is still worth doing on its own, I think.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

24 Mar, 2009

1 commit

  • This is the big patch that I've been working on for some time
    now. There are many reasons for wanting to make this change
    such as:
    o Reducing overhead by eliminating duplicated fields between structures
    o Simplifcation of the code (reduces the code size by a fair bit)
    o The locking interface is now the DLM interface itself as proposed
    some time ago.
    o Fewer lookups of glocks when processing replies from the DLM
    o Fewer memory allocations/deallocations for each glock
    o Scope to do further optimisations in the future (but this patch is
    more than big enough for now!)

    Please note that (a) this patch relates to the lock_dlm module and
    not the DLM itself, that is still a separate module; and (b) that
    we retain the ability to build GFS2 as a standalone single node
    filesystem with out requiring the DLM.

    This patch needs a lot of testing, hence my keeping it I restarted
    my -git tree after the last merge window. That way, this has the maximum
    exposure before its merged. This is (modulo a few minor bug fixes) the
    same patch that I've been posting on and off the the last three months
    and its passed a number of different tests so far.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

26 Sep, 2008

1 commit

  • This patch adds barrier support to GFS2. There is not a lot of change
    really... we just add the barrier flag when we write journal header
    blocks. If the underlying device refuses to support them, we fall back
    to the previous way of doing things (wait for the I/O and hope) since
    there is nothing else we can do. There is no user configuration,
    barriers will always be on unless the device refuses to support them.
    This seems a reasonable solution to me since this is a correctness
    issue.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse