07 Mar, 2011

1 commit

  • Convert the xfs log operations to use the new error logging
    interfaces. This removes the xlog_{warn,panic} wrappers and makes
    almost all errors emit the device they belong to instead of just
    refering to "XFS".

    Signed-off-by: Dave Chinner
    Reviewed-by: Alex Elder
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     

21 Dec, 2010

6 commits

  • The only thing that the grant lock remains to protect is the grant head
    manipulations when adding or removing space from the log. These calculations
    are already based on atomic variables, so we can already update them safely
    without locks. However, the grant head manpulations require atomic multi-step
    calculations to be executed, which the algorithms currently don't allow.

    To make these multi-step calculations atomic, convert the algorithms to
    compare-and-exchange loops on the atomic variables. That is, we sample the old
    value, perform the calculation and use atomic64_cmpxchg() to attempt to update
    the head with the new value. If the head has not changed since we sampled it,
    it will succeed and we are done. Otherwise, we rerun the calculation again from
    a new sample of the head.

    This allows us to remove the grant lock from around all the grant head space
    manipulations, and that effectively removes the grant lock from the log
    completely. Hence we can remove the grant lock completely from the log at this
    point.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     
  • The log grant ticket wait queues are currently protected by the log
    grant lock. However, the queues are functionally independent from
    each other, and operations on them only require serialisation
    against other queue operations now that all of the other log
    variables they use are atomic values.

    Hence, we can make them independent of the grant lock by introducing
    new locks just to protect the lists operations. because the lists
    are independent, we can use a lock per list and ensure that reserve
    and write head queuing do not contend.

    To ensure forced shutdowns work correctly in conjunction with the
    new fast paths, ensure that we check whether the log has been shut
    down in the grant functions once we hold the relevant spin locks but
    before we go to sleep. This is needed to co-ordinate correctly with
    the wakeups that are issued on the ticket queues so we don't leave
    any processes sleeping on the queues during a shutdown.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     
  • log->l_tail_lsn is currently protected by the log grant lock. The
    lock is only needed for serialising readers against writers, so we
    don't really need the lock if we make the l_tail_lsn variable an
    atomic. Converting the l_tail_lsn variable to an atomic64_t means we
    can start to peel back the grant lock from various operations.

    Also, provide functions to safely crack an atomic LSN variable into
    it's component pieces and to recombined the components into an
    atomic variable. Use them where appropriate.

    This also removes the need for explicitly holding a spinlock to read
    the l_tail_lsn on 32 bit platforms.

    Signed-off-by: Dave Chinner

    Dave Chinner
     
  • The log grant queues are one of the few places left using sv_t
    constructs for waiting. Given we are touching this code, we should
    convert them to plain wait queues. While there, convert all the
    other sv_t users in the log code as well.

    Seeing as this removes the last users of the sv_t type, remove the
    header file defining the wrapper and the fragments that still
    reference it.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     
  • Prepare for switching the grant heads to atomic variables by
    combining the two 32 bit values that make up the grant head into a
    single 64 bit variable. Provide wrapper functions to combine and
    split the grant heads appropriately for calculations and use them as
    necessary.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     
  • The grant write and reserve queues use a roll-your-own double linked
    list, so convert it to a standard list_head structure and convert
    all the list traversals to use list_for_each_entry(). We can also
    get rid of the XLOG_TIC_IN_Q flag as we can use the list_empty()
    check to tell if the ticket is in a list or not.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     

17 Dec, 2010

1 commit


03 Dec, 2010

2 commits

  • Convert the log grant heads to atomic64_t types in preparation for
    converting the accounting algorithms to atomic operations. his patch
    just converts the variables; the algorithmic changes are in a
    separate patch for clarity.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     
  • log->l_last_sync_lsn is updated in only one critical spot - log
    buffer Io completion - and is protected by the grant lock here. This
    requires the grant lock to be taken for every log buffer IO
    completion. Converting the l_last_sync_lsn variable to an atomic64_t
    means that we do not need to take the grant lock in log buffer IO
    completion to update it.

    This also removes the need for explicitly holding a spinlock to read
    the l_last_sync_lsn on 32 bit platforms.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     

29 Sep, 2010

1 commit

  • I have been seeing occasional pauses in transaction throughput up to
    30s long under heavy parallel workloads. The only notable thing was
    that the xfsaild was trying to be active during the pauses, but
    making no progress. It was running exactly 20 times a second (on the
    50ms no-progress backoff), and the number of pushbuf events was
    constant across this time as well. IOWs, the xfsaild appeared to be
    stuck on buffers that it could not push out.

    Further investigation indicated that it was trying to push out inode
    buffers that were pinned and/or locked. The xfsbufd was also getting
    woken at the same frequency (by the xfsaild, no doubt) to push out
    delayed write buffers. The xfsbufd was not making any progress
    because all the buffers in the delwri queue were pinned. This scan-
    and-make-no-progress dance went one in the trace for some seconds,
    before the xfssyncd came along an issued a log force, and then
    things started going again.

    However, I noticed something strange about the log force - there
    were way too many IO's issued. 516 log buffers were written, to be
    exact. That added up to 129MB of log IO, which got me very
    interested because it's almost exactly 25% of the size of the log.
    He delayed logging code is suppose to aggregate the minimum of 25%
    of the log or 8MB worth of changes before flushing. That's what
    really puzzled me - why did a log force write 129MB instead of only
    8MB?

    Essentially what has happened is that no CIL pushes had occurred
    since the previous tail push which cleared out 25% of the log space.
    That caused all the new transactions to block because there wasn't
    log space for them, but they kick the xfsaild to push the tail.
    However, the xfsaild was not making progress because there were
    buffers it could not lock and flush, and the xfsbufd could not flush
    them because they were pinned. As a result, both the xfsaild and the
    xfsbufd could not move the tail of the log forward without the CIL
    first committing.

    The cause of the problem was that the background CIL push, which
    should happen when 8MB of aggregated changes have been committed, is
    being held off by the concurrent transaction commit load. The
    background push does a down_write_trylock() which will fail if there
    is a concurrent transaction commit holding the push lock in read
    mode. With 8 CPUs all doing transactions as fast as they can, there
    was enough concurrent transaction commits to hold off the background
    push until tail-pushing could no longer free log space, and the halt
    would occur.

    It should be noted that there is no reason why it would halt at 25%
    of log space used by a single CIL checkpoint. This bug could
    definitely violate the "no transaction should be larger than half
    the log" requirement and hence result in corruption if the system
    crashed under heavy load. This sort of bug is exactly the reason why
    delayed logging was tagged as experimental....

    The fix is to start blocking background pushes once the threshold
    has been exceeded. Rework the threshold calculations to keep the
    amount of log space a CIL checkpoint can use to below that of the
    AIL push threshold to avoid the problem completely.

    Signed-off-by: Dave Chinner
    Reviewed-by: Alex Elder
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     

24 Aug, 2010

1 commit

  • Delayed logging adds some serialisation to the log force process to
    ensure that it does not deference a bad commit context structure
    when determining if a CIL push is necessary or not. It does this by
    grabing the CIL context lock exclusively, then dropping it before
    pushing the CIL if necessary. This causes serialisation of all log
    forces and pushes regardless of whether a force is necessary or not.
    As a result fsync heavy workloads (like dbench) can be significantly
    slower with delayed logging than without.

    To avoid this penalty, copy the current sequence from the context to
    the CIL structure when they are swapped. This allows us to do
    unlocked checks on the current sequence without having to worry
    about dereferencing context structures that may have already been
    freed. Hence we can remove the CIL context locking in the forcing
    code and only call into the push code if the current context matches
    the sequence we need to force.

    By passing the sequence into the push code, we can check the
    sequence again once we have the CIL lock held exclusive and abort if
    the sequence has already been pushed. This avoids a lock round-trip
    and unnecessary CIL pushes when we have racing push calls.

    The result is that the regression in dbench performance goes away -
    this change improves dbench performance on a ramdisk from ~2100MB/s
    to ~2500MB/s. This compares favourably to not using delayed logging
    which retuns ~2500MB/s for the same workload.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     

24 May, 2010

3 commits

  • If we let the CIL grow without bound, it will grow large enough to violate
    recovery constraints (must be at least one complete transaction in the log at
    all times) or take forever to write out through the log buffers. Hence we need
    a check during asynchronous transactions as to whether the CIL needs to be
    pushed.

    We track the amount of log space the CIL consumes, so it is relatively simple
    to limit it on a pure size basis. Make the limit the minimum of just under half
    the log size (recovery constraint) or 8MB of log space (which is an awful lot
    of metadata).

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • The delayed logging code only changes in-memory structures and as
    such can be enabled and disabled with a mount option. Add the mount
    option and emit a warning that this is an experimental feature that
    should not be used in production yet.

    We also need infrastructure to track committed items that have not
    yet been written to the log. This is what the Committed Item List
    (CIL) is for.

    The log item also needs to be extended to track the current log
    vector, the associated memory buffer and it's location in the Commit
    Item List. Extend the log item and log vector structures to enable
    this tracking.

    To maintain the current log format for transactions with delayed
    logging, we need to introduce a checkpoint transaction and a context
    for tracking each checkpoint from initiation to transaction
    completion. This includes adding a log ticket for tracking space
    log required/used by the context checkpoint.

    To track all the changes we need an io vector array per log item,
    rather than a single array for the entire transaction. Using the new
    log vector structure for this requires two passes - the first to
    allocate the log vector structures and chain them together, and the
    second to fill them out. This log vector chain can then be passed
    to the CIL for formatting, pinning and insertion into the CIL.

    Formatting of the log vector chain is relatively simple - it's just
    a loop over the iovecs on each log vector, but it is made slightly
    more complex because we re-write the iovec after the copy to point
    back at the memory buffer we just copied into.

    This code also needs to pin log items. If the log item is not
    already tracked in this checkpoint context, then it needs to be
    pinned. Otherwise it is already pinned and we don't need to pin it
    again.

    The only other complexity is calculating the amount of new log space
    the formatting has consumed. This needs to be accounted to the
    transaction in progress, and the accounting is made more complex
    becase we need also to steal space from it for log metadata in the
    checkpoint transaction. Calculate all this at insert time and update
    all the tickets, counters, etc correctly.

    Once we've formatted all the log items in the transaction, attach
    the busy extents to the checkpoint context so the busy extents live
    until checkpoint completion and can be processed at that point in
    time. Transactions can then be freed at this point in time.

    Now we need to issue checkpoints - we are tracking the amount of log space
    used by the items in the CIL, so we can trigger background checkpoints when the
    space usage gets to a certain threshold. Otherwise, checkpoints need ot be
    triggered when a log synchronisation point is reached - a log force event.

    Because the log write code already handles chained log vectors, writing the
    transaction is trivial, too. Construct a transaction header, add it
    to the head of the chain and write it into the log, then issue a
    commit record write. Then we can release the checkpoint log ticket
    and attach the context to the log buffer so it can be called during
    Io completion to complete the checkpoint.

    We also need to allow for synchronising multiple in-flight
    checkpoints. This is needed for two things - the first is to ensure
    that checkpoint commit records appear in the log in the correct
    sequence order (so they are replayed in the correct order). The
    second is so that xfs_log_force_lsn() operates correctly and only
    flushes and/or waits for the specific sequence it was provided with.

    To do this we need a wait variable and a list tracking the
    checkpoint commits in progress. We can walk this list and wait for
    the checkpoints to change state or complete easily, an this provides
    the necessary synchronisation for correct operation in both cases.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • The ticket ID is needed to uniquely identify transactions when doing busy
    extent matching. Delayed logging changes the lifecycle of busy extents with
    respect to the transaction structure lifecycle. Hence we can no longer use
    the transaction structure as a means of determining the owner of the busy
    extent as it may be freed and reused while the busy extent is still active.

    This commit provides the infrastructure to access the xlog_tid_t held in the
    ticket from a transaction handle. This avoids the need for callers to peek
    into the transaction and log structures to find this out.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     

19 May, 2010

3 commits

  • There remains only one user of the l_sectbb_mask field in the log
    structure. Just kill it off and compute the mask where needed from
    the power-of-2 sector size.

    (Only update from last post is to accomodate the changes in the
    previous patch in the series.)

    Signed-off-by: Alex Elder
    Reviewed-by: Christoph Hellwig

    Alex Elder
     
  • Change struct log so it keeps track of the size (in basic blocks) of
    a log sector in l_sectBBsize rather than the log-base-2 of that
    value (previously, l_sectbb_log). The name was chosen for
    consistency with the other fields in the structure that represent
    a number of basic blocks.

    (Updated so that a variable used in computing and verifying a log's
    sector size is named "log2_size". Also added the "BB" to the
    structure field name, based on feedback from Eric Sandeen. Also
    dropped some superfluous parentheses.)

    Signed-off-by: Alex Elder
    Reviewed-by: Eric Sandeen

    Alex Elder
     
  • Replace the awkward xlog_write_adv_cnt with an inline helper that makes
    it more obvious that it's modifying it's paramters, and replace the use
    of an integer type for "ptr" with a real void pointer. Also move
    xlog_write_adv_cnt to xfs_log_priv.h as it will be used outside of
    xfs_log.c in the delayed logging series.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     

16 Jan, 2010

1 commit


15 Dec, 2009

1 commit

  • Convert the old xfs tracing support that could only be used with the
    out of tree kdb and xfsidbg patches to use the generic event tracer.

    To use it make sure CONFIG_EVENT_TRACING is enabled and then enable
    all xfs trace channels by:

    echo 1 > /sys/kernel/debug/tracing/events/xfs/enable

    or alternatively enable single events by just doing the same in one
    event subdirectory, e.g.

    echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable

    or set more complex filters, etc. In Documentation/trace/events.txt
    all this is desctribed in more detail. To reads the events do a

    cat /sys/kernel/debug/tracing/trace

    Compared to the last posting this patch converts the tracing mostly to
    the one tracepoint per callsite model that other users of the new
    tracing facility also employ. This allows a very fine-grained control
    of the tracing, a cleaner output of the traces and also enables the
    perf tool to use each tracepoint as a virtual performance counter,
    allowing us to e.g. count how often certain workloads git various
    spots in XFS. Take a look at

    http://lwn.net/Articles/346470/

    for some examples.

    Also the btree tracing isn't included at all yet, as it will require
    additional core tracing features not in mainline yet, I plan to
    deliver it later.

    And the really nice thing about this patch is that it actually removes
    many lines of code while adding this nice functionality:

    fs/xfs/Makefile | 8
    fs/xfs/linux-2.6/xfs_acl.c | 1
    fs/xfs/linux-2.6/xfs_aops.c | 52 -
    fs/xfs/linux-2.6/xfs_aops.h | 2
    fs/xfs/linux-2.6/xfs_buf.c | 117 +--
    fs/xfs/linux-2.6/xfs_buf.h | 33
    fs/xfs/linux-2.6/xfs_fs_subr.c | 3
    fs/xfs/linux-2.6/xfs_ioctl.c | 1
    fs/xfs/linux-2.6/xfs_ioctl32.c | 1
    fs/xfs/linux-2.6/xfs_iops.c | 1
    fs/xfs/linux-2.6/xfs_linux.h | 1
    fs/xfs/linux-2.6/xfs_lrw.c | 87 --
    fs/xfs/linux-2.6/xfs_lrw.h | 45 -
    fs/xfs/linux-2.6/xfs_super.c | 104 ---
    fs/xfs/linux-2.6/xfs_super.h | 7
    fs/xfs/linux-2.6/xfs_sync.c | 1
    fs/xfs/linux-2.6/xfs_trace.c | 75 ++
    fs/xfs/linux-2.6/xfs_trace.h | 1369 +++++++++++++++++++++++++++++++++++++++++
    fs/xfs/linux-2.6/xfs_vnode.h | 4
    fs/xfs/quota/xfs_dquot.c | 110 ---
    fs/xfs/quota/xfs_dquot.h | 21
    fs/xfs/quota/xfs_qm.c | 40 -
    fs/xfs/quota/xfs_qm_syscalls.c | 4
    fs/xfs/support/ktrace.c | 323 ---------
    fs/xfs/support/ktrace.h | 85 --
    fs/xfs/xfs.h | 16
    fs/xfs/xfs_ag.h | 14
    fs/xfs/xfs_alloc.c | 230 +-----
    fs/xfs/xfs_alloc.h | 27
    fs/xfs/xfs_alloc_btree.c | 1
    fs/xfs/xfs_attr.c | 107 ---
    fs/xfs/xfs_attr.h | 10
    fs/xfs/xfs_attr_leaf.c | 14
    fs/xfs/xfs_attr_sf.h | 40 -
    fs/xfs/xfs_bmap.c | 507 +++------------
    fs/xfs/xfs_bmap.h | 49 -
    fs/xfs/xfs_bmap_btree.c | 6
    fs/xfs/xfs_btree.c | 5
    fs/xfs/xfs_btree_trace.h | 17
    fs/xfs/xfs_buf_item.c | 87 --
    fs/xfs/xfs_buf_item.h | 20
    fs/xfs/xfs_da_btree.c | 3
    fs/xfs/xfs_da_btree.h | 7
    fs/xfs/xfs_dfrag.c | 2
    fs/xfs/xfs_dir2.c | 8
    fs/xfs/xfs_dir2_block.c | 20
    fs/xfs/xfs_dir2_leaf.c | 21
    fs/xfs/xfs_dir2_node.c | 27
    fs/xfs/xfs_dir2_sf.c | 26
    fs/xfs/xfs_dir2_trace.c | 216 ------
    fs/xfs/xfs_dir2_trace.h | 72 --
    fs/xfs/xfs_filestream.c | 8
    fs/xfs/xfs_fsops.c | 2
    fs/xfs/xfs_iget.c | 111 ---
    fs/xfs/xfs_inode.c | 67 --
    fs/xfs/xfs_inode.h | 76 --
    fs/xfs/xfs_inode_item.c | 5
    fs/xfs/xfs_iomap.c | 85 --
    fs/xfs/xfs_iomap.h | 8
    fs/xfs/xfs_log.c | 181 +----
    fs/xfs/xfs_log_priv.h | 20
    fs/xfs/xfs_log_recover.c | 1
    fs/xfs/xfs_mount.c | 2
    fs/xfs/xfs_quota.h | 8
    fs/xfs/xfs_rename.c | 1
    fs/xfs/xfs_rtalloc.c | 1
    fs/xfs/xfs_rw.c | 3
    fs/xfs/xfs_trans.h | 47 +
    fs/xfs/xfs_trans_buf.c | 62 -
    fs/xfs/xfs_vnodeops.c | 8
    70 files changed, 2151 insertions(+), 2592 deletions(-)

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

01 Sep, 2009

1 commit


16 Mar, 2009

1 commit

  • Most callers of xlog_bread need to call xlog_align to get the actual offset.
    Consolidate that call into the main xlog_bread and provide a _xlog_bread
    for those few that don't want the actual offset.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     

09 Feb, 2009

1 commit


01 Dec, 2008

1 commit

  • Move all fields from xlog_iclog_fields_t into xlog_in_core_t instead of having
    them in a substructure and the using #defines to make it look like they were
    directly in xlog_in_core_t. Also document that xlog_in_core_2_t is grossly
    misnamed, and make all references to it typesafe.

    (First sent on Semptember 15th)

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     

17 Nov, 2008

1 commit

  • When an I/O error occurs during an intermediate commit on a rolling
    transaction, xfs_trans_commit() will free the transaction structure
    and the related ticket. However, the duplicate transaction that
    gets used as the transaction continues still contains a pointer
    to the ticket. Hence when the duplicate transaction is cancelled
    and freed, we free the ticket a second time.

    Add reference counting to the ticket so that we hold an extra
    reference to the ticket over the transaction commit. We drop the
    extra reference once we have checked that the transaction commit
    did not return an error, thus avoiding a double free on commit
    error.

    Credit to Nick Piggin for tripping over the problem.

    SGI-PV: 989741

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Dave Chinner
     

30 Oct, 2008

1 commit

  • When we need to go from the log to the AIL, we have to go via the
    xfs_mount. Add a xfs_ail pointer to the log so we can go directly to the
    AIL associated with the log.

    SGI-PV: 988143

    SGI-Modid: xfs-linux-melb:xfs-kern:32351a

    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Christoph Hellwig

    David Chinner
     

17 Sep, 2008

1 commit

  • Memory allocations for log->l_grant_trace and iclog->ic_trace are done on
    demand when the first event is logged. In xlog_state_get_iclog_space() we
    call xlog_trace_iclog() under a spinlock and allocating memory here can
    cause us to sleep with a spinlock held and deadlock the system.

    For the log grant tracing we use KM_NOSLEEP but that means we can lose
    trace entries. Since there is no locking to serialize the log grant
    tracing we could race and have multiple allocations and leak memory.

    So move the allocations to where we initialize the log/iclog structures.
    Use KM_NOFS to avoid recursing into the filesystem and drop log->l_trace
    since it's not even used.

    SGI-PV: 983738

    SGI-Modid: xfs-linux-melb:xfs-kern:31896a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Christoph Hellwig

    Lachlan McIlroy
     

13 Aug, 2008

2 commits

  • Remove all the useless flags and code keyed off it in xfs_mountfs.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31831a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • A lot of code has been converted away from semaphores, but there are still
    comments that reference semaphore behaviour. The log code is the worst
    offender. Update the comments to reflect what the code really does now.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31814a

    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    David Chinner
     

28 Jul, 2008

1 commit

  • The l_flushsema doesn't exactly have completion semantics, nor mutex
    semantics. It's used as a list of tasks which are waiting to be notified
    that a flush has completed. It was also being used in a way that was
    potentially racy, depending on the semaphore implementation.

    By using a sv_t instead of a semaphore we avoid the need for a separate
    counter, since we know we just need to wake everything on the queue.

    Original waitqueue implementation from Matthew Wilcox. Cleanup and
    conversion to sv_t by Christoph Hellwig.

    SGI-PV: 981507
    SGI-Modid: xfs-linux-melb:xfs-kern:31059a

    Signed-off-by: Matthew Wilcox
    Signed-off-by: Christoph Hellwig
    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    Matthew Wilcox
     

18 Apr, 2008

4 commits

  • To reduce contention on the log in large CPU count, separate out different
    parts of the xlog_t structure onto different cachelines. Move each lock
    onto a different cacheline along with all the members that are
    accessed/modified while that lock is held.

    Also, move the debugging code into debug code.

    SGI-PV: 978729
    SGI-Modid: xfs-linux-melb:xfs-kern:30772a

    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • The ticket allocator is just a simple slab implementation internal to the
    log. It requires the icloglock to be held when manipulating it and this
    contributes to contention on that lock.

    Just kill the entire allocator and use a memory zone instead. While there,
    allow us to gracefully fail allocation with ENOMEM.

    SGI-PV: 978729
    SGI-Modid: xfs-linux-melb:xfs-kern:30771a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • Rather than use the icloglock for protecting the iclog completion callback
    chain, use a new per-iclog lock so that walking the callback chain doesn't
    require holding a global lock.

    This reduces contention on the icloglock during transaction commit and log
    I/O completion by reducing the number of times we need to hold the global
    icloglock during these operations.

    SGI-PV: 978729
    SGI-Modid: xfs-linux-melb:xfs-kern:30770a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • Now that we update the log tail LSN less frequently on transaction
    completion, we pass the contention straight to the global log state lock
    (l_iclog_lock) during transaction completion.

    We currently have to take this lock to decrement the iclog reference
    count. there is a reference count on each iclog, so we need to take þhe
    global lock for all refcount changes.

    When large numbers of processes are all doing small trnasctions, the iclog
    reference counts will be quite high, and the state change that absolutely
    requires the l_iclog_lock is the except rather than the norm.

    Change the reference counting on the iclogs to use atomic_inc/dec so that
    we can use atomic_dec_and_lock during transaction completion and avoid the
    need for grabbing the l_iclog_lock for every reference count decrement
    except the one that matters - the last.

    SGI-PV: 975671
    SGI-Modid: xfs-linux-melb:xfs-kern:30505a

    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin
    Signed-off-by: Lachlan McIlroy

    David Chinner
     

10 Apr, 2008

1 commit


07 Feb, 2008

5 commits

  • Mostly trivial conversion with one exceptions: h_num_logops was kept in
    native endian previously and only converted to big endian in xlog_sync,
    but we always keep it big endian now. With todays cpus fast byteswap
    instructions that's not an issue but the new variant keeps the code clean
    and maintainable.

    SGI-PV: 971186
    SGI-Modid: xfs-linux-melb:xfs-kern:29821a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Tim Shimmin

    Christoph Hellwig
     
  • - the various assign lsn macros are replaced by a single inline,
    xlog_assign_lsn, which is equivalent to ASSIGN_ANY_LSN_HOST except
    for a more sane calling convention. ASSIGN_LSN_DISK is replaced
    by xlog_assign_lsn and a manual bytespap, and ASSIGN_LSN by the same,
    except we pass the cycle and block arguments explicitly instead of a
    log paramter. The latter two variants only had 2, respectively one
    user anyway.
    - the GET_CYCLE is replaced by a xlog_get_cycle inline with exactly the
    same calling conventions.
    - GET_CLIENT_ID is replaced by xlog_get_client_id which leaves away
    the unused arch argument. Instead of conditional defintions
    depending on host endianess we now do an unconditional swap and shift
    then, which generates equal code.
    - the unused XLOG_SET macro is removed.

    SGI-PV: 971186
    SGI-Modid: xfs-linux-melb:xfs-kern:29820a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Tim Shimmin

    Christoph Hellwig
     
  • - the various assign lsn macros are replaced by a single inline,
    xlog_assign_lsn, which is equivalent to ASSIGN_ANY_LSN_HOST except
    for a more sane calling convention. ASSIGN_LSN_DISK is replaced
    by xlog_assign_lsn and a manual bytespap, and ASSIGN_LSN by the same,
    except we pass the cycle and block arguments explicitly instead of a
    log paramter. The latter two variants only had 2, respectively one
    user anyway.
    - the GET_CYCLE is replaced by a xlog_get_cycle inline with exactly the
    same calling conventions.
    - GET_CLIENT_ID is replaced by xlog_get_client_id which leaves away
    the unused arch argument. Instead of conditional defintions
    depending on host endianess we now do an unconditional swap and shift
    then, which generates equal code.
    - the unused XLOG_SET macro is removed.

    SGI-PV: 971186
    SGI-Modid: xfs-linux-melb:xfs-kern:29819a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Tim Shimmin

    Christoph Hellwig
     
  • Un-obfuscate GRANT_LOCK, remove GRANT_LOCK->mutex_lock->spin_lock macros,
    call spin_lock directly, remove extraneous cookie holdover from old xfs
    code, and change lock type to spinlock_t.

    SGI-PV: 970382
    SGI-Modid: xfs-linux-melb:xfs-kern:29741a

    Signed-off-by: Eric Sandeen
    Signed-off-by: Donald Douwsma
    Signed-off-by: Tim Shimmin

    Eric Sandeen
     
  • Un-obfuscate LOG_LOCK, remove LOG_LOCK->mutex_lock->spin_lock macros, call
    spin_lock directly, remove extraneous cookie holdover from old xfs code,
    and change lock type to spinlock_t.

    SGI-PV: 970382
    SGI-Modid: xfs-linux-melb:xfs-kern:29740a

    Signed-off-by: Eric Sandeen
    Signed-off-by: Donald Douwsma
    Signed-off-by: Tim Shimmin

    Eric Sandeen